nimlgen
d08ce62553
hcq: do not reread signal in wait ( #10232 )
2025-05-09 14:38:36 +03:00
nimlgen
0464a31000
usbgpu: no overrun check needed ( #10231 )
2025-05-09 14:20:24 +03:00
nimlgen
116390083f
nvme speed write example ( #10230 )
2025-05-09 14:20:01 +03:00
chenyu
9846435c2e
fix test_div_numerator_negative ( #10229 )
...
the simplification was wrong with negative const_factor
2025-05-09 06:19:59 -04:00
chenyu
cba508c8c3
update uop symbolic tests ( #10228 )
...
clean up TODOs and update tests
2025-05-09 01:55:53 -04:00
chenyu
56def6c319
better bound for mod negative number ( #10227 )
2025-05-09 01:19:47 -04:00
chenyu
99f6d89dfb
tighter idiv bound for symbolic denominator ( #10226 )
2025-05-08 22:38:56 -04:00
uuuvn
82a6160ff7
Detect metal paravirtualization bug via device name instead of CI ( #10225 )
2025-05-08 19:31:47 -07:00
Xingyu
a21369d039
Enhance tensor random functions with dtype support ( #10214 )
...
* Enhance tensor random functions with dtype support
- Updated `aten.uniform_` and `aten.normal_` to include dtype parameter in backend.py
- Added unit tests for uniform and normal tensor generation with specific dtypes in test.py
* Refactor test name for clarity
- Renamed `test_normal_dtype` to `test_normal` in `extra/torch_backend/test.py`
- Aims to improve readability and better reflect the test's purpose
2025-05-08 20:48:07 -04:00
qazal
b6904bbf83
Revert "split grouper into insert and finalize stages [pr] ( #10222 )" ( #10224 )
...
This reverts commit 2594e4db15 .
2025-05-09 03:02:38 +03:00
qazal
2594e4db15
split grouper into insert and finalize stages [pr] ( #10222 )
2025-05-09 02:36:22 +03:00
George Hotz
0b7e3e86d0
single device copy [pr] ( #10221 )
...
* single device copy [pr]
* simpler
2025-05-08 15:23:22 -07:00
qazal
1d0f239df7
use Tensor.train() in schedule test + typo [pr] ( #10220 )
2025-05-08 23:46:42 +03:00
qazal
ff2aa6d0b2
buffer in create_kernel is optional [pr] ( #10218 )
...
* buffer in create_kernel is optional [pr]
* pylint
2025-05-08 22:35:55 +03:00
qazal
40560e77c2
minor grouper + viz fixup [pr] ( #10217 )
...
* minor grouper + viz fixup [pr]
* gitignore mypy_cache
* reorder create_kernels
* replace with realized
* use tensor_map + viz before spec
* lint
* add that back
2025-05-08 21:39:44 +03:00
George Hotz
0411b09763
small changes from new multi [pr] ( #10213 )
2025-05-08 07:04:27 -07:00
Sieds Lykles
a0580e8d3c
Cleanup in div_and_mod_folding [pr] ( #10178 )
...
* Refactor binary var simplification
* Simplify the congruence logic
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-05-08 06:25:32 -07:00
nimlgen
267ba9b592
usbgpu: better names in copy speed benchmark ( #10212 )
2025-05-08 16:12:37 +03:00
hooved
7b4f05fd00
Add test for correctness of Infinity in WebGPU ( #10201 )
...
* use function for infinity instead of uniform
* test infinity math locally
* test infinity math in CI
* make pytest available to MacOS (WebGPU)
* revert to master except failing webgpu test
2025-05-08 05:20:05 -07:00
nimlgen
e24fe1c746
usbgpu: pci cache ( #10207 )
2025-05-08 14:31:01 +03:00
nimlgen
7d6ed1b1e9
hotfix: mac ci ( #10210 )
...
* fixed?
* cmnt
2025-05-08 14:13:23 +03:00
nimlgen
ba52fce4b2
usbgpu: benchmark in ci ( #10208 )
...
* usbgpu: benchmark
* usbgpu: benchmark
2025-05-08 12:02:04 +03:00
qazal
d0e3449992
remove view_supported_devices, check allocator instead [pr] ( #10209 )
2025-05-08 11:45:02 +03:00
nimlgen
5a7f6b4d8e
am: fix launch on rdna4 ( #10206 )
2025-05-08 09:46:12 +03:00
George Hotz
8d4c563c01
all COPY can be clone ( #10205 )
...
* match old behavior
* simple
* it means the naive thing before the multi
* fix
2025-05-07 20:31:39 -07:00
hooved
8e76c40aea
Refactor test: Enable generality in testing UOp alu expressions ( #10200 )
...
* use function for infinity instead of uniform
* test infinity math locally
* test infinity math in CI
* make pytest available to MacOS (WebGPU)
* revert to master except failing webgpu test
* isolate test refactor
2025-05-07 19:39:44 -07:00
George Hotz
83efc5d5bb
lil changes from multi [pr] ( #10202 )
2025-05-07 14:42:30 -07:00
Rory Clear
9f2931ae67
Fix yolo load failing silently ( #10046 )
...
* wait for js before loading model
* use f32
* revert html changes, try both cameras and remove f16 req
* clean
2025-05-07 11:46:09 -07:00
uuuvn
10c9ede6b7
Cloud graph ( #9876 )
2025-05-07 11:41:41 -07:00
Sieds Lykles
2891892834
Fold constant variable ( #10196 )
...
* Add rule
* add test and comment
* merge rule
2025-05-07 11:39:44 -07:00
Sieds Lykles
8386527bb9
Take neg out of idiv ( #10164 )
...
* Add rules
* Fix tests
* Move rules lower to prevent recursion
2025-05-07 11:39:08 -07:00
qazal
e6c80a9e40
hotfix: early kwargs.pop('err') ( #10197 )
...
* hotfix: early kwargs.pop('err')
* err, no container
2025-05-07 23:53:26 +08:00
qazal
3bc72f02d9
better error message for linearizer failures in viz [pr] ( #10195 )
2025-05-07 23:11:44 +08:00
Sieds Lykles
09544d4556
Add rule and test ( #10189 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-05-07 10:15:55 -04:00
nimlgen
c603b86d69
usbgpu: move queues to controller ( #10194 )
2025-05-07 16:41:16 +03:00
nimlgen
0fbe494c6b
usb: cache writes into 0xa000 ( #10191 )
...
* usb: cache writes into 0xa000
* mock
* match parent spec
* ugh
2025-05-07 16:03:35 +03:00
nimlgen
b8fb0f11ff
hcq: parametrize signal allocation size ( #10192 )
2025-05-07 15:50:43 +03:00
nimlgen
685d5c46df
usbgpu: send pci write in batches ( #10190 )
...
* usbgpu: send pci write in batches
* mock
2025-05-07 14:41:56 +03:00
qazal
3a32fa228c
refactor merge_views matcher [pr] ( #10188 )
2025-05-07 19:22:06 +08:00
qazal
94e07725a6
only reorder expand if it can fuse with input ( #10186 )
...
* failing test
* only reorder expand if it can fuse with input
* (16,) is reshaped to (4, 4)
2025-05-07 18:14:31 +08:00
qazal
4ea3e373aa
decode lds ops in remu ( #10184 )
2025-05-07 16:44:18 +08:00
uuuvn
dba073e5c0
Less messy broken graph on paravirtualized metal workaround ( #10182 )
...
* Less messy broken graph on paravirtualized metal workaround
GitHub CI macOS runners use paravirtualized metal which is broken with
graph (some comments say that ICB in particular is broken but in my
testing it was fine sometimes, but other times hitting an assert inside
metal's code related to resouces, so not sure).
> Assertion failed: (resource != nil), function -[IOGPUMetalResource initWithResource:], file IOGPUMetalResource.m, line 458.
This can be reproduced locally with any virtualization software (like utm)
that can create macOS VMs with apple's own virtualization framework.
* unused import
2025-05-06 20:41:02 +03:00
nimlgen
59c03e8904
usbgpu: tiny changes in setup pci bars to match spec ( #10181 )
...
* usbgpu: tiny changes in setup pci bars to match spec
* unused
2025-05-06 20:39:03 +03:00
Ignacio Sica
74c25bdc8b
add support for ds_load_u8 in remu ( #10180 )
...
* add support for ds_load_u8 in remu
* add test for ds_load_u8
2025-05-06 20:31:00 +03:00
nimlgen
10f115fdb0
usbgpu: USB_RESCAN_BUS envvar ( #10177 )
2025-05-06 17:09:36 +03:00
nimlgen
781fd8c1eb
usbgpu: some tlp error info ( #10176 )
...
* usbgpu: some tlp error info
* oops
2025-05-06 17:01:10 +03:00
nimlgen
aea1f77225
amd: uppercase amd_iface vals ( #10175 )
2025-05-06 15:12:50 +03:00
nimlgen
34d55857cf
usbgpu: more devs in scan_pci ( #10171 )
2025-05-06 11:55:34 +03:00
nimlgen
37a7a99adb
metal: fix graph when unrelated input buffers are not metal buffers ( #10170 )
...
* metal: fix graph when unrelated input buffers are not metal buffers
* tinier test
2025-05-06 11:37:16 +03:00
George Hotz
603c03bef2
fix tests for rewrite [pr] ( #10167 )
...
* fix tests for rewrite [pr]
* cleaner
* delete linearize_uop
* clean up the rest
2025-05-05 19:19:49 -07:00