qazal
7cfe367c07
failing test for slow embedding kernel with FUSE_ARANGE=1 [pr] ( #10330 )
2025-05-15 14:58:11 +03:00
qazal
0a45cd0cbe
grouper: merge views in fuse elementwise ( #10325 )
...
* grouper: merge views in fuse elementwise
* with gradient api
2025-05-15 13:17:09 +03:00
qazal
89d8d5b25e
add dims check in FUSE_ARANGE ( #10323 )
2025-05-15 11:33:21 +03:00
qazal
8fad0f0124
grouper: check for unsafe PAD in FUSE ( #10322 )
2025-05-15 10:53:44 +03:00
chenyu
f008e5f233
test_dtype_alu should cast bf16 input ( #10320 )
...
when testing alu for bfloat16, it should cast inputs to bfloat16 first, otherwise numpy has both errors from input and errors from alu which is more inaccurate
2025-05-15 01:11:39 -04:00
George Hotz
568d6d96e7
small changes from new multi [pr] ( #10318 )
2025-05-14 20:50:59 -07:00
chenyu
f6cf25fce4
cleanup test_conv2d_ceildiv_edge_case [pr] ( #10317 )
2025-05-14 23:35:28 -04:00
Kirill R.
50d7162acd
Add conv2d ceildiv edge case ( #10303 )
2025-05-14 22:50:23 -04:00
wozeparrot
9bbc2bc2a7
hotfix: filter_too_much ( #10308 )
2025-05-14 15:31:51 -07:00
George Hotz
42e70193c9
multi: instead of real, just copy ( #10289 )
...
* multi: instead of real, just copy
* fix test
* remove real
2025-05-14 10:36:55 -07:00
qazal
043efc6ec4
do not require self for track_rewrites [pr] ( #10302 )
2025-05-14 18:23:32 +03:00
qazal
d342f7688d
remove some skips in test_schedule + use assertRaisesRegex [pr] ( #10296 )
2025-05-14 14:54:07 +03:00
qazal
40f4ce3390
enable AMD CI for TestRandomness.test_multinomial [pr] ( #10295 )
2025-05-14 14:32:22 +03:00
qazal
1770e00c41
only CAPTURE_PROCESS_REPLAY=1 + add filterwarnings back [pr] ( #10292 )
2025-05-14 11:58:42 +03:00
qazal
1c97338be5
enable process replay assert for schedule [pr] ( #10280 )
...
* enable process replay assert for schedule
* start at unique+1
2025-05-14 11:10:47 +03:00
uuuvn
7bc4864bc4
Make dev a property of Allocator ( #10286 )
...
* Make `dev` a property of `Allocator`
(this is a prereq refactor for #10285 )
At least `BufferXfer.copy` accesses it assuming it's always present,
currently most devices just add this property on their own repeating
the same code over and over again.
This is also a bit footguny, see `RemoteAllocator` that named this
property `device` instead of `dev`, i could obviously just change that
in one place but doing it globally seems like a better solution (and it
reduces code duplication too).
`MallocAllocator` is a bit special, but passing `None` works just fine.
* typing
* ignore type instead of cast
2025-05-13 17:01:01 -07:00
uuuvn
ddff9857b8
Remote properties is a dataclass ( #10283 )
...
Not strictly required for anything but soon there will be like 4 new
properties and having it be a huge json just seems like a bad taste.
It also seems right to not have a separate endpoint for this, just
`GetProperties` request that returns a repr of this similar to how
requests are sent in `BatchRequest`.
This will also make a switch to anything other than http much simpler
if it will be required for any reason, like just a tcp stream of
`BatchRequest`s
2025-05-13 11:56:58 -07:00
uuuvn
ba87eca0f1
Remote multi (basic) ( #10269 )
...
* Basic remote multi support
Simplest thing to be able to use remote with multiple gpus, very slow
because no transfers (copyin copyout for cross-device copies)
* tests
2025-05-13 09:52:47 -07:00
George Hotz
5f64bbc63d
improve multi tests + add support for fixedvars [pr] ( #10281 )
...
* improve multi tests + add support for fixedvars [pr]
* add support for fixedvars
2025-05-13 09:27:00 -07:00
chenyu
8a906cb124
Tensor.randn_like ( #10276 )
2025-05-13 11:53:59 -04:00
chenyu
c4988bc07b
only run test_u32_to_f16 if it supports fp16 ( #10277 )
...
* only run test_u32_to_f16 if it supports fp16
* cleanup
2025-05-13 11:16:14 -04:00
uuuvn
1900c3c68a
Metal multi in ci is fine actually ( #10274 )
...
Useful for testing remote multi stuff
2025-05-13 10:07:35 -04:00
nimlgen
6f42bf8b54
usbgpu: 10 steps in benchmark to hit cache ( #10273 )
2025-05-13 17:06:50 +03:00
qazal
a2d6b0afe0
fix FUSE pushing through SHRINK ( #10271 )
2025-05-13 11:38:53 +03:00
geohotstan
1c4ab6b991
ONNX add tests against ORT ( #10270 )
...
* start
* clean up
* indicate file location too
2025-05-13 04:03:52 -04:00
Sieds Lykles
02208565de
add check ( #10257 )
2025-05-12 11:03:01 -04:00
Kirill R.
4c7c139102
Use cmod/cdiv in sym_infer ( #10258 )
...
* Use cmod/cdiv in sym_infer
* test
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-05-12 09:07:28 -04:00
qazal
95c6a736a9
fix FUSE_ARANGE=1 for bert ( #10255 )
2025-05-12 14:44:05 +03:00
Sieds Lykles
7c4b381fbf
Extra simplify valid test [pr] ( #10256 )
...
* add test
* Change the range
* add todo test
2025-05-12 07:32:03 -04:00
chenyu
70c797b107
train bert tests ( #10248 )
...
added a working bert tiny test, and a failed bert FUSE_ARANGE test
2025-05-11 08:42:08 -04:00
nimlgen
2145bce3f9
usbgpu: copyin size is 16k ( #10240 )
...
* usbgpu: copyin size is 16k
* ush
2025-05-09 22:12:54 +03:00
Sieds Lykles
74e40aafa0
use cdiv in div and mod folding ( #10216 )
...
* use cdiv
* use cdiv and cmod there as well
* Add tests
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-05-09 12:37:24 -04:00
Sieds Lykles
8da9c070ca
take gcd out of trunc div ( #10238 )
2025-05-09 12:08:10 -04:00
chenyu
9846435c2e
fix test_div_numerator_negative ( #10229 )
...
the simplification was wrong with negative const_factor
2025-05-09 06:19:59 -04:00
chenyu
cba508c8c3
update uop symbolic tests ( #10228 )
...
clean up TODOs and update tests
2025-05-09 01:55:53 -04:00
chenyu
56def6c319
better bound for mod negative number ( #10227 )
2025-05-09 01:19:47 -04:00
chenyu
99f6d89dfb
tighter idiv bound for symbolic denominator ( #10226 )
2025-05-08 22:38:56 -04:00
qazal
b6904bbf83
Revert "split grouper into insert and finalize stages [pr] ( #10222 )" ( #10224 )
...
This reverts commit 2594e4db15 .
2025-05-09 03:02:38 +03:00
qazal
2594e4db15
split grouper into insert and finalize stages [pr] ( #10222 )
2025-05-09 02:36:22 +03:00
qazal
1d0f239df7
use Tensor.train() in schedule test + typo [pr] ( #10220 )
2025-05-08 23:46:42 +03:00
nimlgen
267ba9b592
usbgpu: better names in copy speed benchmark ( #10212 )
2025-05-08 16:12:37 +03:00
hooved
7b4f05fd00
Add test for correctness of Infinity in WebGPU ( #10201 )
...
* use function for infinity instead of uniform
* test infinity math locally
* test infinity math in CI
* make pytest available to MacOS (WebGPU)
* revert to master except failing webgpu test
2025-05-08 05:20:05 -07:00
nimlgen
ba52fce4b2
usbgpu: benchmark in ci ( #10208 )
...
* usbgpu: benchmark
* usbgpu: benchmark
2025-05-08 12:02:04 +03:00
qazal
d0e3449992
remove view_supported_devices, check allocator instead [pr] ( #10209 )
2025-05-08 11:45:02 +03:00
George Hotz
8d4c563c01
all COPY can be clone ( #10205 )
...
* match old behavior
* simple
* it means the naive thing before the multi
* fix
2025-05-07 20:31:39 -07:00
hooved
8e76c40aea
Refactor test: Enable generality in testing UOp alu expressions ( #10200 )
...
* use function for infinity instead of uniform
* test infinity math locally
* test infinity math in CI
* make pytest available to MacOS (WebGPU)
* revert to master except failing webgpu test
* isolate test refactor
2025-05-07 19:39:44 -07:00
uuuvn
10c9ede6b7
Cloud graph ( #9876 )
2025-05-07 11:41:41 -07:00
Sieds Lykles
2891892834
Fold constant variable ( #10196 )
...
* Add rule
* add test and comment
* merge rule
2025-05-07 11:39:44 -07:00
Sieds Lykles
8386527bb9
Take neg out of idiv ( #10164 )
...
* Add rules
* Fix tests
* Move rules lower to prevent recursion
2025-05-07 11:39:08 -07:00
Sieds Lykles
09544d4556
Add rule and test ( #10189 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-05-07 10:15:55 -04:00