Commit Graph

4433 Commits

Author SHA1 Message Date
George Hotz
568d6d96e7 small changes from new multi [pr] (#10318) 2025-05-14 20:50:59 -07:00
chenyu
f6cf25fce4 cleanup test_conv2d_ceildiv_edge_case [pr] (#10317) 2025-05-14 23:35:28 -04:00
Kirill R.
50d7162acd Add conv2d ceildiv edge case (#10303) 2025-05-14 22:50:23 -04:00
wozeparrot
9bbc2bc2a7 hotfix: filter_too_much (#10308) 2025-05-14 15:31:51 -07:00
George Hotz
42e70193c9 multi: instead of real, just copy (#10289)
* multi: instead of real, just copy

* fix test

* remove real
2025-05-14 10:36:55 -07:00
qazal
043efc6ec4 do not require self for track_rewrites [pr] (#10302) 2025-05-14 18:23:32 +03:00
qazal
d342f7688d remove some skips in test_schedule + use assertRaisesRegex [pr] (#10296) 2025-05-14 14:54:07 +03:00
qazal
40f4ce3390 enable AMD CI for TestRandomness.test_multinomial [pr] (#10295) 2025-05-14 14:32:22 +03:00
qazal
1770e00c41 only CAPTURE_PROCESS_REPLAY=1 + add filterwarnings back [pr] (#10292) 2025-05-14 11:58:42 +03:00
qazal
1c97338be5 enable process replay assert for schedule [pr] (#10280)
* enable process replay assert for schedule

* start at unique+1
2025-05-14 11:10:47 +03:00
uuuvn
7bc4864bc4 Make dev a property of Allocator (#10286)
* Make `dev` a property of `Allocator`

(this is a prereq refactor for #10285)

At least `BufferXfer.copy` accesses it assuming it's always present,
currently most devices just add this property on their own repeating
the same code over and over again.

This is also a bit footguny, see `RemoteAllocator` that named this
property `device` instead of `dev`, i could obviously just change that
in one place but doing it globally seems like a better solution (and it
reduces code duplication too).

`MallocAllocator` is a bit special, but passing `None` works just fine.

* typing

* ignore type instead of cast
2025-05-13 17:01:01 -07:00
uuuvn
ddff9857b8 Remote properties is a dataclass (#10283)
Not strictly required for anything but soon there will be like 4 new
properties and having it be a huge json just seems like a bad taste.

It also seems right to not have a separate endpoint for this, just
`GetProperties` request that returns a repr of this similar to how
requests are sent in `BatchRequest`.

This will also make a switch to anything other than http much simpler
if it will be required for any reason, like just a tcp stream of
`BatchRequest`s
2025-05-13 11:56:58 -07:00
uuuvn
ba87eca0f1 Remote multi (basic) (#10269)
* Basic remote multi support

Simplest thing to be able to use remote with multiple gpus, very slow
because no transfers (copyin copyout for cross-device copies)

* tests
2025-05-13 09:52:47 -07:00
George Hotz
5f64bbc63d improve multi tests + add support for fixedvars [pr] (#10281)
* improve multi tests + add support for fixedvars [pr]

* add support for fixedvars
2025-05-13 09:27:00 -07:00
chenyu
8a906cb124 Tensor.randn_like (#10276) 2025-05-13 11:53:59 -04:00
chenyu
c4988bc07b only run test_u32_to_f16 if it supports fp16 (#10277)
* only run test_u32_to_f16 if it supports fp16

* cleanup
2025-05-13 11:16:14 -04:00
uuuvn
1900c3c68a Metal multi in ci is fine actually (#10274)
Useful for testing remote multi stuff
2025-05-13 10:07:35 -04:00
nimlgen
6f42bf8b54 usbgpu: 10 steps in benchmark to hit cache (#10273) 2025-05-13 17:06:50 +03:00
qazal
a2d6b0afe0 fix FUSE pushing through SHRINK (#10271) 2025-05-13 11:38:53 +03:00
geohotstan
1c4ab6b991 ONNX add tests against ORT (#10270)
* start

* clean up

* indicate file location too
2025-05-13 04:03:52 -04:00
Sieds Lykles
02208565de add check (#10257) 2025-05-12 11:03:01 -04:00
Kirill R.
4c7c139102 Use cmod/cdiv in sym_infer (#10258)
* Use cmod/cdiv in sym_infer

* test

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-12 09:07:28 -04:00
qazal
95c6a736a9 fix FUSE_ARANGE=1 for bert (#10255) 2025-05-12 14:44:05 +03:00
Sieds Lykles
7c4b381fbf Extra simplify valid test [pr] (#10256)
* add test

* Change the range

* add todo test
2025-05-12 07:32:03 -04:00
chenyu
70c797b107 train bert tests (#10248)
added a working bert tiny test, and a failed bert FUSE_ARANGE test
2025-05-11 08:42:08 -04:00
nimlgen
2145bce3f9 usbgpu: copyin size is 16k (#10240)
* usbgpu: copyin size is 16k

* ush
2025-05-09 22:12:54 +03:00
Sieds Lykles
74e40aafa0 use cdiv in div and mod folding (#10216)
* use cdiv

* use cdiv and cmod there as well

* Add tests

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-09 12:37:24 -04:00
Sieds Lykles
8da9c070ca take gcd out of trunc div (#10238) 2025-05-09 12:08:10 -04:00
chenyu
9846435c2e fix test_div_numerator_negative (#10229)
the simplification was wrong with negative const_factor
2025-05-09 06:19:59 -04:00
chenyu
cba508c8c3 update uop symbolic tests (#10228)
clean up TODOs and update tests
2025-05-09 01:55:53 -04:00
chenyu
56def6c319 better bound for mod negative number (#10227) 2025-05-09 01:19:47 -04:00
chenyu
99f6d89dfb tighter idiv bound for symbolic denominator (#10226) 2025-05-08 22:38:56 -04:00
qazal
b6904bbf83 Revert "split grouper into insert and finalize stages [pr] (#10222)" (#10224)
This reverts commit 2594e4db15.
2025-05-09 03:02:38 +03:00
qazal
2594e4db15 split grouper into insert and finalize stages [pr] (#10222) 2025-05-09 02:36:22 +03:00
qazal
1d0f239df7 use Tensor.train() in schedule test + typo [pr] (#10220) 2025-05-08 23:46:42 +03:00
nimlgen
267ba9b592 usbgpu: better names in copy speed benchmark (#10212) 2025-05-08 16:12:37 +03:00
hooved
7b4f05fd00 Add test for correctness of Infinity in WebGPU (#10201)
* use function for infinity instead of uniform

* test infinity math locally

* test infinity math in CI

* make pytest available to MacOS (WebGPU)

* revert to master except failing webgpu test
2025-05-08 05:20:05 -07:00
nimlgen
ba52fce4b2 usbgpu: benchmark in ci (#10208)
* usbgpu: benchmark

* usbgpu: benchmark
2025-05-08 12:02:04 +03:00
qazal
d0e3449992 remove view_supported_devices, check allocator instead [pr] (#10209) 2025-05-08 11:45:02 +03:00
George Hotz
8d4c563c01 all COPY can be clone (#10205)
* match old behavior

* simple

* it means the naive thing before the multi

* fix
2025-05-07 20:31:39 -07:00
hooved
8e76c40aea Refactor test: Enable generality in testing UOp alu expressions (#10200)
* use function for infinity instead of uniform

* test infinity math locally

* test infinity math in CI

* make pytest available to MacOS (WebGPU)

* revert to master except failing webgpu test

* isolate test refactor
2025-05-07 19:39:44 -07:00
uuuvn
10c9ede6b7 Cloud graph (#9876) 2025-05-07 11:41:41 -07:00
Sieds Lykles
2891892834 Fold constant variable (#10196)
* Add rule

* add test and comment

* merge rule
2025-05-07 11:39:44 -07:00
Sieds Lykles
8386527bb9 Take neg out of idiv (#10164)
* Add rules

* Fix tests

* Move rules lower to prevent recursion
2025-05-07 11:39:08 -07:00
Sieds Lykles
09544d4556 Add rule and test (#10189)
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-07 10:15:55 -04:00
nimlgen
0fbe494c6b usb: cache writes into 0xa000 (#10191)
* usb: cache writes into 0xa000

* mock

* match parent spec

* ugh
2025-05-07 16:03:35 +03:00
nimlgen
685d5c46df usbgpu: send pci write in batches (#10190)
* usbgpu: send pci write in batches

* mock
2025-05-07 14:41:56 +03:00
qazal
94e07725a6 only reorder expand if it can fuse with input (#10186)
* failing test

* only reorder expand if it can fuse with input

* (16,) is reshaped to (4, 4)
2025-05-07 18:14:31 +08:00
uuuvn
dba073e5c0 Less messy broken graph on paravirtualized metal workaround (#10182)
* Less messy broken graph on paravirtualized metal workaround

GitHub CI macOS runners use paravirtualized metal which is broken with
graph (some comments say that ICB in particular is broken but in my
testing it was fine sometimes, but other times hitting an assert inside
metal's code related to resouces, so not sure).

> Assertion failed: (resource != nil), function -[IOGPUMetalResource initWithResource:], file IOGPUMetalResource.m, line 458.

This can be reproduced locally with any virtualization software (like utm)
that can create macOS VMs with apple's own virtualization framework.

* unused import
2025-05-06 20:41:02 +03:00
nimlgen
37a7a99adb metal: fix graph when unrelated input buffers are not metal buffers (#10170)
* metal: fix graph when unrelated input buffers are not metal buffers

* tinier test
2025-05-06 11:37:16 +03:00