Commit Graph

3723 Commits

Author SHA1 Message Date
uuuvn
7bc4864bc4 Make dev a property of Allocator (#10286)
* Make `dev` a property of `Allocator`

(this is a prereq refactor for #10285)

At least `BufferXfer.copy` accesses it assuming it's always present,
currently most devices just add this property on their own repeating
the same code over and over again.

This is also a bit footguny, see `RemoteAllocator` that named this
property `device` instead of `dev`, i could obviously just change that
in one place but doing it globally seems like a better solution (and it
reduces code duplication too).

`MallocAllocator` is a bit special, but passing `None` works just fine.

* typing

* ignore type instead of cast
2025-05-13 17:01:01 -07:00
uuuvn
ddff9857b8 Remote properties is a dataclass (#10283)
Not strictly required for anything but soon there will be like 4 new
properties and having it be a huge json just seems like a bad taste.

It also seems right to not have a separate endpoint for this, just
`GetProperties` request that returns a repr of this similar to how
requests are sent in `BatchRequest`.

This will also make a switch to anything other than http much simpler
if it will be required for any reason, like just a tcp stream of
`BatchRequest`s
2025-05-13 11:56:58 -07:00
uuuvn
ba87eca0f1 Remote multi (basic) (#10269)
* Basic remote multi support

Simplest thing to be able to use remote with multiple gpus, very slow
because no transfers (copyin copyout for cross-device copies)

* tests
2025-05-13 09:52:47 -07:00
George Hotz
5f64bbc63d improve multi tests + add support for fixedvars [pr] (#10281)
* improve multi tests + add support for fixedvars [pr]

* add support for fixedvars
2025-05-13 09:27:00 -07:00
chenyu
8a906cb124 Tensor.randn_like (#10276) 2025-05-13 11:53:59 -04:00
chenyu
c4988bc07b only run test_u32_to_f16 if it supports fp16 (#10277)
* only run test_u32_to_f16 if it supports fp16

* cleanup
2025-05-13 11:16:14 -04:00
uuuvn
1900c3c68a Metal multi in ci is fine actually (#10274)
Useful for testing remote multi stuff
2025-05-13 10:07:35 -04:00
nimlgen
6f42bf8b54 usbgpu: 10 steps in benchmark to hit cache (#10273) 2025-05-13 17:06:50 +03:00
qazal
a2d6b0afe0 fix FUSE pushing through SHRINK (#10271) 2025-05-13 11:38:53 +03:00
geohotstan
1c4ab6b991 ONNX add tests against ORT (#10270)
* start

* clean up

* indicate file location too
2025-05-13 04:03:52 -04:00
Sieds Lykles
02208565de add check (#10257) 2025-05-12 11:03:01 -04:00
Kirill R.
4c7c139102 Use cmod/cdiv in sym_infer (#10258)
* Use cmod/cdiv in sym_infer

* test

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-12 09:07:28 -04:00
qazal
95c6a736a9 fix FUSE_ARANGE=1 for bert (#10255) 2025-05-12 14:44:05 +03:00
Sieds Lykles
7c4b381fbf Extra simplify valid test [pr] (#10256)
* add test

* Change the range

* add todo test
2025-05-12 07:32:03 -04:00
chenyu
70c797b107 train bert tests (#10248)
added a working bert tiny test, and a failed bert FUSE_ARANGE test
2025-05-11 08:42:08 -04:00
nimlgen
2145bce3f9 usbgpu: copyin size is 16k (#10240)
* usbgpu: copyin size is 16k

* ush
2025-05-09 22:12:54 +03:00
Sieds Lykles
74e40aafa0 use cdiv in div and mod folding (#10216)
* use cdiv

* use cdiv and cmod there as well

* Add tests

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-09 12:37:24 -04:00
Sieds Lykles
8da9c070ca take gcd out of trunc div (#10238) 2025-05-09 12:08:10 -04:00
chenyu
9846435c2e fix test_div_numerator_negative (#10229)
the simplification was wrong with negative const_factor
2025-05-09 06:19:59 -04:00
chenyu
cba508c8c3 update uop symbolic tests (#10228)
clean up TODOs and update tests
2025-05-09 01:55:53 -04:00
chenyu
56def6c319 better bound for mod negative number (#10227) 2025-05-09 01:19:47 -04:00
chenyu
99f6d89dfb tighter idiv bound for symbolic denominator (#10226) 2025-05-08 22:38:56 -04:00
qazal
b6904bbf83 Revert "split grouper into insert and finalize stages [pr] (#10222)" (#10224)
This reverts commit 2594e4db15.
2025-05-09 03:02:38 +03:00
qazal
2594e4db15 split grouper into insert and finalize stages [pr] (#10222) 2025-05-09 02:36:22 +03:00
qazal
1d0f239df7 use Tensor.train() in schedule test + typo [pr] (#10220) 2025-05-08 23:46:42 +03:00
nimlgen
267ba9b592 usbgpu: better names in copy speed benchmark (#10212) 2025-05-08 16:12:37 +03:00
hooved
7b4f05fd00 Add test for correctness of Infinity in WebGPU (#10201)
* use function for infinity instead of uniform

* test infinity math locally

* test infinity math in CI

* make pytest available to MacOS (WebGPU)

* revert to master except failing webgpu test
2025-05-08 05:20:05 -07:00
nimlgen
ba52fce4b2 usbgpu: benchmark in ci (#10208)
* usbgpu: benchmark

* usbgpu: benchmark
2025-05-08 12:02:04 +03:00
qazal
d0e3449992 remove view_supported_devices, check allocator instead [pr] (#10209) 2025-05-08 11:45:02 +03:00
George Hotz
8d4c563c01 all COPY can be clone (#10205)
* match old behavior

* simple

* it means the naive thing before the multi

* fix
2025-05-07 20:31:39 -07:00
hooved
8e76c40aea Refactor test: Enable generality in testing UOp alu expressions (#10200)
* use function for infinity instead of uniform

* test infinity math locally

* test infinity math in CI

* make pytest available to MacOS (WebGPU)

* revert to master except failing webgpu test

* isolate test refactor
2025-05-07 19:39:44 -07:00
uuuvn
10c9ede6b7 Cloud graph (#9876) 2025-05-07 11:41:41 -07:00
Sieds Lykles
2891892834 Fold constant variable (#10196)
* Add rule

* add test and comment

* merge rule
2025-05-07 11:39:44 -07:00
Sieds Lykles
8386527bb9 Take neg out of idiv (#10164)
* Add rules

* Fix tests

* Move rules lower to prevent recursion
2025-05-07 11:39:08 -07:00
Sieds Lykles
09544d4556 Add rule and test (#10189)
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-07 10:15:55 -04:00
nimlgen
0fbe494c6b usb: cache writes into 0xa000 (#10191)
* usb: cache writes into 0xa000

* mock

* match parent spec

* ugh
2025-05-07 16:03:35 +03:00
nimlgen
685d5c46df usbgpu: send pci write in batches (#10190)
* usbgpu: send pci write in batches

* mock
2025-05-07 14:41:56 +03:00
qazal
94e07725a6 only reorder expand if it can fuse with input (#10186)
* failing test

* only reorder expand if it can fuse with input

* (16,) is reshaped to (4, 4)
2025-05-07 18:14:31 +08:00
uuuvn
dba073e5c0 Less messy broken graph on paravirtualized metal workaround (#10182)
* Less messy broken graph on paravirtualized metal workaround

GitHub CI macOS runners use paravirtualized metal which is broken with
graph (some comments say that ICB in particular is broken but in my
testing it was fine sometimes, but other times hitting an assert inside
metal's code related to resouces, so not sure).

> Assertion failed: (resource != nil), function -[IOGPUMetalResource initWithResource:], file IOGPUMetalResource.m, line 458.

This can be reproduced locally with any virtualization software (like utm)
that can create macOS VMs with apple's own virtualization framework.

* unused import
2025-05-06 20:41:02 +03:00
nimlgen
37a7a99adb metal: fix graph when unrelated input buffers are not metal buffers (#10170)
* metal: fix graph when unrelated input buffers are not metal buffers

* tinier test
2025-05-06 11:37:16 +03:00
George Hotz
603c03bef2 fix tests for rewrite [pr] (#10167)
* fix tests for rewrite [pr]

* cleaner

* delete linearize_uop

* clean up the rest
2025-05-05 19:19:49 -07:00
wozeparrot
10437904cd refactor: ops_cloud -> ops_remote [pr] (#10166) 2025-05-05 15:59:51 -07:00
Sieds Lykles
338f33efae Fast mod (#10055)
* Enable fast mod

* Add test
2025-05-05 09:15:43 -07:00
qazal
62e86bc5ec insert Ops.FUSE for arange (#10140)
* insert Ops.FUSE for arange

* reshape does not collapse

* do not fuse reshapes

* add children

* fixups

* work

* add Ops.WHERE support to z3

* fix fuse for cast

* diff

* ugh

* don't need this anymore

* contiguous

* add always_contiguous

* there too
2025-05-05 08:32:12 +03:00
George Hotz
a0240d8c2b lil work on llvm speed (#10157)
* lil work on llvm speed

* llvm failing test

* 1e-4

* simpler failing test

* once is fine

* gpt suggests this syntax change

* bump that debug
2025-05-04 16:37:26 -07:00
George Hotz
36ccaa88a6 move merge views [pr] (#10156)
* move merge views [pr]

* move flow to __init__ [pr]
2025-05-04 14:41:47 -07:00
George Hotz
5f3f162606 cache rewrites for renderer [pr] (#10155)
* add caching to rewrites for renderer [pr]

* remove that

* update ebs
2025-05-04 13:45:15 -07:00
Sieds Lykles
848c7783a4 Sign check in div const div pattern (#10150)
* Add rule

* Relax the condition

* Add test

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-03 18:04:34 -04:00
George Hotz
7c33924a50 don't use real_size for mem_bytes [pr] (#10147) 2025-05-03 09:41:21 -04:00
nimlgen
45bf7c5b81 am: add allocation bench (#10135)
* init allocation bench

* sorryg

* betetr
2025-05-02 13:51:07 +03:00