Commit Graph

10417 Commits

Author SHA1 Message Date
qazal
0a45cd0cbe grouper: merge views in fuse elementwise (#10325)
* grouper: merge views in fuse elementwise

* with gradient api
2025-05-15 13:17:09 +03:00
qazal
89d8d5b25e add dims check in FUSE_ARANGE (#10323) 2025-05-15 11:33:21 +03:00
qazal
8fad0f0124 grouper: check for unsafe PAD in FUSE (#10322) 2025-05-15 10:53:44 +03:00
chenyu
f008e5f233 test_dtype_alu should cast bf16 input (#10320)
when testing alu for bfloat16, it should cast inputs to bfloat16 first, otherwise numpy has both errors from input and errors from alu which is more inaccurate
2025-05-15 01:11:39 -04:00
George Hotz
568d6d96e7 small changes from new multi [pr] (#10318) 2025-05-14 20:50:59 -07:00
chenyu
f6cf25fce4 cleanup test_conv2d_ceildiv_edge_case [pr] (#10317) 2025-05-14 23:35:28 -04:00
Kirill R.
50d7162acd Add conv2d ceildiv edge case (#10303) 2025-05-14 22:50:23 -04:00
uuuvn
e5639b7788 Remote finalize (#10314)
* Remote `.q(..., wait=True)`

Seems a bit cleaner than doing `.batch_request()` after `.q(...)` for
requests with return value.

* Remote finalize

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-05-14 19:40:44 -07:00
George Hotz
bfc30fa6ea hotfix: typo in shm_name 2025-05-14 19:34:52 -07:00
George Hotz
2bc54b3e22 manually handle OSX 2025-05-14 19:17:51 -07:00
George Hotz
ab460486d7 Revert "resnet dataloader osx (#10316)"
This reverts commit aef336930a.
2025-05-14 19:15:07 -07:00
uuuvn
7b4f27a219 Remote .q(..., wait=True) (#10313)
Seems a bit cleaner than doing `.batch_request()` after `.q(...)` for
requests with return value.
2025-05-14 19:07:20 -07:00
George Hotz
50181ab09f hotfix: bump to 13500 lines 2025-05-14 18:49:59 -07:00
George Hotz
aef336930a resnet dataloader osx (#10316)
* mlperf dataloader on mac

* resnet dataloader [pr]

* simple should work
2025-05-14 18:31:26 -07:00
wozeparrot
9b14e8c3cd feat: tag 0.10.3 (#10310) v0.10.3 2025-05-14 15:45:13 -07:00
George Hotz
18f532d110 small changes from O(1) multi [pr] (#10309) 2025-05-14 15:34:07 -07:00
wozeparrot
9bbc2bc2a7 hotfix: filter_too_much (#10308) 2025-05-14 15:31:51 -07:00
George Hotz
fc8ef63194 multi doesn't need tuple arg anymore [pr] (#10307) 2025-05-14 15:16:40 -07:00
George Hotz
7a3d4de59a hotfix: add GRAPH_ONE_KERNEL=1 to UsbGPU openpilot test 2025-05-14 14:50:37 -07:00
wozeparrot
2df2ec6640 feat: unpin hypothesis (#10306) 2025-05-14 14:26:28 -07:00
uuuvn
b52452d69f Remote multi (graph) (#9902)
* Remote multi (graph)

* Remote multi (graph transfers)

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-05-14 10:54:23 -07:00
George Hotz
42e70193c9 multi: instead of real, just copy (#10289)
* multi: instead of real, just copy

* fix test

* remove real
2025-05-14 10:36:55 -07:00
qazal
043efc6ec4 do not require self for track_rewrites [pr] (#10302) 2025-05-14 18:23:32 +03:00
uuuvn
dd816d0237 All MultiGraphRunners can graph transfers (#10301) 2025-05-14 17:23:02 +03:00
nimlgen
e00679dc92 am_smi: fix layout with sleep mode (#10300) 2025-05-14 15:44:42 +03:00
chenyu
fbaa26247a randn_like in minrf (#10298)
tested that it trains to similar loss
2025-05-14 07:59:50 -04:00
nimlgen
0788659d08 usbgpu: fast cold boot (#10260)
* usbgpu: fast cold boot

* cleaner

* assert

* xx

* compat

* fix

* fix
2025-05-14 14:58:55 +03:00
qazal
d342f7688d remove some skips in test_schedule + use assertRaisesRegex [pr] (#10296) 2025-05-14 14:54:07 +03:00
qazal
40f4ce3390 enable AMD CI for TestRandomness.test_multinomial [pr] (#10295) 2025-05-14 14:32:22 +03:00
nimlgen
792853b9e2 usbgpu: enable cache for compute queue (#10294) 2025-05-14 13:05:36 +03:00
nimlgen
1218fc2230 usbgpu: enable cache for 64bit addresses (#10293) 2025-05-14 12:37:39 +03:00
qazal
1770e00c41 only CAPTURE_PROCESS_REPLAY=1 + add filterwarnings back [pr] (#10292) 2025-05-14 11:58:42 +03:00
qazal
1c97338be5 enable process replay assert for schedule [pr] (#10280)
* enable process replay assert for schedule

* start at unique+1
2025-05-14 11:10:47 +03:00
George Hotz
f1130ab3d3 openpilot benchmark test (#10290)
* openpilot benchmark test

* that
2025-05-13 22:49:28 -07:00
uuuvn
f726f79a9e Remote multi (transfer) (#10285) 2025-05-13 18:26:32 -07:00
uuuvn
7bc4864bc4 Make dev a property of Allocator (#10286)
* Make `dev` a property of `Allocator`

(this is a prereq refactor for #10285)

At least `BufferXfer.copy` accesses it assuming it's always present,
currently most devices just add this property on their own repeating
the same code over and over again.

This is also a bit footguny, see `RemoteAllocator` that named this
property `device` instead of `dev`, i could obviously just change that
in one place but doing it globally seems like a better solution (and it
reduces code duplication too).

`MallocAllocator` is a bit special, but passing `None` works just fine.

* typing

* ignore type instead of cast
2025-05-13 17:01:01 -07:00
George Hotz
ec46f658d7 openpilot llvm test [pr] (#10288) 2025-05-13 16:51:41 -07:00
uuuvn
453b268342 Factor out remote connection and cache it (#10282)
Should be a small speed improvement but the main reason this is needed
is to have a defined ordering of RemoteRequests within one host so that
transfers won't required doing something like:
```python
src_dev.batch_submit()
dest_dev.q(Transfer(dest, src_dev.session, src))
dest_dev.batch_submit()
```
for correctness.
2025-05-13 15:02:06 -07:00
uuuvn
ddff9857b8 Remote properties is a dataclass (#10283)
Not strictly required for anything but soon there will be like 4 new
properties and having it be a huge json just seems like a bad taste.

It also seems right to not have a separate endpoint for this, just
`GetProperties` request that returns a repr of this similar to how
requests are sent in `BatchRequest`.

This will also make a switch to anything other than http much simpler
if it will be required for any reason, like just a tcp stream of
`BatchRequest`s
2025-05-13 11:56:58 -07:00
uuuvn
ba87eca0f1 Remote multi (basic) (#10269)
* Basic remote multi support

Simplest thing to be able to use remote with multiple gpus, very slow
because no transfers (copyin copyout for cross-device copies)

* tests
2025-05-13 09:52:47 -07:00
George Hotz
5f64bbc63d improve multi tests + add support for fixedvars [pr] (#10281)
* improve multi tests + add support for fixedvars [pr]

* add support for fixedvars
2025-05-13 09:27:00 -07:00
chenyu
8a906cb124 Tensor.randn_like (#10276) 2025-05-13 11:53:59 -04:00
nimlgen
eab71d70ba usbgpu: rescan pci bus every run (#10279)
* usbgpu: rescan pci bus every run

* ff
2025-05-13 18:31:42 +03:00
chenyu
c4988bc07b only run test_u32_to_f16 if it supports fp16 (#10277)
* only run test_u32_to_f16 if it supports fp16

* cleanup
2025-05-13 11:16:14 -04:00
nimlgen
9924c7d0e4 usbgpu: rebar (#10275)
* usbgpu: rebar

* cache back

* revert this

* fix

* ugh

* tt
2025-05-13 17:25:51 +03:00
uuuvn
1900c3c68a Metal multi in ci is fine actually (#10274)
Useful for testing remote multi stuff
2025-05-13 10:07:35 -04:00
nimlgen
6f42bf8b54 usbgpu: 10 steps in benchmark to hit cache (#10273) 2025-05-13 17:06:50 +03:00
chenyu
ad5cb2717d FUSE_ARANGE=1 in bert bench (#10263)
still fails, something multi related maybe

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2025-05-13 09:12:19 -04:00
qazal
a2d6b0afe0 fix FUSE pushing through SHRINK (#10271) 2025-05-13 11:38:53 +03:00
geohotstan
1c4ab6b991 ONNX add tests against ORT (#10270)
* start

* clean up

* indicate file location too
2025-05-13 04:03:52 -04:00