Commit Graph

8888 Commits

Author SHA1 Message Date
Elnur Rakhmatullin
de2b323d97 Fixed a typo in "simplify" (#10358) 2025-05-16 14:45:14 -07:00
Harald Schäfer
ee5258328a You never want multiple backends (#10354) 2025-05-16 13:10:39 -07:00
George Hotz
876d2275a1 changes from new multi (#10353)
* changes from new multi

* revert hcq change
2025-05-16 13:07:29 -07:00
wozeparrot
66e00c04dd fix: skip kernel timing tests on ci cuda (#10348) 2025-05-16 11:48:06 -07:00
Ignacio Sica
a54fd745c3 simpler barrier match in remu (#10339)
* s_barrier

* remove s_barrier from syncs
2025-05-16 14:40:58 +03:00
qazal
e9e5b54e43 grouper cleanups and merge with insert_kernels [pr] (#10349)
* grouper cleanups and merge with insert_kernels [pr]

* remove that
2025-05-16 14:39:56 +03:00
b1tg
caded2f413 llvm diagnostic error (#10267)
* llvm diagnostic info

* use decorator

* better error reporting

* fix mypy

* collect all diag msgs

* test diag error

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-16 02:03:20 -04:00
George Hotz
a4a25720b2 add test_multitensor_jit_input [pr] (#10347) 2025-05-15 20:47:57 -07:00
chenyu
c798f2f427 brew --quiet to suppress already installed warnings (#10346)
example https://github.com/tinygrad/tinygrad/actions/runs/15057000247
2025-05-15 23:31:18 -04:00
wozeparrot
12a1ccc680 clean: double import (#10345) 2025-05-15 20:15:09 -07:00
wozeparrot
1ed04f993b move benchmark stat tracking to influxdb (#10185) 2025-05-15 16:14:56 -07:00
wozeparrot
f59ecf2116 fix: mockgpu cuda timing (#10343) 2025-05-15 14:14:14 -07:00
nimlgen
a825608dc2 hcq: fix progs' __del__ when shutdown (#10341)
* debug ci

* better?

* and mute this?

* revrt that
2025-05-15 23:26:48 +03:00
Ignacio Sica
47b3055fe2 set fail-fast behavior (#10336) 2025-05-15 11:24:45 -07:00
uuuvn
c2bf2c6bb0 Remote offset (#10311)
For memory savings from memory planner. Also for some reason it makes hlb
cifar on mac noticeably faster.

master:
```
  3  210.12 ms run,    4.34 ms python,  205.78 ms REMOTE, 2075.90 loss, 0.002698 LR, 2.07 GB used,   1558.41 GFLOPS,    327.45 GOPS
  4  210.40 ms run,    4.33 ms python,  206.07 ms REMOTE, 2481.94 loss, 0.002262 LR, 2.07 GB used,   1556.34 GFLOPS,    327.45 GOPS
  5  188.08 ms run,    4.41 ms python,  183.67 ms REMOTE, 1967.49 loss, 0.001827 LR, 2.07 GB used,   1741.00 GFLOPS,    327.45 GOPS
  6  211.19 ms run,    4.26 ms python,  206.93 ms REMOTE, 1511.62 loss, 0.001392 LR, 2.07 GB used,   1550.51 GFLOPS,    327.45 GOPS
```

this:
```
  3  189.05 ms run,    4.50 ms python,  184.55 ms REMOTE, 2075.90 loss, 0.002698 LR, 1.60 GB used,   1732.08 GFLOPS,    327.45 GOPS
  4  187.81 ms run,    4.11 ms python,  183.71 ms REMOTE, 2481.94 loss, 0.002262 LR, 1.60 GB used,   1743.49 GFLOPS,    327.45 GOPS
  5  186.70 ms run,    4.09 ms python,  182.62 ms REMOTE, 1967.49 loss, 0.001827 LR, 1.60 GB used,   1753.89 GFLOPS,    327.45 GOPS
  6  187.18 ms run,    4.06 ms python,  183.12 ms REMOTE, 1511.62 loss, 0.001392 LR, 1.60 GB used,   1749.36 GFLOPS,    327.45 GOPS
```

(`PYTHONPATH=. REMOTE=1 REMOTEDEV=METAL BS=256 STEPS=10 python examples/hlb_cifar10.py`)

Clouldn't reliably reproduce the faster thing on tinybox though.
2025-05-15 11:20:01 -07:00
Ignacio Sica
3c453e96a9 add ds_load_b96 and ds_store_b96 instructions (#10338) 2025-05-15 18:11:08 +03:00
qazal
be8202b293 add s_abs_i32 instruction to remu (#10334) 2025-05-15 16:47:58 +03:00
nimlgen
5efbe1c947 print offset only for subbuf (#10332) 2025-05-15 15:35:19 +03:00
qazal
7cfe367c07 failing test for slow embedding kernel with FUSE_ARANGE=1 [pr] (#10330) 2025-05-15 14:58:11 +03:00
nimlgen
5f03688280 usbgpu: remove max_read_len (#10328) 2025-05-15 14:49:58 +03:00
qazal
27b3dbe67e remove FUSE_ARANGE_UINT [pr] (#10324) 2025-05-15 14:39:54 +03:00
qazal
0a45cd0cbe grouper: merge views in fuse elementwise (#10325)
* grouper: merge views in fuse elementwise

* with gradient api
2025-05-15 13:17:09 +03:00
qazal
89d8d5b25e add dims check in FUSE_ARANGE (#10323) 2025-05-15 11:33:21 +03:00
qazal
8fad0f0124 grouper: check for unsafe PAD in FUSE (#10322) 2025-05-15 10:53:44 +03:00
chenyu
f008e5f233 test_dtype_alu should cast bf16 input (#10320)
when testing alu for bfloat16, it should cast inputs to bfloat16 first, otherwise numpy has both errors from input and errors from alu which is more inaccurate
2025-05-15 01:11:39 -04:00
George Hotz
568d6d96e7 small changes from new multi [pr] (#10318) 2025-05-14 20:50:59 -07:00
chenyu
f6cf25fce4 cleanup test_conv2d_ceildiv_edge_case [pr] (#10317) 2025-05-14 23:35:28 -04:00
Kirill R.
50d7162acd Add conv2d ceildiv edge case (#10303) 2025-05-14 22:50:23 -04:00
uuuvn
e5639b7788 Remote finalize (#10314)
* Remote `.q(..., wait=True)`

Seems a bit cleaner than doing `.batch_request()` after `.q(...)` for
requests with return value.

* Remote finalize

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-05-14 19:40:44 -07:00
George Hotz
bfc30fa6ea hotfix: typo in shm_name 2025-05-14 19:34:52 -07:00
George Hotz
2bc54b3e22 manually handle OSX 2025-05-14 19:17:51 -07:00
George Hotz
ab460486d7 Revert "resnet dataloader osx (#10316)"
This reverts commit aef336930a.
2025-05-14 19:15:07 -07:00
uuuvn
7b4f27a219 Remote .q(..., wait=True) (#10313)
Seems a bit cleaner than doing `.batch_request()` after `.q(...)` for
requests with return value.
2025-05-14 19:07:20 -07:00
George Hotz
50181ab09f hotfix: bump to 13500 lines 2025-05-14 18:49:59 -07:00
George Hotz
aef336930a resnet dataloader osx (#10316)
* mlperf dataloader on mac

* resnet dataloader [pr]

* simple should work
2025-05-14 18:31:26 -07:00
wozeparrot
9b14e8c3cd feat: tag 0.10.3 (#10310) v0.10.3 2025-05-14 15:45:13 -07:00
George Hotz
18f532d110 small changes from O(1) multi [pr] (#10309) 2025-05-14 15:34:07 -07:00
wozeparrot
9bbc2bc2a7 hotfix: filter_too_much (#10308) 2025-05-14 15:31:51 -07:00
George Hotz
fc8ef63194 multi doesn't need tuple arg anymore [pr] (#10307) 2025-05-14 15:16:40 -07:00
George Hotz
7a3d4de59a hotfix: add GRAPH_ONE_KERNEL=1 to UsbGPU openpilot test 2025-05-14 14:50:37 -07:00
wozeparrot
2df2ec6640 feat: unpin hypothesis (#10306) 2025-05-14 14:26:28 -07:00
uuuvn
b52452d69f Remote multi (graph) (#9902)
* Remote multi (graph)

* Remote multi (graph transfers)

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-05-14 10:54:23 -07:00
George Hotz
42e70193c9 multi: instead of real, just copy (#10289)
* multi: instead of real, just copy

* fix test

* remove real
2025-05-14 10:36:55 -07:00
qazal
043efc6ec4 do not require self for track_rewrites [pr] (#10302) 2025-05-14 18:23:32 +03:00
uuuvn
dd816d0237 All MultiGraphRunners can graph transfers (#10301) 2025-05-14 17:23:02 +03:00
nimlgen
e00679dc92 am_smi: fix layout with sleep mode (#10300) 2025-05-14 15:44:42 +03:00
chenyu
fbaa26247a randn_like in minrf (#10298)
tested that it trains to similar loss
2025-05-14 07:59:50 -04:00
nimlgen
0788659d08 usbgpu: fast cold boot (#10260)
* usbgpu: fast cold boot

* cleaner

* assert

* xx

* compat

* fix

* fix
2025-05-14 14:58:55 +03:00
qazal
d342f7688d remove some skips in test_schedule + use assertRaisesRegex [pr] (#10296) 2025-05-14 14:54:07 +03:00
qazal
40f4ce3390 enable AMD CI for TestRandomness.test_multinomial [pr] (#10295) 2025-05-14 14:32:22 +03:00