Elnur Rakhmatullin
de2b323d97
Fixed a typo in "simplify" ( #10358 )
2025-05-16 14:45:14 -07:00
Harald Schäfer
ee5258328a
You never want multiple backends ( #10354 )
2025-05-16 13:10:39 -07:00
George Hotz
876d2275a1
changes from new multi ( #10353 )
...
* changes from new multi
* revert hcq change
2025-05-16 13:07:29 -07:00
wozeparrot
66e00c04dd
fix: skip kernel timing tests on ci cuda ( #10348 )
2025-05-16 11:48:06 -07:00
Ignacio Sica
a54fd745c3
simpler barrier match in remu ( #10339 )
...
* s_barrier
* remove s_barrier from syncs
2025-05-16 14:40:58 +03:00
qazal
e9e5b54e43
grouper cleanups and merge with insert_kernels [pr] ( #10349 )
...
* grouper cleanups and merge with insert_kernels [pr]
* remove that
2025-05-16 14:39:56 +03:00
b1tg
caded2f413
llvm diagnostic error ( #10267 )
...
* llvm diagnostic info
* use decorator
* better error reporting
* fix mypy
* collect all diag msgs
* test diag error
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-05-16 02:03:20 -04:00
George Hotz
a4a25720b2
add test_multitensor_jit_input [pr] ( #10347 )
2025-05-15 20:47:57 -07:00
chenyu
c798f2f427
brew --quiet to suppress already installed warnings ( #10346 )
...
example https://github.com/tinygrad/tinygrad/actions/runs/15057000247
2025-05-15 23:31:18 -04:00
wozeparrot
12a1ccc680
clean: double import ( #10345 )
2025-05-15 20:15:09 -07:00
wozeparrot
1ed04f993b
move benchmark stat tracking to influxdb ( #10185 )
2025-05-15 16:14:56 -07:00
wozeparrot
f59ecf2116
fix: mockgpu cuda timing ( #10343 )
2025-05-15 14:14:14 -07:00
nimlgen
a825608dc2
hcq: fix progs' __del__ when shutdown ( #10341 )
...
* debug ci
* better?
* and mute this?
* revrt that
2025-05-15 23:26:48 +03:00
Ignacio Sica
47b3055fe2
set fail-fast behavior ( #10336 )
2025-05-15 11:24:45 -07:00
uuuvn
c2bf2c6bb0
Remote offset ( #10311 )
...
For memory savings from memory planner. Also for some reason it makes hlb
cifar on mac noticeably faster.
master:
```
3 210.12 ms run, 4.34 ms python, 205.78 ms REMOTE, 2075.90 loss, 0.002698 LR, 2.07 GB used, 1558.41 GFLOPS, 327.45 GOPS
4 210.40 ms run, 4.33 ms python, 206.07 ms REMOTE, 2481.94 loss, 0.002262 LR, 2.07 GB used, 1556.34 GFLOPS, 327.45 GOPS
5 188.08 ms run, 4.41 ms python, 183.67 ms REMOTE, 1967.49 loss, 0.001827 LR, 2.07 GB used, 1741.00 GFLOPS, 327.45 GOPS
6 211.19 ms run, 4.26 ms python, 206.93 ms REMOTE, 1511.62 loss, 0.001392 LR, 2.07 GB used, 1550.51 GFLOPS, 327.45 GOPS
```
this:
```
3 189.05 ms run, 4.50 ms python, 184.55 ms REMOTE, 2075.90 loss, 0.002698 LR, 1.60 GB used, 1732.08 GFLOPS, 327.45 GOPS
4 187.81 ms run, 4.11 ms python, 183.71 ms REMOTE, 2481.94 loss, 0.002262 LR, 1.60 GB used, 1743.49 GFLOPS, 327.45 GOPS
5 186.70 ms run, 4.09 ms python, 182.62 ms REMOTE, 1967.49 loss, 0.001827 LR, 1.60 GB used, 1753.89 GFLOPS, 327.45 GOPS
6 187.18 ms run, 4.06 ms python, 183.12 ms REMOTE, 1511.62 loss, 0.001392 LR, 1.60 GB used, 1749.36 GFLOPS, 327.45 GOPS
```
(`PYTHONPATH=. REMOTE=1 REMOTEDEV=METAL BS=256 STEPS=10 python examples/hlb_cifar10.py`)
Clouldn't reliably reproduce the faster thing on tinybox though.
2025-05-15 11:20:01 -07:00
Ignacio Sica
3c453e96a9
add ds_load_b96 and ds_store_b96 instructions ( #10338 )
2025-05-15 18:11:08 +03:00
qazal
be8202b293
add s_abs_i32 instruction to remu ( #10334 )
2025-05-15 16:47:58 +03:00
nimlgen
5efbe1c947
print offset only for subbuf ( #10332 )
2025-05-15 15:35:19 +03:00
qazal
7cfe367c07
failing test for slow embedding kernel with FUSE_ARANGE=1 [pr] ( #10330 )
2025-05-15 14:58:11 +03:00
nimlgen
5f03688280
usbgpu: remove max_read_len ( #10328 )
2025-05-15 14:49:58 +03:00
qazal
27b3dbe67e
remove FUSE_ARANGE_UINT [pr] ( #10324 )
2025-05-15 14:39:54 +03:00
qazal
0a45cd0cbe
grouper: merge views in fuse elementwise ( #10325 )
...
* grouper: merge views in fuse elementwise
* with gradient api
2025-05-15 13:17:09 +03:00
qazal
89d8d5b25e
add dims check in FUSE_ARANGE ( #10323 )
2025-05-15 11:33:21 +03:00
qazal
8fad0f0124
grouper: check for unsafe PAD in FUSE ( #10322 )
2025-05-15 10:53:44 +03:00
chenyu
f008e5f233
test_dtype_alu should cast bf16 input ( #10320 )
...
when testing alu for bfloat16, it should cast inputs to bfloat16 first, otherwise numpy has both errors from input and errors from alu which is more inaccurate
2025-05-15 01:11:39 -04:00
George Hotz
568d6d96e7
small changes from new multi [pr] ( #10318 )
2025-05-14 20:50:59 -07:00
chenyu
f6cf25fce4
cleanup test_conv2d_ceildiv_edge_case [pr] ( #10317 )
2025-05-14 23:35:28 -04:00
Kirill R.
50d7162acd
Add conv2d ceildiv edge case ( #10303 )
2025-05-14 22:50:23 -04:00
uuuvn
e5639b7788
Remote finalize ( #10314 )
...
* Remote `.q(..., wait=True)`
Seems a bit cleaner than doing `.batch_request()` after `.q(...)` for
requests with return value.
* Remote finalize
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-05-14 19:40:44 -07:00
George Hotz
bfc30fa6ea
hotfix: typo in shm_name
2025-05-14 19:34:52 -07:00
George Hotz
2bc54b3e22
manually handle OSX
2025-05-14 19:17:51 -07:00
George Hotz
ab460486d7
Revert "resnet dataloader osx ( #10316 )"
...
This reverts commit aef336930a .
2025-05-14 19:15:07 -07:00
uuuvn
7b4f27a219
Remote .q(..., wait=True) ( #10313 )
...
Seems a bit cleaner than doing `.batch_request()` after `.q(...)` for
requests with return value.
2025-05-14 19:07:20 -07:00
George Hotz
50181ab09f
hotfix: bump to 13500 lines
2025-05-14 18:49:59 -07:00
George Hotz
aef336930a
resnet dataloader osx ( #10316 )
...
* mlperf dataloader on mac
* resnet dataloader [pr]
* simple should work
2025-05-14 18:31:26 -07:00
wozeparrot
9b14e8c3cd
feat: tag 0.10.3 ( #10310 )
v0.10.3
2025-05-14 15:45:13 -07:00
George Hotz
18f532d110
small changes from O(1) multi [pr] ( #10309 )
2025-05-14 15:34:07 -07:00
wozeparrot
9bbc2bc2a7
hotfix: filter_too_much ( #10308 )
2025-05-14 15:31:51 -07:00
George Hotz
fc8ef63194
multi doesn't need tuple arg anymore [pr] ( #10307 )
2025-05-14 15:16:40 -07:00
George Hotz
7a3d4de59a
hotfix: add GRAPH_ONE_KERNEL=1 to UsbGPU openpilot test
2025-05-14 14:50:37 -07:00
wozeparrot
2df2ec6640
feat: unpin hypothesis ( #10306 )
2025-05-14 14:26:28 -07:00
uuuvn
b52452d69f
Remote multi (graph) ( #9902 )
...
* Remote multi (graph)
* Remote multi (graph transfers)
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-05-14 10:54:23 -07:00
George Hotz
42e70193c9
multi: instead of real, just copy ( #10289 )
...
* multi: instead of real, just copy
* fix test
* remove real
2025-05-14 10:36:55 -07:00
qazal
043efc6ec4
do not require self for track_rewrites [pr] ( #10302 )
2025-05-14 18:23:32 +03:00
uuuvn
dd816d0237
All MultiGraphRunners can graph transfers ( #10301 )
2025-05-14 17:23:02 +03:00
nimlgen
e00679dc92
am_smi: fix layout with sleep mode ( #10300 )
2025-05-14 15:44:42 +03:00
chenyu
fbaa26247a
randn_like in minrf ( #10298 )
...
tested that it trains to similar loss
2025-05-14 07:59:50 -04:00
nimlgen
0788659d08
usbgpu: fast cold boot ( #10260 )
...
* usbgpu: fast cold boot
* cleaner
* assert
* xx
* compat
* fix
* fix
2025-05-14 14:58:55 +03:00
qazal
d342f7688d
remove some skips in test_schedule + use assertRaisesRegex [pr] ( #10296 )
2025-05-14 14:54:07 +03:00
qazal
40f4ce3390
enable AMD CI for TestRandomness.test_multinomial [pr] ( #10295 )
2025-05-14 14:32:22 +03:00