Commit Graph

10417 Commits

Author SHA1 Message Date
George Hotz
9fc01c1e03 support for uop tags (#10477)
* support for uop tags [pr]

* test uop tags
2025-05-22 19:53:48 -07:00
chenyu
8cc2dff4d8 only float Tensors have gradient [pr] (#10475) 2025-05-22 21:02:11 -04:00
George Hotz
147f7747f2 remove the map from create_schedule_with_vars [pr] (#10472) 2025-05-22 15:58:25 -07:00
George Hotz
6d5f87a18a lshift/rshift reverse is broken [pr] (#10467) 2025-05-22 13:01:48 -07:00
Mike Ashcroft
209d4401f8 Merge SimpleMathTrait and MathTrait (#10463) 2025-05-22 11:47:22 -07:00
George Hotz
0d39bb5de1 rename to get_kernelize_map (#10465) 2025-05-22 11:44:44 -07:00
Xingyu
1e0a59aca4 fix: handle buffer size calculation in to_movement_ops and add scalar assignment test in torch_backend (#10464) 2025-05-22 10:54:13 -07:00
George Hotz
577a0b4cfa openpilot compile4 (wip) (#10407)
* openpilot compile4

* add copies

* remove junk
2025-05-22 10:47:34 -07:00
George Hotz
ab591fa4dd make schedule explicit about kernels [pr] (#10462) 2025-05-22 09:32:16 -07:00
George Hotz
c46edbf262 hotfix: add note to relu 2025-05-22 09:13:38 -07:00
George Hotz
c6cbf0145a check that arg on copy is only used on multi [pr] (#10461) 2025-05-22 09:08:43 -07:00
Ignacio Sica
f69722dc2a refactor cuda disassemble (#10449) 2025-05-22 08:58:24 -07:00
qazal
5c4cfbc22c remove merge_views from kernel grouping rewrite [pr] (#10457) 2025-05-22 18:36:54 +03:00
nimlgen
035dffb00c nv: refactor qmd from ctypes (#10459)
* nv: refactor qmd from ctypes

* shorter

* imports

* x

* fix prefetch
2025-05-22 17:20:11 +03:00
wozeparrot
12285e926a fix: apply ip version fixes during AMDIP creation (#10454) 2025-05-22 10:14:48 +03:00
Ignacio Sica
5e6b96a1be align 16 in ptx, metal, cuda and amd (#10450) 2025-05-21 14:38:54 -07:00
nimlgen
570cb89652 amd: handle all exceptions (#10448)
* amd: handle all exceptions

* linter
2025-05-21 16:51:44 +03:00
nimlgen
475a7583b3 usbgpu: tiny changes (#10445) 2025-05-21 16:20:35 +03:00
qazal
7720c1aef1 hotfix: remove viz_sz.py [pr] (#10446) 2025-05-21 14:17:42 +03:00
chenyu
7bfb20757c fix tensor int floor div (#10327)
* fix tensor int floor div

* test_float_floordiv_scalar
2025-05-21 06:46:54 -04:00
Sieds Lykles
2b4375f36d Correct divmod folding behind flag (#10433)
* add flag

* add test

* remove import
2025-05-21 06:46:13 -04:00
qazal
df4cbb69e9 move fuzz_schedule.py to extra [pr] (#10444) 2025-05-21 10:07:24 +03:00
chenyu
29624af872 skip commavq in external_model_benchmark (#10439)
precision issue with different onnxruntime version
2025-05-21 01:45:33 -04:00
George Hotz
03e7a99ca8 add edge cases found by codex [pr] (#10423)
* add edge cases found by codex [pr]

* another test

* more edgecases

* docs

* instructions

* fine, add that one

* nan cases

* roll failures

* inv prob

* more failing tests

* err, that's failing

* more tests

* more failures

* uop verif

* failures

* webgpu
2025-05-20 14:53:18 -07:00
nimlgen
2895198c36 am: download regs (#10419)
* am: download regs

* x

* linter

* mypy

* after merge

* raise

* fixed name

* fix

* xx

* remove

* missing reg

* missing reg

* move to online

* ops
2025-05-20 18:59:56 +03:00
nimlgen
965f9e0696 amd: refactor amdreg (#10427) 2025-05-20 15:23:28 +03:00
nimlgen
0b65c367f5 hotfix: make vfio not default (#10429)
* to validate

* disable vfio causing blockingio err
2025-05-20 15:08:49 +03:00
nimlgen
cfa5c1cac6 am: disable idle d3 (#10428) 2025-05-20 13:50:58 +03:00
nimlgen
252c1dc737 am: close flock in fini (#10426) 2025-05-20 13:15:48 +03:00
George Hotz
ceb9d94eab Update AGENTS.md 2025-05-19 17:59:59 -07:00
George Hotz
9389edf7ac hotfix: add AGENTS.md 2025-05-19 17:48:42 -07:00
uuuvn
ec9955c956 Use REAL_DEV for test skips (#10420)
This should fix remote cpu tests flakiness (segfaults were in
`test_data_parallel_resnet_train_step` which is skipped on cpu but wasn't
skipped on remote cpu)
2025-05-19 17:32:14 -07:00
nimlgen
9a199ccd81 am: try to modprobe vfio (#10418)
* am: try to modprobe vfio

* fix
2025-05-19 23:46:50 +03:00
chenyu
67d1364106 update LOGMLPERF in red resnet run_and_time (#10416) 2025-05-19 13:23:33 -04:00
Sieds Lykles
db09676250 Dont simplify gate in gate, fix FUSE_ARANGE=1 python test/test_ops.py TestOps.test_scatter_add (#10411)
* substitute out index

* Add test

* change comment
2025-05-19 13:16:21 -04:00
chenyu
116d9e6306 run mlperf resnet on red box (#10413)
also made push to `update_mlperf` branch trigger
2025-05-19 12:48:36 -04:00
George Hotz
f1fe1f93c1 hotfix: 14000 lines 2025-05-19 09:40:53 -07:00
qazal
90eb3c0e5d add MobileNetV2 benchmark to comma CI (#10250)
* add MobileNetV2 to comma CI

* symlink imagenet

* also the signature

* comment that out

* need imagenetmock

* same train and test set

* quantize on CPU=1

* verbose

* need __hexagon_divsf3

* 0x858d6c15

* quant cpu + CC=clang-19
2025-05-19 18:22:50 +03:00
qazal
f9a5ad24c5 faster viz to_program [pr] (#10410)
* faster viz to_program [pr]

* Callable
2025-05-19 12:27:49 +03:00
qazal
cc8dda1d75 move multi_map to grouper rewrite pass (#10409)
* move multi_map to grouper rewrite pass

* delete that
2025-05-19 10:44:06 +03:00
George Hotz
b06291077c no amdgpu kernel driver (#10408)
* no amdgpu kernel driver

* don't test hip

* lower req
2025-05-18 20:52:39 -07:00
George Hotz
4b1f1a47bb hotfix: allow ModuleNotFoundError in metal llvm import 2025-05-18 20:46:31 -07:00
chenyu
485e80da69 run_and_time for resnet ci (#10405) 2025-05-18 23:39:57 -04:00
qazal
d1eeb19437 count viz javascript in lines (#10403)
* count viz javascript in lines

* don't count }

* it's javascript

* share with autogen
2025-05-18 19:34:00 -07:00
qazal
260d194523 merge insert_fuse and do_fuse [pr] (#10406) 2025-05-19 04:44:36 +03:00
uuuvn
33cf33902a Slightly less slow remote copyin (#10404)
bytes concat is slow, don't do it if data is already present in self._h

also don't cast memoryview into bytes (copy, +100ms) before it's needed

this mitigates shard copying before shrink

master:
```
*** REMOTE     6 copy 1073.74M,  REMOTE <- METAL           arg  2 mem  2.15 GB tm    806.84ms/   829.61ms (     0.00 GFLOPS    1.3|1.3     GB/s)
*** REMOTE:    7 copy 1073.74M, REMOTE: <- METAL           arg  2 mem  3.22 GB tm    797.41ms/  1627.02ms (     0.00 GFLOPS    1.3|1.3     GB/s)
*** REMOTE:    8 copy 1073.74M, REMOTE: <- METAL           arg  2 mem  4.29 GB tm    677.89ms/  2304.91ms (     0.00 GFLOPS    1.6|1.6     GB/s)
*** REMOTE:    9 copy 1073.74M, REMOTE: <- METAL           arg  2 mem  5.37 GB tm    659.81ms/  2964.72ms (     0.00 GFLOPS    1.6|1.6     GB/s)
*** REMOTE:   10 copy 1073.74M, REMOTE: <- METAL           arg  2 mem  6.44 GB tm    679.21ms/  3643.93ms (     0.00 GFLOPS    1.6|1.6     GB/s)
*** REMOTE:   11 copy 1073.74M, REMOTE: <- METAL           arg  2 mem  7.52 GB tm    673.90ms/  4317.83ms
```

this:
```
*** REMOTE     6 copy 1073.74M,  REMOTE <- METAL           arg  2 mem  2.15 GB tm    867.06ms/   895.58ms (     0.00 GFLOPS    1.2|1.2     GB/s)
*** REMOTE:    7 copy 1073.74M, REMOTE: <- METAL           arg  2 mem  3.22 GB tm    433.35ms/  1328.93ms (     0.00 GFLOPS    2.5|2.5     GB/s)
*** REMOTE:    8 copy 1073.74M, REMOTE: <- METAL           arg  2 mem  4.29 GB tm    433.19ms/  1762.12ms (     0.00 GFLOPS    2.5|2.5     GB/s)
*** REMOTE:    9 copy 1073.74M, REMOTE: <- METAL           arg  2 mem  5.37 GB tm    432.71ms/  2194.83ms (     0.00 GFLOPS    2.5|2.5     GB/s)
*** REMOTE:   10 copy 1073.74M, REMOTE: <- METAL           arg  2 mem  6.44 GB tm    433.68ms/  2628.51ms (     0.00 GFLOPS    2.5|2.5     GB/s)
*** REMOTE:   11 copy 1073.74M, REMOTE: <- METAL           arg  2 mem  7.52 GB tm    432.91ms/  3061.42ms
```

The 430ms is basically all sha256 time.
2025-05-18 16:20:43 -07:00
qazal
e55ee28b29 little smaller viz/worker.js [pr] (#10402) 2025-05-18 23:44:46 +03:00
qazal
8a6fb37560 move viz /prof to extra [pr] (#10401) 2025-05-18 23:25:59 +03:00
George Hotz
411392dfb7 move files into uop dir (#10399)
* move files into uop dir [pr]

* tinygrad.uop is a thing

* fix uop docs, no pr

* fix viz
2025-05-18 11:38:28 -07:00
uuuvn
0f825e12f2 Remote fixedvars (#10371)
* amd mockgpu graph support

For testing remote graph stuff (prompted by #10371) in ci

* Remote fixedvars

Somehow none of existing tests failed when fixedvars were added, looking
what to add as an regression test for this

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-05-18 09:57:13 -07:00