George Hotz
9fc01c1e03
support for uop tags ( #10477 )
...
* support for uop tags [pr]
* test uop tags
2025-05-22 19:53:48 -07:00
chenyu
8cc2dff4d8
only float Tensors have gradient [pr] ( #10475 )
2025-05-22 21:02:11 -04:00
George Hotz
147f7747f2
remove the map from create_schedule_with_vars [pr] ( #10472 )
2025-05-22 15:58:25 -07:00
George Hotz
6d5f87a18a
lshift/rshift reverse is broken [pr] ( #10467 )
2025-05-22 13:01:48 -07:00
Mike Ashcroft
209d4401f8
Merge SimpleMathTrait and MathTrait ( #10463 )
2025-05-22 11:47:22 -07:00
George Hotz
0d39bb5de1
rename to get_kernelize_map ( #10465 )
2025-05-22 11:44:44 -07:00
Xingyu
1e0a59aca4
fix: handle buffer size calculation in to_movement_ops and add scalar assignment test in torch_backend ( #10464 )
2025-05-22 10:54:13 -07:00
George Hotz
577a0b4cfa
openpilot compile4 (wip) ( #10407 )
...
* openpilot compile4
* add copies
* remove junk
2025-05-22 10:47:34 -07:00
George Hotz
ab591fa4dd
make schedule explicit about kernels [pr] ( #10462 )
2025-05-22 09:32:16 -07:00
George Hotz
c46edbf262
hotfix: add note to relu
2025-05-22 09:13:38 -07:00
George Hotz
c6cbf0145a
check that arg on copy is only used on multi [pr] ( #10461 )
2025-05-22 09:08:43 -07:00
Ignacio Sica
f69722dc2a
refactor cuda disassemble ( #10449 )
2025-05-22 08:58:24 -07:00
qazal
5c4cfbc22c
remove merge_views from kernel grouping rewrite [pr] ( #10457 )
2025-05-22 18:36:54 +03:00
nimlgen
035dffb00c
nv: refactor qmd from ctypes ( #10459 )
...
* nv: refactor qmd from ctypes
* shorter
* imports
* x
* fix prefetch
2025-05-22 17:20:11 +03:00
wozeparrot
12285e926a
fix: apply ip version fixes during AMDIP creation ( #10454 )
2025-05-22 10:14:48 +03:00
Ignacio Sica
5e6b96a1be
align 16 in ptx, metal, cuda and amd ( #10450 )
2025-05-21 14:38:54 -07:00
nimlgen
570cb89652
amd: handle all exceptions ( #10448 )
...
* amd: handle all exceptions
* linter
2025-05-21 16:51:44 +03:00
nimlgen
475a7583b3
usbgpu: tiny changes ( #10445 )
2025-05-21 16:20:35 +03:00
qazal
7720c1aef1
hotfix: remove viz_sz.py [pr] ( #10446 )
2025-05-21 14:17:42 +03:00
chenyu
7bfb20757c
fix tensor int floor div ( #10327 )
...
* fix tensor int floor div
* test_float_floordiv_scalar
2025-05-21 06:46:54 -04:00
Sieds Lykles
2b4375f36d
Correct divmod folding behind flag ( #10433 )
...
* add flag
* add test
* remove import
2025-05-21 06:46:13 -04:00
qazal
df4cbb69e9
move fuzz_schedule.py to extra [pr] ( #10444 )
2025-05-21 10:07:24 +03:00
chenyu
29624af872
skip commavq in external_model_benchmark ( #10439 )
...
precision issue with different onnxruntime version
2025-05-21 01:45:33 -04:00
George Hotz
03e7a99ca8
add edge cases found by codex [pr] ( #10423 )
...
* add edge cases found by codex [pr]
* another test
* more edgecases
* docs
* instructions
* fine, add that one
* nan cases
* roll failures
* inv prob
* more failing tests
* err, that's failing
* more tests
* more failures
* uop verif
* failures
* webgpu
2025-05-20 14:53:18 -07:00
nimlgen
2895198c36
am: download regs ( #10419 )
...
* am: download regs
* x
* linter
* mypy
* after merge
* raise
* fixed name
* fix
* xx
* remove
* missing reg
* missing reg
* move to online
* ops
2025-05-20 18:59:56 +03:00
nimlgen
965f9e0696
amd: refactor amdreg ( #10427 )
2025-05-20 15:23:28 +03:00
nimlgen
0b65c367f5
hotfix: make vfio not default ( #10429 )
...
* to validate
* disable vfio causing blockingio err
2025-05-20 15:08:49 +03:00
nimlgen
cfa5c1cac6
am: disable idle d3 ( #10428 )
2025-05-20 13:50:58 +03:00
nimlgen
252c1dc737
am: close flock in fini ( #10426 )
2025-05-20 13:15:48 +03:00
George Hotz
ceb9d94eab
Update AGENTS.md
2025-05-19 17:59:59 -07:00
George Hotz
9389edf7ac
hotfix: add AGENTS.md
2025-05-19 17:48:42 -07:00
uuuvn
ec9955c956
Use REAL_DEV for test skips ( #10420 )
...
This should fix remote cpu tests flakiness (segfaults were in
`test_data_parallel_resnet_train_step` which is skipped on cpu but wasn't
skipped on remote cpu)
2025-05-19 17:32:14 -07:00
nimlgen
9a199ccd81
am: try to modprobe vfio ( #10418 )
...
* am: try to modprobe vfio
* fix
2025-05-19 23:46:50 +03:00
chenyu
67d1364106
update LOGMLPERF in red resnet run_and_time ( #10416 )
2025-05-19 13:23:33 -04:00
Sieds Lykles
db09676250
Dont simplify gate in gate, fix FUSE_ARANGE=1 python test/test_ops.py TestOps.test_scatter_add ( #10411 )
...
* substitute out index
* Add test
* change comment
2025-05-19 13:16:21 -04:00
chenyu
116d9e6306
run mlperf resnet on red box ( #10413 )
...
also made push to `update_mlperf` branch trigger
2025-05-19 12:48:36 -04:00
George Hotz
f1fe1f93c1
hotfix: 14000 lines
2025-05-19 09:40:53 -07:00
qazal
90eb3c0e5d
add MobileNetV2 benchmark to comma CI ( #10250 )
...
* add MobileNetV2 to comma CI
* symlink imagenet
* also the signature
* comment that out
* need imagenetmock
* same train and test set
* quantize on CPU=1
* verbose
* need __hexagon_divsf3
* 0x858d6c15
* quant cpu + CC=clang-19
2025-05-19 18:22:50 +03:00
qazal
f9a5ad24c5
faster viz to_program [pr] ( #10410 )
...
* faster viz to_program [pr]
* Callable
2025-05-19 12:27:49 +03:00
qazal
cc8dda1d75
move multi_map to grouper rewrite pass ( #10409 )
...
* move multi_map to grouper rewrite pass
* delete that
2025-05-19 10:44:06 +03:00
George Hotz
b06291077c
no amdgpu kernel driver ( #10408 )
...
* no amdgpu kernel driver
* don't test hip
* lower req
2025-05-18 20:52:39 -07:00
George Hotz
4b1f1a47bb
hotfix: allow ModuleNotFoundError in metal llvm import
2025-05-18 20:46:31 -07:00
chenyu
485e80da69
run_and_time for resnet ci ( #10405 )
2025-05-18 23:39:57 -04:00
qazal
d1eeb19437
count viz javascript in lines ( #10403 )
...
* count viz javascript in lines
* don't count }
* it's javascript
* share with autogen
2025-05-18 19:34:00 -07:00
qazal
260d194523
merge insert_fuse and do_fuse [pr] ( #10406 )
2025-05-19 04:44:36 +03:00
uuuvn
33cf33902a
Slightly less slow remote copyin ( #10404 )
...
bytes concat is slow, don't do it if data is already present in self._h
also don't cast memoryview into bytes (copy, +100ms) before it's needed
this mitigates shard copying before shrink
master:
```
*** REMOTE 6 copy 1073.74M, REMOTE <- METAL arg 2 mem 2.15 GB tm 806.84ms/ 829.61ms ( 0.00 GFLOPS 1.3|1.3 GB/s)
*** REMOTE: 7 copy 1073.74M, REMOTE: <- METAL arg 2 mem 3.22 GB tm 797.41ms/ 1627.02ms ( 0.00 GFLOPS 1.3|1.3 GB/s)
*** REMOTE: 8 copy 1073.74M, REMOTE: <- METAL arg 2 mem 4.29 GB tm 677.89ms/ 2304.91ms ( 0.00 GFLOPS 1.6|1.6 GB/s)
*** REMOTE: 9 copy 1073.74M, REMOTE: <- METAL arg 2 mem 5.37 GB tm 659.81ms/ 2964.72ms ( 0.00 GFLOPS 1.6|1.6 GB/s)
*** REMOTE: 10 copy 1073.74M, REMOTE: <- METAL arg 2 mem 6.44 GB tm 679.21ms/ 3643.93ms ( 0.00 GFLOPS 1.6|1.6 GB/s)
*** REMOTE: 11 copy 1073.74M, REMOTE: <- METAL arg 2 mem 7.52 GB tm 673.90ms/ 4317.83ms
```
this:
```
*** REMOTE 6 copy 1073.74M, REMOTE <- METAL arg 2 mem 2.15 GB tm 867.06ms/ 895.58ms ( 0.00 GFLOPS 1.2|1.2 GB/s)
*** REMOTE: 7 copy 1073.74M, REMOTE: <- METAL arg 2 mem 3.22 GB tm 433.35ms/ 1328.93ms ( 0.00 GFLOPS 2.5|2.5 GB/s)
*** REMOTE: 8 copy 1073.74M, REMOTE: <- METAL arg 2 mem 4.29 GB tm 433.19ms/ 1762.12ms ( 0.00 GFLOPS 2.5|2.5 GB/s)
*** REMOTE: 9 copy 1073.74M, REMOTE: <- METAL arg 2 mem 5.37 GB tm 432.71ms/ 2194.83ms ( 0.00 GFLOPS 2.5|2.5 GB/s)
*** REMOTE: 10 copy 1073.74M, REMOTE: <- METAL arg 2 mem 6.44 GB tm 433.68ms/ 2628.51ms ( 0.00 GFLOPS 2.5|2.5 GB/s)
*** REMOTE: 11 copy 1073.74M, REMOTE: <- METAL arg 2 mem 7.52 GB tm 432.91ms/ 3061.42ms
```
The 430ms is basically all sha256 time.
2025-05-18 16:20:43 -07:00
qazal
e55ee28b29
little smaller viz/worker.js [pr] ( #10402 )
2025-05-18 23:44:46 +03:00
qazal
8a6fb37560
move viz /prof to extra [pr] ( #10401 )
2025-05-18 23:25:59 +03:00
George Hotz
411392dfb7
move files into uop dir ( #10399 )
...
* move files into uop dir [pr]
* tinygrad.uop is a thing
* fix uop docs, no pr
* fix viz
2025-05-18 11:38:28 -07:00
uuuvn
0f825e12f2
Remote fixedvars ( #10371 )
...
* amd mockgpu graph support
For testing remote graph stuff (prompted by #10371 ) in ci
* Remote fixedvars
Somehow none of existing tests failed when fixedvars were added, looking
what to add as an regression test for this
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-05-18 09:57:13 -07:00