Commit Graph

11275 Commits

Author SHA1 Message Date
George Hotz
74fb405cc9 reenable the actual schedule cache 2025-12-03 15:03:42 -08:00
George Hotz
bf5de6ba5f delete abstractions2 2025-12-03 15:02:20 -08:00
George Hotz
183b3ced03 fix process replay 2025-12-03 14:56:28 -08:00
George Hotz
2280dae504 src[0].op 2025-12-03 14:50:46 -08:00
George Hotz
9ba612f0b4 Merge branch 'master' into sched_cache 2025-12-03 14:50:29 -08:00
Douglas Nyberg
f5abd38132 remove tfa dependency: use keras.optimizers.Lamb and tf.raw_ops for LARS (#13555) 2025-12-03 17:48:27 -05:00
George Hotz
32794853db why is that broken? 2025-12-03 14:44:41 -08:00
George Hotz
4a72a49082 Merge branch 'master' into sched_cache 2025-12-03 14:34:49 -08:00
George Hotz
a4c4e48385 add LUNIQUE op (#13554) 2025-12-03 14:34:34 -08:00
George Hotz
9e6f8c823d always miss 2025-12-03 14:22:26 -08:00
George Hotz
4459a88a54 fix spec 2025-12-03 14:19:07 -08:00
George Hotz
9cdda8913f put that there 2025-12-03 14:15:13 -08:00
George Hotz
e644d59f9f oops, fix cache 2025-12-03 14:07:04 -08:00
George Hotz
37a930591f preserve metadata 2025-12-03 14:04:20 -08:00
George Hotz
723179dfd6 Merge branch 'master' into sched_cache 2025-12-03 13:43:58 -08:00
George Hotz
a909cd4581 faster HEVC decode (#13552)
* faster HEVC decode

* bind to variables

* cleanups

* more cleanups
2025-12-03 11:33:05 -08:00
chenyu
22777a89ea minor test_uop_symbolic updates (#13551) 2025-12-03 13:17:44 -05:00
chenyu
a205f98ef4 tighter bound for MOD (#13550) 2025-12-03 11:24:29 -05:00
nimlgen
fcdb01abe7 hip: fix ioctl (#13548) 2025-12-03 16:40:43 +03:00
qazal
aab7535805 viz: format buffer size unit (#13547) 2025-12-03 21:35:49 +08:00
nimlgen
daea1161cc nv: nvdec for blackwell (#13546) 2025-12-03 16:30:22 +03:00
nimlgen
549f3287a8 fix caching for fetch (#13544) 2025-12-03 14:34:14 +03:00
qazal
8390de39e6 amd: static flag check for sqtt/pmc (#13545) 2025-12-03 18:36:15 +08:00
George Hotz
ddf3f2d0c4 rdna3 asm + zip_extract (#13499)
* rdna3 asm + zip_extract

* include sqtt

* fix end parsing

* disassembler working

* parsing fields

* instruction

* op

* more parsing
2025-12-02 22:56:01 -08:00
George Hotz
81bafb1af3 Merge branch 'master' into sched_cache 2025-12-02 19:59:48 -08:00
George Hotz
6bd355fa26 add needs_second_gpu decorator (#13543)
* add needs_second_gpu decorator

* more skips

* two more fixes
2025-12-02 19:08:23 -08:00
wozeparrot
0d55aec605 fix after end (#13542) 2025-12-02 18:42:58 -08:00
chenyu
8902781dc1 enable more benchmarks (#13540)
* enable more benchmarks

* disable some

* adjust ASSERT_MIN_STEP_TIME

* mac NOCLANG=1
2025-12-02 20:31:14 -05:00
George Hotz
055d5aeb7f add external_test_process_count 2025-12-02 17:26:30 -08:00
George Hotz
ed89217ef2 fix tests 2025-12-02 17:14:06 -08:00
George Hotz
79f2cfcb96 schedule cache cleanup 2025-12-02 16:59:32 -08:00
George Hotz
add768aab0 schedule cache works 2025-12-02 16:40:30 -08:00
George Hotz
2d6cf839d5 local unique 2025-12-02 15:45:56 -08:00
chenyu
e8879f7e31 match torch clamp backward (#13533)
* match torch clamp backward

* fix PYTHON
2025-12-02 17:58:32 -05:00
qazal
7622be761f add new remu instructions from #13533 (#13539) 2025-12-03 06:29:20 +08:00
wozeparrot
18640f57b2 feat: configurable timeout (#13537) 2025-12-02 13:35:35 -08:00
chenyu
21aac568fd limit lift x*y out of reduce to int [pr] (#13535) 2025-12-02 16:11:45 -05:00
Roelof van Dijk
c158e3c988 add cifar gated uop_given_valid regression test (#13536) 2025-12-02 16:02:47 -05:00
George Hotz
b4c3a6977e Merge branch 'master' into sched_cache 2025-12-02 12:54:14 -08:00
Roelof van Dijk
e329baffa7 fix cifar while keeping openpilot fused (#13528)
* this works

* test now passes
2025-12-02 12:05:56 -08:00
nimlgen
0874ba8cc8 test_hevc: do not download the whole file (#13531)
* test_hevc: do not download the whole file

* fix
2025-12-02 21:31:28 +03:00
qazal
366badaa68 require renderer argument in get_program, removes device opening in process replay [pr] (#13524) 2025-12-03 02:05:31 +08:00
George Hotz
21184ae6b1 bump cache to 14 (#13530) 2025-12-02 08:02:19 -08:00
George Hotz
037edc151c late gate for ALLOW_TF32 (#13527)
* remove ALLOW_TF32

* the right place to put that gate
2025-12-02 07:51:58 -08:00
Douglas Nyberg
6a7c58abf1 fix(onnx): unwrap list/tuple value in Pad op (#13500)
* fix(onnx): unwrap list/tuple value in Pad op

* add regression test for Pad list value

* remove trailing whitespace

* use _resolve_const for Pad constant_value
2025-12-02 07:47:20 -08:00
George Hotz
7f7aa0a7f8 start work on schedule cache 2025-12-02 07:44:10 -08:00
qazal
c65aa93081 refactor sqtt loader to enable PMC=1 SQTT=0 (#13526) 2025-12-02 22:50:38 +08:00
chenyu
60f7c6cce6 simpler drop_and_clauses [pr] (#13525) 2025-12-02 09:12:21 -05:00
nimlgen
77a76d1b13 device: respect compiler ContextVars (#13523)
* device: envvars for cc

* fix

* fix

* x

* um

* fix

* remote

* em

* cleanup

* typing

* fix

* debug

* lvp?

* ugh

* singl

* rm

* lol

* fix

* ?

* this?

* why?

* rev

* mod test

* l
2025-12-02 14:42:04 +03:00
wozeparrot
1b7dbfb37f tk: named kernels + per kernel range id (#13522) 2025-12-01 22:51:04 -08:00