nimlgen
877a7fdd61
jit: support encdec ( #13563 )
...
* jit: support encdec
* fix
2025-12-04 11:58:34 +03:00
Douglas Nyberg
a8a62bc08e
add max/min reduction support to ScatterND ( #13562 )
2025-12-04 00:53:47 -08:00
ayanhan
edf929ec9d
fix: add __delitem__ to Tensor with proper TypeError ( #13561 )
2025-12-04 00:53:08 -08:00
Douglas Nyberg
9411ecedc4
fix CUDA half-precision trunc() type mismatch ( #13559 )
2025-12-03 21:53:16 -05:00
ayanhan
92b40290c7
fix: add test_sum_int and remove outdated TODO in test_custom_kernel ( #13560 )
2025-12-03 21:51:58 -05:00
Christopher Milan
0a54434b15
mitigate ctypes c_bool bitfield bug ( #13558 )
...
* mitigate ctypes c_bool bitfield bug
* don't delete old test
2025-12-03 20:46:04 -05:00
George Hotz
96d16675fe
update examples/gradaccum_mnist.py to use the JIT
2025-12-03 16:11:42 -08:00
George Hotz
24ca8eeaa7
small fixups from schedule_cache ( #13557 )
2025-12-03 15:41:16 -08:00
Douglas Nyberg
f5abd38132
remove tfa dependency: use keras.optimizers.Lamb and tf.raw_ops for LARS ( #13555 )
2025-12-03 17:48:27 -05:00
George Hotz
a4c4e48385
add LUNIQUE op ( #13554 )
2025-12-03 14:34:34 -08:00
George Hotz
a909cd4581
faster HEVC decode ( #13552 )
...
* faster HEVC decode
* bind to variables
* cleanups
* more cleanups
2025-12-03 11:33:05 -08:00
chenyu
22777a89ea
minor test_uop_symbolic updates ( #13551 )
2025-12-03 13:17:44 -05:00
chenyu
a205f98ef4
tighter bound for MOD ( #13550 )
2025-12-03 11:24:29 -05:00
nimlgen
fcdb01abe7
hip: fix ioctl ( #13548 )
2025-12-03 16:40:43 +03:00
qazal
aab7535805
viz: format buffer size unit ( #13547 )
2025-12-03 21:35:49 +08:00
nimlgen
daea1161cc
nv: nvdec for blackwell ( #13546 )
2025-12-03 16:30:22 +03:00
nimlgen
549f3287a8
fix caching for fetch ( #13544 )
2025-12-03 14:34:14 +03:00
qazal
8390de39e6
amd: static flag check for sqtt/pmc ( #13545 )
2025-12-03 18:36:15 +08:00
George Hotz
ddf3f2d0c4
rdna3 asm + zip_extract ( #13499 )
...
* rdna3 asm + zip_extract
* include sqtt
* fix end parsing
* disassembler working
* parsing fields
* instruction
* op
* more parsing
2025-12-02 22:56:01 -08:00
George Hotz
6bd355fa26
add needs_second_gpu decorator ( #13543 )
...
* add needs_second_gpu decorator
* more skips
* two more fixes
2025-12-02 19:08:23 -08:00
wozeparrot
0d55aec605
fix after end ( #13542 )
2025-12-02 18:42:58 -08:00
chenyu
8902781dc1
enable more benchmarks ( #13540 )
...
* enable more benchmarks
* disable some
* adjust ASSERT_MIN_STEP_TIME
* mac NOCLANG=1
2025-12-02 20:31:14 -05:00
George Hotz
055d5aeb7f
add external_test_process_count
2025-12-02 17:26:30 -08:00
chenyu
e8879f7e31
match torch clamp backward ( #13533 )
...
* match torch clamp backward
* fix PYTHON
2025-12-02 17:58:32 -05:00
qazal
7622be761f
add new remu instructions from #13533 ( #13539 )
2025-12-03 06:29:20 +08:00
wozeparrot
18640f57b2
feat: configurable timeout ( #13537 )
2025-12-02 13:35:35 -08:00
chenyu
21aac568fd
limit lift x*y out of reduce to int [pr] ( #13535 )
2025-12-02 16:11:45 -05:00
Roelof van Dijk
c158e3c988
add cifar gated uop_given_valid regression test ( #13536 )
2025-12-02 16:02:47 -05:00
Roelof van Dijk
e329baffa7
fix cifar while keeping openpilot fused ( #13528 )
...
* this works
* test now passes
2025-12-02 12:05:56 -08:00
nimlgen
0874ba8cc8
test_hevc: do not download the whole file ( #13531 )
...
* test_hevc: do not download the whole file
* fix
2025-12-02 21:31:28 +03:00
qazal
366badaa68
require renderer argument in get_program, removes device opening in process replay [pr] ( #13524 )
2025-12-03 02:05:31 +08:00
George Hotz
21184ae6b1
bump cache to 14 ( #13530 )
2025-12-02 08:02:19 -08:00
George Hotz
037edc151c
late gate for ALLOW_TF32 ( #13527 )
...
* remove ALLOW_TF32
* the right place to put that gate
2025-12-02 07:51:58 -08:00
Douglas Nyberg
6a7c58abf1
fix(onnx): unwrap list/tuple value in Pad op ( #13500 )
...
* fix(onnx): unwrap list/tuple value in Pad op
* add regression test for Pad list value
* remove trailing whitespace
* use _resolve_const for Pad constant_value
2025-12-02 07:47:20 -08:00
qazal
c65aa93081
refactor sqtt loader to enable PMC=1 SQTT=0 ( #13526 )
2025-12-02 22:50:38 +08:00
chenyu
60f7c6cce6
simpler drop_and_clauses [pr] ( #13525 )
2025-12-02 09:12:21 -05:00
nimlgen
77a76d1b13
device: respect compiler ContextVars ( #13523 )
...
* device: envvars for cc
* fix
* fix
* x
* um
* fix
* remote
* em
* cleanup
* typing
* fix
* debug
* lvp?
* ugh
* singl
* rm
* lol
* fix
* ?
* this?
* why?
* rev
* mod test
* l
2025-12-02 14:42:04 +03:00
wozeparrot
1b7dbfb37f
tk: named kernels + per kernel range id ( #13522 )
2025-12-01 22:51:04 -08:00
wozeparrot
8713ae6de9
fix: dead sdv2 download link ( #13521 )
2025-12-01 22:50:53 -08:00
George Hotz
44104b0b7f
mnist with grad acc + Adam on CPU ( #13520 )
...
* mnist with grad acc + Adam on CPU
* still broken, but closer
* works w/o jit
* this works without the jit
2025-12-01 18:27:32 -08:00
George Hotz
7307120311
shard to one device is to ( #13519 )
...
* shard to one device is to
* fst
2025-12-01 16:29:53 -08:00
chenyu
0b92fd30f5
simpler simplify_valid [pr] ( #13514 )
...
dedup instead of getting a True clause which is removed later
2025-12-01 17:36:33 -05:00
qazal
a5ec3b24be
viz: start PMC in the counters view ( #13510 )
2025-12-02 00:01:57 +08:00
nimlgen
759b41ab91
amd: fix rsrc_word3 on gfx9 ( #13509 )
2025-12-01 12:47:54 +03:00
chenyu
ebbd114885
simpler invalid alu [pr] ( #13508 )
2025-11-30 22:18:42 -05:00
George Hotz
ada6b92b2d
add a gate to rewrite if there's no rules [pr] ( #13506 )
2025-11-30 17:40:52 -08:00
George Hotz
97b56e11e0
hotfix: 32 workgroups for radeon 8050s
2025-11-30 08:20:17 -08:00
George Hotz
bd4b9de7d2
use numpy in amd_uop_matmul for simpler tracing ( #13503 )
2025-11-30 08:04:38 -08:00
qazal
9023ca30ef
show number of waves in each SE/CU ( #13491 )
...
* show number of waves in each SE/CU
* update to test_ones
2025-11-30 22:29:16 +08:00
nimlgen
455dd88236
nv: minimal hevc ( #13502 )
...
* nv: minimal hevc
* validate
* not needed
* tralin
* var
* cpu
* fxi
* desc
* move
* cleanup
2025-11-30 16:46:55 +03:00