b1tg
241f0402b4
add seed in bert data shuffle ( #14054 )
2026-01-07 10:02:05 -05:00
nimlgen
25c82dd242
nv: profile nvdec ( #14053 )
2026-01-07 15:56:54 +03:00
qazal
35900290b2
viz: configure text height for cfg ( #14052 )
2026-01-07 18:58:56 +09:00
chenyu
87f4bc5446
update variable names around jit [pr] ( #14049 )
...
lbs, st_vars_dtype_device and rawbuffers no more
2026-01-06 22:32:41 -05:00
chenyu
2833c5a54b
few more jit tests with multi tensor inputs ( #14047 )
2026-01-06 22:05:22 -05:00
chenyu
72a3f78d19
jit includes tensor inputs in containers ( #14043 )
...
* jit includes tensor inputs in containers
* cleanup
2026-01-06 19:42:06 -05:00
chenyu
c714881832
don't allow jit input to be const ( #14045 )
...
* don't allow jit input to be unbuffered like const
* just const to fix multi
* fix rnnt
2026-01-06 18:15:22 -05:00
chenyu
a8896f28e1
test_unrealized_const_input_frozen ( #14044 )
...
unrealized const is not replaced in jit
2026-01-06 14:17:43 -05:00
nimlgen
325f4006ff
amd: copies w/o sdma ( #14036 )
...
* amd: copies w/o sdma
* as_args
* fixes
* f
2026-01-06 21:15:58 +03:00
chenyu
7fb18f7e47
raise when jit fxn returns non-Tensor output ( #14042 )
2026-01-06 12:59:20 -05:00
chenyu
4491ec0c9e
JitError ( #14041 )
...
* JitError
* test_symbolic_jit
2026-01-06 12:19:50 -05:00
chenyu
6ddddc68af
test jit tolist failure ( #14040 )
...
also moved tests to test_jit_footguns
2026-01-06 11:16:57 -05:00
chenyu
b699b9f763
test case for jit a function with item call ( #14039 )
...
* test case for jit a function with item call
output is silently wrong now
* no dtype
2026-01-06 10:40:43 -05:00
nimlgen
02084f5376
mockdsp: use dsp allocator ( #14037 )
...
* mockdsp: use dsp allocator
* fix
* ?
2026-01-06 16:04:47 +03:00
wozeparrot
2b3e01e79c
tk: support sliced local -> reg load ( #14034 )
2026-01-06 05:33:24 -05:00
George Hotz
45f7fd073d
assembly/amd: pcode bug fixes ( #14032 )
...
* bring over pcode parser
* fixes
* pdf test
* delay alu
2026-01-06 00:15:48 -08:00
wozeparrot
21d0f6bb76
tk: flat global -> local load ( #14033 )
2026-01-05 23:35:53 -08:00
qazal
3170365a5b
visualize SQTT with the same cfg infrastructure ( #13870 )
...
* start
* rough sketch
* post render dag
* art
* intro g key
* work
* custom color scale
* colors
* more blue
* better
* smaller
* use for loop in test
2026-01-06 14:53:20 +09:00
Christopher Milan
0120d69caa
autogen: avcodec (and simplify workflow) ( #14031 )
...
* simplify autogen workflow and add avcodec verification
- Consolidate all regeneration into single steps (delete + import)
- Remove continue-on-error and individual diff checks
- Use git diff at end to catch all differences
- Show artifact URL in failure message
- Add avcodec.py verification
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
* patch avcodec
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
2026-01-05 23:30:25 -05:00
George Hotz
20653d2996
assembly/amd: make pdf.py code shine ( #14029 )
...
* assembly/amd: make pdf.py code shine
* no merge
* pdf2 is the future
* something
* regen enums
* test
* work
* remove junk
* write
* pcode extraction
* pdf2 passes all tests
* simplify
* simpler pdf
* late filter
* remove hacks
* simplify pdf2.py
* field type
* remove defaults
* don't export srcenum
* simple pdf.py
* simpler
* cleaner
* less hack in PDF
2026-01-05 18:49:40 -08:00
qazal
ea7b149ca5
viz command line tool ( #14030 )
2026-01-06 10:19:47 +09:00
Christopher Milan
f86c728440
load libclang as 'libclang.so' too ( #14028 )
2026-01-05 16:56:16 -05:00
chenyu
eda6a73897
clean up canonicalize_device ( #14027 )
...
centralize the type check
2026-01-05 10:29:55 -05:00
chenyu
ce464b147a
clean up comments that mentioned outdated terms ( #14026 )
...
no MultiLazyBuffer and no ShapeTracker in comments
2026-01-05 09:42:58 -05:00
chenyu
83063cc3e4
onnx TensorScatter ( #14024 )
2026-01-05 09:05:22 -05:00
chenyu
9497ec00f2
fix onnx attention permute ( #14025 )
...
* fix onnx attention permute
* skip test_attention_4d_fp16_cpu too
2026-01-05 08:58:50 -05:00
qazal
5cff5698f7
viz: g key toggles graph and text view ( #14023 )
2026-01-05 22:41:45 +09:00
chenyu
7a81a3cb98
more passed onnx tests ( #14022 )
2026-01-05 07:46:27 -05:00
kim yongjin
34fe105386
remove unused LazySeq ( #14020 )
2026-01-05 07:38:33 -05:00
qazal
4f2f38bf64
viz: split cfg and table render ( #14021 )
2026-01-05 20:59:08 +09:00
nimlgen
70405b4f3c
am_smi: mi350 ( #14018 )
2026-01-05 13:10:56 +03:00
Christopher Milan
b2a0b9c551
autogen: dump patch in CI ( #14010 )
...
* autogen: don't fast-fail, produce patch artifact on differences
All verification steps now use continue-on-error to run completely.
Each job generates a patch artifact containing all differences found.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com >
* add gen from header test
* fix tests
* fail if diff
* add forward decl autogen test
* remove confusing/wrong comments
* macos unittests set LIBCLANG_PATH
---------
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com >
2026-01-04 22:38:12 -05:00
chenyu
aae08b20e0
enable passed onnx tests ( #14017 )
2026-01-04 22:12:50 -05:00
chenyu
785d04d127
simpler einsum ( #14014 )
2026-01-04 20:38:59 -05:00
chenyu
f6a78a29e0
support einsum trace ( #14012 )
...
* support einsum trace
* test_einsum_scalar_cpu
2026-01-04 19:27:27 -05:00
George Hotz
404eed6172
assembly/amd: improve tests for asm ( #14007 )
...
* assembly/amd: improve tests for asm
* upd
* skip
* tests
* re bug
* more passing
* cleanups
* cdna fixups
* improve tests, better CDNA parsing
* fix CI
* no defs
* simpler
* all pass
* from pdf
* regen
2026-01-04 15:14:08 -08:00
wozeparrot
f550f9204c
fa: failing test for bwd jit ( #14009 )
...
* tk: failing test for bwd jit
* feat: mark expectedFailure
* clean: spaces
2026-01-04 16:57:43 -05:00
George Hotz
7abf4591ba
use bitsize on dtype ( #14011 )
...
* use bitsize on dtype [pr]
* bitsize
* bitsize in js export, but might be wrong
* reverts
* revert that
2026-01-04 12:16:21 -08:00
chenyu
cfb8bf5814
faster image load ( #13977 )
...
sometimes image load does not need to init with NAN
2026-01-04 13:09:59 -05:00
George Hotz
7ebda28692
assembly/amd: add CDNA support to asm ( #13982 )
...
* add CDNA support
* more cdna tests
* something
* fix more stuff
* more work
* simpler
* simplier
* cdna
* disasm
* less skip
* fixes
* simpler
2026-01-04 08:53:56 -08:00
chenyu
ad041416ca
delete unused rewrite rule [pr] ( #14006 )
2026-01-04 09:48:52 -05:00
nimlgen
bf356ae996
am: mi300 48bit address space ( #14004 )
...
* am: mi300 48bit address space
* fix
2026-01-04 15:19:25 +03:00
nimlgen
606786e152
am: do not sleep for each hive node during resets ( #14003 )
2026-01-04 14:02:11 +03:00
George Hotz
34ea053b26
assembly/amd: clean up pcode, jit pcode instead of static ( #14001 )
...
* assembly/amd: clean up pcode
* regen
* lil
* jit the pcode
* sendmsg
* cleanups
* inst prefetch lol
2026-01-03 23:06:15 -08:00
kamilisjon
280790e438
Reuse toposort in recursive_property ( #13993 )
2026-01-03 22:04:13 -08:00
kamilisjon
9a9564118c
[pr] Delete reverse_toposort ( #13987 )
...
* Delete reverse_toposort
* Update comment and profiler name
* Update profiler name
2026-01-03 22:03:44 -08:00
George Hotz
8328511808
assembly/amd: make the emu.py code shine ( #13996 )
...
* assembly/amd: make the code shine
* lil clean
* reg back in pcode
* cleanups
* gen fma_mix
* no writelane hacks
* fn cleanup
* dead vgpr_write
* readable
* smem
* cleanup bench_emu
* speedups
* simpler and faster
* direct inst._fn
* split fxn
* Revert "simpler and faster"
This reverts commit e85f6594b3 .
* move lds to wavestate
* dispatcher
* pc in dispatch
* literal isn't wavestate
* cleanups + program
* one readlane
* exec_vop3sd in exec_vop
* cleaner exec_vopd
* fully merge VOP3P
* no special paths
* no SliceProxy
* low=0
* no bigint
* failing tests
* fma on python 3.13
2026-01-03 20:33:09 -08:00
qazal
bdb421f13e
process_replay: passthrough sink arg for Ops.PROGRAM input ( #14000 )
2026-01-04 13:09:39 +09:00
Galax
66caa9fe1d
fix: library linking for fedora systems ( #13999 )
2026-01-03 17:40:56 -08:00
chenyu
8003db2a28
test case of NOOP store load folding ( #13997 )
2026-01-03 14:39:26 -05:00