Commit Graph

12045 Commits

Author SHA1 Message Date
b1tg
241f0402b4 add seed in bert data shuffle (#14054) 2026-01-07 10:02:05 -05:00
nimlgen
25c82dd242 nv: profile nvdec (#14053) 2026-01-07 15:56:54 +03:00
qazal
35900290b2 viz: configure text height for cfg (#14052) 2026-01-07 18:58:56 +09:00
chenyu
87f4bc5446 update variable names around jit [pr] (#14049)
lbs, st_vars_dtype_device and rawbuffers no more
2026-01-06 22:32:41 -05:00
chenyu
2833c5a54b few more jit tests with multi tensor inputs (#14047) 2026-01-06 22:05:22 -05:00
chenyu
72a3f78d19 jit includes tensor inputs in containers (#14043)
* jit includes tensor inputs in containers

* cleanup
2026-01-06 19:42:06 -05:00
chenyu
c714881832 don't allow jit input to be const (#14045)
* don't allow jit input to be unbuffered like const

* just const to fix multi

* fix rnnt
2026-01-06 18:15:22 -05:00
chenyu
a8896f28e1 test_unrealized_const_input_frozen (#14044)
unrealized const is not replaced in jit
2026-01-06 14:17:43 -05:00
nimlgen
325f4006ff amd: copies w/o sdma (#14036)
* amd: copies w/o sdma

* as_args

* fixes

* f
2026-01-06 21:15:58 +03:00
chenyu
7fb18f7e47 raise when jit fxn returns non-Tensor output (#14042) 2026-01-06 12:59:20 -05:00
chenyu
4491ec0c9e JitError (#14041)
* JitError

* test_symbolic_jit
2026-01-06 12:19:50 -05:00
chenyu
6ddddc68af test jit tolist failure (#14040)
also moved tests to test_jit_footguns
2026-01-06 11:16:57 -05:00
chenyu
b699b9f763 test case for jit a function with item call (#14039)
* test case for jit a function with item call

output is silently wrong now

* no dtype
2026-01-06 10:40:43 -05:00
nimlgen
02084f5376 mockdsp: use dsp allocator (#14037)
* mockdsp: use dsp allocator

* fix

* ?
2026-01-06 16:04:47 +03:00
wozeparrot
2b3e01e79c tk: support sliced local -> reg load (#14034) 2026-01-06 05:33:24 -05:00
George Hotz
45f7fd073d assembly/amd: pcode bug fixes (#14032)
* bring over pcode parser

* fixes

* pdf test

* delay alu
2026-01-06 00:15:48 -08:00
wozeparrot
21d0f6bb76 tk: flat global -> local load (#14033) 2026-01-05 23:35:53 -08:00
qazal
3170365a5b visualize SQTT with the same cfg infrastructure (#13870)
* start

* rough sketch

* post render dag

* art

* intro g key

* work

* custom color scale

* colors

* more blue

* better

* smaller

* use for loop in test
2026-01-06 14:53:20 +09:00
Christopher Milan
0120d69caa autogen: avcodec (and simplify workflow) (#14031)
* simplify autogen workflow and add avcodec verification

- Consolidate all regeneration into single steps (delete + import)
- Remove continue-on-error and individual diff checks
- Use git diff at end to catch all differences
- Show artifact URL in failure message
- Add avcodec.py verification

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* patch avcodec

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 23:30:25 -05:00
George Hotz
20653d2996 assembly/amd: make pdf.py code shine (#14029)
* assembly/amd: make pdf.py code shine

* no merge

* pdf2 is the future

* something

* regen enums

* test

* work

* remove junk

* write

* pcode extraction

* pdf2 passes all tests

* simplify

* simpler pdf

* late filter

* remove hacks

* simplify pdf2.py

* field type

* remove defaults

* don't export srcenum

* simple pdf.py

* simpler

* cleaner

* less hack in PDF
2026-01-05 18:49:40 -08:00
qazal
ea7b149ca5 viz command line tool (#14030) 2026-01-06 10:19:47 +09:00
Christopher Milan
f86c728440 load libclang as 'libclang.so' too (#14028) 2026-01-05 16:56:16 -05:00
chenyu
eda6a73897 clean up canonicalize_device (#14027)
centralize the type check
2026-01-05 10:29:55 -05:00
chenyu
ce464b147a clean up comments that mentioned outdated terms (#14026)
no MultiLazyBuffer and no ShapeTracker in comments
2026-01-05 09:42:58 -05:00
chenyu
83063cc3e4 onnx TensorScatter (#14024) 2026-01-05 09:05:22 -05:00
chenyu
9497ec00f2 fix onnx attention permute (#14025)
* fix onnx attention permute

* skip test_attention_4d_fp16_cpu too
2026-01-05 08:58:50 -05:00
qazal
5cff5698f7 viz: g key toggles graph and text view (#14023) 2026-01-05 22:41:45 +09:00
chenyu
7a81a3cb98 more passed onnx tests (#14022) 2026-01-05 07:46:27 -05:00
kim yongjin
34fe105386 remove unused LazySeq (#14020) 2026-01-05 07:38:33 -05:00
qazal
4f2f38bf64 viz: split cfg and table render (#14021) 2026-01-05 20:59:08 +09:00
nimlgen
70405b4f3c am_smi: mi350 (#14018) 2026-01-05 13:10:56 +03:00
Christopher Milan
b2a0b9c551 autogen: dump patch in CI (#14010)
* autogen: don't fast-fail, produce patch artifact on differences

All verification steps now use continue-on-error to run completely.
Each job generates a patch artifact containing all differences found.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* add gen from header test

* fix tests

* fail if diff

* add forward decl autogen test

* remove confusing/wrong comments

* macos unittests set LIBCLANG_PATH

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-04 22:38:12 -05:00
chenyu
aae08b20e0 enable passed onnx tests (#14017) 2026-01-04 22:12:50 -05:00
chenyu
785d04d127 simpler einsum (#14014) 2026-01-04 20:38:59 -05:00
chenyu
f6a78a29e0 support einsum trace (#14012)
* support einsum trace

* test_einsum_scalar_cpu
2026-01-04 19:27:27 -05:00
George Hotz
404eed6172 assembly/amd: improve tests for asm (#14007)
* assembly/amd: improve tests for asm

* upd

* skip

* tests

* re bug

* more passing

* cleanups

* cdna fixups

* improve tests, better CDNA parsing

* fix CI

* no defs

* simpler

* all pass

* from pdf

* regen
2026-01-04 15:14:08 -08:00
wozeparrot
f550f9204c fa: failing test for bwd jit (#14009)
* tk: failing test for bwd jit

* feat: mark expectedFailure

* clean: spaces
2026-01-04 16:57:43 -05:00
George Hotz
7abf4591ba use bitsize on dtype (#14011)
* use bitsize on dtype [pr]

* bitsize

* bitsize in js export, but might be wrong

* reverts

* revert that
2026-01-04 12:16:21 -08:00
chenyu
cfb8bf5814 faster image load (#13977)
sometimes image load does not need to init with NAN
2026-01-04 13:09:59 -05:00
George Hotz
7ebda28692 assembly/amd: add CDNA support to asm (#13982)
* add CDNA support

* more cdna tests

* something

* fix more stuff

* more work

* simpler

* simplier

* cdna

* disasm

* less skip

* fixes

* simpler
2026-01-04 08:53:56 -08:00
chenyu
ad041416ca delete unused rewrite rule [pr] (#14006) 2026-01-04 09:48:52 -05:00
nimlgen
bf356ae996 am: mi300 48bit address space (#14004)
* am: mi300 48bit address space

* fix
2026-01-04 15:19:25 +03:00
nimlgen
606786e152 am: do not sleep for each hive node during resets (#14003) 2026-01-04 14:02:11 +03:00
George Hotz
34ea053b26 assembly/amd: clean up pcode, jit pcode instead of static (#14001)
* assembly/amd: clean up pcode

* regen

* lil

* jit the pcode

* sendmsg

* cleanups

* inst prefetch lol
2026-01-03 23:06:15 -08:00
kamilisjon
280790e438 Reuse toposort in recursive_property (#13993) 2026-01-03 22:04:13 -08:00
kamilisjon
9a9564118c [pr] Delete reverse_toposort (#13987)
* Delete reverse_toposort

* Update comment and profiler name

* Update profiler name
2026-01-03 22:03:44 -08:00
George Hotz
8328511808 assembly/amd: make the emu.py code shine (#13996)
* assembly/amd: make the code shine

* lil clean

* reg back in pcode

* cleanups

* gen fma_mix

* no writelane hacks

* fn cleanup

* dead vgpr_write

* readable

* smem

* cleanup bench_emu

* speedups

* simpler and faster

* direct inst._fn

* split fxn

* Revert "simpler and faster"

This reverts commit e85f6594b3.

* move lds to wavestate

* dispatcher

* pc in dispatch

* literal isn't wavestate

* cleanups + program

* one readlane

* exec_vop3sd in exec_vop

* cleaner exec_vopd

* fully merge VOP3P

* no special paths

* no SliceProxy

* low=0

* no bigint

* failing tests

* fma on python 3.13
2026-01-03 20:33:09 -08:00
qazal
bdb421f13e process_replay: passthrough sink arg for Ops.PROGRAM input (#14000) 2026-01-04 13:09:39 +09:00
Galax
66caa9fe1d fix: library linking for fedora systems (#13999) 2026-01-03 17:40:56 -08:00
chenyu
8003db2a28 test case of NOOP store load folding (#13997) 2026-01-03 14:39:26 -05:00