* viz cli work
* baseline test
* make cli test work without subprocess
* more checks
* check itrace
* s/return/return None
* change
* minimal
* colored
* start work on target
* add test
* update actions to use DEV
* update docs
* update readmes
* tests need that too
* update example
* update tests (comments)
* fix that test
* ruff
* mypy
* oops
* remove getenvs
* don't add Target yet
* and the test
* lint
* and docs
* more stuff
* assert
* few more fixes
* test assert
* improvements to the llm server
* eval script
* eval llm
* better eval gets 58.71
* cleanups
* add temperature, but multinomial is absurdly slow
* claude is so smart
* lint
* remove slop
* no more stop
* ai slop flash attention (it works)
* speed up, 2 TFLOPS + 7 GB/s
* simpler
* simpler
* optimize
* faster
* warp shuffle
* sqtt: link dispatch to exec (#15396)
* sqtt packet linking infra
python
* javascript
* ~doubly linked list
* ui works
* work
* exec can also highlight the pc, coloring work
* more work
* rm sqtt/model.py, doesn't need to be upstreamed
* viz: no context enters in cli, update llama profile (#15404)
* removed unused named arg in rules [pr] (#15414)
* viz: sqtt printer in viz/cli.py (#15411)
* work
* sqtt timeline in CLI
* format all printers nicely
* s/Showed/Printed
* ansistrip
* sys.exit
* keep colors in list
* work from amd_copy_matmul
* has_more always gets returned
* linter
* don't print colors
* more colors
* wow this is so deep
* work
* minor details
* selected
* improve progress bar
* remove it
* 22, global_load_vaddr is so long
* remove *0 hack in sign, gradient materializes zeros for unconnected nodes (#15416)
Amp-Thread-ID: https://ampcode.com/threads/T-019d1612-6322-706b-a94d-a812400a55cb
Co-authored-by: Amp <amp@ampcode.com>
* works
* cnt=20
* revert that
* uop slice tests
* simpler
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
Co-authored-by: gg <ggordbegli@gmail.com>
Co-authored-by: Amp <amp@ampcode.com>
* work
* sqtt timeline in CLI
* format all printers nicely
* s/Showed/Printed
* ansistrip
* sys.exit
* keep colors in list
* work from amd_copy_matmul
* has_more always gets returned
* linter
* don't print colors
* more colors
* wow this is so deep
* work
* minor details
* selected
* improve progress bar
* remove it
* 22, global_load_vaddr is so long
* cdna emulator work
* accvgprs
* cdna passes most tests
* ruff
* add cdna4 to tests
* cdna emu
* crash
* pass?
* work
* gen
* clean up wave_size access
* asm_gemm passes
* remove acc from dsl.py, emulator can keep its different reg file
it's purely an encoding here, the ASM_GEMM already encodes acc srcs with v[], this can
be cleaned up later, but not functionally required for emulator.
* split asm_gemm tests to ones fast on the emulator
* don't do that
* 124 stays null on rdna
* the segfault was because of hw regs, not this
* Revert "clean up wave_size access", it's explicitly tested
This reverts commit 1202ff5787.
* nullcopyout
---------
Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>