Commit Graph

11823 Commits

Author SHA1 Message Date
chenyu
8bb61c2490 stronger test_graph_input_output_aliasing (#14282)
* stronger test_graph_input_output_aliasing

* comfirmed failure
2026-01-22 09:59:34 -05:00
qazal
d7afa02085 clean up the extra/sqtt directory (#14284)
* remove legacy test_timing stuff

* remove legacy test_pmc, update active_sqtt_parse
2026-01-22 19:10:59 +09:00
qazal
dff5f361b0 support rendering assembly kernels on the NULL backend (#14283)
* assembly custom kernels in DEV=NULL, use renderer arch

* update mmapeak

* llvm
2026-01-22 15:49:07 +09:00
qazal
dfefeddeed add tflops to cdna gemm custom kernel (#14281) 2026-01-22 12:48:28 +09:00
qazal
18f408a35a custom assembly kernel with variable tests (#14280)
* custom assembly kernel with variable tests

* different threads

* sink

* zeros like / flatten
2026-01-22 11:34:17 +09:00
chenyu
4de107b764 jit graph bug when input is output (#14278)
* jit graph bug when input is output

wrong result in llm

* not just metal
2026-01-21 18:49:52 -05:00
wozeparrot
76a9242a66 fa: merge kv bwd into one kernel (#14277) 2026-01-21 15:24:41 -08:00
chenyu
6279ae4a94 remove llm generate always reset start_pos (#14276)
* remove llm generate always reset start_pos

by itself seems like a bug, also added a test to repro forward_jit.reset() issue

* issue is jit graph, so revert that test
2026-01-21 16:54:30 -05:00
nimlgen
da1fedc3c8 working ioctls (#14272) 2026-01-21 20:29:04 +03:00
chenyu
574d171fa6 fix onnx Pad constant_value=None (#14271)
also removed a dead branch in _resolve_pool_pads
2026-01-21 11:51:34 -05:00
chenyu
a18d34be1e simpler split_store outer range check [pr] (#14273)
also fixed comment
2026-01-21 11:51:14 -05:00
chenyu
e64111ad08 update all_same [pr] (#14270)
add type annotation and unit test
2026-01-21 11:26:15 -05:00
chenyu
9ad3c865ac fix bug in logsumexp keepdim=True (#14268) 2026-01-21 09:49:55 -05:00
George Hotz
41d00a046d add device to local, fix PCONTIG=2 (#14266)
* add device to local, fix PCONTIG=2

* regression test

* remove the device when we render

* viz slowness

* no long
2026-01-21 22:12:18 +09:00
wozeparrot
c1d14ea832 llama8b train fixes (#14264) 2026-01-20 20:34:47 -08:00
qazal
549dbabfcb move ALLOW_DEVICE_USAGE=0 to get_program [pr] (#14263) 2026-01-21 12:56:05 +09:00
qazal
78a28227c6 assembly/amd: cdna4 mfma support (#14206) 2026-01-21 09:12:05 +09:00
George Hotz
1baefed530 assembly/amd: add hw tests from ucode branch (#14259)
* assembly/amd: add hw tests from ucode branch

* fix is per lane
2026-01-21 08:53:54 +09:00
wozeparrot
ba90e1b52e feat: script to run llama8b training (#14239) 2026-01-20 12:44:06 -08:00
Christopher Milan
daf9414bff fix nullptr arg to CUDA_KERNEL_NODE_PARAMS_v1 (#14256)
* fix nullptr arg to CUDA_KERNEL_NODE_PARAMS_v1

* ruff
2026-01-20 12:30:07 -05:00
chenyu
e04767e39e run pre-commit in ci (#14253)
* run pre-commit in ci

prevents pre-commit regression

* IGNORE_OOB=1

* pytest

* unit test

* split
2026-01-20 12:24:33 -05:00
nimlgen
22af7132cd fix test_dev_jitter_matrix (#14255) 2026-01-20 20:07:51 +03:00
Robbe Derks
c7fbd177d4 USBGPU: debug script for comma chestnut (#14252)
* initial debug script

* improvements
2026-01-20 18:52:25 +03:00
C T
26f8b12e01 Whisper audio helpers (mel filters in tinygrad) (#13478)
* add whisper audio helpers for stft/mel/resample

* cleanup

* add whisper stft test

* make only stft test explicitly depend on librosa

* extract sinc_window_kernel

* dehardcode device

* use same device argument

* simplify

* type annotate

* ruff format audio_helpers.py

* ruff format test_whisper.py

* add WHISPER_NEW_STFT

* rename

* undo ruff format changes

* use new stft and mel for whisper

* remove stft test that depends on librosa

* remove whitespace

* add Tensor.log10 with test\test_ops.py::TestOps::test_log10

* use Tensor.log10

* fix lint

* future: remove unused STFT class

* future: remove resample code since it isn't used (yet)

* match openai with pad_mode="reflect"

* pad_to

* future: cut resample leftovers

* cleanup

* add mel tests

* future: cut stft

* future: cut non-mel prep_audio changes

* reduce diff

* move audio_helpers.py to examples

* reduce whitespace

* fix imports

* reduce whitespace

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2026-01-20 10:50:02 -05:00
nimlgen
dc82856084 tbgpu: shim binary + remote apl pci dev (#14124)
* shim binary + remote pci dev

* v2

* rip out apl

* cmds

* rename

* clean

* remove

* rm gitignore

* ui

* install

* linter

* um

* cleaner

* assets

* normal install in ui

* cleaner app

* install script

* support fd mmap

* cleaner

* kill server when disconn

* rename + pcidevs

* sign

* install and reinstall

* no sip install

* will trigger update

* nv

* ugh

* this

* fix

* nv

* use nosip sign

* auto install

* remove

* mypy

* upd

* ditto

* print

* simpler

* ditto

* um

* simpler

* upd

* upd

* cleaner

* autogen

* cleaner

* move

* annotations

* server cleaner
2026-01-20 16:15:18 +03:00
qazal
4548fcc1b8 amd/sqtt: add rdna4 and cdna sqtt examples (#14251)
* amd/sqtt: add rdna4 and cdna sqtt examples

* work

* comment out rdna and cdna tests
2026-01-20 21:11:48 +09:00
qazal
2dc281b32a assembly/amd: test helpers for arch to gfx target mapping (#14250) 2026-01-20 20:35:09 +09:00
nimlgen
823e88c0d0 nv: request bar 3 (#14249) 2026-01-20 13:52:38 +03:00
qazal
dddd0e384f ALLOW_DEVICE_USAGE=0 in codegen (#14238) 2026-01-20 15:15:16 +09:00
George Hotz
0243f4a0f1 clear wins from ucode branch (#14243)
* clear wins from ucode branch

* two more

* revert those
2026-01-20 15:11:09 +09:00
George Hotz
5e24643889 minor import speedups (#14244)
* minor import speedups

* server stuff in server places

* pre-commit

* fix
2026-01-20 15:05:36 +09:00
George Hotz
d60a155e48 defer compilation of upats (#14242)
* defer compilation of upats

* mypy
2026-01-20 13:50:00 +09:00
George Hotz
56c8926d32 import speedups: refactor validate to late import (#14241)
* refactor validate to late import

* preommit stuff

* fix mypy
2026-01-20 13:23:39 +09:00
chenyu
9d3b1cf1e7 simpler _cached_to_python_const (#14236) 2026-01-19 23:10:53 -05:00
qazal
b1c5a242b7 Revert "move is_dtype_supported logic to renderer (#14188)" (#14237)
This reverts commit 161fee9a48.
2026-01-20 12:19:14 +09:00
wozeparrot
1f89eaf790 tk: fa bert mask fix + some numerical stability improvements (#14214) 2026-01-19 19:18:07 -08:00
chenyu
9ea63d7d52 failed test case for onnx IF with jit (#14235)
silently fails now since onnx treats IF cond as a const
2026-01-19 18:10:05 -05:00
Garret Castro
b65dc9fd8e refactor: use generic type for ContextVar [pr] (#13998)
* use generic type for context var

removes ops_python string cast thing, allows for handling of other string vars like `_CC`

* update Context.old_context type

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2026-01-19 13:37:54 -05:00
Martin Szewieczek
7010c176cf pre commit: fix path to test_assign.py (#14231) 2026-01-19 13:36:30 -05:00
Christopher Milan
34f6192739 look for cuda in /opt/cuda (#14230)
* look for cuda in /opt/cuda

* regen
2026-01-19 11:51:00 -05:00
qazal
0f61cbd51f viz: draw shapes directly on the canvas (#14229) 2026-01-20 00:57:06 +09:00
nimlgen
acb0045ba0 system: alloc_sysmem is part of interface (#14226) 2026-01-19 18:15:54 +03:00
qazal
ab426cb671 viz: simplify row line logic (#14227) 2026-01-20 00:00:28 +09:00
nimlgen
01653db4fd nv: GPPut is mmiointerface (#14225) 2026-01-19 17:36:26 +03:00
nimlgen
7cb7abeeb0 amd: fix scratch_wave64_lane_byte_size (#14223) 2026-01-19 15:21:39 +03:00
nimlgen
979ce211f7 amd: missing self in aql's exec (#14224) 2026-01-19 14:27:54 +03:00
George Hotz
31bcbed6bb AMD_DISABLE_SDMA for testing with -n12 (#14216) 2026-01-19 16:10:30 +09:00
qazal
578a4a50d3 viz: row lines in timeline (#14213)
* simple start, already works for memory graph

* add height to exec packets

* math.max, border-color

* borderline is in pixels

* row border color
2026-01-19 13:01:43 +09:00
Christopher Milan
161fee9a48 move is_dtype_supported logic to renderer (#14188)
* move is_dtype_supported logic to renderer

* fix CPU_COUNT

* mypy happy

* early import libclang too with llvm

* run with debug

* skip autogen tests if MTLCompiler or llvm is loaded

* run autogen tests separately in CI

* lint
2026-01-18 22:37:04 -05:00
qazal
7abe9b020f viz: add border colors to pkts timeline (#14211)
* viz: add border colors to pkts timeline

* 10
2026-01-19 11:37:46 +09:00