Commit Graph

11810 Commits

Author SHA1 Message Date
George Hotz
41d00a046d add device to local, fix PCONTIG=2 (#14266)
* add device to local, fix PCONTIG=2

* regression test

* remove the device when we render

* viz slowness

* no long
2026-01-21 22:12:18 +09:00
wozeparrot
c1d14ea832 llama8b train fixes (#14264) 2026-01-20 20:34:47 -08:00
qazal
549dbabfcb move ALLOW_DEVICE_USAGE=0 to get_program [pr] (#14263) 2026-01-21 12:56:05 +09:00
qazal
78a28227c6 assembly/amd: cdna4 mfma support (#14206) 2026-01-21 09:12:05 +09:00
George Hotz
1baefed530 assembly/amd: add hw tests from ucode branch (#14259)
* assembly/amd: add hw tests from ucode branch

* fix is per lane
2026-01-21 08:53:54 +09:00
wozeparrot
ba90e1b52e feat: script to run llama8b training (#14239) 2026-01-20 12:44:06 -08:00
Christopher Milan
daf9414bff fix nullptr arg to CUDA_KERNEL_NODE_PARAMS_v1 (#14256)
* fix nullptr arg to CUDA_KERNEL_NODE_PARAMS_v1

* ruff
2026-01-20 12:30:07 -05:00
chenyu
e04767e39e run pre-commit in ci (#14253)
* run pre-commit in ci

prevents pre-commit regression

* IGNORE_OOB=1

* pytest

* unit test

* split
2026-01-20 12:24:33 -05:00
nimlgen
22af7132cd fix test_dev_jitter_matrix (#14255) 2026-01-20 20:07:51 +03:00
Robbe Derks
c7fbd177d4 USBGPU: debug script for comma chestnut (#14252)
* initial debug script

* improvements
2026-01-20 18:52:25 +03:00
C T
26f8b12e01 Whisper audio helpers (mel filters in tinygrad) (#13478)
* add whisper audio helpers for stft/mel/resample

* cleanup

* add whisper stft test

* make only stft test explicitly depend on librosa

* extract sinc_window_kernel

* dehardcode device

* use same device argument

* simplify

* type annotate

* ruff format audio_helpers.py

* ruff format test_whisper.py

* add WHISPER_NEW_STFT

* rename

* undo ruff format changes

* use new stft and mel for whisper

* remove stft test that depends on librosa

* remove whitespace

* add Tensor.log10 with test\test_ops.py::TestOps::test_log10

* use Tensor.log10

* fix lint

* future: remove unused STFT class

* future: remove resample code since it isn't used (yet)

* match openai with pad_mode="reflect"

* pad_to

* future: cut resample leftovers

* cleanup

* add mel tests

* future: cut stft

* future: cut non-mel prep_audio changes

* reduce diff

* move audio_helpers.py to examples

* reduce whitespace

* fix imports

* reduce whitespace

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2026-01-20 10:50:02 -05:00
nimlgen
dc82856084 tbgpu: shim binary + remote apl pci dev (#14124)
* shim binary + remote pci dev

* v2

* rip out apl

* cmds

* rename

* clean

* remove

* rm gitignore

* ui

* install

* linter

* um

* cleaner

* assets

* normal install in ui

* cleaner app

* install script

* support fd mmap

* cleaner

* kill server when disconn

* rename + pcidevs

* sign

* install and reinstall

* no sip install

* will trigger update

* nv

* ugh

* this

* fix

* nv

* use nosip sign

* auto install

* remove

* mypy

* upd

* ditto

* print

* simpler

* ditto

* um

* simpler

* upd

* upd

* cleaner

* autogen

* cleaner

* move

* annotations

* server cleaner
2026-01-20 16:15:18 +03:00
qazal
4548fcc1b8 amd/sqtt: add rdna4 and cdna sqtt examples (#14251)
* amd/sqtt: add rdna4 and cdna sqtt examples

* work

* comment out rdna and cdna tests
2026-01-20 21:11:48 +09:00
qazal
2dc281b32a assembly/amd: test helpers for arch to gfx target mapping (#14250) 2026-01-20 20:35:09 +09:00
nimlgen
823e88c0d0 nv: request bar 3 (#14249) 2026-01-20 13:52:38 +03:00
qazal
dddd0e384f ALLOW_DEVICE_USAGE=0 in codegen (#14238) 2026-01-20 15:15:16 +09:00
George Hotz
0243f4a0f1 clear wins from ucode branch (#14243)
* clear wins from ucode branch

* two more

* revert those
2026-01-20 15:11:09 +09:00
George Hotz
5e24643889 minor import speedups (#14244)
* minor import speedups

* server stuff in server places

* pre-commit

* fix
2026-01-20 15:05:36 +09:00
George Hotz
d60a155e48 defer compilation of upats (#14242)
* defer compilation of upats

* mypy
2026-01-20 13:50:00 +09:00
George Hotz
56c8926d32 import speedups: refactor validate to late import (#14241)
* refactor validate to late import

* preommit stuff

* fix mypy
2026-01-20 13:23:39 +09:00
chenyu
9d3b1cf1e7 simpler _cached_to_python_const (#14236) 2026-01-19 23:10:53 -05:00
qazal
b1c5a242b7 Revert "move is_dtype_supported logic to renderer (#14188)" (#14237)
This reverts commit 161fee9a48.
2026-01-20 12:19:14 +09:00
wozeparrot
1f89eaf790 tk: fa bert mask fix + some numerical stability improvements (#14214) 2026-01-19 19:18:07 -08:00
chenyu
9ea63d7d52 failed test case for onnx IF with jit (#14235)
silently fails now since onnx treats IF cond as a const
2026-01-19 18:10:05 -05:00
Garret Castro
b65dc9fd8e refactor: use generic type for ContextVar [pr] (#13998)
* use generic type for context var

removes ops_python string cast thing, allows for handling of other string vars like `_CC`

* update Context.old_context type

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2026-01-19 13:37:54 -05:00
Martin Szewieczek
7010c176cf pre commit: fix path to test_assign.py (#14231) 2026-01-19 13:36:30 -05:00
Christopher Milan
34f6192739 look for cuda in /opt/cuda (#14230)
* look for cuda in /opt/cuda

* regen
2026-01-19 11:51:00 -05:00
qazal
0f61cbd51f viz: draw shapes directly on the canvas (#14229) 2026-01-20 00:57:06 +09:00
nimlgen
acb0045ba0 system: alloc_sysmem is part of interface (#14226) 2026-01-19 18:15:54 +03:00
qazal
ab426cb671 viz: simplify row line logic (#14227) 2026-01-20 00:00:28 +09:00
nimlgen
01653db4fd nv: GPPut is mmiointerface (#14225) 2026-01-19 17:36:26 +03:00
nimlgen
7cb7abeeb0 amd: fix scratch_wave64_lane_byte_size (#14223) 2026-01-19 15:21:39 +03:00
nimlgen
979ce211f7 amd: missing self in aql's exec (#14224) 2026-01-19 14:27:54 +03:00
George Hotz
31bcbed6bb AMD_DISABLE_SDMA for testing with -n12 (#14216) 2026-01-19 16:10:30 +09:00
qazal
578a4a50d3 viz: row lines in timeline (#14213)
* simple start, already works for memory graph

* add height to exec packets

* math.max, border-color

* borderline is in pixels

* row border color
2026-01-19 13:01:43 +09:00
Christopher Milan
161fee9a48 move is_dtype_supported logic to renderer (#14188)
* move is_dtype_supported logic to renderer

* fix CPU_COUNT

* mypy happy

* early import libclang too with llvm

* run with debug

* skip autogen tests if MTLCompiler or llvm is loaded

* run autogen tests separately in CI

* lint
2026-01-18 22:37:04 -05:00
qazal
7abe9b020f viz: add border colors to pkts timeline (#14211)
* viz: add border colors to pkts timeline

* 10
2026-01-19 11:37:46 +09:00
chenyu
67d9712ef6 jit copy aliased output if it's read later (#14210) 2026-01-18 18:48:59 -05:00
chenyu
97333b1954 jit footguns test case on assign with same buffer outputs (#14209)
related https://github.com/tinygrad/tinygrad/issues/13364
2026-01-18 16:01:09 -05:00
chenyu
e7c2df9113 improve consecutive Tensor indexing (#14208)
* improve consecutive Tensor indexing

instead of O(idx_counts*src_dims), it can just be O(idx_counts)

* test correctness
2026-01-18 15:14:33 -05:00
chenyu
c7b8f6496f remove dtypes.index_like and dtypes.fields [pr] (#14207)
barely used, so just use inline and DTYPES_DICT
2026-01-18 11:49:01 -05:00
qazal
e27a0002c5 viz: only keep the sqtt bytes for pkts (#14203)
* viz: only keep the sqtt bytes for pkts

* better option name

* work

* renames
2026-01-18 17:04:26 +09:00
qazal
d8f87ae2f2 SQTT packets to assembly mapper (#14198)
* disasm + compare to llvm

* start inst trace

* base tests pass

* work

* work

* all kernels

* qol

* refactor

* work

* work

* wave_focus

* simple

* work

* add a lot of asserts

* focus on wave0

* correct handling of IMMEDIATE_MASK

* work

* viz work

* use the metadata infra

* better
2026-01-18 16:32:13 +09:00
Christopher Milan
1eb110cd7d fix memory corruption in NIR, reenable process replay (#14204) 2026-01-18 02:05:12 -05:00
George Hotz
a51e0a86db assembly/amd: clean up disasm.py + add CDNA support (#14200)
* assembly/amd: clean up disasm.py

* cleanups

* add missing encodings

* decode is pretty

* cdna

* assert on failure

* cdna roudtrip

* cdna passing

* test

* lil cleanup

* variant cleanups

* cleanups
2026-01-18 14:48:44 +09:00
chenyu
4b18c92bc5 simpler Context.__enter__ [pr] (#14201) 2026-01-18 00:38:59 -05:00
qazal
feaa804158 skip lvp process replay in CI [pr] (#14202) 2026-01-18 13:25:04 +09:00
chenyu
b12a9fea80 runtime int call instead of cast(int) (#14183) 2026-01-17 20:34:45 -05:00
George Hotz
79c1559f69 amd asm can still be simpler (#14199)
* amd asm can still be simpler

* simpler

* V_LANE_ID

* simpler

* simpler

* compact vgpr
2026-01-17 18:40:10 +09:00
chenyu
5e6a72c33f new Onnx Gather (#14187)
instead of assuming const indices, check if it showed as a const
2026-01-16 22:24:07 -05:00