wozeparrot
ba90e1b52e
feat: script to run llama8b training ( #14239 )
2026-01-20 12:44:06 -08:00
Christopher Milan
daf9414bff
fix nullptr arg to CUDA_KERNEL_NODE_PARAMS_v1 ( #14256 )
...
* fix nullptr arg to CUDA_KERNEL_NODE_PARAMS_v1
* ruff
2026-01-20 12:30:07 -05:00
chenyu
e04767e39e
run pre-commit in ci ( #14253 )
...
* run pre-commit in ci
prevents pre-commit regression
* IGNORE_OOB=1
* pytest
* unit test
* split
2026-01-20 12:24:33 -05:00
nimlgen
22af7132cd
fix test_dev_jitter_matrix ( #14255 )
2026-01-20 20:07:51 +03:00
Robbe Derks
c7fbd177d4
USBGPU: debug script for comma chestnut ( #14252 )
...
* initial debug script
* improvements
2026-01-20 18:52:25 +03:00
C T
26f8b12e01
Whisper audio helpers (mel filters in tinygrad) ( #13478 )
...
* add whisper audio helpers for stft/mel/resample
* cleanup
* add whisper stft test
* make only stft test explicitly depend on librosa
* extract sinc_window_kernel
* dehardcode device
* use same device argument
* simplify
* type annotate
* ruff format audio_helpers.py
* ruff format test_whisper.py
* add WHISPER_NEW_STFT
* rename
* undo ruff format changes
* use new stft and mel for whisper
* remove stft test that depends on librosa
* remove whitespace
* add Tensor.log10 with test\test_ops.py::TestOps::test_log10
* use Tensor.log10
* fix lint
* future: remove unused STFT class
* future: remove resample code since it isn't used (yet)
* match openai with pad_mode="reflect"
* pad_to
* future: cut resample leftovers
* cleanup
* add mel tests
* future: cut stft
* future: cut non-mel prep_audio changes
* reduce diff
* move audio_helpers.py to examples
* reduce whitespace
* fix imports
* reduce whitespace
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2026-01-20 10:50:02 -05:00
nimlgen
dc82856084
tbgpu: shim binary + remote apl pci dev ( #14124 )
...
* shim binary + remote pci dev
* v2
* rip out apl
* cmds
* rename
* clean
* remove
* rm gitignore
* ui
* install
* linter
* um
* cleaner
* assets
* normal install in ui
* cleaner app
* install script
* support fd mmap
* cleaner
* kill server when disconn
* rename + pcidevs
* sign
* install and reinstall
* no sip install
* will trigger update
* nv
* ugh
* this
* fix
* nv
* use nosip sign
* auto install
* remove
* mypy
* upd
* ditto
* print
* simpler
* ditto
* um
* simpler
* upd
* upd
* cleaner
* autogen
* cleaner
* move
* annotations
* server cleaner
2026-01-20 16:15:18 +03:00
qazal
4548fcc1b8
amd/sqtt: add rdna4 and cdna sqtt examples ( #14251 )
...
* amd/sqtt: add rdna4 and cdna sqtt examples
* work
* comment out rdna and cdna tests
2026-01-20 21:11:48 +09:00
qazal
2dc281b32a
assembly/amd: test helpers for arch to gfx target mapping ( #14250 )
2026-01-20 20:35:09 +09:00
nimlgen
823e88c0d0
nv: request bar 3 ( #14249 )
2026-01-20 13:52:38 +03:00
qazal
dddd0e384f
ALLOW_DEVICE_USAGE=0 in codegen ( #14238 )
2026-01-20 15:15:16 +09:00
George Hotz
0243f4a0f1
clear wins from ucode branch ( #14243 )
...
* clear wins from ucode branch
* two more
* revert those
2026-01-20 15:11:09 +09:00
George Hotz
5e24643889
minor import speedups ( #14244 )
...
* minor import speedups
* server stuff in server places
* pre-commit
* fix
2026-01-20 15:05:36 +09:00
George Hotz
d60a155e48
defer compilation of upats ( #14242 )
...
* defer compilation of upats
* mypy
2026-01-20 13:50:00 +09:00
George Hotz
56c8926d32
import speedups: refactor validate to late import ( #14241 )
...
* refactor validate to late import
* preommit stuff
* fix mypy
2026-01-20 13:23:39 +09:00
chenyu
9d3b1cf1e7
simpler _cached_to_python_const ( #14236 )
2026-01-19 23:10:53 -05:00
qazal
b1c5a242b7
Revert "move is_dtype_supported logic to renderer ( #14188 )" ( #14237 )
...
This reverts commit 161fee9a48 .
2026-01-20 12:19:14 +09:00
wozeparrot
1f89eaf790
tk: fa bert mask fix + some numerical stability improvements ( #14214 )
2026-01-19 19:18:07 -08:00
chenyu
9ea63d7d52
failed test case for onnx IF with jit ( #14235 )
...
silently fails now since onnx treats IF cond as a const
2026-01-19 18:10:05 -05:00
Garret Castro
b65dc9fd8e
refactor: use generic type for ContextVar [pr] ( #13998 )
...
* use generic type for context var
removes ops_python string cast thing, allows for handling of other string vars like `_CC`
* update Context.old_context type
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2026-01-19 13:37:54 -05:00
Martin Szewieczek
7010c176cf
pre commit: fix path to test_assign.py ( #14231 )
2026-01-19 13:36:30 -05:00
Christopher Milan
34f6192739
look for cuda in /opt/cuda ( #14230 )
...
* look for cuda in /opt/cuda
* regen
2026-01-19 11:51:00 -05:00
qazal
0f61cbd51f
viz: draw shapes directly on the canvas ( #14229 )
2026-01-20 00:57:06 +09:00
nimlgen
acb0045ba0
system: alloc_sysmem is part of interface ( #14226 )
2026-01-19 18:15:54 +03:00
qazal
ab426cb671
viz: simplify row line logic ( #14227 )
2026-01-20 00:00:28 +09:00
nimlgen
01653db4fd
nv: GPPut is mmiointerface ( #14225 )
2026-01-19 17:36:26 +03:00
nimlgen
7cb7abeeb0
amd: fix scratch_wave64_lane_byte_size ( #14223 )
2026-01-19 15:21:39 +03:00
nimlgen
979ce211f7
amd: missing self in aql's exec ( #14224 )
2026-01-19 14:27:54 +03:00
George Hotz
31bcbed6bb
AMD_DISABLE_SDMA for testing with -n12 ( #14216 )
2026-01-19 16:10:30 +09:00
qazal
578a4a50d3
viz: row lines in timeline ( #14213 )
...
* simple start, already works for memory graph
* add height to exec packets
* math.max, border-color
* borderline is in pixels
* row border color
2026-01-19 13:01:43 +09:00
Christopher Milan
161fee9a48
move is_dtype_supported logic to renderer ( #14188 )
...
* move is_dtype_supported logic to renderer
* fix CPU_COUNT
* mypy happy
* early import libclang too with llvm
* run with debug
* skip autogen tests if MTLCompiler or llvm is loaded
* run autogen tests separately in CI
* lint
2026-01-18 22:37:04 -05:00
qazal
7abe9b020f
viz: add border colors to pkts timeline ( #14211 )
...
* viz: add border colors to pkts timeline
* 10
2026-01-19 11:37:46 +09:00
chenyu
67d9712ef6
jit copy aliased output if it's read later ( #14210 )
2026-01-18 18:48:59 -05:00
chenyu
97333b1954
jit footguns test case on assign with same buffer outputs ( #14209 )
...
related https://github.com/tinygrad/tinygrad/issues/13364
2026-01-18 16:01:09 -05:00
chenyu
e7c2df9113
improve consecutive Tensor indexing ( #14208 )
...
* improve consecutive Tensor indexing
instead of O(idx_counts*src_dims), it can just be O(idx_counts)
* test correctness
2026-01-18 15:14:33 -05:00
chenyu
c7b8f6496f
remove dtypes.index_like and dtypes.fields [pr] ( #14207 )
...
barely used, so just use inline and DTYPES_DICT
2026-01-18 11:49:01 -05:00
qazal
e27a0002c5
viz: only keep the sqtt bytes for pkts ( #14203 )
...
* viz: only keep the sqtt bytes for pkts
* better option name
* work
* renames
2026-01-18 17:04:26 +09:00
qazal
d8f87ae2f2
SQTT packets to assembly mapper ( #14198 )
...
* disasm + compare to llvm
* start inst trace
* base tests pass
* work
* work
* all kernels
* qol
* refactor
* work
* work
* wave_focus
* simple
* work
* add a lot of asserts
* focus on wave0
* correct handling of IMMEDIATE_MASK
* work
* viz work
* use the metadata infra
* better
2026-01-18 16:32:13 +09:00
Christopher Milan
1eb110cd7d
fix memory corruption in NIR, reenable process replay ( #14204 )
2026-01-18 02:05:12 -05:00
George Hotz
a51e0a86db
assembly/amd: clean up disasm.py + add CDNA support ( #14200 )
...
* assembly/amd: clean up disasm.py
* cleanups
* add missing encodings
* decode is pretty
* cdna
* assert on failure
* cdna roudtrip
* cdna passing
* test
* lil cleanup
* variant cleanups
* cleanups
2026-01-18 14:48:44 +09:00
chenyu
4b18c92bc5
simpler Context.__enter__ [pr] ( #14201 )
2026-01-18 00:38:59 -05:00
qazal
feaa804158
skip lvp process replay in CI [pr] ( #14202 )
2026-01-18 13:25:04 +09:00
chenyu
b12a9fea80
runtime int call instead of cast(int) ( #14183 )
2026-01-17 20:34:45 -05:00
George Hotz
79c1559f69
amd asm can still be simpler ( #14199 )
...
* amd asm can still be simpler
* simpler
* V_LANE_ID
* simpler
* simpler
* compact vgpr
2026-01-17 18:40:10 +09:00
chenyu
5e6a72c33f
new Onnx Gather ( #14187 )
...
instead of assuming const indices, check if it showed as a const
2026-01-16 22:24:07 -05:00
George Hotz
9f7f2f0e0c
MAX_SQTT_PKTS
2026-01-17 12:05:36 +09:00
George Hotz
50554115ee
fix VALU_SALU / IMMED_MASK and improve amd_asm_matmul ( #14196 )
...
* fix VALU_SALU / IMMED_MASK and improve amd_asm_matmul
* immed
* wave override
* restore ALT
* advance sgprs correctly
* no helpers
* decrease to 192 VGPRs
2026-01-17 11:58:34 +09:00
chenyu
ab244c7f81
onnx Gather should not assume indices to be const ( #14185 )
...
* onnx Gather should not assume indices to be const
added a failed test case
* just list
2026-01-16 20:55:00 -05:00
wozeparrot
a879b54234
tk: fa jit fix ( #14170 )
2026-01-16 16:38:45 -08:00
qazal
a8ae9757dd
viz: put alts in the same row, LDS color ( #14194 )
...
* viz: put alts in the same row, coloring work
* assert if packets overlap
* lds color
2026-01-17 09:36:14 +09:00