4838 Commits

Author SHA1 Message Date
chenyu
0a98fd38b3 fix tests that failed locally on mac (#13872)
keccak output was silently broken without contiguous
2025-12-29 11:23:38 -05:00
Clément Verrier
0e409ff5ce fix indentation in UOp pretty_print for repeated references (#13857)
* fix correct indentation in UOp pretty_print for repeated references

When a UOp was referenced multiple times, the walrus operator notation
(e.g., x0:=) was correctly used for the first occurrence, but subsequent
references had misaligned indentation due to an extra space character.

Fix indentation misalignment in pretty_print() when UOps are referenced
multiple times.

* add simple unit tests for UOp repr

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-12-29 10:46:16 -05:00
George Hotz
25ef866e89 write python emulator from RDNA3 psuedocode in pdf (#13841)
* write python emulator from RDNA3 psuedocode in pdf

* emu2

* more emu

* working

* more psueod

* progress

* cleanups

* delete junk

* delete stale files

* just emu

* work

* emu compare

* bemu

* cleanups and more failures

* revert bench emu

* fix emu cmp

* four tests fail

* bugfixes

* dsl

* ext

* refactor

* dsl

* div scale fix

* test_emu

* fix emu tests

* pcode

* test pcode

* top imports

* fix test_emu to use run_asm

* emu tests on real hardware

* more tests

* more emu tests

* more

* work

* work

* bug fix

* bugfixes

* fix fp16 gemm

* all ops tests pass in emulator

* fix llvm tests

* fix a few more tests

* fix mockgpu timeout
2025-12-29 07:39:53 -05:00
nimlgen
c6769badc2 mockgpu: async support (#13868)
* mockgpu: async support

* cpu
2025-12-29 13:18:37 +03:00
chenyu
784b919f7f Revert "optim empty shard #13513 (#13598)" (#13855)
* Revert "optim empty shard #13513 (#13598)"

This reverts commit 76d465dbc3.

* test_arange_shrink

* update test
2025-12-27 21:10:23 -05:00
anu
9b4de8abc7 fix beam in python 3.14+ (#13836)
* fix beam search on python 3.14

* add PickleableCount class to helpers

* change name, add test, add step

* tidy count init
2025-12-27 16:24:22 -05:00
Clément Verrier
ae013beab8 handle empty VECTORIZE in UOp.render() (#13847)
`UOp.render()` crashed with `IndexError: tuple index out of range` when
the UOp graph contained a `VECTORIZE` with empty `src=()`. This occurs
when reshaping to scalar shape `()`, e.g., `Tensor.ones(4).sum()`.

The bug was in the renderer's VECTORIZE pattern: `all_same(())` returns
`True` (vacuous truth), causing the code to access `x.src[0]` on an
empty tuple.

- Fix `IndexError` when calling `UOp.render()` on graphs containing
  empty `VECTORIZE` nodes.
- Add test for empty `VECTORIZE` rendering.
2025-12-27 10:09:39 -05:00
qazal
a2da61d096 use new style amd compiler in viz (#13848)
* working version, handcode gfx1100 arch

* get target from device properties

* lib in cfg test program spec
2025-12-27 23:59:30 +09:00
qazal
f6de9095a0 switch asm tests to dsl (#13840)
* switch asm tests to dsl

* labeled basic blocks also work

* indenting for basic blocks

* allow define from star import
2025-12-27 02:15:16 +09:00
George Hotz
9d94b8c6b2 python asm dsl in extra + python REMU (#13436)
* having fun with python asm dsl

* rdna3

* meh

* all in rdna3

* work

* more work

* work

* integration

* tests

* simpler

* simpler

* asm

* better

* simpler

* progress

* emu

* simpler

* emu

* tests

* types

* vopd

* cleaups

* work

* memory ranges

* add tracing

* refactors

* run_asm exit

* more readable

* compare to remu

* test gemm

* bug + stale

* more tests

* refactor

* tests fix

* more ins

* more instructions

* refactor

* faster

* match case

* match case

* simpler

* work

* tests

* run_asm

* work

* bug fixes

* more emu

* alu/emu

* refactor

* no pipeline emu yet

* alu direct

* fix

* bugfixes + new test

* fix exceptions in emulators

* update gen.py

* pylint

* no pdf

* improve bench_emu

* speedups

* cleanups

* more tests
2025-12-25 13:04:14 -05:00
chenyu
54af29dbdb trange can just be a function (#13827) 2025-12-24 23:57:10 -05:00
qazal
a1c1684b91 set .amdhsa_kernarg_size in asm test (#13826) 2025-12-25 13:08:14 +09:00
George Hotz
43c6e973d8 add optional compiler in Renderer (#13817)
* add optional compiler in Renderer [pr]

* fix

* late init

* remove precompiled

* cleanup
2025-12-23 17:58:46 -05:00
nimlgen
90b217896f am: xgmi p2p (#13811)
* system: use addr space

* am: xgmi

* fix

* ugh
2025-12-23 20:11:38 +03:00
George Hotz
6439a515be test fixups / speedups / var_vals refactor (#13812)
* no PYTHONPATH + llm server port 0

* llm tok speedup

* refactor var_vals
2025-12-23 12:05:59 -05:00
George Hotz
8dcba2e2cc no full_rewrite [pr] (#13809)
* no full_rewrite [pr]

* fix

* fix docs
2025-12-22 23:20:01 -05:00
George Hotz
2af2b4da5d Revert "rewrites for renderer and compiler (#13646)" (#13806)
This reverts commit 339dadf056.
2025-12-22 19:21:33 -05:00
George Hotz
339dadf056 rewrites for renderer and compiler (#13646)
* rewrites for renderer and compiler

* full_rewrite_to_program

* fix pre-commit

* compiler passed into get_program

* no pkl compiler

* lib on program spec

* fix spec

* fix test

* no device

* compiler_device

* nm

* fix nir

* fix

* simplest

* fix tests

* revert
2025-12-22 18:58:43 -05:00
chenyu
7f1d41c9f9 delete files that import ShapeTracker (#13805) 2025-12-22 15:54:18 -05:00
qazal
389f01c7f4 viz: amdgpu assembly basic block graph (#13755) 2025-12-22 23:17:16 +08:00
George Hotz
df0f9d6860 add olmoe support to llm (#13792)
* add olmoe support to llm

* cleanups

* simpler

* clean

* fix mypy

* lil

* remove dumb assert
2025-12-22 10:41:35 -04:00
chenyu
5cb827f7bf clean up can_lossless_cast and add missing pairs [p] (#13793) 2025-12-21 12:18:33 -05:00
George Hotz
75a6a03664 add qwen3 moe support to tinygrad.apps.llm (#13775)
* qwen moe works

* simple moe

* one test

* integration
2025-12-21 12:36:02 -04:00
qazal
dc660c9fc0 remove stale / untested viz related files (#13785) 2025-12-21 16:42:48 +08:00
George Hotz
59c02dd87f does this fix the dtype test? (#13779)
* does this fix the dtype test?

* simpler
2025-12-20 17:31:46 -04:00
chenyu
733ef0452c update test_uop_resolve (#13777)
plain @unittest.expectedFailure is too broad
2025-12-20 12:40:59 -05:00
George Hotz
45c459848d remove more stale stuff (#13765)
* remove more stale stuff

* remove disassemblers/adreno

* stale
2025-12-19 17:14:56 -04:00
George Hotz
744af193f0 remove ScheduleItem and merge it with ExecItem (#13759)
* remove ExecItem and merge it with ScheduleItem

* less diff

* fix issues

* min diff

* don't change bufs in _lower

* min diff

* update

* revert

* fixes

* diff
2025-12-19 17:04:24 -04:00
Christopher Milan
97103831c5 Revert "remove image from BufferSpec (#13636)" (#13761)
This reverts commit 2571a1eb47.
2025-12-19 13:54:36 -05:00
Christopher Milan
2571a1eb47 remove image from BufferSpec (#13636)
* remove image from BufferSpec

* cl tiny_gemm (64) works

* mypy

* padding

* openpilot CL

* reshape properly

* remove extra qcom checks

* pad output

* mypy

* update compile test

* move undo

* TestImageCopy valid images

* TestImageRealization valid images

* TestImageDType valid images

* cleanups

* test_renderer_failures

* ruff

* mypy

* simplify ops_qcom

* bump step time
2025-12-19 13:41:20 -05:00
chenyu
185a000882 gradient of COPY (#13760) 2025-12-19 13:33:59 -05:00
George Hotz
fa40df972f fix tests for NV (#13744)
* small fix

* min diff

* bfloat16 out
2025-12-18 13:20:21 -04:00
wozeparrot
99e667bdcd tk fa bwd (#13480) 2025-12-17 23:56:37 -08:00
George Hotz
aeb7516c8a tests passing on tinybox h3 (#13742) 2025-12-17 19:04:34 -04:00
George Hotz
b013244c38 fix local tests for AMD_LLVM (#13738)
* fix local tests for AMD_LLVM

* fix linters

* skip that for now

* fix segfault
2025-12-17 12:23:46 -04:00
George Hotz
3dbde178c1 mark slow tests as slow instead of as CI (#13736)
* mark slow tests as slow instead of as CI

* CI shouldn't have different behavior

* more skips / CI

* slow
2025-12-17 10:29:57 -04:00
George Hotz
9015a22523 make tests faster (#13734) 2025-12-17 09:39:44 -04:00
chenyu
fda73c8180 support LAMB param offload (#13730)
also added Tensor.shard_like
2025-12-16 19:56:30 -05:00
George Hotz
cf0c28d5ae all tests pass on strix halo (#13728) 2025-12-16 19:35:50 -04:00
George Hotz
4b741e893f remove REMOTE=1 (#13722)
* remove REMOTE=1

* leave ibverbs
2025-12-16 15:58:10 -04:00
George Hotz
bfe374c7f5 support symbolic shapes in split/chunk when split dim is concrete (#13718)
* support symbolic shapes in split/chunk when split dim is concrete

Previously split() and chunk() required all dimensions to be concrete.
Now they only require the dimension being split to be concrete, allowing
them to work with tensors that have symbolic shapes in other dimensions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* update CLAUDE.md: add pre-commit and no-amend rules

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix dim resolution order in split/chunk

Ensure dim_sz is retrieved after dim is resolved, not before.
The previous one-liner evaluated self.shape[dim] with the original
unresolved dim value.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-16 13:55:06 -04:00
George Hotz
e5a66ace80 multi custom kernel support (#13716)
* multi custom kernel support

* custom kernel xfrom

* works

* no SPEC=2 on ck

* panic

* touchups
2025-12-16 11:36:30 -04:00
George Hotz
321ab943b2 qwen model is working (#13690)
* qwen model is mostly working

* add Q4_K quantization support to GGUF parser, add qwen3:1.7b model

- Add Q4_K (type 12) dequantization in nn/state.py
- Add qwen3:1.7b model using Q4_K_M quantization (smaller than Q8_0)
- Make bos_token_id optional for models like Qwen3 that don't have it
- Fix line length issues and add preset parameter to SimpleTokenizer

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* smaller diff

* test dequant

* half split

* better

* simple tok

* mock token

* polish

* better

* fix

* replace

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 18:00:34 -04:00
wozeparrot
5d509499b2 tk: kernel finish groups stores (#13704) 2025-12-15 09:16:17 -08:00
George Hotz
54a22aa298 add test for jit footguns (#13701)
* add test for jit footguns

* shorter

* notes
2025-12-15 10:47:44 -05:00
George Hotz
a657a4e0f4 add Q4_K GGUF quantization support (#13700)
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 10:17:56 -05:00
wozeparrot
7ef7ce2856 tk reg local store (#13689) 2025-12-14 23:07:30 -08:00
George Hotz
572ca80046 fast tinygrad.apps.llm (#13685)
* llm: add --benchmark support

* fix speed

* debug logging

* fix test attention
2025-12-14 21:05:21 -05:00
chenyu
ed962786d6 use assign in Tensor.backward (#13674)
preserve the grad object so that jit works
2025-12-13 22:43:06 -05:00
George Hotz
55845f7de7 schedule: cache unbinds for consistent cache keys (#13664)
* schedule: cache unbinds for consistent cache keys

strip BIND values before computing cache key so different bound values
(e.g. KV cache positions) hit the same schedule cache entry.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* spec: allow single-src BIND for schedule cache key normalization

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: add lessons learned to CLAUDE.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* more claude.md

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 17:27:42 -05:00