Commit Graph

4808 Commits

Author SHA1 Message Date
chenyu
cb7c76a3bd update test_fuzz_failure to not contruct full UOp (#13960) 2026-01-01 15:09:58 -05:00
chenyu
51398edf9c fix indirect import (#13958)
also deleted old external tests
2026-01-01 14:22:45 -05:00
chenyu
8e416df438 simpler InvalidType [pr] (#13957)
simpler singleton pattern
2026-01-01 13:55:51 -05:00
chenyu
4d5c4d256d update tqdm for edge case (#13956)
1.00kit/s and not 1000it/s for value 999.5
2026-01-01 11:37:26 -05:00
chenyu
ed222070f7 update xlog2 fp16 decomp to not use fp32 (#13955) 2026-01-01 11:18:29 -05:00
chenyu
c69470be52 fix test_symbolic_arange_sym_step (#13952) 2026-01-01 09:41:07 -05:00
chenyu
b91b46091c delete test_tensor_uop (#13951)
old test for shape tracker. also update tests that refer shapetracker

names
2026-01-01 09:25:05 -05:00
chenyu
17ef4af72c new ceildiv that fixed symbolic conv (#13944)
* new ceildiv that fixed symbolic conv

* smaller test case
2026-01-01 09:02:41 -05:00
haofei
526fd4ec71 Fix SVD rank‑1 Jacobi rotation when tau == 0 (#13945) 2026-01-01 00:30:18 -05:00
haofei
20777f30b9 Fix QR/SVD NaNs on zero/orthogonal inputs (#13943) 2025-12-31 23:40:09 -05:00
chenyu
52acadc160 consolidate IGNORE_OOB=0 tests (#13937)
add a new unit test file and add more cases
2025-12-31 15:24:20 -05:00
Christopher Milan
13973e4dea refactor image pitch (#13928) 2025-12-31 13:22:38 -05:00
George Hotz
b998a80b5d assembly/amd: split generated stuff into enum/ins (#13924) 2025-12-31 10:10:52 -05:00
nimlgen
25440f0f72 all2all (#13902)
* all2all

* um

* fix

* x

* um

* simler

* mypy

* fix

* t

* cmnts
2025-12-31 16:38:32 +03:00
George Hotz
0221b96761 assembly/amd: fix all ops tests (#13910)
* assembly/amd: fix all ops tests

* test_ops with smaller sizes

* ds store/load 2addr
2025-12-30 18:01:34 -05:00
George Hotz
efc99d0c55 assembly/amd: more refactors (#13907)
* assembly/amd: more refactors

* more refactors

* more refactors

* simpler emu

* generate.py

* regen all

* cleanups

* more

* work

* more readme

* lil
2025-12-30 16:13:24 -05:00
George Hotz
04c79505ec no subnormal bf16 (#13905) 2025-12-30 13:02:53 -05:00
chenyu
ab58926b00 update sampling in test_float_cast_to_unsigned (#13889)
filter is slow for small dtypes
2025-12-29 21:35:46 -05:00
George Hotz
81cf9ea0ab rename to extra.assembly.amd (#13879) 2025-12-29 14:10:55 -05:00
b1tg
63a1bb8507 multi custom kernel: support input mixed with copy and shard (#13748) 2025-12-29 12:54:27 -05:00
chenyu
0a98fd38b3 fix tests that failed locally on mac (#13872)
keccak output was silently broken without contiguous
2025-12-29 11:23:38 -05:00
Clément Verrier
0e409ff5ce fix indentation in UOp pretty_print for repeated references (#13857)
* fix correct indentation in UOp pretty_print for repeated references

When a UOp was referenced multiple times, the walrus operator notation
(e.g., x0:=) was correctly used for the first occurrence, but subsequent
references had misaligned indentation due to an extra space character.

Fix indentation misalignment in pretty_print() when UOps are referenced
multiple times.

* add simple unit tests for UOp repr

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-12-29 10:46:16 -05:00
George Hotz
25ef866e89 write python emulator from RDNA3 psuedocode in pdf (#13841)
* write python emulator from RDNA3 psuedocode in pdf

* emu2

* more emu

* working

* more psueod

* progress

* cleanups

* delete junk

* delete stale files

* just emu

* work

* emu compare

* bemu

* cleanups and more failures

* revert bench emu

* fix emu cmp

* four tests fail

* bugfixes

* dsl

* ext

* refactor

* dsl

* div scale fix

* test_emu

* fix emu tests

* pcode

* test pcode

* top imports

* fix test_emu to use run_asm

* emu tests on real hardware

* more tests

* more emu tests

* more

* work

* work

* bug fix

* bugfixes

* fix fp16 gemm

* all ops tests pass in emulator

* fix llvm tests

* fix a few more tests

* fix mockgpu timeout
2025-12-29 07:39:53 -05:00
nimlgen
c6769badc2 mockgpu: async support (#13868)
* mockgpu: async support

* cpu
2025-12-29 13:18:37 +03:00
chenyu
784b919f7f Revert "optim empty shard #13513 (#13598)" (#13855)
* Revert "optim empty shard #13513 (#13598)"

This reverts commit 76d465dbc3.

* test_arange_shrink

* update test
2025-12-27 21:10:23 -05:00
anu
9b4de8abc7 fix beam in python 3.14+ (#13836)
* fix beam search on python 3.14

* add PickleableCount class to helpers

* change name, add test, add step

* tidy count init
2025-12-27 16:24:22 -05:00
Clément Verrier
ae013beab8 handle empty VECTORIZE in UOp.render() (#13847)
`UOp.render()` crashed with `IndexError: tuple index out of range` when
the UOp graph contained a `VECTORIZE` with empty `src=()`. This occurs
when reshaping to scalar shape `()`, e.g., `Tensor.ones(4).sum()`.

The bug was in the renderer's VECTORIZE pattern: `all_same(())` returns
`True` (vacuous truth), causing the code to access `x.src[0]` on an
empty tuple.

- Fix `IndexError` when calling `UOp.render()` on graphs containing
  empty `VECTORIZE` nodes.
- Add test for empty `VECTORIZE` rendering.
2025-12-27 10:09:39 -05:00
qazal
a2da61d096 use new style amd compiler in viz (#13848)
* working version, handcode gfx1100 arch

* get target from device properties

* lib in cfg test program spec
2025-12-27 23:59:30 +09:00
qazal
f6de9095a0 switch asm tests to dsl (#13840)
* switch asm tests to dsl

* labeled basic blocks also work

* indenting for basic blocks

* allow define from star import
2025-12-27 02:15:16 +09:00
George Hotz
9d94b8c6b2 python asm dsl in extra + python REMU (#13436)
* having fun with python asm dsl

* rdna3

* meh

* all in rdna3

* work

* more work

* work

* integration

* tests

* simpler

* simpler

* asm

* better

* simpler

* progress

* emu

* simpler

* emu

* tests

* types

* vopd

* cleaups

* work

* memory ranges

* add tracing

* refactors

* run_asm exit

* more readable

* compare to remu

* test gemm

* bug + stale

* more tests

* refactor

* tests fix

* more ins

* more instructions

* refactor

* faster

* match case

* match case

* simpler

* work

* tests

* run_asm

* work

* bug fixes

* more emu

* alu/emu

* refactor

* no pipeline emu yet

* alu direct

* fix

* bugfixes + new test

* fix exceptions in emulators

* update gen.py

* pylint

* no pdf

* improve bench_emu

* speedups

* cleanups

* more tests
2025-12-25 13:04:14 -05:00
chenyu
54af29dbdb trange can just be a function (#13827) 2025-12-24 23:57:10 -05:00
qazal
a1c1684b91 set .amdhsa_kernarg_size in asm test (#13826) 2025-12-25 13:08:14 +09:00
George Hotz
43c6e973d8 add optional compiler in Renderer (#13817)
* add optional compiler in Renderer [pr]

* fix

* late init

* remove precompiled

* cleanup
2025-12-23 17:58:46 -05:00
nimlgen
90b217896f am: xgmi p2p (#13811)
* system: use addr space

* am: xgmi

* fix

* ugh
2025-12-23 20:11:38 +03:00
George Hotz
6439a515be test fixups / speedups / var_vals refactor (#13812)
* no PYTHONPATH + llm server port 0

* llm tok speedup

* refactor var_vals
2025-12-23 12:05:59 -05:00
George Hotz
8dcba2e2cc no full_rewrite [pr] (#13809)
* no full_rewrite [pr]

* fix

* fix docs
2025-12-22 23:20:01 -05:00
George Hotz
2af2b4da5d Revert "rewrites for renderer and compiler (#13646)" (#13806)
This reverts commit 339dadf056.
2025-12-22 19:21:33 -05:00
George Hotz
339dadf056 rewrites for renderer and compiler (#13646)
* rewrites for renderer and compiler

* full_rewrite_to_program

* fix pre-commit

* compiler passed into get_program

* no pkl compiler

* lib on program spec

* fix spec

* fix test

* no device

* compiler_device

* nm

* fix nir

* fix

* simplest

* fix tests

* revert
2025-12-22 18:58:43 -05:00
chenyu
7f1d41c9f9 delete files that import ShapeTracker (#13805) 2025-12-22 15:54:18 -05:00
qazal
389f01c7f4 viz: amdgpu assembly basic block graph (#13755) 2025-12-22 23:17:16 +08:00
George Hotz
df0f9d6860 add olmoe support to llm (#13792)
* add olmoe support to llm

* cleanups

* simpler

* clean

* fix mypy

* lil

* remove dumb assert
2025-12-22 10:41:35 -04:00
chenyu
5cb827f7bf clean up can_lossless_cast and add missing pairs [p] (#13793) 2025-12-21 12:18:33 -05:00
George Hotz
75a6a03664 add qwen3 moe support to tinygrad.apps.llm (#13775)
* qwen moe works

* simple moe

* one test

* integration
2025-12-21 12:36:02 -04:00
qazal
dc660c9fc0 remove stale / untested viz related files (#13785) 2025-12-21 16:42:48 +08:00
George Hotz
59c02dd87f does this fix the dtype test? (#13779)
* does this fix the dtype test?

* simpler
2025-12-20 17:31:46 -04:00
chenyu
733ef0452c update test_uop_resolve (#13777)
plain @unittest.expectedFailure is too broad
2025-12-20 12:40:59 -05:00
George Hotz
45c459848d remove more stale stuff (#13765)
* remove more stale stuff

* remove disassemblers/adreno

* stale
2025-12-19 17:14:56 -04:00
George Hotz
744af193f0 remove ScheduleItem and merge it with ExecItem (#13759)
* remove ExecItem and merge it with ScheduleItem

* less diff

* fix issues

* min diff

* don't change bufs in _lower

* min diff

* update

* revert

* fixes

* diff
2025-12-19 17:04:24 -04:00
Christopher Milan
97103831c5 Revert "remove image from BufferSpec (#13636)" (#13761)
This reverts commit 2571a1eb47.
2025-12-19 13:54:36 -05:00
Christopher Milan
2571a1eb47 remove image from BufferSpec (#13636)
* remove image from BufferSpec

* cl tiny_gemm (64) works

* mypy

* padding

* openpilot CL

* reshape properly

* remove extra qcom checks

* pad output

* mypy

* update compile test

* move undo

* TestImageCopy valid images

* TestImageRealization valid images

* TestImageDType valid images

* cleanups

* test_renderer_failures

* ruff

* mypy

* simplify ops_qcom

* bump step time
2025-12-19 13:41:20 -05:00