Francis Lata
fac137779e
remove flux1 seed image ( #13843 )
2025-12-27 00:45:11 -05:00
qazal
f6de9095a0
switch asm tests to dsl ( #13840 )
...
* switch asm tests to dsl
* labeled basic blocks also work
* indenting for basic blocks
* allow define from star import
2025-12-27 02:15:16 +09:00
chenyu
ba922094f2
remove redudant check in disk_supports_fast_copyout ( #13838 )
2025-12-26 11:30:55 -05:00
George Hotz
e9f2aaba2a
simplify rdna3 asm ( #13835 )
...
* simplify rdna3 asm
* cleanups
* fix names
* fix tests
* fixes
* more test fixes
* type fixes
* tests pass + mypy passes
* 3.11 syntax
2025-12-26 11:21:03 -05:00
nimlgen
c44b4f9ae0
am: fix sdma warm boot ( #13837 )
2025-12-26 12:38:06 +03:00
George Hotz
c6937fa744
more work on RDNA3 asm ( #13833 )
...
* more llvm asm tests
* roundtrip test
* work
* more handwritten
* more handwritten
* work
* tests pass
* dual mov
* all tests pass
* all tests pass fast
2025-12-25 23:28:14 -05:00
George Hotz
f1111ac7de
move amd compilers to new style ( #13831 )
...
* move amd compilers to new style
* simplest diff
* AMDHIPrenderer
2025-12-25 13:42:24 -05:00
George Hotz
9d94b8c6b2
python asm dsl in extra + python REMU ( #13436 )
...
* having fun with python asm dsl
* rdna3
* meh
* all in rdna3
* work
* more work
* work
* integration
* tests
* simpler
* simpler
* asm
* better
* simpler
* progress
* emu
* simpler
* emu
* tests
* types
* vopd
* cleaups
* work
* memory ranges
* add tracing
* refactors
* run_asm exit
* more readable
* compare to remu
* test gemm
* bug + stale
* more tests
* refactor
* tests fix
* more ins
* more instructions
* refactor
* faster
* match case
* match case
* simpler
* work
* tests
* run_asm
* work
* bug fixes
* more emu
* alu/emu
* refactor
* no pipeline emu yet
* alu direct
* fix
* bugfixes + new test
* fix exceptions in emulators
* update gen.py
* pylint
* no pdf
* improve bench_emu
* speedups
* cleanups
* more tests
2025-12-25 13:04:14 -05:00
nimlgen
b5f3a5ad79
am: cleanup comment ( #13828 )
2025-12-25 18:00:28 +03:00
chenyu
8985a4a023
one less branch in Buffer.view [pr] ( #13829 )
2025-12-25 09:34:15 -05:00
chenyu
094753b4e0
renderer arch version cleanup [pr] ( #13830 )
2025-12-25 09:32:56 -05:00
chenyu
54af29dbdb
trange can just be a function ( #13827 )
2025-12-24 23:57:10 -05:00
qazal
a1c1684b91
set .amdhsa_kernarg_size in asm test ( #13826 )
2025-12-25 13:08:14 +09:00
chenyu
da1cb6a9ec
update llama dataloader ( #13825 )
...
separate creating dataset from itererating over the dataset to not create eval data for each eval
2025-12-24 17:42:08 -05:00
chenyu
a7fc0c288b
clean up BufferCopy init [pr] ( #13824 )
2025-12-24 10:40:15 -05:00
chenyu
903753c60c
llama wandb logging ( #13822 )
2025-12-24 10:24:59 -05:00
qazal
e3a646dce3
viz: skip plaintext disassemble for cfg ( #13821 )
2025-12-24 23:16:59 +09:00
chenyu
cb07c5d0e8
fewer import annotations ( #13819 )
2025-12-23 18:45:50 -05:00
George Hotz
43c6e973d8
add optional compiler in Renderer ( #13817 )
...
* add optional compiler in Renderer [pr]
* fix
* late init
* remove precompiled
* cleanup
2025-12-23 17:58:46 -05:00
George Hotz
8eab6175ee
get_program refactor ( #13816 )
...
* get_program refactor
* fix docs
* cleanup
2025-12-23 16:44:46 -05:00
George Hotz
3d3c5b2fb9
add device to program ( #13815 )
...
* add device to program
* from_uop
* from_uop no renderer
* simpler global_size
2025-12-23 16:15:33 -05:00
nimlgen
90b217896f
am: xgmi p2p ( #13811 )
...
* system: use addr space
* am: xgmi
* fix
* ugh
2025-12-23 20:11:38 +03:00
George Hotz
6439a515be
test fixups / speedups / var_vals refactor ( #13812 )
...
* no PYTHONPATH + llm server port 0
* llm tok speedup
* refactor var_vals
2025-12-23 12:05:59 -05:00
George Hotz
8dcba2e2cc
no full_rewrite [pr] ( #13809 )
...
* no full_rewrite [pr]
* fix
* fix docs
2025-12-22 23:20:01 -05:00
George Hotz
edce2303f4
rewrite to program ( #13808 )
2025-12-22 20:03:33 -05:00
George Hotz
2af2b4da5d
Revert "rewrites for renderer and compiler ( #13646 )" ( #13806 )
...
This reverts commit 339dadf056 .
2025-12-22 19:21:33 -05:00
George Hotz
339dadf056
rewrites for renderer and compiler ( #13646 )
...
* rewrites for renderer and compiler
* full_rewrite_to_program
* fix pre-commit
* compiler passed into get_program
* no pkl compiler
* lib on program spec
* fix spec
* fix test
* no device
* compiler_device
* nm
* fix nir
* fix
* simplest
* fix tests
* revert
2025-12-22 18:58:43 -05:00
Daniel Xu
4edaaf19e5
Handle tied embeddings for llama 3.2 1B ( #13796 )
...
Previously the output.weight layer would not be loaded, and would only
contain randomly initialized values. This led to junk when doing a
forward pass.
Signed-off-by: Daniel Xu <daniel@thinkingmachines.ai >
2025-12-22 16:31:40 -05:00
chenyu
7f1d41c9f9
delete files that import ShapeTracker ( #13805 )
2025-12-22 15:54:18 -05:00
qazal
b31373ca70
remove llvm-mca stuff from viz ( #13802 )
2025-12-23 01:41:51 +08:00
chenyu
27d899ce97
TRAIN=0 to only eval llama ( #13804 )
2025-12-22 11:55:46 -05:00
chenyu
39d962106f
update llama logging ( #13803 )
...
```
REWRITE_STACK_LIMIT=1000000 SMALL=1 BASEDIR=/raid/datasets/c4-8b SAMPLES=1000 BS=8 DP=8 DEFAULT_FLOAT=bfloat16 OPTIM_DTYPE=bfloat16 LLAMA3_SIZE=8B SEQLEN=1024 PYTHONPATH=. MODEL=llama3 python3 examples/mlperf/model_train.py
1 93.44 s run, 11.8750 loss, 0.000000000001 LR, 642.43 GB used, 19644.30 GFLOPS
2 101.78 s run, 11.8750 loss, 0.000000000001 LR, 1454.57 GB used, 17039.35 GFLOPS
3 7.34 s run, 11.8750 loss, 0.000000000002 LR, 1454.57 GB used, 236258.78 GFLOPS
4 4.32 s run, 11.8750 loss, 0.000000000002 LR, 1454.57 GB used, 401488.40 GFLOPS
5 4.36 s run, 11.9375 loss, 0.000000000003 LR, 1454.57 GB used, 398116.13 GFLOPS
6 4.32 s run, 11.8750 loss, 0.000000000003 LR, 1454.57 GB used, 401878.60 GFLOPS
7 4.34 s run, 11.8750 loss, 0.000000000004 LR, 1454.57 GB used, 399822.57 GFLOPS
8 4.35 s run, 11.8750 loss, 0.000000000004 LR, 1454.57 GB used, 398512.24 GFLOPS
9 4.36 s run, 11.8750 loss, 0.000000000005 LR, 1454.57 GB used, 397832.61 GFLOPS
10 4.40 s run, 11.8750 loss, 0.000000000005 LR, 1454.57 GB used, 394520.83 GFLOPS
```
2025-12-22 11:28:29 -05:00
qazal
389f01c7f4
viz: amdgpu assembly basic block graph ( #13755 )
2025-12-22 23:17:16 +08:00
George Hotz
df0f9d6860
add olmoe support to llm ( #13792 )
...
* add olmoe support to llm
* cleanups
* simpler
* clean
* fix mypy
* lil
* remove dumb assert
2025-12-22 10:41:35 -04:00
qazal
81d9053013
roc: cast to nullptr instead of changing header ( #13801 )
2025-12-22 22:34:06 +08:00
nimlgen
d299d30f2c
am_smi: fix with new autogen ( #13800 )
2025-12-22 16:53:26 +03:00
nimlgen
f6bda6ae4e
am: continue from saved state ( #13799 )
...
* am: gfx queue cont
* f
* reset
* f
* l
2025-12-22 15:55:07 +03:00
qazal
6237bd86f6
sqtt/pmc viz improvements ( #13797 )
2025-12-22 18:16:35 +09:00
Sitananda Prasad
3000b8d762
symbolic: add x ^ x -> 0 folding pattern ( #13794 )
2025-12-21 21:47:28 -04:00
chenyu
5cb827f7bf
clean up can_lossless_cast and add missing pairs [p] ( #13793 )
2025-12-21 12:18:33 -05:00
George Hotz
75a6a03664
add qwen3 moe support to tinygrad.apps.llm ( #13775 )
...
* qwen moe works
* simple moe
* one test
* integration
2025-12-21 12:36:02 -04:00
chenyu
29ef0809bb
can_safe_cast -> can_lossless_cast ( #13789 )
...
safe cast in numpy only means the result won't overflow, so lossless is more precise
2025-12-21 11:29:19 -05:00
chenyu
ed1fd7023b
use getattr in dtype.truncate [pr] ( #13788 )
2025-12-21 11:05:43 -05:00
qazal
9839838fdd
viz UOp layout cleanup ( #13787 )
...
* use the same names in server and client
* first layout args, then renderer args
2025-12-21 22:11:40 +08:00
nimlgen
e523971028
am: make mqd contig ( #13786 )
2025-12-21 17:00:33 +03:00
qazal
09e060eab5
simplify viz node labels ( #13784 )
2025-12-21 16:45:06 +08:00
qazal
dc660c9fc0
remove stale / untested viz related files ( #13785 )
2025-12-21 16:42:48 +08:00
George Hotz
59c02dd87f
does this fix the dtype test? ( #13779 )
...
* does this fix the dtype test?
* simpler
2025-12-20 17:31:46 -04:00
George Hotz
5228f7bd06
hotfix: opencode should not reformat files
2025-12-20 15:55:29 -04:00
chenyu
733ef0452c
update test_uop_resolve ( #13777 )
...
plain @unittest.expectedFailure is too broad
2025-12-20 12:40:59 -05:00