Commit Graph

11449 Commits

Author SHA1 Message Date
George Hotz
8dcba2e2cc no full_rewrite [pr] (#13809)
* no full_rewrite [pr]

* fix

* fix docs
2025-12-22 23:20:01 -05:00
George Hotz
edce2303f4 rewrite to program (#13808) 2025-12-22 20:03:33 -05:00
George Hotz
2af2b4da5d Revert "rewrites for renderer and compiler (#13646)" (#13806)
This reverts commit 339dadf056.
2025-12-22 19:21:33 -05:00
George Hotz
339dadf056 rewrites for renderer and compiler (#13646)
* rewrites for renderer and compiler

* full_rewrite_to_program

* fix pre-commit

* compiler passed into get_program

* no pkl compiler

* lib on program spec

* fix spec

* fix test

* no device

* compiler_device

* nm

* fix nir

* fix

* simplest

* fix tests

* revert
2025-12-22 18:58:43 -05:00
Daniel Xu
4edaaf19e5 Handle tied embeddings for llama 3.2 1B (#13796)
Previously the output.weight layer would not be loaded, and would only
contain randomly initialized values. This led to junk when doing a
forward pass.

Signed-off-by: Daniel Xu <daniel@thinkingmachines.ai>
2025-12-22 16:31:40 -05:00
chenyu
7f1d41c9f9 delete files that import ShapeTracker (#13805) 2025-12-22 15:54:18 -05:00
qazal
b31373ca70 remove llvm-mca stuff from viz (#13802) 2025-12-23 01:41:51 +08:00
chenyu
27d899ce97 TRAIN=0 to only eval llama (#13804) 2025-12-22 11:55:46 -05:00
chenyu
39d962106f update llama logging (#13803)
```
REWRITE_STACK_LIMIT=1000000 SMALL=1 BASEDIR=/raid/datasets/c4-8b SAMPLES=1000 BS=8 DP=8 DEFAULT_FLOAT=bfloat16 OPTIM_DTYPE=bfloat16 LLAMA3_SIZE=8B SEQLEN=1024 PYTHONPATH=. MODEL=llama3 python3 examples/mlperf/model_train.py

    1 93.44 s run, 11.8750 loss, 0.000000000001 LR, 642.43 GB used,  19644.30 GFLOPS
    2 101.78 s run, 11.8750 loss, 0.000000000001 LR, 1454.57 GB used,  17039.35 GFLOPS
    3 7.34 s run, 11.8750 loss, 0.000000000002 LR, 1454.57 GB used, 236258.78 GFLOPS
    4 4.32 s run, 11.8750 loss, 0.000000000002 LR, 1454.57 GB used, 401488.40 GFLOPS
    5 4.36 s run, 11.9375 loss, 0.000000000003 LR, 1454.57 GB used, 398116.13 GFLOPS
    6 4.32 s run, 11.8750 loss, 0.000000000003 LR, 1454.57 GB used, 401878.60 GFLOPS
    7 4.34 s run, 11.8750 loss, 0.000000000004 LR, 1454.57 GB used, 399822.57 GFLOPS
    8 4.35 s run, 11.8750 loss, 0.000000000004 LR, 1454.57 GB used, 398512.24 GFLOPS
    9 4.36 s run, 11.8750 loss, 0.000000000005 LR, 1454.57 GB used, 397832.61 GFLOPS
   10 4.40 s run, 11.8750 loss, 0.000000000005 LR, 1454.57 GB used, 394520.83 GFLOPS
```
2025-12-22 11:28:29 -05:00
qazal
389f01c7f4 viz: amdgpu assembly basic block graph (#13755) 2025-12-22 23:17:16 +08:00
George Hotz
df0f9d6860 add olmoe support to llm (#13792)
* add olmoe support to llm

* cleanups

* simpler

* clean

* fix mypy

* lil

* remove dumb assert
2025-12-22 10:41:35 -04:00
qazal
81d9053013 roc: cast to nullptr instead of changing header (#13801) 2025-12-22 22:34:06 +08:00
nimlgen
d299d30f2c am_smi: fix with new autogen (#13800) 2025-12-22 16:53:26 +03:00
nimlgen
f6bda6ae4e am: continue from saved state (#13799)
* am: gfx queue cont

* f

* reset

* f

* l
2025-12-22 15:55:07 +03:00
qazal
6237bd86f6 sqtt/pmc viz improvements (#13797) 2025-12-22 18:16:35 +09:00
Sitananda Prasad
3000b8d762 symbolic: add x ^ x -> 0 folding pattern (#13794) 2025-12-21 21:47:28 -04:00
chenyu
5cb827f7bf clean up can_lossless_cast and add missing pairs [p] (#13793) 2025-12-21 12:18:33 -05:00
George Hotz
75a6a03664 add qwen3 moe support to tinygrad.apps.llm (#13775)
* qwen moe works

* simple moe

* one test

* integration
2025-12-21 12:36:02 -04:00
chenyu
29ef0809bb can_safe_cast -> can_lossless_cast (#13789)
safe cast in numpy only means the result won't overflow, so lossless is more precise
2025-12-21 11:29:19 -05:00
chenyu
ed1fd7023b use getattr in dtype.truncate [pr] (#13788) 2025-12-21 11:05:43 -05:00
qazal
9839838fdd viz UOp layout cleanup (#13787)
* use the same names in server and client

* first layout args, then renderer args
2025-12-21 22:11:40 +08:00
nimlgen
e523971028 am: make mqd contig (#13786) 2025-12-21 17:00:33 +03:00
qazal
09e060eab5 simplify viz node labels (#13784) 2025-12-21 16:45:06 +08:00
qazal
dc660c9fc0 remove stale / untested viz related files (#13785) 2025-12-21 16:42:48 +08:00
George Hotz
59c02dd87f does this fix the dtype test? (#13779)
* does this fix the dtype test?

* simpler
2025-12-20 17:31:46 -04:00
George Hotz
5228f7bd06 hotfix: opencode should not reformat files 2025-12-20 15:55:29 -04:00
chenyu
733ef0452c update test_uop_resolve (#13777)
plain @unittest.expectedFailure is too broad
2025-12-20 12:40:59 -05:00
nimlgen
3db2104fb8 am: timeout sos start (#13776) 2025-12-20 17:41:33 +03:00
qazal
94f97f6988 generic viz cleanups from the basic blocks branch (#13774)
* simpler codeblock highlight

* simpler append

* status enum
2025-12-20 18:18:03 +08:00
George Hotz
a987a8ed44 add neg VIZ support to not start server (#13772) 2025-12-20 00:36:38 -04:00
qazal
b7c2f0dd1b remove stale extra/sched directory (#13770) 2025-12-20 11:57:30 +08:00
George Hotz
86cd1e9e81 remove UPatAny for typing fix [pr] (#13766)
* remove UPatAny for typing fix [pr]

* fix dtype
2025-12-19 17:41:18 -04:00
George Hotz
4702da41d5 hotfix: mkdir for extra/disassemblers 2025-12-19 17:18:37 -04:00
George Hotz
45c459848d remove more stale stuff (#13765)
* remove more stale stuff

* remove disassemblers/adreno

* stale
2025-12-19 17:14:56 -04:00
George Hotz
744af193f0 remove ScheduleItem and merge it with ExecItem (#13759)
* remove ExecItem and merge it with ScheduleItem

* less diff

* fix issues

* min diff

* don't change bufs in _lower

* min diff

* update

* revert

* fixes

* diff
2025-12-19 17:04:24 -04:00
George Hotz
df6cde8a00 cleanup stale examples/extra (#13764)
* cleanup stale files

* examples

* move those back

* old

* delete more
2025-12-19 16:27:37 -04:00
chenyu
80b84f5267 ruff lint tinykitten (#13762)
deleted used import and double spaces. a few ignore to not change the real code
2025-12-19 14:31:00 -05:00
Christopher Milan
97103831c5 Revert "remove image from BufferSpec (#13636)" (#13761)
This reverts commit 2571a1eb47.
2025-12-19 13:54:36 -05:00
Christopher Milan
2571a1eb47 remove image from BufferSpec (#13636)
* remove image from BufferSpec

* cl tiny_gemm (64) works

* mypy

* padding

* openpilot CL

* reshape properly

* remove extra qcom checks

* pad output

* mypy

* update compile test

* move undo

* TestImageCopy valid images

* TestImageRealization valid images

* TestImageDType valid images

* cleanups

* test_renderer_failures

* ruff

* mypy

* simplify ops_qcom

* bump step time
2025-12-19 13:41:20 -05:00
chenyu
185a000882 gradient of COPY (#13760) 2025-12-19 13:33:59 -05:00
nimlgen
57fe4d0a59 am: no_update_ptr for master (#13757) 2025-12-19 19:37:37 +03:00
chenyu
7fcd3cf991 hotfix SPEC for AFTER(CONTIGUOUS) (#13752)
fixed spec error in `PYTHONPATH="." REWRITE_STACK_LIMIT=5000000 NULL=1 DEFAULT_FLOAT="HALF" BERT_LAYERS=2 BENCHMARK=10  BS=128 GPUS=1 MODEL=bert python3 examples/mlperf/model_train.py`
2025-12-19 10:05:45 -04:00
qazal
81b5815a66 viz: minimal data to render a graph (#13754) 2025-12-19 16:19:28 +08:00
Christopher Milan
849e46da21 DLL: _PATH variables can be parent dir (#13753) 2025-12-19 00:28:02 -05:00
qazal
159c0e92fa viz: infrastructure for basic block graphs (#13751) 2025-12-19 13:08:19 +08:00
George Hotz
fa40df972f fix tests for NV (#13744)
* small fix

* min diff

* bfloat16 out
2025-12-18 13:20:21 -04:00
nimlgen
77191fb744 hive_reset for mi350 (#13746) 2025-12-18 12:02:28 +03:00
nimlgen
ceff388f3d am: extend va space (#13745) 2025-12-18 11:20:43 +03:00
wozeparrot
99e667bdcd tk fa bwd (#13480) 2025-12-17 23:56:37 -08:00
George Hotz
aeb7516c8a tests passing on tinybox h3 (#13742) 2025-12-17 19:04:34 -04:00