Commit Graph

11401 Commits

Author SHA1 Message Date
wozeparrot
99e667bdcd tk fa bwd (#13480) 2025-12-17 23:56:37 -08:00
George Hotz
aeb7516c8a tests passing on tinybox h3 (#13742) 2025-12-17 19:04:34 -04:00
chenyu
7cd7593c5d add script to train bert on mi350x (#13743)
adapted from mi300 config
2025-12-17 16:54:04 -05:00
George Hotz
22f3e7f995 better precommit coverage and faster (#13740)
* improve pre-commit hook speed and coverage

* remove a few

* lose that
2025-12-17 13:25:55 -04:00
George Hotz
bc78cf1197 filter warnings for nicer test output (#13739) 2025-12-17 13:25:27 -04:00
George Hotz
b013244c38 fix local tests for AMD_LLVM (#13738)
* fix local tests for AMD_LLVM

* fix linters

* skip that for now

* fix segfault
2025-12-17 12:23:46 -04:00
nimlgen
7081014c73 am_smi: mi300 (#13737)
* am_smi: mi300

* smi

* remo
2025-12-17 17:56:01 +03:00
George Hotz
3dbde178c1 mark slow tests as slow instead of as CI (#13736)
* mark slow tests as slow instead of as CI

* CI shouldn't have different behavior

* more skips / CI

* slow
2025-12-17 10:29:57 -04:00
George Hotz
9015a22523 make tests faster (#13734) 2025-12-17 09:39:44 -04:00
nimlgen
3eecb4f123 am: mi350 support (#13733) 2025-12-17 14:57:21 +03:00
wozeparrot
5151a341b3 tk: small changes from fa bwd (#13732) 2025-12-16 22:44:36 -08:00
chenyu
fda73c8180 support LAMB param offload (#13730)
also added Tensor.shard_like
2025-12-16 19:56:30 -05:00
George Hotz
cf0c28d5ae all tests pass on strix halo (#13728) 2025-12-16 19:35:50 -04:00
Christopher Milan
af1d938a50 DLL: search wsl lib folder (#13727) 2025-12-16 18:27:09 -05:00
George Hotz
0fb645cc4c move some methods to mixins (#13725)
* move some methods to mixins

* a few more

* math trunc
2025-12-16 19:20:04 -04:00
Christopher Milan
c6ba016da6 fix cuda check (#13726) 2025-12-16 18:00:09 -05:00
George Hotz
ee45669d14 pre extract afters + sched cleanups (#13720)
* pre extract afters + sched cleanups

* claude.md lesson

* tests for schedule cache

* Revert "tests for schedule cache"

This reverts commit fb3f2e800a.
2025-12-16 16:14:30 -04:00
George Hotz
4b741e893f remove REMOTE=1 (#13722)
* remove REMOTE=1

* leave ibverbs
2025-12-16 15:58:10 -04:00
George Hotz
4d8d821f56 create schedule before the cache (#13717)
* create schedule before the cache

* move create_schedule

* simpler

* simpler

* simpler
2025-12-16 14:15:31 -04:00
George Hotz
bfe374c7f5 support symbolic shapes in split/chunk when split dim is concrete (#13718)
* support symbolic shapes in split/chunk when split dim is concrete

Previously split() and chunk() required all dimensions to be concrete.
Now they only require the dimension being split to be concrete, allowing
them to work with tensors that have symbolic shapes in other dimensions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* update CLAUDE.md: add pre-commit and no-amend rules

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix dim resolution order in split/chunk

Ensure dim_sz is retrieved after dim is resolved, not before.
The previous one-liner evaluated self.shape[dim] with the original
unresolved dim value.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-16 13:55:06 -04:00
chenyu
e428fbfab6 verify dtype of llama model params (#13719) 2025-12-16 12:32:02 -05:00
George Hotz
e5a66ace80 multi custom kernel support (#13716)
* multi custom kernel support

* custom kernel xfrom

* works

* no SPEC=2 on ck

* panic

* touchups
2025-12-16 11:36:30 -04:00
nimlgen
5778722979 am: restore queues (#13714)
* am: restore queues

* l

* cmnt
2025-12-16 15:21:42 +03:00
chenyu
041e9a41c9 add contiguous in BertIntermediate (#13713)
faster step with a lot less recomputation
2025-12-15 22:37:36 -05:00
George Hotz
7589c897b2 split usbgpu tests into their own benchmark [pr] (#13711) 2025-12-15 21:42:40 -04:00
qazal
6bafd90248 remove unused process replay input [pr] (#13712) 2025-12-16 09:29:35 +08:00
George Hotz
321ab943b2 qwen model is working (#13690)
* qwen model is mostly working

* add Q4_K quantization support to GGUF parser, add qwen3:1.7b model

- Add Q4_K (type 12) dequantization in nn/state.py
- Add qwen3:1.7b model using Q4_K_M quantization (smaller than Q8_0)
- Make bos_token_id optional for models like Qwen3 that don't have it
- Fix line length issues and add preset parameter to SimpleTokenizer

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* smaller diff

* test dequant

* half split

* better

* simple tok

* mock token

* polish

* better

* fix

* replace

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 18:00:34 -04:00
George Hotz
d43e4c7553 llm args + lil html page (#13710)
* update llm args

* lil html page

* lil

* line size

* qol
2025-12-15 17:09:31 -04:00
George Hotz
ee4a7ee12f rope half-split (#13706)
* rope half

* nicer

* this

* rearrange
2025-12-15 15:31:11 -04:00
Christopher Milan
2359e88f0c wrap cdll redo (#13705)
* wrap CDLL with custom findlib

* lint

* regen

* fix

* mypy

* hardcode libc on macos

* fix frameworks

* fix webgpu win

* remove supports

* regen metal

* regen libclang

* regen

* simpler

* regen

* regen

* find nvrtc

* fix

* regen

* fix

* typo

* regen

* split

* rsplit one

* typo

* try load DLL

* string error
2025-12-15 13:15:02 -05:00
wozeparrot
5d509499b2 tk: kernel finish groups stores (#13704) 2025-12-15 09:16:17 -08:00
George Hotz
54a22aa298 add test for jit footguns (#13701)
* add test for jit footguns

* shorter

* notes
2025-12-15 10:47:44 -05:00
George Hotz
fd49bb512d download cache by job (#13703) 2025-12-15 10:47:17 -05:00
George Hotz
a657a4e0f4 add Q4_K GGUF quantization support (#13700)
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 10:17:56 -05:00
nimlgen
615dcab767 am: minimal mi300 boot (#13679)
* nbio7_9

* psp

* gmc

* gfx

* sdma

* ih

* linter

* linter

* minor

* finish

* add missing

* do not allow warm boot for now
2025-12-15 15:55:03 +03:00
qazal
72e006cd59 fast VIZ=2 startup (#13682) 2025-12-15 19:16:43 +08:00
qazal
50d34428bd fix viz endstream (#13687) 2025-12-15 16:54:18 +08:00
wozeparrot
7ef7ce2856 tk reg local store (#13689) 2025-12-14 23:07:30 -08:00
George Hotz
572ca80046 fast tinygrad.apps.llm (#13685)
* llm: add --benchmark support

* fix speed

* debug logging

* fix test attention
2025-12-14 21:05:21 -05:00
chenyu
6cad622f59 don't FREE_INTERMEDIATE in bert (#13684)
hangs green hcq consistently after an hour of training
2025-12-14 14:27:42 -05:00
chenyu
871ab8415f some onnx cleanups (#13683) 2025-12-14 13:58:54 -05:00
nimlgen
75832ce4f6 am: psp with no autoload (#13681) 2025-12-14 20:20:09 +03:00
nimlgen
8bcb1038e4 am: nbio 7.9.0 (#13680) 2025-12-14 18:35:29 +03:00
George Hotz
013240938b llm: add --benchmark support (#13678) 2025-12-14 08:35:05 -05:00
Robbe Derks
cddbdaf5e1 usbgpu: patch: auto-detect controller PID/VID (#13645)
* auto-detect controller

* fix lint?

* needs ''

* just try
2025-12-14 00:54:51 -05:00
George Hotz
d7fb5d9b62 speedups: early return from simplify (#13665)
* early return from simplify

* pm_rewrite

* more speed

* remove again

* early return from simplify

* ugh
2025-12-14 00:51:28 -05:00
George Hotz
bcbf832399 add chrism 2025-12-14 00:45:57 -05:00
chenyu
ed962786d6 use assign in Tensor.backward (#13674)
preserve the grad object so that jit works
2025-12-13 22:43:06 -05:00
chenyu
721a379c41 Revert "autogen: use wrapped CDLL with custom findlib (#13666)" (#13675)
This reverts commit f6cc3b13b9.
2025-12-13 22:42:41 -05:00
nimlgen
6402dcf940 am: xccs queue creation (#13672) 2025-12-13 18:37:09 +03:00