wozeparrot
480ad264a4
llama: per device amax ( #15735 )
2026-04-14 19:01:17 -07:00
wozeparrot
457508d5a0
llama: save more 2 ( #15681 )
2026-04-11 01:03:36 -07:00
wozeparrot
590464c8d8
llama: only support wqkv path + cleanups ( #15680 )
...
* llama: only support wqkv path + cleanups
* llama: missing transpose
2026-04-11 07:39:27 +08:00
wozeparrot
55bcd7cc9e
llama amax outside ( #15670 )
2026-04-09 23:08:03 -07:00
qazal
39a029ec55
remove ASM_GEMM context var ( #15645 )
2026-04-08 18:02:40 +09:00
wozeparrot
70dbd35023
llama: move custom_kernel into flat_llama ( #15643 )
2026-04-08 00:19:14 -07:00
wozeparrot
7e54992bf6
fp8 llama ( #15588 )
...
Co-authored-by: qazal <qazal.software@gmail.com >
2026-04-04 18:24:57 -07:00
wozeparrot
a65e958be9
llama: new apply_grad ( #15503 )
2026-03-26 19:39:25 -07:00
Christopher Milan
bc180a963c
deprecate <dev>=1 in favor of DEV=<dev> ( #15467 )
...
* start work on target
* add test
* update actions to use DEV
* update docs
* update readmes
* tests need that too
* update example
* update tests (comments)
* fix that test
* ruff
* mypy
* oops
* remove getenvs
* don't add Target yet
* and the test
* lint
* and docs
* more stuff
* assert
* few more fixes
* test assert
2026-03-26 03:48:03 -04:00
wozeparrot
da2031266a
llama: correct 8b init ( #15397 )
2026-03-24 13:41:41 -07:00
wozeparrot
87c4ec1724
llama: use flat llama ( #15353 )
2026-03-19 22:12:38 -07:00
George Hotz
4091d37e8e
flat llama step work ( #15355 )
...
* flat llama step work
* fp8 support
* blacklisted matmul
* chestertons fence
2026-03-20 09:06:12 +08:00
George Hotz
5524916e39
llama compute gradients explicitly + 243 GB of RAM on MP=8 ( #15343 )
...
* llama compute gradients explicitly
* apply grads
* fix multi issue
* multi BUFFER_VIEW support
* simpler
* skip the flaky test
2026-03-18 19:54:40 +08:00
George Hotz
6e196195d8
add test for flat llama ( #15327 )
...
* add test for flat llama
* simpler
* back to split w1/w3
* env
* still too much ram
* invalid
2026-03-18 15:16:33 +08:00
George Hotz
2605840ee2
flat llama ( #15324 )
...
* FlatTransformer
* works
* pass in buffer views
* print stuff
* print
* bugfixes
2026-03-17 19:39:55 +08:00
wozeparrot
a191ac0566
llama: use mlperf model ( #15257 )
2026-03-13 08:08:32 -07:00