George Hotz
1ae6528bb6
move schedule into schedule ( #15736 )
...
* move schedule into schedule
* callify to root
* sched docs
2026-04-15 11:03:25 +08:00
wozeparrot
3721c60bef
llama: bs 16 ( #15737 )
2026-04-14 19:52:03 -07:00
wozeparrot
480ad264a4
llama: per device amax ( #15735 )
2026-04-14 19:01:17 -07:00
chenyu
3394d18066
size*itemsize -> nbytes ( #15729 )
...
and some UOp.size removal to prep for size to mixin change
2026-04-14 16:27:54 -04:00
wozeparrot
2b8d303f75
allreduce in precast dtype ( #15689 )
2026-04-13 20:24:12 -07:00
qazal
054d78e6ff
fix llama profile.sh NULL source ( #15685 )
2026-04-11 22:56:05 +09:00
wozeparrot
457508d5a0
llama: save more 2 ( #15681 )
2026-04-11 01:03:36 -07:00
wozeparrot
590464c8d8
llama: only support wqkv path + cleanups ( #15680 )
...
* llama: only support wqkv path + cleanups
* llama: missing transpose
2026-04-11 07:39:27 +08:00
wozeparrot
55bcd7cc9e
llama amax outside ( #15670 )
2026-04-09 23:08:03 -07:00
chenyu
839d37b7bc
update median_step_time in model_train.py ( #15649 )
...
BENCHMARK=5 used to pick the 4th largest, not the middle one
2026-04-08 09:53:59 -04:00
qazal
39a029ec55
remove ASM_GEMM context var ( #15645 )
2026-04-08 18:02:40 +09:00
wozeparrot
70dbd35023
llama: move custom_kernel into flat_llama ( #15643 )
2026-04-08 00:19:14 -07:00
qazal
890286e8d6
update llama profile.sh ( #15633 )
...
* update llama profile.sh
* BENCHMARK 5
2026-04-08 03:18:45 +09:00
wozeparrot
810d7c00cd
llama: unify scripts ( #15628 )
2026-04-06 20:28:08 -07:00
wozeparrot
7e54992bf6
fp8 llama ( #15588 )
...
Co-authored-by: qazal <qazal.software@gmail.com >
2026-04-04 18:24:57 -07:00
qazal
f7aed180e4
viz/cli: add Other row in profiler ( #15600 )
2026-04-04 22:40:53 +09:00
wozeparrot
5b2a3251c4
mlperf system json for mi350 ( #15575 )
2026-04-01 15:30:33 -07:00
qazal
09f60d80fd
llama: fix FP8=1 FAKEDATA=1 ( #15564 )
2026-04-01 20:53:03 +09:00
wozeparrot
8b5b9a0e90
llama: run_and_time ( #15533 )
2026-03-31 15:46:16 -07:00
qazal
8feb8edc68
gemm/asm: add fp8 support to cdna asm_gemm ( #15542 )
...
* work
* hmm, mixins
* rhs_transposed
* also fix the dtype
* check for hipcc
* Exception
* select dev
* default
2026-03-31 19:32:54 +09:00
Christopher Milan
adbfd82d1d
DEV is ContextVar, setting Device.DEFAULT is deprecated ( #15508 )
2026-03-30 17:10:49 -04:00
wozeparrot
0c3e438229
llama: mllog ( #15502 )
2026-03-28 11:18:25 -07:00
wozeparrot
a65e958be9
llama: new apply_grad ( #15503 )
2026-03-26 19:39:25 -07:00
Christopher Milan
bc180a963c
deprecate <dev>=1 in favor of DEV=<dev> ( #15467 )
...
* start work on target
* add test
* update actions to use DEV
* update docs
* update readmes
* tests need that too
* update example
* update tests (comments)
* fix that test
* ruff
* mypy
* oops
* remove getenvs
* don't add Target yet
* and the test
* lint
* and docs
* more stuff
* assert
* few more fixes
* test assert
2026-03-26 03:48:03 -04:00
wozeparrot
1ca178f379
llama: stochastic rounding ( #15456 )
2026-03-25 18:16:31 -07:00
qazal
1b3d00d6ac
viz/cli: remove --offset and --limit flags ( #15439 )
...
* work
* also no more no-color
* reorder
* update llama
* sqtt readme
* itertools
* rm that
* signals back
2026-03-25 09:52:27 +09:00
wozeparrot
da2031266a
llama: correct 8b init ( #15397 )
2026-03-24 13:41:41 -07:00
nimlgen
2da008ae3b
jit: rm replan ( #15433 )
2026-03-23 19:31:51 +08:00
Pham Nguyen Hung
c89576921d
Updated the APIs of mnist_gan ( #15429 )
...
Co-authored-by: pnhung1703@gmail.com <Hung Pham>
2026-03-23 17:04:00 +08:00
qazal
c7b18e6108
viz: sqtt printer in viz/cli.py ( #15411 )
...
* work
* sqtt timeline in CLI
* format all printers nicely
* s/Showed/Printed
* ansistrip
* sys.exit
* keep colors in list
* work from amd_copy_matmul
* has_more always gets returned
* linter
* don't print colors
* more colors
* wow this is so deep
* work
* minor details
* selected
* improve progress bar
* remove it
* 22, global_load_vaddr is so long
2026-03-23 00:17:05 +09:00
qazal
2363bceb47
viz: no context enters in cli, update llama profile ( #15404 )
2026-03-22 05:47:02 +09:00
wozeparrot
87c4ec1724
llama: use flat llama ( #15353 )
2026-03-19 22:12:38 -07:00
George Hotz
4091d37e8e
flat llama step work ( #15355 )
...
* flat llama step work
* fp8 support
* blacklisted matmul
* chestertons fence
2026-03-20 09:06:12 +08:00
wozeparrot
f6687d1ffc
feat: sd seed0 update ( #15354 )
2026-03-18 18:42:00 -07:00
George Hotz
5524916e39
llama compute gradients explicitly + 243 GB of RAM on MP=8 ( #15343 )
...
* llama compute gradients explicitly
* apply grads
* fix multi issue
* multi BUFFER_VIEW support
* simpler
* skip the flaky test
2026-03-18 19:54:40 +08:00
George Hotz
6e196195d8
add test for flat llama ( #15327 )
...
* add test for flat llama
* simpler
* back to split w1/w3
* env
* still too much ram
* invalid
2026-03-18 15:16:33 +08:00
George Hotz
2605840ee2
flat llama ( #15324 )
...
* FlatTransformer
* works
* pass in buffer views
* print stuff
* print
* bugfixes
2026-03-17 19:39:55 +08:00
George Hotz
9d95321be3
set allow_implicit=False by default ( #15319 )
...
* set allow_implicit=False by default
* modernize beautiful mnist
2026-03-17 17:14:38 +08:00
wozeparrot
a191ac0566
llama: use mlperf model ( #15257 )
2026-03-13 08:08:32 -07:00
wozeparrot
749162bd2f
llama memory tweaks ( #15223 )
2026-03-12 12:36:23 -07:00
wozeparrot
4fab320abe
llama: clean ( #15224 )
2026-03-11 13:33:59 -07:00
wozeparrot
05d6d9120a
llama offload null ( #15222 )
2026-03-11 10:04:31 -07:00
wozeparrot
525a178966
llama: jit more ( #15199 )
2026-03-10 11:04:59 +08:00
wozeparrot
4544da1c54
llama3 fixes part3 ( #15152 )
2026-03-05 01:17:54 -08:00
wozeparrot
0c769289eb
llama3: more scripts ( #15107 )
2026-03-04 22:18:03 -08:00
Christopher Milan
592f9bf6c6
set OPENPILOT_HACKS=1 to enable replace assign ( #15123 )
2026-03-04 05:26:04 -05:00
Christopher Milan
de043226ba
benchmark comma usbgpu driving_vision step and load time ( #15103 )
...
Co-authored-by: Comma Device <device@comma.ai >
2026-03-03 06:08:03 -05:00
wozeparrot
92c16810ac
feat: per device mem_used ( #15100 )
2026-03-03 01:31:28 -08:00
wozeparrot
824ba4386a
llama3 dp fix ( #15098 )
2026-03-02 22:43:07 -08:00
qazal
f7aeff6061
viz: cli.py cleanups, do not require PYTHONPATH ( #15085 )
...
* cleanup the print
* sys.exit
* equal check
* cleanup unpacker
* cli doesn't need PYTHONPATH
* no semicolons
* %s/PYTHONPATH=. //g
2026-03-02 19:24:38 +09:00