Commit Graph

1310 Commits

Author SHA1 Message Date
wozeparrot
a65e958be9 llama: new apply_grad (#15503) 2026-03-26 19:39:25 -07:00
Christopher Milan
bc180a963c deprecate <dev>=1 in favor of DEV=<dev> (#15467)
* start work on target

* add test

* update actions to use DEV

* update docs

* update readmes

* tests need that too

* update example

* update tests (comments)

* fix that test

* ruff

* mypy

* oops

* remove getenvs

* don't add Target yet

* and the test

* lint

* and docs

* more stuff

* assert

* few more fixes

* test assert
2026-03-26 03:48:03 -04:00
wozeparrot
1ca178f379 llama: stochastic rounding (#15456) 2026-03-25 18:16:31 -07:00
qazal
1b3d00d6ac viz/cli: remove --offset and --limit flags (#15439)
* work

* also no more no-color

* reorder

* update llama

* sqtt readme

* itertools

* rm that

* signals back
2026-03-25 09:52:27 +09:00
wozeparrot
da2031266a llama: correct 8b init (#15397) 2026-03-24 13:41:41 -07:00
nimlgen
2da008ae3b jit: rm replan (#15433) 2026-03-23 19:31:51 +08:00
Pham Nguyen Hung
c89576921d Updated the APIs of mnist_gan (#15429)
Co-authored-by: pnhung1703@gmail.com <Hung Pham>
2026-03-23 17:04:00 +08:00
qazal
c7b18e6108 viz: sqtt printer in viz/cli.py (#15411)
* work

* sqtt timeline in CLI

* format all printers nicely

* s/Showed/Printed

* ansistrip

* sys.exit

* keep colors in list

* work from amd_copy_matmul

* has_more always gets returned

* linter

* don't print colors

* more colors

* wow this is so deep

* work

* minor details

* selected

* improve progress bar

* remove it

* 22, global_load_vaddr is so long
2026-03-23 00:17:05 +09:00
qazal
2363bceb47 viz: no context enters in cli, update llama profile (#15404) 2026-03-22 05:47:02 +09:00
wozeparrot
87c4ec1724 llama: use flat llama (#15353) 2026-03-19 22:12:38 -07:00
George Hotz
4091d37e8e flat llama step work (#15355)
* flat llama step work

* fp8 support

* blacklisted matmul

* chestertons fence
2026-03-20 09:06:12 +08:00
wozeparrot
f6687d1ffc feat: sd seed0 update (#15354) 2026-03-18 18:42:00 -07:00
George Hotz
5524916e39 llama compute gradients explicitly + 243 GB of RAM on MP=8 (#15343)
* llama compute gradients explicitly

* apply grads

* fix multi issue

* multi BUFFER_VIEW support

* simpler

* skip the flaky test
2026-03-18 19:54:40 +08:00
George Hotz
6e196195d8 add test for flat llama (#15327)
* add test for flat llama

* simpler

* back to split w1/w3

* env

* still too much ram

* invalid
2026-03-18 15:16:33 +08:00
George Hotz
2605840ee2 flat llama (#15324)
* FlatTransformer

* works

* pass in buffer views

* print stuff

* print

* bugfixes
2026-03-17 19:39:55 +08:00
George Hotz
9d95321be3 set allow_implicit=False by default (#15319)
* set allow_implicit=False by default

* modernize beautiful mnist
2026-03-17 17:14:38 +08:00
wozeparrot
a191ac0566 llama: use mlperf model (#15257) 2026-03-13 08:08:32 -07:00
wozeparrot
749162bd2f llama memory tweaks (#15223) 2026-03-12 12:36:23 -07:00
wozeparrot
4fab320abe llama: clean (#15224) 2026-03-11 13:33:59 -07:00
wozeparrot
05d6d9120a llama offload null (#15222) 2026-03-11 10:04:31 -07:00
wozeparrot
525a178966 llama: jit more (#15199) 2026-03-10 11:04:59 +08:00
wozeparrot
4544da1c54 llama3 fixes part3 (#15152) 2026-03-05 01:17:54 -08:00
wozeparrot
0c769289eb llama3: more scripts (#15107) 2026-03-04 22:18:03 -08:00
Christopher Milan
592f9bf6c6 set OPENPILOT_HACKS=1 to enable replace assign (#15123) 2026-03-04 05:26:04 -05:00
Christopher Milan
de043226ba benchmark comma usbgpu driving_vision step and load time (#15103)
Co-authored-by: Comma Device <device@comma.ai>
2026-03-03 06:08:03 -05:00
wozeparrot
92c16810ac feat: per device mem_used (#15100) 2026-03-03 01:31:28 -08:00
wozeparrot
824ba4386a llama3 dp fix (#15098) 2026-03-02 22:43:07 -08:00
qazal
f7aeff6061 viz: cli.py cleanups, do not require PYTHONPATH (#15085)
* cleanup the print

* sys.exit

* equal check

* cleanup unpacker

* cli doesn't need PYTHONPATH

* no semicolons

* %s/PYTHONPATH=. //g
2026-03-02 19:24:38 +09:00
wozeparrot
a4f6365929 llama3: fstep takes grads (#15069) 2026-03-01 20:05:07 -08:00
wozeparrot
cfc5cf65ad llama3: vocab padding fix + jit copies on fakedata (#15067) 2026-02-28 08:44:55 -08:00
George Hotz
bb84e389cf functions for llama trainer (#15045)
* functions for llama trainer

* function there

* axis match

* fix multi

* lil cleaner

* there's a bug with HK_FLASH_ATTENTION

* training functions

* for commit
2026-02-28 12:15:18 +08:00
Nick
af94bfc401 fix retinanet shared memory race condition in parallel tests (#15030)
Append PID to shared memory names in batch_load_retinanet to prevent
FileExistsError when pytest-xdist runs multiple test workers that each
call _setup_shared_mem with the same hardcoded name.
2026-02-27 08:36:24 +08:00
wozeparrot
d941dd5aeb llama3: pad vocab when mp sharding (#14998) 2026-02-25 00:04:06 -08:00
wozeparrot
e1c9985715 llama3: better time keeping (#14999) 2026-02-24 22:42:05 -08:00
wozeparrot
8d9545e09e llama3: correctly shard wqkv (#14978) 2026-02-23 23:57:10 -08:00
wozeparrot
a36a26d4ed llama3: optim does grad acc in correct order (#14965) 2026-02-23 22:25:13 -08:00
wozeparrot
3cda781876 llama optim offload (#14901) 2026-02-21 08:53:45 -08:00
George Hotz
55d3a5def9 preallocate all realized buffers (#14823)
* preallocate all realized buffers

* contiguous

* work

* comment that out

* move to schedule

* better

* correct fix

* just buffer

* disk bufs

* fixes disk tensor stuff

* fix symbolic stuff

* fix multi

* 162 failures

* bugfixes

* don't check that anymore

* fix schedule tests

* mnist should be contiguious

* type and buffer

* fix tests

* shrink axis correction

* mypy fixes

* tests skips

* same 37 failures

* dedup

* no shrink in the graph

* 29 failures

* skips

* fix custom kernel

* fix training

* those optimizations aren't supported currently

* simpler

* more correct

* tests

* 14 failures

* works

* fix that test

* broken

* 11 failures

* only kernel counts left

* fixes

* all tests pass

* remove tensor_map

* op test

* 200 -> 230

* test fixes

* fixes

* revert test_tiny thing

* guard

* revert that

* test tiny passes

* no contigs there

* base realize back

* Revert "no contigs there"

This reverts commit c45bb9fcfd.

* revert that

* chop many assigns

* 12 failures

* fix tests

* tests

* apply after

* pre-commit

* remove old code

* delete that

* fix types

* remove extra contig

* fix dataloader

* torch fix

* disk fix

* update kernel fusion numbres

* runs on amd

* restore kernel count

* add that rule back

* that

* disable that

* wrong

* add the correct rule for that folding

* more tests

* guard c1.arg

* no newlines

* realize those

* split into a different file

* remove detach/contig back

* skip 2

* update that
2026-02-20 20:05:54 +08:00
George Hotz
fc5677c28b resnet dataloader + more test cleanups (#14899)
* resnet dataloader

* tests
2026-02-20 10:05:47 +08:00
chenyu
f84a11bb9f delete uneven shard tests and mentions (#14867) 2026-02-18 14:10:33 -05:00
wozeparrot
6d301ad2c4 feat: llama wqkv (#14841) 2026-02-17 23:01:33 -08:00
wozeparrot
95e97ec341 seperate llama optim (#14810) 2026-02-17 13:02:35 -08:00
wozeparrot
45aebe1572 hipkittens fa backward (#14723) 2026-02-16 00:38:44 -08:00
wozeparrot
4b5d3bda1f llama3: data seed (#14681) 2026-02-11 19:04:40 -08:00
wozeparrot
a60220bed9 llama3: move dl to numpy & jit more (#14677)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-02-10 18:16:40 -08:00
wozeparrot
69574542ab fix: use correct fa implementation in eval (#14651) 2026-02-09 18:20:44 -08:00
qazal
50d3f6cea5 EVAL_BS=0 in llama profile (#14643) 2026-02-10 00:49:43 +09:00
nimlgen
e087c58ae0 print tables in llama/profile.sh (#14639) 2026-02-09 12:32:54 +03:00
qazal
b7e3fbe07e llama: add VIZ=-1 to dev_run (#14583)
* llama: add VIZ=-1 to dev_run

* readme

* cleaner

* add profile.sh script

* better grouping of options

* add other row

* readme edits

* work
2026-02-06 22:59:22 +09:00
chenyu
d57d24c7d4 Buffer.as_buffer -> Buffer.as_memoryview [pr] (#14535)
it casts to memoryview. also inline the as_typed_buffer checks to Tensor._data
2026-02-04 11:31:11 -05:00