tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
wozeparrot	a65e958be9	llama: new apply_grad (#15503 )	2026-03-26 19:39:25 -07:00
Christopher Milan	bc180a963c	deprecate <dev>=1 in favor of DEV=<dev> (#15467 ) * start work on target * add test * update actions to use DEV * update docs * update readmes * tests need that too * update example * update tests (comments) * fix that test * ruff * mypy * oops * remove getenvs * don't add Target yet * and the test * lint * and docs * more stuff * assert * few more fixes * test assert	2026-03-26 03:48:03 -04:00
wozeparrot	1ca178f379	llama: stochastic rounding (#15456 )	2026-03-25 18:16:31 -07:00
qazal	1b3d00d6ac	viz/cli: remove --offset and --limit flags (#15439 ) * work * also no more no-color * reorder * update llama * sqtt readme * itertools * rm that * signals back	2026-03-25 09:52:27 +09:00
wozeparrot	da2031266a	llama: correct 8b init (#15397 )	2026-03-24 13:41:41 -07:00
nimlgen	2da008ae3b	jit: rm replan (#15433 )	2026-03-23 19:31:51 +08:00
Pham Nguyen Hung	c89576921d	Updated the APIs of mnist_gan (#15429 ) Co-authored-by: pnhung1703@gmail.com <Hung Pham>	2026-03-23 17:04:00 +08:00
qazal	c7b18e6108	viz: sqtt printer in viz/cli.py (#15411 ) * work * sqtt timeline in CLI * format all printers nicely * s/Showed/Printed * ansistrip * sys.exit * keep colors in list * work from amd_copy_matmul * has_more always gets returned * linter * don't print colors * more colors * wow this is so deep * work * minor details * selected * improve progress bar * remove it * 22, global_load_vaddr is so long	2026-03-23 00:17:05 +09:00
qazal	2363bceb47	viz: no context enters in cli, update llama profile (#15404 )	2026-03-22 05:47:02 +09:00
wozeparrot	87c4ec1724	llama: use flat llama (#15353 )	2026-03-19 22:12:38 -07:00
George Hotz	4091d37e8e	flat llama step work (#15355 ) * flat llama step work * fp8 support * blacklisted matmul * chestertons fence	2026-03-20 09:06:12 +08:00
wozeparrot	f6687d1ffc	feat: sd seed0 update (#15354 )	2026-03-18 18:42:00 -07:00
George Hotz	5524916e39	llama compute gradients explicitly + 243 GB of RAM on MP=8 (#15343 ) * llama compute gradients explicitly * apply grads * fix multi issue * multi BUFFER_VIEW support * simpler * skip the flaky test	2026-03-18 19:54:40 +08:00
George Hotz	6e196195d8	add test for flat llama (#15327 ) * add test for flat llama * simpler * back to split w1/w3 * env * still too much ram * invalid	2026-03-18 15:16:33 +08:00
George Hotz	2605840ee2	flat llama (#15324 ) * FlatTransformer * works * pass in buffer views * print stuff * print * bugfixes	2026-03-17 19:39:55 +08:00
George Hotz	9d95321be3	set allow_implicit=False by default (#15319 ) * set allow_implicit=False by default * modernize beautiful mnist	2026-03-17 17:14:38 +08:00
wozeparrot	a191ac0566	llama: use mlperf model (#15257 )	2026-03-13 08:08:32 -07:00
wozeparrot	749162bd2f	llama memory tweaks (#15223 )	2026-03-12 12:36:23 -07:00
wozeparrot	4fab320abe	llama: clean (#15224 )	2026-03-11 13:33:59 -07:00
wozeparrot	05d6d9120a	llama offload null (#15222 )	2026-03-11 10:04:31 -07:00
wozeparrot	525a178966	llama: jit more (#15199 )	2026-03-10 11:04:59 +08:00
wozeparrot	4544da1c54	llama3 fixes part3 (#15152 )	2026-03-05 01:17:54 -08:00
wozeparrot	0c769289eb	llama3: more scripts (#15107 )	2026-03-04 22:18:03 -08:00
Christopher Milan	592f9bf6c6	set OPENPILOT_HACKS=1 to enable replace assign (#15123 )	2026-03-04 05:26:04 -05:00
Christopher Milan	de043226ba	benchmark comma usbgpu driving_vision step and load time (#15103 ) Co-authored-by: Comma Device <device@comma.ai>	2026-03-03 06:08:03 -05:00
wozeparrot	92c16810ac	feat: per device mem_used (#15100 )	2026-03-03 01:31:28 -08:00
wozeparrot	824ba4386a	llama3 dp fix (#15098 )	2026-03-02 22:43:07 -08:00
qazal	f7aeff6061	viz: cli.py cleanups, do not require PYTHONPATH (#15085 ) * cleanup the print * sys.exit * equal check * cleanup unpacker * cli doesn't need PYTHONPATH * no semicolons * %s/PYTHONPATH=. //g	2026-03-02 19:24:38 +09:00
wozeparrot	a4f6365929	llama3: fstep takes grads (#15069 )	2026-03-01 20:05:07 -08:00
wozeparrot	cfc5cf65ad	llama3: vocab padding fix + jit copies on fakedata (#15067 )	2026-02-28 08:44:55 -08:00
George Hotz	bb84e389cf	functions for llama trainer (#15045 ) * functions for llama trainer * function there * axis match * fix multi * lil cleaner * there's a bug with HK_FLASH_ATTENTION * training functions * for commit	2026-02-28 12:15:18 +08:00
Nick	af94bfc401	fix retinanet shared memory race condition in parallel tests (#15030 ) Append PID to shared memory names in batch_load_retinanet to prevent FileExistsError when pytest-xdist runs multiple test workers that each call _setup_shared_mem with the same hardcoded name.	2026-02-27 08:36:24 +08:00
wozeparrot	d941dd5aeb	llama3: pad vocab when mp sharding (#14998 )	2026-02-25 00:04:06 -08:00
wozeparrot	e1c9985715	llama3: better time keeping (#14999 )	2026-02-24 22:42:05 -08:00
wozeparrot	8d9545e09e	llama3: correctly shard wqkv (#14978 )	2026-02-23 23:57:10 -08:00
wozeparrot	a36a26d4ed	llama3: optim does grad acc in correct order (#14965 )	2026-02-23 22:25:13 -08:00
wozeparrot	3cda781876	llama optim offload (#14901 )	2026-02-21 08:53:45 -08:00
George Hotz	55d3a5def9	preallocate all realized buffers (#14823 ) * preallocate all realized buffers * contiguous * work * comment that out * move to schedule * better * correct fix * just buffer * disk bufs * fixes disk tensor stuff * fix symbolic stuff * fix multi * 162 failures * bugfixes * don't check that anymore * fix schedule tests * mnist should be contiguious * type and buffer * fix tests * shrink axis correction * mypy fixes * tests skips * same 37 failures * dedup * no shrink in the graph * 29 failures * skips * fix custom kernel * fix training * those optimizations aren't supported currently * simpler * more correct * tests * 14 failures * works * fix that test * broken * 11 failures * only kernel counts left * fixes * all tests pass * remove tensor_map * op test * 200 -> 230 * test fixes * fixes * revert test_tiny thing * guard * revert that * test tiny passes * no contigs there * base realize back * Revert "no contigs there" This reverts commit `c45bb9fcfd`. * revert that * chop many assigns * 12 failures * fix tests * tests * apply after * pre-commit * remove old code * delete that * fix types * remove extra contig * fix dataloader * torch fix * disk fix * update kernel fusion numbres * runs on amd * restore kernel count * add that rule back * that * disable that * wrong * add the correct rule for that folding * more tests * guard c1.arg * no newlines * realize those * split into a different file * remove detach/contig back * skip 2 * update that	2026-02-20 20:05:54 +08:00
George Hotz	fc5677c28b	resnet dataloader + more test cleanups (#14899 ) * resnet dataloader * tests	2026-02-20 10:05:47 +08:00
chenyu	f84a11bb9f	delete uneven shard tests and mentions (#14867 )	2026-02-18 14:10:33 -05:00
wozeparrot	6d301ad2c4	feat: llama wqkv (#14841 )	2026-02-17 23:01:33 -08:00
wozeparrot	95e97ec341	seperate llama optim (#14810 )	2026-02-17 13:02:35 -08:00
wozeparrot	45aebe1572	hipkittens fa backward (#14723 )	2026-02-16 00:38:44 -08:00
wozeparrot	4b5d3bda1f	llama3: data seed (#14681 )	2026-02-11 19:04:40 -08:00
wozeparrot	a60220bed9	llama3: move dl to numpy & jit more (#14677 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2026-02-10 18:16:40 -08:00
wozeparrot	69574542ab	fix: use correct fa implementation in eval (#14651 )	2026-02-09 18:20:44 -08:00
qazal	50d3f6cea5	EVAL_BS=0 in llama profile (#14643 )	2026-02-10 00:49:43 +09:00
nimlgen	e087c58ae0	print tables in llama/profile.sh (#14639 )	2026-02-09 12:32:54 +03:00
qazal	b7e3fbe07e	llama: add VIZ=-1 to dev_run (#14583 ) * llama: add VIZ=-1 to dev_run * readme * cleaner * add profile.sh script * better grouping of options * add other row * readme edits * work	2026-02-06 22:59:22 +09:00
chenyu	d57d24c7d4	Buffer.as_buffer -> Buffer.as_memoryview [pr] (#14535 ) it casts to memoryview. also inline the as_typed_buffer checks to Tensor._data	2026-02-04 11:31:11 -05:00

1 2 3 4 5 ...

1310 Commits