tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-07 22:23:55 -05:00

Author	SHA1	Message	Date
George Hotz	8dcba2e2cc	no full_rewrite [pr] (#13809 ) * no full_rewrite [pr] * fix * fix docs	2025-12-22 23:20:01 -05:00
George Hotz	edce2303f4	rewrite to program (#13808 )	2025-12-22 20:03:33 -05:00
George Hotz	2af2b4da5d	Revert "rewrites for renderer and compiler (#13646 )" (#13806 ) This reverts commit `339dadf056`.	2025-12-22 19:21:33 -05:00
George Hotz	339dadf056	rewrites for renderer and compiler (#13646 ) * rewrites for renderer and compiler * full_rewrite_to_program * fix pre-commit * compiler passed into get_program * no pkl compiler * lib on program spec * fix spec * fix test * no device * compiler_device * nm * fix nir * fix * simplest * fix tests * revert	2025-12-22 18:58:43 -05:00
Daniel Xu	4edaaf19e5	Handle tied embeddings for llama 3.2 1B (#13796 ) Previously the output.weight layer would not be loaded, and would only contain randomly initialized values. This led to junk when doing a forward pass. Signed-off-by: Daniel Xu <daniel@thinkingmachines.ai>	2025-12-22 16:31:40 -05:00
chenyu	7f1d41c9f9	delete files that import ShapeTracker (#13805 )	2025-12-22 15:54:18 -05:00
qazal	b31373ca70	remove llvm-mca stuff from viz (#13802 )	2025-12-23 01:41:51 +08:00
chenyu	27d899ce97	TRAIN=0 to only eval llama (#13804 )	2025-12-22 11:55:46 -05:00
chenyu	39d962106f	update llama logging (#13803 ) ``` REWRITE_STACK_LIMIT=1000000 SMALL=1 BASEDIR=/raid/datasets/c4-8b SAMPLES=1000 BS=8 DP=8 DEFAULT_FLOAT=bfloat16 OPTIM_DTYPE=bfloat16 LLAMA3_SIZE=8B SEQLEN=1024 PYTHONPATH=. MODEL=llama3 python3 examples/mlperf/model_train.py 1 93.44 s run, 11.8750 loss, 0.000000000001 LR, 642.43 GB used, 19644.30 GFLOPS 2 101.78 s run, 11.8750 loss, 0.000000000001 LR, 1454.57 GB used, 17039.35 GFLOPS 3 7.34 s run, 11.8750 loss, 0.000000000002 LR, 1454.57 GB used, 236258.78 GFLOPS 4 4.32 s run, 11.8750 loss, 0.000000000002 LR, 1454.57 GB used, 401488.40 GFLOPS 5 4.36 s run, 11.9375 loss, 0.000000000003 LR, 1454.57 GB used, 398116.13 GFLOPS 6 4.32 s run, 11.8750 loss, 0.000000000003 LR, 1454.57 GB used, 401878.60 GFLOPS 7 4.34 s run, 11.8750 loss, 0.000000000004 LR, 1454.57 GB used, 399822.57 GFLOPS 8 4.35 s run, 11.8750 loss, 0.000000000004 LR, 1454.57 GB used, 398512.24 GFLOPS 9 4.36 s run, 11.8750 loss, 0.000000000005 LR, 1454.57 GB used, 397832.61 GFLOPS 10 4.40 s run, 11.8750 loss, 0.000000000005 LR, 1454.57 GB used, 394520.83 GFLOPS ```	2025-12-22 11:28:29 -05:00
qazal	389f01c7f4	viz: amdgpu assembly basic block graph (#13755 )	2025-12-22 23:17:16 +08:00
George Hotz	df0f9d6860	add olmoe support to llm (#13792 ) * add olmoe support to llm * cleanups * simpler * clean * fix mypy * lil * remove dumb assert	2025-12-22 10:41:35 -04:00
qazal	81d9053013	roc: cast to nullptr instead of changing header (#13801 )	2025-12-22 22:34:06 +08:00
nimlgen	d299d30f2c	am_smi: fix with new autogen (#13800 )	2025-12-22 16:53:26 +03:00
nimlgen	f6bda6ae4e	am: continue from saved state (#13799 ) * am: gfx queue cont * f * reset * f * l	2025-12-22 15:55:07 +03:00
qazal	6237bd86f6	sqtt/pmc viz improvements (#13797 )	2025-12-22 18:16:35 +09:00
Sitananda Prasad	3000b8d762	symbolic: add x ^ x -> 0 folding pattern (#13794 )	2025-12-21 21:47:28 -04:00
chenyu	5cb827f7bf	clean up can_lossless_cast and add missing pairs [p] (#13793 )	2025-12-21 12:18:33 -05:00
George Hotz	75a6a03664	add qwen3 moe support to tinygrad.apps.llm (#13775 ) * qwen moe works * simple moe * one test * integration	2025-12-21 12:36:02 -04:00
chenyu	29ef0809bb	can_safe_cast -> can_lossless_cast (#13789 ) safe cast in numpy only means the result won't overflow, so lossless is more precise	2025-12-21 11:29:19 -05:00
chenyu	ed1fd7023b	use getattr in dtype.truncate [pr] (#13788 )	2025-12-21 11:05:43 -05:00
qazal	9839838fdd	viz UOp layout cleanup (#13787 ) * use the same names in server and client * first layout args, then renderer args	2025-12-21 22:11:40 +08:00
nimlgen	e523971028	am: make mqd contig (#13786 )	2025-12-21 17:00:33 +03:00
qazal	09e060eab5	simplify viz node labels (#13784 )	2025-12-21 16:45:06 +08:00
qazal	dc660c9fc0	remove stale / untested viz related files (#13785 )	2025-12-21 16:42:48 +08:00
George Hotz	59c02dd87f	does this fix the dtype test? (#13779 ) * does this fix the dtype test? * simpler	2025-12-20 17:31:46 -04:00
George Hotz	5228f7bd06	hotfix: opencode should not reformat files	2025-12-20 15:55:29 -04:00
chenyu	733ef0452c	update test_uop_resolve (#13777 ) plain @unittest.expectedFailure is too broad	2025-12-20 12:40:59 -05:00
nimlgen	3db2104fb8	am: timeout sos start (#13776 )	2025-12-20 17:41:33 +03:00
qazal	94f97f6988	generic viz cleanups from the basic blocks branch (#13774 ) * simpler codeblock highlight * simpler append * status enum	2025-12-20 18:18:03 +08:00
George Hotz	a987a8ed44	add neg VIZ support to not start server (#13772 )	2025-12-20 00:36:38 -04:00
qazal	b7c2f0dd1b	remove stale extra/sched directory (#13770 )	2025-12-20 11:57:30 +08:00
George Hotz	86cd1e9e81	remove UPatAny for typing fix [pr] (#13766 ) * remove UPatAny for typing fix [pr] * fix dtype	2025-12-19 17:41:18 -04:00
George Hotz	4702da41d5	hotfix: mkdir for extra/disassemblers	2025-12-19 17:18:37 -04:00
George Hotz	45c459848d	remove more stale stuff (#13765 ) * remove more stale stuff * remove disassemblers/adreno * stale	2025-12-19 17:14:56 -04:00
George Hotz	744af193f0	remove ScheduleItem and merge it with ExecItem (#13759 ) * remove ExecItem and merge it with ScheduleItem * less diff * fix issues * min diff * don't change bufs in _lower * min diff * update * revert * fixes * diff	2025-12-19 17:04:24 -04:00
George Hotz	df6cde8a00	cleanup stale examples/extra (#13764 ) * cleanup stale files * examples * move those back * old * delete more	2025-12-19 16:27:37 -04:00
chenyu	80b84f5267	ruff lint tinykitten (#13762 ) deleted used import and double spaces. a few ignore to not change the real code	2025-12-19 14:31:00 -05:00
Christopher Milan	97103831c5	Revert "remove image from BufferSpec (#13636 )" (#13761 ) This reverts commit `2571a1eb47`.	2025-12-19 13:54:36 -05:00
Christopher Milan	2571a1eb47	remove image from BufferSpec (#13636 ) * remove image from BufferSpec * cl tiny_gemm (64) works * mypy * padding * openpilot CL * reshape properly * remove extra qcom checks * pad output * mypy * update compile test * move undo * TestImageCopy valid images * TestImageRealization valid images * TestImageDType valid images * cleanups * test_renderer_failures * ruff * mypy * simplify ops_qcom * bump step time	2025-12-19 13:41:20 -05:00
chenyu	185a000882	gradient of COPY (#13760 )	2025-12-19 13:33:59 -05:00
nimlgen	57fe4d0a59	am: no_update_ptr for master (#13757 )	2025-12-19 19:37:37 +03:00
chenyu	7fcd3cf991	hotfix SPEC for AFTER(CONTIGUOUS) (#13752 ) fixed spec error in `PYTHONPATH="." REWRITE_STACK_LIMIT=5000000 NULL=1 DEFAULT_FLOAT="HALF" BERT_LAYERS=2 BENCHMARK=10 BS=128 GPUS=1 MODEL=bert python3 examples/mlperf/model_train.py`	2025-12-19 10:05:45 -04:00
qazal	81b5815a66	viz: minimal data to render a graph (#13754 )	2025-12-19 16:19:28 +08:00
Christopher Milan	849e46da21	DLL: _PATH variables can be parent dir (#13753 )	2025-12-19 00:28:02 -05:00
qazal	159c0e92fa	viz: infrastructure for basic block graphs (#13751 )	2025-12-19 13:08:19 +08:00
George Hotz	fa40df972f	fix tests for NV (#13744 ) * small fix * min diff * bfloat16 out	2025-12-18 13:20:21 -04:00
nimlgen	77191fb744	hive_reset for mi350 (#13746 )	2025-12-18 12:02:28 +03:00
nimlgen	ceff388f3d	am: extend va space (#13745 )	2025-12-18 11:20:43 +03:00
wozeparrot	99e667bdcd	tk fa bwd (#13480 )	2025-12-17 23:56:37 -08:00
George Hotz	aeb7516c8a	tests passing on tinybox h3 (#13742 )	2025-12-17 19:04:34 -04:00

1 2 3 4 5 ...

11449 Commits