tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-06 21:53:53 -05:00

Author	SHA1	Message	Date
nimlgen	c6769badc2	mockgpu: async support (#13868 ) * mockgpu: async support * cpu	2025-12-29 13:18:37 +03:00
qazal	fc5278746f	mi350x assembly gemm cleanups (#13867 )	2025-12-29 18:47:23 +09:00
George Hotz	f07c39cfa4	hwtest fixes for rdna3 dsl (#13865 )	2025-12-28 20:42:29 -05:00
George Hotz	d9603c1bee	improve asm dsl syntax (#13864 ) * improve asm dsl syntax * improve asm dsl syntax	2025-12-28 20:04:59 -05:00
chenyu	f5090192c8	reorder AMD tensor core benchmark test (#13860 ) * reorder AMD tensor core benchmark test * disable that	2025-12-28 12:29:51 -05:00
qazal	066d96c397	print tflops in asm gemm test (#13859 ) * print tflops in asm gemm test * change order	2025-12-29 02:26:40 +09:00
chenyu	a03cd43e78	fix typing in compute_gradient (#13852 )	2025-12-28 11:52:14 -05:00
chenyu	cba05acadf	re-enable TYPED=1 import test (#13858 )	2025-12-28 11:49:06 -05:00
qazal	2cfbabdc34	mi350x 1tflop bf16 gemm in extra (#13702 )	2025-12-28 21:45:42 +09:00
qazal	2180eee5e4	use the asm dsl in remu hwtest.py (#13856 ) * remu hw test with the asm dsl * simpler * nthreads and exec mask * cmp/cmpx * assembler error in s_mov_b32 * vopd in dsl?	2025-12-28 11:32:41 +09:00
chenyu	784b919f7f	Revert "optim empty shard #13513 (#13598 )" (#13855 ) * Revert "optim empty shard #13513 (#13598)" This reverts commit `76d465dbc3`. * test_arange_shrink * update test	2025-12-27 21:10:23 -05:00
anu	9b4de8abc7	fix beam in python 3.14+ (#13836 ) * fix beam search on python 3.14 * add PickleableCount class to helpers * change name, add test, add step * tidy count init	2025-12-27 16:24:22 -05:00
chenyu	0f74909ae9	clean up rearrange (#13851 )	2025-12-27 11:06:10 -05:00
qazal	f6c660f7fa	simplify sqtt decoder infra (#13849 ) * more work * simpler	2025-12-28 00:31:16 +09:00
Clément Verrier	ae013beab8	handle empty VECTORIZE in UOp.render() (#13847 ) `UOp.render()` crashed with `IndexError: tuple index out of range` when the UOp graph contained a `VECTORIZE` with empty `src=()`. This occurs when reshaping to scalar shape `()`, e.g., `Tensor.ones(4).sum()`. The bug was in the renderer's VECTORIZE pattern: `all_same(())` returns `True` (vacuous truth), causing the code to access `x.src[0]` on an empty tuple. - Fix `IndexError` when calling `UOp.render()` on graphs containing empty `VECTORIZE` nodes. - Add test for empty `VECTORIZE` rendering.	2025-12-27 10:09:39 -05:00
qazal	a2da61d096	use new style amd compiler in viz (#13848 ) * working version, handcode gfx1100 arch * get target from device properties * lib in cfg test program spec	2025-12-27 23:59:30 +09:00
JINO ROHIT	1ee92003ea	minor typo (#13846 )	2025-12-27 09:34:57 -05:00
nimlgen	276159cb87	system: add base_class to pci_scan_bus (#13845 ) * system: add base_class to pci_scan_bus * fix	2025-12-27 13:22:21 +03:00
Francis Lata	fac137779e	remove flux1 seed image (#13843 )	2025-12-27 00:45:11 -05:00
qazal	f6de9095a0	switch asm tests to dsl (#13840 ) * switch asm tests to dsl * labeled basic blocks also work * indenting for basic blocks * allow define from star import	2025-12-27 02:15:16 +09:00
chenyu	ba922094f2	remove redudant check in disk_supports_fast_copyout (#13838 )	2025-12-26 11:30:55 -05:00
George Hotz	e9f2aaba2a	simplify rdna3 asm (#13835 ) * simplify rdna3 asm * cleanups * fix names * fix tests * fixes * more test fixes * type fixes * tests pass + mypy passes * 3.11 syntax	2025-12-26 11:21:03 -05:00
nimlgen	c44b4f9ae0	am: fix sdma warm boot (#13837 )	2025-12-26 12:38:06 +03:00
George Hotz	c6937fa744	more work on RDNA3 asm (#13833 ) * more llvm asm tests * roundtrip test * work * more handwritten * more handwritten * work * tests pass * dual mov * all tests pass * all tests pass fast	2025-12-25 23:28:14 -05:00
George Hotz	f1111ac7de	move amd compilers to new style (#13831 ) * move amd compilers to new style * simplest diff * AMDHIPrenderer	2025-12-25 13:42:24 -05:00
George Hotz	9d94b8c6b2	python asm dsl in extra + python REMU (#13436 ) * having fun with python asm dsl * rdna3 * meh * all in rdna3 * work * more work * work * integration * tests * simpler * simpler * asm * better * simpler * progress * emu * simpler * emu * tests * types * vopd * cleaups * work * memory ranges * add tracing * refactors * run_asm exit * more readable * compare to remu * test gemm * bug + stale * more tests * refactor * tests fix * more ins * more instructions * refactor * faster * match case * match case * simpler * work * tests * run_asm * work * bug fixes * more emu * alu/emu * refactor * no pipeline emu yet * alu direct * fix * bugfixes + new test * fix exceptions in emulators * update gen.py * pylint * no pdf * improve bench_emu * speedups * cleanups * more tests	2025-12-25 13:04:14 -05:00
nimlgen	b5f3a5ad79	am: cleanup comment (#13828 )	2025-12-25 18:00:28 +03:00
chenyu	8985a4a023	one less branch in Buffer.view [pr] (#13829 )	2025-12-25 09:34:15 -05:00
chenyu	094753b4e0	renderer arch version cleanup [pr] (#13830 )	2025-12-25 09:32:56 -05:00
chenyu	54af29dbdb	trange can just be a function (#13827 )	2025-12-24 23:57:10 -05:00
qazal	a1c1684b91	set .amdhsa_kernarg_size in asm test (#13826 )	2025-12-25 13:08:14 +09:00
chenyu	da1cb6a9ec	update llama dataloader (#13825 ) separate creating dataset from itererating over the dataset to not create eval data for each eval	2025-12-24 17:42:08 -05:00
chenyu	a7fc0c288b	clean up BufferCopy init [pr] (#13824 )	2025-12-24 10:40:15 -05:00
chenyu	903753c60c	llama wandb logging (#13822 )	2025-12-24 10:24:59 -05:00
qazal	e3a646dce3	viz: skip plaintext disassemble for cfg (#13821 )	2025-12-24 23:16:59 +09:00
chenyu	cb07c5d0e8	fewer import annotations (#13819 )	2025-12-23 18:45:50 -05:00
George Hotz	43c6e973d8	add optional compiler in Renderer (#13817 ) * add optional compiler in Renderer [pr] * fix * late init * remove precompiled * cleanup	2025-12-23 17:58:46 -05:00
George Hotz	8eab6175ee	get_program refactor (#13816 ) * get_program refactor * fix docs * cleanup	2025-12-23 16:44:46 -05:00
George Hotz	3d3c5b2fb9	add device to program (#13815 ) * add device to program * from_uop * from_uop no renderer * simpler global_size	2025-12-23 16:15:33 -05:00
nimlgen	90b217896f	am: xgmi p2p (#13811 ) * system: use addr space * am: xgmi * fix * ugh	2025-12-23 20:11:38 +03:00
George Hotz	6439a515be	test fixups / speedups / var_vals refactor (#13812 ) * no PYTHONPATH + llm server port 0 * llm tok speedup * refactor var_vals	2025-12-23 12:05:59 -05:00
George Hotz	8dcba2e2cc	no full_rewrite [pr] (#13809 ) * no full_rewrite [pr] * fix * fix docs	2025-12-22 23:20:01 -05:00
George Hotz	edce2303f4	rewrite to program (#13808 )	2025-12-22 20:03:33 -05:00
George Hotz	2af2b4da5d	Revert "rewrites for renderer and compiler (#13646 )" (#13806 ) This reverts commit `339dadf056`.	2025-12-22 19:21:33 -05:00
George Hotz	339dadf056	rewrites for renderer and compiler (#13646 ) * rewrites for renderer and compiler * full_rewrite_to_program * fix pre-commit * compiler passed into get_program * no pkl compiler * lib on program spec * fix spec * fix test * no device * compiler_device * nm * fix nir * fix * simplest * fix tests * revert	2025-12-22 18:58:43 -05:00
Daniel Xu	4edaaf19e5	Handle tied embeddings for llama 3.2 1B (#13796 ) Previously the output.weight layer would not be loaded, and would only contain randomly initialized values. This led to junk when doing a forward pass. Signed-off-by: Daniel Xu <daniel@thinkingmachines.ai>	2025-12-22 16:31:40 -05:00
chenyu	7f1d41c9f9	delete files that import ShapeTracker (#13805 )	2025-12-22 15:54:18 -05:00
qazal	b31373ca70	remove llvm-mca stuff from viz (#13802 )	2025-12-23 01:41:51 +08:00
chenyu	27d899ce97	TRAIN=0 to only eval llama (#13804 )	2025-12-22 11:55:46 -05:00
chenyu	39d962106f	update llama logging (#13803 ) ``` REWRITE_STACK_LIMIT=1000000 SMALL=1 BASEDIR=/raid/datasets/c4-8b SAMPLES=1000 BS=8 DP=8 DEFAULT_FLOAT=bfloat16 OPTIM_DTYPE=bfloat16 LLAMA3_SIZE=8B SEQLEN=1024 PYTHONPATH=. MODEL=llama3 python3 examples/mlperf/model_train.py 1 93.44 s run, 11.8750 loss, 0.000000000001 LR, 642.43 GB used, 19644.30 GFLOPS 2 101.78 s run, 11.8750 loss, 0.000000000001 LR, 1454.57 GB used, 17039.35 GFLOPS 3 7.34 s run, 11.8750 loss, 0.000000000002 LR, 1454.57 GB used, 236258.78 GFLOPS 4 4.32 s run, 11.8750 loss, 0.000000000002 LR, 1454.57 GB used, 401488.40 GFLOPS 5 4.36 s run, 11.9375 loss, 0.000000000003 LR, 1454.57 GB used, 398116.13 GFLOPS 6 4.32 s run, 11.8750 loss, 0.000000000003 LR, 1454.57 GB used, 401878.60 GFLOPS 7 4.34 s run, 11.8750 loss, 0.000000000004 LR, 1454.57 GB used, 399822.57 GFLOPS 8 4.35 s run, 11.8750 loss, 0.000000000004 LR, 1454.57 GB used, 398512.24 GFLOPS 9 4.36 s run, 11.8750 loss, 0.000000000005 LR, 1454.57 GB used, 397832.61 GFLOPS 10 4.40 s run, 11.8750 loss, 0.000000000005 LR, 1454.57 GB used, 394520.83 GFLOPS ```	2025-12-22 11:28:29 -05:00

... 2 3 4 5 6 ...

11640 Commits