tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-15 01:48:23 -05:00

Author	SHA1	Message	Date
George Hotz	3cfd4c915f	warp_num	2025-10-29 19:08:10 +08:00
George Hotz	abb01ce6a0	last	2025-10-29 18:57:38 +08:00
George Hotz	6853da2a2f	works	2025-10-29 18:45:16 +08:00
George Hotz	154ddd98fd	it works, it's just slow...	2025-10-29 18:35:44 +08:00
George Hotz	e4ef94cf10	both is broken	2025-10-29 18:15:38 +08:00
George Hotz	0c274151ad	tensor core works	2025-10-29 18:06:05 +08:00
George Hotz	c4e32d4f63	demote op	2025-10-29 17:47:12 +08:00
George Hotz	a9d91ffcfc	DEMOTE op for putting globals in locals	2025-10-29 17:22:59 +08:00
George Hotz	819592ee67	hotfix: disable DoubleMatmul for PTX	2025-10-29 16:37:17 +08:00
George Hotz	30ca3f2af8	all double matmul (#12993 ) * fix more double matmuls * a few more * all double matmul passes * opts for flash attention * fix spec * comment	2025-10-29 16:25:27 +08:00
Sieds Lykles	9f39f6391c	shared_codegen_spec and fix index spec (#12967 ) * split shared_codegen_spec and fix index * add VCONST to program_spec and move index to shared_codegen_spec * working ignore_oob=0 * cleanup * fix spec * undo that * move barrier and special earlier * fix more spec issues * more updates * remove special from program_spec * cleanup and fixes * move more to shared * special is not in shared_spec * some comments * dont do bounds check there	2025-10-29 09:14:11 +01:00
George Hotz	1c362736aa	fix more double matmuls (#12991 ) * fix more double matmuls * a few more	2025-10-29 16:09:48 +08:00
George Hotz	e42b4edf8c	remove if stuff (#12992 )	2025-10-29 15:29:35 +08:00
George Hotz	8c47cf4323	pcontig double matmul works (#12899 ) * pcontig double matmul works * tests * contract * closer * works-ish * add that broadcast * 2 more work * something * disable broken ones * llvm * align 16	2025-10-29 13:06:43 +08:00
George Hotz	35b6f4148d	delete untested quantize (#12990 )	2025-10-29 12:46:32 +08:00
Sieds Lykles	5ce8a1d2f2	Merge adjacent try all permutations for reduce (#12972 )	2025-10-29 05:04:54 +01:00
George Hotz	b147e7e8e6	flatten bufferize (#12984 ) * flatten bufferize * simpler * tests pass * flat * not flat	2025-10-29 11:23:43 +08:00
qazal	a7dac11aad	viz: keep rewrite step in back button history (#12986 )	2025-10-29 11:09:43 +08:00
qazal	37967fa17b	viz: add integer query param helper and more typing (#12985 ) * viz: query param helper * json.dumps once	2025-10-29 10:44:01 +08:00
chenyu	fb53bdad5d	unused propagate_invalid rules [pr] (#12983 ) named is not used, so you know it never matched	2025-10-28 22:16:50 -04:00
chenyu	ef16e6c68c	unwrap instead of cast [pr] (#12982 )	2025-10-28 21:29:23 -04:00
chenyu	f55fcfecf9	ProgramSpec uops must end with SINK [pr] (#12981 )	2025-10-28 17:12:22 -04:00
chenyu	9442442cb1	update variable names in search [pr] (#12979 ) no lin nor linearize	2025-10-28 15:37:52 -04:00
wozeparrot	d66c997a39	feat: thunderkittens fa2 (#12955 )	2025-10-28 11:27:45 -07:00
b1tg	bb307b9e81	fix fp8 vectorization (#12977 ) * fix fp8 vectorization * add fp8 tc to benchmark	2025-10-28 13:55:30 -04:00
nimlgen	c11dd56956	amd: cleanup import urls (#12976 )	2025-10-29 00:43:02 +08:00
George Hotz	5e01cc299b	zero len ranges fail (#12974 ) * zero len ranges fail * fix Python backend * fix llvm * fix ptx * yolo fix nir * this works... * always store... * always store... * Revert "always store..." This reverts commit `0816cf344d`.	2025-10-28 22:49:55 +08:00
George Hotz	e936aa7974	cleanups from if range branch (#12973 )	2025-10-28 20:58:47 +08:00
qazal	901d27b3ba	viz: optional text dims try 2 (#12971 )	2025-10-28 18:54:28 +08:00
George Hotz	f5a3b33d33	add fun with nhwc convs	2025-10-28 17:12:22 +08:00
George Hotz	907499b02c	clean up GROUP/SINK (#12969 ) * clean up GROUP/SINK * fix end * range_str color	2025-10-28 16:08:10 +08:00
Sieds Lykles	e22c5e7e73	process_replay uses opts argument for KernelInfo.opts_to_apply (#12946 ) * opts_to_apply is opts * skip beamed kernels * simpler change * fix the tensor cores tests for process replay * use opts	2025-10-28 09:00:28 +01:00
George Hotz	6c9560a846	more syntactic sugar for pyrender (#12968 )	2025-10-28 15:24:33 +08:00
George Hotz	b0da173f2f	add unique to const, fix longstanding bug (#12965 ) * add unique to const, fix longstanding bug * _force_unique=True * fix tests * fix more tests	2025-10-28 15:11:37 +08:00
Sieds Lykles	e110f4632a	split cat (on cpu) (#12864 ) * split ranges but only on cpu * except KernelOptError for threads * use GROUP and END * no more flatten_range needed * remove noop end * always process replay for openpilot * update test * skip test * fix in outs calculation With the new linearizer the toposort is a problem, this matches the spec now * undo that	2025-10-28 07:55:19 +01:00
qazal	3b82dee625	viz: match DEBUG=2 for exec item metadata (#12966 ) * viz: match DEBUG=2 for exec item metadata * remove repr from kernel	2025-10-28 14:53:57 +08:00
qazal	99589dea81	move viz edge tagging to UOp graph (#12964 )	2025-10-28 12:46:23 +08:00
George Hotz	bbe0bebbf3	no range tags in kernels (#12962 )	2025-10-28 12:33:48 +08:00
George Hotz	39c2117dea	cleanup pyrender (#12961 )	2025-10-28 10:47:39 +08:00
George Hotz	2832954bcb	test with IGNORE_OOB=0 (#12960 )	2025-10-28 10:32:19 +08:00
George Hotz	7784cec48e	pytest-split on spec (#12959 )	2025-10-28 10:09:01 +08:00
George Hotz	4d817a289e	simplify spec (#12958 ) * simplify spec * more	2025-10-28 09:52:32 +08:00
George Hotz	62e62d8760	move verify to spec / cleanup (#12956 ) * move verify to spec / cleanup * lil * more explicit	2025-10-28 08:58:10 +08:00
wozeparrot	24884c6768	fix: don't use KITTENS_HOPPER for 4090 (#12954 )	2025-10-27 17:19:53 -07:00
nimlgen	372d9e5753	hcq: helper for visible devices (#12950 ) * hcq: helper for visible devices * fix * f	2025-10-28 02:27:56 +08:00
Justin Erenkrantz	f2ffe9c8cf	Apply an override for nbio 7.3.0 to 7.2.0. (#12949 )	2025-10-27 11:10:10 -07:00
qazal	63484d837e	Revert "viz graph drawing cleanups (#12933 )" (#12947 ) This reverts commit `189582db5e`.	2025-10-28 00:39:37 +08:00
chenyu	a79832b01f	control_flow.py -> linearizer.py [pr] (#12948 )	2025-10-27 12:38:13 -04:00
b1tg	45e2f916a3	add quantize fp8 in llama3 (#12893 ) * add quantize fp8 in llama3 * don't truncate fp8 alu result * cast to float32 before matmul * --model weights/LLaMA-3/8B-SF-DPO/ --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-10-27 10:22:57 -04:00
George Hotz	25c2da1579	check SPEC=2 in CI (#12945 ) * check SPEC=2 in CI * split SPEC=2 * fast enough	2025-10-27 21:53:57 +08:00

1 2 3 4 5 ...

10827 Commits