tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-10 07:28:15 -05:00

Author	SHA1	Message	Date
chenyu	7ad7329257	data parallel train llama (#11466 )	2025-08-01 12:13:51 -04:00
nimlgen	9f2182f92f	cpu: start threading (#11324 ) * cpu: threading * syncs * llvm * fix * opt * fx * fix * missed sync * one line less * cleaner * fix	2025-08-01 15:35:07 +03:00
qazal	c7ae1bd474	viz: more consistent border styling (#11464 )	2025-08-01 09:31:06 +03:00
George Hotz	8ff03806e8	add llama layers (#11460 ) * add llama layers * add contig bw for speed	2025-07-31 16:28:04 -07:00
qazal	719827b95d	viz: add flops / mem bw to device programs (#11459 ) * viz: add flops / mem bw to device programs * better spacing style	2025-08-01 02:12:30 +03:00
chenyu	3f742a5a7c	comma space lab models benchmark (#11461 )	2025-07-31 19:06:18 -04:00
George Hotz	474ee9daa5	hotfix: add contiguous_backward to llama	2025-07-31 15:07:12 -07:00
qazal	fa66d9772d	viz: show const node when it's root (#11456 )	2025-08-01 01:01:58 +03:00
qazal	056dabda5a	viz: refactor to color scheme (#11455 )	2025-08-01 00:17:50 +03:00
nimlgen	e5b6149dfb	more typing in drivers (#11454 ) * more typing in drivers * rm	2025-07-31 23:26:33 +03:00
qazal	bad3cf5731	viz: add LLVM machine code analysis (#11421 ) * start * works everywhere * add viz api * utilization table * reg pressure ui * use llvm-mca * llvm-mca ui * work * cleanup * cycle through, defaults are enough * x86 pending * x86 nops * get mcpu/mtriple from autogen * cleanup server diff * move parser to python * normalize to pct of max * segments legend * imports * also monospace * max comes from the total per instruction * base on the value	2025-08-01 01:59:26 +08:00
chenyu	e847677e8a	use AxisType in search instead of colors (#11452 )	2025-07-31 13:07:33 -04:00
nimlgen	75c2c42def	suppress exceptions only during finalization (#11451 ) * suppress exceptions only during finalization * fix * fix typing * fix more warns * fix * better? * Revert "better?" This reverts commit `a068aa5793`. * mm? * no as e	2025-07-31 13:57:12 +03:00
wozeparrot	24dd0d52ed	feat: test remove to cpu (#11444 )	2025-07-30 20:18:56 -07:00
kevvz	c3cfcb50cb	Add linalg_det and test for torch backend (#11405 ) * add linalg_det and test * space --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-07-30 22:04:44 -04:00
Eitan Turok	cba3655de5	Add Test for Setitem (#10559 ) * init * update * better * failing test * works * Delete test file * clean * lint * simplify variable name * rm contigious, rm int dtype, and add assertEqual --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-07-30 22:03:41 -04:00
wozeparrot	6252f7770e	feat: fake data (#11447 )	2025-07-30 17:18:20 -07:00
chenyu	e300451f3a	update llama3 (#11446 ) `LR=1e-4 TRAIN_ON_VAL=1 DEFAULT_FLOAT=bfloat16 FUSE_ARANGE=1 JITBEAM=2 OPTIM_DTYPE=bfloat16 LLAMA3_SIZE=1B WARMUP_STEPS=36 DECAY_STEPS=360 SEQLEN=512 PYTHONPATH=. AMD=1 AMD_LLVM=0 MODEL=llama3 python3 examples/mlperf/model_train.py` trained to 7	2025-07-30 19:34:21 -04:00
wozeparrot	5fb975351a	feat: flag for training on val (#11441 )	2025-07-30 14:29:45 -07:00
chenyu	4ca430e5bf	fix search dedup (#11439 ) it should check against pre real_axis axis in actions, not real_axis.	2025-07-30 17:24:16 -04:00
wozeparrot	d3da20eca6	feat: bump mlperf workflow timeout to 6 hours (#11440 )	2025-07-30 14:12:12 -07:00
wozeparrot	825b6a2505	feat: llama3 dataloader (#11340 )	2025-07-30 13:27:55 -07:00
qazal	af357b5dc8	disable TRACK_MATCH_STATS in BEAM workers [pr] (#11437 )	2025-07-30 23:22:08 +03:00
George Hotz	7c2d2eff86	check tensor core dims (#11436 ) * check elements_per_thread in tensorcore [pr] * check tc dims	2025-07-30 13:06:59 -07:00
nimlgen	5fc5bb5237	ci: clear processes (#11434 ) * unified hcq_smi for managment * fix * fix * no reset for amd	2025-07-30 22:15:18 +03:00
George Hotz	4f26a9ad32	check elements_per_thread in tensorcore [pr] (#11435 )	2025-07-30 11:55:48 -07:00
nimlgen	4b4ba5454c	ci: move driver start higher (#11431 )	2025-07-30 10:48:38 +03:00
George Hotz	1bef2d80c1	unrolls are all in the same scope (#11429 ) * unrolls are all in the same scope * fix that import	2025-07-29 16:55:37 -07:00
chenyu	204da24cfc	increase driverbenchmark timeout-minutes to 15 (#11428 )	2025-07-29 19:45:05 -04:00
chenyu	d5fc6af4a2	remove unused ShapeTracker.consecutive [pr] (#11426 )	2025-07-29 18:36:19 -04:00
George Hotz	49a2583584	real new lowerer (#11419 ) * real new lowerer * fix group for reduce * skip missing ranges * fix wmma and unroll/contract * real fix for wmma * disable that test * fix if gate * simpler * flash attention fusion works * no end barriers * still broken * flash attention finally works	2025-07-29 15:35:51 -07:00
chenyu	0e5d8d5c3c	remove tests that used .to_uop() (#11425 ) * remove tests that used .to_uop() * import	2025-07-29 15:52:16 -04:00
nimlgen	c88e401d0e	ci: fix typos in h machine benchmarks (#11423 )	2025-07-29 22:11:47 +03:00
chenyu	90a5a312eb	simplify ShapeTracker in UOp.const [pr] (#11424 )	2025-07-29 15:04:06 -04:00
chenyu	398594029b	spec checks arg of VIEW are ShapeTracker (#11422 )	2025-07-29 14:05:12 -04:00
George Hotz	1f1f99c287	hotfix: add DEBUG=3 to driver CI	2025-07-29 11:03:47 -07:00
George Hotz	50fae54175	global local dims in gpudims [pr] (#11420 )	2025-07-29 10:39:03 -07:00
chenyu	9bc413f104	remove ShapeTracker.to_uop [pr] (#11418 )	2025-07-29 13:29:37 -04:00
George Hotz	ba2c4df125	dont render cast ptrs standalone (#11417 ) * dont render cast ptrs standalone * barrier cleanups	2025-07-29 09:24:26 -07:00
nimlgen	d38d285489	ci: add h machines (#11416 ) * ci: add h machines * more * fix names * names not collide * 20 * 10	2025-07-29 19:21:51 +03:00
Tom Clesius	2568bc0d99	ci: add caching for apt packages (#11162 ) * add caching for apt packages * remove 'inputs' from apt cache key, use outputs instead of env * remove unnecessary mkdir for partial --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-07-29 09:04:56 -07:00
George Hotz	03909f2772	permute locals for HL uop matmul (#11412 ) * permute locals for HL uop matmul * parens fix that * permutes * 20 TFLOPS	2025-07-29 08:19:59 -07:00
nimlgen	e0c9747684	amd: fix typo in has_scratch_base_registers for mi350 (#11413 )	2025-07-29 10:30:06 +03:00
George Hotz	735ad5f10d	kernel4 and 5 in uops (#11411 ) * move simplify views to merge views * add amd kernel 4 * Revert "move simplify views to merge views" This reverts commit `1e07dff384`. * k4 in python * kernel4 written in uops * k5 support * cleanups	2025-07-28 19:35:48 -07:00
George Hotz	fddc645668	HL=2 top matmul (#11406 ) * HL=2 top matmul * top colored	2025-07-28 12:32:38 -07:00
nimlgen	c7b4ab86e4	fix llvm tc on mi350 (#11404 )	2025-07-28 21:37:43 +03:00
chenyu	9f7c72ff8f	remove UOp.valid method [pr] (#11402 ) only used in add_buffer_ops	2025-07-28 11:29:08 -04:00
chenyu	b22a34331b	remove const valid in fixup_ast [pr] (#11401 )	2025-07-28 11:07:59 -04:00
qazal	7737cbb2a0	viz: tabulate runtime stats (#11400 )	2025-07-28 15:56:39 +03:00
chenyu	ab6a27f627	remove a branch in UOp.r [pr] (#11398 )	2025-07-27 18:00:01 -04:00

1 2 3 4 5 ...

9652 Commits