tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-22 13:28:06 -05:00

Author	SHA1	Message	Date
qazal	5870352fe1	viz: factorize llvm-mca call (#11490 )	2025-08-04 00:31:23 +03:00
chenyu	dbc7807c61	enable WEBGPU tests with buffer limit (#11489 ) TestSample still fails?	2025-08-03 13:02:44 -07:00
nimlgen	8f374ee1f7	nv: print devfmr in gsp logs (#11484 )	2025-08-03 15:12:53 +03:00
chenyu	823f1a01db	move cast around expand backward to tensor.py (#11483 )	2025-08-02 23:03:54 -04:00
chenyu	0ce0f51010	generic double cast folding (#11481 ) b.cast(a).cast(b) -> b if a preserves all values in b	2025-08-02 19:26:37 -04:00
qazal	72e0d1d0dc	viz: profile the compiler in TINY device (#11457 ) * viz: profile the compiler in TINY device * leanup	2025-08-03 02:03:20 +03:00
chenyu	66be747908	few more dtype cast convinience methods (#11480 )	2025-08-02 15:47:09 -04:00
chenyu	e22e5da9a5	move some test_dtype tests to unit (#11479 )	2025-08-02 15:25:00 -04:00
nimlgen	da0b955be4	hcq: cpu can be graphed (#11474 ) * hcq: cpu can be graphed * ops * new jit decisions * fix test * fix remote * cleaner * fix	2025-08-02 21:01:19 +03:00
chenyu	f7965f85aa	Revert "feat: faster index building (#11462 )" (#11478 ) This reverts commit `3a4deb08d2`.	2025-08-02 12:50:48 -04:00
kevvz	ef7e01cadf	Fix SVD shape bug + Fix batched SVD bug (#11477 ) * failing test case * fix * better test * space	2025-08-02 09:47:41 -07:00
b1tg	6ecaf8e7b2	refactor: use less index and simplify reduce axes check [pr] (#11476 ) * use output_shape/full_shape * simple final_reduces check --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-08-02 09:44:51 -07:00
wozeparrot	3a4deb08d2	feat: faster index building (#11462 ) * feat: faster index building * feat: correct training samples	2025-08-02 11:50:18 -04:00
nimlgen	8cc2d64edb	amd: reuse create_queues for usb iface (#11473 )	2025-08-02 14:40:46 +03:00
chenyu	9e8e6b45ab	grad acc train llama (#11467 ) * grad acc train llama * log step time	2025-08-01 15:54:50 -04:00
chenyu	7ad7329257	data parallel train llama (#11466 )	2025-08-01 12:13:51 -04:00
nimlgen	9f2182f92f	cpu: start threading (#11324 ) * cpu: threading * syncs * llvm * fix * opt * fx * fix * missed sync * one line less * cleaner * fix	2025-08-01 15:35:07 +03:00
qazal	c7ae1bd474	viz: more consistent border styling (#11464 )	2025-08-01 09:31:06 +03:00
George Hotz	8ff03806e8	add llama layers (#11460 ) * add llama layers * add contig bw for speed	2025-07-31 16:28:04 -07:00
qazal	719827b95d	viz: add flops / mem bw to device programs (#11459 ) * viz: add flops / mem bw to device programs * better spacing style	2025-08-01 02:12:30 +03:00
chenyu	3f742a5a7c	comma space lab models benchmark (#11461 )	2025-07-31 19:06:18 -04:00
George Hotz	474ee9daa5	hotfix: add contiguous_backward to llama	2025-07-31 15:07:12 -07:00
qazal	fa66d9772d	viz: show const node when it's root (#11456 )	2025-08-01 01:01:58 +03:00
qazal	056dabda5a	viz: refactor to color scheme (#11455 )	2025-08-01 00:17:50 +03:00
nimlgen	e5b6149dfb	more typing in drivers (#11454 ) * more typing in drivers * rm	2025-07-31 23:26:33 +03:00
qazal	bad3cf5731	viz: add LLVM machine code analysis (#11421 ) * start * works everywhere * add viz api * utilization table * reg pressure ui * use llvm-mca * llvm-mca ui * work * cleanup * cycle through, defaults are enough * x86 pending * x86 nops * get mcpu/mtriple from autogen * cleanup server diff * move parser to python * normalize to pct of max * segments legend * imports * also monospace * max comes from the total per instruction * base on the value	2025-08-01 01:59:26 +08:00
chenyu	e847677e8a	use AxisType in search instead of colors (#11452 )	2025-07-31 13:07:33 -04:00
nimlgen	75c2c42def	suppress exceptions only during finalization (#11451 ) * suppress exceptions only during finalization * fix * fix typing * fix more warns * fix * better? * Revert "better?" This reverts commit `a068aa5793`. * mm? * no as e	2025-07-31 13:57:12 +03:00
wozeparrot	24dd0d52ed	feat: test remove to cpu (#11444 )	2025-07-30 20:18:56 -07:00
kevvz	c3cfcb50cb	Add linalg_det and test for torch backend (#11405 ) * add linalg_det and test * space --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-07-30 22:04:44 -04:00
Eitan Turok	cba3655de5	Add Test for Setitem (#10559 ) * init * update * better * failing test * works * Delete test file * clean * lint * simplify variable name * rm contigious, rm int dtype, and add assertEqual --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-07-30 22:03:41 -04:00
wozeparrot	6252f7770e	feat: fake data (#11447 )	2025-07-30 17:18:20 -07:00
chenyu	e300451f3a	update llama3 (#11446 ) `LR=1e-4 TRAIN_ON_VAL=1 DEFAULT_FLOAT=bfloat16 FUSE_ARANGE=1 JITBEAM=2 OPTIM_DTYPE=bfloat16 LLAMA3_SIZE=1B WARMUP_STEPS=36 DECAY_STEPS=360 SEQLEN=512 PYTHONPATH=. AMD=1 AMD_LLVM=0 MODEL=llama3 python3 examples/mlperf/model_train.py` trained to 7	2025-07-30 19:34:21 -04:00
wozeparrot	5fb975351a	feat: flag for training on val (#11441 )	2025-07-30 14:29:45 -07:00
chenyu	4ca430e5bf	fix search dedup (#11439 ) it should check against pre real_axis axis in actions, not real_axis.	2025-07-30 17:24:16 -04:00
wozeparrot	d3da20eca6	feat: bump mlperf workflow timeout to 6 hours (#11440 )	2025-07-30 14:12:12 -07:00
wozeparrot	825b6a2505	feat: llama3 dataloader (#11340 )	2025-07-30 13:27:55 -07:00
qazal	af357b5dc8	disable TRACK_MATCH_STATS in BEAM workers [pr] (#11437 )	2025-07-30 23:22:08 +03:00
George Hotz	7c2d2eff86	check tensor core dims (#11436 ) * check elements_per_thread in tensorcore [pr] * check tc dims	2025-07-30 13:06:59 -07:00
nimlgen	5fc5bb5237	ci: clear processes (#11434 ) * unified hcq_smi for managment * fix * fix * no reset for amd	2025-07-30 22:15:18 +03:00
George Hotz	4f26a9ad32	check elements_per_thread in tensorcore [pr] (#11435 )	2025-07-30 11:55:48 -07:00
nimlgen	4b4ba5454c	ci: move driver start higher (#11431 )	2025-07-30 10:48:38 +03:00
George Hotz	1bef2d80c1	unrolls are all in the same scope (#11429 ) * unrolls are all in the same scope * fix that import	2025-07-29 16:55:37 -07:00
chenyu	204da24cfc	increase driverbenchmark timeout-minutes to 15 (#11428 )	2025-07-29 19:45:05 -04:00
chenyu	d5fc6af4a2	remove unused ShapeTracker.consecutive [pr] (#11426 )	2025-07-29 18:36:19 -04:00
George Hotz	49a2583584	real new lowerer (#11419 ) * real new lowerer * fix group for reduce * skip missing ranges * fix wmma and unroll/contract * real fix for wmma * disable that test * fix if gate * simpler * flash attention fusion works * no end barriers * still broken * flash attention finally works	2025-07-29 15:35:51 -07:00
chenyu	0e5d8d5c3c	remove tests that used .to_uop() (#11425 ) * remove tests that used .to_uop() * import	2025-07-29 15:52:16 -04:00
nimlgen	c88e401d0e	ci: fix typos in h machine benchmarks (#11423 )	2025-07-29 22:11:47 +03:00
chenyu	90a5a312eb	simplify ShapeTracker in UOp.const [pr] (#11424 )	2025-07-29 15:04:06 -04:00
chenyu	398594029b	spec checks arg of VIEW are ShapeTracker (#11422 )	2025-07-29 14:05:12 -04:00

... 14 15 16 17 18 ...

10417 Commits