tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
George Hotz	fa02105546	hotfix: pin amd isa xml version	2026-04-10 06:47:00 +08:00
nimlgen	057dc173ab	beam uop (#15660 ) * beam as uop * x	2026-04-09 19:13:03 +03:00
nimlgen	0ff30b003d	am: reset queues from spi (#15664 ) * am: reset queues from spi * move	2026-04-09 18:25:50 +03:00
George Hotz	48a7627b04	add RDNA4 support to copy WMMA (#15663 ) * add RDNA4 supportt to copy WMMA * simpler * simpler * comment * assert	2026-04-09 22:48:20 +08:00
chenyu	6837881b06	remove same_shape_noop [pr] (#15662 ) no longer used	2026-04-09 09:50:26 -04:00
Christopher Milan	d08c76d9cb	c.Struct cleanup (#15640 )	2026-04-08 20:07:16 -04:00
qazal	742b3894d7	viz/cli: add pmc printer (#15651 ) * viz/cli: add pmc printer * cli work * s * linter * pack workgroups * add : to wgp * counter name	2026-04-09 08:50:54 +09:00
chenyu	4cf2759fc8	fix merge_reduce_ends (#15659 ) * fix merge_reduce_ends same range with different nesting should not merge, like cumsum twice should not merge * skip that	2026-04-08 17:20:01 -04:00
chenyu	cb681da840	move UOp.pad to mixin (#15657 ) the same arg works for Tensor.pad	2026-04-08 13:15:19 -04:00
nimlgen	28b14b0e38	mlx: remove to_be, use helpers (#15655 )	2026-04-08 20:07:28 +03:00
nimlgen	1b44cb2ac6	split update stat from execitem (#15654 )	2026-04-08 20:07:12 +03:00
qazal	71c83cc3f6	viz: put OTHER_ on the wave row (#15650 ) * viz: put OTHER_ on the wave row * update tests * cleanup cli	2026-04-08 23:13:44 +09:00
chenyu	839d37b7bc	update median_step_time in model_train.py (#15649 ) BENCHMARK=5 used to pick the 4th largest, not the middle one	2026-04-08 09:53:59 -04:00
chenyu	dae9dea903	clean up tensor random functions (#15648 ) * clean up tensor random functions * revert that	2026-04-08 09:44:37 -04:00
George Hotz	1ebeb52e59	RDNA4 asm gemm (#15427 ) * sqtt: rdna4 decoder work * diff cleanup * more diff * test * 125 * r4 --------- Co-authored-by: qazal <qazal.software@gmail.com> Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2026-04-08 21:26:44 +08:00
nimlgen	b1e52ba0c2	the slowest line in hcq graph (#15635 ) * the slowest line in hcq graph * x	2026-04-08 15:53:52 +03:00
qazal	3ac16b3bea	viz: add wmma row, update exec duration logic (#15646 ) * viz: split wmma to its own row, fix duration logic * regs * decrease number of loops, add pickle * assert overlaps	2026-04-08 20:24:23 +09:00
George Hotz	35e3983840	Add Q5_0, Q5_1, and bfloat16 GGUF types (#15644 )	2026-04-08 17:16:19 +08:00
qazal	39a029ec55	remove ASM_GEMM context var (#15645 )	2026-04-08 18:02:40 +09:00
qazal	dc6a51e44d	viz: add # of bytes to sdma (#15639 ) * viz: add # of bytes to sdma * update test_viz	2026-04-08 17:43:37 +09:00
wozeparrot	70dbd35023	llama: move custom_kernel into flat_llama (#15643 )	2026-04-08 00:19:14 -07:00
Christopher Milan	bcf6931a4f	fix: comma 4 does not have pcie (#15642 )	2026-04-07 23:57:03 -04:00
George Hotz	f930579b7a	llm: change the default port to 8000 so you can remember it (match vLLM)	2026-04-08 11:25:38 +08:00
b1tg	bf3763526a	llm: buffer SSE chunks to fix parse errors from split reads (#15641 )	2026-04-08 10:26:23 +08:00
qazal	a508b8fd2a	viz: delete redundant things (#15637 ) * delete that * remove * delete graph config	2026-04-08 07:18:04 +09:00
chenyu	9c6e925b56	move lerp to mixin (#15634 ) last function of math function section	2026-04-07 15:13:00 -04:00
qazal	890286e8d6	update llama profile.sh (#15633 ) * update llama profile.sh * BENCHMARK 5	2026-04-08 03:18:45 +09:00
nimlgen	b78b384d58	mlx: graph (#15621 ) * Dx * Dx * simpler * mypy * x * f * Dx * x * c * x	2026-04-07 19:43:51 +03:00
qazal	d29f0ef721	viz: speed up profiler first render (#15632 ) * viz: speed up profiler first render * better comment	2026-04-07 23:07:09 +09:00
George Hotz	d3de63d998	improvements to apps.llm (#15631 )	2026-04-07 20:34:05 +08:00
George Hotz	2b01ca59dd	USB driver for custom ASM firmware (#15597 ) * USB driver for custom ASM firmware * timeout * fix mypy * pcie mem read * flip in f/w * one tx * litle endian * autodetect custom * mock bypass * lint * clean	2026-04-07 13:45:41 +08:00
wozeparrot	810d7c00cd	llama: unify scripts (#15628 )	2026-04-06 20:28:08 -07:00
Christopher Milan	19e96497ee	interface in DEV (#15620 )	2026-04-06 19:59:28 -04:00
qazal	8ba58304f7	viz: reenable tests (#15626 )	2026-04-07 07:52:44 +09:00
chenyu	2f7d085450	shared _normalize_indices for getitem (#15625 ) * shared _normalize_indices for getitem * list	2026-04-06 17:45:36 -04:00
chenyu	66ec188d50	more activations to mixin (#15624 )	2026-04-06 15:41:41 -04:00
chenyu	1483f7e71c	support shift by Tensor (#15623 ) * support shift by Tensor * use mixin	2026-04-06 15:14:57 -04:00
chenyu	6e30a5f5ea	update shifts in torch backend (#15622 )	2026-04-06 14:08:33 -04:00
chenyu	a444be172d	lower fuzz_symbolic_symbolic_div timeout (#15619 ) mitigate timeout crash due to high total time	2026-04-06 12:58:29 -04:00
chenyu	01b49c8647	support int operand for shifts (#15618 ) matches torch/jax, also symbolic rule to remove mask	2026-04-06 12:32:12 -04:00
nimlgen	e2700475cf	mlx: cleaner (#15617 ) * mlx: cleaner * x	2026-04-06 17:49:47 +03:00
Valtteri Valo	86c4431d74	add gpu_family detection to Metal, target MSL 4.0 on macOS 26+ (#15079 ) use supportsFamily API to detect GPU generation instead of parsing ICB debug description strings. also adds metal4.0 compiler target.	2026-04-06 06:51:38 +08:00
13Perrius	ff0c941548	remove redundant iteration and toposort in _deepwalk (#15532 )	2026-04-06 06:38:45 +08:00
Andrew Cappelli	e39cfe685a	validate lr, momentum, weight_decay in optimizers (#15576 )	2026-04-06 06:37:34 +08:00
nimlgen	6a334ceb27	hotfix: fix bert (#15613 )	2026-04-05 23:41:21 +03:00
nimlgen	e3986a6b74	mlx: init runtime (#15612 ) * mlx: init * x * swap	2026-04-05 22:52:29 +03:00
nimlgen	e0988dbae5	hcq: support non for signal_t and compute_t (#15611 ) * hcq: support non for signal_t and compute_t * revert * x	2026-04-05 18:56:47 +03:00
nimlgen	5e134aa087	hcq: add write/poll_bit commands (#15610 ) * hcq: add write/poll_bit commands * x	2026-04-05 18:09:44 +03:00
nimlgen	604cdbf2f7	am: large allocs aligned to 2mb to use 2mb pages (#15609 )	2026-04-05 18:01:31 +03:00
qazal	b2d5b29f45	assembly/amd: validate dsl keyword args (#15608 ) * assembly/amd: validate dsl keyword args * hm, this should use the SOP2 s_waits * use the sop2 s_waits	2026-04-05 23:00:24 +09:00

1 2 3 4 5 ...

12904 Commits