tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
wozeparrot	590464c8d8	llama: only support wqkv path + cleanups (#15680 ) * llama: only support wqkv path + cleanups * llama: missing transpose	2026-04-11 07:39:27 +08:00
nimlgen	aa012d6f08	usb: faster custom (#15678 ) * usb: _f0_out_buf for e4 cmd as well * custom speed * fast	2026-04-10 23:00:31 +03:00
nimlgen	58646f9569	usb fast copyout (#15677 ) * usb * fix usb	2026-04-10 21:04:49 +03:00
qazal	0d5cdc9600	viz: split draw loop (#15676 ) * split draw loop * one draw * no functions * inline all highlights * cleanup	2026-04-10 23:25:50 +09:00
chenyu	e1334d3852	move canonicalize_device to device.py (#15675 )	2026-04-10 09:43:56 -04:00
chenyu	8e7fcc8ca3	remove _include_initial in _cumalu (#15674 ) handle negative pad in caller	2026-04-10 08:33:30 -04:00
George Hotz	9092f2a8c0	llm: add shared_expert and rope_dim support from qwen35 (#15673 ) * llm: add shared_expert and rope_dim support from qwen35 * refactor into FFNBlock and TransformerBlock * norms where they belong	2026-04-10 19:18:27 +08:00
b1tg	9ab1415937	llm: fix streaming UTF-8 decode (#15653 )	2026-04-10 17:01:02 +08:00
wozeparrot	55bcd7cc9e	llama amax outside (#15670 )	2026-04-09 23:08:03 -07:00
George Hotz	16f3448b26	Add HIP to abstractions4 (#15672 ) * cleanup formatting * add HIP option * pass in correct	2026-04-10 14:05:52 +08:00
George Hotz	ed2a72bb23	work on abstractions4 (#15671 ) * work on abstractions4 * works * offst * assembly works * RAND * cleanup * work	2026-04-10 13:25:11 +08:00
Christopher Milan	dbc23e8a1b	move HCQ_VISIBLE_DEVICES into DEV (#15668 )	2026-04-09 22:01:35 -04:00
George Hotz	fa02105546	hotfix: pin amd isa xml version	2026-04-10 06:47:00 +08:00
nimlgen	057dc173ab	beam uop (#15660 ) * beam as uop * x	2026-04-09 19:13:03 +03:00
nimlgen	0ff30b003d	am: reset queues from spi (#15664 ) * am: reset queues from spi * move	2026-04-09 18:25:50 +03:00
George Hotz	48a7627b04	add RDNA4 support to copy WMMA (#15663 ) * add RDNA4 supportt to copy WMMA * simpler * simpler * comment * assert	2026-04-09 22:48:20 +08:00
chenyu	6837881b06	remove same_shape_noop [pr] (#15662 ) no longer used	2026-04-09 09:50:26 -04:00
Christopher Milan	d08c76d9cb	c.Struct cleanup (#15640 )	2026-04-08 20:07:16 -04:00
qazal	742b3894d7	viz/cli: add pmc printer (#15651 ) * viz/cli: add pmc printer * cli work * s * linter * pack workgroups * add : to wgp * counter name	2026-04-09 08:50:54 +09:00
chenyu	4cf2759fc8	fix merge_reduce_ends (#15659 ) * fix merge_reduce_ends same range with different nesting should not merge, like cumsum twice should not merge * skip that	2026-04-08 17:20:01 -04:00
chenyu	cb681da840	move UOp.pad to mixin (#15657 ) the same arg works for Tensor.pad	2026-04-08 13:15:19 -04:00
nimlgen	28b14b0e38	mlx: remove to_be, use helpers (#15655 )	2026-04-08 20:07:28 +03:00
nimlgen	1b44cb2ac6	split update stat from execitem (#15654 )	2026-04-08 20:07:12 +03:00
qazal	71c83cc3f6	viz: put OTHER_ on the wave row (#15650 ) * viz: put OTHER_ on the wave row * update tests * cleanup cli	2026-04-08 23:13:44 +09:00
chenyu	839d37b7bc	update median_step_time in model_train.py (#15649 ) BENCHMARK=5 used to pick the 4th largest, not the middle one	2026-04-08 09:53:59 -04:00
chenyu	dae9dea903	clean up tensor random functions (#15648 ) * clean up tensor random functions * revert that	2026-04-08 09:44:37 -04:00
George Hotz	1ebeb52e59	RDNA4 asm gemm (#15427 ) * sqtt: rdna4 decoder work * diff cleanup * more diff * test * 125 * r4 --------- Co-authored-by: qazal <qazal.software@gmail.com> Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2026-04-08 21:26:44 +08:00
nimlgen	b1e52ba0c2	the slowest line in hcq graph (#15635 ) * the slowest line in hcq graph * x	2026-04-08 15:53:52 +03:00
qazal	3ac16b3bea	viz: add wmma row, update exec duration logic (#15646 ) * viz: split wmma to its own row, fix duration logic * regs * decrease number of loops, add pickle * assert overlaps	2026-04-08 20:24:23 +09:00
George Hotz	35e3983840	Add Q5_0, Q5_1, and bfloat16 GGUF types (#15644 )	2026-04-08 17:16:19 +08:00
qazal	39a029ec55	remove ASM_GEMM context var (#15645 )	2026-04-08 18:02:40 +09:00
qazal	dc6a51e44d	viz: add # of bytes to sdma (#15639 ) * viz: add # of bytes to sdma * update test_viz	2026-04-08 17:43:37 +09:00
wozeparrot	70dbd35023	llama: move custom_kernel into flat_llama (#15643 )	2026-04-08 00:19:14 -07:00
Christopher Milan	bcf6931a4f	fix: comma 4 does not have pcie (#15642 )	2026-04-07 23:57:03 -04:00
George Hotz	f930579b7a	llm: change the default port to 8000 so you can remember it (match vLLM)	2026-04-08 11:25:38 +08:00
b1tg	bf3763526a	llm: buffer SSE chunks to fix parse errors from split reads (#15641 )	2026-04-08 10:26:23 +08:00
qazal	a508b8fd2a	viz: delete redundant things (#15637 ) * delete that * remove * delete graph config	2026-04-08 07:18:04 +09:00
chenyu	9c6e925b56	move lerp to mixin (#15634 ) last function of math function section	2026-04-07 15:13:00 -04:00
qazal	890286e8d6	update llama profile.sh (#15633 ) * update llama profile.sh * BENCHMARK 5	2026-04-08 03:18:45 +09:00
nimlgen	b78b384d58	mlx: graph (#15621 ) * Dx * Dx * simpler * mypy * x * f * Dx * x * c * x	2026-04-07 19:43:51 +03:00
qazal	d29f0ef721	viz: speed up profiler first render (#15632 ) * viz: speed up profiler first render * better comment	2026-04-07 23:07:09 +09:00
George Hotz	d3de63d998	improvements to apps.llm (#15631 )	2026-04-07 20:34:05 +08:00
George Hotz	2b01ca59dd	USB driver for custom ASM firmware (#15597 ) * USB driver for custom ASM firmware * timeout * fix mypy * pcie mem read * flip in f/w * one tx * litle endian * autodetect custom * mock bypass * lint * clean	2026-04-07 13:45:41 +08:00
wozeparrot	810d7c00cd	llama: unify scripts (#15628 )	2026-04-06 20:28:08 -07:00
Christopher Milan	19e96497ee	interface in DEV (#15620 )	2026-04-06 19:59:28 -04:00
qazal	8ba58304f7	viz: reenable tests (#15626 )	2026-04-07 07:52:44 +09:00
chenyu	2f7d085450	shared _normalize_indices for getitem (#15625 ) * shared _normalize_indices for getitem * list	2026-04-06 17:45:36 -04:00
chenyu	66ec188d50	more activations to mixin (#15624 )	2026-04-06 15:41:41 -04:00
chenyu	1483f7e71c	support shift by Tensor (#15623 ) * support shift by Tensor * use mixin	2026-04-06 15:14:57 -04:00
chenyu	6e30a5f5ea	update shifts in torch backend (#15622 )	2026-04-06 14:08:33 -04:00

1 2 3 4 5 ...

12916 Commits