tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
qazal	ac027055ef	viz: no global state (#15705 ) * start viz data * get_full_rewrites also moves * update ref_map * work * update consumers * cleaner cli * linter * cleanup tests * back * better * sqtt tests	2026-04-13 21:35:20 +09:00
George Hotz	4c1fb18a09	Revert "Revert "Tests for GatedDeltaNetBlock + fix multi after assign issue (…" (#15703 ) This reverts commit `0cec42db71`.	2026-04-13 19:09:38 +08:00
George Hotz	0cec42db71	Revert "Tests for GatedDeltaNetBlock + fix multi after assign issue (#15700 )" (#15702 ) This reverts commit `6f5d756282`.	2026-04-13 19:06:44 +08:00
George Hotz	6f5d756282	Tests for GatedDeltaNetBlock + fix multi after assign issue (#15700 ) * broken after/assign test * test for GatedDeltaNet * better comments * fix issue 1 with multi kernel * fix 2 * fix * linter * public api + cleanup	2026-04-13 18:43:23 +08:00
b1tg	2b5ba0095d	qwen3.5 (#15210 ) * qwen3.5 * faster * or * rm zero hack * less float * T=1 * clean * clean * 4b * rope_dim * Revert "jit: captures linears, not execitems (#15399)" This reverts commit `9656d97d97`. * DeltaNetBlock * pairwise_topk * clean * Reapply "jit: captures linears, not execitems (#15399)" This reverts commit `cf3deff53d`. * clean topk, _swiglu * common * FFNBlock * clean * half * no mix * qwen3.5 test * fix ssm cache invalidation * TransformerConfig * SSMConfig * clean * reset_state * llm: reuse server conversation tokens to avoid BPE roundtrip cache miss * import error * prefill * none check * put it back * clean pairwise_topk * symbolic: fold BIND(CONST, CONST) to CONST * clean * simpler pm * _cached_msg_count * stream decoder; ssm checkpoints * rm checkpoint * attn_output_gate * conflict, attn_output_gate * clean, less has_ssm, assert * chunked prefill * _reset_cache * _reusable_prefix_len * revert loop --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2026-04-13 15:35:24 +08:00
qazal	2ada38f777	viz: execv after all producers complete (#15696 )	2026-04-13 08:15:47 +09:00
chenyu	f7ff480fa6	start mixin getitem tests (#15695 ) goal is to make Tensor[idx].uop equal to Tensor.uop[idx]	2026-04-12 18:54:33 -04:00
chenyu	77385ccb37	more trivial stuff to mixin (#15693 )	2026-04-12 15:17:16 -04:00
chenyu	ff1de5ae13	normalize logsumexp contiguous_backward to mixin (#15692 ) * normalize logsumexp contiguous_backward to mixin * more	2026-04-12 13:13:00 -04:00
chenyu	0254cfe642	move usum and uprod to mixin (#15690 ) and used it to clean up ops and tensor	2026-04-12 11:42:24 -04:00
nimlgen	e9b2e156b4	add jitbeam to tinygpu docs (#15691 )	2026-04-12 18:20:26 +03:00
chenyu	e706f408cb	suppress test warnings from numpy (#15688 )	2026-04-11 22:33:20 -04:00
nimlgen	938cba4fdf	amd: a bit faster usb, skip interrupts on sync (#15686 )	2026-04-11 17:26:36 +03:00
qazal	054d78e6ff	fix llama profile.sh NULL source (#15685 )	2026-04-11 22:56:05 +09:00
Graham Robbins	4ca844e96b	add Q1_0 gguf type (#15683 ) * add Q1_0 * better description * fix trailing whitespace --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2026-04-11 18:17:24 +08:00
George Hotz	5156a04cf5	add support for AM_POWER_LIMIT (#15684 ) * add support for AM_POWER_LIMIT * level None	2026-04-11 17:14:54 +08:00
wozeparrot	457508d5a0	llama: save more 2 (#15681 )	2026-04-11 01:03:36 -07:00
George Hotz	29238b772f	AMD USB: support for 0xF3 power toggle	2026-04-11 13:04:38 +08:00
George Hotz	b5a9465b13	llm: add support for moonlight (deepseek MLA) (#15466 ) * add gguf Q5_0 * it works * rebase * simpler test * class * less diff * dicts * normal names * simplify * this * simpler * work * work	2026-04-11 10:32:48 +08:00
wozeparrot	590464c8d8	llama: only support wqkv path + cleanups (#15680 ) * llama: only support wqkv path + cleanups * llama: missing transpose	2026-04-11 07:39:27 +08:00
nimlgen	aa012d6f08	usb: faster custom (#15678 ) * usb: _f0_out_buf for e4 cmd as well * custom speed * fast	2026-04-10 23:00:31 +03:00
nimlgen	58646f9569	usb fast copyout (#15677 ) * usb * fix usb	2026-04-10 21:04:49 +03:00
qazal	0d5cdc9600	viz: split draw loop (#15676 ) * split draw loop * one draw * no functions * inline all highlights * cleanup	2026-04-10 23:25:50 +09:00
chenyu	e1334d3852	move canonicalize_device to device.py (#15675 )	2026-04-10 09:43:56 -04:00
chenyu	8e7fcc8ca3	remove _include_initial in _cumalu (#15674 ) handle negative pad in caller	2026-04-10 08:33:30 -04:00
George Hotz	9092f2a8c0	llm: add shared_expert and rope_dim support from qwen35 (#15673 ) * llm: add shared_expert and rope_dim support from qwen35 * refactor into FFNBlock and TransformerBlock * norms where they belong	2026-04-10 19:18:27 +08:00
b1tg	9ab1415937	llm: fix streaming UTF-8 decode (#15653 )	2026-04-10 17:01:02 +08:00
wozeparrot	55bcd7cc9e	llama amax outside (#15670 )	2026-04-09 23:08:03 -07:00
George Hotz	16f3448b26	Add HIP to abstractions4 (#15672 ) * cleanup formatting * add HIP option * pass in correct	2026-04-10 14:05:52 +08:00
George Hotz	ed2a72bb23	work on abstractions4 (#15671 ) * work on abstractions4 * works * offst * assembly works * RAND * cleanup * work	2026-04-10 13:25:11 +08:00
Christopher Milan	dbc23e8a1b	move HCQ_VISIBLE_DEVICES into DEV (#15668 )	2026-04-09 22:01:35 -04:00
George Hotz	fa02105546	hotfix: pin amd isa xml version	2026-04-10 06:47:00 +08:00
nimlgen	057dc173ab	beam uop (#15660 ) * beam as uop * x	2026-04-09 19:13:03 +03:00
nimlgen	0ff30b003d	am: reset queues from spi (#15664 ) * am: reset queues from spi * move	2026-04-09 18:25:50 +03:00
George Hotz	48a7627b04	add RDNA4 support to copy WMMA (#15663 ) * add RDNA4 supportt to copy WMMA * simpler * simpler * comment * assert	2026-04-09 22:48:20 +08:00
chenyu	6837881b06	remove same_shape_noop [pr] (#15662 ) no longer used	2026-04-09 09:50:26 -04:00
Christopher Milan	d08c76d9cb	c.Struct cleanup (#15640 )	2026-04-08 20:07:16 -04:00
qazal	742b3894d7	viz/cli: add pmc printer (#15651 ) * viz/cli: add pmc printer * cli work * s * linter * pack workgroups * add : to wgp * counter name	2026-04-09 08:50:54 +09:00
chenyu	4cf2759fc8	fix merge_reduce_ends (#15659 ) * fix merge_reduce_ends same range with different nesting should not merge, like cumsum twice should not merge * skip that	2026-04-08 17:20:01 -04:00
chenyu	cb681da840	move UOp.pad to mixin (#15657 ) the same arg works for Tensor.pad	2026-04-08 13:15:19 -04:00
nimlgen	28b14b0e38	mlx: remove to_be, use helpers (#15655 )	2026-04-08 20:07:28 +03:00
nimlgen	1b44cb2ac6	split update stat from execitem (#15654 )	2026-04-08 20:07:12 +03:00
qazal	71c83cc3f6	viz: put OTHER_ on the wave row (#15650 ) * viz: put OTHER_ on the wave row * update tests * cleanup cli	2026-04-08 23:13:44 +09:00
chenyu	839d37b7bc	update median_step_time in model_train.py (#15649 ) BENCHMARK=5 used to pick the 4th largest, not the middle one	2026-04-08 09:53:59 -04:00
chenyu	dae9dea903	clean up tensor random functions (#15648 ) * clean up tensor random functions * revert that	2026-04-08 09:44:37 -04:00
George Hotz	1ebeb52e59	RDNA4 asm gemm (#15427 ) * sqtt: rdna4 decoder work * diff cleanup * more diff * test * 125 * r4 --------- Co-authored-by: qazal <qazal.software@gmail.com> Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2026-04-08 21:26:44 +08:00
nimlgen	b1e52ba0c2	the slowest line in hcq graph (#15635 ) * the slowest line in hcq graph * x	2026-04-08 15:53:52 +03:00
qazal	3ac16b3bea	viz: add wmma row, update exec duration logic (#15646 ) * viz: split wmma to its own row, fix duration logic * regs * decrease number of loops, add pickle * assert overlaps	2026-04-08 20:24:23 +09:00
George Hotz	35e3983840	Add Q5_0, Q5_1, and bfloat16 GGUF types (#15644 )	2026-04-08 17:16:19 +08:00
qazal	39a029ec55	remove ASM_GEMM context var (#15645 )	2026-04-08 18:02:40 +09:00

1 2 3 4 5 ...

12935 Commits