tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
qazal	616e9c1483	CDNA assembly gemm in tensor.py with flag (#14310 ) * work * work * the assembly * remove the old one * remove ws bufs, assert splitk * notes cleanup * work * gemm args * gemm in mixins would be nice * add gemm gradient * print counters * the realize is for DEBUG=2 aesthetics * dedup * rewrite to python dsl, no list copies * leave that * add B, M, N, K to gemm name * it's M0 not NULL * fp16 support * test cleanup + more gemms * work from viz * more work * gemm batch_size * xccg path work * tiny comments on the label naming * s_waitcnt	2026-01-31 22:34:14 +09:00
chenyu	55f806b713	tighter late_buffer_view match [pr] (#14456 ) src must be len 2 at that point	2026-01-31 07:28:26 -05:00
qazal	d69bc5aa1a	make DEV=NULL EMULATE=AMD amd_asm_matmul run (#14460 )	2026-01-31 20:45:24 +09:00
qazal	4976544bf9	multi ram usage tests on the NULL device (#14457 )	2026-01-31 14:14:53 +09:00
chenyu	99b44121bc	failed test case for non-consecutive disk read (#14455 ) silently fail now	2026-01-30 23:44:04 -05:00
George Hotz	b705c9143c	assembly/amd: test more instructions (#14365 ) * assembly/amd: test more instructions * more * passing * revert * no const fold * remove junk * cleaner	2026-01-31 12:40:22 +08:00
George Hotz	c9a3ddb341	benchmark llama walltime script (#14454 ) * benchmark llama walltime script * adj layers	2026-01-31 10:21:54 +08:00
George Hotz	f5346d6a1a	fix USE_ATOMICS for non float dtypes and make it the default (#14444 ) * embedded multistep test * complex test * with jit * fix dtypes and reenable USE_ATOMICS * that test didn't catch anything	2026-01-31 09:44:16 +08:00
Christopher Milan	e575dd8275	prevent UB in long decomp and more emulated tests (#14447 )	2026-01-30 19:38:41 -05:00
chenyu	3204f94454	correct var_vals schedule filter (#14451 ) complete_create_schedule_with_vars returns var_vals that's used in schedule	2026-01-30 17:10:07 -05:00
chenyu	cfcd1debb5	test schedule with multiple AFTER (#14449 )	2026-01-30 15:59:00 -05:00
nimlgen	486d53d646	device: call free for external_ptr (#14448 ) * device: call free for external_ptr * lin	2026-01-30 23:53:17 +03:00
nimlgen	e0978498dc	amd: read_ptr/write_ptr/doorbells are not lists (#14445 )	2026-01-30 23:11:57 +03:00
Christopher Milan	1803ee939d	EMULATED_DTYPES=long works with CPU_LLVM (#14446 )	2026-01-30 13:54:43 -05:00
chenyu	03613e83ad	update TestTensorMetadata (#14443 ) run with SCACHE=0 some more TODOs	2026-01-30 12:39:01 -05:00
George Hotz	cbb1eed57b	hotfix: partial revert of `9eb449f88`, caused llama NaN	2026-01-30 17:19:27 +00:00
chenyu	26f5c00265	move TestTensorMetadata to unit (#14442 )	2026-01-30 12:14:21 -05:00
chenyu	c05a0b85ae	flip unique const src order [pr] (#14441 ) * flip unique const src order [pr] matches buffer, simplifies replace_input_buffer * combine rules	2026-01-30 11:44:18 -05:00
George Hotz	ee2c78709d	mlperf/llama: disable USE_ATOMICS for now	2026-01-31 00:42:08 +08:00
chenyu	beecac4d85	expand ranges -> unroll outer ranges [pr] (#14440 )	2026-01-30 11:26:05 -05:00
chenyu	9eb449f882	clean up toposort sched_sink [pr] (#14439 )	2026-01-30 10:18:28 -05:00
George Hotz	838cd078bc	use atomics for embedding backward (#14400 ) * embedding is slow * failing * float is fine * null * it fails * simplify embedding with broadcasting * ATOMIC_ADD incoming * min change * simpler test * better test * fix test * real test * simpler * cleanups * types and names * _zero_kernel * grad multi * hack * none * multi unshard * more for call * don't tag in call * good * call_multi * call_multi wow claude is useless * embedding backward mutli test * test passes * fix as_param * shape_to_shape_arg * add clip * before cast * fix spec=2, use atomics	2026-01-30 18:10:59 +08:00
nimlgen	1998e0bb28	nv: add prof props to dev (#14437 )	2026-01-30 12:51:43 +03:00
George Hotz	7a9dee4e50	add call/param UOps (#14433 ) * add call/param UOps * resolve call * skip that for now * grad on call * fix tests	2026-01-30 14:51:45 +08:00
qazal	66d6a68016	viz: sqtt work from cdna gemm (#14434 ) * it's the tag * initialize rows based on the disasm * test_cfg with Ops.BINARY * pyremu wants s_code_end? * test_diamond * diff cleanup	2026-01-30 14:00:56 +09:00
Christopher Milan	88caf57ef4	ci: unify python versions (#14430 )	2026-01-29 21:42:03 -05:00
chenyu	86a204d22a	allow Tensor setitem input to be list/tuple (#14432 ) matches assign, and generally matches numpy	2026-01-29 21:26:58 -05:00
chenyu	4a80319093	clean up split_store final logic [pr] (#14429 ) explicitly check the structure	2026-01-29 18:40:07 -05:00
Christopher Milan	e47f12f671	ci: replace testing_minimal with testing_unit (#14427 )	2026-01-29 18:02:43 -05:00
wozeparrot	c2fb8b208f	fa: 32 block size (#14416 )	2026-01-29 13:59:13 -08:00
chenyu	a979fafae5	cleanup around disk buffer [pr] (#14428 ) style change, prep for refactor	2026-01-29 16:18:44 -05:00
nimlgen	dc977a03b0	nv_pma: bw decoder (#14424 ) * nv_pma: bw decoder * decoder fix * better	2026-01-30 00:12:39 +03:00
chenyu	ddc041854b	failed test case for disk setitem (#14426 ) strided setitem is wrong	2026-01-29 14:54:19 -05:00
chenyu	31706bf6bc	add few more types [pr] (#14425 )	2026-01-29 14:04:09 -05:00
nimlgen	2d5c24879f	nv: pma for 5090 (#14420 ) * nv: pma for 5090 * hm * 4090	2026-01-29 20:06:01 +03:00
nimlgen	c8dc6332d2	memory: read_fields is not universal (#14348 )	2026-01-29 20:00:00 +03:00
chenyu	dbe8f034a7	pass z3.Context in validate ctx [pr] (#14423 ) does not need to pass the whole solver	2026-01-29 11:11:47 -05:00
chenyu	033ce1b885	types for validate.py (#14422 )	2026-01-29 10:56:50 -05:00
nimlgen	230d08ec70	test for am recovery and faults handling (#14421 ) * test for am recovery and faults handling * linter	2026-01-29 17:11:24 +03:00
George Hotz	793afbd473	simplify nn.Embedding, support AFTER in CUSTOM_KERNEL (#14419 )	2026-01-29 17:22:13 +08:00
Christopher Milan	0c855d6149	ci: remove unused pydeps (#14418 )	2026-01-29 01:51:26 -05:00
wozeparrot	4845e42135	llama3 gradacc fixes (#14414 )	2026-01-28 19:12:39 -08:00
chenyu	37cde4a01a	add one line mypy report (#14415 )	2026-01-28 20:39:32 -05:00
chenyu	15aed51544	return types for all math.py function (#14413 ) calling int() on sint -> int, i think it's better support since some UOp can be safely cast to int	2026-01-28 20:10:11 -05:00
nimlgen	aec1ae0de1	llama: set manual_seed (#14409 )	2026-01-28 14:40:00 -08:00
chenyu	0870ed28b1	add Self type to MathMixin (#14411 ) these don't cause error	2026-01-28 16:59:38 -05:00
chenyu	079f33c208	fix type in Tensor.mean and Tensor.var (#14410 ) use Tensor.from_uop to wrap UOp from symbolic shape, kernels are the same	2026-01-28 15:24:02 -05:00
chenyu	2b5e99ccc1	minor type cleanups [pr] (#14408 ) mypy --warn-redundant-casts has false negative	2026-01-28 14:11:50 -05:00
chenyu	726415dbc8	import sint directly in movement.py TYPE_CHECKING (#14406 ) avoid creating string TypeAlias, fixed warning in `TYPED=1 python test/test_tiny.py`	2026-01-28 12:47:26 -05:00
nimlgen	acb2fc36ba	nv_pma: add decoder (#14404 ) * nv_pma: add decoder * cl	2026-01-28 20:44:02 +03:00

1 2 3 4 5 ...

11974 Commits