tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
nimlgen	486d53d646	device: call free for external_ptr (#14448 ) * device: call free for external_ptr * lin	2026-01-30 23:53:17 +03:00
nimlgen	e0978498dc	amd: read_ptr/write_ptr/doorbells are not lists (#14445 )	2026-01-30 23:11:57 +03:00
Christopher Milan	1803ee939d	EMULATED_DTYPES=long works with CPU_LLVM (#14446 )	2026-01-30 13:54:43 -05:00
chenyu	03613e83ad	update TestTensorMetadata (#14443 ) run with SCACHE=0 some more TODOs	2026-01-30 12:39:01 -05:00
George Hotz	cbb1eed57b	hotfix: partial revert of `9eb449f88`, caused llama NaN	2026-01-30 17:19:27 +00:00
chenyu	26f5c00265	move TestTensorMetadata to unit (#14442 )	2026-01-30 12:14:21 -05:00
chenyu	c05a0b85ae	flip unique const src order [pr] (#14441 ) * flip unique const src order [pr] matches buffer, simplifies replace_input_buffer * combine rules	2026-01-30 11:44:18 -05:00
George Hotz	ee2c78709d	mlperf/llama: disable USE_ATOMICS for now	2026-01-31 00:42:08 +08:00
chenyu	beecac4d85	expand ranges -> unroll outer ranges [pr] (#14440 )	2026-01-30 11:26:05 -05:00
chenyu	9eb449f882	clean up toposort sched_sink [pr] (#14439 )	2026-01-30 10:18:28 -05:00
George Hotz	838cd078bc	use atomics for embedding backward (#14400 ) * embedding is slow * failing * float is fine * null * it fails * simplify embedding with broadcasting * ATOMIC_ADD incoming * min change * simpler test * better test * fix test * real test * simpler * cleanups * types and names * _zero_kernel * grad multi * hack * none * multi unshard * more for call * don't tag in call * good * call_multi * call_multi wow claude is useless * embedding backward mutli test * test passes * fix as_param * shape_to_shape_arg * add clip * before cast * fix spec=2, use atomics	2026-01-30 18:10:59 +08:00
nimlgen	1998e0bb28	nv: add prof props to dev (#14437 )	2026-01-30 12:51:43 +03:00
George Hotz	7a9dee4e50	add call/param UOps (#14433 ) * add call/param UOps * resolve call * skip that for now * grad on call * fix tests	2026-01-30 14:51:45 +08:00
qazal	66d6a68016	viz: sqtt work from cdna gemm (#14434 ) * it's the tag * initialize rows based on the disasm * test_cfg with Ops.BINARY * pyremu wants s_code_end? * test_diamond * diff cleanup	2026-01-30 14:00:56 +09:00
Christopher Milan	88caf57ef4	ci: unify python versions (#14430 )	2026-01-29 21:42:03 -05:00
chenyu	86a204d22a	allow Tensor setitem input to be list/tuple (#14432 ) matches assign, and generally matches numpy	2026-01-29 21:26:58 -05:00
chenyu	4a80319093	clean up split_store final logic [pr] (#14429 ) explicitly check the structure	2026-01-29 18:40:07 -05:00
Christopher Milan	e47f12f671	ci: replace testing_minimal with testing_unit (#14427 )	2026-01-29 18:02:43 -05:00
wozeparrot	c2fb8b208f	fa: 32 block size (#14416 )	2026-01-29 13:59:13 -08:00
chenyu	a979fafae5	cleanup around disk buffer [pr] (#14428 ) style change, prep for refactor	2026-01-29 16:18:44 -05:00
nimlgen	dc977a03b0	nv_pma: bw decoder (#14424 ) * nv_pma: bw decoder * decoder fix * better	2026-01-30 00:12:39 +03:00
chenyu	ddc041854b	failed test case for disk setitem (#14426 ) strided setitem is wrong	2026-01-29 14:54:19 -05:00
chenyu	31706bf6bc	add few more types [pr] (#14425 )	2026-01-29 14:04:09 -05:00
nimlgen	2d5c24879f	nv: pma for 5090 (#14420 ) * nv: pma for 5090 * hm * 4090	2026-01-29 20:06:01 +03:00
nimlgen	c8dc6332d2	memory: read_fields is not universal (#14348 )	2026-01-29 20:00:00 +03:00
chenyu	dbe8f034a7	pass z3.Context in validate ctx [pr] (#14423 ) does not need to pass the whole solver	2026-01-29 11:11:47 -05:00
chenyu	033ce1b885	types for validate.py (#14422 )	2026-01-29 10:56:50 -05:00
nimlgen	230d08ec70	test for am recovery and faults handling (#14421 ) * test for am recovery and faults handling * linter	2026-01-29 17:11:24 +03:00
George Hotz	793afbd473	simplify nn.Embedding, support AFTER in CUSTOM_KERNEL (#14419 )	2026-01-29 17:22:13 +08:00
Christopher Milan	0c855d6149	ci: remove unused pydeps (#14418 )	2026-01-29 01:51:26 -05:00
wozeparrot	4845e42135	llama3 gradacc fixes (#14414 )	2026-01-28 19:12:39 -08:00
chenyu	37cde4a01a	add one line mypy report (#14415 )	2026-01-28 20:39:32 -05:00
chenyu	15aed51544	return types for all math.py function (#14413 ) calling int() on sint -> int, i think it's better support since some UOp can be safely cast to int	2026-01-28 20:10:11 -05:00
nimlgen	aec1ae0de1	llama: set manual_seed (#14409 )	2026-01-28 14:40:00 -08:00
chenyu	0870ed28b1	add Self type to MathMixin (#14411 ) these don't cause error	2026-01-28 16:59:38 -05:00
chenyu	079f33c208	fix type in Tensor.mean and Tensor.var (#14410 ) use Tensor.from_uop to wrap UOp from symbolic shape, kernels are the same	2026-01-28 15:24:02 -05:00
chenyu	2b5e99ccc1	minor type cleanups [pr] (#14408 ) mypy --warn-redundant-casts has false negative	2026-01-28 14:11:50 -05:00
chenyu	726415dbc8	import sint directly in movement.py TYPE_CHECKING (#14406 ) avoid creating string TypeAlias, fixed warning in `TYPED=1 python test/test_tiny.py`	2026-01-28 12:47:26 -05:00
nimlgen	acb2fc36ba	nv_pma: add decoder (#14404 ) * nv_pma: add decoder * cl	2026-01-28 20:44:02 +03:00
chenyu	7b9bc1d8cf	_MockMemoryviewMeta for mockgpu (#14405 ) fixed `PYTHONPATH=. TYPED=1 DEV=AMD MOCKGPU=1 python test/test_tiny.py`. basically make `isinstance(TrackedMemoryView_instance, memoryview)` true	2026-01-28 11:59:00 -05:00
chenyu	93793a645b	use cl.cl_mem instead of internal ctypes._CData (#14403 ) fixed `CHECK_OOB=0 DEV=CL TYPED=1 python test/test_tiny.py`	2026-01-28 10:56:41 -05:00
chenyu	a9b44070a8	fix webgpu runtime types (#14402 ) `CHECK_OOB=0 DEV=WEBGPU TYPED=1 python test/test_tiny.py` passed, also skip tests that failed locally	2026-01-28 10:37:25 -05:00
George Hotz	0c6b3f50aa	add marker to llama training (#14401 )	2026-01-28 22:44:28 +08:00
Jakob Sachs	2b7c00d3d2	fix sd-example dtype for CLIP embeddings (#14397 )	2026-01-28 09:07:19 -05:00
qazal	a5a9ce3fdf	viz: disasm cleanups from null emulate (#14399 ) * it's AMDHIPRenderer * don't need that indent * less assignment stuff * that arg order did not make sense * pmc	2026-01-28 22:03:30 +09:00
nimlgen	544928766d	hcq_smi: kill mac pids (#14398 )	2026-01-28 15:00:28 +03:00
George Hotz	202b74b369	assembly/amd: continue refactors (#14386 ) * simpler * merge * flat * no ctx * use the correct apis * dup code * write clean code * remove bad helpers * bits junk remove * junk remove * smem test * fix tests * correct fix + tests * Fmt matters it seems * wmma refactor * a lil more * kimi cleanups * line	2026-01-28 17:33:03 +08:00
qazal	5bffa17f82	llama train: better NULL=1 EMULATE=AMD_CDNA4 dev experience (#14395 ) * beam opens devices * switch to hip renderer * amd: true? * llvm true is for test_autogen	2026-01-28 17:31:22 +09:00
qazal	0294014108	fix bufferize cost function for multi, improve VIZ=-1 cli (#14394 ) * improve cli * remove_bufferize change	2026-01-28 15:53:18 +09:00
qazal	c158acea29	failing multi ram usage test from llama gemm (#14392 )	2026-01-28 14:32:32 +09:00

1 2 3 4 5 ...

11963 Commits