tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
Christopher Milan	b47397ab17	list ml_dtypes as dependency for DSP (#14562 ) * pin onnxruntime to 1.23.2 for DSP * list ml_dtypes instead This reverts commit `84bb2cc0fc`.	2026-02-05 14:27:50 -05:00
chenyu	2b47a9a1b5	skip test_xlm_roberta_large (#14563 ) symlink model not allowed in latest onnxruntime	2026-02-05 14:00:24 -05:00
chenyu	42c18da88a	add Ops asserts in toposort sched_sink [pr] (#14561 ) more explicit	2026-02-05 12:40:02 -05:00
nimlgen	483bba4f05	nv: use prof_exec_counter (#14559 )	2026-02-05 19:00:14 +03:00
qazal	190042358f	llama: faster bf16 matmul / rope backward (#14558 )	2026-02-05 23:57:25 +09:00
George Hotz	b398335f62	assembly/amd: fix saturation in python remu (#14557 ) * PYTHONREMU: failing test for V_SUB_NC_U32_E64 clamp * fix saturation in PYTHON_REMU * simpler * more tests, less lines --------- Co-authored-by: Christopher Milan <chrismilan@ucla.edu>	2026-02-05 18:35:57 +08:00
wozeparrot	c1ea6687e5	fa: simpler is faster (#14548 )	2026-02-05 01:13:17 -08:00
George Hotz	43e7eda4e7	grad_b uses custom gemm (#14550 ) * grad_b uses custom gemm * fix multi backward, acc is in float32 * test_gemm_batched * square gemm --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com> Co-authored-by: qazal <qazal.software@gmail.com>	2026-02-05 15:22:27 +09:00
qazal	f9cfb64cd9	test asm_gemm in CI (#14551 ) * test asm_gemm in CI * default float16 * use a smaller shape for multi * smaller size * smaller for CI * smaller for ci * need half	2026-02-05 13:32:22 +09:00
chenyu	c0ca7f9c51	use more UOp.sum and UOp.prod [pr] (#14549 )	2026-02-04 22:05:20 -05:00
chenyu	e8dace41b6	clean up UOp.vars [pr] (#14547 )	2026-02-04 20:52:25 -05:00
Christopher Milan	232848d086	PYTHONREMU: VOP3P integer operations with constants don't cast to fp16 (#14546 ) * PYTHONREMU: VOP3P integer operations with constants don't cast to fp16 * put that back * cleaner * do that once	2026-02-04 20:10:59 -05:00
wozeparrot	2966619834	feat: llama uses enable_gqa during training (#14545 )	2026-02-04 16:22:31 -08:00
chenyu	664f1bf76d	minor ops/jit cleanups [pr] (#14543 )	2026-02-04 17:21:34 -05:00
chenyu	03d0fa9c3f	merge as_buf into buf_uop [pr] (#14541 )	2026-02-04 16:32:23 -05:00
chenyu	43ef24a8af	remove buf_target [pr] (#14540 ) not really needed	2026-02-04 15:03:47 -05:00
chenyu	8b7343b950	clean up is_realized [pr] (#14538 ) base cannot be Ops.MULTI since MULTI is a view now	2026-02-04 14:24:10 -05:00
Christopher Milan	5338ce6b74	test S_PACK in extra/assembly/amd/test/hw (#14537 ) * S_PACK_LL_B32_B16 in test/hw * add rest of S_PACK instructions	2026-02-04 14:17:16 -05:00
chenyu	9052db678f	remove allow_shape_mismatch in Tensor.replace (#14536 ) move all logic to torch_backend and not hacking Tensor method	2026-02-04 12:38:18 -05:00
nimlgen	ec2b6bbda8	hcq: update signal logic (#14531 )	2026-02-04 19:32:56 +03:00
nimlgen	62786d488a	am: mi3xx perf (#14529 )	2026-02-04 19:32:43 +03:00
chenyu	d57d24c7d4	Buffer.as_buffer -> Buffer.as_memoryview [pr] (#14535 ) it casts to memoryview. also inline the as_typed_buffer checks to Tensor._data	2026-02-04 11:31:11 -05:00
chenyu	024f57ecf5	jit input_buffers cleanup [pr] (#14532 )	2026-02-04 10:14:38 -05:00
chenyu	67f91e897b	UOp.is_contiguous -> UOp.has_buffer_identity [pr] (#14530 ) one more confusing buffer related method, but it's definitely not is_contiguous	2026-02-04 09:21:26 -05:00
George Hotz	fb9df1e031	pretty print binary (#14520 )	2026-02-04 18:04:35 +08:00
Christopher Milan	8c3c026d86	decomp float16 to float32 (#14417 ) * decomp float16 to float32 * denormals arent zero * add test * denormals are zero * fix * oops * bitcast works * fix LOADs * test_dtype passing * cleanup * mypy * debug print * only emulate if EMULATED * very ugly, but passes spec * add test_dtype_alu tests * Revert "very ugly, but passes spec" This reverts commit `fdc3999b65`. * bottom up decompositions * that should have symbolic * simplify a bit * SPEC really works * run with DEBUG * debug=4 * rm debug	2026-02-04 01:37:47 -05:00
Christopher Milan	ecbce5269e	PYTHONREMU properly supports S_PACK_LL_B32_B16 (#14527 ) * PYTHONREMU properly supports S_PACK_LL_B32_B16 * default	2026-02-03 23:45:33 -05:00
wozeparrot	720c9597a9	feat: llama uses is_causal on sdpa during training (#14528 )	2026-02-03 20:24:30 -08:00
chenyu	9c2fc118ef	relax setitem target check (#14526 ) old check was too conservative	2026-02-03 22:32:49 -05:00
qazal	d1bfbe9ce3	isolate slow llama gemm (#14525 )	2026-02-04 12:20:10 +09:00
nimlgen	2f55005ad9	qcom: sync cpu cache when from_blob (#14518 ) * um * fx * d * x * x * x * x * f * ren	2026-02-03 21:51:03 +03:00
chenyu	ee9d6a1f36	remove DEFINE_VAR in to_define_global [pr] (#14522 ) not needed	2026-02-03 10:12:33 -05:00
Nino Risteski	af4c74bb41	delete extra cast (#14517 )	2026-02-03 08:29:04 -05:00
chenyu	9d1e9e643e	removed a duplicated remove_bufferize rule [pr] (#14519 )	2026-02-03 08:28:07 -05:00
George Hotz	d59e6e7a37	move more tests to test/null, split some existing ones (#14512 ) * move more tests to test/null, split some existing ones * null work * null work * move more * fixes * move PIL * PIL in CLIP * don't move that	2026-02-03 20:20:20 +08:00
qazal	a98c53769a	ASM_GEMM=1 runs the UOp gemm on non cdna (#14516 ) * ASM_GEMM=1 runs the UOp gemm on non cdna tests run on mac in 3 seconds * min diff	2026-02-03 20:42:02 +09:00
qazal	5c1d21349e	viz: profiler command line tool (#14515 )	2026-02-03 19:51:25 +09:00
George Hotz	dd2de4f838	rename all DEFINE_GLOBAL to PARAM (#14511 )	2026-02-03 15:09:38 +08:00
George Hotz	dc77b3318b	move files that pass with NULL=1 to test/null (#14508 ) * move files that pass with NULL=1 to test/null * fix windows * cpu 0 * bugfix + durations	2026-02-03 13:52:36 +08:00
George Hotz	888819ee09	call autodiff gradient (#14510 )	2026-02-03 13:51:02 +08:00
wozeparrot	bbcd3d67a3	fa: faster (#14453 )	2026-02-02 21:34:17 -08:00
Christopher Milan	e579613b90	IR3 has aux (#14509 )	2026-02-02 23:46:41 -05:00
George Hotz	85c7b23160	add pytest -nauto to benchmark for mac (#14458 ) * add pytest -nauto to benchmark * 3 minute timeout * 3 min * setup env * comment * fresh db * in the pyenv	2026-02-03 12:26:09 +08:00
Christopher Milan	a5d7eb37db	IR3 works on versions earlier than 3.14 (#14507 )	2026-02-02 23:10:19 -05:00
George Hotz	33c886cafa	disable copyout on NULL backend by default (#14506 ) * disable copyout on NULL backend * gate it * allow copyout on some tests	2026-02-03 11:57:47 +08:00
chenyu	3c5845e8a5	remove cut_store_range (#14505 ) special scheduling for CPU	2026-02-02 21:58:36 -05:00
chenyu	4f2e7aed24	fix multiple REDUCE on same RANGE (#14504 ) each RANGE maps to one END, but reduce_to_acc is local and would not know this	2026-02-02 20:42:09 -05:00
chenyu	93c41a78fa	clean up NOOP [pr] (#14503 ) should not be used as a COPY, started with removing from ALWAYS_RUN_OPS	2026-02-02 19:46:45 -05:00
chenyu	66d2b02f11	delete files that depends on extra.optimization.helpers (#14499 )	2026-02-02 13:33:33 -05:00
George Hotz	ec0398fceb	test amd gpu crashes (#14459 ) * test amd gpu crashes * cleanup * less sketch tests	2026-02-02 18:57:47 +03:00

1 2 3 4 5 ...

12053 Commits