tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 15:08:02 -05:00

Author	SHA1	Message	Date
nimlgen	25440f0f72	all2all (#13902 ) * all2all * um * fix * x * um * simler * mypy * fix * t * cmnts	2025-12-31 16:38:32 +03:00
George Hotz	43c6e973d8	add optional compiler in Renderer (#13817 ) * add optional compiler in Renderer [pr] * fix * late init * remove precompiled * cleanup	2025-12-23 17:58:46 -05:00
nimlgen	90b217896f	am: xgmi p2p (#13811 ) * system: use addr space * am: xgmi * fix * ugh	2025-12-23 20:11:38 +03:00
George Hotz	8dcba2e2cc	no full_rewrite [pr] (#13809 ) * no full_rewrite [pr] * fix * fix docs	2025-12-22 23:20:01 -05:00
chenyu	7f1d41c9f9	delete files that import ShapeTracker (#13805 )	2025-12-22 15:54:18 -05:00
George Hotz	45c459848d	remove more stale stuff (#13765 ) * remove more stale stuff * remove disassemblers/adreno * stale	2025-12-19 17:14:56 -04:00
George Hotz	744af193f0	remove ScheduleItem and merge it with ExecItem (#13759 ) * remove ExecItem and merge it with ScheduleItem * less diff * fix issues * min diff * don't change bufs in _lower * min diff * update * revert * fixes * diff	2025-12-19 17:04:24 -04:00
George Hotz	3dbde178c1	mark slow tests as slow instead of as CI (#13736 ) * mark slow tests as slow instead of as CI * CI shouldn't have different behavior * more skips / CI * slow	2025-12-17 10:29:57 -04:00
George Hotz	4b741e893f	remove REMOTE=1 (#13722 ) * remove REMOTE=1 * leave ibverbs	2025-12-16 15:58:10 -04:00
nimlgen	e36385e570	am: support xgmi systems (#13659 ) * am: support xgmi systems * fake_am	2025-12-12 18:55:45 +03:00
Douglas Nyberg	947c6eefc3	add Swish op (#13541 ) * add Swish ONNX operator * add Swish regression test * remove trailing whitespace * upgrade ONNX to 1.20, add excludes for unimplemented ops * upgrade ONNX to 1.19, add Swish op * upgrade ONNX to 1.19, TensorFlow to 2.18, add Swish op * exclude attention_3d and attention_4d_gqa tests * exclude attention fp16 tests * exclude all attention tests * retrigger CI * retrigger CI - worker crash	2025-12-08 12:41:18 -05:00
George Hotz	c5bd28e21d	start work on schedule cache (#13529 ) * start work on schedule cache * local unique * schedule cache works * schedule cache cleanup * fix tests * preserve metadata * oops, fix cache * put that there * fix spec * always miss * why is that broken? * src[0].op * fix process replay * delete abstractions2 * reenable the actual schedule cache * metadata is best effort * fix JIT in examples/gradaccum_mnist.py * full jit * fixed and test is real	2025-12-04 17:24:49 -08:00
Douglas Nyberg	a8a62bc08e	add max/min reduction support to ScatterND (#13562 )	2025-12-04 00:53:47 -08:00
Douglas Nyberg	f5abd38132	remove tfa dependency: use keras.optimizers.Lamb and tf.raw_ops for LARS (#13555 )	2025-12-03 17:48:27 -05:00
George Hotz	055d5aeb7f	add external_test_process_count	2025-12-02 17:26:30 -08:00
qazal	366badaa68	require renderer argument in get_program, removes device opening in process replay [pr] (#13524 )	2025-12-03 02:05:31 +08:00
George Hotz	6a140f74fe	split out unique_const and cache const [pr] (#13493 ) * split out unique_const * add cache to const * call const in unique_const	2025-11-29 10:44:28 -08:00
George Hotz	18addc0a1d	process replay only get_program (#13475 )	2025-11-27 08:18:18 -08:00
George Hotz	a8e005b095	enable process replay (non-checking) by default (#13474 )	2025-11-27 07:28:44 -08:00
George Hotz	05cd2279d0	add cache on reshape (#13466 ) * remove cache on divmod, way less objects * _apply_reshape * reshape * no gc on realize * wow that cache is fast	2025-11-26 18:57:40 -08:00
Sieds Lykles	63a931ff76	Symbolic divisor fuzzer (#13433 ) * render z3 range better * working version * rename * add to workflow * factor out variable_names * smaller expressions * smaller * + back	2025-11-23 20:29:32 +01:00
qazal	9dcd52287a	add external_benchmark_pyrender (#13378 ) * add external_benchmark_pyrender * can ctrlc it * cpu_profile exists	2025-11-20 17:38:28 +08:00
George Hotz	8919c994b7	Revert "AxisType.PLACEHOLDER in reshape to do less graph_rewrite (#13373 )" (#13375 ) This reverts commit `ac7559e33d`.	2025-11-19 19:34:30 -08:00
George Hotz	ac7559e33d	AxisType.PLACEHOLDER in reshape to do less graph_rewrite (#13373 ) * AxisType.PLACEHOLDER in reshape to do less graph_rewrite * _apply_movement_op cache	2025-11-19 19:19:58 -08:00
George Hotz	ab7df42c78	bring back fold_divmod_general with bugfix and test [pr] (#13369 ) * Revert "Revert "merge to fold_divmod_general [p] (#13359)"" This reverts commit `05ccc69248`. * Revert "Revert "actually merge to fold_divmod_general [pr] (#13363)"" This reverts commit `90e5752199`. * Revert "Revert "add cache to fold_divmod_general (#13365)"" This reverts commit `8e17bd6791`. * bring back fold_divmod_general with bugfix and test	2025-11-19 14:51:51 -08:00
George Hotz	8e17bd6791	Revert "add cache to fold_divmod_general (#13365 )" This reverts commit `b5309a5043`.	2025-11-19 14:18:08 -08:00
George Hotz	b5309a5043	add cache to fold_divmod_general (#13365 )	2025-11-19 13:49:18 -08:00
George Hotz	6fdbd03104	more divmod cleanup [p] (#13358 ) * more divmod cleanup [p] * lil cleanups, faster	2025-11-19 10:35:15 -08:00
George Hotz	385618d45b	skip process replay by default (#13353 )	2025-11-19 08:25:34 -08:00
George Hotz	6d3385c284	print special ops in postrange (#13318 ) * print special ops in postrange * fix on OSX	2025-11-17 14:43:23 -08:00
George Hotz	e5351699bd	openpilot warp (#13283 ) * openpilot image warp test * 0.4 ms on metal, 1 ms on CPU * new inputs each time * reshape	2025-11-14 13:55:32 -08:00
wozeparrot	759557f633	feat: move tk tests to testextra (#13242 )	2025-11-12 17:06:53 -08:00
Jan Akhremchik	bc8e537423	Add NONZERO op to onnx backend (#13211 )	2025-11-12 08:55:51 -08:00
wozeparrot	371c1f2355	tk: move tiles to class (#13224 )	2025-11-11 21:53:46 -08:00
wozeparrot	222bb12ddf	tk softmax (#13205 )	2025-11-11 15:13:16 -08:00
wozeparrot	73497af4c0	clean: use np for allclose (#13204 )	2025-11-10 23:02:43 -08:00
chenyu	22b8579234	one last regressed dm kernel (#13201 )	2025-11-10 23:30:52 -05:00
chenyu	829cdafccc	update openpilot slow conv uop ast (#13197 ) the two remaining slow ones	2025-11-10 17:03:20 -05:00
wozeparrot	6252831ceb	feat: initial tk library (#13160 )	2025-11-09 22:54:29 -08:00
chenyu	2ba8b4946f	external_benchmark_op_cat.py (#13168 ) * external_benchmark_op_cat.py cat kernel that's 1ms on master and 50us with no GROUP and with NOLOCALS * fix	2025-11-08 01:54:10 -05:00
nimlgen	dafdb4bfb1	test hcq open with pytest (#13124 ) * test hcq open with pytest * fi	2025-11-06 20:09:51 +08:00
nimlgen	05e2ff4d87	system: fix flock on pcidevs (#13123 ) * system: fix locking of hcq devices * rename and fullrun * force ok * fix * fix	2025-11-06 19:02:13 +08:00
chenyu	18d4ecc1f3	lower nv test_gemm_4096 target (#13107 )	2025-11-05 11:05:16 -05:00
chenyu	54141e9cb9	DISABLE_COMPILER_CACHE=1 in speed_v_theoretical (#13096 )	2025-11-04 11:28:18 -05:00
chenyu	f6430a0559	add script for one slow openpilot conv (#12953 ) * add script for one slow openpilot conv * fix ruff	2025-10-30 18:08:41 -04:00
George Hotz	f5a3b33d33	add fun with nhwc convs	2025-10-28 17:12:22 +08:00
Sieds Lykles	e22c5e7e73	process_replay uses opts argument for KernelInfo.opts_to_apply (#12946 ) * opts_to_apply is opts * skip beamed kernels * simpler change * fix the tensor cores tests for process replay * use opts	2025-10-28 09:00:28 +01:00
chenyu	a79832b01f	control_flow.py -> linearizer.py [pr] (#12948 )	2025-10-27 12:38:13 -04:00
Sieds Lykles	e1f8c82938	Onnx Layer/Group/RMS/Batch-Norm ReduceL2 fp32 intermediates for fp16 (#12109 ) * match onnx spec * use least_upper_dtype * promote the square * just cast before the square	2025-10-24 12:26:11 +02:00
George Hotz	7762b3558b	clean up the spec (#12868 ) * tighten up the spec * move validate into a different file * that moved to validate * after(barr)	2025-10-22 19:50:42 +08:00

1 2 3 4 5 ...

929 Commits