tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-07 03:00:26 -04:00

Author	SHA1	Message	Date
George Hotz	bf769fa5c5	label ranges with their number (#9805 )	2025-04-09 14:31:18 +08:00
chenyu	c5db5b83b9	add SHOULD_USE_TC=1 check to simple_matmul (#9802 ) * add SHOULD_USE_TC=1 check to simple_matmul also zero centered the random input and update atol for tf32 * ATOL=2e-2 for HALF	2025-04-09 02:24:42 -04:00
qazal	f27dbc8c35	becomes_map cleanups [pr] (#9790 ) * cleanup becomes_map [pr] * source	2025-04-09 14:11:53 +08:00
qazal	7d2349c827	track_rewrites in scheduler [pr] (#9801 )	2025-04-09 12:48:14 +08:00
George Hotz	bb18adb0d5	reduce with a mul chain (#9799 ) * reduce with a mul chain * inside is just 1	2025-04-09 12:42:32 +08:00
George Hotz	78caf55154	Revert "FP8 support on NVIDIA (#8631 )" This reverts commit `2c8e4ea865`.	2025-04-09 12:27:41 +08:00
George Hotz	d1505137ad	Revert "move TestOpsFp8s skipTest (#9797 )" This reverts commit `a3aaf92b21`.	2025-04-09 12:27:40 +08:00
George Hotz	14928fecff	Revert "fix TF32 tensor core dropped in tc_sm89 (#9798 )" This reverts commit `7c9a96824f`.	2025-04-09 12:27:39 +08:00
qazal	1ed4eae510	hotfix: don't add shape to SINK viz node (#9800 )	2025-04-09 12:04:33 +08:00
chenyu	7c9a96824f	fix TF32 tensor core dropped in tc_sm89 (#9798 ) also add `SHOULD_USE_TC=1` to verify TC is applied in simple_matmul	2025-04-08 23:20:50 -04:00
chenyu	a3aaf92b21	move TestOpsFp8s skipTest (#9797 ) so get_available_devices is not called when running other tests	2025-04-08 22:44:07 -04:00
pkotzbach	2c8e4ea865	FP8 support on NVIDIA (#8631 ) * squashed fp8 commits * tensorcore start * minor changes * pre-commit * pylint * Delete fp8mul.cu * clean * small bugfix * fix test_dtype * fix test_dtype_alu * add EMULATE_CUDA_SM89 * fix ci * fix test_linearizer * fix test_linearizer * fix swizzle * add debug to simple_matmul * fixed swizzle * python emulator * refactor python emulator * setup fix * numpy setup * ml_dtypes only in emulate_cuda_sm89 * fix pylint * fix tests * fix mypy * fix mypy * fix ruff * done python emulator * add acc type * tests * mypy * clean code * add cuda tensor core tests to CI * minor fix * clean test_dtype.py * clean cstyle.py * clean test_ops.py * fix test * fix test * whitespaces * pylint * pylint * amd? * amd? * amd * reduce lines * mockgpu remove * fix * ruff * ruff * fix mypy * ruff * test only for cuda * fixed formatting * small fixes * small fix * least_upper_dtype if fp8s not supported * log and reciprocal are supported for fp8s * ops python fixes * dtypes.fp8s use * e4m3 + e5m2 result dtype test * truncate linter fix --------- Co-authored-by: pkotzbach <pawkotz@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-04-08 21:54:04 -04:00
hooved	5d85765327	types for WebGPU runtime (#9791 ) * add type annotations to ops_webgpu * rerun CI * add types to some _run params	2025-04-08 22:52:17 +03:00
chenyu	4c8582a7ce	pipe allow_test_size in _time_program (#9789 ) * pipe allow_test_size in _time_program it was dropped long time ago and BEAM_ESTIMATE is doing nothing * revert BEAM_ESTIMATE	2025-04-08 09:07:40 -04:00
chenyu	8fe83385ec	add system json for mi300x mlperf (#9786 ) * add system json for mi300x mlperf ``` python3 -m mlperf_logging.system_desc_checker examples/mlperf/training_submission_v5.0/tinycorp/systems/tinybox_8xMI300X.json training 4.1.0 INFO - System description checker passed for tinybox 8xMI300X ``` also removed the rocm from tinybox_red since we are not using it * update mlperf-logging version	2025-04-08 06:36:44 -04:00
chenyu	4a807ee952	remove duplicated z3-solver in setup.py (#9787 )	2025-04-08 06:12:58 -04:00
qazal	21e872df44	remove consts from sched_sink [pr] (#9782 )	2025-04-08 16:08:24 +08:00
qazal	f13e9cf2d9	move view_left to grouper.py + tiny reorders [pr] (#9780 ) * move view_left to grouper.py [pr] * reorder grouper * test_schedule	2025-04-08 15:39:28 +08:00
chenyu	7a28133b37	failed test for single softmax backward (#9778 ) getting RecursionError with DONT_GROUP_REDUCES=1	2025-04-08 02:36:32 -04:00
George Hotz	fefee5d3ab	single kernel softmax (#9776 ) * real single kernel softmax * cleanup * fix blockend insertion * add to bert test	2025-04-08 12:35:48 +08:00
qazal	9963bb51e0	grouper tests cleanups [pr] (#9777 ) * grouper tests cleanups [pr] * viz * tuple * whitespace	2025-04-08 12:33:11 +08:00
chenyu	4cc7422769	use AM driver in bert mlperf (#9775 ) we should commit to use AM. it's 7ms slower python time now	2025-04-07 23:40:27 -04:00
George Hotz	db22094d35	hotfix: update softmax fusion test	2025-04-08 11:23:19 +08:00
Francis Lata	f8fe15e64e	move BoxCoder to mlperf helpers (#9773 )	2025-04-07 20:27:06 -04:00
Eitan Turok	bb7922b95f	Vectorize Transcendental Regression Tests (#9753 ) * init test * cleanup	2025-04-08 01:27:39 +08:00
chenyu	7c4a739fe4	full script for bert mi300x (#9772 )	2025-04-07 11:41:31 -04:00
Sieds Lykles	07d1aefaf4	fast idiv (#9755 ) * fast idiv with tests and fuzzer * Add todo comment * Add env variable to toggle fast_idiv * Move env check * Add fuzz fast_idiv to ci --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-04-07 08:32:24 -04:00
nimlgen	fa888ee077	minor test cleanups (#9770 ) * fix test_graph on max * pcie5	2025-04-07 15:29:12 +03:00
chenyu	3069ebfad1	use BERT_LAYERS=2 in bert init (#9769 ) save 5 minut scheduling in setup so we can fit more search	2025-04-07 07:46:37 -04:00
qazal	891322fd51	split into grouper.py (#9768 ) * split into grouper.py * update tests * reorder	2025-04-07 18:40:59 +08:00
qazal	219b8c9e8b	return becomes_map in scheduler [pr] (#9766 ) * add a graph_rewrite pass for creating asts [pr] * disk * benchmark * return becomes_map in scheduler * reorder schedule.py into grouper and linearizer [pr] * comments	2025-04-07 17:36:23 +08:00
qazal	6306dea6e2	add a graph_rewrite pass for creating asts [pr] (#9765 ) * add a graph_rewrite pass for creating asts [pr] * disk * benchmark	2025-04-07 16:32:11 +08:00
qazal	07eea567d4	reorder tensor_map and grouper parts [pr] (#9764 )	2025-04-07 15:36:13 +08:00
qazal	8ddb1357c0	fix UPat.location after pickle (#9763 ) * fix UPat.location after pickle [pr] * named upat test	2025-04-07 15:16:42 +08:00
qazal	4cd27aa0e6	hotfix: viz recenter and unlimited zoom (#9760 ) * hotfix: viz recenter and unlimited zoom * add shapes to the ast graph * not for COPY	2025-04-07 14:38:03 +08:00
chenyu	d0dace4306	update doc for permute to 3d tensor (#9758 ) easier to see if it's permuted to or permuted from	2025-04-07 00:38:05 -04:00
chenyu	b190d85ad7	benchmark script bert softmax (#9759 )	2025-04-07 00:31:18 -04:00
Ignacio Sica	58785181a8	AMD `bf16xf32` TC (#9717 ) * dont test bf16 for emulated amd tc * skip bf16 tc test in ci * skip bf16 for AMD in test_tensor_cores_codegen * add simple bf16 gemm test to benchmark	2025-04-07 11:41:04 +08:00
chenyu	43e4565148	weighted linear in external_benchmark_bert_matmuls (#9757 ) include the linear to get qkv, and permute so that stride matches with the real run	2025-04-06 23:35:42 -04:00
George Hotz	28e06d2d44	minor cleanups from patternmatcher [pr] (#9756 )	2025-04-07 11:28:14 +08:00
qazal	1ce4912770	viz profiler ui (#9664 ) * localhost:8000/prof * selector + table * add pid * on null selection reset filters * table sort * charset=utf-8 * clear the rest * sort by duration * render table * format * nothing in copy thread * keep starts * sort back * less javascript * diff * works on firefox	2025-04-07 00:30:17 +08:00
chenyu	8a585dc5c1	benchmark script for matmuls in bert (#9752 ) 2 main matmuls in the bert layers. getting these to be fast makes bert fast	2025-04-06 19:34:25 +08:00
qazal	139999c6d7	map viz files + query params cleanup [pr] (#9754 ) * map viz files + query params cleanup [pr] * .width + fix	2025-04-06 16:20:00 +08:00
Francis Lata	71b8890dd6	use validation dataloader inside retinanet eval (#9747 )	2025-04-05 16:46:55 -04:00
nimlgen	5f7c79676f	jit: prune independent copies (#9749 ) * jit: prune independent copies * linter * check kernel cnt	2025-04-05 20:50:28 +03:00
nimlgen	c2573b247c	jit: rename optimize_weights -> replan_buffers_memory_layout (#9751 )	2025-04-05 20:35:15 +03:00
uuuvn	493fb315b1	fix RDNA2 support (#9700 ) linux amdgpu_discovery.c:amdgpu_discovery_set_ip_blocks is a ton of switch cases with sometimes weird choices like replacing nbio 3.X with 2.3 while nbio 2.5 is somehow nbio 7.0. `import_module` currently just tries to replace revision and minor with zeroes if there is no exact match, but that's not enough to cover all that weirdness	2025-04-05 18:42:47 +03:00
chenyu	5a04f4d4ba	revert bert hparams for green and red (#9744 ) did more runs and it's not really better and not worth the change. only useful for BS=1024	2025-04-05 07:38:01 -04:00
chenyu	407ca54382	symbolic fold double where (#9436 ) * symbolic fold double where a.where(b.where(c, d), d) -> (a & b).where(c, d). a pattern in optimizer * test case	2025-04-05 05:12:17 -04:00
Sieds Lykles	9c2fc695b5	cond.logical_not().where(a,b) -> cond.where(b,a) (#9741 ) * Add rule for negation in where, simplifies arange patterns * 0 becomes 0.0 again * Only if cond is bool * ne is never None * Add a test --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-04-04 19:13:32 -04:00

1 2 3 4 5 ...

8403 Commits