tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-26 07:18:40 -05:00

Author	SHA1	Message	Date
chenyu	b190d85ad7	benchmark script bert softmax (#9759 )	2025-04-07 00:31:18 -04:00
Ignacio Sica	58785181a8	AMD `bf16xf32` TC (#9717 ) * dont test bf16 for emulated amd tc * skip bf16 tc test in ci * skip bf16 for AMD in test_tensor_cores_codegen * add simple bf16 gemm test to benchmark	2025-04-07 11:41:04 +08:00
chenyu	43e4565148	weighted linear in external_benchmark_bert_matmuls (#9757 ) include the linear to get qkv, and permute so that stride matches with the real run	2025-04-06 23:35:42 -04:00
George Hotz	28e06d2d44	minor cleanups from patternmatcher [pr] (#9756 )	2025-04-07 11:28:14 +08:00
qazal	1ce4912770	viz profiler ui (#9664 ) * localhost:8000/prof * selector + table * add pid * on null selection reset filters * table sort * charset=utf-8 * clear the rest * sort by duration * render table * format * nothing in copy thread * keep starts * sort back * less javascript * diff * works on firefox	2025-04-07 00:30:17 +08:00
chenyu	8a585dc5c1	benchmark script for matmuls in bert (#9752 ) 2 main matmuls in the bert layers. getting these to be fast makes bert fast	2025-04-06 19:34:25 +08:00
qazal	139999c6d7	map viz files + query params cleanup [pr] (#9754 ) * map viz files + query params cleanup [pr] * .width + fix	2025-04-06 16:20:00 +08:00
Francis Lata	71b8890dd6	use validation dataloader inside retinanet eval (#9747 )	2025-04-05 16:46:55 -04:00
nimlgen	5f7c79676f	jit: prune independent copies (#9749 ) * jit: prune independent copies * linter * check kernel cnt	2025-04-05 20:50:28 +03:00
nimlgen	c2573b247c	jit: rename optimize_weights -> replan_buffers_memory_layout (#9751 )	2025-04-05 20:35:15 +03:00
uuuvn	493fb315b1	fix RDNA2 support (#9700 ) linux amdgpu_discovery.c:amdgpu_discovery_set_ip_blocks is a ton of switch cases with sometimes weird choices like replacing nbio 3.X with 2.3 while nbio 2.5 is somehow nbio 7.0. `import_module` currently just tries to replace revision and minor with zeroes if there is no exact match, but that's not enough to cover all that weirdness	2025-04-05 18:42:47 +03:00
chenyu	5a04f4d4ba	revert bert hparams for green and red (#9744 ) did more runs and it's not really better and not worth the change. only useful for BS=1024	2025-04-05 07:38:01 -04:00
chenyu	407ca54382	symbolic fold double where (#9436 ) * symbolic fold double where a.where(b.where(c, d), d) -> (a & b).where(c, d). a pattern in optimizer * test case	2025-04-05 05:12:17 -04:00
Sieds Lykles	9c2fc695b5	cond.logical_not().where(a,b) -> cond.where(b,a) (#9741 ) * Add rule for negation in where, simplifies arange patterns * 0 becomes 0.0 again * Only if cond is bool * ne is never None * Add a test --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-04-04 19:13:32 -04:00
Sieds Lykles	e9a3ac02a5	Remove ne from arange pattern (#9743 )	2025-04-04 18:31:13 -04:00
nimlgen	86c55414d7	ops_amd: simplify gfx version (#9742 ) * ops_amd: simplify gfx version * fix * all vesrsion compact style * mypy * revert this * rename back to target	2025-04-04 22:18:11 +03:00
qazal	16d6aa15f1	record unittest name in process replay (#9731 ) * record unittest name in process replay * getitem * filename + (optional) name * del * get_test_method * not solved * try with linecache * test: print_loc * format * without linecache * checkout master	2025-04-05 01:39:48 +08:00
qazal	354db961c6	viz refactor to prep for profiler [pr] (#9739 )	2025-04-04 17:13:14 +08:00
chenyu	fe998798fb	linearizer failure test for OUT OF BOUNDS ACCESS (#9738 )	2025-04-04 03:48:43 -04:00
George Hotz	8b5a523743	fix minimum length in pattern matcher (#9736 )	2025-04-04 14:57:01 +08:00
chenyu	640ff681c3	rename bert script to 8xMI300X (#9734 ) and adds a script for single MI300X	2025-04-03 23:36:24 -04:00
George Hotz	b719aa1fb0	only check once for divisible fold lengths (#9732 )	2025-04-04 11:27:34 +08:00
George Hotz	926b0bcc57	cache folded upcast [pr] (#9733 )	2025-04-04 11:23:19 +08:00
George Hotz	8206c7281e	move const multiply after REDUCE (#9730 )	2025-04-04 11:07:46 +08:00
chenyu	6b3480ec70	update mi300x bert haparams (#9716 ) * update mi300x bert haparams borrowed from previous submission that also did BS=1024 * update	2025-04-03 22:30:00 -04:00
George Hotz	cac8bcf8b5	use Ops.REDUCE (#9721 ) * decrease bert python time [pr] * order copies * Revert "order copies" This reverts commit `3f62c8693b`. * rewrite count * Ops.REDUCE * acc first in the add chain * Fix tensor core acc * arange patterns look good * fix multireduce gate * reduce rewrite rule * bump that to 15 minutes * multiwmma isn't fusing * gep through wmma is gep pushing * bump that timeout too, it's all env setup * add failing test	2025-04-04 10:14:34 +08:00
nimlgen	949459fdd6	jit: fix deallocate on unallocated buffers in free_intermediates (#9699 )	2025-04-03 18:32:51 +03:00
qazal	52a8ecb15e	record unittest location in process replay [pr] (#9727 )	2025-04-03 20:50:09 +08:00
geohotstan	ac713e04db	ONNX add output shape validation (#9720 ) * add output shape validation and remove support for sequence_type * nit better err msg * add sequence_type back * improve err msg * Revert "improve err msg" This reverts commit `dc9eaea4bb`. * Revert "add sequence_type back" This reverts commit `288170b2d9`. * do explicit shape equality * small nit	2025-04-03 05:44:53 -04:00
chenyu	7dadbf3697	insert float() in bert acc (#9726 ) sum of bool by default uses default_float for acc. So without float, it might overflow with a large BS and default_float=HALF. fixed clsf_accuracy to not be inf in mi300x bert	2025-04-03 05:44:09 -04:00
chenyu	79145e3d40	cleanup truncate_bf16 [pr] (#9725 ) use torch bfloat16 for groundtruth in test. also a TODO for discrepancy	2025-04-03 05:43:49 -04:00
Ignacio Sica	bc2d86195e	increase test tolerance (#9719 )	2025-04-03 15:24:09 +08:00
chenyu	1d25844d44	Revert "disable CI red llama 3 4 gpu beam (#9690 )" (#9709 ) This reverts commit `6a5eacba8b`.	2025-04-03 02:34:39 -04:00
George Hotz	49dafe6d43	add gc tests [pr] (#9718 ) * add gc tests [pr] * del * more gc tests * add NullGraph	2025-04-03 14:08:32 +08:00
Ignacio Sica	bc91fffc5d	fix gated store with index in python backend (#9703 ) * add default gate in index * assert store * add TestRendererFailures - move test_gated_store_with_alu to new TestRenderFailures class for tests that fail on multiple renderers - add test_renderer_failures.py run on python CI * add test for gated index in 2d * test TestRenderFailures	2025-04-03 12:48:28 +08:00
qazal	f2bd65ccfc	delete Ops.EMPTY and Tensor._metaop (#9715 ) * delete Ops.EMPTY and Tensor._metaop [pr] * test_creation * arg= * abstractions2	2025-04-03 12:29:02 +08:00
George Hotz	5c7b549eab	use functools.cache instead of lru_cache(None) [pr] (#9714 ) * use functools.cache instead of lru_cache(None) [pr] * more cache	2025-04-03 11:47:13 +08:00
qazal	bbd13191f4	cleanup tensor BIND + remove outdated comments in tensor.py [pr] (#9712 ) * cleanup tensor BIND + remove outdated comments in tensor.py [pr] * from_blob whitespace * assert	2025-04-03 11:21:53 +08:00
geohotstan	e1d7e47cca	fix ONNX IsInf unintended dtype promotion (#9711 ) * add IsInf * add corresponding test * that float16 is kinda silly	2025-04-02 22:46:15 -04:00
qazal	11ae254dc5	construct BUFFER UOps directly when device in known [pr] (#9710 ) * construct BUFFER UOps directly when device in known [pr] * diff	2025-04-03 10:41:44 +08:00
George Hotz	1714fc3ba4	start work on speed [pr] (#9707 ) * fix get_location * fix get_location try 2 * clean up split_load_store [pr] * SHR fixup [pr]	2025-04-03 10:39:01 +08:00
George Hotz	0f1ffc2050	hotfix: cat tests 2048 instead of 256	2025-04-03 10:37:56 +08:00
uuuvn	5bd485c027	Fix double SDMA_OP_FENCE (#9705 ) Introduced in #9585, probably when i incorrectly resolved merge conflict while rebasing an old, mi300x-only branch. Seems to be the source of multi gpu beam llama hangs	2025-04-03 09:43:37 +08:00
chenyu	a6fec2f5ae	dev_run for bert on mi300x (#9706 )	2025-04-02 21:12:55 -04:00
nimlgen	d96b4983ac	amd: support rdna4 in runtime again (#9702 )	2025-04-03 01:19:23 +07:00
Ignacio Sica	2d6d8b7355	add bf16 mfma support (#9695 ) * add bf16 mfma support * skip tc if emulated_amd and dtypes is bf16 * hotfix	2025-04-02 21:44:49 +08:00
nimlgen	a6733f519f	dsp: make relro sections contiguous (#9701 )	2025-04-02 18:02:16 +07:00
George Hotz	ea5caefef0	gep should look at count, not vcount (#9698 ) * gep should look at count, not vcount * gep in order is a rule * min change * gep on void	2025-04-02 18:10:57 +08:00
George Hotz	f72a87fd0e	add proper support for Ops.IGNORE to remove store masks (#9692 ) * add proper support for Ops.IGNORE to remove store masks * remove useless NHWC * revert that	2025-04-02 16:38:01 +08:00
chenyu	3b8d923692	remove skip LLVM in test_div_int (#9686 )	2025-04-02 04:15:00 -04:00

... 40 41 42 43 44 ...

10417 Commits