tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-08 13:45:50 -05:00

Author	SHA1	Message	Date
chenyu	9838c1a6ff	update import style in runtime (#5735 )	2024-07-26 14:00:23 -04:00
chenyu	a4e9ebc68a	update test_uop_symbolic (#5733 ) enabled more passed tests	2024-07-26 13:46:09 -04:00
George Hotz	5c688560bc	move CUDA/HIP compilers to their own files [run_process_replay] (#5732 )	2024-07-26 10:00:15 -07:00
chenyu	2cc55a3095	UOp simple mul add div fold (#5726 )	2024-07-25 22:00:30 -04:00
chenyu	78f75aa80d	remove redundant symbolic mod rule [run_process_replay] (#5725 )	2024-07-25 21:21:02 -04:00
chenyu	5521b6d437	UOp simple mul-add-lt fold (#5721 )	2024-07-25 20:49:38 -04:00
qazal	1b53207b4f	revert isolated dags scheduling (#5724 )	2024-07-25 19:45:12 -04:00
chenyu	845b0d1c9d	UOp more generic div folding (#5722 ) old: `x // c` can fold if `0 <= x.vmin <= x.vmax < c` new: `x // c` can fold if `0 < c and x.vmin // c == x.vmax // c`	2024-07-25 17:49:14 -04:00
nimlgen	fb8148077e	hcq do not update the same signal (#5719 ) * hcq do not update the same signal * import them	2024-07-26 00:24:45 +03:00
nimlgen	6ec9ea9ddd	hcq update_exec with optional params (#5708 )	2024-07-26 00:04:57 +03:00
George Hotz	8b34ee2f52	remove global_size and local_size from Kernel class [run_process_replay] (#5720 ) * remove global_size and local_size from Kernel class [run_process_replay] * sizes from the prg	2024-07-25 13:55:08 -07:00
George Hotz	142b7fb22f	faster beam [run_process_replay] (#5718 )	2024-07-25 11:58:41 -07:00
chenyu	eff7c5fd2c	halve kernel counts in metal Fuzz Test linearizer (#5716 ) the test time has increased to 3 minutes	2024-07-25 14:35:11 -04:00
George Hotz	e877ed9688	cleaner uop expand [run_process_replay] (#5715 ) * cleaner uop expand [run_process_replay] * comments	2024-07-25 11:29:53 -07:00
chenyu	a82815262c	more test_pattern_matcher fixups (#5714 )	2024-07-25 14:12:21 -04:00
George Hotz	b8b5411845	move Function to Developer section of docs	2024-07-25 11:05:23 -07:00
qazal	f02124ffa0	rename to realize_reduceop (#5713 ) * rename to realize_reduceop * shorter comment	2024-07-25 20:57:33 +03:00
chenyu	05e02ddfb3	fixup test_pattern_matcher (#5712 )	2024-07-25 13:48:52 -04:00
qazal	9ceb3a3d1f	beautiful_mnist -4.3% kernels (#5709 ) * add is_complete * partially delete forced_realized * p2 * start * refactor to can_group * remove steps * _get_inputs is nicer * fix the cache * cache is dict now * rename to group	2024-07-25 20:30:49 +03:00
kormann	92eefab4b0	method alu (#5711 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2024-07-25 13:25:38 -04:00
qazal	76877df518	map groupable children (#5710 ) * map groupable children * remove setitem	2024-07-25 19:27:48 +03:00
kormann	1e2eac755d	Fix repr upat (#5705 ) * test * fix * x fix * simpler * rm extra space	2024-07-25 12:05:48 -04:00
qazal	1c992de257	hotfix: compare_schedule defaults to false (#5707 )	2024-07-25 17:08:28 +03:00
qazal	489cda827a	more scheduler process replay tooling (#5706 ) * more scheduler process replay tooling * refactor to compare_schedule	2024-07-25 15:47:18 +03:00
qazal	4e070a2c89	start work on indexing fusion (#5590 ) * start base * the views add up base reduceop st: ShapeTracker(views=(View(shape=(60000, 1), strides=(1, 0), offset=0, mask=None, contiguous=True),)) top st: ShapeTracker(views=(View(shape=(512, 6000, 1, 28, 28, 10), strides=(0, 1, 0, 0, 0, 6000), offset=0, mask=None, contiguous=False), View(shape=(512, 6000, 1, 28, 28, 10), strides=(47040000, 784, 0, 28, 1, 4704000), offset=0, mask=None, contiguous=False))) merged buf.st+st: ShapeTracker(views=(View(shape=(512, 6000, 1, 28, 28, 10), strides=(0, 1, 0, 0, 0, 6000), offset=0, mask=None, contiguous=False), View(shape=(512, 6000, 1, 28, 28, 10), strides=(47040000, 784, 0, 28, 1, 4704000), offset=0, mask=None, contiguous=False))) * p1 * some cleanups * more cleanups * one kernel * more * late fuse arange * less lines * more work * fix st strides 1 * update test_schedule, start argmax * test_tiny_argmax * add FUSE_ARANGE * more cleanup * add utils * reduce merging * fix axis and fold if needed * more fusion * need to figure this out * now fixing all of these * todos+save a line * ready for p1	2024-07-25 13:23:38 +03:00
nimlgen	08f47d7dc3	more info on failure 41 (#5704 )	2024-07-25 12:14:28 +03:00
nimlgen	69d4f474d8	amd resnet pf (#5703 )	2024-07-25 11:21:22 +03:00
nimlgen	1038482a66	enable hip tc (#5702 )	2024-07-25 11:12:11 +03:00
qazal	5b38ff8679	shorter llvm and ptx rendering [run_process_replay] (#5686 ) * src_dtype * that's a upat * the assert in vectorize is in type_verify * uops asserts vectorizing a vectorize * assert this * for internal casts it's fine	2024-07-25 10:42:25 +03:00
chenyu	46e1151c02	UOp more generic mul -> mod folding (#5698 )	2024-07-24 21:41:25 -04:00
chenyu	66a9c372af	UOp mod reduction (#5697 )	2024-07-24 20:36:00 -04:00
George Hotz	489a5b99a5	hotfix: triton_nv_matmul touchups	2024-07-24 23:24:29 +00:00
chenyu	8648fb2636	UOp vmin/vmax on ADD (#5689 )	2024-07-24 19:09:42 -04:00
qazal	e2e70bd90b	bring unbind back in Varaible const (#5687 ) * bring unbind back in Varaible const * this shows my experience with symbolic	2024-07-24 18:37:00 -04:00
nimlgen	b026312a31	nv ptx print log (#5691 )	2024-07-24 21:40:58 +03:00
George Hotz	bf24be4c8c	triton gets 163 TFLOPS on 4090	2024-07-24 18:32:29 +00:00
chenyu	85710e86cb	UOps div folding (#5690 ) #5689, with just div folding and new test cases	2024-07-24 14:21:44 -04:00
chenyu	fb1b51811b	unify UOp min/max default [run_process_replay] (#5688 ) * unify UOp min/max default [run_process_replay] * fix that	2024-07-24 13:05:26 -04:00
George Hotz	33d44f00ae	first fold, then expand (#5673 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2024-07-24 09:43:09 -07:00
qazal	b7b4c7844f	shorter BufferOps.LOAD creation (#5685 )	2024-07-24 18:53:07 +03:00
qazal	365e7afd4d	make fusion deterministic (#5684 ) * make fusion deterministic * not this one yet * line saving	2024-07-24 18:37:31 +03:00
nimlgen	2ea54176e2	docs: add more info on HCQProgram (#5683 ) * docs: add more info on HCQProgram * linter * linter2 * one more type	2024-07-24 17:20:18 +03:00
nimlgen	baface413a	nv better nvdisasm fail message (#5682 ) * nv better nvdisasm message * cuda	2024-07-24 16:19:26 +03:00
qazal	37347528bf	shorter BufferOps.CONST creation (#5681 )	2024-07-24 19:33:04 +08:00
qazal	6dcdff3bfd	share fusion behavior for r3 kernels (#5680 ) * use groups * this is the next one * should check the whole graph	2024-07-24 19:07:10 +08:00
qazal	3ffb1059a0	scheduling infra for isolated dags (#5679 ) * refactor to get_isolated_children * move assign	2024-07-24 17:14:26 +08:00
chenyu	e6e2d86fcf	replace RANGE max fold with generic max fold (#5676 )	2024-07-24 03:15:39 -04:00
chenyu	a7a77dfd83	UOp mul lt fold (#5677 )	2024-07-24 02:49:25 -04:00
chenyu	67b036bdfd	generic UOp max folding (#5675 )	2024-07-24 01:30:32 -04:00
chenyu	d1d81b359f	UOp compute min and max in one call [run_process_replay] (#5674 ) easier to handle cases like *-1 that flip the bounds	2024-07-24 00:51:23 -04:00

1 2 3 4 5 ...

5302 Commits