tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-01 18:25:04 -05:00

Author	SHA1	Message	Date
George Hotz	829262a5ee	add external_test_speed_theoretical	2024-07-26 17:45:22 -07:00
chenyu	5f168e7499	remove the optimization in AndNode.substitute (#5747 ) was used in the old linearizer but longer needed. still need substitute because some fuzz tests calls sym_infer on AndNode	2024-07-26 20:08:07 -04:00
kormann	c50e354936	NOp clean up any_len passing [run_process_replay] (#5743 ) * clean allow_any_len * min	2024-07-26 17:00:31 -07:00
George Hotz	db1d093b29	reenable LLaMA-3 8B BEAM on NV (#5746 )	2024-07-26 16:56:41 -07:00
chenyu	c6b2d96474	minor uop uopgraph cleanups (#5745 )	2024-07-26 19:23:48 -04:00
chenyu	3686b6726a	move GraphException to jit.py (#5744 ) same place where GraphRunner is defined	2024-07-26 19:01:12 -04:00
kormann	a5ede535ef	NOp field name [run_process_replay] (#5742 ) * rm def name * add field name	2024-07-26 18:45:59 -04:00
chenyu	0d7d4dd731	UOp._min_max for MUL and MOD (#5741 )	2024-07-26 18:38:10 -04:00
George Hotz	c50e374bb6	multiple locals + get_kernel_modifier + fix valid (#5739 ) * multiple locals + get_kernel_modifier + fix valid * fix test pattern matcher	2024-07-26 15:10:10 -07:00
nimlgen	f6c0e17a2c	optimize symbolic-related updates in graphs (#5727 ) * try * faster * cleaner * better? * better? * cleaner * fixes * unused * mypy * fix clang * remove comment * better var names * rename * fix cuda * rename	2024-07-27 00:57:59 +03:00
chenyu	dc7483ee6f	UOp simple div folding (#5740 ) made UOp.divides return the Optional[quotient] and used it for simple div folding	2024-07-26 17:14:32 -04:00
chenyu	671259417f	reuse UOp `__repr__` for NOp (#5738 )	2024-07-26 16:59:55 -04:00
kormann	b0c1dba299	named UOp class "NOP" [run_process_replay] (#5728 ) * NOP * fix const + simplify compile * rm VAR for NOOP --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-07-26 13:25:53 -07:00
George Hotz	4df46eac67	clean up tensor cores [run_process_replay] (#5736 ) * clean up tensor cores [run_process_replay] * remove tuple(wmma_sz), self.opts.device * remove tls, leave DEVICE	2024-07-26 13:21:23 -07:00
qazal	94d578396f	separate process replay main loop (#5734 ) * separate process replay main loop * [run_process_replay] * add kernel_changed * test with [run_process_replay] * revert temp [run_process_replay]	2024-07-26 21:43:08 +03:00
chenyu	9838c1a6ff	update import style in runtime (#5735 )	2024-07-26 14:00:23 -04:00
chenyu	a4e9ebc68a	update test_uop_symbolic (#5733 ) enabled more passed tests	2024-07-26 13:46:09 -04:00
George Hotz	5c688560bc	move CUDA/HIP compilers to their own files [run_process_replay] (#5732 )	2024-07-26 10:00:15 -07:00
chenyu	2cc55a3095	UOp simple mul add div fold (#5726 )	2024-07-25 22:00:30 -04:00
chenyu	78f75aa80d	remove redundant symbolic mod rule [run_process_replay] (#5725 )	2024-07-25 21:21:02 -04:00
chenyu	5521b6d437	UOp simple mul-add-lt fold (#5721 )	2024-07-25 20:49:38 -04:00
qazal	1b53207b4f	revert isolated dags scheduling (#5724 )	2024-07-25 19:45:12 -04:00
chenyu	845b0d1c9d	UOp more generic div folding (#5722 ) old: `x // c` can fold if `0 <= x.vmin <= x.vmax < c` new: `x // c` can fold if `0 < c and x.vmin // c == x.vmax // c`	2024-07-25 17:49:14 -04:00
nimlgen	fb8148077e	hcq do not update the same signal (#5719 ) * hcq do not update the same signal * import them	2024-07-26 00:24:45 +03:00
nimlgen	6ec9ea9ddd	hcq update_exec with optional params (#5708 )	2024-07-26 00:04:57 +03:00
George Hotz	8b34ee2f52	remove global_size and local_size from Kernel class [run_process_replay] (#5720 ) * remove global_size and local_size from Kernel class [run_process_replay] * sizes from the prg	2024-07-25 13:55:08 -07:00
George Hotz	142b7fb22f	faster beam [run_process_replay] (#5718 )	2024-07-25 11:58:41 -07:00
chenyu	eff7c5fd2c	halve kernel counts in metal Fuzz Test linearizer (#5716 ) the test time has increased to 3 minutes	2024-07-25 14:35:11 -04:00
George Hotz	e877ed9688	cleaner uop expand [run_process_replay] (#5715 ) * cleaner uop expand [run_process_replay] * comments	2024-07-25 11:29:53 -07:00
chenyu	a82815262c	more test_pattern_matcher fixups (#5714 )	2024-07-25 14:12:21 -04:00
George Hotz	b8b5411845	move Function to Developer section of docs	2024-07-25 11:05:23 -07:00
qazal	f02124ffa0	rename to realize_reduceop (#5713 ) * rename to realize_reduceop * shorter comment	2024-07-25 20:57:33 +03:00
chenyu	05e02ddfb3	fixup test_pattern_matcher (#5712 )	2024-07-25 13:48:52 -04:00
qazal	9ceb3a3d1f	beautiful_mnist -4.3% kernels (#5709 ) * add is_complete * partially delete forced_realized * p2 * start * refactor to can_group * remove steps * _get_inputs is nicer * fix the cache * cache is dict now * rename to group	2024-07-25 20:30:49 +03:00
kormann	92eefab4b0	method alu (#5711 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2024-07-25 13:25:38 -04:00
qazal	76877df518	map groupable children (#5710 ) * map groupable children * remove setitem	2024-07-25 19:27:48 +03:00
kormann	1e2eac755d	Fix repr upat (#5705 ) * test * fix * x fix * simpler * rm extra space	2024-07-25 12:05:48 -04:00
qazal	1c992de257	hotfix: compare_schedule defaults to false (#5707 )	2024-07-25 17:08:28 +03:00
qazal	489cda827a	more scheduler process replay tooling (#5706 ) * more scheduler process replay tooling * refactor to compare_schedule	2024-07-25 15:47:18 +03:00
qazal	4e070a2c89	start work on indexing fusion (#5590 ) * start base * the views add up base reduceop st: ShapeTracker(views=(View(shape=(60000, 1), strides=(1, 0), offset=0, mask=None, contiguous=True),)) top st: ShapeTracker(views=(View(shape=(512, 6000, 1, 28, 28, 10), strides=(0, 1, 0, 0, 0, 6000), offset=0, mask=None, contiguous=False), View(shape=(512, 6000, 1, 28, 28, 10), strides=(47040000, 784, 0, 28, 1, 4704000), offset=0, mask=None, contiguous=False))) merged buf.st+st: ShapeTracker(views=(View(shape=(512, 6000, 1, 28, 28, 10), strides=(0, 1, 0, 0, 0, 6000), offset=0, mask=None, contiguous=False), View(shape=(512, 6000, 1, 28, 28, 10), strides=(47040000, 784, 0, 28, 1, 4704000), offset=0, mask=None, contiguous=False))) * p1 * some cleanups * more cleanups * one kernel * more * late fuse arange * less lines * more work * fix st strides 1 * update test_schedule, start argmax * test_tiny_argmax * add FUSE_ARANGE * more cleanup * add utils * reduce merging * fix axis and fold if needed * more fusion * need to figure this out * now fixing all of these * todos+save a line * ready for p1	2024-07-25 13:23:38 +03:00
nimlgen	08f47d7dc3	more info on failure 41 (#5704 )	2024-07-25 12:14:28 +03:00
nimlgen	69d4f474d8	amd resnet pf (#5703 )	2024-07-25 11:21:22 +03:00
nimlgen	1038482a66	enable hip tc (#5702 )	2024-07-25 11:12:11 +03:00
qazal	5b38ff8679	shorter llvm and ptx rendering [run_process_replay] (#5686 ) * src_dtype * that's a upat * the assert in vectorize is in type_verify * uops asserts vectorizing a vectorize * assert this * for internal casts it's fine	2024-07-25 10:42:25 +03:00
chenyu	46e1151c02	UOp more generic mul -> mod folding (#5698 )	2024-07-24 21:41:25 -04:00
chenyu	66a9c372af	UOp mod reduction (#5697 )	2024-07-24 20:36:00 -04:00
George Hotz	489a5b99a5	hotfix: triton_nv_matmul touchups	2024-07-24 23:24:29 +00:00
chenyu	8648fb2636	UOp vmin/vmax on ADD (#5689 )	2024-07-24 19:09:42 -04:00
qazal	e2e70bd90b	bring unbind back in Varaible const (#5687 ) * bring unbind back in Varaible const * this shows my experience with symbolic	2024-07-24 18:37:00 -04:00
nimlgen	b026312a31	nv ptx print log (#5691 )	2024-07-24 21:40:58 +03:00

... 101 102 103 104 105 ...

10417 Commits