tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-18 18:35:12 -05:00

Author	SHA1	Message	Date
chenyu	c3da458bc3	UOp if min==max folds to CONST (#5828 ) * UOp if min==max folds to CONST * fix test	2024-07-30 22:14:22 -04:00
George Hotz	e6879035a0	work to make GEMV fast (#5824 ) * work to make GEMV fast * half8 cast * align struct * fix amd * float8 is a later problem	2024-07-30 17:41:40 -07:00
chenyu	02f0be03f2	tests on UOp div negative number and arange opts (#5825 )	2024-07-30 20:06:57 -04:00
George Hotz	693990a346	swap src[2] and src[3] in load [run_process_replay] (#5821 ) * swap src[2] and src[3] in load [run_process_replay] * cleanups + bugfix * fix ptx	2024-07-30 14:04:13 -07:00
George Hotz	17a2f74412	new style load/store folder (#5784 ) * remove old index reorder * new style folder * works better * dedup * one failure * this is fine now... * expander_rewrite * images broken, but all else should work * cleanups * make tests work with old * fix images * cleanups + bugfix * minor fixes * fix gated store folding * flip gate_creator and expander * fix gated store * remove unneeded rules * lines getting close * line count good	2024-07-30 13:17:20 -07:00
qazal	03d866b84f	UOps.IF with rewrite rules (#5812 ) * expand merge * merge barriers * gate_folder * test_linearizer_failures * this can be here * bring the new repr back * gate_folder2 * gate_creator is better * gate_folder * dedup conditions * early gate folding * dedup barrier * fold noop conditions * all consts can go away * free lines	2024-07-30 20:50:56 +03:00
chenyu	defd89e8e0	unify negative shape creation to raise ValueError (#5817 ) [run_process_replay]	2024-07-30 13:42:59 -04:00
P4ssenger	6742a4789a	Add check for negative dimension in view (#5790 ) * add check for negative dimension in view * add negative dim tests * move check to tensor level * fix error message * move check to view create --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-07-30 13:26:27 -04:00
Francis Lata	ce61be16f1	clean up how preprocessed folder is defined (#5813 )	2024-07-30 12:35:26 -04:00
qazal	5e827e51d2	add llama3 BEAM=2 failures to test_linearizer_failures (#5553 ) * skips * opts.device * benchmarks * add to test_linearizer_failures * remove hardcoded ones * linter * skip cpu	2024-07-30 00:37:32 +03:00
samm393	573e0f9a48	remove float division from idiv in python_alu (#5777 ) * removes float division from idiv in python_alu * add test * cleaner logic * pass clang unsigned literals correctly * suffix ULL instead of U --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-07-29 12:14:12 -04:00
samm393	2c94316bd2	ull literal support and test (#5789 ) * ull literal support and test * missing .numpy()	2024-07-29 11:50:49 -04:00
nimlgen	ab3839a80a	cleanup nv/cuda compilers (#5767 ) * cleanup nv/cuda compilers * destroy prog * small test * fix test * nv ptx rewrite key * jitlink free * ptx is part of cuda	2024-07-29 13:50:03 +03:00
chenyu	e7a14f398e	more uop_symbolic tests for divmod pairs (#5785 )	2024-07-28 21:27:06 -04:00
George Hotz	76d191ab94	move consts to end of add (#5783 ) * move consts to end of add * better * fix infinite loop	2024-07-28 17:38:57 -07:00
chenyu	71a64d8252	UOps.MUL bound when one is negative (#5781 ) * UOps.MUL bound when one is negative also one more distribute_mul rule * don't always expand	2024-07-28 19:02:47 -04:00
qazal	b775db6b60	high-level benchmark timing diff (#5776 ) * high level timings benchmark times fix defs * use the name map * skip last task	2024-07-28 23:42:57 +03:00
chenyu	600a39771d	fix Tensor.arange if (stop-start) and step have different signs (#5775 )	2024-07-28 14:34:10 -04:00
David González Martínez	d0fd84e617	feat: allow passing gradient to .backward() to compute vjp (#5771 ) * feat: allow passing gradient to .backward() to compute vjp * fix * refactor * fix trailing whitespace	2024-07-28 11:13:18 -07:00
qazal	e0e7293b0a	make process replay unique in retries [run_process_replay] (#5773 )	2024-07-28 20:44:15 +03:00
qazal	95dda8dadf	more unmatching vectorize/gep asserts [run_process_replay] (#5760 ) * merge vectorize/gep rules [run_process_replay] * assert dtypes * src= * float2=(float4.x,float4.y)	2024-07-28 15:08:54 +08:00
chenyu	bfbd7c5461	more generic UOp mul mod folding (#5765 )	2024-07-27 20:20:35 -04:00
chenyu	80c6475757	update test_uop_symbolic to test UOp min and max (#5764 ) covers #5750, #5748, #5741	2024-07-27 19:53:21 -04:00
nimlgen	ed1d784077	test profiler timer sync across devs (#5751 ) * test profiler timer sync across devs * more correct * typo	2024-07-27 16:47:37 +03:00
qazal	3e49d86c01	process replay diffs 3 things now (#5731 ) * github api infra * process replay is 3 parts now * parse benchmarks * add gh_token * complete diff * move process replay tests * last successful run * add tempdir * skip master	2024-07-27 12:52:20 +03:00
qazal	57b4a8e98d	assert process replay asserts (#5737 ) * assert process replay asserts * one ci job is fine * test: Revert "separate process replay main loop (#5734)" This reverts commit `94d578396f`. * mac sed needs that * Revert "test: Revert "separate process replay main loop (#5734)"" This reverts commit `e4ad7684d5`. * disable process replay capture * save time * amd is tiny * send to /dev/null	2024-07-27 12:07:50 +03:00
George Hotz	f8972ace38	test flops (and allow wide ALU in UOps) [run_process_replay] (#5749 ) * flops test in external_test_speed_theoretical.py * test speed theo * min SZMAX * allow wide ALU for things that support it * needed for mypy	2024-07-26 21:07:28 -07:00
George Hotz	2fde2d2914	hotfix: external_test_speed_theoretical works on 24GB	2024-07-26 18:41:52 -07:00
George Hotz	829262a5ee	add external_test_speed_theoretical	2024-07-26 17:45:22 -07:00
kormann	a5ede535ef	NOp field name [run_process_replay] (#5742 ) * rm def name * add field name	2024-07-26 18:45:59 -04:00
George Hotz	c50e374bb6	multiple locals + get_kernel_modifier + fix valid (#5739 ) * multiple locals + get_kernel_modifier + fix valid * fix test pattern matcher	2024-07-26 15:10:10 -07:00
chenyu	dc7483ee6f	UOp simple div folding (#5740 ) made UOp.divides return the Optional[quotient] and used it for simple div folding	2024-07-26 17:14:32 -04:00
chenyu	671259417f	reuse UOp `__repr__` for NOp (#5738 )	2024-07-26 16:59:55 -04:00
kormann	b0c1dba299	named UOp class "NOP" [run_process_replay] (#5728 ) * NOP * fix const + simplify compile * rm VAR for NOOP --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-07-26 13:25:53 -07:00
George Hotz	4df46eac67	clean up tensor cores [run_process_replay] (#5736 ) * clean up tensor cores [run_process_replay] * remove tuple(wmma_sz), self.opts.device * remove tls, leave DEVICE	2024-07-26 13:21:23 -07:00
qazal	94d578396f	separate process replay main loop (#5734 ) * separate process replay main loop * [run_process_replay] * add kernel_changed * test with [run_process_replay] * revert temp [run_process_replay]	2024-07-26 21:43:08 +03:00
chenyu	a4e9ebc68a	update test_uop_symbolic (#5733 ) enabled more passed tests	2024-07-26 13:46:09 -04:00
chenyu	2cc55a3095	UOp simple mul add div fold (#5726 )	2024-07-25 22:00:30 -04:00
chenyu	5521b6d437	UOp simple mul-add-lt fold (#5721 )	2024-07-25 20:49:38 -04:00
qazal	1b53207b4f	revert isolated dags scheduling (#5724 )	2024-07-25 19:45:12 -04:00
chenyu	845b0d1c9d	UOp more generic div folding (#5722 ) old: `x // c` can fold if `0 <= x.vmin <= x.vmax < c` new: `x // c` can fold if `0 < c and x.vmin // c == x.vmax // c`	2024-07-25 17:49:14 -04:00
chenyu	a82815262c	more test_pattern_matcher fixups (#5714 )	2024-07-25 14:12:21 -04:00
chenyu	05e02ddfb3	fixup test_pattern_matcher (#5712 )	2024-07-25 13:48:52 -04:00
qazal	9ceb3a3d1f	beautiful_mnist -4.3% kernels (#5709 ) * add is_complete * partially delete forced_realized * p2 * start * refactor to can_group * remove steps * _get_inputs is nicer * fix the cache * cache is dict now * rename to group	2024-07-25 20:30:49 +03:00
kormann	1e2eac755d	Fix repr upat (#5705 ) * test * fix * x fix * simpler * rm extra space	2024-07-25 12:05:48 -04:00
qazal	1c992de257	hotfix: compare_schedule defaults to false (#5707 )	2024-07-25 17:08:28 +03:00
qazal	489cda827a	more scheduler process replay tooling (#5706 ) * more scheduler process replay tooling * refactor to compare_schedule	2024-07-25 15:47:18 +03:00
qazal	4e070a2c89	start work on indexing fusion (#5590 ) * start base * the views add up base reduceop st: ShapeTracker(views=(View(shape=(60000, 1), strides=(1, 0), offset=0, mask=None, contiguous=True),)) top st: ShapeTracker(views=(View(shape=(512, 6000, 1, 28, 28, 10), strides=(0, 1, 0, 0, 0, 6000), offset=0, mask=None, contiguous=False), View(shape=(512, 6000, 1, 28, 28, 10), strides=(47040000, 784, 0, 28, 1, 4704000), offset=0, mask=None, contiguous=False))) merged buf.st+st: ShapeTracker(views=(View(shape=(512, 6000, 1, 28, 28, 10), strides=(0, 1, 0, 0, 0, 6000), offset=0, mask=None, contiguous=False), View(shape=(512, 6000, 1, 28, 28, 10), strides=(47040000, 784, 0, 28, 1, 4704000), offset=0, mask=None, contiguous=False))) * p1 * some cleanups * more cleanups * one kernel * more * late fuse arange * less lines * more work * fix st strides 1 * update test_schedule, start argmax * test_tiny_argmax * add FUSE_ARANGE * more cleanup * add utils * reduce merging * fix axis and fold if needed * more fusion * need to figure this out * now fixing all of these * todos+save a line * ready for p1	2024-07-25 13:23:38 +03:00
nimlgen	08f47d7dc3	more info on failure 41 (#5704 )	2024-07-25 12:14:28 +03:00
nimlgen	69d4f474d8	amd resnet pf (#5703 )	2024-07-25 11:21:22 +03:00

... 6 7 8 9 10 ...

2555 Commits