tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-07 03:00:26 -04:00

Author	SHA1	Message	Date
chenyu	aee737bd9e	divide by gcd in UOp div folding (#5949 ) * divide by gcd in UOp div folding `(6x+6y)//16 -> (3x+3y)//8` etc simpler version * only factor out const * don't apply for unsigned * don't need that if * space	2024-08-06 20:00:57 -04:00
George Hotz	6d1fdcfce2	don't reduce the same thing in a vector (#5950 ) * don't reduce the same thing over and over * cleaner way to write it that doesn't loop	2024-08-06 16:59:15 -07:00
qazal	d5d7f4e7b8	more TestIndexing correctness asserts [run_process_replay] (#5948 ) * use torch in test_mnist_val * more asserts	2024-08-07 01:50:42 +03:00
qazal	7f062929e8	start all cached scheduler functions with buf, st [run_process_replay] (#5946 ) * start all cached scheduler functions with buf, st - [x] _recursive_group - [x] _recursive_lazyop - [x] _recurse_reduceops * use dict [run_process_replay]	2024-08-07 01:24:22 +03:00
chenyu	794796256c	UOp.const_factor [run_process_replay] (#5945 ) * UOp.const_factor [run_process_replay] simplify mod and div folding * test does not work now	2024-08-06 18:18:29 -04:00
Elias Wahl	c9862e17d4	MLPERF BERT submission scripts (#5931 ) * green * red * fix benchmark * log * count train samples * oops. 4.0 -> 4.1 * note to todo * no pillow	2024-08-06 18:09:18 -04:00
George Hotz	73d4d51845	add LBScheduleItem type [run_process_replay] (#5944 ) * add LBScheduleItem type [run_process_replay] * minor cleanups * fix * fix fuzz tests * add group cache type	2024-08-06 14:49:40 -07:00
chenyu	1dab75ae37	clean up mlperf dataloader import (#5940 ) use tinygrad tqdm for dataset, and PIL Image is only needed for resnet	2024-08-06 17:10:08 -04:00
qazal	7b6496f2e6	fix the reduceops cache breaking beautiful_mnist (#5938 ) * fix the reduceops cache breaking beautiful_mnist * test_sparse_categorical_crossentropy_simple * starting tests * atol from test_nn * test_sparse_categorical_crossentropy_alt * dont use torch	2024-08-07 00:02:54 +03:00
George Hotz	1417cc8df1	can reenable that test now (#5914 )	2024-08-06 13:38:21 -07:00
George Hotz	75154d7ae2	add some types to the scheduler [run_process_replay] (#5941 ) * add some types to the scheduler [run_process_replay] * set -> dedup	2024-08-06 12:23:54 -07:00
George Hotz	e077bc7baf	move memory planner to realize (#5937 )	2024-08-06 10:41:29 -07:00
chenyu	489575c3be	more UOp sum div with gcd tests (#5936 ) * more UOp sum div with gcd tests * one more	2024-08-06 12:50:10 -04:00
ignaciosica	81ae9fadc8	Float4 support for CLANG (#5915 ) * float4 support on clang * skip linearizer tests that require locals * add aligned attribute	2024-08-06 07:50:12 -07:00
qazal	a7db4c3ee9	show timings for DIFF_ARANGE=1 (#5935 ) * show timings for DIFF_ARANGE=1 * always with DEBUG=2	2024-08-06 17:20:38 +03:00
qazal	102a8c184b	diff fused arange schedules with ARANGE_DIFF=1 (#5934 ) * diff fused arange schedules with ARANGE_DIFF=1 * better llama diff	2024-08-06 16:52:26 +03:00
qazal	f7761245aa	save_schedule pre toposort [run_process_replay] (#5933 )	2024-08-06 15:10:01 +03:00
nimlgen	895e062723	nv remove useless init (#5932 )	2024-08-06 14:41:40 +03:00
qazal	3d4742dd2e	override output shape in fused assign (#5930 ) * override output shape in fused assign This makes ``` FUSE_ARANGE=1 JIT=0 python3 examples/llama.py --gen 1 --prompt "Hello." --count 10 --temperature 0 --timing ``` work. In general we should assert ASSIGN doesn't change shape. * merge asserts	2024-08-06 13:28:50 +03:00
nimlgen	341c394c89	amd save exec offsets (#5928 ) * amd save exec offsets * fix * better * ugh	2024-08-06 12:11:46 +03:00
wozeparrot	5808e8a30f	mockgpu remu changes (#5925 )	2024-08-05 19:26:58 -07:00
chenyu	09b7722637	UOp generic div folding (#5896 )	2024-08-05 21:38:43 -04:00
George Hotz	3e1336957d	test arange with all opts (#5923 ) * test arange with all opts * Update test_arange.py * Update test_arange.py * Update test_arange.py * Update test_arange.py * Update test_arange.py	2024-08-05 18:38:25 -07:00
George Hotz	2e7adb529f	don't run kernels with 1000x more compute (fix BEAM with FUSE_ARANGE) (#5924 )	2024-08-05 16:28:09 -07:00
George Hotz	5d17f54e3c	fast mnist indexing (#5921 ) * fast mnist indexing * more tests * remove those tests, new indexing rule	2024-08-05 13:55:15 -07:00
George Hotz	e81c18f494	make the arange test check correctness [run_process_replay] (#5920 )	2024-08-05 13:41:06 -07:00
George Hotz	8d1c884e78	capture the const pattern in both directions (#5919 ) * capture the const pattern in both directions * add regression test	2024-08-05 12:15:38 -07:00
George Hotz	42f599870c	unroll arange is broken (#5918 ) * unroll arange is broken * fix unrolled arange * one more test	2024-08-05 12:15:07 -07:00
wozeparrot	6740a0a6a0	hip_ioctl changes (#5917 )	2024-08-05 11:58:38 -07:00
qazal	70949ea7e6	test cstyle compile error for max with inline const (#5838 ) * test_failure_46 * GPU=1 fails too * add test_renderer * add failing platforms * nv too * assert return value	2024-08-05 19:02:16 +03:00
nimlgen	98df648a79	metal sync queues in transfer (#5308 ) * metal sync queues * cleaner * need this * oops	2024-08-05 18:43:22 +03:00
qazal	6a70c69167	hotfix: TC renders nv_bfloat16 (#5913 ) * fix wmma bfloat16 * cleanup	2024-08-05 18:40:31 +03:00
P4ssenger	8ce9e6e693	Fix vectorized dtype rendering bug in CLANG (#5911 ) * fix vectorized types rendering for clang * fix bug in fix * fix bug 2 in fix 2	2024-08-05 17:43:26 +03:00
qazal	e0c6520138	check arange fusing with VIEW and COPY (#5912 ) * check arange fusing with VIEW and COPY * gpu and clang	2024-08-05 17:09:21 +03:00
nimlgen	590b9ebb34	hcq copy queue is optional (#5909 ) * hcq copy queue is optional * one more * this	2024-08-05 14:03:25 +03:00
George Hotz	159ac06b5b	remove unused reduce rules + improve unparented (#5908 ) * remove unused reduce rules [run_process_replay] * this work * those tests are meaningless now	2024-08-04 18:18:27 -07:00
George Hotz	d7387d31bf	remove useless reduce cases [run_process_replay] (#5907 ) * remove useless reduce cases [run_process_replay] * do_reduce cleanup * more cleanups + no longer supported tests * Revert "more cleanups + no longer supported tests" This reverts commit `e9f2f6ba70`. * no longer supported tests * switch ReduceOps.SUM -> BinaryOps.ADD	2024-08-04 17:11:08 -07:00
wozeparrot	94917521ee	fix: sqlite on pypy (#5906 )	2024-08-04 16:40:59 -07:00
George Hotz	be8958e26b	use CONTRACT before REDUCE (#5903 ) * use CONTRACT before REDUCE [run_process_replay] * support half expand * EXPAND GEP	2024-08-04 16:17:33 -07:00
wozeparrot	f33950f454	tracemeta fixups (#5904 )	2024-08-04 16:15:06 -07:00
chenyu	adba5efc64	enable llama 2 70B in tinybox green CI (#5905 ) runnable with MAX_CONTEXT=256	2024-08-04 18:48:46 -04:00
chenyu	4a65010de8	remove CUDACPU flag in tests [run_process_replay] (#5902 ) no longer used	2024-08-04 16:06:38 -04:00
chenyu	996ff0c135	pow(2) -> square in RMSNorm [run_process_replay] (#5901 ) reads nicer in metadata	2024-08-04 14:21:31 -04:00
qazal	aad9234e52	test fused precompute_freqs_cis (#5900 ) * test_precompute_freqs_cis * tiny for ci	2024-08-04 21:01:05 +03:00
chenyu	c67e9887f7	support using str to specify dtype (#5897 ) * support using str to specify dtype in Tensor creation and args into `cast` and `bitcast`, and acc_dtype * more tests	2024-08-04 12:56:28 -04:00
nimlgen	4f9221e8dd	remove useless _ensure_shared_time_base (#5899 )	2024-08-04 17:01:54 +03:00
qazal	4c5ef2cc4f	setitem with arange fusion 1 (#5898 )	2024-08-04 16:09:21 +03:00
chenyu	59315ffc78	minor cleanup to UOp mod folding [run_process_replay] (#5895 ) some walrus	2024-08-03 21:38:44 -04:00
nimlgen	dad8e72ee9	hcq graph refactor (#5887 ) * cleanup * prof * cleaner * comments * more types	2024-08-03 23:35:33 +03:00
chenyu	da61dea1b2	simple failed UOp sub symbolic test case (#5894 )	2024-08-03 14:27:23 -04:00

1 2 3 4 5 ...

5492 Commits