tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-18 02:21:40 -05:00

Author	SHA1	Message	Date
qazal	bcb2f1caa3	init REDUCE_AXIS with BinaryOps (#6256 ) * REDUCE_AXIS arg with BinaryOps * more work in kernel.py fixup sops.gz * fix TestGraphRewriteEfficiency	2024-08-24 11:28:41 +03:00
chenyu	da5cf11859	fix acc init value for MUL (#6263 )	2024-08-23 23:19:44 -04:00
George Hotz	26498b322e	add BEAM to external_benchmark_schedule.py	2024-08-23 18:10:46 -07:00
George Hotz	53a73038e3	hotfix: TestGraphRewriteEfficiency.test_create_many_uops	2024-08-23 15:51:57 -07:00
chenyu	590c0922b6	Tensor.prod (#6250 ) * Tensor.prod a new reduce op! * onnx ReduceProd	2024-08-23 10:06:32 -04:00
qazal	78d6bd8b41	start graph rewrite in the scheduler (#6248 ) * start graph rewrite in the scheduler * test: enable it * test timings * only fails in multi reduce * more isolated tests	2024-08-23 13:15:55 +03:00
George Hotz	238896ca02	loooking into graph rewrite speed (#6239 ) * loooking into graph rewrite speed * track, replace is slow * if all same, no permutations [run_process_replay] * types so compile works * no implied comprehension * TRACK_MATCH_STATS=2	2024-08-22 13:17:55 -07:00
chenyu	e745e16441	remove UnaryOps.NEG (#6238 ) * Remove UnaryOps.NEG generated new dataset with ``` time JIT=2 PYTHONPATH=. ./extra/optimization/generate_dataset.sh gzip /tmp/sops mv /tmp/sops.gz extra/datasets/ ``` * fix that	2024-08-22 14:21:39 -04:00
nimlgen	6c4ddd6260	hcq skip tests when no multidev (#6235 ) * hcq skip tests when no multidev * linter * a bit higher tinout	2024-08-22 18:27:16 +03:00
chenyu	08539f08b0	fix UOp repr with Variable in arg (#6236 )	2024-08-22 11:06:33 -04:00
chenyu	3fc8203475	remove NEG from handwritten ast in tests (#6234 ) * remove NEG from handwritten ast in tests * test_linearizer_failures	2024-08-22 09:06:59 -04:00
chenyu	1c5ef5b793	format test_linearizer_failure (#6231 ) made it easier to remove NEG	2024-08-21 21:10:56 -04:00
nimlgen	78c94abe9c	raise time limit for ci in test_profile_multidev_transfer (#6227 )	2024-08-21 22:42:03 +03:00
gswangg	c74b318458	migrate test_linearizer.py to UOp AST, pt. 2 (#6228 )	2024-08-21 22:16:11 +03:00
George Hotz	c3168952f0	wip: tracking pattern matcher [run_process_replay] (#6225 ) * wip: tracking pattern matcher * better * proper dedup * timing * early reject * mergable match stats * TrackedPattenMatcher * fix TrackedPattenMatcher * cleanups * clean that too * remove early_reject * Revert "remove early_reject" This reverts commit dc2aef14b8f5da58f5ec9566daf252513cac394c. * total * sort by time * match_stats cleanup	2024-08-21 11:57:26 -07:00
chenyu	a666450e4d	UOp pattern x + x -> x * 2 (#6224 ) * UOp pattern x + x -> x * 2 now there's no NEG, with this it covers all kinds of ax+bx * can remove x-x	2024-08-21 12:06:19 -04:00
chenyu	c9a9631818	no UnaryOps.NEG in generated UOp patterns (#6209 ) * no UnaryOps.NEG in generated UOp patterns removed pattern `x * (-1) -> -x` and `x != True` * those are fine because NEG became CMPNE and True * fix sd validation L2 norm	2024-08-21 11:08:22 -04:00
qazal	3b8cc5a3e0	more multireduce tests prep for neg removal [run_process_replay] (#6220 )	2024-08-21 12:45:24 +03:00
qazal	f03e5a4b3b	test_multireduce const has a shape (#6218 )	2024-08-21 11:02:45 +03:00
George Hotz	2c42e9c2c6	faster rewrite, no folder in expand/reduce [run_process_replay] (#6216 ) * faster rewrite, no folder in expand/reduce [run_process_replay] * is removing the expander there okay * parens * don't reconstruct exact match uop * fast do_reduce * expand pyint * most of the parents gains with less lines	2024-08-20 23:36:58 -07:00
George Hotz	16f420f7a7	split full_graph_rewrite and linearize_uop [run_process_replay] (#6215 ) * split full_graph_rewrite and linearize_uop * fix tests * graph rewrite in test uops * add types	2024-08-20 20:12:33 -07:00
George Hotz	9faf205601	CIFAR trainer + various bugfixes / improvements (#6146 ) * move cifar into datasets * support for pathlib Tensors, tar_extract, and fetch gunzip * too early for Device.DEFAULT * simpler hlb_cifar + .to(None) is default * new compiler failure, start beautiful_cifar * beautiful cifar runs but is broken * jit train step * cleaner * std_mean, not mean_std * more correct * fast indexing * don't print that * torch load broken * add eval * nicer bar * decoraters are the way to do this * bounds check the load * a few ops * batchnorm bugfix, if track_running_stats is False, use online estimate * full timing * fix fusion * unneeded realize * master tensor	2024-08-20 16:58:46 -07:00
madt2709	4bb98d8882	Fix track_running_stats in batchnorm (#6200 ) * Fix track_running_stats in batchnorm * Fix linter * Update test_fold_conv_batchnorm_notrain to keep allowed at 1 * Add test_fold_conv_batchnorm_notrain_no_running_stats * Save 1 line	2024-08-20 14:01:22 -07:00
George Hotz	a5d79688db	fix indexing out of bounds (#6208 ) * fix indeing out of bounds * 5 ops per access is fine	2024-08-20 11:34:56 -07:00
chenyu	4451bcaf95	update test_arange test_llama_embedding_opt (#6207 ) non CI uses larger embedding, still same orders of magnitude	2024-08-20 13:58:43 -04:00
qazal	074cf780dd	add option to only benchmark schedule [run_process_replay] (#6204 )	2024-08-20 16:51:27 +03:00
gswangg	0e6f057eae	migrate test_linearizer.py to UOP AST (pt. 1) (#6150 ) * migrate test_multioutput to UOP AST * inline buf declarations * migrate test_multireduce to UOp AST * update test_mid_dim_multireduce to UOp AST * update test_triple_multireduce with UOp AST * make global definitions more concise * update test_double_reduce_multireduce with UOp AST * update test_multireduce_with_parallel with UOp AST * update test_multiout_multireduce to UOp AST * make gidx style consistent across updated tests --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-08-20 10:02:20 +03:00
chenyu	10330a41c7	add CMPNE tests in test_uops (#6196 ) fixed the output_dtype for CMPNE and match the tests for CMPLT	2024-08-19 19:41:21 -04:00
chenyu	21d6739237	remove UnaryOps.NEG from lazy.py (#6193 ) * remove UnaryOps.NEG from lazy.py * neg is no longer unary	2024-08-19 18:41:28 -04:00
Gabe Caldwell	bdd6325f31	default num_classes value for one_hot (#6182 ) * num_classes=-1 If num_classes set to -1, the number of classes will be inferred as one greater than the largest class value in the input tensor. * num_classes desc comment to explain num_classes default and what that means. * replacing ' with `	2024-08-19 12:07:14 -07:00
Alessandro Benetti	9328248610	support for std_mean and cross_entropy (#6181 ) * support for std_mean and cross_entropy (#3) * Cross entropy and std mean support * remove extra examples	2024-08-19 12:06:44 -07:00
Max-We	53b20afa3f	Write tar_extract (#6180 ) * Add tar_extract * Add tar_extract tests * Fix dtype for initialization from path * Tests for path initialization * rm print --------- Co-authored-by: Maximilian Weichart <maximilian.weichart@icloud.com>	2024-08-19 12:06:17 -07:00
Eitan Turok	8556d0c642	Support `gunzip` in `fetch` (#6176 ) * init * update * clean * add type * clean * fix import order * shorten variable names	2024-08-19 12:04:40 -07:00
samm393	5d742f7fe3	Missing features from rearrange (#6184 ) * fixes and tests * typo in test	2024-08-19 11:19:07 -07:00
qazal	2242ff84be	type verify intermediate UOps [run_process_replay] (#6140 ) * type verify intermediate UOps [run_process_replay] * merge asserts * variable const	2024-08-19 20:59:01 +03:00
qazal	478145cb8e	lowering error in diff_schedule is fine [run_process_replay] (#6185 )	2024-08-19 20:51:12 +03:00
chenyu	00578a021b	re:6125 switch real_size to use uops [run_process_replay] (#6138 ) * switch real_size to use uops [run_process_replay] * enough to pass --------- Co-authored-by: George Hotz <geohot@gmail.com>	2024-08-19 13:20:24 -04:00
qazal	e28d29641f	more scheduler process replay tooling [run_process_replay] (#6178 )	2024-08-19 15:35:51 +03:00
chenyu	b36a7273c6	RUF018 assignment-in-assert [run_process_replay] (#6172 ) assertion should not have side effect or `-O` breaks. initially just wanted to fix the one in rearrange, but it also made some long lines less long	2024-08-19 00:34:52 -04:00
chenyu	9c60a27ece	lower float64 sin fuzzer threshold (#6173 ) 139216373.71875 failed https://github.com/tinygrad/tinygrad/actions/runs/10446960642/job/28925156240	2024-08-19 00:25:42 -04:00
samm393	fd7c84c1c8	Rearrange (#6106 ) * rearrange and tests * tidy * whitespace * remove line * -5 lines * test fix * static -> instance * fix () & add more tests * remove flags * -1 line * match einops * whitespace * repeated names	2024-08-18 20:22:28 -07:00
chenyu	2de174677a	threefry touchup [run_process_replay] (#6169 ) also why is test_gc testing _rng_counter is allocated??	2024-08-18 23:01:24 -04:00
David González Martínez	724e408736	add support for retain_graph in backward (#6145 ) * add support for retain_graph in backward * fix: dont accumulate grad on non-leaf tensors * fix order * fix: do not delete grad on leafs * fix linter * fix: can't exactly match torch behaviour internally * allow numerical room for test * refactor	2024-08-18 16:08:31 -07:00
wozeparrot	0c5189de25	threefry half (#6154 )	2024-08-18 15:23:12 -07:00
Timmy	e3d14d1ccc	Lowerer Multireduce Grouping (#6097 ) * grouping changes to codegen * linters + tests * fix identical store issue on PTX * comment in grouping multireduce tests * cleaning up diff * cleaning up diff * comments * linters * hotfix: dont change kernels --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-08-18 19:57:51 +03:00
qazal	1ba83cc7fa	split test_sgd_4convs_fuse [run_process_replay] (#6158 )	2024-08-18 18:35:42 +03:00
qazal	be6dda4093	hotfix: more lazyop rename to uop [run_process_replay] (#6157 )	2024-08-18 17:28:44 +03:00
George Hotz	17a043edad	tensor inference (#6156 ) * tensor inference * test is even better name	2024-08-18 00:19:28 -07:00
chenyu	f7950fc2b6	add E275 missing-whitespace-after-keyword linting rule (#6149 ) requires space after keywords like `assert`, `not`, `return`, `else`	2024-08-17 16:44:34 -04:00
George Hotz	88edc2902d	axis_is_masked with graph_rewrite [run_process_replay] (#6144 )	2024-08-17 10:28:49 -07:00

... 2 3 4 5 6 ...

2555 Commits