tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-07 03:00:26 -04:00

Author	SHA1	Message	Date
qazal	f03e5a4b3b	test_multireduce const has a shape (#6218 )	2024-08-21 11:02:45 +03:00
George Hotz	2c42e9c2c6	faster rewrite, no folder in expand/reduce [run_process_replay] (#6216 ) * faster rewrite, no folder in expand/reduce [run_process_replay] * is removing the expander there okay * parens * don't reconstruct exact match uop * fast do_reduce * expand pyint * most of the parents gains with less lines	2024-08-20 23:36:58 -07:00
George Hotz	16f420f7a7	split full_graph_rewrite and linearize_uop [run_process_replay] (#6215 ) * split full_graph_rewrite and linearize_uop * fix tests * graph rewrite in test uops * add types	2024-08-20 20:12:33 -07:00
George Hotz	9faf205601	CIFAR trainer + various bugfixes / improvements (#6146 ) * move cifar into datasets * support for pathlib Tensors, tar_extract, and fetch gunzip * too early for Device.DEFAULT * simpler hlb_cifar + .to(None) is default * new compiler failure, start beautiful_cifar * beautiful cifar runs but is broken * jit train step * cleaner * std_mean, not mean_std * more correct * fast indexing * don't print that * torch load broken * add eval * nicer bar * decoraters are the way to do this * bounds check the load * a few ops * batchnorm bugfix, if track_running_stats is False, use online estimate * full timing * fix fusion * unneeded realize * master tensor	2024-08-20 16:58:46 -07:00
madt2709	4bb98d8882	Fix track_running_stats in batchnorm (#6200 ) * Fix track_running_stats in batchnorm * Fix linter * Update test_fold_conv_batchnorm_notrain to keep allowed at 1 * Add test_fold_conv_batchnorm_notrain_no_running_stats * Save 1 line	2024-08-20 14:01:22 -07:00
George Hotz	a5d79688db	fix indexing out of bounds (#6208 ) * fix indeing out of bounds * 5 ops per access is fine	2024-08-20 11:34:56 -07:00
chenyu	4451bcaf95	update test_arange test_llama_embedding_opt (#6207 ) non CI uses larger embedding, still same orders of magnitude	2024-08-20 13:58:43 -04:00
qazal	074cf780dd	add option to only benchmark schedule [run_process_replay] (#6204 )	2024-08-20 16:51:27 +03:00
gswangg	0e6f057eae	migrate test_linearizer.py to UOP AST (pt. 1) (#6150 ) * migrate test_multioutput to UOP AST * inline buf declarations * migrate test_multireduce to UOp AST * update test_mid_dim_multireduce to UOp AST * update test_triple_multireduce with UOp AST * make global definitions more concise * update test_double_reduce_multireduce with UOp AST * update test_multireduce_with_parallel with UOp AST * update test_multiout_multireduce to UOp AST * make gidx style consistent across updated tests --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-08-20 10:02:20 +03:00
chenyu	10330a41c7	add CMPNE tests in test_uops (#6196 ) fixed the output_dtype for CMPNE and match the tests for CMPLT	2024-08-19 19:41:21 -04:00
chenyu	21d6739237	remove UnaryOps.NEG from lazy.py (#6193 ) * remove UnaryOps.NEG from lazy.py * neg is no longer unary	2024-08-19 18:41:28 -04:00
Gabe Caldwell	bdd6325f31	default num_classes value for one_hot (#6182 ) * num_classes=-1 If num_classes set to -1, the number of classes will be inferred as one greater than the largest class value in the input tensor. * num_classes desc comment to explain num_classes default and what that means. * replacing ' with `	2024-08-19 12:07:14 -07:00
Alessandro Benetti	9328248610	support for std_mean and cross_entropy (#6181 ) * support for std_mean and cross_entropy (#3) * Cross entropy and std mean support * remove extra examples	2024-08-19 12:06:44 -07:00
Max-We	53b20afa3f	Write tar_extract (#6180 ) * Add tar_extract * Add tar_extract tests * Fix dtype for initialization from path * Tests for path initialization * rm print --------- Co-authored-by: Maximilian Weichart <maximilian.weichart@icloud.com>	2024-08-19 12:06:17 -07:00
Eitan Turok	8556d0c642	Support `gunzip` in `fetch` (#6176 ) * init * update * clean * add type * clean * fix import order * shorten variable names	2024-08-19 12:04:40 -07:00
samm393	5d742f7fe3	Missing features from rearrange (#6184 ) * fixes and tests * typo in test	2024-08-19 11:19:07 -07:00
qazal	2242ff84be	type verify intermediate UOps [run_process_replay] (#6140 ) * type verify intermediate UOps [run_process_replay] * merge asserts * variable const	2024-08-19 20:59:01 +03:00
qazal	478145cb8e	lowering error in diff_schedule is fine [run_process_replay] (#6185 )	2024-08-19 20:51:12 +03:00
chenyu	00578a021b	re:6125 switch real_size to use uops [run_process_replay] (#6138 ) * switch real_size to use uops [run_process_replay] * enough to pass --------- Co-authored-by: George Hotz <geohot@gmail.com>	2024-08-19 13:20:24 -04:00
qazal	e28d29641f	more scheduler process replay tooling [run_process_replay] (#6178 )	2024-08-19 15:35:51 +03:00
chenyu	b36a7273c6	RUF018 assignment-in-assert [run_process_replay] (#6172 ) assertion should not have side effect or `-O` breaks. initially just wanted to fix the one in rearrange, but it also made some long lines less long	2024-08-19 00:34:52 -04:00
chenyu	9c60a27ece	lower float64 sin fuzzer threshold (#6173 ) 139216373.71875 failed https://github.com/tinygrad/tinygrad/actions/runs/10446960642/job/28925156240	2024-08-19 00:25:42 -04:00
samm393	fd7c84c1c8	Rearrange (#6106 ) * rearrange and tests * tidy * whitespace * remove line * -5 lines * test fix * static -> instance * fix () & add more tests * remove flags * -1 line * match einops * whitespace * repeated names	2024-08-18 20:22:28 -07:00
chenyu	2de174677a	threefry touchup [run_process_replay] (#6169 ) also why is test_gc testing _rng_counter is allocated??	2024-08-18 23:01:24 -04:00
David González Martínez	724e408736	add support for retain_graph in backward (#6145 ) * add support for retain_graph in backward * fix: dont accumulate grad on non-leaf tensors * fix order * fix: do not delete grad on leafs * fix linter * fix: can't exactly match torch behaviour internally * allow numerical room for test * refactor	2024-08-18 16:08:31 -07:00
wozeparrot	0c5189de25	threefry half (#6154 )	2024-08-18 15:23:12 -07:00
Timmy	e3d14d1ccc	Lowerer Multireduce Grouping (#6097 ) * grouping changes to codegen * linters + tests * fix identical store issue on PTX * comment in grouping multireduce tests * cleaning up diff * cleaning up diff * comments * linters * hotfix: dont change kernels --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-08-18 19:57:51 +03:00
qazal	1ba83cc7fa	split test_sgd_4convs_fuse [run_process_replay] (#6158 )	2024-08-18 18:35:42 +03:00
qazal	be6dda4093	hotfix: more lazyop rename to uop [run_process_replay] (#6157 )	2024-08-18 17:28:44 +03:00
George Hotz	17a043edad	tensor inference (#6156 ) * tensor inference * test is even better name	2024-08-18 00:19:28 -07:00
chenyu	f7950fc2b6	add E275 missing-whitespace-after-keyword linting rule (#6149 ) requires space after keywords like `assert`, `not`, `return`, `else`	2024-08-17 16:44:34 -04:00
George Hotz	88edc2902d	axis_is_masked with graph_rewrite [run_process_replay] (#6144 )	2024-08-17 10:28:49 -07:00
qazal	5a266d5d0c	type verify ImageDType and PtrDType [run_process_replay] (#6137 ) * type verify ImageDType and PtrDType [run_process_replay] * fix tests	2024-08-17 16:37:07 +03:00
qazal	d1d41130cd	use membufs in ImageDType checks [run_process_replay] (#6136 ) * use membufs in ImageDType checks * set by key [run_process_replay]	2024-08-17 16:17:46 +03:00
qazal	d9ce664350	add test_verify_ast [run_process_replay] (#6134 )	2024-08-17 14:14:30 +03:00
George Hotz	3a2d724cb2	extra matcher from renderer [run_process_replay] (#6130 ) * extra matcher from renderer * cache_pm [run_process_replay]	2024-08-16 23:53:11 -07:00
George Hotz	5048066e79	st_arg, never -1 [run_process_replay] (#6128 )	2024-08-16 22:46:56 -07:00
George Hotz	d9cb45af09	only axis is masked [run_process_replay] (#6123 )	2024-08-16 21:01:17 -07:00
George Hotz	94aa5f11b5	Revert "use vmax for real_size [run_process_replay] (#6120 )" (#6122 ) This reverts commit `a6e3211444`.	2024-08-16 20:33:19 -07:00
George Hotz	a6e3211444	use vmax for real_size [run_process_replay] (#6120 ) * use vmax for real_size [run_process_replay] * axis is masked	2024-08-16 20:17:23 -07:00
George Hotz	912f01ed4b	UOpGraph -> linearize_uop [run_process_replay] (#6119 )	2024-08-16 19:48:39 -07:00
George Hotz	89c7989659	no shapetracker in ops [run_process_replay] (#6117 )	2024-08-16 17:23:27 -07:00
George Hotz	74ee9febec	remove iter from uopgraph (#6110 ) * remove iter from uopgraph * linearize returns uops * fix tests * linearize in linearize * tests fix * touchup * test failures	2024-08-16 15:58:29 -07:00
qazal	28c75bf2a6	merge uops with ops (#6111 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2024-08-16 18:17:57 -04:00
qazal	d5e3217076	hotfix: scheduler differ (#6115 ) * hotfix: scheduler differ * add the test back * track keys	2024-08-16 23:34:49 +03:00
qazal	c23d44c779	AST is UOp (#6030 ) * most of the work from the uops2 branch * schedule * realize * kernel * lowerer * search * green * merge uops with ops * Revert "merge uops with ops" This reverts commit `1408a59f12`. * fix benchmark * remove extra dedup	2024-08-16 22:09:00 +03:00
CaltropHungerton	38fb1e14a2	Intel XMX Tensor Core Support (#5622 ) * fixed xmx demo * i think i'm invoking the DPAS but it's slow * compiler build arg to stop register spilling, indicated where to fix flop counter * don't mind this * do NOT mind me * do not mind me * do not view * i will add bf16 later * in process of figuring out tc fields * we figured out the fields!!! * added check for cl device vendor, added seperate IntelRenderer * remove tc thread_local_aliases * cleaning debris before draft pr * edits for linter * deduping and checking device extensions * i will find more line reductions in other places * before merge upstream * double grf size in compiler to fix register spilling (bandaid), device checking changes * tc python emulation * fixed emulation * tests for emulated intel tensor core * TC=0, 1 working on upstream, fixed perf * test * debris * check for specialized cl device when we canonicalize device * bf16 support, tc=3 test added * address tests * revert half2 loads on intel tc, cleanup * linter * fold_expanded revert * lint, whitespace fix * cuda bf16 (only one with bf16) is skipped in test tensor cores, so i will skip for intel bf16 too * make line shorter, no need for noqa E501 * removed device intel * fix python emulation --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-08-16 09:19:21 -07:00
George Hotz	553ae9ebc0	bilinear interp uint8 fails (#6103 ) * new test for e2e compile failures * fix bug * bilinear interp uint8 fails * better tests	2024-08-15 19:34:39 -07:00
George Hotz	c850e03758	new test for e2e compile failures (#6101 ) * new test for e2e compile failures * fix bug	2024-08-15 18:56:22 -07:00
chenyu	9ef82e1f2b	UOp pattern DEFINE_VAR with min==max is also CONST (#6095 ) * UOp pattern DEFINE_VAR with min==max is also CONST * fix tests	2024-08-15 12:09:44 -04:00

1 2 3 4 5 ...

2387 Commits