tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-26 07:18:40 -05:00

Author	SHA1	Message	Date
George Hotz	26498b322e	add BEAM to external_benchmark_schedule.py	2024-08-23 18:10:46 -07:00
George Hotz	53a73038e3	hotfix: TestGraphRewriteEfficiency.test_create_many_uops	2024-08-23 15:51:57 -07:00
George Hotz	7c3ba3fa8a	improve match stats + custom early reject [run_process_replay] (#6260 ) * improve match stats [run_process_replay] * custom_early_reject	2024-08-23 15:28:57 -07:00
George Hotz	0b0a8829fb	allowed_len early stop [run_process_replay] (#6257 ) * vectorize single rule [run_process_replay] * allowed_len gate * i mean, i guess i like the rule * cleaner way to write that, and faster	2024-08-23 13:31:07 -07:00
George Hotz	a18744188f	more early reject [run_process_replay] (#6254 ) * simple matcher in alu [run_process_replay] * never mind, i don't like simple matcher * allowed_len == 0 is okay sometimes * more generic matcher	2024-08-23 12:16:44 -07:00
qazal	0d4887e9df	use UOps.WMMA everywhere (#6255 ) * add UOps.WMMA_AXIS * delete ReduceOps.WMMA from ops	2024-08-23 15:03:26 -04:00
chenyu	66d0b14a20	simpler CMPLT UOp _min_max [run_process_replay] (#6251 )	2024-08-23 10:36:16 -04:00
chenyu	590c0922b6	Tensor.prod (#6250 ) * Tensor.prod a new reduce op! * onnx ReduceProd	2024-08-23 10:06:32 -04:00
qazal	78d6bd8b41	start graph rewrite in the scheduler (#6248 ) * start graph rewrite in the scheduler * test: enable it * test timings * only fails in multi reduce * more isolated tests	2024-08-23 13:15:55 +03:00
chenyu	75700edf73	minor bitcast touchup (#6246 ) `not A == B` -> `A != B`	2024-08-22 20:25:28 -04:00
chenyu	4d40de867b	remove redundant `c1-(x+c2)` rule [run_process_replay] (#6243 )	2024-08-22 16:45:49 -04:00
George Hotz	238896ca02	loooking into graph rewrite speed (#6239 ) * loooking into graph rewrite speed * track, replace is slow * if all same, no permutations [run_process_replay] * types so compile works * no implied comprehension * TRACK_MATCH_STATS=2	2024-08-22 13:17:55 -07:00
chenyu	f62c4b3b5f	remove redundant `-(xc)` pattern [run_process_replay] (#6242 ) covered by `xc0*c1`	2024-08-22 16:11:02 -04:00
chenyu	e745e16441	remove UnaryOps.NEG (#6238 ) * Remove UnaryOps.NEG generated new dataset with ``` time JIT=2 PYTHONPATH=. ./extra/optimization/generate_dataset.sh gzip /tmp/sops mv /tmp/sops.gz extra/datasets/ ``` * fix that	2024-08-22 14:21:39 -04:00
nimlgen	6c4ddd6260	hcq skip tests when no multidev (#6235 ) * hcq skip tests when no multidev * linter * a bit higher tinout	2024-08-22 18:27:16 +03:00
chenyu	08539f08b0	fix UOp repr with Variable in arg (#6236 )	2024-08-22 11:06:33 -04:00
chenyu	3fc8203475	remove NEG from handwritten ast in tests (#6234 ) * remove NEG from handwritten ast in tests * test_linearizer_failures	2024-08-22 09:06:59 -04:00
chenyu	1c5ef5b793	format test_linearizer_failure (#6231 ) made it easier to remove NEG	2024-08-21 21:10:56 -04:00
George Hotz	5cdec79469	simpler expand without dont_expand_args [run_process_replay] (#6230 ) * simpler expand without dont_expand_args [run_process_replay] * Revert "simpler expand without dont_expand_args [run_process_replay]" This reverts commit 81693024c097c31e601f1a199a631e9eda0d9638. * exclude_args * why does that fix it * correct fix * _swizzle_args should be fast * add comment * zip is tuples	2024-08-21 17:48:45 -07:00
nimlgen	78c94abe9c	raise time limit for ci in test_profile_multidev_transfer (#6227 )	2024-08-21 22:42:03 +03:00
gswangg	c74b318458	migrate test_linearizer.py to UOp AST, pt. 2 (#6228 )	2024-08-21 22:16:11 +03:00
George Hotz	c3168952f0	wip: tracking pattern matcher [run_process_replay] (#6225 ) * wip: tracking pattern matcher * better * proper dedup * timing * early reject * mergable match stats * TrackedPattenMatcher * fix TrackedPattenMatcher * cleanups * clean that too * remove early_reject * Revert "remove early_reject" This reverts commit dc2aef14b8f5da58f5ec9566daf252513cac394c. * total * sort by time * match_stats cleanup	2024-08-21 11:57:26 -07:00
chenyu	a666450e4d	UOp pattern x + x -> x * 2 (#6224 ) * UOp pattern x + x -> x * 2 now there's no NEG, with this it covers all kinds of ax+bx * can remove x-x	2024-08-21 12:06:19 -04:00
chenyu	c9a9631818	no UnaryOps.NEG in generated UOp patterns (#6209 ) * no UnaryOps.NEG in generated UOp patterns removed pattern `x * (-1) -> -x` and `x != True` * those are fine because NEG became CMPNE and True * fix sd validation L2 norm	2024-08-21 11:08:22 -04:00
qazal	3b8cc5a3e0	more multireduce tests prep for neg removal [run_process_replay] (#6220 )	2024-08-21 12:45:24 +03:00
qazal	86c036f0d3	reorder uops.py [run_process_replay] (#6219 ) * reorder uops.py [run_process_replay] * nop spacing	2024-08-21 11:39:55 +03:00
qazal	f03e5a4b3b	test_multireduce const has a shape (#6218 )	2024-08-21 11:02:45 +03:00
George Hotz	911bf7216c	remove unused match rules [run_process_replay] (#6217 )	2024-08-21 00:16:04 -07:00
George Hotz	2c42e9c2c6	faster rewrite, no folder in expand/reduce [run_process_replay] (#6216 ) * faster rewrite, no folder in expand/reduce [run_process_replay] * is removing the expander there okay * parens * don't reconstruct exact match uop * fast do_reduce * expand pyint * most of the parents gains with less lines	2024-08-20 23:36:58 -07:00
George Hotz	16f420f7a7	split full_graph_rewrite and linearize_uop [run_process_replay] (#6215 ) * split full_graph_rewrite and linearize_uop * fix tests * graph rewrite in test uops * add types	2024-08-20 20:12:33 -07:00
George Hotz	9faf205601	CIFAR trainer + various bugfixes / improvements (#6146 ) * move cifar into datasets * support for pathlib Tensors, tar_extract, and fetch gunzip * too early for Device.DEFAULT * simpler hlb_cifar + .to(None) is default * new compiler failure, start beautiful_cifar * beautiful cifar runs but is broken * jit train step * cleaner * std_mean, not mean_std * more correct * fast indexing * don't print that * torch load broken * add eval * nicer bar * decoraters are the way to do this * bounds check the load * a few ops * batchnorm bugfix, if track_running_stats is False, use online estimate * full timing * fix fusion * unneeded realize * master tensor	2024-08-20 16:58:46 -07:00
George Hotz	296368f0dd	Revert "delete arg from cast [run_process_replay] (#6202 )" (#6214 ) This reverts commit `ec52a09393`.	2024-08-20 16:45:30 -07:00
nimlgen	89c4cffd86	nv fix size in SET_SEMAPHORE_A (#6213 )	2024-08-21 01:47:10 +03:00
qazal	ec52a09393	delete arg from cast [run_process_replay] (#6202 )	2024-08-20 14:06:16 -07:00
Francis Lam	7376b67e36	extra/gemm/triton_nv_matmul: fix Program arguments (#6212 ) remove op_estimate	2024-08-20 14:05:38 -07:00
madt2709	4bb98d8882	Fix track_running_stats in batchnorm (#6200 ) * Fix track_running_stats in batchnorm * Fix linter * Update test_fold_conv_batchnorm_notrain to keep allowed at 1 * Add test_fold_conv_batchnorm_notrain_no_running_stats * Save 1 line	2024-08-20 14:01:22 -07:00
George Hotz	d9c62a33c3	add cifar to datasets.py (#6210 )	2024-08-20 11:42:49 -07:00
George Hotz	a5d79688db	fix indexing out of bounds (#6208 ) * fix indeing out of bounds * 5 ops per access is fine	2024-08-20 11:34:56 -07:00
chenyu	4451bcaf95	update test_arange test_llama_embedding_opt (#6207 ) non CI uses larger embedding, still same orders of magnitude	2024-08-20 13:58:43 -04:00
ignaciosica	e4bb63c1be	Refactor amd kernel prefix (#6205 ) * refactor amd kernel_prefix * restore removed comment * nit	2024-08-20 10:37:36 -07:00
qazal	074cf780dd	add option to only benchmark schedule [run_process_replay] (#6204 )	2024-08-20 16:51:27 +03:00
Francis Lata	8fd8b970b0	update URL to eval cases from recent MLPerf file movements (#6201 )	2024-08-20 08:43:13 -04:00
gswangg	0e6f057eae	migrate test_linearizer.py to UOP AST (pt. 1) (#6150 ) * migrate test_multioutput to UOP AST * inline buf declarations * migrate test_multireduce to UOp AST * update test_mid_dim_multireduce to UOp AST * update test_triple_multireduce with UOp AST * make global definitions more concise * update test_double_reduce_multireduce with UOp AST * update test_multireduce_with_parallel with UOp AST * update test_multiout_multireduce to UOp AST * make gidx style consistent across updated tests --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-08-20 10:02:20 +03:00
chenyu	10330a41c7	add CMPNE tests in test_uops (#6196 ) fixed the output_dtype for CMPNE and match the tests for CMPLT	2024-08-19 19:41:21 -04:00
chenyu	21d6739237	remove UnaryOps.NEG from lazy.py (#6193 ) * remove UnaryOps.NEG from lazy.py * neg is no longer unary	2024-08-19 18:41:28 -04:00
chenyu	4d1b5781b5	remove UnaryOps.NEG from function.py (#6187 ) * remove function.Neg prep to remove UnaryOps.NEG * replace all NEG in function.py	2024-08-19 17:39:15 -04:00
nimlgen	bc44e6501b	_gpu_alloc -> allocator.alloc (#6189 ) * _gpu_alloc -> allocator.alloc * not needed this import * pylint	2024-08-19 23:34:22 +03:00
chenyu	96d502d8b7	update function.Max backward (#6190 ) instead of `(1-(x!=max))`, use `(x!=max)!=True`. prep to remove Unary.NEG, also this can be instruction fused later more easily	2024-08-19 16:06:14 -04:00
Gabe Caldwell	bdd6325f31	default num_classes value for one_hot (#6182 ) * num_classes=-1 If num_classes set to -1, the number of classes will be inferred as one greater than the largest class value in the input tensor. * num_classes desc comment to explain num_classes default and what that means. * replacing ' with `	2024-08-19 12:07:14 -07:00
Alessandro Benetti	9328248610	support for std_mean and cross_entropy (#6181 ) * support for std_mean and cross_entropy (#3) * Cross entropy and std mean support * remove extra examples	2024-08-19 12:06:44 -07:00

1 2 3 4 5 ...

5720 Commits