tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-11 07:05:04 -05:00

Author	SHA1	Message	Date
qazal	9250452da4	no codegen import in ops [pr] (#6888 ) * no codegen import in ops [pr] * @track_rewrites * all functions need this * polish	2024-10-07 20:54:21 +08:00
George Hotz	f7f94cd62f	bitcast cleanup [pr] (#6933 )	2024-10-07 19:16:16 +08:00
chenyu	0cf815a93a	bert use BS=66 and update hparams (#6932 ) with dropout memory improvement, we can fit BS=66 now. revert back to the hparams in #5891 too	2024-10-07 05:08:27 -04:00
ignaciosica	32ac24c45c	Generic wmma rendering for cuda, ptx [run_process_replay] (#6838 ) * generic wmma rendering for cuda, ptx - also adds wmma generic shape ops_python support * hotfix: fixed values in ops_python * hotfix: more fixed values * hotfix: revert changes in ops_python * refactor wmma rendering * hotfix: get n_args directly * hotfix: use n_args[0] for a * hotfix: simplify * hotfix: add args_slices * hotfix: rename args back to operands * hotfix: fix spacing * hotfix: rename upc to sz * hotfix: rename args to operands in assembly * hotfix: space * hotifx: add comment for literal 4 * hotfix: rename some variables and change for clarity	2024-10-07 16:36:36 +08:00
qazal	b82023c97e	process replay cleanup to generic _pmap [pr] (#6929 ) * process replay cleanup to generic _pmap [pr] * delete `COMPARE_SCHEDULE`	2024-10-07 13:57:05 +08:00
qazal	16312b4c59	rip out old scheduler process replay stuff, diff pure UOps [pr] (#6927 )	2024-10-07 13:20:35 +08:00
chenyu	999e3780e9	dropout contiguous after >= p (#6892 ) make it a bool buffer	2024-10-06 19:40:42 -04:00
wozeparrot	9eb6eef441	seed in tensor (#6869 )	2024-10-06 14:46:58 -04:00
Tobias Fischer	f9e32f2bb2	clip device fix (#6924 )	2024-10-07 00:47:32 +08:00
chenyu	01a2d7316d	dtype=float in bert log_softmax for loss and accuracy (#6916 )	2024-10-06 11:15:56 -04:00
jeffzh4ng	19a7e41113	implement logcumsumexp (#6921 ) * implement logcumsumexp * change axis=None to axis=0	2024-10-06 10:45:36 -04:00
George Hotz	f588169fdc	hotfix: ad for DEBUG=2 in the mnist tutorial	2024-10-06 21:05:48 +08:00
qazal	10ff1d6fb9	viz prep refactor for tracked scope decorator [pr] (#6920 ) * viz prep refactor for tracked scope decorator [pr] * fix fuzzer	2024-10-06 16:02:09 +03:00
qazal	837f9c6832	new viz fuzz tests, track multiple contexts (#6913 ) * add FUZZ_VIZ option * add FUZZ_VIZ=1 tests * use .replace * rewrites test * add rewrite_stack * add FUZZ_VIZ to ops * what if FUZZ_VIZ was up there * leave fuzz_viz for now	2024-10-06 14:58:15 +03:00
chenyu	75d9dcf000	support dtype in softmax and log_softmax (#6914 ) matches torch. for mixed precision training, we would want to use float for softmax	2024-10-06 07:18:15 -04:00
chenyu	718b959349	log epoch start and stop for bert (#6912 )	2024-10-06 06:39:46 -04:00
qazal	b066ef2282	small changes from the viz_rewrite branch [pr] (#6907 ) * simpler replace * dont show shapetracker consts * changed_nodes shouldn't exist for the first sink	2024-10-06 12:00:55 +03:00
chenyu	16c1fa4208	use BEAM=3 for red box bert runs (#6904 ) BEAM=4 slightly exceeded 30 minutes setup	2024-10-05 09:21:12 -04:00
chenyu	0e706227a2	add seed to bert result log filename (#6903 ) * add seed to bert result log filename * different name for different benchmark	2024-10-05 09:15:24 -04:00
George Hotz	8ed3a00c9c	ceildiv helper [pr] (#6899 )	2024-10-05 14:59:10 +08:00
chenyu	fd68b6dbc2	type annotation to round_up (#6898 ) * type annotation to round_up also cleaned up places where round_up was potentially called on symbolic * fix	2024-10-04 23:27:23 -04:00
chenyu	3c12244cfc	remove DTypeLike from lazy (#6897 ) keep only in tensor	2024-10-04 22:49:21 -04:00
George Hotz	0d6216aba1	bump the download cache (#6896 )	2024-10-05 10:23:18 +08:00
George Hotz	4058a99275	symbolic in ops 2 [pr] (#6895 ) * move symbolic to ops, simple [pr] * fix for shapetracker	2024-10-05 10:20:07 +08:00
chenyu	08414d7b7c	cleanup test_uop_symbolic.py (#6894 ) no more test_symbolic for reference, so force expected output to be exact instead of a set	2024-10-04 20:53:10 -04:00
ignaciosica	555bcb5e54	static access for code_for_op (#6889 )	2024-10-05 07:38:01 +08:00
vladov	5f6b6162b3	Suppress warnings in transcendental tests. (#6891 )	2024-10-05 07:37:17 +08:00
nimlgen	707c805a68	nv set localmem sm count to max (#6890 )	2024-10-04 23:29:46 +03:00
George Hotz	4df5c7a4ef	move lazy to engine [pr] (#6886 ) * move lazy to engine [pr] * engine.lazy	2024-10-04 23:19:26 +08:00
George Hotz	6b063450df	move hcq device to runtime [pr] (#6879 ) * things that are only used in one place don't belong in helpers [pr] * start moving hcq device [pr] * fix paths	2024-10-04 22:26:50 +08:00
George Hotz	5be2bd18a6	use UOps.BIND instead of ASSIGN, it's different (#6885 )	2024-10-04 22:26:33 +08:00
chenyu	4c3895744e	type annotation for layernorm (#6883 )	2024-10-04 09:03:56 -04:00
George Hotz	8ca506ee37	remove the magic methods for moving between devices [pr] (#6881 ) * remove the magic methods for moving between devices [pr] * remove unneeded clang	2024-10-04 20:27:52 +08:00
chenyu	7c8849010a	fix var_vals in MCTS (#6882 ) tested with JITBEAM=100 llama	2024-10-04 08:19:35 -04:00
George Hotz	a0cb16ac61	node cleanup + local metal test speed [pr] (#6880 ) * node cleanup [pr] * fix tests, including the double one on metal * no time tqdm tests	2024-10-04 18:14:23 +08:00
George Hotz	cdff1d75b6	things that are only used in one place don't belong in helpers [pr] (#6878 ) * things that are only used in one place don't belong in helpers [pr] * pretty print moved	2024-10-04 17:27:38 +08:00
George Hotz	f4ec39fe58	switch symbolic from old to uops, final PR (#6872 ) * switch symbolic from old to uops, final PR * two wrong answers * not needed resolves * symbolic ops passes * symbolic ops passes * progress * tests pass (almost) * fix last test * fix some tests * global binding and unbinding * Revert "global binding and unbinding" This reverts commit `9456725630`. * that test works now * vars on uop doesn't recurse * fix fuzzer * update * fix type * fix gpt, it's UOp now * ssimplify symbolics	2024-10-04 16:42:27 +08:00
George Hotz	738a5794a9	last update for new symbolic [pr] (#6877 )	2024-10-04 14:58:51 +08:00
chenyu	7391376528	update bert hparams (#6876 ) 4h32m with this https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/q99frv1l/overview. loss scaler 213->210. matched the closest submission, no nan for ~10 runs. increased lr and total step a bit. `PARALLEL=0` after setup, same as resnet.	2024-10-04 00:39:06 -04:00
George Hotz	0dee49637e	small symbolic changes [pr] (#6874 ) * small symbolic changes [pr] * need that unbind	2024-10-04 12:03:08 +08:00
George Hotz	c50d3c4979	move const mover to ops [pr] (#6873 ) * move const mover to ops [pr] * move more	2024-10-04 11:49:32 +08:00
Tim Becker	d42cb5596f	Restore fast path for matching new_src in rewrite (#6870 )	2024-10-04 11:22:24 +08:00
ignaciosica	8931f20765	CLANG fixed ops python [run_process_replay] (#6866 ) * hotfix: fixed values in ops_python for AMX * hotfix: remove unused import	2024-10-03 20:40:04 +08:00
George Hotz	4b6732c4f6	safe changes for new symbolic [pr] (#6864 )	2024-10-03 20:39:15 +08:00
qazal	17068410e6	give EXT schedules metadata [pr] (#6865 )	2024-10-03 20:14:18 +08:00
qazal	5517a07a09	viz late to_program and benchmarks [pr] (#6851 ) * viz late to_program [pr] * benchmark resnet * delete all of checkStatus * revert that * fixup * get from kernel	2024-10-03 18:29:04 +08:00
qazal	c7925414df	don't default print the whole graph in buf limit error [pr] (#6861 )	2024-10-03 18:02:19 +08:00
George Hotz	e10245909a	explore global uop cache [pr] (#6863 ) * explore global uop cache * wvd uops * remove useless lru caches * key is is * simpler rewriter	2024-10-03 13:08:13 +08:00
George Hotz	a26c6a0ad0	cleanup with smax [pr] (#6854 ) * cleanup with smax [pr] * add that resolve	2024-10-03 08:11:02 +08:00
nimlgen	8bbf6fb88c	use mv_address in ops_gpu (#6856 )	2024-10-02 22:31:51 +03:00

1 2 3 4 5 ...

6249 Commits