tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-28 08:17:58 -05:00

Author	SHA1	Message	Date
qazal	a1dee0e532	early uop UOps.BUFFER (only once) [run_process_replay] (#6820 ) * buf_uops lookup [run_process_replay] * next diff will be this * fix ImageDType	2024-10-01 08:46:05 +08:00
nimlgen	e213bea426	nv shorter (#6819 )	2024-09-30 19:39:32 +03:00
George Hotz	0f28e93224	add pickle support for pattern matchers [run_process_replay] (#6816 ) * add pickle support for pattern matchers [run_process_replay] * cleaner and all * no closures * fix tests * revert that * final * cleaner * python 3.8 fix * add round trip back * this * waste lines on this. that's the final line count * max print better * more targetted fix * regrettably add 3.8 support	2024-09-30 21:54:46 +08:00
chenyu	f59517754e	add RESET_STEP in bert to control reset (#6818 ) same as resnet	2024-09-30 09:39:04 -04:00
qazal	0c24fec9f4	test current behavior of const schedule [run_process_replay] (#6817 )	2024-09-30 21:02:01 +08:00
qazal	4a4aa69b84	add a better dedup test for DEFINE_VAR with CONST arg (#6813 )	2024-09-30 15:43:55 +08:00
qazal	e7fcbe1a4d	refactor test_linearizer correctness asserts (#6812 )	2024-09-30 15:31:02 +08:00
George Hotz	9dd9f71011	no global kernel stuff [run_process_replay] (#6808 ) * use traceback instead of global metadata crap [run_process_replay] * save the kernel * correct, imports clean, no device * UNPARENTED * speed * proudly unparented * Update ops.py * update tests for unparented --------- Co-authored-by: qazal <qazal.software@gmail.com>	2024-09-30 13:52:33 +08:00
George Hotz	00b3171902	mod can be and (#6810 )	2024-09-30 12:33:15 +08:00
qazal	c9d763d331	refactor to axis_arg [run_process_replay] (#6806 ) * refactor to axis_arg [run_process_replay] * remove more arg[1]s	2024-09-30 09:37:31 +08:00
qazal	7099af4450	VIZ show rendering errors (#6807 ) * VIZ show rendering errors * show the entire traceback	2024-09-30 09:35:36 +08:00
George Hotz	2ed94e447f	gpt2: corealize opt and loss	2024-09-30 09:11:20 +08:00
qazal	2ec73d6f05	push swizzle through dim change (#6801 ) * push swizzle through dim change * can this be generic * generic version * cleanups	2024-09-30 09:04:59 +08:00
George Hotz	a76c6c740c	hand pad gpt2 (#6805 )	2024-09-30 09:03:07 +08:00
geohotstan	282abb4234	add get_available_backends (#6771 ) * lol * 1 less line lmfao * something like this? * comment * pylint * just iterator * backends -> devices --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-09-30 08:58:04 +08:00
qazal	3c15e64273	VIZ prep for the new kernel render (#6800 ) * refactor to list * remove prints in test_viz * more cleanup	2024-09-29 20:06:31 +08:00
qazal	01c9653614	add UOps.BUFFER, delete Buffer in UOps.DEFINE_GLOBAL (#6798 ) * delete DEFINE_GLOBAL buffer arg * add UOps.BUFFER	2024-09-29 18:56:07 +08:00
qazal	5e1221845f	refactor schedule edges to tuple[LazyBuffer, ...] [run_process_replay] (#6797 )	2024-09-29 11:34:39 +08:00
chenyu	68e59eb3f5	update mlperf-logging to 4.1.0-rc3 (#6796 )	2024-09-28 21:45:37 -04:00
qazal	dab05ff070	match dataclass.replace in UOp.replace [run_process_replay] (#6792 ) * UOp replace matching dataclass replace * p2 * replace creates a copy	2024-09-28 16:28:49 +08:00
chenyu	494b20e886	bert BS back to 54 (#6791 ) 60 does not run end to end	2024-09-27 22:16:05 -04:00
chenyu	572d77d1d9	bert script delete eval data after eval (#6790 ) fits BS=60 which is 2% faster than 54. also fixed wandb logging params	2024-09-27 20:54:00 -04:00
chenyu	f9c8e144ff	chmod +x mlperf bert script for red (#6789 ) also disabled raising power cap in setup. wozeparrot mentioned that's unstable and might cause bert training issue on red	2024-09-27 11:27:32 -04:00
Francis Lata	d3a387be63	[MLPerf] Prepare openimages dataset script (#6747 ) * prepare openimages for MLPerf * cleanup * fix issue when clearing jit_cache on retinanet eval * revert pandas specific changes	2024-09-27 11:13:56 -04:00
chenyu	bc82f8c5be	use where in dropout (#6758 ) should save memory since we only store mask in bool instead of the upcasted used in mul	2024-09-27 11:11:43 -04:00
qazal	76b3c1e818	add all realized Buffers to schedule graph edges [run_process_replay] (#6786 ) * add realized Buffers to bufs * simpler checks	2024-09-27 19:25:51 +08:00
qazal	568c97f7a2	add UOp.define_global [run_process_replay] (#6787 ) * add UOp.define_global [run_process_replay] * no src	2024-09-27 19:24:03 +08:00
nimlgen	b95f47784a	qcom sleep when sync (#6785 ) * qcom sleep when sync * linter * short	2024-09-27 19:14:10 +08:00
qazal	fb3fe6f39b	better VIZ (#6781 ) * ui changes * make kernels global * dont save buffers when running VIZ=1 * remove flex in layout * use os.execv * del server thread * server close * cleanup * logs cleanup * rm getenv * cleanups * remove global	2024-09-27 18:38:31 +08:00
chenyu	2fc26890c9	default BS=9 in handcode_opt bert (#6783 ) using 54 for 6 gpus now, and 2 is not a good default	2024-09-27 04:38:16 -04:00
George Hotz	9a3f6f392d	llm.c tok/s	2024-09-27 00:46:18 -07:00
George Hotz	b0e70ab04f	llm.c updates	2024-09-27 15:25:59 +08:00
George Hotz	eaa1e0eeeb	rename constant_folder to sym [run_process_replay] (#6780 )	2024-09-27 14:54:54 +08:00
qazal	900b21ef0c	viz delete const after fold (#6778 ) * viz delete const after fold * add base to tests	2024-09-27 11:58:01 +08:00
qazal	94e43dc49a	add Buffer.to_uop [run_process_replay] (#6777 )	2024-09-27 11:41:23 +08:00
qazal	98a81b36e1	viz table view (#6743 ) * fix matcher with ctx * current_kernel fix * add table * make the right things clickable * some more init work * add kernel resizer * Revert "add kernel resizer" This reverts commit `035eef3703`. * allow scroll	2024-09-27 10:26:46 +08:00
chenyu	bea7ed5986	add RUNMLPERF=1 to bert dev_run.sh (#6775 ) already set in run_and_time.sh, need RUNMLPERF=1 for it to load real data	2024-09-26 11:00:49 -04:00
George Hotz	c178dc1071	faster uops ci [run_process_replay] (#6774 )	2024-09-26 20:15:01 +08:00
George Hotz	249af24f18	metal bfloat as cast (#6773 )	2024-09-26 19:31:40 +08:00
George Hotz	ed2f28388f	render cast is rewrite rules [run_process_replay] (#6772 ) * render cast is rewrite rules [run_process_replay] * move load/store to rewrite rules * render_alu smaller * render_gep	2024-09-26 19:03:31 +08:00
nimlgen	3c56aeee70	add Tensor.from_blob (#6765 ) * draft tensor from pointer init * some docs and types * comment * cleaner * test * malloc * qcom cl interop * jit example * cleaner * dealoc * wording * docs	2024-09-26 18:33:19 +08:00
George Hotz	14ad47b515	rewrite to use uops if (#6764 ) * rewrite to use uops if * does this pass * careful penalty * fix tests * remove unused stuff * that's a cstyle rewrite * Update test_linearizer_dumb.py	2024-09-26 18:09:09 +08:00
George Hotz	7e7184bb13	cleaner ptx match rules [run_process_replay] (#6770 ) * cleaner ptx match rules [run_process_replay] * clean up load/store rules * now that's clean * oops, typo * cast back to bool	2024-09-26 17:44:10 +08:00
chenyu	12de203a43	add IGNORE_JIT_FIRST_BEAM to bert scripts (#6769 ) * update bert BEAM params copied from resnet to start with * just IGNORE_JIT_FIRST_BEAM	2024-09-26 05:38:24 -04:00
wozeparrot	15cd42cfb9	feat: support TRACEMETA=2 in handcode_opt (#6767 )	2024-09-26 16:58:29 +08:00
chenyu	5a5fbfa1eb	smaller bert script change (#6768 ) only WANDB and RUNMLPERF order. BENCHMARK and BEAM will be done differently	2024-09-26 04:54:28 -04:00
wozeparrot	abd484a9f7	fix: need numpy for docs and testing (#6766 )	2024-09-26 16:44:59 +08:00
wozeparrot	2b899164c6	no numpy (#6751 )	2024-09-26 16:40:18 +08:00
George Hotz	7fca0bc912	use pattern matcher for image [run_process_replay] (#6762 ) * use pattern matcher for image [run_process_replay] * try again * this	2024-09-26 15:49:09 +08:00
qazal	197f8fd986	early uop globals with Buffer (#6753 )	2024-09-26 15:34:21 +08:00

1 2 3 4 5 ...

6172 Commits