tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-22 13:28:06 -05:00

Author	SHA1	Message	Date
nimlgen	9d3c40601f	am: fast memory manager (#8654 ) * start * progress * fixes * smth * mini fixes * fix2 * ugh, need this for now * faster * cleanups * tiny linters * make mypy happier * test & free pts * ops * linter * cleanup vm * fix * remove map_from * tiny fixes * add test to ci	2025-01-20 16:58:22 +03:00
ignaciosica	d2234e308a	tf32 tc for nv and ptx (#8635 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-17 17:43:57 -08:00
nimlgen	f671da6755	ci: add AM start time to benchmark (#8637 ) * ci: add AM start time to benchmark * am: unlock it * add AMD * revert this	2025-01-16 14:47:36 +03:00
chenyu	4ee3243c93	JITBEAM=2 for LLaMA-3 8B on 4 GPUs [pr] (#8623 ) is it fast?	2025-01-14 19:52:38 -05:00
George Hotz	bfbe81df71	remove cast before view (#8613 ) * remove cast before view * greener * indexing * that passes too * openpilot too * ack --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-01-14 15:04:58 -05:00
chenyu	393eec3201	raise RuntimeError for uneven shard [pr] (#8593 ) no 7B llama on 6 GPUs skip 70B	2025-01-14 14:51:48 -05:00
ignaciosica	d5a646d492	CUDA Turing TC (#8597 ) * init turing tc * reorder tc * hotfix: remove some spaces * revert var name to x * consistent order of factors * revert order of terms to match old stuff --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-14 10:35:14 -08:00
nimlgen	1ff6862a3d	ci: sleep a bit to let the driver unload the prev pid (#8605 )	2025-01-14 15:55:23 +03:00
nimlgen	74b83c4c41	am in ci (#8532 ) * try am in ci * no sudo * temp * run more am test * run half on am * insert amdgpu * other machine as well	2025-01-13 19:55:17 +03:00
qazal	2f71a00236	remove PYTHONPATH=. from mypy ci [pr] (#8578 )	2025-01-12 09:52:03 -08:00
qazal	98c9e23560	remove global PYTHONPATH setting in CI (test.yml) [pr] (#8568 ) * remove global PYTHONPATH setting in CI [pr] * only run mypy in tinygrad/ * still needed for benchmarks	2025-01-11 12:47:50 -05:00
qazal	60503c8621	use CAPTURE_PROCESS_REPLAY=1 in CI [pr] (#8564 )	2025-01-11 06:03:48 -05:00
nimlgen	aa3d612df2	add script to install amd mockgpu on macOS (#8536 ) * upload artifact every time * hm * sh script * hm * hm2 * hm2 * hm2 * no sudo * def paths * small comments * text * try auth for bigger limits	2025-01-09 01:29:25 +03:00
patrini32	21c7d7c71a	MOCKGPU amd test on OSX (#8505 ) * add tests * Refactor * cache only amd/comgr/build (saves a lot of space) * fix * silence warning and add check for cache hit before installing cmake * run only pytest * use actions/cache * lower timeout-minutes and add Device.DEFAULT step * add nvidia to Device.DEFAULT check * typo * fix * Check only for amd and run only 2 test	2025-01-08 14:27:56 +03:00
chenyu	85a4397f27	fix create_schedule_with_vars usage in allreduce benchmark [pr] (#8522 ) * fix create_schedule_with_vars usage in allreduce benchmark [pr] because i didn't know how to use it... * increase time limit because tiny17 is slow	2025-01-07 01:30:01 -05:00
chenyu	0061dc7447	fix benchmark allreduce and add to ci [pr] (#8521 )	2025-01-07 00:37:59 -05:00
nimlgen	9bc317d5d2	mockcuda (#8503 ) * init mockcuda * run gpu ocelot * fix * sfixes * disable broken tests * linter * these fails as well * pylint * myypy * this fails on real platforms as well * mypy please	2025-01-05 01:23:57 +03:00
uuuvn	5ffc50d58c	Clang JIT (#8481 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-01-03 11:12:55 -05:00
George Hotz	803a47494e	Revert "Clang JIT (#8312 )" (#8452 ) This reverts commit `b6266c8e41`.	2024-12-30 17:49:35 -05:00
uuuvn	b6266c8e41	Clang JIT (#8312 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-30 17:37:53 -05:00
George Hotz	0addbad36d	Happy New Year! Let's get AM merged	2024-12-30 13:15:10 -05:00
qazal	9defbc7d54	add symbolic_simple to the scheduler [pr] (#8419 )	2024-12-26 20:05:08 +08:00
George Hotz	9f62c80f68	hotfix: this is a loan	2024-12-20 14:47:04 -08:00
qazal	d78e75f710	hotfix: use ubuntu-22.04 ci from 8249 (#8251 )	2024-12-15 02:23:00 +02:00
George Hotz	8a50868264	touchup function.py [pr] (#8220 ) * touchup function.py [pr] * remove ALLOWED_READ_IMAGE * eh, keep it, just change it	2024-12-13 13:07:00 -08:00
ignaciosica	0a00187dce	add real AMX tests to benchmark (#8216 ) * add real amx to benchmark * add debug=2 to check tc are triggered	2024-12-13 14:03:41 -05:00
George Hotz	d9a0880d33	delete fuzz uops (not tested) [pr] (#8181 )	2024-12-12 01:41:27 -08:00
chenyu	26e049ab40	add ALLOWED_READ_IMAGE=2131 to openpilot (#8166 ) added as exact number check now as it's not clear if more/less than allowed is any better	2024-12-11 12:14:17 -08:00
Ahmed Harmouche	a73e3677d0	Test linearizer on webgpu (#8159 ) * Test linearizer on wgpu * Skip tests due to exceeded dims	2024-12-11 17:03:26 +01:00
chenyu	d462f8ace0	use HALF in cifar wino benchmarks (#8153 ) more representative as it hits tensor cores on tinyboxes	2024-12-10 20:21:00 -05:00
Ahmed Harmouche	a8cfdc70ed	Run more webgpu tests (#8142 )	2024-12-10 23:20:04 +01:00
Ahmed Harmouche	ed7318a3f5	Fix puppeteer install (#8148 ) Clean npm cache before puppeteer install	2024-12-10 23:06:33 +01:00
Ahmed Harmouche	71dd222f66	Fix setitem on wgpu (#8144 )	2024-12-10 19:34:25 +01:00
George Hotz	f83d715f41	move checks into compile3, delete compile2 [pr] (#8127 ) * move checks into compile3 [pr] * test_vs_onnx * test v torch works * float16 won't compile on compile3 * actually delete compile2	2024-12-09 14:21:42 -08:00
George Hotz	87c360c4b5	hotfix: add --size 8B to llama3	2024-12-09 07:53:20 -08:00
chenyu	e9692de42b	don't FUZZ_ALL_ACTIONS in fuzz_linearizer.py (#8096 ) mostly for speed, this is just making sure the script runs	2024-12-06 17:22:17 -05:00
Ahmed Harmouche	ce72fe1411	u32 to f16 in tinygrad (#8074 ) * f16 decompression in tinygrad * Typing and cleanup	2024-12-06 12:00:13 +01:00
Ahmed Harmouche	ff9a89f714	Proper dtypes for input/output of exported WebGPU model (#8053 ) * Respect input/output dtypes in exported WebGPU model * Add some comments about skipped dtypes	2024-12-05 10:38:05 +01:00
George Hotz	83aecbdc70	do gpuocelot copy manually [pr] (#8050 )	2024-12-05 11:51:20 +08:00
George Hotz	4a208bfb28	bump download cache version	2024-12-05 11:42:34 +08:00
Ahmed Harmouche	13eedd373b	Run WebGPU tests on ubuntu (#8033 )	2024-12-04 12:42:04 +01:00
Ahmed Harmouche	db330a3110	Remove WebGL (#8012 )	2024-12-03 16:02:53 +01:00
George Hotz	dddfb494d7	don't mutate the uop/lazybuffer, just the Buffer [pr] (#8000 ) * don't mutate the uop/lazybuffer, just the Buffer [pr] * fix red test * try different fix * that * that's the right fix * test for fixed behavior * bump to 3.12	2024-12-03 19:03:51 +08:00
chenyu	17d5719a38	add process replay to webgpu tests (#7998 )	2024-12-02 20:27:29 -05:00
chenyu	3c8c98253a	BEAM_DEBUG=1 in speed_v_theoretical (#7942 ) * DEBUG=3 in speed_v_theoretical * BEAM_DEBUG=1	2024-11-28 08:30:55 -05:00
chenyu	a6171cbe71	add stable diffusion v2 to mac benchmark (#7917 ) this caught #7902	2024-11-26 22:09:43 -05:00
qazal	345457f518	webgpu cache packages (#7911 ) * webgpu -n=auto * fix webgpu ci cache	2024-11-27 00:17:36 +08:00
qazal	6102e3159c	webgpu -n=auto (#7910 )	2024-11-26 21:13:12 +08:00
George Hotz	4e5bf9dc7a	test assignment in jit (#7906 ) * test assignment in jit * don't waste lines * skip broken test in webgpu	2024-11-26 17:37:00 +08:00
Ahmed Harmouche	10618aba98	Bring back WebGPU (#7063 ) * Start from andredaprato:webgpu-clean * Fix infs * inf wgsl function is not needed * Emulated ulong for threefry, more tests passing * Randomness tests passing * Update model export to support new changes in webgpu, efficientnet export works again * Simplify shift emulation in wgsl * Delete test file * Fix bigger than u32 u32 literal * Why was skip copies added here? * Python3.12 for webgpu tests * Fix model export syntax error * Get test ops passing with some skips * Fix lint * Much simpler shift * Run more tests * Timestamp queries are not supported in CI, so skip search tests * All fancy indexing passing * r is ctx * Run more dtype tests by using is_dtype_supported * Cleanup ulong shift rendering * UPat -> Pat, UOps -> Ops * Pat -> UPat * Refactor render_ushift if-else * Pattern to avoid ulong mul * Remove vals_dtype * is_nan trick + rewrite, test_isnan passing * Rewrite a * select(1, nan, gate) -> select(a, nan, gate) * No arg, just op * Support char, uchar, short, ushort * Run test_index_mnis now that we have uint8 * Fix pyling * Save 3 lines by using base Compiler * No more long emulation * Remove fixup_binops * No more external_local_bufx wgsl specific cstyle modif, use base extra_pm * Simpler, faster copyin/out * Skip some new tests that use long * Fix typo * copyout touchup * Save lines by using render_cast * WebGL is not supported in core, delete it from is_dtype_supported * More narrow test skips for some unary tests * TernaryOps, UnaryOps -> Ops * TinyGrad supports WebGPU * StableDiffusion demo: f16tof32 gpu is a lib, update UI * Packed load/store, no more scale_size, no core tinygrad changes * Rename copyin, copyout * Device -> dev * Fix lint * Pattern matcher rule for packed load/store * Refactor * Shorter packed load/store * this should fix lint * Fix mypy * SD compile script working * New SD webgpu UI * New default prompt * New SD weights * Fix title when webgpu not available * Run symbolic tests, simplify is_nan, use round_up * Show step time on UI * Bump minimum wgpu version to v0.19 * Fix latent --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-11-26 12:26:40 +08:00

... 6 7 8 9 10 ...

1003 Commits