tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-15 17:15:48 -05:00

Author	SHA1	Message	Date
nimlgen	0a139b1436	amd iface abstraction (#8413 ) * start on amd iface * t * unused import * fixes * internal api	2024-12-27 15:53:53 +03:00
nimlgen	90f1f0c9d5	eh (#8309 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-26 13:16:34 -05:00
nimlgen	a562ee2c6e	BumpAllocator rename start -> base (#8415 )	2024-12-25 23:12:55 +03:00
nimlgen	9ed064710a	hcq remove old profiler lines (#8414 )	2024-12-25 23:12:28 +03:00
chenyu	3f46425f1e	typos found by gemini [pr] (#8400 ) not very effective... maybe due to tokenizer	2024-12-24 22:32:25 -05:00
nimlgen	a647f3dd2c	move mockgpu to tests [pr] (#8396 ) * move mockgpu to tests * linter * i'm so sorry * sorry, python * path	2024-12-24 23:48:02 +03:00
chenyu	7ea633f94f	remove from __future__ import annotations from runtimes [pr] (#8373 ) it's not needed if we move the Device before Program and Allocator, which need Device. not updating hcq because it has a lot more stuff, and CLDevice requires CLDevice	2024-12-21 23:46:07 -05:00
chenyu	1ce9851ba6	import and type cleanups [pr] (#8359 ) Dict and DefaultDict and some imports	2024-12-20 21:52:02 -05:00
chenyu	e63c7818dc	few type cleanups [pr] (#8347 )	2024-12-20 01:56:01 -05:00
George Hotz	82833f1b3c	a little more typing [pr] (#8346 ) * a little more typing [pr] * few more	2024-12-19 22:09:52 -08:00
George Hotz	62e5d96446	more typing work [pr] (#8345 )	2024-12-19 21:46:35 -08:00
George Hotz	9c77e9f9b7	replace Tuple with tuple [pr] (#8344 ) * replace Tuple with tuple [pr] * replace List with list [pr] * replace Dict with dict [pr] * replace Set with set [pr]	2024-12-19 21:27:56 -08:00
George Hotz	adcdc583a2	small cleanups [pr] (#8343 ) * small cleanups [pr] * GPU suppress	2024-12-19 21:20:46 -08:00
George Hotz	3a9ca62b9e	get_single_element [pr] (#8328 )	2024-12-18 22:23:45 -08:00
nimlgen	777d2aec05	metal profiler + cpu_profile (#8291 ) * metal + cpu_profile * gpt example * linter + revert gpt2 for now * a bit of readme * linter * unrelated * tests * linter * b	2024-12-18 00:06:56 +03:00
nimlgen	af87e4b53c	viz profiler (#8287 ) * only hcq * fix get_metadata * linter * oops * tiny * linter * time * print pm * hmm * nits	2024-12-17 20:00:53 +03:00
George Hotz	cda34ccadf	hotfix: time.time -> time.perf_counter	2024-12-16 11:32:49 -08:00
nimlgen	a2a4ff30dc	hcq better timout haandling (#8269 )	2024-12-16 13:44:55 +03:00
chenyu	f05fd118a2	few minor code cleanups [pr] (#8267 )	2024-12-15 23:44:51 -05:00
chenyu	2e4c7d4cfb	add "tinygrad" to be part of cache_dir [pr] (#8188 ) instead of having sqlite / http download / metal compile to add "tinygrad" separately. also make it non-private since it's used in metal	2024-12-12 12:09:44 -05:00
nimlgen	bf7d1fcd2c	tiny import fixes in hcq graph (#8184 )	2024-12-12 16:30:06 +03:00
Ahmed Harmouche	2f2b1e792c	wgsl and ops_webgpu simplifications [pr] (#8182 ) Simplify wgsl and ops_webgpu	2024-12-12 14:21:58 +01:00
Ahmed Harmouche	1b94cc095a	Bump back wgpu to latest (#8179 )	2024-12-12 09:40:52 +01:00
chenyu	aaa3cc235d	unused `from __future__ import annotations` (#8171 )	2024-12-11 19:05:04 -05:00
George Hotz	8f4299fcc8	hotfix: suppress shutdown errors in CLProgram	2024-12-11 08:08:32 -08:00
nimlgen	3a7d64b96c	hcq remove update from args state (#8104 ) * hcq remove update from args state fix amd ugh qcom? qcom ops ops qcom fix qcom texture info fx qcom fix qcom qcom, sry minor works * remove old code * unrelated+sint * qcom * typing * rm comments	2024-12-08 15:22:05 +03:00
nimlgen	d6e66095fd	hcq buffer is a class (#8106 ) * hcq buffer is a class * qcom * no from_mv in qcom * remove qcombuffer * useless cast * mypy * qcom fix * _md -> meta	2024-12-08 13:29:43 +03:00
nimlgen	8b1fa9cb7d	nv hcq queue touchups (#8102 )	2024-12-07 14:09:38 +03:00
nimlgen	e180a31c5e	tiny metal cleanup (#8089 ) * tiny metal cleanup * cast * sry	2024-12-06 21:44:32 +03:00
nimlgen	d1282da7e8	hcq bump alloc (#8078 ) * hcq bump alloc * hm * nv * typo	2024-12-06 19:19:04 +03:00
nimlgen	c0240855b9	qcom has not transfer (#8075 ) * qcom alloc is not hcq alloc * maybe base? * test	2024-12-06 14:45:01 +03:00
JaSpa99	3c5d5f9414	mypy==1.13.0 (#7990 ) * explicit instantiation and narrowing asserts * explicit cast * bump * one line assert * handle case for no copy_queue_t * Revert "handle case for no copy_queue_t" This reverts commit `38347806ca`. * more readable control flow --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-06 12:09:14 +08:00
nimlgen	78c01a5c2b	amd general _gpu_alloc (#8056 ) * amd general _gpu_alloc * hmm * ops	2024-12-05 15:50:23 +03:00
nimlgen	8071600897	nv one _gpu_alloc (#8055 )	2024-12-05 15:22:03 +03:00
uuuvn	e9c5b23ba1	Use MTLCompiler directly (v2) (#7920 ) * Use MTLCompiler directly (v2) * to_block_literal and REQUEST_TYPE_COMPILE * Rewrite command encoding * Revert to_block_literal * Maybe that's more readable to some people? * Typo and comment about stdlib caching * Update ops_metal.py * Update ops_metal.py * Update ops_metal.py --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-04 16:36:48 +08:00
nimlgen	7fda464b08	hcq c-like args state (#8020 ) * hcq c-like args state * ugh * Dfix * rename * i	2024-12-03 23:53:35 +03:00
George Hotz	32675a8a77	sacrifice ClangGraph on the altar of lines [pr] (#8009 )	2024-12-03 21:11:15 +08:00
Ahmed Harmouche	146e1caea3	Downgrade wgpu to prevent sd segfault (#7969 )	2024-12-02 15:48:44 +01:00
wozeparrot	077e7e8ed2	fix: private segment sgpr on gfx103x (#7987 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-02 20:54:50 +08:00
nimlgen	10f431b96d	hcq replace update with sint (#7899 ) * try sym hcq * start with amd * move to nv * nv works * cache and qcom * fixes * signals * fix nv * qcom fixes * linter * linter * cache + typings * fixes * tiny fixes * linter * linter * lntr * ugh * comments	2024-11-29 20:08:13 +03:00
nimlgen	d3660ccc51	prereqs for hcq updates removal (#7959 ) * hcq signals touch ups * hcq compiled has device id * helpers * prreq hcq api * oops	2024-11-29 18:20:07 +03:00
nimlgen	309dcb1044	hcq signal add sleep (#7955 ) * hcqsignal sleep * fixes * typing * time ms is int	2024-11-29 14:04:45 +03:00
nimlgen	81d415be03	amd pkt3 refactor (#7923 ) * amd pkt3 refactor * replace this * linter * fix * cmt * fast * simpler * linter * smth * missing	2024-11-28 11:06:37 +03:00
JaSpa99	38f34ca0cb	prepare mypy==1.13.0: legacy cast (#7866 ) * use helper to narrow literal type * narrow with asserts instead of cast * remove parantheses * tensor.item() calls tensor.data() * no copy * proper indexing	2024-11-27 10:33:35 -05:00
nimlgen	84f96e48a1	hcq signal tiny refactor (#7913 ) * hcq signal tiny refactor * no mv * fix * fix2 * fix3	2024-11-26 21:48:38 +03:00
Ahmed Harmouche	10618aba98	Bring back WebGPU (#7063 ) * Start from andredaprato:webgpu-clean * Fix infs * inf wgsl function is not needed * Emulated ulong for threefry, more tests passing * Randomness tests passing * Update model export to support new changes in webgpu, efficientnet export works again * Simplify shift emulation in wgsl * Delete test file * Fix bigger than u32 u32 literal * Why was skip copies added here? * Python3.12 for webgpu tests * Fix model export syntax error * Get test ops passing with some skips * Fix lint * Much simpler shift * Run more tests * Timestamp queries are not supported in CI, so skip search tests * All fancy indexing passing * r is ctx * Run more dtype tests by using is_dtype_supported * Cleanup ulong shift rendering * UPat -> Pat, UOps -> Ops * Pat -> UPat * Refactor render_ushift if-else * Pattern to avoid ulong mul * Remove vals_dtype * is_nan trick + rewrite, test_isnan passing * Rewrite a * select(1, nan, gate) -> select(a, nan, gate) * No arg, just op * Support char, uchar, short, ushort * Run test_index_mnis now that we have uint8 * Fix pyling * Save 3 lines by using base Compiler * No more long emulation * Remove fixup_binops * No more external_local_bufx wgsl specific cstyle modif, use base extra_pm * Simpler, faster copyin/out * Skip some new tests that use long * Fix typo * copyout touchup * Save lines by using render_cast * WebGL is not supported in core, delete it from is_dtype_supported * More narrow test skips for some unary tests * TernaryOps, UnaryOps -> Ops * TinyGrad supports WebGPU * StableDiffusion demo: f16tof32 gpu is a lib, update UI * Packed load/store, no more scale_size, no core tinygrad changes * Rename copyin, copyout * Device -> dev * Fix lint * Pattern matcher rule for packed load/store * Refactor * Shorter packed load/store * this should fix lint * Fix mypy * SD compile script working * New SD webgpu UI * New default prompt * New SD weights * Fix title when webgpu not available * Run symbolic tests, simplify is_nan, use round_up * Show step time on UI * Bump minimum wgpu version to v0.19 * Fix latent --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-11-26 12:26:40 +08:00
chenyu	04bee97d2a	hotfix ctypes.c_ulong(size) for metal _alloc (#7902 ) fix `Tensor.ones(1000, 1000, 1000).contiguous().realize()` on METAL	2024-11-25 18:25:33 -05:00
George Hotz	1d6d842887	move DSP to extra (room for webgpu) [pr] (#7836 )	2024-11-22 11:32:57 +08:00
George Hotz	6fc7013463	put all DSP in dsp file [pr] (#7833 )	2024-11-22 11:22:59 +08:00
George Hotz	e39af63156	no loop assert in ops_python [pr] (#7834 )	2024-11-22 11:17:36 +08:00

1 2 3 4 5 ...

903 Commits