tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
chenyu	c22667b0c4	also skip test_overlapping_shrink_assignment_reverse (#14375 ) crashing	2026-01-27 12:20:39 -05:00
nimlgen	e52d58b041	autogen: update amd (#14372 )	2026-01-27 19:53:14 +03:00
nimlgen	cbf94a0a95	nv: exit early in case of failures (#14363 ) * nv: exit early in case of failures * f * cleaner	2026-01-27 19:16:22 +03:00
nimlgen	ec691cb299	am: print sq intrs (#14366 ) * am: print sq intrs * cleaner	2026-01-27 18:28:13 +03:00
qazal	a5f3d46423	hcq: do not assume kernel names are unique (#14371 ) * hcq: do not assume kernel names are unique * colored kernel name	2026-01-27 23:03:15 +09:00
George Hotz	e5df7e640b	fix branches in amd_asm_matmul (#14369 )	2026-01-27 20:48:42 +08:00
George Hotz	0ced258726	HOTFIX: skip crashing assign test	2026-01-27 20:35:17 +08:00
George Hotz	131ae604de	force_transcendental on sqrt (#14368 )	2026-01-27 20:24:41 +08:00
imaolo	14574c68fa	Add ContextVar to disable the scheduler cache (#14257 ) * add scheduler cache ContextVar * test scheduler cache context var --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2026-01-27 19:55:29 +08:00
George Hotz	bfc88bcfb8	assembly/amd: emu refactors + enable PYTHON_REMU by default (#14361 ) * assembly/amd: start refactors * cleanups * those are global * methods on ctx * const cleanup * range helper * types and imports * cleanups * cleanups * remove stale name * fix emu2 types * more typing * more mypy * cleanups * fxns * scc cleanup * cleanups * cleanups * simpler parse_pcode * laneid * no defaults for pcode * pcode is not optional * cleanups * functions cleanup * splat * expr_parser functions * single tok * invert global loops * try_eat * minor * run parser on all * no silent 0 * tests	2026-01-27 17:42:24 +08:00
Christopher Milan	2e72625652	Revert "decompose dtypes.long to ints where unsupported (#14261 )" (#14362 )	2026-01-27 02:04:59 -05:00
qazal	f866b2a513	mfma loop in asm dsl (#14349 ) * mfma loop in asm dsl * work	2026-01-27 11:11:37 +09:00
Christopher Milan	0793319929	decompose dtypes.long to ints where unsupported (#14261 ) * add works * use carry not overflow * bitwise ops * use tag instead of vec * cleaner * mul somewhat works * mul actually works * SUB and NEG work * SHL/SHR * ulong support * this should work? * oops * fix indexing * all ALU mostly works * refactor * test_dtype passing * signed division works * format * clean * some tests * ruff	2026-01-26 18:34:13 -05:00
wozeparrot	a987a4abc3	feat: llama8b dev_beam.sh (#14358 )	2026-01-26 14:51:23 -08:00
Christopher Milan	c9c533fc78	libclang path is homebrew on macos (#14357 ) * libclang path is homebrew macos * typo * ugh * typo * regen * no LIBCLANG_PATH	2026-01-26 17:32:09 -05:00
chenyu	d641e63189	improve min/max for AND (#14356 )	2026-01-26 15:44:18 -05:00
chenyu	f16372487a	fix assign hazard on shrink (#14355 ) * fix assign hazard on shrink possible to have race if both assign src and dest are shrink * test_nonoverlapping_shrink_assignment	2026-01-26 14:46:30 -05:00
chenyu	145df879c1	find_permutes -> fix_assign_hazard [pr] (#14354 ) some noop tweaks and comment updates	2026-01-26 14:05:19 -05:00
nimlgen	e152f1b0f5	llama: use ALL2ALL (#14353 )	2026-01-26 22:01:53 +03:00
nimlgen	3f25eb3026	am: ih (#14346 ) * am: ih * um * fix * line * no trap and fix ring * keep * fix	2026-01-26 20:11:04 +03:00
chenyu	823bc17fb5	failed test case for shrink overlap assigns (#14350 ) * failed test case for shrink overlap assigns current logic can create a race resulted in wrong output * skip for now	2026-01-26 11:58:45 -05:00
George Hotz	204f51e739	assembly/amd: bug fixes for PYTHON_REMU (#14347 ) * default PYTHON_REMU to 1 * mockgpu * less size * normal compile path * uniqie * more * fix clamp * Change PYTHON_REMU default to 0 in _try_dlopen_remu	2026-01-27 00:48:22 +08:00
chenyu	231305603d	remove REAL_DEV [pr] (#14337 ) it's just Device.DEFAULT now	2026-01-26 10:08:16 -05:00
Martin Szewieczek	9cbe99348a	func meshgrid: change param index to type str (#14331 )	2026-01-26 10:07:56 -05:00
George Hotz	3b43d26f10	assembly/amd: emu speed (#14344 ) * assembly/amd: emu speed * fix spec * go * don't do this * simpler * no stupid consts * hack * simpler * no index * no where * faster linearizer * fix spec * no index dtype	2026-01-26 22:21:34 +08:00
George Hotz	774a454bb5	assembly/amd: fix scratch SVE (#14340 ) * assembly/amd: default python REMU * mem_used * no lane * sve * remove that * needs s_code_end in tests	2026-01-26 21:03:51 +08:00
qazal	2d91fe6310	use amdgpu dsl in mmapeak (#14342 ) * use amdgpu dsl in mmapeak * don't rely on llvm for vgpr counting * llvm roundtrip assert * rm it, add ci * vgpr_count * move emulated test to amd, it needs comgr * env * arch * inst._fields -> inst.operands * vgpr offset	2026-01-26 22:03:43 +09:00
qazal	b2e2ace85b	viz: remove ci check, it's VIZ=-1/-2 (#14343 )	2026-01-26 20:36:23 +09:00
George Hotz	be23776ba7	assembly/amd: replace pcode with ucode (#14002 ) * a bunch of todos for my boy claude * uops have types * lil cleanups * simpler ucode * isNAN * calls * move more * cleanup pcode_parse * cvt functions * fix parser bugs * no void * minmax * more pcode parse * pretty print * transform * comments * move to transform * assign/declare * simpler norm * single PM * just Uops * simpler * more typed * all rewrite * less verbose * work * spec * transform * work * simpler spec * less spec * bitcast * simpler * simp ucode * work * more in pcode_transform * remove junk * more functions * bug * no void assign * load/store * wave * fixes * move denorm * move more functions * tests * cat is shape None * uop syntax * move a few more * program_spec * cat stuff * assign fix clear * unused * nans * fp bits * works with simplify * remove junk * special * meh * more * more * update test pcode parse * improve parser * parse some for loops * merge master * dead files * tests pass * emu2 * better emu2 * test_plus works * uselessly write more instructions * use pcode * something * something * bench_emu * progress * ds works * work * work * more passing * run compare * bench_emu * more pcode * a few more * bugfixes * bugfix * test fixes * tests pass without USE_HW * all hw tests pass * add more hw tests * new hw tests * bit * less handcode * parse more * consolidate pcode * fixes * rsrc * lane pcode * cleanups * simpler * emu bugs * one cmp test fails * fix decode and upd name * fix name and test harness * _ftz_f32 * fix denorm * fix VOPD and use load * fix carry bug * no load where / just invalid * clean * simpler * merge sops * refactoring * simplifications * bugfixes * new tests * f16 sin fix * assertion and hw tests * cvt functions * one more failure * bugfixes * bugfix + regression * more tests * fmac * no manual unrolling * ordering * LLVM backend is a lot faster * compile inst * more bugs * f16 * bugfix * fix regression * one clang call * 1M inst * scratch works * do scratch correctly * cleanup * regression * cmp * fmamk fixes * merge * fix vcmpx * unify memory * remove unused code * ignore oob for test * cleanups * fix mbs * unify cmp * test * minor cleanups * bump timeout * fix tests * revert the CMPLE stuff * remove opt * less diff * simpler * revert * support multiple backends * memset is a lot faster * split out in bench emu * improve timing * timing * cache that * cache that * simpler and faster * tokenize * binop table * simpler * move to parser * tok for lambda * refactor * expr_parser * delete emu2_pcode * import cleanup * lil * if parse * work * simpler * no v * trig preop is faster * durations for tests * fix cmp bug * sdst * remove scartch_size hack * null behavior * _MXCSRContext * bugfixes * DEBUG >= 3 * test smem crashes my gpu * debug * test * test smem * profiler * full inst * bugfix * rtag(1) * pc is 64-bit and word * pc is real code now * dynamic * more dynamic * fix oob access * fix crash, more dyn * all dyn * really all dyn * correct null mask * lit + format * 21s on the tests * 13s on the tests * canonical name * simm16 * more dyn * 14s * proper saddr dedup * dyn * debug 5 * better 5 * revert dynamic stuff * that can be dyn * negative offsets * dyn wmma * f16 wmma support / ops / dtype / dtype_alu * symbolic changes not needed * ConstFloat * more uop.const * __eq__ * uop tests * fix f16 * bf16 tensor cores * whitespace * remove cast roundtrip * Revert "remove cast roundtrip" This reverts commit `c5bb0381c3`. * just the fix * remove dead paths * llvm runs	2026-01-26 18:04:29 +08:00
George Hotz	984cdc4840	add wrapper class for the -0.0 != 0.0 issue (#14339 ) * add wrapper class for the -0.0 != 0.0 issue * fixes * spec fix * missed one	2026-01-26 16:52:37 +08:00
qazal	92bfe92138	assembly/amd: fix cdna mfma xml (#14329 ) * handwritten failing test * new amdxml * more mfma from fixes * ci * move arch of test integration * alt * amdxml human cleanup * _TestIntegration rename to IntegrationTestBase * it's the same problem as _LIT * better comment * better variable name	2026-01-26 17:51:26 +09:00
Garret Castro	6c109f4d75	LLVM: CPU threading support (#14320 ) * make generic llvmrenderer class for cpu and amd * move `tensor_cores` back to parent * remove empty line * restore extra matcher position * add threading * dont need to add core_id here * dont move code for workitem * cleanup --------- Co-authored-by: TheVanadium <claude_user@ret2022.localdomain>	2026-01-26 13:12:39 +08:00
George Hotz	cc49e47ea2	tinygrad changes from ucode (#14336 ) * tinygrad changes from ucode * dtype	2026-01-26 11:30:18 +08:00
Garret Castro	8477368d07	generic LLVMRenderer class for CPU and AMD (#14321 ) * make generic llvmrenderer class for cpu and amd * move `tensor_cores` back to parent * remove empty line * restore extra matcher position * cleanup --------- Co-authored-by: TheVanadium <claude_user@ret2022.localdomain>	2026-01-26 09:11:49 +08:00
George Hotz	11ce1e847d	llama train: null device support	2026-01-26 08:53:05 +08:00
chenyu	e3601788fa	update torch backend function (#14333 ) those have tensor.py implementation	2026-01-25 16:39:34 -05:00
nimlgen	9865f51e39	cupti: ref collector (#14330 ) * cupti: ref collector * ll	2026-01-25 20:35:21 +03:00
nimlgen	21ab23ae18	nv: add pma for ada (#14328 ) * nv: add pma for ada * um * fix * shorter * mock	2026-01-25 17:33:37 +03:00
George Hotz	49db266b96	ReprEnum for repr roundtrips (#14327 ) * ReprEnum for repr roundtrips * dsl * bugfixes * vdsty fixes * cleaner * fix * fix cdna fields * tests all pass	2026-01-25 18:58:31 +08:00
qazal	bf2d9d138f	viz: simplify amdgpu cfg (#14326 ) * viz: replace llvm disasm with our disasm * it starts with more code * then it becomes less * simpler, cdna disassembles with decimal simm16 * s_branch is upper case, add test * simm16s and others	2026-01-25 15:21:45 +09:00
qazal	647e527a7e	viz: replace llvm disasm with our disasm (#14325 )	2026-01-25 13:56:56 +09:00
nimlgen	4280a8eef2	am: update fw (#14323 )	2026-01-25 01:08:47 +03:00
chenyu	7e41da1ae8	fix generate_dataset.sh (#14324 ) added `set -e` so wrong pathes would fail the script, then fixed the path	2026-01-24 16:47:10 -05:00
chenyu	311bfd91d6	clean up where_on_load [pr] (#14322 ) no repeated split_uop and general cleanup	2026-01-24 14:43:43 -05:00
nimlgen	8b282ba6d2	memory: reserved vram (#14318 )	2026-01-24 19:39:24 +03:00
chenyu	00e9ba0b82	update type for split_uop and where_on_load [pr] (#14319 ) also variable names in where_on_load, before logic update	2026-01-24 11:17:41 -05:00
chenyu	cb69b7b2b2	comment out fold_where_closure (#14316 )	2026-01-24 10:15:42 -05:00
wozeparrot	d74587f16d	fa multi fix 2 (#14314 )	2026-01-23 23:35:02 -08:00
chenyu	d9f0ad1d87	update return type for Tensor.tolist (#14313 ) since sequence is incorrect since it can be list of list, use Any to avoid recursive type	2026-01-23 23:21:49 -05:00
qazal	807bc40931	assembly/amd: dsl and disasm cleanup (#14311 ) * rdna4 inst helper * remove dsl aliases	2026-01-24 11:36:12 +09:00

... 2 3 4 5 6 ...

12045 Commits