tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-13 17:08:11 -05:00

Author	SHA1	Message	Date
George Hotz	2d4f01fda0	move mixins to mixin dir (#13105 ) * move mixins to mixin dir * math	2025-11-05 10:18:33 -08:00
b1tg	60d7e232f2	cuda fp8 (#12782 ) * cuda fp8 * tensor core * tc test * clean * clean pm	2025-10-21 15:05:25 -04:00
George Hotz	38dcadf07b	delete kernel.py (#12040 ) * delete kernel.py * delete that file * rip and tear * don't test search * imports * fix torch frontend * not a part of regen	2025-09-05 15:52:07 -07:00
George Hotz	82be8abfd2	move opt under codegen (#11569 )	2025-08-07 14:19:17 -07:00
George Hotz	92678e59ee	move kernel to opt (#10899 )	2025-06-20 15:22:28 -07:00
chenyu	c5db5b83b9	add SHOULD_USE_TC=1 check to simple_matmul (#9802 ) * add SHOULD_USE_TC=1 check to simple_matmul also zero centered the random input and update atol for tf32 * ATOL=2e-2 for HALF	2025-04-09 02:24:42 -04:00
George Hotz	78caf55154	Revert "FP8 support on NVIDIA (#8631 )" This reverts commit `2c8e4ea865`.	2025-04-09 12:27:41 +08:00
George Hotz	14928fecff	Revert "fix TF32 tensor core dropped in tc_sm89 (#9798 )" This reverts commit `7c9a96824f`.	2025-04-09 12:27:39 +08:00
chenyu	7c9a96824f	fix TF32 tensor core dropped in tc_sm89 (#9798 ) also add `SHOULD_USE_TC=1` to verify TC is applied in simple_matmul	2025-04-08 23:20:50 -04:00
pkotzbach	2c8e4ea865	FP8 support on NVIDIA (#8631 ) * squashed fp8 commits * tensorcore start * minor changes * pre-commit * pylint * Delete fp8mul.cu * clean * small bugfix * fix test_dtype * fix test_dtype_alu * add EMULATE_CUDA_SM89 * fix ci * fix test_linearizer * fix test_linearizer * fix swizzle * add debug to simple_matmul * fixed swizzle * python emulator * refactor python emulator * setup fix * numpy setup * ml_dtypes only in emulate_cuda_sm89 * fix pylint * fix tests * fix mypy * fix mypy * fix ruff * done python emulator * add acc type * tests * mypy * clean code * add cuda tensor core tests to CI * minor fix * clean test_dtype.py * clean cstyle.py * clean test_ops.py * fix test * fix test * whitespaces * pylint * pylint * amd? * amd? * amd * reduce lines * mockgpu remove * fix * ruff * ruff * fix mypy * ruff * test only for cuda * fixed formatting * small fixes * small fix * least_upper_dtype if fp8s not supported * log and reciprocal are supported for fp8s * ops python fixes * dtypes.fp8s use * e4m3 + e5m2 result dtype test * truncate linter fix --------- Co-authored-by: pkotzbach <pawkotz@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-04-08 21:54:04 -04:00
chenyu	0e591baf43	redo simple_matmul change (#9450 ) numpy does not support bfloat16	2025-03-14 17:53:52 -04:00
chenyu	b0f63d3c04	Revert "`simple_matmul.py` uses np to generate random (#9438 )" (#9449 ) This reverts commit `14018050c1`.	2025-03-14 17:14:22 -04:00
Ignacio Sica	14018050c1	`simple_matmul.py` uses np to generate random (#9438 ) * np generates randoms * hotfix: use generator for int dtype * float32 as default dtype for float generator * use np.float32 instead of stirng * add dtype= to integers generator * change import _to_np_dtype source	2025-03-14 17:36:50 -03:00
chenyu	01e8b60911	acc_dtype -> dtype (#9402 ) matched numpy and torch	2025-03-10 16:05:30 -04:00
ignaciosica	b49a04145e	fix for int plus minor cleanup (#8650 )	2025-01-17 22:30:39 -05:00
nimlgen	81a4a9623c	add qcom dsp runtime (#6112 ) * calling qualcomm dsp from python * include so files * add include file * adsprpc.py * running with adsprpc * work * 32-bit support in elf * compilation works * ion * msm_ion * working DSP backend * getting 500 MFLOPS on matmul * beam works with timing * move to autogen * disasm * progress * simple tests pass * qcom_dsp * more dsp autogen * progress * some progress * works w/o lib * checkpoint * no lib * ugh, better * cleaner, but with lib. test good, but with the hack * remove autogens * small * push * simpler * revert this * run_3 * simpler * android * handle * run it * why? * run2 * to gen * cc * cleaner * elf * part of autogen * comemnt * no lib * autohen * linter * bug reproducer * cleaner * this repro is almost empty and doesn't work!!!! * with this test_ops passes, no crashes anymore * cleaner * linter * renames * shorter * remoev contextlib * ugh * myoy * cleaner * cleaner * remove import * conn * import * revert this * remove heavy .so * shorter alloc * not tue anymore --------- Co-authored-by: Comma Device <device@comma.ai> Co-authored-by: George Hotz <geohot@gmail.com> Co-authored-by: George Hotz <george@comma.ai>	2024-09-13 21:01:33 +03:00
Francis Lam	c91b7b1739	test: add fuzz_matmul and better debugging for simple_matmul (#4199 ) also show unoptimized shape in verify_kernel	2024-04-16 23:40:31 -04:00
Francis Lam	7c5729a3bd	wmma: refactor to remove wmma_func and create TC funcs as needed (#3945 ) * wmma: refactor to remove wmma_func and create TC funcs as needed * test_linearizer: disable bf16 CUDA during emulation testing * cstyle: clean up creation of CUDA vec dtypes * extra/gemm: add option to accumulate to bfloat16 * cleanups * benchmark: add CUDA bfloat16 matmul * more cleanups	2024-03-27 16:43:09 -04:00
Francis Lam	a26090d404	search: change to use "spawn" and limit the number of tasks per child (#3862 ) also clean up some examples to use __main__ and not initialize resources outside of main	2024-03-21 21:23:36 -07:00
Francis Lam	ddbdb52f77	wmma: enable METAL half tensor cores and clean up cstyle (#3095 ) * wmma: enable METAL half tensor cores and clean up cstyle * revert simple_matmul rand changes and break line in tensor * added metal fp16->fp32 tensor core	2024-01-12 16:25:28 -05:00
chenyu	1d730b8853	remove ACCUM_FP32 in simple_matmul.py (#3045 ) * remove ACCUM_FP32 in simple_matmul.py accumate for half inputs is always in float * move test llama compile speed to metal	2024-01-08 17:37:57 -05:00
George Hotz	a280cfe169	move dtypes to dtype.py (#2964 ) * move dtypes to dtype.py * fix urllib	2024-01-01 14:58:48 -08:00
George Hotz	b5fd160b39	hotfix: increase rtol on simple_matmul	2023-12-11 10:10:29 -08:00
Francis Lam	dece9958f8	wmma: clean up to make WMMA arg order consistent (#2014 ) also add cache defeat to extra/gemm/simple_matmul.py	2023-10-07 17:45:40 -07:00
Francis Lam	f445e056ed	wmma: add test and tensor core shape (#1925 )	2023-09-28 18:04:28 -07:00
George Hotz	e464442adf	WMMA for 7900XTX (#1563 ) * go * hip no LRU * work * works * 16 TFLOPS * 29 TFLOPS * 30 TFLOPS * never mind, it's 60 TFLOPS * fix metal WMMA * put hip alloc back	2023-08-19 09:07:23 -07:00
George Hotz	90fff82c8a	Rdna (#776 ) * assembler maybe * custom asm * rdna3 on quiet * trigger crashes * fixed notes * non-fatal rdna2 crash * Crash4 * improve rdna sniffer * comments * improve sniffer * asm * 131 TFLOPS RDNA3 * opt simple matmul * todos	2023-05-16 05:33:57 -07:00

27 Commits