tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-11 07:58:08 -05:00

Author	SHA1	Message	Date
chenyu	b34c637767	support bfloat16 for CL (#14073 )	2026-01-08 14:14:29 -05:00
chenyu	3caa1e2c98	fix cast HALF with PYTHON backend (#14058 )	2026-01-07 16:52:05 -05:00
chenyu	5f1ede7f7e	clean up test_dtype (#14055 ) use less lambda	2026-01-07 15:45:42 -05:00
George Hotz	7abf4591ba	use bitsize on dtype (#14011 ) * use bitsize on dtype [pr] * bitsize * bitsize in js export, but might be wrong * reverts * revert that	2026-01-04 12:16:21 -08:00
Jakob Sachs	ab2220b834	Handle missing bfloat16 natives on CPU architectures (#13553 ) * CPU: fix compiler-rt libcall by adding intermediate casts for bfloat16 * fix lint * remove old manual bypass of bf16 for CPU tests, and add diversion converstion from bf16 to/from fp16 --------- Co-authored-by: Jakob Sachs <jakobs99@purelymail.com>	2025-12-11 15:38:43 -05:00
Christopher Milan	0aabc1e938	Mesa NIR backend (NAK/LLVMpipe) (#12089 ) * nak works * TestOps::test_add works * testop has no crashes * fix bool casts * fix typo * add disassemble * RANGE and locals/regs * simplify NAKCompiler * disass cleanup * cleanup nir codegen * almost all tests passing * cleanup notes in extra/ * old notes * only import nak if NIR=1 * fix new SPECIAL syntax * fix local/shared memory * more tests passing * add DEFINE_VAR support * llvmpipe kinda works * diskcache * some mypy stuff * lvp passing test_ops.py * fix imports * actually fix imports * remove 'stdout' * fix llvm import * fix mypy issues * nicer errors * simpler test_dtype skips * test lvp in CI * fix github action syntax * fix more actions typos * switch to mesa 25.1.0 * diskcache_put * better generation for lvp nir_options * b64encode shader blobs * Revert diskcache changes This reverts commits `930fa3de8a` and `8428c694b3`. * general cleanup * better error messages * fix llvm import * fix windows tests * link with libm and libgcc_s * fix some errors * dont check for 'float4' * NIR uses pointer arithmetic * use tinymesa * bump tinymesa * bump tinymesa again * update lvp nir_options * print nir shader with DEBUG * simplify LVPCompiler * more tests * "gated" STORE * NAK is cacheable * more tests * all tests pass locally for NAK * test autogen in CI * autogen deps * more deps * fix uop_gc * fix macos * mypy * save 2 lines * save two more lines * save 1 line * save 4 lines * save more lines * Revert "save more lines" This reverts commit `dd3a720c5a`. * save more lines * fix LVP on windows * refactor * reorganize some code * refactor lib_gpu * move LVP check * out of order loads * remove support.mesa * bump tinymesa version * simplify LVP jit * macos * macos ci * shell: bash * testing * more testing * compute brew prefix * stupid typo * actually fix * lib * stdout on macos * inline gallivm_compile_module * Revert "inline gallivm_compile_module" This reverts commit `b65983b151`. * elf macos * semicolon * inherit from CPULLVMCompiler * ruff * disas test * fix libm linking * default is fine actually * arm works * add elf loader link test * fix NAK beam * pylint is too smart by half --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2025-10-15 17:38:33 +08:00
chenyu	9f2b69b870	enable few tests for PTX test_dtype (#12445 )	2025-10-03 08:56:30 -04:00
b1tg	54c15d74a4	python float8 support (#11960 ) * basic support * alu * nan in exec_alu * rand_for_dtype * inf + 0.0 * finfo * revert rand_for_dtype * clean * truncate fp8s inf * spec ok * float_to_fp8 nan/inf * least_upper_dtype * clean up --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-09-18 09:17:09 -04:00
chenyu	98ecab7563	remove ml_dtypes (#12169 )	2025-09-14 14:20:05 -04:00
chenyu	0e266f376c	ops_gpu -> ops_cl (#12103 )	2025-09-10 15:15:48 -04:00
nimlgen	551560b87c	do not use getenv('PTX') in tests (#12095 ) * test without ptx * fix tests * fix test * linters	2025-09-10 14:04:07 +03:00
b1tg	58d13a6e3e	remove redundant check (#12087 )	2025-09-09 15:15:39 -04:00
b1tg	82e955fe79	fix inf bug in float_to_fp8 (#12085 )	2025-09-09 12:02:56 -04:00
chenyu	677220ae7e	test_tesnor_data to unit/ (#12013 )	2025-09-04 19:58:27 -04:00
b1tg	a9f07c31bc	fix amd llvm sqrt (#11936 ) * fix amd llvm sqrt * lint --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-09-01 09:31:14 -04:00
b1tg	c1eeb3b99c	only skip AMD_LLVM (#11934 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-08-31 18:15:47 +03:00
b1tg	75d380a77c	fix transcendentals in python renderer (#11932 ) * fix transcendentals in python renderer * add test --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-08-31 09:37:17 -04:00
b1tg	b2cc06218a	python bfloat16 (#11912 ) * python bf16 * _to_torch_storage_type --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-08-29 15:18:02 -04:00
chenyu	f28f613f85	improved float_to_bf16 (#11848 ) round instead of truncate	2025-08-26 11:14:06 -04:00
chenyu	e22e5da9a5	move some test_dtype tests to unit (#11479 )	2025-08-02 15:25:00 -04:00
chenyu	a41140241b	truncate unsigned const in cstyle (#11318 ) it can be a warning or a hard error in clang PTX and PYTHON also need fix, skipping for now	2025-07-22 08:02:12 -04:00
George Hotz	a493eb396c	fix view add 0 (#10840 )	2025-06-16 16:46:12 -07:00
chenyu	8c28b5d833	move dtype spec tests into unit test (#10808 ) * move dtype spec tests into unit test can clean up more after the split * skip CI test_backward_sum_acc_dtype	2025-06-13 22:21:22 -04:00
Sieds Lykles	0daa4c6ed0	Add `DType.min` and `DType.max` properties (#10749 ) * add properties * cleaner test * remove added newline	2025-06-10 08:31:34 -07:00
George Hotz	81b9c04574	move high level stuff to unit tests [pr] (#10708 ) * move high level stuff to unit tests [pr] * process replay on unit tests * fix pr, less compute * set omp num threads * set 200MB buffer size limit * delete junk * fix tests * faster * move test_indexing to unit * faster	2025-06-08 14:05:56 -07:00
George Hotz	32e9949052	rename lazydata to uop (#10698 )	2025-06-08 08:42:22 -07:00
pkotzbach	dbbd755cba	FP8s truncate (#9937 ) * truncate fp8 * fix * maybe like that? * fix linters * ruff * move from extra and add ml_types to tests * minor changes * str to dtypes and nan support --------- Co-authored-by: pkotzbach <pawkotz@gmail.com>	2025-04-22 19:12:49 -04:00
pkotzbach	5849c43382	FP8s part 1 (#9887 ) * fp8s part 1 * prettier * fixes * fixes * remove stuff that should be in next pr * revert * add creation --------- Co-authored-by: pkotzbach <pawkotz@gmail.com>	2025-04-15 11:20:02 -04:00
chenyu	ce454793e6	support specifying dtype for Tensor.linear (#9886 )	2025-04-14 13:55:11 -04:00
George Hotz	78caf55154	Revert "FP8 support on NVIDIA (#8631 )" This reverts commit `2c8e4ea865`.	2025-04-09 12:27:41 +08:00
pkotzbach	2c8e4ea865	FP8 support on NVIDIA (#8631 ) * squashed fp8 commits * tensorcore start * minor changes * pre-commit * pylint * Delete fp8mul.cu * clean * small bugfix * fix test_dtype * fix test_dtype_alu * add EMULATE_CUDA_SM89 * fix ci * fix test_linearizer * fix test_linearizer * fix swizzle * add debug to simple_matmul * fixed swizzle * python emulator * refactor python emulator * setup fix * numpy setup * ml_dtypes only in emulate_cuda_sm89 * fix pylint * fix tests * fix mypy * fix mypy * fix ruff * done python emulator * add acc type * tests * mypy * clean code * add cuda tensor core tests to CI * minor fix * clean test_dtype.py * clean cstyle.py * clean test_ops.py * fix test * fix test * whitespaces * pylint * pylint * amd? * amd? * amd * reduce lines * mockgpu remove * fix * ruff * ruff * fix mypy * ruff * test only for cuda * fixed formatting * small fixes * small fix * least_upper_dtype if fp8s not supported * log and reciprocal are supported for fp8s * ops python fixes * dtypes.fp8s use * e4m3 + e5m2 result dtype test * truncate linter fix --------- Co-authored-by: pkotzbach <pawkotz@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-04-08 21:54:04 -04:00
chenyu	79145e3d40	cleanup truncate_bf16 [pr] (#9725 ) use torch bfloat16 for groundtruth in test. also a TODO for discrepancy	2025-04-03 05:43:49 -04:00
qazal	e26caf4c3a	hotfix: skip test_mean_half_precision_underflow on amd ci (#9476 ) The global size is very large (781250 gidx) and the emulated version takes more than 1 minute to execute the kernel.	2025-03-17 16:47:48 +08:00
chenyu	01e8b60911	acc_dtype -> dtype (#9402 ) matched numpy and torch	2025-03-10 16:05:30 -04:00
chenyu	3ae66e59a3	least_upper_float is at least default_float (#9303 ) * least_upper_float is at least default_float en route for div rounding mode. dtype of true int division would change from int32 to default_float, which matches torch too. * fix bert acc	2025-02-28 10:41:56 -05:00
qazal	cbfe95d306	bring cast before view back (#9230 ) * bring cast before view back * tune it to only trigger on expands --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-25 01:50:39 +02:00
Ahmed Harmouche	59fe45f947	Solve get_grouped_dims does not split issue (#9085 ) * Solve dims too large errors on webgpu * Simplify divisor find * Test square root divisor * Fix lint * Refactor into group_dims and split_dims * Refactor * Fix lint * Add back max check in _group_dims * Prefer grouping over split --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-16 19:57:29 -05:00
b1tg	1f1362fd27	add truncate_bf16 (#9078 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-02-15 07:59:09 +08:00
Ahmed Harmouche	916d5e7f08	WebGPU f16 support (f16 bounty part 2) (#8653 ) * WebGPU f16 support * Don't enable f16 yet * dtype tests passing after bitcast fix * Maybe all WebGPU green? * Require shader-f16 in examples * Minor wgsl touchup * 1 line shorter * Simpler * Add transcendetal support * log2 nan location mismatch on Vulkan * Nan skips	2025-02-12 19:46:53 +08:00
George Hotz	c1c5227acb	preserve size in dtype ptr [pr] (#8898 )	2025-02-05 14:38:57 +08:00
George Hotz	c85737c200	assert to prepare for grad uop [pr] (#8280 ) * assert to prepare for grad uop [pr] * fix test_nn * fix most of test_tensor * few more tests * fix multi * uniform gradient * acc_dtype * any for multi * fix typing * fix assert, CAST_BEFORE_VIEW is still the issue * explict test for CAST_BEFORE_VIEW --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-01-14 13:26:56 -08:00
qazal	ed618a72e7	do not use subbuffer for bitcast (#8514 ) * do not use subbuffer for bitcast * edit that test * explicit test for ptx * ptx	2025-01-06 18:40:46 +02:00
George Hotz	6608ba316d	add size of the buffer to the ptr dtype (#8322 )	2024-12-18 12:46:35 -08:00
uuuvn	da2245a458	Fix double => half cast on clang (#8265 )	2024-12-15 11:24:05 -08:00
pkotzbach	c1b79c118f	add unit tests for to_dtype (#8217 ) * add unit test for to_dtype * add unit test for to_dtype --------- Co-authored-by: pkotzbach <pawkotz@gmail.com>	2024-12-13 16:21:02 -05:00
chenyu	40a4c603b9	remove more test skip for webgpu [pr] (#8192 )	2024-12-12 14:06:35 -05:00
Ahmed Harmouche	db330a3110	Remove WebGL (#8012 )	2024-12-03 16:02:53 +01:00
George Hotz	d53cd92364	fix tests for delete lazy [pr] (#7980 )	2024-12-02 12:00:48 +08:00
Ahmed Harmouche	10618aba98	Bring back WebGPU (#7063 ) * Start from andredaprato:webgpu-clean * Fix infs * inf wgsl function is not needed * Emulated ulong for threefry, more tests passing * Randomness tests passing * Update model export to support new changes in webgpu, efficientnet export works again * Simplify shift emulation in wgsl * Delete test file * Fix bigger than u32 u32 literal * Why was skip copies added here? * Python3.12 for webgpu tests * Fix model export syntax error * Get test ops passing with some skips * Fix lint * Much simpler shift * Run more tests * Timestamp queries are not supported in CI, so skip search tests * All fancy indexing passing * r is ctx * Run more dtype tests by using is_dtype_supported * Cleanup ulong shift rendering * UPat -> Pat, UOps -> Ops * Pat -> UPat * Refactor render_ushift if-else * Pattern to avoid ulong mul * Remove vals_dtype * is_nan trick + rewrite, test_isnan passing * Rewrite a * select(1, nan, gate) -> select(a, nan, gate) * No arg, just op * Support char, uchar, short, ushort * Run test_index_mnis now that we have uint8 * Fix pyling * Save 3 lines by using base Compiler * No more long emulation * Remove fixup_binops * No more external_local_bufx wgsl specific cstyle modif, use base extra_pm * Simpler, faster copyin/out * Skip some new tests that use long * Fix typo * copyout touchup * Save lines by using render_cast * WebGL is not supported in core, delete it from is_dtype_supported * More narrow test skips for some unary tests * TernaryOps, UnaryOps -> Ops * TinyGrad supports WebGPU * StableDiffusion demo: f16tof32 gpu is a lib, update UI * Packed load/store, no more scale_size, no core tinygrad changes * Rename copyin, copyout * Device -> dev * Fix lint * Pattern matcher rule for packed load/store * Refactor * Shorter packed load/store * this should fix lint * Fix mypy * SD compile script working * New SD webgpu UI * New default prompt * New SD weights * Fix title when webgpu not available * Run symbolic tests, simplify is_nan, use round_up * Show step time on UI * Bump minimum wgpu version to v0.19 * Fix latent --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-11-26 12:26:40 +08:00
chenyu	40d7535eeb	clean up DTYPES_DICT [pr] (#7845 )	2024-11-22 10:01:34 -05:00

1 2 3 4 5

221 Commits