tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-05 12:15:05 -05:00

Author	SHA1	Message	Date
b1tg	54c15d74a4	python float8 support (#11960 ) * basic support * alu * nan in exec_alu * rand_for_dtype * inf + 0.0 * finfo * revert rand_for_dtype * clean * truncate fp8s inf * spec ok * float_to_fp8 nan/inf * least_upper_dtype * clean up --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-09-18 09:17:09 -04:00
chenyu	98ecab7563	remove ml_dtypes (#12169 )	2025-09-14 14:20:05 -04:00
chenyu	0e266f376c	ops_gpu -> ops_cl (#12103 )	2025-09-10 15:15:48 -04:00
nimlgen	551560b87c	do not use getenv('PTX') in tests (#12095 ) * test without ptx * fix tests * fix test * linters	2025-09-10 14:04:07 +03:00
b1tg	58d13a6e3e	remove redundant check (#12087 )	2025-09-09 15:15:39 -04:00
b1tg	82e955fe79	fix inf bug in float_to_fp8 (#12085 )	2025-09-09 12:02:56 -04:00
chenyu	677220ae7e	test_tesnor_data to unit/ (#12013 )	2025-09-04 19:58:27 -04:00
b1tg	a9f07c31bc	fix amd llvm sqrt (#11936 ) * fix amd llvm sqrt * lint --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-09-01 09:31:14 -04:00
b1tg	c1eeb3b99c	only skip AMD_LLVM (#11934 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-08-31 18:15:47 +03:00
b1tg	75d380a77c	fix transcendentals in python renderer (#11932 ) * fix transcendentals in python renderer * add test --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-08-31 09:37:17 -04:00
b1tg	b2cc06218a	python bfloat16 (#11912 ) * python bf16 * _to_torch_storage_type --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-08-29 15:18:02 -04:00
chenyu	f28f613f85	improved float_to_bf16 (#11848 ) round instead of truncate	2025-08-26 11:14:06 -04:00
chenyu	e22e5da9a5	move some test_dtype tests to unit (#11479 )	2025-08-02 15:25:00 -04:00
chenyu	a41140241b	truncate unsigned const in cstyle (#11318 ) it can be a warning or a hard error in clang PTX and PYTHON also need fix, skipping for now	2025-07-22 08:02:12 -04:00
George Hotz	a493eb396c	fix view add 0 (#10840 )	2025-06-16 16:46:12 -07:00
chenyu	8c28b5d833	move dtype spec tests into unit test (#10808 ) * move dtype spec tests into unit test can clean up more after the split * skip CI test_backward_sum_acc_dtype	2025-06-13 22:21:22 -04:00
Sieds Lykles	0daa4c6ed0	Add `DType.min` and `DType.max` properties (#10749 ) * add properties * cleaner test * remove added newline	2025-06-10 08:31:34 -07:00
George Hotz	81b9c04574	move high level stuff to unit tests [pr] (#10708 ) * move high level stuff to unit tests [pr] * process replay on unit tests * fix pr, less compute * set omp num threads * set 200MB buffer size limit * delete junk * fix tests * faster * move test_indexing to unit * faster	2025-06-08 14:05:56 -07:00
George Hotz	32e9949052	rename lazydata to uop (#10698 )	2025-06-08 08:42:22 -07:00
pkotzbach	dbbd755cba	FP8s truncate (#9937 ) * truncate fp8 * fix * maybe like that? * fix linters * ruff * move from extra and add ml_types to tests * minor changes * str to dtypes and nan support --------- Co-authored-by: pkotzbach <pawkotz@gmail.com>	2025-04-22 19:12:49 -04:00
pkotzbach	5849c43382	FP8s part 1 (#9887 ) * fp8s part 1 * prettier * fixes * fixes * remove stuff that should be in next pr * revert * add creation --------- Co-authored-by: pkotzbach <pawkotz@gmail.com>	2025-04-15 11:20:02 -04:00
chenyu	ce454793e6	support specifying dtype for Tensor.linear (#9886 )	2025-04-14 13:55:11 -04:00
George Hotz	78caf55154	Revert "FP8 support on NVIDIA (#8631 )" This reverts commit `2c8e4ea865`.	2025-04-09 12:27:41 +08:00
pkotzbach	2c8e4ea865	FP8 support on NVIDIA (#8631 ) * squashed fp8 commits * tensorcore start * minor changes * pre-commit * pylint * Delete fp8mul.cu * clean * small bugfix * fix test_dtype * fix test_dtype_alu * add EMULATE_CUDA_SM89 * fix ci * fix test_linearizer * fix test_linearizer * fix swizzle * add debug to simple_matmul * fixed swizzle * python emulator * refactor python emulator * setup fix * numpy setup * ml_dtypes only in emulate_cuda_sm89 * fix pylint * fix tests * fix mypy * fix mypy * fix ruff * done python emulator * add acc type * tests * mypy * clean code * add cuda tensor core tests to CI * minor fix * clean test_dtype.py * clean cstyle.py * clean test_ops.py * fix test * fix test * whitespaces * pylint * pylint * amd? * amd? * amd * reduce lines * mockgpu remove * fix * ruff * ruff * fix mypy * ruff * test only for cuda * fixed formatting * small fixes * small fix * least_upper_dtype if fp8s not supported * log and reciprocal are supported for fp8s * ops python fixes * dtypes.fp8s use * e4m3 + e5m2 result dtype test * truncate linter fix --------- Co-authored-by: pkotzbach <pawkotz@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-04-08 21:54:04 -04:00
chenyu	79145e3d40	cleanup truncate_bf16 [pr] (#9725 ) use torch bfloat16 for groundtruth in test. also a TODO for discrepancy	2025-04-03 05:43:49 -04:00
qazal	e26caf4c3a	hotfix: skip test_mean_half_precision_underflow on amd ci (#9476 ) The global size is very large (781250 gidx) and the emulated version takes more than 1 minute to execute the kernel.	2025-03-17 16:47:48 +08:00
chenyu	01e8b60911	acc_dtype -> dtype (#9402 ) matched numpy and torch	2025-03-10 16:05:30 -04:00
chenyu	3ae66e59a3	least_upper_float is at least default_float (#9303 ) * least_upper_float is at least default_float en route for div rounding mode. dtype of true int division would change from int32 to default_float, which matches torch too. * fix bert acc	2025-02-28 10:41:56 -05:00
qazal	cbfe95d306	bring cast before view back (#9230 ) * bring cast before view back * tune it to only trigger on expands --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-25 01:50:39 +02:00
Ahmed Harmouche	59fe45f947	Solve get_grouped_dims does not split issue (#9085 ) * Solve dims too large errors on webgpu * Simplify divisor find * Test square root divisor * Fix lint * Refactor into group_dims and split_dims * Refactor * Fix lint * Add back max check in _group_dims * Prefer grouping over split --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-16 19:57:29 -05:00
b1tg	1f1362fd27	add truncate_bf16 (#9078 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-02-15 07:59:09 +08:00
Ahmed Harmouche	916d5e7f08	WebGPU f16 support (f16 bounty part 2) (#8653 ) * WebGPU f16 support * Don't enable f16 yet * dtype tests passing after bitcast fix * Maybe all WebGPU green? * Require shader-f16 in examples * Minor wgsl touchup * 1 line shorter * Simpler * Add transcendetal support * log2 nan location mismatch on Vulkan * Nan skips	2025-02-12 19:46:53 +08:00
George Hotz	c1c5227acb	preserve size in dtype ptr [pr] (#8898 )	2025-02-05 14:38:57 +08:00
George Hotz	c85737c200	assert to prepare for grad uop [pr] (#8280 ) * assert to prepare for grad uop [pr] * fix test_nn * fix most of test_tensor * few more tests * fix multi * uniform gradient * acc_dtype * any for multi * fix typing * fix assert, CAST_BEFORE_VIEW is still the issue * explict test for CAST_BEFORE_VIEW --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-01-14 13:26:56 -08:00
qazal	ed618a72e7	do not use subbuffer for bitcast (#8514 ) * do not use subbuffer for bitcast * edit that test * explicit test for ptx * ptx	2025-01-06 18:40:46 +02:00
George Hotz	6608ba316d	add size of the buffer to the ptr dtype (#8322 )	2024-12-18 12:46:35 -08:00
uuuvn	da2245a458	Fix double => half cast on clang (#8265 )	2024-12-15 11:24:05 -08:00
pkotzbach	c1b79c118f	add unit tests for to_dtype (#8217 ) * add unit test for to_dtype * add unit test for to_dtype --------- Co-authored-by: pkotzbach <pawkotz@gmail.com>	2024-12-13 16:21:02 -05:00
chenyu	40a4c603b9	remove more test skip for webgpu [pr] (#8192 )	2024-12-12 14:06:35 -05:00
Ahmed Harmouche	db330a3110	Remove WebGL (#8012 )	2024-12-03 16:02:53 +01:00
George Hotz	d53cd92364	fix tests for delete lazy [pr] (#7980 )	2024-12-02 12:00:48 +08:00
Ahmed Harmouche	10618aba98	Bring back WebGPU (#7063 ) * Start from andredaprato:webgpu-clean * Fix infs * inf wgsl function is not needed * Emulated ulong for threefry, more tests passing * Randomness tests passing * Update model export to support new changes in webgpu, efficientnet export works again * Simplify shift emulation in wgsl * Delete test file * Fix bigger than u32 u32 literal * Why was skip copies added here? * Python3.12 for webgpu tests * Fix model export syntax error * Get test ops passing with some skips * Fix lint * Much simpler shift * Run more tests * Timestamp queries are not supported in CI, so skip search tests * All fancy indexing passing * r is ctx * Run more dtype tests by using is_dtype_supported * Cleanup ulong shift rendering * UPat -> Pat, UOps -> Ops * Pat -> UPat * Refactor render_ushift if-else * Pattern to avoid ulong mul * Remove vals_dtype * is_nan trick + rewrite, test_isnan passing * Rewrite a * select(1, nan, gate) -> select(a, nan, gate) * No arg, just op * Support char, uchar, short, ushort * Run test_index_mnis now that we have uint8 * Fix pyling * Save 3 lines by using base Compiler * No more long emulation * Remove fixup_binops * No more external_local_bufx wgsl specific cstyle modif, use base extra_pm * Simpler, faster copyin/out * Skip some new tests that use long * Fix typo * copyout touchup * Save lines by using render_cast * WebGL is not supported in core, delete it from is_dtype_supported * More narrow test skips for some unary tests * TernaryOps, UnaryOps -> Ops * TinyGrad supports WebGPU * StableDiffusion demo: f16tof32 gpu is a lib, update UI * Packed load/store, no more scale_size, no core tinygrad changes * Rename copyin, copyout * Device -> dev * Fix lint * Pattern matcher rule for packed load/store * Refactor * Shorter packed load/store * this should fix lint * Fix mypy * SD compile script working * New SD webgpu UI * New default prompt * New SD weights * Fix title when webgpu not available * Run symbolic tests, simplify is_nan, use round_up * Show step time on UI * Bump minimum wgpu version to v0.19 * Fix latent --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-11-26 12:26:40 +08:00
chenyu	40d7535eeb	clean up DTYPES_DICT [pr] (#7845 )	2024-11-22 10:01:34 -05:00
George Hotz	205befa788	move is_dtype_supported to device [pr] (#7575 )	2024-11-07 20:38:03 +08:00
Ahmed Harmouche	36488a2a43	Use is_dtype_supported in more places in tests (#7529 )	2024-11-04 09:21:15 -05:00
George Hotz	76a41a1083	don't compare with pointer dtype (#7394 ) * don't compare with pointer dtype * more cleanup * images are pointers * handle IMAGE better * cleaner test_image * this work * pr match * cleanup	2024-10-30 17:48:27 +08:00
George Hotz	4e2895f8d2	safe changes from new dtype branch [pr] (#7397 ) * safe changes from new dtype branch [pr] * only image test on GPU	2024-10-30 17:18:48 +08:00
George Hotz	27995a2a04	vcount + cleanups (#7393 ) * Revert "Revert "Restore vcount [pr] (#7390)" (#7392)" This reverts commit `4ca53db604`. * ugh bugfix [pr] * uops_to_dtypes function * fixups * varnames * fix mypy * just 4,8 * tests	2024-10-30 12:50:15 +08:00
George Hotz	4ca53db604	Revert "Restore vcount [pr] (#7390 )" (#7392 ) This reverts commit `1058f9c9ff`.	2024-10-30 11:40:25 +08:00
George Hotz	1058f9c9ff	Restore vcount [pr] (#7390 ) * Revert "Revert "add vcount to PtrDtype (#7388)"" This reverts commit `399a5219dd`. * Revert "Revert "add tests to vcount stuff [pr] (#7389)"" This reverts commit `cc8d6dbdf3`. * no ptr	2024-10-30 11:27:55 +08:00

1 2 3 4 5

214 Commits