tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
Christopher Milan	8c3c026d86	decomp float16 to float32 (#14417 ) * decomp float16 to float32 * denormals arent zero * add test * denormals are zero * fix * oops * bitcast works * fix LOADs * test_dtype passing * cleanup * mypy * debug print * only emulate if EMULATED * very ugly, but passes spec * add test_dtype_alu tests * Revert "very ugly, but passes spec" This reverts commit `fdc3999b65`. * bottom up decompositions * that should have symbolic * simplify a bit * SPEC really works * run with DEBUG * debug=4 * rm debug	2026-02-04 01:37:47 -05:00
Christopher Milan	e575dd8275	prevent UB in long decomp and more emulated tests (#14447 )	2026-01-30 19:38:41 -05:00
Christopher Milan	5e36482314	decompose long to ints where unsupported, try 2 (#14383 )	2026-01-27 23:20:43 -05:00
Christopher Milan	2e72625652	Revert "decompose dtypes.long to ints where unsupported (#14261 )" (#14362 )	2026-01-27 02:04:59 -05:00
Christopher Milan	0793319929	decompose dtypes.long to ints where unsupported (#14261 ) * add works * use carry not overflow * bitwise ops * use tag instead of vec * cleaner * mul somewhat works * mul actually works * SUB and NEG work * SHL/SHR * ulong support * this should work? * oops * fix indexing * all ALU mostly works * refactor * test_dtype passing * signed division works * format * clean * some tests * ruff	2026-01-26 18:34:13 -05:00
chenyu	c7b8f6496f	remove dtypes.index_like and dtypes.fields [pr] (#14207 ) barely used, so just use inline and DTYPES_DICT	2026-01-18 11:49:01 -05:00
chenyu	b34c637767	support bfloat16 for CL (#14073 )	2026-01-08 14:14:29 -05:00
chenyu	3caa1e2c98	fix cast HALF with PYTHON backend (#14058 )	2026-01-07 16:52:05 -05:00
chenyu	5f1ede7f7e	clean up test_dtype (#14055 ) use less lambda	2026-01-07 15:45:42 -05:00
George Hotz	7abf4591ba	use bitsize on dtype (#14011 ) * use bitsize on dtype [pr] * bitsize * bitsize in js export, but might be wrong * reverts * revert that	2026-01-04 12:16:21 -08:00
Jakob Sachs	ab2220b834	Handle missing bfloat16 natives on CPU architectures (#13553 ) * CPU: fix compiler-rt libcall by adding intermediate casts for bfloat16 * fix lint * remove old manual bypass of bf16 for CPU tests, and add diversion converstion from bf16 to/from fp16 --------- Co-authored-by: Jakob Sachs <jakobs99@purelymail.com>	2025-12-11 15:38:43 -05:00
Christopher Milan	0aabc1e938	Mesa NIR backend (NAK/LLVMpipe) (#12089 ) * nak works * TestOps::test_add works * testop has no crashes * fix bool casts * fix typo * add disassemble * RANGE and locals/regs * simplify NAKCompiler * disass cleanup * cleanup nir codegen * almost all tests passing * cleanup notes in extra/ * old notes * only import nak if NIR=1 * fix new SPECIAL syntax * fix local/shared memory * more tests passing * add DEFINE_VAR support * llvmpipe kinda works * diskcache * some mypy stuff * lvp passing test_ops.py * fix imports * actually fix imports * remove 'stdout' * fix llvm import * fix mypy issues * nicer errors * simpler test_dtype skips * test lvp in CI * fix github action syntax * fix more actions typos * switch to mesa 25.1.0 * diskcache_put * better generation for lvp nir_options * b64encode shader blobs * Revert diskcache changes This reverts commits `930fa3de8a` and `8428c694b3`. * general cleanup * better error messages * fix llvm import * fix windows tests * link with libm and libgcc_s * fix some errors * dont check for 'float4' * NIR uses pointer arithmetic * use tinymesa * bump tinymesa * bump tinymesa again * update lvp nir_options * print nir shader with DEBUG * simplify LVPCompiler * more tests * "gated" STORE * NAK is cacheable * more tests * all tests pass locally for NAK * test autogen in CI * autogen deps * more deps * fix uop_gc * fix macos * mypy * save 2 lines * save two more lines * save 1 line * save 4 lines * save more lines * Revert "save more lines" This reverts commit `dd3a720c5a`. * save more lines * fix LVP on windows * refactor * reorganize some code * refactor lib_gpu * move LVP check * out of order loads * remove support.mesa * bump tinymesa version * simplify LVP jit * macos * macos ci * shell: bash * testing * more testing * compute brew prefix * stupid typo * actually fix * lib * stdout on macos * inline gallivm_compile_module * Revert "inline gallivm_compile_module" This reverts commit `b65983b151`. * elf macos * semicolon * inherit from CPULLVMCompiler * ruff * disas test * fix libm linking * default is fine actually * arm works * add elf loader link test * fix NAK beam * pylint is too smart by half --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2025-10-15 17:38:33 +08:00
chenyu	9f2b69b870	enable few tests for PTX test_dtype (#12445 )	2025-10-03 08:56:30 -04:00
b1tg	54c15d74a4	python float8 support (#11960 ) * basic support * alu * nan in exec_alu * rand_for_dtype * inf + 0.0 * finfo * revert rand_for_dtype * clean * truncate fp8s inf * spec ok * float_to_fp8 nan/inf * least_upper_dtype * clean up --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-09-18 09:17:09 -04:00
chenyu	98ecab7563	remove ml_dtypes (#12169 )	2025-09-14 14:20:05 -04:00
chenyu	0e266f376c	ops_gpu -> ops_cl (#12103 )	2025-09-10 15:15:48 -04:00
nimlgen	551560b87c	do not use getenv('PTX') in tests (#12095 ) * test without ptx * fix tests * fix test * linters	2025-09-10 14:04:07 +03:00
b1tg	58d13a6e3e	remove redundant check (#12087 )	2025-09-09 15:15:39 -04:00
b1tg	82e955fe79	fix inf bug in float_to_fp8 (#12085 )	2025-09-09 12:02:56 -04:00
chenyu	677220ae7e	test_tesnor_data to unit/ (#12013 )	2025-09-04 19:58:27 -04:00
b1tg	a9f07c31bc	fix amd llvm sqrt (#11936 ) * fix amd llvm sqrt * lint --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-09-01 09:31:14 -04:00
b1tg	c1eeb3b99c	only skip AMD_LLVM (#11934 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-08-31 18:15:47 +03:00
b1tg	75d380a77c	fix transcendentals in python renderer (#11932 ) * fix transcendentals in python renderer * add test --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-08-31 09:37:17 -04:00
b1tg	b2cc06218a	python bfloat16 (#11912 ) * python bf16 * _to_torch_storage_type --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-08-29 15:18:02 -04:00
chenyu	f28f613f85	improved float_to_bf16 (#11848 ) round instead of truncate	2025-08-26 11:14:06 -04:00
chenyu	e22e5da9a5	move some test_dtype tests to unit (#11479 )	2025-08-02 15:25:00 -04:00
chenyu	a41140241b	truncate unsigned const in cstyle (#11318 ) it can be a warning or a hard error in clang PTX and PYTHON also need fix, skipping for now	2025-07-22 08:02:12 -04:00
George Hotz	a493eb396c	fix view add 0 (#10840 )	2025-06-16 16:46:12 -07:00
chenyu	8c28b5d833	move dtype spec tests into unit test (#10808 ) * move dtype spec tests into unit test can clean up more after the split * skip CI test_backward_sum_acc_dtype	2025-06-13 22:21:22 -04:00
Sieds Lykles	0daa4c6ed0	Add `DType.min` and `DType.max` properties (#10749 ) * add properties * cleaner test * remove added newline	2025-06-10 08:31:34 -07:00
George Hotz	81b9c04574	move high level stuff to unit tests [pr] (#10708 ) * move high level stuff to unit tests [pr] * process replay on unit tests * fix pr, less compute * set omp num threads * set 200MB buffer size limit * delete junk * fix tests * faster * move test_indexing to unit * faster	2025-06-08 14:05:56 -07:00
George Hotz	32e9949052	rename lazydata to uop (#10698 )	2025-06-08 08:42:22 -07:00
pkotzbach	dbbd755cba	FP8s truncate (#9937 ) * truncate fp8 * fix * maybe like that? * fix linters * ruff * move from extra and add ml_types to tests * minor changes * str to dtypes and nan support --------- Co-authored-by: pkotzbach <pawkotz@gmail.com>	2025-04-22 19:12:49 -04:00
pkotzbach	5849c43382	FP8s part 1 (#9887 ) * fp8s part 1 * prettier * fixes * fixes * remove stuff that should be in next pr * revert * add creation --------- Co-authored-by: pkotzbach <pawkotz@gmail.com>	2025-04-15 11:20:02 -04:00
chenyu	ce454793e6	support specifying dtype for Tensor.linear (#9886 )	2025-04-14 13:55:11 -04:00
George Hotz	78caf55154	Revert "FP8 support on NVIDIA (#8631 )" This reverts commit `2c8e4ea865`.	2025-04-09 12:27:41 +08:00
pkotzbach	2c8e4ea865	FP8 support on NVIDIA (#8631 ) * squashed fp8 commits * tensorcore start * minor changes * pre-commit * pylint * Delete fp8mul.cu * clean * small bugfix * fix test_dtype * fix test_dtype_alu * add EMULATE_CUDA_SM89 * fix ci * fix test_linearizer * fix test_linearizer * fix swizzle * add debug to simple_matmul * fixed swizzle * python emulator * refactor python emulator * setup fix * numpy setup * ml_dtypes only in emulate_cuda_sm89 * fix pylint * fix tests * fix mypy * fix mypy * fix ruff * done python emulator * add acc type * tests * mypy * clean code * add cuda tensor core tests to CI * minor fix * clean test_dtype.py * clean cstyle.py * clean test_ops.py * fix test * fix test * whitespaces * pylint * pylint * amd? * amd? * amd * reduce lines * mockgpu remove * fix * ruff * ruff * fix mypy * ruff * test only for cuda * fixed formatting * small fixes * small fix * least_upper_dtype if fp8s not supported * log and reciprocal are supported for fp8s * ops python fixes * dtypes.fp8s use * e4m3 + e5m2 result dtype test * truncate linter fix --------- Co-authored-by: pkotzbach <pawkotz@gmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-04-08 21:54:04 -04:00
chenyu	79145e3d40	cleanup truncate_bf16 [pr] (#9725 ) use torch bfloat16 for groundtruth in test. also a TODO for discrepancy	2025-04-03 05:43:49 -04:00
qazal	e26caf4c3a	hotfix: skip test_mean_half_precision_underflow on amd ci (#9476 ) The global size is very large (781250 gidx) and the emulated version takes more than 1 minute to execute the kernel.	2025-03-17 16:47:48 +08:00
chenyu	01e8b60911	acc_dtype -> dtype (#9402 ) matched numpy and torch	2025-03-10 16:05:30 -04:00
chenyu	3ae66e59a3	least_upper_float is at least default_float (#9303 ) * least_upper_float is at least default_float en route for div rounding mode. dtype of true int division would change from int32 to default_float, which matches torch too. * fix bert acc	2025-02-28 10:41:56 -05:00
qazal	cbfe95d306	bring cast before view back (#9230 ) * bring cast before view back * tune it to only trigger on expands --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-25 01:50:39 +02:00
Ahmed Harmouche	59fe45f947	Solve get_grouped_dims does not split issue (#9085 ) * Solve dims too large errors on webgpu * Simplify divisor find * Test square root divisor * Fix lint * Refactor into group_dims and split_dims * Refactor * Fix lint * Add back max check in _group_dims * Prefer grouping over split --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-16 19:57:29 -05:00
b1tg	1f1362fd27	add truncate_bf16 (#9078 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-02-15 07:59:09 +08:00
Ahmed Harmouche	916d5e7f08	WebGPU f16 support (f16 bounty part 2) (#8653 ) * WebGPU f16 support * Don't enable f16 yet * dtype tests passing after bitcast fix * Maybe all WebGPU green? * Require shader-f16 in examples * Minor wgsl touchup * 1 line shorter * Simpler * Add transcendetal support * log2 nan location mismatch on Vulkan * Nan skips	2025-02-12 19:46:53 +08:00
George Hotz	c1c5227acb	preserve size in dtype ptr [pr] (#8898 )	2025-02-05 14:38:57 +08:00
George Hotz	c85737c200	assert to prepare for grad uop [pr] (#8280 ) * assert to prepare for grad uop [pr] * fix test_nn * fix most of test_tensor * few more tests * fix multi * uniform gradient * acc_dtype * any for multi * fix typing * fix assert, CAST_BEFORE_VIEW is still the issue * explict test for CAST_BEFORE_VIEW --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-01-14 13:26:56 -08:00
qazal	ed618a72e7	do not use subbuffer for bitcast (#8514 ) * do not use subbuffer for bitcast * edit that test * explicit test for ptx * ptx	2025-01-06 18:40:46 +02:00
George Hotz	6608ba316d	add size of the buffer to the ptr dtype (#8322 )	2024-12-18 12:46:35 -08:00
uuuvn	da2245a458	Fix double => half cast on clang (#8265 )	2024-12-15 11:24:05 -08:00

1 2 3 4 5

227 Commits