tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
chenyu	0703075357	bf16 is float (#2786 ) * add bfloat16 to is_float check * and test	2023-12-15 21:41:30 -05:00
chenyu	e4bbbc5bc3	Revert "Use the reduceop dtype to define the acc in linearizer (#2625 )" (#2783 ) This reverts commit `f3ed96a929`.	2023-12-15 16:29:10 -05:00
qazal	f3ed96a929	Use the reduceop dtype to define the acc in linearizer (#2625 ) * upcast the other way * Revert "upcast the other way" This reverts commit `355692ba79`. * remove uop cast, this should have never been there * add regression test * now fuzz it correct test * the accumulator is always the output type lint * fuzz all reduce ops * MULACC upcast_dtype could be half too opencl supports it https://man.opencl.org/mad.html * cast to the same dtype is a noop * internal casting support for MULACC * fuzz test mulacc internal casting * get_reduce_dtype handle vectorized acc update get_reduce_acc calls with the correct dtype update tests * pending _complete_ implementation of a function that gets the dtype based on self.reduceop +more failing tests * get_reduce_dtype try 2 add TODO * get_lazyop_info already does it * cleanup * bring back internal casting support for mulacc * use the scalar version of the acc dtype * conceptual diff cleanup * one extra line to a cleaner linearizer * correct test assumptions - these should promote? * rm mulacc cast, the cast of vins happens with the acc dtype promotion linearizer hacks * Revert "rm mulacc cast, the cast of vins happens with the acc dtype promotion" This reverts commit `afdd540733`. Revert "correct test assumptions - these should promote?" This reverts commit `49ae2206ed`. * skip tests blocked by MULACC->lazyop cleanup * final changes to add back internal casting for MULACC and update skip test logic, upcast works but downcast does not * only test the linearizer abstraction layer we wanna ensure that linearizer matches whatever lazy is returning * remove unused hypothesis module * remove mulacc related changes, those will move to the lazy pr * remove midcast test * move to helpers * Revert "remove midcast test" This reverts commit `86e74d7960`. add TODO with skip --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2023-12-15 16:14:32 -05:00
chenyu	765f8b05e5	TernaryOps.WHERE has vin[0] as bool and BinaryOps.CMPLT always outputs bool (#2782 ) * vin[0] to where is always bool * due to better hack * update test * fix test_uops	2023-12-15 14:51:51 -05:00
George Hotz	96a276cc7c	hotfix: add test_reduce_permute_nofuse to master	2023-12-15 09:39:47 -08:00
qazal	66f07d97e2	don't auto-cast half to float in unary functions (#2776 ) * least upper float * dont cast to the same thing * tests for least_upper_float * add regression tests to test_dtype_alu * the call is pretty cheap probably cache is too much overhead	2023-12-15 10:11:47 -05:00
George Hotz	c6eb618013	tests from new lazy branch (#2774 ) * tests from new lazy branch * fix lin 11 * that was needed * doesn't fail * mark * meant that * llvm passes	2023-12-14 23:06:39 -08:00
chenyu	c0f76ed4ea	transformer kvcache and mask have same dtype as input (#2771 ) * transformer kvcache and mask have same dtype as input * don't use `=0` in cstyle ternary where * (bool) * where float16 test	2023-12-14 22:41:51 -05:00
chenyu	66d9eb10b6	arange default dtype to int and zeros/ones default to float (#2769 )	2023-12-14 17:53:00 -05:00
qazal	3cf4376ce2	test_linearizer cleanup (#2766 ) * test_linearizer cleanup * use unittest.skipIf * update msg	2023-12-14 17:20:09 -05:00
chenyu	57017c87e9	remove duplicated dtype in DEFINE_GLOBAL args (#2768 ) now DEFINE_GLOBAL uop.arg[1] is always the same as uop.dtype, we can remove the one in arg and just use uop.dtype	2023-12-14 15:42:36 -05:00
chenyu	5235cdee3d	remove _arg_int32 internal type (#2767 ) in DEFINE_GLOBAL, PtrDtype(int32) is buffer and int32 is int	2023-12-14 14:17:14 -05:00
chenyu	0ae22b0f81	restore Tensor.default_type in test_hip_rdna3 (#2763 ) might cause flaky tests	2023-12-14 11:35:38 -05:00
qazal	746cb5de21	Test coverage for matvec (#2762 ) * add test coverage for matvec * skip devices that don't support locals	2023-12-14 11:34:56 -05:00
chenyu	107dd8f3d7	fix a typo in test_dtype_alu (#2754 )	2023-12-13 19:23:21 -05:00
chenyu	81a747fc63	more test cases in test_slice_fancy_indexing_with_idx (#2751 )	2023-12-13 17:52:26 -05:00
George Hotz	7e5b3e53fe	changes to prep for new lazy (#2748 ) * changes to prep for new lazy * put those back	2023-12-13 10:28:22 -08:00
Umut Zengin	8ad7cfeeb1	More simplification in to_image_idx and symbolic (#2679 ) * less valid * add test --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2023-12-13 12:30:44 -05:00
chenyu	aa4a0de287	simpler Tensor.pow to integer (#2746 )	2023-12-13 11:39:20 -05:00
chenyu	2ef33abd20	some unary functions cast int input into float (#2740 ) * some unary functions cast int input into float * precision * image dtype	2023-12-13 00:10:29 -05:00
Shawn Hagler	51afe938f1	update onnx model links (#2737 )	2023-12-12 19:11:11 -08:00
chenyu	0869e7a301	update onnx benchmark urls (#2735 ) onnx is remapping the models, old ones are in archive/	2023-12-12 20:46:01 -05:00
George Hotz	6d6eb9302d	ruff checks the max line length is 150 (#2734 ) * ruff checks the max line length is 150 * fix tensor.py * a lot more * done	2023-12-12 17:34:47 -08:00
chenyu	00b611c156	simplify type promotion - remove weak types (#2730 )	2023-12-12 16:12:57 -05:00
chenyu	ef6e942a23	dtype promotion helpers (#2724 ) * dtype promotion helpers * better tests * space	2023-12-11 23:14:23 -05:00
Christopher Mauri Milan	0232db294d	fix tolist issue (#2723 )	2023-12-11 19:14:00 -08:00
chenyu	4075208127	some dtype creation spec test cases (#2722 )	2023-12-11 19:33:49 -05:00
Guy Leroy	ee9e1d3662	Extend available types for `safe_save` (#2720 ) * Extend available types to save with * Linter fix	2023-12-11 14:50:35 -08:00
qazal	a43bc78804	fix dtypes helpers for integers (#2716 ) * scalar * maybe do this instead * Revert "scalar" everything is a scalar * add tests in test_dtype * fuzz testing + fix unsigned ints * fuzz everything	2023-12-11 09:28:19 -08:00
chenyu	2ee6f689c5	simpler einsum (#2700 )	2023-12-10 21:24:44 -05:00
George Hotz	0fd44259cd	bf16 fix + cleanups from mixtral (#2698 ) * bf16 fix + cleanups from mixtral * generic bf16 cast	2023-12-10 16:31:52 -08:00
Davi Silva	7fbebb3df6	Implement einsum (#2686 ) * hopeful impl for Tensor.einsum * satisfy mypy by having less typing. :( * a few simple tests * even more tests * permute tests * xfails for improper usage * fix LLVM test fail * use argfix * more helpful error message on shape mismatch	2023-12-10 15:56:01 -08:00
chenyu	2d0e38e201	fix jit input_rawbuffers check wrt consts (#2689 ) * fix jit input_rawbuffers check wrt consts * .numpy()	2023-12-09 15:59:03 -05:00
geohotstan	67ff2b2b18	Formatted test_indexing (#2688 ) * added tensor.clone() for more correct cloning behavior * some work and randint issue * formatted * final cleanups * oops, bug fix	2023-12-09 11:38:36 -05:00
chenyu	0fb1d47aa0	two linearizer fuzzer failed test case for webgpu (#2685 ) * add a linearizer fuzzer failed for webgpu * CI specific	2023-12-08 22:52:34 -05:00
qazal	73b067f5ce	Bitcast p2 bfloat16 tests + clang fix (#2635 ) * add bf16 test support this model takes me almost a minute to download though: https://huggingface.co/TinyPixel/Llama-2-7B-bf16-sharded/resolve/main/pytorch_model-00001-of-00014.bin?download=true: 100%\|█████████████████████████████\| 981M/981M [00:40<00:00, 24.2MB/s] * ensure we first load if it is bitcast to avoid taking the address of an rvalue * tiny bf16 in the cloud skip GPU * should skip torch lint * Revert "ensure we first load if it is bitcast to avoid taking the address of an rvalue" This reverts commit `b86a28ab84`. * break the kernel * skip LLVM and GPU in CI * skip CUDA	2023-12-08 10:30:10 -08:00
qazal	a29538a094	green more dtypes tests (#2656 ) * universal test cast * disable div * midcast fixup * add 64-bit types * hack maximum * use Metal precise::sin instead of default This is because the default sin function defaults to single-percision math: https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf#page=164 * LLVM code_for_op support for var_dtype * comment out maximum for now with a TODO explaining it * Revert "hack maximum" This reverts commit `d170048c5f`. * make the comment more specific * slightly more forgiving * ok does this fail in all backends? * weird its only Metal CI * add graph * skip sin of nan for CUDACPU This is only happening in the CUDACPU runtime and not CUDA itself. https://github.com/tinygrad/tinygrad/actions/runs/7128973726/job/19412000385#step:16:36 * METAL and CUDACPU behave differently in overflows with numpy running on CI * that skip is wrong * skip fp16 tests on LLVM similar to test_dtype original commit that skipped LLVM in CI `1826ff6b89` * remove all of sin from CUDACPU * limit range of values in CUDACPU and METAL CI * Revert "use Metal precise::sin instead of default" This reverts commit `d960094d4a`. * change atol and rtol for Metal sin * METAL CI is more imprecise * cleanup --------- Co-authored-by: George Hotz <geohot@gmail.com>	2023-12-08 10:29:20 -08:00
George Hotz	4164d0ebbd	multitensor start (#2676 ) * multitensor work * early gen fixes the tests * atol for flaky test	2023-12-07 17:07:05 -08:00
Ahmed Harmouche	4b01839774	support vals on WebGPU, run more tests (#2668 ) * Vals on webgpu, run more tests * Skip slow tests, run symbolic ops tests * Balance out tests	2023-12-07 16:45:21 -08:00
geohotstan	d02ff21f1a	enable test_index and test_advancedindex (#2648 ) * enable test_index and test_advancedindex with pretty diff * removed contig * created set_ helper function * comment change * del empty line --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2023-12-07 19:44:39 -05:00
George Hotz	00d9eda961	FROM -> COPY, move vars_from_ast (#2675 )	2023-12-07 16:32:30 -08:00
chenyu	51af99367f	fix fuzz_linearizer using new device Buffer (#2674 )	2023-12-07 19:21:47 -05:00
nimlgen	650117a8f6	split large jit into several graphs (#2650 ) * jit graph split * update * that's fine, not all buffers are there now * use logariphmic tho, seems good * no keep it simple * add test * simplify * split graph when jit item cannot be graphed	2023-12-07 10:58:25 -08:00
chenyu	fd21eced74	reduce gpt2 kernel count in test_real_world (#2663 )	2023-12-06 21:57:04 -05:00
chenyu	371005cb2d	use one kvcache tensor in gpt2 instead of two separate caches (#2662 ) * use one kvcache tensor in gpt2 * test case * is None * better test cases	2023-12-06 20:59:17 -05:00
George Hotz	5a7b2ff1b2	masked shapetrackers (#2657 )	2023-12-06 11:22:26 -08:00
chenyu	b931a20882	minor shapetracker cleanup (#2652 )	2023-12-06 11:43:52 -05:00
qazal	c704a77ca0	green dtypes ALU tests (#2617 ) * dtypes alu test * those types don't exist in torch * floats * more tests * disable those * a couple unary tests * skip float16 tests in CI for GPU * fix LLVM bool add True+True=1+1=2 which truncates to False in native LLVM * remove hardcoded float for LLVM ALU fns * less sensitive atol for fp32, 1e-10 is flaky and sometimes failed even if you revert the merge commit for non-fp32 math, nothing has changed in our kernels for fp32. * return on overflows * fix CUDA exp2 * compute results of op regardless of bounds in a python backend * skip fp16 in GPU and CUDACPU * fuzz a smaller range in the float_midcast_int32 test I sampled this and we overflow ~70% of the time. because numpy behaves differently on different devices for overflows and Metal seems to do the same, I'm opting to eliminate the non-determinism here * remove CUDA exp2 overload it's already there now --------- Co-authored-by: George Hotz <geohot@gmail.com>	2023-12-06 08:15:46 -08:00
Amrit Sahu	71d989b476	adding test to cover #2644 failure (#2645 )	2023-12-06 11:00:30 -05:00
Ahmed Harmouche	50dcd532d5	Get all WEBGPU test_ops passing (#2646 ) * Get all WEBGPU tests passing * Custom render store is not needed in wgsl	2023-12-06 07:40:37 -08:00

1 2 3 4 5 ...

1137 Commits