tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-26 15:28:10 -05:00

Author	SHA1	Message	Date
Umut Zengin	8ad7cfeeb1	More simplification in to_image_idx and symbolic (#2679 ) * less valid * add test --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2023-12-13 12:30:44 -05:00
Ahmed Harmouche	e7248b677c	Remove wgsl custom render_for (#2729 ) * Generic for * remove custom render_if * Simplify for loop * 150 line-length constraint * Put custom render_if back	2023-12-13 09:04:17 -08:00
tomtom-95	6b0f07e94a	add decorator to preserve info about original function (#2743 )	2023-12-13 09:03:50 -08:00
chenyu	aa4a0de287	simpler Tensor.pow to integer (#2746 )	2023-12-13 11:39:20 -05:00
chenyu	26f49869f4	minor tensor type annotation and cleanup (#2742 )	2023-12-13 01:53:59 -05:00
chenyu	2ef33abd20	some unary functions cast int input into float (#2740 ) * some unary functions cast int input into float * precision * image dtype	2023-12-13 00:10:29 -05:00
George Hotz	3e778fcc52	hotfix: ***	2023-12-12 19:44:31 -08:00
Shawn Hagler	51afe938f1	update onnx model links (#2737 )	2023-12-12 19:11:11 -08:00
George Hotz	431fae5ed3	hotfix: update_stats cleanup, yellow is nicer than red	2023-12-12 17:50:22 -08:00
chenyu	0869e7a301	update onnx benchmark urls (#2735 ) onnx is remapping the models, old ones are in archive/	2023-12-12 20:46:01 -05:00
George Hotz	6d6eb9302d	ruff checks the max line length is 150 (#2734 ) * ruff checks the max line length is 150 * fix tensor.py * a lot more * done	2023-12-12 17:34:47 -08:00
George Hotz	3635540ddb	shorter line (#2733 )	2023-12-12 15:34:17 -08:00
nimlgen	ede7971ada	save some lines (#2731 ) * remove unsused mem_cached var * one more	2023-12-12 15:26:27 -08:00
chenyu	00b611c156	simplify type promotion - remove weak types (#2730 )	2023-12-12 16:12:57 -05:00
Nguyen Nguyen Phuong	07cf45e133	fix cuda matmul (#2725 )	2023-12-12 07:59:31 -08:00
chenyu	ef6e942a23	dtype promotion helpers (#2724 ) * dtype promotion helpers * better tests * space	2023-12-11 23:14:23 -05:00
Christopher Mauri Milan	0232db294d	fix tolist issue (#2723 )	2023-12-11 19:14:00 -08:00
chenyu	4075208127	some dtype creation spec test cases (#2722 )	2023-12-11 19:33:49 -05:00
Guy Leroy	ee9e1d3662	Extend available types for `safe_save` (#2720 ) * Extend available types to save with * Linter fix	2023-12-11 14:50:35 -08:00
George Hotz	b5fd160b39	hotfix: increase rtol on simple_matmul	2023-12-11 10:10:29 -08:00
Gregor Kikelj	4feaaa27aa	ensure shrink is valid (#2717 )	2023-12-11 09:58:43 -08:00
qazal	a43bc78804	fix dtypes helpers for integers (#2716 ) * scalar * maybe do this instead * Revert "scalar" everything is a scalar * add tests in test_dtype * fuzz testing + fix unsigned ints * fuzz everything	2023-12-11 09:28:19 -08:00
nimlgen	bc3c4ce50b	cuda set context before sync (#2715 ) * cuda set context before sync * no helper	2023-12-11 09:26:53 -08:00
Ivan Vnučec	8d206f6bfd	fix help message (#2705 ) llama -> mixtral	2023-12-10 22:04:35 -08:00
George Hotz	59ab3675a3	faster mixtral + green for new kernels (#2701 ) * green for new kernels * track ram	2023-12-10 19:04:58 -08:00
chenyu	2ee6f689c5	simpler einsum (#2700 )	2023-12-10 21:24:44 -05:00
George Hotz	b01e3907a1	mixtral touch up: two lines	2023-12-10 17:21:49 -08:00
George Hotz	b3982187d1	Mixtral Example (#2691 ) * mixtral * simpler * global counters * simpler * weights arg	2023-12-10 17:18:31 -08:00
George Hotz	0fd44259cd	bf16 fix + cleanups from mixtral (#2698 ) * bf16 fix + cleanups from mixtral * generic bf16 cast	2023-12-10 16:31:52 -08:00
Davi Silva	7fbebb3df6	Implement einsum (#2686 ) * hopeful impl for Tensor.einsum * satisfy mypy by having less typing. :( * a few simple tests * even more tests * permute tests * xfails for improper usage * fix LLVM test fail * use argfix * more helpful error message on shape mismatch	2023-12-10 15:56:01 -08:00
chenyu	181b0970b5	slightly better extra/to_movement_ops dedups (#2695 )	2023-12-10 11:05:44 -05:00
chenyu	ef18d79faa	remove noop from to_movement_ops (#2693 )	2023-12-10 00:50:24 -05:00
chenyu	2d0e38e201	fix jit input_rawbuffers check wrt consts (#2689 ) * fix jit input_rawbuffers check wrt consts * .numpy()	2023-12-09 15:59:03 -05:00
geohotstan	67ff2b2b18	Formatted test_indexing (#2688 ) * added tensor.clone() for more correct cloning behavior * some work and randint issue * formatted * final cleanups * oops, bug fix	2023-12-09 11:38:36 -05:00
chenyu	1e7823e1f5	combine GROUP and GROUPTOP to a single block (#2687 )	2023-12-09 01:19:32 -05:00
chenyu	0fb1d47aa0	two linearizer fuzzer failed test case for webgpu (#2685 ) * add a linearizer fuzzer failed for webgpu * CI specific	2023-12-08 22:52:34 -05:00
chenyu	fae5394845	validate llama output (#2681 ) * validate llama output * does not work with quantize	2023-12-08 16:42:01 -05:00
nickovaras	182d067407	Update yolov3.py (#2680 ) The current yolov3 example is broken with the current implementation of of fetch in the helpers. I was tempted to fix the helpers instead but that could have just as well broken other examples.	2023-12-08 12:59:38 -08:00
qazal	73b067f5ce	Bitcast p2 bfloat16 tests + clang fix (#2635 ) * add bf16 test support this model takes me almost a minute to download though: https://huggingface.co/TinyPixel/Llama-2-7B-bf16-sharded/resolve/main/pytorch_model-00001-of-00014.bin?download=true: 100%\|█████████████████████████████\| 981M/981M [00:40<00:00, 24.2MB/s] * ensure we first load if it is bitcast to avoid taking the address of an rvalue * tiny bf16 in the cloud skip GPU * should skip torch lint * Revert "ensure we first load if it is bitcast to avoid taking the address of an rvalue" This reverts commit `b86a28ab84`. * break the kernel * skip LLVM and GPU in CI * skip CUDA	2023-12-08 10:30:10 -08:00
qazal	a29538a094	green more dtypes tests (#2656 ) * universal test cast * disable div * midcast fixup * add 64-bit types * hack maximum * use Metal precise::sin instead of default This is because the default sin function defaults to single-percision math: https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf#page=164 * LLVM code_for_op support for var_dtype * comment out maximum for now with a TODO explaining it * Revert "hack maximum" This reverts commit `d170048c5f`. * make the comment more specific * slightly more forgiving * ok does this fail in all backends? * weird its only Metal CI * add graph * skip sin of nan for CUDACPU This is only happening in the CUDACPU runtime and not CUDA itself. https://github.com/tinygrad/tinygrad/actions/runs/7128973726/job/19412000385#step:16:36 * METAL and CUDACPU behave differently in overflows with numpy running on CI * that skip is wrong * skip fp16 tests on LLVM similar to test_dtype original commit that skipped LLVM in CI `1826ff6b89` * remove all of sin from CUDACPU * limit range of values in CUDACPU and METAL CI * Revert "use Metal precise::sin instead of default" This reverts commit `d960094d4a`. * change atol and rtol for Metal sin * METAL CI is more imprecise * cleanup --------- Co-authored-by: George Hotz <geohot@gmail.com>	2023-12-08 10:29:20 -08:00
George Hotz	4164d0ebbd	multitensor start (#2676 ) * multitensor work * early gen fixes the tests * atol for flaky test	2023-12-07 17:07:05 -08:00
Ahmed Harmouche	4b01839774	support vals on WebGPU, run more tests (#2668 ) * Vals on webgpu, run more tests * Skip slow tests, run symbolic ops tests * Balance out tests	2023-12-07 16:45:21 -08:00
geohotstan	d02ff21f1a	enable test_index and test_advancedindex (#2648 ) * enable test_index and test_advancedindex with pretty diff * removed contig * created set_ helper function * comment change * del empty line --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2023-12-07 19:44:39 -05:00
George Hotz	00d9eda961	FROM -> COPY, move vars_from_ast (#2675 )	2023-12-07 16:32:30 -08:00
chenyu	51af99367f	fix fuzz_linearizer using new device Buffer (#2674 )	2023-12-07 19:21:47 -05:00
nimlgen	650117a8f6	split large jit into several graphs (#2650 ) * jit graph split * update * that's fine, not all buffers are there now * use logariphmic tho, seems good * no keep it simple * add test * simplify * split graph when jit item cannot be graphed	2023-12-07 10:58:25 -08:00
qazal	29f2653d8d	add graph (#2670 )	2023-12-07 10:53:31 -08:00
chenyu	539b00a645	move llama getenv("JIT") from models to examples (#2671 ) Transformer class has a jit param so we should use that in the caller	2023-12-07 12:43:22 -05:00
chenyu	fd21eced74	reduce gpt2 kernel count in test_real_world (#2663 )	2023-12-06 21:57:04 -05:00
chenyu	371005cb2d	use one kvcache tensor in gpt2 instead of two separate caches (#2662 ) * use one kvcache tensor in gpt2 * test case * is None * better test cases	2023-12-06 20:59:17 -05:00

... 146 147 148 149 150 ...

10417 Commits