tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-24 14:28:09 -05:00

Author	SHA1	Message	Date
chenyu	2ef33abd20	some unary functions cast int input into float (#2740 ) * some unary functions cast int input into float * precision * image dtype	2023-12-13 00:10:29 -05:00
Shawn Hagler	51afe938f1	update onnx model links (#2737 )	2023-12-12 19:11:11 -08:00
chenyu	0869e7a301	update onnx benchmark urls (#2735 ) onnx is remapping the models, old ones are in archive/	2023-12-12 20:46:01 -05:00
George Hotz	6d6eb9302d	ruff checks the max line length is 150 (#2734 ) * ruff checks the max line length is 150 * fix tensor.py * a lot more * done	2023-12-12 17:34:47 -08:00
chenyu	00b611c156	simplify type promotion - remove weak types (#2730 )	2023-12-12 16:12:57 -05:00
chenyu	ef6e942a23	dtype promotion helpers (#2724 ) * dtype promotion helpers * better tests * space	2023-12-11 23:14:23 -05:00
Christopher Mauri Milan	0232db294d	fix tolist issue (#2723 )	2023-12-11 19:14:00 -08:00
chenyu	4075208127	some dtype creation spec test cases (#2722 )	2023-12-11 19:33:49 -05:00
Guy Leroy	ee9e1d3662	Extend available types for `safe_save` (#2720 ) * Extend available types to save with * Linter fix	2023-12-11 14:50:35 -08:00
qazal	a43bc78804	fix dtypes helpers for integers (#2716 ) * scalar * maybe do this instead * Revert "scalar" everything is a scalar * add tests in test_dtype * fuzz testing + fix unsigned ints * fuzz everything	2023-12-11 09:28:19 -08:00
chenyu	2ee6f689c5	simpler einsum (#2700 )	2023-12-10 21:24:44 -05:00
George Hotz	0fd44259cd	bf16 fix + cleanups from mixtral (#2698 ) * bf16 fix + cleanups from mixtral * generic bf16 cast	2023-12-10 16:31:52 -08:00
Davi Silva	7fbebb3df6	Implement einsum (#2686 ) * hopeful impl for Tensor.einsum * satisfy mypy by having less typing. :( * a few simple tests * even more tests * permute tests * xfails for improper usage * fix LLVM test fail * use argfix * more helpful error message on shape mismatch	2023-12-10 15:56:01 -08:00
chenyu	2d0e38e201	fix jit input_rawbuffers check wrt consts (#2689 ) * fix jit input_rawbuffers check wrt consts * .numpy()	2023-12-09 15:59:03 -05:00
geohotstan	67ff2b2b18	Formatted test_indexing (#2688 ) * added tensor.clone() for more correct cloning behavior * some work and randint issue * formatted * final cleanups * oops, bug fix	2023-12-09 11:38:36 -05:00
chenyu	0fb1d47aa0	two linearizer fuzzer failed test case for webgpu (#2685 ) * add a linearizer fuzzer failed for webgpu * CI specific	2023-12-08 22:52:34 -05:00
qazal	73b067f5ce	Bitcast p2 bfloat16 tests + clang fix (#2635 ) * add bf16 test support this model takes me almost a minute to download though: https://huggingface.co/TinyPixel/Llama-2-7B-bf16-sharded/resolve/main/pytorch_model-00001-of-00014.bin?download=true: 100%\|█████████████████████████████\| 981M/981M [00:40<00:00, 24.2MB/s] * ensure we first load if it is bitcast to avoid taking the address of an rvalue * tiny bf16 in the cloud skip GPU * should skip torch lint * Revert "ensure we first load if it is bitcast to avoid taking the address of an rvalue" This reverts commit `b86a28ab84`. * break the kernel * skip LLVM and GPU in CI * skip CUDA	2023-12-08 10:30:10 -08:00
qazal	a29538a094	green more dtypes tests (#2656 ) * universal test cast * disable div * midcast fixup * add 64-bit types * hack maximum * use Metal precise::sin instead of default This is because the default sin function defaults to single-percision math: https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf#page=164 * LLVM code_for_op support for var_dtype * comment out maximum for now with a TODO explaining it * Revert "hack maximum" This reverts commit `d170048c5f`. * make the comment more specific * slightly more forgiving * ok does this fail in all backends? * weird its only Metal CI * add graph * skip sin of nan for CUDACPU This is only happening in the CUDACPU runtime and not CUDA itself. https://github.com/tinygrad/tinygrad/actions/runs/7128973726/job/19412000385#step:16:36 * METAL and CUDACPU behave differently in overflows with numpy running on CI * that skip is wrong * skip fp16 tests on LLVM similar to test_dtype original commit that skipped LLVM in CI `1826ff6b89` * remove all of sin from CUDACPU * limit range of values in CUDACPU and METAL CI * Revert "use Metal precise::sin instead of default" This reverts commit `d960094d4a`. * change atol and rtol for Metal sin * METAL CI is more imprecise * cleanup --------- Co-authored-by: George Hotz <geohot@gmail.com>	2023-12-08 10:29:20 -08:00
George Hotz	4164d0ebbd	multitensor start (#2676 ) * multitensor work * early gen fixes the tests * atol for flaky test	2023-12-07 17:07:05 -08:00
Ahmed Harmouche	4b01839774	support vals on WebGPU, run more tests (#2668 ) * Vals on webgpu, run more tests * Skip slow tests, run symbolic ops tests * Balance out tests	2023-12-07 16:45:21 -08:00
geohotstan	d02ff21f1a	enable test_index and test_advancedindex (#2648 ) * enable test_index and test_advancedindex with pretty diff * removed contig * created set_ helper function * comment change * del empty line --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2023-12-07 19:44:39 -05:00
George Hotz	00d9eda961	FROM -> COPY, move vars_from_ast (#2675 )	2023-12-07 16:32:30 -08:00
chenyu	51af99367f	fix fuzz_linearizer using new device Buffer (#2674 )	2023-12-07 19:21:47 -05:00
nimlgen	650117a8f6	split large jit into several graphs (#2650 ) * jit graph split * update * that's fine, not all buffers are there now * use logariphmic tho, seems good * no keep it simple * add test * simplify * split graph when jit item cannot be graphed	2023-12-07 10:58:25 -08:00
chenyu	fd21eced74	reduce gpt2 kernel count in test_real_world (#2663 )	2023-12-06 21:57:04 -05:00
chenyu	371005cb2d	use one kvcache tensor in gpt2 instead of two separate caches (#2662 ) * use one kvcache tensor in gpt2 * test case * is None * better test cases	2023-12-06 20:59:17 -05:00
George Hotz	5a7b2ff1b2	masked shapetrackers (#2657 )	2023-12-06 11:22:26 -08:00
chenyu	b931a20882	minor shapetracker cleanup (#2652 )	2023-12-06 11:43:52 -05:00
qazal	c704a77ca0	green dtypes ALU tests (#2617 ) * dtypes alu test * those types don't exist in torch * floats * more tests * disable those * a couple unary tests * skip float16 tests in CI for GPU * fix LLVM bool add True+True=1+1=2 which truncates to False in native LLVM * remove hardcoded float for LLVM ALU fns * less sensitive atol for fp32, 1e-10 is flaky and sometimes failed even if you revert the merge commit for non-fp32 math, nothing has changed in our kernels for fp32. * return on overflows * fix CUDA exp2 * compute results of op regardless of bounds in a python backend * skip fp16 in GPU and CUDACPU * fuzz a smaller range in the float_midcast_int32 test I sampled this and we overflow ~70% of the time. because numpy behaves differently on different devices for overflows and Metal seems to do the same, I'm opting to eliminate the non-determinism here * remove CUDA exp2 overload it's already there now --------- Co-authored-by: George Hotz <geohot@gmail.com>	2023-12-06 08:15:46 -08:00
Amrit Sahu	71d989b476	adding test to cover #2644 failure (#2645 )	2023-12-06 11:00:30 -05:00
Ahmed Harmouche	50dcd532d5	Get all WEBGPU test_ops passing (#2646 ) * Get all WEBGPU tests passing * Custom render store is not needed in wgsl	2023-12-06 07:40:37 -08:00
qazal	be09cc87c1	Bitcast support / fast bf16 load (#2011 ) * bitcast renderers * fast llama load * make it one kernel * regression testing p1: re-enable test_dtype for all backends fix GPU * regression testing p2: fuzz all possible cases against numpy remove hancoded tests since the fuzzer covers them * define ushort * fix indent, probably need flake8 back for CI to catch --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-12-05 16:19:28 -08:00
George Hotz	232ed2af3f	more test cleanups (#2631 ) * more test cleanups * move test example back	2023-12-05 16:17:57 -08:00
wozeparrot	6d58c19736	binaryops xor (#2627 ) * feat: initial xor * feat: numpy xor * feat: llvm xor * feat: quick test for xor * feat: slightly working xor in torch * feat: xor in tensor * feat: slightly better test	2023-12-05 13:21:42 -08:00
George Hotz	c53e854687	cast image doesn't work on nvidia (#2626 ) * cast image doesn't work on nvidia * hmm, interpreteds use buffer size 0 * fix type * no lru	2023-12-05 12:48:19 -08:00
George Hotz	8c67eb1c92	GPT bugfixes (#2624 ) * simple fixes * fix exp2 * fixed * parallel beam for CUDA * fix image dtypes	2023-12-05 11:42:28 -08:00
chenyu	8903a40541	update the onnx test so cuda local run passes (#2623 )	2023-12-05 14:04:17 -05:00
George Hotz	35b5e95097	parallel beam search (#2610 ) * better print * fix beam search with vars * cleanups * parallel is not default * restore that * bugfix * cleanups * bugfix	2023-12-05 10:09:45 -08:00
chenyu	dd8b4632a4	regression test for reshape fix #2616 (#2620 )	2023-12-05 11:46:33 -05:00
chenyu	c257a0dd99	minor reshape cleanups (#2619 ) * minor reshape cleanups * mea culpa	2023-12-05 11:23:17 -05:00
geohotstan	fc00da538d	helper functions for test_indexing.py (#2615 ) * add some helpers * I think it should all work.. * fixed get_set_tensor * done * del import * bye bye typing * style * remove empty lines lol * deleted dtype arg * del trailing space	2023-12-05 02:00:41 -05:00
chenyu	7322ab8dfd	onnx tests with different dtypes (#2612 )	2023-12-05 00:04:08 -05:00
geohotstan	f12bcccb87	[ready] refactor getitem round 2 :D (#2568 ) * new getitem * go * add temporary simple tests * better * comments * WOW that took awhile * save 1 line lol * work * still need to add comprehensive tests, but i think getitem looks nice :D * GIMME GREEN CI CHECKMARK PLS * try.. * k idk * added tests for errors * fixed small hack * added tests * almost good * try no contig? * yay no more contig + comments and spacing * finishing touches (comments) * revert regex unittests lol * add suggested change * oops I fell asleep yesterday	2023-12-04 22:36:32 -05:00
George Hotz	09b6e254a3	hip compile speed (#2606 )	2023-12-04 13:47:40 -08:00
Amrit Sahu	e8d6a6ef2e	view.reshape without symbolic (#2218 ) * handle reshape of contiguous subparts with explicit mask * remove the add/remove ones logic in reshape * accomodate ones in accumulate logic * make multiply commutative * fix linting * make mypy happy * add test for commutative mul * merge dimensions in shape_strides for 1 range masks * add offsets for merging * fix linting * add back explicit 1 reshapes * fix mypy errors * fix accumulate by includng state * include non-zero stride dimension in acc * small cleanup * more compact to_shape_strides * more logical cleanup * compress more * compress reshape mask * adding some comments * small bug fix * improve test coverage * remove explicit add remove ones * small bug in test * enable test_reshape_splitting_combining * small fix * 10 lines less to_shape_strides * shorten reshape mask * some more cleanup * more cleanup * introduce some symbols for compactness * more symbols * more cleaner * lessen symbols, it became less readable * remove merge_views from view.reshape * change to_shape_strides to _merge_dims * improve readability * fix corner case * cleanup * better handling of 1 <= Variable('i',1,10) & new_dim = Variable('i',1,10) * rewrite _reshape_mask for readability * fix white space * add comment * nice shorthands for readability * add proof in docs * small nit --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2023-12-04 12:46:53 -05:00
George Hotz	664475f247	vals is an argument (#2599 ) * vals is an argument * don't even know how that's legal python	2023-12-03 21:50:43 -08:00
George Hotz	fcd0b2ee6c	fix multigpu on tinybox (#2595 ) * fix multigpu on tinybox * fixed multigpu	2023-12-03 16:48:07 -08:00
George Hotz	61c0113928	test external_multi_gpu.py (and works in CUDA)	2023-12-03 15:57:13 -08:00
George Hotz	bbeba8ec85	use default dict for external_model_benchmark (#2592 ) * device default * Device.DEFAULT * half max for cuda * CUDA_INCLUDE_PATH * closer to working * cuda fixups * Update ops_cuda.py	2023-12-03 15:25:43 -08:00
chenyu	550817389a	enable test_sample for all backend (#2593 )	2023-12-03 17:20:27 -05:00

... 69 70 71 72 73 ...

4618 Commits