tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-23 13:58:00 -05:00

Author	SHA1	Message	Date
George Hotz	8c67eb1c92	GPT bugfixes (#2624 ) * simple fixes * fix exp2 * fixed * parallel beam for CUDA * fix image dtypes	2023-12-05 11:42:28 -08:00
chenyu	8903a40541	update the onnx test so cuda local run passes (#2623 )	2023-12-05 14:04:17 -05:00
George Hotz	35b5e95097	parallel beam search (#2610 ) * better print * fix beam search with vars * cleanups * parallel is not default * restore that * bugfix * cleanups * bugfix	2023-12-05 10:09:45 -08:00
chenyu	dd8b4632a4	regression test for reshape fix #2616 (#2620 )	2023-12-05 11:46:33 -05:00
chenyu	c257a0dd99	minor reshape cleanups (#2619 ) * minor reshape cleanups * mea culpa	2023-12-05 11:23:17 -05:00
geohotstan	fc00da538d	helper functions for test_indexing.py (#2615 ) * add some helpers * I think it should all work.. * fixed get_set_tensor * done * del import * bye bye typing * style * remove empty lines lol * deleted dtype arg * del trailing space	2023-12-05 02:00:41 -05:00
chenyu	7322ab8dfd	onnx tests with different dtypes (#2612 )	2023-12-05 00:04:08 -05:00
geohotstan	f12bcccb87	[ready] refactor getitem round 2 :D (#2568 ) * new getitem * go * add temporary simple tests * better * comments * WOW that took awhile * save 1 line lol * work * still need to add comprehensive tests, but i think getitem looks nice :D * GIMME GREEN CI CHECKMARK PLS * try.. * k idk * added tests for errors * fixed small hack * added tests * almost good * try no contig? * yay no more contig + comments and spacing * finishing touches (comments) * revert regex unittests lol * add suggested change * oops I fell asleep yesterday	2023-12-04 22:36:32 -05:00
George Hotz	09b6e254a3	hip compile speed (#2606 )	2023-12-04 13:47:40 -08:00
Amrit Sahu	e8d6a6ef2e	view.reshape without symbolic (#2218 ) * handle reshape of contiguous subparts with explicit mask * remove the add/remove ones logic in reshape * accomodate ones in accumulate logic * make multiply commutative * fix linting * make mypy happy * add test for commutative mul * merge dimensions in shape_strides for 1 range masks * add offsets for merging * fix linting * add back explicit 1 reshapes * fix mypy errors * fix accumulate by includng state * include non-zero stride dimension in acc * small cleanup * more compact to_shape_strides * more logical cleanup * compress more * compress reshape mask * adding some comments * small bug fix * improve test coverage * remove explicit add remove ones * small bug in test * enable test_reshape_splitting_combining * small fix * 10 lines less to_shape_strides * shorten reshape mask * some more cleanup * more cleanup * introduce some symbols for compactness * more symbols * more cleaner * lessen symbols, it became less readable * remove merge_views from view.reshape * change to_shape_strides to _merge_dims * improve readability * fix corner case * cleanup * better handling of 1 <= Variable('i',1,10) & new_dim = Variable('i',1,10) * rewrite _reshape_mask for readability * fix white space * add comment * nice shorthands for readability * add proof in docs * small nit --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2023-12-04 12:46:53 -05:00
George Hotz	664475f247	vals is an argument (#2599 ) * vals is an argument * don't even know how that's legal python	2023-12-03 21:50:43 -08:00
George Hotz	fcd0b2ee6c	fix multigpu on tinybox (#2595 ) * fix multigpu on tinybox * fixed multigpu	2023-12-03 16:48:07 -08:00
George Hotz	61c0113928	test external_multi_gpu.py (and works in CUDA)	2023-12-03 15:57:13 -08:00
George Hotz	bbeba8ec85	use default dict for external_model_benchmark (#2592 ) * device default * Device.DEFAULT * half max for cuda * CUDA_INCLUDE_PATH * closer to working * cuda fixups * Update ops_cuda.py	2023-12-03 15:25:43 -08:00
chenyu	550817389a	enable test_sample for all backend (#2593 )	2023-12-03 17:20:27 -05:00
qazal	4380ccb169	Non fp32 math (#2264 ) * `global_load` and `global_store` using buffer dtype * `UOps.PHI` in all dtypes * `UOps.ALU` in all dtypes * `UOps.CONST` & `UOps.DEFINE_ACC` in all dtypes * -- endof implementation -- +tiny lint changes * these tests require the fp16 extention you can run them locally to confirm they're green: (GPT2 test is broken in master for mac, see [this](https://discord.com/channels/1068976834382925865/1069001075828469790/1177993277958533261) `GPU=1 python3 -m pytest test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_dequantizelinear_e4m3fn_float16_cpu test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_max_float16_cpu test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_min_float16_cpu test/models/test_real_world.py::TestRealWorld::test_llama test/models/test_real_world.py::TestRealWorld::test_gpt2 test/models/test_whisper.py test/test_specific_conv.py::TestSpecific::test_big_vec_mul` skip the new test_linearizer_failures in CI GPU because of the fp16 extention This passes on a real GPU since the extention is available: `GPU=1 python3 -m pytest test/test_linearizer_failures.py::TestLinearizerFailures::test_failure_8` see CI logs [here](https://github.com/tinygrad/tinygrad/actions/runs/6996590597/job/19032641427#step:14:644) * these tests fail in CI due to segfaults and CPU crashes To confirm they're green locally, you can run the following commands: 1. For the tests skipped in test_ops.py (note: CLANG is very slow) `for var in GPU CUDA CLANG; do export $var=1; for test in test/test_ops.py::TestOps::test_slice_fancy_indexing_no_dim_collapse test/test_ops.py::TestOps::test_slice_fancy_indexing_dim_collapse_int test/test_ops.py::TestOps::test_slice_fancy_indexing_dim_inject_none test/test_ops.py::TestOps::test_slice_fancy_indexing_dim_inject_and_collapse; do python3 -m pytest $test; done; unset $var; done` 2. For the ONNX tests skipped in CLANG: ``` CLANG=1 python3 -m pytest test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_ai_onnx_ml_array_feature_extractor_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_gather_elements_0_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_mean_weight_ii_3d_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_gather_elements_1_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_NCd1_mean_weight_negative_ii_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1_weight_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2d3_none_no_weight_negative_ii_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1_ii_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_mean_weight_ii_4d_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_mean_weight_ii_3d_log_prob_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_gather_elements_negative_indices_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_NCd1d2d3d4d5_mean_weight_log_prob_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_NCd1_mean_weight_negative_ii_log_prob_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_no_weight_reduction_mean_ii_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_NCd1d2d3d4d5_mean_weight_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2d3d4d5_mean_weight_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1_mean_weight_negative_ii_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_sce_mean_weight_ii_4d_log_prob_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_with_weight_reduction_mean_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1_weight_ii_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_with_weight_reduction_sum_ii_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_with_weight_reduction_sum_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_reduction_sum_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2d3d4d5_none_no_weight_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2d3_sum_weight_high_ii_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_reduction_mean_expanded_cpu \ test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_nllloss_NCd1d2_with_weight_expanded_cpu ``` 3. The LLVM test I skipped here is already [skipped in master for all backends](https://github.com/tinygrad/tinygrad/blob/master/test/external/external_test_onnx_backend.py#L186), I just made it more specific `LLVM=1 python3 -m pytest test/external/external_test_onnx_backend.py::OnnxBackendNodeModelTest::test_dequantizelinear_e4m3fn_float16_cpu` * Revert "these tests fail in CI due to segfaults and CPU crashes" This reverts commit `15db570143`. * merge with cleanup-vectorized-hip-renders * barely working HIP P1, ALU ops need a refactor? * manage the fact that in HIP [half2 is actually an unsigned int vec](`f921880387/hip/include/hip/amd_detail/amd_hip_fp16.h (L59)`) and half is a totally different __half that [has an unsigned int element in it](`f921880387/hip/include/hip/amd_detail/amd_hip_fp16.h (L50)`) but can't be accessed [because it's private](`f921880387/hip/include/hip/amd_detail/amd_hip_fp16.h (L86)`). If you just do this: ``` half2 val0 = // ... half val1 = // ... ``` then you can't do: ``` val0.x + val1 // error: use of overloaded operator '+' is ambiguous (with operand types 'unsigned short' and 'half' (aka '__half')) ``` * update the sign definition to avoid division by zero in all dtypes * diff cleanup p1: why were these in the diff anyways * less hacky HIP, enable CIFAR fp16 benchmark, test ops for HIP in CI! add ALU ops overloads for HIP this will make HIP max work handle mod Revert "handle mod" This reverts commit 370fd4b3fbe99b6ae8cc293d005b106628205933. update max to use hmax add HIP GEP render logic enable CIFAR fp16 benchmark test ops for HIP back to store as float because this only works for float4 grouping right now test_ops for hip!! always sign * back to the sign we had before because we cant do a backward pass on a Less node * remove old hacks HIP compiling test_ops in CI takes ~9 mins, not doing it for now new HIP ALUs * reduce accs done right * refactor to function * no device hacks hacks p2 the other way * LLVM ALU ops half, float and double are all float update max * update test_uops, cmplt is always a bool in the real linearizer. assertAlmostEqual is wrong when ret is bool * cleanup LLVM wrong code * dummy change for the CUDA install glitch --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-12-03 13:45:49 -08:00
chenyu	1ac958a058	update pytest marks and CI test filters (#2587 ) * remove pytest marks * test more stuff * fine revert some * add that mark back * skip that * hmm LLVM does not work on ubuntu * too slow on CUDA CI * dup test	2023-12-03 15:20:44 -05:00
qazal	ab2d4d8d29	Fix cl import in the copy_speed test and cifar example (#2586 ) * fix CL import * update test to only run on GPU * update hlb_cifar too	2023-12-03 09:22:07 -08:00
chenyu	3226b3d96b	enable the jit random test (#2580 )	2023-12-02 20:25:23 -05:00
chenyu	09c9794f3f	clean external_test_opt.py (#2578 )	2023-12-02 19:51:08 -05:00
George Hotz	171543fc8d	cleanups to save lines and files (#2577 ) * runtime/graph -> features/graph * put all the cstyle renderers in cstyle * same line for those * how did that pass mypy	2023-12-02 16:29:56 -08:00
George Hotz	d6b404ac11	No dtype alloc (#2570 ) * fix all allocs * improve docs * ugh fix fake alloc	2023-12-02 13:29:40 -08:00
chenyu	c8774713c5	lazy cleanup (#2567 )	2023-12-02 13:21:43 -05:00
George Hotz	5068e99d18	refactor to remove extra kernel params (#2563 ) * refactor to have compiled kernel * bugfixes * docs/beautiful.py * revert that * fix tests	2023-12-02 00:32:25 -08:00
George Hotz	27481b9206	Switch ops_gpu -> gpuctypes (#2532 ) * ops_gpu is go * fix size 0 * fix image, and add more tests * nerf openpilot test, doesn't test thneed * run the schedule * better * oops, new inputs * delete pyopencl * Update ops_gpu.py	2023-12-01 22:30:21 -08:00
George Hotz	6733425095	lower schedule (#2559 ) * lower schedule * remove RAND, and don't put load in the JIT yet * better fix for that test	2023-12-01 19:17:46 -08:00
Christopher Mauri Milan	077567f62d	Remove as_buffer for TORCH (#2554 ) * remove as_buffer for torch * enable torch zerocopy if on cpu * remove as_buffer even on torch:cpu	2023-12-01 18:51:38 -08:00
chenyu	86fbd413f3	update test_real_world configs (#2557 )	2023-12-01 20:03:52 -05:00
andresgit	00523d5656	New fix accessing elements created by padding (#2529 ) * pad slice test cases, many failing * fix failing test cases check mask if we are outside the base buffer also create a multi-view if in that case we reshape to an empty shape * real_offset calculation more readable --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2023-12-01 19:08:10 -05:00
chenyu	67f4e03724	rewrite 0 size loadop into a CONST (#2556 ) * rewrite 0 size loadop into a CONST * check alloc size * EMPTY is better * Revert "EMPTY is better" This reverts commit 574fe0f9ed28f1b97da5a81afdfd2cd5d9a94ff9. * no ast is created * fix test	2023-12-01 18:29:06 -05:00
George Hotz	4447188051	gate METAL_FAST_LOAD	2023-12-01 15:28:40 -08:00
chenyu	e9426f4fe4	simpler get_contraction (#2552 ) * simpler get_contraction * and test	2023-12-01 18:02:52 -05:00
George Hotz	f5de21e753	fast path for copy (#2548 ) * fast copy * ruff first * flat_mv on malloc * order + webgpu test	2023-12-01 11:34:47 -08:00
George Hotz	12fa846122	zero copy (#2531 ) * zero copy * zero copy test * loads coder in milliseconds * zero copy for cpu and torch * src_from_buffer is None * SLOW_METAL_COPY there	2023-11-30 18:38:41 -08:00
George Hotz	2c363b5f0b	new style device (#2530 ) * cpu tests pass * torch works * works * metal works * fix ops_disk * metal jit works * fix openpilot * llvm and clang work * fix webgpu * docs are rly broken * LRU works on metal * delete comment * revert name to ._buf. LRU only on Compiled * changes * allocator * allocator, getting closer * lru alloc * LRUAllocator * all pass * metal * cuda * test examples * linearizer * test fixes * fix custom + clean realize * fix hip * skip tests * fix tests * fix size=0 * fix MOCKHIP * fix thneed * copy better * simple * old style metal copy * fix thneed * np reshape * give cuda a device	2023-11-30 17:07:16 -08:00
chenyu	7d26452305	call ruff with --preview (#2522 ) some checks are ignored without --preview	2023-11-30 13:59:00 -05:00
chenyu	5db0cdfbd3	support list of ints (or other Tensorable) in tensor indices (#2520 ) * support list of ints (or other Tensorable) in tensor indices * enable some index test cases	2023-11-30 12:46:33 -05:00
chenyu	bd941a0df1	first version of test_indexing (#2515 ) * first version of test_indexing * move to test/imported	2023-11-30 00:03:59 -05:00
qazal	370cfbb957	Cleanup vectorized hip renders (#2497 ) * add typedefs and make_dtypen functions use ext_vector_type for half16 kernels * remove the old test_render because we just use whatever cstyle has * align vectors	2023-11-29 14:02:12 -08:00
George Hotz	065aff747e	make webgpu test reliable (#2502 ) * remove retry that doesn't work * fix cleanup * process exit in cleanup * add space	2023-11-29 10:02:24 -08:00
George Hotz	6707f2588e	use copyin (#2500 ) * it's always copyin * all RawBuffer are RawBufferCopyIn * cleanups * this fixes it * requirements='C' * more correct	2023-11-29 09:34:00 -08:00
chenyu	3eb3c74675	metal ci tests everything (#2499 ) * metal ci tests everything * pretty good * METAL	2023-11-29 12:04:37 -05:00
George Hotz	889acefe85	Support weird loads in Image (#2498 ) * image support weird loads * umm, that was always wrong * openpilot compile fails with a weird error * image test passes * we have valids now * clean that up * no more required opts * add fastvits test, fix bug * minor cleanups	2023-11-29 08:30:46 -08:00
George Hotz	5629fc368c	Use Buffer.STORE at the end of ASTs (#2494 ) * work * store broken * interpreteds work * this passes * symbolic cpu * fix tests * fix opt tests * images fail * fix InterpretedFlopCounter * stupid hack for images	2023-11-28 20:11:37 -08:00
Liam	cf0c9096a9	Removing METAL Skips as CI works (#2488 ) * Test metal CI * remove metal and CI restrictions * enable dtype tests for metal ci	2023-11-28 19:46:59 -08:00
George Hotz	d87a246439	move to new cached fetch (#2493 ) * move to new cached fetch * extra.utils is over * loads * bump download cache * bump timeout	2023-11-28 17:36:55 -08:00
George Hotz	ab5d14d4ba	MEM -> LOAD (#2492 ) * MEM -> LOAD * keep legacy working	2023-11-28 16:46:37 -08:00
chenyu	847f0a02b1	non-simplifiable mod should result in ModNode (#2490 ) * non-simplifiable mod should result in ModNode * space	2023-11-28 16:52:19 -05:00
mmmkkaaayy	ddb6a33ae5	improve test assertions for jit cache len with graph executor (#2476 ) * improve test assertions for jit cache len with graph executor * delete newline * unused import * another unused import	2023-11-27 23:02:45 -08:00
chenyu	28a67106ca	enable symbolic ops tests for hip (#2485 )	2023-11-27 22:33:41 -08:00

... 66 67 68 69 70 ...

4433 Commits