tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-03 19:25:06 -05:00

Author	SHA1	Message	Date
chenyu	5235cdee3d	remove _arg_int32 internal type (#2767 ) in DEFINE_GLOBAL, PtrDtype(int32) is buffer and int32 is int	2023-12-14 14:17:14 -05:00
George Hotz	7e5b3e53fe	changes to prep for new lazy (#2748 ) * changes to prep for new lazy * put those back	2023-12-13 10:28:22 -08:00
Umut Zengin	8ad7cfeeb1	More simplification in to_image_idx and symbolic (#2679 ) * less valid * add test --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2023-12-13 12:30:44 -05:00
George Hotz	6d6eb9302d	ruff checks the max line length is 150 (#2734 ) * ruff checks the max line length is 150 * fix tensor.py * a lot more * done	2023-12-12 17:34:47 -08:00
Guy Leroy	ee9e1d3662	Extend available types for `safe_save` (#2720 ) * Extend available types to save with * Linter fix	2023-12-11 14:50:35 -08:00
George Hotz	0fd44259cd	bf16 fix + cleanups from mixtral (#2698 ) * bf16 fix + cleanups from mixtral * generic bf16 cast	2023-12-10 16:31:52 -08:00
qazal	73b067f5ce	Bitcast p2 bfloat16 tests + clang fix (#2635 ) * add bf16 test support this model takes me almost a minute to download though: https://huggingface.co/TinyPixel/Llama-2-7B-bf16-sharded/resolve/main/pytorch_model-00001-of-00014.bin?download=true: 100%\|█████████████████████████████\| 981M/981M [00:40<00:00, 24.2MB/s] * ensure we first load if it is bitcast to avoid taking the address of an rvalue * tiny bf16 in the cloud skip GPU * should skip torch lint * Revert "ensure we first load if it is bitcast to avoid taking the address of an rvalue" This reverts commit `b86a28ab84`. * break the kernel * skip LLVM and GPU in CI * skip CUDA	2023-12-08 10:30:10 -08:00
chenyu	b931a20882	minor shapetracker cleanup (#2652 )	2023-12-06 11:43:52 -05:00
Amrit Sahu	71d989b476	adding test to cover #2644 failure (#2645 )	2023-12-06 11:00:30 -05:00
George Hotz	232ed2af3f	more test cleanups (#2631 ) * more test cleanups * move test example back	2023-12-05 16:17:57 -08:00
George Hotz	35b5e95097	parallel beam search (#2610 ) * better print * fix beam search with vars * cleanups * parallel is not default * restore that * bugfix * cleanups * bugfix	2023-12-05 10:09:45 -08:00
chenyu	dd8b4632a4	regression test for reshape fix #2616 (#2620 )	2023-12-05 11:46:33 -05:00
chenyu	c257a0dd99	minor reshape cleanups (#2619 ) * minor reshape cleanups * mea culpa	2023-12-05 11:23:17 -05:00
Amrit Sahu	e8d6a6ef2e	view.reshape without symbolic (#2218 ) * handle reshape of contiguous subparts with explicit mask * remove the add/remove ones logic in reshape * accomodate ones in accumulate logic * make multiply commutative * fix linting * make mypy happy * add test for commutative mul * merge dimensions in shape_strides for 1 range masks * add offsets for merging * fix linting * add back explicit 1 reshapes * fix mypy errors * fix accumulate by includng state * include non-zero stride dimension in acc * small cleanup * more compact to_shape_strides * more logical cleanup * compress more * compress reshape mask * adding some comments * small bug fix * improve test coverage * remove explicit add remove ones * small bug in test * enable test_reshape_splitting_combining * small fix * 10 lines less to_shape_strides * shorten reshape mask * some more cleanup * more cleanup * introduce some symbols for compactness * more symbols * more cleaner * lessen symbols, it became less readable * remove merge_views from view.reshape * change to_shape_strides to _merge_dims * improve readability * fix corner case * cleanup * better handling of 1 <= Variable('i',1,10) & new_dim = Variable('i',1,10) * rewrite _reshape_mask for readability * fix white space * add comment * nice shorthands for readability * add proof in docs * small nit --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2023-12-04 12:46:53 -05:00
chenyu	e9426f4fe4	simpler get_contraction (#2552 ) * simpler get_contraction * and test	2023-12-01 18:02:52 -05:00
George Hotz	2c363b5f0b	new style device (#2530 ) * cpu tests pass * torch works * works * metal works * fix ops_disk * metal jit works * fix openpilot * llvm and clang work * fix webgpu * docs are rly broken * LRU works on metal * delete comment * revert name to ._buf. LRU only on Compiled * changes * allocator * allocator, getting closer * lru alloc * LRUAllocator * all pass * metal * cuda * test examples * linearizer * test fixes * fix custom + clean realize * fix hip * skip tests * fix tests * fix size=0 * fix MOCKHIP * fix thneed * copy better * simple * old style metal copy * fix thneed * np reshape * give cuda a device	2023-11-30 17:07:16 -08:00
George Hotz	d87a246439	move to new cached fetch (#2493 ) * move to new cached fetch * extra.utils is over * loads * bump download cache * bump timeout	2023-11-28 17:36:55 -08:00
George Hotz	ab5d14d4ba	MEM -> LOAD (#2492 ) * MEM -> LOAD * keep legacy working	2023-11-28 16:46:37 -08:00
chenyu	847f0a02b1	non-simplifiable mod should result in ModNode (#2490 ) * non-simplifiable mod should result in ModNode * space	2023-11-28 16:52:19 -05:00
Christopher Mauri Milan	7f01dd04f0	Apply ruff linting rules to tests (#2473 ) * everything except F821 * enable F821 with noqa * dumb fix * fix remaining imports and (former) lambdas * replace _ with noqa to avoid gc	2023-11-27 21:24:06 -08:00
Paul Gustafson	98cd9e8926	Add assertion to prevent nonsense mod values (#2474 )	2023-11-27 18:37:44 -08:00
chenyu	61a80a0675	asserts LtNodes of SumNode with MulNode of Nodes (#2465 )	2023-11-27 12:56:59 -05:00
Paul Gustafson	1d89c018fa	Add isinstance check before gcd call in SumNode.__lt__ (#2450 ) * Add isinstance check before gcd call * Delete blank lines * Fix unit test typo * Delete blank lines again --------- Co-authored-by: Paul Gustafson <paul.gustafson@theambrusgroup.com>	2023-11-26 13:05:04 -08:00
George Hotz	8e9cdef61f	clean up the buffers (#2447 ) * clean up the buffers * remove allocate_output * functools.lru_cache is methodcache * add TestShapeTrackerSize * cache_clear * no 0 sz buffer, add _ on functions that shouldn't be imported * fix size * if -> while	2023-11-26 11:02:29 -08:00
George Hotz	095e2ced61	add name support to fetch (#2407 ) * add name support * use fetch in gpt2 * remove requests from main lib, networkx also optional * umm, keep that assert * updates to fetch * i love the walrus so much * stop bundling mnist with tinygrad * err, https * download cache names * add DOWNLOAD_CACHE_VERSION * need env. * ugh, wrong path * replace get_child	2023-11-23 14:16:17 -08:00
George Hotz	a0890f4e6c	move fetch to helpers (#2363 ) * switch datasets to new fetch * add test_helpers * fix convnext and delete old torch load	2023-11-19 12:29:51 -08:00
chenyu	d7d078c7f9	Node.vars() returns a set and properly dedup (#2356 ) * dedup RedNode.vars() * vars returns a set * fix more vars * unused import * update to_movement_ops * comment	2023-11-18 17:44:52 -05:00
chenyu	f02e17a967	Variable.num -> NumNode (#2354 )	2023-11-18 15:45:52 -05:00
George Hotz	40246d35bc	ops_shm removed (#2351 ) * ops_shm removed * buf.cast * err, forgot those	2023-11-18 11:41:58 -08:00
George Hotz	3baaf298d6	two stage cumsum in tensor.py (#2331 ) * two stage cumsum in tensor.py * 2 more kernels for llama cumsum * gpt-2 and llama use fast multinomial	2023-11-16 12:09:53 -08:00
George Hotz	0cbf6c1811	move things, clean up extra (#2292 ) * move things * idk why pylint needs that now * delete unused	2023-11-13 20:18:40 -08:00
qazal	e2428b63a6	external (#2191 )	2023-10-31 13:57:24 -07:00
chenyu	3c88af5071	use unique table name for each disk_cache test (#2184 )	2023-10-30 13:49:49 -07:00
George Hotz	cea2bc7964	Add dictionary keys to reduce db size (#2131 ) * work * ignore beam cache * dictionary keys are generic * minor db cleanups * fix baseline and extract dataset * fix training * log likelihood	2023-10-24 10:49:22 -04:00
George Hotz	6dc8eb5bfd	universal disk cache (#2130 ) * caching infra for tinygrad * nons tr key * fix linter * no shelve in beam search * beam search caching * check tensor cores with beam too * pretty print * LATEBEAM in stable diffusion	2023-10-22 10:56:57 -07:00
Umut Zengin	01b98b7f42	MulNode.__lt__ rule (#2086 ) * Added the rule * Added tests * flake8 * self.b == -1 shortcut	2023-10-17 13:18:35 -07:00
Umut Zengin	776605f2fc	O(1) VALIDHACKS (#2072 ) * first refactoring * O(1) validhacks * O(1) validhacks * Some cleaning * mypy * flake8 * Trim trim * flake8 * clean * less chaotic * less chaotic * flake8 * Symbolic, SumNode include mulnode for gcd * fix tests * smal optim * revert * clean * clean * flake8 * small fix * Add symbolic test	2023-10-15 11:26:41 -07:00
Umut Zengin	6b7ac5c431	ModNode __mod__ rule (#2039 ) * Implement mod rule * mypy * feat: New test added	2023-10-12 11:30:10 -07:00
qazal	e40f141203	Refactor and add more unit tests for disktensors (#2022 ) * testing with the test_ops pattern * add assign test * flake8 complaining about single line fn * slice 2d and minor cleanup * make assign_slice a one-liner * we dont need to repeat the same lambda twice, default tinygrad_fxn to be np_fxn * back assign fn for np array * implement __setitem__ in tensor.py * dont re-slice the ret tesnsor * one liner assign * drop the permute test	2023-10-09 18:46:29 -07:00
George Hotz	ffa33d743a	good changes from openpilot_compile2 (#2000 ) * good changed from openpilot_compile2 * float32 image type was wrong * cleaner way to write that + a test	2023-10-06 13:33:24 -07:00
George Hotz	22b8576887	more lazy cleanup (#1938 ) * small lazy cleanups * a few more * cleanups * no more realizing in the scheduler test * a few more minor things * that was just wrong * fix graph. the graph test was completely useless * make graph usable * fix op graph	2023-09-29 00:53:29 -07:00
George Hotz	c907efbf4a	reorder a few things (#1915 ) * reorder a few things * huh, that has to be there * move apply shapetracker * BufferOps * only for type checking	2023-09-25 10:17:21 +08:00
George Hotz	20059dc55b	Make ShapeTracker Immutable (#1909 ) * ugh * ops test pass * fix shapetracker tests * sym shapetracker * shapetracker is a tuple of views now * from_shape * fix has variable shape * key isn't needed * post init assert	2023-09-24 21:09:03 +08:00
George Hotz	7ff7aacdb4	LazyOp out of Linearizer (#1908 ) * loadop buffer on cpu * works for GPU * sort of working * has bugs * gpu tests pass * fix some tests * fix tensor cores * fix test linearizer * fix symbolic * fix has_variable_shape * non symbolic size * disable weird test * simple cache fix * fix custom function * fix kopt * cleanups * a bit broken on the assign * contig check * only buffer * need that order * idx * dedup buffers * hmm, bugfix * fix tensor cores * opts device	2023-09-24 14:30:53 +08:00
George Hotz	97dc813329	Revert "All LazyOps in the Linearizer (#1905 )" (#1907 ) This reverts commit `a5820390db`.	2023-09-24 11:51:22 +08:00
George Hotz	a5820390db	All LazyOps in the Linearizer (#1905 ) * loadop buffer on cpu * works for GPU * sort of working * has bugs * gpu tests pass * fix some tests * fix tensor cores * fix test linearizer * fix symbolic * fix has_variable_shape * non symbolic size * disable weird test * simple cache fix * fix custom function * fix kopt * cleanups * a bit broken on the assign * contig check * only buffer * need that order * idx	2023-09-24 11:50:00 +08:00
Umut Zengin	3987280daf	Fix VALIDHACKS for Images and make it default (#1832 ) * valid hacks * valid hacks * valid hacks * new method * new method * handtune * is gate load breaking? * lint ruff less junk new approach? maybe this? * Make it more clear * Make it more clear * Will deal with the linter later * hack for linter * subs the idx but dont touch the valid * Updated the mod rules * lint hack * I believe bug fix lets see * Mod Node left * revert * Maybe this wont break? * revert * implemented "handtuned garbage" * revert and use VALIDHACKS * Lets see the CI * still broken? * currently its jungle * maybe this jungle ? * This works for everything somehow * Added test for symbolic * lint * final touch * This still works * lint * midway clean * less garbage * lint * final form * Slow but working way * lint and other stuff * lint * mypy * Make sure CI test Openpilot valid checks * test if CI break * Convert back * refactor * refactor * Managed to reduce openpilot time from 30 secs to 5 secs * Refactor * Substitute a node with variable * flake8 * Comment and refactor * More comprehensive mod * refactor * bug fix * More shave off * remove not sure part	2023-09-23 07:34:43 +08:00
George Hotz	78576915de	Add needed contiguous to DiskBuffer. SHM support on OSX (#1891 ) * add some contiguous * remove second contig * Revert "remove second contig" This reverts commit fc164f7dca1ad75b1e466e4e45a05eca58b7e0e0. * shm on osx * can repro bug * don't contig zeros and ones	2023-09-22 09:16:42 +08:00
chenyu	a5090f0ee9	remove NumNode.int() (#1876 )	2023-09-21 10:29:16 +08:00
chenyu	1b46de1a3e	fix type of helpers.prod, add test cases (#1859 )	2023-09-14 05:16:55 +08:00

1 2 3

119 Commits