tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 14:58:46 -05:00

Author	SHA1	Message	Date
chenyu	25a767cd5d	Remove LtNode.__mul__ and AndNode.__mul__ (#1913 )	2023-09-25 07:03:59 +08:00
chenyu	eaa8d343d8	Remove str type from map_buffers (#1912 )	2023-09-25 07:03:22 +08:00
Dat D. Nguyen	ae9529e678	chore: remove redundant noise in stable diffusion example (#1910 )	2023-09-24 21:33:45 +08:00
George Hotz	6d9065ed1c	Minor cleanups (#1911 ) * cleanups * remove that simplify	2023-09-24 21:32:50 +08:00
George Hotz	20059dc55b	Make ShapeTracker Immutable (#1909 ) * ugh * ops test pass * fix shapetracker tests * sym shapetracker * shapetracker is a tuple of views now * from_shape * fix has variable shape * key isn't needed * post init assert	2023-09-24 21:09:03 +08:00
nimlgen	45f02393f0	HipGraph support (#1880 ) * init hip graph * optimize args update * cache symbolic in jit * remove NOSTAT * init BasicBatchExecutor * symbolic infer cache per jit instance * basicbatchexec is defualt for compiled * batch_exec is taken from ASTRunner * no infer cache * batched execution of hip graph * add comment about hip graph batches * readable hip graph	2023-09-24 20:14:36 +08:00
George Hotz	7ff7aacdb4	LazyOp out of Linearizer (#1908 ) * loadop buffer on cpu * works for GPU * sort of working * has bugs * gpu tests pass * fix some tests * fix tensor cores * fix test linearizer * fix symbolic * fix has_variable_shape * non symbolic size * disable weird test * simple cache fix * fix custom function * fix kopt * cleanups * a bit broken on the assign * contig check * only buffer * need that order * idx * dedup buffers * hmm, bugfix * fix tensor cores * opts device	2023-09-24 14:30:53 +08:00
qazal	2201b46bce	Refactor Conv2d/ConvTranspose2d into a single parent class (#1906 ) * refactor Conv2d/ConvTranspose2d * raise in __call__ for the parent class * use ABC * drop ABC it's just syntactic sugar * use conv2d as base for the transposed version	2023-09-24 14:23:41 +08:00
George Hotz	97dc813329	Revert "All LazyOps in the Linearizer (#1905 )" (#1907 ) This reverts commit `a5820390db`.	2023-09-24 11:51:22 +08:00
George Hotz	a5820390db	All LazyOps in the Linearizer (#1905 ) * loadop buffer on cpu * works for GPU * sort of working * has bugs * gpu tests pass * fix some tests * fix tensor cores * fix test linearizer * fix symbolic * fix has_variable_shape * non symbolic size * disable weird test * simple cache fix * fix custom function * fix kopt * cleanups * a bit broken on the assign * contig check * only buffer * need that order * idx	2023-09-24 11:50:00 +08:00
George Hotz	0f373b8b47	cache more uops (#1904 ) * cache more uops * fix cacheable	2023-09-23 16:50:13 +08:00
George Hotz	1e15fdaee7	disable flaky triton test	2023-09-23 14:59:36 +08:00
George Hotz	0571dd7627	move all int (#1903 )	2023-09-23 14:43:45 +08:00
nimlgen	41aea3ad36	require C-contiguous array for hip._copyin (#1902 )	2023-09-23 14:36:59 +08:00
Szymon Ożóg	58296c079d	Make Triton work again (#1547 ) * Move ops_triton to runtime and remove errors from deprecated code * Remove deprecated AST Kernel * Remove deprecated buffer * Add TritonProgram * Triton Buffer * Use RawCUDABuffer * triton_compile * Added new parameter * pass _buf to program * remove deprecated include * Added triton tests * Deprecated includes removed * remove double print * Disable float4 support * Disable float4 support * variable load fix * Track local size * Add pycuda to triton dependencies * Merge test.yml * install cuda packages for testing * merge double package install * remove emulated from triton tests * upscale local index to power of 2 and add masking * cuda envs * Add TernaryOps * ConstOp loading * proper function name * remove deprecated variables * get global program from name * const ops match local shape * Enable test_nn * remove deprecated import * fix linter error * Add wait logic * Add local size override * accumulate local shapes instead of using max shape * Merge triton tests into global tests * fix envs in testing * Old testing routine * split file into renderer and program * remove print and starting whitespace * pretty ptx print on debug 5 * linter errors * ignore triton saturation tests * ignore test example * remove pytorch cpu extra index * Add triton to existing testing routine * use triton tests * disable cuda backend in triton tests * use cudacpu in tests * print used device * Print device default * Remove print * ensure we are running triton backend * update variable signatures * update dtypes for load * infinity render fixed * limit global size * negative infinity now properly rendered * split chain with parentheses for and node * Add option to disable shared memory, disable for triton * missing import * Properly index and mask conditional load * use mask only if not loading a block pointer * nan support * fix symbolic tests to include chain split * proper masking for stores * Implemented bool dtype * Add mod * fix loads for variables with valid range * merge triton with cuda runtime * merge from master * run triton tests with cuda * Correct target when running from triton * conftest with triton compiler config * use triton nightly * verbose tests for triton * capture stdout * fix function depth when exiting multiple loops * add render valid function for readabilty * fix mask for local loops * add _arg_int32 datatype * fix dims for conditional loads * enable non float stores * correct variable dtypes * fix type for arg_int32 * remove junk * Added get max function for range based var.max * remove deprecated code * Fix triton ptxas path * Fix testing for CI * clamp local size by max local size instead of always running max * Disable matmul test in triton cpu * rerun tests * Disable broken test in triton cpu * whitespace removed * rerun tests again * Disable TestSymbolicOps for triton * update to new uops * linter fix * ignore test/extra * linting fix * Update tinygrad/renderer/triton.py Co-authored-by: Gijs Koning <gijs-koning@live.nl> * remove deprecated line * quotes type fix * linter * Remove unnecesary lines * UnaryOps.NEG * dont define constants * Linting fix * Disable tests that are broken in ocelot * remove trailing whitespace * reduce line count * linting fix * update to new uast * New looping style * Update to new uast * make AST runner work with triton * linting fix * set renderer var for testing * disable local for ocelot * reenable all tests for ocelot * Pass shared to cuda * Don't group if the backend doesn't support shared mem * use working gpuocelot branch * enable all tests * enable local for ocelot * cleanup * Update test.yml * update cache key * reenable test symbolic and extra * Update test.yml * Revert "Update test.yml" (rerun tests) This reverts commit `98c0630ee5`. * Revert "fix symbolic tests to include chain split" This reverts commit `22a9a4c9cd`. * Revert "split chain with parentheses for and node" This reverts commit `7499a7004e`. * use global size from linearizer * rename newvar to dtype to match other renderers * join program start lines * simplify code that adds axis to local dims * assign r[u] in ssa * We no longer need to replace target in src * we no longer need to cast indices to int by hand * Update triton.py(rerun tests) * Update triton.py(rerun tests) * Update triton.py(rerun tests) --------- Co-authored-by: Gijs Koning <gijs-koning@live.nl> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-09-23 14:17:12 +08:00
George Hotz	6fb8b3bb60	move symbolic functions to shapetracker (#1901 )	2023-09-23 11:45:08 +08:00
George Hotz	9cf13bd055	rename reduce_op (#1900 ) * rename reduce_op * more design v2	2023-09-23 11:27:36 +08:00
George Hotz	73a6ed7862	Apply ShapeTracker in interpreted backends (#1846 ) * applying st * tests pass * minor cleanups * torch too * hack * contiguous * move mops * contig in BN * tests should pass * make torch fast * make zeros and ones contig by default * no contig there * fix padding with expanding * might fix tests * still doesn't fix bug, but should be there * Revert "still doesn't fix bug, but should be there" This reverts commit `8ea92f3e07`. * minor cleanups	2023-09-23 10:05:13 +08:00
Umut Zengin	3987280daf	Fix VALIDHACKS for Images and make it default (#1832 ) * valid hacks * valid hacks * valid hacks * new method * new method * handtune * is gate load breaking? * lint ruff less junk new approach? maybe this? * Make it more clear * Make it more clear * Will deal with the linter later * hack for linter * subs the idx but dont touch the valid * Updated the mod rules * lint hack * I believe bug fix lets see * Mod Node left * revert * Maybe this wont break? * revert * implemented "handtuned garbage" * revert and use VALIDHACKS * Lets see the CI * still broken? * currently its jungle * maybe this jungle ? * This works for everything somehow * Added test for symbolic * lint * final touch * This still works * lint * midway clean * less garbage * lint * final form * Slow but working way * lint and other stuff * lint * mypy * Make sure CI test Openpilot valid checks * test if CI break * Convert back * refactor * refactor * Managed to reduce openpilot time from 30 secs to 5 secs * Refactor * Substitute a node with variable * flake8 * Comment and refactor * More comprehensive mod * refactor * bug fix * More shave off * remove not sure part	2023-09-23 07:34:43 +08:00
Gijs Koning	767bb35903	Enable symbolic ops tests for LLVM (#1898 ) * Enable symbolic tests for HIP and LLVM * Only llvm	2023-09-23 07:30:26 +08:00
Gijs Koning	b8ff20ffe4	Gpt2 (#1896 ) * small helps * got something working * faster? * faster yes * cleanup * cleanup * cleanup * Fix non jit * Fix fp16 and some cleanup * Fix fp16 and some cleanup * cleanup * similar to master * cleanup	2023-09-22 20:14:47 +08:00
chenyu	b89ee1ac83	lazy type annotation and cleanups (#1897 )	2023-09-22 14:20:23 +08:00
George Hotz	78576915de	Add needed contiguous to DiskBuffer. SHM support on OSX (#1891 ) * add some contiguous * remove second contig * Revert "remove second contig" This reverts commit fc164f7dca1ad75b1e466e4e45a05eca58b7e0e0. * shm on osx * can repro bug * don't contig zeros and ones	2023-09-22 09:16:42 +08:00
qazal	d0e752003d	fixes (#1893 )	2023-09-22 07:20:27 +08:00
wozeparrot	009a99a0b1	feat: way cleaner hip wrapper (#1895 )	2023-09-22 07:20:03 +08:00
Yixiang Gao	cb5d6576cb	cifar step time 65ms while stay above 94% (#1888 ) * change reduceop heruistics * add model ema and jit hack * add ema eval * have to create a duplicate eval function for jit * remove manual seed * 94% achieveable with normal eval * ema is outputting the same results as normal * fix ema bug * ema achieves 94% with fix seed * multigpu tested * constant fold decay, fix jit, adjust message for multigpu * pull SpeedyResNet out of train_cifar()	2023-09-21 11:19:32 +08:00
kormann	864746d6aa	polish print_tree (#1868 ) * fix * isinstance	2023-09-21 11:13:10 +08:00
chenyu	a5090f0ee9	remove NumNode.int() (#1876 )	2023-09-21 10:29:16 +08:00
Gijs Koning	9eb6310686	Fix gpt optimization (#1885 ) * fix for gpt * the actual fix * Remove change in symbolic * small comment	2023-09-21 10:28:18 +08:00
Szymon Ożóg	bd3444797b	make ssa assign r[u] (#1887 )	2023-09-21 10:20:20 +08:00
nimlgen	9450e41f70	no import when Python is shutting down (#1875 )	2023-09-20 12:47:02 -04:00
Yixiang Gao	84ab47a90a	add branch up-to-date check (#1879 )	2023-09-20 12:41:51 -04:00
nimlgen	504bb6d0ea	support symbolic jit in HIP (#1877 )	2023-09-20 01:44:26 -04:00
chenyu	cd66c9e249	no numnode in shape (#1871 )	2023-09-17 07:49:45 +08:00
Yixiang Gao	18ec5a9e09	add comment bot to CI (#1873 )	2023-09-16 12:22:06 -04:00
Yixiang Gao	a27f6c7d62	add diff mode to sz.py (#1872 )	2023-09-16 00:43:47 -04:00
nimlgen	4c31dfafb3	add seed to gpt-2 (#1869 )	2023-09-15 17:34:14 -04:00
wozeparrot	c870764940	Revert "add line changes diff bot to CI (#1863 )" (#1870 )	2023-09-15 16:56:42 -04:00
Yixiang Gao	789c84a7a3	add line changes diff bot to CI (#1863 )	2023-09-15 16:29:58 -04:00
chenyu	29ac8293d7	run gpt2 in CI (#1866 )	2023-09-15 04:37:02 +08:00
chenyu	1b46de1a3e	fix type of helpers.prod, add test cases (#1859 )	2023-09-14 05:16:55 +08:00
chenyu	e67306ba04	symbolic shape type with TypeGuard (#1852 )	2023-09-13 05:27:22 +08:00
Roelof van Dijk	c91b44f7bf	refactor: move size to view (#1848 ) * refactor: move size to view * fix: pylint --------- Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-09-11 07:16:04 -07:00
chenyu	9e9ea20784	Fix view, CI cpu test with python 3.8 (#1845 )	2023-09-10 22:37:58 -04:00
chenyu	3ec301c2d7	apply view.py patch (#1844 )	2023-09-10 17:32:15 -07:00
Yixiang Gao	a32951a001	add test_tensor_copy (#1840 ) * add test_tensor_copy * fix whitespace * add value check	2023-09-10 16:01:58 -07:00
Roelof van Dijk	1bc52c60df	fix: minor tweaks to view (#1842 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-09-10 15:55:57 -07:00
George Hotz	47e602f717	view: do not trade complexity for speed (#1839 ) * view: do not trade complexity for speed * staticmethods * view create	2023-09-10 11:29:53 -07:00
chenyu	c0bc4cfbaf	DivNode.b is int (#1833 )	2023-09-10 09:04:29 -07:00
nimlgen	13790b1e20	cast types in render_load (#1837 )	2023-09-10 07:58:13 -07:00

... 157 158 159 160 161 ...

10417 Commits