tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 14:58:46 -05:00

Author	SHA1	Message	Date
David Hou	e74a6ca7e4	expand in terms of substitute (#1827 )	2023-09-09 14:43:00 -07:00
George Hotz	0e3e2bac13	amd wino: upload results	2023-09-09 13:57:14 -07:00
George Hotz	6f95c5f284	winograd speed test for AMD (#1826 )	2023-09-09 13:56:33 -07:00
George Hotz	0f2bd10d00	add winograd CIFAR to mac tests (#1825 ) * add winograd CIFAR to mac tests * symlink already done	2023-09-09 13:45:24 -07:00
nimlgen	31fca43706	kopt works with local+grouped reduce and tests (#1824 )	2023-09-09 13:22:09 -07:00
chenyu	9da40c8448	move Node.__lt__ SumNode special case to SumNode (#1823 )	2023-09-09 13:20:38 -07:00
Francis Lam	651205fa5c	linearizer: support local and group_for_reduce dimensions together (#1821 ) also minor changes to test_speed_v_torch.py and size of UOps.SPECIAL	2023-09-08 12:39:27 -07:00
segf00lt	9e8c1dbf34	patch to remove hack from stable_diffusion.py (#1814 ) * patch to remove hack from stable_diffusion.py * sorry linter * realize after assign? * float16 broken in llvmlite use float64 for now * int32 * idiot forgot to change test array dtype	2023-09-08 09:26:50 -07:00
chenyu	ebcda8a714	Move var_vals from ShapeTracker to LazyBuffer (#1819 )	2023-09-08 09:25:10 -07:00
kormann	7ac65a93b4	utils.printtree (#1816 ) * utils.printtree * linter compliance * rename to print_tree	2023-09-07 23:08:57 -07:00
George Hotz	4613c9e77c	add tvm example, formatting (#1813 ) * add tvm example * no realize	2023-09-07 11:50:41 -07:00
nimlgen	5b15a972b5	no functions with same names in test/ (#1811 )	2023-09-07 11:27:31 -07:00
George Hotz	722823dee1	stable diffusion: force fp16 free	2023-09-06 15:11:05 -07:00
chenyu	928cb1a64a	AndNode.substitute short circuit (#1800 ) * AndNode substitute short circuit * Node.__bool__ is faster than Node.__eq__	2023-09-06 14:58:49 -07:00
nimlgen	a78a1fa499	fix jit buffer reuse when freed (#1802 ) * fix jit buffer reuse when freed * Firbid output_buffer reusage	2023-09-06 14:41:57 -07:00
Yixiang Gao	22cf15e9d0	convert function into tinygrad (#1803 )	2023-09-06 14:41:26 -07:00
Pavol Rusnak	52a92bf95d	use class Foo: instead of class Foo(): (#1797 ) * use class Foo: instead of class Foo(): * add ruff linter, copy settings from .flake8 to ruff.toml	2023-09-06 12:20:25 -07:00
badcc	fd25792c8b	Ensure freqs as type float32 in freqs_cis (#1798 )	2023-09-06 10:24:15 -07:00
chenyu	35072877ef	sym_infer is noop for int input (#1795 )	2023-09-06 09:17:20 -07:00
George Hotz	f67638b27a	delete broken DDPG example	2023-09-06 08:01:12 -07:00
George Hotz	78a43ad2c7	add uop fixup (#1793 )	2023-09-06 07:55:22 -07:00
geohotstan	1bbf26d7fd	fix try except not catching fxn() in benchmark (#1783 ) * have function raise notimplementederror * more lines * revert back to 2 lines :D * aahhhhhhhh shoooot im stupid * keep it minimal?	2023-09-06 07:36:43 -07:00
chenyu	09e78a9d07	Node does not need to subclass ABC (#1792 ) * Node does not need to subclass ABC * class Node:	2023-09-06 07:35:45 -07:00
badcc	ee9ac20752	Use correct dtype in Tensor when data is an ndarray (#1785 ) * use correct dtype in Tensor when data is an ndarray * attempt 2 * add assert to be consistent * Add test case for ndarray * Add test case for list * remove whitespace	2023-09-06 07:35:32 -07:00
nimlgen	130cd55942	fix gpu compilation of const GEP (#1788 )	2023-09-06 07:34:46 -07:00
George Hotz	e10a9692ec	Revert "fix attn_mask None issue" (#1787 ) * Revert "fix attn_mask None issue (#1786)" This reverts commit `bd06d88c73`. * Update tensor.py	2023-09-05 21:18:55 -07:00
David Hou	343b256deb	PoC fast winograd compile (#1771 ) * proof of concept for variable replace global load * small hacks to make faster * clean up a little? * linter * allow substituting with an expression * clean up a little * fix everything * try to fix bug? * type annotation * typing * typing	2023-09-05 21:14:40 -07:00
Pavol Rusnak	a50a7ef6f2	revert typo in external_multi_gpu.py (#1777 ) introduced by `fb1cc6bf4b`	2023-09-05 20:46:28 -07:00
George Hotz	bd06d88c73	fix attn_mask None issue (#1786 )	2023-09-05 20:45:54 -07:00
Francis Lam	0379b64ac4	add seed option to stable_diffusion (#1784 ) useful for testing correctness of model runs	2023-09-05 19:45:15 -07:00
George Hotz	6100d7425f	add 2 to locals, uops debug 5 (#1782 )	2023-09-05 19:44:43 -07:00
Roelof van Dijk	2a11669e1d	perf: faster and more readable merge_dicts (#1775 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-09-05 14:42:19 -07:00
George Hotz	89a8a02697	disable openpilot model in model benchmark	2023-09-05 13:32:30 -07:00
geohotstan	9af5645ba3	onnx full passing (#1076 ) * 1 * 83 failed * learning how git works * lol idk * zero shape aaaa * space lol * aaa * test check * haha * fixed gather * 73 failing * 71 failing * 68 failing * added some debug * fking resize * lol * 62 failing * 58 failling fucking did nearest resize hell yeah * clean up * 56 failing * janitor duty * lol * 53 failing * hi mom * 50 failing * added linear interp, but coord_trans is wrong * did lin interpolation woohoo * 43 failing * 40 failing * temporary Gather fix * 39 failing * fixed slice onnxver<10 * 37 failing * 35 failing * excluded tests that use float64 * 32 failing with hacks * added _batchnorm() for 3D 5D batchnorm, 29 failing * changed ALLOWED_KERNEL_COUNT from 199 to 207 * added improved Gather op, reverted ALLOWED_KERNEL_COUNT commit * support Round op * added storage_order/indices maxpool, 27 failing * support maxunpool, 25 failures * support Gradient, 23 failures * merged new where * added Adam * cleanups * added Momentum and Nesterov Momentum * added Adagrad * support sequence_type, 20 failing * ugh git * I give up on cubic interp :D, 9 failing * sexy 1 liner gather, much improved, wow * polished gather to make it shine bright like a diamond * clean 1 liner for gather * improved readability of gather * uhh * clean up * more clean up * WHITEspace * implemented SoftmaxCrossEntropyLoss op * added comments and cleaned up if statements * update * thank based wozeparrot for pow and new GatherElements * CPU and TORCH all pass \| cast float64 -> float32 for all fromCPU() * _nearest_gather() failing on yolo * reverted ops_cpu change and added assert in Resize * added comments for resize for multiple channels * oops * merge * test * switched np.pad to Tensor.pad for constant padding * gah * gah2 * sexy reflect pad with movementops -> add * delete commented out lines * edge mode pad sexy as well * trying out model_benchmark * revert gitignore change lol * init * Revert "init" This reverts commit `682bf2073a`. * wrote cast workaround for CPU, CPU and TORCH all pass * wrote cast workaround for CPU, CPU and TORCH all pass * skipped tests w/ 0 shape for METAL and GPU * excluded tests for CLANG, CPU, TORCH, CLANG pass * fixed hacky ConvTranspose * gotta figure out autopad * UOps.STORE support cast bool -> float * small fix for fast gather * reverted 0 shape skipped tests * oops missed a file * added comment * fixed slice op hack * First commit to pr * More trig ops * More trig ops * format * isinf support * More ops * changed onnx_ops to use our new gather :D * Det op bug fix * rebase * fixed some tests * det broken and slow * fixed compress to use new gather * implemented argmax argmin * support variable types in type_proto * support Upsample and Identity sequence * we support float64 now and tinygrad support automatic broadcasting * added EyeLike op * resize does support multiple channels now actually * yolov8 onnx runs successfully * added batch size 1 * oops * finally fixed type_proto I think * fixed some llvm bugs * del whitespaces * added ZenginU Format PR * test * oops * added float64 exclude tests back * more skipped tests * try * ok openpilot pass * flake8 pass * woooooohooo * revert external_model_benchmark changes * perf tested gather * removed promote types from ops_cpu * numerical errors from 1681 is fixed --------- Co-authored-by: ZenginU <umutzengin00@gmail.com>	2023-09-05 13:23:32 -07:00
George Hotz	fb1cc6bf4b	llama jit is default, print tok/sec (#1774 ) * llama jit is default, print tok/sec * jit not default in CI	2023-09-05 10:12:16 -07:00
Roelof van Dijk	f6e6a1a4d7	perf: avoid cast, restore isinstance (#1772 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-09-05 09:07:04 -04:00
geohotstan	671101e6b8	Metal stuff pip install on default when on Darwin (#1770 ) * added to setup * split lines for Darwin stuff	2023-09-04 21:59:54 -07:00
George Hotz	10305bfc0a	tuples only (#1769 )	2023-09-04 16:35:11 -07:00
George Hotz	63c46e0287	Parens and gls (#1768 ) * more paren stripping * remove global and local size from renderers * complex strip parens * extra helpers + minor webgpu fix * fix test uops * one more parens test	2023-09-04 16:09:01 -07:00
Adrian Kretz	3473c9e88d	Metal conv tensor cores (#1696 ) * Benchmark 5x5 conv kernel which is optimized * Use Metal tensor cores in 2d convs	2023-09-04 15:14:46 -07:00
George Hotz	b32ed8e6e9	removing loop (#1764 ) * removing loop * fix llvm * remove unused * strip parens * with side effects * define global has side effects	2023-09-04 14:47:46 -07:00
tomtom-95	7344f7c2d1	KeyError fixed. (#1763 )	2023-09-04 15:36:16 -04:00
Roelof van Dijk	fd8e14c07a	fix: unused function (#1759 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-09-04 11:39:50 -07:00
Roelof van Dijk	c826854e48	fix: remove unused function (#1760 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-09-04 11:39:34 -07:00
Roelof van Dijk	2aaecc1ce4	fix: remove unused function (#1761 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-09-04 11:39:27 -07:00
nimlgen	f863c12610	test kopt correctness (#1756 ) * test kopt correctness * bump BUDGET to 20 * kopt hooks as setUp/tearDown	2023-09-04 10:55:00 -07:00
George Hotz	c6d5d45a2b	Remove MemOp (#1750 ) * start removing memop * locals * support both stores * might be correct * remove parens on shape ish * fix metal ops * render load and render store * fix image * maybe fix asm * fix test uops * revert asm * remove memop itself	2023-09-04 09:58:33 -07:00
George Hotz	56abe04e4b	disable assembly (#1755 )	2023-09-04 09:41:20 -07:00
chenyu	b8fde6bb0f	Test KOPT in CI (#1744 ) * test kopt in ci * getenv takes dtype from default	2023-09-03 14:37:20 -07:00
George Hotz	ed194a1d3b	zero fold (#1748 ) * add constant fold * err, it's just zero folding * self store fold + caching * prints and more folds * simpler winograd kernels * remove childless uops	2023-09-03 13:48:11 -07:00

... 158 159 160 161 162 ...

10417 Commits