tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-23 13:58:00 -05:00

Author	SHA1	Message	Date
chenyu	3ec301c2d7	apply view.py patch (#1844 )	2023-09-10 17:32:15 -07:00
Yixiang Gao	a32951a001	add test_tensor_copy (#1840 ) * add test_tensor_copy * fix whitespace * add value check	2023-09-10 16:01:58 -07:00
George Hotz	47e602f717	view: do not trade complexity for speed (#1839 ) * view: do not trade complexity for speed * staticmethods * view create	2023-09-10 11:29:53 -07:00
David Hou	e74a6ca7e4	expand in terms of substitute (#1827 )	2023-09-09 14:43:00 -07:00
nimlgen	31fca43706	kopt works with local+grouped reduce and tests (#1824 )	2023-09-09 13:22:09 -07:00
Francis Lam	651205fa5c	linearizer: support local and group_for_reduce dimensions together (#1821 ) also minor changes to test_speed_v_torch.py and size of UOps.SPECIAL	2023-09-08 12:39:27 -07:00
segf00lt	9e8c1dbf34	patch to remove hack from stable_diffusion.py (#1814 ) * patch to remove hack from stable_diffusion.py * sorry linter * realize after assign? * float16 broken in llvmlite use float64 for now * int32 * idiot forgot to change test array dtype	2023-09-08 09:26:50 -07:00
chenyu	ebcda8a714	Move var_vals from ShapeTracker to LazyBuffer (#1819 )	2023-09-08 09:25:10 -07:00
nimlgen	5b15a972b5	no functions with same names in test/ (#1811 )	2023-09-07 11:27:31 -07:00
nimlgen	a78a1fa499	fix jit buffer reuse when freed (#1802 ) * fix jit buffer reuse when freed * Firbid output_buffer reusage	2023-09-06 14:41:57 -07:00
Pavol Rusnak	52a92bf95d	use class Foo: instead of class Foo(): (#1797 ) * use class Foo: instead of class Foo(): * add ruff linter, copy settings from .flake8 to ruff.toml	2023-09-06 12:20:25 -07:00
geohotstan	1bbf26d7fd	fix try except not catching fxn() in benchmark (#1783 ) * have function raise notimplementederror * more lines * revert back to 2 lines :D * aahhhhhhhh shoooot im stupid * keep it minimal?	2023-09-06 07:36:43 -07:00
badcc	ee9ac20752	Use correct dtype in Tensor when data is an ndarray (#1785 ) * use correct dtype in Tensor when data is an ndarray * attempt 2 * add assert to be consistent * Add test case for ndarray * Add test case for list * remove whitespace	2023-09-06 07:35:32 -07:00
Pavol Rusnak	a50a7ef6f2	revert typo in external_multi_gpu.py (#1777 ) introduced by `fb1cc6bf4b`	2023-09-05 20:46:28 -07:00
George Hotz	89a8a02697	disable openpilot model in model benchmark	2023-09-05 13:32:30 -07:00
geohotstan	9af5645ba3	onnx full passing (#1076 ) * 1 * 83 failed * learning how git works * lol idk * zero shape aaaa * space lol * aaa * test check * haha * fixed gather * 73 failing * 71 failing * 68 failing * added some debug * fking resize * lol * 62 failing * 58 failling fucking did nearest resize hell yeah * clean up * 56 failing * janitor duty * lol * 53 failing * hi mom * 50 failing * added linear interp, but coord_trans is wrong * did lin interpolation woohoo * 43 failing * 40 failing * temporary Gather fix * 39 failing * fixed slice onnxver<10 * 37 failing * 35 failing * excluded tests that use float64 * 32 failing with hacks * added _batchnorm() for 3D 5D batchnorm, 29 failing * changed ALLOWED_KERNEL_COUNT from 199 to 207 * added improved Gather op, reverted ALLOWED_KERNEL_COUNT commit * support Round op * added storage_order/indices maxpool, 27 failing * support maxunpool, 25 failures * support Gradient, 23 failures * merged new where * added Adam * cleanups * added Momentum and Nesterov Momentum * added Adagrad * support sequence_type, 20 failing * ugh git * I give up on cubic interp :D, 9 failing * sexy 1 liner gather, much improved, wow * polished gather to make it shine bright like a diamond * clean 1 liner for gather * improved readability of gather * uhh * clean up * more clean up * WHITEspace * implemented SoftmaxCrossEntropyLoss op * added comments and cleaned up if statements * update * thank based wozeparrot for pow and new GatherElements * CPU and TORCH all pass \| cast float64 -> float32 for all fromCPU() * _nearest_gather() failing on yolo * reverted ops_cpu change and added assert in Resize * added comments for resize for multiple channels * oops * merge * test * switched np.pad to Tensor.pad for constant padding * gah * gah2 * sexy reflect pad with movementops -> add * delete commented out lines * edge mode pad sexy as well * trying out model_benchmark * revert gitignore change lol * init * Revert "init" This reverts commit `682bf2073a`. * wrote cast workaround for CPU, CPU and TORCH all pass * wrote cast workaround for CPU, CPU and TORCH all pass * skipped tests w/ 0 shape for METAL and GPU * excluded tests for CLANG, CPU, TORCH, CLANG pass * fixed hacky ConvTranspose * gotta figure out autopad * UOps.STORE support cast bool -> float * small fix for fast gather * reverted 0 shape skipped tests * oops missed a file * added comment * fixed slice op hack * First commit to pr * More trig ops * More trig ops * format * isinf support * More ops * changed onnx_ops to use our new gather :D * Det op bug fix * rebase * fixed some tests * det broken and slow * fixed compress to use new gather * implemented argmax argmin * support variable types in type_proto * support Upsample and Identity sequence * we support float64 now and tinygrad support automatic broadcasting * added EyeLike op * resize does support multiple channels now actually * yolov8 onnx runs successfully * added batch size 1 * oops * finally fixed type_proto I think * fixed some llvm bugs * del whitespaces * added ZenginU Format PR * test * oops * added float64 exclude tests back * more skipped tests * try * ok openpilot pass * flake8 pass * woooooohooo * revert external_model_benchmark changes * perf tested gather * removed promote types from ops_cpu * numerical errors from 1681 is fixed --------- Co-authored-by: ZenginU <umutzengin00@gmail.com>	2023-09-05 13:23:32 -07:00
George Hotz	fb1cc6bf4b	llama jit is default, print tok/sec (#1774 ) * llama jit is default, print tok/sec * jit not default in CI	2023-09-05 10:12:16 -07:00
George Hotz	63c46e0287	Parens and gls (#1768 ) * more paren stripping * remove global and local size from renderers * complex strip parens * extra helpers + minor webgpu fix * fix test uops * one more parens test	2023-09-04 16:09:01 -07:00
Adrian Kretz	3473c9e88d	Metal conv tensor cores (#1696 ) * Benchmark 5x5 conv kernel which is optimized * Use Metal tensor cores in 2d convs	2023-09-04 15:14:46 -07:00
tomtom-95	7344f7c2d1	KeyError fixed. (#1763 )	2023-09-04 15:36:16 -04:00
nimlgen	f863c12610	test kopt correctness (#1756 ) * test kopt correctness * bump BUDGET to 20 * kopt hooks as setUp/tearDown	2023-09-04 10:55:00 -07:00
George Hotz	c6d5d45a2b	Remove MemOp (#1750 ) * start removing memop * locals * support both stores * might be correct * remove parens on shape ish * fix metal ops * render load and render store * fix image * maybe fix asm * fix test uops * revert asm * remove memop itself	2023-09-04 09:58:33 -07:00
chenyu	b8fde6bb0f	Test KOPT in CI (#1744 ) * test kopt in ci * getenv takes dtype from default	2023-09-03 14:37:20 -07:00
George Hotz	ed194a1d3b	zero fold (#1748 ) * add constant fold * err, it's just zero folding * self store fold + caching * prints and more folds * simpler winograd kernels * remove childless uops	2023-09-03 13:48:11 -07:00
George Hotz	e17b1af160	UnaryOps.NEG (#1749 )	2023-09-03 12:44:26 -07:00
David Hou	3151d91f6e	3x3 winograd convs (#1675 ) * winograd * simplify local groups code * comment * respects self.opts.has_local * always simplify ones * make mypy happy * move reshape, WINO flag * wino flag, simple forward backward test for wino * extra wino test * merge oops * comments * axis_needs_valid -> axis_is_masked * don't delete needs_valid (it's unused though) * make linter happy * make linter happy * smaller test * change number * make wino tests very small	2023-09-03 07:29:43 -07:00
geohotstan	e36148b1ce	Make __getitem__ TINYer (#1661 )	2023-09-02 23:01:01 -04:00
Yixiang Gao	66a6bbd029	codellama (#1702 ) * add codellama with pre-downloaded weights * add rope_theta, fix param * fix test * add 7B-Python * add 7B-Instruct * replace single quotes with doulbe --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-09-02 08:45:12 -07:00
chenyu	a2745819f6	faster gpt2 jit path and gpt2 in test_real_world (#1738 )	2023-09-02 08:39:12 -07:00
George Hotz	91258aa67f	render const (#1736 ) * render const * remove constop * fix llvm and webgpu * disable consts in llvm again * assembly special * fix const rendering * fix arm64 * imms are int * fix ptx * fix arm64	2023-09-01 19:01:43 -07:00
George Hotz	cd844ec4b2	remove Token class (#1723 ) * no fusion * no float4 grouping * mulacc fusion is fine. remove uop_alu * fully remove get_grouped_maybe_float4 * removed that test * that's not float4 anymore * disable failing arm64 * metal ops pass tokenless * fix wmma * update test_uops with new style * fix gep * fix float4 store * fix float4 store more * cuda tests pass * disable broadcast pow * fix ptx * reenable arm64 * bring cse back * don't cache the acc * fix ptx bug	2023-09-01 12:53:07 -07:00
George Hotz	458eb89463	minor changes from prerender (#1734 )	2023-09-01 10:04:47 -07:00
chenyu	f964b9e5ee	visitor pattern for sym_infer and unit tests (#1733 ) * visitor pattern for sym_infer and unit tests * comments	2023-09-01 09:47:45 -07:00
JaSpa99	024dd690fa	Reactivate commavq/gpt2m benchmark (#1731 ) * get commavq/gpt2m from huggingface * increase tols	2023-09-01 06:45:08 -07:00
George Hotz	5c403d43b9	New >3 indexing (#1729 ) * move reindexing into linearizer * get_grouped_dims * don't limit for clang	2023-08-31 21:24:15 -07:00
George Hotz	e3a062ad17	real matvec test	2023-08-31 17:27:25 -07:00
Karan Handa	a8aa13dc91	[ready] Replacing os with pathlib (#1708 ) * replace os.path with pathlib * safe convert dirnames to pathlib * replace all os.path.join * fix cuda error * change main chunk * Reviewer fixes * fix vgg * Fixed everything * Final fixes * ensure consistency * Change all parent.parent... to parents	2023-08-30 10:41:08 -07:00
nimlgen	355b02dc3f	allow zerosized tensors (#1659 ) * allow zerosized tensors * works with numpy	2023-08-30 10:39:24 -07:00
Max Hahn	f9cb31fdc2	added visitor pattern (#1669 ) * added visitor pattern * pylint bug workaround * added tests, made abstract OpNode inherit from ABC * fixed assert * fix check of abstract classes in negative test * remove assert False	2023-08-30 09:03:44 -07:00
chenyu	ac183568be	llama JIT python runtime speedup (#1633 ) * no JIT call in TransformerBlock * idea * move 2 reshapes to jitted function shrink inside jitted too, 6.3ms remove back reshapes, 5.5ms isinstance -> __class__ 4.99ms * think revert ops_gpu.py revert symbolic.py too PYOPENCL_COMPILER_OUTPUT=1 * cleanup * fix cache shape for conversational model only reshape if start_pos > 0 * small cleanup * include var_vals.keys() to st.key * add comments * llama small update * everything jitted again, similar structure to gpt2 * fix typing * add TODO for in place update cache	2023-08-30 07:51:05 -07:00
nimlgen	8844a0a822	llvm jitted (#1652 )	2023-08-28 20:22:44 -07:00
nimlgen	1c0449e190	add cache collector (#1595 ) * init cache collector * add test_cache_collector.py * switch GlobalCounters.cache to CacheCollector * init jit models test * jitted SD * add debug msg to print loaded bufs count * moved cache collctor to jit * clearer SD * no double device import	2023-08-28 19:59:55 -07:00
qazal	3515ba4f23	add dtypes test (#1682 )	2023-08-28 08:12:15 -07:00
chenyu	66fbf4800b	fix symbolic_ops tests with Tensor.training=True (#1686 )	2023-08-26 23:19:56 -04:00
chenyu	b5d700adae	update openpilot supercombo.onnx to 0.9.4 (#1681 ) * update openpilot supercombo.onnx to 0.9.4 * update tests for the new model * comment out comma models from external_model_benchmark	2023-08-26 19:16:08 -04:00
Jordan Wright	25be7f745d	Tensor.uniform with dtype=int bug fix (#1593 )	2023-08-26 01:59:53 -04:00
George Hotz	1b8c40234f	Uast start (#1650 ) * work * more tests * more tests 2 * don't break it	2023-08-23 12:00:06 -07:00
George Hotz	a6d842af7a	move device to ops (#1646 ) * move device to ops * mlops types * 2 lines	2023-08-23 08:30:17 -07:00
nimlgen	a65ae1198b	do replace div->mul for non-floats (#1644 )	2023-08-23 07:34:31 -07:00
George Hotz	c831218139	Optional: Reduce line count and simplify the LazyBuffer interface (#1642 ) * less lines in lazybuffer, def e * custom function * cast * reorder functions * lb type	2023-08-22 21:01:10 -07:00

... 71 72 73 74 75 ...

4433 Commits