tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-14 09:28:04 -05:00

Author	SHA1	Message	Date
George Hotz	fb1cc6bf4b	llama jit is default, print tok/sec (#1774 ) * llama jit is default, print tok/sec * jit not default in CI	2023-09-05 10:12:16 -07:00
George Hotz	63c46e0287	Parens and gls (#1768 ) * more paren stripping * remove global and local size from renderers * complex strip parens * extra helpers + minor webgpu fix * fix test uops * one more parens test	2023-09-04 16:09:01 -07:00
Adrian Kretz	3473c9e88d	Metal conv tensor cores (#1696 ) * Benchmark 5x5 conv kernel which is optimized * Use Metal tensor cores in 2d convs	2023-09-04 15:14:46 -07:00
tomtom-95	7344f7c2d1	KeyError fixed. (#1763 )	2023-09-04 15:36:16 -04:00
nimlgen	f863c12610	test kopt correctness (#1756 ) * test kopt correctness * bump BUDGET to 20 * kopt hooks as setUp/tearDown	2023-09-04 10:55:00 -07:00
George Hotz	c6d5d45a2b	Remove MemOp (#1750 ) * start removing memop * locals * support both stores * might be correct * remove parens on shape ish * fix metal ops * render load and render store * fix image * maybe fix asm * fix test uops * revert asm * remove memop itself	2023-09-04 09:58:33 -07:00
chenyu	b8fde6bb0f	Test KOPT in CI (#1744 ) * test kopt in ci * getenv takes dtype from default	2023-09-03 14:37:20 -07:00
George Hotz	ed194a1d3b	zero fold (#1748 ) * add constant fold * err, it's just zero folding * self store fold + caching * prints and more folds * simpler winograd kernels * remove childless uops	2023-09-03 13:48:11 -07:00
George Hotz	e17b1af160	UnaryOps.NEG (#1749 )	2023-09-03 12:44:26 -07:00
David Hou	3151d91f6e	3x3 winograd convs (#1675 ) * winograd * simplify local groups code * comment * respects self.opts.has_local * always simplify ones * make mypy happy * move reshape, WINO flag * wino flag, simple forward backward test for wino * extra wino test * merge oops * comments * axis_needs_valid -> axis_is_masked * don't delete needs_valid (it's unused though) * make linter happy * make linter happy * smaller test * change number * make wino tests very small	2023-09-03 07:29:43 -07:00
geohotstan	e36148b1ce	Make __getitem__ TINYer (#1661 )	2023-09-02 23:01:01 -04:00
Yixiang Gao	66a6bbd029	codellama (#1702 ) * add codellama with pre-downloaded weights * add rope_theta, fix param * fix test * add 7B-Python * add 7B-Instruct * replace single quotes with doulbe --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-09-02 08:45:12 -07:00
chenyu	a2745819f6	faster gpt2 jit path and gpt2 in test_real_world (#1738 )	2023-09-02 08:39:12 -07:00
George Hotz	91258aa67f	render const (#1736 ) * render const * remove constop * fix llvm and webgpu * disable consts in llvm again * assembly special * fix const rendering * fix arm64 * imms are int * fix ptx * fix arm64	2023-09-01 19:01:43 -07:00
George Hotz	cd844ec4b2	remove Token class (#1723 ) * no fusion * no float4 grouping * mulacc fusion is fine. remove uop_alu * fully remove get_grouped_maybe_float4 * removed that test * that's not float4 anymore * disable failing arm64 * metal ops pass tokenless * fix wmma * update test_uops with new style * fix gep * fix float4 store * fix float4 store more * cuda tests pass * disable broadcast pow * fix ptx * reenable arm64 * bring cse back * don't cache the acc * fix ptx bug	2023-09-01 12:53:07 -07:00
George Hotz	458eb89463	minor changes from prerender (#1734 )	2023-09-01 10:04:47 -07:00
chenyu	f964b9e5ee	visitor pattern for sym_infer and unit tests (#1733 ) * visitor pattern for sym_infer and unit tests * comments	2023-09-01 09:47:45 -07:00
JaSpa99	024dd690fa	Reactivate commavq/gpt2m benchmark (#1731 ) * get commavq/gpt2m from huggingface * increase tols	2023-09-01 06:45:08 -07:00
George Hotz	5c403d43b9	New >3 indexing (#1729 ) * move reindexing into linearizer * get_grouped_dims * don't limit for clang	2023-08-31 21:24:15 -07:00
George Hotz	e3a062ad17	real matvec test	2023-08-31 17:27:25 -07:00
Karan Handa	a8aa13dc91	[ready] Replacing os with pathlib (#1708 ) * replace os.path with pathlib * safe convert dirnames to pathlib * replace all os.path.join * fix cuda error * change main chunk * Reviewer fixes * fix vgg * Fixed everything * Final fixes * ensure consistency * Change all parent.parent... to parents	2023-08-30 10:41:08 -07:00
nimlgen	355b02dc3f	allow zerosized tensors (#1659 ) * allow zerosized tensors * works with numpy	2023-08-30 10:39:24 -07:00
Max Hahn	f9cb31fdc2	added visitor pattern (#1669 ) * added visitor pattern * pylint bug workaround * added tests, made abstract OpNode inherit from ABC * fixed assert * fix check of abstract classes in negative test * remove assert False	2023-08-30 09:03:44 -07:00
chenyu	ac183568be	llama JIT python runtime speedup (#1633 ) * no JIT call in TransformerBlock * idea * move 2 reshapes to jitted function shrink inside jitted too, 6.3ms remove back reshapes, 5.5ms isinstance -> __class__ 4.99ms * think revert ops_gpu.py revert symbolic.py too PYOPENCL_COMPILER_OUTPUT=1 * cleanup * fix cache shape for conversational model only reshape if start_pos > 0 * small cleanup * include var_vals.keys() to st.key * add comments * llama small update * everything jitted again, similar structure to gpt2 * fix typing * add TODO for in place update cache	2023-08-30 07:51:05 -07:00
nimlgen	8844a0a822	llvm jitted (#1652 )	2023-08-28 20:22:44 -07:00
nimlgen	1c0449e190	add cache collector (#1595 ) * init cache collector * add test_cache_collector.py * switch GlobalCounters.cache to CacheCollector * init jit models test * jitted SD * add debug msg to print loaded bufs count * moved cache collctor to jit * clearer SD * no double device import	2023-08-28 19:59:55 -07:00
qazal	3515ba4f23	add dtypes test (#1682 )	2023-08-28 08:12:15 -07:00
chenyu	66fbf4800b	fix symbolic_ops tests with Tensor.training=True (#1686 )	2023-08-26 23:19:56 -04:00
chenyu	b5d700adae	update openpilot supercombo.onnx to 0.9.4 (#1681 ) * update openpilot supercombo.onnx to 0.9.4 * update tests for the new model * comment out comma models from external_model_benchmark	2023-08-26 19:16:08 -04:00
Jordan Wright	25be7f745d	Tensor.uniform with dtype=int bug fix (#1593 )	2023-08-26 01:59:53 -04:00
George Hotz	1b8c40234f	Uast start (#1650 ) * work * more tests * more tests 2 * don't break it	2023-08-23 12:00:06 -07:00
George Hotz	a6d842af7a	move device to ops (#1646 ) * move device to ops * mlops types * 2 lines	2023-08-23 08:30:17 -07:00
nimlgen	a65ae1198b	do replace div->mul for non-floats (#1644 )	2023-08-23 07:34:31 -07:00
George Hotz	c831218139	Optional: Reduce line count and simplify the LazyBuffer interface (#1642 ) * less lines in lazybuffer, def e * custom function * cast * reorder functions * lb type	2023-08-22 21:01:10 -07:00
George Hotz	d25046e66a	matvec tests (#1634 ) * matvec tests * f16 * f16 is broken	2023-08-22 17:33:58 -07:00
George Hotz	643cbdfd50	make embedding and GPT-2 fast (#1631 ) * make embedding fast * jit more, variable shape support * print mem bw	2023-08-22 15:14:38 -07:00
George Hotz	db8344ab83	add noalias to llvm (#1622 )	2023-08-22 09:26:01 -07:00
chenyu	89e13f2f04	support symbols in shrink (#1611 )	2023-08-22 09:08:21 -07:00
George Hotz	718ced296c	move state to nn/state (#1619 )	2023-08-22 07:36:24 -07:00
George Hotz	86a32ffb1a	lt sum (#1617 )	2023-08-21 21:19:16 -07:00
George Hotz	c64c47a6ae	test arange simple	2023-08-21 20:16:17 -07:00
Yixiang Gao	4f02491cd4	add cpu if torch tensor (#1609 )	2023-08-21 16:57:59 -07:00
Yixiang Gao	4d54afb6df	sparse cat cross entropy (#1597 ) * add sparse cat cross entropy * minor fix * add log_softmax into loss function * add test * update docs * fix training loss * add device	2023-08-21 14:14:54 -07:00
George Hotz	2e60920317	Revert "sparse cat cross entropy (#1591 )" (#1596 ) This reverts commit `f0ee850e98`.	2023-08-21 10:04:26 -07:00
Yixiang Gao	f0ee850e98	sparse cat cross entropy (#1591 ) * add sparse cat cross entropy * minor fix * add log_softmax into loss function * add test * update docs	2023-08-21 09:56:41 -07:00
Yixiang Gao	8d6662a741	.cpu().numpy() -> .numpy() (#1594 ) * .cpu().numpy() -> .numpy() * restore ops_torch * restore test_speed_v_torch	2023-08-21 09:53:29 -07:00
Umut Zengin	35bf21276f	Argmax/Argmin Feature (#1576 ) * implemented argmax and argmin * lint * lint * match torch behaviour * format * removed flip	2023-08-20 18:46:46 -07:00
George Hotz	012ee7d162	not worth the speed (#1584 ) * not worth the speed * no slots * uops comments * bump to python 3.11 for speed * add critical slots back	2023-08-20 10:24:58 -07:00
George Hotz	739f327d2d	Shorter (#1582 ) * deleting lines * remove insert dims * if statement is never hit * bug fixes	2023-08-20 08:12:16 -07:00
David Hou	4fbce972d7	CSE at uop level (#1483 ) * uop-level cse * add test * don't cache reduce alu ops * types * rename variable * fix * delete lines	2023-08-19 23:40:40 -07:00

... 76 77 78 79 80 ...

4667 Commits