tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 14:58:46 -05:00

Author	SHA1	Message	Date
George Hotz	e17b1af160	UnaryOps.NEG (#1749 )	2023-09-03 12:44:26 -07:00
George Hotz	9f1a54acee	pretty kernel in cstyle (#1746 ) * pretty kernel in cstyle * fix mem estimate * that made it slower * Revert "that made it slower" This reverts commit `faa4cd0187`.	2023-09-03 10:21:02 -07:00
George Hotz	e910e0e62c	folding mul by 0 (#1743 ) * why doesn't this work * zero mlop * explicit fold in winograd	2023-09-03 09:04:12 -07:00
David Hou	3151d91f6e	3x3 winograd convs (#1675 ) * winograd * simplify local groups code * comment * respects self.opts.has_local * always simplify ones * make mypy happy * move reshape, WINO flag * wino flag, simple forward backward test for wino * extra wino test * merge oops * comments * axis_needs_valid -> axis_is_masked * don't delete needs_valid (it's unused though) * make linter happy * make linter happy * smaller test * change number * make wino tests very small	2023-09-03 07:29:43 -07:00
crankygrumpster	c8025c319c	Remove Token from abstractions.py (#1741 ) * Remove Token from abstractions.py, update output string * add dtype	2023-09-02 21:56:11 -07:00
geohotstan	e36148b1ce	Make __getitem__ TINYer (#1661 )	2023-09-02 23:01:01 -04:00
Roelof van Dijk	60590cf8b5	perf: create buffer only when needed (#1684 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-09-02 17:43:29 -07:00
Yixiang Gao	66a6bbd029	codellama (#1702 ) * add codellama with pre-downloaded weights * add rope_theta, fix param * fix test * add 7B-Python * add 7B-Instruct * replace single quotes with doulbe --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-09-02 08:45:12 -07:00
chenyu	a2745819f6	faster gpt2 jit path and gpt2 in test_real_world (#1738 )	2023-09-02 08:39:12 -07:00
George Hotz	89cd380bfc	add nvidia CI (#1737 ) * add nvidia * speed(nvidia)	2023-09-01 22:02:30 -07:00
George Hotz	91258aa67f	render const (#1736 ) * render const * remove constop * fix llvm and webgpu * disable consts in llvm again * assembly special * fix const rendering * fix arm64 * imms are int * fix ptx * fix arm64	2023-09-01 19:01:43 -07:00
nimlgen	a96e54d8bb	search for grouped reduces (#1732 )	2023-09-01 14:21:10 -07:00
George Hotz	cd844ec4b2	remove Token class (#1723 ) * no fusion * no float4 grouping * mulacc fusion is fine. remove uop_alu * fully remove get_grouped_maybe_float4 * removed that test * that's not float4 anymore * disable failing arm64 * metal ops pass tokenless * fix wmma * update test_uops with new style * fix gep * fix float4 store * fix float4 store more * cuda tests pass * disable broadcast pow * fix ptx * reenable arm64 * bring cse back * don't cache the acc * fix ptx bug	2023-09-01 12:53:07 -07:00
George Hotz	458eb89463	minor changes from prerender (#1734 )	2023-09-01 10:04:47 -07:00
chenyu	f964b9e5ee	visitor pattern for sym_infer and unit tests (#1733 ) * visitor pattern for sym_infer and unit tests * comments	2023-09-01 09:47:45 -07:00
wozeparrot	bf05534c6e	hip multidevice (#1728 ) * feat: hip multidevice support + p2p * feat: default device	2023-09-01 06:46:13 -07:00
JaSpa99	024dd690fa	Reactivate commavq/gpt2m benchmark (#1731 ) * get commavq/gpt2m from huggingface * increase tols	2023-09-01 06:45:08 -07:00
George Hotz	7780eb3c5a	minor dimensions (#1730 )	2023-09-01 06:42:00 -07:00
George Hotz	5c403d43b9	New >3 indexing (#1729 ) * move reindexing into linearizer * get_grouped_dims * don't limit for clang	2023-08-31 21:24:15 -07:00
George Hotz	e3a062ad17	real matvec test	2023-08-31 17:27:25 -07:00
George Hotz	453e437598	move stuff in the linearizer (#1726 ) * move stuff in linearizer * move stuff in linearizer * minor * fix opts import	2023-08-31 14:42:09 -07:00
George Hotz	c18a497dde	minor global dim cleanup (#1724 )	2023-08-31 12:23:39 -07:00
geohotstan	94b1257f5e	Changed DEVICE to Device.DEFAULT in deep_determinist_policy_gradient (#1715 ) * added device in optim and deep * oops forgot to del print code * use Device.DEFAULT instead * removed device	2023-08-31 07:08:51 -07:00
nimlgen	b5cf274da3	remove memory peak for quantized llama (#1720 )	2023-08-30 16:32:30 -04:00
chenyu	e4eb5d55c7	critical realize for unjitted llama (#1718 )	2023-08-30 14:52:32 -04:00
George Hotz	cd7ceed914	gpt2: print total instead of sync time	2023-08-30 10:59:42 -07:00
Roelof van Dijk	62536d6000	perf: use enumerate where possible (#1692 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-30 10:41:51 -07:00
Karan Handa	a8aa13dc91	[ready] Replacing os with pathlib (#1708 ) * replace os.path with pathlib * safe convert dirnames to pathlib * replace all os.path.join * fix cuda error * change main chunk * Reviewer fixes * fix vgg * Fixed everything * Final fixes * ensure consistency * Change all parent.parent... to parents	2023-08-30 10:41:08 -07:00
nimlgen	355b02dc3f	allow zerosized tensors (#1659 ) * allow zerosized tensors * works with numpy	2023-08-30 10:39:24 -07:00
Max Hahn	f9cb31fdc2	added visitor pattern (#1669 ) * added visitor pattern * pylint bug workaround * added tests, made abstract OpNode inherit from ABC * fixed assert * fix check of abstract classes in negative test * remove assert False	2023-08-30 09:03:44 -07:00
George Hotz	fdd7f282cb	Reenable tensor cores for self-hosted Mac CI (#1717 ) * debug 5 matmul * allow tensor cores in CI * tensor cores on arm64 * put debug back	2023-08-30 07:53:04 -07:00
chenyu	ac183568be	llama JIT python runtime speedup (#1633 ) * no JIT call in TransformerBlock * idea * move 2 reshapes to jitted function shrink inside jitted too, 6.3ms remove back reshapes, 5.5ms isinstance -> __class__ 4.99ms * think revert ops_gpu.py revert symbolic.py too PYOPENCL_COMPILER_OUTPUT=1 * cleanup * fix cache shape for conversational model only reshape if start_pos > 0 * small cleanup * include var_vals.keys() to st.key * add comments * llama small update * everything jitted again, similar structure to gpt2 * fix typing * add TODO for in place update cache	2023-08-30 07:51:05 -07:00
Umut Zengin	1682e9a38a	Fix: Stable Diffusion index (#1713 )	2023-08-30 00:21:10 -04:00
wozeparrot	2f768e386d	stable diffusion benchmark artifact (#1714 )	2023-08-29 21:08:40 -04:00
George Hotz	0ea22bf249	remove DEBUG=1 from stable diffusion AMD since jit cache is fixed	2023-08-29 12:46:12 -07:00
George Hotz	ab9b9ff3e2	pipefail benchmark (#1709 ) (#1710 ) * feat: specify shell * feat: specify shell for mac Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2023-08-29 08:15:02 -07:00
George Hotz	aa7c98722b	sd timing (#1706 )	2023-08-28 20:22:57 -07:00
nimlgen	8844a0a822	llvm jitted (#1652 )	2023-08-28 20:22:44 -07:00
nimlgen	1c0449e190	add cache collector (#1595 ) * init cache collector * add test_cache_collector.py * switch GlobalCounters.cache to CacheCollector * init jit models test * jitted SD * add debug msg to print loaded bufs count * moved cache collctor to jit * clearer SD * no double device import	2023-08-28 19:59:55 -07:00
George Hotz	f5f8b09c13	allow manual release (#1704 )	2023-08-28 17:54:25 -07:00
George Hotz	715047a1e4	fix release publish (#1703 )	2023-08-28 17:48:00 -07:00
Olivier Chafik	ee6d8de2dc	Llama: load models in HuggingFace format (incl. indexed, safetensors) (#1583 )	2023-08-28 15:11:40 -04:00
qazal	3515ba4f23	add dtypes test (#1682 )	2023-08-28 08:12:15 -07:00
Roelof van Dijk	50f669e43b	[ready] perf: simpler Tensor init (#1679 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-27 22:18:03 -04:00
Roelof van Dijk	b66f54e379	perf: avoid reshaping if not necessary (#1683 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-27 14:17:04 -04:00
Roelof van Dijk	328cf2e86a	perf: remove cast and revert back to isinstance (#1694 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-27 14:15:52 -04:00
wozeparrot	8b354b3f73	feat: version bump! (#1687 ) v0.7.0	2023-08-27 12:38:58 -04:00
Roelof van Dijk	abaa605f71	[ready] perf: start enumerate at 1 instead of checking all i (#1691 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-27 12:00:32 -04:00
Roelof van Dijk	2730ed657f	perf: faster lazyop eq (#1693 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-27 11:17:02 -04:00
Roelof van Dijk	6ca509a485	perf: constant in while in for in busy func (#1688 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-27 11:13:16 -04:00

... 159 160 161 162 163 ...

10417 Commits