tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-13 17:08:11 -05:00

Author	SHA1	Message	Date
Ahmed Harmouche	2114dc13d1	Allow multi-input model export (#1995 ) * Allow multi-input model export * Add model export unit test * Fix efficientnet compilation * Only run model export test on JIT supported devices * Skip export model test if not EXPORT_SUPPORTED_DEVICE	2023-10-07 04:13:34 -07:00
chenyu	05be57f57f	Fix llama with empty prompt (#1997 ) * fix llama with one token prompt * llama is all_jitted	2023-10-06 06:48:07 -07:00
chenyu	da2b3e55f4	simpler llama - don't shrink twice (#1981 )	2023-10-05 14:31:46 -07:00
chenyu	c99fa58dd2	simplify gpt2 example (#1973 ) * simplify gpt2 example * kernel_jitted_count and jit tests * Revert "kernel_jitted_count and jit tests" This reverts commit `31a3c26dd0`. * all_jitted test in test_real_world	2023-10-05 07:09:29 -07:00
nimlgen	2ea1dd3e87	no process() in Linearizer (#1966 ) * no process() in Linearizer * more process() clean up	2023-10-04 07:18:42 -07:00
Daniel Riege	579cabf668	Fix examples/train_efficientnet (#1947 ) * added missing colon * bug fixes for cifar10 dataset loading needed a reshape to work with conv layers and resolve fetched tensor to numpy since further code expects numpy array	2023-10-02 02:23:38 -07:00
George Hotz	90326dbdc3	resnet50 hand coded optimization (#1945 ) * resnet50 hand coded opt * hand optimize one kernel * opt in both places to fix test	2023-09-29 09:34:51 -07:00
George Hotz	4ff35e2b97	better resnet eval (#1943 )	2023-09-29 05:40:25 -07:00
George Hotz	48c8d130ae	simpler GPT2 (#1941 ) * don't realize in gpt2 * simpler gpt2	2023-09-29 04:41:09 -07:00
Yixiang Gao	094d3d71be	with Tensor.train() (#1935 ) * add with.train * remove the rest TODOs * fix pyflake * fix pyflake error * fix mypy	2023-09-28 18:02:31 -07:00
George Hotz	adab724caa	schedule2, keep the tests working with small changes (#1932 ) * lazy cleanups * ast functions take in LazyOps * op instead of self.op * _base for mops * fix contiguous * start schedule * test_schedule * fix openpilot * more tests * bugfix and test skip * work * make sure things get freed * fix zerosized tensors * fix failing test * fix ceil and friends * fix openpilot * disable training * disable test collectives	2023-09-28 09:14:43 -07:00
Dat D. Nguyen	ae9529e678	chore: remove redundant noise in stable diffusion example (#1910 )	2023-09-24 21:33:45 +08:00
Gijs Koning	b8ff20ffe4	Gpt2 (#1896 ) * small helps * got something working * faster? * faster yes * cleanup * cleanup * cleanup * Fix non jit * Fix fp16 and some cleanup * Fix fp16 and some cleanup * cleanup * similar to master * cleanup	2023-09-22 20:14:47 +08:00
Yixiang Gao	cb5d6576cb	cifar step time 65ms while stay above 94% (#1888 ) * change reduceop heruistics * add model ema and jit hack * add ema eval * have to create a duplicate eval function for jit * remove manual seed * 94% achieveable with normal eval * ema is outputting the same results as normal * fix ema bug * ema achieves 94% with fix seed * multigpu tested * constant fold decay, fix jit, adjust message for multigpu * pull SpeedyResNet out of train_cifar()	2023-09-21 11:19:32 +08:00
nimlgen	4c31dfafb3	add seed to gpt-2 (#1869 )	2023-09-15 17:34:14 -04:00
segf00lt	9e8c1dbf34	patch to remove hack from stable_diffusion.py (#1814 ) * patch to remove hack from stable_diffusion.py * sorry linter * realize after assign? * float16 broken in llvmlite use float64 for now * int32 * idiot forgot to change test array dtype	2023-09-08 09:26:50 -07:00
chenyu	ebcda8a714	Move var_vals from ShapeTracker to LazyBuffer (#1819 )	2023-09-08 09:25:10 -07:00
George Hotz	722823dee1	stable diffusion: force fp16 free	2023-09-06 15:11:05 -07:00
Yixiang Gao	22cf15e9d0	convert function into tinygrad (#1803 )	2023-09-06 14:41:26 -07:00
Pavol Rusnak	52a92bf95d	use class Foo: instead of class Foo(): (#1797 ) * use class Foo: instead of class Foo(): * add ruff linter, copy settings from .flake8 to ruff.toml	2023-09-06 12:20:25 -07:00
badcc	fd25792c8b	Ensure freqs as type float32 in freqs_cis (#1798 )	2023-09-06 10:24:15 -07:00
George Hotz	f67638b27a	delete broken DDPG example	2023-09-06 08:01:12 -07:00
Francis Lam	0379b64ac4	add seed option to stable_diffusion (#1784 ) useful for testing correctness of model runs	2023-09-05 19:45:15 -07:00
George Hotz	fb1cc6bf4b	llama jit is default, print tok/sec (#1774 ) * llama jit is default, print tok/sec * jit not default in CI	2023-09-05 10:12:16 -07:00
Yixiang Gao	66a6bbd029	codellama (#1702 ) * add codellama with pre-downloaded weights * add rope_theta, fix param * fix test * add 7B-Python * add 7B-Instruct * replace single quotes with doulbe --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-09-02 08:45:12 -07:00
chenyu	a2745819f6	faster gpt2 jit path and gpt2 in test_real_world (#1738 )	2023-09-02 08:39:12 -07:00
geohotstan	94b1257f5e	Changed DEVICE to Device.DEFAULT in deep_determinist_policy_gradient (#1715 ) * added device in optim and deep * oops forgot to del print code * use Device.DEFAULT instead * removed device	2023-08-31 07:08:51 -07:00
nimlgen	b5cf274da3	remove memory peak for quantized llama (#1720 )	2023-08-30 16:32:30 -04:00
chenyu	e4eb5d55c7	critical realize for unjitted llama (#1718 )	2023-08-30 14:52:32 -04:00
George Hotz	cd7ceed914	gpt2: print total instead of sync time	2023-08-30 10:59:42 -07:00
Karan Handa	a8aa13dc91	[ready] Replacing os with pathlib (#1708 ) * replace os.path with pathlib * safe convert dirnames to pathlib * replace all os.path.join * fix cuda error * change main chunk * Reviewer fixes * fix vgg * Fixed everything * Final fixes * ensure consistency * Change all parent.parent... to parents	2023-08-30 10:41:08 -07:00
chenyu	ac183568be	llama JIT python runtime speedup (#1633 ) * no JIT call in TransformerBlock * idea * move 2 reshapes to jitted function shrink inside jitted too, 6.3ms remove back reshapes, 5.5ms isinstance -> __class__ 4.99ms * think revert ops_gpu.py revert symbolic.py too PYOPENCL_COMPILER_OUTPUT=1 * cleanup * fix cache shape for conversational model only reshape if start_pos > 0 * small cleanup * include var_vals.keys() to st.key * add comments * llama small update * everything jitted again, similar structure to gpt2 * fix typing * add TODO for in place update cache	2023-08-30 07:51:05 -07:00
Umut Zengin	1682e9a38a	Fix: Stable Diffusion index (#1713 )	2023-08-30 00:21:10 -04:00
George Hotz	aa7c98722b	sd timing (#1706 )	2023-08-28 20:22:57 -07:00
nimlgen	1c0449e190	add cache collector (#1595 ) * init cache collector * add test_cache_collector.py * switch GlobalCounters.cache to CacheCollector * init jit models test * jitted SD * add debug msg to print loaded bufs count * moved cache collctor to jit * clearer SD * no double device import	2023-08-28 19:59:55 -07:00
Olivier Chafik	ee6d8de2dc	Llama: load models in HuggingFace format (incl. indexed, safetensors) (#1583 )	2023-08-28 15:11:40 -04:00
Yixiang Gao	9d93a82354	remove FAKEDATA (#1685 )	2023-08-26 20:15:54 -04:00
Yixiang Gao	173850f599	fix CIFAR jit (#1657 ) * update mask function * kept 94 with the new fetcher clean up batch fetcher * 94.04% without cutmix * 94.04% with cutmix * move batch fetcher to avoid fetching additional batch last STEP	2023-08-24 16:14:40 -07:00
George Hotz	a6d842af7a	move device to ops (#1646 ) * move device to ops * mlops types * 2 lines	2023-08-23 08:30:17 -07:00
George Hotz	643cbdfd50	make embedding and GPT-2 fast (#1631 ) * make embedding fast * jit more, variable shape support * print mem bw	2023-08-22 15:14:38 -07:00
George Hotz	d3c401ba3c	llama quantize: scale uses mul, not div	2023-08-22 11:48:56 -07:00
chenyu	89e13f2f04	support symbols in shrink (#1611 )	2023-08-22 09:08:21 -07:00
George Hotz	718ced296c	move state to nn/state (#1619 )	2023-08-22 07:36:24 -07:00
George Hotz	4f459841bc	Symbolic JIT for GPT2 (#1613 ) * not fast yet * simpler * symbolic jit * fp16 GOPS and GB	2023-08-21 19:44:57 -07:00
Umut Zengin	f720682beb	np.argmax to Tensor.argmax (#1608 ) * to tensor argmax * removed keepdim * training update	2023-08-21 15:22:29 -07:00
George Hotz	4ea00bad38	track down llama bug	2023-08-21 15:14:21 -07:00
Yixiang Gao	4d54afb6df	sparse cat cross entropy (#1597 ) * add sparse cat cross entropy * minor fix * add log_softmax into loss function * add test * update docs * fix training loss * add device	2023-08-21 14:14:54 -07:00
George Hotz	2e60920317	Revert "sparse cat cross entropy (#1591 )" (#1596 ) This reverts commit `f0ee850e98`.	2023-08-21 10:04:26 -07:00
Yixiang Gao	f0ee850e98	sparse cat cross entropy (#1591 ) * add sparse cat cross entropy * minor fix * add log_softmax into loss function * add test * update docs	2023-08-21 09:56:41 -07:00
Yixiang Gao	8d6662a741	.cpu().numpy() -> .numpy() (#1594 ) * .cpu().numpy() -> .numpy() * restore ops_torch * restore test_speed_v_torch	2023-08-21 09:53:29 -07:00

... 15 16 17 18 19 ...

1207 Commits