tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 14:58:46 -05:00

Author	SHA1	Message	Date
George Hotz	643cbdfd50	make embedding and GPT-2 fast (#1631 ) * make embedding fast * jit more, variable shape support * print mem bw	2023-08-22 15:14:38 -07:00
Niklas D	a7752ad65d	Fix link to state.py in quickstart (#1632 )	2023-08-22 17:39:30 -04:00
c143	c9c40bb16f	Import whole math module in tensor.py (#1628 )	2023-08-22 17:07:46 -04:00
Roelof van Dijk	6fcfa50b35	[ready] perf: no noop cast just to make mypy happy (#1626 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-22 17:07:22 -04:00
Roelof van Dijk	f04a6d7882	perf: faster partition (#1625 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-22 11:56:41 -07:00
George Hotz	d3c401ba3c	llama quantize: scale uses mul, not div	2023-08-22 11:48:56 -07:00
George Hotz	696e4d20a1	fix KOPT=2 with variable shape	2023-08-22 11:34:34 -07:00
George Hotz	de1fcc418f	no more toCPU path (#1624 )	2023-08-22 11:07:26 -07:00
George Hotz	463dece63e	auto arg dtypes (#1623 )	2023-08-22 10:22:40 -07:00
George Hotz	db8344ab83	add noalias to llvm (#1622 )	2023-08-22 09:26:01 -07:00
chenyu	89e13f2f04	support symbols in shrink (#1611 )	2023-08-22 09:08:21 -07:00
George Hotz	718ced296c	move state to nn/state (#1619 )	2023-08-22 07:36:24 -07:00
Umut Zengin	1e93fd5449	Readability for unreadable functions (#1610 ) * cleaned * typing * typing * if format * if format * mypy * update argmax * argmax more readable * More stable def pad * lint	2023-08-22 07:09:08 -07:00
George Hotz	86a32ffb1a	lt sum (#1617 )	2023-08-21 21:19:16 -07:00
George Hotz	c64c47a6ae	test arange simple	2023-08-21 20:16:17 -07:00
George Hotz	4f459841bc	Symbolic JIT for GPT2 (#1613 ) * not fast yet * simpler * symbolic jit * fp16 GOPS and GB	2023-08-21 19:44:57 -07:00
Yixiang Gao	4f02491cd4	add cpu if torch tensor (#1609 )	2023-08-21 16:57:59 -07:00
Umut Zengin	f720682beb	np.argmax to Tensor.argmax (#1608 ) * to tensor argmax * removed keepdim * training update	2023-08-21 15:22:29 -07:00
George Hotz	4ea00bad38	track down llama bug	2023-08-21 15:14:21 -07:00
Roelof van Dijk	b02f77b354	perf: faster broadcasted (#1601 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-21 14:21:46 -07:00
Yixiang Gao	4d54afb6df	sparse cat cross entropy (#1597 ) * add sparse cat cross entropy * minor fix * add log_softmax into loss function * add test * update docs * fix training loss * add device	2023-08-21 14:14:54 -07:00
Roelof van Dijk	109100656f	refactor: no len if it is not needed (#1598 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-21 14:06:32 -07:00
Roelof van Dijk	2c8f8ac611	perf: no ret needed (#1604 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-21 14:05:13 -07:00
Roelof van Dijk	750714c386	perf: namedtuples are hashable, don't need a key (#1607 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-21 14:01:10 -07:00
George Hotz	aaa6fdf347	this was unused code (#1600 )	2023-08-21 12:02:58 -07:00
Roelof van Dijk	8e8724d3a8	perf: if argument order (mops) (#1599 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-21 11:20:39 -07:00
George Hotz	2e60920317	Revert "sparse cat cross entropy (#1591 )" (#1596 ) This reverts commit `f0ee850e98`.	2023-08-21 10:04:26 -07:00
Yixiang Gao	f0ee850e98	sparse cat cross entropy (#1591 ) * add sparse cat cross entropy * minor fix * add log_softmax into loss function * add test * update docs	2023-08-21 09:56:41 -07:00
Yixiang Gao	8d6662a741	.cpu().numpy() -> .numpy() (#1594 ) * .cpu().numpy() -> .numpy() * restore ops_torch * restore test_speed_v_torch	2023-08-21 09:53:29 -07:00
Umut Zengin	35bf21276f	Argmax/Argmin Feature (#1576 ) * implemented argmax and argmin * lint * lint * match torch behaviour * format * removed flip	2023-08-20 18:46:46 -07:00
Roelof van Dijk	1900acda09	[READY] ci: setup venv cache (#1475 ) * ci: cache installed packages * ci: trigger jobs * ci: fix hashfiles argument --------- Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-20 18:43:16 -07:00
Umut Zengin	3fc7e984f0	__getitem__ refactoring (#1586 ) * dene * dene * form * form * form * form * lint * small change * preserve old * revert to explicit reshape	2023-08-20 18:42:30 -07:00
George Hotz	d627349af0	teeny changes (#1589 ) * teeny changes * import order	2023-08-20 13:38:38 -07:00
George Hotz	012ee7d162	not worth the speed (#1584 ) * not worth the speed * no slots * uops comments * bump to python 3.11 for speed * add critical slots back	2023-08-20 10:24:58 -07:00
George Hotz	739f327d2d	Shorter (#1582 ) * deleting lines * remove insert dims * if statement is never hit * bug fixes	2023-08-20 08:12:16 -07:00
David Hou	4fbce972d7	CSE at uop level (#1483 ) * uop-level cse * add test * don't cache reduce alu ops * types * rename variable * fix * delete lines	2023-08-19 23:40:40 -07:00
George Hotz	b9feb1b743	fp16 support in stable diffusion	2023-08-20 05:37:21 +00:00
George Hotz	ad7d26c393	fix __launch_bounds__ and benchmark TC MATMUL (#1575 ) * fix * benchmark matmul	2023-08-19 10:54:39 -07:00
David Hou	92754e177c	cache buffer loads across multiple bufs (#1482 ) * cache loads across buffers (since they may share rawbufs) * typing * add test * fix test * small changes to test * fix test * one big cache * whitespace * golf a line? * invalid is RawBuffer(0)[0], valid 1.	2023-08-19 09:09:58 -07:00
George Hotz	e464442adf	WMMA for 7900XTX (#1563 ) * go * hip no LRU * work * works * 16 TFLOPS * 29 TFLOPS * 30 TFLOPS * never mind, it's 60 TFLOPS * fix metal WMMA * put hip alloc back	2023-08-19 09:07:23 -07:00
nimlgen	faa521bcab	fix usage of arm64 regs according to CC (#1570 )	2023-08-18 21:40:32 -07:00
corranr	68ebbd2954	for issue #1555 , int64 and int8 in CI=1 ARM64=1 CLANG=1 (#1572 ) * fixed for int8,int64, added dtype broadcasting test, passing all CI,ARM64,CLANG tests * remove shifts	2023-08-18 21:40:13 -07:00
chenyu	ae39cf84ab	Symbolic Shape JIT main PR (#1353 ) * Symbolic Shape JIT update tests 2 variables symbolic ops, adding more tests test passing cleanup * more test cases * single flag * review update * jit attention one piece * realize * symbolic_jit test for cuda * old artifact * works with cuda gpu but failed ci * CUDACPU	2023-08-18 14:39:55 -07:00
Roelof van Dijk	84e6693915	fix: apt-get to apt, no recommends, clean up (#1571 ) Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>	2023-08-18 13:48:59 -07:00
wozeparrot	50decf0d45	train cifar using multigpu (#1529 ) * feat: train cifar using multigpu * feat: split eval batch across 5 * feat: cleaner allreduce * feat: 93.88% * feat: cleaner batch chunking from bert * feat: cleaner grad sync * feat: tinygrad argmax * feat: make it work with different gpu counts * feat: move some stuff into the normal __init__ * feat: autodetect gpu count * feat: move import inside	2023-08-18 09:35:44 -07:00
chenyu	be50b2fe8f	more symbolic symbolic ops (#1564 ) * more symbolic symbolic ops * handle NumNode in __mul__	2023-08-18 09:21:41 -07:00
chenyu	dfec16cc83	Support arg int for CUDA kernel (#1565 )	2023-08-18 09:19:40 -07:00
wozeparrot	15150d60c4	fix: small fix for lru on hip (#1567 )	2023-08-18 09:18:38 -07:00
wozeparrot	c65ad43a93	cleanup ops_gpu (#1566 )	2023-08-17 23:43:08 -04:00
nimlgen	bd111411bf	init allocator for compiled backends (#1467 ) * init allocator for compiled backends * Update ops_webgpu.py --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-08-17 10:33:32 -07:00

1 2 3 4 5 ...

2341 Commits