tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-24 06:18:01 -05:00

Author	SHA1	Message	Date
nimlgen	dc10187fc0	am: add am_smi (#8739 ) * am: start monitor * cleanups * fixes * hmm * progress * cleanup	2025-01-24 20:16:19 +03:00
George Hotz	7a2223a6c6	add merge views to ops_folding [pr] (#8051 ) Co-authored-by: qazal <qazal.software@gmail.com>	2025-01-24 17:45:11 +02:00
qazal	0814a79cb4	cleanup the merge_views upats [pr] (#8738 )	2025-01-24 16:49:54 +02:00
qazal	07069b9988	rename to tensor_uop [pr] (#8737 )	2025-01-24 13:42:25 +02:00
George Hotz	e82ba1454b	MultiLazyBuffer is UOp [pr] (#8662 ) * MultiLazyBuffer is UOp [pr] * this is new mlb * this is the idea * progress * multitensor works * more movement ops * this * MultiLazyBuffer is UOp * cleanups * multi axis * fix more tests * work * not that * add multi grad and move shard to ops * mops not views * no double contig * sweet, all mt tests passing * port old logic * remove lbs * fix realized * whitespace * assign tweak * test_assign_kv_cache_multi passes * fix is_realized * fix JIT for multi * just a few more lines i'll pay them back soon i swear please bro just a few more * no split reduceop for multi	2025-01-24 13:28:55 +09:00
chenyu	eb77488f85	update llama3 70B to use R1 (#8733 )	2025-01-23 19:06:05 -05:00
George Hotz	3e987fc856	add device print with -m tinygrad.device [pr] (#8729 ) * add device print with -m tinygrad.device [pr] * fix linter	2025-01-24 05:46:27 +09:00
geohotstan	04846b91aa	reorder and categorize onnx_ops (#8731 ) * new order * remove a todo * constant node is definitely requires_grad false * one new line spacing * property and graph * oops linter	2025-01-23 13:18:54 -05:00
qazal	8e5bd0cd7a	fix buffer init and skip test_swizzle_failure_permute [pr] (#8732 ) * fix buffer init and skip test_swizzle_failure_permute [pr] * replace preload with just load * add	2025-01-23 17:21:38 +02:00
nimlgen	e4512baea4	am: cleanup mm (#8730 ) * am: cleanup mm * cle * ops * entries	2025-01-23 15:49:37 +03:00
qazal	07ec99001a	keep VIEW in big_sink + copy of buffer view spec [pr] (#8727 ) * keep views in sink [pr] * tests * things from the gpt2 bug	2025-01-23 11:29:30 +02:00
qazal	6cb74bb630	fix using clone with shrink [pr] (#8724 ) * fix using clone with shrink [pr] * remove extra arg, add test_clone_with_shrink_realized	2025-01-23 08:28:07 +02:00
chenyu	af65331b76	update beam params for bert green [pr] (#8726 ) increase BEAM_UPCAST_MAX and BEAM_LOCAL_MAX to default and matched red. 3% faster step	2025-01-22 22:00:05 -05:00
qazal	907dfa0e82	image buffer realization spec [pr] (#8420 ) * image buffer realization spec [pr] * redo the spec * work	2025-01-22 20:25:22 +02:00
chenyu	49b914ee69	simpler bert acc [pr] (#8714 ) logit.log_softmax().argmax(-1) is equivalent to logit.argmax(-1)	2025-01-22 10:32:19 -05:00
nimlgen	93fb50ce77	allreduce: add flags (#8713 )	2025-01-22 17:44:31 +03:00
qazal	891436853d	remove buffer size check in schedule item [pr] (#8712 )	2025-01-22 13:36:30 +02:00
qazal	2dae467b75	scheduler + process_replay import cleanup (#8711 )	2025-01-22 12:44:07 +02:00
qazal	e3d1464ba4	move assign preload out of schedule item [pr] (#8710 ) * move assign preload out of schedule item [pr] * fix that	2025-01-22 12:43:57 +02:00
chenyu	9a9079118e	envvar BERT_LAYERS [pr] (#8709 ) default is 24 for large	2025-01-21 22:49:19 -05:00
chenyu	9f6d545a16	bert log global_norm in training step [pr] (#8708 ) * bert log global_norm in training step [pr] and minor cleanups * .item()	2025-01-21 20:36:27 -05:00
nimlgen	c5e46c5eee	am: recover from any boot interrupt (#8703 ) * am: recover from any load interrupt * add fuzzer * nu	2025-01-21 22:22:23 +03:00
chenyu	1e283c33d3	remove realize in bert model init [pr] (#8707 )	2025-01-21 14:11:03 -05:00
George Hotz	018edd934b	don't use view in copy [pr] (#8704 ) * don't use view in copy [pr] * oh, remove double contig * fix reps	2025-01-21 09:57:47 -08:00
qazal	d6bf1feaab	remove the "no copy" line from copy_to_device (#8702 ) * delete the no copy one * add tests	2025-01-21 17:09:33 +02:00
nimlgen	3628f89929	fix deallocate for subbuffers (#8701 ) * fix deallocate for subbuffers * forgot this * rm name * hmm	2025-01-21 16:34:19 +03:00
nimlgen	6733a3a96b	am: fix typo (#8700 )	2025-01-21 14:35:15 +03:00
qazal	f0d424ecdf	Tensor UOps can become a buffer or const after scheduling (#8698 ) * spec * work * update test_viewed_consts_do_not_realize * remove	2025-01-21 12:33:19 +02:00
qazal	e2008c98c3	allow symbolic shape in tensor const parents [pr] (#8699 )	2025-01-21 12:01:25 +02:00
nimlgen	2b239db5d2	temp() with usernames (#8697 )	2025-01-21 12:26:43 +03:00
qazal	66ac0087e8	more high level contiguous tests + scheduler deletions [pr] (#8695 ) * delete those * move the upat too * rename ops_folding to just sym * keep that	2025-01-21 01:52:58 +02:00
qazal	08eb1f1f56	simplify tensors before scheduling [pr] (#8580 ) * delete forced_realize * put that back * work * remove forced_realize * expectedFailures * contiguous(buffer) * multi * expectedFailures * cleaner create_subbuffer * more comments * remove that * note * realizes * work * one upat and image is back * remove * cleaner * fix test_complex_backward for now --------- Co-authored-by: George Hotz <geohot@gmail.com>	2025-01-20 23:42:42 +02:00
qazal	02ad450e22	add failing assert for gradient realization [pr] (#8692 )	2025-01-20 22:50:09 +02:00
qazal	b14c9848cc	small changes to make the tensor_map_simple diff cleaner [pr] (#8691 )	2025-01-20 22:25:59 +02:00
Sieds Lykles	1a15c0e89d	Move define_acc down an unrolled add chain (#8404 ) * Move define_acc down an unrolled add chain * Prevent possible infinite recursion * Add test * Fix typo in test * Move mulacc_unrolled to devoctorize + load_store_indexing pass * Add test for mulacc_unrolled by itself * undo formatter * import from ops, not rewriter * Add a const version --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-20 14:56:27 -05:00
geohotstan	dd82b4c913	make onnx runner a class (#8647 ) * this * clean up * more clean ups and improve debug msg * more correct training toggler * remove manual training toggling * change some variable names * actually just add the training toggle for LIMIT envvar too * more refinement * __call__ and OnnxRunner * fix half pylint, other half is importing from onnx while this file is onnx.py, figure out later * ahhhh found another mistake * remove limit from __call__ --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-20 10:11:05 -08:00
George Hotz	46a8c5e1e5	delete forced_realize (#8615 ) * delete forced_realize * put that back * expectedFailures * cleaner create_subbuffer * more comments --------- Co-authored-by: qazal <qazal.software@gmail.com> Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-01-20 09:40:36 -08:00
chenyu	679b1ad058	move softmax upcast to after subtracting max (#8684 ) * move softmax upcast to after subtracting max max can always be done in the same dtype without any numerical loss, so this is better when explicitly upcasting in softmax * skipUnless half	2025-01-20 12:16:32 -05:00
nimlgen	08ca871d77	am: remove pm block (#8688 ) * am: remove pm block * hm * oops	2025-01-20 18:05:22 +03:00
nimlgen	9d3c40601f	am: fast memory manager (#8654 ) * start * progress * fixes * smth * mini fixes * fix2 * ugh, need this for now * faster * cleanups * tiny linters * make mypy happier * test & free pts * ops * linter * cleanup vm * fix * remove map_from * tiny fixes * add test to ci	2025-01-20 16:58:22 +03:00
qazal	9e55495b4d	fold double contiguous [pr] (#8687 )	2025-01-20 14:38:33 +02:00
qazal	ed63ff2372	Remove contiguous on buffer (#8676 ) * remove contiguous on buffer * spec * make things that can't be images not images	2025-01-20 13:48:33 +02:00
qazal	3499a2c72d	start moving image things to rewrite rules (#8678 ) * start moving image things to rewrite rules [pr] * that too * as expected * fix * Revert "fix" This reverts commit `fd03c9464b`.	2025-01-20 13:34:29 +02:00
qazal	b1847d561f	smaller do_realize and some cleanups [pr] (#8685 ) * do_realize cleanups [pr] * cleanup assign * unwrap ShapeTracker as we expect it to exist	2025-01-20 12:47:01 +02:00
qazal	689bf68cfc	remove GroupOp.Meta [pr] (#8686 )	2025-01-20 12:24:19 +02:00
George Hotz	4198bce150	_apply_map_to_tensors [pr] (#8683 )	2025-01-19 17:56:04 -08:00
George Hotz	98d01a059d	rename uopgraph to rewriter [pr] (#8682 )	2025-01-19 17:03:12 -08:00
Ignacio Sica	f532c78889	minor space hotfix (#8679 )	2025-01-19 17:00:24 -08:00
chenyu	2d0842386d	fix parse_valid for float uop (#8681 ) x < c -> X <= c-1 only works for int	2025-01-19 18:15:49 -05:00
George Hotz	168c16646a	change create_schedule_with_vars api to big_sink [pr] (#8677 )	2025-01-19 13:30:26 -08:00

1 2 3 4 5 ...

7608 Commits