tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-29 03:00:14 -04:00

Author	SHA1	Message	Date
Francis Lata	335d11281c	add multi device support to retinanet eval	2025-01-29 10:00:46 -08:00
Francis Lata	fc957e7377	update validation dataloader and more cleanups	2025-01-28 03:02:17 -08:00
Francis Lata	3b9e5a3ed4	debug test	2025-01-27 08:35:36 -08:00
Francis Lata	2177053076	Merge branch 'master' into retinanet_mlperf	2025-01-27 08:07:19 -08:00
Francis Lata	e91733baae	refactor on training loop and start the work on val looop	2025-01-27 07:27:19 -08:00
George Hotz	e82ba1454b	MultiLazyBuffer is UOp [pr] (#8662 ) * MultiLazyBuffer is UOp [pr] * this is new mlb * this is the idea * progress * multitensor works * more movement ops * this * MultiLazyBuffer is UOp * cleanups * multi axis * fix more tests * work * not that * add multi grad and move shard to ops * mops not views * no double contig * sweet, all mt tests passing * port old logic * remove lbs * fix realized * whitespace * assign tweak * test_assign_kv_cache_multi passes * fix is_realized * fix JIT for multi * just a few more lines i'll pay them back soon i swear please bro just a few more * no split reduceop for multi	2025-01-24 13:28:55 +09:00
chenyu	eb77488f85	update llama3 70B to use R1 (#8733 )	2025-01-23 19:06:05 -05:00
chenyu	af65331b76	update beam params for bert green [pr] (#8726 ) increase BEAM_UPCAST_MAX and BEAM_LOCAL_MAX to default and matched red. 3% faster step	2025-01-22 22:00:05 -05:00
Francis Lata	6fdcaa178b	add checkpointing and training resume capabilities	2025-01-22 14:20:17 -08:00
Francis Lata	95cdbbf237	add jit to the training loop	2025-01-22 12:31:29 -08:00
Francis Lata	efe64ebeaf	enable lr scheduler and fix benchmark timing	2025-01-22 09:56:38 -08:00
chenyu	9a9079118e	envvar BERT_LAYERS [pr] (#8709 ) default is 24 for large	2025-01-21 22:49:19 -05:00
chenyu	9f6d545a16	bert log global_norm in training step [pr] (#8708 ) * bert log global_norm in training step [pr] and minor cleanups * .item()	2025-01-21 20:36:27 -05:00
Francis Lata	d1bc4aef94	do not realize when sharding model weights	2025-01-21 13:45:35 -08:00
Francis Lata	7f331d8836	fix dataloader script	2025-01-21 13:43:59 -08:00
Francis Lata	1bf5ee286b	Revert "debug dataset test failuire" This reverts commit `1b2f9d7f50`.	2025-01-21 13:30:12 -08:00
Francis Lata	1b2f9d7f50	debug dataset test failuire	2025-01-21 13:23:50 -08:00
Francis Lata	7815d3ddff	Merge branch 'master' into retinanet_mlperf	2025-01-21 13:06:04 -08:00
chenyu	1e283c33d3	remove realize in bert model init [pr] (#8707 )	2025-01-21 14:11:03 -05:00
Francis Lata	bf36006ff0	set seed	2025-01-20 22:54:54 -08:00
Francis Lata	5d9a604963	add support for BENCHMARK	2025-01-20 22:47:23 -08:00
Francis Lata	be2e97260d	fix dtype for anchor inside dataloader and fix horizontal flip transformation	2025-01-20 22:45:25 -08:00
Francis Lata	cd511384e2	move anchors as part of dataloader	2025-01-20 13:13:16 -08:00
geohotstan	dd82b4c913	make onnx runner a class (#8647 ) * this * clean up * more clean ups and improve debug msg * more correct training toggler * remove manual training toggling * change some variable names * actually just add the training toggle for LIMIT envvar too * more refinement * __call__ and OnnxRunner * fix half pylint, other half is importing from onnx while this file is onnx.py, figure out later * ahhhh found another mistake * remove limit from __call__ --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-20 10:11:05 -08:00
Francis Lata	575c748d94	fix wandb resuming feature	2025-01-20 07:22:16 -08:00
Francis Lata	a90a6e624d	add wandb	2025-01-20 07:07:51 -08:00
Francis Lata	9402872d90	Merge branch 'master' into retinanet_mlperf	2025-01-20 06:51:12 -08:00
chenyu	c49e0fca60	GlobalCounters.reset() in sdxl step [pr] (#8664 )	2025-01-17 21:10:28 -05:00
Francis Lata	4bc762120e	Merge branch 'master' into retinanet_mlperf	2025-01-15 02:45:21 -08:00
chenyu	930728c069	bert BS 72->66 [pr] (#8621 ) 72 does not fit now	2025-01-14 18:41:41 -05:00
Francis Lata	b957b023fc	Merge branch 'master' into retinanet_mlperf	2025-01-13 09:33:38 -08:00
geohotstan	4abe631b56	fix onnx mobilenetv2-7-quantized.onnx (#8574 ) * is 67% considered fixed? * move test up * share function * add qgemm too * make sure qgemm comes out as int * actually that note is not right * remove qgemm (I did it wrong) and add it later lol.	2025-01-13 09:25:06 -08:00
Francis Lata	aeecce1d18	Merge branch 'master' into retinanet_mlperf	2025-01-13 07:02:19 -08:00
chenyu	994944920b	simpler batch_load_train_bert [pr] (#8582 ) don't think that buffer is really beneficial. 5% faster data_time and 1ms faster per step. https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/69c9lx8y/overview	2025-01-12 20:25:05 -05:00
George Hotz	4ac4c1415a	free intermediate buffers in the jit [pr] (#8581 ) * free intermediate buffers in the jit [pr] * intermediates_freed * deallocate if not allocated * self._first_run is simpler	2025-01-12 15:41:41 -08:00
chenyu	def90b22f6	EVAL_BS=36 for bert [pr] (#8576 ) 3X faster eval compared to BS=6. green https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/ka5p5sm9/overview red https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/a7maxsxd/overview	2025-01-12 09:43:56 -05:00
Francis Lata	f7537e4db2	Merge branch 'master' into retinanet_mlperf	2025-01-10 05:46:04 -08:00
George Hotz	9833fe83d8	more work on onnx imagenet [pr] (#8552 ) * more work on onnx imagenet [pr] * working quantization * static quant * benchmark onnx 0 dim	2025-01-09 20:28:18 -08:00
George Hotz	e172b759f0	more working (#8550 )	2025-01-09 18:40:08 -08:00
chenyu	b6be407bc6	fix handcode_opt bert [pr] (#8509 ) * fix handcode_opt bert [pr] * too slow	2025-01-05 19:14:12 -05:00
George Hotz	24de25b52f	example to benchmark onnx [pr] (#8459 ) * example to benchmark onnx [pr] * reset global count	2024-12-31 11:38:33 -05:00
qazal	866dfa1f23	create_schedule([x.lazydata]) -> x.schedule() in tests (#8449 )	2024-12-31 03:15:52 +08:00
Francis Lata	40d6752854	adjust regression loss to mask after L1 loss is calculated	2024-12-27 17:41:12 +00:00
Francis Lata	cc4a673aa9	Merge branch 'master' into retinanet_mlperf	2024-12-26 21:11:55 +00:00
Calum	d8b08790b9	Fix examples/conversation.py (#8425 ) * fix: conversation example * remove slice func * remove unused import * use Tensor.split	2024-12-26 12:45:19 -05:00
chenyu	4712847766	make self_tokenize output more like a python file (#8411 ) use comment for file name and join with newline instead of null byte when export to file	2024-12-25 14:16:30 -05:00
chenyu	a35eef8d58	optionally output to file in self_tokenize.py (#8399 ) can paste the whole tinygrad in gemini this way	2024-12-24 21:09:26 -05:00
Francis Lata	c1a18e13ef	make training work	2024-12-23 21:48:55 +00:00
Francis Lata	d1627d0b1b	start re-enabling training step	2024-12-23 19:43:20 +00:00
Francis Lata	44abfbcacb	Merge branch 'master' into retinanet_mlperf	2024-12-23 05:36:37 +00:00

1 2 3 4 5 ...

1033 Commits