tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-10 23:48:01 -05:00

Author	SHA1	Message	Date
chenyu	81597ddd96	increase lr for bert (#9098 ) had one run that converged better https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/u66tv2hh/overview	2025-02-14 19:10:35 -05:00
chenyu	b58e7b1898	zero out the weight in bert init run (#9076 ) `DEFAULT_FLOAT=HALF BENCHMARK=10 BS=66 EVAL_BS=6 GPUS=6 MODEL=bert python3 examples/mlperf/model_train.py` no longer oom. I think the buffer of random init weights caused the oom.	2025-02-14 08:40:41 -05:00
chenyu	9e91898941	bert eval at the end of training (#9070 ) always eval at the last epoch	2025-02-13 16:29:44 -05:00
chenyu	f4f56d7c15	move time_linearizer to extra.optimization.helpers [pr] (#9048 ) no longer used in tinygrad	2025-02-12 15:49:58 -05:00
chenyu	7b5ac2c15e	free_intermediates in bert (#9040 ) also re-enable dropout and update EVAL_BS	2025-02-12 10:00:39 -05:00
Ahmed Harmouche	916d5e7f08	WebGPU f16 support (f16 bounty part 2) (#8653 ) * WebGPU f16 support * Don't enable f16 yet * dtype tests passing after bitcast fix * Maybe all WebGPU green? * Require shader-f16 in examples * Minor wgsl touchup * 1 line shorter * Simpler * Add transcendetal support * log2 nan location mismatch on Vulkan * Nan skips	2025-02-12 19:46:53 +08:00
George Hotz	0568720a68	delete revectorize (#9000 ) * delete revectorize * test vectorized LLVM/CLANG * idk about that * was that the segfault?	2025-02-10 18:32:35 +08:00
George Hotz	4de084a835	cleanup ci, split docs/autogen, testing_minimal, LLVM Speed [pr] (#8952 ) * cleanup ci [pr] * testing_minimal * add hypothesis to minimal * fail tiktoken import okay * add LLVM speed test * llvm speed w/o beam	2025-02-07 19:01:59 +08:00
chenyu	a092b6395d	Tuple -> tuple, List -> list [pr] (#8936 )	2025-02-06 14:21:19 -05:00
George Hotz	8b16c65bca	add compile3 benchmark [pr] (#8929 )	2025-02-06 22:49:31 +08:00
geohotstan	6fb0e5751b	hotfix test_onnx_imagenet (#8897 ) * start * log severity * only change this * change abstraction so it's more usable for huggingface * WHOOPS * actually this is more correct	2025-02-05 14:39:55 +08:00
geohotstan	057c70b05f	add onnx_helpers to extra and add ort validate to benchmark_onnx (#8890 ) * start * log severity * only change this * change abstraction so it's more usable for huggingface --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-04 16:36:01 -05:00
George Hotz	f484db0e63	dsp cleanups [pr] (#8866 )	2025-02-03 15:18:53 +08:00
George Hotz	42d7c800a1	hotfix: add missing tinychat fonts + other assets	2025-02-01 09:34:44 +08:00
chenyu	c7ca7959e6	set DISABLE_DROPOUT=1 in bert script for now (#8799 )	2025-01-29 10:51:29 -05:00
chenyu	c99ae81f63	update default resnet LOSS_SCALER to 256 [pr] (#8774 )	2025-01-27 16:59:05 -05:00
George Hotz	e82ba1454b	MultiLazyBuffer is UOp [pr] (#8662 ) * MultiLazyBuffer is UOp [pr] * this is new mlb * this is the idea * progress * multitensor works * more movement ops * this * MultiLazyBuffer is UOp * cleanups * multi axis * fix more tests * work * not that * add multi grad and move shard to ops * mops not views * no double contig * sweet, all mt tests passing * port old logic * remove lbs * fix realized * whitespace * assign tweak * test_assign_kv_cache_multi passes * fix is_realized * fix JIT for multi * just a few more lines i'll pay them back soon i swear please bro just a few more * no split reduceop for multi	2025-01-24 13:28:55 +09:00
chenyu	eb77488f85	update llama3 70B to use R1 (#8733 )	2025-01-23 19:06:05 -05:00
chenyu	af65331b76	update beam params for bert green [pr] (#8726 ) increase BEAM_UPCAST_MAX and BEAM_LOCAL_MAX to default and matched red. 3% faster step	2025-01-22 22:00:05 -05:00
chenyu	9a9079118e	envvar BERT_LAYERS [pr] (#8709 ) default is 24 for large	2025-01-21 22:49:19 -05:00
chenyu	9f6d545a16	bert log global_norm in training step [pr] (#8708 ) * bert log global_norm in training step [pr] and minor cleanups * .item()	2025-01-21 20:36:27 -05:00
chenyu	1e283c33d3	remove realize in bert model init [pr] (#8707 )	2025-01-21 14:11:03 -05:00
geohotstan	dd82b4c913	make onnx runner a class (#8647 ) * this * clean up * more clean ups and improve debug msg * more correct training toggler * remove manual training toggling * change some variable names * actually just add the training toggle for LIMIT envvar too * more refinement * __call__ and OnnxRunner * fix half pylint, other half is importing from onnx while this file is onnx.py, figure out later * ahhhh found another mistake * remove limit from __call__ --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-01-20 10:11:05 -08:00
chenyu	c49e0fca60	GlobalCounters.reset() in sdxl step [pr] (#8664 )	2025-01-17 21:10:28 -05:00
chenyu	930728c069	bert BS 72->66 [pr] (#8621 ) 72 does not fit now	2025-01-14 18:41:41 -05:00
geohotstan	4abe631b56	fix onnx mobilenetv2-7-quantized.onnx (#8574 ) * is 67% considered fixed? * move test up * share function * add qgemm too * make sure qgemm comes out as int * actually that note is not right * remove qgemm (I did it wrong) and add it later lol.	2025-01-13 09:25:06 -08:00
chenyu	994944920b	simpler batch_load_train_bert [pr] (#8582 ) don't think that buffer is really beneficial. 5% faster data_time and 1ms faster per step. https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/69c9lx8y/overview	2025-01-12 20:25:05 -05:00
George Hotz	4ac4c1415a	free intermediate buffers in the jit [pr] (#8581 ) * free intermediate buffers in the jit [pr] * intermediates_freed * deallocate if not allocated * self._first_run is simpler	2025-01-12 15:41:41 -08:00
chenyu	def90b22f6	EVAL_BS=36 for bert [pr] (#8576 ) 3X faster eval compared to BS=6. green https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/ka5p5sm9/overview red https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/a7maxsxd/overview	2025-01-12 09:43:56 -05:00
George Hotz	9833fe83d8	more work on onnx imagenet [pr] (#8552 ) * more work on onnx imagenet [pr] * working quantization * static quant * benchmark onnx 0 dim	2025-01-09 20:28:18 -08:00
George Hotz	e172b759f0	more working (#8550 )	2025-01-09 18:40:08 -08:00
chenyu	b6be407bc6	fix handcode_opt bert [pr] (#8509 ) * fix handcode_opt bert [pr] * too slow	2025-01-05 19:14:12 -05:00
George Hotz	24de25b52f	example to benchmark onnx [pr] (#8459 ) * example to benchmark onnx [pr] * reset global count	2024-12-31 11:38:33 -05:00
qazal	866dfa1f23	create_schedule([x.lazydata]) -> x.schedule() in tests (#8449 )	2024-12-31 03:15:52 +08:00
Calum	d8b08790b9	Fix examples/conversation.py (#8425 ) * fix: conversation example * remove slice func * remove unused import * use Tensor.split	2024-12-26 12:45:19 -05:00
chenyu	4712847766	make self_tokenize output more like a python file (#8411 ) use comment for file name and join with newline instead of null byte when export to file	2024-12-25 14:16:30 -05:00
chenyu	a35eef8d58	optionally output to file in self_tokenize.py (#8399 ) can paste the whole tinygrad in gemini this way	2024-12-24 21:09:26 -05:00
Harald Schäfer	7059459648	Openpilot compile: fix for openpilot use (#8338 ) * compile3 changes * merge conflict * merge conflict * give dm npy for now * Revert "give dm npy for now" This reverts commit bfd980da7d2c2bab5b073127442c361922032ba1. * updates * Always float32 floats * Update compile3.py * Update compile3.py --------- Co-authored-by: ZwX1616 <zwx1616@gmail.com>	2024-12-19 19:43:15 -05:00
George Hotz	8f95b578f6	use Estimates class [pr] (#8319 ) * use Estimates class [pr] * frozen dataclass	2024-12-18 10:19:32 -08:00
George Hotz	37fa38d272	Revert "switch beautiful_mnist to use new optimizer [pr] (#8231 )" (#8233 ) This reverts commit `e9ee39df22`.	2024-12-13 19:07:09 -08:00
George Hotz	e9ee39df22	switch beautiful_mnist to use new optimizer [pr] (#8231 ) * switch beautiful_mnist to use new optimizer [pr] * fix abstractions3 + docs * fix OptimizerGroup with schedule_step api	2024-12-13 18:27:16 -08:00
Ahmed Harmouche	651f72442c	encapsulate the exported webgpu model (#8203 )	2024-12-13 10:55:37 +01:00
chenyu	64a917b7eb	remove LAZYCACHE ContextVar [pr] (#8175 ) also removed from resnet latest script	2024-12-11 22:02:52 -05:00
chenyu	26e049ab40	add ALLOWED_READ_IMAGE=2131 to openpilot (#8166 ) added as exact number check now as it's not clear if more/less than allowed is any better	2024-12-11 12:14:17 -08:00
Maxim Zakharov	e53a5bf0c3	StableDdiffusion UI - convenient send via Enter (#8160 )	2024-12-11 19:05:24 +01:00
George Hotz	f83d715f41	move checks into compile3, delete compile2 [pr] (#8127 ) * move checks into compile3 [pr] * test_vs_onnx * test v torch works * float16 won't compile on compile3 * actually delete compile2	2024-12-09 14:21:42 -08:00
George Hotz	a773c5a571	hotfix: default llama3 is 1B with download_model	2024-12-09 07:23:35 -08:00
Ahmed Harmouche	c6277fce09	Remove f16 decompression lib from SD compile.py (#8121 ) * Remove f16-to-f32-gpu lib, use tinygrad exported decompression * No need to create new instance	2024-12-09 14:09:00 +01:00
George Hotz	00ac0db9d4	np tensors have the memory from numpy in compile3 [pr] (#8098 )	2024-12-07 14:01:51 +08:00
George Hotz	22feb3a2f1	move copy into the JIT for openpilot compile3 (#7937 ) * move copy into the JIT, test fails * ahh, prune was the issue	2024-12-07 13:26:26 +08:00

1 2 3 4 5 ...

962 Commits