tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 15:08:02 -05:00

Author	SHA1	Message	Date
George Hotz	8ff03806e8	add llama layers (#11460 ) * add llama layers * add contig bw for speed	2025-07-31 16:28:04 -07:00
wozeparrot	6252f7770e	feat: fake data (#11447 )	2025-07-30 17:18:20 -07:00
chenyu	e300451f3a	update llama3 (#11446 ) `LR=1e-4 TRAIN_ON_VAL=1 DEFAULT_FLOAT=bfloat16 FUSE_ARANGE=1 JITBEAM=2 OPTIM_DTYPE=bfloat16 LLAMA3_SIZE=1B WARMUP_STEPS=36 DECAY_STEPS=360 SEQLEN=512 PYTHONPATH=. AMD=1 AMD_LLVM=0 MODEL=llama3 python3 examples/mlperf/model_train.py` trained to 7	2025-07-30 19:34:21 -04:00
wozeparrot	5fb975351a	feat: flag for training on val (#11441 )	2025-07-30 14:29:45 -07:00
wozeparrot	825b6a2505	feat: llama3 dataloader (#11340 )	2025-07-30 13:27:55 -07:00
George Hotz	842184a1ab	rename kernelize to schedule, try 2 (#11305 )	2025-07-21 11:18:36 -07:00
nimlgen	cc3c1e4c14	hcq: move cpu to hcq (#11262 ) * hcq: move cpu to hcq * import time * upd * fix * windows support * hm * cleaner * fix timer * fix timing * std is ns * skip profiler * mypy * cleaner * cleanups * after merge * default is back	2025-07-21 15:10:38 +03:00
chenyu	85ddd72038	simpler grouptop in hcopt (#11219 ) * simpler grouptop in hcopt keep the only perf relevant conditions and the rest is handled by try except * update openpilot read image count	2025-07-13 16:06:09 -04:00
chenyu	a0438012af	remove Kernel.get_program [pr] (#11203 )	2025-07-12 20:50:29 -04:00
geohotstan	5ce278b245	OnnxRunner file as input (#10789 ) * file path as input and have parse be in OnnxRunner.__init__ * modelproto_to_onnxrunner -> modelproto_to_runner * whoops, fix import * oh flakiness again, is it because it's getting gc-ed? * small changes * CI flaky so just move compile4 fix in * copy typing of onnx_load * actually can just import onnx_load instead of onnx.load * fix external_benchmark_openpilot * fix onnx_runner test to use onnx_helper * rerun CI * try run_modelproto * spam CI a few times * revert run_modelproto since that's flaky also * no external onnx_load usage except onnx.py * cursor tab complete is evil. Snuck a darn sorted in. But does order change result? Why? * model_benchmark 193s -> 80s, add OnnxRunner.to()... * minimize diff and clean up * device can be None, weird but eh --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-07-12 14:27:46 -04:00
chenyu	b072be0e2d	hotfix whisper main script (#11184 )	2025-07-11 12:34:00 -04:00
Nino Risteski	bc15e98f5c	clean up unused imports in examples and update CI linting (#11024 ) * clean up unused imports in examples * enable unused import checking in examples * lint * ignore F541 and F841 - focus on unused imports only * clean up * restore tinygrad.frontend.torch for TINY_BACKEND * tiny change	2025-06-30 08:21:27 -07:00
chenyu	c14c9a8eff	llama3 grad clip (#11003 )	2025-06-27 19:14:12 -04:00
chenyu	f2548afeb5	bert grad clipping start with const 0 (#11008 ) saved the init kernels	2025-06-27 18:02:23 -04:00
chenyu	6ab5a5cb6c	llama3 mlperf train (#10983 ) work in progress. now it can overfit small examples and vram roughly matches	2025-06-26 20:24:27 -04:00
geohotstan	50936b4a18	ONNX real float16 (#10694 ) * squash commits * temp fix for const tensor * actually realizing float16 can only happen in raw_data * .float -> cast(float) to rerun CI --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-06-26 14:05:12 -04:00
chenyu	8751d47985	CosineAnnealingLRWithWarmup (#10981 )	2025-06-25 17:45:21 -04:00
chenyu	efad567ebd	ruff check whole `examples/mlperf/` (#10979 )	2025-06-25 12:57:48 -04:00
Alexey Zaytsev	230ad3a460	[bounty] Don't use numpy inside hlb_cifar10 training loop (#10777 ) * Don't use numpy inside hlb_cifar10 training loop * Lint it * jit it * Drop the last half-batch * Use gather for random_crop and reuse perms * Wrap train_cifar in FUSE_ARANGE context * No need to pass FUSE_ARANGE=1 to hlb_cifar10.py * Add cutmix to jittable augmentations * Remove .contiguous() from fetch_batches * Fix indexing boundary --------- Co-authored-by: Irwin1138 <irwin1139@gmail.com>	2025-06-23 17:24:56 -07:00
chenyu	3699d1d3ba	hotfix llama3 temperature is float (#10938 )	2025-06-23 15:20:56 -04:00
chenyu	0480139def	log_perplexity metrics (#10912 )	2025-06-21 10:44:47 -04:00
George Hotz	b41e0563a3	move stuff to kernelize folder (#10902 ) * move stuff to kernelize folder * oops, forgot that	2025-06-20 16:10:20 -07:00
George Hotz	92678e59ee	move kernel to opt (#10899 )	2025-06-20 15:22:28 -07:00
chenyu	62a540066e	remove DEBUG=2 in mi300x bert setup (#10886 ) seems fine now, not sure what the issue was	2025-06-19 13:28:53 -04:00
Nino Risteski	5a56710ff4	small fix replacing download_file with fetch (#10877 ) * imported a missing os and replaced download_file with fetch from tg helpers * use fetch directly * Remove if not os.path.isfile	2025-06-19 12:12:09 -04:00
chenyu	8d721a4ead	add 405B params to llama3.py (#10884 ) tested with `python examples/llama3.py --model /raid/weights/llama31_405b/ --size 405B --shard 8 --benchmark` on tinyamd2	2025-06-19 11:45:37 -04:00
chenyu	f377cc19cd	use AM for bert (#10882 ) have triained 3 runs and all seem fine	2025-06-19 09:48:54 -04:00
chenyu	b70c7d3631	bert grad accumulation (#10863 ) * bert grad accumulation * realize grad	2025-06-18 12:17:07 -04:00
George Hotz	cba6e15937	split grouper and kernelize [pr] (#10854 )	2025-06-17 17:54:20 -07:00
chenyu	075a74cf25	add global_batch_size to mlperf bert (#10852 ) global_batch_size = grad_acc_steps * batch_size. no-op change to prep grad acc for bert	2025-06-17 17:54:15 -04:00
chenyu	7d5c769c6b	fix compile4 (#10797 )	2025-06-12 22:28:56 -04:00
chenyu	81e296d7b8	remove Tensor.test() in retinanet (#10770 ) test was removed	2025-06-10 22:14:57 -04:00
George Hotz	acf72872b3	move view left to the outer graph prereqs + testing (#10725 ) * move view left to the outer graph * global view right * dont need that one * remove comment * test kernelize * simple * split onnx, test sdxl null * fix testing * ugh, wrong one * Update test.yml	2025-06-09 20:43:25 -07:00
b1tg	24d328e313	onnx parser (#10435 ) * onnx parser * fix compile, lint * onnx.load -> onnx_load * compatible with ModelProto * fix test external_test_onnx_ops.py * fix tests * fix signed int * reduce to 261 lines * fix TypeProto.Optional * debug for _parse_message, add TypeProto.Sequence, cleanup * onnx_load from Tensor * remove BufferedReader * 174 lines and reduce tensor copy * cleanup * use onnx_load in external_model_benchmark.py * fix qcom test * [onnx] parser support external data --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-06-09 12:44:28 -04:00
Sieds Lykles	cfa65bea05	Subtract 1 from Variable upper bound (#10715 )	2025-06-09 09:25:53 -07:00
George Hotz	32e9949052	rename lazydata to uop (#10698 )	2025-06-08 08:42:22 -07:00
chenyu	e88fe41d37	update vits vctk model to use download from huggingface (#10688 ) google drive points to a warning page that does not work	2025-06-07 20:47:28 -04:00
Sieds Lykles	c29a56dd51	Fix whisper OOB (#10685 ) * fix whisper and test * remove import	2025-06-07 20:23:50 -04:00
Sieds Lykles	2f605eadf7	fix oob (#10666 )	2025-06-07 11:32:03 -04:00
wozeparrot	0d86f8d375	fix failed threefry (#10646 )	2025-06-05 17:17:42 -07:00
chenyu	4ab3391e6f	`set -o pipefail` for mlperf run_and_time (#10577 ) also run the 5.1 script in ci cron job	2025-05-30 16:36:44 -04:00
chenyu	baf482d314	copy mlperf stuff to 5.1 (#10576 ) 5.0 is finalized, new changes go to 5.1	2025-05-30 16:12:39 -04:00
George Hotz	b3b43a82c4	remove Tensor.no_grad, it's meaningless now [pr] (#10556 )	2025-05-28 22:20:02 -07:00
George Hotz	e4e7b5d7e1	continue work on beautiful cifar (#10555 )	2025-05-28 21:42:01 -07:00
George Hotz	871df1436a	more beautiful cifar (#10551 ) * enumerate cases of Tensors in the JIT * optional fused optimizers * add fused optimizer test * move that there * ugh * work on beautiful_cifar * speed close to hlb_cifar * schedule to corealize all * one line sched step * less lines	2025-05-28 20:48:20 -07:00
chenyu	74cf5dbd9e	mlperf system updates (#10550 ) standardized processor and accelerator names	2025-05-28 16:15:46 -04:00
chenyu	51dc7eedb0	correct use AM for resnet run_and_time (#10524 )	2025-05-26 15:33:11 -04:00
chenyu	c1919ad55f	use AM for resnet run_and_time (#10523 )	2025-05-26 14:50:49 -04:00
chenyu	2d50efb92b	`set -e` on mlperf run_and_time scripts (#10519 )	2025-05-26 09:22:30 -04:00
chenyu	dc6309242d	WallTimeEvent for mlperf ci (#10506 )	2025-05-24 10:56:03 -04:00

1 2 3 4 5 ...

1138 Commits