tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 06:48:22 -05:00

Author	SHA1	Message	Date
George Hotz	5d28a202b5	make tinychat local (#7871 )	2024-11-24 14:45:48 +08:00
chenyu	22d5def113	download llama3 70B (#7868 ) use "nvidia/Llama-3.1-Nemotron-70B-Instruct-HF". ``` PYTHONPATH=. JITBEAM=2 python3 examples/llama3.py --download_model --size 70B --quantize int8 --benchmark ``` on M4 Max, 40 sec to load the model and ``` enqueue in 165.15 ms total 328.54 ms, 3.04 tok/s, 247.46 GB/s, param 221.20 GB/s enqueue in 5.31 ms total 168.48 ms, 5.94 tok/s, 482.54 GB/s, param 431.34 GB/s enqueue in 5.32 ms total 168.77 ms, 5.93 tok/s, 481.71 GB/s, param 430.60 GB/s enqueue in 5.69 ms total 169.51 ms, 5.90 tok/s, 479.61 GB/s, param 428.72 GB/s enqueue in 5.41 ms total 168.60 ms, 5.93 tok/s, 482.20 GB/s, param 431.04 GB/s enqueue in 5.18 ms total 168.98 ms, 5.92 tok/s, 481.12 GB/s, param 430.08 GB/s enqueue in 5.43 ms total 168.82 ms, 5.92 tok/s, 481.59 GB/s, param 430.49 GB/s enqueue in 5.27 ms total 168.94 ms, 5.92 tok/s, 481.23 GB/s, param 430.17 GB/s ```	2024-11-23 12:18:31 -05:00
George Hotz	144e9f00df	viz is local, new test, and new quantize [pr] (#7859 ) * viz is local, new test, and new quantize [pr] * fix mime types * remove font * after index	2024-11-23 14:27:10 +08:00
qazal	9828277c03	view doesn't have buffer, fix the tests [pr] (#7841 ) * view doesn't have buffer, fix the tests [pr] * need assigns	2024-11-22 20:41:55 +08:00
chenyu	69e382216d	fix wino conv output dtype for half inputs (#7829 )	2024-11-21 12:13:54 -05:00
George Hotz	fbb4099b3c	add test for compile3 [pr] (#7783 ) Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-11-19 19:26:51 +08:00
chenyu	73ea913050	really not using numpy in gpt2 example (#7779 )	2024-11-18 23:21:16 -05:00
chenyu	e6debda5c4	remove numpy from gpt2 and llama examples (#7778 )	2024-11-18 22:48:17 -05:00
ignaciosica	597a239e28	Remove UnaryOps, BinaryOps, TernaryOps, MetaOps [pr] (#7725 ) * remove unaryops * remove ternaryops * remove metaops * hotfix * remove binaryops * hotfix: test_pattern_matcher --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2024-11-16 20:56:56 +08:00
geohotstan	f8056a74d6	combine pad2d with pad (#7677 ) * I have pad2d, I have pad, uuh~, pad2dpad~ * fix some small things * strategically placed cast hack * fix more * fix more more * tests * periods	2024-11-14 17:56:02 +08:00
chenyu	4c5f7ddf1f	flux set model path in args (#7660 ) in addition to default downloading through fetch, add an arg to pass model path directly	2024-11-12 22:11:40 -05:00
Harald Schäfer	e7cbc29f48	openpilot benchmark: add cast from numpy to benchmark (#7593 ) * openpilot benchmark: add cast from numpy to benchmark * whitespace * comment	2024-11-08 19:31:00 +08:00
Anthony DeMattos	953ef1b57e	tinychat ui +/- 20 lines (#7471 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-11-06 14:23:55 +08:00
George Hotz	c8bf09b7d4	s/UOps/Ops (#7500 ) * s/UOps/Ops [pr] * fix	2024-11-03 11:26:10 +08:00
George Hotz	72a9ac27e9	support image dtype in cloud [pr] (#7482 ) * support image dtype in cloud [pr] * remove outdated osx hack * unused imports	2024-11-02 23:54:27 +08:00
Tobias Fischer	7c9a1d69f9	sdxl gen fix (#7459 )	2024-11-01 13:57:01 -04:00
gonutz	e7cbc6dc23	Fix ValueError in Yolo 8 example (#7387 ) Calling python3 examples/yolov8.py ./test/models/efficientnet/Chicken.jpg used to result in this error ValueError: Calling nonzero on 0d arrays is not allowed. Using np.atleast_1d makes sure we avoid a zero-dimension array. Co-authored-by: gonutz <gonutz@fake.mail>	2024-10-30 10:18:39 +08:00
George Hotz	3989bd2682	idiv + reciprocal [pr] (#7354 ) * idiv + reciprocal * remove upcast from div * fix docs	2024-10-29 15:54:19 +08:00
chenyu	4a03e00aa1	fix llama3 download_model assert (#7320 ) false positive if download_model and model are not provided	2024-10-27 11:20:24 -04:00
eliotgolding	e920f1d663	Llama 3.2 1B load from GGUF (#7295 ) * gguf 1b-instruct * not needed	2024-10-27 09:29:02 +08:00
George Hotz	dc3148c677	hotfix: minor speed increase + stable diffusion relax	2024-10-25 16:27:21 +08:00
leopf	87877d7a91	GGUF cleanup (#7192 ) * cleanup * remove vocab size hard code	2024-10-21 10:44:54 -04:00
leopf	b6d9b276bb	GGUF support (#7046 ) * basic loader, untested * testing * remove utils import in test * q8_0 * q4_1 * end to end testing * minor cleanup * fix casting * moved to state * move tests * move dequant to fn * fix lint elif * remove gguf from extra * fix dict union * q6_k simpler * naming and spacing * gpt2-gguf example * cleanup * move gguf example * minor cleanup --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-10-21 16:15:34 +08:00
qazal	30989fb459	changes from the big graph branch [pr] (#7160 ) * metaops srcs * delete multioutput ctx var * always has metadata * shorter path for realized * this still needs inputs This reverts commit `a59cbb2886`.	2024-10-19 16:22:37 +03:00
Francis Lata	90eff347e2	tinytqdm write support (#6359 ) * add write support * add test * update test case to compare write outputs * assert final write output * flush when using write * update write logic * Revert "update write logic" This reverts commit `5e0e611b46`. --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-10-16 14:51:41 -04:00
George Hotz	3169cb386d	remove graph [pr] (#7085 )	2024-10-16 11:40:07 +08:00
George Hotz	26df50cf43	move memory_planner to memory.py [pr] (#7079 )	2024-10-16 10:04:35 +08:00
chenyu	ed1ed9e4ff	bert use BS=72 (#7015 ) memory 131 -> 138 green tflops 201 -> 209 red tflops 160 -> 169	2024-10-12 09:41:56 -04:00
George Hotz	a71bb09ec3	remove symbolic file [pr] (#7012 )	2024-10-12 18:44:44 +08:00
George Hotz	5c9f76e274	hotfix: openpilot compile3 compare to i==1	2024-10-12 09:44:24 +08:00
chenyu	36056e0760	update mlperf systems and copy 4.1 to 5.0 (#7004 )	2024-10-11 16:20:34 -04:00
chenyu	0e42662f2a	log seed at the right place for bert (#7000 )	2024-10-11 10:39:40 -04:00
nimlgen	5496a36536	update red mlperf bert readme (#6969 )	2024-10-11 13:08:06 +03:00
Friedrich Carl Eichenroth	859d6d0407	Fix mypy examples/beautiful_.py (#6978 ) fix mypy examples/beautiful_.py backwards * add test * Revert "add test" This reverts commit `4d88845ba3`. --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-10-10 11:34:29 -04:00
Kinvert	960c495755	added beautiful fashion mnist and example (#6961 ) * added beautiful fashion mnist and example * fixing whitespace * refactor Fashion MNIST to fewer lines * fix newline to reduce diff * Update beautiful_mnist.py * Update beautiful_mnist.py --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-10-10 12:01:07 +08:00
chenyu	b5546912e2	10% more TRAIN_STEPS for bert (#6971 ) got two very close run, adding more steps for buffer	2024-10-09 19:21:43 -04:00
chenyu	35cf48659b	limit beam param for bert on green (#6966 ) seems to mitigate the crash	2024-10-09 11:48:18 -04:00
chenyu	1ff2c98f8a	fix logfile name for bert red (#6952 )	2024-10-08 05:37:52 -04:00
chenyu	a78c96273a	update bert epoch logging (#6940 ) * update bert epoch logging epoch for bert is simply number of examples seen (which is used for RCP check) * update total steps too * more changes	2024-10-08 00:34:06 -04:00
chenyu	102dfe5510	back to 210 for bert loss scaler (#6934 ) getting 2 NaN for this, revert back to 210	2024-10-07 10:17:21 -04:00
chenyu	0cf815a93a	bert use BS=66 and update hparams (#6932 ) with dropout memory improvement, we can fit BS=66 now. revert back to the hparams in #5891 too	2024-10-07 05:08:27 -04:00
chenyu	718b959349	log epoch start and stop for bert (#6912 )	2024-10-06 06:39:46 -04:00
chenyu	16c1fa4208	use BEAM=3 for red box bert runs (#6904 ) BEAM=4 slightly exceeded 30 minutes setup	2024-10-05 09:21:12 -04:00
chenyu	0e706227a2	add seed to bert result log filename (#6903 ) * add seed to bert result log filename * different name for different benchmark	2024-10-05 09:15:24 -04:00
George Hotz	f4ec39fe58	switch symbolic from old to uops, final PR (#6872 ) * switch symbolic from old to uops, final PR * two wrong answers * not needed resolves * symbolic ops passes * symbolic ops passes * progress * tests pass (almost) * fix last test * fix some tests * global binding and unbinding * Revert "global binding and unbinding" This reverts commit `9456725630`. * that test works now * vars on uop doesn't recurse * fix fuzzer * update * fix type * fix gpt, it's UOp now * ssimplify symbolics	2024-10-04 16:42:27 +08:00
chenyu	7391376528	update bert hparams (#6876 ) 4h32m with this https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/q99frv1l/overview. loss scaler 213->210. matched the closest submission, no nan for ~10 runs. increased lr and total step a bit. `PARALLEL=0` after setup, same as resnet.	2024-10-04 00:39:06 -04:00
chenyu	5f77217772	bert default CKPT to 0 (#6840 ) not required	2024-10-01 21:55:56 -04:00
George Hotz	547733e57c	stunning_mnist [run_process_replay] (#6828 ) * stunning_mnist [run_process_replay] * add loss to stunning mnist	2024-10-01 15:00:48 +08:00
chenyu	f59517754e	add RESET_STEP in bert to control reset (#6818 ) same as resnet	2024-09-30 09:39:04 -04:00
George Hotz	2ed94e447f	gpt2: corealize opt and loss	2024-09-30 09:11:20 +08:00

1 2 3 4 5 ...

897 Commits