tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-12 00:18:43 -05:00

Author	SHA1	Message	Date
George Hotz	f6b661eb3c	imports	2025-09-05 15:34:49 -07:00
George Hotz	5cf42dc4db	add Scheduler to replace Kernel with POSTOPT=2 (#11924 ) * ** simple kernel to replace Kernel for postopt * support old * fix beam * beaming * beam on old * bring tensor cores back * raise * postbeam * test ops passes on mac * skip that * postopt default * gate that * fix tensor cores * a few test fixes * dsp fix * tc fix * loop * support swap * test_gemv * fix beam for variable * test opts from high level stuff * range annoying * compile slow * metal slow * better beam * no POSTBEAM * fix nolocals * hc opt mostly works * put that back * lil * some work * fix that * POSTOPT 2 * fix tests * no postopt 2 * work * back * padded tensors cores * shift_to * postopt 0 passes? * write PADTO * fix padded tensor cores * compare hcopt * 18000 lines * should pass tests * fix rangeify * put types back	2025-09-03 19:23:30 -07:00
George Hotz	a5f2b4872a	use_tensor_cores is a heuristic (#11989 ) * use_tensor_cores is a heuristic * context	2025-09-03 17:05:10 -07:00
George Hotz	394c2d1db1	update Kernel API in tests + move optimize_local_size (#11907 )	2025-08-28 15:12:47 -07:00
geohotstan	4e8370309c	Support onnx If OP (#11648 ) * start * tiny clean up * whoops, didn't mean to accidentally fix this * fix .to(device), kinda hacky and this fix makes it slower? * merge properly * FINALLY figured out slowness, also hack pylint for now * add DEBUGONNX print for subgraph * oops * WOOOOOOOO SHAPE CACHE 50% SPEED INCREASE * small fix, but maybe all deterministic Tensor creation in fp should be cached * cache condition * sliiiightly cleaner * better abstraction? * remove sam from model_benchmark * remove shape cache speed up for now * less lines * isinstance fix --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-08-28 10:17:35 -04:00
Sieds Lykles	d39365809a	add ctx to z3_renderer arg (#11867 ) * add ctx to z3_renderer arg * update symbolic fuzzer * rewrite u1,u2,u3 * update fuzz_fast_idiv * remove imports	2025-08-27 03:38:15 +02:00
George Hotz	12ab3f8b06	correct row_count in process replay (#11748 )	2025-08-19 22:21:07 -07:00
George Hotz	8af8808c61	cleanup tests, bump caches (#11746 )	2025-08-19 21:21:07 -07:00
George Hotz	1d307f568c	move device tests to test/device + test cleanups (#11735 ) * move device tests to test/device * test speedups * test device * linalg to unit * upd * so pytest just works * more divide and skip * speed * test devectorize * add pillow	2025-08-19 16:02:20 -07:00
chenyu	d0d39885c3	onnx in tinygrad (#11675 )	2025-08-14 19:57:21 -04:00
George Hotz	d2521d828a	transcendental+idiv+threefry are uop decompositions (#11636 ) * transcendental+idiv+threefry are uop decompositions [pr] * threefry decomp * fix randomness tests * fix webgpu * unneeded now * fix * move prematcher * all cast should probably be cast_vec	2025-08-13 09:37:12 -07:00
geohotstan	925555b62a	Fix onnx Domain bug (#11650 )	2025-08-13 08:20:50 -07:00
Sieds Lykles	4d6e407eb0	Extend fast_idiv to negative ints (#11632 ) * fast idiv for signed ints * Add rule and test * fix tests * redo fuzz_fast_idiv to do negative ints as well * adjust comments * remove unused imports	2025-08-12 19:34:49 +02:00
geohotstan	ad9dec25b3	combine onnx parser and onnx (#11485 ) * start * more * fix onnx_runner test * pass * patch for disk and add domains from huggingface * simpler docs * revert domain changes * rerun ci * revert onnx ops test change * add fix from strenum stuff * correct way * revert correct way to leave the fix for another PR * test segfault * Revert "test segfault" This reverts commit `4e1aaf41e7`. * remove some unnecessary documentation * test segfault again * Revert "test segfault again" This reverts commit `56fc5f03e7`. * try gemini suggested patch for sys._getframe * keep trying with gemini * revert not working gemini suggestions and try faulthandler * remove pythonfaulthandler * trigger CI a few times * minimize diff --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-08-12 12:56:39 -04:00
geohotstan	27bcb9fd1c	Support cubic mode for ONNX Resize OP (#11612 ) * start * add reference * this is so much slower * this makes sense but differs from official impl, but results are still correct..? * add a comment * Just keep it simple for now since I don't fully get it yet * address comments * correct * teeny clean up * another small comment improvement lol	2025-08-11 11:49:30 -04:00
geohotstan	b0dab6a4cd	onnx Resize OP clean up (#11603 ) * start * slight clean up	2025-08-10 14:10:39 -04:00
Sieds Lykles	10d388499d	Refactor optional.py (#11578 ) * move fast_idiv to transcendental * move optional.py * adjust comment * change import * mypy needs this?	2025-08-09 17:35:05 +02:00
qazal	16f0edbe90	pass opts arg in get_program process replay [pr] (#11571 ) * fix ptx process replay * keyword arg * renderer is also optional [pr] * test_linearizer fixup * name function order is args,ret,kwargs * can use opts_to_apply * pass through p.applied_opts * sink_arg * now it opens devices too	2025-08-08 03:05:09 +03:00
George Hotz	82be8abfd2	move opt under codegen (#11569 )	2025-08-07 14:19:17 -07:00
George Hotz	80d9cced07	more test cleanups (#11544 ) * more test cleanups * revert that	2025-08-06 15:05:21 -07:00
leopf	4f0ee4e982	BPE tokenizer (#11415 ) * BPE works * refactor tok * oops * basic tests * fix eval * smaller diff * fix error * proper vocab decoding * use regex for splitting * escape ucatrange * full compat --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-08-04 09:52:38 -07:00
b1tg	06af9f9236	fix double exception + add name,loc in error msg (#11487 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-08-04 13:41:23 +03:00
nimlgen	d38d285489	ci: add h machines (#11416 ) * ci: add h machines * more * fix names * names not collide * 20 * 10	2025-07-29 19:21:51 +03:00
George Hotz	7f0a41df4d	move optional out of devectorize [pr] (#11350 ) * move optional out of devectorize [pr] * fast idiv	2025-07-23 11:26:05 -07:00
chenyu	960da9319d	Remove StrEnum in onnx for python 3.10 (#11345 ) some training tests failed looks like parsing error?	2025-07-23 11:52:25 -04:00
nimlgen	304eb9cecb	allocate less memory in am tests (#11342 )	2025-07-23 11:11:26 +03:00
geohotstan	445ff8de56	ONNX onnx_parser and buffer_parse clean up (#11000 ) * start * remove onnx.load from compile4 and move np to dropout * clean up and enable test * clean up * move WebGPU ONNX test into MacOS (WebGPU) * leave test in ONNX (CPU) * fix raw_data init None, and simplify onnx_runner test a little? * THESE TESTS ARE SO UGLY UGHH * need to really think about how to structure the test * wow LLMs are quite something * not always on disk now * also add external data loading test * cleaner tests * minimize diff and add const folding tests * add external data loading too * whoops add webgpu back.. but why was it not needed in the first place? * better comment * move webgpu test to macos(webgpu)? * llm english so much better than me wow * trigger CI to check flakiness --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-07-21 15:10:25 -04:00
George Hotz	842184a1ab	rename kernelize to schedule, try 2 (#11305 )	2025-07-21 11:18:36 -07:00
qazal	3002c63b1e	process replay: optionally pass tinygrad import error (#11289 ) * process replay: optionally pass tinygrad import error * gate all tinygrad internals * s/getenv/os.getenv pre import * diff	2025-07-20 22:57:56 +03:00
geohotstan	536b254df4	Bump onnx to 1.18.0 (#11266 ) * bump * thou hast implement functions * hacked in domain support * some clean ups * hack quantize_onnx_test too * add helper lol, why onnx tests why * better dispatcher, but need tests and better naming * flaky ci * change some names * small clean ups * make it easier to clean up tests once ORT supports 1.18.0 * nits * fix bug of Softmax_1 being registered in onnx_ops * need a default value * resolve_const is better name * fix OnnxRunner.to * use proper domain names	2025-07-17 15:35:41 -04:00
leopf	557ca7d757	testing SimpleTokenizer against OASST1 (#11214 )	2025-07-14 17:09:31 -07:00
chenyu	a0438012af	remove Kernel.get_program [pr] (#11203 )	2025-07-12 20:50:29 -04:00
chenyu	73caa5dd1b	remove Kernel.membufs [pr] (#11200 )	2025-07-12 14:48:47 -04:00
geohotstan	5ce278b245	OnnxRunner file as input (#10789 ) * file path as input and have parse be in OnnxRunner.__init__ * modelproto_to_onnxrunner -> modelproto_to_runner * whoops, fix import * oh flakiness again, is it because it's getting gc-ed? * small changes * CI flaky so just move compile4 fix in * copy typing of onnx_load * actually can just import onnx_load instead of onnx.load * fix external_benchmark_openpilot * fix onnx_runner test to use onnx_helper * rerun CI * try run_modelproto * spam CI a few times * revert run_modelproto since that's flaky also * no external onnx_load usage except onnx.py * cursor tab complete is evil. Snuck a darn sorted in. But does order change result? Why? * model_benchmark 193s -> 80s, add OnnxRunner.to()... * minimize diff and clean up * device can be None, weird but eh --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-07-12 14:27:46 -04:00
chenyu	6283d50224	DEPRECATED_linearize -> to_program [pr] (#11198 )	2025-07-12 13:46:20 -04:00
nimlgen	b6981404ed	memory: use page shifts in memory manager (#11149 ) * memory: use page shifts in memory manager * fix	2025-07-09 22:05:00 +03:00
George Hotz	2893feb9f6	cleanups for kernel.py (#11143 ) * cleanups for kernel.py * fixups	2025-07-08 18:10:25 -07:00
chenyu	dada3f5bf3	skip some new onnx tests (#11135 ) these fails on master with latest onnx	2025-07-08 16:12:48 -04:00
George Hotz	f7d4638e05	start LLM app, tons of clean up required. target is 200 line ollama (#11068 ) * start LLM app, tons of clean up required. target is 200 line ollama * kind of works * simpler * add k/v cache * with SYM=1, it loops * no rope cache * simpler * more cleanups * cleanups * works * argparse and comments * from gguf * generate is a function * no copy from cpu * fix max context pass in * test * improve test * ai2_arc * fix 8B, use less ram * 136 lines	2025-07-07 17:09:46 -07:00
nimlgen	01f3c4f44d	memory: simpler paddr allocation logic (#11090 ) * memory: new paddr allocation logic * am fix * am refactrros * fix * mypy * use it * am	2025-07-04 17:00:36 +03:00
qazal	ad155f5454	print inputs to get_program in process replay [pr] (#11051 ) * print inputs to get_program in process replay [pr] * colors * keep dataclass default escapes * Revert "keep dataclass default escapes" This reverts commit `c6db7e8a7a`. * note for ast_repr * add that back	2025-07-02 20:20:01 +03:00
qazal	452b22c9b6	fix process replay diff in PYTHON device [pr] (#11052 ) * fix process replay diff in PYTHON device [pr] The PYTHON backend pickles and encodes UOps, the encoded binary can't be directly diffed in process replay. * note	2025-07-02 11:06:46 +03:00
geohotstan	8ebf0abaae	ONNX external_test_onnx_backend use PYTHON device for model (#10915 ) * try * ruff check --fix * no skip test * hmmmmmmm I don't get this D: * run CI again * why is PYTHON device faster than CPU? * run ci again and fix lint * actually doesn't PYTHON device make sense here? * see cpu speed again * Revert "see cpu speed again" This reverts commit `1e366f2256`. * trigger CI * pretty good --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-07-01 12:11:17 -04:00
qazal	712980e167	fix extract_dataset + add tests to CI (#10995 ) * fix extract_dataset + tests * add CI * sops.gz itself is same as master * yml + gzip -c + ge * don't commit that * bump limit to 1000 * axis=7 * test_tiny	2025-06-27 01:51:36 +03:00
geohotstan	50936b4a18	ONNX real float16 (#10694 ) * squash commits * temp fix for const tensor * actually realizing float16 can only happen in raw_data * .float -> cast(float) to rerun CI --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-06-26 14:05:12 -04:00
chenyu	8751d47985	CosineAnnealingLRWithWarmup (#10981 )	2025-06-25 17:45:21 -04:00
Ignacio Sica	21f1c4cc09	remove some linearize calls from tests [pr] (#10978 ) * remove some linearize calls from tests speed_compare_cuda_ptx test_uop_spec test_linearizer test_uops test_winograd * more clear assert message	2025-06-25 12:37:17 -07:00
qazal	de4b9bf53b	add opts_to_apply option to AST KernelInfo (#10950 ) * proposal: add option to override opts in the get_program API * update test_linearizer_rewrite * state in uops * update process_replay and names * empty isn't none * fix process replay	2025-06-24 18:55:39 +03:00
qazal	7a5e4e0bf1	fix unittests process replay [pr] (#10947 )	2025-06-24 10:30:23 +03:00
George Hotz	ae4d2d71b4	bump line count to 14500	2025-06-23 15:32:27 -07:00

1 2 3 4 5 ...

837 Commits