tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-07 03:00:26 -04:00

Author	SHA1	Message	Date
chenyu	6283d50224	DEPRECATED_linearize -> to_program [pr] (#11198 )	2025-07-12 13:46:20 -04:00
nimlgen	b6981404ed	memory: use page shifts in memory manager (#11149 ) * memory: use page shifts in memory manager * fix	2025-07-09 22:05:00 +03:00
George Hotz	2893feb9f6	cleanups for kernel.py (#11143 ) * cleanups for kernel.py * fixups	2025-07-08 18:10:25 -07:00
chenyu	dada3f5bf3	skip some new onnx tests (#11135 ) these fails on master with latest onnx	2025-07-08 16:12:48 -04:00
George Hotz	f7d4638e05	start LLM app, tons of clean up required. target is 200 line ollama (#11068 ) * start LLM app, tons of clean up required. target is 200 line ollama * kind of works * simpler * add k/v cache * with SYM=1, it loops * no rope cache * simpler * more cleanups * cleanups * works * argparse and comments * from gguf * generate is a function * no copy from cpu * fix max context pass in * test * improve test * ai2_arc * fix 8B, use less ram * 136 lines	2025-07-07 17:09:46 -07:00
nimlgen	01f3c4f44d	memory: simpler paddr allocation logic (#11090 ) * memory: new paddr allocation logic * am fix * am refactrros * fix * mypy * use it * am	2025-07-04 17:00:36 +03:00
qazal	ad155f5454	print inputs to get_program in process replay [pr] (#11051 ) * print inputs to get_program in process replay [pr] * colors * keep dataclass default escapes * Revert "keep dataclass default escapes" This reverts commit `c6db7e8a7a`. * note for ast_repr * add that back	2025-07-02 20:20:01 +03:00
qazal	452b22c9b6	fix process replay diff in PYTHON device [pr] (#11052 ) * fix process replay diff in PYTHON device [pr] The PYTHON backend pickles and encodes UOps, the encoded binary can't be directly diffed in process replay. * note	2025-07-02 11:06:46 +03:00
geohotstan	8ebf0abaae	ONNX external_test_onnx_backend use PYTHON device for model (#10915 ) * try * ruff check --fix * no skip test * hmmmmmmm I don't get this D: * run CI again * why is PYTHON device faster than CPU? * run ci again and fix lint * actually doesn't PYTHON device make sense here? * see cpu speed again * Revert "see cpu speed again" This reverts commit `1e366f2256`. * trigger CI * pretty good --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-07-01 12:11:17 -04:00
qazal	712980e167	fix extract_dataset + add tests to CI (#10995 ) * fix extract_dataset + tests * add CI * sops.gz itself is same as master * yml + gzip -c + ge * don't commit that * bump limit to 1000 * axis=7 * test_tiny	2025-06-27 01:51:36 +03:00
geohotstan	50936b4a18	ONNX real float16 (#10694 ) * squash commits * temp fix for const tensor * actually realizing float16 can only happen in raw_data * .float -> cast(float) to rerun CI --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-06-26 14:05:12 -04:00
chenyu	8751d47985	CosineAnnealingLRWithWarmup (#10981 )	2025-06-25 17:45:21 -04:00
Ignacio Sica	21f1c4cc09	remove some linearize calls from tests [pr] (#10978 ) * remove some linearize calls from tests speed_compare_cuda_ptx test_uop_spec test_linearizer test_uops test_winograd * more clear assert message	2025-06-25 12:37:17 -07:00
qazal	de4b9bf53b	add opts_to_apply option to AST KernelInfo (#10950 ) * proposal: add option to override opts in the get_program API * update test_linearizer_rewrite * state in uops * update process_replay and names * empty isn't none * fix process replay	2025-06-24 18:55:39 +03:00
qazal	7a5e4e0bf1	fix unittests process replay [pr] (#10947 )	2025-06-24 10:30:23 +03:00
George Hotz	ae4d2d71b4	bump line count to 14500	2025-06-23 15:32:27 -07:00
George Hotz	e15754db28	remove (some) kernelize from llama and test schedule speed (#10939 ) * remove kernelize from llama * 405B * space	2025-06-23 15:07:31 -07:00
chenyu	42b1c9625b	skip test TestKiTS19Dataset::test_training_set (#10936 ) flaky	2025-06-23 14:27:24 -04:00
patrini32	9e9fd44987	refactor test/external/external_llama_eval.py (#10567 ) Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2025-06-23 10:43:20 -07:00
qazal	7820aeca8e	update codegen process replay to use get_program [pr] (#10921 ) * update codegen process replay to get_program [pr] * precommit * try str replace * +to_function_name * fixup tc * local2.sh * fix openpilot NOLOCALS * new local.sh * correct merge * beam cache * back * revert beam thing * adding opts_override and name_override makes output of get_program reproducible * min diff	2025-06-23 17:31:41 +03:00
alpharush	22f9696522	Fix/hcqfuzz harnesss bug (#10923 ) * update command so extra module is found * fix empty range in randrange errors * lint	2025-06-23 11:22:30 +03:00
geohotstan	4ab7d792cc	ONNX improve dtype fallback (#10800 ) * fix * add early verbose demo test * is this how to write tests :s * is definition drift even a thing? gemini says it is * clean up * better * even better * try add to CI * doesn't work quite yet * much more work to be done * whoops * partition the test heh * skipif * some nits for better names * add webgpu test for onnxrunner * fix reference links * flush for now	2025-06-21 19:29:45 -04:00
chenyu	0480139def	log_perplexity metrics (#10912 )	2025-06-21 10:44:47 -04:00
nimlgen	0e7bd9fd03	factor out generic MemoryManager (#10910 ) * allocator -> memory * just moveout it * mm is abstracted * need entry abstraction * fix * mypy	2025-06-21 16:18:33 +03:00
George Hotz	7636d2cdc5	flip order of get_program args (#10905 )	2025-06-20 17:23:23 -07:00
George Hotz	b41e0563a3	move stuff to kernelize folder (#10902 ) * move stuff to kernelize folder * oops, forgot that	2025-06-20 16:10:20 -07:00
George Hotz	92678e59ee	move kernel to opt (#10899 )	2025-06-20 15:22:28 -07:00
chenyu	a3dae51085	lower test_gemm_8192 on red (#10883 )	2025-06-19 10:01:25 -04:00
George Hotz	18593c9800	one less rewrite on schedule [pr] (#10872 ) * one less rewrite on schedule [pr] * verify in ebs	2025-06-18 17:06:17 -07:00
wozeparrot	bdbf121285	fix: contigous -> contiguous (#10868 )	2025-06-18 13:09:51 -07:00
George Hotz	cba6e15937	split grouper and kernelize [pr] (#10854 )	2025-06-17 17:54:20 -07:00
uuuvn	a51f18f8f9	CI flakiness (#10851 ) https://github.com/tinygrad/tinygrad/actions/runs/15718103629/job/44292845140?pr=10753#step:4:161	2025-06-17 14:46:30 -07:00
nimlgen	c0329148c7	am: check va is aligned to page size (#10815 ) * am: check va is aligned to page size * swap them * is this faster	2025-06-15 22:51:09 +03:00
George Hotz	5dc1bc6070	switch get_kernel -> get_program [pr] (#10817 ) * switch get_kernel -> get_program [pr] * fix tests	2025-06-15 12:26:50 -07:00
wozeparrot	eb739bb96a	hotfix: lower threshold (#10786 )	2025-06-11 19:36:20 -04:00
chenyu	612cdf5146	move fuzz_shape_ops to run with other fuzzer (#10767 ) * move fuzz_shape_ops to run with other fuzzer * don't skip CPU	2025-06-10 17:43:04 -04:00
b1tg	52c49dd4f3	fix onnx ci (#10762 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-06-10 14:28:40 -04:00
George Hotz	f84c320548	better external_benchmark_schedule [pr] (#10722 )	2025-06-09 10:26:11 -07:00
b1tg	24d328e313	onnx parser (#10435 ) * onnx parser * fix compile, lint * onnx.load -> onnx_load * compatible with ModelProto * fix test external_test_onnx_ops.py * fix tests * fix signed int * reduce to 261 lines * fix TypeProto.Optional * debug for _parse_message, add TypeProto.Sequence, cleanup * onnx_load from Tensor * remove BufferedReader * 174 lines and reduce tensor copy * cleanup * use onnx_load in external_model_benchmark.py * fix qcom test * [onnx] parser support external data --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-06-09 12:44:28 -04:00
George Hotz	81b9c04574	move high level stuff to unit tests [pr] (#10708 ) * move high level stuff to unit tests [pr] * process replay on unit tests * fix pr, less compute * set omp num threads * set 200MB buffer size limit * delete junk * fix tests * faster * move test_indexing to unit * faster	2025-06-08 14:05:56 -07:00
George Hotz	32e9949052	rename lazydata to uop (#10698 )	2025-06-08 08:42:22 -07:00
leopf	eb7305e6a4	Tensor.keccak("sha3_256") (#7186 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: George Hotz <geohot@gmail.com> Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2025-06-06 15:24:05 -07:00
wozeparrot	0d86f8d375	fix failed threefry (#10646 )	2025-06-05 17:17:42 -07:00
chenyu	46811d0d3c	minor external_model_benchmark cleanup (#10644 )	2025-06-05 14:13:28 -04:00
chenyu	80ebce421d	remove metal buffer limit in external_model_benchmark [pr] (#10642 ) not needed anymore	2025-06-05 13:00:51 -04:00
wozeparrot	4d1686f767	clean: becnhmark -> benchmark (#10620 )	2025-06-03 19:28:18 -07:00
qazal	910cabb081	add kernel count to grouper process replay differ [pr] (#10611 )	2025-06-03 15:21:27 +03:00
qazal	3cc73a0172	simpler process replay main loop [pr] (#10588 ) * simpler process replay main loop [pr] * use logging * default to 1	2025-06-01 15:03:21 +03:00
qazal	dc882d3d7d	merge process replay and viz captures [pr] (#10581 ) * refactoring * test script * work * more work * diff * repr splits lines correctly * that * add location * add location * also don't need name_override * k.copy * [pr] * name_override 2 * err	2025-06-01 12:30:10 +03:00
George Hotz	b3b43a82c4	remove Tensor.no_grad, it's meaningless now [pr] (#10556 )	2025-05-28 22:20:02 -07:00

1 2 3 4 5 ...

853 Commits