tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-14 09:28:04 -05:00

Author	SHA1	Message	Date
geohotstan	536b254df4	Bump onnx to 1.18.0 (#11266 ) * bump * thou hast implement functions * hacked in domain support * some clean ups * hack quantize_onnx_test too * add helper lol, why onnx tests why * better dispatcher, but need tests and better naming * flaky ci * change some names * small clean ups * make it easier to clean up tests once ORT supports 1.18.0 * nits * fix bug of Softmax_1 being registered in onnx_ops * need a default value * resolve_const is better name * fix OnnxRunner.to * use proper domain names	2025-07-17 15:35:41 -04:00
qazal	e68af3b336	disable flaky assert in test_cpu_profile (#11270 )	2025-07-17 06:50:39 +03:00
chenyu	522dc72f08	remove Kernel.local_dims [pr] (#11268 ) * remove Kernel.local_dims [pr] also not needed * fix test_matvec	2025-07-16 17:46:19 -04:00
uuuvn	6f0ddcc24c	Remote cross-host graph (#11229 )	2025-07-16 13:27:54 -07:00
quortus	924bc7c9ae	Fix test_uop_spec (#11259 )	2025-07-16 11:02:31 +03:00
chenyu	c8e5c4d7c3	insert_before -> insert_at [pr] (#11257 ) more precise	2025-07-15 17:44:34 -04:00
leopf	557ca7d757	testing SimpleTokenizer against OASST1 (#11214 )	2025-07-14 17:09:31 -07:00
wozeparrot	5878b189b8	don't const fold shape changing bitcast (#11236 )	2025-07-14 16:42:16 -07:00
chenyu	b6662096cb	remove more first_reduce [pr] (#11239 )	2025-07-14 19:13:44 -04:00
chenyu	eb8e17ef59	remove most of the first_upcast [pr] (#11238 )	2025-07-14 16:54:24 -04:00
chenyu	674dc28505	remove Kernel.full_unupcasted_shape [pr] (#11215 ) decomp to shape_len and first_upcast to get the last upcast-able dim	2025-07-13 13:56:23 -04:00
Alisher Zhubanyshev	4ef6b46b34	hcq: reduce launch overhead (#11193 ) * nv: improve mmio creation speed * add memoryview test * fix indents * move mv bench to `test_helpers`, remove comparison	2025-07-13 19:25:50 +03:00
chenyu	2b48b961be	fix a few broken AMX tests (#11204 )	2025-07-12 21:42:38 -04:00
chenyu	a0438012af	remove Kernel.get_program [pr] (#11203 )	2025-07-12 20:50:29 -04:00
chenyu	73caa5dd1b	remove Kernel.membufs [pr] (#11200 )	2025-07-12 14:48:47 -04:00
geohotstan	5ce278b245	OnnxRunner file as input (#10789 ) * file path as input and have parse be in OnnxRunner.__init__ * modelproto_to_onnxrunner -> modelproto_to_runner * whoops, fix import * oh flakiness again, is it because it's getting gc-ed? * small changes * CI flaky so just move compile4 fix in * copy typing of onnx_load * actually can just import onnx_load instead of onnx.load * fix external_benchmark_openpilot * fix onnx_runner test to use onnx_helper * rerun CI * try run_modelproto * spam CI a few times * revert run_modelproto since that's flaky also * no external onnx_load usage except onnx.py * cursor tab complete is evil. Snuck a darn sorted in. But does order change result? Why? * model_benchmark 193s -> 80s, add OnnxRunner.to()... * minimize diff and clean up * device can be None, weird but eh --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-07-12 14:27:46 -04:00
nimlgen	110cff3f2e	fix device arg to Tensor.randn (#11194 ) * fix device arg to Tensor.randn * simpler test * self.assertEqual	2025-07-12 13:51:59 -04:00
chenyu	6283d50224	DEPRECATED_linearize -> to_program [pr] (#11198 )	2025-07-12 13:46:20 -04:00
nimlgen	ea7f2f779c	hcq: p2p nv-amd (#11195 ) * hcq: p2p between diff devices * fix	2025-07-12 18:53:34 +03:00
qazal	d3ec63a5c3	viz: add base class for unittests (#11178 )	2025-07-11 13:58:03 +03:00
nimlgen	fb278c6a02	do not recreate Compiled.profile_events in helper_collect_profile (#11171 )	2025-07-10 23:55:12 +03:00
qazal	bde80c0cdf	record GraphEvents in metal graph (#11145 ) * record GraphEvents in metal graph * add TestProfiler.test_graph, revert old stuff * move profile capture to MetalGraph * comment * don't double record graph command buffers * wait_check * explicit delete	2025-07-10 21:32:06 +03:00
chenyu	7db07e5f2c	don't narrow range of CAST on bool/unsigned (#11156 )	2025-07-09 22:20:09 -04:00
George Hotz	4156baee93	break swizzle into three chunks [pr] (#11153 ) * break swizzle into three chunks [pr] * test failed	2025-07-09 15:30:34 -07:00
George Hotz	53ae153404	tc should be in opt (#11148 ) * tc should be in opt [pr] * fix import	2025-07-09 14:12:21 -07:00
nimlgen	b6981404ed	memory: use page shifts in memory manager (#11149 ) * memory: use page shifts in memory manager * fix	2025-07-09 22:05:00 +03:00
qazal	5c1d215b41	viz: add Graph stream (#11144 ) * viz: stack an event for the entire batch * multi * whitespace * work * multi graph, Graph gets its own row	2025-07-09 20:56:46 +03:00
George Hotz	2893feb9f6	cleanups for kernel.py (#11143 ) * cleanups for kernel.py * fixups	2025-07-08 18:10:25 -07:00
George Hotz	359bed74f8	axis type tracking [pr] (#11137 ) * axis type tracking [pr] * keep update_info * keep legacy colors * update tests to apply_opt	2025-07-08 14:16:25 -07:00
chenyu	dada3f5bf3	skip some new onnx tests (#11135 ) these fails on master with latest onnx	2025-07-08 16:12:48 -04:00
qazal	3dfc0ff887	move cpu_profile and shared ProfileEvents from device.py to helpers [pr] (#11126 ) * move cpu_profile and shared ProfileEvents to helpers [pr] * TestProfiler.test_cpu_profile * update test_viz.py * TestProfiler.test_profile_multiops ordering, it's different streams now	2025-07-08 12:14:03 +03:00
George Hotz	f7d4638e05	start LLM app, tons of clean up required. target is 200 line ollama (#11068 ) * start LLM app, tons of clean up required. target is 200 line ollama * kind of works * simpler * add k/v cache * with SYM=1, it loops * no rope cache * simpler * more cleanups * cleanups * works * argparse and comments * from gguf * generate is a function * no copy from cpu * fix max context pass in * test * improve test * ai2_arc * fix 8B, use less ram * 136 lines	2025-07-07 17:09:46 -07:00
chenyu	341a686799	Tensor.diagonal (#11122 ) only implemented main diagonal for 2-D tensors. with diagonal and qr, we can get determinant	2025-07-07 16:21:26 -04:00
Sieds Lykles	584fd6af5a	Fix division by zero and mask bug in add views (#11088 ) * merge view infinite loop test * adjust condition in `x//d -> x//(-d)-1` Fix division by zero in add views * adjust offset end * fix typo in comment * add target to test_merge_views_variable * fix view incorrectly being masked * ssimplify strides and offset of the new view to canonicalize * remove print in test --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-07-07 10:05:47 -07:00
Nino Risteski	a1a146a499	adding enable_gqa in SDPA (#11097 ) Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2025-07-06 23:25:33 -07:00
chenyu	7468959f4b	Tensor.argsort (#11112 )	2025-07-06 13:56:35 -04:00
kevvz	b7af9cf849	clean svd tests, set full_matrices false in torch backend (#11113 ) * clean tests, set full_matrices false * add more shape asserts	2025-07-06 13:55:49 -04:00
chenyu	ba88ec3ad0	pipe linalg svd to torch (#11109 ) and found a bug in svd	2025-07-06 08:37:25 -04:00
chenyu	845a4d32bc	Tensor.diag (#11108 ) also updated Tensor.eye to use it	2025-07-05 23:03:02 -04:00
ttomsa	4905af4ae0	remove invalid int div test (#11106 ) * rm test * also rm this	2025-07-05 18:57:55 -04:00
qazal	81781dc12b	viz: renames and spacing changes to tracing (#11102 )	2025-07-05 18:40:39 +03:00
qazal	7619bf35e7	cleanup: remove disabled TestIndexingOrdering (#11101 ) * cleanup: remove disabled TestIndexingOrdering * don't import kernelize internals	2025-07-05 18:14:37 +03:00
qazal	4fcfaa0ef7	viz: switch to TracingKey (#11100 ) * viz: switch to TracingKey * tuple * order is name, keys, fmt * add test_tracing_key	2025-07-05 17:46:18 +03:00
qazal	3d8569f6d8	hotfix: infinite loop in tracking pattern matcher (#11094 ) * failing test * fix that * given matchers	2025-07-04 19:55:26 +03:00
nimlgen	01f3c4f44d	memory: simpler paddr allocation logic (#11090 ) * memory: new paddr allocation logic * am fix * am refactrros * fix * mypy * use it * am	2025-07-04 17:00:36 +03:00
qazal	988540f401	support capturing cpu_profile on error (#11078 ) * support capturing cpu_profile on error * spacing * pylint complains	2025-07-04 11:53:12 +03:00
chenyu	a2f5a54458	move sparse_categorical_crossentropy to test_ops (#11083 ) also flattened the tests	2025-07-03 21:40:54 -04:00
chenyu	7c8ccb0267	sparse_categorical_crossentropy cleanup [pr] (#11082 )	2025-07-03 18:32:52 -04:00
chenyu	678cabc6f2	use argfix in Tensor.stack (#11077 ) works for multiple Tensor args or single tuple/list of Tensors, but not the mixed	2025-07-03 12:15:11 -04:00
qazal	b695e8c4d6	viz: remove support for naming with self (#11076 )	2025-07-03 17:29:14 +03:00

... 12 13 14 15 16 ...

4667 Commits