tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-24 06:18:01 -05:00

Author	SHA1	Message	Date
George Hotz	09431d4ad1	make DEFINE_REG behave like the others (#11273 ) * simpler define reg * cast * PTRCAT define_acc * cleanups * fix uops stats * fix linearizer tests * llvm * define reg sets const * define reg sets const * no assign * collapse that * fix test_max_pool2d_bigger_stride_dilation * use index, fix webgpu * devec * fix tests * fix webgpu * fix llvm * threads for python * fix ops_python * only for reg * acc_half is real now in the emulator * fix llvm * fix webgpu init * fix wgpu test * fix some tests * fix ptx * fix ptx bool acc * cleanups * broken, meh. will fix with ENDRANGE * line count	2025-07-22 13:53:56 -07:00
chenyu	4535908679	update keccak test_long (#11331 ) it should compare with arg "shake_128"	2025-07-22 16:08:01 -04:00
George Hotz	affd83961c	small changes from define_reg (#11327 ) * small changes from define_reg * fix webgpu	2025-07-22 11:11:48 -07:00
chenyu	2d7c28de6a	clean up dup lambdas in helper_test_exception (#11325 )	2025-07-22 12:21:57 -04:00
chenyu	c6aa8e58ca	fix TestDropoutProbabilityEdgeCases (#11322 )	2025-07-22 11:13:56 -04:00
chenyu	fb42c84365	merge TestRollEdgeCases into test_ops (#11321 )	2025-07-22 10:55:57 -04:00
chenyu	1d8b3e9d1c	movementop only Tensor.roll (#11317 ) * movementop only Tensor.roll * fixed	2025-07-22 10:34:15 -04:00
chenyu	a41140241b	truncate unsigned const in cstyle (#11318 ) it can be a warning or a hard error in clang PTX and PYTHON also need fix, skipping for now	2025-07-22 08:02:12 -04:00
qazal	6668d6d241	fix word_wrap with newlines in input string [pr] (#11319 )	2025-07-22 12:03:13 +03:00
George Hotz	3b674df34b	generic changes from define_reg_2 (#11315 ) * generic changes from define_reg_2 * fix for ptx * ugh, that one	2025-07-21 15:14:06 -07:00
chenyu	6e9506e6fd	Tensor.roll supports dims=None (#11313 )	2025-07-21 17:29:23 -04:00
George Hotz	108aac8af4	use AddrSpace instead of local (#11314 ) * use AddrSpace instead of local * addrspace in test	2025-07-21 14:00:06 -07:00
chenyu	d3a93185a6	clean up test_roll (#11312 )	2025-07-21 16:00:50 -04:00
George Hotz	532b52fcef	store has a dtype, like assign (#11309 ) * store has a dtype, like assign * fix upat * fix test	2025-07-21 12:50:01 -07:00
geohotstan	445ff8de56	ONNX onnx_parser and buffer_parse clean up (#11000 ) * start * remove onnx.load from compile4 and move np to dropout * clean up and enable test * clean up * move WebGPU ONNX test into MacOS (WebGPU) * leave test in ONNX (CPU) * fix raw_data init None, and simplify onnx_runner test a little? * THESE TESTS ARE SO UGLY UGHH * need to really think about how to structure the test * wow LLMs are quite something * not always on disk now * also add external data loading test * cleaner tests * minimize diff and add const folding tests * add external data loading too * whoops add webgpu back.. but why was it not needed in the first place? * better comment * move webgpu test to macos(webgpu)? * llm english so much better than me wow * trigger CI to check flakiness --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-07-21 15:10:25 -04:00
George Hotz	842184a1ab	rename kernelize to schedule, try 2 (#11305 )	2025-07-21 11:18:36 -07:00
wozeparrot	30ce16a424	feat: failing test for long keccak (#11292 )	2025-07-21 12:49:23 -04:00
uuuvn	178dbf3f66	Remote scheduler changes (#11177 )	2025-07-21 09:29:44 -07:00
nimlgen	cc3c1e4c14	hcq: move cpu to hcq (#11262 ) * hcq: move cpu to hcq * import time * upd * fix * windows support * hm * cleaner * fix timer * fix timing * std is ns * skip profiler * mypy * cleaner * cleanups * after merge * default is back	2025-07-21 15:10:38 +03:00
qazal	3002c63b1e	process replay: optionally pass tinygrad import error (#11289 ) * process replay: optionally pass tinygrad import error * gate all tinygrad internals * s/getenv/os.getenv pre import * diff	2025-07-20 22:57:56 +03:00
chenyu	54924f9969	type remove Union and Optional [pr] (#11283 ) use `\|` for consistency	2025-07-19 14:05:52 -04:00
nimlgen	188ed38315	replace from_mv with lightweight mv_address (#11280 )	2025-07-19 13:50:51 +03:00
chenyu	ec3efd2919	move upcast before reduce (#11250 ) * move upcast before reduce upcast goes to end of global+local+upcast * r_196_32_4_24_8	2025-07-18 14:42:15 -04:00
nimlgen	9a88bd841c	hcq: refactor into peer_groups (#11277 ) * hcq: refactor into peer_groups * fix fors * fixes * ooops * mypy * tiny fixes	2025-07-18 16:34:18 +03:00
chenyu	c5a5d74642	Revert "image_dot of 2 half inputs returns half (#11007 )" (#11274 ) This reverts commit `fa8e08f922`.	2025-07-17 17:34:18 -04:00
Utkarsh Gill	fa8e08f922	image_dot of 2 half inputs returns half (#11007 ) * cast after sum * comment out skipif * minor fix * only test IMAGE * IMAGE is supported now * simpler * simplerr * only cast if dtype is None * dont need to change base_imaeg_type * only cast when dtype is half * add explicit test * actually no, workflow seems better * actually, keep both * move test * fix indent --------- Co-authored-by: Utkarsh Gill <engelbart@Utkarshs-MacBook-Pro.local>	2025-07-17 13:47:22 -07:00
geohotstan	536b254df4	Bump onnx to 1.18.0 (#11266 ) * bump * thou hast implement functions * hacked in domain support * some clean ups * hack quantize_onnx_test too * add helper lol, why onnx tests why * better dispatcher, but need tests and better naming * flaky ci * change some names * small clean ups * make it easier to clean up tests once ORT supports 1.18.0 * nits * fix bug of Softmax_1 being registered in onnx_ops * need a default value * resolve_const is better name * fix OnnxRunner.to * use proper domain names	2025-07-17 15:35:41 -04:00
qazal	e68af3b336	disable flaky assert in test_cpu_profile (#11270 )	2025-07-17 06:50:39 +03:00
chenyu	522dc72f08	remove Kernel.local_dims [pr] (#11268 ) * remove Kernel.local_dims [pr] also not needed * fix test_matvec	2025-07-16 17:46:19 -04:00
uuuvn	6f0ddcc24c	Remote cross-host graph (#11229 )	2025-07-16 13:27:54 -07:00
quortus	924bc7c9ae	Fix test_uop_spec (#11259 )	2025-07-16 11:02:31 +03:00
chenyu	c8e5c4d7c3	insert_before -> insert_at [pr] (#11257 ) more precise	2025-07-15 17:44:34 -04:00
leopf	557ca7d757	testing SimpleTokenizer against OASST1 (#11214 )	2025-07-14 17:09:31 -07:00
wozeparrot	5878b189b8	don't const fold shape changing bitcast (#11236 )	2025-07-14 16:42:16 -07:00
chenyu	b6662096cb	remove more first_reduce [pr] (#11239 )	2025-07-14 19:13:44 -04:00
chenyu	eb8e17ef59	remove most of the first_upcast [pr] (#11238 )	2025-07-14 16:54:24 -04:00
chenyu	674dc28505	remove Kernel.full_unupcasted_shape [pr] (#11215 ) decomp to shape_len and first_upcast to get the last upcast-able dim	2025-07-13 13:56:23 -04:00
Alisher Zhubanyshev	4ef6b46b34	hcq: reduce launch overhead (#11193 ) * nv: improve mmio creation speed * add memoryview test * fix indents * move mv bench to `test_helpers`, remove comparison	2025-07-13 19:25:50 +03:00
chenyu	2b48b961be	fix a few broken AMX tests (#11204 )	2025-07-12 21:42:38 -04:00
chenyu	a0438012af	remove Kernel.get_program [pr] (#11203 )	2025-07-12 20:50:29 -04:00
chenyu	73caa5dd1b	remove Kernel.membufs [pr] (#11200 )	2025-07-12 14:48:47 -04:00
geohotstan	5ce278b245	OnnxRunner file as input (#10789 ) * file path as input and have parse be in OnnxRunner.__init__ * modelproto_to_onnxrunner -> modelproto_to_runner * whoops, fix import * oh flakiness again, is it because it's getting gc-ed? * small changes * CI flaky so just move compile4 fix in * copy typing of onnx_load * actually can just import onnx_load instead of onnx.load * fix external_benchmark_openpilot * fix onnx_runner test to use onnx_helper * rerun CI * try run_modelproto * spam CI a few times * revert run_modelproto since that's flaky also * no external onnx_load usage except onnx.py * cursor tab complete is evil. Snuck a darn sorted in. But does order change result? Why? * model_benchmark 193s -> 80s, add OnnxRunner.to()... * minimize diff and clean up * device can be None, weird but eh --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-07-12 14:27:46 -04:00
nimlgen	110cff3f2e	fix device arg to Tensor.randn (#11194 ) * fix device arg to Tensor.randn * simpler test * self.assertEqual	2025-07-12 13:51:59 -04:00
chenyu	6283d50224	DEPRECATED_linearize -> to_program [pr] (#11198 )	2025-07-12 13:46:20 -04:00
nimlgen	ea7f2f779c	hcq: p2p nv-amd (#11195 ) * hcq: p2p between diff devices * fix	2025-07-12 18:53:34 +03:00
qazal	d3ec63a5c3	viz: add base class for unittests (#11178 )	2025-07-11 13:58:03 +03:00
nimlgen	fb278c6a02	do not recreate Compiled.profile_events in helper_collect_profile (#11171 )	2025-07-10 23:55:12 +03:00
qazal	bde80c0cdf	record GraphEvents in metal graph (#11145 ) * record GraphEvents in metal graph * add TestProfiler.test_graph, revert old stuff * move profile capture to MetalGraph * comment * don't double record graph command buffers * wait_check * explicit delete	2025-07-10 21:32:06 +03:00
chenyu	7db07e5f2c	don't narrow range of CAST on bool/unsigned (#11156 )	2025-07-09 22:20:09 -04:00
George Hotz	4156baee93	break swizzle into three chunks [pr] (#11153 ) * break swizzle into three chunks [pr] * test failed	2025-07-09 15:30:34 -07:00

1 2 3 4 5 ...

4043 Commits