tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-04-07 03:00:26 -04:00

Author	SHA1	Message	Date
George Hotz	dfeee63d30	uop matmul work (#11388 ) * uop matmul work * works with locals	2025-07-26 21:23:55 -07:00
George Hotz	2c70eaf18c	fix load / barrier (#11386 ) * fix load / barrier * cleanups * fix CI	2025-07-26 10:27:37 -07:00
George Hotz	466ab5a3f2	store/load not pass through index (#11381 ) * noop * fix noop * store cat is NOOP * store dtype is void * stores aren't passed through anymore * meh, skip those for ptx * correct ptx skip * hl runs	2025-07-25 21:01:47 -07:00
chenyu	3d68feb67d	minor onnx Gather cleanup (#11375 ) removed a type ignore and one error code skip	2025-07-25 21:08:08 -04:00
George Hotz	490a93902c	define reg doesn't have init anymore (#11365 ) * define reg doesn't have init anymore * remove that * no special logic for dr * fix amd uop matmul	2025-07-24 19:15:49 -07:00
George Hotz	0602b22086	kernel spec (#11359 ) * kernel spec * ops.VIEW * work	2025-07-24 12:45:38 -07:00
George Hotz	b0dc97d1f7	write out kernel 3 in uops (#11352 ) * write out kernel 3 in uops * matmul is correct * gemm passes spec * bugfix to match speed * cleanups	2025-07-23 17:32:38 -07:00
chenyu	86e7504111	mypy check extra/onnx.py (#11348 ) instead of running test with 3.10, add onnx to mypy which would have caught StrEnum regression. Several type annotation failed mypy now that does not affect running the code and were skipped for now	2025-07-23 12:42:59 -04:00
chenyu	960da9319d	Remove StrEnum in onnx for python 3.10 (#11345 ) some training tests failed looks like parsing error?	2025-07-23 11:52:25 -04:00
George Hotz	108aac8af4	use AddrSpace instead of local (#11314 ) * use AddrSpace instead of local * addrspace in test	2025-07-21 14:00:06 -07:00
geohotstan	445ff8de56	ONNX onnx_parser and buffer_parse clean up (#11000 ) * start * remove onnx.load from compile4 and move np to dropout * clean up and enable test * clean up * move WebGPU ONNX test into MacOS (WebGPU) * leave test in ONNX (CPU) * fix raw_data init None, and simplify onnx_runner test a little? * THESE TESTS ARE SO UGLY UGHH * need to really think about how to structure the test * wow LLMs are quite something * not always on disk now * also add external data loading test * cleaner tests * minimize diff and add const folding tests * add external data loading too * whoops add webgpu back.. but why was it not needed in the first place? * better comment * move webgpu test to macos(webgpu)? * llm english so much better than me wow * trigger CI to check flakiness --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-07-21 15:10:25 -04:00
George Hotz	842184a1ab	rename kernelize to schedule, try 2 (#11305 )	2025-07-21 11:18:36 -07:00
वेदांत	e368628736	Add amin support to Tensor operations in Torch backend (#11290 ) * intiger div mod fix * Revert "intiger div mod fix" This reverts commit `d5d2f201bf`. * feat arg_min support * tets update * test fix	2025-07-21 09:14:08 -04:00
nimlgen	cc3c1e4c14	hcq: move cpu to hcq (#11262 ) * hcq: move cpu to hcq * import time * upd * fix * windows support * hm * cleaner * fix timer * fix timing * std is ns * skip profiler * mypy * cleaner * cleanups * after merge * default is back	2025-07-21 15:10:38 +03:00
nimlgen	2f72be5055	nv_smi: init basic insmod/rmmod/reset cmds (#11282 )	2025-07-19 15:43:03 +03:00
qazal	577e581943	fix typo in sqtt/readme (#11281 )	2025-07-19 15:10:24 +03:00
geohotstan	536b254df4	Bump onnx to 1.18.0 (#11266 ) * bump * thou hast implement functions * hacked in domain support * some clean ups * hack quantize_onnx_test too * add helper lol, why onnx tests why * better dispatcher, but need tests and better naming * flaky ci * change some names * small clean ups * make it easier to clean up tests once ORT supports 1.18.0 * nits * fix bug of Softmax_1 being registered in onnx_ops * need a default value * resolve_const is better name * fix OnnxRunner.to * use proper domain names	2025-07-17 15:35:41 -04:00
chenyu	c8e5c4d7c3	insert_before -> insert_at [pr] (#11257 ) more precise	2025-07-15 17:44:34 -04:00
chenyu	a0438012af	remove Kernel.get_program [pr] (#11203 )	2025-07-12 20:50:29 -04:00
George Hotz	d67c8e7b42	local metal on metal in uop syntax (#11185 ) * local metal on metal in uop syntax * TODO: just put the axis_info in the kernelinfo * local * amd_matmul works @ 28 TFLOPS * clean up matmul * kernel8 works * remove that * locals * axistype innovation * work * cleanup * kernel3 regs * cleanup kernel3 * work * why is it broken * no beam * reenable * permutes	2025-07-12 16:31:19 -07:00
geohotstan	5ce278b245	OnnxRunner file as input (#10789 ) * file path as input and have parse be in OnnxRunner.__init__ * modelproto_to_onnxrunner -> modelproto_to_runner * whoops, fix import * oh flakiness again, is it because it's getting gc-ed? * small changes * CI flaky so just move compile4 fix in * copy typing of onnx_load * actually can just import onnx_load instead of onnx.load * fix external_benchmark_openpilot * fix onnx_runner test to use onnx_helper * rerun CI * try run_modelproto * spam CI a few times * revert run_modelproto since that's flaky also * no external onnx_load usage except onnx.py * cursor tab complete is evil. Snuck a darn sorted in. But does order change result? Why? * model_benchmark 193s -> 80s, add OnnxRunner.to()... * minimize diff and clean up * device can be None, weird but eh --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-07-12 14:27:46 -04:00
chenyu	6283d50224	DEPRECATED_linearize -> to_program [pr] (#11198 )	2025-07-12 13:46:20 -04:00
George Hotz	2893feb9f6	cleanups for kernel.py (#11143 ) * cleanups for kernel.py * fixups	2025-07-08 18:10:25 -07:00
chenyu	7ce9e45474	mypy onnx_parser (#11141 )	2025-07-08 19:50:28 -04:00
chenyu	ffcc557986	lint onnx and onnx_parser (#11134 )	2025-07-08 15:28:35 -04:00
qazal	3dfc0ff887	move cpu_profile and shared ProfileEvents from device.py to helpers [pr] (#11126 ) * move cpu_profile and shared ProfileEvents to helpers [pr] * TestProfiler.test_cpu_profile * update test_viz.py * TestProfiler.test_profile_multiops ordering, it's different streams now	2025-07-08 12:14:03 +03:00
nimlgen	71377cd233	nv: parse falcon app descs (#11118 )	2025-07-07 18:14:14 +03:00
kevvz	b7af9cf849	clean svd tests, set full_matrices false in torch backend (#11113 ) * clean tests, set full_matrices false * add more shape asserts	2025-07-06 13:55:49 -04:00
chenyu	ba88ec3ad0	pipe linalg svd to torch (#11109 ) and found a bug in svd	2025-07-06 08:37:25 -04:00
nimlgen	4dccb2ea49	am_smi: increase kill retries (#11099 )	2025-07-05 16:23:50 +03:00
0xSG	17119b0f23	hip_ioctl: platform.machine added (#11084 )	2025-07-04 17:20:24 +03:00
nimlgen	2d138c6cf1	am: factor out init_sw (#11070 )	2025-07-03 11:01:17 +03:00
chenyu	425d5f55c4	generate kernel dataset and upload artifact (#11063 )	2025-07-02 17:21:25 -04:00
chenyu	4626e9c172	is_numpy_ndarray helper [pr] (#11050 )	2025-07-02 09:12:53 -04:00
chenyu	126fcf4129	clean up AMD_LLVM in tests (#11021 )	2025-06-28 22:45:47 -04:00
chenyu	a6485d00c8	very tiny generate_dataset (#11013 ) one minute to gen on my mac	2025-06-27 17:10:45 -04:00
George Hotz	be53ef4f0a	rename DEFINE_ACC -> DEFINE_REG (#11006 ) * rename DEFINE_ACC -> DEFINE_REG * add CMPEQ to groupops	2025-06-27 11:09:25 -07:00
George Hotz	b4eb876d5a	kernel.py no longer permutes reduce axis [pr] (#10968 ) * kernel.py no longer permutes reduce axis [pr] * delete tests that handcode uops * regen of sops is broken... * put import back * just remove that * disable those tests	2025-06-26 17:44:58 -07:00
George Hotz	856759c79c	add halide example (#10980 ) * add halide example * upd halide gemm * partial works * touchups	2025-06-26 16:14:57 -07:00
qazal	1127302c46	move perfetto to extra (#10994 ) * move perfetto to extra * update TestViz and fix tests * remove perfetto.html from viz directory * work * mypy	2025-06-27 01:53:54 +03:00
qazal	712980e167	fix extract_dataset + add tests to CI (#10995 ) * fix extract_dataset + tests * add CI * sops.gz itself is same as master * yml + gzip -c + ge * don't commit that * bump limit to 1000 * axis=7 * test_tiny	2025-06-27 01:51:36 +03:00
geohotstan	50936b4a18	ONNX real float16 (#10694 ) * squash commits * temp fix for const tensor * actually realizing float16 can only happen in raw_data * .float -> cast(float) to rerun CI --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-06-26 14:05:12 -04:00
chenyu	49bba2f0a0	improve test_nll_loss (#10986 ) build target and weight tensors outside so it tests backward too.	2025-06-26 02:46:55 -04:00
nimlgen	1c45b9f7fb	start nvpci (#10521 ) * start nvpci * talk to fsp * boot args * riscv core bootted * q * agen * got gsp init msg * some fixes * set registry, stuck aft lockdown( * start ga/ad port * gsp init on ada * more classes allocated * more * mm * fixes and progress * no huge pages for now * mm seems workin, but switch to 512mb page for simplicity * working state * not cleaned * claned * nvd=1 * start gr ctx * compute * clean 1 * cleanup 2 * cleanup 3 * cleaner 4 * cleaner 6 * add iface to nv * save before reboot * merged into NV * moveout mm * post merge * cleaner 7 * merge and rebase * pciiface abstraction + reset * download fw from web * print logs * minor changes + p2p * cleaner 8 * cleaner 9 * cleaner 10 * delete * delete this as well * linter 1 * oops * priv_client -> priv_root * fix mypy * mypy? * mypy? * small changes * shorter * ops * remove this * do not allocate paddr for reserve * nodiff * unified script * ops * dif ver * add lock * setup	2025-06-25 00:37:34 +03:00
chenyu	ffb032e31d	test_diagonal touchup (#10962 )	2025-06-24 15:51:19 -04:00
Utkarsh Gill	7f9958b632	Fix torch.linalg.diagonal crash due to invalid shrink in to_movement_ops (#10945 ) * fix as_strided shrink bug breaking torch.linalg.diagonal on tinygrad backend * cleanup * generic fix * tests * cmp with diagonal too * oops * move tests * fix test * remove unnecessary import * fix assert * compare against numpy --------- Co-authored-by: Utkarsh Gill <engelbart@Utkarshs-MacBook-Pro.local>	2025-06-24 15:36:06 -04:00
chenyu	18e264a449	Tensor.logsigmoid (#10955 )	2025-06-24 11:16:14 -04:00
George Hotz	e15754db28	remove (some) kernelize from llama and test schedule speed (#10939 ) * remove kernelize from llama * 405B * space	2025-06-23 15:07:31 -07:00
alpharush	22f9696522	Fix/hcqfuzz harnesss bug (#10923 ) * update command so extra module is found * fix empty range in randrange errors * lint	2025-06-23 11:22:30 +03:00
geohotstan	4ab7d792cc	ONNX improve dtype fallback (#10800 ) * fix * add early verbose demo test * is this how to write tests :s * is definition drift even a thing? gemini says it is * clean up * better * even better * try add to CI * doesn't work quite yet * much more work to be done * whoops * partition the test heh * skipif * some nits for better names * add webgpu test for onnxrunner * fix reference links * flush for now	2025-06-21 19:29:45 -04:00

1 2 3 4 5 ...

1184 Commits