tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-20 20:38:03 -05:00

Author	SHA1	Message	Date
chenyu	702e38dc19	remove FUSE_ARANGE_UINT (#11567 ) also add IGNORE_OOB=1 to bert runs. lowered BS on tinybox to 90 since 96 oom during eval without reset	2025-08-07 16:49:06 -04:00
geohotstan	1163292759	move onnx_parser into onnx (#11530 )	2025-08-06 10:46:27 -04:00
nimlgen	eafc7fda12	upd perfetto (#11528 )	2025-08-06 14:00:34 +03:00
nimlgen	4877aa965a	ast seems to probe nv as well (#11494 )	2025-08-04 11:47:07 +03:00
George Hotz	8ff03806e8	add llama layers (#11460 ) * add llama layers * add contig bw for speed	2025-07-31 16:28:04 -07:00
George Hotz	474ee9daa5	hotfix: add contiguous_backward to llama	2025-07-31 15:07:12 -07:00
kevvz	c3cfcb50cb	Add linalg_det and test for torch backend (#11405 ) * add linalg_det and test * space --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-07-30 22:04:44 -04:00
wozeparrot	825b6a2505	feat: llama3 dataloader (#11340 )	2025-07-30 13:27:55 -07:00
nimlgen	5fc5bb5237	ci: clear processes (#11434 ) * unified hcq_smi for managment * fix * fix * no reset for amd	2025-07-30 22:15:18 +03:00
George Hotz	4f26a9ad32	check elements_per_thread in tensorcore [pr] (#11435 )	2025-07-30 11:55:48 -07:00
George Hotz	1bef2d80c1	unrolls are all in the same scope (#11429 ) * unrolls are all in the same scope * fix that import	2025-07-29 16:55:37 -07:00
George Hotz	03909f2772	permute locals for HL uop matmul (#11412 ) * permute locals for HL uop matmul * parens fix that * permutes * 20 TFLOPS	2025-07-29 08:19:59 -07:00
George Hotz	735ad5f10d	kernel4 and 5 in uops (#11411 ) * move simplify views to merge views * add amd kernel 4 * Revert "move simplify views to merge views" This reverts commit `1e07dff384`. * k4 in python * kernel4 written in uops * k5 support * cleanups	2025-07-28 19:35:48 -07:00
George Hotz	fddc645668	HL=2 top matmul (#11406 ) * HL=2 top matmul * top colored	2025-07-28 12:32:38 -07:00
George Hotz	dfeee63d30	uop matmul work (#11388 ) * uop matmul work * works with locals	2025-07-26 21:23:55 -07:00
George Hotz	2c70eaf18c	fix load / barrier (#11386 ) * fix load / barrier * cleanups * fix CI	2025-07-26 10:27:37 -07:00
George Hotz	466ab5a3f2	store/load not pass through index (#11381 ) * noop * fix noop * store cat is NOOP * store dtype is void * stores aren't passed through anymore * meh, skip those for ptx * correct ptx skip * hl runs	2025-07-25 21:01:47 -07:00
chenyu	3d68feb67d	minor onnx Gather cleanup (#11375 ) removed a type ignore and one error code skip	2025-07-25 21:08:08 -04:00
George Hotz	490a93902c	define reg doesn't have init anymore (#11365 ) * define reg doesn't have init anymore * remove that * no special logic for dr * fix amd uop matmul	2025-07-24 19:15:49 -07:00
George Hotz	0602b22086	kernel spec (#11359 ) * kernel spec * ops.VIEW * work	2025-07-24 12:45:38 -07:00
George Hotz	b0dc97d1f7	write out kernel 3 in uops (#11352 ) * write out kernel 3 in uops * matmul is correct * gemm passes spec * bugfix to match speed * cleanups	2025-07-23 17:32:38 -07:00
chenyu	86e7504111	mypy check extra/onnx.py (#11348 ) instead of running test with 3.10, add onnx to mypy which would have caught StrEnum regression. Several type annotation failed mypy now that does not affect running the code and were skipped for now	2025-07-23 12:42:59 -04:00
chenyu	960da9319d	Remove StrEnum in onnx for python 3.10 (#11345 ) some training tests failed looks like parsing error?	2025-07-23 11:52:25 -04:00
George Hotz	108aac8af4	use AddrSpace instead of local (#11314 ) * use AddrSpace instead of local * addrspace in test	2025-07-21 14:00:06 -07:00
geohotstan	445ff8de56	ONNX onnx_parser and buffer_parse clean up (#11000 ) * start * remove onnx.load from compile4 and move np to dropout * clean up and enable test * clean up * move WebGPU ONNX test into MacOS (WebGPU) * leave test in ONNX (CPU) * fix raw_data init None, and simplify onnx_runner test a little? * THESE TESTS ARE SO UGLY UGHH * need to really think about how to structure the test * wow LLMs are quite something * not always on disk now * also add external data loading test * cleaner tests * minimize diff and add const folding tests * add external data loading too * whoops add webgpu back.. but why was it not needed in the first place? * better comment * move webgpu test to macos(webgpu)? * llm english so much better than me wow * trigger CI to check flakiness --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-07-21 15:10:25 -04:00
George Hotz	842184a1ab	rename kernelize to schedule, try 2 (#11305 )	2025-07-21 11:18:36 -07:00
वेदांत	e368628736	Add amin support to Tensor operations in Torch backend (#11290 ) * intiger div mod fix * Revert "intiger div mod fix" This reverts commit `d5d2f201bf`. * feat arg_min support * tets update * test fix	2025-07-21 09:14:08 -04:00
nimlgen	cc3c1e4c14	hcq: move cpu to hcq (#11262 ) * hcq: move cpu to hcq * import time * upd * fix * windows support * hm * cleaner * fix timer * fix timing * std is ns * skip profiler * mypy * cleaner * cleanups * after merge * default is back	2025-07-21 15:10:38 +03:00
nimlgen	2f72be5055	nv_smi: init basic insmod/rmmod/reset cmds (#11282 )	2025-07-19 15:43:03 +03:00
qazal	577e581943	fix typo in sqtt/readme (#11281 )	2025-07-19 15:10:24 +03:00
geohotstan	536b254df4	Bump onnx to 1.18.0 (#11266 ) * bump * thou hast implement functions * hacked in domain support * some clean ups * hack quantize_onnx_test too * add helper lol, why onnx tests why * better dispatcher, but need tests and better naming * flaky ci * change some names * small clean ups * make it easier to clean up tests once ORT supports 1.18.0 * nits * fix bug of Softmax_1 being registered in onnx_ops * need a default value * resolve_const is better name * fix OnnxRunner.to * use proper domain names	2025-07-17 15:35:41 -04:00
chenyu	c8e5c4d7c3	insert_before -> insert_at [pr] (#11257 ) more precise	2025-07-15 17:44:34 -04:00
chenyu	a0438012af	remove Kernel.get_program [pr] (#11203 )	2025-07-12 20:50:29 -04:00
George Hotz	d67c8e7b42	local metal on metal in uop syntax (#11185 ) * local metal on metal in uop syntax * TODO: just put the axis_info in the kernelinfo * local * amd_matmul works @ 28 TFLOPS * clean up matmul * kernel8 works * remove that * locals * axistype innovation * work * cleanup * kernel3 regs * cleanup kernel3 * work * why is it broken * no beam * reenable * permutes	2025-07-12 16:31:19 -07:00
geohotstan	5ce278b245	OnnxRunner file as input (#10789 ) * file path as input and have parse be in OnnxRunner.__init__ * modelproto_to_onnxrunner -> modelproto_to_runner * whoops, fix import * oh flakiness again, is it because it's getting gc-ed? * small changes * CI flaky so just move compile4 fix in * copy typing of onnx_load * actually can just import onnx_load instead of onnx.load * fix external_benchmark_openpilot * fix onnx_runner test to use onnx_helper * rerun CI * try run_modelproto * spam CI a few times * revert run_modelproto since that's flaky also * no external onnx_load usage except onnx.py * cursor tab complete is evil. Snuck a darn sorted in. But does order change result? Why? * model_benchmark 193s -> 80s, add OnnxRunner.to()... * minimize diff and clean up * device can be None, weird but eh --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-07-12 14:27:46 -04:00
chenyu	6283d50224	DEPRECATED_linearize -> to_program [pr] (#11198 )	2025-07-12 13:46:20 -04:00
George Hotz	2893feb9f6	cleanups for kernel.py (#11143 ) * cleanups for kernel.py * fixups	2025-07-08 18:10:25 -07:00
chenyu	7ce9e45474	mypy onnx_parser (#11141 )	2025-07-08 19:50:28 -04:00
chenyu	ffcc557986	lint onnx and onnx_parser (#11134 )	2025-07-08 15:28:35 -04:00
qazal	3dfc0ff887	move cpu_profile and shared ProfileEvents from device.py to helpers [pr] (#11126 ) * move cpu_profile and shared ProfileEvents to helpers [pr] * TestProfiler.test_cpu_profile * update test_viz.py * TestProfiler.test_profile_multiops ordering, it's different streams now	2025-07-08 12:14:03 +03:00
nimlgen	71377cd233	nv: parse falcon app descs (#11118 )	2025-07-07 18:14:14 +03:00
kevvz	b7af9cf849	clean svd tests, set full_matrices false in torch backend (#11113 ) * clean tests, set full_matrices false * add more shape asserts	2025-07-06 13:55:49 -04:00
chenyu	ba88ec3ad0	pipe linalg svd to torch (#11109 ) and found a bug in svd	2025-07-06 08:37:25 -04:00
nimlgen	4dccb2ea49	am_smi: increase kill retries (#11099 )	2025-07-05 16:23:50 +03:00
0xSG	17119b0f23	hip_ioctl: platform.machine added (#11084 )	2025-07-04 17:20:24 +03:00
nimlgen	2d138c6cf1	am: factor out init_sw (#11070 )	2025-07-03 11:01:17 +03:00
chenyu	425d5f55c4	generate kernel dataset and upload artifact (#11063 )	2025-07-02 17:21:25 -04:00
chenyu	4626e9c172	is_numpy_ndarray helper [pr] (#11050 )	2025-07-02 09:12:53 -04:00
chenyu	126fcf4129	clean up AMD_LLVM in tests (#11021 )	2025-06-28 22:45:47 -04:00
chenyu	a6485d00c8	very tiny generate_dataset (#11013 ) one minute to gen on my mac	2025-06-27 17:10:45 -04:00

1 2 3 4 5 ...

1248 Commits