tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-15 01:48:23 -05:00

Author	SHA1	Message	Date
George Hotz	15ee742afa	add get_children_map to uop (#9470 ) * add get_children_map to uop * update_children * fix new children	2025-03-17 14:36:13 +08:00
George Hotz	cb7a7f69c7	quantization preprocessor from DSP, should be universal (#9437 ) * quantization preprocessor from DSP, should be universal * touchups * fix tests	2025-03-15 07:49:37 +08:00
chenyu	99b0287e4e	add GROUP and GROUPTOP to test_arange (#9432 ) it does not grow quadratically, but it's not 0 ops now	2025-03-13 11:28:38 -04:00
qazal	90ffa9bd45	swizzle without buffer ops try 2 [pr] (#9427 ) * add DONT_PUSH_VIEWS to matchers * swizzle without buffer ops try 2 [pr] * swizzle reduceop * simple failing test * fix failing test * s/on/for	2025-03-13 10:00:40 +01:00
chenyu	22fc0a2e36	bert sum acc in half (#9412 ) also BS=96	2025-03-11 23:03:15 -04:00
George Hotz	e174c6c3bc	new devectorizer (#9331 ) * new devectorizer * lidx * test linearizer passes * fix images * fix unfoldable image load * delete unused * improve fix_unfoldable_image_load * working for image * fixup types * fixup transcendental * cast_vec * cleaner transcendental * skip failing test * err, flip that * not devec * sqrt	2025-03-11 18:47:56 +08:00
George Hotz	2780e2027e	devectorize prereqs [pr] (#9404 )	2025-03-11 12:33:29 +08:00
chenyu	01e8b60911	acc_dtype -> dtype (#9402 ) matched numpy and torch	2025-03-10 16:05:30 -04:00
qazal	59dfb234eb	replace hardcoded ast with tensors in TestSwizzle [pr] (#9401 )	2025-03-10 19:33:57 +01:00
geohotstan	1d64c12f2b	add Topk to tensor (#9343 ) * terrible but somewhat working impl * linux behaves differently than macos? * slightly better impl * small clean up; haven't figured this out yet * better * torch has different behavior on linux and macos for duplicated values * add sum docs * fix test * add torch return_type test * add an exception test * wrap_fxn instead, and move op lower in order * better repeated values test * rerun ci	2025-03-09 20:01:42 -04:00
qazal	a1f41fadf6	test_schedule cleanups + add DONT_GROUP_REDUCES [pr] (#9392 ) * test_schedule cleanups + add DONT_GROUP_REDUCES [pr] * replace with test_swizzle_reduceop * delete duplicate tests * test_allow_push_permutes * one kernel tests	2025-03-09 15:01:08 +01:00
qazal	286b480f82	do not replace assign with the offset buffer [pr] (#9387 )	2025-03-08 11:57:44 +01:00
qazal	0d2762c010	prep refactor for adding buffer ops last [pr] (#9383 ) * prep refactor for adding buffer ops last [pr] * freeze buffers * add swizzle_reduceop * shape for reduceop_view_right * simpler elementwise_view_right * add shapetracker to const * only const * from process replay	2025-03-08 08:00:14 +01:00
nimlgen	243078dda9	am: optimize tlb usage (#9049 ) * am: optimize tlb usage * fxies * comments * tiny	2025-03-07 19:37:29 +03:00
geohotstan	088d86691b	fix onnx gather and onnx auto_pad VALID mode (#9375 ) * fix gather and auto_pad * long -> int64	2025-03-07 10:27:23 -05:00
hooved	136cf7b8b1	hotfix: load >2 GiB from disk on macOS (#9361 ) * enable loading >2 GiB buffer from disk on macOS * handle None case raised by mypy * add test * revert fix to repro bug in CI * tell CI to run a unit test for macOS * reapply fix	2025-03-07 14:51:58 +08:00
nimlgen	9bd13de44c	lower test_gemv_4096_16384 to 750 for red (#9367 )	2025-03-05 22:44:48 +03:00
uuuvn	b75f307234	amd: autogen ip bases (#9360 )	2025-03-05 22:30:38 +03:00
chenyu	2cb2fce8d9	lower test_gemm_8192 amd_tflops to 65 (#9364 )	2025-03-05 14:06:11 -05:00
nimlgen	14c88abf27	add some options to allreduce bench (#9348 )	2025-03-04 23:46:36 +03:00
Anish Umale	bafa40fe12	Tiny backend test_ops fix part1 (#9338 ) * extract name methods from https://github.com/tinygrad/tinygrad/pull/9302 * t.grad.numpy() -> t.grad.cpu().numpy() * revert TORCH_DEBUG change * revert dtype change in aten.sum	2025-03-03 12:36:51 -05:00
George Hotz	0d4ba7dd87	import tinygrad.frontend.torch (#9337 ) * import tinygrad.frontend.torch * type ignore	2025-03-04 00:15:29 +08:00
qazal	23084fd850	merge merge_views and remove_movement_ops [pr] (#9333 ) * merge merge_views and remove_movement_ops [pr] * fix that assert	2025-03-03 12:38:59 +01:00
George Hotz	ece0a0f305	use empty for test instead of rand (#9332 )	2025-03-03 16:19:06 +08:00
George Hotz	2cc4cb74f0	reorder binops (#9328 ) * reorder binops * test improvements + fix string tests * ugh, okay this	2025-03-03 14:58:18 +08:00
chenyu	146eb73790	fix Tensor.view with a tuple arg (#9330 )	2025-03-02 23:35:23 -05:00
chenyu	ba4b8c2c23	Tensor.copysign (#9329 )	2025-03-02 21:33:49 -05:00
nimlgen	8cae00833c	flaky test in ci (#9321 )	2025-03-02 16:27:22 +03:00
Ali Ladjevardi	00028e87bb	Failing test for not realizing intermediate expand in multi-GPU (#9320 )	2025-03-02 12:54:48 +01:00
George Hotz	ba97fd0b9c	hotfix: add test/external/external_benchmark_disk_raw	2025-03-02 02:32:15 +00:00
chenyu	cc2bbb0bf1	Tensor.isfinite (#9316 )	2025-03-01 19:58:56 -05:00
geohotstan	d9ec05cea6	Test Onnx quantization behavior (#9301 ) * add DynamicDequantizeLinear and corresponding tests * wow qlinearops are round away from zero * this passes locally... * again * try * try separate test * round to even again * also add QLinearMul --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-01 19:21:58 -05:00
chenyu	fe0f860209	update test_ops for tensors from torch (#9308 ) a few detach().numpy() -> detach().cpu().numpy()	2025-02-28 15:57:25 -05:00
chenyu	38d7aae3b7	onnx fmod (#9307 )	2025-02-28 14:09:22 -05:00
chenyu	7c7db78feb	support float mod (#9306 ) also added spec check on Ops.MOD to be ints only	2025-02-28 13:33:58 -05:00
chenyu	90808e2dd0	div rounding_mode (#9304 )	2025-02-28 11:38:25 -05:00
chenyu	3ae66e59a3	least_upper_float is at least default_float (#9303 ) * least_upper_float is at least default_float en route for div rounding mode. dtype of true int division would change from int32 to default_float, which matches torch too. * fix bert acc	2025-02-28 10:41:56 -05:00
Eitan Turok	d657d5f754	[Bounty] Vectorize Transcendental (#9058 ) * init * cast everythig right * more casting * install pillow in test * quick tests * simplify * quick tests * delete test * tests * fix import error * add vec to ldexp3k * vec for bitcast * some helper tests * high level tests * clean tests * change tolerance so cuda passes * ruff passes * remove tests for transcendental helpers * ruff passes * make exponent in power vectorized * fix pow test * add newline * add vec dtype to ilogb2k * comment + clean up * ruff --------- Co-authored-by: chenyu <chenyu@fastmail.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-28 15:47:25 +08:00
qazal	cdf66cc67f	test: recompute expanded CAST (#9286 ) * those views should merge * diff cleanup * gpu * put it behind CAST_AFTER_EXPAND	2025-02-27 19:22:17 +01:00
chenyu	4342300eff	lower test_gemm_8192 amd to 70 (#9277 ) flaky	2025-02-26 16:32:08 -05:00
Francis Lata	86b737a120	leakyrelu to leaky_relu (#9270 )	2025-02-26 13:22:08 -05:00
chenyu	cd822bbe11	hotfix torch_grad.detach().cpu().numpy() in test_ops (#9268 )	2025-02-26 12:27:35 -05:00
chenyu	49ca90df75	update test_ops backward tests (#9267 ) instead of `(out+1).square().mean().backward()`, use forward.sum().gradient to get closer to the gradients	2025-02-26 12:09:24 -05:00
chenyu	aaf0a8069f	xor -> bitwise_xor (#9264 )	2025-02-26 10:21:14 -05:00
qazal	e162aa862d	is_realized only if buffer is allocated (#9253 ) * is_realized only if the buffer is allocated * fix the image check too * assert test_lil_model after ExecItems run	2025-02-26 08:58:08 +01:00
George Hotz	3f4eb9006a	test for device mismatch [pr] (#9250 ) * test for device mismatch [pr] * fix bert	2025-02-26 13:06:33 +08:00
Sieds Lykles	9c4d9d9f10	Acc first (#9232 ) * put acc in front of the add chain * handle the other case * Make loop collapse more generic * Remove mulacc_unrolled * Actually remove it --------- Co-authored-by: George Hotz <geohot@gmail.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-25 22:10:15 -05:00
nimlgen	70db8c3003	hcq: dyn alloc signals (#9238 ) * hcq: dyn alloc signals * types and uniqueue devs * typing * mypy * mypy one more time * test * make fds to not intersect in mockgpu between drivers	2025-02-25 17:22:24 +03:00
nimlgen	b4c3780df0	hotfix: interop example (#9237 ) * hotfix: interop example * rm this * fix * fix ci mps * atol rtol * no uaf	2025-02-25 10:32:00 +03:00
Sieds Lykles	990c240b82	Stable pow gradient (#9226 ) * Stable gradient * More efficient * Fix and test for +-inf * cleaner * skip webgpu test --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-24 20:54:26 -05:00

... 23 24 25 26 27 ...

4667 Commits