tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 15:08:02 -05:00

Author	SHA1	Message	Date
Francis Lata	86b737a120	leakyrelu to leaky_relu (#9270 )	2025-02-26 13:22:08 -05:00
chenyu	cd822bbe11	hotfix torch_grad.detach().cpu().numpy() in test_ops (#9268 )	2025-02-26 12:27:35 -05:00
chenyu	49ca90df75	update test_ops backward tests (#9267 ) instead of `(out+1).square().mean().backward()`, use forward.sum().gradient to get closer to the gradients	2025-02-26 12:09:24 -05:00
chenyu	aaf0a8069f	xor -> bitwise_xor (#9264 )	2025-02-26 10:21:14 -05:00
George Hotz	2158dc4849	full fix for as_strided in torch backend (#9257 ) * fixes from chargpt for torch backend * shrink support * add stride support * comment cleanup * a few more * work * import the stream hack * llvm multi auto	2025-02-26 22:34:05 +08:00
qazal	f60f997bf7	viz ui fixes [pr] (#9261 )	2025-02-26 14:52:18 +01:00
qazal	bfd1e55bda	show zoom to fit button in VIZ if graph isn't in view [pr] (#9258 ) * show zoom to fit button in VIZ if graph isn't in view [pr] * select #render	2025-02-26 14:20:39 +01:00
qazal	f70bad42ce	minor becomes_map cleanup + comments [pr] (#9256 ) * substitute assign source for KERNEL + comments [pr] * minor becomes_map cleanup + comments [pr]	2025-02-26 12:36:27 +01:00
George Hotz	7780393460	rig up torch's testing framework [pr] (#9254 ) * rig up torch's testing framework [pr] * support more movement ops * dec on expand * fix tests * work * fix tests * a few more * decomps + opt hook * installed pytest	2025-02-26 18:46:22 +08:00
qazal	b3755370ae	substitute assign source for KERNEL + comments [pr] (#9255 )	2025-02-26 11:44:29 +01:00
qazal	941559098b	do not lockup VIZ when rendering big graphs [pr] (#8795 ) * new viz renderer * aesthetics * progress message * pruning + timeout at 2s	2025-02-26 09:15:26 +01:00
qazal	e162aa862d	is_realized only if buffer is allocated (#9253 ) * is_realized only if the buffer is allocated * fix the image check too * assert test_lil_model after ExecItems run	2025-02-26 08:58:08 +01:00
George Hotz	b603af373e	run some tests from torch [pr] (#9252 ) * run some tests from torch [pr] * yml * wrap_out * clean up for the new people * a lil more	2025-02-26 15:42:22 +08:00
George Hotz	3f4eb9006a	test for device mismatch [pr] (#9250 ) * test for device mismatch [pr] * fix bert	2025-02-26 13:06:33 +08:00
Sieds Lykles	9c4d9d9f10	Acc first (#9232 ) * put acc in front of the add chain * handle the other case * Make loop collapse more generic * Remove mulacc_unrolled * Actually remove it --------- Co-authored-by: George Hotz <geohot@gmail.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-25 22:10:15 -05:00
chenyu	979e84f30e	RESET_STEP in bert setup and beam (#9248 ) running dev_beam migh OOM without it but runs fine in real run.	2025-02-25 19:15:10 -05:00
nimlgen	2676c9d46e	dsp: raise exec errors as RuntimeError for beam (#9246 )	2025-02-25 19:22:35 +03:00
nimlgen	70db8c3003	hcq: dyn alloc signals (#9238 ) * hcq: dyn alloc signals * types and uniqueue devs * typing * mypy * mypy one more time * test * make fds to not intersect in mockgpu between drivers	2025-02-25 17:22:24 +03:00
chenyu	6610ad58ab	hotfix bert no shard with only one device (#9243 ) `LLVM=1 BERT_SIZE="tiny" DEFAULT_FLOAT=HALF BENCHMARK=5 MODEL="bert" python3 examples/mlperf/model_train.py` runs for me with this. it should not failed with single device shard though	2025-02-25 09:05:11 -05:00
qazal	bba9c22f53	implement the new subbuffer spec for DISK [pr] (#9241 )	2025-02-25 13:36:23 +01:00
qazal	48dfed064a	remove const/var from the kernel graph [pr] (#9240 )	2025-02-25 12:21:55 +01:00
nimlgen	b4c3780df0	hotfix: interop example (#9237 ) * hotfix: interop example * rm this * fix * fix ci mps * atol rtol * no uaf	2025-02-25 10:32:00 +03:00
chenyu	8c7be428e5	update bert BS to 78 (#9236 ) fits 78 now. about 215 tflops on green	2025-02-24 22:47:35 -05:00
Sieds Lykles	990c240b82	Stable pow gradient (#9226 ) * Stable gradient * More efficient * Fix and test for +-inf * cleaner * skip webgpu test --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-24 20:54:26 -05:00
chenyu	731d14e718	hotfix bump testmetal2 timeout-minutes to 20 (#9235 ) setup is taking too long	2025-02-24 20:23:56 -05:00
qazal	cbfe95d306	bring cast before view back (#9230 ) * bring cast before view back * tune it to only trigger on expands --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-25 01:50:39 +02:00
chenyu	90c3ed17c5	move cast to before softmax in attention (#9213 ) * move cast to before softmax in attention saved some memory because exp (which is used for backward) are done in half. training bert seems fine and can fit BS=78 now (from 66) * test	2025-02-24 17:24:59 -05:00
geohotstan	f0b24d230c	add test_onnx_ops.py (#8569 ) * boom * fix webgpu * use exact variable names in test so that AI can read easier * add tag for specific test name like test a specific dtype * fix ruff * astype everything * dtype in array creation * just arange * is 67% considered fixed? * move test up * small cleanups * share function * add qgemm as well * add qgemm too * make sure qgemm comes out as int * take out qgemm for now * fixed test * add correct qgemm * addressing feedback here too, early naive fix for now * simplify bias and c to be minimalistic enough to test correctness * refactored qlinearops * maybe these asserts aren't the best.. * fix test * updated tests to cover new ops * try to add to CI * move test_onnx_ops into testextra/ * more attention tests * qlinear_add atol=1 * attention still not fullllllly correct * it is what it is --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-24 16:15:22 -05:00
nimlgen	56288243e6	metal PyTorch interop (#9229 ) * add from_blob support to mps cuda * objc_id * metal pytorch interop * fix comments --------- Co-authored-by: George Hotz <geohot@gmail.com>	2025-02-24 22:36:08 +03:00
qazal	687d157906	delete cast early folding from ops [pr] (#9228 )	2025-02-24 19:00:51 +01:00
George Hotz	c9493e41a6	reorder expand (#9051 ) * reorder expand * symbolic ops needs resolve here * s/arg/st + whitespace * viz --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-02-24 13:55:47 +01:00
qazal	14aa2395d0	allow VIEW(BUFFER) in Tensor UOps [pr] (#9210 ) * allow VIEW(BUFFER) in Tensor UOps [pr] * still reshapes * update becomes_map tests * bring copy folder to the scheduler * lint * only sgd left * optimizer assign * 13 kernels * rename to test_reorder_expand + assert VIEW	2025-02-24 13:06:15 +01:00
nimlgen	1d06d61b16	from_blob for cuda (#9223 ) * from_blob for cuda * maybe docs? * minor docs * example * waiting 9224 --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-24 14:02:06 +03:00
George Hotz	fc32ff80d6	torch and numpy dtype interop [pr] (#9224 ) * torch and numpy dtype interop [pr] * less lines * order	2025-02-24 18:26:49 +08:00
George Hotz	24615db5f5	hotfix: torch cuda interop example	2025-02-24 09:02:48 +00:00
George Hotz	fd731e740a	hotfix: add note on backend2.py	2025-02-24 11:23:03 +08:00
albanD	f2dd9c1562	simplify c++ code (#9221 )	2025-02-24 11:04:41 +08:00
qazal	d12efc95d4	support custom name function in viz [pr] (#9219 ) * support custom name function in viz [pr] * title case * assert name count in test_track_rewrites_name_fxn	2025-02-24 03:03:25 +02:00
chenyu	b3ae664d5d	fix gradient of pow(t, int) (#9217 ) semi revert some pow logic back to tensor. added direct gradient check because the backward in test_ops passed by luck	2025-02-23 17:42:09 -05:00
qazal	12b5b83821	set TRACK_MATCH_STATS=0 for real_strides [pr] (#9216 )	2025-02-23 23:26:31 +02:00
qazal	9db0ec46a7	simpler buf_uop [pr] (#9215 ) * simpler buf_uop [pr] * assert after realize it's buffer	2025-02-23 19:23:14 +01:00
qazal	898aafe6fd	move split_reduceop to scheduler + enable it for multi (#9214 ) * move split_reduceop to scheduler + enable it for multi * merge r and _reduceop	2025-02-23 17:30:04 +01:00
ShikChen	05e3202fba	remove unused memsize_to_str and minor cleanups [pr] (#9211 ) * fix edge cases in memsize_to_str() Inputs <= 1 now return "0.00 B" for 0 and "1.00 B" for 1, avoiding an IndexError. Also, memsize_to_str(1000) now returns "1.00 KB" instead of "1000.00 B". Replaced the list comprehension with a next(...) generator for conciseness and efficiency. * simplify code using idiomatic python - Remove the unused `memsize_to_str()` function in helpers. - Use a tuple for checking multiple string prefixes/suffixes. - Avoid unnecessary list construction by using iterables directly. - Check None in @diskcache to ensure proper caching of falsy values. * revert generators back to list comprehension Sometimes building list first could be faster. Keep it as is.	2025-02-23 09:58:37 -05:00
qazal	81a71ae0f6	hotfix: skip test_exclude_const_metadata (#9208 )	2025-02-22 23:26:04 +02:00
chenyu	e0adb1fc76	really run test_ops with TINY_BACKEND in ci (#9206 ) was failing with `line 1: pytest: command not found`	2025-02-22 15:51:24 -05:00
qazal	e6d20c47e3	simpler becomes_map update [pr] (#9201 ) * simpler becomes_map update * err, no metadata for device * simpler tensor metadata mapping + tests [pr] * remove kernel metadata * don't map nones * pruning * linter	2025-02-22 20:50:58 +01:00
qazal	4578c3e8fd	simpler tensor metadata mapping + tests [pr] (#9203 ) * simpler tensor metadata mapping + tests [pr] * remove kernel metadata * don't map nones	2025-02-22 20:18:46 +01:00
qazal	b711c6343a	no early return + allow childless const/bind/var in kernel graph [pr] (#9202 )	2025-02-22 19:28:22 +01:00
George Hotz	97bc723538	torch backend works for ResNet-18 (#9200 ) * torch backend progress, a few more functions * resnet works * pillow * tv	2025-02-22 22:16:23 +08:00
George Hotz	f92820d30d	torch backend tests (#9198 ) * torch backend tests * pythonpath * install ninja	2025-02-22 16:01:49 +08:00

1 2 3 4 5 ...

8012 Commits