tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-02-16 01:26:29 -05:00

Author	SHA1	Message	Date
chenyu	184030168d	fix aten.reflection_pad2d (#9289 ) tested the torch doc example	2025-02-27 15:53:46 -05:00
chenyu	0de6585df0	fix aten.normal_ arg (#9288 ) should be mean and std.	2025-02-27 15:36:25 -05:00
chenyu	8ee2b460ee	Tensor.var_mean (#9287 )	2025-02-27 15:15:31 -05:00
qazal	cdf66cc67f	test: recompute expanded CAST (#9286 ) * those views should merge * diff cleanup * gpu * put it behind CAST_AFTER_EXPAND	2025-02-27 19:22:17 +01:00
nimlgen	43e60914f3	init torch hooking (#9284 ) * smth * mv * prof wk * revert and move * fix * nvprof * fix and no print much	2025-02-27 19:36:55 +03:00
George Hotz	387ea41e99	increase speed of torch mnist: use gradient api (#9282 )	2025-02-27 11:57:41 +08:00
Priyank Patel	a0764f0dc0	(bounty) Make mnist training run with torch backend (#9233 ) * yml changes * torch backend remove meta decomps and add test * torch backend bump timeout for tests --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-27 11:32:25 +08:00
George Hotz	67ba073c55	hotfix: test accuracy in beautiful_mnist_torch	2025-02-27 11:18:59 +08:00
George Hotz	9088125a6a	a lil more torch (#9280 )	2025-02-27 11:12:20 +08:00
George Hotz	b6a14911c8	start torch.compile support (#9279 )	2025-02-27 10:29:51 +08:00
chenyu	4342300eff	lower test_gemm_8192 amd to 70 (#9277 ) flaky	2025-02-26 16:32:08 -05:00
nimlgen	c4c29c8acc	nv: parse elf attrs (#9275 ) * better * hm * hm * fixed	2025-02-26 23:21:57 +03:00
chenyu	6350725e2d	simpler leaky_relu (#9271 ) rendered as `(data0+alu0) = ((val0<0.0f)?(0.01fval0):val0);` instead of two wheres. possible to update rewrite rules too	2025-02-26 13:43:48 -05:00
Francis Lata	86b737a120	leakyrelu to leaky_relu (#9270 )	2025-02-26 13:22:08 -05:00
chenyu	cd822bbe11	hotfix torch_grad.detach().cpu().numpy() in test_ops (#9268 )	2025-02-26 12:27:35 -05:00
chenyu	49ca90df75	update test_ops backward tests (#9267 ) instead of `(out+1).square().mean().backward()`, use forward.sum().gradient to get closer to the gradients	2025-02-26 12:09:24 -05:00
chenyu	aaf0a8069f	xor -> bitwise_xor (#9264 )	2025-02-26 10:21:14 -05:00
George Hotz	2158dc4849	full fix for as_strided in torch backend (#9257 ) * fixes from chargpt for torch backend * shrink support * add stride support * comment cleanup * a few more * work * import the stream hack * llvm multi auto	2025-02-26 22:34:05 +08:00
qazal	f60f997bf7	viz ui fixes [pr] (#9261 )	2025-02-26 14:52:18 +01:00
qazal	bfd1e55bda	show zoom to fit button in VIZ if graph isn't in view [pr] (#9258 ) * show zoom to fit button in VIZ if graph isn't in view [pr] * select #render	2025-02-26 14:20:39 +01:00
qazal	f70bad42ce	minor becomes_map cleanup + comments [pr] (#9256 ) * substitute assign source for KERNEL + comments [pr] * minor becomes_map cleanup + comments [pr]	2025-02-26 12:36:27 +01:00
George Hotz	7780393460	rig up torch's testing framework [pr] (#9254 ) * rig up torch's testing framework [pr] * support more movement ops * dec on expand * fix tests * work * fix tests * a few more * decomps + opt hook * installed pytest	2025-02-26 18:46:22 +08:00
qazal	b3755370ae	substitute assign source for KERNEL + comments [pr] (#9255 )	2025-02-26 11:44:29 +01:00
qazal	941559098b	do not lockup VIZ when rendering big graphs [pr] (#8795 ) * new viz renderer * aesthetics * progress message * pruning + timeout at 2s	2025-02-26 09:15:26 +01:00
qazal	e162aa862d	is_realized only if buffer is allocated (#9253 ) * is_realized only if the buffer is allocated * fix the image check too * assert test_lil_model after ExecItems run	2025-02-26 08:58:08 +01:00
George Hotz	b603af373e	run some tests from torch [pr] (#9252 ) * run some tests from torch [pr] * yml * wrap_out * clean up for the new people * a lil more	2025-02-26 15:42:22 +08:00
George Hotz	3f4eb9006a	test for device mismatch [pr] (#9250 ) * test for device mismatch [pr] * fix bert	2025-02-26 13:06:33 +08:00
Sieds Lykles	9c4d9d9f10	Acc first (#9232 ) * put acc in front of the add chain * handle the other case * Make loop collapse more generic * Remove mulacc_unrolled * Actually remove it --------- Co-authored-by: George Hotz <geohot@gmail.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-25 22:10:15 -05:00
chenyu	979e84f30e	RESET_STEP in bert setup and beam (#9248 ) running dev_beam migh OOM without it but runs fine in real run.	2025-02-25 19:15:10 -05:00
nimlgen	2676c9d46e	dsp: raise exec errors as RuntimeError for beam (#9246 )	2025-02-25 19:22:35 +03:00
nimlgen	70db8c3003	hcq: dyn alloc signals (#9238 ) * hcq: dyn alloc signals * types and uniqueue devs * typing * mypy * mypy one more time * test * make fds to not intersect in mockgpu between drivers	2025-02-25 17:22:24 +03:00
chenyu	6610ad58ab	hotfix bert no shard with only one device (#9243 ) `LLVM=1 BERT_SIZE="tiny" DEFAULT_FLOAT=HALF BENCHMARK=5 MODEL="bert" python3 examples/mlperf/model_train.py` runs for me with this. it should not failed with single device shard though	2025-02-25 09:05:11 -05:00
qazal	bba9c22f53	implement the new subbuffer spec for DISK [pr] (#9241 )	2025-02-25 13:36:23 +01:00
qazal	48dfed064a	remove const/var from the kernel graph [pr] (#9240 )	2025-02-25 12:21:55 +01:00
nimlgen	b4c3780df0	hotfix: interop example (#9237 ) * hotfix: interop example * rm this * fix * fix ci mps * atol rtol * no uaf	2025-02-25 10:32:00 +03:00
chenyu	8c7be428e5	update bert BS to 78 (#9236 ) fits 78 now. about 215 tflops on green	2025-02-24 22:47:35 -05:00
Sieds Lykles	990c240b82	Stable pow gradient (#9226 ) * Stable gradient * More efficient * Fix and test for +-inf * cleaner * skip webgpu test --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-24 20:54:26 -05:00
chenyu	731d14e718	hotfix bump testmetal2 timeout-minutes to 20 (#9235 ) setup is taking too long	2025-02-24 20:23:56 -05:00
qazal	cbfe95d306	bring cast before view back (#9230 ) * bring cast before view back * tune it to only trigger on expands --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-25 01:50:39 +02:00
chenyu	90c3ed17c5	move cast to before softmax in attention (#9213 ) * move cast to before softmax in attention saved some memory because exp (which is used for backward) are done in half. training bert seems fine and can fit BS=78 now (from 66) * test	2025-02-24 17:24:59 -05:00
geohotstan	f0b24d230c	add test_onnx_ops.py (#8569 ) * boom * fix webgpu * use exact variable names in test so that AI can read easier * add tag for specific test name like test a specific dtype * fix ruff * astype everything * dtype in array creation * just arange * is 67% considered fixed? * move test up * small cleanups * share function * add qgemm as well * add qgemm too * make sure qgemm comes out as int * take out qgemm for now * fixed test * add correct qgemm * addressing feedback here too, early naive fix for now * simplify bias and c to be minimalistic enough to test correctness * refactored qlinearops * maybe these asserts aren't the best.. * fix test * updated tests to cover new ops * try to add to CI * move test_onnx_ops into testextra/ * more attention tests * qlinear_add atol=1 * attention still not fullllllly correct * it is what it is --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-24 16:15:22 -05:00
nimlgen	56288243e6	metal PyTorch interop (#9229 ) * add from_blob support to mps cuda * objc_id * metal pytorch interop * fix comments --------- Co-authored-by: George Hotz <geohot@gmail.com>	2025-02-24 22:36:08 +03:00
qazal	687d157906	delete cast early folding from ops [pr] (#9228 )	2025-02-24 19:00:51 +01:00
George Hotz	c9493e41a6	reorder expand (#9051 ) * reorder expand * symbolic ops needs resolve here * s/arg/st + whitespace * viz --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-02-24 13:55:47 +01:00
qazal	14aa2395d0	allow VIEW(BUFFER) in Tensor UOps [pr] (#9210 ) * allow VIEW(BUFFER) in Tensor UOps [pr] * still reshapes * update becomes_map tests * bring copy folder to the scheduler * lint * only sgd left * optimizer assign * 13 kernels * rename to test_reorder_expand + assert VIEW	2025-02-24 13:06:15 +01:00
nimlgen	1d06d61b16	from_blob for cuda (#9223 ) * from_blob for cuda * maybe docs? * minor docs * example * waiting 9224 --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-24 14:02:06 +03:00
George Hotz	fc32ff80d6	torch and numpy dtype interop [pr] (#9224 ) * torch and numpy dtype interop [pr] * less lines * order	2025-02-24 18:26:49 +08:00
George Hotz	24615db5f5	hotfix: torch cuda interop example	2025-02-24 09:02:48 +00:00
George Hotz	fd731e740a	hotfix: add note on backend2.py	2025-02-24 11:23:03 +08:00
albanD	f2dd9c1562	simplify c++ code (#9221 )	2025-02-24 11:04:41 +08:00

1 2 3 4 5 ...

8025 Commits