tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-10 07:28:15 -05:00

Author	SHA1	Message	Date
nimlgen	b4c3780df0	hotfix: interop example (#9237 ) * hotfix: interop example * rm this * fix * fix ci mps * atol rtol * no uaf	2025-02-25 10:32:00 +03:00
chenyu	8c7be428e5	update bert BS to 78 (#9236 ) fits 78 now. about 215 tflops on green	2025-02-24 22:47:35 -05:00
Sieds Lykles	990c240b82	Stable pow gradient (#9226 ) * Stable gradient * More efficient * Fix and test for +-inf * cleaner * skip webgpu test --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-24 20:54:26 -05:00
chenyu	731d14e718	hotfix bump testmetal2 timeout-minutes to 20 (#9235 ) setup is taking too long	2025-02-24 20:23:56 -05:00
qazal	cbfe95d306	bring cast before view back (#9230 ) * bring cast before view back * tune it to only trigger on expands --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-25 01:50:39 +02:00
chenyu	90c3ed17c5	move cast to before softmax in attention (#9213 ) * move cast to before softmax in attention saved some memory because exp (which is used for backward) are done in half. training bert seems fine and can fit BS=78 now (from 66) * test	2025-02-24 17:24:59 -05:00
geohotstan	f0b24d230c	add test_onnx_ops.py (#8569 ) * boom * fix webgpu * use exact variable names in test so that AI can read easier * add tag for specific test name like test a specific dtype * fix ruff * astype everything * dtype in array creation * just arange * is 67% considered fixed? * move test up * small cleanups * share function * add qgemm as well * add qgemm too * make sure qgemm comes out as int * take out qgemm for now * fixed test * add correct qgemm * addressing feedback here too, early naive fix for now * simplify bias and c to be minimalistic enough to test correctness * refactored qlinearops * maybe these asserts aren't the best.. * fix test * updated tests to cover new ops * try to add to CI * move test_onnx_ops into testextra/ * more attention tests * qlinear_add atol=1 * attention still not fullllllly correct * it is what it is --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-24 16:15:22 -05:00
nimlgen	56288243e6	metal PyTorch interop (#9229 ) * add from_blob support to mps cuda * objc_id * metal pytorch interop * fix comments --------- Co-authored-by: George Hotz <geohot@gmail.com>	2025-02-24 22:36:08 +03:00
qazal	687d157906	delete cast early folding from ops [pr] (#9228 )	2025-02-24 19:00:51 +01:00
George Hotz	c9493e41a6	reorder expand (#9051 ) * reorder expand * symbolic ops needs resolve here * s/arg/st + whitespace * viz --------- Co-authored-by: qazal <qazal.software@gmail.com>	2025-02-24 13:55:47 +01:00
qazal	14aa2395d0	allow VIEW(BUFFER) in Tensor UOps [pr] (#9210 ) * allow VIEW(BUFFER) in Tensor UOps [pr] * still reshapes * update becomes_map tests * bring copy folder to the scheduler * lint * only sgd left * optimizer assign * 13 kernels * rename to test_reorder_expand + assert VIEW	2025-02-24 13:06:15 +01:00
nimlgen	1d06d61b16	from_blob for cuda (#9223 ) * from_blob for cuda * maybe docs? * minor docs * example * waiting 9224 --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-24 14:02:06 +03:00
George Hotz	fc32ff80d6	torch and numpy dtype interop [pr] (#9224 ) * torch and numpy dtype interop [pr] * less lines * order	2025-02-24 18:26:49 +08:00
George Hotz	24615db5f5	hotfix: torch cuda interop example	2025-02-24 09:02:48 +00:00
George Hotz	fd731e740a	hotfix: add note on backend2.py	2025-02-24 11:23:03 +08:00
albanD	f2dd9c1562	simplify c++ code (#9221 )	2025-02-24 11:04:41 +08:00
qazal	d12efc95d4	support custom name function in viz [pr] (#9219 ) * support custom name function in viz [pr] * title case * assert name count in test_track_rewrites_name_fxn	2025-02-24 03:03:25 +02:00
chenyu	b3ae664d5d	fix gradient of pow(t, int) (#9217 ) semi revert some pow logic back to tensor. added direct gradient check because the backward in test_ops passed by luck	2025-02-23 17:42:09 -05:00
qazal	12b5b83821	set TRACK_MATCH_STATS=0 for real_strides [pr] (#9216 )	2025-02-23 23:26:31 +02:00
qazal	9db0ec46a7	simpler buf_uop [pr] (#9215 ) * simpler buf_uop [pr] * assert after realize it's buffer	2025-02-23 19:23:14 +01:00
qazal	898aafe6fd	move split_reduceop to scheduler + enable it for multi (#9214 ) * move split_reduceop to scheduler + enable it for multi * merge r and _reduceop	2025-02-23 17:30:04 +01:00
ShikChen	05e3202fba	remove unused memsize_to_str and minor cleanups [pr] (#9211 ) * fix edge cases in memsize_to_str() Inputs <= 1 now return "0.00 B" for 0 and "1.00 B" for 1, avoiding an IndexError. Also, memsize_to_str(1000) now returns "1.00 KB" instead of "1000.00 B". Replaced the list comprehension with a next(...) generator for conciseness and efficiency. * simplify code using idiomatic python - Remove the unused `memsize_to_str()` function in helpers. - Use a tuple for checking multiple string prefixes/suffixes. - Avoid unnecessary list construction by using iterables directly. - Check None in @diskcache to ensure proper caching of falsy values. * revert generators back to list comprehension Sometimes building list first could be faster. Keep it as is.	2025-02-23 09:58:37 -05:00
qazal	81a71ae0f6	hotfix: skip test_exclude_const_metadata (#9208 )	2025-02-22 23:26:04 +02:00
chenyu	e0adb1fc76	really run test_ops with TINY_BACKEND in ci (#9206 ) was failing with `line 1: pytest: command not found`	2025-02-22 15:51:24 -05:00
qazal	e6d20c47e3	simpler becomes_map update [pr] (#9201 ) * simpler becomes_map update * err, no metadata for device * simpler tensor metadata mapping + tests [pr] * remove kernel metadata * don't map nones * pruning * linter	2025-02-22 20:50:58 +01:00
qazal	4578c3e8fd	simpler tensor metadata mapping + tests [pr] (#9203 ) * simpler tensor metadata mapping + tests [pr] * remove kernel metadata * don't map nones	2025-02-22 20:18:46 +01:00
qazal	b711c6343a	no early return + allow childless const/bind/var in kernel graph [pr] (#9202 )	2025-02-22 19:28:22 +01:00
George Hotz	97bc723538	torch backend works for ResNet-18 (#9200 ) * torch backend progress, a few more functions * resnet works * pillow * tv	2025-02-22 22:16:23 +08:00
George Hotz	f92820d30d	torch backend tests (#9198 ) * torch backend tests * pythonpath * install ninja	2025-02-22 16:01:49 +08:00
George Hotz	4e6665bda5	different way to write torch backend (#9197 ) * different way to write torch backend * both backends * more work * simpler code * more work * test both * imply unwrap/wrap * FORWARD_ONLY=1 TINY_BACKEND=1 python3 test/test_ops.py TestOps.test_add works * ready to start making test_ops work in torch backend * backward pass, TINY_BACKEND=1 python3 test/test_ops.py TestOps.test_add works * FORWARD_ONLY=1 TINY_BACKEND=1 python3 test/test_ops.py TestOps.test_simple_conv2d works * matmul backward is broken with as_strided	2025-02-22 14:42:26 +08:00
nimlgen	041b6d5678	am: load fw in batches (#9185 ) * am: load fw in batches * am: 1mb less fw copies * mypy * list	2025-02-21 23:21:31 +03:00
qazal	1db4341e9f	move viz graph to lib/graph [pr] (#9196 ) * move viz graph to lib/graph [pr] * add package * share with program	2025-02-21 21:04:07 +01:00
geohotstan	6587c7879b	simple fixes to onnx (#9195 ) * uncontroversial changes * cleaner _prepare_quantize	2025-02-21 13:10:06 -05:00
Simon R	2318d7ac51	Add missing tinygrad.runtime.autogen.am to packages (#9194 )	2025-02-21 15:38:24 +02:00
qazal	8bb80b6e5e	reorder AST matchers + comments [pr] (#9193 )	2025-02-21 14:31:15 +01:00
qazal	2eab8021fb	remove inputs+outputs attributes from ScheduleItem [pr] (#9192 ) * remove inputs/outputs from ScheduleItem * fix test_linearizer * fix test_conv_shapetracker * fix test_schedule + lint * test_image_dtype + multitensor + search	2025-02-21 13:48:11 +01:00
George Hotz	e87be0131e	torch backend start (#9191 ) * start torch backend * progress * ugh, you need cpp crap * 1+1 works * 1+1 works * becoming a real backend * ready to merge?	2025-02-21 16:57:28 +08:00
George Hotz	d3a21cced2	hotfix: bump version to 0.10.2 v0.10.2	2025-02-21 10:43:49 +08:00
chenyu	2e7c2780a9	CLANG -> CPU (#9189 )	2025-02-20 18:03:09 -05:00
nimlgen	f986e12f91	metal: choose compile spec based on macos (#9188 ) * metal: choose compile spec based on macos * correction	2025-02-21 00:43:39 +03:00
chenyu	3e22747799	run unit test on windows ci (#9187 ) * factor out testing_minimal in setup.py [pr] * testing_unit + windows	2025-02-20 14:40:41 -05:00
chenyu	287de4ecc6	use torch in test_gradient (#9186 ) used torch.autograd.grad, but not sure if it can be a template like jax	2025-02-20 12:26:11 -05:00
qazal	574a905291	Fix running VIZ=1 after package installation + test (#9183 ) * test running viz from pip install * add pkg * do 10 connection attempts * include assets in package_data * quiet curl * better print	2025-02-20 15:02:00 +01:00
chenyu	1692087db5	_one_hot_along_dim input needs to be int (#9179 ) * _one_hot_along_dim input needs to be int indexing and onehot compare with arange, and non-int dtype is likely a bug	2025-02-20 09:00:43 -05:00
George Hotz	bf36967883	cuda hooking (#9180 ) * cuda hooking * progress * more hook cuda * fix params * compile + cuMemHostAlloc hook * work * revert that	2025-02-20 19:20:01 +08:00
chenyu	3b37cc898b	add bert tiny config (#9177 ) set with BERT_SIZE=tiny. easier to study embedding and fusion	2025-02-19 14:57:03 -05:00
qazal	5662c898f1	correctly step through bottom_up_rewrites in viz [pr] (#9176 )	2025-02-19 19:20:57 +01:00
peppingdore	b1ddb2a1a6	fix win32 CPUProgram missing cache flush (#9171 ) * win32: fix missing inst cache flush, rename ptr->self.mem for consistency with posix code * fix types, remove assert * fix memory leak * rm whitespace	2025-02-19 21:38:51 +08:00
qazal	1bb9d78c7a	hotfix: add output buffer back to kernel parents + comment [pr] (#9174 )	2025-02-19 14:22:01 +01:00
chenyu	975c318dbc	bert use int32 for input ids (#9173 ) original data was int32 for these. float might have caused precision issues	2025-02-19 08:17:27 -05:00

1 2 3 4 5 ...

7991 Commits