tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 06:48:22 -05:00

Author	SHA1	Message	Date
qazal	0a45cd0cbe	grouper: merge views in fuse elementwise (#10325 ) * grouper: merge views in fuse elementwise * with gradient api	2025-05-15 13:17:09 +03:00
qazal	89d8d5b25e	add dims check in FUSE_ARANGE (#10323 )	2025-05-15 11:33:21 +03:00
qazal	8fad0f0124	grouper: check for unsafe PAD in FUSE (#10322 )	2025-05-15 10:53:44 +03:00
chenyu	f008e5f233	test_dtype_alu should cast bf16 input (#10320 ) when testing alu for bfloat16, it should cast inputs to bfloat16 first, otherwise numpy has both errors from input and errors from alu which is more inaccurate	2025-05-15 01:11:39 -04:00
George Hotz	568d6d96e7	small changes from new multi [pr] (#10318 )	2025-05-14 20:50:59 -07:00
chenyu	f6cf25fce4	cleanup test_conv2d_ceildiv_edge_case [pr] (#10317 )	2025-05-14 23:35:28 -04:00
Kirill R.	50d7162acd	Add conv2d ceildiv edge case (#10303 )	2025-05-14 22:50:23 -04:00
uuuvn	e5639b7788	Remote finalize (#10314 ) * Remote `.q(..., wait=True)` Seems a bit cleaner than doing `.batch_request()` after `.q(...)` for requests with return value. * Remote finalize --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-05-14 19:40:44 -07:00
George Hotz	bfc30fa6ea	hotfix: typo in shm_name	2025-05-14 19:34:52 -07:00
George Hotz	2bc54b3e22	manually handle OSX	2025-05-14 19:17:51 -07:00
George Hotz	ab460486d7	Revert "resnet dataloader osx (#10316 )" This reverts commit `aef336930a`.	2025-05-14 19:15:07 -07:00
uuuvn	7b4f27a219	Remote `.q(..., wait=True)` (#10313 ) Seems a bit cleaner than doing `.batch_request()` after `.q(...)` for requests with return value.	2025-05-14 19:07:20 -07:00
George Hotz	50181ab09f	hotfix: bump to 13500 lines	2025-05-14 18:49:59 -07:00
George Hotz	aef336930a	resnet dataloader osx (#10316 ) * mlperf dataloader on mac * resnet dataloader [pr] * simple should work	2025-05-14 18:31:26 -07:00
wozeparrot	9b14e8c3cd	feat: tag 0.10.3 (#10310 ) v0.10.3	2025-05-14 15:45:13 -07:00
George Hotz	18f532d110	small changes from O(1) multi [pr] (#10309 )	2025-05-14 15:34:07 -07:00
wozeparrot	9bbc2bc2a7	hotfix: filter_too_much (#10308 )	2025-05-14 15:31:51 -07:00
George Hotz	fc8ef63194	multi doesn't need tuple arg anymore [pr] (#10307 )	2025-05-14 15:16:40 -07:00
George Hotz	7a3d4de59a	hotfix: add GRAPH_ONE_KERNEL=1 to UsbGPU openpilot test	2025-05-14 14:50:37 -07:00
wozeparrot	2df2ec6640	feat: unpin hypothesis (#10306 )	2025-05-14 14:26:28 -07:00
uuuvn	b52452d69f	Remote multi (graph) (#9902 ) * Remote multi (graph) * Remote multi (graph transfers) --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-05-14 10:54:23 -07:00
George Hotz	42e70193c9	multi: instead of real, just copy (#10289 ) * multi: instead of real, just copy * fix test * remove real	2025-05-14 10:36:55 -07:00
qazal	043efc6ec4	do not require self for track_rewrites [pr] (#10302 )	2025-05-14 18:23:32 +03:00
uuuvn	dd816d0237	All MultiGraphRunners can graph transfers (#10301 )	2025-05-14 17:23:02 +03:00
nimlgen	e00679dc92	am_smi: fix layout with sleep mode (#10300 )	2025-05-14 15:44:42 +03:00
chenyu	fbaa26247a	randn_like in minrf (#10298 ) tested that it trains to similar loss	2025-05-14 07:59:50 -04:00
nimlgen	0788659d08	usbgpu: fast cold boot (#10260 ) * usbgpu: fast cold boot * cleaner * assert * xx * compat * fix * fix	2025-05-14 14:58:55 +03:00
qazal	d342f7688d	remove some skips in test_schedule + use assertRaisesRegex [pr] (#10296 )	2025-05-14 14:54:07 +03:00
qazal	40f4ce3390	enable AMD CI for TestRandomness.test_multinomial [pr] (#10295 )	2025-05-14 14:32:22 +03:00
nimlgen	792853b9e2	usbgpu: enable cache for compute queue (#10294 )	2025-05-14 13:05:36 +03:00
nimlgen	1218fc2230	usbgpu: enable cache for 64bit addresses (#10293 )	2025-05-14 12:37:39 +03:00
qazal	1770e00c41	only CAPTURE_PROCESS_REPLAY=1 + add filterwarnings back [pr] (#10292 )	2025-05-14 11:58:42 +03:00
qazal	1c97338be5	enable process replay assert for schedule [pr] (#10280 ) * enable process replay assert for schedule * start at unique+1	2025-05-14 11:10:47 +03:00
George Hotz	f1130ab3d3	openpilot benchmark test (#10290 ) * openpilot benchmark test * that	2025-05-13 22:49:28 -07:00
uuuvn	f726f79a9e	Remote multi (transfer) (#10285 )	2025-05-13 18:26:32 -07:00
uuuvn	7bc4864bc4	Make `dev` a property of `Allocator` (#10286 ) * Make `dev` a property of `Allocator` (this is a prereq refactor for #10285) At least `BufferXfer.copy` accesses it assuming it's always present, currently most devices just add this property on their own repeating the same code over and over again. This is also a bit footguny, see `RemoteAllocator` that named this property `device` instead of `dev`, i could obviously just change that in one place but doing it globally seems like a better solution (and it reduces code duplication too). `MallocAllocator` is a bit special, but passing `None` works just fine. * typing * ignore type instead of cast	2025-05-13 17:01:01 -07:00
George Hotz	ec46f658d7	openpilot llvm test [pr] (#10288 )	2025-05-13 16:51:41 -07:00
uuuvn	453b268342	Factor out remote connection and cache it (#10282 ) Should be a small speed improvement but the main reason this is needed is to have a defined ordering of RemoteRequests within one host so that transfers won't required doing something like: ```python src_dev.batch_submit() dest_dev.q(Transfer(dest, src_dev.session, src)) dest_dev.batch_submit() ``` for correctness.	2025-05-13 15:02:06 -07:00
uuuvn	ddff9857b8	Remote properties is a dataclass (#10283 ) Not strictly required for anything but soon there will be like 4 new properties and having it be a huge json just seems like a bad taste. It also seems right to not have a separate endpoint for this, just `GetProperties` request that returns a repr of this similar to how requests are sent in `BatchRequest`. This will also make a switch to anything other than http much simpler if it will be required for any reason, like just a tcp stream of `BatchRequest`s	2025-05-13 11:56:58 -07:00
uuuvn	ba87eca0f1	Remote multi (basic) (#10269 ) * Basic remote multi support Simplest thing to be able to use remote with multiple gpus, very slow because no transfers (copyin copyout for cross-device copies) * tests	2025-05-13 09:52:47 -07:00
George Hotz	5f64bbc63d	improve multi tests + add support for fixedvars [pr] (#10281 ) * improve multi tests + add support for fixedvars [pr] * add support for fixedvars	2025-05-13 09:27:00 -07:00
chenyu	8a906cb124	Tensor.randn_like (#10276 )	2025-05-13 11:53:59 -04:00
nimlgen	eab71d70ba	usbgpu: rescan pci bus every run (#10279 ) * usbgpu: rescan pci bus every run * ff	2025-05-13 18:31:42 +03:00
chenyu	c4988bc07b	only run test_u32_to_f16 if it supports fp16 (#10277 ) * only run test_u32_to_f16 if it supports fp16 * cleanup	2025-05-13 11:16:14 -04:00
nimlgen	9924c7d0e4	usbgpu: rebar (#10275 ) * usbgpu: rebar * cache back * revert this * fix * ugh * tt	2025-05-13 17:25:51 +03:00
uuuvn	1900c3c68a	Metal multi in ci is fine actually (#10274 ) Useful for testing remote multi stuff	2025-05-13 10:07:35 -04:00
nimlgen	6f42bf8b54	usbgpu: 10 steps in benchmark to hit cache (#10273 )	2025-05-13 17:06:50 +03:00
chenyu	ad5cb2717d	FUSE_ARANGE=1 in bert bench (#10263 ) still fails, something multi related maybe Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-05-13 09:12:19 -04:00
qazal	a2d6b0afe0	fix FUSE pushing through SHRINK (#10271 )	2025-05-13 11:38:53 +03:00
geohotstan	1c4ab6b991	ONNX add tests against ORT (#10270 ) * start * clean up * indicate file location too	2025-05-13 04:03:52 -04:00

... 31 32 33 34 35 ...

10417 Commits