tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-25 06:48:22 -05:00

Author	SHA1	Message	Date
qazal	d342f7688d	remove some skips in test_schedule + use assertRaisesRegex [pr] (#10296 )	2025-05-14 14:54:07 +03:00
qazal	40f4ce3390	enable AMD CI for TestRandomness.test_multinomial [pr] (#10295 )	2025-05-14 14:32:22 +03:00
nimlgen	792853b9e2	usbgpu: enable cache for compute queue (#10294 )	2025-05-14 13:05:36 +03:00
nimlgen	1218fc2230	usbgpu: enable cache for 64bit addresses (#10293 )	2025-05-14 12:37:39 +03:00
qazal	1770e00c41	only CAPTURE_PROCESS_REPLAY=1 + add filterwarnings back [pr] (#10292 )	2025-05-14 11:58:42 +03:00
qazal	1c97338be5	enable process replay assert for schedule [pr] (#10280 ) * enable process replay assert for schedule * start at unique+1	2025-05-14 11:10:47 +03:00
George Hotz	f1130ab3d3	openpilot benchmark test (#10290 ) * openpilot benchmark test * that	2025-05-13 22:49:28 -07:00
uuuvn	f726f79a9e	Remote multi (transfer) (#10285 )	2025-05-13 18:26:32 -07:00
uuuvn	7bc4864bc4	Make `dev` a property of `Allocator` (#10286 ) * Make `dev` a property of `Allocator` (this is a prereq refactor for #10285) At least `BufferXfer.copy` accesses it assuming it's always present, currently most devices just add this property on their own repeating the same code over and over again. This is also a bit footguny, see `RemoteAllocator` that named this property `device` instead of `dev`, i could obviously just change that in one place but doing it globally seems like a better solution (and it reduces code duplication too). `MallocAllocator` is a bit special, but passing `None` works just fine. * typing * ignore type instead of cast	2025-05-13 17:01:01 -07:00
George Hotz	ec46f658d7	openpilot llvm test [pr] (#10288 )	2025-05-13 16:51:41 -07:00
uuuvn	453b268342	Factor out remote connection and cache it (#10282 ) Should be a small speed improvement but the main reason this is needed is to have a defined ordering of RemoteRequests within one host so that transfers won't required doing something like: ```python src_dev.batch_submit() dest_dev.q(Transfer(dest, src_dev.session, src)) dest_dev.batch_submit() ``` for correctness.	2025-05-13 15:02:06 -07:00
uuuvn	ddff9857b8	Remote properties is a dataclass (#10283 ) Not strictly required for anything but soon there will be like 4 new properties and having it be a huge json just seems like a bad taste. It also seems right to not have a separate endpoint for this, just `GetProperties` request that returns a repr of this similar to how requests are sent in `BatchRequest`. This will also make a switch to anything other than http much simpler if it will be required for any reason, like just a tcp stream of `BatchRequest`s	2025-05-13 11:56:58 -07:00
uuuvn	ba87eca0f1	Remote multi (basic) (#10269 ) * Basic remote multi support Simplest thing to be able to use remote with multiple gpus, very slow because no transfers (copyin copyout for cross-device copies) * tests	2025-05-13 09:52:47 -07:00
George Hotz	5f64bbc63d	improve multi tests + add support for fixedvars [pr] (#10281 ) * improve multi tests + add support for fixedvars [pr] * add support for fixedvars	2025-05-13 09:27:00 -07:00
chenyu	8a906cb124	Tensor.randn_like (#10276 )	2025-05-13 11:53:59 -04:00
nimlgen	eab71d70ba	usbgpu: rescan pci bus every run (#10279 ) * usbgpu: rescan pci bus every run * ff	2025-05-13 18:31:42 +03:00
chenyu	c4988bc07b	only run test_u32_to_f16 if it supports fp16 (#10277 ) * only run test_u32_to_f16 if it supports fp16 * cleanup	2025-05-13 11:16:14 -04:00
nimlgen	9924c7d0e4	usbgpu: rebar (#10275 ) * usbgpu: rebar * cache back * revert this * fix * ugh * tt	2025-05-13 17:25:51 +03:00
uuuvn	1900c3c68a	Metal multi in ci is fine actually (#10274 ) Useful for testing remote multi stuff	2025-05-13 10:07:35 -04:00
nimlgen	6f42bf8b54	usbgpu: 10 steps in benchmark to hit cache (#10273 )	2025-05-13 17:06:50 +03:00
chenyu	ad5cb2717d	FUSE_ARANGE=1 in bert bench (#10263 ) still fails, something multi related maybe Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-05-13 09:12:19 -04:00
qazal	a2d6b0afe0	fix FUSE pushing through SHRINK (#10271 )	2025-05-13 11:38:53 +03:00
geohotstan	1c4ab6b991	ONNX add tests against ORT (#10270 ) * start * clean up * indicate file location too	2025-05-13 04:03:52 -04:00
nimlgen	bb31cc4582	usbgpu: check hash in patcher (#10266 )	2025-05-12 21:08:53 +03:00
uuuvn	94907d02c8	Move session to RemoteRequest (#10264 ) This is a prereq refactor for cloud multi which will make it possible to use multiple devices from cloud host instead of just one. I will do that via changing a session to be a `tuple[token, dev_idx]` Previously the session was in cookies, this is a problem because a single http request can contain many RemoteRequests with potentially different devices. The alternatives are either: \- sending commands for different devices in separate http requests (slow) \- only adding an idx in RemoteRequest in basically the same way i added session here, keeping session a cookie and concat in server. This is how i've done it previously and it looks just strictly worse than having it all be in the same place.	2025-05-12 10:06:09 -07:00
Sieds Lykles	02208565de	add check (#10257 )	2025-05-12 11:03:01 -04:00
nimlgen	08ab184dfd	usbgpu: copyin over 100mb/s (#10259 ) * usbgpu: over 100mb/s * align * h	2025-05-12 16:52:43 +03:00
Kirill R.	4c7c139102	Use cmod/cdiv in sym_infer (#10258 ) * Use cmod/cdiv in sym_infer * test --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-05-12 09:07:28 -04:00
chenyu	0015b3921f	sleep more in CI Remove amdgpu (#10261 ) see if this is less flaky	2025-05-12 08:13:44 -04:00
qazal	95c6a736a9	fix FUSE_ARANGE=1 for bert (#10255 )	2025-05-12 14:44:05 +03:00
Sieds Lykles	7c4b381fbf	Extra simplify valid test [pr] (#10256 ) * add test * Change the range * add todo test	2025-05-12 07:32:03 -04:00
b1tg	7eeb35ba6f	fix AMD LLVM compile error for bf16 cifar (#10254 ) * fix AMD LLVM compile error * remove llvm_bf16_cast --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-05-12 01:57:07 -04:00
uuuvn	a0ed1ec1ae	Faster remote server (#10235 )	2025-05-11 19:15:05 -07:00
b1tg	41f5ece877	add nsw flag (#10249 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-05-11 19:14:32 -07:00
George Hotz	8864ff894b	hotfix: that repeat_kv belongs outside the if	2025-05-11 18:43:01 -07:00
George Hotz	98c84a711d	min rectified flow example [pr] (#10252 ) * work on minrf example * more * jit sample * t is tensor not const * fixes * more convs * fix dropout * don't print * 504 * big patch * onehot * touch * use embeddings * dumb uses final layer * act * non fl * match * tp * 3 * of * ppsz * normal * add adln * no t * weird transformer * weird transformer * contig * actual speed fix * dumb * cb * 0 * t is 0 * mort-t * args * dumb days are over * readable * contig * no more t mask * mask_t * init to zero * clean * steps * work * tt * t * solid	2025-05-11 18:36:44 -07:00
chenyu	70c797b107	train bert tests (#10248 ) added a working bert tiny test, and a failed bert FUSE_ARANGE test	2025-05-11 08:42:08 -04:00
George Hotz	b2df4cb696	add support for amdgpu-flat-work-group-size to AMD LLVM IR (#10246 ) * add support for amdgpu-flat-work-group-size to AMD LLVM IR * don't spam llvm init --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-05-10 19:11:10 -07:00
qazal	9210280811	add v_fmac_f16 vop3 instruction to remu (#10247 ) * fmac vop3 * from the box	2025-05-10 23:48:25 +03:00
George Hotz	697259a8a1	amd_comgr_action_info_set_options was deprecated [pr] (#10245 ) * amd_comgr_action_info_set_options was deprecated [pr] * more standard	2025-05-10 11:59:04 -07:00
Kevin Buhler	2e0990c4e9	even spacing in viz nodes (#10168 ) * even spacing in viz nodes * precise dy value * dominant-baseline text-after-edge * add STROKE_WIDTH constant, delete dominant_baseline attr --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-05-10 10:35:10 +03:00
chenyu	d0e9b74f40	minor div_and_mod_folding cleanup [pr] (#10243 ) remove type ignore and one walrus	2025-05-09 22:42:01 -04:00
Adam Van Ymeren	a28ca0680f	update dead link (#10242 )	2025-05-09 19:59:52 -04:00
nimlgen	2145bce3f9	usbgpu: copyin size is 16k (#10240 ) * usbgpu: copyin size is 16k * ush	2025-05-09 22:12:54 +03:00
Sieds Lykles	74e40aafa0	use cdiv in div and mod folding (#10216 ) * use cdiv * use cdiv and cmod there as well * Add tests --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-05-09 12:37:24 -04:00
Sieds Lykles	8da9c070ca	take gcd out of trunc div (#10238 )	2025-05-09 12:08:10 -04:00
qazal	e2292f6663	TRACEMETA>=2 displays UOp metadata in VIZ (#10237 )	2025-05-09 17:42:00 +03:00
qazal	d5686f33a9	delete KernelContext dataclass [pr] (#10236 )	2025-05-09 17:36:21 +03:00
qazal	467daf8d4c	remap UOp metadata in graph_rewrite_map [pr] (#10234 ) * remap metadata in graph_rewrite_map [pr] * fix * merge loops * UOp.metadata returns Metadata\|None * shorter	2025-05-09 17:20:53 +03:00
nimlgen	4c75b124b6	usb: copy into mv is faster (#10233 ) * usb: copy into mv is faster * missing * bytes	2025-05-09 14:53:36 +03:00

... 33 34 35 36 37 ...

10490 Commits