tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-22 05:18:01 -05:00

Author	SHA1	Message	Date
chenyu	ad5cb2717d	FUSE_ARANGE=1 in bert bench (#10263 ) still fails, something multi related maybe Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-05-13 09:12:19 -04:00
qazal	a2d6b0afe0	fix FUSE pushing through SHRINK (#10271 )	2025-05-13 11:38:53 +03:00
geohotstan	1c4ab6b991	ONNX add tests against ORT (#10270 ) * start * clean up * indicate file location too	2025-05-13 04:03:52 -04:00
nimlgen	bb31cc4582	usbgpu: check hash in patcher (#10266 )	2025-05-12 21:08:53 +03:00
uuuvn	94907d02c8	Move session to RemoteRequest (#10264 ) This is a prereq refactor for cloud multi which will make it possible to use multiple devices from cloud host instead of just one. I will do that via changing a session to be a `tuple[token, dev_idx]` Previously the session was in cookies, this is a problem because a single http request can contain many RemoteRequests with potentially different devices. The alternatives are either: \- sending commands for different devices in separate http requests (slow) \- only adding an idx in RemoteRequest in basically the same way i added session here, keeping session a cookie and concat in server. This is how i've done it previously and it looks just strictly worse than having it all be in the same place.	2025-05-12 10:06:09 -07:00
Sieds Lykles	02208565de	add check (#10257 )	2025-05-12 11:03:01 -04:00
nimlgen	08ab184dfd	usbgpu: copyin over 100mb/s (#10259 ) * usbgpu: over 100mb/s * align * h	2025-05-12 16:52:43 +03:00
Kirill R.	4c7c139102	Use cmod/cdiv in sym_infer (#10258 ) * Use cmod/cdiv in sym_infer * test --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-05-12 09:07:28 -04:00
chenyu	0015b3921f	sleep more in CI Remove amdgpu (#10261 ) see if this is less flaky	2025-05-12 08:13:44 -04:00
qazal	95c6a736a9	fix FUSE_ARANGE=1 for bert (#10255 )	2025-05-12 14:44:05 +03:00
Sieds Lykles	7c4b381fbf	Extra simplify valid test [pr] (#10256 ) * add test * Change the range * add todo test	2025-05-12 07:32:03 -04:00
b1tg	7eeb35ba6f	fix AMD LLVM compile error for bf16 cifar (#10254 ) * fix AMD LLVM compile error * remove llvm_bf16_cast --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-05-12 01:57:07 -04:00
uuuvn	a0ed1ec1ae	Faster remote server (#10235 )	2025-05-11 19:15:05 -07:00
b1tg	41f5ece877	add nsw flag (#10249 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-05-11 19:14:32 -07:00
George Hotz	8864ff894b	hotfix: that repeat_kv belongs outside the if	2025-05-11 18:43:01 -07:00
George Hotz	98c84a711d	min rectified flow example [pr] (#10252 ) * work on minrf example * more * jit sample * t is tensor not const * fixes * more convs * fix dropout * don't print * 504 * big patch * onehot * touch * use embeddings * dumb uses final layer * act * non fl * match * tp * 3 * of * ppsz * normal * add adln * no t * weird transformer * weird transformer * contig * actual speed fix * dumb * cb * 0 * t is 0 * mort-t * args * dumb days are over * readable * contig * no more t mask * mask_t * init to zero * clean * steps * work * tt * t * solid	2025-05-11 18:36:44 -07:00
chenyu	70c797b107	train bert tests (#10248 ) added a working bert tiny test, and a failed bert FUSE_ARANGE test	2025-05-11 08:42:08 -04:00
George Hotz	b2df4cb696	add support for amdgpu-flat-work-group-size to AMD LLVM IR (#10246 ) * add support for amdgpu-flat-work-group-size to AMD LLVM IR * don't spam llvm init --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-05-10 19:11:10 -07:00
qazal	9210280811	add v_fmac_f16 vop3 instruction to remu (#10247 ) * fmac vop3 * from the box	2025-05-10 23:48:25 +03:00
George Hotz	697259a8a1	amd_comgr_action_info_set_options was deprecated [pr] (#10245 ) * amd_comgr_action_info_set_options was deprecated [pr] * more standard	2025-05-10 11:59:04 -07:00
Kevin Buhler	2e0990c4e9	even spacing in viz nodes (#10168 ) * even spacing in viz nodes * precise dy value * dominant-baseline text-after-edge * add STROKE_WIDTH constant, delete dominant_baseline attr --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-05-10 10:35:10 +03:00
chenyu	d0e9b74f40	minor div_and_mod_folding cleanup [pr] (#10243 ) remove type ignore and one walrus	2025-05-09 22:42:01 -04:00
Adam Van Ymeren	a28ca0680f	update dead link (#10242 )	2025-05-09 19:59:52 -04:00
nimlgen	2145bce3f9	usbgpu: copyin size is 16k (#10240 ) * usbgpu: copyin size is 16k * ush	2025-05-09 22:12:54 +03:00
Sieds Lykles	74e40aafa0	use cdiv in div and mod folding (#10216 ) * use cdiv * use cdiv and cmod there as well * Add tests --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-05-09 12:37:24 -04:00
Sieds Lykles	8da9c070ca	take gcd out of trunc div (#10238 )	2025-05-09 12:08:10 -04:00
qazal	e2292f6663	TRACEMETA>=2 displays UOp metadata in VIZ (#10237 )	2025-05-09 17:42:00 +03:00
qazal	d5686f33a9	delete KernelContext dataclass [pr] (#10236 )	2025-05-09 17:36:21 +03:00
qazal	467daf8d4c	remap UOp metadata in graph_rewrite_map [pr] (#10234 ) * remap metadata in graph_rewrite_map [pr] * fix * merge loops * UOp.metadata returns Metadata\|None * shorter	2025-05-09 17:20:53 +03:00
nimlgen	4c75b124b6	usb: copy into mv is faster (#10233 ) * usb: copy into mv is faster * missing * bytes	2025-05-09 14:53:36 +03:00
nimlgen	d08ce62553	hcq: do not reread signal in wait (#10232 )	2025-05-09 14:38:36 +03:00
nimlgen	0464a31000	usbgpu: no overrun check needed (#10231 )	2025-05-09 14:20:24 +03:00
nimlgen	116390083f	nvme speed write example (#10230 )	2025-05-09 14:20:01 +03:00
chenyu	9846435c2e	fix test_div_numerator_negative (#10229 ) the simplification was wrong with negative const_factor	2025-05-09 06:19:59 -04:00
chenyu	cba508c8c3	update uop symbolic tests (#10228 ) clean up TODOs and update tests	2025-05-09 01:55:53 -04:00
chenyu	56def6c319	better bound for mod negative number (#10227 )	2025-05-09 01:19:47 -04:00
chenyu	99f6d89dfb	tighter idiv bound for symbolic denominator (#10226 )	2025-05-08 22:38:56 -04:00
uuuvn	82a6160ff7	Detect metal paravirtualization bug via device name instead of CI (#10225 )	2025-05-08 19:31:47 -07:00
Xingyu	a21369d039	Enhance tensor random functions with dtype support (#10214 ) * Enhance tensor random functions with dtype support - Updated `aten.uniform_` and `aten.normal_` to include dtype parameter in backend.py - Added unit tests for uniform and normal tensor generation with specific dtypes in test.py * Refactor test name for clarity - Renamed `test_normal_dtype` to `test_normal` in `extra/torch_backend/test.py` - Aims to improve readability and better reflect the test's purpose	2025-05-08 20:48:07 -04:00
qazal	b6904bbf83	Revert "split grouper into insert and finalize stages [pr] (#10222 )" (#10224 ) This reverts commit `2594e4db15`.	2025-05-09 03:02:38 +03:00
qazal	2594e4db15	split grouper into insert and finalize stages [pr] (#10222 )	2025-05-09 02:36:22 +03:00
George Hotz	0b7e3e86d0	single device copy [pr] (#10221 ) * single device copy [pr] * simpler	2025-05-08 15:23:22 -07:00
qazal	1d0f239df7	use Tensor.train() in schedule test + typo [pr] (#10220 )	2025-05-08 23:46:42 +03:00
qazal	ff2aa6d0b2	buffer in create_kernel is optional [pr] (#10218 ) * buffer in create_kernel is optional [pr] * pylint	2025-05-08 22:35:55 +03:00
qazal	40560e77c2	minor grouper + viz fixup [pr] (#10217 ) * minor grouper + viz fixup [pr] * gitignore mypy_cache * reorder create_kernels * replace with realized * use tensor_map + viz before spec * lint * add that back	2025-05-08 21:39:44 +03:00
George Hotz	0411b09763	small changes from new multi [pr] (#10213 )	2025-05-08 07:04:27 -07:00
Sieds Lykles	a0580e8d3c	Cleanup in `div_and_mod_folding` [pr] (#10178 ) * Refactor binary var simplification * Simplify the congruence logic --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-05-08 06:25:32 -07:00
nimlgen	267ba9b592	usbgpu: better names in copy speed benchmark (#10212 )	2025-05-08 16:12:37 +03:00
hooved	7b4f05fd00	Add test for correctness of Infinity in WebGPU (#10201 ) * use function for infinity instead of uniform * test infinity math locally * test infinity math in CI * make pytest available to MacOS (WebGPU) * revert to master except failing webgpu test	2025-05-08 05:20:05 -07:00
nimlgen	e24fe1c746	usbgpu: pci cache (#10207 )	2025-05-08 14:31:01 +03:00

1 2 3 4 5 ...

8770 Commits