tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-22 21:38:10 -05:00

Author	SHA1	Message	Date
nimlgen	08ab184dfd	usbgpu: copyin over 100mb/s (#10259 ) * usbgpu: over 100mb/s * align * h	2025-05-12 16:52:43 +03:00
Kirill R.	4c7c139102	Use cmod/cdiv in sym_infer (#10258 ) * Use cmod/cdiv in sym_infer * test --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-05-12 09:07:28 -04:00
chenyu	0015b3921f	sleep more in CI Remove amdgpu (#10261 ) see if this is less flaky	2025-05-12 08:13:44 -04:00
qazal	95c6a736a9	fix FUSE_ARANGE=1 for bert (#10255 )	2025-05-12 14:44:05 +03:00
Sieds Lykles	7c4b381fbf	Extra simplify valid test [pr] (#10256 ) * add test * Change the range * add todo test	2025-05-12 07:32:03 -04:00
b1tg	7eeb35ba6f	fix AMD LLVM compile error for bf16 cifar (#10254 ) * fix AMD LLVM compile error * remove llvm_bf16_cast --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-05-12 01:57:07 -04:00
uuuvn	a0ed1ec1ae	Faster remote server (#10235 )	2025-05-11 19:15:05 -07:00
b1tg	41f5ece877	add nsw flag (#10249 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-05-11 19:14:32 -07:00
George Hotz	8864ff894b	hotfix: that repeat_kv belongs outside the if	2025-05-11 18:43:01 -07:00
George Hotz	98c84a711d	min rectified flow example [pr] (#10252 ) * work on minrf example * more * jit sample * t is tensor not const * fixes * more convs * fix dropout * don't print * 504 * big patch * onehot * touch * use embeddings * dumb uses final layer * act * non fl * match * tp * 3 * of * ppsz * normal * add adln * no t * weird transformer * weird transformer * contig * actual speed fix * dumb * cb * 0 * t is 0 * mort-t * args * dumb days are over * readable * contig * no more t mask * mask_t * init to zero * clean * steps * work * tt * t * solid	2025-05-11 18:36:44 -07:00
chenyu	70c797b107	train bert tests (#10248 ) added a working bert tiny test, and a failed bert FUSE_ARANGE test	2025-05-11 08:42:08 -04:00
George Hotz	b2df4cb696	add support for amdgpu-flat-work-group-size to AMD LLVM IR (#10246 ) * add support for amdgpu-flat-work-group-size to AMD LLVM IR * don't spam llvm init --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-05-10 19:11:10 -07:00
qazal	9210280811	add v_fmac_f16 vop3 instruction to remu (#10247 ) * fmac vop3 * from the box	2025-05-10 23:48:25 +03:00
George Hotz	697259a8a1	amd_comgr_action_info_set_options was deprecated [pr] (#10245 ) * amd_comgr_action_info_set_options was deprecated [pr] * more standard	2025-05-10 11:59:04 -07:00
Kevin Buhler	2e0990c4e9	even spacing in viz nodes (#10168 ) * even spacing in viz nodes * precise dy value * dominant-baseline text-after-edge * add STROKE_WIDTH constant, delete dominant_baseline attr --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-05-10 10:35:10 +03:00
chenyu	d0e9b74f40	minor div_and_mod_folding cleanup [pr] (#10243 ) remove type ignore and one walrus	2025-05-09 22:42:01 -04:00
Adam Van Ymeren	a28ca0680f	update dead link (#10242 )	2025-05-09 19:59:52 -04:00
nimlgen	2145bce3f9	usbgpu: copyin size is 16k (#10240 ) * usbgpu: copyin size is 16k * ush	2025-05-09 22:12:54 +03:00
Sieds Lykles	74e40aafa0	use cdiv in div and mod folding (#10216 ) * use cdiv * use cdiv and cmod there as well * Add tests --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-05-09 12:37:24 -04:00
Sieds Lykles	8da9c070ca	take gcd out of trunc div (#10238 )	2025-05-09 12:08:10 -04:00
qazal	e2292f6663	TRACEMETA>=2 displays UOp metadata in VIZ (#10237 )	2025-05-09 17:42:00 +03:00
qazal	d5686f33a9	delete KernelContext dataclass [pr] (#10236 )	2025-05-09 17:36:21 +03:00
qazal	467daf8d4c	remap UOp metadata in graph_rewrite_map [pr] (#10234 ) * remap metadata in graph_rewrite_map [pr] * fix * merge loops * UOp.metadata returns Metadata\|None * shorter	2025-05-09 17:20:53 +03:00
nimlgen	4c75b124b6	usb: copy into mv is faster (#10233 ) * usb: copy into mv is faster * missing * bytes	2025-05-09 14:53:36 +03:00
nimlgen	d08ce62553	hcq: do not reread signal in wait (#10232 )	2025-05-09 14:38:36 +03:00
nimlgen	0464a31000	usbgpu: no overrun check needed (#10231 )	2025-05-09 14:20:24 +03:00
nimlgen	116390083f	nvme speed write example (#10230 )	2025-05-09 14:20:01 +03:00
chenyu	9846435c2e	fix test_div_numerator_negative (#10229 ) the simplification was wrong with negative const_factor	2025-05-09 06:19:59 -04:00
chenyu	cba508c8c3	update uop symbolic tests (#10228 ) clean up TODOs and update tests	2025-05-09 01:55:53 -04:00
chenyu	56def6c319	better bound for mod negative number (#10227 )	2025-05-09 01:19:47 -04:00
chenyu	99f6d89dfb	tighter idiv bound for symbolic denominator (#10226 )	2025-05-08 22:38:56 -04:00
uuuvn	82a6160ff7	Detect metal paravirtualization bug via device name instead of CI (#10225 )	2025-05-08 19:31:47 -07:00
Xingyu	a21369d039	Enhance tensor random functions with dtype support (#10214 ) * Enhance tensor random functions with dtype support - Updated `aten.uniform_` and `aten.normal_` to include dtype parameter in backend.py - Added unit tests for uniform and normal tensor generation with specific dtypes in test.py * Refactor test name for clarity - Renamed `test_normal_dtype` to `test_normal` in `extra/torch_backend/test.py` - Aims to improve readability and better reflect the test's purpose	2025-05-08 20:48:07 -04:00
qazal	b6904bbf83	Revert "split grouper into insert and finalize stages [pr] (#10222 )" (#10224 ) This reverts commit `2594e4db15`.	2025-05-09 03:02:38 +03:00
qazal	2594e4db15	split grouper into insert and finalize stages [pr] (#10222 )	2025-05-09 02:36:22 +03:00
George Hotz	0b7e3e86d0	single device copy [pr] (#10221 ) * single device copy [pr] * simpler	2025-05-08 15:23:22 -07:00
qazal	1d0f239df7	use Tensor.train() in schedule test + typo [pr] (#10220 )	2025-05-08 23:46:42 +03:00
qazal	ff2aa6d0b2	buffer in create_kernel is optional [pr] (#10218 ) * buffer in create_kernel is optional [pr] * pylint	2025-05-08 22:35:55 +03:00
qazal	40560e77c2	minor grouper + viz fixup [pr] (#10217 ) * minor grouper + viz fixup [pr] * gitignore mypy_cache * reorder create_kernels * replace with realized * use tensor_map + viz before spec * lint * add that back	2025-05-08 21:39:44 +03:00
George Hotz	0411b09763	small changes from new multi [pr] (#10213 )	2025-05-08 07:04:27 -07:00
Sieds Lykles	a0580e8d3c	Cleanup in `div_and_mod_folding` [pr] (#10178 ) * Refactor binary var simplification * Simplify the congruence logic --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-05-08 06:25:32 -07:00
nimlgen	267ba9b592	usbgpu: better names in copy speed benchmark (#10212 )	2025-05-08 16:12:37 +03:00
hooved	7b4f05fd00	Add test for correctness of Infinity in WebGPU (#10201 ) * use function for infinity instead of uniform * test infinity math locally * test infinity math in CI * make pytest available to MacOS (WebGPU) * revert to master except failing webgpu test	2025-05-08 05:20:05 -07:00
nimlgen	e24fe1c746	usbgpu: pci cache (#10207 )	2025-05-08 14:31:01 +03:00
nimlgen	7d6ed1b1e9	hotfix: mac ci (#10210 ) * fixed? * cmnt	2025-05-08 14:13:23 +03:00
nimlgen	ba52fce4b2	usbgpu: benchmark in ci (#10208 ) * usbgpu: benchmark * usbgpu: benchmark	2025-05-08 12:02:04 +03:00
qazal	d0e3449992	remove view_supported_devices, check allocator instead [pr] (#10209 )	2025-05-08 11:45:02 +03:00
nimlgen	5a7f6b4d8e	am: fix launch on rdna4 (#10206 )	2025-05-08 09:46:12 +03:00
George Hotz	8d4c563c01	all COPY can be clone (#10205 ) * match old behavior * simple * it means the naive thing before the multi * fix	2025-05-07 20:31:39 -07:00
hooved	8e76c40aea	Refactor test: Enable generality in testing UOp alu expressions (#10200 ) * use function for infinity instead of uniform * test infinity math locally * test infinity math in CI * make pytest available to MacOS (WebGPU) * revert to master except failing webgpu test * isolate test refactor	2025-05-07 19:39:44 -07:00

1 2 3 4 5 ...

8764 Commits