tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-20 04:18:13 -05:00

Author	SHA1	Message	Date
nimlgen	0788659d08	usbgpu: fast cold boot (#10260 ) * usbgpu: fast cold boot * cleaner * assert * xx * compat * fix * fix	2025-05-14 14:58:55 +03:00
geohotstan	1c4ab6b991	ONNX add tests against ORT (#10270 ) * start * clean up * indicate file location too	2025-05-13 04:03:52 -04:00
nimlgen	bb31cc4582	usbgpu: check hash in patcher (#10266 )	2025-05-12 21:08:53 +03:00
George Hotz	8864ff894b	hotfix: that repeat_kv belongs outside the if	2025-05-11 18:43:01 -07:00
George Hotz	98c84a711d	min rectified flow example [pr] (#10252 ) * work on minrf example * more * jit sample * t is tensor not const * fixes * more convs * fix dropout * don't print * 504 * big patch * onehot * touch * use embeddings * dumb uses final layer * act * non fl * match * tp * 3 * of * ppsz * normal * add adln * no t * weird transformer * weird transformer * contig * actual speed fix * dumb * cb * 0 * t is 0 * mort-t * args * dumb days are over * readable * contig * no more t mask * mask_t * init to zero * clean * steps * work * tt * t * solid	2025-05-11 18:36:44 -07:00
qazal	9210280811	add v_fmac_f16 vop3 instruction to remu (#10247 ) * fmac vop3 * from the box	2025-05-10 23:48:25 +03:00
nimlgen	116390083f	nvme speed write example (#10230 )	2025-05-09 14:20:01 +03:00
Xingyu	a21369d039	Enhance tensor random functions with dtype support (#10214 ) * Enhance tensor random functions with dtype support - Updated `aten.uniform_` and `aten.normal_` to include dtype parameter in backend.py - Added unit tests for uniform and normal tensor generation with specific dtypes in test.py * Refactor test name for clarity - Renamed `test_normal_dtype` to `test_normal` in `extra/torch_backend/test.py` - Aims to improve readability and better reflect the test's purpose	2025-05-08 20:48:07 -04:00
qazal	4ea3e373aa	decode lds ops in remu (#10184 )	2025-05-07 16:44:18 +08:00
Ignacio Sica	74c25bdc8b	add support for `ds_load_u8` in remu (#10180 ) * add support for ds_load_u8 in remu * add test for ds_load_u8	2025-05-06 20:31:00 +03:00
nimlgen	34d55857cf	usbgpu: more devs in scan_pci (#10171 )	2025-05-06 11:55:34 +03:00
nimlgen	30bd6a619f	usb gpu (#8766 ) * start gpu * progress * fixes * read correct * libusb * libusb works * support asm24 * hmm * one access file * fix extra * start AMBar * works on am * back to usb * patch fw * full fast write into a bar * ugh, minus one gpus, next please * mute libusb for now * usb for asm24 * 63 * hmm * ops * rescan * and gpu shoudl be there * enumerate them? * usbgpu bus 4, 100% reliable (draft) * lil * works * comments * add DEBUG * cleaner * simplest * Revert "simplest" This reverts commit `1d00354c16`. * Revert "cleaner" This reverts commit `c5662de956`. * assert we find gpu * that's simpler * this back * simpler? * correcT * work * nonsense * works with more checks * this works * the 6s in the right place * reliable now * fix after reboot * set config * 1s timeouts * close to fw loading * streams * usbhub works * endpoints * fix * want to test tiny10 * move to tiny 10 * fix gpu * ugly speed * smth * mostly broken, but signals and dmas * do not reset gpu every time * changes to run kernels * ugh, not working * t10 * pg and sc files * some prog * um? * somehow it works * patched for 24 * some tries * minimal * moving * back to working * so sloooooow * move to controller * usb.py rewrite * rework * cleaner 1 * cleaner 2 * cleaner 3 * new abstractions * aft merge * init controller * cleaner 4 * cleaner 5 * patcher + tiny changes * ignore that * cleaner 6 * after rebase * cleaner 7 * bring it back * start linter war * linter 2 * autogen was missing * fix autogen * typing * better? * mypy * extra/legacy rename and cleaner * shuffle * better printing * tiny changes and tests --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-05-01 18:03:47 +03:00
chenyu	17d4d258ea	simple symbolic slice in llama [pr] (#10112 ) support slice that has step None and stop > start	2025-04-30 14:36:35 -04:00
nimlgen	fcdda4fc09	am: move boot memory to vram start (#10115 )	2025-04-30 19:12:19 +03:00
chenyu	573bbb9746	Revert "remove TransformerBlock contiguous in llama (#10104 )" (#10108 ) This reverts commit `b8d07dcc54`.	2025-04-29 15:28:38 -04:00
chenyu	b8d07dcc54	remove TransformerBlock contiguous in llama (#10104 )	2025-04-29 14:15:39 -04:00
qazal	3b67f56c02	kernelize some llama realizes (#10098 )	2025-04-29 18:39:56 +08:00
chenyu	3eba3d6ee9	don't pass model in convert_from_huggingface and convert_from_gguf (#10094 ) it only needs n_layers	2025-04-28 20:11:19 -04:00
George Hotz	690dac79b5	don't modify the ranges on reduce rewrite (#10062 ) * bug in div range folding * simpler * oh, this is right for indexing, but the div mod folding needs to be fixed * reenable * Passing test_complexity_w_unroll2 (#10068) * Passing * remove non_folded_divs * Add check for negative tern in div folding * Add test * bump that limit * fix casted --------- Co-authored-by: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>	2025-04-28 12:01:19 -04:00
qazal	ac37510f60	remu: only write v_cmp result if exec is set (#10084 )	2025-04-28 20:31:52 +08:00
qazal	d6b436a815	remu bugfix with -0.0 negation (#10082 )	2025-04-28 15:46:42 +08:00
George Hotz	ea5dddc537	reduce collapse generic (#10045 ) * reduce collapse generic * new arange folder * new range folding * correct with sym * all tests pass * indexing ops passes * failing tests * fix tests, remove unused * revert that * torch indexing is fast * skip on webgpu * touchups * comments	2025-04-26 09:13:24 -04:00
qazal	e1d2b64e92	remu new instructions (#10050 ) * remu new instructions * test_ds_store_half * test_v_mul_f16	2025-04-26 02:04:12 +03:00
qazal	bba5d0a3e4	remu refactors (#10028 ) * remu refactors * scc is sgpr 253 * remove that * rename to vcc_lo * run cargo test in CI * llvm-mc * meh * work * work_group work 1 * seeded_lanes is dumb * better than seeded_lanes * does not need to be address * 128 sgpr per wave * scc is sgpr, we don't know which one * null_src once more * derive clone, wave init is cleaner * init comes first	2025-04-26 04:31:10 +08:00
nimlgen	0fc85a2b0a	hcqfuzz: init (#10049 ) * hcqfuzz: init * fix fuzz * linter * graph * taht test * update readme	2025-04-25 23:19:21 +03:00
chenyu	74c6cf8be3	lint mlperf model_train (#10038 )	2025-04-24 16:19:44 -04:00
Nishant Rajadhyaksha	55942a8d8e	[Bounty] moved index_tensor off cpu in torch_backend (#9916 ) * moved index tensor off cpu in torch_backend * added support for None based indexing * fix_to_pass_tests * fix segfault tests	2025-04-24 14:12:37 -04:00
qazal	0b482fb824	add RDNA3 parser to remu (#10025 ) * llvm ref * work * all of them * salu * cleaner * start * vector ops * done * replace SMEM * vopd * sop1 * SOPC * null stays null_src * sopp * SOPK * sop2 * vop1 * vop2 * remove allow(dead_code) * vopc	2025-04-24 21:34:07 +08:00
Sieds Lykles	e75be6eafc	[bounty] [pr] index validation with z3 (#9981 ) * index validation with z3 * Change comment * toposort -> toposort() --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-04-24 08:06:08 -04:00
Park Jun	c3ad7b2a84	create randperm and support pytorch backend (#10019 )	2025-04-24 07:29:02 -04:00
Matthew Daiter	b545338e59	isin_Tensor_out added (#10018 )	2025-04-24 07:26:51 -04:00
nimlgen	1c5e353249	am: use mmio iface (#10012 ) * am: use mmio iface * linters * fixes * fixes + cleanups * mute * mypy * style	2025-04-24 00:27:04 +03:00
Francis Lata	defa1e77f6	get the proper dataset count (#9962 )	2025-04-21 12:11:37 -04:00
Francis Lata	d7e247f329	RetinaNet INITMLPERF support (#9950 ) * fixes to make fake data work * fix eval beam * fix merge issue	2025-04-21 10:32:05 -04:00
akhuntsaria	2d423e6737	fix assertion message for supported device in export_model (#9957 )	2025-04-21 09:23:44 -04:00
qazal	e20ef7196a	Tensor.kernelize (#9845 ) * add kernelize * remove that * kernelize returns self * update abstractions2.py * kernelize in test_schedule * temp: assert BUFFER_VIEW's existence * ASSIGN must have a buffer or subbuffer target * assert and shrink * fix * padded setitem * var * toposort once * extra * base_buffer * end with BUFFER_VIEW * setitem for disk * test_setitem_becomes_subbuffer * mul slice test * torch backend fix 1 * non-deterministic * keep subbuffer	2025-04-20 20:53:49 +08:00
chenyu	6c30948df6	hand_coded_optimizations returns list[Opt] [pr] (#9938 ) new api looks like `k.apply_opts(hand_coded_optimizations(k))`	2025-04-19 20:26:59 -04:00
chenyu	720f20865b	remove required_optimizations (#9848 )	2025-04-19 16:51:16 -04:00
qazal	16dfe0a902	upstream remu (#9921 )	2025-04-18 01:57:36 +03:00
chenyu	f5256e0020	Kernel.apply_opts [pr] (#9917 ) * Kernel.apply_opts [pr] updated all `for opt in`. also updated a few test_liinearizer tests to not implcitly depend on hand_coded_optimization * not you yet	2025-04-17 08:00:56 -04:00
Xingyu	047c8fd70d	Add amax support to Tensor operations in Torch Backend (#9905 ) * Add amax support to Tensor operations - Implemented amax function in backend.py for tensor max operations. - Added unit tests for amax in test.py to ensure correct functionality. * Fix formatting in amax output function - Adjusted spacing in the amax output lambda function in backend.py - Improved code readability for better maintenance	2025-04-16 10:35:50 +01:00
geohotstan	4e8f25109a	Revert "ONNX add output shape validation (#9720 )" (#9904 ) This reverts commit `ac713e04db`.	2025-04-16 03:15:56 -04:00
nimlgen	7c466c24f7	am_smi: refactor to support arches (#9864 ) * am_smi: refactor to support arches * shorter	2025-04-12 20:37:01 +03:00
chenyu	8c6299bced	move hand_coded_optimizations to heuristic.py [pr] (#9844 ) * move hand_coded_optimizations to heuristic.py [pr] also folded all long lines * make a copy and rename self -> k * fix test	2025-04-10 23:40:16 -04:00
Francis Lata	eb2e59db42	RetinaNet model type annotations and loss functions (#9822 ) * add type annotations and loss functions for training * combine sum of multiple dims inside loss functions	2025-04-10 00:31:37 -04:00
Francis Lata	7bb36d71b2	remove openimages iterate (#9820 )	2025-04-09 22:54:12 -04:00
chenyu	c5db5b83b9	add SHOULD_USE_TC=1 check to simple_matmul (#9802 ) * add SHOULD_USE_TC=1 check to simple_matmul also zero centered the random input and update atol for tf32 * ATOL=2e-2 for HALF	2025-04-09 02:24:42 -04:00
George Hotz	78caf55154	Revert "FP8 support on NVIDIA (#8631 )" This reverts commit `2c8e4ea865`.	2025-04-09 12:27:41 +08:00
George Hotz	14928fecff	Revert "fix TF32 tensor core dropped in tc_sm89 (#9798 )" This reverts commit `7c9a96824f`.	2025-04-09 12:27:39 +08:00
chenyu	7c9a96824f	fix TF32 tensor core dropped in tc_sm89 (#9798 ) also add `SHOULD_USE_TC=1` to verify TC is applied in simple_matmul	2025-04-08 23:20:50 -04:00

... 2 3 4 5 6 ...

1242 Commits