tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-13 17:08:11 -05:00

Author	SHA1	Message	Date
geohotstan	602a145f8f	Add Tensor.unfold (#10518 ) * yoinked 10272 * eitanturok's fixes * hmmm should size be sint? * add test	2025-05-26 11:15:44 -04:00
nimlgen	deb369417c	am_smi: print device usage (#10520 ) * am_smi: print device usage * tiny comments	2025-05-26 17:17:56 +03:00
geohotstan	fd9f236a82	move test over (#10508 )	2025-05-25 21:51:51 -04:00
George Hotz	941cbd3471	hotfix: amd works on arch linux w/o rocm	2025-05-24 16:47:13 -07:00
nimlgen	d90ddcc365	nv: blackwell support (#10487 ) * nv: blackwell support * fixes * hm * h * fixes * mypy * xx * yy * arr * revert * oops * unrelated	2025-05-24 18:23:53 +03:00
chenyu	dc6309242d	WallTimeEvent for mlperf ci (#10506 )	2025-05-24 10:56:03 -04:00
Panagiotis Kourouklidis	e21836952d	mmapeak implementation for 7900 XTX (#10417 ) * Add mmapeak implementation for 7900 XTX * Change identation * Use a template instead of multiple assebly files * Fix output formatting * Reduce register file bank conflicts * More accurate measurement for quick instructions * Add support for gfx1201 * RDNA4 wmma requires less VGRPs * RDNA4 does not have s_cmpk instructions * Add v_wmma_i32_16x16x32_iu4 for gfx1201 * Add sparse wmma instructions --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-05-23 16:26:12 -07:00
George Hotz	0a313d98a0	add rocm 6.4 support (#10491 ) * add rocm 6.4 support * update to newer amdcomgr, assert lang is right * fix aux-triple	2025-05-23 16:20:54 -07:00
Xingyu	1e0a59aca4	fix: handle buffer size calculation in to_movement_ops and add scalar assignment test in torch_backend (#10464 )	2025-05-22 10:54:13 -07:00
George Hotz	577a0b4cfa	openpilot compile4 (wip) (#10407 ) * openpilot compile4 * add copies * remove junk	2025-05-22 10:47:34 -07:00
qazal	7720c1aef1	hotfix: remove viz_sz.py [pr] (#10446 )	2025-05-21 14:17:42 +03:00
qazal	df4cbb69e9	move fuzz_schedule.py to extra [pr] (#10444 )	2025-05-21 10:07:24 +03:00
qazal	8a6fb37560	move viz /prof to extra [pr] (#10401 )	2025-05-18 23:25:59 +03:00
George Hotz	411392dfb7	move files into uop dir (#10399 ) * move files into uop dir [pr] * tinygrad.uop is a thing * fix uop docs, no pr * fix viz	2025-05-18 11:38:28 -07:00
qazal	17f0f5e764	add v_rcp_f32_e64 to remu (#10393 ) * tests from the box * add v_rcp_f32_e64 to remu * f32::from_bits utils * v_cndmask_b32 tests	2025-05-18 17:08:21 +03:00
Xingyu	286b0f4051	Add equal function implementation and corresponding test (#10351 ) - Implemented a new function `equal` in the torch backend to compare two tensors for equality. - Added unit tests for the `equal` function to verify its correctness with different tensor inputs.	2025-05-16 23:39:49 -07:00
Ignacio Sica	a54fd745c3	simpler barrier match in remu (#10339 ) * s_barrier * remove s_barrier from syncs	2025-05-16 14:40:58 +03:00
wozeparrot	1ed04f993b	move benchmark stat tracking to influxdb (#10185 )	2025-05-15 16:14:56 -07:00
Ignacio Sica	3c453e96a9	add ds_load_b96 and ds_store_b96 instructions (#10338 )	2025-05-15 18:11:08 +03:00
qazal	be8202b293	add s_abs_i32 instruction to remu (#10334 )	2025-05-15 16:47:58 +03:00
nimlgen	e00679dc92	am_smi: fix layout with sleep mode (#10300 )	2025-05-14 15:44:42 +03:00
nimlgen	0788659d08	usbgpu: fast cold boot (#10260 ) * usbgpu: fast cold boot * cleaner * assert * xx * compat * fix * fix	2025-05-14 14:58:55 +03:00
geohotstan	1c4ab6b991	ONNX add tests against ORT (#10270 ) * start * clean up * indicate file location too	2025-05-13 04:03:52 -04:00
nimlgen	bb31cc4582	usbgpu: check hash in patcher (#10266 )	2025-05-12 21:08:53 +03:00
George Hotz	8864ff894b	hotfix: that repeat_kv belongs outside the if	2025-05-11 18:43:01 -07:00
George Hotz	98c84a711d	min rectified flow example [pr] (#10252 ) * work on minrf example * more * jit sample * t is tensor not const * fixes * more convs * fix dropout * don't print * 504 * big patch * onehot * touch * use embeddings * dumb uses final layer * act * non fl * match * tp * 3 * of * ppsz * normal * add adln * no t * weird transformer * weird transformer * contig * actual speed fix * dumb * cb * 0 * t is 0 * mort-t * args * dumb days are over * readable * contig * no more t mask * mask_t * init to zero * clean * steps * work * tt * t * solid	2025-05-11 18:36:44 -07:00
qazal	9210280811	add v_fmac_f16 vop3 instruction to remu (#10247 ) * fmac vop3 * from the box	2025-05-10 23:48:25 +03:00
nimlgen	116390083f	nvme speed write example (#10230 )	2025-05-09 14:20:01 +03:00
Xingyu	a21369d039	Enhance tensor random functions with dtype support (#10214 ) * Enhance tensor random functions with dtype support - Updated `aten.uniform_` and `aten.normal_` to include dtype parameter in backend.py - Added unit tests for uniform and normal tensor generation with specific dtypes in test.py * Refactor test name for clarity - Renamed `test_normal_dtype` to `test_normal` in `extra/torch_backend/test.py` - Aims to improve readability and better reflect the test's purpose	2025-05-08 20:48:07 -04:00
qazal	4ea3e373aa	decode lds ops in remu (#10184 )	2025-05-07 16:44:18 +08:00
Ignacio Sica	74c25bdc8b	add support for `ds_load_u8` in remu (#10180 ) * add support for ds_load_u8 in remu * add test for ds_load_u8	2025-05-06 20:31:00 +03:00
nimlgen	34d55857cf	usbgpu: more devs in scan_pci (#10171 )	2025-05-06 11:55:34 +03:00
nimlgen	30bd6a619f	usb gpu (#8766 ) * start gpu * progress * fixes * read correct * libusb * libusb works * support asm24 * hmm * one access file * fix extra * start AMBar * works on am * back to usb * patch fw * full fast write into a bar * ugh, minus one gpus, next please * mute libusb for now * usb for asm24 * 63 * hmm * ops * rescan * and gpu shoudl be there * enumerate them? * usbgpu bus 4, 100% reliable (draft) * lil * works * comments * add DEBUG * cleaner * simplest * Revert "simplest" This reverts commit `1d00354c16`. * Revert "cleaner" This reverts commit `c5662de956`. * assert we find gpu * that's simpler * this back * simpler? * correcT * work * nonsense * works with more checks * this works * the 6s in the right place * reliable now * fix after reboot * set config * 1s timeouts * close to fw loading * streams * usbhub works * endpoints * fix * want to test tiny10 * move to tiny 10 * fix gpu * ugly speed * smth * mostly broken, but signals and dmas * do not reset gpu every time * changes to run kernels * ugh, not working * t10 * pg and sc files * some prog * um? * somehow it works * patched for 24 * some tries * minimal * moving * back to working * so sloooooow * move to controller * usb.py rewrite * rework * cleaner 1 * cleaner 2 * cleaner 3 * new abstractions * aft merge * init controller * cleaner 4 * cleaner 5 * patcher + tiny changes * ignore that * cleaner 6 * after rebase * cleaner 7 * bring it back * start linter war * linter 2 * autogen was missing * fix autogen * typing * better? * mypy * extra/legacy rename and cleaner * shuffle * better printing * tiny changes and tests --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-05-01 18:03:47 +03:00
chenyu	17d4d258ea	simple symbolic slice in llama [pr] (#10112 ) support slice that has step None and stop > start	2025-04-30 14:36:35 -04:00
nimlgen	fcdda4fc09	am: move boot memory to vram start (#10115 )	2025-04-30 19:12:19 +03:00
chenyu	573bbb9746	Revert "remove TransformerBlock contiguous in llama (#10104 )" (#10108 ) This reverts commit `b8d07dcc54`.	2025-04-29 15:28:38 -04:00
chenyu	b8d07dcc54	remove TransformerBlock contiguous in llama (#10104 )	2025-04-29 14:15:39 -04:00
qazal	3b67f56c02	kernelize some llama realizes (#10098 )	2025-04-29 18:39:56 +08:00
chenyu	3eba3d6ee9	don't pass model in convert_from_huggingface and convert_from_gguf (#10094 ) it only needs n_layers	2025-04-28 20:11:19 -04:00
George Hotz	690dac79b5	don't modify the ranges on reduce rewrite (#10062 ) * bug in div range folding * simpler * oh, this is right for indexing, but the div mod folding needs to be fixed * reenable * Passing test_complexity_w_unroll2 (#10068) * Passing * remove non_folded_divs * Add check for negative tern in div folding * Add test * bump that limit * fix casted --------- Co-authored-by: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>	2025-04-28 12:01:19 -04:00
qazal	ac37510f60	remu: only write v_cmp result if exec is set (#10084 )	2025-04-28 20:31:52 +08:00
qazal	d6b436a815	remu bugfix with -0.0 negation (#10082 )	2025-04-28 15:46:42 +08:00
George Hotz	ea5dddc537	reduce collapse generic (#10045 ) * reduce collapse generic * new arange folder * new range folding * correct with sym * all tests pass * indexing ops passes * failing tests * fix tests, remove unused * revert that * torch indexing is fast * skip on webgpu * touchups * comments	2025-04-26 09:13:24 -04:00
qazal	e1d2b64e92	remu new instructions (#10050 ) * remu new instructions * test_ds_store_half * test_v_mul_f16	2025-04-26 02:04:12 +03:00
qazal	bba5d0a3e4	remu refactors (#10028 ) * remu refactors * scc is sgpr 253 * remove that * rename to vcc_lo * run cargo test in CI * llvm-mc * meh * work * work_group work 1 * seeded_lanes is dumb * better than seeded_lanes * does not need to be address * 128 sgpr per wave * scc is sgpr, we don't know which one * null_src once more * derive clone, wave init is cleaner * init comes first	2025-04-26 04:31:10 +08:00
nimlgen	0fc85a2b0a	hcqfuzz: init (#10049 ) * hcqfuzz: init * fix fuzz * linter * graph * taht test * update readme	2025-04-25 23:19:21 +03:00
chenyu	74c6cf8be3	lint mlperf model_train (#10038 )	2025-04-24 16:19:44 -04:00
Nishant Rajadhyaksha	55942a8d8e	[Bounty] moved index_tensor off cpu in torch_backend (#9916 ) * moved index tensor off cpu in torch_backend * added support for None based indexing * fix_to_pass_tests * fix segfault tests	2025-04-24 14:12:37 -04:00
qazal	0b482fb824	add RDNA3 parser to remu (#10025 ) * llvm ref * work * all of them * salu * cleaner * start * vector ops * done * replace SMEM * vopd * sop1 * SOPC * null stays null_src * sopp * SOPK * sop2 * vop1 * vop2 * remove allow(dead_code) * vopc	2025-04-24 21:34:07 +08:00
Sieds Lykles	e75be6eafc	[bounty] [pr] index validation with z3 (#9981 ) * index validation with z3 * Change comment * toposort -> toposort() --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-04-24 08:06:08 -04:00

... 4 5 6 7 8 ...

1363 Commits