tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-24 14:28:09 -05:00

Author	SHA1	Message	Date
George Hotz	92678e59ee	move kernel to opt (#10899 )	2025-06-20 15:22:28 -07:00
chenyu	3f29c7edda	minor onnx dropout cleanup (#10891 ) we should consider removing numpy random and test it similar to test_randomness, unless how seed works is part of spec?	2025-06-20 10:18:34 -04:00
qazal	000eb30f04	viz: remove prev profiler file (#10888 ) The new profiler is integrated in the main VIZ tab. Will also delete perfetto.html after matching [final features](https://github.com/tinygrad/tinygrad/pull/10763#issuecomment-2980543715) soon.	2025-06-19 23:05:46 +03:00
chenyu	7d5c769c6b	fix compile4 (#10797 )	2025-06-12 22:28:56 -04:00
geohotstan	806b68c2b3	Add fallback dtype to ONNX (#10788 ) * start * still need the float16 workaround in * tiny nit for correctness * idk hacks, I need to understand this device stuff better * no-op? * remove that assert for true nooooooop * add fallback_context	2025-06-12 20:39:21 -04:00
chenyu	5e7ad70aae	don't run linearize().uop tests in get_action_space test (#10766 ) * don't run linearize().uop tests in get_action_space test this part takes 2 minutes in CI and has nothing to do with action space. also not sure if the "for some reason" comment is still relevant * -n=auto test/models	2025-06-10 17:23:53 -04:00
nimlgen	800d1796d5	am_smi: kill process group (#10750 )	2025-06-10 15:23:39 +03:00
b1tg	24d328e313	onnx parser (#10435 ) * onnx parser * fix compile, lint * onnx.load -> onnx_load * compatible with ModelProto * fix test external_test_onnx_ops.py * fix tests * fix signed int * reduce to 261 lines * fix TypeProto.Optional * debug for _parse_message, add TypeProto.Sequence, cleanup * onnx_load from Tensor * remove BufferedReader * 174 lines and reduce tensor copy * cleanup * use onnx_load in external_model_benchmark.py * fix qcom test * [onnx] parser support external data --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-06-09 12:44:28 -04:00
George Hotz	32e9949052	rename lazydata to uop (#10698 )	2025-06-08 08:42:22 -07:00
George Hotz	3ece2e4bb5	hotfix: remove accel from extra	2025-06-08 08:20:34 -07:00
geohotstan	dedff0e96c	fix run huggingface onnx debug (#10679 )	2025-06-08 00:59:20 -04:00
nimlgen	85cea23557	nv: original bw qmd (#10672 ) * nv: original bw qmd * forgot	2025-06-07 01:43:22 +03:00
Sidharth N. Babu	ef14dfb277	compile fixes (#10442 )	2025-06-06 18:38:37 -04:00
chenyu	4a6d84c4c3	hotfix llama start_pos vmax is max_context-1 (#10659 ) * hotfix llama start_pos vmax is max_context-1 fixed `IGNORE_OOB=0 python3 examples/llama3.py --size 1B --benchmark --temperature 0` * hotfix: multitensor transformer test tests kv cache --------- Co-authored-by: George Hotz <geohot@gmail.com>	2025-06-06 00:41:25 -04:00
Xingyu	7a1bfb668d	Implement linalg_eigh function for tensor eigenvalue decomposition in torch backend (#10612 ) * Implement private _linalg_eigh function for tensor eigenvalue decomposition in torch backend * Add unit test for linalg.eigh function in TestTorchBackend This test verifies the eigenvalue decomposition of a 2x2 tensor using the linalg.eigh function, ensuring the computed eigenvalues and reconstructed tensor match the expected results.	2025-06-04 07:59:50 -04:00
nimlgen	883bb4541c	am: reserve address space (#10564 ) * am: reserve address space * f * cc * errno * fix * always has cpu mapping	2025-05-30 19:31:03 +03:00
qazal	5b59728c75	refactor LOAD(DEFINE_GLOBAL, VIEW) in kernels to LOAD(VIEW(DEFINE_GLOBAL)) (#10541 ) * changes to core tinygrad * fixups pt1 TC=3 docs/abstractions2.py IMAGE=2 test_quantize_dsp test_schedule * more tests * green now * images stay images	2025-05-30 14:27:58 +03:00
George Hotz	b3b43a82c4	remove Tensor.no_grad, it's meaningless now [pr] (#10556 )	2025-05-28 22:20:02 -07:00
George Hotz	871df1436a	more beautiful cifar (#10551 ) * enumerate cases of Tensors in the JIT * optional fused optimizers * add fused optimizer test * move that there * ugh * work on beautiful_cifar * speed close to hlb_cifar * schedule to corealize all * one line sched step * less lines	2025-05-28 20:48:20 -07:00
nimlgen	d1d9e729fd	am_smi: mem usage (#10547 )	2025-05-28 16:53:31 +03:00
chenyu	76eb130d8c	hotfix: BenchEvent MLPERF_RUN is mlperf_run (#10526 )	2025-05-26 20:19:37 -04:00
geohotstan	602a145f8f	Add Tensor.unfold (#10518 ) * yoinked 10272 * eitanturok's fixes * hmmm should size be sint? * add test	2025-05-26 11:15:44 -04:00
nimlgen	deb369417c	am_smi: print device usage (#10520 ) * am_smi: print device usage * tiny comments	2025-05-26 17:17:56 +03:00
geohotstan	fd9f236a82	move test over (#10508 )	2025-05-25 21:51:51 -04:00
George Hotz	941cbd3471	hotfix: amd works on arch linux w/o rocm	2025-05-24 16:47:13 -07:00
nimlgen	d90ddcc365	nv: blackwell support (#10487 ) * nv: blackwell support * fixes * hm * h * fixes * mypy * xx * yy * arr * revert * oops * unrelated	2025-05-24 18:23:53 +03:00
chenyu	dc6309242d	WallTimeEvent for mlperf ci (#10506 )	2025-05-24 10:56:03 -04:00
Panagiotis Kourouklidis	e21836952d	mmapeak implementation for 7900 XTX (#10417 ) * Add mmapeak implementation for 7900 XTX * Change identation * Use a template instead of multiple assebly files * Fix output formatting * Reduce register file bank conflicts * More accurate measurement for quick instructions * Add support for gfx1201 * RDNA4 wmma requires less VGRPs * RDNA4 does not have s_cmpk instructions * Add v_wmma_i32_16x16x32_iu4 for gfx1201 * Add sparse wmma instructions --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-05-23 16:26:12 -07:00
George Hotz	0a313d98a0	add rocm 6.4 support (#10491 ) * add rocm 6.4 support * update to newer amdcomgr, assert lang is right * fix aux-triple	2025-05-23 16:20:54 -07:00
Xingyu	1e0a59aca4	fix: handle buffer size calculation in to_movement_ops and add scalar assignment test in torch_backend (#10464 )	2025-05-22 10:54:13 -07:00
George Hotz	577a0b4cfa	openpilot compile4 (wip) (#10407 ) * openpilot compile4 * add copies * remove junk	2025-05-22 10:47:34 -07:00
qazal	7720c1aef1	hotfix: remove viz_sz.py [pr] (#10446 )	2025-05-21 14:17:42 +03:00
qazal	df4cbb69e9	move fuzz_schedule.py to extra [pr] (#10444 )	2025-05-21 10:07:24 +03:00
qazal	8a6fb37560	move viz /prof to extra [pr] (#10401 )	2025-05-18 23:25:59 +03:00
George Hotz	411392dfb7	move files into uop dir (#10399 ) * move files into uop dir [pr] * tinygrad.uop is a thing * fix uop docs, no pr * fix viz	2025-05-18 11:38:28 -07:00
qazal	17f0f5e764	add v_rcp_f32_e64 to remu (#10393 ) * tests from the box * add v_rcp_f32_e64 to remu * f32::from_bits utils * v_cndmask_b32 tests	2025-05-18 17:08:21 +03:00
Xingyu	286b0f4051	Add equal function implementation and corresponding test (#10351 ) - Implemented a new function `equal` in the torch backend to compare two tensors for equality. - Added unit tests for the `equal` function to verify its correctness with different tensor inputs.	2025-05-16 23:39:49 -07:00
Ignacio Sica	a54fd745c3	simpler barrier match in remu (#10339 ) * s_barrier * remove s_barrier from syncs	2025-05-16 14:40:58 +03:00
wozeparrot	1ed04f993b	move benchmark stat tracking to influxdb (#10185 )	2025-05-15 16:14:56 -07:00
Ignacio Sica	3c453e96a9	add ds_load_b96 and ds_store_b96 instructions (#10338 )	2025-05-15 18:11:08 +03:00
qazal	be8202b293	add s_abs_i32 instruction to remu (#10334 )	2025-05-15 16:47:58 +03:00
nimlgen	e00679dc92	am_smi: fix layout with sleep mode (#10300 )	2025-05-14 15:44:42 +03:00
nimlgen	0788659d08	usbgpu: fast cold boot (#10260 ) * usbgpu: fast cold boot * cleaner * assert * xx * compat * fix * fix	2025-05-14 14:58:55 +03:00
geohotstan	1c4ab6b991	ONNX add tests against ORT (#10270 ) * start * clean up * indicate file location too	2025-05-13 04:03:52 -04:00
nimlgen	bb31cc4582	usbgpu: check hash in patcher (#10266 )	2025-05-12 21:08:53 +03:00
George Hotz	8864ff894b	hotfix: that repeat_kv belongs outside the if	2025-05-11 18:43:01 -07:00
George Hotz	98c84a711d	min rectified flow example [pr] (#10252 ) * work on minrf example * more * jit sample * t is tensor not const * fixes * more convs * fix dropout * don't print * 504 * big patch * onehot * touch * use embeddings * dumb uses final layer * act * non fl * match * tp * 3 * of * ppsz * normal * add adln * no t * weird transformer * weird transformer * contig * actual speed fix * dumb * cb * 0 * t is 0 * mort-t * args * dumb days are over * readable * contig * no more t mask * mask_t * init to zero * clean * steps * work * tt * t * solid	2025-05-11 18:36:44 -07:00
qazal	9210280811	add v_fmac_f16 vop3 instruction to remu (#10247 ) * fmac vop3 * from the box	2025-05-10 23:48:25 +03:00
nimlgen	116390083f	nvme speed write example (#10230 )	2025-05-09 14:20:01 +03:00
Xingyu	a21369d039	Enhance tensor random functions with dtype support (#10214 ) * Enhance tensor random functions with dtype support - Updated `aten.uniform_` and `aten.normal_` to include dtype parameter in backend.py - Added unit tests for uniform and normal tensor generation with specific dtypes in test.py * Refactor test name for clarity - Renamed `test_normal_dtype` to `test_normal` in `extra/torch_backend/test.py` - Aims to improve readability and better reflect the test's purpose	2025-05-08 20:48:07 -04:00

1 2 3 4 5 ...

1134 Commits