tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-21 04:47:56 -05:00

Author	SHA1	Message	Date
chenyu	49bba2f0a0	improve test_nll_loss (#10986 ) build target and weight tensors outside so it tests backward too.	2025-06-26 02:46:55 -04:00
nimlgen	1c45b9f7fb	start nvpci (#10521 ) * start nvpci * talk to fsp * boot args * riscv core bootted * q * agen * got gsp init msg * some fixes * set registry, stuck aft lockdown( * start ga/ad port * gsp init on ada * more classes allocated * more * mm * fixes and progress * no huge pages for now * mm seems workin, but switch to 512mb page for simplicity * working state * not cleaned * claned * nvd=1 * start gr ctx * compute * clean 1 * cleanup 2 * cleanup 3 * cleaner 4 * cleaner 6 * add iface to nv * save before reboot * merged into NV * moveout mm * post merge * cleaner 7 * merge and rebase * pciiface abstraction + reset * download fw from web * print logs * minor changes + p2p * cleaner 8 * cleaner 9 * cleaner 10 * delete * delete this as well * linter 1 * oops * priv_client -> priv_root * fix mypy * mypy? * mypy? * small changes * shorter * ops * remove this * do not allocate paddr for reserve * nodiff * unified script * ops * dif ver * add lock * setup	2025-06-25 00:37:34 +03:00
chenyu	ffb032e31d	test_diagonal touchup (#10962 )	2025-06-24 15:51:19 -04:00
Utkarsh Gill	7f9958b632	Fix torch.linalg.diagonal crash due to invalid shrink in to_movement_ops (#10945 ) * fix as_strided shrink bug breaking torch.linalg.diagonal on tinygrad backend * cleanup * generic fix * tests * cmp with diagonal too * oops * move tests * fix test * remove unnecessary import * fix assert * compare against numpy --------- Co-authored-by: Utkarsh Gill <engelbart@Utkarshs-MacBook-Pro.local>	2025-06-24 15:36:06 -04:00
chenyu	18e264a449	Tensor.logsigmoid (#10955 )	2025-06-24 11:16:14 -04:00
George Hotz	e15754db28	remove (some) kernelize from llama and test schedule speed (#10939 ) * remove kernelize from llama * 405B * space	2025-06-23 15:07:31 -07:00
alpharush	22f9696522	Fix/hcqfuzz harnesss bug (#10923 ) * update command so extra module is found * fix empty range in randrange errors * lint	2025-06-23 11:22:30 +03:00
geohotstan	4ab7d792cc	ONNX improve dtype fallback (#10800 ) * fix * add early verbose demo test * is this how to write tests :s * is definition drift even a thing? gemini says it is * clean up * better * even better * try add to CI * doesn't work quite yet * much more work to be done * whoops * partition the test heh * skipif * some nits for better names * add webgpu test for onnxrunner * fix reference links * flush for now	2025-06-21 19:29:45 -04:00
George Hotz	92678e59ee	move kernel to opt (#10899 )	2025-06-20 15:22:28 -07:00
chenyu	3f29c7edda	minor onnx dropout cleanup (#10891 ) we should consider removing numpy random and test it similar to test_randomness, unless how seed works is part of spec?	2025-06-20 10:18:34 -04:00
qazal	000eb30f04	viz: remove prev profiler file (#10888 ) The new profiler is integrated in the main VIZ tab. Will also delete perfetto.html after matching [final features](https://github.com/tinygrad/tinygrad/pull/10763#issuecomment-2980543715) soon.	2025-06-19 23:05:46 +03:00
chenyu	7d5c769c6b	fix compile4 (#10797 )	2025-06-12 22:28:56 -04:00
geohotstan	806b68c2b3	Add fallback dtype to ONNX (#10788 ) * start * still need the float16 workaround in * tiny nit for correctness * idk hacks, I need to understand this device stuff better * no-op? * remove that assert for true nooooooop * add fallback_context	2025-06-12 20:39:21 -04:00
chenyu	5e7ad70aae	don't run linearize().uop tests in get_action_space test (#10766 ) * don't run linearize().uop tests in get_action_space test this part takes 2 minutes in CI and has nothing to do with action space. also not sure if the "for some reason" comment is still relevant * -n=auto test/models	2025-06-10 17:23:53 -04:00
nimlgen	800d1796d5	am_smi: kill process group (#10750 )	2025-06-10 15:23:39 +03:00
b1tg	24d328e313	onnx parser (#10435 ) * onnx parser * fix compile, lint * onnx.load -> onnx_load * compatible with ModelProto * fix test external_test_onnx_ops.py * fix tests * fix signed int * reduce to 261 lines * fix TypeProto.Optional * debug for _parse_message, add TypeProto.Sequence, cleanup * onnx_load from Tensor * remove BufferedReader * 174 lines and reduce tensor copy * cleanup * use onnx_load in external_model_benchmark.py * fix qcom test * [onnx] parser support external data --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-06-09 12:44:28 -04:00
George Hotz	32e9949052	rename lazydata to uop (#10698 )	2025-06-08 08:42:22 -07:00
George Hotz	3ece2e4bb5	hotfix: remove accel from extra	2025-06-08 08:20:34 -07:00
geohotstan	dedff0e96c	fix run huggingface onnx debug (#10679 )	2025-06-08 00:59:20 -04:00
nimlgen	85cea23557	nv: original bw qmd (#10672 ) * nv: original bw qmd * forgot	2025-06-07 01:43:22 +03:00
Sidharth N. Babu	ef14dfb277	compile fixes (#10442 )	2025-06-06 18:38:37 -04:00
chenyu	4a6d84c4c3	hotfix llama start_pos vmax is max_context-1 (#10659 ) * hotfix llama start_pos vmax is max_context-1 fixed `IGNORE_OOB=0 python3 examples/llama3.py --size 1B --benchmark --temperature 0` * hotfix: multitensor transformer test tests kv cache --------- Co-authored-by: George Hotz <geohot@gmail.com>	2025-06-06 00:41:25 -04:00
Xingyu	7a1bfb668d	Implement linalg_eigh function for tensor eigenvalue decomposition in torch backend (#10612 ) * Implement private _linalg_eigh function for tensor eigenvalue decomposition in torch backend * Add unit test for linalg.eigh function in TestTorchBackend This test verifies the eigenvalue decomposition of a 2x2 tensor using the linalg.eigh function, ensuring the computed eigenvalues and reconstructed tensor match the expected results.	2025-06-04 07:59:50 -04:00
nimlgen	883bb4541c	am: reserve address space (#10564 ) * am: reserve address space * f * cc * errno * fix * always has cpu mapping	2025-05-30 19:31:03 +03:00
qazal	5b59728c75	refactor LOAD(DEFINE_GLOBAL, VIEW) in kernels to LOAD(VIEW(DEFINE_GLOBAL)) (#10541 ) * changes to core tinygrad * fixups pt1 TC=3 docs/abstractions2.py IMAGE=2 test_quantize_dsp test_schedule * more tests * green now * images stay images	2025-05-30 14:27:58 +03:00
George Hotz	b3b43a82c4	remove Tensor.no_grad, it's meaningless now [pr] (#10556 )	2025-05-28 22:20:02 -07:00
George Hotz	871df1436a	more beautiful cifar (#10551 ) * enumerate cases of Tensors in the JIT * optional fused optimizers * add fused optimizer test * move that there * ugh * work on beautiful_cifar * speed close to hlb_cifar * schedule to corealize all * one line sched step * less lines	2025-05-28 20:48:20 -07:00
nimlgen	d1d9e729fd	am_smi: mem usage (#10547 )	2025-05-28 16:53:31 +03:00
chenyu	76eb130d8c	hotfix: BenchEvent MLPERF_RUN is mlperf_run (#10526 )	2025-05-26 20:19:37 -04:00
geohotstan	602a145f8f	Add Tensor.unfold (#10518 ) * yoinked 10272 * eitanturok's fixes * hmmm should size be sint? * add test	2025-05-26 11:15:44 -04:00
nimlgen	deb369417c	am_smi: print device usage (#10520 ) * am_smi: print device usage * tiny comments	2025-05-26 17:17:56 +03:00
geohotstan	fd9f236a82	move test over (#10508 )	2025-05-25 21:51:51 -04:00
George Hotz	941cbd3471	hotfix: amd works on arch linux w/o rocm	2025-05-24 16:47:13 -07:00
nimlgen	d90ddcc365	nv: blackwell support (#10487 ) * nv: blackwell support * fixes * hm * h * fixes * mypy * xx * yy * arr * revert * oops * unrelated	2025-05-24 18:23:53 +03:00
chenyu	dc6309242d	WallTimeEvent for mlperf ci (#10506 )	2025-05-24 10:56:03 -04:00
Panagiotis Kourouklidis	e21836952d	mmapeak implementation for 7900 XTX (#10417 ) * Add mmapeak implementation for 7900 XTX * Change identation * Use a template instead of multiple assebly files * Fix output formatting * Reduce register file bank conflicts * More accurate measurement for quick instructions * Add support for gfx1201 * RDNA4 wmma requires less VGRPs * RDNA4 does not have s_cmpk instructions * Add v_wmma_i32_16x16x32_iu4 for gfx1201 * Add sparse wmma instructions --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-05-23 16:26:12 -07:00
George Hotz	0a313d98a0	add rocm 6.4 support (#10491 ) * add rocm 6.4 support * update to newer amdcomgr, assert lang is right * fix aux-triple	2025-05-23 16:20:54 -07:00
Xingyu	1e0a59aca4	fix: handle buffer size calculation in to_movement_ops and add scalar assignment test in torch_backend (#10464 )	2025-05-22 10:54:13 -07:00
George Hotz	577a0b4cfa	openpilot compile4 (wip) (#10407 ) * openpilot compile4 * add copies * remove junk	2025-05-22 10:47:34 -07:00
qazal	7720c1aef1	hotfix: remove viz_sz.py [pr] (#10446 )	2025-05-21 14:17:42 +03:00
qazal	df4cbb69e9	move fuzz_schedule.py to extra [pr] (#10444 )	2025-05-21 10:07:24 +03:00
qazal	8a6fb37560	move viz /prof to extra [pr] (#10401 )	2025-05-18 23:25:59 +03:00
George Hotz	411392dfb7	move files into uop dir (#10399 ) * move files into uop dir [pr] * tinygrad.uop is a thing * fix uop docs, no pr * fix viz	2025-05-18 11:38:28 -07:00
qazal	17f0f5e764	add v_rcp_f32_e64 to remu (#10393 ) * tests from the box * add v_rcp_f32_e64 to remu * f32::from_bits utils * v_cndmask_b32 tests	2025-05-18 17:08:21 +03:00
Xingyu	286b0f4051	Add equal function implementation and corresponding test (#10351 ) - Implemented a new function `equal` in the torch backend to compare two tensors for equality. - Added unit tests for the `equal` function to verify its correctness with different tensor inputs.	2025-05-16 23:39:49 -07:00
Ignacio Sica	a54fd745c3	simpler barrier match in remu (#10339 ) * s_barrier * remove s_barrier from syncs	2025-05-16 14:40:58 +03:00
wozeparrot	1ed04f993b	move benchmark stat tracking to influxdb (#10185 )	2025-05-15 16:14:56 -07:00
Ignacio Sica	3c453e96a9	add ds_load_b96 and ds_store_b96 instructions (#10338 )	2025-05-15 18:11:08 +03:00
qazal	be8202b293	add s_abs_i32 instruction to remu (#10334 )	2025-05-15 16:47:58 +03:00
nimlgen	e00679dc92	am_smi: fix layout with sleep mode (#10300 )	2025-05-14 15:44:42 +03:00

1 2 3 4 5 ...

1242 Commits