tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-20 20:38:03 -05:00

Author	SHA1	Message	Date
George Hotz	be53ef4f0a	rename DEFINE_ACC -> DEFINE_REG (#11006 ) * rename DEFINE_ACC -> DEFINE_REG * add CMPEQ to groupops	2025-06-27 11:09:25 -07:00
George Hotz	b4eb876d5a	kernel.py no longer permutes reduce axis [pr] (#10968 ) * kernel.py no longer permutes reduce axis [pr] * delete tests that handcode uops * regen of sops is broken... * put import back * just remove that * disable those tests	2025-06-26 17:44:58 -07:00
George Hotz	856759c79c	add halide example (#10980 ) * add halide example * upd halide gemm * partial works * touchups	2025-06-26 16:14:57 -07:00
qazal	1127302c46	move perfetto to extra (#10994 ) * move perfetto to extra * update TestViz and fix tests * remove perfetto.html from viz directory * work * mypy	2025-06-27 01:53:54 +03:00
qazal	712980e167	fix extract_dataset + add tests to CI (#10995 ) * fix extract_dataset + tests * add CI * sops.gz itself is same as master * yml + gzip -c + ge * don't commit that * bump limit to 1000 * axis=7 * test_tiny	2025-06-27 01:51:36 +03:00
geohotstan	50936b4a18	ONNX real float16 (#10694 ) * squash commits * temp fix for const tensor * actually realizing float16 can only happen in raw_data * .float -> cast(float) to rerun CI --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-06-26 14:05:12 -04:00
chenyu	49bba2f0a0	improve test_nll_loss (#10986 ) build target and weight tensors outside so it tests backward too.	2025-06-26 02:46:55 -04:00
nimlgen	1c45b9f7fb	start nvpci (#10521 ) * start nvpci * talk to fsp * boot args * riscv core bootted * q * agen * got gsp init msg * some fixes * set registry, stuck aft lockdown( * start ga/ad port * gsp init on ada * more classes allocated * more * mm * fixes and progress * no huge pages for now * mm seems workin, but switch to 512mb page for simplicity * working state * not cleaned * claned * nvd=1 * start gr ctx * compute * clean 1 * cleanup 2 * cleanup 3 * cleaner 4 * cleaner 6 * add iface to nv * save before reboot * merged into NV * moveout mm * post merge * cleaner 7 * merge and rebase * pciiface abstraction + reset * download fw from web * print logs * minor changes + p2p * cleaner 8 * cleaner 9 * cleaner 10 * delete * delete this as well * linter 1 * oops * priv_client -> priv_root * fix mypy * mypy? * mypy? * small changes * shorter * ops * remove this * do not allocate paddr for reserve * nodiff * unified script * ops * dif ver * add lock * setup	2025-06-25 00:37:34 +03:00
chenyu	ffb032e31d	test_diagonal touchup (#10962 )	2025-06-24 15:51:19 -04:00
Utkarsh Gill	7f9958b632	Fix torch.linalg.diagonal crash due to invalid shrink in to_movement_ops (#10945 ) * fix as_strided shrink bug breaking torch.linalg.diagonal on tinygrad backend * cleanup * generic fix * tests * cmp with diagonal too * oops * move tests * fix test * remove unnecessary import * fix assert * compare against numpy --------- Co-authored-by: Utkarsh Gill <engelbart@Utkarshs-MacBook-Pro.local>	2025-06-24 15:36:06 -04:00
chenyu	18e264a449	Tensor.logsigmoid (#10955 )	2025-06-24 11:16:14 -04:00
George Hotz	e15754db28	remove (some) kernelize from llama and test schedule speed (#10939 ) * remove kernelize from llama * 405B * space	2025-06-23 15:07:31 -07:00
alpharush	22f9696522	Fix/hcqfuzz harnesss bug (#10923 ) * update command so extra module is found * fix empty range in randrange errors * lint	2025-06-23 11:22:30 +03:00
geohotstan	4ab7d792cc	ONNX improve dtype fallback (#10800 ) * fix * add early verbose demo test * is this how to write tests :s * is definition drift even a thing? gemini says it is * clean up * better * even better * try add to CI * doesn't work quite yet * much more work to be done * whoops * partition the test heh * skipif * some nits for better names * add webgpu test for onnxrunner * fix reference links * flush for now	2025-06-21 19:29:45 -04:00
George Hotz	92678e59ee	move kernel to opt (#10899 )	2025-06-20 15:22:28 -07:00
chenyu	3f29c7edda	minor onnx dropout cleanup (#10891 ) we should consider removing numpy random and test it similar to test_randomness, unless how seed works is part of spec?	2025-06-20 10:18:34 -04:00
qazal	000eb30f04	viz: remove prev profiler file (#10888 ) The new profiler is integrated in the main VIZ tab. Will also delete perfetto.html after matching [final features](https://github.com/tinygrad/tinygrad/pull/10763#issuecomment-2980543715) soon.	2025-06-19 23:05:46 +03:00
chenyu	7d5c769c6b	fix compile4 (#10797 )	2025-06-12 22:28:56 -04:00
geohotstan	806b68c2b3	Add fallback dtype to ONNX (#10788 ) * start * still need the float16 workaround in * tiny nit for correctness * idk hacks, I need to understand this device stuff better * no-op? * remove that assert for true nooooooop * add fallback_context	2025-06-12 20:39:21 -04:00
chenyu	5e7ad70aae	don't run linearize().uop tests in get_action_space test (#10766 ) * don't run linearize().uop tests in get_action_space test this part takes 2 minutes in CI and has nothing to do with action space. also not sure if the "for some reason" comment is still relevant * -n=auto test/models	2025-06-10 17:23:53 -04:00
nimlgen	800d1796d5	am_smi: kill process group (#10750 )	2025-06-10 15:23:39 +03:00
b1tg	24d328e313	onnx parser (#10435 ) * onnx parser * fix compile, lint * onnx.load -> onnx_load * compatible with ModelProto * fix test external_test_onnx_ops.py * fix tests * fix signed int * reduce to 261 lines * fix TypeProto.Optional * debug for _parse_message, add TypeProto.Sequence, cleanup * onnx_load from Tensor * remove BufferedReader * 174 lines and reduce tensor copy * cleanup * use onnx_load in external_model_benchmark.py * fix qcom test * [onnx] parser support external data --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-06-09 12:44:28 -04:00
George Hotz	32e9949052	rename lazydata to uop (#10698 )	2025-06-08 08:42:22 -07:00
George Hotz	3ece2e4bb5	hotfix: remove accel from extra	2025-06-08 08:20:34 -07:00
geohotstan	dedff0e96c	fix run huggingface onnx debug (#10679 )	2025-06-08 00:59:20 -04:00
nimlgen	85cea23557	nv: original bw qmd (#10672 ) * nv: original bw qmd * forgot	2025-06-07 01:43:22 +03:00
Sidharth N. Babu	ef14dfb277	compile fixes (#10442 )	2025-06-06 18:38:37 -04:00
chenyu	4a6d84c4c3	hotfix llama start_pos vmax is max_context-1 (#10659 ) * hotfix llama start_pos vmax is max_context-1 fixed `IGNORE_OOB=0 python3 examples/llama3.py --size 1B --benchmark --temperature 0` * hotfix: multitensor transformer test tests kv cache --------- Co-authored-by: George Hotz <geohot@gmail.com>	2025-06-06 00:41:25 -04:00
Xingyu	7a1bfb668d	Implement linalg_eigh function for tensor eigenvalue decomposition in torch backend (#10612 ) * Implement private _linalg_eigh function for tensor eigenvalue decomposition in torch backend * Add unit test for linalg.eigh function in TestTorchBackend This test verifies the eigenvalue decomposition of a 2x2 tensor using the linalg.eigh function, ensuring the computed eigenvalues and reconstructed tensor match the expected results.	2025-06-04 07:59:50 -04:00
nimlgen	883bb4541c	am: reserve address space (#10564 ) * am: reserve address space * f * cc * errno * fix * always has cpu mapping	2025-05-30 19:31:03 +03:00
qazal	5b59728c75	refactor LOAD(DEFINE_GLOBAL, VIEW) in kernels to LOAD(VIEW(DEFINE_GLOBAL)) (#10541 ) * changes to core tinygrad * fixups pt1 TC=3 docs/abstractions2.py IMAGE=2 test_quantize_dsp test_schedule * more tests * green now * images stay images	2025-05-30 14:27:58 +03:00
George Hotz	b3b43a82c4	remove Tensor.no_grad, it's meaningless now [pr] (#10556 )	2025-05-28 22:20:02 -07:00
George Hotz	871df1436a	more beautiful cifar (#10551 ) * enumerate cases of Tensors in the JIT * optional fused optimizers * add fused optimizer test * move that there * ugh * work on beautiful_cifar * speed close to hlb_cifar * schedule to corealize all * one line sched step * less lines	2025-05-28 20:48:20 -07:00
nimlgen	d1d9e729fd	am_smi: mem usage (#10547 )	2025-05-28 16:53:31 +03:00
chenyu	76eb130d8c	hotfix: BenchEvent MLPERF_RUN is mlperf_run (#10526 )	2025-05-26 20:19:37 -04:00
geohotstan	602a145f8f	Add Tensor.unfold (#10518 ) * yoinked 10272 * eitanturok's fixes * hmmm should size be sint? * add test	2025-05-26 11:15:44 -04:00
nimlgen	deb369417c	am_smi: print device usage (#10520 ) * am_smi: print device usage * tiny comments	2025-05-26 17:17:56 +03:00
geohotstan	fd9f236a82	move test over (#10508 )	2025-05-25 21:51:51 -04:00
George Hotz	941cbd3471	hotfix: amd works on arch linux w/o rocm	2025-05-24 16:47:13 -07:00
nimlgen	d90ddcc365	nv: blackwell support (#10487 ) * nv: blackwell support * fixes * hm * h * fixes * mypy * xx * yy * arr * revert * oops * unrelated	2025-05-24 18:23:53 +03:00
chenyu	dc6309242d	WallTimeEvent for mlperf ci (#10506 )	2025-05-24 10:56:03 -04:00
Panagiotis Kourouklidis	e21836952d	mmapeak implementation for 7900 XTX (#10417 ) * Add mmapeak implementation for 7900 XTX * Change identation * Use a template instead of multiple assebly files * Fix output formatting * Reduce register file bank conflicts * More accurate measurement for quick instructions * Add support for gfx1201 * RDNA4 wmma requires less VGRPs * RDNA4 does not have s_cmpk instructions * Add v_wmma_i32_16x16x32_iu4 for gfx1201 * Add sparse wmma instructions --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-05-23 16:26:12 -07:00
George Hotz	0a313d98a0	add rocm 6.4 support (#10491 ) * add rocm 6.4 support * update to newer amdcomgr, assert lang is right * fix aux-triple	2025-05-23 16:20:54 -07:00
Xingyu	1e0a59aca4	fix: handle buffer size calculation in to_movement_ops and add scalar assignment test in torch_backend (#10464 )	2025-05-22 10:54:13 -07:00
George Hotz	577a0b4cfa	openpilot compile4 (wip) (#10407 ) * openpilot compile4 * add copies * remove junk	2025-05-22 10:47:34 -07:00
qazal	7720c1aef1	hotfix: remove viz_sz.py [pr] (#10446 )	2025-05-21 14:17:42 +03:00
qazal	df4cbb69e9	move fuzz_schedule.py to extra [pr] (#10444 )	2025-05-21 10:07:24 +03:00
qazal	8a6fb37560	move viz /prof to extra [pr] (#10401 )	2025-05-18 23:25:59 +03:00
George Hotz	411392dfb7	move files into uop dir (#10399 ) * move files into uop dir [pr] * tinygrad.uop is a thing * fix uop docs, no pr * fix viz	2025-05-18 11:38:28 -07:00
qazal	17f0f5e764	add v_rcp_f32_e64 to remu (#10393 ) * tests from the box * add v_rcp_f32_e64 to remu * f32::from_bits utils * v_cndmask_b32 tests	2025-05-18 17:08:21 +03:00

1 2 3 4 5 ...

1248 Commits