tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-24 14:28:09 -05:00

Author	SHA1	Message	Date
chenyu	c8e5c4d7c3	insert_before -> insert_at [pr] (#11257 ) more precise	2025-07-15 17:44:34 -04:00
chenyu	a0438012af	remove Kernel.get_program [pr] (#11203 )	2025-07-12 20:50:29 -04:00
George Hotz	d67c8e7b42	local metal on metal in uop syntax (#11185 ) * local metal on metal in uop syntax * TODO: just put the axis_info in the kernelinfo * local * amd_matmul works @ 28 TFLOPS * clean up matmul * kernel8 works * remove that * locals * axistype innovation * work * cleanup * kernel3 regs * cleanup kernel3 * work * why is it broken * no beam * reenable * permutes	2025-07-12 16:31:19 -07:00
geohotstan	5ce278b245	OnnxRunner file as input (#10789 ) * file path as input and have parse be in OnnxRunner.__init__ * modelproto_to_onnxrunner -> modelproto_to_runner * whoops, fix import * oh flakiness again, is it because it's getting gc-ed? * small changes * CI flaky so just move compile4 fix in * copy typing of onnx_load * actually can just import onnx_load instead of onnx.load * fix external_benchmark_openpilot * fix onnx_runner test to use onnx_helper * rerun CI * try run_modelproto * spam CI a few times * revert run_modelproto since that's flaky also * no external onnx_load usage except onnx.py * cursor tab complete is evil. Snuck a darn sorted in. But does order change result? Why? * model_benchmark 193s -> 80s, add OnnxRunner.to()... * minimize diff and clean up * device can be None, weird but eh --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-07-12 14:27:46 -04:00
chenyu	6283d50224	DEPRECATED_linearize -> to_program [pr] (#11198 )	2025-07-12 13:46:20 -04:00
George Hotz	2893feb9f6	cleanups for kernel.py (#11143 ) * cleanups for kernel.py * fixups	2025-07-08 18:10:25 -07:00
chenyu	7ce9e45474	mypy onnx_parser (#11141 )	2025-07-08 19:50:28 -04:00
chenyu	ffcc557986	lint onnx and onnx_parser (#11134 )	2025-07-08 15:28:35 -04:00
qazal	3dfc0ff887	move cpu_profile and shared ProfileEvents from device.py to helpers [pr] (#11126 ) * move cpu_profile and shared ProfileEvents to helpers [pr] * TestProfiler.test_cpu_profile * update test_viz.py * TestProfiler.test_profile_multiops ordering, it's different streams now	2025-07-08 12:14:03 +03:00
nimlgen	71377cd233	nv: parse falcon app descs (#11118 )	2025-07-07 18:14:14 +03:00
kevvz	b7af9cf849	clean svd tests, set full_matrices false in torch backend (#11113 ) * clean tests, set full_matrices false * add more shape asserts	2025-07-06 13:55:49 -04:00
chenyu	ba88ec3ad0	pipe linalg svd to torch (#11109 ) and found a bug in svd	2025-07-06 08:37:25 -04:00
nimlgen	4dccb2ea49	am_smi: increase kill retries (#11099 )	2025-07-05 16:23:50 +03:00
0xSG	17119b0f23	hip_ioctl: platform.machine added (#11084 )	2025-07-04 17:20:24 +03:00
nimlgen	2d138c6cf1	am: factor out init_sw (#11070 )	2025-07-03 11:01:17 +03:00
chenyu	425d5f55c4	generate kernel dataset and upload artifact (#11063 )	2025-07-02 17:21:25 -04:00
chenyu	4626e9c172	is_numpy_ndarray helper [pr] (#11050 )	2025-07-02 09:12:53 -04:00
chenyu	126fcf4129	clean up AMD_LLVM in tests (#11021 )	2025-06-28 22:45:47 -04:00
chenyu	a6485d00c8	very tiny generate_dataset (#11013 ) one minute to gen on my mac	2025-06-27 17:10:45 -04:00
George Hotz	be53ef4f0a	rename DEFINE_ACC -> DEFINE_REG (#11006 ) * rename DEFINE_ACC -> DEFINE_REG * add CMPEQ to groupops	2025-06-27 11:09:25 -07:00
George Hotz	b4eb876d5a	kernel.py no longer permutes reduce axis [pr] (#10968 ) * kernel.py no longer permutes reduce axis [pr] * delete tests that handcode uops * regen of sops is broken... * put import back * just remove that * disable those tests	2025-06-26 17:44:58 -07:00
George Hotz	856759c79c	add halide example (#10980 ) * add halide example * upd halide gemm * partial works * touchups	2025-06-26 16:14:57 -07:00
qazal	1127302c46	move perfetto to extra (#10994 ) * move perfetto to extra * update TestViz and fix tests * remove perfetto.html from viz directory * work * mypy	2025-06-27 01:53:54 +03:00
qazal	712980e167	fix extract_dataset + add tests to CI (#10995 ) * fix extract_dataset + tests * add CI * sops.gz itself is same as master * yml + gzip -c + ge * don't commit that * bump limit to 1000 * axis=7 * test_tiny	2025-06-27 01:51:36 +03:00
geohotstan	50936b4a18	ONNX real float16 (#10694 ) * squash commits * temp fix for const tensor * actually realizing float16 can only happen in raw_data * .float -> cast(float) to rerun CI --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-06-26 14:05:12 -04:00
chenyu	49bba2f0a0	improve test_nll_loss (#10986 ) build target and weight tensors outside so it tests backward too.	2025-06-26 02:46:55 -04:00
nimlgen	1c45b9f7fb	start nvpci (#10521 ) * start nvpci * talk to fsp * boot args * riscv core bootted * q * agen * got gsp init msg * some fixes * set registry, stuck aft lockdown( * start ga/ad port * gsp init on ada * more classes allocated * more * mm * fixes and progress * no huge pages for now * mm seems workin, but switch to 512mb page for simplicity * working state * not cleaned * claned * nvd=1 * start gr ctx * compute * clean 1 * cleanup 2 * cleanup 3 * cleaner 4 * cleaner 6 * add iface to nv * save before reboot * merged into NV * moveout mm * post merge * cleaner 7 * merge and rebase * pciiface abstraction + reset * download fw from web * print logs * minor changes + p2p * cleaner 8 * cleaner 9 * cleaner 10 * delete * delete this as well * linter 1 * oops * priv_client -> priv_root * fix mypy * mypy? * mypy? * small changes * shorter * ops * remove this * do not allocate paddr for reserve * nodiff * unified script * ops * dif ver * add lock * setup	2025-06-25 00:37:34 +03:00
chenyu	ffb032e31d	test_diagonal touchup (#10962 )	2025-06-24 15:51:19 -04:00
Utkarsh Gill	7f9958b632	Fix torch.linalg.diagonal crash due to invalid shrink in to_movement_ops (#10945 ) * fix as_strided shrink bug breaking torch.linalg.diagonal on tinygrad backend * cleanup * generic fix * tests * cmp with diagonal too * oops * move tests * fix test * remove unnecessary import * fix assert * compare against numpy --------- Co-authored-by: Utkarsh Gill <engelbart@Utkarshs-MacBook-Pro.local>	2025-06-24 15:36:06 -04:00
chenyu	18e264a449	Tensor.logsigmoid (#10955 )	2025-06-24 11:16:14 -04:00
George Hotz	e15754db28	remove (some) kernelize from llama and test schedule speed (#10939 ) * remove kernelize from llama * 405B * space	2025-06-23 15:07:31 -07:00
alpharush	22f9696522	Fix/hcqfuzz harnesss bug (#10923 ) * update command so extra module is found * fix empty range in randrange errors * lint	2025-06-23 11:22:30 +03:00
geohotstan	4ab7d792cc	ONNX improve dtype fallback (#10800 ) * fix * add early verbose demo test * is this how to write tests :s * is definition drift even a thing? gemini says it is * clean up * better * even better * try add to CI * doesn't work quite yet * much more work to be done * whoops * partition the test heh * skipif * some nits for better names * add webgpu test for onnxrunner * fix reference links * flush for now	2025-06-21 19:29:45 -04:00
George Hotz	92678e59ee	move kernel to opt (#10899 )	2025-06-20 15:22:28 -07:00
chenyu	3f29c7edda	minor onnx dropout cleanup (#10891 ) we should consider removing numpy random and test it similar to test_randomness, unless how seed works is part of spec?	2025-06-20 10:18:34 -04:00
qazal	000eb30f04	viz: remove prev profiler file (#10888 ) The new profiler is integrated in the main VIZ tab. Will also delete perfetto.html after matching [final features](https://github.com/tinygrad/tinygrad/pull/10763#issuecomment-2980543715) soon.	2025-06-19 23:05:46 +03:00
chenyu	7d5c769c6b	fix compile4 (#10797 )	2025-06-12 22:28:56 -04:00
geohotstan	806b68c2b3	Add fallback dtype to ONNX (#10788 ) * start * still need the float16 workaround in * tiny nit for correctness * idk hacks, I need to understand this device stuff better * no-op? * remove that assert for true nooooooop * add fallback_context	2025-06-12 20:39:21 -04:00
chenyu	5e7ad70aae	don't run linearize().uop tests in get_action_space test (#10766 ) * don't run linearize().uop tests in get_action_space test this part takes 2 minutes in CI and has nothing to do with action space. also not sure if the "for some reason" comment is still relevant * -n=auto test/models	2025-06-10 17:23:53 -04:00
nimlgen	800d1796d5	am_smi: kill process group (#10750 )	2025-06-10 15:23:39 +03:00
b1tg	24d328e313	onnx parser (#10435 ) * onnx parser * fix compile, lint * onnx.load -> onnx_load * compatible with ModelProto * fix test external_test_onnx_ops.py * fix tests * fix signed int * reduce to 261 lines * fix TypeProto.Optional * debug for _parse_message, add TypeProto.Sequence, cleanup * onnx_load from Tensor * remove BufferedReader * 174 lines and reduce tensor copy * cleanup * use onnx_load in external_model_benchmark.py * fix qcom test * [onnx] parser support external data --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-06-09 12:44:28 -04:00
George Hotz	32e9949052	rename lazydata to uop (#10698 )	2025-06-08 08:42:22 -07:00
George Hotz	3ece2e4bb5	hotfix: remove accel from extra	2025-06-08 08:20:34 -07:00
geohotstan	dedff0e96c	fix run huggingface onnx debug (#10679 )	2025-06-08 00:59:20 -04:00
nimlgen	85cea23557	nv: original bw qmd (#10672 ) * nv: original bw qmd * forgot	2025-06-07 01:43:22 +03:00
Sidharth N. Babu	ef14dfb277	compile fixes (#10442 )	2025-06-06 18:38:37 -04:00
chenyu	4a6d84c4c3	hotfix llama start_pos vmax is max_context-1 (#10659 ) * hotfix llama start_pos vmax is max_context-1 fixed `IGNORE_OOB=0 python3 examples/llama3.py --size 1B --benchmark --temperature 0` * hotfix: multitensor transformer test tests kv cache --------- Co-authored-by: George Hotz <geohot@gmail.com>	2025-06-06 00:41:25 -04:00
Xingyu	7a1bfb668d	Implement linalg_eigh function for tensor eigenvalue decomposition in torch backend (#10612 ) * Implement private _linalg_eigh function for tensor eigenvalue decomposition in torch backend * Add unit test for linalg.eigh function in TestTorchBackend This test verifies the eigenvalue decomposition of a 2x2 tensor using the linalg.eigh function, ensuring the computed eigenvalues and reconstructed tensor match the expected results.	2025-06-04 07:59:50 -04:00
nimlgen	883bb4541c	am: reserve address space (#10564 ) * am: reserve address space * f * cc * errno * fix * always has cpu mapping	2025-05-30 19:31:03 +03:00
qazal	5b59728c75	refactor LOAD(DEFINE_GLOBAL, VIEW) in kernels to LOAD(VIEW(DEFINE_GLOBAL)) (#10541 ) * changes to core tinygrad * fixups pt1 TC=3 docs/abstractions2.py IMAGE=2 test_quantize_dsp test_schedule * more tests * green now * images stay images	2025-05-30 14:27:58 +03:00

1 2 3 4 5 ...

1167 Commits