tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-09 06:58:11 -05:00

Author	SHA1	Message	Date
nimlgen	e372c841ba	hevc: beam in decode (#14067 ) * hevc: beam in decode * fine * g	2026-01-08 15:47:16 +03:00
Christopher Milan	0120d69caa	autogen: avcodec (and simplify workflow) (#14031 ) * simplify autogen workflow and add avcodec verification - Consolidate all regeneration into single steps (delete + import) - Remove continue-on-error and individual diff checks - Use git diff at end to catch all differences - Show artifact URL in failure message - Add avcodec.py verification 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * patch avcodec --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-05 23:30:25 -05:00
George Hotz	20653d2996	assembly/amd: make pdf.py code shine (#14029 ) * assembly/amd: make pdf.py code shine * no merge * pdf2 is the future * something * regen enums * test * work * remove junk * write * pcode extraction * pdf2 passes all tests * simplify * simpler pdf * late filter * remove hacks * simplify pdf2.py * field type * remove defaults * don't export srcenum * simple pdf.py * simpler * cleaner * less hack in PDF	2026-01-05 18:49:40 -08:00
Christopher Milan	b2a0b9c551	autogen: dump patch in CI (#14010 ) * autogen: don't fast-fail, produce patch artifact on differences All verification steps now use continue-on-error to run completely. Each job generates a patch artifact containing all differences found. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * add gen from header test * fix tests * fail if diff * add forward decl autogen test * remove confusing/wrong comments * macos unittests set LIBCLANG_PATH --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-01-04 22:38:12 -05:00
George Hotz	8328511808	assembly/amd: make the emu.py code shine (#13996 ) * assembly/amd: make the code shine * lil clean * reg back in pcode * cleanups * gen fma_mix * no writelane hacks * fn cleanup * dead vgpr_write * readable * smem * cleanup bench_emu * speedups * simpler and faster * direct inst._fn * split fxn * Revert "simpler and faster" This reverts commit `e85f6594b3`. * move lds to wavestate * dispatcher * pc in dispatch * literal isn't wavestate * cleanups + program * one readlane * exec_vop3sd in exec_vop * cleaner exec_vopd * fully merge VOP3P * no special paths * no SliceProxy * low=0 * no bigint * failing tests * fma on python 3.13	2026-01-03 20:33:09 -08:00
Christopher Milan	35c2870b1f	gate image_conv2d pitch hacks on IMAGE==1 (#13995 ) * gate image_conv2d pitch hacks on IMAGE==1 * fix opencl image copies * cleanup	2026-01-03 12:27:31 -05:00
Christopher Milan	9dc524536f	IMAGE=1 creates "dynamic" images (#13769 ) * remove image from BufferSpec * cl tiny_gemm (64) works * mypy * padding * openpilot CL * reshape properly * remove extra qcom checks * pad output * mypy * update compile test * move undo * TestImageCopy valid images * TestImageRealization valid images * TestImageDType valid images * cleanups * test_renderer_failures * ruff * mypy * simplify ops_qcom * bump step time * Revert "bump step time" This reverts commit `75a037c7d0`. * "dynamic textures" are optional * a start * IMAGE=1 works, no FLOAT16 * fast but wrong * mypy * some fixes * better * works * refactor * oops	2026-01-02 16:22:39 -05:00
Christopher Milan	61dc70f1a8	add driving_vision IMAGE=1 benchmark (#13979 )	2026-01-02 13:58:27 -05:00
George Hotz	dfb813b760	assembly/amd: add pcode ds ops (#13939 ) * assembly/amd: add pcode ds ops * refactors * fix ds op * update autogen * fix flat bug * more tests * fix emu test * that's a hack * generic * fix all tests * two tests * fix test failure * better * remove __all__	2026-01-01 16:24:13 -05:00
chenyu	ce84a23142	remove tee in benchmark (#13954 )	2026-01-01 10:55:36 -05:00
chenyu	e2987001ee	unify pre-commit mypy and ci mypy (#13940 )	2025-12-31 17:51:51 -05:00
chenyu	a9a7b33404	IGNORE_OOB=0 in CI (#13903 )	2025-12-31 12:56:59 -05:00
chenyu	ba9aa5cd6f	skip some PTX IGNORE_OOB validation (#13927 )	2025-12-31 12:40:21 -05:00
chenyu	4968060ad4	fix IGNORE_OOB=0 for WEBGPU (#13926 )	2025-12-31 10:41:28 -05:00
chenyu	404755bafd	merge ci ruff tests and update ruff version (#13922 )	2025-12-31 09:53:49 -05:00
chenyu	dc27eb48ac	remove PYTHONPATH="." from test.yml (#13909 )	2025-12-30 17:00:16 -05:00
George Hotz	efc99d0c55	assembly/amd: more refactors (#13907 ) * assembly/amd: more refactors * more refactors * more refactors * simpler emu * generate.py * regen all * cleanups * more * work * more readme * lil	2025-12-30 16:13:24 -05:00
George Hotz	69cdc8066d	assembly/amd: add dtype tests to AMD IDE CI (#13899 ) * add dtype tests to AMD IDE CI * more tests * add trig preop * regen done * split to amd autogen * simpler	2025-12-30 11:09:51 -05:00
George Hotz	2b838dc1d8	assembly/amd: fix AMD_LLVM=1 support in emulator (#13881 ) * fix AMD_LLVM=1 support in emulator * more llvm with dtype * work * more fixes * fix dtype	2025-12-30 09:09:57 -05:00
George Hotz	9d8397be11	add CDNA3+RDNA4 support (#13882 ) * fix CI * remove junk * rename lib to dsl * correct * cleanups	2025-12-29 15:51:29 -05:00
George Hotz	81cf9ea0ab	rename to extra.assembly.amd (#13879 )	2025-12-29 14:10:55 -05:00
George Hotz	37f0fa11b6	rdna3 test cleanups (#13878 ) * rdna3 test cleanups * cleanups * ugh DONT SKIP	2025-12-29 13:41:59 -05:00
George Hotz	f1471a3b99	speed up rdna3 unit tests + add to CI (#13871 ) * speed up rdna3 unit tests * add test to CI * faster and simpler * speedups * bugfixes * use helper * fix CI maybe * test fixes * llvm-21 on 24.04 * upd * llvm-21 * fix test * bring that back * merge gen into lib * test generators	2025-12-29 10:26:48 -05:00
chenyu	f5090192c8	reorder AMD tensor core benchmark test (#13860 ) * reorder AMD tensor core benchmark test * disable that	2025-12-28 12:29:51 -05:00
chenyu	cba05acadf	re-enable TYPED=1 import test (#13858 )	2025-12-28 11:49:06 -05:00
qazal	a1c1684b91	set .amdhsa_kernarg_size in asm test (#13826 )	2025-12-25 13:08:14 +09:00
George Hotz	4702da41d5	hotfix: mkdir for extra/disassemblers	2025-12-19 17:18:37 -04:00
chenyu	80b84f5267	ruff lint tinykitten (#13762 ) deleted used import and double spaces. a few ignore to not change the real code	2025-12-19 14:31:00 -05:00
Christopher Milan	97103831c5	Revert "remove image from BufferSpec (#13636 )" (#13761 ) This reverts commit `2571a1eb47`.	2025-12-19 13:54:36 -05:00
Christopher Milan	2571a1eb47	remove image from BufferSpec (#13636 ) * remove image from BufferSpec * cl tiny_gemm (64) works * mypy * padding * openpilot CL * reshape properly * remove extra qcom checks * pad output * mypy * update compile test * move undo * TestImageCopy valid images * TestImageRealization valid images * TestImageDType valid images * cleanups * test_renderer_failures * ruff * mypy * simplify ops_qcom * bump step time	2025-12-19 13:41:20 -05:00
George Hotz	4b741e893f	remove REMOTE=1 (#13722 ) * remove REMOTE=1 * leave ibverbs	2025-12-16 15:58:10 -04:00
George Hotz	e5a66ace80	multi custom kernel support (#13716 ) * multi custom kernel support * custom kernel xfrom * works * no SPEC=2 on ck * panic * touchups	2025-12-16 11:36:30 -04:00
George Hotz	7589c897b2	split usbgpu tests into their own benchmark [pr] (#13711 )	2025-12-15 21:42:40 -04:00
qazal	6bafd90248	remove unused process replay input [pr] (#13712 )	2025-12-16 09:29:35 +08:00
George Hotz	316da9f7ff	llm: add created/model fields, non-streaming support, and tests (#13660 ) * llm: add created/model fields, non-streaming support, and tests - Add `created` timestamp and `model` fields to response (required by OpenAI spec) - Add non-streaming mode support for /v1/chat/completions - Add `send_data` helper to HTTPRequestHandler for responses with Content-Length - Refactor viz/serve.py to use send_data - Add integration tests using real OpenAI client 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * add openai to testing * toml * Remove 'openai' from dependencies Removed 'openai' from the dependencies list. * bump cache --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-12 14:50:36 -05:00
George Hotz	f0fa9bcd98	openai api for llm (#13648 ) * openai api for llm * responds to simple request * schedule cache needs to unbind * stream works * share stream code * 20k * one print * cid	2025-12-12 08:25:33 -05:00
nimlgen	cbae33003d	ci: add usb4 (#13643 ) * ci: add usb4 * debug=3 * undef * revert	2025-12-11 19:41:41 +03:00
chenyu	2471b49e45	minor bert / llama change from grad acc branch (#13622 ) * minor bert / llama change from grad acc branch * revert those	2025-12-08 16:04:14 -05:00
Christopher Milan	cb3d756547	NAK compile-only test (#13621 )	2025-12-08 15:53:46 -05:00
Christopher Milan	a4c3d48aa9	compile-only test for IR3 actually works (#13619 )	2025-12-08 15:07:49 -05:00
Christopher Milan	1c16b6e082	Mesa: freedreno (#12746 ) * ir3 init * got a program * 1 + 1 works * use isa_disasm instead of shader_disasm * wip * matmul works * works on py3.14 * fix const loading * skip QCOM failing tests * cleanup * args actually work * add compile-only tests * fix typo and install tinymesa * IR3 NULL backend * (float32) images work * autogen fix * fix compile only test * typo * mypy happy * compile-only uses py3.14 * bump mesa * unify qcom disassembler * float16 works * disasm shows in viz * save a line * add real del * variable workgroup sizes * simplify diff * bump line count * properly set wgsz * regen mesa * no preamble * bump lines	2025-12-08 14:02:08 -05:00
chenyu	b981b6f89e	remove old llama grad_acc (#13611 ) * remove old llama grad_acc * GRADIENT_ACC_STEPS=1	2025-12-07 13:03:47 -05:00
Christopher Milan	4eae4b0ce6	unify adreno autogen with mesa (#13604 ) * unify adreno autogen with mesa * gen pm4 * TestTiny::test_plus works * add a6xx enums * IMAGE=2 TestTiny::test_gemm works * remove adreno from CI * cleanup	2025-12-06 15:17:36 -05:00
Christopher Milan	dec2f50aee	reenable process replay for lvp (#13592 )	2025-12-05 12:36:35 -05:00
chenyu	ac1227575f	IMAGE=1 driving_vision in benchmark (#13587 )	2025-12-05 10:20:54 -05:00
qazal	6d92e9ffbf	hotfix: skip process replay on lvp (#13585 )	2025-12-05 19:25:23 +08:00
George Hotz	24ca8eeaa7	small fixups from schedule_cache (#13557 )	2025-12-03 15:41:16 -08:00
Douglas Nyberg	f5abd38132	remove tfa dependency: use keras.optimizers.Lamb and tf.raw_ops for LARS (#13555 )	2025-12-03 17:48:27 -05:00
chenyu	8902781dc1	enable more benchmarks (#13540 ) * enable more benchmarks * disable some * adjust ASSERT_MIN_STEP_TIME * mac NOCLANG=1	2025-12-02 20:31:14 -05:00
George Hotz	21184ae6b1	bump cache to 14 (#13530 )	2025-12-02 08:02:19 -08:00

1 2 3 4 5 ...

1147 Commits