tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-07 22:23:55 -05:00

Author	SHA1	Message	Date
George Hotz	b46229ca51	use shrink in amd_matmul_uop (#13026 ) * use shrink in amd_matmul_uop * colors	2025-10-31 10:43:41 +08:00
wozeparrot	78f7650eec	faster tk matmul (#13006 )	2025-10-30 19:09:27 -07:00
George Hotz	512513c403	cleanup amd uop matmul (#13025 ) * cleanup amd uop matmul * remove mod * move that out * better variable names * var names * more * render fallback * colors	2025-10-31 10:04:45 +08:00
nimlgen	629b177b66	amd: sqtt works in profile mode (#13019 )	2025-10-30 23:48:52 +08:00
nimlgen	4d7a7096c9	am: enable perfmon (#13013 ) * am: enable perfmon * try * msg	2025-10-30 22:28:36 +08:00
George Hotz	4a741e8364	modernize amd uop matmul (#13011 ) * modernize amd uop matmul * progress * comment * more comments * revert that * mac cleanups * fix estimates * format	2025-10-30 17:02:38 +08:00
wozeparrot	92a87e37e4	fix: fetch_file (#13010 )	2025-10-29 22:44:22 -07:00
nimlgen	a6f5b1482e	amd: perf counters (#12975 ) * amd: perf counters * sq * cleaner * fix * if enabled * ruff * mypy * counters * reset * fix * no cpu	2025-10-30 00:10:31 +08:00
wozeparrot	d66c997a39	feat: thunderkittens fa2 (#12955 )	2025-10-28 11:27:45 -07:00
wozeparrot	24884c6768	fix: don't use KITTENS_HOPPER for 4090 (#12954 )	2025-10-27 17:19:53 -07:00
George Hotz	25c2da1579	check SPEC=2 in CI (#12945 ) * check SPEC=2 in CI * split SPEC=2 * fast enough	2025-10-27 21:53:57 +08:00
nimlgen	f4da94af28	system: reset is a method of pcidevice (#12936 )	2025-10-27 16:21:10 +08:00
wozeparrot	6b54378eba	working kitten matmul (#12935 )	2025-10-26 23:40:49 -07:00
George Hotz	db5c918215	source extra/cl_android.sh to fix opencl on android	2025-10-26 15:27:51 +08:00
qazal	2f95c10702	remu new instructions / use volatile in emulator tests (#12862 ) * remu new instructions * start moving to volatile * test_simple works * test_exec_mov works and lid is still here * test_exec_cmp_vopc * clang did s_mov_b32 exec_lo, 1 * don't hardcode v1 * support volatile in tests * hw_test passes * only the volatile version * subrev saturating behavior	2025-10-23 11:13:43 +08:00
chenyu	c5cee74706	remove BLOCK_REORDER (#12854 ) not used	2025-10-21 19:10:14 -04:00
b1tg	60d7e232f2	cuda fp8 (#12782 ) * cuda fp8 * tensor core * tc test * clean * clean pm	2025-10-21 15:05:25 -04:00
chenyu	8baa61bd67	use torch 2.9 and its Muon in test (#12773 ) * use torch 2.9 and its Muon in test * relax and disable	2025-10-21 13:35:17 -04:00
chenyu	f51f9aaa16	muon ns_params -> ns_coefficients (#12850 ) match the official torch one	2025-10-21 12:35:52 -04:00
nimlgen	1ad6598963	amd: trace all instructions (#12831 )	2025-10-21 20:52:24 +08:00
George Hotz	cad3ada909	tinygpu: build with SIP off works	2025-10-20 09:11:09 +08:00
nimlgen	59784a5972	amd: ensure ts is written (#12794 )	2025-10-19 23:55:49 +08:00
George Hotz	89e7f2fa00	mmapeak: gfx1103 support	2025-10-19 16:57:28 +08:00
George Hotz	617614beb7	add mi350x support to mmapeak (#12784 )	2025-10-19 16:11:07 +08:00
nimlgen	037f6e8fa0	qcom: ioctl for 7xx (#12777 )	2025-10-18 20:33:14 +08:00
geohotstan	5d209ee7ec	onnx helper intermediate node output validation (#12740 ) * start * update comments * good * add comments and better printing * done	2025-10-16 11:17:47 -04:00
nimlgen	3aa2277b8f	nv: usb4 (#12696 ) * hackish * prog * match * l * simpler * refactor * not osx * apple things * tiny changes * fix mask * match fix * nn	2025-10-16 20:11:19 +08:00
wozeparrot	cc2dfe22f5	tinyfs: fetch file utility (#12719 )	2025-10-15 23:38:56 -07:00
George Hotz	4a151e7533	make xcode signing happy, waiting for entitlement (#12712 )	2025-10-16 10:20:34 +08:00
Daniel	d65bd669f8	update tiny torch backend hook (#12575 ) * update the backend to fix torch deprecation warning * use param_hook to avoid full backward hook needlessly firing on inputs which do not require gradients * fix indentation --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-10-15 14:02:33 -04:00
Christopher Milan	0aabc1e938	Mesa NIR backend (NAK/LLVMpipe) (#12089 ) * nak works * TestOps::test_add works * testop has no crashes * fix bool casts * fix typo * add disassemble * RANGE and locals/regs * simplify NAKCompiler * disass cleanup * cleanup nir codegen * almost all tests passing * cleanup notes in extra/ * old notes * only import nak if NIR=1 * fix new SPECIAL syntax * fix local/shared memory * more tests passing * add DEFINE_VAR support * llvmpipe kinda works * diskcache * some mypy stuff * lvp passing test_ops.py * fix imports * actually fix imports * remove 'stdout' * fix llvm import * fix mypy issues * nicer errors * simpler test_dtype skips * test lvp in CI * fix github action syntax * fix more actions typos * switch to mesa 25.1.0 * diskcache_put * better generation for lvp nir_options * b64encode shader blobs * Revert diskcache changes This reverts commits `930fa3de8a` and `8428c694b3`. * general cleanup * better error messages * fix llvm import * fix windows tests * link with libm and libgcc_s * fix some errors * dont check for 'float4' * NIR uses pointer arithmetic * use tinymesa * bump tinymesa * bump tinymesa again * update lvp nir_options * print nir shader with DEBUG * simplify LVPCompiler * more tests * "gated" STORE * NAK is cacheable * more tests * all tests pass locally for NAK * test autogen in CI * autogen deps * more deps * fix uop_gc * fix macos * mypy * save 2 lines * save two more lines * save 1 line * save 4 lines * save more lines * Revert "save more lines" This reverts commit `dd3a720c5a`. * save more lines * fix LVP on windows * refactor * reorganize some code * refactor lib_gpu * move LVP check * out of order loads * remove support.mesa * bump tinymesa version * simplify LVP jit * macos * macos ci * shell: bash * testing * more testing * compute brew prefix * stupid typo * actually fix * lib * stdout on macos * inline gallivm_compile_module * Revert "inline gallivm_compile_module" This reverts commit `b65983b151`. * elf macos * semicolon * inherit from CPULLVMCompiler * ruff * disas test * fix libm linking * default is fine actually * arm works * add elf loader link test * fix NAK beam * pylint is too smart by half --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2025-10-15 17:38:33 +08:00
nimlgen	aa81bde150	amd: usb4/thunderbolt on macs (#12641 ) * tbgpu * works * cleaner * this * zero size * h * fix * simpler * prio over usb * c * not needed * linter * this way * mappings * mypy * mypy * mypy 2 * nn	2025-10-15 13:02:01 +08:00
wozeparrot	f228c03f9f	fetch raid from cloud (#10799 ) * feat: initial tinyfs device * feat: don't allow compute on tinyfs device * feat: tensor helpers to load and store * feat: bufferview for tinyfs * fix: keep copy sizes correct * fix: recv large * clean: unneeded * feat: comment * clean: unneeded * clean: remove * clean: remove * feat: get request tag * feat: rename to cloud * feat: send request_id * feat: start computing tree * feat: compute store tree on this side * feat: jank chunked load * feat: more debugging * feat: rename to just load and store * feat: correct chunk count * fix: fix load for < 1mb * feat: comments * feat: don't truncate on block devices * feat: better way of testing block device * feat: don't need to pad that much * feat: connect to nodes directly on load * feat: cache connections * feat: don't hard code chunk size * feat: close mmap when closing file handle * feat: don't overwrite stuff on disk if storing from disk * clean: debug print * fix: close mmap * feat: await workers * feat: fast copy from tinyfs to disk * feat: don't copy to device on last * feat: use single socket per device * feat: raid in tinyfs * clean: remove import * clean: type * feat: maintain single event loop * feat: lower worker count * feat: use connection pool * feat: fetch mapping in its own process * fix: release lock * feat: don't fetch if exists * feat: req id only on stores * feat: always fetch * fix: rangeify * feat: allow specifying raid root * fix: dealloc buffer * feat: start support non 0 offset * clean: use cleaner * feat: don't pass to threadpool * clean: typing	2025-10-14 07:53:55 -07:00
George Hotz	fb61f3519f	remove assign contiguous hack (#12659 ) * remove assign contiguous hack * remove bad contiguous usage in torch backend * assign	2025-10-14 16:42:14 +08:00
qazal	cd6aeebfee	sqtt: osx decoder installer (#12637 )	2025-10-13 17:26:12 +08:00
nimlgen	89be3590aa	amd: sqtt on gfx12 (#12564 ) * amd: sqtt on gfx12 * cleaner * thi * and this * ops * ugh * back * rm this * rm	2025-10-10 17:54:14 +08:00
wozeparrot	f12e2a75db	feat: add thunderkittens (#12590 )	2025-10-10 00:32:33 -07:00
nimlgen	1309cea247	rocprof parser in extra (#12569 ) * rocprof parser * viewer * vw * skip	2025-10-10 14:56:42 +08:00
chenyu	c8dfd10257	ShapeTracker.real_strides -> is_expanded [pr] (#12579 ) only keep the used part	2025-10-09 22:52:45 -04:00
George Hotz	9b66c2b0b7	fix weekly commits table (i didn't know we linted extra)	2025-10-10 09:23:33 +08:00
George Hotz	658b96cbfb	weekly commits table	2025-10-10 09:15:41 +08:00
nimlgen	a11b686c71	amd: sqtt for all gfx11 (#12546 ) * amd: general sqtt for gfx11 * target * ops * no gfx12 here	2025-10-09 17:04:06 +08:00
chenyu	ae51bdd06a	remove trivial use of RANGEIFY flag (#12550 ) some tests need update still	2025-10-09 02:29:38 -04:00
George Hotz	2653147cb7	delete the lowerer (#12526 )	2025-10-08 21:58:18 +08:00
chenyu	e701106a64	remove FUSE_ARANGE (#12511 ) it was the default already	2025-10-08 04:54:07 -04:00
nimlgen	4a756a37d8	amd: support rocm7 (#12502 ) * amd: support rocm7 * mock	2025-10-08 14:30:39 +08:00
George Hotz	514d2a0774	merge tagless reshapes (#12474 ) * merge tagless reshapes * cleanup	2025-10-07 13:57:58 +08:00
George Hotz	b4509fba31	thundermittens (#12471 ) * thundermittens * give device a type	2025-10-07 11:47:39 +08:00
George Hotz	0f25b4b289	move frontend dir to nn [pr] (#12470 )	2025-10-07 10:42:22 +08:00
hooved	0f804c9a83	Stable Diffusion model init for mlperf (#12314 ) * include clip pr diff * updated unet and sd init * dehardcode default device * revert beam hang workaround --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-10-02 02:28:41 -04:00

... 3 4 5 6 7 ...

1491 Commits