tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-01-08 22:48:25 -05:00

Author	SHA1	Message	Date
Christopher Milan	0aabc1e938	Mesa NIR backend (NAK/LLVMpipe) (#12089 ) * nak works * TestOps::test_add works * testop has no crashes * fix bool casts * fix typo * add disassemble * RANGE and locals/regs * simplify NAKCompiler * disass cleanup * cleanup nir codegen * almost all tests passing * cleanup notes in extra/ * old notes * only import nak if NIR=1 * fix new SPECIAL syntax * fix local/shared memory * more tests passing * add DEFINE_VAR support * llvmpipe kinda works * diskcache * some mypy stuff * lvp passing test_ops.py * fix imports * actually fix imports * remove 'stdout' * fix llvm import * fix mypy issues * nicer errors * simpler test_dtype skips * test lvp in CI * fix github action syntax * fix more actions typos * switch to mesa 25.1.0 * diskcache_put * better generation for lvp nir_options * b64encode shader blobs * Revert diskcache changes This reverts commits `930fa3de8a` and `8428c694b3`. * general cleanup * better error messages * fix llvm import * fix windows tests * link with libm and libgcc_s * fix some errors * dont check for 'float4' * NIR uses pointer arithmetic * use tinymesa * bump tinymesa * bump tinymesa again * update lvp nir_options * print nir shader with DEBUG * simplify LVPCompiler * more tests * "gated" STORE * NAK is cacheable * more tests * all tests pass locally for NAK * test autogen in CI * autogen deps * more deps * fix uop_gc * fix macos * mypy * save 2 lines * save two more lines * save 1 line * save 4 lines * save more lines * Revert "save more lines" This reverts commit `dd3a720c5a`. * save more lines * fix LVP on windows * refactor * reorganize some code * refactor lib_gpu * move LVP check * out of order loads * remove support.mesa * bump tinymesa version * simplify LVP jit * macos * macos ci * shell: bash * testing * more testing * compute brew prefix * stupid typo * actually fix * lib * stdout on macos * inline gallivm_compile_module * Revert "inline gallivm_compile_module" This reverts commit `b65983b151`. * elf macos * semicolon * inherit from CPULLVMCompiler * ruff * disas test * fix libm linking * default is fine actually * arm works * add elf loader link test * fix NAK beam * pylint is too smart by half --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2025-10-15 17:38:33 +08:00
qazal	cd6aeebfee	sqtt: osx decoder installer (#12637 )	2025-10-13 17:26:12 +08:00
nimlgen	89be3590aa	amd: sqtt on gfx12 (#12564 ) * amd: sqtt on gfx12 * cleaner * thi * and this * ops * ugh * back * rm this * rm	2025-10-10 17:54:14 +08:00
nimlgen	1309cea247	rocprof parser in extra (#12569 ) * rocprof parser * viewer * vw * skip	2025-10-10 14:56:42 +08:00
nimlgen	9c9e337c78	amd: parse soc enums (#11727 ) * amd: parse soc enums * remove from mock * fix * minimal amd_gpu	2025-08-19 15:06:09 +03:00
uuuvn	052191eae4	Remote multihost (p2p with infiniband verbs) (#9746 ) Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2025-07-27 14:44:32 -07:00
nimlgen	df3ba0a7c0	autogen: fix imports in libusb (#11294 )	2025-07-21 13:04:27 +03:00
nimlgen	f9e4c4e57a	nv: nvpci blackwell support (#11127 ) * nv: start 5090 * gsp init 5090 * mmu * works * after merge * clenaer * rwk * x * fx * finish? * fix * unrelated * fix * commenbt	2025-07-11 17:02:09 +03:00
nimlgen	6067568087	nv: remove hardcoded CTRL_CMD_VASPACE_COPY_SERVER_RESERVED_PDES (#11057 )	2025-07-02 20:41:10 +03:00
nimlgen	1c45b9f7fb	start nvpci (#10521 ) * start nvpci * talk to fsp * boot args * riscv core bootted * q * agen * got gsp init msg * some fixes * set registry, stuck aft lockdown( * start ga/ad port * gsp init on ada * more classes allocated * more * mm * fixes and progress * no huge pages for now * mm seems workin, but switch to 512mb page for simplicity * working state * not cleaned * claned * nvd=1 * start gr ctx * compute * clean 1 * cleanup 2 * cleanup 3 * cleaner 4 * cleaner 6 * add iface to nv * save before reboot * merged into NV * moveout mm * post merge * cleaner 7 * merge and rebase * pciiface abstraction + reset * download fw from web * print logs * minor changes + p2p * cleaner 8 * cleaner 9 * cleaner 10 * delete * delete this as well * linter 1 * oops * priv_client -> priv_root * fix mypy * mypy? * mypy? * small changes * shorter * ops * remove this * do not allocate paddr for reserve * nodiff * unified script * ops * dif ver * add lock * setup	2025-06-25 00:37:34 +03:00
nimlgen	b6e574fcdf	am: smu 14.0.3 is smu 14.0.2 (#10714 )	2025-06-13 23:07:56 +03:00
George Hotz	0fbf3f5554	Revert "Revert "Update autogen ci runner to ubuntu 24.04 (#10736 )" (#10757 )" (#10758 ) This reverts commit `a6dba9b9d9`.	2025-06-10 09:32:27 -07:00
George Hotz	a6dba9b9d9	Revert "Update autogen ci runner to ubuntu 24.04 (#10736 )" (#10757 ) This reverts commit `1d15374c7a`.	2025-06-10 09:31:51 -07:00
uuuvn	1d15374c7a	Update autogen ci runner to ubuntu 24.04 (#10736 ) For `kfd.AMDKFD_IOC_EXPORT_DMABUF`	2025-06-10 08:33:02 -07:00
George Hotz	7ff175c022	cache a venv to avoid pip usage (#10689 ) * try built in pip caching * try venv * export venv * set VIRTUAL_ENV * revert that * venv key * fix * ci cache hit? * fix windows	2025-06-07 20:13:41 -07:00
nimlgen	85cea23557	nv: original bw qmd (#10672 ) * nv: original bw qmd * forgot	2025-06-07 01:43:22 +03:00
nimlgen	883bb4541c	am: reserve address space (#10564 ) * am: reserve address space * f * cc * errno * fix * always has cpu mapping	2025-05-30 19:31:03 +03:00
nimlgen	d90ddcc365	nv: blackwell support (#10487 ) * nv: blackwell support * fixes * hm * h * fixes * mypy * xx * yy * arr * revert * oops * unrelated	2025-05-24 18:23:53 +03:00
nimlgen	2895198c36	am: download regs (#10419 ) * am: download regs * x * linter * mypy * after merge * raise * fixed name * fix * xx * remove * missing reg * missing reg * move to online * ops	2025-05-20 18:59:56 +03:00
nimlgen	30bd6a619f	usb gpu (#8766 ) * start gpu * progress * fixes * read correct * libusb * libusb works * support asm24 * hmm * one access file * fix extra * start AMBar * works on am * back to usb * patch fw * full fast write into a bar * ugh, minus one gpus, next please * mute libusb for now * usb for asm24 * 63 * hmm * ops * rescan * and gpu shoudl be there * enumerate them? * usbgpu bus 4, 100% reliable (draft) * lil * works * comments * add DEBUG * cleaner * simplest * Revert "simplest" This reverts commit `1d00354c16`. * Revert "cleaner" This reverts commit `c5662de956`. * assert we find gpu * that's simpler * this back * simpler? * correcT * work * nonsense * works with more checks * this works * the 6s in the right place * reliable now * fix after reboot * set config * 1s timeouts * close to fw loading * streams * usbhub works * endpoints * fix * want to test tiny10 * move to tiny 10 * fix gpu * ugly speed * smth * mostly broken, but signals and dmas * do not reset gpu every time * changes to run kernels * ugh, not working * t10 * pg and sc files * some prog * um? * somehow it works * patched for 24 * some tries * minimal * moving * back to working * so sloooooow * move to controller * usb.py rewrite * rework * cleaner 1 * cleaner 2 * cleaner 3 * new abstractions * aft merge * init controller * cleaner 4 * cleaner 5 * patcher + tiny changes * ignore that * cleaner 6 * after rebase * cleaner 7 * bring it back * start linter war * linter 2 * autogen was missing * fix autogen * typing * better? * mypy * extra/legacy rename and cleaner * shuffle * better printing * tiny changes and tests --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-05-01 18:03:47 +03:00
nimlgen	9caceda79a	amd: comgr is not required (#10128 )	2025-05-01 13:41:44 +03:00
nimlgen	db51133537	rename HWInterface -> FileIOInterface (#9989 ) * rename HWInterface -> FileIOInterface * ugh	2025-04-22 22:18:57 +03:00
deftdawg	32bbff942c	amd: add nbio 7.2.0 for some rdna2 (#9964 ) * - Updated of #9700 which fixes #9665 but for the Steam Deck which was erroring on NBIO 7.2.0 * unrelated change --------- Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>	2025-04-22 12:10:48 +03:00
nimlgen	a9430b4118	am: fix metrics table for smu14_0_2 (#9863 )	2025-04-12 19:07:22 +03:00
uuuvn	3ee317ffed	Fix kfd autogen and verify it in ci (#9818 ) Had to autogen newer uapi headers for #9746 (dmabuf export ioctl missing), submitting just the fix without updating to newer headers as they are only needed for infiniband stuff	2025-04-10 09:53:42 +08:00
nimlgen	3e2f42c2e8	autogen: remove am headers from extra (#9666 )	2025-04-01 14:45:30 +07:00
uuuvn	962c0f65f8	Fix generate_am (#9626 ) This should be a comment	2025-03-31 01:15:44 +08:00
uuuvn	2a4247b8c2	RDNA 3.5 support (#9627 )	2025-03-31 01:15:20 +08:00
nimlgen	54e1e59b44	am: rdna 4 support (#9621 ) * hm * fix * return this * fine * g * ruff * fix	2025-03-29 23:16:27 +07:00
uuuvn	5908b89f71	MI300X support (WIP) (#9585 )	2025-03-29 19:46:42 +08:00
uuuvn	dd9aae02c3	Refactor ops_amd.py (MI300X prereq) (#9428 )	2025-03-29 00:17:20 +07:00
nimlgen	edf9e1bf8d	am: move out soc21 to a sep module (#9551 ) * am: soc module is not part of am * am: soc module is not part of am	2025-03-24 14:17:42 +07:00
Ahmed Harmouche	7ce7fe0574	Refactor webgpu_dawn lib finding (#9547 ) * Refactor webgpu_dawn lib finding * Fix ruff	2025-03-23 08:23:29 -04:00
uuuvn	e85001b6ee	SQTT profiling (#9278 ) * sqtt * docs * multi-device * ProfileSQTTEvent * exec update * 256mb default * don't let people hang their gpus * bitfields from autogen * asic info from mesa * more bitfields from autogen * SQTT_ITRACE_SE_MASK --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-03-11 13:19:56 +08:00
uuuvn	b75f307234	amd: autogen ip bases (#9360 )	2025-03-05 22:30:38 +03:00
nimlgen	993ef42bd5	am: hdp cg (#9346 )	2025-03-04 20:44:09 +03:00
George Hotz	bf36967883	cuda hooking (#9180 ) * cuda hooking * progress * more hook cuda * fix params * compile + cuMemHostAlloc hook * work * revert that	2025-02-20 19:20:01 +08:00
nimlgen	c6c2373bc0	replace libpciaccess autogen with just pci regs (#8983 ) * replace libpciaccess autogen with just pci regs * add pci.py	2025-02-09 18:40:45 +03:00
nimlgen	79de980565	am: do not fork pci bars (#8969 )	2025-02-08 19:03:17 +03:00
Ahmed Harmouche	133cacadde	Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646 ) * Switch to dawn, all tests passing locally * Use dawn-python * Skip failing test * Skip midcast and fix timestamp on metal ci * Autogen webgpu * Try fetch dawn lib again * /usr/lib * Without lib prefix * Test autogen diff * Delete webgpu support, move everything to ops_webgpu * mypy fix * Simplify, refactor * Line savings * No ResultContainer * Type annotation for result * Some more simplifications * Why was this explicit sync used at all? * Refactor: delete functions that are only used once * Create shader module inline * Clear unit tests cache, maybe that solves it * That wasn't it * Try deleting cache to pass failing weight compare * weights_only=False for pytorch 2.6 * Simplify ctype array creation * Remove nanosecond precision timestamps * Simplify error handling * Refactor, add back type annotations * Deleted custom submit function, refactor * read_buffer simplify * Fix use after free, refactor * Simplify supported_features * Runtime docs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-07 15:16:59 +08:00
nimlgen	86feb98dcd	am: add support for 7600 (#8910 ) * am: start to add support for 7600 * test_tiny passes * mmhub 3 0 2 * cleaner	2025-02-06 14:04:07 +03:00
uuuvn	6dadb60c93	LLVM JIT (+autogen llvm instead of llvmlite) (#8486 ) * LLVM JIT * Autogen LLVM * Update autogen * Move things around * even more non-determinism * windows * more autogen weirdness * more windows stuff * blind windows development try 2 * more blind windows development * even more blind windows development * maybe i should just set up a windows vm... * why can't everyone just use sysv abi? * cleanup debugging stuff * unused import * icache flushing isn't required on x86 * merge jit_nt and jit_unix * more * Temporary hack to not segfault * better error * bad conflict resolution * Attempt to simplify support/llvm.py * More refactoring --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-02 19:52:42 +08:00
nimlgen	2f0856c1e2	qcom: use hwinterface (#8565 ) * qcom: use hwinterface * ops * not needed anymore	2025-01-11 17:11:23 +03:00
nimlgen	aa3d612df2	add script to install amd mockgpu on macOS (#8536 ) * upload artifact every time * hm * sh script * hm * hm2 * hm2 * hm2 * no sudo * def paths * small comments * text * try auth for bigger limits	2025-01-09 01:29:25 +03:00
patrini32	afef69a37d	MOCKGPU on mac os (#8520 ) * tweaks for macos * fix * fix * typo * remove nvidia changes * remove nv related changes * change address back	2025-01-07 20:27:43 +03:00
nimlgen	ab3ac2b58d	hw interface abstraction (#8524 ) * use HWInterface in autogen * mockgpu * HWInterface * more HWInterface * fix * fix * old code * fix * implicit field definition * add offset check to mockgpu too * refactor * forgot to pass flags + read rewrite * test * play with vfio * nv: this should be kept * try this * vfio * rm overwrite=True * linetr * do not reinit kfd * minor * mypy * mock * init them once --------- Co-authored-by: patrini32 <patrini23@proton.me>	2025-01-07 18:18:28 +03:00
nimlgen	c18307e749	AM driver (#6923 ) * connect to gpu * rlc init? * gfx comp start init * early init is hardoded, some progress with fw * gart * progress, next mqd * ring setup, still does not execute anything * ugh write correct reg * pci2: vm * pci2: start psp * vm seems to work * pci2: gfx start * pci2: fix psp ring resp * pci2: try ring * pci2: mes and some fixes * pci2: some progress * pci2: progress * pci2: mm * pci2: discovery * pci2: correct apertures * pci2: b * pci2: i * pci2: l * pci2: o * pci2: cmu * pci2: mes_kiq works * pci2: mes * pci2: kcq does not work( * pci2: unhalt gfx * ops_am * minor * check if amdgpu is there, or we will crash * bring back graph, it just works * less prints * do not init mes (not used) * remove unused files * ops_am: start move into core * ops_am: works * clcks, but still slower * faster + no mes_kiq * vm frags + remove mes * cleanup fw * gmc tiny cleanup * move to ops_amd * comment out what we dont really need * driverless * close in speed * am clean most of ips * gmc to ips * cleaner * new vm walker * comment old one * remove unsued autogens * last write ups * remove psp hardcoded values * more * add logs * ih * p2p and sdma * vfio hal and interrupts * smth * amd dev iface * minor after rebase * bind for sdma * Revert "bind for sdma" This reverts commit `a90766514d`. * tmp * debug new mm * ugh, allreduce hangs fixed * p1 * works * no pci.py * cleaner a bit * smth * tiny cleanups * cleaner a bit * pciiface * linter * linter 2 * linter 3 * linter * pylint * reverted unrelated changes * unrelated * cmp tool * ugh wrong fw * clockgating * unrelated * alloc smaller chunks * this * opt sigs * collect stat * ops * upd * proclogs * proclogs2 * vfio * ruff * linter pylint * oops * mypy p1 * mem fix * mypy p2 * mypy p3 * mypy p4 * correct * minor * more tests * linter in tests * pci_regs header * minor write up * setup * do not require libs --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-12-31 23:06:17 +03:00
nimlgen	81d415be03	amd pkt3 refactor (#7923 ) * amd pkt3 refactor * replace this * linter * fix * cmt * fast * simpler * linter * smth * missing	2024-11-28 11:06:37 +03:00
Jacky Lee	c8b59416d0	fix: find_library can be None (#7145 )	2024-10-18 20:50:52 +03:00
nimlgen	8094340221	nv print info about faults (#7057 ) * nv print info about faults * unrelated changes * nv_gpu.GT200_DEBUGGER in mockgpu * regen with ocrrect version * spacing	2024-10-14 21:49:38 +03:00

1 2

75 Commits