Commit Graph

10606 Commits

Author SHA1 Message Date
George Hotz
4a151e7533 make xcode signing happy, waiting for entitlement (#12712) 2025-10-16 10:20:34 +08:00
chenyu
c3278e5622 clean up old tests (#12708) 2025-10-15 17:53:17 -04:00
chenyu
b8cf35fb77 print macOS version in CI (#12705) 2025-10-15 15:05:33 -04:00
Daniel
d65bd669f8 update tiny torch backend hook (#12575)
* update the backend to fix torch deprecation warning

* use param_hook to avoid full backward hook needlessly firing on inputs which do not require gradients

* fix indentation

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-10-15 14:02:33 -04:00
nimlgen
db5ae846aa nv: do not use va_addr for cpu accesses (#12697)
* nv: do not use va_addr for cpu accesses

* mypy
2025-10-15 22:48:12 +08:00
nimlgen
3ab23af829 nv: copy prog with copyin (#12701)
* nv: copy prog with copyin

* to bytes

* fix test
2025-10-15 22:48:01 +08:00
nimlgen
fafbf3daea memory: reserve ptable (#12702) 2025-10-15 22:47:50 +08:00
George Hotz
85a907605c hotfix: only 20 steps of beautiful_mnist_torch, some CI machines are slow 2025-10-15 22:29:34 +08:00
Christopher Milan
e1996d358c use RTLD_GLOBAL on macos (#12699) 2025-10-15 22:24:50 +08:00
chenyu
312c622d35 support None in pad_to and shrink_to (#12700) 2025-10-15 09:25:31 -04:00
George Hotz
612e3d6143 replace mop arg with vectorized index (#12695)
* replace mop arg with vectorized index

* tests passing

* better viz

* no compile4
2025-10-15 20:50:06 +08:00
wozeparrot
9ec4c06d7d feat: one request per device (#12698) 2025-10-15 05:22:07 -07:00
Sieds Lykles
99aa3bd5f9 reduce collapse reduce only the cut range (#12687) 2025-10-15 13:57:41 +02:00
Sieds Lykles
91ac4f1f92 late merging of where and load (#12694) 2025-10-15 13:33:06 +02:00
qazal
768dc952de viz ui cleanups / renaming (#12691)
* better viz names

* delete unused

* don't use opacity, it's multiplicative

* keep styles

* scrollbar coloring

* pyrender doesn't work here

beautiful_mnist r_64_16_32_36@lower all index dtypes
2025-10-15 18:40:22 +08:00
chenyu
2e50ed0767 increase timeout of resnet cron (#12693)
does not finish in 6 hours now
2025-10-15 06:08:58 -04:00
Christopher Milan
0aabc1e938 Mesa NIR backend (NAK/LLVMpipe) (#12089)
* nak works

* TestOps::test_add works

* testop has no crashes

* fix bool casts

* fix typo

* add disassemble

* RANGE and locals/regs

* simplify NAKCompiler

* disass cleanup

* cleanup nir codegen

* almost all tests passing

* cleanup notes in extra/

* old notes

* only import nak if NIR=1

* fix new SPECIAL syntax

* fix local/shared memory

* more tests passing

* add DEFINE_VAR support

* llvmpipe kinda works

* diskcache

* some mypy stuff

* lvp passing test_ops.py

* fix imports

* actually fix imports

* remove 'stdout'

* fix llvm import

* fix mypy issues

* nicer errors

* simpler test_dtype skips

* test lvp in CI

* fix github action syntax

* fix more actions typos

* switch to mesa 25.1.0

* diskcache_put

* better generation for lvp nir_options

* b64encode shader blobs

* Revert diskcache changes

This reverts commits 930fa3de8a and 8428c694b3.

* general cleanup

* better error messages

* fix llvm import

* fix windows tests

* link with libm and libgcc_s

* fix some errors

* dont check for 'float4'

* NIR uses pointer arithmetic

* use tinymesa

* bump tinymesa

* bump tinymesa again

* update lvp nir_options

* print nir shader with DEBUG

* simplify LVPCompiler

* more tests

* "gated" STORE

* NAK is cacheable

* more tests

* all tests pass locally for NAK

* test autogen in CI

* autogen deps

* more deps

* fix uop_gc

* fix macos

* mypy

* save 2 lines

* save two more lines

* save 1 line

* save 4 lines

* save more lines

* Revert "save more lines"

This reverts commit dd3a720c5a.

* save more lines

* fix LVP on windows

* refactor

* reorganize some code

* refactor lib_gpu

* move LVP check

* out of order loads

* remove support.mesa

* bump tinymesa version

* simplify LVP jit

* macos

* macos ci

* shell: bash

* testing

* more testing

* compute brew prefix

* stupid typo

* actually fix

* lib

* stdout on macos

* inline gallivm_compile_module

* Revert "inline gallivm_compile_module"

This reverts commit b65983b151.

* elf macos

* semicolon

* inherit from CPULLVMCompiler

* ruff

* disas test

* fix libm linking

* default is fine actually

* arm works

* add elf loader link test

* fix NAK beam

* pylint is too smart by half

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2025-10-15 17:38:33 +08:00
qazal
f0268d13f6 cleanup viz server (#12688) 2025-10-15 15:58:36 +08:00
nimlgen
aa81bde150 amd: usb4/thunderbolt on macs (#12641)
* tbgpu

* works

* cleaner

* this

* zero size

* h

* fix

* simpler

* prio over usb

* c

* not needed

* linter

* this way

* mappings

* mypy

* mypy

* mypy 2

* nn
2025-10-15 13:02:01 +08:00
George Hotz
236c4590c3 use margs as intermediate for new style mops (#12686)
* use marg to prepare for movement op change

* clean up forced reshape

* move marg

* more marg

* more
2025-10-15 12:43:00 +08:00
qazal
7597e1dcac pyrender in viz (#12682)
* pyrender in viz

* keep profile still print_tree

* keep special in render
2025-10-15 11:53:30 +08:00
qazal
60e03eec37 viz: add View Program option (#12683) 2025-10-15 11:37:51 +08:00
George Hotz
a59439d013 use UOp.shape property instead of UOp.st (#12664)
* work on shape property

* reshape causing issues

* more mops

* all mops

* need to cache it

* _shape is like _device

* mostly works

* shape is good

* const uses _shape

* fix tests

* size doesn't use st

* close

* test is broken

* one less st

* hack for 3 op assign

* oops, i didn't mean to change that

* support emulate in the NullDevice

* reproed failure in emulation

* fix wmma
2025-10-15 10:01:34 +08:00
chenyu
89df6f611d reenable sdxl mac benchmark (#12680)
also updated faster sd step times
2025-10-14 17:36:17 -04:00
chenyu
d25ceffe8d update padto opts tests (#12679) 2025-10-14 17:00:42 -04:00
chenyu
e8380968f2 add venv_sd_mlperf to gitignore (#12676)
training stable diffusion stuff
2025-10-14 12:51:36 -04:00
wozeparrot
f228c03f9f fetch raid from cloud (#10799)
* feat: initial tinyfs device

* feat: don't allow compute on tinyfs device

* feat: tensor helpers to load and store

* feat: bufferview for tinyfs

* fix: keep copy sizes correct

* fix: recv large

* clean: unneeded

* feat: comment

* clean: unneeded

* clean: remove

* clean: remove

* feat: get request tag

* feat: rename to cloud

* feat: send request_id

* feat: start computing tree

* feat: compute store tree on this side

* feat: jank chunked load

* feat: more debugging

* feat: rename to just load and store

* feat: correct chunk count

* fix: fix load for < 1mb

* feat: comments

* feat: don't truncate on block devices

* feat: better way of testing block device

* feat: don't need to pad that much

* feat: connect to nodes directly on load

* feat: cache connections

* feat: don't hard code chunk size

* feat: close mmap when closing file handle

* feat: don't overwrite stuff on disk if storing from disk

* clean: debug print

* fix: close mmap

* feat: await workers

* feat: fast copy from tinyfs to disk

* feat: don't copy to device on last

* feat: use single socket per device

* feat: raid in tinyfs

* clean: remove import

* clean: type

* feat: maintain single event loop

* feat: lower worker count

* feat: use connection pool

* feat: fetch mapping in its own process

* fix: release lock

* feat: don't fetch if exists

* feat: req id only on stores

* feat: always fetch

* fix: rangeify

* feat: allow specifying raid root

* fix: dealloc buffer

* feat: start support non 0 offset

* clean: use cleaner

* feat: don't pass to threadpool

* clean: typing
2025-10-14 07:53:55 -07:00
chenyu
70dd297a05 BS=96 for bert (#12675)
96 trains fine now
2025-10-14 09:07:43 -04:00
Sieds Lykles
852d80dff9 better where on load folding (#12651)
* move where clauses to load

* shorten line

* drop clauses if they are duplicated

* add rule for swapped where branch

* where on ungated load

* dont move clause if load is in the clause

* parse_valid returns None

* no data dependent branches

* fix rule

* enable swapped rule

* remove those
2025-10-14 13:30:47 +02:00
nimlgen
c7e63601fd gfx1200 tc for AMD_LLVM (#12673) 2025-10-14 19:17:48 +08:00
George Hotz
db4a359374 fix up some slow tests that launch python (#12672)
* fix up some slow tests that launch python

* svd nonfull in parallel

* split test_advancedindex
2025-10-14 19:13:55 +08:00
nimlgen
4918c827c2 amd: lib_gpu does not need cpu_access (#12670) 2025-10-14 18:34:34 +08:00
nimlgen
0c9d47deab hcq: add alignment to kernargs (#12669) 2025-10-14 18:33:12 +08:00
qazal
d3bfcd3277 minor patches for SQTT over usb on gfx12 (#12627)
* disable cpu_access in the sqtt buffer allocation

not sure if this is required, it results in a very slow call to
pcie_mem_write over USB GPU, removing it worked fine.

* fix itrace_se_mask on gfx12

on gfx11 it gave 6 se, on gfx11 this value is 2 so no instructions were
traced.

* Revert "fix itrace_se_mask on gfx12"

This reverts commit 0644adbcd1.
2025-10-14 18:07:46 +08:00
Sieds Lykles
1e6e5a0efd parse_valid returns None instead of raising (#12663)
* parse_valid returns None

* change there too
2025-10-14 11:57:38 +02:00
qazal
471bd30d16 cleanup viz/serve.py (#12665)
* use load_pickle

* update comment
2025-10-14 17:50:39 +08:00
George Hotz
fb61f3519f remove assign contiguous hack (#12659)
* remove assign contiguous hack

* remove bad contiguous usage in torch backend

* assign
2025-10-14 16:42:14 +08:00
George Hotz
30ee7c4c26 cleanup Device usage in Tensor (#12662) 2025-10-14 16:22:22 +08:00
Sieds Lykles
e06cbfcb8a combine pm_drop_and_clauses (#12660)
* combine those

* wino kernels decreased
2025-10-14 10:09:41 +02:00
George Hotz
84d4589ed4 remove pylint from pre-commit and CI (#12658)
* remove pylint from pre-commit and CI

* multidevice test is fast

* faster pre-commit

* 8 is faster than 4

* better name

* how did that typecheck?
2025-10-14 15:39:59 +08:00
qazal
8ecaf839e2 cleanup UOp tracing [pr] (#12657) 2025-10-14 14:50:59 +08:00
George Hotz
b9eb5b5d49 clean up the LLM tokenizer (#12653)
* clean up the LLM tokenizer

* simple tokenizer is actually simple

* ugh write good code
2025-10-14 14:22:01 +08:00
qazal
a9ef93176f viz: add colored text helper (#12654) 2025-10-14 13:05:26 +08:00
George Hotz
ecdc7539a2 add typing to MathTraits (#12650)
* add typing to MathTraits

* fix assign
2025-10-14 12:35:20 +08:00
qazal
9bf032de69 viz: keep focused shape in view (#12648) 2025-10-14 10:49:08 +08:00
chenyu
77b5e6774e fix bert training config (#12647)
FREE_INTERMEDIATE=0 REWRITE_STACK_LIMIT=500000
2025-10-13 15:03:47 -04:00
nimlgen
f1041dc0ac pylint 4.0.0 (#12642)
* cpu: fix spacing

* fix pylint

* fix pylint

* pylint 4.0.0

* lambda

* keep eval for now

* im so sorry
2025-10-13 23:28:36 +08:00
wozeparrot
47e0c43976 feat: Tensor.{load, store} (#12629) 2025-10-13 08:04:41 -07:00
chenyu
0f776c6e46 examples/mlperf/training_submission_v6.0 (#12644)
copied from v5.1
2025-10-13 09:58:25 -04:00
Sieds Lykles
e0139fafc1 UOp symbolic tests use eval to check against string (#12643) 2025-10-13 14:19:42 +02:00