Commit Graph

11094 Commits

Author SHA1 Message Date
Sieds Lykles
99aa3bd5f9 reduce collapse reduce only the cut range (#12687) 2025-10-15 13:57:41 +02:00
Sieds Lykles
91ac4f1f92 late merging of where and load (#12694) 2025-10-15 13:33:06 +02:00
qazal
768dc952de viz ui cleanups / renaming (#12691)
* better viz names

* delete unused

* don't use opacity, it's multiplicative

* keep styles

* scrollbar coloring

* pyrender doesn't work here

beautiful_mnist r_64_16_32_36@lower all index dtypes
2025-10-15 18:40:22 +08:00
chenyu
2e50ed0767 increase timeout of resnet cron (#12693)
does not finish in 6 hours now
2025-10-15 06:08:58 -04:00
Christopher Milan
0aabc1e938 Mesa NIR backend (NAK/LLVMpipe) (#12089)
* nak works

* TestOps::test_add works

* testop has no crashes

* fix bool casts

* fix typo

* add disassemble

* RANGE and locals/regs

* simplify NAKCompiler

* disass cleanup

* cleanup nir codegen

* almost all tests passing

* cleanup notes in extra/

* old notes

* only import nak if NIR=1

* fix new SPECIAL syntax

* fix local/shared memory

* more tests passing

* add DEFINE_VAR support

* llvmpipe kinda works

* diskcache

* some mypy stuff

* lvp passing test_ops.py

* fix imports

* actually fix imports

* remove 'stdout'

* fix llvm import

* fix mypy issues

* nicer errors

* simpler test_dtype skips

* test lvp in CI

* fix github action syntax

* fix more actions typos

* switch to mesa 25.1.0

* diskcache_put

* better generation for lvp nir_options

* b64encode shader blobs

* Revert diskcache changes

This reverts commits 930fa3de8a and 8428c694b3.

* general cleanup

* better error messages

* fix llvm import

* fix windows tests

* link with libm and libgcc_s

* fix some errors

* dont check for 'float4'

* NIR uses pointer arithmetic

* use tinymesa

* bump tinymesa

* bump tinymesa again

* update lvp nir_options

* print nir shader with DEBUG

* simplify LVPCompiler

* more tests

* "gated" STORE

* NAK is cacheable

* more tests

* all tests pass locally for NAK

* test autogen in CI

* autogen deps

* more deps

* fix uop_gc

* fix macos

* mypy

* save 2 lines

* save two more lines

* save 1 line

* save 4 lines

* save more lines

* Revert "save more lines"

This reverts commit dd3a720c5a.

* save more lines

* fix LVP on windows

* refactor

* reorganize some code

* refactor lib_gpu

* move LVP check

* out of order loads

* remove support.mesa

* bump tinymesa version

* simplify LVP jit

* macos

* macos ci

* shell: bash

* testing

* more testing

* compute brew prefix

* stupid typo

* actually fix

* lib

* stdout on macos

* inline gallivm_compile_module

* Revert "inline gallivm_compile_module"

This reverts commit b65983b151.

* elf macos

* semicolon

* inherit from CPULLVMCompiler

* ruff

* disas test

* fix libm linking

* default is fine actually

* arm works

* add elf loader link test

* fix NAK beam

* pylint is too smart by half

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2025-10-15 17:38:33 +08:00
qazal
f0268d13f6 cleanup viz server (#12688) 2025-10-15 15:58:36 +08:00
nimlgen
aa81bde150 amd: usb4/thunderbolt on macs (#12641)
* tbgpu

* works

* cleaner

* this

* zero size

* h

* fix

* simpler

* prio over usb

* c

* not needed

* linter

* this way

* mappings

* mypy

* mypy

* mypy 2

* nn
2025-10-15 13:02:01 +08:00
George Hotz
236c4590c3 use margs as intermediate for new style mops (#12686)
* use marg to prepare for movement op change

* clean up forced reshape

* move marg

* more marg

* more
2025-10-15 12:43:00 +08:00
qazal
7597e1dcac pyrender in viz (#12682)
* pyrender in viz

* keep profile still print_tree

* keep special in render
2025-10-15 11:53:30 +08:00
qazal
60e03eec37 viz: add View Program option (#12683) 2025-10-15 11:37:51 +08:00
George Hotz
a59439d013 use UOp.shape property instead of UOp.st (#12664)
* work on shape property

* reshape causing issues

* more mops

* all mops

* need to cache it

* _shape is like _device

* mostly works

* shape is good

* const uses _shape

* fix tests

* size doesn't use st

* close

* test is broken

* one less st

* hack for 3 op assign

* oops, i didn't mean to change that

* support emulate in the NullDevice

* reproed failure in emulation

* fix wmma
2025-10-15 10:01:34 +08:00
chenyu
89df6f611d reenable sdxl mac benchmark (#12680)
also updated faster sd step times
2025-10-14 17:36:17 -04:00
chenyu
d25ceffe8d update padto opts tests (#12679) 2025-10-14 17:00:42 -04:00
chenyu
e8380968f2 add venv_sd_mlperf to gitignore (#12676)
training stable diffusion stuff
2025-10-14 12:51:36 -04:00
wozeparrot
f228c03f9f fetch raid from cloud (#10799)
* feat: initial tinyfs device

* feat: don't allow compute on tinyfs device

* feat: tensor helpers to load and store

* feat: bufferview for tinyfs

* fix: keep copy sizes correct

* fix: recv large

* clean: unneeded

* feat: comment

* clean: unneeded

* clean: remove

* clean: remove

* feat: get request tag

* feat: rename to cloud

* feat: send request_id

* feat: start computing tree

* feat: compute store tree on this side

* feat: jank chunked load

* feat: more debugging

* feat: rename to just load and store

* feat: correct chunk count

* fix: fix load for < 1mb

* feat: comments

* feat: don't truncate on block devices

* feat: better way of testing block device

* feat: don't need to pad that much

* feat: connect to nodes directly on load

* feat: cache connections

* feat: don't hard code chunk size

* feat: close mmap when closing file handle

* feat: don't overwrite stuff on disk if storing from disk

* clean: debug print

* fix: close mmap

* feat: await workers

* feat: fast copy from tinyfs to disk

* feat: don't copy to device on last

* feat: use single socket per device

* feat: raid in tinyfs

* clean: remove import

* clean: type

* feat: maintain single event loop

* feat: lower worker count

* feat: use connection pool

* feat: fetch mapping in its own process

* fix: release lock

* feat: don't fetch if exists

* feat: req id only on stores

* feat: always fetch

* fix: rangeify

* feat: allow specifying raid root

* fix: dealloc buffer

* feat: start support non 0 offset

* clean: use cleaner

* feat: don't pass to threadpool

* clean: typing
2025-10-14 07:53:55 -07:00
chenyu
70dd297a05 BS=96 for bert (#12675)
96 trains fine now
2025-10-14 09:07:43 -04:00
Sieds Lykles
852d80dff9 better where on load folding (#12651)
* move where clauses to load

* shorten line

* drop clauses if they are duplicated

* add rule for swapped where branch

* where on ungated load

* dont move clause if load is in the clause

* parse_valid returns None

* no data dependent branches

* fix rule

* enable swapped rule

* remove those
2025-10-14 13:30:47 +02:00
nimlgen
c7e63601fd gfx1200 tc for AMD_LLVM (#12673) 2025-10-14 19:17:48 +08:00
George Hotz
db4a359374 fix up some slow tests that launch python (#12672)
* fix up some slow tests that launch python

* svd nonfull in parallel

* split test_advancedindex
2025-10-14 19:13:55 +08:00
nimlgen
4918c827c2 amd: lib_gpu does not need cpu_access (#12670) 2025-10-14 18:34:34 +08:00
nimlgen
0c9d47deab hcq: add alignment to kernargs (#12669) 2025-10-14 18:33:12 +08:00
qazal
d3bfcd3277 minor patches for SQTT over usb on gfx12 (#12627)
* disable cpu_access in the sqtt buffer allocation

not sure if this is required, it results in a very slow call to
pcie_mem_write over USB GPU, removing it worked fine.

* fix itrace_se_mask on gfx12

on gfx11 it gave 6 se, on gfx11 this value is 2 so no instructions were
traced.

* Revert "fix itrace_se_mask on gfx12"

This reverts commit 0644adbcd1.
2025-10-14 18:07:46 +08:00
Sieds Lykles
1e6e5a0efd parse_valid returns None instead of raising (#12663)
* parse_valid returns None

* change there too
2025-10-14 11:57:38 +02:00
qazal
471bd30d16 cleanup viz/serve.py (#12665)
* use load_pickle

* update comment
2025-10-14 17:50:39 +08:00
George Hotz
fb61f3519f remove assign contiguous hack (#12659)
* remove assign contiguous hack

* remove bad contiguous usage in torch backend

* assign
2025-10-14 16:42:14 +08:00
George Hotz
30ee7c4c26 cleanup Device usage in Tensor (#12662) 2025-10-14 16:22:22 +08:00
Sieds Lykles
e06cbfcb8a combine pm_drop_and_clauses (#12660)
* combine those

* wino kernels decreased
2025-10-14 10:09:41 +02:00
George Hotz
84d4589ed4 remove pylint from pre-commit and CI (#12658)
* remove pylint from pre-commit and CI

* multidevice test is fast

* faster pre-commit

* 8 is faster than 4

* better name

* how did that typecheck?
2025-10-14 15:39:59 +08:00
qazal
8ecaf839e2 cleanup UOp tracing [pr] (#12657) 2025-10-14 14:50:59 +08:00
George Hotz
b9eb5b5d49 clean up the LLM tokenizer (#12653)
* clean up the LLM tokenizer

* simple tokenizer is actually simple

* ugh write good code
2025-10-14 14:22:01 +08:00
qazal
a9ef93176f viz: add colored text helper (#12654) 2025-10-14 13:05:26 +08:00
George Hotz
ecdc7539a2 add typing to MathTraits (#12650)
* add typing to MathTraits

* fix assign
2025-10-14 12:35:20 +08:00
qazal
9bf032de69 viz: keep focused shape in view (#12648) 2025-10-14 10:49:08 +08:00
chenyu
77b5e6774e fix bert training config (#12647)
FREE_INTERMEDIATE=0 REWRITE_STACK_LIMIT=500000
2025-10-13 15:03:47 -04:00
nimlgen
f1041dc0ac pylint 4.0.0 (#12642)
* cpu: fix spacing

* fix pylint

* fix pylint

* pylint 4.0.0

* lambda

* keep eval for now

* im so sorry
2025-10-13 23:28:36 +08:00
wozeparrot
47e0c43976 feat: Tensor.{load, store} (#12629) 2025-10-13 08:04:41 -07:00
chenyu
0f776c6e46 examples/mlperf/training_submission_v6.0 (#12644)
copied from v5.1
2025-10-13 09:58:25 -04:00
Sieds Lykles
e0139fafc1 UOp symbolic tests use eval to check against string (#12643) 2025-10-13 14:19:42 +02:00
b1tg
218225e8d0 pylint error (#12630)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-10-13 05:05:12 -07:00
nimlgen
9096d7cc2e amd: support for rx9060 (#12640) 2025-10-13 19:44:15 +08:00
qazal
066d25f5fb refactor to trace_num property in buffers (#12638) 2025-10-13 18:06:55 +08:00
qazal
cd6aeebfee sqtt: osx decoder installer (#12637) 2025-10-13 17:26:12 +08:00
Sieds Lykles
e537e895b1 drop unused invalid conditions (#12635)
* drop where conditions if the ranges are not used inside the index

* remove allow_any_len
2025-10-13 10:52:21 +02:00
wozeparrot
9ab06dffad hotfix: block from env (#12628) 2025-10-12 08:07:32 -07:00
wozeparrot
12435a2dab actual tinyfs device (#12620) 2025-10-12 07:51:17 -07:00
chenyu
8f5f57c7d9 smaller CNT fuzz shapetracker (#12626) 2025-10-12 08:52:30 -04:00
George Hotz
1ecf403294 cleanup long lines [pr] (#12623)
* cleanup long lines

* more

* a few more

* all noqa fixed

* fix amd + cuda

* clean that up
2025-10-12 20:18:05 +08:00
qazal
fd51ecf983 process_replay for get_rangeify_map (#12624) 2025-10-12 15:14:40 +03:00
qazal
b5afa3848e viz: fix memory graph total nbytes (#12622)
* viz: fix memory graph total nbytes

* post increment

* simple regression test

* loop with markers + slightly off text baseline

* cpu events clear
2025-10-12 14:32:46 +03:00
nimlgen
822eab057f cpu: respect taskset + allow all cores (#12619)
* cpu: account taskset + allow all cores

* spaces
2025-10-12 14:31:40 +08:00