Commit Graph

10628 Commits

Author SHA1 Message Date
geohotstan
5d209ee7ec onnx helper intermediate node output validation (#12740)
* start

* update comments

* good

* add comments and better printing

* done
2025-10-16 11:17:47 -04:00
Christopher Milan
bce2bc0465 Revert "use RTLD_GLOBAL on macos" (#12738)
This reverts commit 89fe3e574d.
2025-10-16 10:07:21 -04:00
chenyu
f34f26bca0 fix gpt2 with benchmark (#12736)
`CPU=1 python3 examples/gpt2.py --benchmark 128` works now
2025-10-16 09:55:20 -04:00
Sieds Lykles
55db1b0e0e reduce where that is cut from two sides (#12733)
* better rule

* correct pattern

* shorten line
2025-10-16 15:25:15 +02:00
nimlgen
cf9baeea61 Revert "nv: check if jitlink is avail (#12731)" (#12735)
This reverts commit a069a45d14.
2025-10-16 20:41:49 +08:00
George Hotz
8be7844b2e use apply uop for assign to fix assign metadata (#12732)
* use apply uop for assign

* fix metadata for assign

* fix backward metadata

* those aren't real tests
2025-10-16 20:34:12 +08:00
nimlgen
3aa2277b8f nv: usb4 (#12696)
* hackish

* prog

* match

* l

* simpler

* refactor

* not osx

* apple things

* tiny changes

* fix mask

* match fix

* nn
2025-10-16 20:11:19 +08:00
nimlgen
a069a45d14 nv: check if jitlink is avail (#12731) 2025-10-16 19:58:50 +08:00
George Hotz
a498ec9c18 cleanup names of postrange + fast FUSE_OPTIM (#12730)
* cleanup names of postrange

* make FUSE_OPTIM not slow

* delete junk in def r
2025-10-16 19:38:31 +08:00
Sieds Lykles
8f740e07ff no broadcasting/vectors in reduce collapse (#12729) 2025-10-16 13:22:57 +02:00
qazal
533f18b22c viz: add trace data for inflight buffers (#12728)
* viz: add trace data for inflight buffers

* add test_inflight_buf

* temp stores the keys

* update tests / use Tensor.ones
2025-10-16 19:15:03 +08:00
George Hotz
af4479c169 faster stable diffusion load (#12725)
* faster stable diffusion load

* failing tests
2025-10-16 18:31:59 +08:00
nimlgen
e7c057d5dc system: alloc_sysmem return view (#12724)
* system: alloc_sysmem return view

* e
2025-10-16 17:55:01 +08:00
nimlgen
b86a33a312 ptx: support bw (#12722) 2025-10-16 15:38:08 +08:00
nimlgen
b8cd66c7a2 nv: support all gb20x and small bar (#12721) 2025-10-16 15:37:54 +08:00
George Hotz
1d1e1d9d88 delete the ShapeTracker (#12720)
* delete the ShapeTracker

* fix tests

* fix more

* fix gc test
2025-10-16 15:36:22 +08:00
George Hotz
592e86f6f5 remove UOp.st (#12716)
* remove UOp.st

* fix tests

* torch backend disable
2025-10-16 14:44:09 +08:00
wozeparrot
cc2dfe22f5 tinyfs: fetch file utility (#12719) 2025-10-15 23:38:56 -07:00
nimlgen
3ed543f956 system: reorder funcs + barrier on macos (#12714) 2025-10-16 14:38:01 +08:00
qazal
b77bdbbc62 viz: count unpickle in server startup time (#12715)
* viz: count unpickle in server startup time

* type checking
2025-10-16 13:07:46 +08:00
George Hotz
7c19db00f1 remove st from jit/split_reduceop (#12713)
* remove st from jit

* fix by merging reshapes

* no st usage in rangeify

* hmm, stop early works

* fix speed regressions
2025-10-16 12:50:58 +08:00
qazal
069177c1be trace buffer producer and consumers (#12639)
* trace buffer producer and consumers

* work

* generic colored util

* fix batched

* basic clicking works

* generic javascript that works for producer and consumers

* keep focused shape

* idle time

* timings for producer and consumers dedup

* from sd test

* tiny cleanups

* timeline

* work

* up to here

* assert

* list it

* work
2025-10-16 11:11:31 +08:00
George Hotz
4a151e7533 make xcode signing happy, waiting for entitlement (#12712) 2025-10-16 10:20:34 +08:00
chenyu
c3278e5622 clean up old tests (#12708) 2025-10-15 17:53:17 -04:00
chenyu
b8cf35fb77 print macOS version in CI (#12705) 2025-10-15 15:05:33 -04:00
Daniel
d65bd669f8 update tiny torch backend hook (#12575)
* update the backend to fix torch deprecation warning

* use param_hook to avoid full backward hook needlessly firing on inputs which do not require gradients

* fix indentation

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-10-15 14:02:33 -04:00
nimlgen
db5ae846aa nv: do not use va_addr for cpu accesses (#12697)
* nv: do not use va_addr for cpu accesses

* mypy
2025-10-15 22:48:12 +08:00
nimlgen
3ab23af829 nv: copy prog with copyin (#12701)
* nv: copy prog with copyin

* to bytes

* fix test
2025-10-15 22:48:01 +08:00
nimlgen
fafbf3daea memory: reserve ptable (#12702) 2025-10-15 22:47:50 +08:00
George Hotz
85a907605c hotfix: only 20 steps of beautiful_mnist_torch, some CI machines are slow 2025-10-15 22:29:34 +08:00
Christopher Milan
e1996d358c use RTLD_GLOBAL on macos (#12699) 2025-10-15 22:24:50 +08:00
chenyu
312c622d35 support None in pad_to and shrink_to (#12700) 2025-10-15 09:25:31 -04:00
George Hotz
612e3d6143 replace mop arg with vectorized index (#12695)
* replace mop arg with vectorized index

* tests passing

* better viz

* no compile4
2025-10-15 20:50:06 +08:00
wozeparrot
9ec4c06d7d feat: one request per device (#12698) 2025-10-15 05:22:07 -07:00
Sieds Lykles
99aa3bd5f9 reduce collapse reduce only the cut range (#12687) 2025-10-15 13:57:41 +02:00
Sieds Lykles
91ac4f1f92 late merging of where and load (#12694) 2025-10-15 13:33:06 +02:00
qazal
768dc952de viz ui cleanups / renaming (#12691)
* better viz names

* delete unused

* don't use opacity, it's multiplicative

* keep styles

* scrollbar coloring

* pyrender doesn't work here

beautiful_mnist r_64_16_32_36@lower all index dtypes
2025-10-15 18:40:22 +08:00
chenyu
2e50ed0767 increase timeout of resnet cron (#12693)
does not finish in 6 hours now
2025-10-15 06:08:58 -04:00
Christopher Milan
0aabc1e938 Mesa NIR backend (NAK/LLVMpipe) (#12089)
* nak works

* TestOps::test_add works

* testop has no crashes

* fix bool casts

* fix typo

* add disassemble

* RANGE and locals/regs

* simplify NAKCompiler

* disass cleanup

* cleanup nir codegen

* almost all tests passing

* cleanup notes in extra/

* old notes

* only import nak if NIR=1

* fix new SPECIAL syntax

* fix local/shared memory

* more tests passing

* add DEFINE_VAR support

* llvmpipe kinda works

* diskcache

* some mypy stuff

* lvp passing test_ops.py

* fix imports

* actually fix imports

* remove 'stdout'

* fix llvm import

* fix mypy issues

* nicer errors

* simpler test_dtype skips

* test lvp in CI

* fix github action syntax

* fix more actions typos

* switch to mesa 25.1.0

* diskcache_put

* better generation for lvp nir_options

* b64encode shader blobs

* Revert diskcache changes

This reverts commits 930fa3de8a and 8428c694b3.

* general cleanup

* better error messages

* fix llvm import

* fix windows tests

* link with libm and libgcc_s

* fix some errors

* dont check for 'float4'

* NIR uses pointer arithmetic

* use tinymesa

* bump tinymesa

* bump tinymesa again

* update lvp nir_options

* print nir shader with DEBUG

* simplify LVPCompiler

* more tests

* "gated" STORE

* NAK is cacheable

* more tests

* all tests pass locally for NAK

* test autogen in CI

* autogen deps

* more deps

* fix uop_gc

* fix macos

* mypy

* save 2 lines

* save two more lines

* save 1 line

* save 4 lines

* save more lines

* Revert "save more lines"

This reverts commit dd3a720c5a.

* save more lines

* fix LVP on windows

* refactor

* reorganize some code

* refactor lib_gpu

* move LVP check

* out of order loads

* remove support.mesa

* bump tinymesa version

* simplify LVP jit

* macos

* macos ci

* shell: bash

* testing

* more testing

* compute brew prefix

* stupid typo

* actually fix

* lib

* stdout on macos

* inline gallivm_compile_module

* Revert "inline gallivm_compile_module"

This reverts commit b65983b151.

* elf macos

* semicolon

* inherit from CPULLVMCompiler

* ruff

* disas test

* fix libm linking

* default is fine actually

* arm works

* add elf loader link test

* fix NAK beam

* pylint is too smart by half

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2025-10-15 17:38:33 +08:00
qazal
f0268d13f6 cleanup viz server (#12688) 2025-10-15 15:58:36 +08:00
nimlgen
aa81bde150 amd: usb4/thunderbolt on macs (#12641)
* tbgpu

* works

* cleaner

* this

* zero size

* h

* fix

* simpler

* prio over usb

* c

* not needed

* linter

* this way

* mappings

* mypy

* mypy

* mypy 2

* nn
2025-10-15 13:02:01 +08:00
George Hotz
236c4590c3 use margs as intermediate for new style mops (#12686)
* use marg to prepare for movement op change

* clean up forced reshape

* move marg

* more marg

* more
2025-10-15 12:43:00 +08:00
qazal
7597e1dcac pyrender in viz (#12682)
* pyrender in viz

* keep profile still print_tree

* keep special in render
2025-10-15 11:53:30 +08:00
qazal
60e03eec37 viz: add View Program option (#12683) 2025-10-15 11:37:51 +08:00
George Hotz
a59439d013 use UOp.shape property instead of UOp.st (#12664)
* work on shape property

* reshape causing issues

* more mops

* all mops

* need to cache it

* _shape is like _device

* mostly works

* shape is good

* const uses _shape

* fix tests

* size doesn't use st

* close

* test is broken

* one less st

* hack for 3 op assign

* oops, i didn't mean to change that

* support emulate in the NullDevice

* reproed failure in emulation

* fix wmma
2025-10-15 10:01:34 +08:00
chenyu
89df6f611d reenable sdxl mac benchmark (#12680)
also updated faster sd step times
2025-10-14 17:36:17 -04:00
chenyu
d25ceffe8d update padto opts tests (#12679) 2025-10-14 17:00:42 -04:00
chenyu
e8380968f2 add venv_sd_mlperf to gitignore (#12676)
training stable diffusion stuff
2025-10-14 12:51:36 -04:00
wozeparrot
f228c03f9f fetch raid from cloud (#10799)
* feat: initial tinyfs device

* feat: don't allow compute on tinyfs device

* feat: tensor helpers to load and store

* feat: bufferview for tinyfs

* fix: keep copy sizes correct

* fix: recv large

* clean: unneeded

* feat: comment

* clean: unneeded

* clean: remove

* clean: remove

* feat: get request tag

* feat: rename to cloud

* feat: send request_id

* feat: start computing tree

* feat: compute store tree on this side

* feat: jank chunked load

* feat: more debugging

* feat: rename to just load and store

* feat: correct chunk count

* fix: fix load for < 1mb

* feat: comments

* feat: don't truncate on block devices

* feat: better way of testing block device

* feat: don't need to pad that much

* feat: connect to nodes directly on load

* feat: cache connections

* feat: don't hard code chunk size

* feat: close mmap when closing file handle

* feat: don't overwrite stuff on disk if storing from disk

* clean: debug print

* fix: close mmap

* feat: await workers

* feat: fast copy from tinyfs to disk

* feat: don't copy to device on last

* feat: use single socket per device

* feat: raid in tinyfs

* clean: remove import

* clean: type

* feat: maintain single event loop

* feat: lower worker count

* feat: use connection pool

* feat: fetch mapping in its own process

* fix: release lock

* feat: don't fetch if exists

* feat: req id only on stores

* feat: always fetch

* fix: rangeify

* feat: allow specifying raid root

* fix: dealloc buffer

* feat: start support non 0 offset

* clean: use cleaner

* feat: don't pass to threadpool

* clean: typing
2025-10-14 07:53:55 -07:00
chenyu
70dd297a05 BS=96 for bert (#12675)
96 trains fine now
2025-10-14 09:07:43 -04:00