Commit Graph

767 Commits

Author SHA1 Message Date
Christopher Milan
0aabc1e938 Mesa NIR backend (NAK/LLVMpipe) (#12089)
* nak works

* TestOps::test_add works

* testop has no crashes

* fix bool casts

* fix typo

* add disassemble

* RANGE and locals/regs

* simplify NAKCompiler

* disass cleanup

* cleanup nir codegen

* almost all tests passing

* cleanup notes in extra/

* old notes

* only import nak if NIR=1

* fix new SPECIAL syntax

* fix local/shared memory

* more tests passing

* add DEFINE_VAR support

* llvmpipe kinda works

* diskcache

* some mypy stuff

* lvp passing test_ops.py

* fix imports

* actually fix imports

* remove 'stdout'

* fix llvm import

* fix mypy issues

* nicer errors

* simpler test_dtype skips

* test lvp in CI

* fix github action syntax

* fix more actions typos

* switch to mesa 25.1.0

* diskcache_put

* better generation for lvp nir_options

* b64encode shader blobs

* Revert diskcache changes

This reverts commits 930fa3de8a and 8428c694b3.

* general cleanup

* better error messages

* fix llvm import

* fix windows tests

* link with libm and libgcc_s

* fix some errors

* dont check for 'float4'

* NIR uses pointer arithmetic

* use tinymesa

* bump tinymesa

* bump tinymesa again

* update lvp nir_options

* print nir shader with DEBUG

* simplify LVPCompiler

* more tests

* "gated" STORE

* NAK is cacheable

* more tests

* all tests pass locally for NAK

* test autogen in CI

* autogen deps

* more deps

* fix uop_gc

* fix macos

* mypy

* save 2 lines

* save two more lines

* save 1 line

* save 4 lines

* save more lines

* Revert "save more lines"

This reverts commit dd3a720c5a.

* save more lines

* fix LVP on windows

* refactor

* reorganize some code

* refactor lib_gpu

* move LVP check

* out of order loads

* remove support.mesa

* bump tinymesa version

* simplify LVP jit

* macos

* macos ci

* shell: bash

* testing

* more testing

* compute brew prefix

* stupid typo

* actually fix

* lib

* stdout on macos

* inline gallivm_compile_module

* Revert "inline gallivm_compile_module"

This reverts commit b65983b151.

* elf macos

* semicolon

* inherit from CPULLVMCompiler

* ruff

* disas test

* fix libm linking

* default is fine actually

* arm works

* add elf loader link test

* fix NAK beam

* pylint is too smart by half

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2025-10-15 17:38:33 +08:00
George Hotz
a59439d013 use UOp.shape property instead of UOp.st (#12664)
* work on shape property

* reshape causing issues

* more mops

* all mops

* need to cache it

* _shape is like _device

* mostly works

* shape is good

* const uses _shape

* fix tests

* size doesn't use st

* close

* test is broken

* one less st

* hack for 3 op assign

* oops, i didn't mean to change that

* support emulate in the NullDevice

* reproed failure in emulation

* fix wmma
2025-10-15 10:01:34 +08:00
George Hotz
84d4589ed4 remove pylint from pre-commit and CI (#12658)
* remove pylint from pre-commit and CI

* multidevice test is fast

* faster pre-commit

* 8 is faster than 4

* better name

* how did that typecheck?
2025-10-14 15:39:59 +08:00
Sieds Lykles
e537e895b1 drop unused invalid conditions (#12635)
* drop where conditions if the ranges are not used inside the index

* remove allow_any_len
2025-10-13 10:52:21 +02:00
chenyu
8f5f57c7d9 smaller CNT fuzz shapetracker (#12626) 2025-10-12 08:52:30 -04:00
Sieds Lykles
772a8dfe31 reshape uses valid when simplifying (#12597)
* reshape uses valid when simplifying

* try with IGNORE_OOB=0

* is it this test?

* skipif gpuocelot
2025-10-11 17:02:54 +02:00
Sieds Lykles
cbdc13279d fix openpilot gated reads (#12570)
* fix gated image counts

* slice correctly
2025-10-10 04:52:57 +02:00
chenyu
a0cbbc35ad remove LLAMA_LAYERS in ci (#12562) 2025-10-09 04:46:41 -04:00
nimlgen
658c566e22 vars in gated_read_image_count (#12486)
* vars in gated_read_image_count

* nc
2025-10-09 14:54:15 +08:00
chenyu
942022c309 smaller LLAMA_LAYER in Test llama 3 training (#12516)
very slow now
2025-10-08 05:10:51 -04:00
chenyu
e701106a64 remove FUSE_ARANGE (#12511)
it was the default already
2025-10-08 04:54:07 -04:00
chenyu
da1f46ff3f remove RANGEIFY specific test jobs (#12507) 2025-10-08 04:12:04 -04:00
George Hotz
403fdfcfd4 check spec in test, cleanup vectorize render (#12484) 2025-10-07 17:05:50 +08:00
chenyu
8ad5f9e74f skip slow benchmarks (#12481)
* skip slow benchmarks

padded tc is already slow, rest are slow with rangeify (correct if run locally)

* relax more
2025-10-07 03:28:56 -04:00
chenyu
1823a5043f don't check MAX_BUFFER_SIZE on NULL (#12461) 2025-10-05 22:09:29 -04:00
chenyu
74b04f7dca test beautiful_mnist_multigpu (#12455)
* test beautiful_mnist_multigpu

another example that fails with RANGEIFY

* now i remember

* MAX_BUFFER_SIZE=0
2025-10-05 08:45:01 -04:00
chenyu
98163832e4 update RANGEIFY test_cast_padded (#12421)
* update RANGEIFY test_cast_padded

* update test
2025-10-02 04:37:35 -04:00
chenyu
37beef6de3 add null bert training test in ci (#12420)
fails with RANGEIFY `RuntimeError: children not making progress`
2025-10-02 04:05:19 -04:00
b1tg
ec177c80c2 rangeify: fix test_where_fold (llvm) (#12416)
* rangeify: fix test_where_fold (AMD_LLVM)

* rm comment
2025-10-02 02:57:49 -04:00
qazal
d1c868f990 fix limit_bufs with multi (#12414) 2025-10-02 05:51:56 +03:00
qazal
5b649616ff rangeify: detect and assert cycles (#12405)
* rangeify: assert cycles

* rng=2

* any
2025-10-02 03:39:43 +03:00
b1tg
ac3d457d5e rangeify: TestReduceOpsConstFolding (#12397)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-10-01 17:58:19 +08:00
chenyu
6c95b1f39d explicitly set device for CI unit test (#12399) 2025-10-01 05:16:54 -04:00
chenyu
689ab9151b more RANGEIFY tests (#12393)
would have caught the load alt regression without adding too many tests
2025-10-01 03:43:58 -04:00
b1tg
154d114364 rangeify: fix abstractions2.py (#12386)
* rangeify: fix abstractions2.py

* tests

* lint

* only abstractions2

* base
2025-10-01 09:58:56 +03:00
b1tg
da52006bde rangeify: fix test_scatter_reduce (#12380)
* rangeify: fix test_scatter_reduce

* ext_vector_type

* set alignment=1 on boolean
2025-09-30 23:26:36 -04:00
chenyu
8def8145e4 ALLOWED_KERNEL_COUNT openpilot 0.9.4 with RANGEIFY (#12381) 2025-09-30 22:58:59 -04:00
qazal
26247573e1 rangeify multi tests on gpu (#12376)
* rangeify multi tests on gpu

* fix limit_bufs
2025-10-01 04:53:04 +03:00
chenyu
b4a4817c9c fix rangeigy test_linalg (#12365) 2025-09-30 06:28:35 -04:00
b1tg
c9ef5d8fe5 rangeify: fix test_tensor_index_overflow (CPU_LLVM=1) (#12362)
* rangeify: fix test_tensor_index_overflow (CPU_LLVM=1)

* add test

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-09-30 05:55:15 -04:00
qazal
6a56d3c859 rangeify: only test correctness in multi (#12339)
* work

* more work

* back here

* skip tests

* work
2025-09-30 09:55:59 +03:00
George Hotz
ab6b0d3a21 enable cleanup_dead_axes (#12351)
* enable cleanup_dead_axes

* don't mess with user contig

* correct tag behavior

* double reshape isn't correct

* block on assign too

* skip messing with symbolic

* Fix tests

* disable RANGEIFY=2

* test w rangeify
2025-09-30 14:09:39 +08:00
qazal
2a7310ab59 rangeify: fix remaining multi correctness issue (#12354) 2025-09-30 08:08:27 +03:00
chenyu
881709cd33 don't skip rangeify test_instancenorm_3d (#12350)
seems fine now
2025-09-30 00:05:59 -04:00
hooved
39aae679e4 Support bfloat16 on NULL backend (#12340)
* add failing test

* move test

* only run test with NULL default

* add skip reason

* add fix
2025-09-30 00:02:30 -04:00
chenyu
af935e7d32 Revert "reduce const folding (#12344)" (#12349)
This reverts commit 8e508a9927.
2025-09-29 23:45:30 -04:00
qazal
05275c9ec3 rangeify: enable assign to mstack target (#12345) 2025-09-30 06:27:57 +03:00
chenyu
8e508a9927 reduce const folding (#12344) 2025-09-29 23:08:56 -04:00
qazal
32d69d07d7 rangeify: enable multitensor TestBatchNorm (#12342) 2025-09-30 06:05:00 +03:00
Sieds Lykles
c38f6ce140 unified_rewrite: use deque and dont add nodes to the stack multiple times (#12320)
* use deque instead of list

* increase ctx.progress and max stack_len

* add openpilot

* prevent placing uops on stack many times

* revert increasing ctx.progress and stack length limit

* dont block adding to the stack there

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-09-30 10:02:28 +08:00
hooved
c2689c505e Clip model updates for Stable Diffusion mlperf training (#12313)
* stable diffusion mlperf clip changes

* add clip tests

* set gelu as attribute

* add more tests

* factor out GPUS

* rerun CI

* add imports to if blocks

* remove unneeded axis

* add clip tests to CI

* move clip tests

* add deps, disable max buf size
2025-09-29 21:50:14 -04:00
qazal
250cb10e8f rangeify permuted assign (#12299)
* enable RANGEIFY=1 test_assign

* work

* rangeify=0 asserts this ast

* remove that

* beta test, it's correct though

* skip multi

* matches torch/np output

* memcopy without memcopy

* can remove this

* rangeify isn't silently wrong anymore

* diff cleanup

* use UOp toposort instead of global tags

* actual assert TestRangeifyAssign

* step

* work

* this isn't optimizing away now

* some todos

* test fusion schedule

* typo

* dedup idxs

* cleaner

* pre

* work

* diff
2025-09-29 07:27:57 +03:00
Sieds Lykles
ed90de6583 Revert "Bufferize early, fix "children not making progress" on big graphs (#1…" (#12318)
This reverts commit 6f1cf717de.
2025-09-28 19:10:21 +02:00
Sieds Lykles
6f1cf717de Bufferize early, fix "children not making progress" on big graphs (#12308)
* bufferize children early

* cleaner

* fix types

* lower number of reduceops

* test openpilot
2025-09-27 04:17:15 +02:00
qazal
8b2e0930d7 rangeify: enable passing multi test (#12301) 2025-09-26 08:31:13 +03:00
Sieds Lykles
74411984fc Rangeify IMAGE (#12304)
* add imagedtype to rangeify

* enable some image tests

* move the tests

* image upcast before locals

* add if statement

* rangeify image_dtype test

* decrease read_image count
2025-09-26 07:21:02 +02:00
chenyu
17cec8d645 RANGEIFY winograd test (#12297)
speed seems fine
2025-09-24 23:42:32 -04:00
qazal
38ecefaacb RANGEIFY=1 allreduce (#12260)
* ci

* extract mops

* work

* assert early

* port this?

* can realize shard

* allreduce passing

* notes

* better handling of shard

* err

* outerworld allreduce twice

* work

* don't tag movement ops

* don't tag movement ops

* delete old logic

* 19 failing + ram

* cleanup

* reset stuff

* simplest failing test

* diff

* test_ones

* allreduce work

* allreduce more work

* down to 22 failing tests

* port _device_num

* replace creates a new UOp here

* pour symbolic everywhere

* 7 failing

* focus on allreduce

* work

* cleanup

* more ci

* fix test_schedule_ring

* post index const shape

* much better

* diff cleanup
2025-09-24 18:13:08 +03:00
qazal
1400ce105f rangeify: fix sharding (#12288) 2025-09-24 14:33:56 +03:00
qazal
154c865966 rangeify: fix ram usage in multi (#12286) 2025-09-24 13:48:58 +03:00