Commit Graph

813 Commits

Author SHA1 Message Date
chenyu
efad567ebd ruff check whole examples/mlperf/ (#10979) 2025-06-25 12:57:48 -04:00
George Hotz
0f89660ce4 Revert "change clang -march flag to -mcpu on arm (#10841)" (#10942)
This reverts commit 897e42fd1b.
2025-06-23 16:48:28 -07:00
ttomsa
897e42fd1b change clang -march flag to -mcpu on arm (#10841)
* change clang -march flag to -mcpu with fp16 disassembly test

* fix

* add capstone to macos dependencies

* just check no cast in test

* rm import

* woops

* lets check

* move check

* llvm init before cpu chcek

* try this

* bump autogen llvm version

* also update libclang?

* revert

* add comment

* skip llvm test and add comment

* linter
2025-06-23 16:28:48 -07:00
George Hotz
ae4d2d71b4 bump line count to 14500 2025-06-23 15:32:27 -07:00
geohotstan
4ab7d792cc ONNX improve dtype fallback (#10800)
* fix

* add early verbose demo test

* is this how to write tests :s

* is definition drift even a thing? gemini says it is

* clean up

* better

* even better

* try add to CI

* doesn't work quite yet

* much more work to be done

* whoops

* partition the test heh

* skipif

* some nits for better names

* add webgpu test for onnxrunner

* fix reference links

* flush for now
2025-06-21 19:29:45 -04:00
George Hotz
e2907360b7 multi is one PM [pr] (#10838)
* multi is one PM [pr]

* disable flaky tests
2025-06-16 14:52:47 -07:00
uuuvn
18d936f981 Remote multihost (#10598) 2025-06-16 13:18:56 -07:00
George Hotz
27cf836958 split ocelot out for autogen, fix CI (#10819)
* split ocelot out for autogen, fix CI

* mac ocelot
2025-06-15 11:37:23 -07:00
chenyu
7d5c769c6b fix compile4 (#10797) 2025-06-12 22:28:56 -04:00
wozeparrot
53edd49a33 feat: bump to llvm20 (#10784) 2025-06-11 16:04:18 -07:00
George Hotz
9d0383634d bump cache and include full python version [pr] (#10768)
* bump cache and include full python version [pr]

* stupid windows

* really stupid windows
2025-06-10 15:07:30 -07:00
chenyu
612cdf5146 move fuzz_shape_ops to run with other fuzzer (#10767)
* move fuzz_shape_ops to run with other fuzzer

* don't skip CPU
2025-06-10 17:43:04 -04:00
chenyu
5e7ad70aae don't run linearize().uop tests in get_action_space test (#10766)
* don't run linearize().uop tests in get_action_space test

this part takes 2 minutes in CI and has nothing to do with action space. also not sure if the "for some reason" comment is still relevant

* -n=auto test/models
2025-06-10 17:23:53 -04:00
George Hotz
0fbf3f5554 Revert "Revert "Update autogen ci runner to ubuntu 24.04 (#10736)" (#10757)" (#10758)
This reverts commit a6dba9b9d9.
2025-06-10 09:32:27 -07:00
George Hotz
a6dba9b9d9 Revert "Update autogen ci runner to ubuntu 24.04 (#10736)" (#10757)
This reverts commit 1d15374c7a.
2025-06-10 09:31:51 -07:00
uuuvn
1d15374c7a Update autogen ci runner to ubuntu 24.04 (#10736)
For `kfd.AMDKFD_IOC_EXPORT_DMABUF`
2025-06-10 08:33:02 -07:00
George Hotz
acf72872b3 move view left to the outer graph prereqs + testing (#10725)
* move view left to the outer graph

* global view right

* dont need that one

* remove comment

* test kernelize

* simple

* split onnx, test sdxl null

* fix testing

* ugh, wrong one

* Update test.yml
2025-06-09 20:43:25 -07:00
George Hotz
ef58ab340a hotfix: remove n=auto from REMOTE=1 test 2025-06-09 09:19:36 -07:00
George Hotz
81b9c04574 move high level stuff to unit tests [pr] (#10708)
* move high level stuff to unit tests [pr]

* process replay on unit tests

* fix pr, less compute

* set omp num threads

* set 200MB buffer size limit

* delete junk

* fix tests

* faster

* move test_indexing to unit

* faster
2025-06-08 14:05:56 -07:00
George Hotz
4e2c3560b4 smaller tests are faster tests [pr] (#10704)
* remove del spam from CI

* more

* preconstruct default buffer spec

* ignore those errors

* check exception

* more exception check

* skip stuff

* smaller tests mean faster tests

* a few more
2025-06-08 10:54:19 -07:00
George Hotz
7ff175c022 cache a venv to avoid pip usage (#10689)
* try built in pip caching

* try venv

* export venv

* set VIRTUAL_ENV

* revert that

* venv key

* fix

* ci cache hit?

* fix windows
2025-06-07 20:13:41 -07:00
George Hotz
53ed64e133 ci speed work 1 (#10676)
* skip a few slow tests

* use a venv for python packages

* create venv

* no user, it's in venv

* ignore venv

* venv

* new cache key

* try that

* this

* version the python cache
2025-06-07 16:33:11 -07:00
qazal
7114b6ab31 viz browser tests (#10626)
* viz browser tests

* expect failure if js/ isn't included

* back green
2025-06-04 14:58:24 +03:00
George Hotz
ee12e801a3 optional fused optimizers (#10549)
* enumerate cases of Tensors in the JIT

* optional fused optimizers

* add fused optimizer test

* move that there

* ugh
2025-05-28 13:50:30 -07:00
Sieds Lykles
ae02a1e232 [bounty] Z3 symbolic fuzzer [pr] (#10514)
* First version, caught a bug?

* Nicely print failure to reproduce

* Remove that

* Put the assert back

* Change fuzzing to use testing_unit so it has z3

* Test key to match

* Add rule

* Add test

* Add test for edge case 0

* Merge patterns

* update comment

* consistent whitespace

* whitespace

* add condition

* add test

* update comment

* use Variable

* fuzzer using z3_renderer

* Cleaned up printing and debugging

* working new fuzzer

* change some comments and printing

* more formatting

* fuzz failures in seperate file

* fix fstring

* more tests

* naming

* remove added line

* remove comment

* print number of skipped expressions

* use self.assertEqual

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-28 16:28:37 -04:00
uuuvn
c29c46853f Very basic mock sqtt (#10512)
This mockgpu sqtt emulation will just ignore basically everything and end
up with a 0x1000 size trace full of zeroes, but just testing for things
like register rename is better than nothing i guess
2025-05-26 14:38:28 -07:00
b1tg
a1f64af92d ci: setup llvm for amdremote (#10507)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-05-25 21:52:27 -04:00
George Hotz
bf2a0907be gate the mockdsp behind MOCKDSP=1 [pr] (#10486) 2025-05-23 11:44:02 -07:00
George Hotz
f1fe1f93c1 hotfix: 14000 lines 2025-05-19 09:40:53 -07:00
uuuvn
0f825e12f2 Remote fixedvars (#10371)
* amd mockgpu graph support

For testing remote graph stuff (prompted by #10371) in ci

* Remote fixedvars

Somehow none of existing tests failed when fixedvars were added, looking
what to add as an regression test for this

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-05-18 09:57:13 -07:00
uuuvn
27c12be471 amd mockgpu graph support (#10385)
For testing remote graph stuff (prompted by #10371) in ci
2025-05-18 09:43:16 -07:00
qazal
0294bfe507 simpler can_pad (#10364)
* simpler can_pad [pr]

* 3 kernels

* tests

* less kernels
2025-05-18 10:00:07 +03:00
George Hotz
50181ab09f hotfix: bump to 13500 lines 2025-05-14 18:49:59 -07:00
George Hotz
ec46f658d7 openpilot llvm test [pr] (#10288) 2025-05-13 16:51:41 -07:00
uuuvn
ddff9857b8 Remote properties is a dataclass (#10283)
Not strictly required for anything but soon there will be like 4 new
properties and having it be a huge json just seems like a bad taste.

It also seems right to not have a separate endpoint for this, just
`GetProperties` request that returns a repr of this similar to how
requests are sent in `BatchRequest`.

This will also make a switch to anything other than http much simpler
if it will be required for any reason, like just a tcp stream of
`BatchRequest`s
2025-05-13 11:56:58 -07:00
uuuvn
ba87eca0f1 Remote multi (basic) (#10269)
* Basic remote multi support

Simplest thing to be able to use remote with multiple gpus, very slow
because no transfers (copyin copyout for cross-device copies)

* tests
2025-05-13 09:52:47 -07:00
hooved
7b4f05fd00 Add test for correctness of Infinity in WebGPU (#10201)
* use function for infinity instead of uniform

* test infinity math locally

* test infinity math in CI

* make pytest available to MacOS (WebGPU)

* revert to master except failing webgpu test
2025-05-08 05:20:05 -07:00
uuuvn
dba073e5c0 Less messy broken graph on paravirtualized metal workaround (#10182)
* Less messy broken graph on paravirtualized metal workaround

GitHub CI macOS runners use paravirtualized metal which is broken with
graph (some comments say that ICB in particular is broken but in my
testing it was fine sometimes, but other times hitting an assert inside
metal's code related to resouces, so not sure).

> Assertion failed: (resource != nil), function -[IOGPUMetalResource initWithResource:], file IOGPUMetalResource.m, line 458.

This can be reproduced locally with any virtualization software (like utm)
that can create macOS VMs with apple's own virtualization framework.

* unused import
2025-05-06 20:41:02 +03:00
wozeparrot
10437904cd refactor: ops_cloud -> ops_remote [pr] (#10166) 2025-05-05 15:59:51 -07:00
George Hotz
e07d8b147a hotfix: don't OOM in the osx unit test 2025-05-04 17:53:55 -07:00
George Hotz
a0240d8c2b lil work on llvm speed (#10157)
* lil work on llvm speed

* llvm failing test

* 1e-4

* simpler failing test

* once is fine

* gpt suggests this syntax change

* bump that debug
2025-05-04 16:37:26 -07:00
George Hotz
fe0724eebf prebuild all rewrites [pr] (#10154)
* prebuild all rewrites [pr]

* fix that

* tests pass with linearizer
2025-05-04 13:01:18 -07:00
qazal
230a369708 remove some IGNORE_OOB [pr] (#10142)
* remove some IGNORE_OOB

* remove fuzz_schedule stuff

* test with global

* add for amd ci
2025-05-03 01:16:14 +03:00
nimlgen
16e5376ae8 line limit 12800 for usb (#10130) 2025-05-01 16:57:44 +03:00
George Hotz
ef011ff5f9 flip Ops.COPY order [pr] (#10122)
* flip Ops.COPY order [pr]

* fix copy and support multi device copy in _device
2025-05-01 00:26:24 -04:00
Ignacio Sica
bda116d773 fix use_tensor_cores propagation (#10048)
* propagate use_tensor_cores

* add use_tensor_core to arg in test and search

* bugfix

* get TC val from ContextVar in search

* revert minor space change

* add tc emulation test to ci and benchmark

* revert

* revert whitespace change

* remove test for ptx

* add comment and remove llvm test run
2025-04-28 19:30:50 -03:00
George Hotz
ea5dddc537 reduce collapse generic (#10045)
* reduce collapse generic

* new arange folder

* new range folding

* correct with sym

* all tests pass

* indexing ops passes

* failing tests

* fix tests, remove unused

* revert that

* torch indexing is fast

* skip on webgpu

* touchups

* comments
2025-04-26 09:13:24 -04:00
chenyu
74c6cf8be3 lint mlperf model_train (#10038) 2025-04-24 16:19:44 -04:00
Ignacio Sica
51ca19d061 set test_tensor_cores_padded_amd to expectedFailure (#10036)
* init

* add expected failure to correctly track progres

* hotfix

* skip for amd_llvm as well

* add skip

* add pr number

* move comment to amd test

* change reason
2025-04-24 17:11:40 -03:00
b1tg
914d89fa0b fix tensor cores for gfx1201 (#9838)
* fix tensor cores for gfx1201

* fix typo

* fix python wmma

* AMDLLVMRenderer with arch + AMDLLVM tensor_cores

* fix ci

* clean up

* more tensor cores for RDNA4

* fix half/half, bfloat16/float, bfloat16/bfloat16 for amd_llvm

---------

Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-04-24 14:57:41 -04:00