Commit Graph

833 Commits

Author SHA1 Message Date
George Hotz
ef58ab340a hotfix: remove n=auto from REMOTE=1 test 2025-06-09 09:19:36 -07:00
chenyu
d93a0bee6b mlperf ci uses its own cache (#10705)
not to interfere with regular cache which is used by benchmark
2025-06-08 19:43:32 -04:00
George Hotz
81b9c04574 move high level stuff to unit tests [pr] (#10708)
* move high level stuff to unit tests [pr]

* process replay on unit tests

* fix pr, less compute

* set omp num threads

* set 200MB buffer size limit

* delete junk

* fix tests

* faster

* move test_indexing to unit

* faster
2025-06-08 14:05:56 -07:00
George Hotz
4305f532d9 clean up apt stuff (#10706)
* clean up apt stuff

* single apt install

* fixes

* fix opencl + ldconfig
2025-06-08 11:06:09 -07:00
George Hotz
4e2c3560b4 smaller tests are faster tests [pr] (#10704)
* remove del spam from CI

* more

* preconstruct default buffer spec

* ignore those errors

* check exception

* more exception check

* skip stuff

* smaller tests mean faster tests

* a few more
2025-06-08 10:54:19 -07:00
George Hotz
32141ec867 make apt CI faster (#10702) 2025-06-08 09:43:39 -07:00
chenyu
4f535641f7 add one huggingface_onnx test to mac benchmark ci (#10700)
this crashed for me on onnx parser pr but seems fine for the author. see if ci mac is fine
2025-06-08 12:26:12 -04:00
George Hotz
7ff175c022 cache a venv to avoid pip usage (#10689)
* try built in pip caching

* try venv

* export venv

* set VIRTUAL_ENV

* revert that

* venv key

* fix

* ci cache hit?

* fix windows
2025-06-07 20:13:41 -07:00
George Hotz
53ed64e133 ci speed work 1 (#10676)
* skip a few slow tests

* use a venv for python packages

* create venv

* no user, it's in venv

* ignore venv

* venv

* new cache key

* try that

* this

* version the python cache
2025-06-07 16:33:11 -07:00
wozeparrot
37e1ef1be3 feat: cleanup old AM processes (#10653) 2025-06-05 15:41:00 -07:00
qazal
7114b6ab31 viz browser tests (#10626)
* viz browser tests

* expect failure if js/ isn't included

* back green
2025-06-04 14:58:24 +03:00
chenyu
18e9ec3ea1 add wino cifar to search benchmark (#10615)
* add wino cifar to search benchmark

* FUSE_OPTIM=1

* revert those
2025-06-03 20:38:43 -04:00
chenyu
1c1f578490 DISABLE_COMPILER_CACHE in sdxl search (#10614) 2025-06-03 09:22:25 -04:00
chenyu
4ab3391e6f set -o pipefail for mlperf run_and_time (#10577)
also run the 5.1 script in ci cron job
2025-05-30 16:36:44 -04:00
wozeparrot
5e3c4a8431 fix: comma testsig (#10568) 2025-05-29 19:00:07 -07:00
George Hotz
ee12e801a3 optional fused optimizers (#10549)
* enumerate cases of Tensors in the JIT

* optional fused optimizers

* add fused optimizer test

* move that there

* ugh
2025-05-28 13:50:30 -07:00
Sieds Lykles
ae02a1e232 [bounty] Z3 symbolic fuzzer [pr] (#10514)
* First version, caught a bug?

* Nicely print failure to reproduce

* Remove that

* Put the assert back

* Change fuzzing to use testing_unit so it has z3

* Test key to match

* Add rule

* Add test

* Add test for edge case 0

* Merge patterns

* update comment

* consistent whitespace

* whitespace

* add condition

* add test

* update comment

* use Variable

* fuzzer using z3_renderer

* Cleaned up printing and debugging

* working new fuzzer

* change some comments and printing

* more formatting

* fuzz failures in seperate file

* fix fstring

* more tests

* naming

* remove added line

* remove comment

* print number of skipped expressions

* use self.assertEqual

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-28 16:28:37 -04:00
chenyu
23e41f523a sdxl also run with cached search (#10546) 2025-05-28 06:51:56 -04:00
chenyu
fffdc4d31c workflow to run sdxl with search (#10543) 2025-05-27 17:25:41 -04:00
uuuvn
c29c46853f Very basic mock sqtt (#10512)
This mockgpu sqtt emulation will just ignore basically everything and end
up with a 0x1000 size trace full of zeroes, but just testing for things
like register rename is better than nothing i guess
2025-05-26 14:38:28 -07:00
chenyu
2eeea373af add BENCHMARK_LOG for mlperf resnet cron (#10516) 2025-05-25 22:00:29 -04:00
b1tg
a1f64af92d ci: setup llvm for amdremote (#10507)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-05-25 21:52:27 -04:00
wozeparrot
7c81f9f95e fix: gate mlperf workflow (#10515) 2025-05-25 17:06:21 -07:00
George Hotz
6b8eb5fec2 split mlperf to its own red benchmark run (#10492)
* Add mmapeak implementation for 7900 XTX

* Change identation

* Use a template instead of multiple assebly files

* Fix output formatting

* Reduce register file bank conflicts

* More accurate measurement for quick instructions

* Add support for gfx1201

* RDNA4 wmma requires less VGRPs

* RDNA4 does not have s_cmpk instructions

* Add v_wmma_i32_16x16x32_iu4 for gfx1201

* Add sparse wmma instructions

* split to tinybox red MLPerf Benchmark

---------

Co-authored-by: Panagiotis Kourouklidis <panagiotis.kourouklidis@gmail.com>
2025-05-23 17:12:41 -07:00
George Hotz
bf2a0907be gate the mockdsp behind MOCKDSP=1 [pr] (#10486) 2025-05-23 11:44:02 -07:00
uuuvn
3ca5680920 Test remote in benchmark (#10304)
hlb cifar is fast so added it, can add bert too if you think it's ok

6 real gpus to test multigraph and transfers + accuracy validation

should probably be added to tinystats too, i don't know how though

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-23 12:12:57 -04:00
chenyu
c5acb4e06e run mlperf resnet daily (#10482)
Runs at 08:05 UTC (12:05 AM Pacific Time)
2025-05-23 07:16:20 -04:00
chenyu
116d9e6306 run mlperf resnet on red box (#10413)
also made push to `update_mlperf` branch trigger
2025-05-19 12:48:36 -04:00
George Hotz
f1fe1f93c1 hotfix: 14000 lines 2025-05-19 09:40:53 -07:00
qazal
90eb3c0e5d add MobileNetV2 benchmark to comma CI (#10250)
* add MobileNetV2 to comma CI

* symlink imagenet

* also the signature

* comment that out

* need imagenetmock

* same train and test set

* quantize on CPU=1

* verbose

* need __hexagon_divsf3

* 0x858d6c15

* quant cpu + CC=clang-19
2025-05-19 18:22:50 +03:00
George Hotz
b06291077c no amdgpu kernel driver (#10408)
* no amdgpu kernel driver

* don't test hip

* lower req
2025-05-18 20:52:39 -07:00
chenyu
485e80da69 run_and_time for resnet ci (#10405) 2025-05-18 23:39:57 -04:00
uuuvn
0f825e12f2 Remote fixedvars (#10371)
* amd mockgpu graph support

For testing remote graph stuff (prompted by #10371) in ci

* Remote fixedvars

Somehow none of existing tests failed when fixedvars were added, looking
what to add as an regression test for this

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-05-18 09:57:13 -07:00
uuuvn
27c12be471 amd mockgpu graph support (#10385)
For testing remote graph stuff (prompted by #10371) in ci
2025-05-18 09:43:16 -07:00
chenyu
9b4e2a75cd symlink datasets in mlperf workflow (#10391) 2025-05-18 03:26:05 -04:00
qazal
0294bfe507 simpler can_pad (#10364)
* simpler can_pad [pr]

* 3 kernels

* tests

* less kernels
2025-05-18 10:00:07 +03:00
chenyu
efa8dfe7fb test cron job to run resnet (#10368) 2025-05-17 08:57:02 -04:00
chenyu
c798f2f427 brew --quiet to suppress already installed warnings (#10346)
example https://github.com/tinygrad/tinygrad/actions/runs/15057000247
2025-05-15 23:31:18 -04:00
wozeparrot
1ed04f993b move benchmark stat tracking to influxdb (#10185) 2025-05-15 16:14:56 -07:00
Ignacio Sica
47b3055fe2 set fail-fast behavior (#10336) 2025-05-15 11:24:45 -07:00
George Hotz
50181ab09f hotfix: bump to 13500 lines 2025-05-14 18:49:59 -07:00
George Hotz
7a3d4de59a hotfix: add GRAPH_ONE_KERNEL=1 to UsbGPU openpilot test 2025-05-14 14:50:37 -07:00
George Hotz
f1130ab3d3 openpilot benchmark test (#10290)
* openpilot benchmark test

* that
2025-05-13 22:49:28 -07:00
George Hotz
ec46f658d7 openpilot llvm test [pr] (#10288) 2025-05-13 16:51:41 -07:00
uuuvn
ddff9857b8 Remote properties is a dataclass (#10283)
Not strictly required for anything but soon there will be like 4 new
properties and having it be a huge json just seems like a bad taste.

It also seems right to not have a separate endpoint for this, just
`GetProperties` request that returns a repr of this similar to how
requests are sent in `BatchRequest`.

This will also make a switch to anything other than http much simpler
if it will be required for any reason, like just a tcp stream of
`BatchRequest`s
2025-05-13 11:56:58 -07:00
uuuvn
ba87eca0f1 Remote multi (basic) (#10269)
* Basic remote multi support

Simplest thing to be able to use remote with multiple gpus, very slow
because no transfers (copyin copyout for cross-device copies)

* tests
2025-05-13 09:52:47 -07:00
chenyu
ad5cb2717d FUSE_ARANGE=1 in bert bench (#10263)
still fails, something multi related maybe

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2025-05-13 09:12:19 -04:00
chenyu
0015b3921f sleep more in CI Remove amdgpu (#10261)
see if this is less flaky
2025-05-12 08:13:44 -04:00
hooved
7b4f05fd00 Add test for correctness of Infinity in WebGPU (#10201)
* use function for infinity instead of uniform

* test infinity math locally

* test infinity math in CI

* make pytest available to MacOS (WebGPU)

* revert to master except failing webgpu test
2025-05-08 05:20:05 -07:00
nimlgen
7d6ed1b1e9 hotfix: mac ci (#10210)
* fixed?

* cmnt
2025-05-08 14:13:23 +03:00