Commit Graph

1107 Commits

Author SHA1 Message Date
qazal
712980e167 fix extract_dataset + add tests to CI (#10995)
* fix extract_dataset + tests

* add CI

* sops.gz itself is same as master

* yml + gzip -c + ge

* don't commit that

* bump limit to 1000

* axis=7

* test_tiny
2025-06-27 01:51:36 +03:00
chenyu
efad567ebd ruff check whole examples/mlperf/ (#10979) 2025-06-25 12:57:48 -04:00
Alexey Zaytsev
230ad3a460 [bounty] Don't use numpy inside hlb_cifar10 training loop (#10777)
* Don't use numpy inside hlb_cifar10 training loop

* Lint it

* jit it

* Drop the last half-batch

* Use gather for random_crop and reuse perms

* Wrap train_cifar in FUSE_ARANGE context

* No need to pass FUSE_ARANGE=1 to hlb_cifar10.py

* Add cutmix to jittable augmentations

* Remove .contiguous() from fetch_batches

* Fix indexing boundary

---------

Co-authored-by: Irwin1138 <irwin1139@gmail.com>
2025-06-23 17:24:56 -07:00
George Hotz
0f89660ce4 Revert "change clang -march flag to -mcpu on arm (#10841)" (#10942)
This reverts commit 897e42fd1b.
2025-06-23 16:48:28 -07:00
ttomsa
897e42fd1b change clang -march flag to -mcpu on arm (#10841)
* change clang -march flag to -mcpu with fp16 disassembly test

* fix

* add capstone to macos dependencies

* just check no cast in test

* rm import

* woops

* lets check

* move check

* llvm init before cpu chcek

* try this

* bump autogen llvm version

* also update libclang?

* revert

* add comment

* skip llvm test and add comment

* linter
2025-06-23 16:28:48 -07:00
George Hotz
ae4d2d71b4 bump line count to 14500 2025-06-23 15:32:27 -07:00
geohotstan
4ab7d792cc ONNX improve dtype fallback (#10800)
* fix

* add early verbose demo test

* is this how to write tests :s

* is definition drift even a thing? gemini says it is

* clean up

* better

* even better

* try add to CI

* doesn't work quite yet

* much more work to be done

* whoops

* partition the test heh

* skipif

* some nits for better names

* add webgpu test for onnxrunner

* fix reference links

* flush for now
2025-06-21 19:29:45 -04:00
chenyu
d71bb6a7b2 remove comma 0.9.4 from benchmark (#10867) 2025-06-18 12:43:59 -04:00
George Hotz
e2907360b7 multi is one PM [pr] (#10838)
* multi is one PM [pr]

* disable flaky tests
2025-06-16 14:52:47 -07:00
uuuvn
18d936f981 Remote multihost (#10598) 2025-06-16 13:18:56 -07:00
George Hotz
27cf836958 split ocelot out for autogen, fix CI (#10819)
* split ocelot out for autogen, fix CI

* mac ocelot
2025-06-15 11:37:23 -07:00
chenyu
7d5c769c6b fix compile4 (#10797) 2025-06-12 22:28:56 -04:00
chenyu
4242b9874e remove AMD_LLVM=0 in mlperf and search ci (#10785)
tinybox updated to llvm 20
2025-06-11 21:10:31 -04:00
wozeparrot
53edd49a33 feat: bump to llvm20 (#10784) 2025-06-11 16:04:18 -07:00
chenyu
7d8939908f AMD_LLVM=0 for resnet cron (#10780)
similar pf on llvm19 and fine on 20
2025-06-11 16:28:40 -04:00
chenyu
d465ef4acb AMD_LLVM=0 for sdxl search (#10779)
hangs with llvm19 but seems fine with llvm20
2025-06-11 14:56:45 -04:00
George Hotz
9d0383634d bump cache and include full python version [pr] (#10768)
* bump cache and include full python version [pr]

* stupid windows

* really stupid windows
2025-06-10 15:07:30 -07:00
chenyu
612cdf5146 move fuzz_shape_ops to run with other fuzzer (#10767)
* move fuzz_shape_ops to run with other fuzzer

* don't skip CPU
2025-06-10 17:43:04 -04:00
chenyu
5e7ad70aae don't run linearize().uop tests in get_action_space test (#10766)
* don't run linearize().uop tests in get_action_space test

this part takes 2 minutes in CI and has nothing to do with action space. also not sure if the "for some reason" comment is still relevant

* -n=auto test/models
2025-06-10 17:23:53 -04:00
George Hotz
0fbf3f5554 Revert "Revert "Update autogen ci runner to ubuntu 24.04 (#10736)" (#10757)" (#10758)
This reverts commit a6dba9b9d9.
2025-06-10 09:32:27 -07:00
George Hotz
a6dba9b9d9 Revert "Update autogen ci runner to ubuntu 24.04 (#10736)" (#10757)
This reverts commit 1d15374c7a.
2025-06-10 09:31:51 -07:00
uuuvn
1d15374c7a Update autogen ci runner to ubuntu 24.04 (#10736)
For `kfd.AMDKFD_IOC_EXPORT_DMABUF`
2025-06-10 08:33:02 -07:00
George Hotz
acf72872b3 move view left to the outer graph prereqs + testing (#10725)
* move view left to the outer graph

* global view right

* dont need that one

* remove comment

* test kernelize

* simple

* split onnx, test sdxl null

* fix testing

* ugh, wrong one

* Update test.yml
2025-06-09 20:43:25 -07:00
George Hotz
58eebdb507 don't reassign metadata to the same uop + ignore oob in pr [pr] (#10737) 2025-06-09 18:43:39 -07:00
George Hotz
ef58ab340a hotfix: remove n=auto from REMOTE=1 test 2025-06-09 09:19:36 -07:00
chenyu
d93a0bee6b mlperf ci uses its own cache (#10705)
not to interfere with regular cache which is used by benchmark
2025-06-08 19:43:32 -04:00
George Hotz
81b9c04574 move high level stuff to unit tests [pr] (#10708)
* move high level stuff to unit tests [pr]

* process replay on unit tests

* fix pr, less compute

* set omp num threads

* set 200MB buffer size limit

* delete junk

* fix tests

* faster

* move test_indexing to unit

* faster
2025-06-08 14:05:56 -07:00
George Hotz
4305f532d9 clean up apt stuff (#10706)
* clean up apt stuff

* single apt install

* fixes

* fix opencl + ldconfig
2025-06-08 11:06:09 -07:00
George Hotz
4e2c3560b4 smaller tests are faster tests [pr] (#10704)
* remove del spam from CI

* more

* preconstruct default buffer spec

* ignore those errors

* check exception

* more exception check

* skip stuff

* smaller tests mean faster tests

* a few more
2025-06-08 10:54:19 -07:00
George Hotz
32141ec867 make apt CI faster (#10702) 2025-06-08 09:43:39 -07:00
chenyu
4f535641f7 add one huggingface_onnx test to mac benchmark ci (#10700)
this crashed for me on onnx parser pr but seems fine for the author. see if ci mac is fine
2025-06-08 12:26:12 -04:00
George Hotz
7ff175c022 cache a venv to avoid pip usage (#10689)
* try built in pip caching

* try venv

* export venv

* set VIRTUAL_ENV

* revert that

* venv key

* fix

* ci cache hit?

* fix windows
2025-06-07 20:13:41 -07:00
George Hotz
53ed64e133 ci speed work 1 (#10676)
* skip a few slow tests

* use a venv for python packages

* create venv

* no user, it's in venv

* ignore venv

* venv

* new cache key

* try that

* this

* version the python cache
2025-06-07 16:33:11 -07:00
wozeparrot
37e1ef1be3 feat: cleanup old AM processes (#10653) 2025-06-05 15:41:00 -07:00
qazal
7114b6ab31 viz browser tests (#10626)
* viz browser tests

* expect failure if js/ isn't included

* back green
2025-06-04 14:58:24 +03:00
chenyu
18e9ec3ea1 add wino cifar to search benchmark (#10615)
* add wino cifar to search benchmark

* FUSE_OPTIM=1

* revert those
2025-06-03 20:38:43 -04:00
chenyu
1c1f578490 DISABLE_COMPILER_CACHE in sdxl search (#10614) 2025-06-03 09:22:25 -04:00
chenyu
4ab3391e6f set -o pipefail for mlperf run_and_time (#10577)
also run the 5.1 script in ci cron job
2025-05-30 16:36:44 -04:00
wozeparrot
5e3c4a8431 fix: comma testsig (#10568) 2025-05-29 19:00:07 -07:00
George Hotz
ee12e801a3 optional fused optimizers (#10549)
* enumerate cases of Tensors in the JIT

* optional fused optimizers

* add fused optimizer test

* move that there

* ugh
2025-05-28 13:50:30 -07:00
Sieds Lykles
ae02a1e232 [bounty] Z3 symbolic fuzzer [pr] (#10514)
* First version, caught a bug?

* Nicely print failure to reproduce

* Remove that

* Put the assert back

* Change fuzzing to use testing_unit so it has z3

* Test key to match

* Add rule

* Add test

* Add test for edge case 0

* Merge patterns

* update comment

* consistent whitespace

* whitespace

* add condition

* add test

* update comment

* use Variable

* fuzzer using z3_renderer

* Cleaned up printing and debugging

* working new fuzzer

* change some comments and printing

* more formatting

* fuzz failures in seperate file

* fix fstring

* more tests

* naming

* remove added line

* remove comment

* print number of skipped expressions

* use self.assertEqual

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-28 16:28:37 -04:00
chenyu
23e41f523a sdxl also run with cached search (#10546) 2025-05-28 06:51:56 -04:00
chenyu
fffdc4d31c workflow to run sdxl with search (#10543) 2025-05-27 17:25:41 -04:00
uuuvn
c29c46853f Very basic mock sqtt (#10512)
This mockgpu sqtt emulation will just ignore basically everything and end
up with a 0x1000 size trace full of zeroes, but just testing for things
like register rename is better than nothing i guess
2025-05-26 14:38:28 -07:00
chenyu
2eeea373af add BENCHMARK_LOG for mlperf resnet cron (#10516) 2025-05-25 22:00:29 -04:00
b1tg
a1f64af92d ci: setup llvm for amdremote (#10507)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-05-25 21:52:27 -04:00
wozeparrot
7c81f9f95e fix: gate mlperf workflow (#10515) 2025-05-25 17:06:21 -07:00
George Hotz
6b8eb5fec2 split mlperf to its own red benchmark run (#10492)
* Add mmapeak implementation for 7900 XTX

* Change identation

* Use a template instead of multiple assebly files

* Fix output formatting

* Reduce register file bank conflicts

* More accurate measurement for quick instructions

* Add support for gfx1201

* RDNA4 wmma requires less VGRPs

* RDNA4 does not have s_cmpk instructions

* Add v_wmma_i32_16x16x32_iu4 for gfx1201

* Add sparse wmma instructions

* split to tinybox red MLPerf Benchmark

---------

Co-authored-by: Panagiotis Kourouklidis <panagiotis.kourouklidis@gmail.com>
2025-05-23 17:12:41 -07:00
George Hotz
bf2a0907be gate the mockdsp behind MOCKDSP=1 [pr] (#10486) 2025-05-23 11:44:02 -07:00
uuuvn
3ca5680920 Test remote in benchmark (#10304)
hlb cifar is fast so added it, can add bert too if you think it's ok

6 real gpus to test multigraph and transfers + accuracy validation

should probably be added to tinystats too, i don't know how though

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-23 12:12:57 -04:00