Commit Graph

176 Commits

Author SHA1 Message Date
chenyu
0a98fd38b3 fix tests that failed locally on mac (#13872)
keccak output was silently broken without contiguous
2025-12-29 11:23:38 -05:00
George Hotz
3dbde178c1 mark slow tests as slow instead of as CI (#13736)
* mark slow tests as slow instead of as CI

* CI shouldn't have different behavior

* more skips / CI

* slow
2025-12-17 10:29:57 -04:00
George Hotz
9015a22523 make tests faster (#13734) 2025-12-17 09:39:44 -04:00
Douglas Nyberg
947c6eefc3 add Swish op (#13541)
* add Swish ONNX operator

* add Swish regression test

* remove trailing whitespace

* upgrade ONNX to 1.20, add excludes for unimplemented ops

* upgrade ONNX to 1.19, add Swish op

* upgrade ONNX to 1.19, TensorFlow to 2.18, add Swish op

* exclude attention_3d and attention_4d_gqa tests

* exclude attention fp16 tests

* exclude all attention tests

* retrigger CI

* retrigger CI - worker crash
2025-12-08 12:41:18 -05:00
chenyu
42f6cf3a90 tighter test_real_world mem and kernel count bounds (#13573)
also check if actual usage is within 20% of set limit, the old limits are too big to be useful
2025-12-04 13:35:39 -05:00
George Hotz
6bd355fa26 add needs_second_gpu decorator (#13543)
* add needs_second_gpu decorator

* more skips

* two more fixes
2025-12-02 19:08:23 -08:00
Douglas Nyberg
6a7c58abf1 fix(onnx): unwrap list/tuple value in Pad op (#13500)
* fix(onnx): unwrap list/tuple value in Pad op

* add regression test for Pad list value

* remove trailing whitespace

* use _resolve_const for Pad constant_value
2025-12-02 07:47:20 -08:00
George Hotz
fd373fea7a fix a few tests [pr] (#13498) 2025-11-29 13:43:45 -08:00
C T
2d53029be3 Whisper less flaky tests (#13435)
* use less flaky metric for whisper long transcription

* multiline long transcription 3 reference

* fix reference transcript

see https://homepage.ntu.edu.tw/~karchung/miniconversations/MC.htm
sanitized for whisper

* try lower wer threshold

* add test for wer metric

* extract TRANSCRIPTION_3_ALT

* rename test

* rename

* add tests for high WER difference

* move tests

* sync metric
2025-11-24 09:50:49 -08:00
George Hotz
b0da173f2f add unique to const, fix longstanding bug (#12965)
* add unique to const, fix longstanding bug

* _force_unique=True

* fix tests

* fix more tests
2025-10-28 15:11:37 +08:00
George Hotz
3b0b3a2e64 fast RANGEIFY (#12504)
* rtoposort is fast, can replace rangeify with this

* fast rangeify

* work

* fast rangeify works for mnist

* should work

* progress

* pad fix

* FAST

* tests passing

* don't delete those shape ops

* put in rangeify map

* ending ranges fix

* tests

* mstack/mselect no hacks

* move to indexing.py

* touch up tests + add comments

* disable failing test

* actually make the file readable

* failing

* error
2025-10-08 19:38:06 +08:00
qazal
9448924d9e update gpt2 kernel count tests in CI=0 (#12523) 2025-10-08 14:29:11 +03:00
George Hotz
0f25b4b289 move frontend dir to nn [pr] (#12470) 2025-10-07 10:42:22 +08:00
chenyu
a1881b0c17 update test_chicken (#12466)
logits are close, just numerical
2025-10-06 03:58:44 -04:00
chenyu
b087663c35 RANGEIFY test_bert uses more ran somehow (#12443) 2025-10-03 04:38:53 -04:00
chenyu
f203d8b221 update RANGEIFY kernel count and test_masked_select (#12435) 2025-10-03 00:41:34 -04:00
George Hotz
7129419500 fix cifar training in RANGEIFY (#12355)
* fix cifar training in RANGEIFY

* even more wino fuse

* bugfix

* test to show issue
2025-09-30 15:59:19 +08:00
chenyu
25091951ba update test/models (#12142)
minor fix and run more stuff in tinygrad for speed
2025-09-12 16:43:28 -04:00
chenyu
647965fb09 test_train cleanup (#12140)
* test_train cleanup

remove skipIf due to buffer sizes, runs locally

* those are slow
2025-09-12 13:21:30 -04:00
chenyu
20cd7177de delete test_bert_fuse_arange (#12121)
* delete test_bert_fuse_arange

it's the default now and we are not interested in FUSE_ARANGE=0 version

* remove -v
2025-09-11 12:35:51 -04:00
chenyu
0e266f376c ops_gpu -> ops_cl (#12103) 2025-09-10 15:15:48 -04:00
nimlgen
1c6c42715f unify cpu and llvm (#11982)
* try unify cpu and llvm

* fixes

* fix

* ops

* no llvm

* fix

* rm

* lvmm is ot

* oops

* override

* no llvm

* ignore

* skip llvm

* ooops
2025-09-09 13:54:44 +03:00
chenyu
3b41a04b96 remove test_openpilot in test_onnx (#12037)
openpilot is tested in compile3
2025-09-05 16:20:03 -04:00
George Hotz
1d307f568c move device tests to test/device + test cleanups (#11735)
* move device tests to test/device

* test speedups

* test device

* linalg to unit

* upd

* so pytest just works

* more divide and skip

* speed

* test devectorize

* add pillow
2025-08-19 16:02:20 -07:00
geohotstan
1e904155e3 Add Onnx Huggingface to test/models/test_onnx.py (#11468)
* BOOM

* cache extra/huggingface/models/

* why max buffer size is not 0

* override MAX_BUFFER_SIZE

* less models

* remove more models and change cache dir to already cached dir

* only metal

* less is more?

* remove check ops

* why is this not setting the ENVVAR

* ughhhhh just test in models

* only cpu and gpu

* only cpu actually

* just override it idk

* final

* move extra dependencies up top

* simplification

* fix print

* make README better

* revert ops_disk fix for now

* clean up test_onnx

* remove testing fashion clip model cuz sloooowwwwww

* actually let METAL run this

* fix comment mistake

* fix download path in run_models

* does this work?

* cleanup setup and teardown

* contextvar like this?

* prove model is cached

* do I need to increment DOWNLOAD_CACHE_VERSION?

* see if cached with incremented DOWNLOAD_CACHE_VERSION

* use warnings to see if the model exists

* revert DOWNLOAD_CACHE_VERSION stuff and clean up

* add retry to download

* nit
2025-08-14 11:16:41 -04:00
geohotstan
5ce278b245 OnnxRunner file as input (#10789)
* file path as input and have parse be in OnnxRunner.__init__

* modelproto_to_onnxrunner -> modelproto_to_runner

* whoops, fix import

* oh flakiness again, is it because it's getting gc-ed?

* small changes

* CI flaky so just move compile4 fix in

* copy typing of onnx_load

* actually can just import onnx_load instead of onnx.load

* fix external_benchmark_openpilot

* fix onnx_runner test to use onnx_helper

* rerun CI

* try run_modelproto

* spam CI a few times

* revert run_modelproto since that's flaky also

* no external onnx_load usage except onnx.py

* cursor tab complete is evil. Snuck a darn sorted in. But does order change result? Why?

* model_benchmark 193s -> 80s, add OnnxRunner.to()...

* minimize diff and clean up

* device can be None, weird but eh

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-07-12 14:27:46 -04:00
chenyu
bfa87f3490 clean up binary_crossentropy_logits (#10958) 2025-06-24 12:23:40 -04:00
George Hotz
e2907360b7 multi is one PM [pr] (#10838)
* multi is one PM [pr]

* disable flaky tests
2025-06-16 14:52:47 -07:00
b1tg
24d328e313 onnx parser (#10435)
* onnx parser

* fix compile, lint

* onnx.load -> onnx_load

* compatible with ModelProto

* fix test external_test_onnx_ops.py

* fix tests

* fix signed int

* reduce to 261 lines

* fix TypeProto.Optional

* debug for _parse_message, add TypeProto.Sequence, cleanup

* onnx_load from Tensor

* remove BufferedReader

* 174 lines and reduce tensor copy

* cleanup

* use onnx_load in external_model_benchmark.py

* fix qcom test

* [onnx] parser support external data

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-06-09 12:44:28 -04:00
George Hotz
81b9c04574 move high level stuff to unit tests [pr] (#10708)
* move high level stuff to unit tests [pr]

* process replay on unit tests

* fix pr, less compute

* set omp num threads

* set 200MB buffer size limit

* delete junk

* fix tests

* faster

* move test_indexing to unit

* faster
2025-06-08 14:05:56 -07:00
Sieds Lykles
c29a56dd51 Fix whisper OOB (#10685)
* fix whisper and test

* remove import
2025-06-07 20:23:50 -04:00
George Hotz
b3b43a82c4 remove Tensor.no_grad, it's meaningless now [pr] (#10556) 2025-05-28 22:20:02 -07:00
qazal
95c6a736a9 fix FUSE_ARANGE=1 for bert (#10255) 2025-05-12 14:44:05 +03:00
chenyu
70c797b107 train bert tests (#10248)
added a working bert tiny test, and a failed bert FUSE_ARANGE test
2025-05-11 08:42:08 -04:00
George Hotz
b6d2effaf5 assign is contiguous (#10066)
* assign is contiguous

* disable process replay for SDXL
2025-04-27 08:40:33 -04:00
qazal
c990aac2b1 skip flaky test_transcribe_file1_OOB (#10026) 2025-04-24 21:09:43 +08:00
Sieds Lykles
e75be6eafc [bounty] [pr] index validation with z3 (#9981)
* index validation with z3

* Change comment

* toposort -> toposort()

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-04-24 08:06:08 -04:00
George Hotz
74d98eafb8 add onnx frontend stub [pr] (#9558) 2025-03-24 12:24:34 +08:00
qazal
14aa2395d0 allow VIEW(BUFFER) in Tensor UOps [pr] (#9210)
* allow VIEW(BUFFER) in Tensor UOps [pr]

* still reshapes

* update becomes_map tests

* bring copy folder to the scheduler

* lint

* only sgd left

* optimizer assign

* 13 kernels

* rename to test_reorder_expand + assert VIEW
2025-02-24 13:06:15 +01:00
chenyu
2e7c2780a9 CLANG -> CPU (#9189) 2025-02-20 18:03:09 -05:00
qazal
1fce864a6d delete multi output support (#8822)
* delete multioutput for now

* test_schedule

* test_assign too

* linter

* 515 for sd

* update tests and ctx

* update that assign check
2025-01-30 22:45:50 -05:00
George Hotz
a9d9f98d05 hotfix: those tests fail locally on mac due to buffer count 2025-01-27 07:53:48 +09:00
George Hotz
b4bf6a7dea switch backward to use gradient [pr] (#8235)
* switch backward to use gradient [pr]

* set device correctly, dedup

* why does that fail?

* add noop cast

* simple backward

* fix beautiful_mnist

* touchups

* set in compute_gradient

* uop_count

* uop_count was wrong

* collections

* no note

* skip that test

* update sched kernel counts

* train mnist is 65

* fix metadata and gc

* fixes

* materialize_grads

* no pathlib stuff

* add contiguous_backward, fix bugs

* add some realize

* fix multi
2025-01-26 09:12:16 +09:00
geohotstan
dd82b4c913 make onnx runner a class (#8647)
* this

* clean up

* more clean ups and improve debug msg

* more correct training toggler

* remove manual training toggling

* change some variable names

* actually just add the training toggle for LIMIT envvar too

* more refinement

* __call__ and OnnxRunner

* fix half pylint, other half is importing from onnx while this file is onnx.py, figure out later

* ahhhh found another mistake

* remove limit from __call__

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-01-20 10:11:05 -08:00
George Hotz
f29d6f54b8 support multilb gradient [pr] (#8624) 2025-01-14 18:33:33 -08:00
Francis Lata
5755ac1f72 Fix FC layer ResNet load_from_pretrained error (#8387)
* validate that FC exists before loading pretrained weights

* add test case for ResNet pretrained model without FC layer

* remove extra newline

* rename test case

* reraise exception if not handled by check
2024-12-26 18:11:27 -05:00
geohotstan
78cb47dfc5 docs and tests clean ups (#8383) 2024-12-23 11:12:13 -05:00
George Hotz
bd9c015b09 tests from grad uop path [pr] (#8313) 2024-12-18 09:25:05 -08:00
George Hotz
aa3b094334 changes from delete lazy [pr] (#8146)
* changes from delete lazy [pr]

* test tweak
2024-12-10 11:06:17 -08:00
chenyu
aa51f3c14e update kernel counts in test_real_world (#7960)
the test was useless because it was looking at the jit graph counts. wrap with JIT=2 for now.

if it's stable we could consider making kernel count strict, which helps change like #7940
2024-11-29 11:14:54 -05:00