Commit Graph

4667 Commits

Author SHA1 Message Date
geohotstan
ad9dec25b3 combine onnx parser and onnx (#11485)
* start

* more

* fix onnx_runner test

* pass

* patch for disk and add domains from huggingface

* simpler docs

* revert domain changes

* rerun ci

* revert onnx ops test change

* add fix from strenum stuff

* correct way

* revert correct way to leave the fix for another PR

* test segfault

* Revert "test segfault"

This reverts commit 4e1aaf41e7.

* remove some unnecessary documentation

* test segfault again

* Revert "test segfault again"

This reverts commit 56fc5f03e7.

* try gemini suggested patch for sys._getframe

* keep trying with gemini

* revert not working gemini suggestions and try faulthandler

* remove pythonfaulthandler

* trigger CI a few times

* minimize diff

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-08-12 12:56:39 -04:00
Sieds Lykles
4c3982c44e Take sign out of mod (#11631)
* Add rule and test

* fix tests
2025-08-12 18:44:36 +02:00
chenyu
0d7075f2de assign should broadcast input tensor (#11629)
fixed test_assign_broadcast
2025-08-11 23:36:35 -04:00
George Hotz
ca41b5e38b skip_0 in graph rewrite [pr] (#11627)
* skip_0 in graph rewrite [pr]

* no track_rewrites on test

* use dict instead of set
2025-08-11 18:29:04 -07:00
chenyu
0c97d6de1b don't round pow output for int pow int (#11625)
also added atol=0 and big pows for the tests
2025-08-11 20:57:47 -04:00
chenyu
d623f6d850 support int Tensor pow to const non-negative int (#11624)
matches torch
2025-08-11 19:50:19 -04:00
chenyu
857a830dcc fix test_arange_float_step (#11623) 2025-08-11 16:58:42 -04:00
ttomsa
ae0c3cfff6 change clang -march flag to -mcpu on arm (#10970)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-08-11 13:38:48 -04:00
geohotstan
27bcb9fd1c Support cubic mode for ONNX Resize OP (#11612)
* start

* add reference

* this is so much slower

* this makes sense but differs from official impl, but results are still correct..?

* add a comment

* Just keep it simple for now since I don't fully get it yet

* address comments

* correct

* teeny clean up

* another small comment improvement lol
2025-08-11 11:49:30 -04:00
nimlgen
d2bb1bcb97 cloud: a bit better err handling (#11616)
* cloud: err propagation to client

* fix

* print exc

* linter

* excs

* fix

* hm

* flaky
2025-08-11 15:51:22 +03:00
chenyu
a67e0917c3 list indexing can normalize in python (#11609)
* list indexing can normalize in python

list index does not need to be normalized in tensor

* update those
2025-08-10 20:02:38 -04:00
chenyu
1181ec0cd2 few more tensor indexing test cases (#11608) 2025-08-10 18:56:42 -04:00
George Hotz
996c907c0b rewrite not ready + children machinery (#11607)
* rewrite not ready + children machinery

* it doesn't like track rewrites
2025-08-10 15:28:30 -07:00
geohotstan
b0dab6a4cd onnx Resize OP clean up (#11603)
* start

* slight clean up
2025-08-10 14:10:39 -04:00
Sieds Lykles
10540414cd Add Ops.CMPEQ (#10431)
* Add op

* add to Groupop.ALU

* fix spec

* fix ptx

* temporary pickle by name to see process replay

* add Ops.EQ to binary ops

* Actuall rename properly

* add test to assert CMPEQ is being used

* Ops.CMPEQ is automatic cast to bool

* add Ops.CMPEQ to llvm

* add Ops.CMPEQ to llvm
2025-08-10 13:13:16 +02:00
chenyu
dfb702ef33 fix sort for small dim (#11601)
* fix sort for small dim

* fixed test_sort_empty
2025-08-10 01:17:41 -04:00
Sieds Lykles
01c770c77b Fix z3 float cast in indexing (#11590)
* adjust dtype of z3_renderer and add rule for cast

* dtypes.bool is also cast noop

* add regression test

* make embedding smaller

* even smaller test
2025-08-09 17:59:23 +02:00
Sieds Lykles
10d388499d Refactor optional.py (#11578)
* move fast_idiv to transcendental

* move optional.py

* adjust comment

* change import

* mypy needs this?
2025-08-09 17:35:05 +02:00
qazal
16f0edbe90 pass opts arg in get_program process replay [pr] (#11571)
* fix ptx process replay

* keyword arg

* renderer is also optional [pr]

* test_linearizer fixup

* name function order is args,ret,kwargs

* can use opts_to_apply

* pass through p.applied_opts

* sink_arg

* now it opens devices too
2025-08-08 03:05:09 +03:00
qazal
960cc6533a pass through name function args in track_rewrites (#11572) 2025-08-08 02:28:52 +03:00
George Hotz
82be8abfd2 move opt under codegen (#11569) 2025-08-07 14:19:17 -07:00
George Hotz
6ed2dfd187 delete the arange dim mismatch restriction (#11568)
* delete the arange dim mismatch restriction

* skip that test race
2025-08-07 13:46:17 -07:00
chenyu
aa1a6f2132 support threshold in Tensor.softplus (#11564)
fix gradient for large input
2025-08-07 13:43:18 -04:00
chenyu
7ee3770961 FUSE_ARANGE=1 (#11427)
* FUSE_ARANGE=1

* fix test

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-08-07 13:32:34 -04:00
George Hotz
9764c6cdee fix mismatch reduce, try 2 (#11560)
* fix mismatch reduce, try 2

* fix heuristic

* delete that test

* don't start allowing ones
2025-08-07 07:57:58 -07:00
nimlgen
4f29a2c441 fix flaky test on macos (#11557) 2025-08-07 15:55:35 +03:00
George Hotz
a1aa5670aa Revert "fix mismatch reduce (#11547)" (#11549)
This reverts commit 49d21a9055.
2025-08-06 22:43:15 -07:00
George Hotz
49d21a9055 fix mismatch reduce (#11547)
* fix mismatch reduce

* cleanups

* fix shape

* fix mypy

* resolve
2025-08-06 21:12:51 -07:00
George Hotz
21570545d3 move view pushing to codegen, try 2 (#11534)
* move view pushing to codegen, try 2

* fix up some linearizer tests

* fix test search

* fix test schedule

* delete that test

* fix test arange

* fix a few tests

* update tests

* push views

* ebs cleanup

* fix local/reg

* test and lint

* fix more tests

* test cleanups

* skipped that one
2025-08-06 15:58:38 -07:00
George Hotz
80d9cced07 more test cleanups (#11544)
* more test cleanups

* revert that
2025-08-06 15:05:21 -07:00
George Hotz
6fd1332763 update some tests for less Kernel (#11543)
* update some tests for less Kernel

* get_program update
2025-08-06 14:19:59 -07:00
George Hotz
09dc7af8e9 move bind to big graph (#11539)
* move bind to big graph

* fix tests

* unbind inside kernel only

* merge views

* fix multitensor

* failure text change
2025-08-06 13:27:51 -07:00
George Hotz
7c5e115747 test_mismatch_reduce (#11538) 2025-08-06 10:02:14 -07:00
George Hotz
4fe11725c6 pass through sink arg, update linearizer test (#11536)
* pass through sink arg, update linearizer test

* get_program help

* bump line count

* use new api
2025-08-06 09:48:48 -07:00
chenyu
c9225d22ce only disable flaky test_jit_multidev_xfer (#11523) 2025-08-05 22:17:25 -04:00
George Hotz
07b0df0d86 hotfix: test tensor dims start at 1 2025-08-05 15:40:24 -07:00
George Hotz
4dabdf7c6d Revert "optimize in rewrite (#11516)" (#11517)
This reverts commit 3b777a9e05.
2025-08-05 15:39:07 -07:00
George Hotz
3b777a9e05 optimize in rewrite (#11516)
* changes

* fix test uops

* dim shouldn't be 0

* huh, why did that one not save
2025-08-05 15:33:26 -07:00
nimlgen
fc4e713d1c jit graph split tests (#11507)
* jit graph split tests

* fix

* one more test

* more tests

* fix

* xm

* rmeote
2025-08-05 21:32:37 +03:00
chenyu
ace8e9a706 fix test_conv2d_winograd (#11511) 2025-08-05 12:15:46 -04:00
chenyu
223aaa0492 clean up more conv tests (#11510) 2025-08-05 12:15:30 -04:00
Garret Castro
76e62a1c23 extract conv layer test logic (#11488)
* refactor: extract conv layer test logic

* tuple is unnecessary

* integrate _test_conv logic into all conv tests

* fix linter, forgot dilation

* undo winograd extraction

adds too many if statements for a single case
2025-08-05 11:15:54 -04:00
uuuvn
011ef8fa9d Fix incorrect jit current batch devs reset (#11505)
`current_batch_devs = []` (in `flush_batch()`) happens between
`new_batched_devs = ...` and `current_batch_devs = new_batched_devs` =>
doesn't actually reset anything leading to things not jitting properly

which 2xs remote bert step time (should have similar effects on any
non-hcq backend)
2025-08-05 08:16:16 +03:00
chenyu
f02720ca2d fix fuse gate_contiguous unique (#11504) 2025-08-04 23:43:31 -04:00
qazal
846a2826ab viz: remove TracingKey.fmt (#11482)
* viz: remove TracingKey.fmt

* remove from test too
2025-08-05 00:00:03 +03:00
leopf
4f0ee4e982 BPE tokenizer (#11415)
* BPE works

* refactor tok

* oops

* basic tests

* fix eval

* smaller diff

* fix error

* proper vocab decoding

* use regex for splitting

* escape ucatrange

* full compat

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-08-04 09:52:38 -07:00
b1tg
06af9f9236 fix double exception + add name,loc in error msg (#11487)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-08-04 13:41:23 +03:00
chenyu
e0106b6b25 1/(x*c) -> (1/c)*(1/x) (#11491)
example: 2*(2*a).reciprocal() -> a.reciprocal()

# TODO: bounds for reciprocal
# TODO: should z3 work?
2025-08-03 23:35:46 -04:00
chenyu
dbc7807c61 enable WEBGPU tests with buffer limit (#11489)
TestSample still fails?
2025-08-03 13:02:44 -07:00
chenyu
66be747908 few more dtype cast convinience methods (#11480) 2025-08-02 15:47:09 -04:00