Commit Graph

11106 Commits

Author SHA1 Message Date
nimlgen
4176b24264 amd: support xcc in regs (#11670)
* amd: support xcc in regs

* mockamd

* typong
2025-08-14 21:20:11 +03:00
Sieds Lykles
f399d0d75d Render mod in terms of idiv (#11668)
* Render mod in terms of idiv

* cvar -> var
2025-08-14 19:59:39 +02:00
nimlgen
d747eeed32 amd logs parser based on device (#11669) 2025-08-14 19:49:33 +03:00
geohotstan
1e904155e3 Add Onnx Huggingface to test/models/test_onnx.py (#11468)
* BOOM

* cache extra/huggingface/models/

* why max buffer size is not 0

* override MAX_BUFFER_SIZE

* less models

* remove more models and change cache dir to already cached dir

* only metal

* less is more?

* remove check ops

* why is this not setting the ENVVAR

* ughhhhh just test in models

* only cpu and gpu

* only cpu actually

* just override it idk

* final

* move extra dependencies up top

* simplification

* fix print

* make README better

* revert ops_disk fix for now

* clean up test_onnx

* remove testing fashion clip model cuz sloooowwwwww

* actually let METAL run this

* fix comment mistake

* fix download path in run_models

* does this work?

* cleanup setup and teardown

* contextvar like this?

* prove model is cached

* do I need to increment DOWNLOAD_CACHE_VERSION?

* see if cached with incremented DOWNLOAD_CACHE_VERSION

* use warnings to see if the model exists

* revert DOWNLOAD_CACHE_VERSION stuff and clean up

* add retry to download

* nit
2025-08-14 11:16:41 -04:00
Sieds Lykles
06beeb6e13 Nest div even if factor is negative (#11666) 2025-08-14 13:58:59 +02:00
Sieds Lykles
661e9a2d5d div_and_mod_folding refactor (#11585)
* divmod const folding is its own function

* split nested mod optimization out of div and mod folding

* make `fold_binary_numerator` its own function

* factor out `fold_divmod_congruence`

* check sign of numerator

* add tests

* assert int on vmin and vmax

* add type: ignore

* factor out more rules

* remove div_and_mod_folding

* cached_property to property

* remove import

* add returns

* restore old order

* check sign of x.vmin and newx.vmin

* check more signs

* add some test that would have caught bugs

* better test if the div simplified

* shorten line

* replace terms_factors_const with pop_const

* move that back

* minor cleanup

* remove comments

* some cleanup
2025-08-14 11:52:42 +02:00
chenyu
0fc43c2e54 fix test_const_tensor_index index (#11660)
index should be ints
2025-08-13 19:50:16 -04:00
chenyu
4fe19eec72 Ops.TRUNC (#11659) 2025-08-13 18:40:48 -04:00
qazal
eb10a9c76a viz: always left align timeline values (#11658) 2025-08-13 23:55:28 +03:00
George Hotz
22bdf48cdd render ranges in viz, name gbufs with sizes. changes from rangeify (#11656)
* render ranges in viz, name gbufs with sizes. changes from rangeify

* fix unit test dtypes
2025-08-13 12:46:16 -07:00
George Hotz
9b4da590bb remove need for cast_vec (#11653)
* remove need for cast_vec

* fix amdllvm
2025-08-13 12:09:47 -07:00
kevvz
e2873a3a41 [bounty] Muon optim (#11414)
* newton schulz

* add muon + move newton schulz to tensor

* compact newton schulz

* better tests

* cleanup

* add comments for muon

* cleanup

* add export with tests

* match muon optim with test optim

* cleanup

* unsed import

* correct comment

* whitespace

* move export

* muon test fix

* match reference impl + tests

* remove export by moving muon device

* add credit

* cleanup

* remove print

* spacing

* spacing

* comma

* cleanup

* removal

* fix tests + optim momentum

* consistent is not/ not

* more consistency

* fix test

* cleanup

* fix the nones

* remove comment

* cast

* comment

* comment

* muon teeny test

* muon flag beautiful mnist

* set steps

* steps as hyperparam

* match default test steps

* name

* large cleanup

* dont care about steps

* nesterov false default

* match each other impl

* steps

* switch nest

* swap defaults

* update docstring

* add no nesterov test

* ban fuse_optim

* prints

* classical momentum

* alternative condition

* recon

* pre + post wd

* false default

* detach

* signature changes

* context

* swap order

* big cleanup

* 0 step instead

* parity

* remove fuse

* remove fused

* better paper

* assert message

* correct shape check + eps

* multidim

* add eps

* cleanup

* correct assert message

* lint

* better tests

* naming

* ns_steps,ns_params

* update docstring

* docstring

* match sgd and muon together

* sandwich

* add back fused

* parity

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-08-13 14:27:55 -04:00
chenyu
94e6d84e32 rewrite Tensor.round to not use cast int (#11654) 2025-08-13 13:51:08 -04:00
George Hotz
d2521d828a transcendental+idiv+threefry are uop decompositions (#11636)
* transcendental+idiv+threefry are uop decompositions [pr]

* threefry decomp

* fix randomness tests

* fix webgpu

* unneeded now

* fix

* move prematcher

* all cast should probably be cast_vec
2025-08-13 09:37:12 -07:00
geohotstan
cf7224ce3e fully lint onnx.py (#11647)
* mypy

* ruff ruff ruff
2025-08-13 08:22:06 -07:00
geohotstan
925555b62a Fix onnx Domain bug (#11650) 2025-08-13 08:20:50 -07:00
Sieds Lykles
67df617fe1 add launch bounds to ptx (#11646) 2025-08-13 13:05:39 +02:00
qazal
88f95e9f59 viz: minor fixups for firefox (#11645)
* fix circle attr

* set fill color
2025-08-13 12:59:28 +03:00
qazal
6f88eac0fc viz: refactor node and edge tagging (#11644) 2025-08-13 12:41:01 +03:00
qazal
8140bf9778 viz: create layout once (#11643)
* start

* work

* works

* diff cleanup
2025-08-13 09:24:58 +03:00
chenyu
3fb79bb43a minor onnx cleanups (#11642) 2025-08-13 01:05:19 -04:00
chenyu
e9e5a08a04 simplify onnx cubic (#11641)
we can drop the double where and abs since we know which ranges the inputs map into
2025-08-12 19:57:31 -04:00
George Hotz
18cdbec447 split decompositions pass (#11638)
* split decompositions pass

* fix ptx

* pack load store early

* restore that
2025-08-12 12:56:05 -07:00
chenyu
0d8a0d7a96 update test_multi_const_folding_tensor to include pow (#11635)
pow folds now
2025-08-12 13:35:37 -04:00
Sieds Lykles
4d6e407eb0 Extend fast_idiv to negative ints (#11632)
* fast idiv for signed ints

* Add rule and test

* fix tests

* redo fuzz_fast_idiv to do negative ints as well

* adjust comments

* remove unused imports
2025-08-12 19:34:49 +02:00
qazal
17adbe86d8 hotfix: do not default to capturing args in track_rewrites (#11634) 2025-08-12 20:01:24 +03:00
geohotstan
ad9dec25b3 combine onnx parser and onnx (#11485)
* start

* more

* fix onnx_runner test

* pass

* patch for disk and add domains from huggingface

* simpler docs

* revert domain changes

* rerun ci

* revert onnx ops test change

* add fix from strenum stuff

* correct way

* revert correct way to leave the fix for another PR

* test segfault

* Revert "test segfault"

This reverts commit 4e1aaf41e7.

* remove some unnecessary documentation

* test segfault again

* Revert "test segfault again"

This reverts commit 56fc5f03e7.

* try gemini suggested patch for sys._getframe

* keep trying with gemini

* revert not working gemini suggestions and try faulthandler

* remove pythonfaulthandler

* trigger CI a few times

* minimize diff

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-08-12 12:56:39 -04:00
Sieds Lykles
4c3982c44e Take sign out of mod (#11631)
* Add rule and test

* fix tests
2025-08-12 18:44:36 +02:00
qazal
e28605e324 rename profile point event fields [pr] (#11633) 2025-08-12 19:11:21 +03:00
nimlgen
8a7be0a747 metal: workaround for transfers sync issue (#11622)
* metal: workaround for transfers sync issue

* metal tracsfer sync is broken

* hm

* rm it?

* keep it
2025-08-12 16:16:34 +03:00
qazal
efe8b5611d move ProfilePointEvent out of device.py [pr] (#11630)
Generic profiling events exist in helpers so they can be imported from
everywhere in tinygrad.
2025-08-12 09:58:32 +03:00
chenyu
0d7075f2de assign should broadcast input tensor (#11629)
fixed test_assign_broadcast
2025-08-11 23:36:35 -04:00
Joshua Kissoon
c44760c89d torch backend: fix arange, add linalg.cross, add tests (#11628) 2025-08-11 23:34:41 -04:00
George Hotz
ca41b5e38b skip_0 in graph rewrite [pr] (#11627)
* skip_0 in graph rewrite [pr]

* no track_rewrites on test

* use dict instead of set
2025-08-11 18:29:04 -07:00
Sardor
ca7a641442 fix bugs at examples/yolov3.py (#11614)
* Update load_weight. Give valid model url

* Fix bug in iou function
2025-08-11 21:14:47 -04:00
chenyu
0c97d6de1b don't round pow output for int pow int (#11625)
also added atol=0 and big pows for the tests
2025-08-11 20:57:47 -04:00
chenyu
d623f6d850 support int Tensor pow to const non-negative int (#11624)
matches torch
2025-08-11 19:50:19 -04:00
chenyu
857a830dcc fix test_arange_float_step (#11623) 2025-08-11 16:58:42 -04:00
chenyu
0806677b51 rewrite sort idx (#11613) 2025-08-11 16:20:56 -04:00
George Hotz
700c11597b switch contextvars.ContextVar to _ContextVar (#11621) 2025-08-11 12:20:09 -07:00
ttomsa
ae0c3cfff6 change clang -march flag to -mcpu on arm (#10970)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-08-11 13:38:48 -04:00
geohotstan
27bcb9fd1c Support cubic mode for ONNX Resize OP (#11612)
* start

* add reference

* this is so much slower

* this makes sense but differs from official impl, but results are still correct..?

* add a comment

* Just keep it simple for now since I don't fully get it yet

* address comments

* correct

* teeny clean up

* another small comment improvement lol
2025-08-11 11:49:30 -04:00
nimlgen
d2bb1bcb97 cloud: a bit better err handling (#11616)
* cloud: err propagation to client

* fix

* print exc

* linter

* excs

* fix

* hm

* flaky
2025-08-11 15:51:22 +03:00
qazal
6a232ccdac viz: add tiny range drawing helper (#11620)
* viz: add tiny range drawing helper

* less
2025-08-11 15:15:43 +03:00
qazal
e768773e13 viz: use colors helper (#11618) 2025-08-11 13:10:15 +03:00
qazal
7d6c0a8cc7 viz: refactor progress msg (#11617) 2025-08-11 13:01:36 +03:00
chenyu
630edcffd8 remove .float calls in olmoe (#11610)
still matches torch
2025-08-10 20:33:22 -04:00
chenyu
a67e0917c3 list indexing can normalize in python (#11609)
* list indexing can normalize in python

list index does not need to be normalized in tensor

* update those
2025-08-10 20:02:38 -04:00
chenyu
1181ec0cd2 few more tensor indexing test cases (#11608) 2025-08-10 18:56:42 -04:00
George Hotz
996c907c0b rewrite not ready + children machinery (#11607)
* rewrite not ready + children machinery

* it doesn't like track rewrites
2025-08-10 15:28:30 -07:00