nimlgen
bf0c45fd16
system: resource_resize might be unavail ( #11680 )
2025-08-15 22:03:23 +03:00
George Hotz
4ab9fb2edd
explicit fixed point rewrite ( #11685 )
...
* explicit fixed point rewrite
* local cache
* fix that
2025-08-15 11:08:41 -07:00
chenyu
5d6963c968
RuntimeError for unsupported dtype in PYTHON ( #11686 )
2025-08-15 13:59:27 -04:00
nimlgen
b970cd6895
am: fix psp ring completion ( #11679 )
...
* am: psp ring timeout + fix 0 fence_value
* no sleep
2025-08-15 20:15:49 +03:00
qazal
c8ba48b223
show rewrite errors in viz ( #11684 )
2025-08-15 19:09:47 +03:00
George Hotz
560984fd8d
small changes from rangeify ( #11682 )
...
* small changes from rangeify
* const like thing
* ksym
2025-08-15 08:45:52 -07:00
chenyu
d0d39885c3
onnx in tinygrad ( #11675 )
2025-08-14 19:57:21 -04:00
wozeparrot
71260a5ea4
feat: only bench openpilot 0.9.9 models ( #11664 )
2025-08-14 19:27:18 -04:00
chenyu
4ddefbccb4
update setup packages ( #11674 )
...
sorted, and added missing 'tinygrad.frontend' and 'tinygrad.runtime.autogen.nv'
2025-08-14 19:24:57 -04:00
chenyu
48c4033ae1
fix pylint for onnx ( #11673 )
...
* fix pylint for onnx
* too long
2025-08-14 18:48:02 -04:00
chenyu
e9d0027591
llama MP realize weight after shard ( #11672 )
...
* llama MP realize weight after shard
prevents memory spike on device 0
* empty weight for FAKEDATA
2025-08-14 16:17:46 -04:00
nimlgen
4176b24264
amd: support xcc in regs ( #11670 )
...
* amd: support xcc in regs
* mockamd
* typong
2025-08-14 21:20:11 +03:00
Sieds Lykles
f399d0d75d
Render mod in terms of idiv ( #11668 )
...
* Render mod in terms of idiv
* cvar -> var
2025-08-14 19:59:39 +02:00
nimlgen
d747eeed32
amd logs parser based on device ( #11669 )
2025-08-14 19:49:33 +03:00
geohotstan
1e904155e3
Add Onnx Huggingface to test/models/test_onnx.py ( #11468 )
...
* BOOM
* cache extra/huggingface/models/
* why max buffer size is not 0
* override MAX_BUFFER_SIZE
* less models
* remove more models and change cache dir to already cached dir
* only metal
* less is more?
* remove check ops
* why is this not setting the ENVVAR
* ughhhhh just test in models
* only cpu and gpu
* only cpu actually
* just override it idk
* final
* move extra dependencies up top
* simplification
* fix print
* make README better
* revert ops_disk fix for now
* clean up test_onnx
* remove testing fashion clip model cuz sloooowwwwww
* actually let METAL run this
* fix comment mistake
* fix download path in run_models
* does this work?
* cleanup setup and teardown
* contextvar like this?
* prove model is cached
* do I need to increment DOWNLOAD_CACHE_VERSION?
* see if cached with incremented DOWNLOAD_CACHE_VERSION
* use warnings to see if the model exists
* revert DOWNLOAD_CACHE_VERSION stuff and clean up
* add retry to download
* nit
2025-08-14 11:16:41 -04:00
Sieds Lykles
06beeb6e13
Nest div even if factor is negative ( #11666 )
2025-08-14 13:58:59 +02:00
Sieds Lykles
661e9a2d5d
div_and_mod_folding refactor ( #11585 )
...
* divmod const folding is its own function
* split nested mod optimization out of div and mod folding
* make `fold_binary_numerator` its own function
* factor out `fold_divmod_congruence`
* check sign of numerator
* add tests
* assert int on vmin and vmax
* add type: ignore
* factor out more rules
* remove div_and_mod_folding
* cached_property to property
* remove import
* add returns
* restore old order
* check sign of x.vmin and newx.vmin
* check more signs
* add some test that would have caught bugs
* better test if the div simplified
* shorten line
* replace terms_factors_const with pop_const
* move that back
* minor cleanup
* remove comments
* some cleanup
2025-08-14 11:52:42 +02:00
chenyu
0fc43c2e54
fix test_const_tensor_index index ( #11660 )
...
index should be ints
2025-08-13 19:50:16 -04:00
chenyu
4fe19eec72
Ops.TRUNC ( #11659 )
2025-08-13 18:40:48 -04:00
qazal
eb10a9c76a
viz: always left align timeline values ( #11658 )
2025-08-13 23:55:28 +03:00
George Hotz
22bdf48cdd
render ranges in viz, name gbufs with sizes. changes from rangeify ( #11656 )
...
* render ranges in viz, name gbufs with sizes. changes from rangeify
* fix unit test dtypes
2025-08-13 12:46:16 -07:00
George Hotz
9b4da590bb
remove need for cast_vec ( #11653 )
...
* remove need for cast_vec
* fix amdllvm
2025-08-13 12:09:47 -07:00
kevvz
e2873a3a41
[bounty] Muon optim ( #11414 )
...
* newton schulz
* add muon + move newton schulz to tensor
* compact newton schulz
* better tests
* cleanup
* add comments for muon
* cleanup
* add export with tests
* match muon optim with test optim
* cleanup
* unsed import
* correct comment
* whitespace
* move export
* muon test fix
* match reference impl + tests
* remove export by moving muon device
* add credit
* cleanup
* remove print
* spacing
* spacing
* comma
* cleanup
* removal
* fix tests + optim momentum
* consistent is not/ not
* more consistency
* fix test
* cleanup
* fix the nones
* remove comment
* cast
* comment
* comment
* muon teeny test
* muon flag beautiful mnist
* set steps
* steps as hyperparam
* match default test steps
* name
* large cleanup
* dont care about steps
* nesterov false default
* match each other impl
* steps
* switch nest
* swap defaults
* update docstring
* add no nesterov test
* ban fuse_optim
* prints
* classical momentum
* alternative condition
* recon
* pre + post wd
* false default
* detach
* signature changes
* context
* swap order
* big cleanup
* 0 step instead
* parity
* remove fuse
* remove fused
* better paper
* assert message
* correct shape check + eps
* multidim
* add eps
* cleanup
* correct assert message
* lint
* better tests
* naming
* ns_steps,ns_params
* update docstring
* docstring
* match sgd and muon together
* sandwich
* add back fused
* parity
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-08-13 14:27:55 -04:00
chenyu
94e6d84e32
rewrite Tensor.round to not use cast int ( #11654 )
2025-08-13 13:51:08 -04:00
George Hotz
d2521d828a
transcendental+idiv+threefry are uop decompositions ( #11636 )
...
* transcendental+idiv+threefry are uop decompositions [pr]
* threefry decomp
* fix randomness tests
* fix webgpu
* unneeded now
* fix
* move prematcher
* all cast should probably be cast_vec
2025-08-13 09:37:12 -07:00
geohotstan
cf7224ce3e
fully lint onnx.py ( #11647 )
...
* mypy
* ruff ruff ruff
2025-08-13 08:22:06 -07:00
geohotstan
925555b62a
Fix onnx Domain bug ( #11650 )
2025-08-13 08:20:50 -07:00
Sieds Lykles
67df617fe1
add launch bounds to ptx ( #11646 )
2025-08-13 13:05:39 +02:00
qazal
88f95e9f59
viz: minor fixups for firefox ( #11645 )
...
* fix circle attr
* set fill color
2025-08-13 12:59:28 +03:00
qazal
6f88eac0fc
viz: refactor node and edge tagging ( #11644 )
2025-08-13 12:41:01 +03:00
qazal
8140bf9778
viz: create layout once ( #11643 )
...
* start
* work
* works
* diff cleanup
2025-08-13 09:24:58 +03:00
chenyu
3fb79bb43a
minor onnx cleanups ( #11642 )
2025-08-13 01:05:19 -04:00
chenyu
e9e5a08a04
simplify onnx cubic ( #11641 )
...
we can drop the double where and abs since we know which ranges the inputs map into
2025-08-12 19:57:31 -04:00
George Hotz
18cdbec447
split decompositions pass ( #11638 )
...
* split decompositions pass
* fix ptx
* pack load store early
* restore that
2025-08-12 12:56:05 -07:00
chenyu
0d8a0d7a96
update test_multi_const_folding_tensor to include pow ( #11635 )
...
pow folds now
2025-08-12 13:35:37 -04:00
Sieds Lykles
4d6e407eb0
Extend fast_idiv to negative ints ( #11632 )
...
* fast idiv for signed ints
* Add rule and test
* fix tests
* redo fuzz_fast_idiv to do negative ints as well
* adjust comments
* remove unused imports
2025-08-12 19:34:49 +02:00
qazal
17adbe86d8
hotfix: do not default to capturing args in track_rewrites ( #11634 )
2025-08-12 20:01:24 +03:00
geohotstan
ad9dec25b3
combine onnx parser and onnx ( #11485 )
...
* start
* more
* fix onnx_runner test
* pass
* patch for disk and add domains from huggingface
* simpler docs
* revert domain changes
* rerun ci
* revert onnx ops test change
* add fix from strenum stuff
* correct way
* revert correct way to leave the fix for another PR
* test segfault
* Revert "test segfault"
This reverts commit 4e1aaf41e7 .
* remove some unnecessary documentation
* test segfault again
* Revert "test segfault again"
This reverts commit 56fc5f03e7 .
* try gemini suggested patch for sys._getframe
* keep trying with gemini
* revert not working gemini suggestions and try faulthandler
* remove pythonfaulthandler
* trigger CI a few times
* minimize diff
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-08-12 12:56:39 -04:00
Sieds Lykles
4c3982c44e
Take sign out of mod ( #11631 )
...
* Add rule and test
* fix tests
2025-08-12 18:44:36 +02:00
qazal
e28605e324
rename profile point event fields [pr] ( #11633 )
2025-08-12 19:11:21 +03:00
nimlgen
8a7be0a747
metal: workaround for transfers sync issue ( #11622 )
...
* metal: workaround for transfers sync issue
* metal tracsfer sync is broken
* hm
* rm it?
* keep it
2025-08-12 16:16:34 +03:00
qazal
efe8b5611d
move ProfilePointEvent out of device.py [pr] ( #11630 )
...
Generic profiling events exist in helpers so they can be imported from
everywhere in tinygrad.
2025-08-12 09:58:32 +03:00
chenyu
0d7075f2de
assign should broadcast input tensor ( #11629 )
...
fixed test_assign_broadcast
2025-08-11 23:36:35 -04:00
Joshua Kissoon
c44760c89d
torch backend: fix arange, add linalg.cross, add tests ( #11628 )
2025-08-11 23:34:41 -04:00
George Hotz
ca41b5e38b
skip_0 in graph rewrite [pr] ( #11627 )
...
* skip_0 in graph rewrite [pr]
* no track_rewrites on test
* use dict instead of set
2025-08-11 18:29:04 -07:00
Sardor
ca7a641442
fix bugs at examples/yolov3.py ( #11614 )
...
* Update load_weight. Give valid model url
* Fix bug in iou function
2025-08-11 21:14:47 -04:00
chenyu
0c97d6de1b
don't round pow output for int pow int ( #11625 )
...
also added atol=0 and big pows for the tests
2025-08-11 20:57:47 -04:00
chenyu
d623f6d850
support int Tensor pow to const non-negative int ( #11624 )
...
matches torch
2025-08-11 19:50:19 -04:00
chenyu
857a830dcc
fix test_arange_float_step ( #11623 )
2025-08-11 16:58:42 -04:00
chenyu
0806677b51
rewrite sort idx ( #11613 )
2025-08-11 16:20:56 -04:00