qazal
b12d1d866c
count bytes per kernel in test_viz ( #11801 )
...
Currently at ~100 bytes/kernel with JSON.
2025-08-23 23:35:27 +03:00
Sieds Lykles
6a50ab6b87
adjust idiv min_max ( #11802 )
...
* change div min_max
* add tests
2025-08-23 22:25:51 +02:00
chenyu
9d4cccd0f9
test_dtype_alu cleanups ( #11799 )
2025-08-23 15:11:17 -04:00
George Hotz
aefabaf774
add AxisType to range ( #11798 )
...
* add AxisType to range
* missed them
* fix that test
* fix that test
2025-08-23 11:15:00 -07:00
qazal
b975830424
add profile loader helper in test_viz ( #11797 )
2025-08-23 19:20:29 +03:00
chenyu
7123df3928
Use Tensor.logaddexp to implement Tensor.softplus ( #11796 )
...
instead of piecewise linear, numerical is handled by logaddexp. jax does this and i think it's more elegant than torch's approach
2025-08-23 11:52:29 -04:00
chenyu
fb8ee02424
Tensor.logaddexp ( #11793 )
2025-08-23 09:15:00 -04:00
Sieds Lykles
5a6817d5f8
Fix z3 rendering of floats in indexing ( #11740 )
...
* Fix floating point comparison in indexing
* wrap in noop
* update tests
* improve rules for loading and comparing floats
* add test cast to bool
2025-08-23 05:56:19 +02:00
chenyu
e39b25cd36
upcast float exp to at least float32 ( #11758 )
...
* upcast float exp to at least float32
* unlucky seed
2025-08-22 20:16:34 -04:00
qazal
9ff03680ba
viz: store relative timestamps ( #11787 )
...
* viz: store relative timestamps
* err
* update test
2025-08-22 19:30:21 +03:00
geohotstan
1e679bd789
fix max_unpool2d inf ( #11784 )
...
* start
* add regression test for maxunpool2d
2025-08-22 08:31:24 -04:00
George Hotz
9832599c9e
test_vmap + permute isn't a sint ( #11783 )
...
* test_vmap + permute isn't a sint
* order
2025-08-21 22:39:35 -07:00
George Hotz
bb8de51e5f
remove unused early cleanups + contig w range [pr] ( #11780 )
...
* remove unused early cleanups [pr]
* contiguous with range
* woah, this works
2025-08-21 20:04:45 -07:00
chenyu
91a4de4ca7
fix getitem with inf in tensor ( #11781 )
2025-08-21 21:55:32 -04:00
George Hotz
5954a0975f
fix some assigns on rangeify ( #11774 )
...
* fix some assigns
* llvm test
* more tests
* upd test
2025-08-21 15:15:54 -07:00
qazal
2e0eb88549
viz: add metadata to UOp tracing ( #11772 )
...
* viz: add metadata to UOp tracing
* place after tag
* optional field
* err, refcount of root must be 0
2025-08-22 00:18:45 +03:00
George Hotz
9f94c25a25
fix symbolic usage. use shrink, not reshape ( #11762 )
...
* fix test_var
* revert those things
* fix the ones in test tiny
* use better syntax
* it's the same, but that's clearer
* fix pad
2025-08-20 18:35:42 -07:00
chenyu
5276fbc9c5
fix gather with inf values ( #11760 )
...
(mask * x) is wrong because 0*inf is nan. i feel we have a lot of those still...
2025-08-20 20:35:40 -04:00
George Hotz
9635592141
** rangeify, try 3 ( #11683 )
...
* ** rangeify, try 3
* bring that over
* bufferize, don't use contig tag
* work
* ish
* fix rangeify
* flash attention is back
* fix rangeify tests
* stuff passes
* fix test_log_softmax
* more stuff passes
* progress children
* new endrange solution
* progress
* progress counter
* basic assign
* contigs only
* symbolic in schedule
* unbind_kernel
* late children
* ops fixed
* beautiful mnist is close
* that seems to work
* mnist works
* improve names
* fix bmnist
* no pcontig
* testing backward
* work
* clone movement ops
* new_range helper
* MBLOCK/MERGE
* ops tests pass
* revert mblock stuff
* cleanups...but it breaks ops
* remove reindex
* hack for relu
* disable the hacks
* more hacks
* upd
* mostly works with cleanups disabled
* ndr
* ops tests pass
* terrible hacks for indexing to work
* context mismatch
* pcontig
* split pcontig v contig
* z3 trunc
* null
* no fuse in rangeify
* ops test passes
* lnorm
* fix assign
* nd rangeify
* both should work
* tests for rangeify
* cleanups
* stores pass the pointer through
* disable pcontig for now
* PARTIAL_CONTIG is a flag
2025-08-20 14:22:44 -07:00
chenyu
d7553721d1
clean up test_dtype_alu ( #11757 )
...
remove the check that looks into schedule, only test if output matches
2025-08-20 14:36:18 -04:00
qazal
de4cb722a4
viz: add metadata and var_vals tracing ( #11753 )
...
* viz: add metadata and var_vals tracing
* add test_trace_metadata
* set TRACEMETA=1
2025-08-20 18:39:51 +03:00
chenyu
be7b0b6970
TRANSCENDENTAL_SUPPORTED_DTYPES->TRANSCENDENTAL_DTYPES ( #11752 )
2025-08-20 10:29:36 -04:00
ttomsa
220a2a88d7
a*(1/b) -> a/b on LLVM, CPU ( #11743 )
...
* add fdiv rewrite
* :)
* use float_lop
* use reciprocal()
* revert
* move to decompositions
2025-08-20 09:35:10 -04:00
George Hotz
12ab3f8b06
correct row_count in process replay ( #11748 )
2025-08-19 22:21:07 -07:00
George Hotz
8af8808c61
cleanup tests, bump caches ( #11746 )
2025-08-19 21:21:07 -07:00
George Hotz
00391db628
no ast for mem estimate ( #11744 )
...
* no ast for mem estimate
* skip for webgpu
2025-08-19 20:18:45 -07:00
ttomsa
70c3f1fb29
x.where(False, True) -> !x ( #11738 )
...
* add pat
* add test
2025-08-19 19:08:16 -04:00
George Hotz
1d307f568c
move device tests to test/device + test cleanups ( #11735 )
...
* move device tests to test/device
* test speedups
* test device
* linalg to unit
* upd
* so pytest just works
* more divide and skip
* speed
* test devectorize
* add pillow
2025-08-19 16:02:20 -07:00
nimlgen
9c9e337c78
amd: parse soc enums ( #11727 )
...
* amd: parse soc enums
* remove from mock
* fix
* minimal amd_gpu
2025-08-19 15:06:09 +03:00
George Hotz
4b3fcb4064
Revert "REDUCE_AXIS keepdim=False ( #11311 )" ( #11718 )
...
This reverts commit b518a7378a .
2025-08-18 13:28:53 -07:00
b1tg
b518a7378a
REDUCE_AXIS keepdim=False ( #11311 )
...
* progress
* fix tests
* fix tests
* remove hack for test_symfold
* fix test_conv.py on llvm
* hack test_cache_speed
* lint
* remove hack for helper_linearizer_opt
* tests
* fix DSP
* clean up
* remove hack for kernelize.py
* hack for test/test_multitensor.py TestMultiTensor.test_matmul_shard_none
* clean
* uop.r need reshape?
* lower_store cause fail
* fix lower?
* avoid contiguous hack
* 2134
* conv2d count
* remove unused
* hack lower
* reduced and clean up
* fix TestMultiTensor.test_matmul_shard_none
* src sync + fix TestMultiTensor.test_matmul_shard_none
* remove excluded in mop
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com >
2025-08-18 10:09:17 -07:00
chenyu
c30a113b2a
support bf16 and fp8 in Tensor.tolist ( #11704 )
...
memoryview does not support it, but casting works fine so cast is fine
2025-08-17 15:11:13 -04:00
qazal
d762edd694
viz: define tracks in python ( #11701 )
...
* viz: defines tracks in python
* update unittests
* figuring it out
* works
* diff cleanup
* math
* y axis is back
2025-08-17 18:19:13 +03:00
George Hotz
9366a23eb0
test backward in test_tiny ( #11697 )
...
* test backward in test_tiny
* empty
2025-08-16 20:29:39 -07:00
chenyu
4666df71c1
fix test_fuse_and_tc_opt ( #11699 )
2025-08-16 21:10:53 -04:00
geohotstan
3d7c35d615
add fuse and tc opt bug repro ( #11695 )
...
* FINALLY HAVE A SMALL REPRO OH BOY
* show failure in CI
* cleaner?
* 1 possible fix
* Revert "1 possible fix"
This reverts commit 9e0fd215dd .
2025-08-16 18:24:49 -04:00
qazal
c8ba48b223
show rewrite errors in viz ( #11684 )
2025-08-15 19:09:47 +03:00
George Hotz
560984fd8d
small changes from rangeify ( #11682 )
...
* small changes from rangeify
* const like thing
* ksym
2025-08-15 08:45:52 -07:00
chenyu
d0d39885c3
onnx in tinygrad ( #11675 )
2025-08-14 19:57:21 -04:00
nimlgen
4176b24264
amd: support xcc in regs ( #11670 )
...
* amd: support xcc in regs
* mockamd
* typong
2025-08-14 21:20:11 +03:00
geohotstan
1e904155e3
Add Onnx Huggingface to test/models/test_onnx.py ( #11468 )
...
* BOOM
* cache extra/huggingface/models/
* why max buffer size is not 0
* override MAX_BUFFER_SIZE
* less models
* remove more models and change cache dir to already cached dir
* only metal
* less is more?
* remove check ops
* why is this not setting the ENVVAR
* ughhhhh just test in models
* only cpu and gpu
* only cpu actually
* just override it idk
* final
* move extra dependencies up top
* simplification
* fix print
* make README better
* revert ops_disk fix for now
* clean up test_onnx
* remove testing fashion clip model cuz sloooowwwwww
* actually let METAL run this
* fix comment mistake
* fix download path in run_models
* does this work?
* cleanup setup and teardown
* contextvar like this?
* prove model is cached
* do I need to increment DOWNLOAD_CACHE_VERSION?
* see if cached with incremented DOWNLOAD_CACHE_VERSION
* use warnings to see if the model exists
* revert DOWNLOAD_CACHE_VERSION stuff and clean up
* add retry to download
* nit
2025-08-14 11:16:41 -04:00
Sieds Lykles
06beeb6e13
Nest div even if factor is negative ( #11666 )
2025-08-14 13:58:59 +02:00
Sieds Lykles
661e9a2d5d
div_and_mod_folding refactor ( #11585 )
...
* divmod const folding is its own function
* split nested mod optimization out of div and mod folding
* make `fold_binary_numerator` its own function
* factor out `fold_divmod_congruence`
* check sign of numerator
* add tests
* assert int on vmin and vmax
* add type: ignore
* factor out more rules
* remove div_and_mod_folding
* cached_property to property
* remove import
* add returns
* restore old order
* check sign of x.vmin and newx.vmin
* check more signs
* add some test that would have caught bugs
* better test if the div simplified
* shorten line
* replace terms_factors_const with pop_const
* move that back
* minor cleanup
* remove comments
* some cleanup
2025-08-14 11:52:42 +02:00
chenyu
0fc43c2e54
fix test_const_tensor_index index ( #11660 )
...
index should be ints
2025-08-13 19:50:16 -04:00
chenyu
4fe19eec72
Ops.TRUNC ( #11659 )
2025-08-13 18:40:48 -04:00
George Hotz
22bdf48cdd
render ranges in viz, name gbufs with sizes. changes from rangeify ( #11656 )
...
* render ranges in viz, name gbufs with sizes. changes from rangeify
* fix unit test dtypes
2025-08-13 12:46:16 -07:00
kevvz
e2873a3a41
[bounty] Muon optim ( #11414 )
...
* newton schulz
* add muon + move newton schulz to tensor
* compact newton schulz
* better tests
* cleanup
* add comments for muon
* cleanup
* add export with tests
* match muon optim with test optim
* cleanup
* unsed import
* correct comment
* whitespace
* move export
* muon test fix
* match reference impl + tests
* remove export by moving muon device
* add credit
* cleanup
* remove print
* spacing
* spacing
* comma
* cleanup
* removal
* fix tests + optim momentum
* consistent is not/ not
* more consistency
* fix test
* cleanup
* fix the nones
* remove comment
* cast
* comment
* comment
* muon teeny test
* muon flag beautiful mnist
* set steps
* steps as hyperparam
* match default test steps
* name
* large cleanup
* dont care about steps
* nesterov false default
* match each other impl
* steps
* switch nest
* swap defaults
* update docstring
* add no nesterov test
* ban fuse_optim
* prints
* classical momentum
* alternative condition
* recon
* pre + post wd
* false default
* detach
* signature changes
* context
* swap order
* big cleanup
* 0 step instead
* parity
* remove fuse
* remove fused
* better paper
* assert message
* correct shape check + eps
* multidim
* add eps
* cleanup
* correct assert message
* lint
* better tests
* naming
* ns_steps,ns_params
* update docstring
* docstring
* match sgd and muon together
* sandwich
* add back fused
* parity
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-08-13 14:27:55 -04:00
George Hotz
d2521d828a
transcendental+idiv+threefry are uop decompositions ( #11636 )
...
* transcendental+idiv+threefry are uop decompositions [pr]
* threefry decomp
* fix randomness tests
* fix webgpu
* unneeded now
* fix
* move prematcher
* all cast should probably be cast_vec
2025-08-13 09:37:12 -07:00
geohotstan
925555b62a
Fix onnx Domain bug ( #11650 )
2025-08-13 08:20:50 -07:00
chenyu
0d8a0d7a96
update test_multi_const_folding_tensor to include pow ( #11635 )
...
pow folds now
2025-08-12 13:35:37 -04:00