George Hotz
758a1888d6
make EMULATE a context var
2025-09-04 11:03:32 -07:00
George Hotz
09106e4aae
refactor and split test_linearizer ( #12001 )
...
* refactor and split test_linearizer
* forget that file
* imports
* remove from docs
* test gen float4
2025-09-04 10:53:07 -07:00
chenyu
fb71d1e5fd
delete some test_search tests ( #11998 )
...
TC_SEARCH_OVER_SHAPE was removed so should the tests
2025-09-04 11:19:49 -04:00
Sieds Lykles
572a3c15c6
Move Ops.SPECIAL arg to src ( #11918 )
...
* initial moving bound to src
* arg to src
* remove import
* fixup linearizer
* arg to src
* fix test_uop_graph
* fix more tests
* fix python renderer
* get const value from const uop
* ssimplify uop estimates
* fix webgpu locals
* fix old test
* gate Ops.SPECIAL in linearizer
* use ssimplify() for local/global_size
* remove toposort gate_parents_instead_of_self
* fix rendering in comment
* cleanup
* rename and add comments
* add BottomUpGate with test
2025-09-04 09:31:44 +02:00
George Hotz
5cf42dc4db
add Scheduler to replace Kernel with POSTOPT=2 ( #11924 )
...
* ** simple kernel to replace Kernel for postopt
* support old
* fix beam
* beaming
* beam on old
* bring tensor cores back
* raise
* postbeam
* test ops passes on mac
* skip that
* postopt default
* gate that
* fix tensor cores
* a few test fixes
* dsp fix
* tc fix
* loop
* support swap
* test_gemv
* fix beam for variable
* test opts from high level stuff
* range annoying
* compile slow
* metal slow
* better beam
* no POSTBEAM
* fix nolocals
* hc opt mostly works
* put that back
* lil
* some work
* fix that
* POSTOPT 2
* fix tests
* no postopt 2
* work
* back
* padded tensors cores
* shift_to
* postopt 0 passes?
* write PADTO
* fix padded tensor cores
* compare hcopt
* 18000 lines
* should pass tests
* fix rangeify
* put types back
2025-09-03 19:23:30 -07:00
chenyu
b13e071463
move test_winograd to unit test ( #11993 )
2025-09-03 21:47:32 -04:00
chenyu
edc8b99853
more tests that pass PTX now ( #11992 )
2025-09-03 21:18:14 -04:00
chenyu
ed2f45712b
remove skip PTX in test_arange ( #11991 )
...
all passes now
2025-09-03 20:45:19 -04:00
George Hotz
a5f2b4872a
use_tensor_cores is a heuristic ( #11989 )
...
* use_tensor_cores is a heuristic
* context
2025-09-03 17:05:10 -07:00
George Hotz
63e930fec3
apply_tensor_cores is a heuristic ( #11988 )
...
* apply_tensor_cores is a heuristic
* delete extra_opts
2025-09-03 16:39:33 -07:00
chenyu
d0e739453e
update many einsum tests ( #11981 )
...
correct the exception testing, and raise ValueError instead of assert when checking args
2025-09-03 15:40:20 -04:00
b1tg
6d53cac457
dtype fuzz: log need input > 0 ( #11979 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-09-03 12:10:42 -04:00
Jordan Chalupka
68e83b850f
nbytes should raise an exception when size is unlimited ( #11928 )
...
* nbytes should raise an exception when size is unlimited
* adding a test
2025-09-03 07:06:20 -07:00
Sieds Lykles
86e908db57
cast parents of int64 alu to int32 if possible ( #11977 )
...
* add overflows helper
* add rules
* x -> y
* check overflow of u too
* cleaner
* use alu instead of replace to preserve vectorization
* just one rule
* add test
2025-09-03 11:05:04 +02:00
Sieds Lykles
033184b3cb
parse_valid with non const rhs ( #11957 )
...
* const to using vmin/vmax
* add test
* convert to int
* remove left over part of and
2025-09-03 08:08:46 +02:00
Sieds Lykles
53eff8970a
add Ops.GEP to _min_max ( #11976 )
2025-09-03 07:07:54 +02:00
Sieds Lykles
d1d0960e6e
remove intermediate cast using bounds - weaker pattern ( #11974 )
2025-09-03 06:24:40 +02:00
Sieds Lykles
8a2846b31a
assert embedding input is integer dtype ( #11963 )
...
* cast embedding input
* raise error if not using int for index embedding
2025-09-03 01:44:26 +02:00
George Hotz
1b73993521
pyrender to render uops ( #11968 )
...
* pyrender to render uops
* new pyrender style
* pyrender works
* list str
* store render
2025-09-02 15:44:01 -07:00
chenyu
69dd1817d0
raise RuntimeError in merge_dicts instead of assert [pr] ( #11965 )
2025-09-02 17:18:44 -04:00
qazal
f750c15965
viz: add python marker ( #11952 )
...
* viz: add python marker
* remove duplicate
2025-09-02 23:44:00 +03:00
George Hotz
550cf2ca7f
tests from postopt ( #11964 )
...
* tests from postopt
* reraise is fine
2025-09-02 13:34:17 -07:00
nimlgen
897254ad6c
ci: add dev<->cpu copy speeds ( #11959 )
2025-09-02 15:22:44 +03:00
George Hotz
0dfca4e74b
add failing test for rangeify setitem ( #11954 )
2025-09-01 16:24:35 -07:00
chenyu
6a40216724
correct bf16 fuzz input in test_dtype_alu ( #11933 )
...
it was using float16 inputs, now it's uint16 then convert to bf16
2025-09-01 10:52:26 -04:00
chenyu
965ea59b16
test_dtype_alu use AMD_LLVM from helpers ( #11950 )
2025-09-01 10:03:17 -04:00
b1tg
a9f07c31bc
fix amd llvm sqrt ( #11936 )
...
* fix amd llvm sqrt
* lint
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-09-01 09:31:14 -04:00
qazal
0a53e72f70
viz: fix trace duration in python test decoder ( #11949 )
2025-09-01 14:32:25 +03:00
qazal
27c9ed5a84
viz: more consistent naming of events ( #11948 )
...
* s/shapes/events in test_viz
* s/bufs/events in the memory packer
2025-09-01 14:16:47 +03:00
Sieds Lykles
d9560a631c
remove cast between ints if safe ( #11946 )
2025-09-01 05:56:49 +02:00
Sieds Lykles
a19d689481
fix vec dtype _min_max ( #11944 )
2025-09-01 03:24:07 +02:00
Sieds Lykles
f32f3464d6
Can safe cast from certain ints to floats ( #11941 )
...
* add rule
* add some tests
* prevent infinite loop with bfloat16
* add some ints to double and float can_safe_cast
* add tests
2025-09-01 00:51:24 +02:00
Sieds Lykles
1c6e43c203
Double cast is one cast if intermediate cast is safe ( #11939 )
...
* add rule
* add some tests
* prevent infinite loop with bfloat16
* prevent more infinite rewrite
2025-09-01 00:36:29 +02:00
b1tg
c1eeb3b99c
only skip AMD_LLVM ( #11934 )
...
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-08-31 18:15:47 +03:00
b1tg
75d380a77c
fix transcendentals in python renderer ( #11932 )
...
* fix transcendentals in python renderer
* add test
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-08-31 09:37:17 -04:00
Sieds Lykles
d3252ccd85
fix special vmax when arg is UOp ( #11930 )
2025-08-31 06:54:39 +02:00
chenyu
af89be317e
relax rtol for bfloat16 test_dtype_alu ( #11926 )
2025-08-30 17:16:08 -04:00
qazal
c27b99d68f
viz: refactor to indexed rewrite traces ( #11923 )
2025-08-30 20:01:47 +03:00
qazal
bf0d055b39
viz: color by name ( #11919 )
2025-08-30 16:04:58 +03:00
Sieds Lykles
0bc34c000f
simplify range mod its own upper bound ( #11917 )
...
* add rules
* add tests
2025-08-30 08:37:35 +02:00
chenyu
561318fea7
Tensor.cos in test_stype_alu ( #11916 )
...
* Tensor.cos in test_stype_alu
* need this fix anyway
2025-08-29 20:26:36 -04:00
nimlgen
c6e342cdac
mockgpu: no hang if gpuocelot failed ( #11915 )
2025-08-30 00:44:49 +03:00
chenyu
26d03a86a1
test_symbolic_ops.py cleanup ( #11895 )
2025-08-29 17:11:59 -04:00
b1tg
b2cc06218a
python bfloat16 ( #11912 )
...
* python bf16
* _to_torch_storage_type
---------
Co-authored-by: b1tg <b1tg@users.noreply.github.com >
2025-08-29 15:18:02 -04:00
George Hotz
afad7d0cd1
remove dtype from range, it will be dtypes.index soon [pr] ( #11914 )
...
* remove dtype from range, it will be dtypes.index soon [pr]
* a few more
2025-08-29 09:52:07 -07:00
George Hotz
394c2d1db1
update Kernel API in tests + move optimize_local_size ( #11907 )
2025-08-28 15:12:47 -07:00
nimlgen
fa695ac1ce
ci: mac gpuocelot ( #11906 )
...
* gm
* fix?
* ops
* imp
* xx
* add file
2025-08-28 23:29:43 +03:00
George Hotz
b9b438c516
small updates from postopt ( #11903 )
...
* tests from postopt
* modernize
* skip lin tests
* that's fixed?
* skip, not failure
2025-08-28 12:34:52 -07:00
Ben Waldron
ea1be2e4cd
[bounty] Remove using reshape to register symbolic shape ( #11771 )
...
* Modify tests and start work towards removing symbolic reshape
* Refactor symbolic reshape
* fix small error
* much cleaner + fix more tests
* Can remove this now
* Update test_symbolic_ops and test_tiny
* Couple more tests
* Unused import
* More tests and add EXPAND to Tensor.empty
* Fix test beam search
* all int
* Fix rangeify by adding shrink
* Remove OOB check and so fix test_symbolic_jit
* test_symbolic_jit doesn't need OOB Context anymore either
* Should remove that test now
* Cleanups part 1
* fix linters
* Final cleanups
* Don't reassign inside for loop
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-08-28 12:30:49 -04:00
Ben Waldron
17ecaf4682
Add test_variable_empty ( #11889 )
...
* Add test_variable_empty
* Move test and add TODO
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-08-28 11:38:27 -04:00