Commit Graph

4246 Commits

Author SHA1 Message Date
George Hotz
758a1888d6 make EMULATE a context var 2025-09-04 11:03:32 -07:00
George Hotz
09106e4aae refactor and split test_linearizer (#12001)
* refactor and split test_linearizer

* forget that file

* imports

* remove from docs

* test gen float4
2025-09-04 10:53:07 -07:00
chenyu
fb71d1e5fd delete some test_search tests (#11998)
TC_SEARCH_OVER_SHAPE was removed so should the tests
2025-09-04 11:19:49 -04:00
Sieds Lykles
572a3c15c6 Move Ops.SPECIAL arg to src (#11918)
* initial moving bound to src

* arg to src

* remove import

* fixup linearizer

* arg to src

* fix test_uop_graph

* fix more tests

* fix python renderer

* get const value from const uop

* ssimplify uop estimates

* fix webgpu locals

* fix old test

* gate Ops.SPECIAL in linearizer

* use ssimplify() for local/global_size

* remove toposort gate_parents_instead_of_self

* fix rendering in comment

* cleanup

* rename and add comments

* add BottomUpGate with test
2025-09-04 09:31:44 +02:00
George Hotz
5cf42dc4db add Scheduler to replace Kernel with POSTOPT=2 (#11924)
* ** simple kernel to replace Kernel for postopt

* support old

* fix beam

* beaming

* beam on old

* bring tensor cores back

* raise

* postbeam

* test ops passes on mac

* skip that

* postopt default

* gate that

* fix tensor cores

* a few test fixes

* dsp fix

* tc fix

* loop

* support swap

* test_gemv

* fix beam for variable

* test opts from high level stuff

* range annoying

* compile slow

* metal slow

* better beam

* no POSTBEAM

* fix nolocals

* hc opt mostly works

* put that back

* lil

* some work

* fix that

* POSTOPT 2

* fix tests

* no postopt 2

* work

* back

* padded tensors cores

* shift_to

* postopt 0 passes?

* write PADTO

* fix padded tensor cores

* compare hcopt

* 18000 lines

* should pass tests

* fix rangeify

* put types back
2025-09-03 19:23:30 -07:00
chenyu
b13e071463 move test_winograd to unit test (#11993) 2025-09-03 21:47:32 -04:00
chenyu
edc8b99853 more tests that pass PTX now (#11992) 2025-09-03 21:18:14 -04:00
chenyu
ed2f45712b remove skip PTX in test_arange (#11991)
all passes now
2025-09-03 20:45:19 -04:00
George Hotz
a5f2b4872a use_tensor_cores is a heuristic (#11989)
* use_tensor_cores is a heuristic

* context
2025-09-03 17:05:10 -07:00
George Hotz
63e930fec3 apply_tensor_cores is a heuristic (#11988)
* apply_tensor_cores is a heuristic

* delete extra_opts
2025-09-03 16:39:33 -07:00
chenyu
d0e739453e update many einsum tests (#11981)
correct the exception testing, and raise ValueError instead of assert when checking args
2025-09-03 15:40:20 -04:00
b1tg
6d53cac457 dtype fuzz: log need input > 0 (#11979)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-09-03 12:10:42 -04:00
Jordan Chalupka
68e83b850f nbytes should raise an exception when size is unlimited (#11928)
* nbytes should raise an exception when size is unlimited

* adding a test
2025-09-03 07:06:20 -07:00
Sieds Lykles
86e908db57 cast parents of int64 alu to int32 if possible (#11977)
* add overflows helper

* add rules

* x -> y

* check overflow of u too

* cleaner

* use alu instead of replace to preserve vectorization

* just one rule

* add test
2025-09-03 11:05:04 +02:00
Sieds Lykles
033184b3cb parse_valid with non const rhs (#11957)
* const to using vmin/vmax

* add test

* convert to int

* remove left over part of and
2025-09-03 08:08:46 +02:00
Sieds Lykles
53eff8970a add Ops.GEP to _min_max (#11976) 2025-09-03 07:07:54 +02:00
Sieds Lykles
d1d0960e6e remove intermediate cast using bounds - weaker pattern (#11974) 2025-09-03 06:24:40 +02:00
Sieds Lykles
8a2846b31a assert embedding input is integer dtype (#11963)
* cast embedding input

* raise error if not using int for index embedding
2025-09-03 01:44:26 +02:00
George Hotz
1b73993521 pyrender to render uops (#11968)
* pyrender to render uops

* new pyrender style

* pyrender works

* list str

* store render
2025-09-02 15:44:01 -07:00
chenyu
69dd1817d0 raise RuntimeError in merge_dicts instead of assert [pr] (#11965) 2025-09-02 17:18:44 -04:00
qazal
f750c15965 viz: add python marker (#11952)
* viz: add python marker

* remove duplicate
2025-09-02 23:44:00 +03:00
George Hotz
550cf2ca7f tests from postopt (#11964)
* tests from postopt

* reraise is fine
2025-09-02 13:34:17 -07:00
nimlgen
897254ad6c ci: add dev<->cpu copy speeds (#11959) 2025-09-02 15:22:44 +03:00
George Hotz
0dfca4e74b add failing test for rangeify setitem (#11954) 2025-09-01 16:24:35 -07:00
chenyu
6a40216724 correct bf16 fuzz input in test_dtype_alu (#11933)
it was using float16 inputs, now it's uint16 then convert to bf16
2025-09-01 10:52:26 -04:00
chenyu
965ea59b16 test_dtype_alu use AMD_LLVM from helpers (#11950) 2025-09-01 10:03:17 -04:00
b1tg
a9f07c31bc fix amd llvm sqrt (#11936)
* fix amd llvm sqrt

* lint

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-09-01 09:31:14 -04:00
qazal
0a53e72f70 viz: fix trace duration in python test decoder (#11949) 2025-09-01 14:32:25 +03:00
qazal
27c9ed5a84 viz: more consistent naming of events (#11948)
* s/shapes/events in test_viz

* s/bufs/events in the memory packer
2025-09-01 14:16:47 +03:00
Sieds Lykles
d9560a631c remove cast between ints if safe (#11946) 2025-09-01 05:56:49 +02:00
Sieds Lykles
a19d689481 fix vec dtype _min_max (#11944) 2025-09-01 03:24:07 +02:00
Sieds Lykles
f32f3464d6 Can safe cast from certain ints to floats (#11941)
* add rule

* add some tests

* prevent infinite loop with bfloat16

* add some ints to double and float can_safe_cast

* add tests
2025-09-01 00:51:24 +02:00
Sieds Lykles
1c6e43c203 Double cast is one cast if intermediate cast is safe (#11939)
* add rule

* add some tests

* prevent infinite loop with bfloat16

* prevent more infinite rewrite
2025-09-01 00:36:29 +02:00
b1tg
c1eeb3b99c only skip AMD_LLVM (#11934)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-08-31 18:15:47 +03:00
b1tg
75d380a77c fix transcendentals in python renderer (#11932)
* fix transcendentals in python renderer

* add test

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-08-31 09:37:17 -04:00
Sieds Lykles
d3252ccd85 fix special vmax when arg is UOp (#11930) 2025-08-31 06:54:39 +02:00
chenyu
af89be317e relax rtol for bfloat16 test_dtype_alu (#11926) 2025-08-30 17:16:08 -04:00
qazal
c27b99d68f viz: refactor to indexed rewrite traces (#11923) 2025-08-30 20:01:47 +03:00
qazal
bf0d055b39 viz: color by name (#11919) 2025-08-30 16:04:58 +03:00
Sieds Lykles
0bc34c000f simplify range mod its own upper bound (#11917)
* add rules

* add tests
2025-08-30 08:37:35 +02:00
chenyu
561318fea7 Tensor.cos in test_stype_alu (#11916)
* Tensor.cos in test_stype_alu

* need this fix anyway
2025-08-29 20:26:36 -04:00
nimlgen
c6e342cdac mockgpu: no hang if gpuocelot failed (#11915) 2025-08-30 00:44:49 +03:00
chenyu
26d03a86a1 test_symbolic_ops.py cleanup (#11895) 2025-08-29 17:11:59 -04:00
b1tg
b2cc06218a python bfloat16 (#11912)
* python bf16

* _to_torch_storage_type

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-08-29 15:18:02 -04:00
George Hotz
afad7d0cd1 remove dtype from range, it will be dtypes.index soon [pr] (#11914)
* remove dtype from range, it will be dtypes.index soon [pr]

* a few more
2025-08-29 09:52:07 -07:00
George Hotz
394c2d1db1 update Kernel API in tests + move optimize_local_size (#11907) 2025-08-28 15:12:47 -07:00
nimlgen
fa695ac1ce ci: mac gpuocelot (#11906)
* gm

* fix?

* ops

* imp

* xx

* add file
2025-08-28 23:29:43 +03:00
George Hotz
b9b438c516 small updates from postopt (#11903)
* tests from postopt

* modernize

* skip lin tests

* that's fixed?

* skip, not failure
2025-08-28 12:34:52 -07:00
Ben Waldron
ea1be2e4cd [bounty] Remove using reshape to register symbolic shape (#11771)
* Modify tests and start work towards removing symbolic reshape

* Refactor symbolic reshape

* fix small error

* much cleaner + fix more tests

* Can remove this now

* Update test_symbolic_ops and test_tiny

* Couple more tests

* Unused import

* More tests and add EXPAND to Tensor.empty

* Fix test beam search

* all int

* Fix rangeify by adding shrink

* Remove OOB check and so fix test_symbolic_jit

* test_symbolic_jit doesn't need OOB Context anymore either

* Should remove that test now

* Cleanups part 1

* fix linters

* Final cleanups

* Don't reassign inside for loop

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-08-28 12:30:49 -04:00
Ben Waldron
17ecaf4682 Add test_variable_empty (#11889)
* Add test_variable_empty

* Move test and add TODO

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-08-28 11:38:27 -04:00