qazal
ba1183314a
const_like can return a valid [pr] ( #8005 )
...
* const_like can return a valid [pr]
* fixup
2024-12-03 18:42:12 +08:00
qazal
4e91533419
test: don't ref until schedule ( #8004 )
2024-12-03 18:06:52 +08:00
George Hotz
b8bf5b2787
minor uop speedups [pr] ( #8002 )
...
* minor uop cleaner [pr]
* free uop creation speed by removing WeakValueDictionary
* a lil faster
* disable that test
* lines
* and it doesn't print non hit patterns
2024-12-03 17:04:48 +08:00
George Hotz
1028b34a20
add typing to basicblocks ( #7999 )
2024-12-03 15:05:11 +08:00
George Hotz
0905f87b68
hotfix: print only kernel time
2024-12-03 14:25:08 +08:00
chenyu
17d5719a38
add process replay to webgpu tests ( #7998 )
2024-12-02 20:27:29 -05:00
chenyu
c7bc75e634
alu(c?t0:f0, c?t1:f1) -> c?alu(t0,t1):alu(f0,f1) ( #7900 )
...
* alu(c?t0:f0, c?t1:f1) -> c?alu(t0,t1):alu(f0,f1)
only do if at least one branch is const, so total alu won't increase
* tests and interesting TODO cases
2024-12-02 17:19:27 -05:00
chenyu
b91fa24387
script to run regressed sd conv on metal ( #7995 )
...
* script to run regressed sd conv on metal
this and other similar `conv2d + add` kernels contributed to most of the speed regression
* # ruff: noqa: E501
2024-12-02 15:34:27 -05:00
geohotstan
0a2e10be1d
add SELU to Tensor ( #7993 )
...
* add selu
* more clean ups
2024-12-02 10:04:01 -05:00
Ahmed Harmouche
146e1caea3
Downgrade wgpu to prevent sd segfault ( #7969 )
2024-12-02 15:48:44 +01:00
wozeparrot
077e7e8ed2
fix: private segment sgpr on gfx103x ( #7987 )
...
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-12-02 20:54:50 +08:00
qazal
bb606e5bcf
process replayable ops.py changes from delete_lazy [pr] ( #7994 )
...
* process replayable ops.py changes from delete_lazy [pr]
* hotfix: seed tiny_jit
2024-12-02 19:38:31 +08:00
George Hotz
0c7477b108
no bool in range [pr] ( #7988 )
...
* no bool in range [pr]
* fix llvm
* add arg to range spec
* fix broken test
* forgot this one
* hotfix: test_tiny jit is a real test
2024-12-02 19:05:16 +08:00
Ahmed Harmouche
8909dbd82c
Remove wgpu specific checks from stable diffusion example ( #7991 )
2024-12-02 11:31:14 +01:00
qazal
e2916ff210
image dtype fixup refactor for delete_lazy [pr] ( #7989 )
2024-12-02 18:25:13 +08:00
Ahmed Harmouche
5340d3dedf
Merge pull request #7986 from tinygrad/atomics-in-smem-wgpu
...
Support packed types in smem on webgpu
2024-12-02 10:38:19 +01:00
Ahmed Harmouche
dfae038580
Simplify render_buf_dt
2024-12-02 10:27:59 +01:00
Ahmed Harmouche
1ea0925744
Support packed types in smem in webgpu
2024-12-02 10:13:25 +01:00
George Hotz
61b2cac507
basicblock is dataclass ( #7985 )
...
* basicblock is dataclass [pr]
* tiny cleanups
2024-12-02 16:48:39 +08:00
George Hotz
275951b730
clean up a few parents -> toposort [pr] ( #7984 )
...
* clean up a few parents -> toposort [pr]
* rename to old_parents + sched tests
* a few more
* that one
* second to last
* final
2024-12-02 15:59:31 +08:00
George Hotz
f17af70d17
replace all sparents with toposort ( #7983 )
2024-12-02 15:00:30 +08:00
George Hotz
b09310d8c2
add toposort method to uops, faster linearize [pr] ( #7982 )
...
* add toposort method to uops, faster linearize [pr]
* trust the toposort
* all toposort
* Revert "all toposort"
This reverts commit db123adfda .
2024-12-02 14:46:16 +08:00
qazal
b797aee720
uop global buf number tracking try 2 [pr] ( #7912 )
...
* uop buffer init small refactor [pr]
* add early
* this way it doesn't need late
* buffer_num
* itertools.count
* count from 0
* down to 380
2024-12-02 14:45:17 +08:00
George Hotz
cbcc1c20eb
second try at block linearize ( #7892 )
...
* second try at block linearize
* weeee, works for lil matmul
* it's so beautiful
* test tiny passes
* fix bugs
* combine matching BLOCKENDS
* wrapping
* test lin failures passes
* those failures were fake
* flip sort order
* fix ptx tests
* deal with store better
* dumb ptx fix
* expect less
* reduce lines
* reduce lines
* less lines and cleaner
* no defaultdict
* tighter
* simpler block_parent_count
2024-12-02 13:43:09 +08:00
George Hotz
9b0859d717
PYTHON device is okay to use everywhere [pr] ( #7981 )
2024-12-02 12:34:42 +08:00
mesozoic-egg
90e2b2d577
Remove gated store, put rewrite to uopgraph [pr] ( #7975 )
...
* update test for gated store
* put gated store rewrite to uopgraph, rm from ptx
* update test
update test
update test
* remove gated st rewrite in llvm
* lint
---------
Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.mail >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-12-02 12:33:16 +08:00
George Hotz
d53cd92364
fix tests for delete lazy [pr] ( #7980 )
2024-12-02 12:00:48 +08:00
chenyu
254c86d712
ruff target-version "py38" -> "py310" ( #7978 )
2024-12-01 22:35:21 -05:00
George Hotz
6c1efb9a72
hotfix: amd gemv was flaky
2024-12-02 11:08:24 +08:00
chenyu
4e46c67327
small helpers cleanups ( #7977 )
...
less lines for ceildiv and partition, and removed one # noqa: E501
2024-12-01 21:50:47 -05:00
qazal
aa2e7b11f8
more const folding infra from the delete_lazy branch [pr] ( #7976 )
...
* more const folding infra from the delete_lazy branch [pr]
* sink base
* limit
2024-12-01 23:20:30 +08:00
ignaciosica
509c4a573f
increase tolerance on test ( #7972 )
2024-11-30 11:50:10 -05:00
qazal
ca20f281df
late folding size 0 ops ( #7940 )
...
* fold st size=0
* fold 0 here
* ops folding
* update realize
2024-12-01 00:40:02 +08:00
chenyu
c068e8c242
fetch cleanup ( #7970 )
...
reordered a bit to minimize the stuff in the with blocks
test manually with TestFetch and `DISABLE_HTTP_CACHE=1` on some examples
2024-11-30 11:00:33 -05:00
qazal
bb8e319680
unset TRACK_MATCH_STATS while initing beam buffers [pr] ( #7971 )
2024-11-30 23:56:58 +08:00
qazal
d0735d6489
swizzle store [pr] ( #7964 )
...
* swizzle store [pr]
* assign extra swizzle
* now arg is optional
* extra
2024-11-30 21:32:50 +08:00
qazal
6f17eedaea
schedule sink folding try 2 [pr] ( #7968 )
2024-11-30 20:46:26 +08:00
qazal
293e0f8a8e
make ASSIGN arg optional [pr] ( #7966 )
2024-11-30 19:40:33 +08:00
qazal
5615e92df8
const folding tests [pr] ( #7967 )
2024-11-30 19:27:30 +08:00
qazal
8780818d04
Revert "schedule sink folding with graph_rewrite [pr] ( #7963 )" ( #7965 )
...
This reverts commit 4529c5d0da .
2024-11-30 19:02:06 +08:00
qazal
4529c5d0da
schedule sink folding with graph_rewrite [pr] ( #7963 )
...
* schedule sink folding with graph_rewrite [pr]
* x is reserved, use u
* match lazy const folding
2024-11-30 18:30:41 +08:00
nimlgen
10f431b96d
hcq replace update with sint ( #7899 )
...
* try sym hcq
* start with amd
* move to nv
* nv works
* cache and qcom
* fixes
* signals
* fix nv
* qcom fixes
* linter
* linter
* cache + typings
* fixes
* tiny fixes
* linter
* linter
* lntr
* ugh
* comments
2024-11-29 20:08:13 +03:00
chenyu
aa51f3c14e
update kernel counts in test_real_world ( #7960 )
...
the test was useless because it was looking at the jit graph counts. wrap with JIT=2 for now.
if it's stable we could consider making kernel count strict, which helps change like #7940
2024-11-29 11:14:54 -05:00
nimlgen
d3660ccc51
prereqs for hcq updates removal ( #7959 )
...
* hcq signals touch ups
* hcq compiled has device id
* helpers
* prreq hcq api
* oops
2024-11-29 18:20:07 +03:00
geohotstan
e1a85c262c
no type-tracker getitem refactor ( #6917 )
...
* newest newer than new refactor of getitem
* hmmm
* hmmmmmmmmmmmmmmmmm
* bro.
* ???
* small improvements
* cleaner, but why u gotta do this to me mypy
* fix, but still dunno about mypy
* even better
* try again? Passes locally
* use match
* fix mypy
* better
* broooooo check this out
* fix mypy
* bug fix
* fixed
* polish
2024-11-29 10:18:02 -05:00
Sieds Lykles
d267a2d9eb
Div mod recombine test for issue ( #7957 )
...
* Add test for failing div_mod recombine
* Add test case when there is gcd in div/mod
2024-11-29 08:47:50 -05:00
qazal
e54ff0d3af
conceptual uop st cleanup [pr] ( #7956 )
...
* conceptual uop st cleanup [pr]
* unwrap is fine here, better than arg
2024-11-29 19:35:46 +08:00
Ahmed Harmouche
2d11765295
Fix WebGPU atomic store ( #7954 )
2024-11-29 19:31:25 +08:00
nimlgen
309dcb1044
hcq signal add sleep ( #7955 )
...
* hcqsignal sleep
* fixes
* typing
* time ms is int
2024-11-29 14:04:45 +03:00
qazal
30f0e95fbd
don't lru_cache is_scheduled [pr] ( #7953 )
2024-11-29 17:03:55 +08:00