Commit Graph

10633 Commits

Author SHA1 Message Date
qazal
ba1183314a const_like can return a valid [pr] (#8005)
* const_like can return a valid [pr]

* fixup
2024-12-03 18:42:12 +08:00
qazal
4e91533419 test: don't ref until schedule (#8004) 2024-12-03 18:06:52 +08:00
George Hotz
b8bf5b2787 minor uop speedups [pr] (#8002)
* minor uop cleaner [pr]

* free uop creation speed by removing WeakValueDictionary

* a lil faster

* disable that test

* lines

* and it doesn't print non hit patterns
2024-12-03 17:04:48 +08:00
George Hotz
1028b34a20 add typing to basicblocks (#7999) 2024-12-03 15:05:11 +08:00
George Hotz
0905f87b68 hotfix: print only kernel time 2024-12-03 14:25:08 +08:00
chenyu
17d5719a38 add process replay to webgpu tests (#7998) 2024-12-02 20:27:29 -05:00
chenyu
c7bc75e634 alu(c?t0:f0, c?t1:f1) -> c?alu(t0,t1):alu(f0,f1) (#7900)
* alu(c?t0:f0, c?t1:f1) -> c?alu(t0,t1):alu(f0,f1)

only do if at least one branch is const, so total alu won't increase

* tests and interesting TODO cases
2024-12-02 17:19:27 -05:00
chenyu
b91fa24387 script to run regressed sd conv on metal (#7995)
* script to run regressed sd conv on metal

this and other similar `conv2d + add` kernels contributed to most of the speed regression

* # ruff: noqa: E501
2024-12-02 15:34:27 -05:00
geohotstan
0a2e10be1d add SELU to Tensor (#7993)
* add selu

* more clean ups
2024-12-02 10:04:01 -05:00
Ahmed Harmouche
146e1caea3 Downgrade wgpu to prevent sd segfault (#7969) 2024-12-02 15:48:44 +01:00
wozeparrot
077e7e8ed2 fix: private segment sgpr on gfx103x (#7987)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-02 20:54:50 +08:00
qazal
bb606e5bcf process replayable ops.py changes from delete_lazy [pr] (#7994)
* process replayable ops.py changes from delete_lazy [pr]

* hotfix: seed tiny_jit
2024-12-02 19:38:31 +08:00
George Hotz
0c7477b108 no bool in range [pr] (#7988)
* no bool in range [pr]

* fix llvm

* add arg to range spec

* fix broken test

* forgot this one

* hotfix: test_tiny jit is a real test
2024-12-02 19:05:16 +08:00
Ahmed Harmouche
8909dbd82c Remove wgpu specific checks from stable diffusion example (#7991) 2024-12-02 11:31:14 +01:00
qazal
e2916ff210 image dtype fixup refactor for delete_lazy [pr] (#7989) 2024-12-02 18:25:13 +08:00
Ahmed Harmouche
5340d3dedf Merge pull request #7986 from tinygrad/atomics-in-smem-wgpu
Support packed types in smem on webgpu
2024-12-02 10:38:19 +01:00
Ahmed Harmouche
dfae038580 Simplify render_buf_dt 2024-12-02 10:27:59 +01:00
Ahmed Harmouche
1ea0925744 Support packed types in smem in webgpu 2024-12-02 10:13:25 +01:00
George Hotz
61b2cac507 basicblock is dataclass (#7985)
* basicblock is dataclass [pr]

* tiny cleanups
2024-12-02 16:48:39 +08:00
George Hotz
275951b730 clean up a few parents -> toposort [pr] (#7984)
* clean up a few parents -> toposort [pr]

* rename to old_parents + sched tests

* a few more

* that one

* second to last

* final
2024-12-02 15:59:31 +08:00
George Hotz
f17af70d17 replace all sparents with toposort (#7983) 2024-12-02 15:00:30 +08:00
George Hotz
b09310d8c2 add toposort method to uops, faster linearize [pr] (#7982)
* add toposort method to uops, faster linearize [pr]

* trust the toposort

* all toposort

* Revert "all toposort"

This reverts commit db123adfda.
2024-12-02 14:46:16 +08:00
qazal
b797aee720 uop global buf number tracking try 2 [pr] (#7912)
* uop buffer init small refactor [pr]

* add early

* this way it doesn't need late

* buffer_num

* itertools.count

* count from 0

* down to 380
2024-12-02 14:45:17 +08:00
George Hotz
cbcc1c20eb second try at block linearize (#7892)
* second try at block linearize

* weeee, works for lil matmul

* it's so beautiful

* test tiny passes

* fix bugs

* combine matching BLOCKENDS

* wrapping

* test lin failures passes

* those failures were fake

* flip sort order

* fix ptx tests

* deal with store better

* dumb ptx fix

* expect less

* reduce lines

* reduce lines

* less lines and cleaner

* no defaultdict

* tighter

* simpler block_parent_count
2024-12-02 13:43:09 +08:00
George Hotz
9b0859d717 PYTHON device is okay to use everywhere [pr] (#7981) 2024-12-02 12:34:42 +08:00
mesozoic-egg
90e2b2d577 Remove gated store, put rewrite to uopgraph [pr] (#7975)
* update test for gated store

* put gated store rewrite to uopgraph, rm from ptx

* update test

update test

update test

* remove gated st rewrite in llvm

* lint

---------

Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.mail>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-02 12:33:16 +08:00
George Hotz
d53cd92364 fix tests for delete lazy [pr] (#7980) 2024-12-02 12:00:48 +08:00
chenyu
254c86d712 ruff target-version "py38" -> "py310" (#7978) 2024-12-01 22:35:21 -05:00
George Hotz
6c1efb9a72 hotfix: amd gemv was flaky 2024-12-02 11:08:24 +08:00
chenyu
4e46c67327 small helpers cleanups (#7977)
less lines for ceildiv and partition, and removed one # noqa: E501
2024-12-01 21:50:47 -05:00
qazal
aa2e7b11f8 more const folding infra from the delete_lazy branch [pr] (#7976)
* more const folding infra from the delete_lazy branch [pr]

* sink base

* limit
2024-12-01 23:20:30 +08:00
ignaciosica
509c4a573f increase tolerance on test (#7972) 2024-11-30 11:50:10 -05:00
qazal
ca20f281df late folding size 0 ops (#7940)
* fold st size=0

* fold 0 here

* ops folding

* update realize
2024-12-01 00:40:02 +08:00
chenyu
c068e8c242 fetch cleanup (#7970)
reordered a bit to minimize the stuff in the with blocks

test manually with TestFetch and `DISABLE_HTTP_CACHE=1` on some examples
2024-11-30 11:00:33 -05:00
qazal
bb8e319680 unset TRACK_MATCH_STATS while initing beam buffers [pr] (#7971) 2024-11-30 23:56:58 +08:00
qazal
d0735d6489 swizzle store [pr] (#7964)
* swizzle store [pr]

* assign extra swizzle

* now arg is optional

* extra
2024-11-30 21:32:50 +08:00
qazal
6f17eedaea schedule sink folding try 2 [pr] (#7968) 2024-11-30 20:46:26 +08:00
qazal
293e0f8a8e make ASSIGN arg optional [pr] (#7966) 2024-11-30 19:40:33 +08:00
qazal
5615e92df8 const folding tests [pr] (#7967) 2024-11-30 19:27:30 +08:00
qazal
8780818d04 Revert "schedule sink folding with graph_rewrite [pr] (#7963)" (#7965)
This reverts commit 4529c5d0da.
2024-11-30 19:02:06 +08:00
qazal
4529c5d0da schedule sink folding with graph_rewrite [pr] (#7963)
* schedule sink folding with graph_rewrite [pr]

* x is reserved, use u

* match lazy const folding
2024-11-30 18:30:41 +08:00
nimlgen
10f431b96d hcq replace update with sint (#7899)
* try sym hcq

* start with amd

* move to nv

* nv works

* cache and qcom

* fixes

* signals

* fix nv

* qcom fixes

* linter

* linter

* cache + typings

* fixes

* tiny fixes

* linter

* linter

* lntr

* ugh

* comments
2024-11-29 20:08:13 +03:00
chenyu
aa51f3c14e update kernel counts in test_real_world (#7960)
the test was useless because it was looking at the jit graph counts. wrap with JIT=2 for now.

if it's stable we could consider making kernel count strict, which helps change like #7940
2024-11-29 11:14:54 -05:00
nimlgen
d3660ccc51 prereqs for hcq updates removal (#7959)
* hcq signals touch ups

* hcq compiled has device id

* helpers

* prreq hcq api

* oops
2024-11-29 18:20:07 +03:00
geohotstan
e1a85c262c no type-tracker getitem refactor (#6917)
* newest newer than new refactor of getitem

* hmmm

* hmmmmmmmmmmmmmmmmm

* bro.

* ???

* small improvements

* cleaner, but why u gotta do this to me mypy

* fix, but still dunno about mypy

* even better

* try again? Passes locally

* use match

* fix mypy

* better

* broooooo check this out

* fix mypy

* bug fix

* fixed

* polish
2024-11-29 10:18:02 -05:00
Sieds Lykles
d267a2d9eb Div mod recombine test for issue (#7957)
* Add test for failing div_mod recombine

* Add test case when there is gcd in div/mod
2024-11-29 08:47:50 -05:00
qazal
e54ff0d3af conceptual uop st cleanup [pr] (#7956)
* conceptual uop st cleanup [pr]

* unwrap is fine here, better than arg
2024-11-29 19:35:46 +08:00
Ahmed Harmouche
2d11765295 Fix WebGPU atomic store (#7954) 2024-11-29 19:31:25 +08:00
nimlgen
309dcb1044 hcq signal add sleep (#7955)
* hcqsignal sleep

* fixes

* typing

* time ms is int
2024-11-29 14:04:45 +03:00
qazal
30f0e95fbd don't lru_cache is_scheduled [pr] (#7953) 2024-11-29 17:03:55 +08:00