qazal
aa2e7b11f8
more const folding infra from the delete_lazy branch [pr] ( #7976 )
...
* more const folding infra from the delete_lazy branch [pr]
* sink base
* limit
2024-12-01 23:20:30 +08:00
ignaciosica
509c4a573f
increase tolerance on test ( #7972 )
2024-11-30 11:50:10 -05:00
qazal
ca20f281df
late folding size 0 ops ( #7940 )
...
* fold st size=0
* fold 0 here
* ops folding
* update realize
2024-12-01 00:40:02 +08:00
chenyu
c068e8c242
fetch cleanup ( #7970 )
...
reordered a bit to minimize the stuff in the with blocks
test manually with TestFetch and `DISABLE_HTTP_CACHE=1` on some examples
2024-11-30 11:00:33 -05:00
qazal
bb8e319680
unset TRACK_MATCH_STATS while initing beam buffers [pr] ( #7971 )
2024-11-30 23:56:58 +08:00
qazal
d0735d6489
swizzle store [pr] ( #7964 )
...
* swizzle store [pr]
* assign extra swizzle
* now arg is optional
* extra
2024-11-30 21:32:50 +08:00
qazal
6f17eedaea
schedule sink folding try 2 [pr] ( #7968 )
2024-11-30 20:46:26 +08:00
qazal
293e0f8a8e
make ASSIGN arg optional [pr] ( #7966 )
2024-11-30 19:40:33 +08:00
qazal
5615e92df8
const folding tests [pr] ( #7967 )
2024-11-30 19:27:30 +08:00
qazal
8780818d04
Revert "schedule sink folding with graph_rewrite [pr] ( #7963 )" ( #7965 )
...
This reverts commit 4529c5d0da .
2024-11-30 19:02:06 +08:00
qazal
4529c5d0da
schedule sink folding with graph_rewrite [pr] ( #7963 )
...
* schedule sink folding with graph_rewrite [pr]
* x is reserved, use u
* match lazy const folding
2024-11-30 18:30:41 +08:00
nimlgen
10f431b96d
hcq replace update with sint ( #7899 )
...
* try sym hcq
* start with amd
* move to nv
* nv works
* cache and qcom
* fixes
* signals
* fix nv
* qcom fixes
* linter
* linter
* cache + typings
* fixes
* tiny fixes
* linter
* linter
* lntr
* ugh
* comments
2024-11-29 20:08:13 +03:00
chenyu
aa51f3c14e
update kernel counts in test_real_world ( #7960 )
...
the test was useless because it was looking at the jit graph counts. wrap with JIT=2 for now.
if it's stable we could consider making kernel count strict, which helps change like #7940
2024-11-29 11:14:54 -05:00
nimlgen
d3660ccc51
prereqs for hcq updates removal ( #7959 )
...
* hcq signals touch ups
* hcq compiled has device id
* helpers
* prreq hcq api
* oops
2024-11-29 18:20:07 +03:00
geohotstan
e1a85c262c
no type-tracker getitem refactor ( #6917 )
...
* newest newer than new refactor of getitem
* hmmm
* hmmmmmmmmmmmmmmmmm
* bro.
* ???
* small improvements
* cleaner, but why u gotta do this to me mypy
* fix, but still dunno about mypy
* even better
* try again? Passes locally
* use match
* fix mypy
* better
* broooooo check this out
* fix mypy
* bug fix
* fixed
* polish
2024-11-29 10:18:02 -05:00
Sieds Lykles
d267a2d9eb
Div mod recombine test for issue ( #7957 )
...
* Add test for failing div_mod recombine
* Add test case when there is gcd in div/mod
2024-11-29 08:47:50 -05:00
qazal
e54ff0d3af
conceptual uop st cleanup [pr] ( #7956 )
...
* conceptual uop st cleanup [pr]
* unwrap is fine here, better than arg
2024-11-29 19:35:46 +08:00
Ahmed Harmouche
2d11765295
Fix WebGPU atomic store ( #7954 )
2024-11-29 19:31:25 +08:00
nimlgen
309dcb1044
hcq signal add sleep ( #7955 )
...
* hcqsignal sleep
* fixes
* typing
* time ms is int
2024-11-29 14:04:45 +03:00
qazal
30f0e95fbd
don't lru_cache is_scheduled [pr] ( #7953 )
2024-11-29 17:03:55 +08:00
qazal
f044271898
big graph do_realize cleanup and renames [pr] ( #7952 )
...
* scheduler do_realize cleanup and renames [pr]
* big graph is the better name
* more language
* append_kernel -> append_realize
2024-11-29 14:58:45 +08:00
ignaciosica
6e47dc8921
true tc swizzle [pr] ( #7951 )
...
* true tc swizzle
* cleanup
* fix linter
2024-11-29 14:39:46 +08:00
geohotstan
765096fe7d
fix Tensor._pool edge case ( #7581 )
...
* split into another branch
* polish
* try this
* Revert "try this"
This reverts commit 84f711b13e .
* try
* Revert "try"
This reverts commit 89c7a7649b .
* idk anymore
* it is what it is
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-11-28 23:17:13 -05:00
chenyu
70f052d2b8
flip IF and RANGE order ( #7947 )
...
this is the rest of #7919 prereqs for new block lin
2024-11-28 13:35:30 -05:00
chenyu
bb23469f93
lower conv threshold on red ( #7948 )
2024-11-28 13:31:06 -05:00
chenyu
e243e709a7
BLOCK ops in Ops ( #7945 )
...
did this break conv speed?
2024-11-28 12:44:22 -05:00
qazal
f39e9b4288
match lazy movement ops in uop [pr] ( #7944 )
2024-11-28 23:03:43 +08:00
chenyu
f54508549f
don't search conv weight init in speed_v_theoretical ( #7943 )
2024-11-28 10:03:18 -05:00
chenyu
3c8c98253a
BEAM_DEBUG=1 in speed_v_theoretical ( #7942 )
...
* DEBUG=3 in speed_v_theoretical
* BEAM_DEBUG=1
2024-11-28 08:30:55 -05:00
qazal
aa7e16744e
allow sinking childless consts and fold them [pr] ( #7941 )
2024-11-28 20:23:37 +08:00
qazal
3ab67d45b2
init changes from the global buffer branch [pr] ( #7939 )
2024-11-28 19:38:58 +08:00
nimlgen
81d415be03
amd pkt3 refactor ( #7923 )
...
* amd pkt3 refactor
* replace this
* linter
* fix
* cmt
* fast
* simpler
* linter
* smth
* missing
2024-11-28 11:06:37 +03:00
qazal
e3fe7023b0
move all VIEW -> LOAD rules to big graph rewrite [pr] ( #7936 )
...
* move all VIEW -> LOAD rules to big graph rewrite [pr]
* comments
2024-11-28 14:02:29 +08:00
qazal
e2eccdab43
swizzle upat consistency + assert it's base [pr] ( #7935 )
2024-11-28 13:35:55 +08:00
George Hotz
c5c3b05b5a
block lin: only the test changes ( #7933 )
2024-11-28 13:19:00 +08:00
George Hotz
32dbab945c
Revert "add block uops and modify tests ( #7931 )" ( #7932 )
...
This reverts commit 6f4519ff45 .
2024-11-28 13:15:41 +08:00
George Hotz
6f4519ff45
add block uops and modify tests ( #7931 )
2024-11-28 13:11:18 +08:00
chenyu
336a9b6bf3
remove dtype from llama precompute_freqs_cis ( #7930 )
...
do the cast based on input in first forward call instead
2024-11-27 22:28:40 -05:00
chenyu
3e2430f822
use tqdm tqdm in mlperf training ( #7929 )
...
issue in benchmark dashboard logging, revert back to tqdm tqdm for now
2024-11-27 21:57:05 -05:00
Sieds Lykles
864758423e
Don't take const in gcd and change the "nothing_changed" condition ( #7926 )
...
* Don't take const in gcd and change the "nothing_changed" condition
Biggest difference is probably actually that I forgot to check if gcd
changed if nothing else changed
The TODO was fixed by not using the const in the gcd, and then taking it
out
* Fix more tests
2024-11-27 18:07:36 -05:00
chenyu
988d64900b
add TODO case to test_mod_congruence ( #7925 )
...
same alu count but better bounds
2024-11-27 15:23:21 -05:00
chenyu
57262c8e34
update Tensor.scatter doc examples ( #7924 )
...
same example from torch, i think it's much more useful
2024-11-27 11:42:36 -05:00
geohotstan
cea5853cfa
add Tensor.scatter ( #7737 )
...
* working I think
* where are my onnx scatter tests??
* forward_only for now
* try if nan hack fix NV
* looks like issue is different... CUDA WHY
* oops that was wrong. Try if this fixes CUDA
* simpler multiply
* actually finish this up tmrw morning :x
* fix tests?
* improve tests
* improve test and implementation
* fix ruff
* complete but lots of expected failure...
* reviewed tests
* add onnx tests
* is this a processing op?
* add return type to indicate that it's not in-place
* final cleanups
* use or and improve tests a little
* add masked_index_select
* call it masked_setitem instead
* try
* FIXED
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-11-27 10:52:04 -05:00
JaSpa99
38f34ca0cb
prepare mypy==1.13.0: legacy cast ( #7866 )
...
* use helper to narrow literal type
* narrow with asserts instead of cast
* remove parantheses
* tensor.item() calls tensor.data()
* no copy
* proper indexing
2024-11-27 10:33:35 -05:00
geohotstan
753f07e193
add circular pad mode to Tensor.pad ( #7918 )
...
* start
* send it
* no more neg circular pads
* quick fix onnx too
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-11-27 10:30:51 -05:00
chenyu
a58e289d77
Revert "prereqs for new block lin so PR works ( #7919 )" ( #7921 )
...
This reverts commit c53261b541 .
2024-11-27 08:41:09 -05:00
George Hotz
c53261b541
prereqs for new block lin so PR works ( #7919 )
2024-11-27 15:07:54 +08:00
chenyu
a6171cbe71
add stable diffusion v2 to mac benchmark ( #7917 )
...
this caught #7902
2024-11-26 22:09:43 -05:00
Sieds Lykles
d318867776
Factoring gcd out of mod ( #7916 )
...
* Factoring gcd out of mod
Curious if this will be faster/better
* Update bounds on test
2024-11-26 21:17:22 -05:00
nimlgen
84f96e48a1
hcq signal tiny refactor ( #7913 )
...
* hcq signal tiny refactor
* no mv
* fix
* fix2
* fix3
2024-11-26 21:48:38 +03:00