Commit Graph

2979 Commits

Author SHA1 Message Date
Ahmed Harmouche
13eedd373b Run WebGPU tests on ubuntu (#8033) 2024-12-04 12:42:04 +01:00
George Hotz
08657cb7b0 hotfix: bump expectations in speed_v_theoretical 2024-12-04 19:00:33 +08:00
George Hotz
ea65c79ba2 hotfix: don't spam BEAM debug in speed_v_theoretical 2024-12-04 18:47:16 +08:00
George Hotz
09b00b1b04 hotfix: use kernel timings instead of python timings in speed_v_theoretical 2024-12-04 18:36:17 +08:00
leopf
f0401e14e8 tar_extract with Tensors (#7853)
* initial

* USTAR, PAX and GNU support + testing

* from_bytes byteorder

* use TarInfo.frombuf

* tensor only usage

* remove contextlib.suppress

* shorter ow,pax

* more tests

* testing length + move tests

* cleanup

* new approach: RawTensorIO

* fix fetch

* enable read test

* cleanup and ignore fix

* fix for python < 3.12

* make it RawIO

* functions

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-12-04 17:03:19 +08:00
uuuvn
e9c5b23ba1 Use MTLCompiler directly (v2) (#7920)
* Use MTLCompiler directly (v2)

* to_block_literal and REQUEST_TYPE_COMPILE

* Rewrite command encoding

* Revert to_block_literal

* Maybe that's more readable to some people?

* Typo and comment about stdlib caching

* Update ops_metal.py

* Update ops_metal.py

* Update ops_metal.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-04 16:36:48 +08:00
chenyu
0c060fa040 update uop and tests to not use lt/gt/le/ge [pr] (#8023)
just use dunder methods, eventually remove those from ops
2024-12-03 21:02:52 -05:00
Ahmed Harmouche
db330a3110 Remove WebGL (#8012) 2024-12-03 16:02:53 +01:00
chenyu
ef3752625b add test case of realize_size with 0 in shape (#8011) 2024-12-03 09:19:50 -05:00
George Hotz
09eac42fd6 cache indexed uops in st [pr] (#8008)
* cache indexed uops in st [pr]

* remove arg from range
2024-12-03 21:27:07 +08:00
Sieds Lykles
e44183647f Improved div folding (#7996)
* First version of div_mod folding together

* Working version with old div folding behaviour

* Test is fixed

* Fix linting

* Happy mypy

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-12-03 08:11:25 -05:00
qazal
5441127417 assert const folding return shape matches [pr] (#8006) 2024-12-03 19:31:06 +08:00
George Hotz
dddfb494d7 don't mutate the uop/lazybuffer, just the Buffer [pr] (#8000)
* don't mutate the uop/lazybuffer, just the Buffer [pr]

* fix red test

* try different fix

* that

* that's the right fix

* test for fixed behavior

* bump to 3.12
2024-12-03 19:03:51 +08:00
George Hotz
b8bf5b2787 minor uop speedups [pr] (#8002)
* minor uop cleaner [pr]

* free uop creation speed by removing WeakValueDictionary

* a lil faster

* disable that test

* lines

* and it doesn't print non hit patterns
2024-12-03 17:04:48 +08:00
George Hotz
0905f87b68 hotfix: print only kernel time 2024-12-03 14:25:08 +08:00
chenyu
c7bc75e634 alu(c?t0:f0, c?t1:f1) -> c?alu(t0,t1):alu(f0,f1) (#7900)
* alu(c?t0:f0, c?t1:f1) -> c?alu(t0,t1):alu(f0,f1)

only do if at least one branch is const, so total alu won't increase

* tests and interesting TODO cases
2024-12-02 17:19:27 -05:00
chenyu
b91fa24387 script to run regressed sd conv on metal (#7995)
* script to run regressed sd conv on metal

this and other similar `conv2d + add` kernels contributed to most of the speed regression

* # ruff: noqa: E501
2024-12-02 15:34:27 -05:00
geohotstan
0a2e10be1d add SELU to Tensor (#7993)
* add selu

* more clean ups
2024-12-02 10:04:01 -05:00
qazal
bb606e5bcf process replayable ops.py changes from delete_lazy [pr] (#7994)
* process replayable ops.py changes from delete_lazy [pr]

* hotfix: seed tiny_jit
2024-12-02 19:38:31 +08:00
George Hotz
0c7477b108 no bool in range [pr] (#7988)
* no bool in range [pr]

* fix llvm

* add arg to range spec

* fix broken test

* forgot this one

* hotfix: test_tiny jit is a real test
2024-12-02 19:05:16 +08:00
Ahmed Harmouche
1ea0925744 Support packed types in smem in webgpu 2024-12-02 10:13:25 +01:00
George Hotz
275951b730 clean up a few parents -> toposort [pr] (#7984)
* clean up a few parents -> toposort [pr]

* rename to old_parents + sched tests

* a few more

* that one

* second to last

* final
2024-12-02 15:59:31 +08:00
George Hotz
f17af70d17 replace all sparents with toposort (#7983) 2024-12-02 15:00:30 +08:00
qazal
b797aee720 uop global buf number tracking try 2 [pr] (#7912)
* uop buffer init small refactor [pr]

* add early

* this way it doesn't need late

* buffer_num

* itertools.count

* count from 0

* down to 380
2024-12-02 14:45:17 +08:00
George Hotz
cbcc1c20eb second try at block linearize (#7892)
* second try at block linearize

* weeee, works for lil matmul

* it's so beautiful

* test tiny passes

* fix bugs

* combine matching BLOCKENDS

* wrapping

* test lin failures passes

* those failures were fake

* flip sort order

* fix ptx tests

* deal with store better

* dumb ptx fix

* expect less

* reduce lines

* reduce lines

* less lines and cleaner

* no defaultdict

* tighter

* simpler block_parent_count
2024-12-02 13:43:09 +08:00
mesozoic-egg
90e2b2d577 Remove gated store, put rewrite to uopgraph [pr] (#7975)
* update test for gated store

* put gated store rewrite to uopgraph, rm from ptx

* update test

update test

update test

* remove gated st rewrite in llvm

* lint

---------

Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.mail>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-02 12:33:16 +08:00
George Hotz
d53cd92364 fix tests for delete lazy [pr] (#7980) 2024-12-02 12:00:48 +08:00
George Hotz
6c1efb9a72 hotfix: amd gemv was flaky 2024-12-02 11:08:24 +08:00
ignaciosica
509c4a573f increase tolerance on test (#7972) 2024-11-30 11:50:10 -05:00
qazal
6f17eedaea schedule sink folding try 2 [pr] (#7968) 2024-11-30 20:46:26 +08:00
qazal
5615e92df8 const folding tests [pr] (#7967) 2024-11-30 19:27:30 +08:00
qazal
8780818d04 Revert "schedule sink folding with graph_rewrite [pr] (#7963)" (#7965)
This reverts commit 4529c5d0da.
2024-11-30 19:02:06 +08:00
qazal
4529c5d0da schedule sink folding with graph_rewrite [pr] (#7963)
* schedule sink folding with graph_rewrite [pr]

* x is reserved, use u

* match lazy const folding
2024-11-30 18:30:41 +08:00
nimlgen
10f431b96d hcq replace update with sint (#7899)
* try sym hcq

* start with amd

* move to nv

* nv works

* cache and qcom

* fixes

* signals

* fix nv

* qcom fixes

* linter

* linter

* cache + typings

* fixes

* tiny fixes

* linter

* linter

* lntr

* ugh

* comments
2024-11-29 20:08:13 +03:00
chenyu
aa51f3c14e update kernel counts in test_real_world (#7960)
the test was useless because it was looking at the jit graph counts. wrap with JIT=2 for now.

if it's stable we could consider making kernel count strict, which helps change like #7940
2024-11-29 11:14:54 -05:00
geohotstan
e1a85c262c no type-tracker getitem refactor (#6917)
* newest newer than new refactor of getitem

* hmmm

* hmmmmmmmmmmmmmmmmm

* bro.

* ???

* small improvements

* cleaner, but why u gotta do this to me mypy

* fix, but still dunno about mypy

* even better

* try again? Passes locally

* use match

* fix mypy

* better

* broooooo check this out

* fix mypy

* bug fix

* fixed

* polish
2024-11-29 10:18:02 -05:00
Sieds Lykles
d267a2d9eb Div mod recombine test for issue (#7957)
* Add test for failing div_mod recombine

* Add test case when there is gcd in div/mod
2024-11-29 08:47:50 -05:00
Ahmed Harmouche
2d11765295 Fix WebGPU atomic store (#7954) 2024-11-29 19:31:25 +08:00
geohotstan
765096fe7d fix Tensor._pool edge case (#7581)
* split into another branch

* polish

* try this

* Revert "try this"

This reverts commit 84f711b13e.

* try

* Revert "try"

This reverts commit 89c7a7649b.

* idk anymore

* it is what it is

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-11-28 23:17:13 -05:00
chenyu
bb23469f93 lower conv threshold on red (#7948) 2024-11-28 13:31:06 -05:00
qazal
f39e9b4288 match lazy movement ops in uop [pr] (#7944) 2024-11-28 23:03:43 +08:00
chenyu
f54508549f don't search conv weight init in speed_v_theoretical (#7943) 2024-11-28 10:03:18 -05:00
qazal
aa7e16744e allow sinking childless consts and fold them [pr] (#7941) 2024-11-28 20:23:37 +08:00
George Hotz
c5c3b05b5a block lin: only the test changes (#7933) 2024-11-28 13:19:00 +08:00
George Hotz
32dbab945c Revert "add block uops and modify tests (#7931)" (#7932)
This reverts commit 6f4519ff45.
2024-11-28 13:15:41 +08:00
George Hotz
6f4519ff45 add block uops and modify tests (#7931) 2024-11-28 13:11:18 +08:00
Sieds Lykles
864758423e Don't take const in gcd and change the "nothing_changed" condition (#7926)
* Don't take const in gcd and change the "nothing_changed" condition

Biggest difference is probably actually that I forgot to check if gcd
changed if nothing else changed
The TODO was fixed by not using the const in the gcd, and then taking it
out

* Fix more tests
2024-11-27 18:07:36 -05:00
chenyu
988d64900b add TODO case to test_mod_congruence (#7925)
same alu count but better bounds
2024-11-27 15:23:21 -05:00
geohotstan
cea5853cfa add Tensor.scatter (#7737)
* working I think

* where are my onnx scatter tests??

* forward_only for now

* try if nan hack fix NV

* looks like issue is different... CUDA WHY

* oops that was wrong. Try if this fixes CUDA

* simpler multiply

* actually finish this up tmrw morning :x

* fix tests?

* improve tests

* improve test and implementation

* fix ruff

* complete but lots of expected failure...

* reviewed tests

* add onnx tests

* is this a processing op?

* add return type to indicate that it's not in-place

* final cleanups

* use or and improve tests a little

* add masked_index_select

* call it masked_setitem instead

* try

* FIXED

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-11-27 10:52:04 -05:00
geohotstan
753f07e193 add circular pad mode to Tensor.pad (#7918)
* start

* send it

* no more neg circular pads

* quick fix onnx too

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-11-27 10:30:51 -05:00