chenyu
a77ee72d11
clean up reshape size check [pr] ( #8067 )
...
removed a resolve, and remove special case for 0 size assert since it's covered by generic size check
2024-12-06 07:51:19 -05:00
geohotstan
074a67a6eb
combine get inputs and type_parse function in onnx ( #8069 )
...
* 1 is simpler than 2
* variable name
* change error wording
* shapes for sequence type must be homogeneous
2024-12-06 07:42:35 -05:00
nimlgen
c0240855b9
qcom has not transfer ( #8075 )
...
* qcom alloc is not hcq alloc
* maybe base?
* test
2024-12-06 14:45:01 +03:00
Ahmed Harmouche
ce72fe1411
u32 to f16 in tinygrad ( #8074 )
...
* f16 decompression in tinygrad
* Typing and cleanup
2024-12-06 12:00:13 +01:00
George Hotz
e37bff6c19
fix bug in jit prune with copy [pr] ( #8073 )
2024-12-06 18:38:23 +08:00
George Hotz
aae8557ada
test copy inside jit [pr] ( #8072 )
2024-12-06 17:51:50 +08:00
chenyu
e7d5fe4a32
improve idiv _min_max ( #8066 )
...
for the cases that the we don't know the exact bounds, we might still know the sign. with this, can remove some resolve for symbolic shapetracker
2024-12-05 23:02:16 -05:00
Sieds Lykles
49c6dab74b
Add pattern for div mod recombine with gcd ( #8061 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-12-05 13:16:58 -05:00
chenyu
5c6ed5dba6
lower test_conv_3x3_256_32_32_256_256 expectation ( #8060 )
...
failed https://github.com/tinygrad/tinygrad/actions/runs/12182799887/job/33982676812#step:9:210
2024-12-05 10:30:56 -05:00
Ahmed Harmouche
ff9a89f714
Proper dtypes for input/output of exported WebGPU model ( #8053 )
...
* Respect input/output dtypes in exported WebGPU model
* Add some comments about skipped dtypes
2024-12-05 10:38:05 +01:00
qazal
435a51e10c
reduce folding simple tests [pr] ( #8040 )
...
* reduce folding simple tests [pr]
* test for view and realized src pattern
* realize / buffer behavior
2024-12-05 12:22:45 +08:00
George Hotz
20878be2af
lower test_gemv_4096_16384 expectations
2024-12-05 12:08:26 +08:00
George Hotz
df18e7cc37
accept filename decorator [pr] ( #8049 )
...
* accept filename decorator [pr]
* add test for safe_load
* bring old tar tests back
2024-12-05 11:40:59 +08:00
chenyu
b3220ca7b1
test cases of always True/False lt ( #8048 )
...
* test cases of always True/False lt
* one more
2024-12-04 20:38:40 -05:00
geohotstan
5ce8090d42
simple onnx_ops cleanups ( #8003 )
...
* simple clean ups first
* more work
* kinda have adam
* ooo momentum worked nicely
* almost there
* wow.. is the onnx test wrong
* nicer optim stuff
* just skip that test
* small comment changes
* use naming convention from other parts of codebase
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-12-04 15:33:03 -05:00
Sieds Lykles
70db1bab5c
Fold nested div with const ( #8010 )
...
* Rebase nested div and with const
* Update the ordering
* return None on vectors
Fixes cpu test
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-12-04 14:59:09 -05:00
chenyu
0693158d28
lower v_theoretical gemv on red ( #8042 )
...
tiny7 is still slower https://github.com/tinygrad/tinygrad/actions/runs/12166149038/job/33931736130#step:8:209
2024-12-04 13:59:40 -05:00
qazal
b116e1511d
make device on uop optional [pr] ( #8034 )
2024-12-04 20:18:00 +08:00
Ahmed Harmouche
13eedd373b
Run WebGPU tests on ubuntu ( #8033 )
2024-12-04 12:42:04 +01:00
George Hotz
08657cb7b0
hotfix: bump expectations in speed_v_theoretical
2024-12-04 19:00:33 +08:00
George Hotz
ea65c79ba2
hotfix: don't spam BEAM debug in speed_v_theoretical
2024-12-04 18:47:16 +08:00
George Hotz
09b00b1b04
hotfix: use kernel timings instead of python timings in speed_v_theoretical
2024-12-04 18:36:17 +08:00
leopf
f0401e14e8
tar_extract with Tensors ( #7853 )
...
* initial
* USTAR, PAX and GNU support + testing
* from_bytes byteorder
* use TarInfo.frombuf
* tensor only usage
* remove contextlib.suppress
* shorter ow,pax
* more tests
* testing length + move tests
* cleanup
* new approach: RawTensorIO
* fix fetch
* enable read test
* cleanup and ignore fix
* fix for python < 3.12
* make it RawIO
* functions
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-12-04 17:03:19 +08:00
uuuvn
e9c5b23ba1
Use MTLCompiler directly (v2) ( #7920 )
...
* Use MTLCompiler directly (v2)
* to_block_literal and REQUEST_TYPE_COMPILE
* Rewrite command encoding
* Revert to_block_literal
* Maybe that's more readable to some people?
* Typo and comment about stdlib caching
* Update ops_metal.py
* Update ops_metal.py
* Update ops_metal.py
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-12-04 16:36:48 +08:00
chenyu
0c060fa040
update uop and tests to not use lt/gt/le/ge [pr] ( #8023 )
...
just use dunder methods, eventually remove those from ops
2024-12-03 21:02:52 -05:00
Ahmed Harmouche
db330a3110
Remove WebGL ( #8012 )
2024-12-03 16:02:53 +01:00
chenyu
ef3752625b
add test case of realize_size with 0 in shape ( #8011 )
2024-12-03 09:19:50 -05:00
George Hotz
09eac42fd6
cache indexed uops in st [pr] ( #8008 )
...
* cache indexed uops in st [pr]
* remove arg from range
2024-12-03 21:27:07 +08:00
Sieds Lykles
e44183647f
Improved div folding ( #7996 )
...
* First version of div_mod folding together
* Working version with old div folding behaviour
* Test is fixed
* Fix linting
* Happy mypy
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-12-03 08:11:25 -05:00
qazal
5441127417
assert const folding return shape matches [pr] ( #8006 )
2024-12-03 19:31:06 +08:00
George Hotz
dddfb494d7
don't mutate the uop/lazybuffer, just the Buffer [pr] ( #8000 )
...
* don't mutate the uop/lazybuffer, just the Buffer [pr]
* fix red test
* try different fix
* that
* that's the right fix
* test for fixed behavior
* bump to 3.12
2024-12-03 19:03:51 +08:00
George Hotz
b8bf5b2787
minor uop speedups [pr] ( #8002 )
...
* minor uop cleaner [pr]
* free uop creation speed by removing WeakValueDictionary
* a lil faster
* disable that test
* lines
* and it doesn't print non hit patterns
2024-12-03 17:04:48 +08:00
George Hotz
0905f87b68
hotfix: print only kernel time
2024-12-03 14:25:08 +08:00
chenyu
c7bc75e634
alu(c?t0:f0, c?t1:f1) -> c?alu(t0,t1):alu(f0,f1) ( #7900 )
...
* alu(c?t0:f0, c?t1:f1) -> c?alu(t0,t1):alu(f0,f1)
only do if at least one branch is const, so total alu won't increase
* tests and interesting TODO cases
2024-12-02 17:19:27 -05:00
chenyu
b91fa24387
script to run regressed sd conv on metal ( #7995 )
...
* script to run regressed sd conv on metal
this and other similar `conv2d + add` kernels contributed to most of the speed regression
* # ruff: noqa: E501
2024-12-02 15:34:27 -05:00
geohotstan
0a2e10be1d
add SELU to Tensor ( #7993 )
...
* add selu
* more clean ups
2024-12-02 10:04:01 -05:00
qazal
bb606e5bcf
process replayable ops.py changes from delete_lazy [pr] ( #7994 )
...
* process replayable ops.py changes from delete_lazy [pr]
* hotfix: seed tiny_jit
2024-12-02 19:38:31 +08:00
George Hotz
0c7477b108
no bool in range [pr] ( #7988 )
...
* no bool in range [pr]
* fix llvm
* add arg to range spec
* fix broken test
* forgot this one
* hotfix: test_tiny jit is a real test
2024-12-02 19:05:16 +08:00
Ahmed Harmouche
1ea0925744
Support packed types in smem in webgpu
2024-12-02 10:13:25 +01:00
George Hotz
275951b730
clean up a few parents -> toposort [pr] ( #7984 )
...
* clean up a few parents -> toposort [pr]
* rename to old_parents + sched tests
* a few more
* that one
* second to last
* final
2024-12-02 15:59:31 +08:00
George Hotz
f17af70d17
replace all sparents with toposort ( #7983 )
2024-12-02 15:00:30 +08:00
qazal
b797aee720
uop global buf number tracking try 2 [pr] ( #7912 )
...
* uop buffer init small refactor [pr]
* add early
* this way it doesn't need late
* buffer_num
* itertools.count
* count from 0
* down to 380
2024-12-02 14:45:17 +08:00
George Hotz
cbcc1c20eb
second try at block linearize ( #7892 )
...
* second try at block linearize
* weeee, works for lil matmul
* it's so beautiful
* test tiny passes
* fix bugs
* combine matching BLOCKENDS
* wrapping
* test lin failures passes
* those failures were fake
* flip sort order
* fix ptx tests
* deal with store better
* dumb ptx fix
* expect less
* reduce lines
* reduce lines
* less lines and cleaner
* no defaultdict
* tighter
* simpler block_parent_count
2024-12-02 13:43:09 +08:00
mesozoic-egg
90e2b2d577
Remove gated store, put rewrite to uopgraph [pr] ( #7975 )
...
* update test for gated store
* put gated store rewrite to uopgraph, rm from ptx
* update test
update test
update test
* remove gated st rewrite in llvm
* lint
---------
Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.mail >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-12-02 12:33:16 +08:00
George Hotz
d53cd92364
fix tests for delete lazy [pr] ( #7980 )
2024-12-02 12:00:48 +08:00
George Hotz
6c1efb9a72
hotfix: amd gemv was flaky
2024-12-02 11:08:24 +08:00
ignaciosica
509c4a573f
increase tolerance on test ( #7972 )
2024-11-30 11:50:10 -05:00
qazal
6f17eedaea
schedule sink folding try 2 [pr] ( #7968 )
2024-11-30 20:46:26 +08:00
qazal
5615e92df8
const folding tests [pr] ( #7967 )
2024-11-30 19:27:30 +08:00
qazal
8780818d04
Revert "schedule sink folding with graph_rewrite [pr] ( #7963 )" ( #7965 )
...
This reverts commit 4529c5d0da .
2024-11-30 19:02:06 +08:00