Ahmed Harmouche
133cacadde
Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) ( #8646 )
...
* Switch to dawn, all tests passing locally
* Use dawn-python
* Skip failing test
* Skip midcast and fix timestamp on metal ci
* Autogen webgpu
* Try fetch dawn lib again
* /usr/lib
* Without lib prefix
* Test autogen diff
* Delete webgpu support, move everything to ops_webgpu
* mypy fix
* Simplify, refactor
* Line savings
* No ResultContainer
* Type annotation for result
* Some more simplifications
* Why was this explicit sync used at all?
* Refactor: delete functions that are only used once
* Create shader module inline
* Clear unit tests cache, maybe that solves it
* That wasn't it
* Try deleting cache to pass failing weight compare
* weights_only=False for pytorch 2.6
* Simplify ctype array creation
* Remove nanosecond precision timestamps
* Simplify error handling
* Refactor, add back type annotations
* Deleted custom submit function, refactor
* read_buffer simplify
* Fix use after free, refactor
* Simplify supported_features
* Runtime docs
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-02-07 15:16:59 +08:00
chenyu
a092b6395d
Tuple -> tuple, List -> list [pr] ( #8936 )
2025-02-06 14:21:19 -05:00
Ignacio Sica
15f94ac964
TC_SEARCH_OVER_SHAPE to search multiple TC shapes ( #8793 )
...
* squash search over search
* refactor assert
* init benchmark
* cleaner get_kernel_actions
* cleaner get_kernel_actions
* add comment
2025-02-05 11:03:46 -05:00
Ignacio Sica
260df1a17f
tc_select noop (#8801 )
...
* tc_select noop
* revert changes in test
2025-01-29 13:53:23 -05:00
qazal
ba17786068
do not construct unmasked VALID ( #8759 )
...
* new lines that exist in codegen/ops
* update tests
* update sops.gz (13071 -> 13070 asts)
* fix viz too
* remove that TODO
* diff pruning
* mask assert + device
* work
* diff pruning
* re: fix viz too
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-01-28 20:51:21 +02:00
Ignacio Sica
b240f12593
[TIP-9] rename Opt's amt to arg 2 ( #8770 )
...
* rename Opt amt to arg
* ignore_beam_cache for test_tiny
* move ignore_beam_cache to test_tiny
* move to separate pr
* revert space change
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-01-27 14:19:04 -05:00
George Hotz
3ed146a5ff
Revert "rename Opt amt to arg ( #8767 )" ( #8769 )
...
This reverts commit bf041659a5 .
2025-01-27 23:46:37 +09:00
Ignacio Sica
bf041659a5
rename Opt amt to arg ( #8767 )
2025-01-27 23:36:47 +09:00
George Hotz
b4bf6a7dea
switch backward to use gradient [pr] ( #8235 )
...
* switch backward to use gradient [pr]
* set device correctly, dedup
* why does that fail?
* add noop cast
* simple backward
* fix beautiful_mnist
* touchups
* set in compute_gradient
* uop_count
* uop_count was wrong
* collections
* no note
* skip that test
* update sched kernel counts
* train mnist is 65
* fix metadata and gc
* fixes
* materialize_grads
* no pathlib stuff
* add contiguous_backward, fix bugs
* add some realize
* fix multi
2025-01-26 09:12:16 +09:00
George Hotz
46a8c5e1e5
delete forced_realize ( #8615 )
...
* delete forced_realize
* put that back
* expectedFailures
* cleaner create_subbuffer
* more comments
---------
Co-authored-by: qazal <qazal.software@gmail.com >
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2025-01-20 09:40:36 -08:00
qazal
d957a4f108
add tests for div buffer collapsing in the scheduler [pr] ( #8671 )
...
* add tests for mul/div buffer collapsing in the scheduler [pr]
* lint
* merge with test_linearizer's version of this
* 4*3
2025-01-18 14:15:29 -05:00
ignaciosica
d2234e308a
tf32 tc for nv and ptx ( #8635 )
...
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-01-17 17:43:57 -08:00
qazal
ae2229d727
assert kernel buffer limit at compile time [pr] ( #8595 )
...
* remove the BUF_LIMIT assert
* skip the base one
2025-01-13 16:32:07 -05:00
qazal
586e730d32
use UOp.st for kernel reduce axes ( #8499 )
...
* use UOp.st for kernel reduce axes [pr]
* do not return dict
2025-01-13 06:24:11 -05:00
qazal
866dfa1f23
create_schedule([x.lazydata]) -> x.schedule() in tests ( #8449 )
2024-12-31 03:15:52 +08:00
George Hotz
29c14f1cbf
hotfix: update tests for no uop mut
2024-12-30 10:05:37 -05:00
ignaciosica
ba0c844a83
special tol when f16 and bf16 are tc input dtypes ( #8183 )
2024-12-21 11:32:26 -05:00
George Hotz
bd9c015b09
tests from grad uop path [pr] ( #8313 )
2024-12-18 09:25:05 -08:00
Ahmed Harmouche
a73e3677d0
Test linearizer on webgpu ( #8159 )
...
* Test linearizer on wgpu
* Skip tests due to exceeded dims
2024-12-11 17:03:26 +01:00
qazal
6be388be86
failing test for const folding breaking indexing [pr] ( #8103 )
2024-12-07 19:55:02 +08:00
George Hotz
0c7477b108
no bool in range [pr] ( #7988 )
...
* no bool in range [pr]
* fix llvm
* add arg to range spec
* fix broken test
* forgot this one
* hotfix: test_tiny jit is a real test
2024-12-02 19:05:16 +08:00
George Hotz
f17af70d17
replace all sparents with toposort ( #7983 )
2024-12-02 15:00:30 +08:00
George Hotz
c5c3b05b5a
block lin: only the test changes ( #7933 )
2024-11-28 13:19:00 +08:00
George Hotz
32dbab945c
Revert "add block uops and modify tests ( #7931 )" ( #7932 )
...
This reverts commit 6f4519ff45 .
2024-11-28 13:15:41 +08:00
George Hotz
6f4519ff45
add block uops and modify tests ( #7931 )
2024-11-28 13:11:18 +08:00
chenyu
a58e289d77
Revert "prereqs for new block lin so PR works ( #7919 )" ( #7921 )
...
This reverts commit c53261b541 .
2024-11-27 08:41:09 -05:00
George Hotz
c53261b541
prereqs for new block lin so PR works ( #7919 )
2024-11-27 15:07:54 +08:00
ignaciosica
fc3154a7b3
metal bf16 tc support [pr] ( #7408 )
...
* add bf16 tc for metal
* hotfix: spacing
* fix tolerance and skip metal bf16 in ci
* hotfix: check for dtype_out
* hotfix: add check for tc.dtype_out is bf16 back
* hotfix: add parens
2024-11-20 14:39:08 -05:00
George Hotz
bc977fec53
dname -> device [pr] ( #7804 )
...
* dname -> device [pr]
* a few more
* only one left
2024-11-20 17:57:14 +08:00
geohotstan
8100109c9d
Add replicate mode to Tensor.pad ( #7608 )
...
* base implementation
* add tests
* actually remove the assertionerror test
* actually only have reflect for this pr
* change the 4 if-else one liner
* maybe use a lambda
* fix
* maybe a lil cleaner
* fix tests
* complete
* small change
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-11-18 10:55:38 -05:00
ignaciosica
597a239e28
Remove UnaryOps, BinaryOps, TernaryOps, MetaOps [pr] ( #7725 )
...
* remove unaryops
* remove ternaryops
* remove metaops
* hotfix
* remove binaryops
* hotfix: test_pattern_matcher
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2024-11-16 20:56:56 +08:00
George Hotz
0a411b4f68
replace llvm with new llvm ( #7616 )
...
* replace llvm with new llvm
* fix test_linearizer
* minor fixups
* fix alloca
* don't use alloca
* fix DEFINE_ACC
* lines
* comments and lines
* a little tighter
2024-11-10 11:28:52 +08:00
Ahmed Harmouche
e35226e698
Remove Ops.ALU ( #7595 )
2024-11-08 19:52:14 +08:00
Carl Basho
630a7f37cf
update tests ( #7554 )
...
Co-authored-by: John Doe <null@mail.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-11-05 11:35:15 -05:00
George Hotz
99bd4372a5
Ops.ALU is no more, the arg is just an op ( #7525 )
...
* op arg alu [pr]
* more
* more passing
* fix more tests
* more tests passing
* fix single failing test
* so much cleaner
* noop to not have process replay trigger
* fix ptx
2024-11-05 00:22:22 +08:00
George Hotz
c8bf09b7d4
s/UOps/Ops ( #7500 )
...
* s/UOps/Ops [pr]
* fix
2024-11-03 11:26:10 +08:00
George Hotz
4cb236a495
index in cstyle ( #7328 )
...
* index only in cstyle
* fix prefix dtypes
* fix tests
* global indexing
* Revert "global indexing"
This reverts commit 4d507e8abb .
* fix image
* fix image
* ptx tests
* fix CUDA dtype rendering
2024-10-29 13:06:26 +08:00
George Hotz
4812801aa6
try for canonical order ( #7286 )
...
* try for canonical order
* cmp better
* disable bad tests
* flip const order
* fix test
* fix tests
* different fix for NOOP
* metaclass here
* fix tests
* narrower scope
2024-10-25 16:04:54 +08:00
qazal
d2b608233a
get outbufs by globals idxs [pr] ( #7233 )
2024-10-23 16:06:35 +03:00
George Hotz
b0a13896d7
PtrDType is dataclass [pr] ( #7125 )
...
* PtrDType is dataclass [pr]
* new dataset
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-10-18 09:40:33 -04:00
George Hotz
ded1b38b84
minor dtype cleanup [pr] ( #7124 )
...
* minor dtype cleanup [pr]
* use ptr() function
2024-10-17 17:41:23 +08:00
George Hotz
a71bb09ec3
remove symbolic file [pr] ( #7012 )
2024-10-12 18:44:44 +08:00
qazal
20d3c2d113
unify UOps.SHAPETRACKER and UOps.SWIZZLE with UOps.VIEW ( #6955 )
...
* add UOps.VIEW
* update hardcoded asts
* update sops.gz
2024-10-09 02:00:17 +08:00
qazal
391497a311
schedule independent of Device [run_process_replay] ( #6829 )
2024-10-01 14:46:26 +08:00
George Hotz
50dd6bd951
move cmp tuple out [run_process_replay] ( #6825 )
...
* move cmp tuple out [run_process_replay]
* was unneeded
2024-10-01 10:38:28 +08:00
qazal
e7fcbe1a4d
refactor test_linearizer correctness asserts ( #6812 )
2024-09-30 15:31:02 +08:00
qazal
e0d8685c99
test_masked_upcast_wino check device buf_max ( #6723 )
2024-09-25 11:26:53 +08:00
George Hotz
7c38121280
load penalty ( #6681 )
...
* bias/bn loads after loops
* load penalty in fix_priority
* more generic test
2024-09-23 18:12:12 +08:00
qazal
982086f54c
UOps.VALID try 2 ( #6623 )
...
* make UOps.VALID compile
* fixable tests
* bufs dedup
* cleanup the CONST spec
* regenerate dataset with graph_rewrite
```py
def rewrite_const(const:UOp, st_src:UOp) -> UOp:
st: ShapeTracker = st_src.arg
return UOp(UOps.VALID, dtypes.bool, (st.to_uop(),)).where(UOp.const(const.dtype, const.arg), UOp.const(const.dtype, 0))
pm = PatternMatcher([(UPat(UOps.CONST, name="const", src=(UPat(UOps.SHAPETRACKER, name="st_src"),)), rewrite_const)])
```
* rm arg
* remove arg
* revert arg removal
This reverts commit 2c35c75c95 .
* red test_pickle_define_var
2024-09-21 14:19:25 +08:00
George Hotz
42ba887daa
remove logic to vectorize reduces ( #6536 )
...
* remove logic to vectorize reduces
* fix tests
2024-09-16 14:04:48 +08:00