chenyu
e4c0743188
failed example for logcumsumexp ( #6936 )
...
need cummax for numerical stability
2024-10-07 10:55:45 -04:00
qazal
b82023c97e
process replay cleanup to generic _pmap [pr] ( #6929 )
...
* process replay cleanup to generic _pmap [pr]
* delete `COMPARE_SCHEDULE`
2024-10-07 13:57:05 +08:00
qazal
16312b4c59
rip out old scheduler process replay stuff, diff pure UOps [pr] ( #6927 )
2024-10-07 13:20:35 +08:00
wozeparrot
9eb6eef441
seed in tensor ( #6869 )
2024-10-06 14:46:58 -04:00
jeffzh4ng
19a7e41113
implement logcumsumexp ( #6921 )
...
* implement logcumsumexp
* change axis=None to axis=0
2024-10-06 10:45:36 -04:00
chenyu
75d9dcf000
support dtype in softmax and log_softmax ( #6914 )
...
matches torch. for mixed precision training, we would want to use float for softmax
2024-10-06 07:18:15 -04:00
chenyu
08414d7b7c
cleanup test_uop_symbolic.py ( #6894 )
...
no more test_symbolic for reference, so force expected output to be exact instead of a set
2024-10-04 20:53:10 -04:00
ignaciosica
555bcb5e54
static access for code_for_op ( #6889 )
2024-10-05 07:38:01 +08:00
vladov
5f6b6162b3
Suppress warnings in transcendental tests. ( #6891 )
2024-10-05 07:37:17 +08:00
George Hotz
4df5c7a4ef
move lazy to engine [pr] ( #6886 )
...
* move lazy to engine [pr]
* engine.lazy
2024-10-04 23:19:26 +08:00
George Hotz
6b063450df
move hcq device to runtime [pr] ( #6879 )
...
* things that are only used in one place don't belong in helpers [pr]
* start moving hcq device [pr]
* fix paths
2024-10-04 22:26:50 +08:00
George Hotz
8ca506ee37
remove the magic methods for moving between devices [pr] ( #6881 )
...
* remove the magic methods for moving between devices [pr]
* remove unneeded clang
2024-10-04 20:27:52 +08:00
George Hotz
a0cb16ac61
node cleanup + local metal test speed [pr] ( #6880 )
...
* node cleanup [pr]
* fix tests, including the double one on metal
* no time tqdm tests
2024-10-04 18:14:23 +08:00
George Hotz
cdff1d75b6
things that are only used in one place don't belong in helpers [pr] ( #6878 )
...
* things that are only used in one place don't belong in helpers [pr]
* pretty print moved
2024-10-04 17:27:38 +08:00
George Hotz
f4ec39fe58
switch symbolic from old to uops, final PR ( #6872 )
...
* switch symbolic from old to uops, final PR
* two wrong answers
* not needed resolves
* symbolic ops passes
* symbolic ops passes
* progress
* tests pass (almost)
* fix last test
* fix some tests
* global binding and unbinding
* Revert "global binding and unbinding"
This reverts commit 9456725630 .
* that test works now
* vars on uop doesn't recurse
* fix fuzzer
* update
* fix type
* fix gpt, it's UOp now
* ssimplify symbolics
2024-10-04 16:42:27 +08:00
George Hotz
738a5794a9
last update for new symbolic [pr] ( #6877 )
2024-10-04 14:58:51 +08:00
qazal
17068410e6
give EXT schedules metadata [pr] ( #6865 )
2024-10-03 20:14:18 +08:00
George Hotz
e10245909a
explore global uop cache [pr] ( #6863 )
...
* explore global uop cache
* wvd uops
* remove useless lru caches
* key is is
* simpler rewriter
2024-10-03 13:08:13 +08:00
chenyu
c3c93f332a
symbolic bool raise ValueError when not sure [pr] ( #6853 )
2024-10-02 09:10:58 -04:00
George Hotz
7214450c23
little symbolic changes [pr] ( #6849 )
...
* little symbolic changes [pr]
* symbolic needs resolve too
* no resolve
* less change
2024-10-02 17:12:30 +08:00
George Hotz
be12409b51
changes for symbolic ( #6844 )
...
* changes for symbolic
* only for ints
* check int first
2024-10-02 12:57:16 +08:00
George Hotz
100ce7a684
hotfix: min/max on CMPNE was wrong
2024-10-02 10:15:03 +08:00
George Hotz
1ac83aaa4b
lil sym changes ( #6837 )
...
* lil sym changes [pr]
* fix inf crap
* Update ops.py
* remove that, it's wrong
2024-10-02 09:54:17 +08:00
George Hotz
84726e8855
good changes from symbolic removal [run_process_replay] ( #6835 )
...
* good changes from symbolic removal [run_process_replay]
* fix __ne__
2024-10-01 18:49:09 +08:00
qazal
c5b252cdb3
add pr alias [pr] ( #6834 )
2024-10-01 18:48:44 +08:00
George Hotz
e907b25792
move some pm rules to uopgraph.py [run_process_replay] ( #6831 )
...
* move some pm rules to uopgraph.py [run_process_replay]
* move more
* move lt and clean
* end maybe
* put back
2024-10-01 18:28:41 +08:00
vladov
501cfde7e6
Fix GPT2 with OpenCL backend. ( #6821 )
...
* Fix GPT2 with OpenCL backend.
* Add test for unaligned copies into OpenCL buffers.
2024-10-01 16:57:22 +08:00
qazal
a16a8c5958
color process replay stats [run_process_replay] ( #6830 )
2024-10-01 15:29:11 +08:00
George Hotz
547733e57c
stunning_mnist [run_process_replay] ( #6828 )
...
* stunning_mnist [run_process_replay]
* add loss to stunning mnist
2024-10-01 15:00:48 +08:00
qazal
391497a311
schedule independent of Device [run_process_replay] ( #6829 )
2024-10-01 14:46:26 +08:00
George Hotz
8a93c48901
pickle main pattern matcher [run_process_replay] ( #6827 )
...
* pickle main pattern matcher [run_process_replay]
* del line
2024-10-01 13:58:42 +08:00
George Hotz
d726eb6f48
uop resolve [run_process_replay] ( #6826 )
...
* uop bool and int and stuff [run_process_replay]
* add ne support
* can't even be None anymore
* BinaryOps.AND support
* less compare
2024-10-01 13:11:42 +08:00
George Hotz
50dd6bd951
move cmp tuple out [run_process_replay] ( #6825 )
...
* move cmp tuple out [run_process_replay]
* was unneeded
2024-10-01 10:38:28 +08:00
George Hotz
0f28e93224
add pickle support for pattern matchers [run_process_replay] ( #6816 )
...
* add pickle support for pattern matchers [run_process_replay]
* cleaner and all
* no closures
* fix tests
* revert that
* final
* cleaner
* python 3.8 fix
* add round trip back
* this
* waste lines on this. that's the final line count
* max print better
* more targetted fix
* regrettably add 3.8 support
2024-09-30 21:54:46 +08:00
qazal
0c24fec9f4
test current behavior of const schedule [run_process_replay] ( #6817 )
2024-09-30 21:02:01 +08:00
qazal
4a4aa69b84
add a better dedup test for DEFINE_VAR with CONST arg ( #6813 )
2024-09-30 15:43:55 +08:00
qazal
e7fcbe1a4d
refactor test_linearizer correctness asserts ( #6812 )
2024-09-30 15:31:02 +08:00
George Hotz
9dd9f71011
no global kernel stuff [run_process_replay] ( #6808 )
...
* use traceback instead of global metadata crap [run_process_replay]
* save the kernel
* correct, imports clean, no device
* UNPARENTED
* speed
* proudly unparented
* Update ops.py
* update tests for unparented
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2024-09-30 13:52:33 +08:00
qazal
2ec73d6f05
push swizzle through dim change ( #6801 )
...
* push swizzle through dim change
* can this be generic
* generic version
* cleanups
2024-09-30 09:04:59 +08:00
qazal
dab05ff070
match dataclass.replace in UOp.replace [run_process_replay] ( #6792 )
...
* UOp replace matching dataclass replace
* p2
* replace creates a copy
2024-09-28 16:28:49 +08:00
George Hotz
eaa1e0eeeb
rename constant_folder to sym [run_process_replay] ( #6780 )
2024-09-27 14:54:54 +08:00
George Hotz
c178dc1071
faster uops ci [run_process_replay] ( #6774 )
2024-09-26 20:15:01 +08:00
nimlgen
3c56aeee70
add Tensor.from_blob ( #6765 )
...
* draft tensor from pointer init
* some docs and types
* comment
* cleaner
* test
* malloc
* qcom cl interop
* jit example
* cleaner
* dealoc
* wording
* docs
2024-09-26 18:33:19 +08:00
George Hotz
14ad47b515
rewrite to use uops if ( #6764 )
...
* rewrite to use uops if
* does this pass
* careful penalty
* fix tests
* remove unused stuff
* that's a cstyle rewrite
* Update test_linearizer_dumb.py
2024-09-26 18:09:09 +08:00
wozeparrot
2b899164c6
no numpy ( #6751 )
2024-09-26 16:40:18 +08:00
qazal
ee4feedb77
delete test_variable_const [run_process_replay] ( #6757 )
...
* delete test_variable_const [run_process_replay]
* don't allow variable UPat
2024-09-26 12:27:11 +08:00
George Hotz
b199b699ed
use shl everywhere ( #6744 )
...
* use shl everywhere
* fix parens
* late patterns
* works as an extra pass
* ptx
2024-09-26 09:59:36 +08:00
qazal
12e4a4900a
hotfix: missing return in METAL dm benchmark ( #6749 )
2024-09-26 09:12:38 +08:00
qazal
8a15ccb414
start gc/mem usage tests for buffer schedule [run_process_replay] ( #6737 )
...
* gc tests for buffer schedule [run_process_replay]
* assert global counters, maybe del
* check init
* rm global counters
2024-09-26 08:26:31 +08:00
qazal
b629a7998d
early assert buffer count limit [run_process_replay] ( #6746 )
...
* better error message for buffer count limit [run_process_replay]
* 3.9 needs that
* assert ScheduleItem
* new _test_buf_cnt
2024-09-26 08:24:26 +08:00