Commit Graph

4433 Commits

Author SHA1 Message Date
chenyu
e4c0743188 failed example for logcumsumexp (#6936)
need cummax for numerical stability
2024-10-07 10:55:45 -04:00
qazal
b82023c97e process replay cleanup to generic _pmap [pr] (#6929)
* process replay cleanup to generic _pmap [pr]

* delete `COMPARE_SCHEDULE`
2024-10-07 13:57:05 +08:00
qazal
16312b4c59 rip out old scheduler process replay stuff, diff pure UOps [pr] (#6927) 2024-10-07 13:20:35 +08:00
wozeparrot
9eb6eef441 seed in tensor (#6869) 2024-10-06 14:46:58 -04:00
jeffzh4ng
19a7e41113 implement logcumsumexp (#6921)
* implement logcumsumexp

* change axis=None to axis=0
2024-10-06 10:45:36 -04:00
chenyu
75d9dcf000 support dtype in softmax and log_softmax (#6914)
matches torch. for mixed precision training, we would want to use float for softmax
2024-10-06 07:18:15 -04:00
chenyu
08414d7b7c cleanup test_uop_symbolic.py (#6894)
no more test_symbolic for reference, so force expected output to be exact instead of a set
2024-10-04 20:53:10 -04:00
ignaciosica
555bcb5e54 static access for code_for_op (#6889) 2024-10-05 07:38:01 +08:00
vladov
5f6b6162b3 Suppress warnings in transcendental tests. (#6891) 2024-10-05 07:37:17 +08:00
George Hotz
4df5c7a4ef move lazy to engine [pr] (#6886)
* move lazy to engine [pr]

* engine.lazy
2024-10-04 23:19:26 +08:00
George Hotz
6b063450df move hcq device to runtime [pr] (#6879)
* things that are only used in one place don't belong in helpers [pr]

* start moving hcq device [pr]

* fix paths
2024-10-04 22:26:50 +08:00
George Hotz
8ca506ee37 remove the magic methods for moving between devices [pr] (#6881)
* remove the magic methods for moving between devices [pr]

* remove unneeded clang
2024-10-04 20:27:52 +08:00
George Hotz
a0cb16ac61 node cleanup + local metal test speed [pr] (#6880)
* node cleanup [pr]

* fix tests, including the double one on metal

* no time tqdm tests
2024-10-04 18:14:23 +08:00
George Hotz
cdff1d75b6 things that are only used in one place don't belong in helpers [pr] (#6878)
* things that are only used in one place don't belong in helpers [pr]

* pretty print moved
2024-10-04 17:27:38 +08:00
George Hotz
f4ec39fe58 switch symbolic from old to uops, final PR (#6872)
* switch symbolic from old to uops, final PR

* two wrong answers

* not needed resolves

* symbolic ops passes

* symbolic ops passes

* progress

* tests pass (almost)

* fix last test

* fix some tests

* global binding and unbinding

* Revert "global binding and unbinding"

This reverts commit 9456725630.

* that test works now

* vars on uop doesn't recurse

* fix fuzzer

* update

* fix type

* fix gpt, it's UOp now

* ssimplify symbolics
2024-10-04 16:42:27 +08:00
George Hotz
738a5794a9 last update for new symbolic [pr] (#6877) 2024-10-04 14:58:51 +08:00
qazal
17068410e6 give EXT schedules metadata [pr] (#6865) 2024-10-03 20:14:18 +08:00
George Hotz
e10245909a explore global uop cache [pr] (#6863)
* explore global uop cache

* wvd uops

* remove useless lru caches

* key is is

* simpler rewriter
2024-10-03 13:08:13 +08:00
chenyu
c3c93f332a symbolic bool raise ValueError when not sure [pr] (#6853) 2024-10-02 09:10:58 -04:00
George Hotz
7214450c23 little symbolic changes [pr] (#6849)
* little symbolic changes [pr]

* symbolic needs resolve too

* no resolve

* less change
2024-10-02 17:12:30 +08:00
George Hotz
be12409b51 changes for symbolic (#6844)
* changes for symbolic

* only for ints

* check int first
2024-10-02 12:57:16 +08:00
George Hotz
100ce7a684 hotfix: min/max on CMPNE was wrong 2024-10-02 10:15:03 +08:00
George Hotz
1ac83aaa4b lil sym changes (#6837)
* lil sym changes [pr]

* fix inf crap

* Update ops.py

* remove that, it's wrong
2024-10-02 09:54:17 +08:00
George Hotz
84726e8855 good changes from symbolic removal [run_process_replay] (#6835)
* good changes from symbolic removal [run_process_replay]

* fix __ne__
2024-10-01 18:49:09 +08:00
qazal
c5b252cdb3 add pr alias [pr] (#6834) 2024-10-01 18:48:44 +08:00
George Hotz
e907b25792 move some pm rules to uopgraph.py [run_process_replay] (#6831)
* move some pm rules to uopgraph.py [run_process_replay]

* move more

* move lt and clean

* end maybe

* put back
2024-10-01 18:28:41 +08:00
vladov
501cfde7e6 Fix GPT2 with OpenCL backend. (#6821)
* Fix GPT2 with OpenCL backend.

* Add test for unaligned copies into OpenCL buffers.
2024-10-01 16:57:22 +08:00
qazal
a16a8c5958 color process replay stats [run_process_replay] (#6830) 2024-10-01 15:29:11 +08:00
George Hotz
547733e57c stunning_mnist [run_process_replay] (#6828)
* stunning_mnist [run_process_replay]

* add loss to stunning mnist
2024-10-01 15:00:48 +08:00
qazal
391497a311 schedule independent of Device [run_process_replay] (#6829) 2024-10-01 14:46:26 +08:00
George Hotz
8a93c48901 pickle main pattern matcher [run_process_replay] (#6827)
* pickle main pattern matcher [run_process_replay]

* del line
2024-10-01 13:58:42 +08:00
George Hotz
d726eb6f48 uop resolve [run_process_replay] (#6826)
* uop bool and int and stuff [run_process_replay]

* add ne support

* can't even be None anymore

* BinaryOps.AND support

* less compare
2024-10-01 13:11:42 +08:00
George Hotz
50dd6bd951 move cmp tuple out [run_process_replay] (#6825)
* move cmp tuple out [run_process_replay]

* was unneeded
2024-10-01 10:38:28 +08:00
George Hotz
0f28e93224 add pickle support for pattern matchers [run_process_replay] (#6816)
* add pickle support for pattern matchers [run_process_replay]

* cleaner and all

* no closures

* fix tests

* revert that

* final

* cleaner

* python 3.8 fix

* add round trip back

* this

* waste lines on this. that's the final line count

* max print better

* more targetted fix

* regrettably add 3.8 support
2024-09-30 21:54:46 +08:00
qazal
0c24fec9f4 test current behavior of const schedule [run_process_replay] (#6817) 2024-09-30 21:02:01 +08:00
qazal
4a4aa69b84 add a better dedup test for DEFINE_VAR with CONST arg (#6813) 2024-09-30 15:43:55 +08:00
qazal
e7fcbe1a4d refactor test_linearizer correctness asserts (#6812) 2024-09-30 15:31:02 +08:00
George Hotz
9dd9f71011 no global kernel stuff [run_process_replay] (#6808)
* use traceback instead of global metadata crap [run_process_replay]

* save the kernel

* correct, imports clean, no device

* UNPARENTED

* speed

* proudly unparented

* Update ops.py

* update tests for unparented

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-09-30 13:52:33 +08:00
qazal
2ec73d6f05 push swizzle through dim change (#6801)
* push swizzle through dim change

* can this be generic

* generic version

* cleanups
2024-09-30 09:04:59 +08:00
qazal
dab05ff070 match dataclass.replace in UOp.replace [run_process_replay] (#6792)
* UOp replace matching dataclass replace

* p2

* replace creates a copy
2024-09-28 16:28:49 +08:00
George Hotz
eaa1e0eeeb rename constant_folder to sym [run_process_replay] (#6780) 2024-09-27 14:54:54 +08:00
George Hotz
c178dc1071 faster uops ci [run_process_replay] (#6774) 2024-09-26 20:15:01 +08:00
nimlgen
3c56aeee70 add Tensor.from_blob (#6765)
* draft tensor from pointer init

* some docs and types

* comment

* cleaner

* test

* malloc

* qcom cl interop

* jit example

* cleaner

* dealoc

* wording

* docs
2024-09-26 18:33:19 +08:00
George Hotz
14ad47b515 rewrite to use uops if (#6764)
* rewrite to use uops if

* does this pass

* careful penalty

* fix tests

* remove unused stuff

* that's a cstyle rewrite

* Update test_linearizer_dumb.py
2024-09-26 18:09:09 +08:00
wozeparrot
2b899164c6 no numpy (#6751) 2024-09-26 16:40:18 +08:00
qazal
ee4feedb77 delete test_variable_const [run_process_replay] (#6757)
* delete test_variable_const [run_process_replay]

* don't allow variable UPat
2024-09-26 12:27:11 +08:00
George Hotz
b199b699ed use shl everywhere (#6744)
* use shl everywhere

* fix parens

* late patterns

* works as an extra pass

* ptx
2024-09-26 09:59:36 +08:00
qazal
12e4a4900a hotfix: missing return in METAL dm benchmark (#6749) 2024-09-26 09:12:38 +08:00
qazal
8a15ccb414 start gc/mem usage tests for buffer schedule [run_process_replay] (#6737)
* gc tests for buffer schedule [run_process_replay]

* assert global counters, maybe del

* check init

* rm global counters
2024-09-26 08:26:31 +08:00
qazal
b629a7998d early assert buffer count limit [run_process_replay] (#6746)
* better error message for buffer count limit [run_process_replay]

* 3.9 needs that

* assert ScheduleItem

* new _test_buf_cnt
2024-09-26 08:24:26 +08:00