Commit Graph

6218 Commits

Author SHA1 Message Date
chenyu
4c3895744e type annotation for layernorm (#6883) 2024-10-04 09:03:56 -04:00
George Hotz
8ca506ee37 remove the magic methods for moving between devices [pr] (#6881)
* remove the magic methods for moving between devices [pr]

* remove unneeded clang
2024-10-04 20:27:52 +08:00
chenyu
7c8849010a fix var_vals in MCTS (#6882)
tested with JITBEAM=100 llama
2024-10-04 08:19:35 -04:00
George Hotz
a0cb16ac61 node cleanup + local metal test speed [pr] (#6880)
* node cleanup [pr]

* fix tests, including the double one on metal

* no time tqdm tests
2024-10-04 18:14:23 +08:00
George Hotz
cdff1d75b6 things that are only used in one place don't belong in helpers [pr] (#6878)
* things that are only used in one place don't belong in helpers [pr]

* pretty print moved
2024-10-04 17:27:38 +08:00
George Hotz
f4ec39fe58 switch symbolic from old to uops, final PR (#6872)
* switch symbolic from old to uops, final PR

* two wrong answers

* not needed resolves

* symbolic ops passes

* symbolic ops passes

* progress

* tests pass (almost)

* fix last test

* fix some tests

* global binding and unbinding

* Revert "global binding and unbinding"

This reverts commit 9456725630.

* that test works now

* vars on uop doesn't recurse

* fix fuzzer

* update

* fix type

* fix gpt, it's UOp now

* ssimplify symbolics
2024-10-04 16:42:27 +08:00
George Hotz
738a5794a9 last update for new symbolic [pr] (#6877) 2024-10-04 14:58:51 +08:00
chenyu
7391376528 update bert hparams (#6876)
4h32m with this https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/q99frv1l/overview.

loss scaler 2**13->2**10. matched the closest submission, no nan for ~10 runs.

increased lr and total step a bit.

`PARALLEL=0` after setup, same as resnet.
2024-10-04 00:39:06 -04:00
George Hotz
0dee49637e small symbolic changes [pr] (#6874)
* small symbolic changes [pr]

* need that unbind
2024-10-04 12:03:08 +08:00
George Hotz
c50d3c4979 move const mover to ops [pr] (#6873)
* move const mover to ops [pr]

* move more
2024-10-04 11:49:32 +08:00
Tim Becker
d42cb5596f Restore fast path for matching new_src in rewrite (#6870) 2024-10-04 11:22:24 +08:00
ignaciosica
8931f20765 CLANG fixed ops python [run_process_replay] (#6866)
* hotfix: fixed values in ops_python for AMX

* hotfix: remove unused import
2024-10-03 20:40:04 +08:00
George Hotz
4b6732c4f6 safe changes for new symbolic [pr] (#6864) 2024-10-03 20:39:15 +08:00
qazal
17068410e6 give EXT schedules metadata [pr] (#6865) 2024-10-03 20:14:18 +08:00
qazal
5517a07a09 viz late to_program and benchmarks [pr] (#6851)
* viz late to_program [pr]

* benchmark resnet

* delete all of checkStatus

* revert that

* fixup

* get from kernel
2024-10-03 18:29:04 +08:00
qazal
c7925414df don't default print the whole graph in buf limit error [pr] (#6861) 2024-10-03 18:02:19 +08:00
George Hotz
e10245909a explore global uop cache [pr] (#6863)
* explore global uop cache

* wvd uops

* remove useless lru caches

* key is is

* simpler rewriter
2024-10-03 13:08:13 +08:00
George Hotz
a26c6a0ad0 cleanup with smax [pr] (#6854)
* cleanup with smax [pr]

* add that resolve
2024-10-03 08:11:02 +08:00
nimlgen
8bbf6fb88c use mv_address in ops_gpu (#6856) 2024-10-02 22:31:51 +03:00
chenyu
c3c93f332a symbolic bool raise ValueError when not sure [pr] (#6853) 2024-10-02 09:10:58 -04:00
chenyu
08850da026 minor rand_like change [run_process_replay] (#6848) 2024-10-02 07:27:51 -04:00
George Hotz
7214450c23 little symbolic changes [pr] (#6849)
* little symbolic changes [pr]

* symbolic needs resolve too

* no resolve

* less change
2024-10-02 17:12:30 +08:00
qazal
fc78716d31 Buffer arg from big graph [pr] (#6847)
* Buffer arg from big graph [pr]

* x.dtype
2024-10-02 15:28:47 +08:00
qazal
29363fb85e add dtype.ptr() [pr] (#6839) 2024-10-02 15:03:05 +08:00
George Hotz
be12409b51 changes for symbolic (#6844)
* changes for symbolic

* only for ints

* check int first
2024-10-02 12:57:16 +08:00
qazal
1735f8ef1c viz rewrite part 1 [pr] (#6842)
* core viz spec

* leaaan

* refine docs

* .

* add rewrite_count back in ui
2024-10-02 11:56:25 +08:00
mesozoic-egg
d2e02b47e1 Construct c_ulong in blitCommandEncoder copy method (#6793)
* Construct c_ulong in blitCommandEncoder copy method

* line too long

---------

Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me>
2024-10-02 11:09:37 +08:00
George Hotz
567e10efcb lil symbolic changes [pr] (#6841) 2024-10-02 10:56:22 +08:00
George Hotz
100ce7a684 hotfix: min/max on CMPNE was wrong 2024-10-02 10:15:03 +08:00
chenyu
5f77217772 bert default CKPT to 0 (#6840)
not required
2024-10-01 21:55:56 -04:00
George Hotz
1ac83aaa4b lil sym changes (#6837)
* lil sym changes [pr]

* fix inf crap

* Update ops.py

* remove that, it's wrong
2024-10-02 09:54:17 +08:00
Tobias Fischer
33f7599158 Compute FID Score (#6802)
* compute fid score code

* cleaner s1 and m1 loading
2024-10-01 19:47:58 -04:00
ignaciosica
6a73ad89a2 get global, local and shared max from cudarenderer (#6836) 2024-10-01 16:32:57 +03:00
George Hotz
84726e8855 good changes from symbolic removal [run_process_replay] (#6835)
* good changes from symbolic removal [run_process_replay]

* fix __ne__
2024-10-01 18:49:09 +08:00
qazal
c5b252cdb3 add pr alias [pr] (#6834) 2024-10-01 18:48:44 +08:00
George Hotz
e907b25792 move some pm rules to uopgraph.py [run_process_replay] (#6831)
* move some pm rules to uopgraph.py [run_process_replay]

* move more

* move lt and clean

* end maybe

* put back
2024-10-01 18:28:41 +08:00
qazal
0cb82f308c viz don't include graph_rewrites that return a non-UOp (#6832)
* viz don't include graph_rewrites that return a non-UOp

* delete bad things
2024-10-01 18:13:53 +08:00
George Hotz
2a540d87e7 don't use is_int [run_process_replay] (#6833) 2024-10-01 18:13:34 +08:00
vladov
501cfde7e6 Fix GPT2 with OpenCL backend. (#6821)
* Fix GPT2 with OpenCL backend.

* Add test for unaligned copies into OpenCL buffers.
2024-10-01 16:57:22 +08:00
qazal
a16a8c5958 color process replay stats [run_process_replay] (#6830) 2024-10-01 15:29:11 +08:00
George Hotz
547733e57c stunning_mnist [run_process_replay] (#6828)
* stunning_mnist [run_process_replay]

* add loss to stunning mnist
2024-10-01 15:00:48 +08:00
qazal
391497a311 schedule independent of Device [run_process_replay] (#6829) 2024-10-01 14:46:26 +08:00
George Hotz
8a93c48901 pickle main pattern matcher [run_process_replay] (#6827)
* pickle main pattern matcher [run_process_replay]

* del line
2024-10-01 13:58:42 +08:00
George Hotz
d726eb6f48 uop resolve [run_process_replay] (#6826)
* uop bool and int and stuff [run_process_replay]

* add ne support

* can't even be None anymore

* BinaryOps.AND support

* less compare
2024-10-01 13:11:42 +08:00
qazal
a42b177533 express CONST view as SWIZZLE, uop VALID only once [run_process_replay] (#6823)
* construct VALID once and SWIZZLE

* make Variable work

* image dtype

* test: merge views happens already
2024-10-01 11:44:26 +08:00
George Hotz
50dd6bd951 move cmp tuple out [run_process_replay] (#6825)
* move cmp tuple out [run_process_replay]

* was unneeded
2024-10-01 10:38:28 +08:00
qazal
a1dee0e532 early uop UOps.BUFFER (only once) [run_process_replay] (#6820)
* buf_uops lookup [run_process_replay]

* next diff will be this

* fix ImageDType
2024-10-01 08:46:05 +08:00
nimlgen
e213bea426 nv shorter (#6819) 2024-09-30 19:39:32 +03:00
George Hotz
0f28e93224 add pickle support for pattern matchers [run_process_replay] (#6816)
* add pickle support for pattern matchers [run_process_replay]

* cleaner and all

* no closures

* fix tests

* revert that

* final

* cleaner

* python 3.8 fix

* add round trip back

* this

* waste lines on this. that's the final line count

* max print better

* more targetted fix

* regrettably add 3.8 support
2024-09-30 21:54:46 +08:00
chenyu
f59517754e add RESET_STEP in bert to control reset (#6818)
same as resnet
2024-09-30 09:39:04 -04:00