chenyu
4c3895744e
type annotation for layernorm ( #6883 )
2024-10-04 09:03:56 -04:00
George Hotz
8ca506ee37
remove the magic methods for moving between devices [pr] ( #6881 )
...
* remove the magic methods for moving between devices [pr]
* remove unneeded clang
2024-10-04 20:27:52 +08:00
chenyu
7c8849010a
fix var_vals in MCTS ( #6882 )
...
tested with JITBEAM=100 llama
2024-10-04 08:19:35 -04:00
George Hotz
a0cb16ac61
node cleanup + local metal test speed [pr] ( #6880 )
...
* node cleanup [pr]
* fix tests, including the double one on metal
* no time tqdm tests
2024-10-04 18:14:23 +08:00
George Hotz
cdff1d75b6
things that are only used in one place don't belong in helpers [pr] ( #6878 )
...
* things that are only used in one place don't belong in helpers [pr]
* pretty print moved
2024-10-04 17:27:38 +08:00
George Hotz
f4ec39fe58
switch symbolic from old to uops, final PR ( #6872 )
...
* switch symbolic from old to uops, final PR
* two wrong answers
* not needed resolves
* symbolic ops passes
* symbolic ops passes
* progress
* tests pass (almost)
* fix last test
* fix some tests
* global binding and unbinding
* Revert "global binding and unbinding"
This reverts commit 9456725630 .
* that test works now
* vars on uop doesn't recurse
* fix fuzzer
* update
* fix type
* fix gpt, it's UOp now
* ssimplify symbolics
2024-10-04 16:42:27 +08:00
George Hotz
738a5794a9
last update for new symbolic [pr] ( #6877 )
2024-10-04 14:58:51 +08:00
chenyu
7391376528
update bert hparams ( #6876 )
...
4h32m with this https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/q99frv1l/overview .
loss scaler 2**13->2**10. matched the closest submission, no nan for ~10 runs.
increased lr and total step a bit.
`PARALLEL=0` after setup, same as resnet.
2024-10-04 00:39:06 -04:00
George Hotz
0dee49637e
small symbolic changes [pr] ( #6874 )
...
* small symbolic changes [pr]
* need that unbind
2024-10-04 12:03:08 +08:00
George Hotz
c50d3c4979
move const mover to ops [pr] ( #6873 )
...
* move const mover to ops [pr]
* move more
2024-10-04 11:49:32 +08:00
Tim Becker
d42cb5596f
Restore fast path for matching new_src in rewrite ( #6870 )
2024-10-04 11:22:24 +08:00
ignaciosica
8931f20765
CLANG fixed ops python [run_process_replay] ( #6866 )
...
* hotfix: fixed values in ops_python for AMX
* hotfix: remove unused import
2024-10-03 20:40:04 +08:00
George Hotz
4b6732c4f6
safe changes for new symbolic [pr] ( #6864 )
2024-10-03 20:39:15 +08:00
qazal
17068410e6
give EXT schedules metadata [pr] ( #6865 )
2024-10-03 20:14:18 +08:00
qazal
5517a07a09
viz late to_program and benchmarks [pr] ( #6851 )
...
* viz late to_program [pr]
* benchmark resnet
* delete all of checkStatus
* revert that
* fixup
* get from kernel
2024-10-03 18:29:04 +08:00
qazal
c7925414df
don't default print the whole graph in buf limit error [pr] ( #6861 )
2024-10-03 18:02:19 +08:00
George Hotz
e10245909a
explore global uop cache [pr] ( #6863 )
...
* explore global uop cache
* wvd uops
* remove useless lru caches
* key is is
* simpler rewriter
2024-10-03 13:08:13 +08:00
George Hotz
a26c6a0ad0
cleanup with smax [pr] ( #6854 )
...
* cleanup with smax [pr]
* add that resolve
2024-10-03 08:11:02 +08:00
nimlgen
8bbf6fb88c
use mv_address in ops_gpu ( #6856 )
2024-10-02 22:31:51 +03:00
chenyu
c3c93f332a
symbolic bool raise ValueError when not sure [pr] ( #6853 )
2024-10-02 09:10:58 -04:00
chenyu
08850da026
minor rand_like change [run_process_replay] ( #6848 )
2024-10-02 07:27:51 -04:00
George Hotz
7214450c23
little symbolic changes [pr] ( #6849 )
...
* little symbolic changes [pr]
* symbolic needs resolve too
* no resolve
* less change
2024-10-02 17:12:30 +08:00
qazal
fc78716d31
Buffer arg from big graph [pr] ( #6847 )
...
* Buffer arg from big graph [pr]
* x.dtype
2024-10-02 15:28:47 +08:00
qazal
29363fb85e
add dtype.ptr() [pr] ( #6839 )
2024-10-02 15:03:05 +08:00
George Hotz
be12409b51
changes for symbolic ( #6844 )
...
* changes for symbolic
* only for ints
* check int first
2024-10-02 12:57:16 +08:00
qazal
1735f8ef1c
viz rewrite part 1 [pr] ( #6842 )
...
* core viz spec
* leaaan
* refine docs
* .
* add rewrite_count back in ui
2024-10-02 11:56:25 +08:00
mesozoic-egg
d2e02b47e1
Construct c_ulong in blitCommandEncoder copy method ( #6793 )
...
* Construct c_ulong in blitCommandEncoder copy method
* line too long
---------
Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me >
2024-10-02 11:09:37 +08:00
George Hotz
567e10efcb
lil symbolic changes [pr] ( #6841 )
2024-10-02 10:56:22 +08:00
George Hotz
100ce7a684
hotfix: min/max on CMPNE was wrong
2024-10-02 10:15:03 +08:00
chenyu
5f77217772
bert default CKPT to 0 ( #6840 )
...
not required
2024-10-01 21:55:56 -04:00
George Hotz
1ac83aaa4b
lil sym changes ( #6837 )
...
* lil sym changes [pr]
* fix inf crap
* Update ops.py
* remove that, it's wrong
2024-10-02 09:54:17 +08:00
Tobias Fischer
33f7599158
Compute FID Score ( #6802 )
...
* compute fid score code
* cleaner s1 and m1 loading
2024-10-01 19:47:58 -04:00
ignaciosica
6a73ad89a2
get global, local and shared max from cudarenderer ( #6836 )
2024-10-01 16:32:57 +03:00
George Hotz
84726e8855
good changes from symbolic removal [run_process_replay] ( #6835 )
...
* good changes from symbolic removal [run_process_replay]
* fix __ne__
2024-10-01 18:49:09 +08:00
qazal
c5b252cdb3
add pr alias [pr] ( #6834 )
2024-10-01 18:48:44 +08:00
George Hotz
e907b25792
move some pm rules to uopgraph.py [run_process_replay] ( #6831 )
...
* move some pm rules to uopgraph.py [run_process_replay]
* move more
* move lt and clean
* end maybe
* put back
2024-10-01 18:28:41 +08:00
qazal
0cb82f308c
viz don't include graph_rewrites that return a non-UOp ( #6832 )
...
* viz don't include graph_rewrites that return a non-UOp
* delete bad things
2024-10-01 18:13:53 +08:00
George Hotz
2a540d87e7
don't use is_int [run_process_replay] ( #6833 )
2024-10-01 18:13:34 +08:00
vladov
501cfde7e6
Fix GPT2 with OpenCL backend. ( #6821 )
...
* Fix GPT2 with OpenCL backend.
* Add test for unaligned copies into OpenCL buffers.
2024-10-01 16:57:22 +08:00
qazal
a16a8c5958
color process replay stats [run_process_replay] ( #6830 )
2024-10-01 15:29:11 +08:00
George Hotz
547733e57c
stunning_mnist [run_process_replay] ( #6828 )
...
* stunning_mnist [run_process_replay]
* add loss to stunning mnist
2024-10-01 15:00:48 +08:00
qazal
391497a311
schedule independent of Device [run_process_replay] ( #6829 )
2024-10-01 14:46:26 +08:00
George Hotz
8a93c48901
pickle main pattern matcher [run_process_replay] ( #6827 )
...
* pickle main pattern matcher [run_process_replay]
* del line
2024-10-01 13:58:42 +08:00
George Hotz
d726eb6f48
uop resolve [run_process_replay] ( #6826 )
...
* uop bool and int and stuff [run_process_replay]
* add ne support
* can't even be None anymore
* BinaryOps.AND support
* less compare
2024-10-01 13:11:42 +08:00
qazal
a42b177533
express CONST view as SWIZZLE, uop VALID only once [run_process_replay] ( #6823 )
...
* construct VALID once and SWIZZLE
* make Variable work
* image dtype
* test: merge views happens already
2024-10-01 11:44:26 +08:00
George Hotz
50dd6bd951
move cmp tuple out [run_process_replay] ( #6825 )
...
* move cmp tuple out [run_process_replay]
* was unneeded
2024-10-01 10:38:28 +08:00
qazal
a1dee0e532
early uop UOps.BUFFER (only once) [run_process_replay] ( #6820 )
...
* buf_uops lookup [run_process_replay]
* next diff will be this
* fix ImageDType
2024-10-01 08:46:05 +08:00
nimlgen
e213bea426
nv shorter ( #6819 )
2024-09-30 19:39:32 +03:00
George Hotz
0f28e93224
add pickle support for pattern matchers [run_process_replay] ( #6816 )
...
* add pickle support for pattern matchers [run_process_replay]
* cleaner and all
* no closures
* fix tests
* revert that
* final
* cleaner
* python 3.8 fix
* add round trip back
* this
* waste lines on this. that's the final line count
* max print better
* more targetted fix
* regrettably add 3.8 support
2024-09-30 21:54:46 +08:00
chenyu
f59517754e
add RESET_STEP in bert to control reset ( #6818 )
...
same as resnet
2024-09-30 09:39:04 -04:00