Commit Graph

6249 Commits

Author SHA1 Message Date
qazal
9250452da4 no codegen import in ops [pr] (#6888)
* no codegen import in ops [pr]

* @track_rewrites

* all functions need this

* polish
2024-10-07 20:54:21 +08:00
George Hotz
f7f94cd62f bitcast cleanup [pr] (#6933) 2024-10-07 19:16:16 +08:00
chenyu
0cf815a93a bert use BS=66 and update hparams (#6932)
with dropout memory improvement, we can fit BS=66 now. revert back to the hparams in #5891 too
2024-10-07 05:08:27 -04:00
ignaciosica
32ac24c45c Generic wmma rendering for cuda, ptx [run_process_replay] (#6838)
* generic wmma rendering for cuda, ptx

- also adds wmma generic shape ops_python support

* hotfix: fixed values in ops_python

* hotfix: more fixed values

* hotfix: revert changes in ops_python

* refactor wmma rendering

* hotfix: get n_args directly

* hotfix: use n_args[0] for a

* hotfix: simplify

* hotfix: add args_slices

* hotfix: rename args back to operands

* hotfix: fix spacing

* hotfix: rename upc to sz

* hotfix: rename args to operands in assembly

* hotfix: space

* hotifx: add comment for literal 4

* hotfix: rename some variables and change for clarity
2024-10-07 16:36:36 +08:00
qazal
b82023c97e process replay cleanup to generic _pmap [pr] (#6929)
* process replay cleanup to generic _pmap [pr]

* delete `COMPARE_SCHEDULE`
2024-10-07 13:57:05 +08:00
qazal
16312b4c59 rip out old scheduler process replay stuff, diff pure UOps [pr] (#6927) 2024-10-07 13:20:35 +08:00
chenyu
999e3780e9 dropout contiguous after >= p (#6892)
make it a bool buffer
2024-10-06 19:40:42 -04:00
wozeparrot
9eb6eef441 seed in tensor (#6869) 2024-10-06 14:46:58 -04:00
Tobias Fischer
f9e32f2bb2 clip device fix (#6924) 2024-10-07 00:47:32 +08:00
chenyu
01a2d7316d dtype=float in bert log_softmax for loss and accuracy (#6916) 2024-10-06 11:15:56 -04:00
jeffzh4ng
19a7e41113 implement logcumsumexp (#6921)
* implement logcumsumexp

* change axis=None to axis=0
2024-10-06 10:45:36 -04:00
George Hotz
f588169fdc hotfix: ad for DEBUG=2 in the mnist tutorial 2024-10-06 21:05:48 +08:00
qazal
10ff1d6fb9 viz prep refactor for tracked scope decorator [pr] (#6920)
* viz prep refactor for tracked scope decorator [pr]

* fix fuzzer
2024-10-06 16:02:09 +03:00
qazal
837f9c6832 new viz fuzz tests, track multiple contexts (#6913)
* add FUZZ_VIZ option

* add FUZZ_VIZ=1 tests

* use .replace

* rewrites test

* add rewrite_stack

* add FUZZ_VIZ to ops

* what if FUZZ_VIZ was up there

* leave fuzz_viz for now
2024-10-06 14:58:15 +03:00
chenyu
75d9dcf000 support dtype in softmax and log_softmax (#6914)
matches torch. for mixed precision training, we would want to use float for softmax
2024-10-06 07:18:15 -04:00
chenyu
718b959349 log epoch start and stop for bert (#6912) 2024-10-06 06:39:46 -04:00
qazal
b066ef2282 small changes from the viz_rewrite branch [pr] (#6907)
* simpler replace

* dont show shapetracker consts

* changed_nodes shouldn't exist for the first sink
2024-10-06 12:00:55 +03:00
chenyu
16c1fa4208 use BEAM=3 for red box bert runs (#6904)
BEAM=4 slightly exceeded 30 minutes setup
2024-10-05 09:21:12 -04:00
chenyu
0e706227a2 add seed to bert result log filename (#6903)
* add seed to bert result log filename

* different name for different benchmark
2024-10-05 09:15:24 -04:00
George Hotz
8ed3a00c9c ceildiv helper [pr] (#6899) 2024-10-05 14:59:10 +08:00
chenyu
fd68b6dbc2 type annotation to round_up (#6898)
* type annotation to round_up

also cleaned up places where round_up was potentially called on symbolic

* fix
2024-10-04 23:27:23 -04:00
chenyu
3c12244cfc remove DTypeLike from lazy (#6897)
keep only in tensor
2024-10-04 22:49:21 -04:00
George Hotz
0d6216aba1 bump the download cache (#6896) 2024-10-05 10:23:18 +08:00
George Hotz
4058a99275 symbolic in ops 2 [pr] (#6895)
* move symbolic to ops, simple [pr]

* fix for shapetracker
2024-10-05 10:20:07 +08:00
chenyu
08414d7b7c cleanup test_uop_symbolic.py (#6894)
no more test_symbolic for reference, so force expected output to be exact instead of a set
2024-10-04 20:53:10 -04:00
ignaciosica
555bcb5e54 static access for code_for_op (#6889) 2024-10-05 07:38:01 +08:00
vladov
5f6b6162b3 Suppress warnings in transcendental tests. (#6891) 2024-10-05 07:37:17 +08:00
nimlgen
707c805a68 nv set localmem sm count to max (#6890) 2024-10-04 23:29:46 +03:00
George Hotz
4df5c7a4ef move lazy to engine [pr] (#6886)
* move lazy to engine [pr]

* engine.lazy
2024-10-04 23:19:26 +08:00
George Hotz
6b063450df move hcq device to runtime [pr] (#6879)
* things that are only used in one place don't belong in helpers [pr]

* start moving hcq device [pr]

* fix paths
2024-10-04 22:26:50 +08:00
George Hotz
5be2bd18a6 use UOps.BIND instead of ASSIGN, it's different (#6885) 2024-10-04 22:26:33 +08:00
chenyu
4c3895744e type annotation for layernorm (#6883) 2024-10-04 09:03:56 -04:00
George Hotz
8ca506ee37 remove the magic methods for moving between devices [pr] (#6881)
* remove the magic methods for moving between devices [pr]

* remove unneeded clang
2024-10-04 20:27:52 +08:00
chenyu
7c8849010a fix var_vals in MCTS (#6882)
tested with JITBEAM=100 llama
2024-10-04 08:19:35 -04:00
George Hotz
a0cb16ac61 node cleanup + local metal test speed [pr] (#6880)
* node cleanup [pr]

* fix tests, including the double one on metal

* no time tqdm tests
2024-10-04 18:14:23 +08:00
George Hotz
cdff1d75b6 things that are only used in one place don't belong in helpers [pr] (#6878)
* things that are only used in one place don't belong in helpers [pr]

* pretty print moved
2024-10-04 17:27:38 +08:00
George Hotz
f4ec39fe58 switch symbolic from old to uops, final PR (#6872)
* switch symbolic from old to uops, final PR

* two wrong answers

* not needed resolves

* symbolic ops passes

* symbolic ops passes

* progress

* tests pass (almost)

* fix last test

* fix some tests

* global binding and unbinding

* Revert "global binding and unbinding"

This reverts commit 9456725630.

* that test works now

* vars on uop doesn't recurse

* fix fuzzer

* update

* fix type

* fix gpt, it's UOp now

* ssimplify symbolics
2024-10-04 16:42:27 +08:00
George Hotz
738a5794a9 last update for new symbolic [pr] (#6877) 2024-10-04 14:58:51 +08:00
chenyu
7391376528 update bert hparams (#6876)
4h32m with this https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/q99frv1l/overview.

loss scaler 2**13->2**10. matched the closest submission, no nan for ~10 runs.

increased lr and total step a bit.

`PARALLEL=0` after setup, same as resnet.
2024-10-04 00:39:06 -04:00
George Hotz
0dee49637e small symbolic changes [pr] (#6874)
* small symbolic changes [pr]

* need that unbind
2024-10-04 12:03:08 +08:00
George Hotz
c50d3c4979 move const mover to ops [pr] (#6873)
* move const mover to ops [pr]

* move more
2024-10-04 11:49:32 +08:00
Tim Becker
d42cb5596f Restore fast path for matching new_src in rewrite (#6870) 2024-10-04 11:22:24 +08:00
ignaciosica
8931f20765 CLANG fixed ops python [run_process_replay] (#6866)
* hotfix: fixed values in ops_python for AMX

* hotfix: remove unused import
2024-10-03 20:40:04 +08:00
George Hotz
4b6732c4f6 safe changes for new symbolic [pr] (#6864) 2024-10-03 20:39:15 +08:00
qazal
17068410e6 give EXT schedules metadata [pr] (#6865) 2024-10-03 20:14:18 +08:00
qazal
5517a07a09 viz late to_program and benchmarks [pr] (#6851)
* viz late to_program [pr]

* benchmark resnet

* delete all of checkStatus

* revert that

* fixup

* get from kernel
2024-10-03 18:29:04 +08:00
qazal
c7925414df don't default print the whole graph in buf limit error [pr] (#6861) 2024-10-03 18:02:19 +08:00
George Hotz
e10245909a explore global uop cache [pr] (#6863)
* explore global uop cache

* wvd uops

* remove useless lru caches

* key is is

* simpler rewriter
2024-10-03 13:08:13 +08:00
George Hotz
a26c6a0ad0 cleanup with smax [pr] (#6854)
* cleanup with smax [pr]

* add that resolve
2024-10-03 08:11:02 +08:00
nimlgen
8bbf6fb88c use mv_address in ops_gpu (#6856) 2024-10-02 22:31:51 +03:00