Commit Graph

50 Commits

Author SHA1 Message Date
qazal
2800520dd5 even smaller process_replay.py [pr] (#6941)
* even smaller process_replay.py [pr]

* delete those tests

* dedup asts
2024-10-08 20:43:22 +08:00
qazal
b82023c97e process replay cleanup to generic _pmap [pr] (#6929)
* process replay cleanup to generic _pmap [pr]

* delete `COMPARE_SCHEDULE`
2024-10-07 13:57:05 +08:00
qazal
16312b4c59 rip out old scheduler process replay stuff, diff pure UOps [pr] (#6927) 2024-10-07 13:20:35 +08:00
qazal
c5b252cdb3 add pr alias [pr] (#6834) 2024-10-01 18:48:44 +08:00
qazal
a16a8c5958 color process replay stats [run_process_replay] (#6830) 2024-10-01 15:29:11 +08:00
qazal
5ad2f95d01 process replay diff stats (#6736)
* process replay diff stats

* fix tuples
2024-09-25 15:19:56 +08:00
qazal
6be1bf09f1 hotfix: bring COMPARE_SCHEDULE=0 back (#6657) 2024-09-23 10:39:43 +08:00
qazal
6b65d8c461 more process replay tracing work [run_process_replay] (#6650) 2024-09-22 16:16:58 +08:00
qazal
5bafed2f88 process replay traceback (#6642) 2024-09-21 16:53:34 +08:00
George Hotz
904f6a63fa Revert "Revert "cleanup process_replay/* namings [run_process_replay] (#6429)…" (#6442)
This reverts commit eda177da84.
2024-09-10 07:00:16 +08:00
George Hotz
eda177da84 Revert "cleanup process_replay/* namings [run_process_replay] (#6429)" (#6437)
This reverts commit f4e83b30b4.
2024-09-09 18:52:36 +08:00
qazal
f4e83b30b4 cleanup process_replay/* namings [run_process_replay] (#6429) 2024-09-09 16:59:04 +08:00
qazal
99018a4aa1 minor schedule differ utils [run_process_replay] (#6348)
* minor schedule differ utils [run_process_replay]

* rm
2024-09-04 03:41:38 +08:00
qazal
dd4e5f1c8d process replay rewrite (#6284)
* process replay rewrite

p2

* start some unittests + exceptions and exits

* shebang

* remove extra kernel init
2024-08-29 15:08:27 +03:00
qazal
d2f8eeed2e make [compare_schedule] the default [run_process_replay] (#6273)
* make [compare_schedule] the default

* capture ctx

* logging

* set capture to false
2024-08-26 21:40:03 +08:00
chenyu
6b3112d525 fix qcom process_replay for kernel diff (#6079)
* debug why qcom process_replay does not run

skipping the wrong exception?

* um-hum

* get_step_times was parsed incorrectly

* cleanup
2024-08-14 15:05:49 -04:00
qazal
30035df5a4 add metal process replay back (#6068)
test this new one
2024-08-14 12:29:56 +03:00
qazal
0e62076cf5 more process replay cleanups (#6013)
* more process replay cleanups

* comma benchmark missing
2024-08-10 17:29:10 +03:00
qazal
7373b05ee8 assert conv bw reduceops merge [compare_schedule] (#6001)
* assert conv bw reduceops merge [compare_schedule]

* diff with ref_commit_hash
2024-08-09 19:29:56 +03:00
qazal
a833f1a735 scheduler process replay with [compare_schedule] (#5997) 2024-08-09 16:58:22 +03:00
qazal
24c7c41ce0 diff LazyBuffer schedules in process replay (#5996)
* start diff printing

* this should be 2

* add to process_replay.py

* enable schedule capture

* arange diff is process replay
2024-08-09 14:16:43 +03:00
qazal
8611fa6c99 apply opts.extra_matcher in process replay [run_process_replay] (#5877) 2024-08-02 18:07:58 +03:00
qazal
b775db6b60 high-level benchmark timing diff (#5776)
* high level timings

benchmark times

fix defs

* use the name map

* skip last task
2024-07-28 23:42:57 +03:00
qazal
e0e7293b0a make process replay unique in retries [run_process_replay] (#5773) 2024-07-28 20:44:15 +03:00
qazal
3e49d86c01 process replay diffs 3 things now (#5731)
* github api infra

* process replay is 3 parts now

* parse benchmarks

* add gh_token

* complete diff

* move process replay tests

* last successful run

* add tempdir

* skip master
2024-07-27 12:52:20 +03:00
qazal
94d578396f separate process replay main loop (#5734)
* separate process replay main loop

* [run_process_replay]

* add kernel_changed

* test with [run_process_replay]

* revert temp [run_process_replay]
2024-07-26 21:43:08 +03:00
qazal
1c992de257 hotfix: compare_schedule defaults to false (#5707) 2024-07-25 17:08:28 +03:00
qazal
489cda827a more scheduler process replay tooling (#5706)
* more scheduler process replay tooling

* refactor to compare_schedule
2024-07-25 15:47:18 +03:00
qazal
b0fc5a4c6f start scheduler process replay (#5656) 2024-07-23 20:02:51 +03:00
qazal
5f394fc9c6 more work toward non-blocking process replay (#5653)
* non-blocking process replay

* more actionable

* test it

* revert the test

* %s/logging.warn/logging.warning
2024-07-23 14:26:31 +03:00
qazal
50aba32ea8 hotfix: don't assert process replay in master. (#5562)
This is because https://github.com/tinygrad/tinygrad/actions/runs/9996754763/job/27631802686 ran exactly when master changed state, causing the diff to assert.
if [run_process_replay] is green pre merge it's ok.
2024-07-18 22:05:00 +03:00
qazal
0259d76183 use Context only in replaying Kernel [run_process_replay] (#5535) 2024-07-18 03:46:14 +08:00
qazal
a7706e05f9 option to [skip_process_replay] (#5533) 2024-07-17 22:30:46 +03:00
qazal
ae4cb7994e run process replay with DEBUG=0 (#5491)
* process replay with DEBUG=0

* graceful shutdown

* use and
2024-07-15 16:30:57 +03:00
qazal
3c378efcb6 process replay docs improvements (#5481)
* minor cleanups

* docs and logs

* shorter

* comma

* s/print/logging.info [run_process_replay]

* use logging.warn

* process name is noise

* revert lowerer change [run_process_replay]
2024-07-15 00:09:28 +03:00
qazal
671779f280 limit process replay diff to ~20% of kernels (#5480)
* render lidx starting with 0

changed from
```
  int gidx0 = gid.x; /* 4096 */
  int lidx4 = lid.x; /* 8 */
  int gidx1 = gid.y; /* 7 */
  int lidx5 = lid.y; /* 8 */
  int gidx2 = gid.z; /* 7 */
  int lidx6 = lid.z; /* 2 */
```
to
```
  int gidx0 = gid.x; /* 4096 */
  int lidx0 = lid.x; /* 8 */
  int gidx1 = gid.y; /* 7 */
  int lidx1 = lid.y; /* 8 */
  int gidx2 = gid.z; /* 7 */
  int lidx2 = lid.z; /* 2 */
```

the existing one started from pre-limited global dims which skip number if there are more than 3 global dims

* don't need start_dim

* add changed

* env var

* more early exit

* simpler?

* Revert "Merge branch 'lidx0' into process_replay_limit"

This reverts commit cbadcfa5e9, reversing
changes made to fc9bf37ee7.

* minor cleanup

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-07-14 23:10:08 +03:00
qazal
0b3a34e3b1 vectorize folding [run_process_replay] (#5470)
* test_gep_vec_fold

* remove that

* fix process replay

* lint
2024-07-14 09:41:48 +03:00
qazal
487ceff825 hotfix: ASSERT_PROCESS_REPLAY sometimes doesn't exist (#5456) 2024-07-13 21:15:40 +03:00
qazal
40ec9410f9 simpler process replay (#5452)
* remove check_process_replay

* that can go to the top

* add assert back

* [run_process_replay]

* checkout code [run_process_replay]

* temp [run_process_replay]

* revert temp [run_process_replay]

* ahh this is why [run_process_replay]

* revert temp [run_process_replay]
2024-07-13 19:55:06 +03:00
qazal
23b907efbb restore process replay runs by their id (#5453) 2024-07-13 19:32:34 +03:00
qazal
bb1a9ebf78 run process replay in parallel (#5443) 2024-07-13 11:29:36 +03:00
George Hotz
03c2dc8bd7 lowerer is kernel [run_process_replay] (#5437) 2024-07-12 18:50:55 -07:00
George Hotz
870dc8c350 s/Linearizer/Lowerer [run_process_replay] (#5428) 2024-07-12 15:54:07 -07:00
George Hotz
94599c0637 fixup ast in kernel to be MetaOps.SINK [run_process_replay] (#5424)
* fixup ast in kernel to be MetaOps.SINK [run_process_replay]

* fix tests

* fix more tests
2024-07-12 14:01:03 -07:00
qazal
31fcc516dc more process replay tooling (#5407)
* replays

* what's in there

* can it be up there

* sha is enough

* insert sha as the key

* fix str

* update reset utils

* that nested try/except was terrible

* github_context can go
2024-07-12 13:11:34 +03:00
qazal
9712d9ffb6 pass lowering errors if not asserting process replay (#5395)
* pass lowering errors if not asserting process replay

* ProcessReplayError
2024-07-11 19:09:12 -04:00
qazal
004366b193 context aware process replay [run_process_replay] (#5378)
* test tc as ctx var

* remove from opts

* process replay

* pop variable

* B -> Variable

* fix re-assign

* pop temp vars

* move TRANSCENDENTAL=2
2024-07-11 13:07:28 +03:00
qazal
6ca7b13ed1 limit pickled objects [run_process_replay] (#5154)
* limit pickled objects

* delete uop from the list

* debug metal

* need self.opts for TC

* dont need device

* [run_process_replay]

* minor
2024-06-26 13:51:32 +03:00
chenyu
8080298739 s/tinytqdm/tqdm (#5103)
except in unit test where tqdm is imported
2024-06-22 14:18:26 -04:00
qazal
8aa786232d docs for running process replay locally (#5083) 2024-06-21 09:55:08 -04:00