Commit Graph

456 Commits

Author SHA1 Message Date
George Hotz
912f01ed4b UOpGraph -> linearize_uop [run_process_replay] (#6119) 2024-08-16 19:48:39 -07:00
George Hotz
74ee9febec remove iter from uopgraph (#6110)
* remove iter from uopgraph

* linearize returns uops

* fix tests

* linearize in linearize

* tests fix

* touchup

* test failures
2024-08-16 15:58:29 -07:00
qazal
28c75bf2a6 merge uops with ops (#6111)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-08-16 18:17:57 -04:00
qazal
d5e3217076 hotfix: scheduler differ (#6115)
* hotfix: scheduler differ

* add the test back

* track keys
2024-08-16 23:34:49 +03:00
qazal
c23d44c779 AST is UOp (#6030)
* most of the work from the uops2 branch

* schedule

* realize

* kernel

* lowerer

* search

* green

* merge uops with ops

* Revert "merge uops with ops"

This reverts commit 1408a59f12.

* fix benchmark

* remove extra dedup
2024-08-16 22:09:00 +03:00
chenyu
6b3112d525 fix qcom process_replay for kernel diff (#6079)
* debug why qcom process_replay does not run

skipping the wrong exception?

* um-hum

* get_step_times was parsed incorrectly

* cleanup
2024-08-14 15:05:49 -04:00
qazal
30035df5a4 add metal process replay back (#6068)
test this new one
2024-08-14 12:29:56 +03:00
qazal
0e62076cf5 more process replay cleanups (#6013)
* more process replay cleanups

* comma benchmark missing
2024-08-10 17:29:10 +03:00
qazal
7373b05ee8 assert conv bw reduceops merge [compare_schedule] (#6001)
* assert conv bw reduceops merge [compare_schedule]

* diff with ref_commit_hash
2024-08-09 19:29:56 +03:00
qazal
a833f1a735 scheduler process replay with [compare_schedule] (#5997) 2024-08-09 16:58:22 +03:00
qazal
24c7c41ce0 diff LazyBuffer schedules in process replay (#5996)
* start diff printing

* this should be 2

* add to process_replay.py

* enable schedule capture

* arange diff is process replay
2024-08-09 14:16:43 +03:00
qazal
d6f4a61c42 graph LBScheduleItem [run_process_replay] (#5960)
* add toposort key to LBScheduleItem

* use dedup

* graph LBScheduleItem

* make that comment beautiful again

* diff_schedule utils

* update fuzz_schedule
2024-08-07 19:59:11 +03:00
qazal
39dda3d042 rename prescheduled items to lsi [run_process_replay] (#5959)
* rename to lsi

* fuzz_schedule more typings

* rename fuzz_schedule
2024-08-07 14:31:50 +03:00
qazal
728b7e189e diff_schedule tests [run_process_replay] (#5958)
* diff_schedule tests [run_process_replay]

* ok to run serial
2024-08-07 13:50:27 +03:00
George Hotz
3a0515ea22 hotfix: process_replay/diff_schedule.py to LBScheduleItem 2024-08-06 17:01:05 -07:00
George Hotz
73d4d51845 add LBScheduleItem type [run_process_replay] (#5944)
* add LBScheduleItem type [run_process_replay]

* minor cleanups

* fix

* fix fuzz tests

* add group cache type
2024-08-06 14:49:40 -07:00
qazal
a7db4c3ee9 show timings for DIFF_ARANGE=1 (#5935)
* show timings for DIFF_ARANGE=1

* always with DEBUG=2
2024-08-06 17:20:38 +03:00
qazal
102a8c184b diff fused arange schedules with ARANGE_DIFF=1 (#5934)
* diff fused arange schedules with ARANGE_DIFF=1

* better llama diff
2024-08-06 16:52:26 +03:00
wozeparrot
acadccf344 comma benchmark (#5518) 2024-08-02 14:36:54 -07:00
Elias Wahl
4a114756f6 New BERT dataloader (#5881)
* One file == One topic

* update test

* new dataloader

* update train script

* get index is faster
2024-08-02 15:12:23 -04:00
qazal
8611fa6c99 apply opts.extra_matcher in process replay [run_process_replay] (#5877) 2024-08-02 18:07:58 +03:00
qazal
2a791f7924 fuzz uops is simpler with List[UOp] [run_process_replay] (#5875)
* remove from fuzz_uops

* update fuzz_uops.py

* add to realize.py
2024-08-02 17:28:15 +03:00
nimlgen
f768935be8 add RING_ALLREDUCE_THRESHOLD (#5835)
* add RING_ALLREDUCE_THRESHOLD

* becnhmark

* fixes

* fix n_gpus

* unused import

* remove debug=2
2024-07-31 16:13:09 +03:00
George Hotz
e6879035a0 work to make GEMV fast (#5824)
* work to make GEMV fast

* half8 cast

* align struct

* fix amd

* float8 is a later problem
2024-07-30 17:41:40 -07:00
Francis Lata
ce61be16f1 clean up how preprocessed folder is defined (#5813) 2024-07-30 12:35:26 -04:00
qazal
5e827e51d2 add llama3 BEAM=2 failures to test_linearizer_failures (#5553)
* skips

* opts.device

* benchmarks

* add to test_linearizer_failures

* remove hardcoded ones

* linter

* skip cpu
2024-07-30 00:37:32 +03:00
qazal
b775db6b60 high-level benchmark timing diff (#5776)
* high level timings

benchmark times

fix defs

* use the name map

* skip last task
2024-07-28 23:42:57 +03:00
qazal
e0e7293b0a make process replay unique in retries [run_process_replay] (#5773) 2024-07-28 20:44:15 +03:00
qazal
3e49d86c01 process replay diffs 3 things now (#5731)
* github api infra

* process replay is 3 parts now

* parse benchmarks

* add gh_token

* complete diff

* move process replay tests

* last successful run

* add tempdir

* skip master
2024-07-27 12:52:20 +03:00
qazal
57b4a8e98d assert process replay asserts (#5737)
* assert process replay asserts

* one ci job is fine

* test: Revert "separate process replay main loop (#5734)"

This reverts commit 94d578396f.

* mac sed needs that

* Revert "test: Revert "separate process replay main loop (#5734)""

This reverts commit e4ad7684d5.

* disable process replay capture

* save time

* amd is tiny

* send to /dev/null
2024-07-27 12:07:50 +03:00
George Hotz
f8972ace38 test flops (and allow wide ALU in UOps) [run_process_replay] (#5749)
* flops test in external_test_speed_theoretical.py

* test speed theo

* min SZMAX

* allow wide ALU for things that support it

* needed for mypy
2024-07-26 21:07:28 -07:00
George Hotz
2fde2d2914 hotfix: external_test_speed_theoretical works on 24GB 2024-07-26 18:41:52 -07:00
George Hotz
829262a5ee add external_test_speed_theoretical 2024-07-26 17:45:22 -07:00
qazal
94d578396f separate process replay main loop (#5734)
* separate process replay main loop

* [run_process_replay]

* add kernel_changed

* test with [run_process_replay]

* revert temp [run_process_replay]
2024-07-26 21:43:08 +03:00
qazal
1c992de257 hotfix: compare_schedule defaults to false (#5707) 2024-07-25 17:08:28 +03:00
qazal
489cda827a more scheduler process replay tooling (#5706)
* more scheduler process replay tooling

* refactor to compare_schedule
2024-07-25 15:47:18 +03:00
qazal
b0fc5a4c6f start scheduler process replay (#5656) 2024-07-23 20:02:51 +03:00
qazal
5f394fc9c6 more work toward non-blocking process replay (#5653)
* non-blocking process replay

* more actionable

* test it

* revert the test

* %s/logging.warn/logging.warning
2024-07-23 14:26:31 +03:00
George Hotz
dc21e63bd2 test: put conv in one reduce (#4441)
* test: put conv in one reduce

* put reduce at the end

* more expand

* generic, and that expand was breaking things

* ratio

* don't undo the expand

* arg 1

* strides

* warning, for resnet

* warning removed

* disable cast

* handle cast

* op

* err, that's right

* fixup

* fix that

* a test to play with

* add double_reduces

* working up to final reshape

* fold the last reshape

* moved to schedule

* fix axis

* ci, need to bring arange back

* FUSE_CONV_BW maybe

* valid in 3.9

* test_expand_reduce_is_folded_on_different_axes

* add FUSE_CONV_BW=1

* test_fold_batchnorm_backward

* test_sgd_4convs_fuse

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-07-22 12:16:13 +03:00
qazal
e7a057c20f retire replay_schedule (#5563) 2024-07-18 23:07:02 +03:00
qazal
50aba32ea8 hotfix: don't assert process replay in master. (#5562)
This is because https://github.com/tinygrad/tinygrad/actions/runs/9996754763/job/27631802686 ran exactly when master changed state, causing the diff to assert.
if [run_process_replay] is green pre merge it's ok.
2024-07-18 22:05:00 +03:00
kormann
2c4add6844 pretty print lazy op per default (#5505)
* pretty lop

* min diff

* walrus

* fix

* min diff

* simplify

* pretty helper function

* ws

* pretty uop upat

* tests

* stricter tests

* test passes

* ws

* stronger upat test

* delete print_tree

* min diff

* stricter exp test

* fix merge

* stronger uops eval test

* +readable and deep upat test

* +readable and deep upat test

* sort inv fix

* fix

* revert allowed_len
2024-07-18 09:34:08 -07:00
George Hotz
fa7e734b49 MetaOps.KERNEL (#5543) 2024-07-17 19:41:23 -07:00
qazal
0259d76183 use Context only in replaying Kernel [run_process_replay] (#5535) 2024-07-18 03:46:14 +08:00
qazal
a7706e05f9 option to [skip_process_replay] (#5533) 2024-07-17 22:30:46 +03:00
Francis Lam
2d53abb04a test/external/fuzz_linearizer: fix for new AST changes (#5519)
* test/external/fuzz_linearizer: fix for new AST changes

also add beautiful_mnist failures

* add CLANG and LLVM to test_failure_35 failed_platforms

* fix test_linearizer_failure names
2024-07-17 00:08:07 -04:00
Edward Wang
9a7d5a148e move colorize_float to helpers.py (#5490)
* add colorize_float to helpers.py

* update references
2024-07-15 11:29:03 -07:00
qazal
ae4cb7994e run process replay with DEBUG=0 (#5491)
* process replay with DEBUG=0

* graceful shutdown

* use and
2024-07-15 16:30:57 +03:00
qazal
3c378efcb6 process replay docs improvements (#5481)
* minor cleanups

* docs and logs

* shorter

* comma

* s/print/logging.info [run_process_replay]

* use logging.warn

* process name is noise

* revert lowerer change [run_process_replay]
2024-07-15 00:09:28 +03:00
qazal
671779f280 limit process replay diff to ~20% of kernels (#5480)
* render lidx starting with 0

changed from
```
  int gidx0 = gid.x; /* 4096 */
  int lidx4 = lid.x; /* 8 */
  int gidx1 = gid.y; /* 7 */
  int lidx5 = lid.y; /* 8 */
  int gidx2 = gid.z; /* 7 */
  int lidx6 = lid.z; /* 2 */
```
to
```
  int gidx0 = gid.x; /* 4096 */
  int lidx0 = lid.x; /* 8 */
  int gidx1 = gid.y; /* 7 */
  int lidx1 = lid.y; /* 8 */
  int gidx2 = gid.z; /* 7 */
  int lidx2 = lid.z; /* 2 */
```

the existing one started from pre-limited global dims which skip number if there are more than 3 global dims

* don't need start_dim

* add changed

* env var

* more early exit

* simpler?

* Revert "Merge branch 'lidx0' into process_replay_limit"

This reverts commit cbadcfa5e9, reversing
changes made to fc9bf37ee7.

* minor cleanup

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-07-14 23:10:08 +03:00