qazal
b0fc5a4c6f
start scheduler process replay ( #5656 )
2024-07-23 20:02:51 +03:00
qazal
5f394fc9c6
more work toward non-blocking process replay ( #5653 )
...
* non-blocking process replay
* more actionable
* test it
* revert the test
* %s/logging.warn/logging.warning
2024-07-23 14:26:31 +03:00
George Hotz
dc21e63bd2
test: put conv in one reduce ( #4441 )
...
* test: put conv in one reduce
* put reduce at the end
* more expand
* generic, and that expand was breaking things
* ratio
* don't undo the expand
* arg 1
* strides
* warning, for resnet
* warning removed
* disable cast
* handle cast
* op
* err, that's right
* fixup
* fix that
* a test to play with
* add double_reduces
* working up to final reshape
* fold the last reshape
* moved to schedule
* fix axis
* ci, need to bring arange back
* FUSE_CONV_BW maybe
* valid in 3.9
* test_expand_reduce_is_folded_on_different_axes
* add FUSE_CONV_BW=1
* test_fold_batchnorm_backward
* test_sgd_4convs_fuse
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2024-07-22 12:16:13 +03:00
qazal
e7a057c20f
retire replay_schedule ( #5563 )
2024-07-18 23:07:02 +03:00
qazal
50aba32ea8
hotfix: don't assert process replay in master. ( #5562 )
...
This is because https://github.com/tinygrad/tinygrad/actions/runs/9996754763/job/27631802686 ran exactly when master changed state, causing the diff to assert.
if [run_process_replay] is green pre merge it's ok.
2024-07-18 22:05:00 +03:00
kormann
2c4add6844
pretty print lazy op per default ( #5505 )
...
* pretty lop
* min diff
* walrus
* fix
* min diff
* simplify
* pretty helper function
* ws
* pretty uop upat
* tests
* stricter tests
* test passes
* ws
* stronger upat test
* delete print_tree
* min diff
* stricter exp test
* fix merge
* stronger uops eval test
* +readable and deep upat test
* +readable and deep upat test
* sort inv fix
* fix
* revert allowed_len
2024-07-18 09:34:08 -07:00
George Hotz
fa7e734b49
MetaOps.KERNEL ( #5543 )
2024-07-17 19:41:23 -07:00
qazal
0259d76183
use Context only in replaying Kernel [run_process_replay] ( #5535 )
2024-07-18 03:46:14 +08:00
qazal
a7706e05f9
option to [skip_process_replay] ( #5533 )
2024-07-17 22:30:46 +03:00
Francis Lam
2d53abb04a
test/external/fuzz_linearizer: fix for new AST changes ( #5519 )
...
* test/external/fuzz_linearizer: fix for new AST changes
also add beautiful_mnist failures
* add CLANG and LLVM to test_failure_35 failed_platforms
* fix test_linearizer_failure names
2024-07-17 00:08:07 -04:00
Edward Wang
9a7d5a148e
move colorize_float to helpers.py ( #5490 )
...
* add colorize_float to helpers.py
* update references
2024-07-15 11:29:03 -07:00
qazal
ae4cb7994e
run process replay with DEBUG=0 ( #5491 )
...
* process replay with DEBUG=0
* graceful shutdown
* use and
2024-07-15 16:30:57 +03:00
qazal
3c378efcb6
process replay docs improvements ( #5481 )
...
* minor cleanups
* docs and logs
* shorter
* comma
* s/print/logging.info [run_process_replay]
* use logging.warn
* process name is noise
* revert lowerer change [run_process_replay]
2024-07-15 00:09:28 +03:00
qazal
671779f280
limit process replay diff to ~20% of kernels ( #5480 )
...
* render lidx starting with 0
changed from
```
int gidx0 = gid.x; /* 4096 */
int lidx4 = lid.x; /* 8 */
int gidx1 = gid.y; /* 7 */
int lidx5 = lid.y; /* 8 */
int gidx2 = gid.z; /* 7 */
int lidx6 = lid.z; /* 2 */
```
to
```
int gidx0 = gid.x; /* 4096 */
int lidx0 = lid.x; /* 8 */
int gidx1 = gid.y; /* 7 */
int lidx1 = lid.y; /* 8 */
int gidx2 = gid.z; /* 7 */
int lidx2 = lid.z; /* 2 */
```
the existing one started from pre-limited global dims which skip number if there are more than 3 global dims
* don't need start_dim
* add changed
* env var
* more early exit
* simpler?
* Revert "Merge branch 'lidx0' into process_replay_limit"
This reverts commit cbadcfa5e9 , reversing
changes made to fc9bf37ee7 .
* minor cleanup
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-07-14 23:10:08 +03:00
qazal
0b3a34e3b1
vectorize folding [run_process_replay] ( #5470 )
...
* test_gep_vec_fold
* remove that
* fix process replay
* lint
2024-07-14 09:41:48 +03:00
chenyu
28972418c4
s/get_linearizer/get_kernel [run_process_replay] ( #5467 )
2024-07-13 20:32:22 -04:00
Francis Lata
0345577032
UNet3D dataloader shared memory fix ( #5465 )
...
* create separate SharedMemory between inputs and labels
* update path check for shared mem
* clean up unit test for dataset
2024-07-13 20:26:00 -04:00
qazal
487ceff825
hotfix: ASSERT_PROCESS_REPLAY sometimes doesn't exist ( #5456 )
2024-07-13 21:15:40 +03:00
qazal
40ec9410f9
simpler process replay ( #5452 )
...
* remove check_process_replay
* that can go to the top
* add assert back
* [run_process_replay]
* checkout code [run_process_replay]
* temp [run_process_replay]
* revert temp [run_process_replay]
* ahh this is why [run_process_replay]
* revert temp [run_process_replay]
2024-07-13 19:55:06 +03:00
qazal
23b907efbb
restore process replay runs by their id ( #5453 )
2024-07-13 19:32:34 +03:00
qazal
bb1a9ebf78
run process replay in parallel ( #5443 )
2024-07-13 11:29:36 +03:00
George Hotz
fb3011ac61
improve matcher speed [run_process_replay] ( #5438 )
...
* improve matcher speed [run_process_replay]
* don't use arg set in ptx
2024-07-12 20:02:19 -07:00
George Hotz
03c2dc8bd7
lowerer is kernel [run_process_replay] ( #5437 )
2024-07-12 18:50:55 -07:00
wozeparrot
b80fd7d23c
allow benchmarking forward only ( #5436 )
2024-07-12 17:37:49 -07:00
George Hotz
870dc8c350
s/Linearizer/Lowerer [run_process_replay] ( #5428 )
2024-07-12 15:54:07 -07:00
George Hotz
6707c778d0
scheduleitem is not Tuple [run_process_replay] ( #5425 )
...
* scheduleitem is not Tuple [run_process_replay]
* fix tests
* fix op + fuzzers
* fix mop test
2024-07-12 15:13:19 -07:00
George Hotz
94599c0637
fixup ast in kernel to be MetaOps.SINK [run_process_replay] ( #5424 )
...
* fixup ast in kernel to be MetaOps.SINK [run_process_replay]
* fix tests
* fix more tests
2024-07-12 14:01:03 -07:00
George Hotz
f6ef283e6a
s/loadops/metaops [run_process_replay] ( #5421 )
2024-07-12 13:26:50 -07:00
uuuvn
3cb94a0a15
Rename tinygrad/runtime/driver to support ( #5413 )
2024-07-12 11:06:42 -07:00
qazal
31fcc516dc
more process replay tooling ( #5407 )
...
* replays
* what's in there
* can it be up there
* sha is enough
* insert sha as the key
* fix str
* update reset utils
* that nested try/except was terrible
* github_context can go
2024-07-12 13:11:34 +03:00
chenyu
6e0a523078
repro slow resnet kernel with 4 global dims ( #5402 )
...
* repro slow resnet kernel with 4 global dims
* fix ruff
2024-07-11 23:31:15 -04:00
George Hotz
01fbd18209
metal compile fail
2024-07-11 19:27:05 -07:00
qazal
9712d9ffb6
pass lowering errors if not asserting process replay ( #5395 )
...
* pass lowering errors if not asserting process replay
* ProcessReplayError
2024-07-11 19:09:12 -04:00
qazal
004366b193
context aware process replay [run_process_replay] ( #5378 )
...
* test tc as ctx var
* remove from opts
* process replay
* pop variable
* B -> Variable
* fix re-assign
* pop temp vars
* move TRANSCENDENTAL=2
2024-07-11 13:07:28 +03:00
George Hotz
d13654a820
move uopgraph to file [run_process_replay] ( #5364 )
...
* move uopgraph to file [run_process_replay]
* fix print tree test
2024-07-10 17:34:50 -07:00
Elias Wahl
097268fab3
Add layerwise performance bench for bert ( #5349 )
...
* add bert bench
* dont disable by defauöt
* remove lr
* linter
2024-07-09 15:03:25 -04:00
qazal
bee96a19ff
fuzz uop schedules ( #5345 )
...
* basic blocks + cleanups
* fixups
* elif is better for future me
* fuzz_schedule_max_paths
* fix linter
2024-07-09 15:24:56 +03:00
qazal
d813617742
prescheduling refactor ( #5300 )
...
* p1
* refactor tuple
2024-07-06 12:04:03 +03:00
qazal
b369e75ed0
refactor schedule creation ( #5297 )
2024-07-05 21:14:38 +03:00
chenyu
f1ff65e763
remove "no-nans-fp-math"="true" for LLVM ( #5282 )
...
fixed isnan for llvm (still have issue with < nan)
2024-07-03 17:52:50 -04:00
nimlgen
7be776f9af
add _alloc_signal/_free_signal to hcq ( #5264 )
...
* add _alloc_signal/_free_signal api
* oops, revert this
* linter
2024-07-02 23:35:39 +03:00
Tobias Fischer
8c9c1cf62f
Pulled CLIP and UNet into Seperate Files ( #5253 )
...
* pulled clip and unet into seperate files
* reference cleanup, lru cache fix
* better pool indexing
2024-07-01 22:33:01 -04:00
Roelof van Dijk
975b811ad9
names shadowing builtins ( #5179 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-06-27 08:15:01 -04:00
Roelof van Dijk
f88f71d73a
ruff: unnecessary-comprehension ( #5174 )
...
* enable ruff C416 unnecessary-comprehension
* already a list
2024-06-27 07:45:29 -04:00
qazal
6ca7b13ed1
limit pickled objects [run_process_replay] ( #5154 )
...
* limit pickled objects
* delete uop from the list
* debug metal
* need self.opts for TC
* dont need device
* [run_process_replay]
* minor
2024-06-26 13:51:32 +03:00
George Hotz
63ba2d05d1
uops dfs cleanup ( #5147 )
...
* uops dfs cleanup
* Update uops.py
2024-06-25 18:51:42 -07:00
chenyu
e356807696
tinytqdm.set_description and tinytrange ( #5101 )
2024-06-22 14:45:06 -04:00
chenyu
8080298739
s/tinytqdm/tqdm ( #5103 )
...
except in unit test where tqdm is imported
2024-06-22 14:18:26 -04:00
nimlgen
f1e758bacb
graph fuzzer ( #5082 )
...
* graph fuzzer
* more options
* mypy
* no underscores for funcs
2024-06-21 18:47:23 +03:00
qazal
8aa786232d
docs for running process replay locally ( #5083 )
2024-06-21 09:55:08 -04:00