Commit Graph

4433 Commits

Author SHA1 Message Date
hikettei
ad1ca7da64 [Feature] Added BinaryOps.AND/BinaryOps.OR (#5223)
* [Feature] Added BinaryOps.AND/BinaryOps.OR

* Add: __rand__, __ror__
2024-06-29 17:20:25 -07:00
chenyu
b2ea610df8 fix tqdm unit_scale and support hours in time (#5227)
* fix tqdm unit_scale and support hours in time

previously it only supports MM:SS.
more chars to unitscales, strip trailing "." and " " in formatting, and more tests

* simpler
2024-06-29 14:48:51 -04:00
qazal
f374fb77af assert bool dtype for valid [run_process_replay] (#5214)
* valid is always bool

* prevent NumNode to begin with

* part 2

* test: disable pattern matchers, asserts should pass

* test: store without cast

* test: if (0)

* cleanup time

* only pattern match bool literal

* better for upstream debug
2024-06-29 21:20:32 +03:00
qazal
3f4eeb8b54 late UOps.IF generation [run_process_replay] [no_assert] (#5027)
* find all places

* test gates

* test

* gate based on depths

* add ctx

* that cache was so wrong

* delete useless things

* dont double write if

* self.if_cond

* move UOps.IF to gated store

* test_padto_where_multioutput

* test_padto_group

* minor cleanup

* hmm this actually works?

* need a good barrier

* merge 2

* delete ctx

* p1

* maybe p2

* p3

* minor fixup

* fixup 2

* smart thing from the Lowerer branch

* refactoring

* refactoring 2

* maybe before graph_rewrite

* slightly more acceptable Linearizer diff

* more correct

* [run_process_replay] [no_assert]
2024-06-29 12:22:14 -04:00
chenyu
42d1f92fc1 simpler tqdm (#5221)
can do more, but many cases are not tested
2024-06-29 07:41:46 -04:00
George Hotz
80ac21200b hotfix: linearizer test fixup 2024-06-28 10:52:25 -07:00
kormann
6c456b6d66 remove uopgraph dedup + slight speedup (#5199)
* rm dedup

* rm dedup

* tests

* reduce diff

* oups

* reduce diff

* rm UOp.tuple
2024-06-28 09:26:32 -07:00
chenyu
73395b998b better error msg for TinyJit inside TinyJit (#5202)
it's possible to support TinyJit inside TinyJit, but there are edge cases like two TinyJit functions shared another TinyJit function. so just give a more precise error for now
2024-06-27 18:09:19 -04:00
George Hotz
345bcc2099 move graph_dedup out of class [run_process_replay] (#5197) 2024-06-27 12:04:00 -07:00
George Hotz
d094a6828f single pass rewrite (#5159)
* single pass rewrite

* claude cleanups

* claude cleanups

* skip those tests

* restrict that to ints

* comment

* asserts i don't expect to fail do fail

* simplest...rewrite...ever

* simplest...rewrite...ever

* add that rule back

* tests pass?

* only collapse reduce loops

* second SHL/SHR arg must be 4 bytes

* fix verify

* no SHL/SHR in ptx

* put that back

* skip them in PTX...bad tests
2024-06-27 11:36:05 -07:00
chenyu
ad91962dcf CACHECOLLECTING -> CAPTURING and don't capture clear_l2 (#5190)
fixed first time BEAM slowness
2024-06-27 12:32:28 -04:00
Roelof van Dijk
9704c7d4d4 ruff rule if-exp-instead-of-or-operator (FURB110) (#5178)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-06-27 08:22:19 -07:00
chenyu
5b8fda3c65 fix: JIT=0 means no JIT (#5188) 2024-06-27 10:31:37 -04:00
Roelof van Dijk
975b811ad9 names shadowing builtins (#5179)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-06-27 08:15:01 -04:00
Roelof van Dijk
f88f71d73a ruff: unnecessary-comprehension (#5174)
* enable ruff C416 unnecessary-comprehension

* already a list
2024-06-27 07:45:29 -04:00
kormann
3a04e518ec print_tree UPat +fix (#5132)
* fix and extend print_tree

* typing

* typing

* fix upat

* fix none

* ws

* rm prefix

* mv luop dag

* typo

* test print_tree
2024-06-26 15:02:19 -07:00
nimlgen
16405b973a fix hcq sync (#5062)
* fix hcq sync

* rewrite

* linter + comment

* fix profiler

* no default dict

* correct sync of unjitted transfer

* fix test
2024-06-26 17:50:37 +03:00
nimlgen
fd27f19e92 graph tests (#5153)
* graph tests

* add test

* cleanup
2024-06-26 16:31:20 +03:00
qazal
6ca7b13ed1 limit pickled objects [run_process_replay] (#5154)
* limit pickled objects

* delete uop from the list

* debug metal

* need self.opts for TC

* dont need device

* [run_process_replay]

* minor
2024-06-26 13:51:32 +03:00
David Hou
666a9c1448 don't view origin buffer when sharding (#5122)
* make buffer view optional with a flag

* do not view when sharding to save memory
2024-06-25 20:19:09 -07:00
George Hotz
c98ca23cb9 test pickle variable (#5150)
* test pickle variable

* fix process replay
2024-06-25 19:49:21 -07:00
George Hotz
63ba2d05d1 uops dfs cleanup (#5147)
* uops dfs cleanup

* Update uops.py
2024-06-25 18:51:42 -07:00
Jhenner Tigreros
fa78755f19 Add new patterns to unfold division (#5139)
* Add new patterns to unfold division

* Create regression test and fix pattern
2024-06-25 18:07:47 -07:00
qazal
c4fdb9c725 second iteration on verify_lazyop (#5140) 2024-06-25 09:44:32 +03:00
qazal
981afb114f safely fold NEG in lazy.py (#5135)
* safe

* add test
2024-06-24 19:40:37 -04:00
chenyu
7948b05738 fix uneven shard with shrink and pad args on sharded axis (#5131)
it's incorrect to assume all first (len(device)-1) shards would have the same size. e.g. size 2 shard 4 -> (1, 1, 0, 0)
2024-06-24 16:55:50 -04:00
qazal
18e70deec3 verify_lazyop (#5124)
* start verify_lazyop

* bfs order

* assert

* assert shapetrackers 2

* refactor

* more iteration

* skips

* that ast was wrong too
2024-06-24 13:45:35 -07:00
chenyu
4a7d403777 cleanup test_multitensor (#5118)
renamed d_zero, d0, d1, d2, ... to d0, d1, d2, d3 and reused some multi device tuples
2024-06-23 20:54:22 -04:00
chenyu
c0ba5e0dfb multi copy_to_device return the copy on same device if possible (#5117)
previously it always returns from the first device
2024-06-23 20:25:56 -04:00
Francis Lam
b563cd52ed linearizer: change globals to merge into left axis/gridDims.x first (#5033)
* linearizer: change order of collapse to be left-most

also fixes Variable max size to be correct and add docs for the off
parameter

* fix multiple global dim oversizes

* add passing variable test and reorganize tests

* use assert RuntimeError for failing test
2024-06-23 18:53:15 -04:00
qazal
28bf8d86d8 test_linearizer with multi output ASTs (#5115)
* ast is tuple

* run test_phi_simplification

* update reason

* more tc

* beam

* a few more

* use test_opt directly
2024-06-23 15:41:24 +03:00
chenyu
ee0c6dfc15 build Tensor._tri with movements only (#5110)
* build Tensor._tri with movements only

doesn't need arange, saved a kernel in attention mask

* simpler, more tests
2024-06-23 00:07:36 -04:00
chenyu
20fabd8a5b update Tensor.triu and Tensor.tril (#5109)
renamed arg to `diagonal` that matches torch api, and added document and examples
2024-06-22 21:59:50 -04:00
chenyu
33211f356b fix desc in tqdm (#5107)
per doc `https://tqdm.github.io/docs/tqdm/`, user does not need to put `: ` in desc, and `: ` is automatically removed after desc if the latter is empty.

updated test cases and added a test for set_description
2024-06-22 19:00:38 -04:00
chenyu
e356807696 tinytqdm.set_description and tinytrange (#5101) 2024-06-22 14:45:06 -04:00
chenyu
8080298739 s/tinytqdm/tqdm (#5103)
except in unit test where tqdm is imported
2024-06-22 14:18:26 -04:00
George Hotz
9f875123b6 small changes from lowerer. [run_process_replay] [no_assert] (#5102) 2024-06-22 11:09:35 -07:00
chenyu
ca021229e4 fix attention to always return in the same dtype as input (#5100)
mid cast to default_float does not work as intended when default is float32 and qkv is in half
2024-06-22 10:34:57 -04:00
chenyu
166a2b19b5 fix reduce axis of 0d tensors (#5089)
`x.sum(())` is fine, and `x.sum((1,))` should throw IndexError
2024-06-21 13:51:40 -04:00
chenyu
36b4a492a1 explicitly check getitem indices can have at most one ellipsis (#5087)
* explicitly check getitem indices can have at most one ellipsis

previous error with multiple `...`:
```
if index_type not in [None, int, slice, Tensor]: raise IndexError(f"{index_type=} not supported")
                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: index_type=<class 'ellipsis'> not supported
```

this pr:
```
if len(ellipsis_idx) > 1: raise IndexError("an index can only have a single ellipsis ('...')")
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: an index can only have a single ellipsis ('...')
```

* oh we have that already

* test that

* test these
2024-06-21 12:33:18 -04:00
nimlgen
f1e758bacb graph fuzzer (#5082)
* graph fuzzer

* more options

* mypy

* no underscores for funcs
2024-06-21 18:47:23 +03:00
qazal
5717a54b28 don't use Tensor.empty in kernel opts tests (#5086) 2024-06-21 18:41:03 +03:00
qazal
8aa786232d docs for running process replay locally (#5083) 2024-06-21 09:55:08 -04:00
nimlgen
fb1bf48cfe io_uring for copies from disk (#5035)
* exp uring

* fixes and old version

* nv

* cleaner

* cmp vs aio

* fix

* no lib

* fix nv

* linter

* disk_speed_test now runs default

* fixes

* uring -> io_uring

* linter happy

* get_temp_buf comment added

* tiny nits

* put wait back

* test runs everywhere

* remove consts

* remove mmap consts

* do not require iouring to run test, they are generic
2024-06-21 11:36:51 +03:00
chenyu
f6d6760f71 don't cast tuple to list before creating Tensor (#5071)
Tensor constructor supports creating from tuple now
2024-06-20 13:32:56 -04:00
George Hotz
6f6b3b10c9 import from uops, not linearizer (#5064) 2024-06-20 08:08:44 -07:00
chenyu
50700171ef minor cleanup to reshape arg handling (#5070)
moved None handle to be with argfix, and only resolve -1 if there's a -1
2024-06-20 10:27:27 -04:00
chenyu
f4355d0f1b check Tensor.permute input arg is a valid permutation (#5069)
also added support of negative axes
2024-06-20 10:01:28 -04:00
qazal
24c89a2a33 move assert_equiv_uops to helpers + use == for dtypes (#5067)
* dtypes should use ==

* use TestUOps

* should use assertIs
2024-06-20 16:39:34 +03:00
chenyu
e8f39fcaaa check arg to Tensor.flip can appear only once (#5068)
* check arg to Tensor.flip can appear only once

raise RuntimeError if there are multiple

* fix test
2024-06-20 09:33:42 -04:00