kormann
3a04e518ec
print_tree UPat +fix ( #5132 )
...
* fix and extend print_tree
* typing
* typing
* fix upat
* fix none
* ws
* rm prefix
* mv luop dag
* typo
* test print_tree
2024-06-26 15:02:19 -07:00
nimlgen
16405b973a
fix hcq sync ( #5062 )
...
* fix hcq sync
* rewrite
* linter + comment
* fix profiler
* no default dict
* correct sync of unjitted transfer
* fix test
2024-06-26 17:50:37 +03:00
nimlgen
fd27f19e92
graph tests ( #5153 )
...
* graph tests
* add test
* cleanup
2024-06-26 16:31:20 +03:00
qazal
6ca7b13ed1
limit pickled objects [run_process_replay] ( #5154 )
...
* limit pickled objects
* delete uop from the list
* debug metal
* need self.opts for TC
* dont need device
* [run_process_replay]
* minor
2024-06-26 13:51:32 +03:00
David Hou
666a9c1448
don't view origin buffer when sharding ( #5122 )
...
* make buffer view optional with a flag
* do not view when sharding to save memory
2024-06-25 20:19:09 -07:00
George Hotz
c98ca23cb9
test pickle variable ( #5150 )
...
* test pickle variable
* fix process replay
2024-06-25 19:49:21 -07:00
George Hotz
63ba2d05d1
uops dfs cleanup ( #5147 )
...
* uops dfs cleanup
* Update uops.py
2024-06-25 18:51:42 -07:00
Jhenner Tigreros
fa78755f19
Add new patterns to unfold division ( #5139 )
...
* Add new patterns to unfold division
* Create regression test and fix pattern
2024-06-25 18:07:47 -07:00
qazal
c4fdb9c725
second iteration on verify_lazyop ( #5140 )
2024-06-25 09:44:32 +03:00
qazal
981afb114f
safely fold NEG in lazy.py ( #5135 )
...
* safe
* add test
2024-06-24 19:40:37 -04:00
chenyu
7948b05738
fix uneven shard with shrink and pad args on sharded axis ( #5131 )
...
it's incorrect to assume all first (len(device)-1) shards would have the same size. e.g. size 2 shard 4 -> (1, 1, 0, 0)
2024-06-24 16:55:50 -04:00
qazal
18e70deec3
verify_lazyop ( #5124 )
...
* start verify_lazyop
* bfs order
* assert
* assert shapetrackers 2
* refactor
* more iteration
* skips
* that ast was wrong too
2024-06-24 13:45:35 -07:00
chenyu
4a7d403777
cleanup test_multitensor ( #5118 )
...
renamed d_zero, d0, d1, d2, ... to d0, d1, d2, d3 and reused some multi device tuples
2024-06-23 20:54:22 -04:00
chenyu
c0ba5e0dfb
multi copy_to_device return the copy on same device if possible ( #5117 )
...
previously it always returns from the first device
2024-06-23 20:25:56 -04:00
Francis Lam
b563cd52ed
linearizer: change globals to merge into left axis/gridDims.x first ( #5033 )
...
* linearizer: change order of collapse to be left-most
also fixes Variable max size to be correct and add docs for the off
parameter
* fix multiple global dim oversizes
* add passing variable test and reorganize tests
* use assert RuntimeError for failing test
2024-06-23 18:53:15 -04:00
qazal
28bf8d86d8
test_linearizer with multi output ASTs ( #5115 )
...
* ast is tuple
* run test_phi_simplification
* update reason
* more tc
* beam
* a few more
* use test_opt directly
2024-06-23 15:41:24 +03:00
chenyu
ee0c6dfc15
build Tensor._tri with movements only ( #5110 )
...
* build Tensor._tri with movements only
doesn't need arange, saved a kernel in attention mask
* simpler, more tests
2024-06-23 00:07:36 -04:00
chenyu
20fabd8a5b
update Tensor.triu and Tensor.tril ( #5109 )
...
renamed arg to `diagonal` that matches torch api, and added document and examples
2024-06-22 21:59:50 -04:00
chenyu
33211f356b
fix desc in tqdm ( #5107 )
...
per doc `https://tqdm.github.io/docs/tqdm/ `, user does not need to put `: ` in desc, and `: ` is automatically removed after desc if the latter is empty.
updated test cases and added a test for set_description
2024-06-22 19:00:38 -04:00
chenyu
e356807696
tinytqdm.set_description and tinytrange ( #5101 )
2024-06-22 14:45:06 -04:00
chenyu
8080298739
s/tinytqdm/tqdm ( #5103 )
...
except in unit test where tqdm is imported
2024-06-22 14:18:26 -04:00
George Hotz
9f875123b6
small changes from lowerer. [run_process_replay] [no_assert] ( #5102 )
2024-06-22 11:09:35 -07:00
chenyu
ca021229e4
fix attention to always return in the same dtype as input ( #5100 )
...
mid cast to default_float does not work as intended when default is float32 and qkv is in half
2024-06-22 10:34:57 -04:00
chenyu
166a2b19b5
fix reduce axis of 0d tensors ( #5089 )
...
`x.sum(())` is fine, and `x.sum((1,))` should throw IndexError
2024-06-21 13:51:40 -04:00
chenyu
36b4a492a1
explicitly check getitem indices can have at most one ellipsis ( #5087 )
...
* explicitly check getitem indices can have at most one ellipsis
previous error with multiple `...`:
```
if index_type not in [None, int, slice, Tensor]: raise IndexError(f"{index_type=} not supported")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: index_type=<class 'ellipsis'> not supported
```
this pr:
```
if len(ellipsis_idx) > 1: raise IndexError("an index can only have a single ellipsis ('...')")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: an index can only have a single ellipsis ('...')
```
* oh we have that already
* test that
* test these
2024-06-21 12:33:18 -04:00
nimlgen
f1e758bacb
graph fuzzer ( #5082 )
...
* graph fuzzer
* more options
* mypy
* no underscores for funcs
2024-06-21 18:47:23 +03:00
qazal
5717a54b28
don't use Tensor.empty in kernel opts tests ( #5086 )
2024-06-21 18:41:03 +03:00
qazal
8aa786232d
docs for running process replay locally ( #5083 )
2024-06-21 09:55:08 -04:00
nimlgen
fb1bf48cfe
io_uring for copies from disk ( #5035 )
...
* exp uring
* fixes and old version
* nv
* cleaner
* cmp vs aio
* fix
* no lib
* fix nv
* linter
* disk_speed_test now runs default
* fixes
* uring -> io_uring
* linter happy
* get_temp_buf comment added
* tiny nits
* put wait back
* test runs everywhere
* remove consts
* remove mmap consts
* do not require iouring to run test, they are generic
2024-06-21 11:36:51 +03:00
chenyu
f6d6760f71
don't cast tuple to list before creating Tensor ( #5071 )
...
Tensor constructor supports creating from tuple now
2024-06-20 13:32:56 -04:00
George Hotz
6f6b3b10c9
import from uops, not linearizer ( #5064 )
2024-06-20 08:08:44 -07:00
chenyu
50700171ef
minor cleanup to reshape arg handling ( #5070 )
...
moved None handle to be with argfix, and only resolve -1 if there's a -1
2024-06-20 10:27:27 -04:00
chenyu
f4355d0f1b
check Tensor.permute input arg is a valid permutation ( #5069 )
...
also added support of negative axes
2024-06-20 10:01:28 -04:00
qazal
24c89a2a33
move assert_equiv_uops to helpers + use == for dtypes ( #5067 )
...
* dtypes should use ==
* use TestUOps
* should use assertIs
2024-06-20 16:39:34 +03:00
chenyu
e8f39fcaaa
check arg to Tensor.flip can appear only once ( #5068 )
...
* check arg to Tensor.flip can appear only once
raise RuntimeError if there are multiple
* fix test
2024-06-20 09:33:42 -04:00
qazal
55e02cdd84
generic gate folding ( #5061 )
...
* add assert
* fold truthy gates [run_process_replay]
* fold falsy gates [run_process_replay] [no_assert]
* redo asserts
* check both barriers
* spec start
* spec end
* assert srcs
* make test_fold_gated_load_local better
* [run_process_replay] [no_assert]
2024-06-20 16:10:08 +03:00
qazal
ee01e464e3
use process replay as a diff creator ( #4903 )
...
* add no_assert option [run_process_replay] [no_assert]
* test [run_process_replay] [no_assert]
* [run_process_replay]
* back to normal [run_process_replay]
* remove the log
2024-06-19 18:17:31 +03:00
chenyu
cc2be9064f
fix out of bound python list into numpy array ( #5043 )
...
numpy 2.0 does not allow oob python const and recommends writing as `np.array(value).astype(dtype)`
2024-06-18 18:05:21 -04:00
chenyu
4e5add4d01
move test_tqdm to test/unit/ ( #5042 )
2024-06-18 17:41:39 -04:00
chenyu
2b2488f2e2
revert creating Tensor from a list without numpy ( #5041 )
...
the change was incomplete and broke creating Tensor from a list of np array
2024-06-18 17:31:22 -04:00
chenyu
e2c5054bdd
update resnet.load_from_pretrained ( #5040 )
2024-06-18 16:29:22 -04:00
chenyu
a3ed4176c8
use tinytqdm in active tests and examples ( #5038 )
...
* use tinytqdm in active tests and examples
stress test this before 0.9.1
* no set_description
2024-06-18 16:01:19 -04:00
kormann
fe332464d2
src->vin [run_process_replay] ( #5036 )
2024-06-18 22:23:49 +03:00
reddyn12
f171006ded
Should this symbolic test fail? ( #4501 )
...
* add test
* skip test
* use expected failure decorator
---------
Co-authored-by: schlimeszn <schlimeszn@gmail.com >
Co-authored-by: reddyn <nikidsniper@gmail.com >
2024-06-18 15:21:26 -04:00
kormann
7c3b877216
rename uop [run_process_replay] ( #5031 )
...
* rename
* fix unittests
* rename vin
* fix test
* fix type [run_process_replay]
* rm pre commit hook change
2024-06-18 21:34:05 +03:00
chenyu
dc942bf1f6
jit sampling functionn in test_randomness.test_multinomial ( #5034 )
...
* jit sampling functionn in test_randomness.test_multinomial
`THREEFRY=1 python3 -m pytest test/test_randomness.py::TestRandomness::test_multinomial --durations 1` 7 sec -> 1.2 sec
* skip that
2024-06-18 14:21:05 -04:00
Francis Lam
8d33998e0d
[run_process_replay] linearizer: fix get_grouping_dims to respect global/local max ( #4855 )
...
* linearizer: fix get_grouping_dims to respect global/local max
* fix lidx variable index offset and unrestrict clang/llvm global len
* test reverse variable indexing when reverse_dims is true
* change the collapse axis to be the right most if reversed
2024-06-18 16:51:27 +03:00
Junjun Dong
c8cd6e725c
Remove BinaryOps.SUB. Replace SUB by ADD and NEG in all tests. Regenerate dataset ( #4977 )
...
* feat: remove BinaryOps.SUB
* remove SUB in test_early_end_local
* regenerate dataset. remove SUB in test_linearizer_*
* reenable overflow tests
* simplify tensor.sub function by returning a+(-b)
* remove whitespaces
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-06-18 09:06:13 -04:00
chenyu
620fa6e5a2
check Tensor.reshape can have at most one -1 ( #5026 )
...
raise RuntimeError to match torch. on master it throws weird errors from shapetracker
2024-06-18 08:17:12 -04:00
chenyu
acaf9a490d
RECIP(-0.0) should be -inf ( #5024 )
...
* RECIP(-0.0) should be -inf
added test_dtype_alu for PYTHON backend
* catcht that
* fix those two
2024-06-17 22:26:58 -04:00