Commit Graph

4821 Commits

Author SHA1 Message Date
Francis Lam
b563cd52ed linearizer: change globals to merge into left axis/gridDims.x first (#5033)
* linearizer: change order of collapse to be left-most

also fixes Variable max size to be correct and add docs for the off
parameter

* fix multiple global dim oversizes

* add passing variable test and reorganize tests

* use assert RuntimeError for failing test
2024-06-23 18:53:15 -04:00
nimlgen
69f116a7e1 nv/amd profiler (#4718)
* nv/amd profiler

* fix

* fix

* profile copies

* profile logger

* fixes

* more fixes

* less lines and fixes

* fixes

* some linter

* back sync, no related change

* fix gpu2cpu time def

* simpler

* linter

* linter

* docs

* add add_event api
2024-06-23 17:10:12 +03:00
qazal
64a3b7931e simplify render_ops ctx [run_process_replay] (#5116)
* new ctx

* delete DEFINE_VAR

* lt isnt static
2024-06-23 16:56:32 +03:00
qazal
28bf8d86d8 test_linearizer with multi output ASTs (#5115)
* ast is tuple

* run test_phi_simplification

* update reason

* more tc

* beam

* a few more

* use test_opt directly
2024-06-23 15:41:24 +03:00
chenyu
ee0c6dfc15 build Tensor._tri with movements only (#5110)
* build Tensor._tri with movements only

doesn't need arange, saved a kernel in attention mask

* simpler, more tests
2024-06-23 00:07:36 -04:00
chenyu
20fabd8a5b update Tensor.triu and Tensor.tril (#5109)
renamed arg to `diagonal` that matches torch api, and added document and examples
2024-06-22 21:59:50 -04:00
chenyu
8f6ae84e4a minor cleanup of conv_transpose2d (#5108)
* minor cleanup of conv_transpose2d

* that
2024-06-22 21:31:47 -04:00
chenyu
33211f356b fix desc in tqdm (#5107)
per doc `https://tqdm.github.io/docs/tqdm/`, user does not need to put `: ` in desc, and `: ` is automatically removed after desc if the latter is empty.

updated test cases and added a test for set_description
2024-06-22 19:00:38 -04:00
chenyu
055e616302 cleanup mnist data load in beautiful_mnist (#5106) 2024-06-22 18:31:51 -04:00
chenyu
5516b790ad hotfix append colon space to tqdm set_description (#5105) 2024-06-22 18:09:14 -04:00
chenyu
e356807696 tinytqdm.set_description and tinytrange (#5101) 2024-06-22 14:45:06 -04:00
chenyu
8080298739 s/tinytqdm/tqdm (#5103)
except in unit test where tqdm is imported
2024-06-22 14:18:26 -04:00
George Hotz
9f875123b6 small changes from lowerer. [run_process_replay] [no_assert] (#5102) 2024-06-22 11:09:35 -07:00
chenyu
e468601226 update llama attention casting (#5096)
* update llama attention casting

updated scaled_dot_product_attention middle cast and removed hard-coded half in llama attention.

* fix that
2024-06-22 10:57:17 -04:00
chenyu
ca021229e4 fix attention to always return in the same dtype as input (#5100)
mid cast to default_float does not work as intended when default is float32 and qkv is in half
2024-06-22 10:34:57 -04:00
nimlgen
2dcef5a0d7 hcq spec (#5081)
* hcq spec

* small change

* not used import

* fixes

* fix

* signals into base class

* more into base class

* remove imports

* fix wrap timeline

* raise when not implemented

* simpler
2024-06-22 15:32:12 +03:00
chenyu
8bd6cb9511 update llama model RMSNorm casting (#5095)
following the original implementation, cast back to input dtype before multiplying weight. slightly faster
https://github.com/meta-llama/llama/blob/main/llama/model.py
2024-06-21 23:02:04 -04:00
chenyu
0c857ae2d6 some onnx_ops cleanups (#5094) 2024-06-21 22:01:32 -04:00
kormann
f4a041af16 Simplify graph_dedup [run_process_replay] (#5084)
* reset master

* remvpe double default
2024-06-21 22:12:30 +03:00
chenyu
00593d6095 clean the long lines in avg_pool2d and max_pool2d (#5091) 2024-06-21 14:46:56 -04:00
chenyu
a971dc6218 argmax(axis=None) is argmax.flatten().argmax(0) (#5090)
removed the alternative code path
2024-06-21 14:17:10 -04:00
chenyu
166a2b19b5 fix reduce axis of 0d tensors (#5089)
`x.sum(())` is fine, and `x.sum((1,))` should throw IndexError
2024-06-21 13:51:40 -04:00
chenyu
3ff048b68c type annotate reduce axis in tensor.py (#5088) 2024-06-21 13:06:10 -04:00
chenyu
36b4a492a1 explicitly check getitem indices can have at most one ellipsis (#5087)
* explicitly check getitem indices can have at most one ellipsis

previous error with multiple `...`:
```
if index_type not in [None, int, slice, Tensor]: raise IndexError(f"{index_type=} not supported")
                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: index_type=<class 'ellipsis'> not supported
```

this pr:
```
if len(ellipsis_idx) > 1: raise IndexError("an index can only have a single ellipsis ('...')")
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: an index can only have a single ellipsis ('...')
```

* oh we have that already

* test that

* test these
2024-06-21 12:33:18 -04:00
nimlgen
f1e758bacb graph fuzzer (#5082)
* graph fuzzer

* more options

* mypy

* no underscores for funcs
2024-06-21 18:47:23 +03:00
qazal
5717a54b28 don't use Tensor.empty in kernel opts tests (#5086) 2024-06-21 18:41:03 +03:00
qazal
8aa786232d docs for running process replay locally (#5083) 2024-06-21 09:55:08 -04:00
nimlgen
fb1bf48cfe io_uring for copies from disk (#5035)
* exp uring

* fixes and old version

* nv

* cleaner

* cmp vs aio

* fix

* no lib

* fix nv

* linter

* disk_speed_test now runs default

* fixes

* uring -> io_uring

* linter happy

* get_temp_buf comment added

* tiny nits

* put wait back

* test runs everywhere

* remove consts

* remove mmap consts

* do not require iouring to run test, they are generic
2024-06-21 11:36:51 +03:00
George Hotz
b69afc67d8 tinybox docs typo 2024-06-20 17:58:40 -07:00
George Hotz
6bc5e5f41c start tinybox docs 2024-06-20 17:04:45 -07:00
chenyu
f6d6760f71 don't cast tuple to list before creating Tensor (#5071)
Tensor constructor supports creating from tuple now
2024-06-20 13:32:56 -04:00
qazal
97f1347dd9 fix check_process_replay for special characters (#5072)
* 'test' [run_process_replay] [no_assert]

* test with ( ) { } '' " "

* remove the log [run_process_replay] '' () { } '{

* helpful echos [run_process_replay] [no_assert] () ''

* test [run_process_replay] [no_assert]

* test2 [run_process_replay] [no_assert]

* test3 [run_process_replay] [no_assert]

* it's also correct this way [run_process_replay] [no_assert]

* remove extras [run_process_replay]
2024-06-20 20:23:29 +03:00
George Hotz
6f6b3b10c9 import from uops, not linearizer (#5064) 2024-06-20 08:08:44 -07:00
chenyu
50700171ef minor cleanup to reshape arg handling (#5070)
moved None handle to be with argfix, and only resolve -1 if there's a -1
2024-06-20 10:27:27 -04:00
chenyu
f4355d0f1b check Tensor.permute input arg is a valid permutation (#5069)
also added support of negative axes
2024-06-20 10:01:28 -04:00
qazal
24c89a2a33 move assert_equiv_uops to helpers + use == for dtypes (#5067)
* dtypes should use ==

* use TestUOps

* should use assertIs
2024-06-20 16:39:34 +03:00
chenyu
e8f39fcaaa check arg to Tensor.flip can appear only once (#5068)
* check arg to Tensor.flip can appear only once

raise RuntimeError if there are multiple

* fix test
2024-06-20 09:33:42 -04:00
qazal
55e02cdd84 generic gate folding (#5061)
* add assert

* fold truthy gates [run_process_replay]

* fold falsy gates [run_process_replay] [no_assert]

* redo asserts

* check both barriers

* spec start

* spec end

* assert srcs

* make test_fold_gated_load_local better

* [run_process_replay] [no_assert]
2024-06-20 16:10:08 +03:00
kormann
bdca2da2be typannos (#5059) 2024-06-20 09:02:31 -04:00
chenyu
5f7edc7a46 minor cleanup getting output shape of _pool (#5065)
math.ceil makes intent clear, also same formula works in both cases [run_process_replay]
2024-06-20 09:00:48 -04:00
qazal
a6a5dba637 Revert "UPat for has_valid in load/store (#5052)" (#5056)
* manually insert in the Linearizer

* fix process replay
2024-06-19 20:53:36 +03:00
qazal
ee01e464e3 use process replay as a diff creator (#4903)
* add no_assert option [run_process_replay] [no_assert]

* test [run_process_replay] [no_assert]

* [run_process_replay]

* back to normal [run_process_replay]

* remove the log
2024-06-19 18:17:31 +03:00
qazal
99fc275c27 UPat line savings [run_process_replay] (#5053)
* line savings

* move to new style
2024-06-19 12:43:20 +03:00
qazal
71194df1da UPat for has_valid in load/store [run_process_replay] (#5052)
* fold gated load/store [run_process_replay]

* handle temp loads

* direct store
2024-06-19 12:20:22 +03:00
chenyu
996788358d minor change to gelu (#5048)
used `math.sqrt(2 / math.pi)` instead of `0.7978845608`, and moved one mul self inside parentheses. this matched the paper and llm.c
2024-06-18 22:26:56 -04:00
chenyu
4c7e316ded update pylint for ops_python (#5046)
the two errors (cell-var-from-loop and arguments-out-of-order) does not apply as we use it as intended.
2024-06-18 20:15:34 -04:00
wozeparrot
acb715c64c fix: llama3 special tokens (#5045) 2024-06-18 17:08:44 -07:00
chenyu
a8e9307e0b pylint runtime/ and shape/ (#5044)
as pointed out by #4877, need to add `__init__.py` to trigger pylint. fixed some errors except ops_python (will do in a separate pr, it has a lot of errors), and sub-folders in runtime
2024-06-18 19:48:18 -04:00
chenyu
cc2be9064f fix out of bound python list into numpy array (#5043)
numpy 2.0 does not allow oob python const and recommends writing as `np.array(value).astype(dtype)`
2024-06-18 18:05:21 -04:00
chenyu
4e5add4d01 move test_tqdm to test/unit/ (#5042) 2024-06-18 17:41:39 -04:00