Commit Graph

5649 Commits

Author SHA1 Message Date
chenyu
7c9c8ce22f use TensorProto enum in onnx dtype mapping [run_process_replay] (#6151) 2024-08-17 17:58:40 -04:00
chenyu
f7950fc2b6 add E275 missing-whitespace-after-keyword linting rule (#6149)
requires space after keywords like `assert`, `not`, `return`, `else`
2024-08-17 16:44:34 -04:00
chenyu
da4fa77e92 move import cProfile and pstats inside Profiling class (#6148) 2024-08-17 16:08:53 -04:00
George Hotz
88edc2902d axis_is_masked with graph_rewrite [run_process_replay] (#6144) 2024-08-17 10:28:49 -07:00
chenyu
039163e664 more tqdm touchup (#6143)
* more tqdm touchup

don't default iterable to None, and more text cleanups

* oh iterable can be None
2024-08-17 13:06:05 -04:00
qazal
5a266d5d0c type verify ImageDType and PtrDType [run_process_replay] (#6137)
* type verify ImageDType and PtrDType [run_process_replay]

* fix tests
2024-08-17 16:37:07 +03:00
qazal
d1d41130cd use membufs in ImageDType checks [run_process_replay] (#6136)
* use membufs in ImageDType checks

* set by key [run_process_replay]
2024-08-17 16:17:46 +03:00
qazal
41ac8bdd63 verify_ast prep refactor for intermediate uops type spec (#6135)
* refactor to ops

* refactor to two functions

* the uop's shape become local_reduce
2024-08-17 15:34:18 +03:00
qazal
d9ce664350 add test_verify_ast [run_process_replay] (#6134) 2024-08-17 14:14:30 +03:00
qazal
151a62ad32 hotfix: store dtype for ImageDType (#6133) 2024-08-17 13:44:53 +03:00
George Hotz
d0513087e1 hotfix: revert axis_is_masked for stable diffusion speed 2024-08-17 00:22:08 -07:00
George Hotz
4df4845b47 cache is_int [run_process_replay] (#6131)
* cache is_int [run_process_replay]

* functools.cached_property is pretty slow
2024-08-17 00:19:03 -07:00
George Hotz
3a2d724cb2 extra matcher from renderer [run_process_replay] (#6130)
* extra matcher from renderer

* cache_pm [run_process_replay]
2024-08-16 23:53:11 -07:00
George Hotz
9bc81c6db4 UOps.SHAPETRACKER (#6129)
* UOps.SHAPETRACKER [run_process_replay]

* no process replay
2024-08-16 23:26:34 -07:00
George Hotz
5048066e79 st_arg, never -1 [run_process_replay] (#6128) 2024-08-16 22:46:56 -07:00
George Hotz
9e6ad4b40f hotfix: free minor speedup 2024-08-16 21:08:03 -07:00
George Hotz
d9cb45af09 only axis is masked [run_process_replay] (#6123) 2024-08-16 21:01:17 -07:00
George Hotz
94aa5f11b5 Revert "use vmax for real_size [run_process_replay] (#6120)" (#6122)
This reverts commit a6e3211444.
2024-08-16 20:33:19 -07:00
George Hotz
a6e3211444 use vmax for real_size [run_process_replay] (#6120)
* use vmax for real_size [run_process_replay]

* axis is masked
2024-08-16 20:17:23 -07:00
George Hotz
912f01ed4b UOpGraph -> linearize_uop [run_process_replay] (#6119) 2024-08-16 19:48:39 -07:00
George Hotz
7cae152aa2 move uop logic into shapetracker [run_process_replay] (#6118) 2024-08-16 17:47:15 -07:00
George Hotz
89c7989659 no shapetracker in ops [run_process_replay] (#6117) 2024-08-16 17:23:27 -07:00
George Hotz
74ee9febec remove iter from uopgraph (#6110)
* remove iter from uopgraph

* linearize returns uops

* fix tests

* linearize in linearize

* tests fix

* touchup

* test failures
2024-08-16 15:58:29 -07:00
qazal
28c75bf2a6 merge uops with ops (#6111)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-08-16 18:17:57 -04:00
chenyu
379d080e74 tqdm touchup (#6113)
more precise names and don't repeat set_description
2024-08-16 17:34:21 -04:00
nimlgen
5f1554b574 amd fix uaf in program (#6114)
* amd fix uaf in program

* keep it align

* sync before free
2024-08-17 00:22:46 +03:00
qazal
d5e3217076 hotfix: scheduler differ (#6115)
* hotfix: scheduler differ

* add the test back

* track keys
2024-08-16 23:34:49 +03:00
qazal
c23d44c779 AST is UOp (#6030)
* most of the work from the uops2 branch

* schedule

* realize

* kernel

* lowerer

* search

* green

* merge uops with ops

* Revert "merge uops with ops"

This reverts commit 1408a59f12.

* fix benchmark

* remove extra dedup
2024-08-16 22:09:00 +03:00
George Hotz
d6f64c0c1f do better image indexing [run_process_replay] (#6109)
* do better image indexing [run_process_replay]

* fix tests
2024-08-16 09:55:22 -07:00
CaltropHungerton
38fb1e14a2 Intel XMX Tensor Core Support (#5622)
* fixed xmx demo

* i think i'm invoking the DPAS but it's slow

* compiler build arg to stop register spilling, indicated where to fix flop counter

* don't mind this

* do NOT mind me

* do not mind me

* do not view

* i will add bf16 later

* in process of figuring out tc fields

* we figured out the fields!!!

* added check for cl device vendor, added seperate IntelRenderer

* remove tc thread_local_aliases

* cleaning debris before draft pr

* edits for linter

* deduping and checking device extensions

* i will find more line reductions in other places

* before merge upstream

* double grf size in compiler to fix register spilling (bandaid), device checking changes

* tc python emulation

* fixed emulation

* tests for emulated intel tensor core

* TC=0, 1 working on upstream, fixed perf

* test

* debris

* check for specialized cl device when we canonicalize device

* bf16 support, tc=3 test added

* address tests

* revert half2 loads on intel tc, cleanup

* linter

* fold_expanded revert

* lint, whitespace fix

* cuda bf16 (only one with bf16) is skipped in test tensor cores, so i will skip for intel bf16 too

* make line shorter, no need for noqa E501

* removed device intel

* fix python emulation

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-08-16 09:19:21 -07:00
George Hotz
f82ecd8802 remove uop symobilc rendering [run_process_replay] (#6108) 2024-08-16 09:02:15 -07:00
George Hotz
e8ae9af962 bump line count to 9000. we should be here a while 2024-08-16 08:46:36 -07:00
chenyu
7d46fb0c83 load balance NV benchmark ci (#6107) 2024-08-16 10:08:08 -04:00
qazal
1ff6c7c519 add more types to search [run_process_replay] (#6096)
* add more types to search [run_process_replay]

* bufs_from_lin
2024-08-16 13:19:25 +03:00
chenyu
e5da88873b enable UOP_IS_SYMBOLIC (#5954) 2024-08-16 00:15:46 -04:00
George Hotz
553ae9ebc0 bilinear interp uint8 fails (#6103)
* new test for e2e compile failures

* fix bug

* bilinear interp uint8 fails

* better tests
2024-08-15 19:34:39 -07:00
George Hotz
c850e03758 new test for e2e compile failures (#6101)
* new test for e2e compile failures

* fix bug
2024-08-15 18:56:22 -07:00
chenyu
e4a7869893 move cancel mod pattern into mod_folding (#6100)
changed some kernel in a good way because x does not go through add chain
2024-08-15 19:04:18 -04:00
qazal
11d62668a3 refactor ast ops dtype access [run_process_replay] (#6093)
* refactor ast ops dtype access [run_process_replay]

* fix assert message
2024-08-15 19:13:33 +03:00
chenyu
9ef82e1f2b UOp pattern DEFINE_VAR with min==max is also CONST (#6095)
* UOp pattern DEFINE_VAR with min==max is also CONST

* fix tests
2024-08-15 12:09:44 -04:00
chenyu
a41c9dd12c test py.typed as a package (#6094)
* test py.typed as a package

* try this?

* and this

* try that?

* add this back

* cleanup
2024-08-15 11:19:08 -04:00
qazal
25dffb2079 kernel.py more typing [run_process_replay] (#6092) 2024-08-15 17:59:24 +03:00
qazal
4d38fec8c1 rename lazyops to parents [run_process_replay] (#6091) 2024-08-15 17:27:32 +03:00
chenyu
5accfe26a0 rewrite bool ADD to OR and MUL to AND (#6084)
* rewrite bool ADD to OR and MUL to AND

fixed running `tinyphysics.onnx`, which contains a getitem from a boolean tensor.

only can repro through BEAM_COMPARE, which i think is a different bug in test_linearizer_failure

* fold those, and fix tests

* only for bool

* move dtypes.bool
2024-08-15 10:11:57 -04:00
nimlgen
b765996d54 hcq remove offset from progs (#6090) 2024-08-15 17:02:54 +03:00
chenyu
df03dca6e3 move % inside UOp mod_folding and remove deprecated tests (#6085)
[run_process_replay]
2024-08-14 23:25:10 -04:00
George Hotz
c6e117c899 add a single py.typed (#6083) 2024-08-14 17:31:46 -07:00
qazal
2bf7b56485 minor test fixups from the AST is UOp diff (#6081)
* add assert_equiv_uops cache

* dont expect lowering and schedule errors
2024-08-14 23:58:04 +03:00
chenyu
95aa6d8ccd remove redundant x/c pattern [run_process_replay] (#6082)
there's no div and 1/c is const folded
2024-08-14 16:57:39 -04:00
chenyu
a61cb1ff7c move mod mod pattern into generic mod folding (#6077) 2024-08-14 16:24:21 -04:00