Commit Graph

10633 Commits

Author SHA1 Message Date
qazal
dae3615008 replace viz graph when it's sink (#6541) 2024-09-16 16:00:27 +08:00
qazal
2a5a53c3db remove extra scheduler graph call, VIZ does this [run_process_replay] (#6540) 2024-09-16 14:52:50 +08:00
George Hotz
c1b2472dea reorder alu/vectorize (#6538) 2024-09-16 14:28:14 +08:00
George Hotz
42ba887daa remove logic to vectorize reduces (#6536)
* remove logic to vectorize reduces

* fix tests
2024-09-16 14:04:48 +08:00
qazal
607113fcdf fix vectorized dtype repr [run_process_replay] (#6535) 2024-09-16 13:42:55 +08:00
qazal
9b9b83b8b0 viz tests (#6532)
* vizz fuzz tests

* caching

* print timings

* hotfix: update currentRewrite onClick

* import from typing

* indent into __main__
2024-09-16 13:08:42 +08:00
George Hotz
07bd6e070d add more uops tests for vmin/vmax/const_factor/divides (#6533) 2024-09-16 13:06:31 +08:00
ignaciosica
c447ec2190 Fix amx shape [run_process_replay] (#6524)
* fix amx shape (sz,sz,sz) -> (sz,sz,1)

* revert check
2024-09-16 09:49:55 +08:00
chenyu
1683b274b6 main example we want the valid removed (#6527)
* main example we want the valid removed

* ast lines are long
2024-09-15 21:49:10 -04:00
George Hotz
e1b21879a7 minor changes from new expand [run_process_replay] (#6528)
* minor changes from new expand [run_process_replay]

* explain that
2024-09-16 09:48:37 +08:00
Tim Becker
3450382a77 Don't re-check patterns when uop.arg is None (#6525) 2024-09-16 09:46:59 +08:00
qazal
a104ecf79b refactor for SWIZZLE with different st dims [run_process_replay] (#6526)
* refactor for supporting swizzles with different shape dims [run_process_replay]

* rename
2024-09-16 09:41:03 +08:00
George Hotz
21835fc08c more graph rewrite tests (#6521) 2024-09-16 09:20:54 +08:00
chenyu
6be0cc387c _get_add_chain(x) -> _get_chain(x, BinaryOps.ADD) (#6523)
need MUL for valid [run_process_replay]
2024-09-15 10:54:13 -04:00
chenyu
b2c286f567 fix typing for test_ops (#6520)
mostly passed TYPED=1 python3 -m pytest -n=auto test/test_ops.py.

one last test specifically set an invalid value to test the exception, and to ignore that we need to import typeguard. And to get a working version of typeguard, we would need to get rid of dependency on tensorflow_addons because it requires a very old version of typeguard
2024-09-15 06:18:36 -04:00
George Hotz
cd90092f14 graph rewrite tests (#6519)
* more graph rewrite tests

* more complex test cases

* more tests

* more tests

* cleanups

* 9600 lines

* cleanups
2024-09-15 17:29:16 +08:00
qazal
89b950c6b3 viz more work (#6517)
* infra

* actually replace the UOp

* extra per rewrite

* dont allow pyint
2024-09-15 16:42:17 +08:00
qazal
f69251c6b4 assert pyint in linearize_uop [run_process_replay] (#6518) 2024-09-15 16:29:05 +08:00
George Hotz
5132bab48d hotfix: add TYPED=1 support 2024-09-15 14:44:26 +08:00
qazal
2d53e47b14 refactor viz saved context (prereq for tree view) (#6516)
* more styling

* warns

* refactor viz ctx to dataclass

* meh, fine for now

* name ctx

* allow smaller zooms

* more work

* fixup ctx.diffs
2024-09-15 14:08:55 +08:00
qazal
893a24f60f viz minor stuff (#6515)
* some style cleanups

* wrap rewrites
2024-09-15 12:05:31 +08:00
qazal
d0262ac6ab make ScheduleItem hashable [run_process_replay] (#6512) 2024-09-14 18:31:33 +08:00
qazal
4ffb722d4e var_vals prereq for deleting LBScheduleItem [run_process_replay] (#6511) 2024-09-14 17:00:30 +08:00
George Hotz
9188245677 Viz (#6502)
* start viz tool

* start work

* more readme

* graceful shutdown that reloader

* add VIZ=1

* aesthetics

* typings

* more work

* work left

* more work on rewrites saving

* maybe try zoom

* add some metadata

* generic extra, show code and ast

* more tooling

* add rewritten graphs

* show graph_rewrites

* small details

* more diff cleanups

* differ as the cherry on top

* no useless styles

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-09-14 16:15:29 +08:00
nimlgen
052bf43ed4 dsp check buffers count (#6509) 2024-09-14 10:16:58 +03:00
qazal
ee5902d347 hotfix: remove rewrite.py from ops [run_process_replay] (#6508) 2024-09-14 10:02:47 +08:00
nimlgen
81a4a9623c add qcom dsp runtime (#6112)
* calling qualcomm dsp from python

* include so files

* add include file

* adsprpc.py

* running with adsprpc

* work

* 32-bit support in elf

* compilation works

* ion

* msm_ion

* working DSP backend

* getting 500 MFLOPS on matmul

* beam works with timing

* move to autogen

* disasm

* progress

* simple tests pass

* qcom_dsp

* more dsp autogen

* progress

* some progress

* works w/o lib

* checkpoint

* no lib

* ugh, better

* cleaner, but with lib. test good, but with the hack

* remove autogens

* small

* push

* simpler

* revert this

* run_3

* simpler

* android

* handle

* run it

* why?

* run2

* to gen

* cc

* cleaner

* elf

* part of autogen

* comemnt

* no lib

* autohen

* linter

* bug reproducer

* cleaner

* this repro is almost empty and doesn't work!!!!

* with this test_ops passes, no crashes anymore

* cleaner

* linter

* renames

* shorter

* remoev contextlib

* ugh

* myoy

* cleaner

* cleaner

* remove import

* conn

* import

* revert this

* remove heavy .so

* shorter alloc

* not tue anymore

---------

Co-authored-by: Comma Device <device@comma.ai>
Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: George Hotz <george@comma.ai>
2024-09-13 21:01:33 +03:00
nimlgen
ca63207d23 clang compiler args (#6505) 2024-09-13 19:22:27 +03:00
George Hotz
774bf39f85 saving rewrites [run_process_replay] (#6501)
* save rewrites with TRACK_MATCH_STATS=2 [run_process_replay]

* cleaner
2024-09-13 15:02:27 +08:00
Tim Becker
7c078191ce Misc rewrite perf improvements (#6500)
* Make UOp a normal class and use __slots__

* Use __slots__ in UPat

* Cache dtypes.{min,max}

* Use faster iterables in ops.py

* extend is a lot faster than nested listcomp

Co-authored-by: Roelof van Dijk <3604013+roelofvandijk@users.noreply.github.com>

---------

Co-authored-by: Roelof van Dijk <3604013+roelofvandijk@users.noreply.github.com>
2024-09-13 11:31:50 +08:00
Tim Becker
8c4cab8d6e Even faster enums (#6483)
* Even faster enums

* simpler _generate_next_value impl

* FastEnum in ops only

* Better uniqueness for FastEnum
2024-09-12 20:08:02 +08:00
George Hotz
9543e4c92e more expand prereqs [run_process_replay] (#6499) 2024-09-12 17:46:12 +08:00
George Hotz
327eb12600 folding for vectorized consts [run_process_replay] (#6498)
* folding for vectorized consts [run_process_replay]

* remove that if statement

* inf loop
2024-09-12 17:29:37 +08:00
George Hotz
a532d59bbd gep tuple [run_process_replay] (#6495)
* gep tuple [run_process_replay]

* no inf loop, that goes in expander

* fix ops python

* unbreak gep 0

* fix tests

* fix tests

* VECTORIZE/GEP

* oops, broken
2024-09-12 16:37:31 +08:00
George Hotz
6dfa63cb21 more vconst stuff + gep tuple [run_process_replay] (#6494)
* more vconst stuff [run_process_replay]

* revert that

* fix inf loop
2024-09-12 14:58:14 +08:00
qazal
4507ab8016 more upat styling changes [run_process_replay] (#6492)
* more upat styling

* single to doulbe quotes

* wrap line

* comments
2024-09-12 14:40:16 +08:00
qazal
63ea446339 s/None/dtypes.void in docs [run_process_replay] (#6493)
* s/None/dtypes.void in docs [run_process_replay]

* not arg

* now the asts in docs

* more fixup
2024-09-12 14:27:37 +08:00
George Hotz
119b0ea4af add UOps.VCONST [run_process_replay] (#6487)
* add UOps.VCONST [run_process_replay]

* VCONST folding

* simpler devectorize

* alu

* revert that type
2024-09-12 14:03:39 +08:00
qazal
4dc9436d63 use more UPat.var and UPat.cvar [run_process_replay] (#6491) 2024-09-12 13:52:41 +08:00
qazal
e5e14fc4ef all UOp methods need dtype [run_process_replay] (#6490)
* all UOp methods need dtype [run_process_replay]

* delete all type: ignores yay
2024-09-12 13:38:14 +08:00
George Hotz
76487a3533 remove nop, use upat [run_process_replay] (#6489)
* remove nop, use upat [run_process_replay]

* mypy passes

* no wonder nothing worked

* fixes
2024-09-12 12:16:19 +08:00
George Hotz
f12f0857d8 add UOps.VCONST (just the uop) [run_process_replay] (#6488)
* empty branch process replay

* add VCONST
2024-09-12 11:16:20 +08:00
qazal
00d4bf16d8 new utils for scheduler graph rewrite [run_process_replay] (#6485) 2024-09-12 10:01:24 +08:00
qazal
a17ea53340 delete USE_COPY_KERNEL from the scheduler [run_process_replay] (#6482) 2024-09-12 07:45:31 +08:00
nimlgen
eac046ea55 hcq check queue size before submit (#6481) 2024-09-11 23:13:13 +03:00
qazal
dda5c63f4a things we can delete after dtypes.void [run_process_replay] (#6480) 2024-09-11 19:21:41 +08:00
qazal
bce73c9a54 more scheduler graph_rewrite cleanups [run_process_replay] (#6479) 2024-09-11 18:26:35 +08:00
George Hotz
bdd0c06f29 add void type to uop (#6471)
* unwrap_dtype maybe

* uopgraph stuff that hardcoded None

* test_ops passes

* dtypes.py fixups

* update test_linearizer and friends

* more ast updates

* test_beam and test_schedule too

* add void type to uop [run_process_replay]

* remove dumb casts

* start making it green

* more cast cleanups

* more cls methods to fix

* regenerate dataset

* split UOp and NOp const

* maybe that too

* fix docs

* update test_uop_symbolic

* test_verify_ast

* new sops with no diff

* meh, type_ignore is alright

* remove that assert

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-09-11 18:16:28 +08:00
George Hotz
1b4d1823b7 add pyint to DTYPES_DICT [run_process_replay] (#6477)
* add pyint to DTYPES_DICT [run_process_replay]

* also fix uop alu bug

* exclude pyint there too

* ne ne

* force explicit dtype
2024-09-11 17:31:59 +08:00
qazal
5cc142c8b8 add uop.swizzle(st) (#6476) 2024-09-11 16:52:42 +08:00