kormann
f5dd25d376
enable whisper batch for long sequences ( #6458 )
...
* long batch +test
* long batch +test
* cleanup
* rollback syntactic changes
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-09-17 00:42:10 -04:00
chenyu
7c942418a1
other side of simple out of bound valid case ( #6552 )
...
462 -> 455
2024-09-16 23:57:15 -04:00
chenyu
aeaf7894a7
more generic version of #6548 ( #6549 )
...
x*(-1)<0 can be generalized to x*(-1)<c, 473 -> 462 valids
2024-09-16 23:17:16 -04:00
chenyu
596f41eb46
simple drop image valid case ( #6548 )
...
* simple drop image valid case
started unit test, 530 -> 473 valids
* cleanup
2024-09-16 22:54:07 -04:00
chenyu
798be6bb74
add gated read_image count in openpilot compile2 ( #6546 )
...
530 to go
2024-09-16 21:17:00 -04:00
nimlgen
665b4203f8
dsp power managment ( #6544 )
...
* dsp power managment
* not needed
* oops
2024-09-16 23:34:01 +08:00
nimlgen
25d8f3046a
dsp do not flush libs to ds ( #6531 )
...
* dsp use sc
* no flush to fs
* ruff
* tiny nit
* shorter
2024-09-16 16:42:15 +08:00
qazal
dae3615008
replace viz graph when it's sink ( #6541 )
2024-09-16 16:00:27 +08:00
qazal
2a5a53c3db
remove extra scheduler graph call, VIZ does this [run_process_replay] ( #6540 )
2024-09-16 14:52:50 +08:00
George Hotz
c1b2472dea
reorder alu/vectorize ( #6538 )
2024-09-16 14:28:14 +08:00
George Hotz
42ba887daa
remove logic to vectorize reduces ( #6536 )
...
* remove logic to vectorize reduces
* fix tests
2024-09-16 14:04:48 +08:00
qazal
607113fcdf
fix vectorized dtype repr [run_process_replay] ( #6535 )
2024-09-16 13:42:55 +08:00
qazal
9b9b83b8b0
viz tests ( #6532 )
...
* vizz fuzz tests
* caching
* print timings
* hotfix: update currentRewrite onClick
* import from typing
* indent into __main__
2024-09-16 13:08:42 +08:00
George Hotz
07bd6e070d
add more uops tests for vmin/vmax/const_factor/divides ( #6533 )
2024-09-16 13:06:31 +08:00
ignaciosica
c447ec2190
Fix amx shape [run_process_replay] ( #6524 )
...
* fix amx shape (sz,sz,sz) -> (sz,sz,1)
* revert check
2024-09-16 09:49:55 +08:00
chenyu
1683b274b6
main example we want the valid removed ( #6527 )
...
* main example we want the valid removed
* ast lines are long
2024-09-15 21:49:10 -04:00
George Hotz
e1b21879a7
minor changes from new expand [run_process_replay] ( #6528 )
...
* minor changes from new expand [run_process_replay]
* explain that
2024-09-16 09:48:37 +08:00
Tim Becker
3450382a77
Don't re-check patterns when uop.arg is None ( #6525 )
2024-09-16 09:46:59 +08:00
qazal
a104ecf79b
refactor for SWIZZLE with different st dims [run_process_replay] ( #6526 )
...
* refactor for supporting swizzles with different shape dims [run_process_replay]
* rename
2024-09-16 09:41:03 +08:00
George Hotz
21835fc08c
more graph rewrite tests ( #6521 )
2024-09-16 09:20:54 +08:00
chenyu
6be0cc387c
_get_add_chain(x) -> _get_chain(x, BinaryOps.ADD) ( #6523 )
...
need MUL for valid [run_process_replay]
2024-09-15 10:54:13 -04:00
chenyu
b2c286f567
fix typing for test_ops ( #6520 )
...
mostly passed TYPED=1 python3 -m pytest -n=auto test/test_ops.py.
one last test specifically set an invalid value to test the exception, and to ignore that we need to import typeguard. And to get a working version of typeguard, we would need to get rid of dependency on tensorflow_addons because it requires a very old version of typeguard
2024-09-15 06:18:36 -04:00
George Hotz
cd90092f14
graph rewrite tests ( #6519 )
...
* more graph rewrite tests
* more complex test cases
* more tests
* more tests
* cleanups
* 9600 lines
* cleanups
2024-09-15 17:29:16 +08:00
qazal
89b950c6b3
viz more work ( #6517 )
...
* infra
* actually replace the UOp
* extra per rewrite
* dont allow pyint
2024-09-15 16:42:17 +08:00
qazal
f69251c6b4
assert pyint in linearize_uop [run_process_replay] ( #6518 )
2024-09-15 16:29:05 +08:00
George Hotz
5132bab48d
hotfix: add TYPED=1 support
2024-09-15 14:44:26 +08:00
qazal
2d53e47b14
refactor viz saved context (prereq for tree view) ( #6516 )
...
* more styling
* warns
* refactor viz ctx to dataclass
* meh, fine for now
* name ctx
* allow smaller zooms
* more work
* fixup ctx.diffs
2024-09-15 14:08:55 +08:00
qazal
893a24f60f
viz minor stuff ( #6515 )
...
* some style cleanups
* wrap rewrites
2024-09-15 12:05:31 +08:00
qazal
d0262ac6ab
make ScheduleItem hashable [run_process_replay] ( #6512 )
2024-09-14 18:31:33 +08:00
qazal
4ffb722d4e
var_vals prereq for deleting LBScheduleItem [run_process_replay] ( #6511 )
2024-09-14 17:00:30 +08:00
George Hotz
9188245677
Viz ( #6502 )
...
* start viz tool
* start work
* more readme
* graceful shutdown that reloader
* add VIZ=1
* aesthetics
* typings
* more work
* work left
* more work on rewrites saving
* maybe try zoom
* add some metadata
* generic extra, show code and ast
* more tooling
* add rewritten graphs
* show graph_rewrites
* small details
* more diff cleanups
* differ as the cherry on top
* no useless styles
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2024-09-14 16:15:29 +08:00
nimlgen
052bf43ed4
dsp check buffers count ( #6509 )
2024-09-14 10:16:58 +03:00
qazal
ee5902d347
hotfix: remove rewrite.py from ops [run_process_replay] ( #6508 )
2024-09-14 10:02:47 +08:00
nimlgen
81a4a9623c
add qcom dsp runtime ( #6112 )
...
* calling qualcomm dsp from python
* include so files
* add include file
* adsprpc.py
* running with adsprpc
* work
* 32-bit support in elf
* compilation works
* ion
* msm_ion
* working DSP backend
* getting 500 MFLOPS on matmul
* beam works with timing
* move to autogen
* disasm
* progress
* simple tests pass
* qcom_dsp
* more dsp autogen
* progress
* some progress
* works w/o lib
* checkpoint
* no lib
* ugh, better
* cleaner, but with lib. test good, but with the hack
* remove autogens
* small
* push
* simpler
* revert this
* run_3
* simpler
* android
* handle
* run it
* why?
* run2
* to gen
* cc
* cleaner
* elf
* part of autogen
* comemnt
* no lib
* autohen
* linter
* bug reproducer
* cleaner
* this repro is almost empty and doesn't work!!!!
* with this test_ops passes, no crashes anymore
* cleaner
* linter
* renames
* shorter
* remoev contextlib
* ugh
* myoy
* cleaner
* cleaner
* remove import
* conn
* import
* revert this
* remove heavy .so
* shorter alloc
* not tue anymore
---------
Co-authored-by: Comma Device <device@comma.ai >
Co-authored-by: George Hotz <geohot@gmail.com >
Co-authored-by: George Hotz <george@comma.ai >
2024-09-13 21:01:33 +03:00
nimlgen
ca63207d23
clang compiler args ( #6505 )
2024-09-13 19:22:27 +03:00
George Hotz
774bf39f85
saving rewrites [run_process_replay] ( #6501 )
...
* save rewrites with TRACK_MATCH_STATS=2 [run_process_replay]
* cleaner
2024-09-13 15:02:27 +08:00
Tim Becker
7c078191ce
Misc rewrite perf improvements ( #6500 )
...
* Make UOp a normal class and use __slots__
* Use __slots__ in UPat
* Cache dtypes.{min,max}
* Use faster iterables in ops.py
* extend is a lot faster than nested listcomp
Co-authored-by: Roelof van Dijk <3604013+roelofvandijk@users.noreply.github.com >
---------
Co-authored-by: Roelof van Dijk <3604013+roelofvandijk@users.noreply.github.com >
2024-09-13 11:31:50 +08:00
Tim Becker
8c4cab8d6e
Even faster enums ( #6483 )
...
* Even faster enums
* simpler _generate_next_value impl
* FastEnum in ops only
* Better uniqueness for FastEnum
2024-09-12 20:08:02 +08:00
George Hotz
9543e4c92e
more expand prereqs [run_process_replay] ( #6499 )
2024-09-12 17:46:12 +08:00
George Hotz
327eb12600
folding for vectorized consts [run_process_replay] ( #6498 )
...
* folding for vectorized consts [run_process_replay]
* remove that if statement
* inf loop
2024-09-12 17:29:37 +08:00
George Hotz
a532d59bbd
gep tuple [run_process_replay] ( #6495 )
...
* gep tuple [run_process_replay]
* no inf loop, that goes in expander
* fix ops python
* unbreak gep 0
* fix tests
* fix tests
* VECTORIZE/GEP
* oops, broken
2024-09-12 16:37:31 +08:00
George Hotz
6dfa63cb21
more vconst stuff + gep tuple [run_process_replay] ( #6494 )
...
* more vconst stuff [run_process_replay]
* revert that
* fix inf loop
2024-09-12 14:58:14 +08:00
qazal
4507ab8016
more upat styling changes [run_process_replay] ( #6492 )
...
* more upat styling
* single to doulbe quotes
* wrap line
* comments
2024-09-12 14:40:16 +08:00
qazal
63ea446339
s/None/dtypes.void in docs [run_process_replay] ( #6493 )
...
* s/None/dtypes.void in docs [run_process_replay]
* not arg
* now the asts in docs
* more fixup
2024-09-12 14:27:37 +08:00
George Hotz
119b0ea4af
add UOps.VCONST [run_process_replay] ( #6487 )
...
* add UOps.VCONST [run_process_replay]
* VCONST folding
* simpler devectorize
* alu
* revert that type
2024-09-12 14:03:39 +08:00
qazal
4dc9436d63
use more UPat.var and UPat.cvar [run_process_replay] ( #6491 )
2024-09-12 13:52:41 +08:00
qazal
e5e14fc4ef
all UOp methods need dtype [run_process_replay] ( #6490 )
...
* all UOp methods need dtype [run_process_replay]
* delete all type: ignores yay
2024-09-12 13:38:14 +08:00
George Hotz
76487a3533
remove nop, use upat [run_process_replay] ( #6489 )
...
* remove nop, use upat [run_process_replay]
* mypy passes
* no wonder nothing worked
* fixes
2024-09-12 12:16:19 +08:00
George Hotz
f12f0857d8
add UOps.VCONST (just the uop) [run_process_replay] ( #6488 )
...
* empty branch process replay
* add VCONST
2024-09-12 11:16:20 +08:00
qazal
00d4bf16d8
new utils for scheduler graph rewrite [run_process_replay] ( #6485 )
2024-09-12 10:01:24 +08:00