George Hotz
07b350a8f4
new uops is an actual graph ( #4560 )
...
* new uops is an actual graph
* it's way slower
* simpler
* fix define acc
* render_loop unique
* ops test pass
* add pattern matcher back, there's bugs
* rewrite
* use priority queue
* recursive children
* fix tests
* fix tests with SINK
* fix abstractions
* fix assembly
* simpler
* link define_acc
* fix DEFINE_ACC placement
* type verify
* full cmp
* fix cmp
* ACCESS_ACC
* insert DEFINE_ACC
* fix PHI
* recursive rewrite
* fix many tests
* sum collapse
* more patterns
* correct change
* fold arange
* fix that lin test
* space
* big folding rule works
* close
* has more maxes, meh
* cached node replace
* set changed
* simplest folding yet
* works
* works
* DIV
* all tests pass
* del
* fuzz linearizer fails
* sum_collapse
* test depth 2 cf
* fix lin test 14
* fix clang depth
* disable that
* failure 14 is fixed
* fix ptx
* failure 27 is fixed
* fix llama
* run_cnt
* Revert "Optimize PTX gated loads index calculation (#4304 )"
This reverts commit d97d5a7689 .
* fix uops loop
* fix ptx bugs
* add barrier
* print
* mem_type in ptx direct
* bypass tests that fail in CI but pass locally
* ptx remove ptr_ar
* more ptx passing
* fix ptx tests
* assert compile support
* remove model inference benchmark from red
2024-05-17 18:00:18 -07:00
George Hotz
68ca4d4276
split to schedule.py ( #3949 )
...
* split to schedule.py
* split
2024-03-26 21:02:46 -07:00
George Hotz
150ea2eb76
create engine folder and move code ( #3948 )
...
* retry
* older tf
* that
2024-03-26 20:38:03 -07:00
qazal
337cd53444
multioutput ScheduleItem ( #3699 )
...
* refactor realize.py
* update docs
* update test_sched
* update runners and devices
* update openpilot and unit tests
* cleanup runner lowering
* update more tests
2024-03-13 08:59:38 -07:00
George Hotz
1b6e890ef2
uops flop counter ( #3373 )
...
* factor out winograd functions
* test counter
* uops flop counter
* more correct
* ish
* correct
* cleanup
* tests for uops flop counter
* tests still fail
* fix symbolic uops flop cnt
* fix symbolic uops flop cnt
* hmm, it's an alu
* uops alu resolve
* relax that
2024-02-20 09:36:30 +01:00
George Hotz
2e60012bcf
move create schedule and delete old API ( #3377 )
...
* move create schedule and delete old API
* fix test multitensor
2024-02-12 18:10:45 +01:00
George Hotz
0f6cde243d
import from wino_cleanup ( #3374 )
2024-02-12 16:26:50 +01:00
David Hou
aebaab011f
faster wino compile by catting consts across data expand dim ( #3293 )
...
* PoC faster wino compile by catting consts across data expand dim
* fix fusions
* faster + golf it
* noqa 501
* implicit broadcast
* Revert "implicit broadcast"
This reverts commit 5915a9083d045ec1e6be84dcb492333325d48666.
* shorter
* shorter
* oops
* 216 upcasts is probably fine
* wino kernel count test
* test winograd number of sts
* specify device for apply_matrix mat elements
2024-02-02 03:47:45 -05:00
George Hotz
09f2952dc3
reintroduce merge views in update benchmark ( #3279 )
...
* Reapply "take merge views from corsix branch" (#3278 )
This reverts commit d298916232 .
* reintroduce merge views
2024-01-30 09:47:20 -08:00
George Hotz
d298916232
Revert "take merge views from corsix branch" ( #3278 )
2024-01-30 09:34:28 -08:00
George Hotz
b57a16aa89
take merge views from corsix branch ( #3273 )
...
* take merge views from corsix branch
* better DEBUG
* max views
* remove view.py change
* Revert "remove view.py change"
This reverts commit f3025f4f39 .
* only allow filter on non symbolic
* oops, correct fix
* comment to explain
2024-01-30 09:25:16 -08:00
George Hotz
085dc87bed
winograd should be 4 kernels ( #3268 )
2024-01-28 09:21:26 -08:00
chenyu
e52a609240
make WINO a context var, and LATEWINO in hlb_cifar ( #3161 )
2024-01-17 20:21:26 -05:00
George Hotz
7da2325dc7
get_lazyops() -> lazyops ( #2884 )
...
* get_lazyops() -> lazyops
* don't compare empty mem
2023-12-20 18:04:49 -08:00
Friedrich Carl Eichenroth
75676ab8e1
Profiling-helper ( #2321 )
...
* change profiler
* remove unused imports
* remove unused imports
* change lazybuffer references
* remove unused line
* remove unused import
* remove unused stuff
* add types
* typing
* typing
* typing
* trigger actions
* -1 loc
* fixup
* trigger actions
* revert lazy typing changes
* WIP profiler helper
* replace old start & stop profiler
* fixup
* linting
* Update llama.py
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2023-11-16 14:15:56 -08:00
George Hotz
15da96f393
print test durations and add speed ( #2107 )
...
* print test durations
* decrease sizes to increase speed
* faster
* GPU/CLANG onnx in seperate runner
* test split, move ONNX CPU CI
* simpler tests
* simpler uops test
* faster
* less cuda apt
* running ninja install
* apt install
* split fancy indexing
2023-10-18 13:46:42 -07:00
George Hotz
121f7aa8c5
Schedule item ( #2012 )
...
* ScheduleItem
* put var_vals in the schedule
* fix tests, wow that proliferated quickly
* not ready to be in the schedule
2023-10-07 08:59:25 -07:00
George Hotz
f54959e5cd
move print tree into graph ( #2003 )
...
* move print tree into graph
* add winograd profiling test
* change pre-commit to run ruff first
2023-10-07 04:39:21 -07:00
George Hotz
a677a1e2cd
winograd test prints op count
2023-09-29 05:41:29 -07:00
George Hotz
81cb120b0f
winograd speed test ( #1942 )
2023-09-29 04:40:35 -07:00