Comma Device
|
76aa6416d7
|
qcom: add disassembler with DEBUG >= 5
|
2024-09-20 07:04:28 +00:00 |
|
chenyu
|
036c2f5b26
|
validhack use the new style ge for upper bound valid (#6612)
also relaxed the bound check to check vmin/vmax instead just const.
valids 482 -> 283
|
2024-09-19 23:45:42 -04:00 |
|
George Hotz
|
c4d5575c61
|
beat mlx at resnet 18 (#6611)
* work to beat mlx at resnet18 [run_process_replay]
* pruning
* wino sometimes
* shorter
* comment
|
2024-09-20 11:28:01 +08:00 |
|
qazal
|
785aaec67c
|
make VIZ more responsive for big graphs (#6610)
|
2024-09-20 08:56:13 +08:00 |
|
George Hotz
|
78699d9924
|
15% more folder speed [run_process_replay] (#6607)
* 15% more folder speed [run_process_replay]
* gep cleanups
|
2024-09-19 22:34:42 +08:00 |
|
chenyu
|
a37e92081a
|
fix unrolled arange folding (#6606)
* fix unrolled arange folding
also added flop test to test_arange to make sure it's 0 flop
* skip PTX
|
2024-09-19 09:03:01 -04:00 |
|
qazal
|
eebd23155c
|
move scheduler rewrites into full_ast_rewrite [run_process_replay] (#6609)
|
2024-09-19 20:03:28 +08:00 |
|
qazal
|
31748c72c4
|
refactor viz to parse_qs (#6608)
|
2024-09-19 19:51:41 +08:00 |
|
nimlgen
|
944cc46e11
|
qcom fix image pitch (#6600)
* qcom fix image pitch
* correct
|
2024-09-19 18:50:02 +08:00 |
|
George Hotz
|
a1a882b006
|
arange folding with new ge (#6604)
* arange folding with new ge
* bump allowed gated
* bump allowed speed
|
2024-09-19 18:01:28 +08:00 |
|
George Hotz
|
224151a958
|
update indexing with UPat.any [run_process_replay] (#6605)
|
2024-09-19 17:40:17 +08:00 |
|
chenyu
|
d148a62f8d
|
more generic simplify_valid_image_load (#6603)
use graph_rewrite to simplify the expression with narrowed variables, and check boundry conditions on monotonically increasing function to drop valid.
|
2024-09-19 05:33:37 -04:00 |
|
George Hotz
|
718ecad2ee
|
add UPat.any support [run_process_replay] (#6602)
* add UPat.any support [run_process_replay]
* single arange pattern
* no loop_start and loop_end
|
2024-09-19 17:11:24 +08:00 |
|
qazal
|
d06b36e527
|
viz open UPat links in editor (#6601)
* move the reloader
* open links in editor
* less things in ui
|
2024-09-19 16:48:09 +08:00 |
|
qazal
|
94effe2a71
|
simple VIZ=1 and get_location changes (#6599)
* simpler replace
* this get_location is fine?
* python things
* ctx location
|
2024-09-19 15:58:33 +08:00 |
|
chenyu
|
eeee032b14
|
tiny cleanup of test_image_valid (#6597)
* tiny cleanup of test_image_valid
Sepcial and Variable to setup UOp
* typo
|
2024-09-19 03:09:47 -04:00 |
|
George Hotz
|
012a2c449a
|
fix lt_folding VCONST issue [run_process_replay] (#6424)
* le and ge [run_process_replay]
* bugfix
* fix divides bug
* fix lt_folding issue
|
2024-09-19 14:59:20 +08:00 |
|
qazal
|
309ea63c03
|
include cached replaces in VIZ=1 (#6596)
* pick some work from vizmore branch
* fix the ctx location
* fix that loc
|
2024-09-19 14:48:31 +08:00 |
|
qazal
|
44c18a39a5
|
fix upat .location for the type verifier (#6592)
* fix upat .location for the type verifier
* get the last tinygrad file
|
2024-09-19 14:13:12 +08:00 |
|
chenyu
|
496806ce75
|
another example of openpilot conv with valid (#6595)
|
2024-09-19 01:54:01 -04:00 |
|
qazal
|
0c9b7c9167
|
more detailed UPat view in VIZ (#6594)
|
2024-09-19 13:18:11 +08:00 |
|
nimlgen
|
5e358cf179
|
qcom set ctx prio (#6593)
|
2024-09-19 12:30:00 +08:00 |
|
chenyu
|
7f9fd556b0
|
_min_max for WHERE (#6564)
prereq to gated load simplification
just for int
|
2024-09-18 23:47:48 -04:00 |
|
chenyu
|
1b6eee02ad
|
failed test case for openpilot validhack conv (#6590)
* failed test case for openpilot validhack conv
can save 2ms once this is fixed
* fix order
|
2024-09-18 23:12:30 -04:00 |
|
George Hotz
|
dfcc9c9aa3
|
remove unused view.expr [run_process_replay] (#6591)
|
2024-09-19 11:09:42 +08:00 |
|
George Hotz
|
e015b41ce9
|
remove e( function just alu( [run_process_replay] (#6589)
* remove e( function just alu( [run_process_replay]
* missed two
|
2024-09-19 10:24:02 +08:00 |
|
George Hotz
|
fa0f678d5a
|
use the PatternMatcher to validate UOps type [run_process_replay] (#6583)
* use the PatternMatcher to validate UOps type [run_process_replay]
* type check tests pass
* DEFINE_VAR
* fix precommit
* fix tests
* ptx
* type check tests pass
* ptx test
* int64
* ptx barrier
* delete old stuff
|
2024-09-19 09:59:06 +08:00 |
|
qazal
|
d01e011a8c
|
start multi graph VIZ=1 (#6587)
* add all rewrites
* add a picker
* drop this here
* more work
* reset that
* start multigraph
|
2024-09-19 08:31:56 +08:00 |
|
nimlgen
|
5a7cb8d5a5
|
qcom set power to max (#6578)
|
2024-09-18 18:27:06 +08:00 |
|
chenyu
|
bd40a26b8b
|
image valid test case that current approach does not work (#6584)
|
2024-09-18 06:06:03 -04:00 |
|
chenyu
|
1ec6bd5125
|
restructure simplify_valid_image_load [run_process_replay] (#6581)
* restructure simplify_valid_image_load [run_process_replay]
separated parsing valid / idx and simplification
* space
* type
|
2024-09-18 04:46:41 -04:00 |
|
George Hotz
|
d02bb270b7
|
add copyin copyout for image on GPU [run_process_replay] (#6580)
* add copyin copyout for image on GPU [run_process_replay]
* add timing
* enqueue vs total run
* it's failing but that's fine
|
2024-09-18 16:06:20 +08:00 |
|
chenyu
|
162ead02a9
|
remove LOAD where valid is an empty set (#6579)
356 -> 354 valids
|
2024-09-18 03:49:41 -04:00 |
|
George Hotz
|
d4b662c318
|
new openpilot compile (#6573)
* new openpilot compile
* note, copyout doesn't work for images
|
2024-09-18 14:22:50 +08:00 |
|
chenyu
|
c3a70dbf0d
|
20 jitted steps in openpilot benchmark (#6577)
|
2024-09-18 02:15:16 -04:00 |
|
chenyu
|
a72d51e277
|
brute force VALIDHACK matching (#6575)
* brute force VALIDHACK matching
* cleanup
* 9700
|
2024-09-18 01:59:50 -04:00 |
|
qazal
|
d8e5d5c663
|
move VIZ=1 tests to fuzzers (#6574)
|
2024-09-18 12:12:03 +08:00 |
|
ethanreidel
|
ca8bad90a1
|
Fix typo in ops.py (#6572)
|
2024-09-17 21:25:00 -04:00 |
|
nimlgen
|
9894f20684
|
dsp offset buffer (#6570)
* dsp offset buffer
* view
|
2024-09-17 23:34:21 +08:00 |
|
George Hotz
|
28e565dc0d
|
prune independent kernels for openpilot [run_process_replay] (#6569)
* prune independent kernels for openpilot [run_process_replay]
* new pruning
* prune first, then memory plan
|
2024-09-17 20:02:38 +08:00 |
|
qazal
|
9295bc0189
|
viz more work [run_process_replay] (#6568)
* infra
* found it
* real work
* bring those back
* cleanup test_viz
* comment that out
|
2024-09-17 19:27:09 +08:00 |
|
qazal
|
455a27dd43
|
start viz unittests (#6550)
* test_viz
* more tests
|
2024-09-17 18:58:23 +08:00 |
|
George Hotz
|
67a03e72bb
|
remove expr_idxs [run_process_replay] (#6567)
* remove expr_idxs [run_process_replay]
* goodbye that test
|
2024-09-17 18:34:51 +08:00 |
|
George Hotz
|
9ebbedc37f
|
hotfix: remove expr_idxs from graph
|
2024-09-17 18:02:01 +08:00 |
|
chenyu
|
b947db3de1
|
don't fold mul mod for common factor (#6566)
it makes valid pattern more annoying
|
2024-09-17 06:01:27 -04:00 |
|
qazal
|
a2f446653e
|
add swizzle_st [run_process_replay] (#6561)
* add swizzle_st [run_process_replay]
* reduceop arg can stay
|
2024-09-17 15:37:39 +08:00 |
|
Gaétan Lepage
|
f214bb140d
|
test: relax tolerance of test_broadcastdot (#6560)
|
2024-09-17 03:26:39 -04:00 |
|
chenyu
|
5fb877c78c
|
generic valid match criteria of #6552 (#6558)
455 -> 364 valids.
generalize `idx < image bound` to `idx < image bound + c` for some `c`
|
2024-09-17 02:40:36 -04:00 |
|
George Hotz
|
0ab06d5840
|
push geps through wmma (#6559)
* push geps through wmma
* update tests
|
2024-09-17 14:38:40 +08:00 |
|
qazal
|
5a30a32af8
|
small viz fixups from the swizzle pads branch [run_process_replay] (#6557)
* small viz fixups from the swizzle pads branch [run_process_replay]
* handle indexed ones
|
2024-09-17 14:37:53 +08:00 |
|