Commit Graph

10633 Commits

Author SHA1 Message Date
chenyu
5c240c34aa split validhack into simplify idx and drop valids (#6719)
* split validhack into simplify idx and drop valids

will be using the simplify idx for non-image buffer
[run_process_replay]

* shorter
2024-09-24 09:40:27 -04:00
qazal
cefc3e9382 make all schedules immutable [run_process_replay] (#6718)
* compute inputs and outputs in LBScheduleItem [run_process_replay]

* simpler metadata, delete __hash__

* no dynamic field

* test_diff_schedule
2024-09-24 21:08:16 +08:00
qazal
29330014ab give FUZZ_SCHEDULE views a base (#6717)
* memoryview to bytes

* give FUZZ_SCHEDULE views a base
2024-09-24 19:20:37 +08:00
nimlgen
f0019ad29c bump ci test timeout for test_speed_exec_time (#6715)
* bump ci test timeout for test_speed_exec_time

* more
2024-09-24 18:44:09 +08:00
qazal
1c03fb69c9 viz dedup assert groupby ctx [run_process_replay] (#6714) 2024-09-24 18:17:21 +08:00
chenyu
8d75326cb5 do not fold var with min==max (#6713)
not really used, want it to keep as a var for valid simplification
[run_process_replay]
2024-09-24 06:16:34 -04:00
chenyu
9e51879019 fix idx setup in image_valid test_openpilot_conv3 (#6710)
* fix idx setup in image_valid test_openpilot_conv3

* corrected output and sad
2024-09-24 05:49:04 -04:00
qazal
ae3f3fec38 refactor DEFINE_GLOBAL inputs to list [run_process_replay] (#6711) 2024-09-24 17:43:24 +08:00
wozeparrot
f932116e05 feat: small things from default_threefry (#6708) 2024-09-24 17:00:47 +08:00
chenyu
f2700ac58a construct a candidate set to attempt valid idx rewrite (#6706)
preparation for the brute force attempt for some valids
2024-09-24 04:12:21 -04:00
wozeparrot
2be0b26a1f rand only supports single device (#6682) 2024-09-24 16:07:44 +08:00
nimlgen
75b7627db7 qcom do not recreate memoryviews on updates (#6701) 2024-09-24 15:36:22 +08:00
chenyu
a6078c099f simpler idx rewrite structure in simplify_valid_image_load (#6704)
express valid into things to check when rewriting idx. it's the same for single clause or a simplex
[run_process_replay]
2024-09-24 03:35:39 -04:00
nimlgen
d3ed50c769 fix typo in 'Too many resources requested for launch' (#6705) 2024-09-24 15:33:01 +08:00
wozeparrot
ef7a74bfa0 feat: use /raid/downloads on tinybox (#6702) 2024-09-24 15:26:31 +08:00
nimlgen
ca66b11e07 qcom fix disasm (#6703) 2024-09-24 15:23:43 +08:00
nimlgen
a473bf4ba9 do not always update float dims (#6699)
* do not always update float dims

* linter

* isinsatcen
2024-09-24 14:40:45 +08:00
qazal
048483ee0b viz fold const nodes and UOp/float4 syntax highlight (#6695)
* fold const nodes

* show rewrite count

* hotfix: cpp

* more syntax highlight

* custom language definitions

* only cpp

* small fixups for UPat

* extend python

* cleanups

* rewrites helper

* better message
2024-09-24 14:36:59 +08:00
chenyu
4bb1694f49 more tests about bounds of UOp divs (#6700) 2024-09-24 00:41:43 -04:00
chenyu
79aef64d70 update tests in test_image_valid (#6698) 2024-09-24 00:04:21 -04:00
Anurag Lamsal
568757e087 fix model_eval.py in the mlperf folder searching for bert vocab in the wrong directory (#6649) 2024-09-24 11:20:44 +08:00
chenyu
4a2fa0b627 clean up apply OptOps.PADTO [run_process_replay] (#6694) 2024-09-23 23:13:50 -04:00
chenyu
f703180356 hotfix missed cast in cstyle code_for_workitem (#6693)
`NOLOCALS=1 python -c "from tinygrad import Tensor; Tensor.randn((5, 5)).realize()"` works on green box with this fix #6687
2024-09-23 22:18:18 -04:00
samm393
19c11792fd Flux.1 (#6334)
* initial commit

* whitespace

* get rid of torch import

* indentation

* less hardcoding

* add flux.1-dev

* jit

* no double

* t5 tidy up

* validation image

* reuse sdxl autoencoder

* typing changes

* empty lines

* remove unneeded comments

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-09-24 10:08:04 +08:00
chenyu
31b9c74c77 tiny import cleanup and fix typo (#6692) 2024-09-23 21:48:23 -04:00
qazal
02c0c09fb9 VIZ syntax highlighting and new colors (#6686)
* VIZ syntax highlighting

* more work
2024-09-24 09:41:07 +08:00
ignaciosica
0ffbd75af8 Refactor TC [run_process_replay] (#6456)
* unify _apply_tc_opt

* refactor tc pt2

* hotfix: remove blank line

* refactor upcast_axes

* simplify check before using tensor_cores

* rename upcast_axes

* fix amx and remove counting hack

* AMX cleanup

* hotfix: bug

* skip hand-coded TC opts if AMX to also skip if emulating

* hotfix: AMX bug

* hotfix: AMX tests

* minor format change

* hotfix: minor var name change

* hotfix: minor refactor

* hotfix: hand-coded tc bug

* hotfix: simple change

* fix comment

* hotfix: refactor attempt to local N

* hotfix: AMD TC spacing

* refactor tensor core options in kernel.py to include opt order

* hotfix: add comments to TensorCore dataclass

* hotfix: improve comment on TC dataclas

* hotfix: refactor opt_seq loop

* hotfix: add comments in hand-coded TC opts

* hotfix: upcast_axes comment

* hotfix: remove unroll from opt_seq

* hotfix: bug + remove unroll from opt_seq

* hotfix: rename opt_seq into opts_seq

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-09-24 09:05:29 +08:00
George Hotz
b9e6d42a1f Revert "gated native math in OpenCL (#6683)" (#6691)
This reverts commit 2fe3eeed17.
2024-09-24 08:48:10 +08:00
Harald Schäfer
382938ab41 Add command to show default backend in README (#6688)
* Update README.md

* Update README.md

* Update README.md
2024-09-24 08:42:18 +08:00
George Hotz
46fab1f185 hotfix: curved edges in viz 2024-09-23 19:45:35 +08:00
qazal
ee050d31d7 viz more touchups (#6685)
* dont print if we're running VIZ

* 242424
2024-09-23 19:44:28 +08:00
George Hotz
2fe3eeed17 gated native math in OpenCL (#6683)
* gated native math

* Update cstyle.py
2024-09-23 19:22:13 +08:00
George Hotz
84072166db move mul consts like add consts (#6684) 2024-09-23 19:21:53 +08:00
George Hotz
de259e3f09 hotfix: add compile3 to comma CI 2024-09-23 18:25:49 +08:00
George Hotz
7c38121280 load penalty (#6681)
* bias/bn loads after loops

* load penalty in fix_priority

* more generic test
2024-09-23 18:12:12 +08:00
George Hotz
431ffc4254 hotfix: delete float16 failing 2024-09-23 17:42:57 +08:00
qazal
aad7c9c883 viz adjustable metadata (#6679)
* move from grid to flexbox

* viz adjustable metadata

* w-size
2024-09-23 17:31:51 +08:00
George Hotz
2f2f933e50 fix buffer shape regression from onnx (#6678) 2024-09-23 16:58:42 +08:00
qazal
b438e3cc19 viz bugfix click in middle of UOps (#6676) 2024-09-23 16:44:19 +08:00
chenyu
f55459c98e failed validhack test for a 0.9.7 conv (#6677) 2024-09-23 04:43:47 -04:00
nimlgen
94cbb1cd32 qcom image copyout (#6667)
* qcom copyout

* copyin

* linter

* fix

* linter

* myoy
2024-09-23 16:11:43 +08:00
George Hotz
417a19a292 uop priority inversion (#6670)
* make checks simpler [run_process_replay]

* reorder uops

* fix inversion [run_process_replay]

* no need to move SPECIALs

* Update uopgraph.py
2024-09-23 15:53:53 +08:00
qazal
49bf92afa2 schedule UOps.ASSIGN (#6661) 2024-09-23 15:44:12 +08:00
George Hotz
9f1f445a5f reorder uops (#6672) 2024-09-23 15:21:59 +08:00
qazal
e2d6e10ddf hotfix: reset benchmarks cache for process replay (#6671) 2024-09-23 15:13:02 +08:00
chenyu
0362dbbbe8 relax idx simplification given valid (#6669)
apply to kernels in op 0.9.7.
if a valid has a complicated expr, we cannot drop valid but it's possible to simplify idx given valid
2024-09-23 03:04:57 -04:00
qazal
7ca9ffa494 misc UOp st cleanups (#6668) 2024-09-23 14:16:42 +08:00
chenyu
26ebb7cab4 don't use div_folding in lt_folding (#6666)
* don't use div_folding in lt_folding

valids 35 -> 13

* fails the same as before
2024-09-23 01:50:18 -04:00
qazal
e9248b9e27 viz highlight new nodes (#6665)
* p2

* ret adds and dels

* maybe that way

* add additions

* simpler test_viz
2024-09-23 13:46:18 +08:00
chenyu
da5b741656 removed valid in openpilot conv (#6619)
35 valids left
2024-09-23 00:30:18 -04:00