nimlgen
|
a473bf4ba9
|
do not always update float dims (#6699)
* do not always update float dims
* linter
* isinsatcen
|
2024-09-24 14:40:45 +08:00 |
|
qazal
|
048483ee0b
|
viz fold const nodes and UOp/float4 syntax highlight (#6695)
* fold const nodes
* show rewrite count
* hotfix: cpp
* more syntax highlight
* custom language definitions
* only cpp
* small fixups for UPat
* extend python
* cleanups
* rewrites helper
* better message
|
2024-09-24 14:36:59 +08:00 |
|
chenyu
|
4bb1694f49
|
more tests about bounds of UOp divs (#6700)
|
2024-09-24 00:41:43 -04:00 |
|
chenyu
|
79aef64d70
|
update tests in test_image_valid (#6698)
|
2024-09-24 00:04:21 -04:00 |
|
Anurag Lamsal
|
568757e087
|
fix model_eval.py in the mlperf folder searching for bert vocab in the wrong directory (#6649)
|
2024-09-24 11:20:44 +08:00 |
|
chenyu
|
4a2fa0b627
|
clean up apply OptOps.PADTO [run_process_replay] (#6694)
|
2024-09-23 23:13:50 -04:00 |
|
chenyu
|
f703180356
|
hotfix missed cast in cstyle code_for_workitem (#6693)
`NOLOCALS=1 python -c "from tinygrad import Tensor; Tensor.randn((5, 5)).realize()"` works on green box with this fix #6687
|
2024-09-23 22:18:18 -04:00 |
|
samm393
|
19c11792fd
|
Flux.1 (#6334)
* initial commit
* whitespace
* get rid of torch import
* indentation
* less hardcoding
* add flux.1-dev
* jit
* no double
* t5 tidy up
* validation image
* reuse sdxl autoencoder
* typing changes
* empty lines
* remove unneeded comments
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
|
2024-09-24 10:08:04 +08:00 |
|
chenyu
|
31b9c74c77
|
tiny import cleanup and fix typo (#6692)
|
2024-09-23 21:48:23 -04:00 |
|
qazal
|
02c0c09fb9
|
VIZ syntax highlighting and new colors (#6686)
* VIZ syntax highlighting
* more work
|
2024-09-24 09:41:07 +08:00 |
|
ignaciosica
|
0ffbd75af8
|
Refactor TC [run_process_replay] (#6456)
* unify _apply_tc_opt
* refactor tc pt2
* hotfix: remove blank line
* refactor upcast_axes
* simplify check before using tensor_cores
* rename upcast_axes
* fix amx and remove counting hack
* AMX cleanup
* hotfix: bug
* skip hand-coded TC opts if AMX to also skip if emulating
* hotfix: AMX bug
* hotfix: AMX tests
* minor format change
* hotfix: minor var name change
* hotfix: minor refactor
* hotfix: hand-coded tc bug
* hotfix: simple change
* fix comment
* hotfix: refactor attempt to local N
* hotfix: AMD TC spacing
* refactor tensor core options in kernel.py to include opt order
* hotfix: add comments to TensorCore dataclass
* hotfix: improve comment on TC dataclas
* hotfix: refactor opt_seq loop
* hotfix: add comments in hand-coded TC opts
* hotfix: upcast_axes comment
* hotfix: remove unroll from opt_seq
* hotfix: bug + remove unroll from opt_seq
* hotfix: rename opt_seq into opts_seq
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
|
2024-09-24 09:05:29 +08:00 |
|
George Hotz
|
b9e6d42a1f
|
Revert "gated native math in OpenCL (#6683)" (#6691)
This reverts commit 2fe3eeed17.
|
2024-09-24 08:48:10 +08:00 |
|
Harald Schäfer
|
382938ab41
|
Add command to show default backend in README (#6688)
* Update README.md
* Update README.md
* Update README.md
|
2024-09-24 08:42:18 +08:00 |
|
George Hotz
|
46fab1f185
|
hotfix: curved edges in viz
|
2024-09-23 19:45:35 +08:00 |
|
qazal
|
ee050d31d7
|
viz more touchups (#6685)
* dont print if we're running VIZ
* 242424
|
2024-09-23 19:44:28 +08:00 |
|
George Hotz
|
2fe3eeed17
|
gated native math in OpenCL (#6683)
* gated native math
* Update cstyle.py
|
2024-09-23 19:22:13 +08:00 |
|
George Hotz
|
84072166db
|
move mul consts like add consts (#6684)
|
2024-09-23 19:21:53 +08:00 |
|
George Hotz
|
de259e3f09
|
hotfix: add compile3 to comma CI
|
2024-09-23 18:25:49 +08:00 |
|
George Hotz
|
7c38121280
|
load penalty (#6681)
* bias/bn loads after loops
* load penalty in fix_priority
* more generic test
|
2024-09-23 18:12:12 +08:00 |
|
George Hotz
|
431ffc4254
|
hotfix: delete float16 failing
|
2024-09-23 17:42:57 +08:00 |
|
qazal
|
aad7c9c883
|
viz adjustable metadata (#6679)
* move from grid to flexbox
* viz adjustable metadata
* w-size
|
2024-09-23 17:31:51 +08:00 |
|
George Hotz
|
2f2f933e50
|
fix buffer shape regression from onnx (#6678)
|
2024-09-23 16:58:42 +08:00 |
|
qazal
|
b438e3cc19
|
viz bugfix click in middle of UOps (#6676)
|
2024-09-23 16:44:19 +08:00 |
|
chenyu
|
f55459c98e
|
failed validhack test for a 0.9.7 conv (#6677)
|
2024-09-23 04:43:47 -04:00 |
|
nimlgen
|
94cbb1cd32
|
qcom image copyout (#6667)
* qcom copyout
* copyin
* linter
* fix
* linter
* myoy
|
2024-09-23 16:11:43 +08:00 |
|
George Hotz
|
417a19a292
|
uop priority inversion (#6670)
* make checks simpler [run_process_replay]
* reorder uops
* fix inversion [run_process_replay]
* no need to move SPECIALs
* Update uopgraph.py
|
2024-09-23 15:53:53 +08:00 |
|
qazal
|
49bf92afa2
|
schedule UOps.ASSIGN (#6661)
|
2024-09-23 15:44:12 +08:00 |
|
George Hotz
|
9f1f445a5f
|
reorder uops (#6672)
|
2024-09-23 15:21:59 +08:00 |
|
qazal
|
e2d6e10ddf
|
hotfix: reset benchmarks cache for process replay (#6671)
|
2024-09-23 15:13:02 +08:00 |
|
chenyu
|
0362dbbbe8
|
relax idx simplification given valid (#6669)
apply to kernels in op 0.9.7.
if a valid has a complicated expr, we cannot drop valid but it's possible to simplify idx given valid
|
2024-09-23 03:04:57 -04:00 |
|
qazal
|
7ca9ffa494
|
misc UOp st cleanups (#6668)
|
2024-09-23 14:16:42 +08:00 |
|
chenyu
|
26ebb7cab4
|
don't use div_folding in lt_folding (#6666)
* don't use div_folding in lt_folding
valids 35 -> 13
* fails the same as before
|
2024-09-23 01:50:18 -04:00 |
|
qazal
|
e9248b9e27
|
viz highlight new nodes (#6665)
* p2
* ret adds and dels
* maybe that way
* add additions
* simpler test_viz
|
2024-09-23 13:46:18 +08:00 |
|
chenyu
|
da5b741656
|
removed valid in openpilot conv (#6619)
35 valids left
|
2024-09-23 00:30:18 -04:00 |
|
George Hotz
|
52c2c4df9c
|
fix match of sz 0 + dedup kernel ast [run_process_replay] (#6663)
* fix match of sz 0 [run_process_replay]
* empty graph rewrite to dedup st
|
2024-09-23 11:56:53 +08:00 |
|
chenyu
|
2d4d594994
|
validhack is_irreducible helper (#6664)
[run_process_replay]
|
2024-09-22 23:42:47 -04:00 |
|
chenyu
|
1923932339
|
canonicalize simplex lt (#6658)
(X := a0*x0 + a1*x1 + ...) > 0 is equivalent to x0 + x1 + ... > 0 if xi >= 0 and ai > 0 for ints
|
2024-09-22 23:04:47 -04:00 |
|
wozeparrot
|
46e360fdc0
|
check bfloat16 range with threefry (#6660)
|
2024-09-23 10:48:44 +08:00 |
|
qazal
|
d24e4b1042
|
viz more kernel view work (#6659)
|
2024-09-23 10:48:35 +08:00 |
|
qazal
|
6be1bf09f1
|
hotfix: bring COMPARE_SCHEDULE=0 back (#6657)
|
2024-09-23 10:39:43 +08:00 |
|
George Hotz
|
e945fa9c5c
|
put local on the PtrDtype [run_process_replay] (#6656)
* put local on the PtrDtype [run_process_replay]
* those are local too
|
2024-09-23 10:29:17 +08:00 |
|
chenyu
|
90c1ccc402
|
simpler drop valid check in simplify_valid_image_load (#6653)
* simpler drop valid check in simplify_valid_image_load
* update tests
|
2024-09-22 21:46:39 -04:00 |
|
qazal
|
99ed9fb75e
|
simpler verify_ast [run_process_replay] (#6654)
|
2024-09-23 09:40:09 +08:00 |
|
nimlgen
|
8a9195d86e
|
qcom texs refactor (#6613)
* qcom texs refactor
* fix
* linter
* qcombuf
* linter
|
2024-09-23 09:03:17 +08:00 |
|
qazal
|
d1bae42d35
|
viz lowerer and graph_rewrite dedup try 2 (#6652)
|
2024-09-22 21:09:46 +08:00 |
|
qazal
|
6b65d8c461
|
more process replay tracing work [run_process_replay] (#6650)
|
2024-09-22 16:16:58 +08:00 |
|
George Hotz
|
4fc5a34fe7
|
lowerer is just a graph rewrite, not a class [run_process_replay] (#6648)
|
2024-09-22 14:15:33 +08:00 |
|
George Hotz
|
0eb710de84
|
move WMMA out of lowerer [run_process_replay] (#6647)
|
2024-09-22 14:05:51 +08:00 |
|
George Hotz
|
84703d5b77
|
replace the lowerer with a contextual PatternMatcher [run_process_replay] (#6646)
* replace the lowerer with a contextual PatternMatcher [run_process_replay]
* todo
* it's REDUCE by the time it's in lowerer
|
2024-09-22 13:22:26 +08:00 |
|
qazal
|
4751159139
|
second iteration on viz/serve.py (#6643)
* small detail in checkStatus
* better abstractions for the api
* update test_viz
* ui updates
|
2024-09-22 08:49:44 +08:00 |
|