Commit Graph

10417 Commits

Author SHA1 Message Date
imaolo
228b310478 align cpu buffer before copy into cl buffer (#2135) 2023-10-23 21:04:35 -04:00
imaolo
6ee0435263 added from unaligned np test (#2134) 2023-10-23 11:38:57 -04:00
George Hotz
3c56c181f6 string formatting 25 -> 30 to fit 2023-10-22 10:57:34 -07:00
George Hotz
6dc8eb5bfd universal disk cache (#2130)
* caching infra for tinygrad

* nons tr key

* fix linter

* no shelve in beam search

* beam search caching

* check tensor cores with beam too

* pretty print

* LATEBEAM in stable diffusion
2023-10-22 10:56:57 -07:00
Francis Lam
ace6b2a151 optimizer: add test for correctness of opts (#2124)
* optimizer: add test for correctness of opts

Also added OptOps.UPCASTMID to constrain valid axes for opts with
group_for_reduce.

* llvm: fix LinearizerOptions to correctly not has_shared

* optimizer: remove premature test scaffold for TC opts

* search: fix the action space
2023-10-22 08:02:22 -07:00
George Hotz
abeba8f1fc optimization: get actions in CI (#2125)
* get actions in CI

* actually run the test

* pythonpath
2023-10-20 12:22:01 -07:00
qazal
14625721e9 minor triton casting refactor (#2118)
* minor refactor

* render_cast taking an x like cstyle

* fix fmt strings

* tl.where

* fix alu render

* use dtype

* newline eof

* better diff
2023-10-20 12:11:55 -07:00
George Hotz
cb508e6923 uops graphing + phi (#2120)
* uops graphing

* add_phi_node

* less phi nodes

* where graph uops should live

* naming

* move it to external

* fix triton yolo

* fix clang and preserve behavior
2023-10-19 22:26:28 -07:00
20kdc
bedd028061 waifu2x vgg7: testcase, auto-RGBA->RGB, function to grab pretrained models, training "fix" (#2117) 2023-10-19 22:07:15 -07:00
Szymon Ożóg
e0b2bf46b4 Improve triton generated code quality (#2119) 2023-10-19 22:06:19 -07:00
qazal
36d4001b4f add test coverage for search (#2104)
* add test coverage for search

* only in compiled backends

* dont use device.default in decorator

* time_til is the other way around xd
2023-10-19 17:06:47 -07:00
Szymon Ożóg
7268b3c6fb make triton not write to disk (#2116) 2023-10-18 23:06:47 -07:00
David Hou
95e17ff0d4 fix wino mask upcast calculation (#2057)
* fix wino mask upcast calculation

* add tests for wino upcast hcopt

* add info to note

* real world wino hcopt test

* wino backward test

* whitespace
2023-10-18 16:54:48 -07:00
George Hotz
5cfec59abc hlb cifar touchups (#2113)
* types and cnt and EVAL_STEPS

* eval time + always print eval
2023-10-18 16:26:15 -07:00
chenyu
5d5921d2c8 small doc env update (#2112) 2023-10-18 14:49:25 -07:00
George Hotz
4526891db7 parallel apt (#2111) 2023-10-18 14:49:00 -07:00
George Hotz
87b714b8cb split test_conv2d 2023-10-18 14:00:50 -07:00
George Hotz
15da96f393 print test durations and add speed (#2107)
* print test durations

* decrease sizes to increase speed

* faster

* GPU/CLANG onnx in seperate runner

* test split, move ONNX CPU CI

* simpler tests

* simpler uops test

* faster

* less cuda apt

* running ninja install

* apt install

* split fancy indexing
2023-10-18 13:46:42 -07:00
George Hotz
e2a1c2aaa6 force ruff reinstall 2023-10-18 11:40:46 -07:00
George Hotz
0d2b3a9d33 full path for ruff 2023-10-18 11:27:49 -07:00
George Hotz
8940c89d13 tests: remove 2 runners, make cache reliable (#2106)
* remove 2 runners

* device.DEFAULT printing

* explain rebuild

* disable ocelot rebuild

* try again to fix workflow

* this? fix cache hash

* force no rebuild

* fix pylint
2023-10-18 11:10:41 -07:00
George Hotz
b3afe0106b typo, src printing, and no verbose on triton (#2105) 2023-10-18 09:44:36 -07:00
20kdc
967a88a505 examples/waifu2x: Cleanup waifu2x vgg7 model format (now uses safetensors) (#2082) 2023-10-18 09:20:11 -07:00
George Hotz
881fd7c141 add mops to graph, refactor IMAGE (#2100)
* add mops to graph, refactor IMAGE

* no reshape pushing

* add todo

* fix openpilot model alt

* push reshapes reduces kernels in new op

* IMAGE=2 is a first class citizen now
2023-10-17 21:27:51 -07:00
George Hotz
2498802b46 fix beam search for llvm, this needs tests (#2101) 2023-10-17 20:09:42 -07:00
wozeparrot
4d1e59abfd fix: only when distributed (#2102) 2023-10-17 20:09:04 -07:00
Sean D'Souza
999c95ea29 fix: hlb cifar types (#2099) 2023-10-17 19:23:50 -07:00
George Hotz
9b1c3cd9ca hlb_cifar: support EVAL_STEPS=1000, print when dataset is shuffled 2023-10-18 01:11:08 +00:00
Ahmed Harmouche
2b5ea7d9cb Fix output Float32Array size in webgpu export (#2096) 2023-10-17 15:28:19 -07:00
Umut Zengin
01b98b7f42 MulNode.__lt__ rule (#2086)
* Added the rule

* Added tests

* flake8

* self.b == -1 shortcut
2023-10-17 13:18:35 -07:00
Szymon Ożóg
f76fbd23e9 cleanup triton (#2092)
* Revert "disable flaky triton test"

This reverts commit 1e15fdaee7.

* Update test.yml

* check if has shared for matvec

* disable ocelot cache for triton

* disable ocelot cache

* disable ocelot cache

* pass shared to triton uops tests

* temporary debugs for CI crash

* Revert "temporary debugs for CI crash"

This reverts commit fee3ea96c8.

* Revert "triton isn't tested, and allows this refactor (#2007)"

This reverts commit dea8bb0938.

* add runtime_args to every renderer, move triton local size override to runtime args

* Add binary to args, correct type returned

* update to new loops

* Update test.yml

* cleanup triton
2023-10-17 12:49:44 -07:00
Szymon Ożóg
4bef1591f0 Disable ocelot cache + fix matvec in triton (#2010)
* Revert "disable flaky triton test"

This reverts commit 1e15fdaee7.

* Update test.yml

* check if has shared for matvec

* disable ocelot cache for triton

* disable ocelot cache

* disable ocelot cache

* pass shared to triton uops tests

* temporary debugs for CI crash

* Revert "temporary debugs for CI crash"

This reverts commit fee3ea96c8.

* Revert "triton isn't tested, and allows this refactor (#2007)"

This reverts commit dea8bb0938.

* add runtime_args to every renderer, move triton local size override to runtime args

* Add binary to args, correct type returned

* update to new loops

* Update test.yml
2023-10-17 10:33:32 -07:00
geohotstan
5ed630204b Add ONNX to CI for other backends (#2069)
* some cleanup

* move continue back

* more more more

* added to CI

* try

* try intentionally break some tests

* wtf

* del True for test

* yay tests broke, now pls no break

* try AGAIN

* gahy

* lol

* try

* move over constant

* moved over MORE

* move shrink over

* trailing lines

* try CUDA CI

* try again

* boom

* oops

* improved comments

* try: disable some flags and disable CUDA

* try breaking tests

* traceback has too much info so add --tb=no

* revert forced CI failure

* add comments and del unused imports

* oooooooo using regular debug try enable tb

* intentionally break tests

* added tb back. Maybe not too verbose

* strip whitespcae

* missed something

* Shape op int32 -> int64

* oops missed something

* add some types

* get rid of crazy 1 liners in pad op

* actually test Split this time LOL

* strip that whitespace
2023-10-17 09:33:54 -07:00
George Hotz
5a4a62ecae Disable logging in early compile2 and lower kernel counts (#2090)
* Revert "Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)"

This reverts commit 924ecc4d6a.

* gate behind OPT >= 4

* disable_logging in schedule

* simple

* from master

* more images

* revert that

* 206 kernels
2023-10-16 20:15:24 -07:00
George Hotz
442a27db8a shouldn't do anything (#2091) 2023-10-16 18:18:34 -07:00
George Hotz
1bf4aef0f5 fix image dtype cmp (#2089)
* fix image dtype cmp

* print that with debug 3
2023-10-16 17:52:38 -07:00
George Hotz
e4846771b2 Revert "limit metal buffers and revert the 207 fix (try 2) (#2088)"
This reverts commit 5e24dc5a95.
2023-10-16 17:50:11 -07:00
George Hotz
d0aaf7d83b Revert "Revert "Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)""
This reverts commit f22a7cf656.
2023-10-16 17:47:00 -07:00
George Hotz
5e24dc5a95 limit metal buffers and revert the 207 fix (try 2) (#2088)
* limit metal buffers

* look at the base, not the srcs

* Revert "Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)"

This reverts commit 924ecc4d6a.

* add a test for that
2023-10-16 14:52:16 -07:00
George Hotz
e8fcd2f3db Revert "limit metal buffers and revert the 207 fix (#2087)"
This reverts commit 2fb10f6a19.
2023-10-16 14:32:22 -07:00
George Hotz
2fb10f6a19 limit metal buffers and revert the 207 fix (#2087)
* limit metal buffers

* Revert "Revert "openpilot kernel fix from 209 to 207 (#2006)" (#2065)"

This reverts commit 924ecc4d6a.
2023-10-16 14:26:32 -07:00
George Hotz
a7b18ac325 try beam search on device (#2085)
* try beam search on device

* fix beam with nolocals

* ops too

---------

Co-authored-by: Comma Device <device@comma.ai>
2023-10-16 12:52:42 -07:00
George Hotz
c36d306606 KOPT is over, BEAM is upstream (#2071)
* create cache for q learning

* make linter happy

* global beam

* where it belongs

* bugfix

* ditch the kopt, use the beam

* faster lin and DEBUG=2 okay

* remove kopt, move search to features
2023-10-16 09:46:03 -07:00
nimlgen
e4660b024f mute hip warnings (#2081) 2023-10-16 07:09:10 -07:00
George Hotz
5472a14544 openpilot compile2 (#1977)
* start compile2

* tweak

* why are there two more kernels?

* minor cleanups

* don't break onnx tests

* add __metadata__ support to safetensors

* no early realize in onnx

* cleanups

* bugfix

* clean up image type, add optimize

* opt to match old

* try that

* opt work

* run compile2

* optimizer

* prt more

* prerealize

* imp

* NOLOCALS works

* no locals means no locals

* support fractional globals

* all locals welcome

* int that

* cleanups

* show gemv regression

* clean up diff

* use idx for the cond

* nolocals

---------

Co-authored-by: Comma Device <device@comma.ai>
2023-10-15 20:39:46 -07:00
George Hotz
566660675c bugfix for cuda warning (#2078) 2023-10-15 18:35:35 -07:00
Ahmed Harmouche
0d3410d93f Stable diffusion: Make guidance modifiable (#2077) 2023-10-15 14:36:43 -07:00
Umut Zengin
776605f2fc O(1) VALIDHACKS (#2072)
* first refactoring

* O(1) validhacks

* O(1) validhacks

* Some cleaning

* mypy

* flake8

* Trim trim

* flake8

* clean

* less chaotic

* less chaotic

* flake8

* Symbolic, SumNode include mulnode for gcd

* fix tests

* smal optim

* revert

* clean

* clean

* flake8

* small fix

* Add symbolic test
2023-10-15 11:26:41 -07:00
George Hotz
30933d5bd0 if support (#2076)
* if support

* bugfix

* fix wgsl if

* more correct wgsl fix
2023-10-15 07:17:37 -07:00
nimlgen
cb9309bee6 remove temp files (#2075) 2023-10-15 06:45:36 -07:00