Commit Graph

2740 Commits

Author SHA1 Message Date
chenyu
10d642e174 fuzz linearizer transformation (#2188)
* fuzz linearizer transformation

* no standard normal for fp16

* work

* Interpreted start

* CPU and TORCH work

* fix MemBuffer with same idx

* id for failed kernels

* no image and variable for Interpreted

* symbolic shape

* IMAGE only for GPU

* Interpreted almost all good

* cleanup

* fix bufs_from_lin

* zero size

* some failed examples

* just Exception

* just test not pass
2023-11-09 08:03:27 -08:00
chenyu
794122781d Merge pull request #2242 from chenyuxyz/mypy-casts
mypy check warn_redundant_casts
2023-11-08 20:04:46 -05:00
George Hotz
38b7f5a7fd less phi, proper phi (#2241)
* less phi, proper phi

* disable flaky whisper test
2023-11-08 16:13:43 -08:00
chenyu
b9fe133af8 mypy check warn_redundant_casts 2023-11-08 15:06:55 -08:00
wozeparrot
4c44d1344b feat: remove cache_id (#2236) 2023-11-08 08:09:21 -08:00
Rory Clear
553688f12a update metal matmul and matvec for compile api (#2238) 2023-11-08 08:08:35 -08:00
George Hotz
3042450b4d diskcache touchups (#2235) 2023-11-07 18:00:04 -08:00
George Hotz
09bdd55acc update debug prints 2023-11-07 17:47:25 -08:00
George Hotz
c0a033f01d remove real_offset (#2234)
* remove real_offset

* pass in numnode

* remove that real_offset

* sample only for variable
2023-11-07 17:30:53 -08:00
George Hotz
4d95e6d070 move cache out of tmp (#2232) 2023-11-07 11:41:00 -08:00
George Hotz
a48ccdb359 cleanup deps, no pyyaml, pillow to testing (#2231) 2023-11-07 10:32:23 -08:00
nimlgen
ae5d1407ee Fix mmaped in jit (#2225)
* fix reuse for mmaped buffers in jit

* comment
2023-11-06 14:54:21 -08:00
George Hotz
0c9b4ab885 no to_underlying (#2222)
* no to_underlying

* context is no longer used

* no more optimizing

* update docs
2023-11-05 21:34:20 -08:00
George Hotz
fbe7f0c62b metal: unwrap lib write 2023-11-05 21:02:31 -08:00
George Hotz
2f7aab3d13 move optimize_local_size (#2221)
* move optimize_local_size

* interpret_ast
2023-11-05 21:00:52 -08:00
George Hotz
c60c3b467a clean up symlinking in benchmark (#2219)
* clean up symlinking

* make torch deterministic
2023-11-05 16:46:05 -08:00
George Hotz
baeb77a403 Make the JIT simple (no batch exec, no cache collector) (#2215)
* remove batch exec

* simple cachecollector

* remove cache collector test

* less lr
2023-11-05 16:23:43 -08:00
chenyu
719a97b337 fix IMAGE=2 failed with NOOPT=1 (#2209)
* IMAGE=2 failed with NOOPT=1

* fix it
2023-11-05 13:16:37 -08:00
chenyu
680cbfdba4 less broken limit_dims_to_max (#2214) 2023-11-04 08:38:06 -07:00
Ahmed Harmouche
265304e7fd Stable diffusion WebGPU port (#1370)
* WIP: Stable diffusion WebGPU port

* Load whole model: split safetensor to avoid Chrome allocation limit

* Gitignore .DS_Store, remove debug print

* Clip tokenizer in JS

* WIP: Compile model in parts (text model, diffusor, get_x_prev_and_pred_x0, decoder), and recreate forward logic in JS

* e2e stable diffusion flow

* Create initial random latent tensor in JS

* SD working e2e

* Log if some weights were not loaded properly

* Remove latent_tensor.npy used for debugging

* Cleanup, remove useless logs

* Improve UI

* Add progress bar

* Remove .npy files used for debugging

* Add clip tokenizer as external dependency

* Remove alphas_cumprod.js and load it from safetensors

* Refactor

* Simplify a lot

* Dedup base when limiting elementwise merge (webgpu)

* Add return type to safe_load_metadata

* Do not allow run when webgpu is not supported

* Add progress bar, refactor, fix special names

* Add option to chose from local vs huggingface weights

* lowercase tinygrad :)

* fp16 model dl, decompression client side

* Cache f16 model in browser, better progress

* Cache miss recovery

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-03 18:29:16 -07:00
chenyu
f582ec56d5 Replace (getenv("CI", "") != "") with helpers.CI (#2213) 2023-11-03 15:20:44 -07:00
George Hotz
f17bc16f46 simple runtime args (#2211)
* simple runtime args

* fix some tests

* fix abstractions and triton

* fix search
2023-11-03 12:31:29 -07:00
George Hotz
9ea0448103 compile interpreted to python code (#2208)
* sort of works

* interpreted

* fix flopcounter

* interpreted

* simpler

* type

* functools compile ast

* lose a line

* delete extra file

* no self.method_cache
2023-11-03 09:16:12 -07:00
George Hotz
ddbc6eecaf some refactors in the realization (#2206)
* some refactors

* delete old kernel search
2023-11-02 19:51:28 -07:00
George Hotz
51fd993f1f pin onnx to 1.14.1 2023-11-02 18:03:21 -07:00
George Hotz
6621d2eb98 Revert "Modernize setup.py (#2187)"
This reverts commit 7e8c5f1a0f.
2023-11-03 01:01:15 +00:00
nimlgen
6e06adcb95 fix hip segfault (#2204) 2023-11-02 08:40:56 -07:00
George Hotz
03cf0afa4f move all to compile api (#2203)
* move metal+clang to compile api

* all to the new style

* remove binary arg

* fix triton

* fixup tests

* fix clang

* diskcache is generic

* __wrapped__

* compile_gpu

* fix thneed

* keep the src in the ASTRunner

* lib

* move compile_gpu

* compile_gpu in device

* put compiler in astrunner

* test reverts

* triton compiler

* ugh, that too
2023-11-01 23:01:32 -07:00
George Hotz
8932816816 remove arm64, caching for cuda (#2201)
* remove arm64, caching for cuda

* caching in llvm

* switch cache_compiled to new cache

* fix clang

* caching for metal

* fix pylint

* cleanups

* perf_counter and binary
2023-11-01 18:44:00 -07:00
George Hotz
7103b716c4 merge kernel and optimizer (#2200)
* merge kernel and optimizer

* linearize is reentrant

* move global/local size

* clean up linearizer copy

* remove unneeded lin copies

* stop linearizing twice

* oops, that should be None
2023-11-01 15:20:01 -07:00
George Hotz
33bb650e94 use mad in opencl (#2198)
Co-authored-by: Comma Device <device@comma.ai>
2023-11-01 10:40:08 -07:00
George Hotz
c8b6a811ea no locals as opt action (#2196)
* switch barrier, add clear_l2

* no locals can be searched

* revert barrier

* fix ci

* put it there
2023-11-01 09:47:44 -07:00
Comma Device
2e9982fe2d fastvits example that's 10% faster 2023-10-31 21:48:23 -07:00
George Hotz
8ba7ced7f9 extract const if it's const (#2193)
* extract const if it's const

* fix if statement

* fast math issue

* fix graphing and casting

* disable flaky copyout test
2023-10-31 18:52:35 -07:00
George Hotz
b245f1307e add exp2 (#2192) 2023-10-31 17:48:42 -07:00
qazal
e2428b63a6 external (#2191) 2023-10-31 13:57:24 -07:00
Elias Wahl
7e8c5f1a0f Modernize setup.py (#2187)
* Added pyproject.toml

* Pin onnx
2023-10-31 13:55:45 -07:00
nimlgen
8c07c73a9b Fix cl map buffer (#2190)
* fix gpu enqueue_map_buffer out of space

* add test
2023-10-31 12:02:46 -07:00
George Hotz
c59ea32f90 prevent over unrolling in optimzer 2023-10-31 11:45:18 -07:00
George Hotz
5aaa8a0cc1 fix shape 2023-10-31 11:36:19 -07:00
George Hotz
a27c9f9de5 openpilot compile2 (#2189)
* try compile2

* pass to thneed

* fix tanh onnx
2023-10-31 11:08:58 -07:00
qazal
be5f185ac0 Higher test coverage for dtypes (#2156)
* refactor unit tests for dtypes

* add missing dtypes in llvmir.py and lib.py

* skip torch tests

* webgpu

* cleaner skips

* fix llvm bool casting issue using compare

* llvm 100% passing

* llvm segfault

* TEMP decrease timeout mins to 11

debug

* add bf16 to setup

* skip half tests in cuda cpu

* check for CUDACPU insetad

* add int16 to triton dtypes

* u16 for triton

* remove debug - diff is still hard to read

* derive from base class TestDType

* enhance test_upcast and downcast by running on every possible version

* dummy commit to rerun the flakey test

* skip the correct tests for CUDA

* bf16 should be skipped in the common TestDType cases

* re-enable bf16

* more consistent structure

* tiny changes to is_dtype_supported 1

* tiny changes 2

add reason

* fuzz

* fuzzer p2

* run fp32 twice

* remove duplicate fp32 run

* clang: use stdbool

* skip triton on bool casts

* merge and resolve conflicts
2023-10-30 22:38:42 -07:00
forcefieldsovereign
f294bdd681 fixed imports (#2185) 2023-10-30 22:07:17 -07:00
Akshay Kashyap
018bd29e37 Enable Multi-Output Export (#2179)
* Enable Multi-Output Export

* Add test

* Update examples and lint

* fix padding

* test ops

* dummy commit to rerun test

* revert cuda lint

* Enforce tuple/list of tensors

* subscripted generics

* put back webgpu test

* Re-enable WebGPU Efficientnet test
2023-10-30 18:42:26 -07:00
qazal
a7439af786 Fix llvm int->bool cast (#2164)
* add to ir

* add test case

* minimize diff

* todo

* enable fast math

* added both False and True case
2023-10-30 15:28:23 -07:00
George Hotz
94cf652b6b don't use locals applies to GROUP also 2023-10-30 13:56:43 -07:00
George Hotz
5cc536bcc0 don't use locals applies to LASTLOCAL 2023-10-30 13:53:42 -07:00
chenyu
3c88af5071 use unique table name for each disk_cache test (#2184) 2023-10-30 13:49:49 -07:00
George Hotz
608e3ee800 fix no locals search and search both (#2171)
* fix no locals search and search both

* pretty print

* nolocals default no other search
2023-10-30 10:22:50 -07:00
George Hotz
194e4ad6f8 Revert "optimizer: simplify GROUP and LOCAL to have one of each (#2162)" (#2182)
This reverts commit 8cf0bb9351.
2023-10-30 10:22:26 -07:00