Commit Graph

2771 Commits

Author SHA1 Message Date
George Hotz
0cbf6c1811 move things, clean up extra (#2292)
* move things

* idk why pylint needs that now

* delete unused
2023-11-13 20:18:40 -08:00
George Hotz
b1f7f29525 metal indirect command buffers (#2285)
* metal indirect command buffers

* sub 1ms gpt

* metal batch exec is good

* remove whitespace

* input_replace

* fix ci

* useResources

* very simple cacheallocator

* update_stats

* fix CI

* minor

* remove that from jit
2023-11-13 17:58:26 -08:00
chenyu
d86ea188dd support symbolic shape in Interpreted (#2289)
* support symbolic shape in Interpreted

* simpler

* no InterpretedFlopCounter

* tragic NumNode

* regex is hard
2023-11-13 20:13:18 -05:00
George Hotz
6960bcded0 back to 6.54GB for stable diffusion (#2288)
* back to 6.54GB for stable diffusion

* cleanups

* only outputs, not inputs

* err, restore hack for world
2023-11-13 16:50:04 -08:00
nimlgen
960535dfb8 get_linearizer_actions does not return illegal actions (#2287)
* fix some linearizer failures

* linter happy

* no new test class
2023-11-13 11:48:54 -05:00
rodfer
53c5baa8b6 add dilation to avg_pool2d (#2270)
* add dilation to avg_pool2d

* avg_pool_fix

* avg_pool_fix

* woo

* oops

* force it correct

---------

Co-authored-by: rodfer0x80 <rodfer0x80@proton.me>
Co-authored-by: zibokapi <zibokapi@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-13 08:47:56 -08:00
chenyu
a72b370066 llama take int and convert to Variable internally (#2284) 2023-11-12 17:11:37 -05:00
valar
123ea051e6 refactor/ci: delete many # type: ignore (#2281)
* refactor/ci: delete many `# type: ignore`

* replace `axis.__class__ is int` with `isinstance(axis, int)` to make mypy happy
* add `--warn-unused-ignores` to mypy flag

refs #2240

* ci: move `--warn-unused-ignores` flag to mypy config

refs #2240
2023-11-12 11:04:20 -08:00
George Hotz
2e2154ae4f bad hotfix for optimize_local_size, try again 2023-11-12 10:41:11 -08:00
George Hotz
270f747065 hotfix optimize_local_size (TODO: add regression test) 2023-11-12 10:29:00 -08:00
chenyu
f5a62a1b42 fix some tests related to JitItem (#2279) 2023-11-11 23:00:35 -05:00
chenyu
5ef8d682e3 clean up attentions in stable diffusion (#2275) 2023-11-11 14:25:36 -05:00
chenyu
453f48ce02 pad None means (0,0) (#2273) 2023-11-11 09:50:26 -08:00
jxdv
c5d70c1871 typo (#2271) 2023-11-11 07:18:04 -08:00
chenyu
880e693207 fix llama n_kv_heads in kvcache (#2267)
* fix llama n_kv_heads in kvcache

* trigger ci
2023-11-10 21:44:39 -05:00
George Hotz
78623ba204 two simple tests 2023-11-10 16:16:06 -08:00
George Hotz
70fb8a259d hotfix mypy 2023-11-10 15:43:30 -08:00
George Hotz
6ceea02e65 hotfix of onnx 2023-11-10 15:40:30 -08:00
geohotstan
b853e9bb8c Onnx 1.15.0 gogogo (#2217)
* lol

* lol

* add GELULULULUL

* onnx 1.50

* fuk torch bool neg

* exclude regex tests

* exclude dequantizelinear for now

* is sunny in philly

* damn it affinegrid

* fixed auto_pad VALID

* skip 0 shape tests

* add temporary cast in Reduces

* tests should pass now

* added comments and cleanup

* try moving dequantizelinear to onnx.py

* fixed dequantizedlinear?

* cleanup

* try?

* float16 segfaults LLVM CI..???

* cleanup comments

* pin to 1.50.0

* remove use of -np.inf cuz numpy is kill

* 1.50? lol I'm actually retarded

* thx for review, muhbad

* moved Gelu higher up
2023-11-10 15:36:48 -08:00
George Hotz
85d26ddc36 uops loop removal (#2262)
* remove the loop

* cleanups

* tests failing still

* global_loop_ctx wasn't needed

* replace_op is cleaner

* minor opt

* cast opt was wrong

* uop_num

* uop num was dumb

* tuplize_uops

* torch tests

* fix test_uops
2023-11-10 15:24:47 -08:00
chenyu
a753c8e071 examples of new GPT2 and JIT change (#2261)
* var_vals are global

* working with global ish

* better

* fix export model

* fix tests

* better kv cache

* does it run?

* use where for kvmask

* fix excessive var_vals

* fix import

* how does multigpu use this?

* llama kinda work

* faster and simpler

* cleanup

* fix conversation mode

* test cleanups

* fix one more test

* test cleanup

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2023-11-10 15:07:02 -05:00
qazal
b6aaf12df7 Internal cast 2 with more tests (#2257)
* Change linearizer to parse CAST

* Oneliner renders for cstyle and triton

* LLVM cast and ALU implementation

* pylint fixes

* cast in gep

* remove printbufs

* use cast for post-load ops

* get rid of parse_cast

* partially supported vectorized dtypes for initial dev

* render phi as the dtype

* Revert "partially supported vectorized dtypes for initial dev"

This reverts commit 1bf1a818a3.

* Revert "render phi as the dtype"

This reverts commit d08cb270b4.

* reenable triton tests

* no vstore_half if dtype is already half

* upcast max
2023-11-10 10:42:39 -08:00
George Hotz
c0f447d6f7 Inline barrier (#2255)
* put barrier inline for locals

* fix pre-commit on m3

* gate if through barrier
2023-11-10 08:17:10 -08:00
chenyu
75f6e9ab54 one more fuzz linearizer failed example (#2260) 2023-11-10 09:17:37 -05:00
George Hotz
330484c072 Revert "Internal casting support (#2046)" (#2256)
This reverts commit 7e1d08b2ae.
2023-11-09 21:27:13 -08:00
qazal
7e1d08b2ae Internal casting support (#2046)
* Change linearizer to parse CAST

* Oneliner renders for cstyle and triton

* LLVM cast and ALU implementation

* pylint fixes

* cast in gep

* remove printbufs

* use cast for post-load ops

* get rid of parse_cast

* partially supported vectorized dtypes for initial dev

* render phi as the dtype

* Revert "partially supported vectorized dtypes for initial dev"

This reverts commit 1bf1a818a3.

* Revert "render phi as the dtype"

This reverts commit d08cb270b4.

* reenable triton tests

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-09 21:02:32 -08:00
vish-pr
6051f0ce82 For cuda get current free space from device, and retry alloc failures (#2197)
* For cuda get current free space from device, and rery alloc failures

* type ignore for mypy

* add init to get free mem in cuda

* Move retry logic in common lib.

Fix typo in override _get_cur_free_space

* linter error fix in test file

* Not catch all, as it will catch KeyboardInterrupt

* fix unintened line changes
2023-11-09 15:53:50 -08:00
qazal
2465d5d267 fix ops tests in test_dtype (#2237)
* fix test ops

* decompose the err from test_ops

* skipTest skips the entire test, we dont want that

* handle cases with the same priority

* add int16 to torch map
2023-11-09 15:17:43 -08:00
George Hotz
80bf0b8586 proper wmma (#2245)
* proper wmma

* hip cast

* bugfixes

* bugfix

* that bug is fixed

---------

Co-authored-by: George Hotz <george@tinygrad.org>
2023-11-09 15:15:18 -08:00
wozeparrot
b7a31fb708 remove tokei badge from readme (#2251) 2023-11-09 13:53:31 -05:00
2-5
50bf0703aa fix sqlite cache path on Windows (#2250) 2023-11-09 10:32:34 -08:00
chenyu
10d642e174 fuzz linearizer transformation (#2188)
* fuzz linearizer transformation

* no standard normal for fp16

* work

* Interpreted start

* CPU and TORCH work

* fix MemBuffer with same idx

* id for failed kernels

* no image and variable for Interpreted

* symbolic shape

* IMAGE only for GPU

* Interpreted almost all good

* cleanup

* fix bufs_from_lin

* zero size

* some failed examples

* just Exception

* just test not pass
2023-11-09 08:03:27 -08:00
chenyu
794122781d Merge pull request #2242 from chenyuxyz/mypy-casts
mypy check warn_redundant_casts
2023-11-08 20:04:46 -05:00
George Hotz
38b7f5a7fd less phi, proper phi (#2241)
* less phi, proper phi

* disable flaky whisper test
2023-11-08 16:13:43 -08:00
chenyu
b9fe133af8 mypy check warn_redundant_casts 2023-11-08 15:06:55 -08:00
wozeparrot
4c44d1344b feat: remove cache_id (#2236) 2023-11-08 08:09:21 -08:00
Rory Clear
553688f12a update metal matmul and matvec for compile api (#2238) 2023-11-08 08:08:35 -08:00
George Hotz
3042450b4d diskcache touchups (#2235) 2023-11-07 18:00:04 -08:00
George Hotz
09bdd55acc update debug prints 2023-11-07 17:47:25 -08:00
George Hotz
c0a033f01d remove real_offset (#2234)
* remove real_offset

* pass in numnode

* remove that real_offset

* sample only for variable
2023-11-07 17:30:53 -08:00
George Hotz
4d95e6d070 move cache out of tmp (#2232) 2023-11-07 11:41:00 -08:00
George Hotz
a48ccdb359 cleanup deps, no pyyaml, pillow to testing (#2231) 2023-11-07 10:32:23 -08:00
nimlgen
ae5d1407ee Fix mmaped in jit (#2225)
* fix reuse for mmaped buffers in jit

* comment
2023-11-06 14:54:21 -08:00
George Hotz
0c9b4ab885 no to_underlying (#2222)
* no to_underlying

* context is no longer used

* no more optimizing

* update docs
2023-11-05 21:34:20 -08:00
George Hotz
fbe7f0c62b metal: unwrap lib write 2023-11-05 21:02:31 -08:00
George Hotz
2f7aab3d13 move optimize_local_size (#2221)
* move optimize_local_size

* interpret_ast
2023-11-05 21:00:52 -08:00
George Hotz
c60c3b467a clean up symlinking in benchmark (#2219)
* clean up symlinking

* make torch deterministic
2023-11-05 16:46:05 -08:00
George Hotz
baeb77a403 Make the JIT simple (no batch exec, no cache collector) (#2215)
* remove batch exec

* simple cachecollector

* remove cache collector test

* less lr
2023-11-05 16:23:43 -08:00
chenyu
719a97b337 fix IMAGE=2 failed with NOOPT=1 (#2209)
* IMAGE=2 failed with NOOPT=1

* fix it
2023-11-05 13:16:37 -08:00
chenyu
680cbfdba4 less broken limit_dims_to_max (#2214) 2023-11-04 08:38:06 -07:00