Commit Graph

992 Commits

Author SHA1 Message Date
chenyu
f02e17a967 Variable.num -> NumNode (#2354) 2023-11-18 15:45:52 -05:00
George Hotz
40246d35bc ops_shm removed (#2351)
* ops_shm removed

* buf.cast

* err, forgot those
2023-11-18 11:41:58 -08:00
chenyu
6e44a798df update fixed linearizer test (#2347)
* update fixed linearizer test

* except CLANG
2023-11-17 23:46:37 -05:00
George Hotz
c7b38b324b A beautiful MNIST training example (#2272)
* beautiful mnist

* beautiful mnist example

* from tinygrad import Tensor

* more beautiful

* the jit is super core tinygrad

* globalcounters reset on jit run

* symlinks and exclude

* beautiful_cartpole

* evaluate is it's own function

* no symlinks

* more beautiful

* jit reset for double speed

* type hinting for JIT

* beautiful_mnist gets 98%

* beautiful_mnist < 4s with BEAM=2

* better cartpole

* use actor critic

* zero_grad got lost

* delete double relu

* stable cartpole with PPO

* beautiful_cartpole is more beautiful

* REPLAY_BUFFER

* beautiful stuff typechecks

* None support in shape

* hp tuning
2023-11-17 19:42:43 -08:00
chenyu
d2c0035c73 add back as_strided, move rebuilt mops to extra (#2344)
* add back as_strided, move rebuilt mops to extra

* negative stride for ops_cpu

* Revert "negative stride for ops_cpu"

This reverts commit a13b6815ac.

* skip that

* style
2023-11-17 14:34:30 -05:00
chenyu
ad3d7428fa good line shaves in st and faster (#2343) 2023-11-17 11:00:26 -05:00
chenyu
8e22c0d95c everything can jit now (#2338) 2023-11-16 23:54:57 -05:00
George Hotz
1d5501594e force rebuild of ocelot (#2334)
* force rebuild of ocelot

* SzymonOzog gpuocelot

* delete that

* downgrade that

* non parallel

* force rebuild

* use llvm

* nauto

* less mem maybe

* print test

* helper_test_exception skip CUDACPU

* helper_test_exception

* shippable
2023-11-16 20:44:14 -08:00
imaolo
0d0c74bac9 Assert for memory allocation failures (#2337)
* assert adequate memory has been freed

* cleaned up runtime error message

* improved metal buffer alloc error catching and reporting

* decreased lines and altered messages

* removed unnecessary  _get_cur_free_space() call

* improved assert message

* added allocate massive buffer test

* added test_lru_allocator_metal_max_buffer_length

* split into two asserts and removed walrus assignment from assert expression

* update assert message and use byte data type for clarity
2023-11-16 20:14:16 -08:00
chenyu
3971259832 fix test_real_world llama (#2335) 2023-11-16 19:50:08 -05:00
Friedrich Carl Eichenroth
75676ab8e1 Profiling-helper (#2321)
* change profiler

* remove unused imports

* remove unused imports

* change lazybuffer references

* remove unused line

* remove unused import

* remove unused stuff

* add types

* typing

* typing

* typing

* trigger actions

* -1 loc

* fixup

* trigger actions

* revert lazy typing changes

* WIP profiler helper

* replace old start & stop profiler

* fixup

* linting

* Update llama.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-16 14:15:56 -08:00
mmmkkaaayy
8235da11dd whisper: support batch inference, add librispeech WER test (#2074)
* whisper: support batch inference, add librispeech WER test, add kv caching and JIT

* remove JIT_SUPPORTED_DEVICE

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-16 13:50:08 -08:00
George Hotz
3baaf298d6 two stage cumsum in tensor.py (#2331)
* two stage cumsum in tensor.py

* 2 more kernels for llama cumsum

* gpt-2 and llama use fast multinomial
2023-11-16 12:09:53 -08:00
chenyu
27f4c26312 fix getitem slice when end < start (#2329) 2023-11-16 11:20:27 -05:00
chenyu
a98511561c fuzz_linearizer same api for interpreted and compiled (#2320) 2023-11-15 17:40:22 -05:00
Marcello Fuschi
b8d460d203 Add Tensor.multinomial (#2295)
* add Tensor.multinomial only with replacement

* add support for 2D input in Tensor.multinomial

* fix multinomial output shape

* allow passing replacement=False to Tensor.multinomial when num_samples=1

* improve tests for Tensor.multinomial

* fix edge case in Tensor.multinomial

* Tensor.multinomial no more staticmethod
2023-11-15 11:38:39 -08:00
George Hotz
70a65c201e JIT support in Interpreted (#2314)
* factor that out

* jit is supported everywhere

* fix some tests

* there's no jit supported device, the jit is everywhere

* fix test uops
2023-11-15 11:13:38 -08:00
chenyu
9a20bc08d6 Tensor(None) is Tensor([]) (#2316) 2023-11-15 13:49:18 -05:00
chenyu
f1f863c953 allow 0-dim array to broadcast into zero shape tensor (#2315)
* allow 0-dim array to broadcast into zero shape tensor

* not in
2023-11-15 13:12:21 -05:00
George Hotz
4da2ddea6e Interpreted cleanups (#2312)
* move the compiler out of ops

* don't return realized

* var_vals filter, fix custom

* typing
2023-11-15 09:02:23 -08:00
chenyu
123a0b86b2 support zero in shape (#2303)
* zero in shape start

* no assert for that

* if output size is 0, return without exec

* tweak

* strides

* reduce over non-zero

* shrink and expand

* fix import

* test_elementwise where

* cannot reshape from size 0 to size 1

* compiled backend reduce over 0

* zeros for numpy

* reduce over 0 and keepdim resulted in 1

* reduce empty set default values

* compare with same input

* pad test case

* cat test case

* torch does not support that?
2023-11-15 11:57:48 -05:00
geohotstan
3c5a51fb3a aaaaaaa finally (#2310) 2023-11-15 07:12:38 -08:00
kormann
cff8375aa2 make self referential AST fast too (#2278)
* cleanup

* linter

* linter

* linter

* rm .buffers

* linter

* linter

* huh?

* cleanup

* typo

* min diff

* property

* rev

* linter

* no matel hack

* minimal properties

* line

* checkout master

* copy_to_device

* idk

* revert

* type

* type

* faast

* speed test

* cleanup test

* softer test

* monotonic

* harder test

* clean code

* cleanup
2023-11-15 07:12:07 -08:00
chenyu
175cdbe815 fix pad None will value (#2308) 2023-11-14 23:57:05 -05:00
chenyu
fac8633ba8 explicit opts for test_linearizer_failures (#2299)
* explicit opts for test_linearizer_failures

* typo

* update the invalid check
2023-11-14 11:52:38 -05:00
George Hotz
0cbf6c1811 move things, clean up extra (#2292)
* move things

* idk why pylint needs that now

* delete unused
2023-11-13 20:18:40 -08:00
George Hotz
b1f7f29525 metal indirect command buffers (#2285)
* metal indirect command buffers

* sub 1ms gpt

* metal batch exec is good

* remove whitespace

* input_replace

* fix ci

* useResources

* very simple cacheallocator

* update_stats

* fix CI

* minor

* remove that from jit
2023-11-13 17:58:26 -08:00
chenyu
d86ea188dd support symbolic shape in Interpreted (#2289)
* support symbolic shape in Interpreted

* simpler

* no InterpretedFlopCounter

* tragic NumNode

* regex is hard
2023-11-13 20:13:18 -05:00
nimlgen
960535dfb8 get_linearizer_actions does not return illegal actions (#2287)
* fix some linearizer failures

* linter happy

* no new test class
2023-11-13 11:48:54 -05:00
rodfer
53c5baa8b6 add dilation to avg_pool2d (#2270)
* add dilation to avg_pool2d

* avg_pool_fix

* avg_pool_fix

* woo

* oops

* force it correct

---------

Co-authored-by: rodfer0x80 <rodfer0x80@proton.me>
Co-authored-by: zibokapi <zibokapi@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-13 08:47:56 -08:00
chenyu
a72b370066 llama take int and convert to Variable internally (#2284) 2023-11-12 17:11:37 -05:00
chenyu
f5a62a1b42 fix some tests related to JitItem (#2279) 2023-11-11 23:00:35 -05:00
George Hotz
78623ba204 two simple tests 2023-11-10 16:16:06 -08:00
George Hotz
6ceea02e65 hotfix of onnx 2023-11-10 15:40:30 -08:00
geohotstan
b853e9bb8c Onnx 1.15.0 gogogo (#2217)
* lol

* lol

* add GELULULULUL

* onnx 1.50

* fuk torch bool neg

* exclude regex tests

* exclude dequantizelinear for now

* is sunny in philly

* damn it affinegrid

* fixed auto_pad VALID

* skip 0 shape tests

* add temporary cast in Reduces

* tests should pass now

* added comments and cleanup

* try moving dequantizelinear to onnx.py

* fixed dequantizedlinear?

* cleanup

* try?

* float16 segfaults LLVM CI..???

* cleanup comments

* pin to 1.50.0

* remove use of -np.inf cuz numpy is kill

* 1.50? lol I'm actually retarded

* thx for review, muhbad

* moved Gelu higher up
2023-11-10 15:36:48 -08:00
George Hotz
85d26ddc36 uops loop removal (#2262)
* remove the loop

* cleanups

* tests failing still

* global_loop_ctx wasn't needed

* replace_op is cleaner

* minor opt

* cast opt was wrong

* uop_num

* uop num was dumb

* tuplize_uops

* torch tests

* fix test_uops
2023-11-10 15:24:47 -08:00
chenyu
a753c8e071 examples of new GPT2 and JIT change (#2261)
* var_vals are global

* working with global ish

* better

* fix export model

* fix tests

* better kv cache

* does it run?

* use where for kvmask

* fix excessive var_vals

* fix import

* how does multigpu use this?

* llama kinda work

* faster and simpler

* cleanup

* fix conversation mode

* test cleanups

* fix one more test

* test cleanup

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2023-11-10 15:07:02 -05:00
qazal
b6aaf12df7 Internal cast 2 with more tests (#2257)
* Change linearizer to parse CAST

* Oneliner renders for cstyle and triton

* LLVM cast and ALU implementation

* pylint fixes

* cast in gep

* remove printbufs

* use cast for post-load ops

* get rid of parse_cast

* partially supported vectorized dtypes for initial dev

* render phi as the dtype

* Revert "partially supported vectorized dtypes for initial dev"

This reverts commit 1bf1a818a3.

* Revert "render phi as the dtype"

This reverts commit d08cb270b4.

* reenable triton tests

* no vstore_half if dtype is already half

* upcast max
2023-11-10 10:42:39 -08:00
chenyu
75f6e9ab54 one more fuzz linearizer failed example (#2260) 2023-11-10 09:17:37 -05:00
George Hotz
330484c072 Revert "Internal casting support (#2046)" (#2256)
This reverts commit 7e1d08b2ae.
2023-11-09 21:27:13 -08:00
qazal
7e1d08b2ae Internal casting support (#2046)
* Change linearizer to parse CAST

* Oneliner renders for cstyle and triton

* LLVM cast and ALU implementation

* pylint fixes

* cast in gep

* remove printbufs

* use cast for post-load ops

* get rid of parse_cast

* partially supported vectorized dtypes for initial dev

* render phi as the dtype

* Revert "partially supported vectorized dtypes for initial dev"

This reverts commit 1bf1a818a3.

* Revert "render phi as the dtype"

This reverts commit d08cb270b4.

* reenable triton tests

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-09 21:02:32 -08:00
vish-pr
6051f0ce82 For cuda get current free space from device, and retry alloc failures (#2197)
* For cuda get current free space from device, and rery alloc failures

* type ignore for mypy

* add init to get free mem in cuda

* Move retry logic in common lib.

Fix typo in override _get_cur_free_space

* linter error fix in test file

* Not catch all, as it will catch KeyboardInterrupt

* fix unintened line changes
2023-11-09 15:53:50 -08:00
qazal
2465d5d267 fix ops tests in test_dtype (#2237)
* fix test ops

* decompose the err from test_ops

* skipTest skips the entire test, we dont want that

* handle cases with the same priority

* add int16 to torch map
2023-11-09 15:17:43 -08:00
George Hotz
80bf0b8586 proper wmma (#2245)
* proper wmma

* hip cast

* bugfixes

* bugfix

* that bug is fixed

---------

Co-authored-by: George Hotz <george@tinygrad.org>
2023-11-09 15:15:18 -08:00
chenyu
10d642e174 fuzz linearizer transformation (#2188)
* fuzz linearizer transformation

* no standard normal for fp16

* work

* Interpreted start

* CPU and TORCH work

* fix MemBuffer with same idx

* id for failed kernels

* no image and variable for Interpreted

* symbolic shape

* IMAGE only for GPU

* Interpreted almost all good

* cleanup

* fix bufs_from_lin

* zero size

* some failed examples

* just Exception

* just test not pass
2023-11-09 08:03:27 -08:00
George Hotz
38b7f5a7fd less phi, proper phi (#2241)
* less phi, proper phi

* disable flaky whisper test
2023-11-08 16:13:43 -08:00
wozeparrot
4c44d1344b feat: remove cache_id (#2236) 2023-11-08 08:09:21 -08:00
George Hotz
c0a033f01d remove real_offset (#2234)
* remove real_offset

* pass in numnode

* remove that real_offset

* sample only for variable
2023-11-07 17:30:53 -08:00
nimlgen
ae5d1407ee Fix mmaped in jit (#2225)
* fix reuse for mmaped buffers in jit

* comment
2023-11-06 14:54:21 -08:00
George Hotz
2f7aab3d13 move optimize_local_size (#2221)
* move optimize_local_size

* interpret_ast
2023-11-05 21:00:52 -08:00