Commit Graph

2555 Commits

Author SHA1 Message Date
George Hotz
cbb8486779 ResNet training changes (update benchmark) (#2390)
* default arg for chunk

* bring back to_

* good changes

* new set

* unused hash

* fix optim

* new torch loader

* fix test lr scheduler
2023-11-22 17:41:12 -08:00
mmmkkaaayy
7f0cc4a4e8 whisper: support audio >30s (#2378)
* whisper: support audio >30s

* make prompt indexing consistent with reference repo

* fix online
2023-11-21 14:37:51 -08:00
chenyu
d0f966b320 add a segfault linearizer test case (#2383)
* add a segfault linearizer test case

* another interesting one
2023-11-21 15:06:41 -05:00
chenyu
9eeba968cd fix the variable arg order (#2382) 2023-11-21 12:02:31 -05:00
nimlgen
c5f429a40a Fix linearizer cache (#2371)
* fix linearizer cache

* better comments

* a bit cleaner
2023-11-21 07:58:35 -08:00
chenyu
c4cc4966ed update some test_tensor.py cases with 0 in shape (#2368) 2023-11-19 20:35:05 -05:00
chenyu
6add808f6a support tuple shape input for rand and empty (#2367) 2023-11-19 20:20:39 -05:00
chenyu
e9847be790 remove whisper +1-1 hack (#2360)
* remove whisper +1-1 hack

* Revert "remove whisper +1-1 hack"

This reverts commit 5db3800f09.

* update whisper tests

* comment context
2023-11-19 17:56:36 -05:00
George Hotz
a0890f4e6c move fetch to helpers (#2363)
* switch datasets to new fetch

* add test_helpers

* fix convnext and delete old torch load
2023-11-19 12:29:51 -08:00
chenyu
03968622a2 Pretty multinomial (#2365)
* pretty multinomial

p, cdf_normalized -> weight, cdf
symmetric unsqueeze / squeeze
check num_sample > 0

TODO: how do we want to handle 0/0 in general?

* no 0-dim input

* single sum
2023-11-19 15:10:10 -05:00
chenyu
f203d37258 retry test_webgpu.js 3 times (#2362) 2023-11-18 21:24:47 -05:00
mmmkkaaayy
08d09eb666 Enable whisper test in CI for more backends (#2355) 2023-11-18 17:52:50 -05:00
chenyu
d7d078c7f9 Node.vars() returns a set and properly dedup (#2356)
* dedup RedNode.vars()

* vars returns a set

* fix more vars

* unused import

* update to_movement_ops

* comment
2023-11-18 17:44:52 -05:00
chenyu
f02e17a967 Variable.num -> NumNode (#2354) 2023-11-18 15:45:52 -05:00
George Hotz
40246d35bc ops_shm removed (#2351)
* ops_shm removed

* buf.cast

* err, forgot those
2023-11-18 11:41:58 -08:00
chenyu
6e44a798df update fixed linearizer test (#2347)
* update fixed linearizer test

* except CLANG
2023-11-17 23:46:37 -05:00
George Hotz
c7b38b324b A beautiful MNIST training example (#2272)
* beautiful mnist

* beautiful mnist example

* from tinygrad import Tensor

* more beautiful

* the jit is super core tinygrad

* globalcounters reset on jit run

* symlinks and exclude

* beautiful_cartpole

* evaluate is it's own function

* no symlinks

* more beautiful

* jit reset for double speed

* type hinting for JIT

* beautiful_mnist gets 98%

* beautiful_mnist < 4s with BEAM=2

* better cartpole

* use actor critic

* zero_grad got lost

* delete double relu

* stable cartpole with PPO

* beautiful_cartpole is more beautiful

* REPLAY_BUFFER

* beautiful stuff typechecks

* None support in shape

* hp tuning
2023-11-17 19:42:43 -08:00
chenyu
d2c0035c73 add back as_strided, move rebuilt mops to extra (#2344)
* add back as_strided, move rebuilt mops to extra

* negative stride for ops_cpu

* Revert "negative stride for ops_cpu"

This reverts commit a13b6815ac.

* skip that

* style
2023-11-17 14:34:30 -05:00
chenyu
ad3d7428fa good line shaves in st and faster (#2343) 2023-11-17 11:00:26 -05:00
chenyu
8e22c0d95c everything can jit now (#2338) 2023-11-16 23:54:57 -05:00
George Hotz
1d5501594e force rebuild of ocelot (#2334)
* force rebuild of ocelot

* SzymonOzog gpuocelot

* delete that

* downgrade that

* non parallel

* force rebuild

* use llvm

* nauto

* less mem maybe

* print test

* helper_test_exception skip CUDACPU

* helper_test_exception

* shippable
2023-11-16 20:44:14 -08:00
imaolo
0d0c74bac9 Assert for memory allocation failures (#2337)
* assert adequate memory has been freed

* cleaned up runtime error message

* improved metal buffer alloc error catching and reporting

* decreased lines and altered messages

* removed unnecessary  _get_cur_free_space() call

* improved assert message

* added allocate massive buffer test

* added test_lru_allocator_metal_max_buffer_length

* split into two asserts and removed walrus assignment from assert expression

* update assert message and use byte data type for clarity
2023-11-16 20:14:16 -08:00
chenyu
3971259832 fix test_real_world llama (#2335) 2023-11-16 19:50:08 -05:00
Friedrich Carl Eichenroth
75676ab8e1 Profiling-helper (#2321)
* change profiler

* remove unused imports

* remove unused imports

* change lazybuffer references

* remove unused line

* remove unused import

* remove unused stuff

* add types

* typing

* typing

* typing

* trigger actions

* -1 loc

* fixup

* trigger actions

* revert lazy typing changes

* WIP profiler helper

* replace old start & stop profiler

* fixup

* linting

* Update llama.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-16 14:15:56 -08:00
mmmkkaaayy
8235da11dd whisper: support batch inference, add librispeech WER test (#2074)
* whisper: support batch inference, add librispeech WER test, add kv caching and JIT

* remove JIT_SUPPORTED_DEVICE

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-16 13:50:08 -08:00
George Hotz
3baaf298d6 two stage cumsum in tensor.py (#2331)
* two stage cumsum in tensor.py

* 2 more kernels for llama cumsum

* gpt-2 and llama use fast multinomial
2023-11-16 12:09:53 -08:00
chenyu
27f4c26312 fix getitem slice when end < start (#2329) 2023-11-16 11:20:27 -05:00
chenyu
a98511561c fuzz_linearizer same api for interpreted and compiled (#2320) 2023-11-15 17:40:22 -05:00
Marcello Fuschi
b8d460d203 Add Tensor.multinomial (#2295)
* add Tensor.multinomial only with replacement

* add support for 2D input in Tensor.multinomial

* fix multinomial output shape

* allow passing replacement=False to Tensor.multinomial when num_samples=1

* improve tests for Tensor.multinomial

* fix edge case in Tensor.multinomial

* Tensor.multinomial no more staticmethod
2023-11-15 11:38:39 -08:00
George Hotz
70a65c201e JIT support in Interpreted (#2314)
* factor that out

* jit is supported everywhere

* fix some tests

* there's no jit supported device, the jit is everywhere

* fix test uops
2023-11-15 11:13:38 -08:00
chenyu
9a20bc08d6 Tensor(None) is Tensor([]) (#2316) 2023-11-15 13:49:18 -05:00
chenyu
f1f863c953 allow 0-dim array to broadcast into zero shape tensor (#2315)
* allow 0-dim array to broadcast into zero shape tensor

* not in
2023-11-15 13:12:21 -05:00
George Hotz
4da2ddea6e Interpreted cleanups (#2312)
* move the compiler out of ops

* don't return realized

* var_vals filter, fix custom

* typing
2023-11-15 09:02:23 -08:00
chenyu
123a0b86b2 support zero in shape (#2303)
* zero in shape start

* no assert for that

* if output size is 0, return without exec

* tweak

* strides

* reduce over non-zero

* shrink and expand

* fix import

* test_elementwise where

* cannot reshape from size 0 to size 1

* compiled backend reduce over 0

* zeros for numpy

* reduce over 0 and keepdim resulted in 1

* reduce empty set default values

* compare with same input

* pad test case

* cat test case

* torch does not support that?
2023-11-15 11:57:48 -05:00
geohotstan
3c5a51fb3a aaaaaaa finally (#2310) 2023-11-15 07:12:38 -08:00
kormann
cff8375aa2 make self referential AST fast too (#2278)
* cleanup

* linter

* linter

* linter

* rm .buffers

* linter

* linter

* huh?

* cleanup

* typo

* min diff

* property

* rev

* linter

* no matel hack

* minimal properties

* line

* checkout master

* copy_to_device

* idk

* revert

* type

* type

* faast

* speed test

* cleanup test

* softer test

* monotonic

* harder test

* clean code

* cleanup
2023-11-15 07:12:07 -08:00
chenyu
175cdbe815 fix pad None will value (#2308) 2023-11-14 23:57:05 -05:00
chenyu
fac8633ba8 explicit opts for test_linearizer_failures (#2299)
* explicit opts for test_linearizer_failures

* typo

* update the invalid check
2023-11-14 11:52:38 -05:00
George Hotz
0cbf6c1811 move things, clean up extra (#2292)
* move things

* idk why pylint needs that now

* delete unused
2023-11-13 20:18:40 -08:00
George Hotz
b1f7f29525 metal indirect command buffers (#2285)
* metal indirect command buffers

* sub 1ms gpt

* metal batch exec is good

* remove whitespace

* input_replace

* fix ci

* useResources

* very simple cacheallocator

* update_stats

* fix CI

* minor

* remove that from jit
2023-11-13 17:58:26 -08:00
chenyu
d86ea188dd support symbolic shape in Interpreted (#2289)
* support symbolic shape in Interpreted

* simpler

* no InterpretedFlopCounter

* tragic NumNode

* regex is hard
2023-11-13 20:13:18 -05:00
nimlgen
960535dfb8 get_linearizer_actions does not return illegal actions (#2287)
* fix some linearizer failures

* linter happy

* no new test class
2023-11-13 11:48:54 -05:00
rodfer
53c5baa8b6 add dilation to avg_pool2d (#2270)
* add dilation to avg_pool2d

* avg_pool_fix

* avg_pool_fix

* woo

* oops

* force it correct

---------

Co-authored-by: rodfer0x80 <rodfer0x80@proton.me>
Co-authored-by: zibokapi <zibokapi@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-13 08:47:56 -08:00
chenyu
a72b370066 llama take int and convert to Variable internally (#2284) 2023-11-12 17:11:37 -05:00
chenyu
f5a62a1b42 fix some tests related to JitItem (#2279) 2023-11-11 23:00:35 -05:00
George Hotz
78623ba204 two simple tests 2023-11-10 16:16:06 -08:00
George Hotz
6ceea02e65 hotfix of onnx 2023-11-10 15:40:30 -08:00
geohotstan
b853e9bb8c Onnx 1.15.0 gogogo (#2217)
* lol

* lol

* add GELULULULUL

* onnx 1.50

* fuk torch bool neg

* exclude regex tests

* exclude dequantizelinear for now

* is sunny in philly

* damn it affinegrid

* fixed auto_pad VALID

* skip 0 shape tests

* add temporary cast in Reduces

* tests should pass now

* added comments and cleanup

* try moving dequantizelinear to onnx.py

* fixed dequantizedlinear?

* cleanup

* try?

* float16 segfaults LLVM CI..???

* cleanup comments

* pin to 1.50.0

* remove use of -np.inf cuz numpy is kill

* 1.50? lol I'm actually retarded

* thx for review, muhbad

* moved Gelu higher up
2023-11-10 15:36:48 -08:00
George Hotz
85d26ddc36 uops loop removal (#2262)
* remove the loop

* cleanups

* tests failing still

* global_loop_ctx wasn't needed

* replace_op is cleaner

* minor opt

* cast opt was wrong

* uop_num

* uop num was dumb

* tuplize_uops

* torch tests

* fix test_uops
2023-11-10 15:24:47 -08:00
chenyu
a753c8e071 examples of new GPT2 and JIT change (#2261)
* var_vals are global

* working with global ish

* better

* fix export model

* fix tests

* better kv cache

* does it run?

* use where for kvmask

* fix excessive var_vals

* fix import

* how does multigpu use this?

* llama kinda work

* faster and simpler

* cleanup

* fix conversation mode

* test cleanups

* fix one more test

* test cleanup

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2023-11-10 15:07:02 -05:00