Commit Graph

10417 Commits

Author SHA1 Message Date
George Hotz
9b58d4cb37 cleanup unused movement ops (#2353)
* cleanup_mops

* no expand

* nothing

* revert that

* add comment

* add correctness check to disk tensor
2023-11-18 09:19:02 -08:00
chenyu
c4d97bba8c simplify Node.sum, remove factorize method (#2352) 2023-11-18 11:55:48 -05:00
George Hotz
e35c31c8e5 xid for hip, device in time linearizer (#2348)
Co-authored-by: Tiny Box <tinybox@tinygrad.org>
2023-11-17 20:50:07 -08:00
chenyu
6e44a798df update fixed linearizer test (#2347)
* update fixed linearizer test

* except CLANG
2023-11-17 23:46:37 -05:00
George Hotz
c8c5212dce a lil more beautiful_mnist 2023-11-17 19:53:06 -08:00
George Hotz
c7b38b324b A beautiful MNIST training example (#2272)
* beautiful mnist

* beautiful mnist example

* from tinygrad import Tensor

* more beautiful

* the jit is super core tinygrad

* globalcounters reset on jit run

* symlinks and exclude

* beautiful_cartpole

* evaluate is it's own function

* no symlinks

* more beautiful

* jit reset for double speed

* type hinting for JIT

* beautiful_mnist gets 98%

* beautiful_mnist < 4s with BEAM=2

* better cartpole

* use actor critic

* zero_grad got lost

* delete double relu

* stable cartpole with PPO

* beautiful_cartpole is more beautiful

* REPLAY_BUFFER

* beautiful stuff typechecks

* None support in shape

* hp tuning
2023-11-17 19:42:43 -08:00
chenyu
74e6b6c9fc types (#2346) 2023-11-17 18:49:24 -05:00
chenyu
d2c0035c73 add back as_strided, move rebuilt mops to extra (#2344)
* add back as_strided, move rebuilt mops to extra

* negative stride for ops_cpu

* Revert "negative stride for ops_cpu"

This reverts commit a13b6815ac.

* skip that

* style
2023-11-17 14:34:30 -05:00
nimlgen
064034c42c hip free event + a bit faster cpu time (#2342)
* free hip events

* hip faster
2023-11-17 09:50:49 -08:00
chenyu
ad3d7428fa good line shaves in st and faster (#2343) 2023-11-17 11:00:26 -05:00
George Hotz
652d2de256 wow how did i think that was okay (#2339) 2023-11-16 21:21:11 -08:00
chenyu
8e22c0d95c everything can jit now (#2338) 2023-11-16 23:54:57 -05:00
Friedrich Carl Eichenroth
a8875bd770 add types to lazy (#2327)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-16 20:48:41 -08:00
George Hotz
1d5501594e force rebuild of ocelot (#2334)
* force rebuild of ocelot

* SzymonOzog gpuocelot

* delete that

* downgrade that

* non parallel

* force rebuild

* use llvm

* nauto

* less mem maybe

* print test

* helper_test_exception skip CUDACPU

* helper_test_exception

* shippable
2023-11-16 20:44:14 -08:00
imaolo
0d0c74bac9 Assert for memory allocation failures (#2337)
* assert adequate memory has been freed

* cleaned up runtime error message

* improved metal buffer alloc error catching and reporting

* decreased lines and altered messages

* removed unnecessary  _get_cur_free_space() call

* improved assert message

* added allocate massive buffer test

* added test_lru_allocator_metal_max_buffer_length

* split into two asserts and removed walrus assignment from assert expression

* update assert message and use byte data type for clarity
2023-11-16 20:14:16 -08:00
chenyu
aa01a63b3f cleanup of lines / unused / types (#2336) 2023-11-16 21:15:32 -05:00
chenyu
3971259832 fix test_real_world llama (#2335) 2023-11-16 19:50:08 -05:00
chenyu
3b9dd3330c add device to beam search cache key (#2333) 2023-11-16 18:35:08 -05:00
Friedrich Carl Eichenroth
75676ab8e1 Profiling-helper (#2321)
* change profiler

* remove unused imports

* remove unused imports

* change lazybuffer references

* remove unused line

* remove unused import

* remove unused stuff

* add types

* typing

* typing

* typing

* trigger actions

* -1 loc

* fixup

* trigger actions

* revert lazy typing changes

* WIP profiler helper

* replace old start & stop profiler

* fixup

* linting

* Update llama.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-16 14:15:56 -08:00
mmmkkaaayy
8235da11dd whisper: support batch inference, add librispeech WER test (#2074)
* whisper: support batch inference, add librispeech WER test, add kv caching and JIT

* remove JIT_SUPPORTED_DEVICE

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-11-16 13:50:08 -08:00
George Hotz
3baaf298d6 two stage cumsum in tensor.py (#2331)
* two stage cumsum in tensor.py

* 2 more kernels for llama cumsum

* gpt-2 and llama use fast multinomial
2023-11-16 12:09:53 -08:00
chenyu
163b2bc26a wgpu.utils._device -> wgpu.utils.device (#2330)
* wgpu.utils._device -> wgpu.utils.device

* can i do this?

* no need to specify metal
2023-11-16 12:52:13 -05:00
chenyu
27f4c26312 fix getitem slice when end < start (#2329) 2023-11-16 11:20:27 -05:00
chenyu
822d6e6f18 Simpler mops verify (#2325)
* rewrite the to_movement_ops check using symbolic

* tweak
2023-11-15 21:47:18 -05:00
George Hotz
ef67d7ff5d shapetracker whitespace 2023-11-15 15:24:09 -08:00
chenyu
a98511561c fuzz_linearizer same api for interpreted and compiled (#2320) 2023-11-15 17:40:22 -05:00
George Hotz
294e71de15 remove lines (unused code) (#2319)
* remove lines

* uhh, i'm tired

* that function never worked

* types for ast_parse
2023-11-15 14:36:11 -08:00
George Hotz
628365eab6 JIT cleanups (#2317)
* cleanup cleanup

* dedup update_stats
2023-11-15 13:34:52 -08:00
forcefieldsovereign
b64738e1d6 Remove AS_STRIDED from shapetracker (#2216)
* very close

* remove comment

* negative strides working

* almost everything passes

* calculate offset with list comprehension

* some cleanup

* got disk load working

* review suggestions

* fix after merge

* overlap working

* did it

* clean

* fixed disk load

* lint

* mypy

* removed as_strided

* trying without simplify

* added back simplify

* make sure expanding to smaller shape

* cleanup

* removed comment

* removed env file

* trying whisper test again

* onnx test sqlite issue

* working on test

* finished test

* eliminate unnecessary shrink-then-pad

* don't shrink buffer

* added strides check

* added to ci under linters

* switch issue

* allow symbolic stride

* removed .env

* isinstance

* adjust strides for double expand

* cleanup

* needed to add type hint for mypy

* set pythonpath
2023-11-15 15:50:17 -05:00
Marcello Fuschi
b8d460d203 Add Tensor.multinomial (#2295)
* add Tensor.multinomial only with replacement

* add support for 2D input in Tensor.multinomial

* fix multinomial output shape

* allow passing replacement=False to Tensor.multinomial when num_samples=1

* improve tests for Tensor.multinomial

* fix edge case in Tensor.multinomial

* Tensor.multinomial no more staticmethod
2023-11-15 11:38:39 -08:00
taher
cb6cfcc8f8 add icb support check for metal device (#2313) 2023-11-15 11:37:28 -08:00
George Hotz
70a65c201e JIT support in Interpreted (#2314)
* factor that out

* jit is supported everywhere

* fix some tests

* there's no jit supported device, the jit is everywhere

* fix test uops
2023-11-15 11:13:38 -08:00
chenyu
9a20bc08d6 Tensor(None) is Tensor([]) (#2316) 2023-11-15 13:49:18 -05:00
chenyu
f1f863c953 allow 0-dim array to broadcast into zero shape tensor (#2315)
* allow 0-dim array to broadcast into zero shape tensor

* not in
2023-11-15 13:12:21 -05:00
George Hotz
4da2ddea6e Interpreted cleanups (#2312)
* move the compiler out of ops

* don't return realized

* var_vals filter, fix custom

* typing
2023-11-15 09:02:23 -08:00
chenyu
123a0b86b2 support zero in shape (#2303)
* zero in shape start

* no assert for that

* if output size is 0, return without exec

* tweak

* strides

* reduce over non-zero

* shrink and expand

* fix import

* test_elementwise where

* cannot reshape from size 0 to size 1

* compiled backend reduce over 0

* zeros for numpy

* reduce over 0 and keepdim resulted in 1

* reduce empty set default values

* compare with same input

* pad test case

* cat test case

* torch does not support that?
2023-11-15 11:57:48 -05:00
qazal
f113a0b83b dtype promotion priorities (#2311) 2023-11-15 07:19:52 -08:00
geohotstan
3c5a51fb3a aaaaaaa finally (#2310) 2023-11-15 07:12:38 -08:00
kormann
cff8375aa2 make self referential AST fast too (#2278)
* cleanup

* linter

* linter

* linter

* rm .buffers

* linter

* linter

* huh?

* cleanup

* typo

* min diff

* property

* rev

* linter

* no matel hack

* minimal properties

* line

* checkout master

* copy_to_device

* idk

* revert

* type

* type

* faast

* speed test

* cleanup test

* softer test

* monotonic

* harder test

* clean code

* cleanup
2023-11-15 07:12:07 -08:00
George Hotz
4f7b1ac0d2 cleanups before interpreted jit (#2306)
* jit mnist

* InterpretedFlopCounter doesn't rely on Interpreted

* allocator for cpu and torch

* types for exec_ast

* fix type issues

* fix onnx, remove print

* always self.from_underlying
2023-11-14 21:44:25 -08:00
mmmkkaaayy
91546225f4 Add cache step for model weights in CI, re-enable whisper test (#2307) 2023-11-14 21:16:04 -08:00
chenyu
175cdbe815 fix pad None will value (#2308) 2023-11-14 23:57:05 -05:00
George Hotz
01f8781c26 fix CI (#2300)
* might work

* might work 2

* might work 3

* sneak that in to llama too

* pin them all
2023-11-14 11:02:59 -08:00
nimlgen
4e0d47533e beam works with var vals (#2296)
* beam works with var vals

* test passes now

* better comment

* linter happy
2023-11-14 13:03:19 -05:00
chenyu
fac8633ba8 explicit opts for test_linearizer_failures (#2299)
* explicit opts for test_linearizer_failures

* typo

* update the invalid check
2023-11-14 11:52:38 -05:00
George Hotz
8916028ddd move BatchExecutor (#2297)
* move BatchExecutor

* refactor to get_optimized_program

* that changed
2023-11-14 08:08:51 -08:00
George Hotz
0cbf6c1811 move things, clean up extra (#2292)
* move things

* idk why pylint needs that now

* delete unused
2023-11-13 20:18:40 -08:00
George Hotz
b1f7f29525 metal indirect command buffers (#2285)
* metal indirect command buffers

* sub 1ms gpt

* metal batch exec is good

* remove whitespace

* input_replace

* fix ci

* useResources

* very simple cacheallocator

* update_stats

* fix CI

* minor

* remove that from jit
2023-11-13 17:58:26 -08:00
chenyu
d86ea188dd support symbolic shape in Interpreted (#2289)
* support symbolic shape in Interpreted

* simpler

* no InterpretedFlopCounter

* tragic NumNode

* regex is hard
2023-11-13 20:13:18 -05:00
George Hotz
6960bcded0 back to 6.54GB for stable diffusion (#2288)
* back to 6.54GB for stable diffusion

* cleanups

* only outputs, not inputs

* err, restore hack for world
2023-11-13 16:50:04 -08:00