chenyu
aa01a63b3f
cleanup of lines / unused / types ( #2336 )
2023-11-16 21:15:32 -05:00
chenyu
3971259832
fix test_real_world llama ( #2335 )
2023-11-16 19:50:08 -05:00
chenyu
3b9dd3330c
add device to beam search cache key ( #2333 )
2023-11-16 18:35:08 -05:00
Friedrich Carl Eichenroth
75676ab8e1
Profiling-helper ( #2321 )
...
* change profiler
* remove unused imports
* remove unused imports
* change lazybuffer references
* remove unused line
* remove unused import
* remove unused stuff
* add types
* typing
* typing
* typing
* trigger actions
* -1 loc
* fixup
* trigger actions
* revert lazy typing changes
* WIP profiler helper
* replace old start & stop profiler
* fixup
* linting
* Update llama.py
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2023-11-16 14:15:56 -08:00
mmmkkaaayy
8235da11dd
whisper: support batch inference, add librispeech WER test ( #2074 )
...
* whisper: support batch inference, add librispeech WER test, add kv caching and JIT
* remove JIT_SUPPORTED_DEVICE
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2023-11-16 13:50:08 -08:00
George Hotz
3baaf298d6
two stage cumsum in tensor.py ( #2331 )
...
* two stage cumsum in tensor.py
* 2 more kernels for llama cumsum
* gpt-2 and llama use fast multinomial
2023-11-16 12:09:53 -08:00
chenyu
163b2bc26a
wgpu.utils._device -> wgpu.utils.device ( #2330 )
...
* wgpu.utils._device -> wgpu.utils.device
* can i do this?
* no need to specify metal
2023-11-16 12:52:13 -05:00
chenyu
27f4c26312
fix getitem slice when end < start ( #2329 )
2023-11-16 11:20:27 -05:00
chenyu
822d6e6f18
Simpler mops verify ( #2325 )
...
* rewrite the to_movement_ops check using symbolic
* tweak
2023-11-15 21:47:18 -05:00
George Hotz
ef67d7ff5d
shapetracker whitespace
2023-11-15 15:24:09 -08:00
chenyu
a98511561c
fuzz_linearizer same api for interpreted and compiled ( #2320 )
2023-11-15 17:40:22 -05:00
George Hotz
294e71de15
remove lines (unused code) ( #2319 )
...
* remove lines
* uhh, i'm tired
* that function never worked
* types for ast_parse
2023-11-15 14:36:11 -08:00
George Hotz
628365eab6
JIT cleanups ( #2317 )
...
* cleanup cleanup
* dedup update_stats
2023-11-15 13:34:52 -08:00
forcefieldsovereign
b64738e1d6
Remove AS_STRIDED from shapetracker ( #2216 )
...
* very close
* remove comment
* negative strides working
* almost everything passes
* calculate offset with list comprehension
* some cleanup
* got disk load working
* review suggestions
* fix after merge
* overlap working
* did it
* clean
* fixed disk load
* lint
* mypy
* removed as_strided
* trying without simplify
* added back simplify
* make sure expanding to smaller shape
* cleanup
* removed comment
* removed env file
* trying whisper test again
* onnx test sqlite issue
* working on test
* finished test
* eliminate unnecessary shrink-then-pad
* don't shrink buffer
* added strides check
* added to ci under linters
* switch issue
* allow symbolic stride
* removed .env
* isinstance
* adjust strides for double expand
* cleanup
* needed to add type hint for mypy
* set pythonpath
2023-11-15 15:50:17 -05:00
Marcello Fuschi
b8d460d203
Add Tensor.multinomial ( #2295 )
...
* add Tensor.multinomial only with replacement
* add support for 2D input in Tensor.multinomial
* fix multinomial output shape
* allow passing replacement=False to Tensor.multinomial when num_samples=1
* improve tests for Tensor.multinomial
* fix edge case in Tensor.multinomial
* Tensor.multinomial no more staticmethod
2023-11-15 11:38:39 -08:00
taher
cb6cfcc8f8
add icb support check for metal device ( #2313 )
2023-11-15 11:37:28 -08:00
George Hotz
70a65c201e
JIT support in Interpreted ( #2314 )
...
* factor that out
* jit is supported everywhere
* fix some tests
* there's no jit supported device, the jit is everywhere
* fix test uops
2023-11-15 11:13:38 -08:00
chenyu
9a20bc08d6
Tensor(None) is Tensor([]) ( #2316 )
2023-11-15 13:49:18 -05:00
chenyu
f1f863c953
allow 0-dim array to broadcast into zero shape tensor ( #2315 )
...
* allow 0-dim array to broadcast into zero shape tensor
* not in
2023-11-15 13:12:21 -05:00
George Hotz
4da2ddea6e
Interpreted cleanups ( #2312 )
...
* move the compiler out of ops
* don't return realized
* var_vals filter, fix custom
* typing
2023-11-15 09:02:23 -08:00
chenyu
123a0b86b2
support zero in shape ( #2303 )
...
* zero in shape start
* no assert for that
* if output size is 0, return without exec
* tweak
* strides
* reduce over non-zero
* shrink and expand
* fix import
* test_elementwise where
* cannot reshape from size 0 to size 1
* compiled backend reduce over 0
* zeros for numpy
* reduce over 0 and keepdim resulted in 1
* reduce empty set default values
* compare with same input
* pad test case
* cat test case
* torch does not support that?
2023-11-15 11:57:48 -05:00
qazal
f113a0b83b
dtype promotion priorities ( #2311 )
2023-11-15 07:19:52 -08:00
geohotstan
3c5a51fb3a
aaaaaaa finally ( #2310 )
2023-11-15 07:12:38 -08:00
kormann
cff8375aa2
make self referential AST fast too ( #2278 )
...
* cleanup
* linter
* linter
* linter
* rm .buffers
* linter
* linter
* huh?
* cleanup
* typo
* min diff
* property
* rev
* linter
* no matel hack
* minimal properties
* line
* checkout master
* copy_to_device
* idk
* revert
* type
* type
* faast
* speed test
* cleanup test
* softer test
* monotonic
* harder test
* clean code
* cleanup
2023-11-15 07:12:07 -08:00
George Hotz
4f7b1ac0d2
cleanups before interpreted jit ( #2306 )
...
* jit mnist
* InterpretedFlopCounter doesn't rely on Interpreted
* allocator for cpu and torch
* types for exec_ast
* fix type issues
* fix onnx, remove print
* always self.from_underlying
2023-11-14 21:44:25 -08:00
mmmkkaaayy
91546225f4
Add cache step for model weights in CI, re-enable whisper test ( #2307 )
2023-11-14 21:16:04 -08:00
chenyu
175cdbe815
fix pad None will value ( #2308 )
2023-11-14 23:57:05 -05:00
George Hotz
01f8781c26
fix CI ( #2300 )
...
* might work
* might work 2
* might work 3
* sneak that in to llama too
* pin them all
2023-11-14 11:02:59 -08:00
nimlgen
4e0d47533e
beam works with var vals ( #2296 )
...
* beam works with var vals
* test passes now
* better comment
* linter happy
2023-11-14 13:03:19 -05:00
chenyu
fac8633ba8
explicit opts for test_linearizer_failures ( #2299 )
...
* explicit opts for test_linearizer_failures
* typo
* update the invalid check
2023-11-14 11:52:38 -05:00
George Hotz
8916028ddd
move BatchExecutor ( #2297 )
...
* move BatchExecutor
* refactor to get_optimized_program
* that changed
2023-11-14 08:08:51 -08:00
George Hotz
0cbf6c1811
move things, clean up extra ( #2292 )
...
* move things
* idk why pylint needs that now
* delete unused
2023-11-13 20:18:40 -08:00
George Hotz
b1f7f29525
metal indirect command buffers ( #2285 )
...
* metal indirect command buffers
* sub 1ms gpt
* metal batch exec is good
* remove whitespace
* input_replace
* fix ci
* useResources
* very simple cacheallocator
* update_stats
* fix CI
* minor
* remove that from jit
2023-11-13 17:58:26 -08:00
chenyu
d86ea188dd
support symbolic shape in Interpreted ( #2289 )
...
* support symbolic shape in Interpreted
* simpler
* no InterpretedFlopCounter
* tragic NumNode
* regex is hard
2023-11-13 20:13:18 -05:00
George Hotz
6960bcded0
back to 6.54GB for stable diffusion ( #2288 )
...
* back to 6.54GB for stable diffusion
* cleanups
* only outputs, not inputs
* err, restore hack for world
2023-11-13 16:50:04 -08:00
nimlgen
960535dfb8
get_linearizer_actions does not return illegal actions ( #2287 )
...
* fix some linearizer failures
* linter happy
* no new test class
2023-11-13 11:48:54 -05:00
rodfer
53c5baa8b6
add dilation to avg_pool2d ( #2270 )
...
* add dilation to avg_pool2d
* avg_pool_fix
* avg_pool_fix
* woo
* oops
* force it correct
---------
Co-authored-by: rodfer0x80 <rodfer0x80@proton.me >
Co-authored-by: zibokapi <zibokapi@gmail.com >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2023-11-13 08:47:56 -08:00
chenyu
a72b370066
llama take int and convert to Variable internally ( #2284 )
2023-11-12 17:11:37 -05:00
valar
123ea051e6
refactor/ci: delete many # type: ignore ( #2281 )
...
* refactor/ci: delete many `# type: ignore`
* replace `axis.__class__ is int` with `isinstance(axis, int)` to make mypy happy
* add `--warn-unused-ignores` to mypy flag
refs #2240
* ci: move `--warn-unused-ignores` flag to mypy config
refs #2240
2023-11-12 11:04:20 -08:00
George Hotz
2e2154ae4f
bad hotfix for optimize_local_size, try again
2023-11-12 10:41:11 -08:00
George Hotz
270f747065
hotfix optimize_local_size (TODO: add regression test)
2023-11-12 10:29:00 -08:00
chenyu
f5a62a1b42
fix some tests related to JitItem ( #2279 )
2023-11-11 23:00:35 -05:00
chenyu
5ef8d682e3
clean up attentions in stable diffusion ( #2275 )
2023-11-11 14:25:36 -05:00
chenyu
453f48ce02
pad None means (0,0) ( #2273 )
2023-11-11 09:50:26 -08:00
jxdv
c5d70c1871
typo ( #2271 )
2023-11-11 07:18:04 -08:00
chenyu
880e693207
fix llama n_kv_heads in kvcache ( #2267 )
...
* fix llama n_kv_heads in kvcache
* trigger ci
2023-11-10 21:44:39 -05:00
George Hotz
78623ba204
two simple tests
2023-11-10 16:16:06 -08:00
George Hotz
70fb8a259d
hotfix mypy
2023-11-10 15:43:30 -08:00
George Hotz
6ceea02e65
hotfix of onnx
2023-11-10 15:40:30 -08:00
geohotstan
b853e9bb8c
Onnx 1.15.0 gogogo ( #2217 )
...
* lol
* lol
* add GELULULULUL
* onnx 1.50
* fuk torch bool neg
* exclude regex tests
* exclude dequantizelinear for now
* is sunny in philly
* damn it affinegrid
* fixed auto_pad VALID
* skip 0 shape tests
* add temporary cast in Reduces
* tests should pass now
* added comments and cleanup
* try moving dequantizelinear to onnx.py
* fixed dequantizedlinear?
* cleanup
* try?
* float16 segfaults LLVM CI..???
* cleanup comments
* pin to 1.50.0
* remove use of -np.inf cuz numpy is kill
* 1.50? lol I'm actually retarded
* thx for review, muhbad
* moved Gelu higher up
2023-11-10 15:36:48 -08:00