Commit Graph

4618 Commits

Author SHA1 Message Date
chenyu
2ef33abd20 some unary functions cast int input into float (#2740)
* some unary functions cast int input into float

* precision

* image dtype
2023-12-13 00:10:29 -05:00
Shawn Hagler
51afe938f1 update onnx model links (#2737) 2023-12-12 19:11:11 -08:00
chenyu
0869e7a301 update onnx benchmark urls (#2735)
onnx is remapping the models, old ones are in archive/
2023-12-12 20:46:01 -05:00
George Hotz
6d6eb9302d ruff checks the max line length is 150 (#2734)
* ruff checks the max line length is 150

* fix tensor.py

* a lot more

* done
2023-12-12 17:34:47 -08:00
chenyu
00b611c156 simplify type promotion - remove weak types (#2730) 2023-12-12 16:12:57 -05:00
chenyu
ef6e942a23 dtype promotion helpers (#2724)
* dtype promotion helpers

* better tests

* space
2023-12-11 23:14:23 -05:00
Christopher Mauri Milan
0232db294d fix tolist issue (#2723) 2023-12-11 19:14:00 -08:00
chenyu
4075208127 some dtype creation spec test cases (#2722) 2023-12-11 19:33:49 -05:00
Guy Leroy
ee9e1d3662 Extend available types for safe_save (#2720)
* Extend available types to save with

* Linter fix
2023-12-11 14:50:35 -08:00
qazal
a43bc78804 fix dtypes helpers for integers (#2716)
* scalar

* maybe do this instead

* Revert "scalar"

everything is a scalar

* add tests in test_dtype

* fuzz testing + fix unsigned ints

* fuzz everything
2023-12-11 09:28:19 -08:00
chenyu
2ee6f689c5 simpler einsum (#2700) 2023-12-10 21:24:44 -05:00
George Hotz
0fd44259cd bf16 fix + cleanups from mixtral (#2698)
* bf16 fix + cleanups from mixtral

* generic bf16 cast
2023-12-10 16:31:52 -08:00
Davi Silva
7fbebb3df6 Implement einsum (#2686)
* hopeful impl for Tensor.einsum

* satisfy mypy by having less typing. :(

* a few simple tests

* even more tests

* permute tests

* xfails for improper usage

* fix LLVM test fail

* use argfix

* more helpful error message on shape mismatch
2023-12-10 15:56:01 -08:00
chenyu
2d0e38e201 fix jit input_rawbuffers check wrt consts (#2689)
* fix jit input_rawbuffers check wrt consts

* .numpy()
2023-12-09 15:59:03 -05:00
geohotstan
67ff2b2b18 Formatted test_indexing (#2688)
* added tensor.clone() for more correct cloning behavior

* some work and randint issue

* formatted

* final cleanups

* oops, bug fix
2023-12-09 11:38:36 -05:00
chenyu
0fb1d47aa0 two linearizer fuzzer failed test case for webgpu (#2685)
* add a linearizer fuzzer failed for webgpu

* CI specific
2023-12-08 22:52:34 -05:00
qazal
73b067f5ce Bitcast p2 bfloat16 tests + clang fix (#2635)
* add bf16 test support

this model takes me almost a minute to download though:

https://huggingface.co/TinyPixel/Llama-2-7B-bf16-sharded/resolve/main/pytorch_model-00001-of-00014.bin?download=true: 100%|█████████████████████████████| 981M/981M [00:40<00:00, 24.2MB/s]

* ensure we first load if it is bitcast to avoid taking the address of an rvalue

* tiny bf16 in the cloud

skip GPU

* should skip torch

lint

* Revert "ensure we first load if it is bitcast to avoid taking the address of an rvalue"

This reverts commit b86a28ab84.

* break the kernel

* skip LLVM and GPU in CI

* skip CUDA
2023-12-08 10:30:10 -08:00
qazal
a29538a094 green more dtypes tests (#2656)
* universal test cast

* disable div

* midcast fixup

* add 64-bit types

* hack maximum

* use Metal precise::sin instead of default

This is because the default sin function defaults to single-percision math: https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf#page=164

* LLVM code_for_op support for var_dtype

* comment out maximum for now with a TODO explaining it

* Revert "hack maximum"

This reverts commit d170048c5f.

* make the comment more specific

* slightly more forgiving

* ok does this fail in all backends?

* weird its only Metal CI

* add graph

* skip sin of nan for CUDACPU

This is only happening in the CUDACPU runtime and not CUDA itself. https://github.com/tinygrad/tinygrad/actions/runs/7128973726/job/19412000385#step:16:36

* METAL and CUDACPU behave differently in overflows with numpy running on CI

* that skip is wrong

* skip fp16 tests on LLVM similar to test_dtype

original commit that skipped LLVM in CI 1826ff6b89

* remove all of sin from CUDACPU

* limit range of values in CUDACPU and METAL CI

* Revert "use Metal precise::sin instead of default"

This reverts commit d960094d4a.

* change atol and rtol for Metal sin

* METAL CI is more imprecise

* cleanup

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2023-12-08 10:29:20 -08:00
George Hotz
4164d0ebbd multitensor start (#2676)
* multitensor work

* early gen fixes the tests

* atol for flaky test
2023-12-07 17:07:05 -08:00
Ahmed Harmouche
4b01839774 support vals on WebGPU, run more tests (#2668)
* Vals on webgpu, run more tests

* Skip slow tests, run symbolic ops tests

* Balance out tests
2023-12-07 16:45:21 -08:00
geohotstan
d02ff21f1a enable test_index and test_advancedindex (#2648)
* enable test_index and test_advancedindex with pretty diff

* removed contig

* created set_ helper function

* comment change

* del empty line

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2023-12-07 19:44:39 -05:00
George Hotz
00d9eda961 FROM -> COPY, move vars_from_ast (#2675) 2023-12-07 16:32:30 -08:00
chenyu
51af99367f fix fuzz_linearizer using new device Buffer (#2674) 2023-12-07 19:21:47 -05:00
nimlgen
650117a8f6 split large jit into several graphs (#2650)
* jit graph split

* update

* that's fine, not all buffers are there now

* use logariphmic tho, seems good

* no keep it simple

* add test

* simplify

* split graph when jit item cannot be graphed
2023-12-07 10:58:25 -08:00
chenyu
fd21eced74 reduce gpt2 kernel count in test_real_world (#2663) 2023-12-06 21:57:04 -05:00
chenyu
371005cb2d use one kvcache tensor in gpt2 instead of two separate caches (#2662)
* use one kvcache tensor in gpt2

* test case

* is None

* better test cases
2023-12-06 20:59:17 -05:00
George Hotz
5a7b2ff1b2 masked shapetrackers (#2657) 2023-12-06 11:22:26 -08:00
chenyu
b931a20882 minor shapetracker cleanup (#2652) 2023-12-06 11:43:52 -05:00
qazal
c704a77ca0 green dtypes ALU tests (#2617)
* dtypes alu test

* those types don't exist in torch

* floats

* more tests

* disable those

* a couple unary tests

* skip float16 tests in CI for GPU

* fix LLVM bool add True+True=1+1=2 which truncates to False in native LLVM

* remove hardcoded float for LLVM ALU fns

* less sensitive atol for fp32, 1e-10 is flaky and sometimes failed even if you revert the merge commit for non-fp32 math, nothing has changed in our kernels for fp32.

* return on overflows

* fix CUDA exp2

* compute results of op regardless of bounds in a python backend

* skip fp16 in GPU and CUDACPU

* fuzz a smaller range in the float_midcast_int32 test

I sampled this and we overflow ~70% of the time.
because numpy behaves differently on different devices for overflows and Metal seems to do the same, I'm opting to eliminate the non-determinism here

* remove CUDA exp2 overload it's already there now

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2023-12-06 08:15:46 -08:00
Amrit Sahu
71d989b476 adding test to cover #2644 failure (#2645) 2023-12-06 11:00:30 -05:00
Ahmed Harmouche
50dcd532d5 Get all WEBGPU test_ops passing (#2646)
* Get all WEBGPU tests passing

* Custom render store is not needed in wgsl
2023-12-06 07:40:37 -08:00
qazal
be09cc87c1 Bitcast support / fast bf16 load (#2011)
* bitcast renderers

* fast llama load

* make it one kernel

* regression testing p1: re-enable test_dtype for all backends

fix GPU

* regression testing p2: fuzz all possible cases against numpy

remove hancoded tests since the fuzzer covers them

* define ushort

* fix indent, probably need flake8 back for CI to catch

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-12-05 16:19:28 -08:00
George Hotz
232ed2af3f more test cleanups (#2631)
* more test cleanups

* move test example back
2023-12-05 16:17:57 -08:00
wozeparrot
6d58c19736 binaryops xor (#2627)
* feat: initial xor

* feat: numpy xor

* feat: llvm xor

* feat: quick test for xor

* feat: slightly working xor in torch

* feat: xor in tensor

* feat: slightly better test
2023-12-05 13:21:42 -08:00
George Hotz
c53e854687 cast image doesn't work on nvidia (#2626)
* cast image doesn't work on nvidia

* hmm, interpreteds use buffer size 0

* fix type

* no lru
2023-12-05 12:48:19 -08:00
George Hotz
8c67eb1c92 GPT bugfixes (#2624)
* simple fixes

* fix exp2

* fixed

* parallel beam for CUDA

* fix image dtypes
2023-12-05 11:42:28 -08:00
chenyu
8903a40541 update the onnx test so cuda local run passes (#2623) 2023-12-05 14:04:17 -05:00
George Hotz
35b5e95097 parallel beam search (#2610)
* better print

* fix beam search with vars

* cleanups

* parallel is not default

* restore that

* bugfix

* cleanups

* bugfix
2023-12-05 10:09:45 -08:00
chenyu
dd8b4632a4 regression test for reshape fix #2616 (#2620) 2023-12-05 11:46:33 -05:00
chenyu
c257a0dd99 minor reshape cleanups (#2619)
* minor reshape cleanups

* mea culpa
2023-12-05 11:23:17 -05:00
geohotstan
fc00da538d helper functions for test_indexing.py (#2615)
* add some helpers

* I think it should all work..

* fixed get_set_tensor

* done

* del import

* bye bye typing

* style

* remove empty lines lol

* deleted dtype arg

* del trailing space
2023-12-05 02:00:41 -05:00
chenyu
7322ab8dfd onnx tests with different dtypes (#2612) 2023-12-05 00:04:08 -05:00
geohotstan
f12bcccb87 [ready] refactor getitem round 2 :D (#2568)
* new getitem

* go

* add temporary simple tests

* better

* comments

* WOW that took awhile

* save 1 line lol

* work

* still need to add comprehensive tests, but i think getitem looks nice :D

* GIMME GREEN CI CHECKMARK PLS

* try..

* k idk

* added tests for errors

* fixed small hack

* added tests

* almost good

* try no contig?

* yay no more contig + comments and spacing

* finishing touches (comments)

* revert regex unittests lol

* add suggested change

* oops I fell asleep yesterday
2023-12-04 22:36:32 -05:00
George Hotz
09b6e254a3 hip compile speed (#2606) 2023-12-04 13:47:40 -08:00
Amrit Sahu
e8d6a6ef2e view.reshape without symbolic (#2218)
* handle reshape of contiguous subparts with explicit mask

* remove the add/remove ones logic in reshape

* accomodate ones in accumulate logic

* make multiply commutative

* fix linting

* make mypy happy

* add test for commutative mul

* merge dimensions in shape_strides for 1 range masks

* add offsets for merging

* fix linting

* add back explicit 1 reshapes

* fix mypy errors

* fix accumulate by includng state

* include non-zero stride dimension in acc

* small cleanup

* more compact to_shape_strides

* more logical cleanup

* compress more

* compress reshape mask

* adding some comments

* small bug fix

* improve test coverage

* remove explicit add remove ones

* small bug in test

* enable test_reshape_splitting_combining

* small fix

* 10 lines less to_shape_strides

* shorten reshape mask

* some more cleanup

* more cleanup

* introduce some symbols for compactness

* more symbols

* more cleaner

* lessen symbols, it became less readable

* remove merge_views from view.reshape

* change to_shape_strides to _merge_dims

* improve readability

* fix corner case

* cleanup

* better handling of 1 <= Variable('i',1,10) & new_dim = Variable('i',1,10)

* rewrite _reshape_mask for readability

* fix white space

* add comment

* nice shorthands for readability

* add proof in docs

* small nit

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2023-12-04 12:46:53 -05:00
George Hotz
664475f247 vals is an argument (#2599)
* vals is an argument

* don't even know how that's legal python
2023-12-03 21:50:43 -08:00
George Hotz
fcd0b2ee6c fix multigpu on tinybox (#2595)
* fix multigpu on tinybox

* fixed multigpu
2023-12-03 16:48:07 -08:00
George Hotz
61c0113928 test external_multi_gpu.py (and works in CUDA) 2023-12-03 15:57:13 -08:00
George Hotz
bbeba8ec85 use default dict for external_model_benchmark (#2592)
* device default

* Device.DEFAULT

* half max for cuda

* CUDA_INCLUDE_PATH

* closer to working

* cuda fixups

* Update ops_cuda.py
2023-12-03 15:25:43 -08:00
chenyu
550817389a enable test_sample for all backend (#2593) 2023-12-03 17:20:27 -05:00