Commit Graph

3196 Commits

Author SHA1 Message Date
chenyu
d424babe2c tensor.py cleanup around Tensor.slice (#2921)
use None for no-op slice and pad
2023-12-22 19:46:39 -05:00
chenyu
089703a390 cleanup test_dtype_alu (#2919)
wrapped long lines and lowered atol for METAL.sin to 2 since atol of two sins are bounded by 2
2023-12-22 17:29:31 -05:00
chenyu
3ba591c3fd less outdated abstraction.py (#2917)
removed some old terms and updated types and code pointers
2023-12-22 15:31:02 -05:00
chenyu
50927defad s/lazydata.realized/lazydata.base.realized/g (#2914)
* s/lazydata.realized/lazydata.base.realized/g

* not that
2023-12-22 14:45:13 -05:00
chenyu
2783e1b50d bugfix Tensor.item when it's unbased (#2913)
it's possible for numel 1 tensor lazydata to be unbased and should call lazydata.base.realized
2023-12-22 13:50:06 -05:00
Oleg Rybalko
c3133adb8c Disk shm refactor (#2912)
* better support for platform dependent flags

* osx test support

* removed unused import and made line length <150

* changed osx ci shm

* lstrip in case SharedMemory._name is passed
2023-12-22 09:23:37 -08:00
chenyu
3855432265 don't use numpy to create Tensor(None) (#2909)
* don't use numpy to create Tensor(None)

empty suffices

* parentheses
2023-12-22 01:07:44 -05:00
chenyu
50cfb1fb3a update onnx model links (#2908)
updated in https://github.com/onnx/models/pull/644
2023-12-22 00:19:41 -05:00
chenyu
1bbeb3fe2f remove the different rtol / atol for openpilot CUDA in benchmark (#2907)
not sure what the issue was but seems to be fixed on master
2023-12-21 22:23:39 -05:00
chenyu
a543d8bea8 fuzz default dtypes for some test_dtype tests (#2906)
* fuzz default dtypes for some test_dtype tests

* ocd

* setUp and tearDown
2023-12-21 22:00:21 -05:00
wozeparrot
5f3d5cfb02 catch cycles in print_tree (#2891)
* feat: smaller tree on references

* fix: shorter line

* fix: huh

* fix: should be all

* feat: cleaner

* fix: extra imports

* fix: pass by reference
2023-12-21 18:40:37 -08:00
George Hotz
4432cb17bb minor cleanups / remove that op (#2905) 2023-12-21 18:24:20 -08:00
chenyu
fd0ba33b38 onnx_ops formatting cleanup (#2904)
also removed a case in safe_numpy that always convert 0-dim array to 1-dim
2023-12-21 20:06:06 -05:00
George Hotz
5cac6338a4 apply the multitensor optimizations in lazy.py (#2901)
* apply the multitensor optimizations in lazy.py

* less lines

* hack for webgpu

* save a line
2023-12-21 13:55:49 -08:00
chenyu
5bf43c9634 reenable one onnx test failed due to dtype (#2902) 2023-12-21 15:50:02 -05:00
chenyu
677ae7673d use np.less and torch.lt for CMPLT (#2899)
also removed one unused output_type
2023-12-21 14:37:24 -05:00
qazal
d2e9245de8 render_locals takes a dtype (#2873)
Co-authored-by: chenyu <chenyu@fastmail.com>
2023-12-21 14:15:28 -05:00
chenyu
6116039f7b don't match dtype with first input in where (#2898)
* don't match dtype with first input in where

`Tensor([1, 2, 3]).where(1.2, 2.3)` the first `[1, 2, 3]` can directly cast into bool without casting float (in broadcasted) first

* cast in one place
2023-12-21 13:02:15 -05:00
chenyu
7dc3352877 increase stable diffusion validation threshold 1e-4 -> 3e-4 (#2897)
saw a flaky CI failure with 1.1e-4, and 3e-4 is a good number
2023-12-21 11:45:25 -05:00
qazal
24e79e0f53 Move the webgpu CMPLT hack to one place (#2895)
* move hacks to one place

* no casting in mlops, move to tensor

* ruff fix
2023-12-21 11:14:56 -05:00
George Hotz
852ef57ba4 fix readme typo 2023-12-21 08:06:24 -08:00
George Hotz
193109a88c hotfix: compare on ids 2023-12-20 23:47:50 -08:00
George Hotz
f6c7833f9f fast compare for lazyop (#2893) 2023-12-20 23:32:27 -08:00
chenyu
1500aca43d remove output_type in ops_cpu and ops_torch (#2892)
now the input types are matched and checked in lazy, we can remove these output_type.
also remove the usage of least_upper_dtype in ops.py since we can just use the input type
2023-12-21 02:11:27 -05:00
chenyu
2d2c4980fe assert for elementwise dtypes in lazy (#2888)
* assert for elementwise dtypes in lazy

* no image hack

* check dtype of scalar for IMAGE=2
2023-12-21 01:42:32 -05:00
George Hotz
41b2a25be6 Fix exponential behavior in lazyops (#2890)
* add cache to ast_parse and lazyop builder

* add caches
2023-12-20 22:06:50 -08:00
George Hotz
8c4a0f8e15 Fix int child count (#2882)
* pad ops broke coder

* that contiguous fixes it

* Update lazy.py

* recursive add

* fix all

* revert that

* todo test
2023-12-20 21:06:27 -08:00
chenyu
8a04107d30 move the op casting logic from mlops to tensor try 2 (#2887)
* unary works

* where works

* add sub mul

* xor div

* CMPLT

* sparse_categorical_crossentropy

* image const

* sparse_categorical_crossentropy
2023-12-20 23:50:37 -05:00
George Hotz
7da2325dc7 get_lazyops() -> lazyops (#2884)
* get_lazyops() -> lazyops

* don't compare empty mem
2023-12-20 18:04:49 -08:00
George Hotz
64dded27f0 pad ops broke coder (#2881)
* pad ops broke coder

* that contiguous fixes it

* Update lazy.py
2023-12-20 17:03:41 -08:00
George Hotz
e1861ab65e remove realize from optimizer (#2880)
* remove realize from optimizer

* one still needed

* opt realize
2023-12-20 16:42:41 -08:00
George Hotz
1765849937 new lazy, benchmark (#2878)
* lazy rewrite, try 2

* min fix tests

* pass contig test

* put broken pads back

* move that to realize

* no contig child fixes array packing

* so wrong

* now that's correct

* base children

* fix bind issues

* disable to_image_idx

* fix tests

* that failure shouldn't break other tests

* more fixes

* fix torch

* skip failing tests in CI

* 1e-7

* half is broken

* 1e-6 margin of error
2023-12-20 14:33:21 -08:00
Peter Cawley
dae8976889 Fix reshape merging with masks (#2877) 2023-12-20 14:00:58 -08:00
George Hotz
8fe24038d8 Revert "mulacc fusion cleanup (#2871)" (#2876)
This reverts commit 863c5b26ed.
2023-12-20 13:26:25 -08:00
qazal
863c5b26ed mulacc fusion cleanup (#2871)
* add mulacc fusion tests

* cleanup the implementation

* fix indent in the test utility

* less verbose
2023-12-20 15:39:54 -05:00
chenyu
e13b4964d7 remove the all_int(shape) check in Tensor._loadop (#2874)
* remove the all_int(shape) check in Tensor._loadop

we can support jittable symbolic shape random with custom rand now, and we can formalize it in the test after threefry is ready

* MOCKHIP false positive
2023-12-20 15:04:50 -05:00
qazal
5f07ef455e update dtypes (#2872) 2023-12-20 15:04:02 -05:00
chenyu
857c35d256 make gpt2 decode output just once at the end (#2869)
also updated function name from greedy_until to generate, as it's not greedy nor until
2023-12-20 12:14:55 -05:00
chenyu
e92069fb1c remove unused symbolic.is_sym_int (#2868) 2023-12-20 11:37:54 -05:00
George Hotz
ca59054463 fix shapetracker math (#2861)
* proper test

* all st math good now

* fix real_strides bug
2023-12-19 22:17:34 -08:00
chenyu
5a739e8c20 update one skipped pad_reshape test that was fine (#2860)
* update one skipped pad_reshape test that was fine

had a typo

* this one passed
2023-12-19 23:25:52 -05:00
chenyu
39af93ed7c minor tensor.py function cleanup (#2859)
* minor tensor.py function cleanup

* where outputs not aligned yet
2023-12-19 22:39:39 -05:00
George Hotz
94f71fe238 random and empty shouldn't reshape 2023-12-19 18:09:03 -08:00
George Hotz
637879af78 add direct install to readme 2023-12-19 18:04:00 -08:00
chenyu
ad233d557f disable reshape merging with masks (#2858)
fuzzer found a bug, and it's not complete
2023-12-19 19:06:16 -05:00
chenyu
1231ec5a02 run the sz.py line count at the end of linter ci (#2857) 2023-12-19 16:33:12 -05:00
George Hotz
ac6ec936cd update contributing 2023-12-19 12:19:14 -08:00
George Hotz
e477cc2f45 hotfix: README is ~25 ops to stop getting PRs about it 2023-12-19 11:53:35 -08:00
Oleg Rybalko
42a038c83f More readable torch_load ext check (#2853)
* more readable extension check

* enable tarfile test

* detach tensor if requires grad in torch
2023-12-19 14:53:15 -05:00
chenyu
172a88e719 skip slow test_indexing on METAL (#2852)
LLVM still runs and is a lot faster, would be curious to know why.
also reworded some error messages and remove regex check
2023-12-19 12:00:54 -05:00