chenyu
ff5399f053
move one last dtype test from test_helpers to test_dtype ( #2975 )
2024-01-02 12:37:56 -05:00
Kevin Herro
bd6a0c90a0
add Tensor.split ( #2750 )
...
* add Tensor.split (#2677 )
* fix mypy errors
* add list support for Tensor.split
* fix ruff comments
* match tensor.split api
* simplify split and test_split
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-01-01 22:09:04 -08:00
George Hotz
e7a432b479
search refactor ( #2969 )
...
* minor search cleanup
* now that saves lines
* fix
2024-01-01 17:39:26 -08:00
chenyu
58d3d5030b
vars_from_ast -> LazyOp.vars ( #2965 )
2024-01-01 18:12:38 -05:00
George Hotz
a280cfe169
move dtypes to dtype.py ( #2964 )
...
* move dtypes to dtype.py
* fix urllib
2024-01-01 14:58:48 -08:00
George Hotz
c81ce9643d
move globalcounters to ops ( #2960 )
...
* move globalcounters to ops
* missed a few
* sick of that failing
2024-01-01 14:21:02 -08:00
chenyu
8291986959
Variable.sum -> Node.sum, Variable.ands -> Node.ands ( #2961 )
2024-01-01 16:21:28 -05:00
chenyu
3d720b5761
move expand_idx, iter_idxs and expand_node from symbolic to linearizer ( #2959 )
2024-01-01 14:41:21 -05:00
George Hotz
56f44bd10e
move the compiler cache to be global ( #2957 )
...
* move the compiler cache to be global
* remove non robust test
* remove dead code
2024-01-01 10:59:56 -08:00
George Hotz
063f465604
simpler webgpu ( #2956 )
...
* simpler webgpu
* skip that test
2024-01-01 10:28:59 -08:00
chenyu
50f2e31d26
cleanup float4 grouping in global_load and global_store ( #2942 )
...
* cleanup float4 grouping in global_load and global_store
* fix test decorator
2023-12-27 14:10:04 -05:00
chenyu
54629b56d2
minor cleanup in kernel and linearizer ( #2937 )
...
* minor cleanup in kernel and linearizer
less long line, spaces and colocate variables
* no deadline in hypothesis test
2023-12-26 12:05:32 -05:00
chenyu
820f2e054e
fix PADTO optimization ( #2935 )
...
the correct condition is that PADTO cannot be applied to reduce axis, not Reduce.MAX in ops.
even for Reduce.SUM it's possible that the reduce axis had a div before, and the padded 0 became inf then sum over it is incorrect.
2023-12-25 22:52:49 -05:00
qazal
dca5e4fe74
tensor == tensor should be bool ( #2916 )
...
* return bool
* add tests to the type spec
* fix multinomial
* fix tril
* fix round
* fix NegativeLogLikelihoodLoss
* rm debug
* webgpu
* more webgpu
* bitwise or for adding two bools
* onnx ops dont need to cast anymore
* Revert "bitwise or for adding two bools"
This reverts commit b413babffa .
* workaround for metal neg
* just the tests in the type spec
2023-12-25 12:38:47 -05:00
chenyu
8a8aed23d2
test dtypes of return values of cumsum, argmax/min, multinomial ( #2933 )
...
* test dtypes of return values of cumsum, argmax/min, multinomial
cumsum behaves like sum, and functions that return an index return in dtypes.default_int
* because webgpu is different
2023-12-25 11:33:17 -05:00
chenyu
1fb815e77e
hotfix fix coder. RMSNorm cannot have float16 input ( #2932 )
...
* hotfix fix coder. RMSNorm cannot have float16 input
* update real world test due to new kernels
* more type casts
2023-12-25 02:28:11 -05:00
Will
016aebcd84
Fixed Tensor.randint() not accepting tuple shapes ( #2923 )
...
* ww/Fixed Tensor.randint() to accept shape tuples ()
* ww/Wrote a test to cover this typo
* ww/Updated Tensor random objects to optionally take (,) or *() to be more consistent
* ww/no lint no worries
* ww/Made peace with linter
* ww/Added new line can't reduce line size without reducing readablitity
* ww/reverted to using .mul
2023-12-24 20:32:26 -05:00
Isalia20
8de1fc2539
Einsum space fix ( #2927 )
...
* space removal in formula and a single test to cover it
* space in torch einsum as well
* replacing spaces in a var formula to support truncating all the spaces
2023-12-24 01:23:27 -05:00
chenyu
b55b55d56e
use at least int32 and uint32 for sum output ( #2926 )
...
* use at least int32 and uint32 for sum output
* use the correct type for acc
* fix opencl
* llvm mulacc
2023-12-24 01:14:54 -05:00
chenyu
089703a390
cleanup test_dtype_alu ( #2919 )
...
wrapped long lines and lowered atol for METAL.sin to 2 since atol of two sins are bounded by 2
2023-12-22 17:29:31 -05:00
chenyu
50927defad
s/lazydata.realized/lazydata.base.realized/g ( #2914 )
...
* s/lazydata.realized/lazydata.base.realized/g
* not that
2023-12-22 14:45:13 -05:00
chenyu
2783e1b50d
bugfix Tensor.item when it's unbased ( #2913 )
...
it's possible for numel 1 tensor lazydata to be unbased and should call lazydata.base.realized
2023-12-22 13:50:06 -05:00
Oleg Rybalko
c3133adb8c
Disk shm refactor ( #2912 )
...
* better support for platform dependent flags
* osx test support
* removed unused import and made line length <150
* changed osx ci shm
* lstrip in case SharedMemory._name is passed
2023-12-22 09:23:37 -08:00
chenyu
3855432265
don't use numpy to create Tensor(None) ( #2909 )
...
* don't use numpy to create Tensor(None)
empty suffices
* parentheses
2023-12-22 01:07:44 -05:00
chenyu
50cfb1fb3a
update onnx model links ( #2908 )
...
updated in https://github.com/onnx/models/pull/644
2023-12-22 00:19:41 -05:00
chenyu
1bbeb3fe2f
remove the different rtol / atol for openpilot CUDA in benchmark ( #2907 )
...
not sure what the issue was but seems to be fixed on master
2023-12-21 22:23:39 -05:00
chenyu
a543d8bea8
fuzz default dtypes for some test_dtype tests ( #2906 )
...
* fuzz default dtypes for some test_dtype tests
* ocd
* setUp and tearDown
2023-12-21 22:00:21 -05:00
George Hotz
5cac6338a4
apply the multitensor optimizations in lazy.py ( #2901 )
...
* apply the multitensor optimizations in lazy.py
* less lines
* hack for webgpu
* save a line
2023-12-21 13:55:49 -08:00
chenyu
5bf43c9634
reenable one onnx test failed due to dtype ( #2902 )
2023-12-21 15:50:02 -05:00
George Hotz
193109a88c
hotfix: compare on ids
2023-12-20 23:47:50 -08:00
George Hotz
f6c7833f9f
fast compare for lazyop ( #2893 )
2023-12-20 23:32:27 -08:00
George Hotz
41b2a25be6
Fix exponential behavior in lazyops ( #2890 )
...
* add cache to ast_parse and lazyop builder
* add caches
2023-12-20 22:06:50 -08:00
George Hotz
8c4a0f8e15
Fix int child count ( #2882 )
...
* pad ops broke coder
* that contiguous fixes it
* Update lazy.py
* recursive add
* fix all
* revert that
* todo test
2023-12-20 21:06:27 -08:00
George Hotz
7da2325dc7
get_lazyops() -> lazyops ( #2884 )
...
* get_lazyops() -> lazyops
* don't compare empty mem
2023-12-20 18:04:49 -08:00
George Hotz
e1861ab65e
remove realize from optimizer ( #2880 )
...
* remove realize from optimizer
* one still needed
* opt realize
2023-12-20 16:42:41 -08:00
George Hotz
1765849937
new lazy, benchmark ( #2878 )
...
* lazy rewrite, try 2
* min fix tests
* pass contig test
* put broken pads back
* move that to realize
* no contig child fixes array packing
* so wrong
* now that's correct
* base children
* fix bind issues
* disable to_image_idx
* fix tests
* that failure shouldn't break other tests
* more fixes
* fix torch
* skip failing tests in CI
* 1e-7
* half is broken
* 1e-6 margin of error
2023-12-20 14:33:21 -08:00
Peter Cawley
dae8976889
Fix reshape merging with masks ( #2877 )
2023-12-20 14:00:58 -08:00
George Hotz
8fe24038d8
Revert "mulacc fusion cleanup ( #2871 )" ( #2876 )
...
This reverts commit 863c5b26ed .
2023-12-20 13:26:25 -08:00
qazal
863c5b26ed
mulacc fusion cleanup ( #2871 )
...
* add mulacc fusion tests
* cleanup the implementation
* fix indent in the test utility
* less verbose
2023-12-20 15:39:54 -05:00
chenyu
e13b4964d7
remove the all_int(shape) check in Tensor._loadop ( #2874 )
...
* remove the all_int(shape) check in Tensor._loadop
we can support jittable symbolic shape random with custom rand now, and we can formalize it in the test after threefry is ready
* MOCKHIP false positive
2023-12-20 15:04:50 -05:00
qazal
5f07ef455e
update dtypes ( #2872 )
2023-12-20 15:04:02 -05:00
George Hotz
ca59054463
fix shapetracker math ( #2861 )
...
* proper test
* all st math good now
* fix real_strides bug
2023-12-19 22:17:34 -08:00
chenyu
5a739e8c20
update one skipped pad_reshape test that was fine ( #2860 )
...
* update one skipped pad_reshape test that was fine
had a typo
* this one passed
2023-12-19 23:25:52 -05:00
chenyu
ad233d557f
disable reshape merging with masks ( #2858 )
...
fuzzer found a bug, and it's not complete
2023-12-19 19:06:16 -05:00
Oleg Rybalko
42a038c83f
More readable torch_load ext check ( #2853 )
...
* more readable extension check
* enable tarfile test
* detach tensor if requires grad in torch
2023-12-19 14:53:15 -05:00
chenyu
172a88e719
skip slow test_indexing on METAL ( #2852 )
...
LLVM still runs and is a lot faster, would be curious to know why.
also reworded some error messages and remove regex check
2023-12-19 12:00:54 -05:00
geohotstan
fec8e9060c
Add simple fancy indexing exceptions ( #2706 )
...
* fancy indexing raise error
* updated error message
* improved error check
* oops
* fixed onnx
* oops typo
* merge
* add full_flatten
* try
* merged and updated some tests
* more cleaning
* done
* temp fix onnx
* try
* add todo in onnx_test
* reword
* gah
2023-12-19 11:23:51 -05:00
George Hotz
90fb09b55c
remove unused _device_extra_args
2023-12-18 22:14:58 -08:00
George Hotz
b2192b5400
minor improvements ( #2845 )
2023-12-18 22:09:08 -08:00
George Hotz
d086325b1b
hotfix: failing tests
2023-12-18 21:12:42 -08:00