leopf
5d92efb121
[BUGFIX] Tensor([]).data() ( #7884 )
...
* added test, fix
* fix only for (0,) shape
* Revert "fix only for (0,) shape"
* test_data_empty_multi_dim
2024-11-24 16:42:57 -05:00
geohotstan
ad9df26fba
add test for inconsistent behavior in float to int casting ( #7870 )
...
* found teeny bug
* no healthcheck
* change function name
2024-11-24 07:31:34 -05:00
qazal
5aee78a0a6
fix uop swizzle on BUFFER, new tests ( #7875 )
...
* fix uop swizzle on BUFFER, new tests
* can have view of view
2024-11-24 17:11:09 +08:00
George Hotz
8c3d3181dd
bottom up rewrite fixes substitute [pr] ( #7862 )
...
* single pass rewrite fixes substitute [pr]
* caching for single_pass_rewrite
* allow multiple rewrites
* a simple test
* bottom_up_rewrite is fully flexible
2024-11-23 20:53:37 +08:00
George Hotz
144e9f00df
viz is local, new test, and new quantize [pr] ( #7859 )
...
* viz is local, new test, and new quantize [pr]
* fix mime types
* remove font
* after index
2024-11-23 14:27:10 +08:00
chenyu
c07daf40e7
move attention upcast ( #7830 )
...
still upcast before softmax, but faster because intermediate buffer can be stored in half (as long as qk is within half range).
2024-11-22 17:10:51 -05:00
chenyu
5c5b1b994c
less flaky benchmarks ( #7855 )
...
JIT=2 for metal cifar with HALF, and lower tflops for nv test_gemm_4096. failures in https://github.com/tinygrad/tinygrad/actions/runs/11980239535/job/33404098428?pr=7830
2024-11-22 16:39:39 -05:00
chenyu
3b26e51fce
Tensor.cummax ( #7854 )
...
generalized the existing cumsum and take Ops.MAX in addition to Ops.ADD
2024-11-22 15:55:02 -05:00
chenyu
40d7535eeb
clean up DTYPES_DICT [pr] ( #7845 )
2024-11-22 10:01:34 -05:00
qazal
9828277c03
view doesn't have buffer, fix the tests [pr] ( #7841 )
...
* view doesn't have buffer, fix the tests [pr]
* need assigns
2024-11-22 20:41:55 +08:00
chenyu
69e382216d
fix wino conv output dtype for half inputs ( #7829 )
2024-11-21 12:13:54 -05:00
geohotstan
cf1ec90ad4
add inverse trig functions to Tensor ( #7805 )
...
* implement inverse trig functions
* guess we should still test nans?
* magnitude as variable name :D
* reorder onnx_ops ops
* approximation -> x for consistency
* address feedback
* simpler acos
* improvement?
* actually just have asin depend on atan
* actually this is nicer
* remove a comment
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-11-21 09:13:36 -05:00
qazal
5399ff6d06
add UOp.const_with_shape [pr] ( #7825 )
...
* add UOp.const_with_shape [pr]
* lines
2024-11-21 21:13:23 +08:00
qazal
e378aeb94e
assert view degrade to const tests post scheduler graph_rewrite [pr] ( #7822 )
...
* assert view degrade to const tests post scheduler graph_rewrite [pr]
* low pri, probably tricky, todo
2024-11-21 19:00:41 +08:00
qazal
75c082b883
move CONST/BIND -> VALID to matchers ( #7818 )
...
* delete special const
* move CONST/BIND -> VALID to matchers
* unittests
* fix FUSE_ARANGE=1
* split into two upats
* the right way to access view
2024-11-21 16:07:01 +08:00
George Hotz
e9ae2ccd09
_prg to match _buf [pr] ( #7816 )
2024-11-21 12:44:48 +08:00
George Hotz
c5d458ce02
BufferSpec and ProgramSpec [pr] ( #7814 )
...
* BufferSpec and ProgramSpec [pr]
* delete preallocate, it's unused
* Revert "delete preallocate, it's unused"
This reverts commit dcfcfaccde .
2024-11-21 12:18:05 +08:00
George Hotz
9df5a62c5e
unify to HWQueue [pr] ( #7812 )
...
* unify to HWCommandQueue [pr]
* all is HWQueue
2024-11-21 10:33:08 +08:00
chenyu
11cea00090
lower vs_theoretical conv tflops threshold for nv ( #7811 )
...
less flaky
2024-11-20 20:03:49 -05:00
ignaciosica
fc3154a7b3
metal bf16 tc support [pr] ( #7408 )
...
* add bf16 tc for metal
* hotfix: spacing
* fix tolerance and skip metal bf16 in ci
* hotfix: check for dtype_out
* hotfix: add check for tc.dtype_out is bf16 back
* hotfix: add parens
2024-11-20 14:39:08 -05:00
geohotstan
66a069ee25
add replicate mode to Tensor.pad ( #7802 )
...
* base implementation
* add tests
* actually remove the assertionerror test
* good
2024-11-20 08:39:58 -05:00
George Hotz
eb0bb7dc0b
final dname to device [pr] ( #7806 )
...
* final dname to device [pr]
* oops, fix nv
2024-11-20 20:20:28 +08:00
George Hotz
bc977fec53
dname -> device [pr] ( #7804 )
...
* dname -> device [pr]
* a few more
* only one left
2024-11-20 17:57:14 +08:00
ttomsa
9adeb1041c
fix advanced setitem with 1 in shape ( #7797 )
...
* fix advanced setitem with 1 in shape
* linter
2024-11-19 20:04:59 -05:00
ttomsa
170ece6605
fix advanced setitem overlap with 0 ( #7793 )
...
* fix advanced setitem overlap with 0
* fix comment
2024-11-19 16:03:55 -05:00
Gaétan Lepage
159c0bf25e
test_kernel_cache_in_action: fix test ( #7792 )
2024-11-19 13:34:56 -05:00
Eitan Turok
56017c52a0
Raise error when model architecture does not match state dict ( #7772 )
...
* init
* style
* style
* style
* fix test
2024-11-20 00:11:54 +08:00
George Hotz
d71fe7faa5
rename allocator methods to not conflict [pr] ( #7788 )
...
* rename allocator methods to not conflict [pr]
* forgot those
* transfer + offset
2024-11-20 00:10:29 +08:00
geohotstan
aeaf574a05
add failure test for setitem bug ( #7786 )
...
* add failure test
* rename
* improve tests
* improve tests and no need numpy
2024-11-19 08:54:21 -05:00
qazal
1e31b5ba6b
hotfix: ctx doesn't impact process replay [pr] ( #7785 )
2024-11-19 20:17:01 +08:00
chenyu
26200574dc
load_state_dict test cases when model and data shard differently ( #7774 )
...
current behavior is weird... when model is sharded and state_dict is not, load shards the state_dict and model shard axis does not change.
but if model and state_dict are sharded differently, model shard axis becomes the state_dict axis after load.
it should either always use model shard axis or always use state_dict shard
2024-11-18 16:08:24 -05:00
Francis Lata
a1c1b9547f
Context manager support for tqdm ( #7770 )
...
* add context manager support
* add test case for context manager usage
2024-11-18 14:12:03 -05:00
geohotstan
8100109c9d
Add replicate mode to Tensor.pad ( #7608 )
...
* base implementation
* add tests
* actually remove the assertionerror test
* actually only have reflect for this pr
* change the 4 if-else one liner
* maybe use a lambda
* fix
* maybe a lil cleaner
* fix tests
* complete
* small change
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-11-18 10:55:38 -05:00
chenyu
66d7d5af50
fix Tensor(MultiLazyBuffer) with different dtype should fail ( #7757 )
...
similar to Tensor(LazyBuffer) as we don't cast implicitly
2024-11-17 21:05:45 -05:00
chenyu
df817297b6
fix passing acc_dtype="" to Tensor.prod should fail ( #7750 )
...
similar to sum
2024-11-17 11:38:13 -05:00
chenyu
55707fd00d
fix passing sum_acc_dtype="" to Tensor.sum should fail ( #7748 )
2024-11-17 10:58:41 -05:00
qazal
99024b922b
to_uop one path for all ops part 1 ( #7745 )
...
* flat meta ops
* one path for everything
* add tests
* view is always base
* just run
2024-11-17 20:12:44 +08:00
chenyu
a15a900415
fix Tensor.meshgrid for 1D input and check indexing ( #7740 )
2024-11-16 23:39:30 -05:00
geohotstan
72a41095bc
add Tensor.meshgrid ( #7714 )
...
* initial implementation and test
* some other places that can use meshgrid
* revert the onnx_ops change
* add to docs
* revert interpolate too
* update
* improve edge case test
* might as well test grad
* add to test can improve docs
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-11-16 23:06:47 -05:00
chenyu
f1efd84c92
fix repeat_interleave with negative dim ( #7734 )
2024-11-16 10:15:29 -05:00
chenyu
e3105675fb
cond.where(True, False) is cond ( #7733 )
2024-11-16 09:44:17 -05:00
ignaciosica
597a239e28
Remove UnaryOps, BinaryOps, TernaryOps, MetaOps [pr] ( #7725 )
...
* remove unaryops
* remove ternaryops
* remove metaops
* hotfix
* remove binaryops
* hotfix: test_pattern_matcher
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2024-11-16 20:56:56 +08:00
chenyu
22da31b223
clean up Tensor.dot ( #7728 )
...
more docs (similar to numpy) and removed many confusing `-min(n2, 2)`
2024-11-15 18:21:15 -05:00
chenyu
4338c450ac
fix max_pool2d for int tensor with padding ( #7726 )
...
padding inf messed output dtype
2024-11-15 16:22:11 -05:00
chenyu
aeb1301bab
enable a few tests that work now ( #7721 )
...
should mark the ones that are expected to work with expectedFailure, and delete and ones that are not expected to work
2024-11-15 14:30:52 -05:00
qazal
bddee26114
Ops.VALID cleanup, move recursive tests [pr] ( #7713 )
2024-11-15 20:22:46 +08:00
qazal
703a255301
use the method_cache in test_schedule [pr] ( #7712 )
...
* use the method_cache in test_schedule [pr]
* need half
2024-11-15 19:20:47 +08:00
qazal
88f760cc32
test_two_sum doesn't need del ( #7711 )
2024-11-15 18:50:08 +08:00
George Hotz
9b1605eef9
Revert "objdump intel syntax ( #7605 )" ( #7707 )
...
This reverts commit 8f8e375f27 .
2024-11-15 12:13:04 +08:00
ttomsa
8f8e375f27
objdump intel syntax ( #7605 )
...
* objdump intel syntax
* test for objdump intel syntax
* add disassemble to ClangCompiler and LLVMCompiler. Use just llvm-objdump
* linter
2024-11-15 11:32:23 +08:00