qazal
7e8777eee9
faster assign scheduling [pr] ( #7839 )
...
* baseline 87 ms
* 86 ms, only PRELOAD assigns
* refactor to assign_adjacents
* ops_folding
2024-11-22 19:23:59 +08:00
chenyu
6229d87f45
simpler reshape symbolic shape check [pr] ( #7837 )
2024-11-21 22:53:57 -05:00
George Hotz
1d6d842887
move DSP to extra (room for webgpu) [pr] ( #7836 )
2024-11-22 11:32:57 +08:00
chenyu
8ff6cba9f0
simpler swizzle_r new_axis [pr] ( #7835 )
...
new axis are the permuted to end ones
2024-11-21 22:26:41 -05:00
George Hotz
6fc7013463
put all DSP in dsp file [pr] ( #7833 )
2024-11-22 11:22:59 +08:00
George Hotz
e39af63156
no loop assert in ops_python [pr] ( #7834 )
2024-11-22 11:17:36 +08:00
George Hotz
d18b948f48
ptxcompiler isn't a cudacompiler [pr] ( #7832 )
...
* ptxcompiler isn't a cudacompiler [pr]
* hcq types
2024-11-22 10:57:22 +08:00
mesozoic-egg
855f9a767a
add restype for msg method for type annotation and consistency ( #7828 )
...
* no need to explicitly set objc_id as restype
* add restype for type annotation
---------
Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me >
2024-11-22 09:17:58 +08:00
chenyu
d5c9fafff5
default run stable diffusion benchmark with fp16 ( #7831 )
...
and keep the non-fp16 one in mac
2024-11-21 15:58:17 -05:00
chenyu
69e382216d
fix wino conv output dtype for half inputs ( #7829 )
2024-11-21 12:13:54 -05:00
geohotstan
cf1ec90ad4
add inverse trig functions to Tensor ( #7805 )
...
* implement inverse trig functions
* guess we should still test nans?
* magnitude as variable name :D
* reorder onnx_ops ops
* approximation -> x for consistency
* address feedback
* simpler acos
* improvement?
* actually just have asin depend on atan
* actually this is nicer
* remove a comment
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-11-21 09:13:36 -05:00
qazal
5399ff6d06
add UOp.const_with_shape [pr] ( #7825 )
...
* add UOp.const_with_shape [pr]
* lines
2024-11-21 21:13:23 +08:00
qazal
2f884b2384
good suggestions from mypy lineprecision-report for schedule.py [pr] ( #7823 )
...
* good suggestions from mypy lineprecision-report [pr]
* ok if metadata doesn't exist
* same for store
* that's buf_uop
2024-11-21 19:59:51 +08:00
qazal
e378aeb94e
assert view degrade to const tests post scheduler graph_rewrite [pr] ( #7822 )
...
* assert view degrade to const tests post scheduler graph_rewrite [pr]
* low pri, probably tricky, todo
2024-11-21 19:00:41 +08:00
qazal
cdc431803f
early mark uops as realized [pr] ( #7821 )
...
* early mark uops as realized [pr]
* merge with metadata
* aesthetics
2024-11-21 18:02:59 +08:00
qazal
4542c0f000
get buffer size just from Ops.BUFFER [pr] ( #7820 )
2024-11-21 17:00:18 +08:00
qazal
877b440fde
derive device (dname) from UOp [pr] ( #7819 )
2024-11-21 16:38:22 +08:00
qazal
75c082b883
move CONST/BIND -> VALID to matchers ( #7818 )
...
* delete special const
* move CONST/BIND -> VALID to matchers
* unittests
* fix FUSE_ARANGE=1
* split into two upats
* the right way to access view
2024-11-21 16:07:01 +08:00
George Hotz
df6f1815ad
remove jit_cache from self in GraphRunner [pr] ( #7817 )
...
* remove jit_cache from self in GraphRunner [pr]
* add back unused
2024-11-21 13:26:37 +08:00
George Hotz
e9ae2ccd09
_prg to match _buf [pr] ( #7816 )
2024-11-21 12:44:48 +08:00
George Hotz
439911b2e6
disable disable_abstract_method [pr] ( #7815 )
2024-11-21 12:28:57 +08:00
George Hotz
c5d458ce02
BufferSpec and ProgramSpec [pr] ( #7814 )
...
* BufferSpec and ProgramSpec [pr]
* delete preallocate, it's unused
* Revert "delete preallocate, it's unused"
This reverts commit dcfcfaccde .
2024-11-21 12:18:05 +08:00
George Hotz
490a6130af
more hcq typing [pr] ( #7813 )
...
* more hcq typing [pr]
* minor
* less generic
2024-11-21 11:23:07 +08:00
George Hotz
9df5a62c5e
unify to HWQueue [pr] ( #7812 )
...
* unify to HWCommandQueue [pr]
* all is HWQueue
2024-11-21 10:33:08 +08:00
chenyu
11cea00090
lower vs_theoretical conv tflops threshold for nv ( #7811 )
...
less flaky
2024-11-20 20:03:49 -05:00
chenyu
46aa23539f
generate and print mypy lineprecision report ( #7809 )
2024-11-20 16:53:17 -05:00
chenyu
c815d7b56e
run bfloat16 tensor core in metal benchmark ( #7808 )
...
* run bfloat16 tensor core in metal benchmark
* separate task
2024-11-20 15:34:07 -05:00
chenyu
33a496279b
load_state_dict check v.shape instead of v.lazydata.shape ( #7807 )
2024-11-20 14:39:30 -05:00
ignaciosica
fc3154a7b3
metal bf16 tc support [pr] ( #7408 )
...
* add bf16 tc for metal
* hotfix: spacing
* fix tolerance and skip metal bf16 in ci
* hotfix: check for dtype_out
* hotfix: add check for tc.dtype_out is bf16 back
* hotfix: add parens
2024-11-20 14:39:08 -05:00
geohotstan
66a069ee25
add replicate mode to Tensor.pad ( #7802 )
...
* base implementation
* add tests
* actually remove the assertionerror test
* good
2024-11-20 08:39:58 -05:00
George Hotz
eb0bb7dc0b
final dname to device [pr] ( #7806 )
...
* final dname to device [pr]
* oops, fix nv
2024-11-20 20:20:28 +08:00
George Hotz
bc977fec53
dname -> device [pr] ( #7804 )
...
* dname -> device [pr]
* a few more
* only one left
2024-11-20 17:57:14 +08:00
George Hotz
0a74acd90e
add proper typing to HCQ [pr] ( #7803 )
...
* add proper typing to HCQ [pr]
* more types
* and qcom
* HCQProgram has device type
* typed allocator
2024-11-20 17:20:39 +08:00
George Hotz
6688539bc9
rename device to dev so Buffer can be Allocator [pr] ( #7799 )
...
* rename device to dev to Buffer can be Allocator [pr]
* missed those
* update the Program classes also
* more renames
* oops
2024-11-20 15:47:26 +08:00
ttomsa
9adeb1041c
fix advanced setitem with 1 in shape ( #7797 )
...
* fix advanced setitem with 1 in shape
* linter
2024-11-19 20:04:59 -05:00
chenyu
d800a79112
use "signed char" for int8 ( #7796 )
...
* use "signed char" for int8
"char" might be unisgned depends on platform.
fixed `python -m pytest test/test_ops.py::TestOpsUint8::test_interpolate_bilinear` on arm64 linux
* opencl does not have "signed char"
2024-11-19 19:29:54 -05:00
chenyu
f16122f9c4
update README to make it runs with just tinygrad ( #7795 )
2024-11-19 17:25:12 -05:00
ttomsa
170ece6605
fix advanced setitem overlap with 0 ( #7793 )
...
* fix advanced setitem overlap with 0
* fix comment
2024-11-19 16:03:55 -05:00
Gaétan Lepage
159c0bf25e
test_kernel_cache_in_action: fix test ( #7792 )
2024-11-19 13:34:56 -05:00
George Hotz
913a27ee27
from_buffer on metal was never called [pr] ( #7791 )
2024-11-20 00:35:17 +08:00
Eitan Turok
56017c52a0
Raise error when model architecture does not match state dict ( #7772 )
...
* init
* style
* style
* style
* fix test
2024-11-20 00:11:54 +08:00
George Hotz
d71fe7faa5
rename allocator methods to not conflict [pr] ( #7788 )
...
* rename allocator methods to not conflict [pr]
* forgot those
* transfer + offset
2024-11-20 00:10:29 +08:00
chenyu
d5f76462c8
fix CI beautiful_mnist dir ( #7790 )
...
fixed `fatal: not a git repository (or any of the parent directories): .git` because $HOME is not $GITHUB_WORKSPACE
2024-11-19 09:59:02 -05:00
geohotstan
aeaf574a05
add failure test for setitem bug ( #7786 )
...
* add failure test
* rename
* improve tests
* improve tests and no need numpy
2024-11-19 08:54:21 -05:00
qazal
1e31b5ba6b
hotfix: ctx doesn't impact process replay [pr] ( #7785 )
2024-11-19 20:17:01 +08:00
qazal
8360bbd88d
faster assign view check [pr] ( #7781 )
2024-11-19 19:42:51 +08:00
George Hotz
3daa376107
remove numpy from assign [pr] ( #7784 )
...
* remove numpy from assign [pr]
* cast not required
2024-11-19 19:34:53 +08:00
George Hotz
fbb4099b3c
add test for compile3 [pr] ( #7783 )
...
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2024-11-19 19:26:51 +08:00
qazal
4f6071d919
capture the schedule context in process replay [pr] ( #7782 )
2024-11-19 19:12:00 +08:00
qazal
f493d480e3
metadata appending to graph_rewrite ( #7780 )
2024-11-19 18:05:42 +08:00