qazal
e823de3828
viz with bottom_up=True ( #7894 )
...
* add failing test
* single pass it
* linter
2024-11-25 17:56:48 +08:00
qazal
2ca41d6a44
ops metadata map try 2, early fuse [pr] ( #7893 )
...
* make this return early
* delete that
* ops metadata map try 2, early fuse [pr]
2024-11-25 17:08:38 +08:00
qazal
9295c86ddc
delete base op cast [pr] ( #7891 )
2024-11-25 16:38:32 +08:00
qazal
26784c45c6
delete cast arg 2 [pr] ( #7881 )
2024-11-25 16:15:57 +08:00
George Hotz
9d0038bccb
small changes from block linearizer [pr] ( #7888 )
...
* small changes from block linearizer [pr]
* fix test_gc
2024-11-25 15:27:04 +08:00
mesozoic-egg
9e958f2b10
Ptx simplify [pr] ( #7877 )
...
* simplify render_kernel
* cvar in const
* Revert "simplify render_kernel"
This reverts commit 1c8817bea2 .
* CMPNE src match
* src match in cast
* cvar in define_acc
* simplify render_store
* simplify render_kernel
* whitespace
* render_kernel fix fstring
* render newline
* do not embed newline in Ops.WHERE render
* WHERE op fix
* missed a comma
* whitespace
---------
Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.mail >
2024-11-25 15:01:47 +08:00
nib9888
e9c681c839
fix missing final rewrite in viz ( #7883 )
2024-11-25 14:13:33 +08:00
Sieds Lykles
a49a7c4784
Improved mod folding ( #7887 )
...
* Remove uneccessary if statement
In all paths where something_changed was set to True, remainder is
appended so the list can't be empty
* Working version of improved mod folding
* Fix offset calculation
Passing fuzz_symbolic.py to 130_000 so far
Added an extra test
* Cleaner offset calculation
2024-11-24 22:21:34 -05:00
leopf
5d92efb121
[BUGFIX] Tensor([]).data() ( #7884 )
...
* added test, fix
* fix only for (0,) shape
* Revert "fix only for (0,) shape"
* test_data_empty_multi_dim
2024-11-24 16:42:57 -05:00
chenyu
ac57d82a13
test_tiny on real NV/CUDA/AMD/HIP ( #7886 )
...
simple tests that run on real CUDA and HIP
2024-11-24 16:34:54 -05:00
qazal
06a28d83f5
delete extra dtype check in uop const [pr] ( #7880 )
2024-11-25 00:06:52 +08:00
chenyu
31337b49e3
cleanup Embedding call [pr] ( #7869 )
...
reshape on self.weight is noop, and don't need special case for numel 0.
2024-11-24 07:32:26 -05:00
geohotstan
ad9df26fba
add test for inconsistent behavior in float to int casting ( #7870 )
...
* found teeny bug
* no healthcheck
* change function name
2024-11-24 07:31:34 -05:00
qazal
6b8a657085
cleanup group_realizes [pr] ( #7878 )
2024-11-24 18:16:46 +08:00
qazal
5aee78a0a6
fix uop swizzle on BUFFER, new tests ( #7875 )
...
* fix uop swizzle on BUFFER, new tests
* can have view of view
2024-11-24 17:11:09 +08:00
George Hotz
5d28a202b5
make tinychat local ( #7871 )
2024-11-24 14:45:48 +08:00
chenyu
22d5def113
download llama3 70B ( #7868 )
...
use "nvidia/Llama-3.1-Nemotron-70B-Instruct-HF".
```
PYTHONPATH=. JITBEAM=2 python3 examples/llama3.py --download_model --size 70B --quantize int8 --benchmark
```
on M4 Max, 40 sec to load the model and
```
enqueue in 165.15 ms
total 328.54 ms, 3.04 tok/s, 247.46 GB/s, param 221.20 GB/s
enqueue in 5.31 ms
total 168.48 ms, 5.94 tok/s, 482.54 GB/s, param 431.34 GB/s
enqueue in 5.32 ms
total 168.77 ms, 5.93 tok/s, 481.71 GB/s, param 430.60 GB/s
enqueue in 5.69 ms
total 169.51 ms, 5.90 tok/s, 479.61 GB/s, param 428.72 GB/s
enqueue in 5.41 ms
total 168.60 ms, 5.93 tok/s, 482.20 GB/s, param 431.04 GB/s
enqueue in 5.18 ms
total 168.98 ms, 5.92 tok/s, 481.12 GB/s, param 430.08 GB/s
enqueue in 5.43 ms
total 168.82 ms, 5.92 tok/s, 481.59 GB/s, param 430.49 GB/s
enqueue in 5.27 ms
total 168.94 ms, 5.92 tok/s, 481.23 GB/s, param 430.17 GB/s
```
2024-11-23 12:18:31 -05:00
qazal
6a8be3ca1e
don't change lazy state in schedule [pr] ( #7867 )
2024-11-24 00:18:50 +08:00
JaSpa99
28e83e662e
least controversial ( #7863 )
2024-11-23 21:23:30 +08:00
George Hotz
8c3d3181dd
bottom up rewrite fixes substitute [pr] ( #7862 )
...
* single pass rewrite fixes substitute [pr]
* caching for single_pass_rewrite
* allow multiple rewrites
* a simple test
* bottom_up_rewrite is fully flexible
2024-11-23 20:53:37 +08:00
mesozoic-egg
54d8f75d0c
vectorized define_acc does not seem to get used ( #7858 )
...
Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.mail >
2024-11-23 19:46:34 +08:00
qazal
40be9177ba
move swizzle upats to ops, prereq for swizzle tc [pr] ( #7861 )
2024-11-23 18:34:45 +08:00
qazal
27a6cd7822
cleanup swizzle upats [pr] ( #7860 )
...
* cleanup swizzle upats [pr]
* match the rest
2024-11-23 15:19:06 +08:00
qazal
5b2c03e865
defer realize folding to kernel splitting [pr] ( #7849 )
...
* defer realize folding to schedule breaking [pr]
* this is init
* p2
* need to lookup edges
* refactor image cast folding [pr]
* Ops.LOAD diff
* image works
* refactor can_pad
* fix fold_img_cast
2024-11-23 14:29:14 +08:00
George Hotz
144e9f00df
viz is local, new test, and new quantize [pr] ( #7859 )
...
* viz is local, new test, and new quantize [pr]
* fix mime types
* remove font
* after index
2024-11-23 14:27:10 +08:00
qazal
d43613e113
refactor image cast folding [pr] ( #7852 )
...
* refactor image cast folding [pr]
* Ops.LOAD diff
2024-11-23 13:59:21 +08:00
chenyu
c07daf40e7
move attention upcast ( #7830 )
...
still upcast before softmax, but faster because intermediate buffer can be stored in half (as long as qk is within half range).
2024-11-22 17:10:51 -05:00
chenyu
5c5b1b994c
less flaky benchmarks ( #7855 )
...
JIT=2 for metal cifar with HALF, and lower tflops for nv test_gemm_4096. failures in https://github.com/tinygrad/tinygrad/actions/runs/11980239535/job/33404098428?pr=7830
2024-11-22 16:39:39 -05:00
chenyu
3b26e51fce
Tensor.cummax ( #7854 )
...
generalized the existing cumsum and take Ops.MAX in addition to Ops.ADD
2024-11-22 15:55:02 -05:00
ignaciosica
fb10ea563e
typedef bf16 amd ( #7850 )
2024-11-22 14:29:01 -05:00
chenyu
a352a6938f
simplify group_for_reduces in get_index [pr] ( #7851 )
...
what was that
2024-11-22 11:53:21 -05:00
chenyu
af5d77f684
move sint_to_uop from view.py to ops.py [pr] ( #7848 )
...
both sint and uop are in ops.py
2024-11-22 11:15:02 -05:00
chenyu
f6d1201c48
variable_to_uop -> sint_to_uop [pr] ( #7847 )
...
and added type to it
2024-11-22 10:54:59 -05:00
chenyu
40d7535eeb
clean up DTYPES_DICT [pr] ( #7845 )
2024-11-22 10:01:34 -05:00
chenyu
4453ab51e1
use ceildiv in View.stride [pr] ( #7844 )
2024-11-22 08:38:05 -05:00
qazal
9828277c03
view doesn't have buffer, fix the tests [pr] ( #7841 )
...
* view doesn't have buffer, fix the tests [pr]
* need assigns
2024-11-22 20:41:55 +08:00
qazal
7e8777eee9
faster assign scheduling [pr] ( #7839 )
...
* baseline 87 ms
* 86 ms, only PRELOAD assigns
* refactor to assign_adjacents
* ops_folding
2024-11-22 19:23:59 +08:00
chenyu
6229d87f45
simpler reshape symbolic shape check [pr] ( #7837 )
2024-11-21 22:53:57 -05:00
George Hotz
1d6d842887
move DSP to extra (room for webgpu) [pr] ( #7836 )
2024-11-22 11:32:57 +08:00
chenyu
8ff6cba9f0
simpler swizzle_r new_axis [pr] ( #7835 )
...
new axis are the permuted to end ones
2024-11-21 22:26:41 -05:00
George Hotz
6fc7013463
put all DSP in dsp file [pr] ( #7833 )
2024-11-22 11:22:59 +08:00
George Hotz
e39af63156
no loop assert in ops_python [pr] ( #7834 )
2024-11-22 11:17:36 +08:00
George Hotz
d18b948f48
ptxcompiler isn't a cudacompiler [pr] ( #7832 )
...
* ptxcompiler isn't a cudacompiler [pr]
* hcq types
2024-11-22 10:57:22 +08:00
mesozoic-egg
855f9a767a
add restype for msg method for type annotation and consistency ( #7828 )
...
* no need to explicitly set objc_id as restype
* add restype for type annotation
---------
Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me >
2024-11-22 09:17:58 +08:00
chenyu
d5c9fafff5
default run stable diffusion benchmark with fp16 ( #7831 )
...
and keep the non-fp16 one in mac
2024-11-21 15:58:17 -05:00
chenyu
69e382216d
fix wino conv output dtype for half inputs ( #7829 )
2024-11-21 12:13:54 -05:00
geohotstan
cf1ec90ad4
add inverse trig functions to Tensor ( #7805 )
...
* implement inverse trig functions
* guess we should still test nans?
* magnitude as variable name :D
* reorder onnx_ops ops
* approximation -> x for consistency
* address feedback
* simpler acos
* improvement?
* actually just have asin depend on atan
* actually this is nicer
* remove a comment
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-11-21 09:13:36 -05:00
qazal
5399ff6d06
add UOp.const_with_shape [pr] ( #7825 )
...
* add UOp.const_with_shape [pr]
* lines
2024-11-21 21:13:23 +08:00
qazal
2f884b2384
good suggestions from mypy lineprecision-report for schedule.py [pr] ( #7823 )
...
* good suggestions from mypy lineprecision-report [pr]
* ok if metadata doesn't exist
* same for store
* that's buf_uop
2024-11-21 19:59:51 +08:00
qazal
e378aeb94e
assert view degrade to const tests post scheduler graph_rewrite [pr] ( #7822 )
...
* assert view degrade to const tests post scheduler graph_rewrite [pr]
* low pri, probably tricky, todo
2024-11-21 19:00:41 +08:00