qazal
23084fd850
merge merge_views and remove_movement_ops [pr] ( #9333 )
...
* merge merge_views and remove_movement_ops [pr]
* fix that assert
2025-03-03 12:38:59 +01:00
qazal
cdf66cc67f
test: recompute expanded CAST ( #9286 )
...
* those views should merge
* diff cleanup
* gpu
* put it behind CAST_AFTER_EXPAND
2025-02-27 19:22:17 +01:00
qazal
e162aa862d
is_realized only if buffer is allocated ( #9253 )
...
* is_realized only if the buffer is allocated
* fix the image check too
* assert test_lil_model after ExecItems run
2025-02-26 08:58:08 +01:00
George Hotz
3f4eb9006a
test for device mismatch [pr] ( #9250 )
...
* test for device mismatch [pr]
* fix bert
2025-02-26 13:06:33 +08:00
qazal
cbfe95d306
bring cast before view back ( #9230 )
...
* bring cast before view back
* tune it to only trigger on expands
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-02-25 01:50:39 +02:00
George Hotz
c9493e41a6
reorder expand ( #9051 )
...
* reorder expand
* symbolic ops needs resolve here
* s/arg/st + whitespace
* viz
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2025-02-24 13:55:47 +01:00
qazal
14aa2395d0
allow VIEW(BUFFER) in Tensor UOps [pr] ( #9210 )
...
* allow VIEW(BUFFER) in Tensor UOps [pr]
* still reshapes
* update becomes_map tests
* bring copy folder to the scheduler
* lint
* only sgd left
* optimizer assign
* 13 kernels
* rename to test_reorder_expand + assert VIEW
2025-02-24 13:06:15 +01:00
qazal
2eab8021fb
remove inputs+outputs attributes from ScheduleItem [pr] ( #9192 )
...
* remove inputs/outputs from ScheduleItem
* fix test_linearizer
* fix test_conv_shapetracker
* fix test_schedule + lint
* test_image_dtype + multitensor + search
2025-02-21 13:48:11 +01:00
chenyu
2e7c2780a9
CLANG -> CPU ( #9189 )
2025-02-20 18:03:09 -05:00
George Hotz
1bf66d62cf
symbolic gets its own file [pr] ( #9132 )
2025-02-17 18:55:21 +08:00
qazal
2b9ce1235a
simple failing case for reorder expand + keep views in tensor_map [pr] ( #9057 )
2025-02-13 11:22:55 +01:00
Ahmed Harmouche
916d5e7f08
WebGPU f16 support (f16 bounty part 2) ( #8653 )
...
* WebGPU f16 support
* Don't enable f16 yet
* dtype tests passing after bitcast fix
* Maybe all WebGPU green?
* Require shader-f16 in examples
* Minor wgsl touchup
* 1 line shorter
* Simpler
* Add transcendetal support
* log2 nan location mismatch on Vulkan
* Nan skips
2025-02-12 19:46:53 +08:00
qazal
cd77e51810
fix tensor realization bug in #8975 ( #8984 )
...
* fix tensor realization bug in #8975
* that's a reshape now
* work
* works
* give those tests better names
* test when multiple mops result in the same ShapeTracker
* test_become_existing_buf_complex is enough
* that too
2025-02-10 13:51:30 +01:00
qazal
fd9f9ec772
realized base tensors become RESHAPE(BUFFER) [pr] ( #8994 )
2025-02-10 10:17:54 +01:00
qazal
7eba5fb413
Tensor.empty is RESHAPE(BUFFER) ( #8987 )
...
* empty is RESHAPE(BUFFER)
* eh
* add test_empty_buf
* can we unsupport this
* linter
* Revert "can we unsupport this"
This reverts commit 0f71e1aadb .
2025-02-09 18:42:51 +01:00
qazal
55351ebb31
minimal failing test for #8975 [pr] ( #8982 )
2025-02-09 14:10:37 +01:00
chenyu
cfd28517df
move pow folding tests to test_schedule [pr] ( #8955 )
...
not really belongs to test_const_folding
2025-02-07 12:51:43 -05:00
chenyu
488200f16c
move more pow const to rewrite ( #8916 )
...
* move more pow const to rewrite
one less use of _to_const_val
* fix
2025-02-05 20:30:12 -05:00
qazal
af4f9d1aa9
use matchers to verify AST shape [pr] ( #8828 )
...
* use matchers to verify kernel AST [pr]
* work
* use swizzle_cnt
* add comment
* imports
* modified_ast comment
* brief
2025-01-31 09:17:42 +02:00
George Hotz
643c09a6c6
tensor uop spec should be in spec.py [pr] ( #8827 )
...
* tensor uop spec should be in spec.py [pr]
* err, spec.py
* print uops can stay
2025-01-31 13:54:04 +08:00
qazal
a78f0f85d3
remove support for checking tensor uops in FUSE_ARANGE [pr] ( #8829 )
2025-01-31 07:48:28 +02:00
qazal
1fce864a6d
delete multi output support ( #8822 )
...
* delete multioutput for now
* test_schedule
* test_assign too
* linter
* 515 for sd
* update tests and ctx
* update that assign check
2025-01-30 22:45:50 -05:00
qazal
530961f7d5
realized only exists on base ( #8815 )
...
* realized only exists on base [pr]
* shorter
* update that too
2025-01-30 23:02:25 +02:00
qazal
5643429c17
give BUFFER UOp a ShapeTracker [pr] ( #8811 )
...
* give BUFFER UOp a ShapeTracker [pr]
* move that
* update contiguous
* test_advancedindex should use movement ops
2025-01-30 22:33:32 +02:00
qazal
ba17786068
do not construct unmasked VALID ( #8759 )
...
* new lines that exist in codegen/ops
* update tests
* update sops.gz (13071 -> 13070 asts)
* fix viz too
* remove that TODO
* diff pruning
* mask assert + device
* work
* diff pruning
* re: fix viz too
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-01-28 20:51:21 +02:00
qazal
3417bc1814
fix ShapeTracker spec for const [pr] ( #8791 )
2025-01-28 19:53:36 +02:00
George Hotz
96bff0b4f7
contiguous is no longer needed in SGD [pr] ( #8760 )
...
* contiguous is no longer needed in SGD [pr]
* add allow condition
2025-01-27 15:19:11 +09:00
qazal
ac70f63d4b
tensor_map cleanups [pr] ( #8754 )
...
* tensor_map cleanups [pr]
* update test_schedule too
2025-01-26 11:41:54 +02:00
George Hotz
b4bf6a7dea
switch backward to use gradient [pr] ( #8235 )
...
* switch backward to use gradient [pr]
* set device correctly, dedup
* why does that fail?
* add noop cast
* simple backward
* fix beautiful_mnist
* touchups
* set in compute_gradient
* uop_count
* uop_count was wrong
* collections
* no note
* skip that test
* update sched kernel counts
* train mnist is 65
* fix metadata and gc
* fixes
* materialize_grads
* no pathlib stuff
* add contiguous_backward, fix bugs
* add some realize
* fix multi
2025-01-26 09:12:16 +09:00
qazal
8e5bd0cd7a
fix buffer init and skip test_swizzle_failure_permute [pr] ( #8732 )
...
* fix buffer init and skip test_swizzle_failure_permute [pr]
* replace preload with just load
* add
2025-01-23 17:21:38 +02:00
qazal
07ec99001a
keep VIEW in big_sink + copy of buffer view spec [pr] ( #8727 )
...
* keep views in sink [pr]
* tests
* things from the gpt2 bug
2025-01-23 11:29:30 +02:00
qazal
e3d1464ba4
move assign preload out of schedule item [pr] ( #8710 )
...
* move assign preload out of schedule item [pr]
* fix that
2025-01-22 12:43:57 +02:00
qazal
d6bf1feaab
remove the "no copy" line from copy_to_device ( #8702 )
...
* delete the no copy one
* add tests
2025-01-21 17:09:33 +02:00
qazal
f0d424ecdf
Tensor UOps can become a buffer or const after scheduling ( #8698 )
...
* spec
* work
* update test_viewed_consts_do_not_realize
* remove
2025-01-21 12:33:19 +02:00
qazal
e2008c98c3
allow symbolic shape in tensor const parents [pr] ( #8699 )
2025-01-21 12:01:25 +02:00
qazal
66ac0087e8
more high level contiguous tests + scheduler deletions [pr] ( #8695 )
...
* delete those
* move the upat too
* rename ops_folding to just sym
* keep that
2025-01-21 01:52:58 +02:00
qazal
08eb1f1f56
simplify tensors before scheduling [pr] ( #8580 )
...
* delete forced_realize
* put that back
* work
* remove forced_realize
* expectedFailures
* contiguous(buffer)
* multi
* expectedFailures
* cleaner create_subbuffer
* more comments
* remove that
* note
* realizes
* work
* one upat and image is back
* remove
* cleaner
* fix test_complex_backward for now
---------
Co-authored-by: George Hotz <geohot@gmail.com >
2025-01-20 23:42:42 +02:00
chenyu
679b1ad058
move softmax upcast to after subtracting max ( #8684 )
...
* move softmax upcast to after subtracting max
max can always be done in the same dtype without any numerical loss, so this is better when explicitly upcasting in softmax
* skipUnless half
2025-01-20 12:16:32 -05:00
qazal
9e55495b4d
fold double contiguous [pr] ( #8687 )
2025-01-20 14:38:33 +02:00
qazal
ed63ff2372
Remove contiguous on buffer ( #8676 )
...
* remove contiguous on buffer
* spec
* make things that can't be images not images
2025-01-20 13:48:33 +02:00
George Hotz
168c16646a
change create_schedule_with_vars api to big_sink [pr] ( #8677 )
2025-01-19 13:30:26 -08:00
chenyu
beba490ba8
update mask in scaled_dot_product_attention ( #8674 )
...
built is_causal mask with ones_like and start with boolean, and reversed the mask -inf order
2025-01-19 15:19:23 -05:00
chenyu
5842ee56c6
raise if attn_mask is set when is_causal=True in sdpa [pr] ( #8675 )
...
matches torch, also fixed incorrect usage in tests
2025-01-19 12:55:04 -05:00
qazal
2faf8774fe
replace DEVICE of CONST after copy folding ( #8673 )
2025-01-19 11:33:39 -05:00
qazal
d957a4f108
add tests for div buffer collapsing in the scheduler [pr] ( #8671 )
...
* add tests for mul/div buffer collapsing in the scheduler [pr]
* lint
* merge with test_linearizer's version of this
* 4*3
2025-01-18 14:15:29 -05:00
qazal
2b7db9b45d
delete unused cast/bitcast lines from ops.py [pr] ( #8651 )
...
* move cast and bitcast out
* more deletion of bitcast arg
* fix test_bitcast_fuses
* update tests
* work
2025-01-17 03:04:18 -05:00
qazal
81a84aa85a
remove is_unrealized_unmasked_const [pr] ( #8644 )
2025-01-16 05:27:47 -05:00
qazal
a1f70ce7d0
only use BUFFER_VIEW in disk [pr] ( #8629 )
...
* only use BUFFER_VIEW in disk [pr]
* delete can_view
* BUFFER_VIEW op on DISK
* remove that allow_buffer_view=False
* notes
* bitcast is a low-level op too
* this passes on AMD and LLVM
2025-01-15 12:34:15 -05:00
George Hotz
504ad08e73
hotfix: add test_example_matmul_same
2025-01-14 19:03:17 -08:00
George Hotz
bfbe81df71
remove cast before view ( #8613 )
...
* remove cast before view
* greener
* indexing
* that passes too
* openpilot too
* ack
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2025-01-14 15:04:58 -05:00