qazal
3417bc1814
fix ShapeTracker spec for const [pr] ( #8791 )
2025-01-28 19:53:36 +02:00
George Hotz
96bff0b4f7
contiguous is no longer needed in SGD [pr] ( #8760 )
...
* contiguous is no longer needed in SGD [pr]
* add allow condition
2025-01-27 15:19:11 +09:00
qazal
ac70f63d4b
tensor_map cleanups [pr] ( #8754 )
...
* tensor_map cleanups [pr]
* update test_schedule too
2025-01-26 11:41:54 +02:00
George Hotz
b4bf6a7dea
switch backward to use gradient [pr] ( #8235 )
...
* switch backward to use gradient [pr]
* set device correctly, dedup
* why does that fail?
* add noop cast
* simple backward
* fix beautiful_mnist
* touchups
* set in compute_gradient
* uop_count
* uop_count was wrong
* collections
* no note
* skip that test
* update sched kernel counts
* train mnist is 65
* fix metadata and gc
* fixes
* materialize_grads
* no pathlib stuff
* add contiguous_backward, fix bugs
* add some realize
* fix multi
2025-01-26 09:12:16 +09:00
qazal
8e5bd0cd7a
fix buffer init and skip test_swizzle_failure_permute [pr] ( #8732 )
...
* fix buffer init and skip test_swizzle_failure_permute [pr]
* replace preload with just load
* add
2025-01-23 17:21:38 +02:00
qazal
07ec99001a
keep VIEW in big_sink + copy of buffer view spec [pr] ( #8727 )
...
* keep views in sink [pr]
* tests
* things from the gpt2 bug
2025-01-23 11:29:30 +02:00
qazal
e3d1464ba4
move assign preload out of schedule item [pr] ( #8710 )
...
* move assign preload out of schedule item [pr]
* fix that
2025-01-22 12:43:57 +02:00
qazal
d6bf1feaab
remove the "no copy" line from copy_to_device ( #8702 )
...
* delete the no copy one
* add tests
2025-01-21 17:09:33 +02:00
qazal
f0d424ecdf
Tensor UOps can become a buffer or const after scheduling ( #8698 )
...
* spec
* work
* update test_viewed_consts_do_not_realize
* remove
2025-01-21 12:33:19 +02:00
qazal
e2008c98c3
allow symbolic shape in tensor const parents [pr] ( #8699 )
2025-01-21 12:01:25 +02:00
qazal
66ac0087e8
more high level contiguous tests + scheduler deletions [pr] ( #8695 )
...
* delete those
* move the upat too
* rename ops_folding to just sym
* keep that
2025-01-21 01:52:58 +02:00
qazal
08eb1f1f56
simplify tensors before scheduling [pr] ( #8580 )
...
* delete forced_realize
* put that back
* work
* remove forced_realize
* expectedFailures
* contiguous(buffer)
* multi
* expectedFailures
* cleaner create_subbuffer
* more comments
* remove that
* note
* realizes
* work
* one upat and image is back
* remove
* cleaner
* fix test_complex_backward for now
---------
Co-authored-by: George Hotz <geohot@gmail.com >
2025-01-20 23:42:42 +02:00
chenyu
679b1ad058
move softmax upcast to after subtracting max ( #8684 )
...
* move softmax upcast to after subtracting max
max can always be done in the same dtype without any numerical loss, so this is better when explicitly upcasting in softmax
* skipUnless half
2025-01-20 12:16:32 -05:00
qazal
9e55495b4d
fold double contiguous [pr] ( #8687 )
2025-01-20 14:38:33 +02:00
qazal
ed63ff2372
Remove contiguous on buffer ( #8676 )
...
* remove contiguous on buffer
* spec
* make things that can't be images not images
2025-01-20 13:48:33 +02:00
George Hotz
168c16646a
change create_schedule_with_vars api to big_sink [pr] ( #8677 )
2025-01-19 13:30:26 -08:00
chenyu
beba490ba8
update mask in scaled_dot_product_attention ( #8674 )
...
built is_causal mask with ones_like and start with boolean, and reversed the mask -inf order
2025-01-19 15:19:23 -05:00
chenyu
5842ee56c6
raise if attn_mask is set when is_causal=True in sdpa [pr] ( #8675 )
...
matches torch, also fixed incorrect usage in tests
2025-01-19 12:55:04 -05:00
qazal
2faf8774fe
replace DEVICE of CONST after copy folding ( #8673 )
2025-01-19 11:33:39 -05:00
qazal
d957a4f108
add tests for div buffer collapsing in the scheduler [pr] ( #8671 )
...
* add tests for mul/div buffer collapsing in the scheduler [pr]
* lint
* merge with test_linearizer's version of this
* 4*3
2025-01-18 14:15:29 -05:00
qazal
2b7db9b45d
delete unused cast/bitcast lines from ops.py [pr] ( #8651 )
...
* move cast and bitcast out
* more deletion of bitcast arg
* fix test_bitcast_fuses
* update tests
* work
2025-01-17 03:04:18 -05:00
qazal
81a84aa85a
remove is_unrealized_unmasked_const [pr] ( #8644 )
2025-01-16 05:27:47 -05:00
qazal
a1f70ce7d0
only use BUFFER_VIEW in disk [pr] ( #8629 )
...
* only use BUFFER_VIEW in disk [pr]
* delete can_view
* BUFFER_VIEW op on DISK
* remove that allow_buffer_view=False
* notes
* bitcast is a low-level op too
* this passes on AMD and LLVM
2025-01-15 12:34:15 -05:00
George Hotz
504ad08e73
hotfix: add test_example_matmul_same
2025-01-14 19:03:17 -08:00
George Hotz
bfbe81df71
remove cast before view ( #8613 )
...
* remove cast before view
* greener
* indexing
* that passes too
* openpilot too
* ack
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2025-01-14 15:04:58 -05:00
qazal
ae2229d727
assert kernel buffer limit at compile time [pr] ( #8595 )
...
* remove the BUF_LIMIT assert
* skip the base one
2025-01-13 16:32:07 -05:00
qazal
7562cc0399
better test for reduce swizzle + don't use double dtype [pr] ( #8586 )
...
* better test_permute_rewrite
* use float32
2025-01-13 05:02:21 -05:00
qazal
cff1ee9038
add SINK folding from the tensor_map branch [pr] ( #8562 )
...
* delete is_constant from the scheduler
* add sink folding
* always give BUFFER uops Buffers [pr]
* spec for view, var (bind) and const
* add test_buffer_only_after_realize
* work
* 3 lines
* more work
2025-01-12 03:39:34 -05:00
qazal
87cbff3ac0
always give BUFFER uops Buffers [pr] ( #8572 )
...
* always give BUFFER uops Buffers [pr]
* add test_buffer_only_after_realize
2025-01-11 23:17:09 +02:00
chenyu
d09897c2aa
allow double copy [pr] ( #8559 )
...
fixed ring allreduce pattern and recovered most of the bert step time regression (10% faster), will double check all benchmark
2025-01-10 18:21:01 -05:00
qazal
2fd068ffc0
delete empty op ( #8544 )
...
* simple delete EMPTY op
* there's no schedule for empty
2025-01-09 14:10:15 -05:00
qazal
f6eb0574f2
start tests for putting the tensor graph in a single kernel [pr] ( #8542 )
...
* start tests for putting the tensor graph in a single kernel [pr]
* parallel actually
* better view_left test
* test a softmax
* put all that in sym
2025-01-09 13:33:21 -05:00
qazal
947de23cac
add VIEW(DEVICE) to tensor variable [pr] ( #8529 )
...
* add VIEW(DEVICE) to tensor variable [pr]
* bind 2
* restrict shapetracker
* move var and bind closer
* one less line
2025-01-08 01:39:42 -05:00
qazal
b22494b710
restrict tensor const ShapeTracker in spec [pr] ( #8447 )
...
* restrict tensor const ShapeTracker in spec [pr]
* pass sink srcs
* reject if any of the specs disagree
* deceive mypy
* viz
* default to float
* just check the view
* create_schedule is gone
* test_verify_arg is flaky
2025-01-07 19:05:11 -05:00
qazal
0e97f807e0
test fixup prereqs for delete_buffer_view [pr] ( #8523 )
2025-01-07 11:52:18 +02:00
qazal
ed618a72e7
do not use subbuffer for bitcast ( #8514 )
...
* do not use subbuffer for bitcast
* edit that test
* explicit test for ptx
* ptx
2025-01-06 18:40:46 +02:00
qazal
ed121d235c
spec for CAST_BEFORE_VIEW=1 [pr] ( #8512 )
2025-01-06 10:43:58 +02:00
qazal
eb7df92136
dedup COPY UOp [pr] ( #8506 )
2025-01-06 10:37:20 +02:00
qazal
bd4d7dc4eb
return becomes_map from the scheduler ( #8483 )
...
* return becomes_map from the scheduler
* fix test_schedule
* fix abstractions2
* s/becomes/becomes_map
2025-01-03 22:47:21 +08:00
qazal
0d33391038
delete unused allow_buffer_view=True arg from bitcast [pr] ( #8462 )
2025-01-03 22:20:46 +08:00
qazal
08c9d980dc
use const_like in uop zero folding [pr] ( #8470 )
2025-01-03 01:05:09 +08:00
qazal
f2bee34197
tests for symbolic_simple failing tensor const spec [pr] ( #8469 )
...
* tests for symbolic_simple failing tensor const spec [pr]
* mul is correct
2025-01-02 19:13:16 +08:00
qazal
c7ec0ab674
delete unused View lt support (2) ( #8451 )
...
* delete lt on view (2)
* the scheduler uses symbolic_simple
2024-12-31 07:01:25 +08:00
qazal
866dfa1f23
create_schedule([x.lazydata]) -> x.schedule() in tests ( #8449 )
2024-12-31 03:15:52 +08:00
qazal
7499139239
scheduler renames from the buffer_shape branch [pr] ( #8444 )
...
* scheduler refactors and renames from the buffer_shape branch [pr]
* all unmasked sts are allowed here
* only renames
2024-12-30 16:33:38 +08:00
George Hotz
b71c51191b
tests from remove uop mutability [pr] ( #8442 )
...
* tests from remove uop mutability [pr]
* more test fix
* simpler test fix
* remove that
2024-12-29 12:14:10 -05:00
qazal
34987a03af
const copy folding spec + multi.py behavior [pr] ( #8436 )
...
* const copy folding spec + multi behavior [pr]
* copy from clang, move multi test
2024-12-29 23:12:13 +08:00
qazal
a44cd1e6f7
add collapse_view to the scheduler [pr] ( #8440 )
2024-12-29 21:30:29 +08:00
qazal
b5820a5209
deletions from an ops.py "instant rule" audit [pr] ( #8424 )
...
* UOp.st cleanup 2 [pr]
* deletions from an ops.py instant rule audit [pr]
* note
2024-12-27 00:49:04 +08:00
qazal
9defbc7d54
add symbolic_simple to the scheduler [pr] ( #8419 )
2024-12-26 20:05:08 +08:00