chenyu
5358b0904b
update uop_given_valid if a node becomes const ( #9604 )
...
* update uop_given_valid if a node becomes const
* cleanup
2025-03-27 14:57:46 -04:00
qazal
bf94924d5a
fix viz with nested graph_rewrite ( #9595 )
2025-03-27 13:14:28 +08:00
qazal
e5ff7b23d7
refactor to @track_matches + add failing test_nested_rewrite ( #9592 )
...
* test_nested_rewrite
* refactor to track_matches
* positional arg
2025-03-27 11:11:56 +08:00
George Hotz
3c5161b4cb
add validation of the bounds of Ops.INDEX ( #9503 )
...
* add validation of the bounds of Ops.INDEX
* do mask properly
* more validation
* correct
* fix gated
* add CAST support to vmin/vmax
* fix ptx and image
* ptx no diff
* upat.index also stays
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2025-03-20 12:15:55 +08:00
qazal
0b20f91ce7
remove move_mask from the devectorizer ( #9511 )
...
* remove move_mask from the devectorizer
* add (wrong) ptx
* reason
* enable index addition in PTX, we won't have the INDEX anyways
* space
2025-03-20 11:53:12 +08:00
chenyu
189f62d44f
add rounding to tqdm unit scale ( #9507 )
...
fixed `AssertionError: ' 1.00/10.0 1000it/s]' != ' 1.00/10.0 1.00kit/s]'`
2025-03-19 12:08:46 -04:00
hooved
136cf7b8b1
hotfix: load >2 GiB from disk on macOS ( #9361 )
...
* enable loading >2 GiB buffer from disk on macOS
* handle None case raised by mypy
* add test
* revert fix to repro bug in CI
* tell CI to run a unit test for macOS
* reapply fix
2025-03-07 14:51:58 +08:00
George Hotz
2cc4cb74f0
reorder binops ( #9328 )
...
* reorder binops
* test improvements + fix string tests
* ugh, okay this
2025-03-03 14:58:18 +08:00
qazal
e162aa862d
is_realized only if buffer is allocated ( #9253 )
...
* is_realized only if the buffer is allocated
* fix the image check too
* assert test_lil_model after ExecItems run
2025-02-26 08:58:08 +01:00
Sieds Lykles
9c4d9d9f10
Acc first ( #9232 )
...
* put acc in front of the add chain
* handle the other case
* Make loop collapse more generic
* Remove mulacc_unrolled
* Actually remove it
---------
Co-authored-by: George Hotz <geohot@gmail.com >
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-02-25 22:10:15 -05:00
chenyu
90c3ed17c5
move cast to before softmax in attention ( #9213 )
...
* move cast to before softmax in attention
saved some memory because exp (which is used for backward) are done in half. training bert seems fine and can fit BS=78 now (from 66)
* test
2025-02-24 17:24:59 -05:00
qazal
14aa2395d0
allow VIEW(BUFFER) in Tensor UOps [pr] ( #9210 )
...
* allow VIEW(BUFFER) in Tensor UOps [pr]
* still reshapes
* update becomes_map tests
* bring copy folder to the scheduler
* lint
* only sgd left
* optimizer assign
* 13 kernels
* rename to test_reorder_expand + assert VIEW
2025-02-24 13:06:15 +01:00
qazal
d12efc95d4
support custom name function in viz [pr] ( #9219 )
...
* support custom name function in viz [pr]
* title case
* assert name count in test_track_rewrites_name_fxn
2025-02-24 03:03:25 +02:00
chenyu
2e7c2780a9
CLANG -> CPU ( #9189 )
2025-02-20 18:03:09 -05:00
chenyu
3e22747799
run unit test on windows ci ( #9187 )
...
* factor out testing_minimal in setup.py [pr]
* testing_unit + windows
2025-02-20 14:40:41 -05:00
chenyu
287de4ecc6
use torch in test_gradient ( #9186 )
...
used torch.autograd.grad, but not sure if it can be a template like jax
2025-02-20 12:26:11 -05:00
George Hotz
df3b320f46
rewriter -> devectorizer [pr] ( #9147 )
2025-02-18 12:42:08 +08:00
Ali Ladjevardi
35e9c4657b
Use proper units when printing beam time ( #9103 )
...
* use proper units when printing beam time
* refactor DEBUG=2
2025-02-17 23:41:38 +08:00
George Hotz
4dd10d03b7
move is_increasing to ops [pr] ( #9134 )
2025-02-17 19:27:48 +08:00
George Hotz
1bf66d62cf
symbolic gets its own file [pr] ( #9132 )
2025-02-17 18:55:21 +08:00
quortus
638d925e4e
Prevent const folding in test_payne_hanek_reduction ( #9088 )
...
* Prevent const folding in test_payne_hanek_reduction
* Do not use list as a default parameter
2025-02-17 17:31:10 +08:00
qazal
2d04a75a40
start tracking bottom_up_rewrite in viz [pr] ( #9071 )
...
* start tracking bottom_up_rewrite in viz [pr]
* use the tracking matcher in test_viz
2025-02-14 00:28:10 +01:00
gg
19ae829bd1
test float uop in sym_infer ( #7456 )
...
* float uop in sym_infer
* break line :(
* rerun mypy
* update GlobalCounters types
* revert type change and cast assignments to mem and ops
* cast inferred value to UOp in reshape
* cast hcq, update view reshape to handle inferred float
* rm extra space
* update error
* no type updates
2025-02-13 12:55:28 +08:00
qazal
fd9f9ec772
realized base tensors become RESHAPE(BUFFER) [pr] ( #8994 )
2025-02-10 10:17:54 +01:00
qazal
7eba5fb413
Tensor.empty is RESHAPE(BUFFER) ( #8987 )
...
* empty is RESHAPE(BUFFER)
* eh
* add test_empty_buf
* can we unsupport this
* linter
* Revert "can we unsupport this"
This reverts commit 0f71e1aadb .
2025-02-09 18:42:51 +01:00
uuuvn
09ec33a578
Better errors when relocating against undefined symbol ( #8902 )
2025-02-06 10:13:44 +08:00
George Hotz
af2c2837f6
hotfix: skip broken test, add KERNEL Op
2025-02-03 14:02:55 +08:00
chenyu
5b1fc4dcb2
push cast to branches in UOp where ( #8850 )
2025-02-01 13:55:24 -05:00
Ahmed Harmouche
07d3676019
weights_only=False ( #8839 )
2025-01-31 17:16:47 -05:00
qazal
af4f9d1aa9
use matchers to verify AST shape [pr] ( #8828 )
...
* use matchers to verify kernel AST [pr]
* work
* use swizzle_cnt
* add comment
* imports
* modified_ast comment
* brief
2025-01-31 09:17:42 +02:00
Ankit Avinash
7647cd8428
[bounty] Stride is flip ( #8792 )
...
* replace stride with flip
* Complete replacing stride with flip
clean flip function in view.py
fix tests
* fix tests for multi shapetracker
* fix tests for fuzz shapetracker
* fix tests for fuzz shapetracker
* debug
* debug
* fix
* fix
* fix
---------
Co-authored-by: George Hotz <geohot@gmail.com >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-01-31 11:34:10 +09:00
Sieds Lykles
7cdc607544
add max as associative ( #8816 )
2025-01-30 16:01:42 -05:00
qazal
5643429c17
give BUFFER UOp a ShapeTracker [pr] ( #8811 )
...
* give BUFFER UOp a ShapeTracker [pr]
* move that
* update contiguous
* test_advancedindex should use movement ops
2025-01-30 22:33:32 +02:00
qazal
ba17786068
do not construct unmasked VALID ( #8759 )
...
* new lines that exist in codegen/ops
* update tests
* update sops.gz (13071 -> 13070 asts)
* fix viz too
* remove that TODO
* diff pruning
* mask assert + device
* work
* diff pruning
* re: fix viz too
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-01-28 20:51:21 +02:00
qazal
e8be8a5835
support lowering CONST(VIEW) in lowerer ( #8785 )
2025-01-28 12:04:41 +02:00
George Hotz
80089536e5
Revert "move llvm_bf16_cast to renderer for CLANG and LLVM [pr] ( #8720 )" ( #8786 )
...
This reverts commit af0452f116 .
2025-01-28 18:59:02 +09:00
mesozoic-egg
af0452f116
move llvm_bf16_cast to renderer for CLANG and LLVM [pr] ( #8720 )
...
* handle bf16 via bitcasting for CLANG and LLVM
* On LLVM, skip float16 cast
* float32 on llvm lite, float32 elsewhere
* code format
* trigger pr
* move to rewriter
---------
Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.mail >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-01-28 18:16:43 +09:00
qazal
aefbc2637f
test fixups from unmasked valid deletion [pr] ( #8776 )
2025-01-28 09:23:30 +02:00
George Hotz
b4bf6a7dea
switch backward to use gradient [pr] ( #8235 )
...
* switch backward to use gradient [pr]
* set device correctly, dedup
* why does that fail?
* add noop cast
* simple backward
* fix beautiful_mnist
* touchups
* set in compute_gradient
* uop_count
* uop_count was wrong
* collections
* no note
* skip that test
* update sched kernel counts
* train mnist is 65
* fix metadata and gc
* fixes
* materialize_grads
* no pathlib stuff
* add contiguous_backward, fix bugs
* add some realize
* fix multi
2025-01-26 09:12:16 +09:00
qazal
0e42befc6e
viz cleanups 2 [pr] ( #8748 )
...
* viz cleanups 2 [pr]
* test_viz updates
2025-01-25 19:41:57 +02:00
qazal
a037201168
test_viz cleanups + move to /unit directory ( #8746 )
...
* test_viz cleanups + move to /unit directory
* lint
2025-01-25 14:33:31 +02:00
George Hotz
018edd934b
don't use view in copy [pr] ( #8704 )
...
* don't use view in copy [pr]
* oh, remove double contig
* fix reps
2025-01-21 09:57:47 -08:00
qazal
f0d424ecdf
Tensor UOps can become a buffer or const after scheduling ( #8698 )
...
* spec
* work
* update test_viewed_consts_do_not_realize
* remove
2025-01-21 12:33:19 +02:00
Sieds Lykles
1a15c0e89d
Move define_acc down an unrolled add chain ( #8404 )
...
* Move define_acc down an unrolled add chain
* Prevent possible infinite recursion
* Add test
* Fix typo in test
* Move mulacc_unrolled to devoctorize + load_store_indexing pass
* Add test for mulacc_unrolled by itself
* undo formatter
* import from ops, not rewriter
* Add a const version
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-01-20 14:56:27 -05:00
George Hotz
46a8c5e1e5
delete forced_realize ( #8615 )
...
* delete forced_realize
* put that back
* expectedFailures
* cleaner create_subbuffer
* more comments
---------
Co-authored-by: qazal <qazal.software@gmail.com >
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2025-01-20 09:40:36 -08:00
George Hotz
98d01a059d
rename uopgraph to rewriter [pr] ( #8682 )
2025-01-19 17:03:12 -08:00
qazal
2b7db9b45d
delete unused cast/bitcast lines from ops.py [pr] ( #8651 )
...
* move cast and bitcast out
* more deletion of bitcast arg
* fix test_bitcast_fuses
* update tests
* work
2025-01-17 03:04:18 -05:00
eliotgolding
0289fbb1c2
limit real_size to the size of first View of ShapeTracker ( #8628 )
...
* fix real_size
* add fuzzer; typing
* spacing
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-01-16 16:27:39 -05:00
George Hotz
c85737c200
assert to prepare for grad uop [pr] ( #8280 )
...
* assert to prepare for grad uop [pr]
* fix test_nn
* fix most of test_tensor
* few more tests
* fix multi
* uniform gradient
* acc_dtype
* any for multi
* fix typing
* fix assert, CAST_BEFORE_VIEW is still the issue
* explict test for CAST_BEFORE_VIEW
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2025-01-14 13:26:56 -08:00
George Hotz
fdd46c9f28
delete view instant rule ( #8616 )
...
* remove cast before view
* greener
* indexing
* delete view instant rule
* that passes too
* openpilot too
* ack
* base on cast_before_view
* add it as a rewrite rule
* VIEW(DEVICE) is also fine
* test_shard_memory depends on forced_realize removal
* put that back, will go soon
* UOp representations change once we don't instantly fold things
* do not duplicate tests
---------
Co-authored-by: qazal <qazal.software@gmail.com >
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2025-01-14 16:15:13 -05:00