Ankit Avinash
7647cd8428
[bounty] Stride is flip ( #8792 )
...
* replace stride with flip
* Complete replacing stride with flip
clean flip function in view.py
fix tests
* fix tests for multi shapetracker
* fix tests for fuzz shapetracker
* fix tests for fuzz shapetracker
* debug
* debug
* fix
* fix
* fix
---------
Co-authored-by: George Hotz <geohot@gmail.com >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-01-31 11:34:10 +09:00
chenyu
0513b0c17d
lower green test_gemm_8192 tflops to 125 [pr] ( #8820 )
...
flaky
2025-01-30 17:30:08 -05:00
Ignacio Sica
f0924e0857
fix and test ( #8814 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-01-30 16:35:53 -05:00
qazal
530961f7d5
realized only exists on base ( #8815 )
...
* realized only exists on base [pr]
* shorter
* update that too
2025-01-30 23:02:25 +02:00
Sieds Lykles
7cdc607544
add max as associative ( #8816 )
2025-01-30 16:01:42 -05:00
qazal
5643429c17
give BUFFER UOp a ShapeTracker [pr] ( #8811 )
...
* give BUFFER UOp a ShapeTracker [pr]
* move that
* update contiguous
* test_advancedindex should use movement ops
2025-01-30 22:33:32 +02:00
chenyu
5527f86a8f
skip tests in test_indexing that set stride with lazydata.view [pr] ( #8813 )
2025-01-30 15:17:35 -05:00
nimlgen
a2faa5e49b
am: fix pt free ( #8810 )
2025-01-30 15:14:55 +03:00
Sieds Lykles
78c0455c7a
Better stable sigmoid ( #8806 )
...
Uses `1/(x*x) -> 1/x * 1/x` together with `x/(1+x) -> 1-1/(1+x)` to
rewrite sigmoid instead of `x/((x+1)(x+1)) -> 1/(x+1)*(1-1/(x+1))`
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-01-29 16:08:53 -05:00
Ignacio Sica
260df1a17f
tc_select noop (#8801 )
...
* tc_select noop
* revert changes in test
2025-01-29 13:53:23 -05:00
qazal
ba17786068
do not construct unmasked VALID ( #8759 )
...
* new lines that exist in codegen/ops
* update tests
* update sops.gz (13071 -> 13070 asts)
* fix viz too
* remove that TODO
* diff pruning
* mask assert + device
* work
* diff pruning
* re: fix viz too
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-01-28 20:51:21 +02:00
qazal
3417bc1814
fix ShapeTracker spec for const [pr] ( #8791 )
2025-01-28 19:53:36 +02:00
qazal
e8be8a5835
support lowering CONST(VIEW) in lowerer ( #8785 )
2025-01-28 12:04:41 +02:00
George Hotz
80089536e5
Revert "move llvm_bf16_cast to renderer for CLANG and LLVM [pr] ( #8720 )" ( #8786 )
...
This reverts commit af0452f116 .
2025-01-28 18:59:02 +09:00
mesozoic-egg
af0452f116
move llvm_bf16_cast to renderer for CLANG and LLVM [pr] ( #8720 )
...
* handle bf16 via bitcasting for CLANG and LLVM
* On LLVM, skip float16 cast
* float32 on llvm lite, float32 elsewhere
* code format
* trigger pr
* move to rewriter
---------
Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.mail >
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2025-01-28 18:16:43 +09:00
qazal
aefbc2637f
test fixups from unmasked valid deletion [pr] ( #8776 )
2025-01-28 09:23:30 +02:00
qazal
ed672881b0
remove additions/deletion in pr + check uops are equal [pr] ( #8779 )
...
* use warnings there [pr]
* remove those + move assert_diff [pr]
* warn after log
* remove
* back
2025-01-28 08:57:34 +02:00
George Hotz
62655e4999
move multi into engine [pr] ( #8778 )
...
* move multi into engine [pr]
* all runtime is one sz
2025-01-28 09:15:29 +09:00
Ignacio Sica
b240f12593
[TIP-9] rename Opt's amt to arg 2 ( #8770 )
...
* rename Opt amt to arg
* ignore_beam_cache for test_tiny
* move ignore_beam_cache to test_tiny
* move to separate pr
* revert space change
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2025-01-27 14:19:04 -05:00
Ignacio Sica
ed1b573868
ignore beam cache in test_tiny for stateless beam ( #8771 )
2025-01-27 12:56:30 -05:00
George Hotz
3ed146a5ff
Revert "rename Opt amt to arg ( #8767 )" ( #8769 )
...
This reverts commit bf041659a5 .
2025-01-27 23:46:37 +09:00
Ignacio Sica
bf041659a5
rename Opt amt to arg ( #8767 )
2025-01-27 23:36:47 +09:00
George Hotz
96bff0b4f7
contiguous is no longer needed in SGD [pr] ( #8760 )
...
* contiguous is no longer needed in SGD [pr]
* add allow condition
2025-01-27 15:19:11 +09:00
George Hotz
a9d9f98d05
hotfix: those tests fail locally on mac due to buffer count
2025-01-27 07:53:48 +09:00
qazal
ac70f63d4b
tensor_map cleanups [pr] ( #8754 )
...
* tensor_map cleanups [pr]
* update test_schedule too
2025-01-26 11:41:54 +02:00
George Hotz
b53fe7c2fc
remove unused ctx [pr] ( #8751 )
...
* remove unused ctx [pr]
* fix test
2025-01-26 17:59:15 +09:00
George Hotz
b4bf6a7dea
switch backward to use gradient [pr] ( #8235 )
...
* switch backward to use gradient [pr]
* set device correctly, dedup
* why does that fail?
* add noop cast
* simple backward
* fix beautiful_mnist
* touchups
* set in compute_gradient
* uop_count
* uop_count was wrong
* collections
* no note
* skip that test
* update sched kernel counts
* train mnist is 65
* fix metadata and gc
* fixes
* materialize_grads
* no pathlib stuff
* add contiguous_backward, fix bugs
* add some realize
* fix multi
2025-01-26 09:12:16 +09:00
George Hotz
0ffd572e1e
fix multi with no real srcs ( #8749 )
2025-01-26 08:41:00 +09:00
qazal
0e42befc6e
viz cleanups 2 [pr] ( #8748 )
...
* viz cleanups 2 [pr]
* test_viz updates
2025-01-25 19:41:57 +02:00
qazal
a037201168
test_viz cleanups + move to /unit directory ( #8746 )
...
* test_viz cleanups + move to /unit directory
* lint
2025-01-25 14:33:31 +02:00
chenyu
e2b380b743
make UOp.multi real a tuple instead of list [pr] ( #8744 )
...
tuple is immutable. also updated test_rand_like_from_alu test
2025-01-24 20:47:27 -05:00
chenyu
e0e176efbc
failed test case for multi rand_like [pr] ( #8740 )
...
new multi broke multi device dropout
2025-01-24 13:56:51 -05:00
nimlgen
dc10187fc0
am: add am_smi ( #8739 )
...
* am: start monitor
* cleanups
* fixes
* hmm
* progress
* cleanup
2025-01-24 20:16:19 +03:00
George Hotz
e82ba1454b
MultiLazyBuffer is UOp [pr] ( #8662 )
...
* MultiLazyBuffer is UOp [pr]
* this is new mlb
* this is the idea
* progress
* multitensor works
* more movement ops
* this
* MultiLazyBuffer is UOp
* cleanups
* multi axis
* fix more tests
* work
* not that
* add multi grad and move shard to ops
* mops not views
* no double contig
* sweet, all mt tests passing
* port old logic
* remove lbs
* fix realized
* whitespace
* assign tweak
* test_assign_kv_cache_multi passes
* fix is_realized
* fix JIT for multi
* just a few more lines i'll pay them back soon i swear please bro just a few more
* no split reduceop for multi
2025-01-24 13:28:55 +09:00
qazal
8e5bd0cd7a
fix buffer init and skip test_swizzle_failure_permute [pr] ( #8732 )
...
* fix buffer init and skip test_swizzle_failure_permute [pr]
* replace preload with just load
* add
2025-01-23 17:21:38 +02:00
nimlgen
e4512baea4
am: cleanup mm ( #8730 )
...
* am: cleanup mm
* cle
* ops
* entries
2025-01-23 15:49:37 +03:00
qazal
07ec99001a
keep VIEW in big_sink + copy of buffer view spec [pr] ( #8727 )
...
* keep views in sink [pr]
* tests
* things from the gpt2 bug
2025-01-23 11:29:30 +02:00
qazal
6cb74bb630
fix using clone with shrink [pr] ( #8724 )
...
* fix using clone with shrink [pr]
* remove extra arg, add test_clone_with_shrink_realized
2025-01-23 08:28:07 +02:00
qazal
907dfa0e82
image buffer realization spec [pr] ( #8420 )
...
* image buffer realization spec [pr]
* redo the spec
* work
2025-01-22 20:25:22 +02:00
nimlgen
93fb50ce77
allreduce: add flags ( #8713 )
2025-01-22 17:44:31 +03:00
qazal
2dae467b75
scheduler + process_replay import cleanup ( #8711 )
2025-01-22 12:44:07 +02:00
qazal
e3d1464ba4
move assign preload out of schedule item [pr] ( #8710 )
...
* move assign preload out of schedule item [pr]
* fix that
2025-01-22 12:43:57 +02:00
nimlgen
c5e46c5eee
am: recover from any boot interrupt ( #8703 )
...
* am: recover from any load interrupt
* add fuzzer
* nu
2025-01-21 22:22:23 +03:00
George Hotz
018edd934b
don't use view in copy [pr] ( #8704 )
...
* don't use view in copy [pr]
* oh, remove double contig
* fix reps
2025-01-21 09:57:47 -08:00
qazal
d6bf1feaab
remove the "no copy" line from copy_to_device ( #8702 )
...
* delete the no copy one
* add tests
2025-01-21 17:09:33 +02:00
nimlgen
3628f89929
fix deallocate for subbuffers ( #8701 )
...
* fix deallocate for subbuffers
* forgot this
* rm name
* hmm
2025-01-21 16:34:19 +03:00
qazal
f0d424ecdf
Tensor UOps can become a buffer or const after scheduling ( #8698 )
...
* spec
* work
* update test_viewed_consts_do_not_realize
* remove
2025-01-21 12:33:19 +02:00
qazal
e2008c98c3
allow symbolic shape in tensor const parents [pr] ( #8699 )
2025-01-21 12:01:25 +02:00
qazal
66ac0087e8
more high level contiguous tests + scheduler deletions [pr] ( #8695 )
...
* delete those
* move the upat too
* rename ops_folding to just sym
* keep that
2025-01-21 01:52:58 +02:00
qazal
08eb1f1f56
simplify tensors before scheduling [pr] ( #8580 )
...
* delete forced_realize
* put that back
* work
* remove forced_realize
* expectedFailures
* contiguous(buffer)
* multi
* expectedFailures
* cleaner create_subbuffer
* more comments
* remove that
* note
* realizes
* work
* one upat and image is back
* remove
* cleaner
* fix test_complex_backward for now
---------
Co-authored-by: George Hotz <geohot@gmail.com >
2025-01-20 23:42:42 +02:00