wozeparrot
2b899164c6
no numpy ( #6751 )
2024-09-26 16:40:18 +08:00
qazal
8a15ccb414
start gc/mem usage tests for buffer schedule [run_process_replay] ( #6737 )
...
* gc tests for buffer schedule [run_process_replay]
* assert global counters, maybe del
* check init
* rm global counters
2024-09-26 08:26:31 +08:00
qazal
b629a7998d
early assert buffer count limit [run_process_replay] ( #6746 )
...
* better error message for buffer count limit [run_process_replay]
* 3.9 needs that
* assert ScheduleItem
* new _test_buf_cnt
2024-09-26 08:24:26 +08:00
wozeparrot
c100f3d406
default threefry ( #6116 )
2024-09-25 17:45:13 +08:00
George Hotz
cb22ef379a
truncate consts early ( #6741 )
...
* truncate consts early
* ptx still fails
* Update dtype.py
2024-09-25 16:49:51 +08:00
qazal
3bf25aae78
start work on global buffer count limit [run_process_replay] ( #6722 )
...
* add a bufs_max option
* simple spec
2024-09-25 09:51:56 +08:00
George Hotz
e015b41ce9
remove e( function just alu( [run_process_replay] ( #6589 )
...
* remove e( function just alu( [run_process_replay]
* missed two
2024-09-19 10:24:02 +08:00
George Hotz
bdd0c06f29
add void type to uop ( #6471 )
...
* unwrap_dtype maybe
* uopgraph stuff that hardcoded None
* test_ops passes
* dtypes.py fixups
* update test_linearizer and friends
* more ast updates
* test_beam and test_schedule too
* add void type to uop [run_process_replay]
* remove dumb casts
* start making it green
* more cast cleanups
* more cls methods to fix
* regenerate dataset
* split UOp and NOp const
* maybe that too
* fix docs
* update test_uop_symbolic
* test_verify_ast
* new sops with no diff
* meh, type_ignore is alright
* remove that assert
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2024-09-11 18:16:28 +08:00
qazal
3cde1503ce
enable graph rewrite in the scheduler ( #6249 )
...
* test: enable
* skip those
* skip pads tests
2024-09-11 14:30:04 +08:00
qazal
262569a3eb
green conv bw AST_REWRITE=1 ( #6466 )
...
* green conv bw AST_REWRITE=1
* new strides and dtype fix
2024-09-11 10:51:24 +08:00
qazal
4259311006
merge views in conv swizzle ( #6464 )
2024-09-11 10:11:01 +08:00
qazal
803b8b9313
conv bw schedule and correctness tests to iterate on ( #6461 )
...
first to fix AST_REWRITE=1, then to implement the same fusion for dtypes.half.
2024-09-11 08:47:07 +08:00
qazal
f4f705a07c
can push SWIZZLE through reduce both ways ( #6453 )
2024-09-10 16:00:50 +08:00
qazal
1347e49e82
second iteration on UOps.SWIZZLE ( #6451 )
...
* new swizzle
* fix the failing tests
* test a double swizzle
* ci
2024-09-10 14:43:21 +08:00
qazal
95c9fe841e
UOp.st infra for the new SWIZZLE ( #6449 )
2024-09-10 09:39:45 +08:00
qazal
29e63097a0
st is a cached_property on UOp [run_process_replay] ( #6433 )
2024-09-10 08:30:35 +08:00
George Hotz
90fb17304f
put rewrite back in ops [run_process_replay] ( #6421 )
2024-09-09 13:53:51 +08:00
qazal
442150a8df
more ast_const for hardcoding consts [run_process_replay] ( #6418 )
2024-09-09 11:35:08 +08:00
Tim Becker
dfb818788e
Support reduction parameter in more loss functions ( #6302 )
2024-09-07 05:11:20 +08:00
George Hotz
c88329244b
create rewrite.py [run_process_replay] ( #6379 )
...
* create rewrite.py [run_process_replay]
* fix tests
* not in rewrite or ops
* skip flaky test
2024-09-06 10:51:01 +08:00
qazal
e7f6b654ad
cleanup uop eq asserts for swizzle [run_process_replay] ( #6362 )
...
* cleanup uop eq asserts for swizzle [run_process_replay]
* more stuff
2024-09-05 13:36:36 +08:00
qazal
2f00bf0c78
conv bw in one kernel with graph_rewrite ( #6330 )
...
* double reduce merger
* add test_fold_conv_relu_backward_ast_rewrite
* a correctness test to iterate on
* merge axes the other way around
* better
2024-09-03 03:53:53 +08:00
qazal
539654fbe1
graph_rewrite complexity tests [run_process_replay] ( #6317 )
2024-08-29 22:39:08 +03:00
qazal
07942ef361
Proposal: Better UOps.SWIZZLE ( #6309 )
...
* better UOps.SWIZZLE
* test_swizzle_rewrite
* add it to docs
* show a diff
* a lil more verbose
* two teeny notes
* hotfix: sink
2024-08-29 15:39:48 +03:00
qazal
f0cc8ca5f2
generic st_fixup in scheduler graph rewrite [compare_schedule] ( #6278 )
2024-08-25 11:02:17 +03:00
qazal
78d6bd8b41
start graph rewrite in the scheduler ( #6248 )
...
* start graph rewrite in the scheduler
* test: enable it
* test timings
* only fails in multi reduce
* more isolated tests
2024-08-23 13:15:55 +03:00
chenyu
3fc8203475
remove NEG from handwritten ast in tests ( #6234 )
...
* remove NEG from handwritten ast in tests
* test_linearizer_failures
2024-08-22 09:06:59 -04:00
madt2709
4bb98d8882
Fix track_running_stats in batchnorm ( #6200 )
...
* Fix track_running_stats in batchnorm
* Fix linter
* Update test_fold_conv_batchnorm_notrain to keep allowed at 1
* Add test_fold_conv_batchnorm_notrain_no_running_stats
* Save 1 line
2024-08-20 14:01:22 -07:00
qazal
1ba83cc7fa
split test_sgd_4convs_fuse [run_process_replay] ( #6158 )
2024-08-18 18:35:42 +03:00
George Hotz
89c7989659
no shapetracker in ops [run_process_replay] ( #6117 )
2024-08-16 17:23:27 -07:00
qazal
28c75bf2a6
merge uops with ops ( #6111 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-08-16 18:17:57 -04:00
qazal
c23d44c779
AST is UOp ( #6030 )
...
* most of the work from the uops2 branch
* schedule
* realize
* kernel
* lowerer
* search
* green
* merge uops with ops
* Revert "merge uops with ops"
This reverts commit 1408a59f12 .
* fix benchmark
* remove extra dedup
2024-08-16 22:09:00 +03:00
qazal
4d38fec8c1
rename lazyops to parents [run_process_replay] ( #6091 )
2024-08-15 17:27:32 +03:00
qazal
7d1f118731
use assertIs in test_schedule ( #6035 )
...
* use self.assertIs in test_schedule
* test_lazybuffer
2024-08-11 19:19:18 +03:00
chenyu
5820940d98
more relax rtol for test_arange_fuse_grouped_children ( #6027 )
...
one more https://github.com/chenyuxyz/tinygrad/actions/runs/10334072657/job/28607120462
2024-08-10 16:10:03 -04:00
chenyu
10374a2741
relax rtol for test_arange_fuse_grouped_children ( #6026 )
...
flaky https://github.com/tinygrad/tinygrad/actions/runs/10333939631/job/28606831006?pr=6023
2024-08-10 15:49:11 -04:00
qazal
3ef2788c4f
hotfix: run the entire test_conv_bw schedule ( #6014 )
2024-08-10 17:55:41 +03:00
qazal
b67d521a07
assert test_conv_bw correctness ( #6000 )
...
* assert test_conv_bw correctness
* reorder half
* metal and clang still red
2024-08-09 18:30:36 +03:00
qazal
45b1761175
smaller test_llama_embedding + assert correctness ( #5986 )
...
* smaller test_llama_embedding in CI
* test correctness
2024-08-08 22:11:29 +03:00
George Hotz
bf8ec23b00
hotfix: contiguous on precompute_freqs_cis
2024-08-07 14:40:56 -07:00
qazal
7677361d90
test pushing through different expands in 1 kernel ( #5963 )
...
* test pushing through different expands in 1 kernel
* realize eye
* back to test_example_matmul
2024-08-07 19:33:18 +03:00
qazal
d5d7f4e7b8
more TestIndexing correctness asserts [run_process_replay] ( #5948 )
...
* use torch in test_mnist_val
* more asserts
2024-08-07 01:50:42 +03:00
qazal
7b6496f2e6
fix the reduceops cache breaking beautiful_mnist ( #5938 )
...
* fix the reduceops cache breaking beautiful_mnist
* test_sparse_categorical_crossentropy_simple
* starting tests
* atol from test_nn
* test_sparse_categorical_crossentropy_alt
* dont use torch
2024-08-07 00:02:54 +03:00
qazal
3d4742dd2e
override output shape in fused assign ( #5930 )
...
* override output shape in fused assign
This makes
```
FUSE_ARANGE=1 JIT=0 python3 examples/llama.py --gen 1 --prompt "Hello." --count 10 --temperature 0 --timing
```
work. In general we should assert ASSIGN doesn't change shape.
* merge asserts
2024-08-06 13:28:50 +03:00
George Hotz
5d17f54e3c
fast mnist indexing ( #5921 )
...
* fast mnist indexing
* more tests
* remove those tests, new indexing rule
2024-08-05 13:55:15 -07:00
qazal
e0c6520138
check arange fusing with VIEW and COPY ( #5912 )
...
* check arange fusing with VIEW and COPY
* gpu and clang
2024-08-05 17:09:21 +03:00
qazal
aad9234e52
test fused precompute_freqs_cis ( #5900 )
...
* test_precompute_freqs_cis
* tiny for ci
2024-08-04 21:01:05 +03:00
qazal
4c5ef2cc4f
setitem with arange fusion 1 ( #5898 )
2024-08-04 16:09:21 +03:00
qazal
56ef9e453e
pad reduceops to the max of each dimension ( #5889 )
...
* early verify
* pad reduceops to the max of each dim
* remove the function
2024-08-03 14:03:30 +03:00
qazal
65fa86901a
indexing fusion 2 ( #5888 )
...
* arange fusion
* kernels that fuse
* tests
2024-08-03 13:13:39 +03:00