qazal
bcb2f1caa3
init REDUCE_AXIS with BinaryOps ( #6256 )
...
* REDUCE_AXIS arg with BinaryOps
* more work in kernel.py
fixup sops.gz
* fix TestGraphRewriteEfficiency
2024-08-24 11:28:41 +03:00
chenyu
da5cf11859
fix acc init value for MUL ( #6263 )
2024-08-23 23:19:44 -04:00
George Hotz
26498b322e
add BEAM to external_benchmark_schedule.py
2024-08-23 18:10:46 -07:00
George Hotz
53a73038e3
hotfix: TestGraphRewriteEfficiency.test_create_many_uops
2024-08-23 15:51:57 -07:00
chenyu
590c0922b6
Tensor.prod ( #6250 )
...
* Tensor.prod
a new reduce op!
* onnx ReduceProd
2024-08-23 10:06:32 -04:00
qazal
78d6bd8b41
start graph rewrite in the scheduler ( #6248 )
...
* start graph rewrite in the scheduler
* test: enable it
* test timings
* only fails in multi reduce
* more isolated tests
2024-08-23 13:15:55 +03:00
George Hotz
238896ca02
loooking into graph rewrite speed ( #6239 )
...
* loooking into graph rewrite speed
* track, replace is slow
* if all same, no permutations [run_process_replay]
* types so compile works
* no implied comprehension
* TRACK_MATCH_STATS=2
2024-08-22 13:17:55 -07:00
chenyu
e745e16441
remove UnaryOps.NEG ( #6238 )
...
* Remove UnaryOps.NEG
generated new dataset with
```
time JIT=2 PYTHONPATH=. ./extra/optimization/generate_dataset.sh
gzip /tmp/sops
mv /tmp/sops.gz extra/datasets/
```
* fix that
2024-08-22 14:21:39 -04:00
nimlgen
6c4ddd6260
hcq skip tests when no multidev ( #6235 )
...
* hcq skip tests when no multidev
* linter
* a bit higher tinout
2024-08-22 18:27:16 +03:00
chenyu
08539f08b0
fix UOp repr with Variable in arg ( #6236 )
2024-08-22 11:06:33 -04:00
chenyu
3fc8203475
remove NEG from handwritten ast in tests ( #6234 )
...
* remove NEG from handwritten ast in tests
* test_linearizer_failures
2024-08-22 09:06:59 -04:00
chenyu
1c5ef5b793
format test_linearizer_failure ( #6231 )
...
made it easier to remove NEG
2024-08-21 21:10:56 -04:00
nimlgen
78c94abe9c
raise time limit for ci in test_profile_multidev_transfer ( #6227 )
2024-08-21 22:42:03 +03:00
gswangg
c74b318458
migrate test_linearizer.py to UOp AST, pt. 2 ( #6228 )
2024-08-21 22:16:11 +03:00
George Hotz
c3168952f0
wip: tracking pattern matcher [run_process_replay] ( #6225 )
...
* wip: tracking pattern matcher
* better
* proper dedup
* timing
* early reject
* mergable match stats
* TrackedPattenMatcher
* fix TrackedPattenMatcher
* cleanups
* clean that too
* remove early_reject
* Revert "remove early_reject"
This reverts commit dc2aef14b8f5da58f5ec9566daf252513cac394c.
* total
* sort by time
* match_stats cleanup
2024-08-21 11:57:26 -07:00
chenyu
a666450e4d
UOp pattern x + x -> x * 2 ( #6224 )
...
* UOp pattern x + x -> x * 2
now there's no NEG, with this it covers all kinds of a*x+b*x
* can remove x-x
2024-08-21 12:06:19 -04:00
chenyu
c9a9631818
no UnaryOps.NEG in generated UOp patterns ( #6209 )
...
* no UnaryOps.NEG in generated UOp patterns
removed pattern `x * (-1) -> -x` and `x != True`
* those are fine because NEG became CMPNE and True
* fix sd validation L2 norm
2024-08-21 11:08:22 -04:00
qazal
3b8cc5a3e0
more multireduce tests prep for neg removal [run_process_replay] ( #6220 )
2024-08-21 12:45:24 +03:00
qazal
f03e5a4b3b
test_multireduce const has a shape ( #6218 )
2024-08-21 11:02:45 +03:00
George Hotz
2c42e9c2c6
faster rewrite, no folder in expand/reduce [run_process_replay] ( #6216 )
...
* faster rewrite, no folder in expand/reduce [run_process_replay]
* is removing the expander there okay
* parens
* don't reconstruct exact match uop
* fast do_reduce
* expand pyint
* most of the parents gains with less lines
2024-08-20 23:36:58 -07:00
George Hotz
16f420f7a7
split full_graph_rewrite and linearize_uop [run_process_replay] ( #6215 )
...
* split full_graph_rewrite and linearize_uop
* fix tests
* graph rewrite in test uops
* add types
2024-08-20 20:12:33 -07:00
George Hotz
9faf205601
CIFAR trainer + various bugfixes / improvements ( #6146 )
...
* move cifar into datasets
* support for pathlib Tensors, tar_extract, and fetch gunzip
* too early for Device.DEFAULT
* simpler hlb_cifar + .to(None) is default
* new compiler failure, start beautiful_cifar
* beautiful cifar runs but is broken
* jit train step
* cleaner
* std_mean, not mean_std
* more correct
* fast indexing
* don't print that
* torch load broken
* add eval
* nicer bar
* decoraters are the way to do this
* bounds check the load
* a few ops
* batchnorm bugfix, if track_running_stats is False, use online estimate
* full timing
* fix fusion
* unneeded realize
* master tensor
2024-08-20 16:58:46 -07:00
madt2709
4bb98d8882
Fix track_running_stats in batchnorm ( #6200 )
...
* Fix track_running_stats in batchnorm
* Fix linter
* Update test_fold_conv_batchnorm_notrain to keep allowed at 1
* Add test_fold_conv_batchnorm_notrain_no_running_stats
* Save 1 line
2024-08-20 14:01:22 -07:00
George Hotz
a5d79688db
fix indexing out of bounds ( #6208 )
...
* fix indeing out of bounds
* 5 ops per access is fine
2024-08-20 11:34:56 -07:00
chenyu
4451bcaf95
update test_arange test_llama_embedding_opt ( #6207 )
...
non CI uses larger embedding, still same orders of magnitude
2024-08-20 13:58:43 -04:00
qazal
074cf780dd
add option to only benchmark schedule [run_process_replay] ( #6204 )
2024-08-20 16:51:27 +03:00
gswangg
0e6f057eae
migrate test_linearizer.py to UOP AST (pt. 1) ( #6150 )
...
* migrate test_multioutput to UOP AST
* inline buf declarations
* migrate test_multireduce to UOp AST
* update test_mid_dim_multireduce to UOp AST
* update test_triple_multireduce with UOp AST
* make global definitions more concise
* update test_double_reduce_multireduce with UOp AST
* update test_multireduce_with_parallel with UOp AST
* update test_multiout_multireduce to UOp AST
* make gidx style consistent across updated tests
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2024-08-20 10:02:20 +03:00
chenyu
10330a41c7
add CMPNE tests in test_uops ( #6196 )
...
fixed the output_dtype for CMPNE and match the tests for CMPLT
2024-08-19 19:41:21 -04:00
chenyu
21d6739237
remove UnaryOps.NEG from lazy.py ( #6193 )
...
* remove UnaryOps.NEG from lazy.py
* neg is no longer unary
2024-08-19 18:41:28 -04:00
Gabe Caldwell
bdd6325f31
default num_classes value for one_hot ( #6182 )
...
* num_classes=-1
If num_classes set to -1, the number of classes will be inferred as one greater than the largest class value in the input tensor.
* num_classes desc
comment to explain num_classes default and what that means.
* replacing ' with `
2024-08-19 12:07:14 -07:00
Alessandro Benetti
9328248610
support for std_mean and cross_entropy ( #6181 )
...
* support for std_mean and cross_entropy (#3 )
* Cross entropy and std mean support
* remove extra examples
2024-08-19 12:06:44 -07:00
Max-We
53b20afa3f
Write tar_extract ( #6180 )
...
* Add tar_extract
* Add tar_extract tests
* Fix dtype for initialization from path
* Tests for path initialization
* rm print
---------
Co-authored-by: Maximilian Weichart <maximilian.weichart@icloud.com >
2024-08-19 12:06:17 -07:00
Eitan Turok
8556d0c642
Support gunzip in fetch ( #6176 )
...
* init
* update
* clean
* add type
* clean
* fix import order
* shorten variable names
2024-08-19 12:04:40 -07:00
samm393
5d742f7fe3
Missing features from rearrange ( #6184 )
...
* fixes and tests
* typo in test
2024-08-19 11:19:07 -07:00
qazal
2242ff84be
type verify intermediate UOps [run_process_replay] ( #6140 )
...
* type verify intermediate UOps [run_process_replay]
* merge asserts
* variable const
2024-08-19 20:59:01 +03:00
qazal
478145cb8e
lowering error in diff_schedule is fine [run_process_replay] ( #6185 )
2024-08-19 20:51:12 +03:00
chenyu
00578a021b
re:6125 switch real_size to use uops [run_process_replay] ( #6138 )
...
* switch real_size to use uops [run_process_replay]
* enough to pass
---------
Co-authored-by: George Hotz <geohot@gmail.com >
2024-08-19 13:20:24 -04:00
qazal
e28d29641f
more scheduler process replay tooling [run_process_replay] ( #6178 )
2024-08-19 15:35:51 +03:00
chenyu
b36a7273c6
RUF018 assignment-in-assert [run_process_replay] ( #6172 )
...
assertion should not have side effect or `-O` breaks.
initially just wanted to fix the one in rearrange, but it also made some long lines less long
2024-08-19 00:34:52 -04:00
chenyu
9c60a27ece
lower float64 sin fuzzer threshold ( #6173 )
...
139216373.71875 failed
https://github.com/tinygrad/tinygrad/actions/runs/10446960642/job/28925156240
2024-08-19 00:25:42 -04:00
samm393
fd7c84c1c8
Rearrange ( #6106 )
...
* rearrange and tests
* tidy
* whitespace
* remove line
* -5 lines
* test fix
* static -> instance
* fix () & add more tests
* remove flags
* -1 line
* match einops
* whitespace
* repeated names
2024-08-18 20:22:28 -07:00
chenyu
2de174677a
threefry touchup [run_process_replay] ( #6169 )
...
also why is test_gc testing _rng_counter is allocated??
2024-08-18 23:01:24 -04:00
David González Martínez
724e408736
add support for retain_graph in backward ( #6145 )
...
* add support for retain_graph in backward
* fix: dont accumulate grad on non-leaf tensors
* fix order
* fix: do not delete grad on leafs
* fix linter
* fix: can't exactly match torch behaviour internally
* allow numerical room for test
* refactor
2024-08-18 16:08:31 -07:00
wozeparrot
0c5189de25
threefry half ( #6154 )
2024-08-18 15:23:12 -07:00
Timmy
e3d14d1ccc
Lowerer Multireduce Grouping ( #6097 )
...
* grouping changes to codegen
* linters + tests
* fix identical store issue on PTX
* comment in grouping multireduce tests
* cleaning up diff
* cleaning up diff
* comments
* linters
* hotfix: dont change kernels
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2024-08-18 19:57:51 +03:00
qazal
1ba83cc7fa
split test_sgd_4convs_fuse [run_process_replay] ( #6158 )
2024-08-18 18:35:42 +03:00
qazal
be6dda4093
hotfix: more lazyop rename to uop [run_process_replay] ( #6157 )
2024-08-18 17:28:44 +03:00
George Hotz
17a043edad
tensor inference ( #6156 )
...
* tensor inference
* test is even better name
2024-08-18 00:19:28 -07:00
chenyu
f7950fc2b6
add E275 missing-whitespace-after-keyword linting rule ( #6149 )
...
requires space after keywords like `assert`, `not`, `return`, `else`
2024-08-17 16:44:34 -04:00
George Hotz
88edc2902d
axis_is_masked with graph_rewrite [run_process_replay] ( #6144 )
2024-08-17 10:28:49 -07:00