105 Commits

Author SHA1 Message Date
qazal
7d1f118731 use assertIs in test_schedule (#6035)
* use self.assertIs in test_schedule

* test_lazybuffer
2024-08-11 19:19:18 +03:00
chenyu
5820940d98 more relax rtol for test_arange_fuse_grouped_children (#6027)
one more https://github.com/chenyuxyz/tinygrad/actions/runs/10334072657/job/28607120462
2024-08-10 16:10:03 -04:00
chenyu
10374a2741 relax rtol for test_arange_fuse_grouped_children (#6026)
flaky https://github.com/tinygrad/tinygrad/actions/runs/10333939631/job/28606831006?pr=6023
2024-08-10 15:49:11 -04:00
qazal
3ef2788c4f hotfix: run the entire test_conv_bw schedule (#6014) 2024-08-10 17:55:41 +03:00
qazal
b67d521a07 assert test_conv_bw correctness (#6000)
* assert test_conv_bw correctness

* reorder half

* metal and clang still red
2024-08-09 18:30:36 +03:00
qazal
45b1761175 smaller test_llama_embedding + assert correctness (#5986)
* smaller test_llama_embedding in CI

* test correctness
2024-08-08 22:11:29 +03:00
George Hotz
bf8ec23b00 hotfix: contiguous on precompute_freqs_cis 2024-08-07 14:40:56 -07:00
qazal
7677361d90 test pushing through different expands in 1 kernel (#5963)
* test pushing through different expands in 1 kernel

* realize eye

* back to test_example_matmul
2024-08-07 19:33:18 +03:00
qazal
d5d7f4e7b8 more TestIndexing correctness asserts [run_process_replay] (#5948)
* use torch in test_mnist_val

* more asserts
2024-08-07 01:50:42 +03:00
qazal
7b6496f2e6 fix the reduceops cache breaking beautiful_mnist (#5938)
* fix the reduceops cache breaking beautiful_mnist

* test_sparse_categorical_crossentropy_simple

* starting tests

* atol from test_nn

* test_sparse_categorical_crossentropy_alt

* dont use torch
2024-08-07 00:02:54 +03:00
qazal
3d4742dd2e override output shape in fused assign (#5930)
* override output shape in fused assign

This makes

```
FUSE_ARANGE=1 JIT=0 python3 examples/llama.py --gen 1 --prompt "Hello." --count 10 --temperature 0 --timing
```
work. In general we should assert ASSIGN doesn't change shape.

* merge asserts
2024-08-06 13:28:50 +03:00
George Hotz
5d17f54e3c fast mnist indexing (#5921)
* fast mnist indexing

* more tests

* remove those tests, new indexing rule
2024-08-05 13:55:15 -07:00
qazal
e0c6520138 check arange fusing with VIEW and COPY (#5912)
* check arange fusing with VIEW and COPY

* gpu and clang
2024-08-05 17:09:21 +03:00
qazal
aad9234e52 test fused precompute_freqs_cis (#5900)
* test_precompute_freqs_cis

* tiny for ci
2024-08-04 21:01:05 +03:00
qazal
4c5ef2cc4f setitem with arange fusion 1 (#5898) 2024-08-04 16:09:21 +03:00
qazal
56ef9e453e pad reduceops to the max of each dimension (#5889)
* early verify

* pad reduceops to the max of each dim

* remove the function
2024-08-03 14:03:30 +03:00
qazal
65fa86901a indexing fusion 2 (#5888)
* arange fusion

* kernels that fuse

* tests
2024-08-03 13:13:39 +03:00
qazal
af59b2eea9 tests from the indexing fusion branch (#5886) 2024-08-03 11:56:48 +03:00
qazal
26d0265d66 test schedule of LazyBuffers [run_process_replay] (#5859) 2024-08-01 19:06:29 +03:00
qazal
1b53207b4f revert isolated dags scheduling (#5724) 2024-07-25 19:45:12 -04:00
qazal
9ceb3a3d1f beautiful_mnist -4.3% kernels (#5709)
* add is_complete

* partially delete forced_realized

* p2

* start

* refactor to can_group

* remove steps

* _get_inputs is nicer

* fix the cache

* cache is dict now

* rename to group
2024-07-25 20:30:49 +03:00
George Hotz
dc21e63bd2 test: put conv in one reduce (#4441)
* test: put conv in one reduce

* put reduce at the end

* more expand

* generic, and that expand was breaking things

* ratio

* don't undo the expand

* arg 1

* strides

* warning, for resnet

* warning removed

* disable cast

* handle cast

* op

* err, that's right

* fixup

* fix that

* a test to play with

* add double_reduces

* working up to final reshape

* fold the last reshape

* moved to schedule

* fix axis

* ci, need to bring arange back

* FUSE_CONV_BW maybe

* valid in 3.9

* test_expand_reduce_is_folded_on_different_axes

* add FUSE_CONV_BW=1

* test_fold_batchnorm_backward

* test_sgd_4convs_fuse

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-07-22 12:16:13 +03:00
kormann
2c4add6844 pretty print lazy op per default (#5505)
* pretty lop

* min diff

* walrus

* fix

* min diff

* simplify

* pretty helper function

* ws

* pretty uop upat

* tests

* stricter tests

* test passes

* ws

* stronger upat test

* delete print_tree

* min diff

* stricter exp test

* fix merge

* stronger uops eval test

* +readable and deep upat test

* +readable and deep upat test

* sort inv fix

* fix

* revert allowed_len
2024-07-18 09:34:08 -07:00
George Hotz
fa7e734b49 MetaOps.KERNEL (#5543) 2024-07-17 19:41:23 -07:00
wozeparrot
90f0e2fc49 db in wal mode (#5388) 2024-07-12 20:43:36 -07:00
chenyu
4df63da190 clean up rest of the loadop [run_process_replay] (#5440)
to metaop and filter_sink
2024-07-12 23:38:51 -04:00
George Hotz
03c2dc8bd7 lowerer is kernel [run_process_replay] (#5437) 2024-07-12 18:50:55 -07:00
George Hotz
870dc8c350 s/Linearizer/Lowerer [run_process_replay] (#5428) 2024-07-12 15:54:07 -07:00
George Hotz
6707c778d0 scheduleitem is not Tuple [run_process_replay] (#5425)
* scheduleitem is not Tuple [run_process_replay]

* fix tests

* fix op + fuzzers

* fix mop test
2024-07-12 15:13:19 -07:00
George Hotz
f6ef283e6a s/loadops/metaops [run_process_replay] (#5421) 2024-07-12 13:26:50 -07:00
qazal
e22b377839 generalize FUSE_AS_ONE_KERNEL in the scheduler (#5397)
* test: use const

* hotfix: base

* asserts

* dont push through reshape

* cleanup

* dont need the cache

* test_reduceop_reshape_dont_push and test_index_fused are next
2024-07-12 10:23:16 +03:00
Timmy
bb7746985f multireduce scheduler tests (#5141)
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-07-08 20:28:55 +03:00
qazal
5292d37db6 LoadOps.VIEW in the scheduler spec (#5296)
* refactor to allow_buffer_view

* tests

* fix multi
2024-07-05 19:43:50 +03:00
hikettei
1ab7a4cff0 Handling Multiple UnaryOps.BITCAST in Function for Proper Kernel Fusion [run_process_replay] (#5172)
* [Patch] added an option not to ignore view replacing when doing bitcast

* added the testcase

* [Add] reproduced bitcast cannot be fused into a single kernel in the unittest

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-07-05 19:16:44 +03:00
chenyu
ee0c6dfc15 build Tensor._tri with movements only (#5110)
* build Tensor._tri with movements only

doesn't need arange, saved a kernel in attention mask

* simpler, more tests
2024-06-23 00:07:36 -04:00
chenyu
2b2488f2e2 revert creating Tensor from a list without numpy (#5041)
the change was incomplete and broke creating Tensor from a list of np array
2024-06-18 17:31:22 -04:00
qazal
04feeb37e6 look for unsafe pad ops in multiview ShapeTrackers (#5002) 2024-06-17 00:28:12 +03:00
Timmy
01b26756d6 Multireduce Scheduler Tests (#4972)
* scheduler tests

* linters

* cleaning up tests

* fixing tests

* syntax

* fixing metal
2024-06-16 16:30:22 +03:00
chenyu
5eee974b2a construct Tensor from python list/tuple directly (#4947)
* construct Tensor from python list/tuple directly

no numpy. annoying that half memoryview is 3.12 feature...

* simpler, and test

* flat already

* simpler

* cute

* 10% faster

* 5%
2024-06-14 11:36:05 -04:00
chenyu
286b4dbdf2 compile raise CompileError and skip only RuntimeError in multiprocess… (#4646)
* compile raise CompileError and skip only RuntimeError in multiprocess beam

renderer error with multiprocess should not be skipped by beam

* use `==` for dtype to dtype comparison

* that needs to be is

* typo
2024-05-19 00:25:25 -04:00
qazal
bf8f855838 assert kernel counts in unsupported fusions (#4643)
* replace with comments

* not relevant

* update comment

* custom exception maybe

* fix LoadOps.VIEW
2024-05-18 20:14:37 +03:00
qazal
f3f2b96583 pick schedule tests from external_test_opt (#4615)
* conv tests

* misc

* that shouldnt const fold
2024-05-16 15:43:41 +03:00
qazal
13200c6894 check simple_pads in all views (#4614) 2024-05-16 14:34:39 +03:00
qazal
0b464df605 base change scheduling spec (#4613)
* spec and kernel cnt

* dont use half

* skip half
2024-05-16 13:30:49 +03:00
George Hotz
ff64bcab69 move graph/search to engine (#4596) 2024-05-14 23:12:59 -07:00
George Hotz
fd02ab1e8b move disassemblers and openpilot (#4592)
* move disassemblers and openpilot

* delete junk

* put that in pre-commit

* fixup readme
2024-05-14 19:30:02 -07:00
qazal
355e1c135c pad fusion tests (#4570)
* what breaks

* Revert "what breaks"

This reverts commit e79f679283.

* simplest case

* one unsafe op

* expand+pad, shrink+pad

* safe case

* refactor
2024-05-14 20:34:46 +03:00
chenyu
7afca52796 replace pow in LAMB by tracking b1**t and b2**t per step (#4582)
* replace pow in LAMB by tracking b1**t and b2**t per step

* remove t, add [self.b1_t, self.b2_t] to return

* adam has one less kernel
2024-05-14 13:08:22 -04:00
George Hotz
17faae091b optimizer shouldn't be run without training (#4460)
* optimizer shouldn't be run without training

* set training in relevant tests

* fix multitensor

* that too
2024-05-06 15:34:12 -07:00
qazal
6dbe5585b0 batchnorm + conv backward in test_schedule (#4420)
* test both optims

* batchnorm_backward
2024-05-06 16:40:17 +03:00