George Hotz
f82ecd8802
remove uop symobilc rendering [run_process_replay] ( #6108 )
2024-08-16 09:02:15 -07:00
George Hotz
e8ae9af962
bump line count to 9000. we should be here a while
2024-08-16 08:46:36 -07:00
chenyu
7d46fb0c83
load balance NV benchmark ci ( #6107 )
2024-08-16 10:08:08 -04:00
qazal
1ff6c7c519
add more types to search [run_process_replay] ( #6096 )
...
* add more types to search [run_process_replay]
* bufs_from_lin
2024-08-16 13:19:25 +03:00
chenyu
e5da88873b
enable UOP_IS_SYMBOLIC ( #5954 )
2024-08-16 00:15:46 -04:00
George Hotz
553ae9ebc0
bilinear interp uint8 fails ( #6103 )
...
* new test for e2e compile failures
* fix bug
* bilinear interp uint8 fails
* better tests
2024-08-15 19:34:39 -07:00
George Hotz
c850e03758
new test for e2e compile failures ( #6101 )
...
* new test for e2e compile failures
* fix bug
2024-08-15 18:56:22 -07:00
chenyu
e4a7869893
move cancel mod pattern into mod_folding ( #6100 )
...
changed some kernel in a good way because x does not go through add chain
2024-08-15 19:04:18 -04:00
qazal
11d62668a3
refactor ast ops dtype access [run_process_replay] ( #6093 )
...
* refactor ast ops dtype access [run_process_replay]
* fix assert message
2024-08-15 19:13:33 +03:00
chenyu
9ef82e1f2b
UOp pattern DEFINE_VAR with min==max is also CONST ( #6095 )
...
* UOp pattern DEFINE_VAR with min==max is also CONST
* fix tests
2024-08-15 12:09:44 -04:00
chenyu
a41c9dd12c
test py.typed as a package ( #6094 )
...
* test py.typed as a package
* try this?
* and this
* try that?
* add this back
* cleanup
2024-08-15 11:19:08 -04:00
qazal
25dffb2079
kernel.py more typing [run_process_replay] ( #6092 )
2024-08-15 17:59:24 +03:00
qazal
4d38fec8c1
rename lazyops to parents [run_process_replay] ( #6091 )
2024-08-15 17:27:32 +03:00
chenyu
5accfe26a0
rewrite bool ADD to OR and MUL to AND ( #6084 )
...
* rewrite bool ADD to OR and MUL to AND
fixed running `tinyphysics.onnx`, which contains a getitem from a boolean tensor.
only can repro through BEAM_COMPARE, which i think is a different bug in test_linearizer_failure
* fold those, and fix tests
* only for bool
* move dtypes.bool
2024-08-15 10:11:57 -04:00
nimlgen
b765996d54
hcq remove offset from progs ( #6090 )
2024-08-15 17:02:54 +03:00
chenyu
df03dca6e3
move % inside UOp mod_folding and remove deprecated tests ( #6085 )
...
[run_process_replay]
2024-08-14 23:25:10 -04:00
George Hotz
c6e117c899
add a single py.typed ( #6083 )
2024-08-14 17:31:46 -07:00
qazal
2bf7b56485
minor test fixups from the AST is UOp diff ( #6081 )
...
* add assert_equiv_uops cache
* dont expect lowering and schedule errors
2024-08-14 23:58:04 +03:00
chenyu
95aa6d8ccd
remove redundant x/c pattern [run_process_replay] ( #6082 )
...
there's no div and 1/c is const folded
2024-08-14 16:57:39 -04:00
chenyu
a61cb1ff7c
move mod mod pattern into generic mod folding ( #6077 )
2024-08-14 16:24:21 -04:00
George Hotz
64563abc90
add LSTMCell to nn ( #6080 )
...
* add LSTMCell to nn
* lstmcell works with no input on first
* fix no bias 0
* simpler
2024-08-14 12:08:42 -07:00
chenyu
6b3112d525
fix qcom process_replay for kernel diff ( #6079 )
...
* debug why qcom process_replay does not run
skipping the wrong exception?
* um-hum
* get_step_times was parsed incorrectly
* cleanup
2024-08-14 15:05:49 -04:00
chenyu
2fe9d62451
increase test_recursive_add time from 1s to 2s ( #6078 )
...
flaky https://github.com/chenyuxyz/tinygrad/actions/runs/10392144818/job/28776666700
2024-08-14 13:52:02 -04:00
nimlgen
7ab531aede
autogen cleanup ( #6064 )
...
* start autogen cleanup
* nvgpu
* better?
* better
* amd part
* gpu regen
* fix mockgpu amd
* nv
* amd fix linter
* remove import
* ugh
* nv on master
* amd on master
2024-08-14 20:20:35 +03:00
chenyu
de773b593e
remove redundant div gcd patterns [run_process_replay] ( #6076 )
...
covered by generic div_folding
2024-08-14 13:18:28 -04:00
samm393
2dc586ffe5
Shape change bitcast for more dtypes ( #6047 )
...
* bitcast & tests
* use to_dtype
* put disk tensor tests back
* tests
* bitmask
* no bitmask
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-08-14 10:03:34 -07:00
qazal
83a2543c74
spec for in order LOAD/STORE indexing ( #6073 )
...
* test_unaligns_idxs
* spec for in order LOAD/STORE indexing
* test UOps.SPECIAL
* check for supports_float4
2024-08-14 19:18:00 +03:00
chenyu
5048f9a4d5
test linearizer failure 49 ( #6074 )
...
with UOP_IS_SYMBOLIC=1, on METAL it breaks store fusion and have A+B and B+A being two different UOp
2024-08-14 11:29:10 -04:00
qazal
30035df5a4
add metal process replay back ( #6068 )
...
test this new one
2024-08-14 12:29:56 +03:00
wozeparrot
518c022c29
feat: tag 0.9.2 ( #6067 )
v0.9.2
2024-08-13 16:15:36 -07:00
George Hotz
97c3563109
hotfix: clamp in docs
2024-08-13 16:06:30 -07:00
George Hotz
e039b2a920
add Tensor.clamp and fix bool loading ( #6069 )
2024-08-13 15:26:40 -07:00
chenyu
1782e4f64d
use div folding to do lt folding ( #6065 )
2024-08-13 16:59:05 -04:00
chenyu
e3af273fa1
touchup cl_errors ( #6058 )
...
* touchup cl_errors
* update test
2024-08-13 13:06:59 -04:00
qazal
9145ad52ff
revert UOps eq, this needs to be isolated in realize.py ( #6063 )
...
This reverts commit dccca7f227 .
2024-08-13 18:02:34 +03:00
nimlgen
fa84e6ec48
init hcq args state ( #6046 )
...
* init hcq args state
* cleaner
* amd
* fillargs
* fixes
* myoy
* docs
* fix
* not needed
* spacing
2024-08-13 17:11:58 +03:00
qazal
9d2ea94fe9
temp: disable process replay on metal ( #6062 )
2024-08-13 16:31:55 +03:00
andredaprato
18192079a9
Apply transcendental rewrite rules only if required by backend ( #6061 )
...
* Apply transcendental folding for missing ops
* Remove comments
* Remove Final type
2024-08-12 22:38:45 -07:00
Tobias Fischer
6e3eb50fd1
added fix and reg tests ( #6060 )
2024-08-12 21:00:48 -04:00
chenyu
45bd667a78
rewrite pyint after first rewrite [run_process_replay] ( #6059 )
2024-08-12 19:18:13 -04:00
qazal
2ac80576e4
remove BufferOps from log_lazybuffer ( #6057 )
...
this is fine because lazy.py doesn't create BufferOps.
2024-08-13 02:06:47 +03:00
qazal
dccca7f227
test: uop and lazyop have the same compare ( #6053 )
...
* test: uop and lazyop have the same compare
* typings
* self.assert_equiv_uops -> assertEqual
* hash dtype
* test nop too
* TestPatternMatcher never used this compare anyway
* nop eq and ne tests
2024-08-13 00:33:19 +03:00
qazal
8c501272f3
proposal: MetaOps.EXT ( #6054 )
...
`MetaOps.CUSTOM, MetaOps.COPY, MetaOps.EMPTY, MetaOps.VIEW` don't fit into any of our existing UOps.
MetaOps.KERNEL and MetaOps.EXT can be the two paths in realize.py
after AST is UOp:
MetaOps.KERNEL -> UOps.SINK
MetaOps.EXT -> UOps.EXT
2024-08-13 00:29:29 +03:00
wozeparrot
059cf2a90d
feat: autogen from kernel register offset headers ( #6056 )
2024-08-12 14:08:35 -07:00
chenyu
3f2d24a6ec
test_failure_48 for wrong truncation in idx on NV ( #6055 )
...
also added `RAWAST` to print pre-modified AST in DEBUG=3
2024-08-12 16:17:42 -04:00
chenyu
6ed9711898
UOps pattern (x%c)+(x//c)*c = x ( #6051 )
...
pretty cool that this is very easy to write now
2024-08-12 14:58:48 -04:00
qazal
71c5901fc1
refactor ast arg and op [compare_schedule] ( #6052 )
2024-08-12 21:51:00 +03:00
wozeparrot
dc2617bffd
feat: use more correct reg for local dims ( #6048 )
2024-08-12 11:15:37 -07:00
qazal
529832d223
refactor ast creation [compare_schedule] ( #6050 )
...
* refactor scheduler lazyop creation [compare_schedule]
* helpful prints
* this will become the default
2024-08-12 21:15:11 +03:00
nimlgen
8f787785d9
fix openpilot benchmark ( #6049 )
2024-08-12 21:12:32 +03:00