George Hotz
3a2d724cb2
extra matcher from renderer [run_process_replay] ( #6130 )
...
* extra matcher from renderer
* cache_pm [run_process_replay]
2024-08-16 23:53:11 -07:00
George Hotz
5048066e79
st_arg, never -1 [run_process_replay] ( #6128 )
2024-08-16 22:46:56 -07:00
George Hotz
d9cb45af09
only axis is masked [run_process_replay] ( #6123 )
2024-08-16 21:01:17 -07:00
George Hotz
94aa5f11b5
Revert "use vmax for real_size [run_process_replay] ( #6120 )" ( #6122 )
...
This reverts commit a6e3211444 .
2024-08-16 20:33:19 -07:00
George Hotz
a6e3211444
use vmax for real_size [run_process_replay] ( #6120 )
...
* use vmax for real_size [run_process_replay]
* axis is masked
2024-08-16 20:17:23 -07:00
George Hotz
912f01ed4b
UOpGraph -> linearize_uop [run_process_replay] ( #6119 )
2024-08-16 19:48:39 -07:00
George Hotz
89c7989659
no shapetracker in ops [run_process_replay] ( #6117 )
2024-08-16 17:23:27 -07:00
George Hotz
74ee9febec
remove iter from uopgraph ( #6110 )
...
* remove iter from uopgraph
* linearize returns uops
* fix tests
* linearize in linearize
* tests fix
* touchup
* test failures
2024-08-16 15:58:29 -07:00
qazal
28c75bf2a6
merge uops with ops ( #6111 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-08-16 18:17:57 -04:00
qazal
d5e3217076
hotfix: scheduler differ ( #6115 )
...
* hotfix: scheduler differ
* add the test back
* track keys
2024-08-16 23:34:49 +03:00
qazal
c23d44c779
AST is UOp ( #6030 )
...
* most of the work from the uops2 branch
* schedule
* realize
* kernel
* lowerer
* search
* green
* merge uops with ops
* Revert "merge uops with ops"
This reverts commit 1408a59f12 .
* fix benchmark
* remove extra dedup
2024-08-16 22:09:00 +03:00
CaltropHungerton
38fb1e14a2
Intel XMX Tensor Core Support ( #5622 )
...
* fixed xmx demo
* i think i'm invoking the DPAS but it's slow
* compiler build arg to stop register spilling, indicated where to fix flop counter
* don't mind this
* do NOT mind me
* do not mind me
* do not view
* i will add bf16 later
* in process of figuring out tc fields
* we figured out the fields!!!
* added check for cl device vendor, added seperate IntelRenderer
* remove tc thread_local_aliases
* cleaning debris before draft pr
* edits for linter
* deduping and checking device extensions
* i will find more line reductions in other places
* before merge upstream
* double grf size in compiler to fix register spilling (bandaid), device checking changes
* tc python emulation
* fixed emulation
* tests for emulated intel tensor core
* TC=0, 1 working on upstream, fixed perf
* test
* debris
* check for specialized cl device when we canonicalize device
* bf16 support, tc=3 test added
* address tests
* revert half2 loads on intel tc, cleanup
* linter
* fold_expanded revert
* lint, whitespace fix
* cuda bf16 (only one with bf16) is skipped in test tensor cores, so i will skip for intel bf16 too
* make line shorter, no need for noqa E501
* removed device intel
* fix python emulation
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-08-16 09:19:21 -07:00
George Hotz
553ae9ebc0
bilinear interp uint8 fails ( #6103 )
...
* new test for e2e compile failures
* fix bug
* bilinear interp uint8 fails
* better tests
2024-08-15 19:34:39 -07:00
George Hotz
c850e03758
new test for e2e compile failures ( #6101 )
...
* new test for e2e compile failures
* fix bug
2024-08-15 18:56:22 -07:00
chenyu
9ef82e1f2b
UOp pattern DEFINE_VAR with min==max is also CONST ( #6095 )
...
* UOp pattern DEFINE_VAR with min==max is also CONST
* fix tests
2024-08-15 12:09:44 -04:00
qazal
4d38fec8c1
rename lazyops to parents [run_process_replay] ( #6091 )
2024-08-15 17:27:32 +03:00
chenyu
5accfe26a0
rewrite bool ADD to OR and MUL to AND ( #6084 )
...
* rewrite bool ADD to OR and MUL to AND
fixed running `tinyphysics.onnx`, which contains a getitem from a boolean tensor.
only can repro through BEAM_COMPARE, which i think is a different bug in test_linearizer_failure
* fold those, and fix tests
* only for bool
* move dtypes.bool
2024-08-15 10:11:57 -04:00
chenyu
df03dca6e3
move % inside UOp mod_folding and remove deprecated tests ( #6085 )
...
[run_process_replay]
2024-08-14 23:25:10 -04:00
qazal
2bf7b56485
minor test fixups from the AST is UOp diff ( #6081 )
...
* add assert_equiv_uops cache
* dont expect lowering and schedule errors
2024-08-14 23:58:04 +03:00
George Hotz
64563abc90
add LSTMCell to nn ( #6080 )
...
* add LSTMCell to nn
* lstmcell works with no input on first
* fix no bias 0
* simpler
2024-08-14 12:08:42 -07:00
chenyu
6b3112d525
fix qcom process_replay for kernel diff ( #6079 )
...
* debug why qcom process_replay does not run
skipping the wrong exception?
* um-hum
* get_step_times was parsed incorrectly
* cleanup
2024-08-14 15:05:49 -04:00
chenyu
2fe9d62451
increase test_recursive_add time from 1s to 2s ( #6078 )
...
flaky https://github.com/chenyuxyz/tinygrad/actions/runs/10392144818/job/28776666700
2024-08-14 13:52:02 -04:00
samm393
2dc586ffe5
Shape change bitcast for more dtypes ( #6047 )
...
* bitcast & tests
* use to_dtype
* put disk tensor tests back
* tests
* bitmask
* no bitmask
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-08-14 10:03:34 -07:00
qazal
83a2543c74
spec for in order LOAD/STORE indexing ( #6073 )
...
* test_unaligns_idxs
* spec for in order LOAD/STORE indexing
* test UOps.SPECIAL
* check for supports_float4
2024-08-14 19:18:00 +03:00
chenyu
5048f9a4d5
test linearizer failure 49 ( #6074 )
...
with UOP_IS_SYMBOLIC=1, on METAL it breaks store fusion and have A+B and B+A being two different UOp
2024-08-14 11:29:10 -04:00
qazal
30035df5a4
add metal process replay back ( #6068 )
...
test this new one
2024-08-14 12:29:56 +03:00
chenyu
1782e4f64d
use div folding to do lt folding ( #6065 )
2024-08-13 16:59:05 -04:00
chenyu
e3af273fa1
touchup cl_errors ( #6058 )
...
* touchup cl_errors
* update test
2024-08-13 13:06:59 -04:00
qazal
9145ad52ff
revert UOps eq, this needs to be isolated in realize.py ( #6063 )
...
This reverts commit dccca7f227 .
2024-08-13 18:02:34 +03:00
Tobias Fischer
6e3eb50fd1
added fix and reg tests ( #6060 )
2024-08-12 21:00:48 -04:00
qazal
dccca7f227
test: uop and lazyop have the same compare ( #6053 )
...
* test: uop and lazyop have the same compare
* typings
* self.assert_equiv_uops -> assertEqual
* hash dtype
* test nop too
* TestPatternMatcher never used this compare anyway
* nop eq and ne tests
2024-08-13 00:33:19 +03:00
chenyu
3f2d24a6ec
test_failure_48 for wrong truncation in idx on NV ( #6055 )
...
also added `RAWAST` to print pre-modified AST in DEBUG=3
2024-08-12 16:17:42 -04:00
chenyu
6ed9711898
UOps pattern (x%c)+(x//c)*c = x ( #6051 )
...
pretty cool that this is very easy to write now
2024-08-12 14:58:48 -04:00
ignaciosica
777d6b3349
Fix compile error for max with inline const ( #5840 )
2024-08-12 23:40:39 +08:00
ignaciosica
164ca5632e
split tensor core tests ( #6041 )
2024-08-12 09:42:02 -04:00
chenyu
7ce716b3a0
bigint -> pyint [run_process_replay] ( #6040 )
...
it's a python int. priority should be higher than bool, but we are not using it in type promo now.
2024-08-12 09:12:23 -04:00
Timmy
a00994b423
Lowerer Multireduce Uopgraph ( #6007 )
...
* uopgraph changes
* fixing for non-reducing ranges
* multireduce tests
* linters
* linters
* removing comments
* removing arg[1]
* linters
* prettier
* linters
* more linters
* use any instead of intersection
2024-08-12 15:16:07 +03:00
qazal
7d1f118731
use assertIs in test_schedule ( #6035 )
...
* use self.assertIs in test_schedule
* test_lazybuffer
2024-08-11 19:19:18 +03:00
qazal
b918e3c255
cache assert_equiv_uops ( #6033 )
2024-08-11 12:17:05 +03:00
George Hotz
1b3443902c
don't use tgmath with clang ( #6029 )
...
* don't use tgmath with clang
* fix tests
* nostdlib for clang
* needs ffreestanding on OSX
2024-08-10 13:58:19 -07:00
chenyu
5820940d98
more relax rtol for test_arange_fuse_grouped_children ( #6027 )
...
one more https://github.com/chenyuxyz/tinygrad/actions/runs/10334072657/job/28607120462
2024-08-10 16:10:03 -04:00
chenyu
10374a2741
relax rtol for test_arange_fuse_grouped_children ( #6026 )
...
flaky https://github.com/tinygrad/tinygrad/actions/runs/10333939631/job/28606831006?pr=6023
2024-08-10 15:49:11 -04:00
George Hotz
cf7d3c1eb8
fix tests locally on metal ( #6025 )
...
* remove contiguous child, it was breaking tests locally
* hmm, it's still needed
* include NOOPT in method cache key
2024-08-10 12:36:22 -07:00
chenyu
e6c7c3e499
update pylint path to check indent/space for all ( #6022 )
...
also fixed many errors. it was not checking nested dirs. exclude autogen for now.
can we use ruff for this?
2024-08-10 14:41:09 -04:00
George Hotz
cfb04c67d1
run unit tests separate from others (and only once) ( #6020 )
...
* run unit tests separate from others
* ignore unit tests elsewhere
2024-08-10 11:17:56 -07:00
uuuvn
ee3b015407
ELF loader strtab fix and tests ( #6011 )
...
* ELF loader strtab fix and tests
* ruff
* typos
* only one test
2024-08-10 10:13:16 -07:00
Jun Zhang
54e176fb4f
Ignore non-computational backends when overwriting the default ( #5770 )
2024-08-10 09:23:29 -07:00
qazal
3ef2788c4f
hotfix: run the entire test_conv_bw schedule ( #6014 )
2024-08-10 17:55:41 +03:00
qazal
0e62076cf5
more process replay cleanups ( #6013 )
...
* more process replay cleanups
* comma benchmark missing
2024-08-10 17:29:10 +03:00
chenyu
63a8bc29d4
addition divisor in UOp div_folding ( #6002 )
...
in addition to try gcd of all terms, also try least common divisor of all MULs
2024-08-09 20:09:05 -04:00