madt2709
4bb98d8882
Fix track_running_stats in batchnorm ( #6200 )
...
* Fix track_running_stats in batchnorm
* Fix linter
* Update test_fold_conv_batchnorm_notrain to keep allowed at 1
* Add test_fold_conv_batchnorm_notrain_no_running_stats
* Save 1 line
2024-08-20 14:01:22 -07:00
George Hotz
a5d79688db
fix indexing out of bounds ( #6208 )
...
* fix indeing out of bounds
* 5 ops per access is fine
2024-08-20 11:34:56 -07:00
chenyu
4451bcaf95
update test_arange test_llama_embedding_opt ( #6207 )
...
non CI uses larger embedding, still same orders of magnitude
2024-08-20 13:58:43 -04:00
qazal
074cf780dd
add option to only benchmark schedule [run_process_replay] ( #6204 )
2024-08-20 16:51:27 +03:00
gswangg
0e6f057eae
migrate test_linearizer.py to UOP AST (pt. 1) ( #6150 )
...
* migrate test_multioutput to UOP AST
* inline buf declarations
* migrate test_multireduce to UOp AST
* update test_mid_dim_multireduce to UOp AST
* update test_triple_multireduce with UOp AST
* make global definitions more concise
* update test_double_reduce_multireduce with UOp AST
* update test_multireduce_with_parallel with UOp AST
* update test_multiout_multireduce to UOp AST
* make gidx style consistent across updated tests
---------
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com >
2024-08-20 10:02:20 +03:00
chenyu
10330a41c7
add CMPNE tests in test_uops ( #6196 )
...
fixed the output_dtype for CMPNE and match the tests for CMPLT
2024-08-19 19:41:21 -04:00
chenyu
21d6739237
remove UnaryOps.NEG from lazy.py ( #6193 )
...
* remove UnaryOps.NEG from lazy.py
* neg is no longer unary
2024-08-19 18:41:28 -04:00
Gabe Caldwell
bdd6325f31
default num_classes value for one_hot ( #6182 )
...
* num_classes=-1
If num_classes set to -1, the number of classes will be inferred as one greater than the largest class value in the input tensor.
* num_classes desc
comment to explain num_classes default and what that means.
* replacing ' with `
2024-08-19 12:07:14 -07:00
Alessandro Benetti
9328248610
support for std_mean and cross_entropy ( #6181 )
...
* support for std_mean and cross_entropy (#3 )
* Cross entropy and std mean support
* remove extra examples
2024-08-19 12:06:44 -07:00
Max-We
53b20afa3f
Write tar_extract ( #6180 )
...
* Add tar_extract
* Add tar_extract tests
* Fix dtype for initialization from path
* Tests for path initialization
* rm print
---------
Co-authored-by: Maximilian Weichart <maximilian.weichart@icloud.com >
2024-08-19 12:06:17 -07:00
Eitan Turok
8556d0c642
Support gunzip in fetch ( #6176 )
...
* init
* update
* clean
* add type
* clean
* fix import order
* shorten variable names
2024-08-19 12:04:40 -07:00
samm393
5d742f7fe3
Missing features from rearrange ( #6184 )
...
* fixes and tests
* typo in test
2024-08-19 11:19:07 -07:00
qazal
2242ff84be
type verify intermediate UOps [run_process_replay] ( #6140 )
...
* type verify intermediate UOps [run_process_replay]
* merge asserts
* variable const
2024-08-19 20:59:01 +03:00
qazal
478145cb8e
lowering error in diff_schedule is fine [run_process_replay] ( #6185 )
2024-08-19 20:51:12 +03:00
chenyu
00578a021b
re:6125 switch real_size to use uops [run_process_replay] ( #6138 )
...
* switch real_size to use uops [run_process_replay]
* enough to pass
---------
Co-authored-by: George Hotz <geohot@gmail.com >
2024-08-19 13:20:24 -04:00
qazal
e28d29641f
more scheduler process replay tooling [run_process_replay] ( #6178 )
2024-08-19 15:35:51 +03:00
chenyu
b36a7273c6
RUF018 assignment-in-assert [run_process_replay] ( #6172 )
...
assertion should not have side effect or `-O` breaks.
initially just wanted to fix the one in rearrange, but it also made some long lines less long
2024-08-19 00:34:52 -04:00
chenyu
9c60a27ece
lower float64 sin fuzzer threshold ( #6173 )
...
139216373.71875 failed
https://github.com/tinygrad/tinygrad/actions/runs/10446960642/job/28925156240
2024-08-19 00:25:42 -04:00
samm393
fd7c84c1c8
Rearrange ( #6106 )
...
* rearrange and tests
* tidy
* whitespace
* remove line
* -5 lines
* test fix
* static -> instance
* fix () & add more tests
* remove flags
* -1 line
* match einops
* whitespace
* repeated names
2024-08-18 20:22:28 -07:00
chenyu
2de174677a
threefry touchup [run_process_replay] ( #6169 )
...
also why is test_gc testing _rng_counter is allocated??
2024-08-18 23:01:24 -04:00
David González Martínez
724e408736
add support for retain_graph in backward ( #6145 )
...
* add support for retain_graph in backward
* fix: dont accumulate grad on non-leaf tensors
* fix order
* fix: do not delete grad on leafs
* fix linter
* fix: can't exactly match torch behaviour internally
* allow numerical room for test
* refactor
2024-08-18 16:08:31 -07:00
wozeparrot
0c5189de25
threefry half ( #6154 )
2024-08-18 15:23:12 -07:00
Timmy
e3d14d1ccc
Lowerer Multireduce Grouping ( #6097 )
...
* grouping changes to codegen
* linters + tests
* fix identical store issue on PTX
* comment in grouping multireduce tests
* cleaning up diff
* cleaning up diff
* comments
* linters
* hotfix: dont change kernels
---------
Co-authored-by: qazal <qazal.software@gmail.com >
2024-08-18 19:57:51 +03:00
qazal
1ba83cc7fa
split test_sgd_4convs_fuse [run_process_replay] ( #6158 )
2024-08-18 18:35:42 +03:00
qazal
be6dda4093
hotfix: more lazyop rename to uop [run_process_replay] ( #6157 )
2024-08-18 17:28:44 +03:00
George Hotz
17a043edad
tensor inference ( #6156 )
...
* tensor inference
* test is even better name
2024-08-18 00:19:28 -07:00
chenyu
f7950fc2b6
add E275 missing-whitespace-after-keyword linting rule ( #6149 )
...
requires space after keywords like `assert`, `not`, `return`, `else`
2024-08-17 16:44:34 -04:00
George Hotz
88edc2902d
axis_is_masked with graph_rewrite [run_process_replay] ( #6144 )
2024-08-17 10:28:49 -07:00
qazal
5a266d5d0c
type verify ImageDType and PtrDType [run_process_replay] ( #6137 )
...
* type verify ImageDType and PtrDType [run_process_replay]
* fix tests
2024-08-17 16:37:07 +03:00
qazal
d1d41130cd
use membufs in ImageDType checks [run_process_replay] ( #6136 )
...
* use membufs in ImageDType checks
* set by key [run_process_replay]
2024-08-17 16:17:46 +03:00
qazal
d9ce664350
add test_verify_ast [run_process_replay] ( #6134 )
2024-08-17 14:14:30 +03:00
George Hotz
3a2d724cb2
extra matcher from renderer [run_process_replay] ( #6130 )
...
* extra matcher from renderer
* cache_pm [run_process_replay]
2024-08-16 23:53:11 -07:00
George Hotz
5048066e79
st_arg, never -1 [run_process_replay] ( #6128 )
2024-08-16 22:46:56 -07:00
George Hotz
d9cb45af09
only axis is masked [run_process_replay] ( #6123 )
2024-08-16 21:01:17 -07:00
George Hotz
94aa5f11b5
Revert "use vmax for real_size [run_process_replay] ( #6120 )" ( #6122 )
...
This reverts commit a6e3211444 .
2024-08-16 20:33:19 -07:00
George Hotz
a6e3211444
use vmax for real_size [run_process_replay] ( #6120 )
...
* use vmax for real_size [run_process_replay]
* axis is masked
2024-08-16 20:17:23 -07:00
George Hotz
912f01ed4b
UOpGraph -> linearize_uop [run_process_replay] ( #6119 )
2024-08-16 19:48:39 -07:00
George Hotz
89c7989659
no shapetracker in ops [run_process_replay] ( #6117 )
2024-08-16 17:23:27 -07:00
George Hotz
74ee9febec
remove iter from uopgraph ( #6110 )
...
* remove iter from uopgraph
* linearize returns uops
* fix tests
* linearize in linearize
* tests fix
* touchup
* test failures
2024-08-16 15:58:29 -07:00
qazal
28c75bf2a6
merge uops with ops ( #6111 )
...
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-08-16 18:17:57 -04:00
qazal
d5e3217076
hotfix: scheduler differ ( #6115 )
...
* hotfix: scheduler differ
* add the test back
* track keys
2024-08-16 23:34:49 +03:00
qazal
c23d44c779
AST is UOp ( #6030 )
...
* most of the work from the uops2 branch
* schedule
* realize
* kernel
* lowerer
* search
* green
* merge uops with ops
* Revert "merge uops with ops"
This reverts commit 1408a59f12 .
* fix benchmark
* remove extra dedup
2024-08-16 22:09:00 +03:00
CaltropHungerton
38fb1e14a2
Intel XMX Tensor Core Support ( #5622 )
...
* fixed xmx demo
* i think i'm invoking the DPAS but it's slow
* compiler build arg to stop register spilling, indicated where to fix flop counter
* don't mind this
* do NOT mind me
* do not mind me
* do not view
* i will add bf16 later
* in process of figuring out tc fields
* we figured out the fields!!!
* added check for cl device vendor, added seperate IntelRenderer
* remove tc thread_local_aliases
* cleaning debris before draft pr
* edits for linter
* deduping and checking device extensions
* i will find more line reductions in other places
* before merge upstream
* double grf size in compiler to fix register spilling (bandaid), device checking changes
* tc python emulation
* fixed emulation
* tests for emulated intel tensor core
* TC=0, 1 working on upstream, fixed perf
* test
* debris
* check for specialized cl device when we canonicalize device
* bf16 support, tc=3 test added
* address tests
* revert half2 loads on intel tc, cleanup
* linter
* fold_expanded revert
* lint, whitespace fix
* cuda bf16 (only one with bf16) is skipped in test tensor cores, so i will skip for intel bf16 too
* make line shorter, no need for noqa E501
* removed device intel
* fix python emulation
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-08-16 09:19:21 -07:00
George Hotz
553ae9ebc0
bilinear interp uint8 fails ( #6103 )
...
* new test for e2e compile failures
* fix bug
* bilinear interp uint8 fails
* better tests
2024-08-15 19:34:39 -07:00
George Hotz
c850e03758
new test for e2e compile failures ( #6101 )
...
* new test for e2e compile failures
* fix bug
2024-08-15 18:56:22 -07:00
chenyu
9ef82e1f2b
UOp pattern DEFINE_VAR with min==max is also CONST ( #6095 )
...
* UOp pattern DEFINE_VAR with min==max is also CONST
* fix tests
2024-08-15 12:09:44 -04:00
qazal
4d38fec8c1
rename lazyops to parents [run_process_replay] ( #6091 )
2024-08-15 17:27:32 +03:00
chenyu
5accfe26a0
rewrite bool ADD to OR and MUL to AND ( #6084 )
...
* rewrite bool ADD to OR and MUL to AND
fixed running `tinyphysics.onnx`, which contains a getitem from a boolean tensor.
only can repro through BEAM_COMPARE, which i think is a different bug in test_linearizer_failure
* fold those, and fix tests
* only for bool
* move dtypes.bool
2024-08-15 10:11:57 -04:00
chenyu
df03dca6e3
move % inside UOp mod_folding and remove deprecated tests ( #6085 )
...
[run_process_replay]
2024-08-14 23:25:10 -04:00
qazal
2bf7b56485
minor test fixups from the AST is UOp diff ( #6081 )
...
* add assert_equiv_uops cache
* dont expect lowering and schedule errors
2024-08-14 23:58:04 +03:00