Commit Graph

4433 Commits

Author SHA1 Message Date
madt2709
4bb98d8882 Fix track_running_stats in batchnorm (#6200)
* Fix track_running_stats in batchnorm

* Fix linter

* Update test_fold_conv_batchnorm_notrain to keep allowed at 1

* Add test_fold_conv_batchnorm_notrain_no_running_stats

* Save 1 line
2024-08-20 14:01:22 -07:00
George Hotz
a5d79688db fix indexing out of bounds (#6208)
* fix indeing out of bounds

* 5 ops per access is fine
2024-08-20 11:34:56 -07:00
chenyu
4451bcaf95 update test_arange test_llama_embedding_opt (#6207)
non CI uses larger embedding, still same orders of magnitude
2024-08-20 13:58:43 -04:00
qazal
074cf780dd add option to only benchmark schedule [run_process_replay] (#6204) 2024-08-20 16:51:27 +03:00
gswangg
0e6f057eae migrate test_linearizer.py to UOP AST (pt. 1) (#6150)
* migrate test_multioutput to UOP AST

* inline buf declarations

* migrate test_multireduce to UOp AST

* update test_mid_dim_multireduce to UOp AST

* update test_triple_multireduce with UOp AST

* make global definitions more concise

* update test_double_reduce_multireduce with UOp AST

* update test_multireduce_with_parallel with UOp AST

* update test_multiout_multireduce to UOp AST

* make gidx style consistent across updated tests

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-08-20 10:02:20 +03:00
chenyu
10330a41c7 add CMPNE tests in test_uops (#6196)
fixed the output_dtype for CMPNE and match the tests for CMPLT
2024-08-19 19:41:21 -04:00
chenyu
21d6739237 remove UnaryOps.NEG from lazy.py (#6193)
* remove UnaryOps.NEG from lazy.py

* neg is no longer unary
2024-08-19 18:41:28 -04:00
Gabe Caldwell
bdd6325f31 default num_classes value for one_hot (#6182)
* num_classes=-1

If num_classes set to -1, the number of classes will be inferred as one greater than the largest class value in the input tensor.

* num_classes desc

comment to explain num_classes default and what that means.

* replacing ' with `
2024-08-19 12:07:14 -07:00
Alessandro Benetti
9328248610 support for std_mean and cross_entropy (#6181)
* support for std_mean and cross_entropy (#3)

* Cross entropy and std mean support

* remove extra examples
2024-08-19 12:06:44 -07:00
Max-We
53b20afa3f Write tar_extract (#6180)
* Add tar_extract

* Add tar_extract tests

* Fix dtype for initialization from path

* Tests for path initialization

* rm print

---------

Co-authored-by: Maximilian Weichart <maximilian.weichart@icloud.com>
2024-08-19 12:06:17 -07:00
Eitan Turok
8556d0c642 Support gunzip in fetch (#6176)
* init

* update

* clean

* add type

* clean

* fix import order

* shorten variable names
2024-08-19 12:04:40 -07:00
samm393
5d742f7fe3 Missing features from rearrange (#6184)
* fixes and tests

* typo in test
2024-08-19 11:19:07 -07:00
qazal
2242ff84be type verify intermediate UOps [run_process_replay] (#6140)
* type verify intermediate UOps [run_process_replay]

* merge asserts

* variable const
2024-08-19 20:59:01 +03:00
qazal
478145cb8e lowering error in diff_schedule is fine [run_process_replay] (#6185) 2024-08-19 20:51:12 +03:00
chenyu
00578a021b re:6125 switch real_size to use uops [run_process_replay] (#6138)
* switch real_size to use uops [run_process_replay]

* enough to pass

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2024-08-19 13:20:24 -04:00
qazal
e28d29641f more scheduler process replay tooling [run_process_replay] (#6178) 2024-08-19 15:35:51 +03:00
chenyu
b36a7273c6 RUF018 assignment-in-assert [run_process_replay] (#6172)
assertion should not have side effect or `-O` breaks.

initially just wanted to fix the one in rearrange, but it also made some long lines less long
2024-08-19 00:34:52 -04:00
chenyu
9c60a27ece lower float64 sin fuzzer threshold (#6173)
139216373.71875 failed
https://github.com/tinygrad/tinygrad/actions/runs/10446960642/job/28925156240
2024-08-19 00:25:42 -04:00
samm393
fd7c84c1c8 Rearrange (#6106)
* rearrange and tests

* tidy

* whitespace

* remove line

* -5 lines

* test fix

* static -> instance

* fix () & add more tests

* remove flags

* -1 line

* match einops

* whitespace

* repeated names
2024-08-18 20:22:28 -07:00
chenyu
2de174677a threefry touchup [run_process_replay] (#6169)
also why is test_gc testing _rng_counter is allocated??
2024-08-18 23:01:24 -04:00
David González Martínez
724e408736 add support for retain_graph in backward (#6145)
* add support for retain_graph in backward

* fix: dont accumulate grad on non-leaf tensors

* fix order

* fix: do not delete grad on leafs

* fix linter

* fix: can't exactly match torch behaviour internally

* allow numerical room for test

* refactor
2024-08-18 16:08:31 -07:00
wozeparrot
0c5189de25 threefry half (#6154) 2024-08-18 15:23:12 -07:00
Timmy
e3d14d1ccc Lowerer Multireduce Grouping (#6097)
* grouping changes to codegen

* linters + tests

* fix identical store issue on PTX

* comment in grouping multireduce tests

* cleaning up diff

* cleaning up diff

* comments

* linters

* hotfix: dont change kernels

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-08-18 19:57:51 +03:00
qazal
1ba83cc7fa split test_sgd_4convs_fuse [run_process_replay] (#6158) 2024-08-18 18:35:42 +03:00
qazal
be6dda4093 hotfix: more lazyop rename to uop [run_process_replay] (#6157) 2024-08-18 17:28:44 +03:00
George Hotz
17a043edad tensor inference (#6156)
* tensor inference

* test is even better name
2024-08-18 00:19:28 -07:00
chenyu
f7950fc2b6 add E275 missing-whitespace-after-keyword linting rule (#6149)
requires space after keywords like `assert`, `not`, `return`, `else`
2024-08-17 16:44:34 -04:00
George Hotz
88edc2902d axis_is_masked with graph_rewrite [run_process_replay] (#6144) 2024-08-17 10:28:49 -07:00
qazal
5a266d5d0c type verify ImageDType and PtrDType [run_process_replay] (#6137)
* type verify ImageDType and PtrDType [run_process_replay]

* fix tests
2024-08-17 16:37:07 +03:00
qazal
d1d41130cd use membufs in ImageDType checks [run_process_replay] (#6136)
* use membufs in ImageDType checks

* set by key [run_process_replay]
2024-08-17 16:17:46 +03:00
qazal
d9ce664350 add test_verify_ast [run_process_replay] (#6134) 2024-08-17 14:14:30 +03:00
George Hotz
3a2d724cb2 extra matcher from renderer [run_process_replay] (#6130)
* extra matcher from renderer

* cache_pm [run_process_replay]
2024-08-16 23:53:11 -07:00
George Hotz
5048066e79 st_arg, never -1 [run_process_replay] (#6128) 2024-08-16 22:46:56 -07:00
George Hotz
d9cb45af09 only axis is masked [run_process_replay] (#6123) 2024-08-16 21:01:17 -07:00
George Hotz
94aa5f11b5 Revert "use vmax for real_size [run_process_replay] (#6120)" (#6122)
This reverts commit a6e3211444.
2024-08-16 20:33:19 -07:00
George Hotz
a6e3211444 use vmax for real_size [run_process_replay] (#6120)
* use vmax for real_size [run_process_replay]

* axis is masked
2024-08-16 20:17:23 -07:00
George Hotz
912f01ed4b UOpGraph -> linearize_uop [run_process_replay] (#6119) 2024-08-16 19:48:39 -07:00
George Hotz
89c7989659 no shapetracker in ops [run_process_replay] (#6117) 2024-08-16 17:23:27 -07:00
George Hotz
74ee9febec remove iter from uopgraph (#6110)
* remove iter from uopgraph

* linearize returns uops

* fix tests

* linearize in linearize

* tests fix

* touchup

* test failures
2024-08-16 15:58:29 -07:00
qazal
28c75bf2a6 merge uops with ops (#6111)
Co-authored-by: chenyu <chenyu@fastmail.com>
2024-08-16 18:17:57 -04:00
qazal
d5e3217076 hotfix: scheduler differ (#6115)
* hotfix: scheduler differ

* add the test back

* track keys
2024-08-16 23:34:49 +03:00
qazal
c23d44c779 AST is UOp (#6030)
* most of the work from the uops2 branch

* schedule

* realize

* kernel

* lowerer

* search

* green

* merge uops with ops

* Revert "merge uops with ops"

This reverts commit 1408a59f12.

* fix benchmark

* remove extra dedup
2024-08-16 22:09:00 +03:00
CaltropHungerton
38fb1e14a2 Intel XMX Tensor Core Support (#5622)
* fixed xmx demo

* i think i'm invoking the DPAS but it's slow

* compiler build arg to stop register spilling, indicated where to fix flop counter

* don't mind this

* do NOT mind me

* do not mind me

* do not view

* i will add bf16 later

* in process of figuring out tc fields

* we figured out the fields!!!

* added check for cl device vendor, added seperate IntelRenderer

* remove tc thread_local_aliases

* cleaning debris before draft pr

* edits for linter

* deduping and checking device extensions

* i will find more line reductions in other places

* before merge upstream

* double grf size in compiler to fix register spilling (bandaid), device checking changes

* tc python emulation

* fixed emulation

* tests for emulated intel tensor core

* TC=0, 1 working on upstream, fixed perf

* test

* debris

* check for specialized cl device when we canonicalize device

* bf16 support, tc=3 test added

* address tests

* revert half2 loads on intel tc, cleanup

* linter

* fold_expanded revert

* lint, whitespace fix

* cuda bf16 (only one with bf16) is skipped in test tensor cores, so i will skip for intel bf16 too

* make line shorter, no need for noqa E501

* removed device intel

* fix python emulation

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-08-16 09:19:21 -07:00
George Hotz
553ae9ebc0 bilinear interp uint8 fails (#6103)
* new test for e2e compile failures

* fix bug

* bilinear interp uint8 fails

* better tests
2024-08-15 19:34:39 -07:00
George Hotz
c850e03758 new test for e2e compile failures (#6101)
* new test for e2e compile failures

* fix bug
2024-08-15 18:56:22 -07:00
chenyu
9ef82e1f2b UOp pattern DEFINE_VAR with min==max is also CONST (#6095)
* UOp pattern DEFINE_VAR with min==max is also CONST

* fix tests
2024-08-15 12:09:44 -04:00
qazal
4d38fec8c1 rename lazyops to parents [run_process_replay] (#6091) 2024-08-15 17:27:32 +03:00
chenyu
5accfe26a0 rewrite bool ADD to OR and MUL to AND (#6084)
* rewrite bool ADD to OR and MUL to AND

fixed running `tinyphysics.onnx`, which contains a getitem from a boolean tensor.

only can repro through BEAM_COMPARE, which i think is a different bug in test_linearizer_failure

* fold those, and fix tests

* only for bool

* move dtypes.bool
2024-08-15 10:11:57 -04:00
chenyu
df03dca6e3 move % inside UOp mod_folding and remove deprecated tests (#6085)
[run_process_replay]
2024-08-14 23:25:10 -04:00
qazal
2bf7b56485 minor test fixups from the AST is UOp diff (#6081)
* add assert_equiv_uops cache

* dont expect lowering and schedule errors
2024-08-14 23:58:04 +03:00