Commit Graph

5567 Commits

Author SHA1 Message Date
ignaciosica
164ca5632e split tensor core tests (#6041) 2024-08-12 09:42:02 -04:00
chenyu
7ce716b3a0 bigint -> pyint [run_process_replay] (#6040)
it's a python int. priority should be  higher than bool, but we are not using it in type promo now.
2024-08-12 09:12:23 -04:00
qazal
059dd35985 minor UOps.RANGE ordering cleanup [run_process_replay] (#6039)
* range srcs has an extra phi filter

* inline comments _tend_ to be harder to read
2024-08-12 15:33:26 +03:00
Timmy
a00994b423 Lowerer Multireduce Uopgraph (#6007)
* uopgraph changes

* fixing for non-reducing ranges

* multireduce tests

* linters

* linters

* removing comments

* removing arg[1]

* linters

* prettier

* linters

* more linters

* use any instead of intersection
2024-08-12 15:16:07 +03:00
qazal
bb653fa0a5 use dict for UOp parents (#6036) 2024-08-12 15:06:45 +03:00
qazal
7d1f118731 use assertIs in test_schedule (#6035)
* use self.assertIs in test_schedule

* test_lazybuffer
2024-08-11 19:19:18 +03:00
qazal
b918e3c255 cache assert_equiv_uops (#6033) 2024-08-11 12:17:05 +03:00
George Hotz
8d108f65a4 add cast to TC=2 (#6032) 2024-08-10 18:47:12 -07:00
George Hotz
14b613e281 add STEPS to beautiful_mnist 2024-08-10 15:23:44 -07:00
George Hotz
1b3443902c don't use tgmath with clang (#6029)
* don't use tgmath with clang

* fix tests

* nostdlib for clang

* needs ffreestanding on OSX
2024-08-10 13:58:19 -07:00
gswangg
e05a1d6113 string -> fstring (#6028) 2024-08-10 13:28:31 -07:00
chenyu
5820940d98 more relax rtol for test_arange_fuse_grouped_children (#6027)
one more https://github.com/chenyuxyz/tinygrad/actions/runs/10334072657/job/28607120462
2024-08-10 16:10:03 -04:00
chenyu
d82370f6ef docs: fix broken links and update is_floating_point (#6023)
* docs: fix broken links and update is_floating_point

broken links would only show as INFO and not an error.

* make doc andhors warn
2024-08-10 15:58:48 -04:00
gswangg
d58ae17771 Cleanup cstlye render_const [run_process_replay] (#6024)
* simplify render_const since CONST is always scalar

* assert dtype is scalar in render_const
2024-08-10 12:58:02 -07:00
chenyu
10374a2741 relax rtol for test_arange_fuse_grouped_children (#6026)
flaky https://github.com/tinygrad/tinygrad/actions/runs/10333939631/job/28606831006?pr=6023
2024-08-10 15:49:11 -04:00
George Hotz
cf7d3c1eb8 fix tests locally on metal (#6025)
* remove contiguous child, it was breaking tests locally

* hmm, it's still needed

* include NOOPT in method cache key
2024-08-10 12:36:22 -07:00
chenyu
e6c7c3e499 update pylint path to check indent/space for all (#6022)
also fixed many errors. it was not checking nested dirs. exclude autogen for now.

can we use ruff for this?
2024-08-10 14:41:09 -04:00
George Hotz
cfb04c67d1 run unit tests separate from others (and only once) (#6020)
* run unit tests separate from others

* ignore unit tests elsewhere
2024-08-10 11:17:56 -07:00
George Hotz
8302dd6ea4 remove noqa: E501 from transcendental 2024-08-10 10:30:05 -07:00
chenyu
350276e947 simpler device_from_env [run_process_replay] (#6015)
* simpler device_from_env [run_process_replay]

fixed type ignore too

* combine
2024-08-10 13:17:46 -04:00
uuuvn
ee3b015407 ELF loader strtab fix and tests (#6011)
* ELF loader strtab fix and tests

* ruff

* typos

* only one test
2024-08-10 10:13:16 -07:00
Jun Zhang
54e176fb4f Ignore non-computational backends when overwriting the default (#5770) 2024-08-10 09:23:29 -07:00
qazal
3ef2788c4f hotfix: run the entire test_conv_bw schedule (#6014) 2024-08-10 17:55:41 +03:00
qazal
0e62076cf5 more process replay cleanups (#6013)
* more process replay cleanups

* comma benchmark missing
2024-08-10 17:29:10 +03:00
qazal
266afad8ed hotfix: skip schedule capture in benchmarks (#6012) 2024-08-10 17:13:53 +03:00
chenyu
63a8bc29d4 addition divisor in UOp div_folding (#6002)
in addition to try gcd of all terms, also try least common divisor of all MULs
2024-08-09 20:09:05 -04:00
chenyu
52a74e1d6f recursive div_folding to simplify logic (#6008)
* recursive div_folding to simplify logic

* fix mypy
2024-08-09 19:23:06 -04:00
chenyu
a4ec4e890a merge mul div pattern into div_folding (#6006) 2024-08-09 18:49:36 -04:00
chenyu
5961faa4be minor change to UOp div_fold (#6004)
remove an unnecessary gcd and swap the quo rem order, minimize diff for divisor pr
2024-08-09 17:09:59 -04:00
qazal
7373b05ee8 assert conv bw reduceops merge [compare_schedule] (#6001)
* assert conv bw reduceops merge [compare_schedule]

* diff with ref_commit_hash
2024-08-09 19:29:56 +03:00
qazal
b67d521a07 assert test_conv_bw correctness (#6000)
* assert test_conv_bw correctness

* reorder half

* metal and clang still red
2024-08-09 18:30:36 +03:00
nimlgen
ce066fd754 nv do not recalc mv_address (#5998) 2024-08-09 17:16:34 +03:00
qazal
a833f1a735 scheduler process replay with [compare_schedule] (#5997) 2024-08-09 16:58:22 +03:00
qazal
24c7c41ce0 diff LazyBuffer schedules in process replay (#5996)
* start diff printing

* this should be 2

* add to process_replay.py

* enable schedule capture

* arange diff is process replay
2024-08-09 14:16:43 +03:00
wozeparrot
d269bc95fa faster tinychat (#5993) 2024-08-08 19:16:26 -07:00
chenyu
1f1eb46af6 more failed simplified UOp div test case (#5992)
this speculative div was handled by "divisor" in symbolic.
2024-08-08 18:39:25 -04:00
gswangg
94dc61aa1f typeguard fixes for beautiful-mnsit JIT=2 METAL=1 [run_process_replay] (#5984)
* fix 'npdtype is not a class' typeguard error

* list -> tuple to fix Tensor.shrink typeguard errors

* list -> tuple to pass global_size typeguard check

* convert src -> list to fix partition typeguard error

* add type annotation and dtype assert to do_reduce

* update partition instead of caller
2024-08-08 17:53:33 -04:00
chenyu
c3e1ae2535 add failed simplified UOp div test case (#5990)
more cases!
2024-08-08 17:37:48 -04:00
nimlgen
38d5eecc68 hcq profiler support args (#5989)
* hcq profiler support args

* bytes -> _bytes

* fix

* add test

* mypy

* not f strings

* percison
2024-08-09 00:18:36 +03:00
qazal
45b1761175 smaller test_llama_embedding + assert correctness (#5986)
* smaller test_llama_embedding in CI

* test correctness
2024-08-08 22:11:29 +03:00
Timmy
8c99bdab08 More Multireduce Tests (#5968)
* multireduce tests

* linters

* more linters

* more linters

* seeing how it works with parallel
2024-08-08 22:04:08 +03:00
chenyu
3c0924cac4 UOp int alu patterns match all int (#5987)
instead of just dtypes.int
2024-08-08 14:50:58 -04:00
gswangg
df44a4e861 Make vectorization of CONST explicit (#5322)
* remove test_const_vectorize_fold

* remove const folding UPat for VECTORIZE

* refactor cstyle render_const

* remove calls to dtype.scalar() in render_const

* add assert

* add vectorized const to UOp.const

* add UPat GEP-VECTORIZE-CONST -> CONST

* render_vectorize for DEFINE_ACC in cstyle

* add back missing render_cast in render_const

* generate vectorized consts as UOps for DEFINE_ACC

* update asserts for DEFINE_ACC with VECTORIZE src

* add UPats for PHI with VECTORIZE src

* use prev rendered vectorize in DEFINE_ACC render

* update DEFINE_ACC in python runtime

* update vectorized DEFINE_ACC in PTXRenderer

* rebase DEFINE_ACC changes on lowerer

* verbose rewrite of bad UPats

* simplify UOps.CONST implementation in ops_python

* update sum_collapse UPats for DEFINE_ACC-VECTORIZE

* revert linearizer to TOT

* fix DEFINE_ACC implementation in ops_python

* simplify DEFINE_ACC in cstyle

* Fix linter error

* support VECTORIZE in fold gated load/store UPat

* support VECTORIZE in other fold gated load UPats

* rewrite VECTORIZE in UPat for no input DEFINE_ACC

* simplify DEFINE_ACC render in cstyle

* make VECTORIZE rules more concise

* add more vectorize fold tests

* inline VECTORIZE-CONSTs in cstyle render

* revert VECTORIZE/GEP rule refactor

* revert cstyle render_const refactor

* inline VECTORIZE-CONSTs in cstyle render

* implicitly vectorized const rendering -> explicit

* WMMA VECTORIZE CONST process replay hacks

* VECTORIZE CONST NAN process_replay hacks

* more VECTORIZE CONST NAN hacks

* cleanup process_replay hacks

* isnan() -> not isfinite() cstyle VECTORIZE CONST

* tweak isnan and isfinite checks VECTORIZE CONST

* tweak for positive vs negative infinity VECTORIZE CONST

* add assert to PTX CONST render

* process_replay VECTORIZE CONST render parity for PTX STORE

* vmin/vmax for VECTORIZE'd CONST

* update WMMA folding rules

* add tests for WMMA VECTORIZE fold

* hack for cstyle half4 CONST zero process_replay parity

* revert PTX backend changes

* add back minimal DEFINE_ACC PTX change

* remove cstyle process_replay hacks

* remove dead code in PTX CONST render

* cleanup vmin/vmax logic for VECTORIZE'd CONSTs

* update vectorize fold tests to use DEFINE_VAR

* fix long line formatting in test

* remove unwanted merge artifact

* more vmin/vmax cleanup

* remove unnecessary asserts

* yet more vmin/vmax cleanup

* get rid of explicit VECTORIZE CONST logic in _min_max

* reuse CONST instead of creating a new one

* remove unneeded cast

* handle DType correctly in sconst

* improve readability of tests

* save a line

* save another line

* tuplize pats in src

* remove GEP-VECTORIZE pats

* add vec +0 fold

* HACK: fold only vec8 +0

* remove vectorized ALU fold hack

---------

Co-authored-by: qazal <qazal.software@gmail.com>
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-08-08 20:59:05 +03:00
chenyu
62c77a2831 trim const in UOp div_folding (#5982)
simplify `(4*x+4*y+7)//16` to `(x+y+1)//4`.
fixed `GPU=1 UOP_IS_SYMBOLIC=1 IMAGE=2 python -m pytest test/test_ops.py -k conv`
2024-08-08 12:49:05 -04:00
qazal
e6d41b0ce7 hotfix: adjust test_backward_pass_diamond_model thresholds (#5981) 2024-08-09 00:20:53 +08:00
gswangg
08d22066ee simplify ALU vmin==vmax fold (#5962) 2024-08-08 11:29:16 -04:00
Elias Wahl
c9b4602854 no load in INITMLPERF (#5957) 2024-08-08 11:28:24 -04:00
nimlgen
183c4c91a3 fix non-jitted transfers in profile (#5980)
* fix transfers in profile

* fix linter

* sync to be sure everythin is recorded
2024-08-08 17:58:08 +03:00
nimlgen
76eca0d27e nv fix host mem mappings (#5979) 2024-08-08 17:03:44 +03:00
nimlgen
e89eff11a6 amd raise when not supported arch (#5978) 2024-08-08 14:46:14 +03:00