Commit Graph

6319 Commits

Author SHA1 Message Date
Louis Novy
2ac5aec66b Fix exponential complexity in _is_padding_okay [pr] (#7008)
* preliminary test

* missed Optional

* don't check for cache during recursion

* match style from st_fixup... may be marginally faster?

* pathological test case: strongly connected DAG

* move to test_schedule as this isn't really a fusion

* oops this shouldn't be edited

* Revert "oops this shouldn't be edited"

This reverts commit 487cb027dc.

* Revert "move to test_schedule as this isn't really a fusion"

This reverts commit 48d8c550ce.

* move to test_schedule as this isn't really a fusion

* ok no more merge error funny business
2024-10-14 02:34:47 +03:00
chenyu
bd8ecf7fd6 remove NumNode (#7035) 2024-10-13 16:42:19 -04:00
chenyu
c4c806a210 generate new kernel dataset (#7034)
* generate new kernel dataset

pre req to remove NumNode
```
extra/optimization/generate_dataset.sh
gzip -k /tmp/sops
mv /tmp/sops.gz extra/datasets/
```

* fix var range in fuzz_linearizer
2024-10-13 16:19:41 -04:00
chenyu
1a27417262 remove arbitrary multiplication case (#7033)
adds the wrongly simplified kernel in test_linearizer_failures
#7019
2024-10-13 15:06:05 -04:00
chenyu
13575f080a remove bitcast backward in function.py (#7031)
bitcast cannot backward
2024-10-13 10:08:27 -04:00
Harsh Natuskar
ace834ef7b =docs update (#7027) 2024-10-13 19:39:06 +08:00
qazal
13846930cd hotfix: extract_dataset.py (#7029) 2024-10-13 11:18:23 +03:00
nimlgen
942a17109a qcom use QCOMBuffer for all allocated buffers (#7023)
* qcom use QCOMBuffer for all allocated buffers

* checks
2024-10-12 23:44:36 +03:00
chenyu
04d9b46d51 derivative of softmax is indepedent of max (#7009)
* derivative of softmax is indepedent of max

* update test
2024-10-12 15:59:23 -04:00
chenyu
cae1c41755 test case of softmax backward kernel count (#7022) 2024-10-12 15:46:32 -04:00
George Hotz
5ce224ceb3 handle arbitrary multiplication case (#7019)
* handle arbitrary multiplication case

* remove count restriction
2024-10-12 23:16:27 +08:00
chenyu
23faeacb23 remove outdated comments (#7018) 2024-10-12 10:51:07 -04:00
George Hotz
85a45164fb remove pyint [pr] (#7016)
* remove pyint

* bump time on tp [pr]

* dont truncate in const fold

* remove dead code

* Revert "dont truncate in const fold"

This reverts commit 29c81db0f7.

* remove define_var
2024-10-12 22:36:24 +08:00
George Hotz
38d45dfba5 hotfix: no rng in test/external/external_benchmark_schedule.py 2024-10-12 22:03:04 +08:00
chenyu
ed1ed9e4ff bert use BS=72 (#7015)
memory 131 -> 138
green tflops 201 -> 209
red tflops 160 -> 169
2024-10-12 09:41:56 -04:00
George Hotz
cba4b9a058 clean up ops file [pr] (#7013) 2024-10-12 19:53:52 +08:00
qazal
746a1f8c86 prep uoping diff for big graph [pr] (#7014) 2024-10-12 14:09:32 +03:00
ignaciosica
334f499e6a consistent render of recip in cuda with CStyleLanguage (#6980) 2024-10-12 18:56:47 +08:00
George Hotz
a71bb09ec3 remove symbolic file [pr] (#7012) 2024-10-12 18:44:44 +08:00
George Hotz
16271189ea hotfix: don't spend lines on a (broken) favicon 2024-10-12 18:21:10 +08:00
George Hotz
b737ee5bac move to_indexed_uops to uops (#7011)
* move to_indexed_uops to uops

* UOp.range
2024-10-12 18:20:57 +08:00
George Hotz
5ae2de9845 UOp.variable (#7010)
* UOp.variable [pr]

* fix tests

* clean

* improve name rendering

* last bug
2024-10-12 18:20:44 +08:00
Bhavya Gada
f79e05cac0 add types in all nn/init.py classes (#7002)
* add types in batchnorm class

* fix lint error in batchnorm types

* add types to conv1d function

* add types to convtranspose1d func and conv2d, convtranspose2d classes

* add types to all remaining classes

* change conv1d padding type to also accept str

* less is more; only keep non-obvious types

* mkdocs need types
2024-10-12 14:42:14 +08:00
ignaciosica
2bb6b95e9f refactor _make_hip_code_for_op into pm rules (#7001) 2024-10-12 12:46:22 +08:00
George Hotz
5c9f76e274 hotfix: openpilot compile3 compare to i==1 2024-10-12 09:44:24 +08:00
chenyu
36056e0760 update mlperf systems and copy 4.1 to 5.0 (#7004) 2024-10-11 16:20:34 -04:00
Markiian Novosad
8831c691e2 Add slice parameter type checking to disallow Tensor usage for slices (#6967)
* add support for single el tensors for slices

* rm trailing spaces

* cleanup long lines

* remove tensor in slice support, add comprehensive err msg

* cleanup getitem, add slice type check

* Edit err message
2024-10-11 16:20:21 -04:00
Francis Lam
b0dd407cdd ops_cuda: add optional dynamic smem parameter (#6956)
* ops_cuda: add optional dynamic smem parameter

This is required to enable larger than 48kb shared memory usage on
a per-kernel basis.

* move setting max dynamic smem size to init
2024-10-11 21:51:06 +03:00
chenyu
0e42662f2a log seed at the right place for bert (#7000) 2024-10-11 10:39:40 -04:00
nimlgen
5496a36536 update red mlperf bert readme (#6969) 2024-10-11 13:08:06 +03:00
nimlgen
feb0bcb58b qcom bench bind to perf cluster (#6996) 2024-10-11 12:21:52 +03:00
qazal
7451812bbf delete AST_REWRITE ctx var (#6995) 2024-10-11 11:33:16 +03:00
qazal
7988547df2 start changes from big graph (#6993)
* start changes from big graph [pr]

* space

* still capture ctx
2024-10-11 11:13:46 +03:00
George Hotz
e7a0ffe46a break out linearization [pr] (#6994) 2024-10-11 15:27:33 +08:00
George Hotz
f319530191 don't track simplify [pr] (#6992) 2024-10-11 15:03:03 +08:00
George Hotz
e441794c4b remove custom op support, we waste time maintaining this (#6991)
* remove custom op support, we waste time maintaining this

* customop is over
2024-10-11 14:31:09 +08:00
George Hotz
c08521e823 minor cleanups from toonygrad (#6990) 2024-10-11 14:19:10 +08:00
George Hotz
f50d0e0ee0 cloud device [pr] (#6964)
* first try at cloud device [pr]

* real separation

* we're free

* clang works

* unhappy with timeout

* better timeouts and free

* unrelated

* use http verbs + add test

* lines + better test

* fix DELETE

* shorter cloud

* split key

* fix sending renderer

* PTXRenderer serialization

* add sessions

* http.client

* minor timeout bump

* fix keep-alive

* inc server timeout

* real fix timeout

* that one too
2024-10-11 12:24:06 +08:00
Bhavya Gada
23c09f4b4c add support for padding='same' in nn.conv (#6975)
* add support for padding='same' in nn.conv

* express concisely

* simplify loop

* test same padding with dilation and conv1d

* fix bad indentation

* make loop one liner
2024-10-11 11:39:07 +08:00
qazal
54dcea235d viz auto recenter on out of view graph [pr] (#6986) 2024-10-11 02:40:06 +03:00
nimlgen
159ee04489 include qcom in view_supported_devices (#6985)
* include qcom in view_supported_devices

* ignore images
2024-10-11 01:10:51 +03:00
nimlgen
f9d454aed5 correct kernargs alignment (#6984) 2024-10-11 00:06:28 +03:00
qazal
2b17279d4e viz don't default open the browser [pr] (#6983)
* viz don't default open the browser [pr]

* move st

* scale down
2024-10-10 22:12:18 +03:00
qazal
4f60252210 reduce scheduler process replay overhead [pr] (#6981) 2024-10-10 20:03:38 +03:00
Friedrich Carl Eichenroth
859d6d0407 Fix mypy examples/beautiful_*.py (#6978)
* fix mypy examples/beautiful_*.py

* backwards

* add test

* Revert "add test"

This reverts commit 4d88845ba3.

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-10-10 11:34:29 -04:00
qazal
4ef5310039 track viz context even if rewrite errors [pr] (#6976) 2024-10-10 18:33:15 +03:00
chenyu
592e5f1df2 skip test_viz test_no_dedup_different_opts (#6979) 2024-10-10 11:10:24 -04:00
chenyu
e3dc10f8f6 improve fold_unrolled_divs (#6977)
addressed #6935
the first few terms in fold_unrolled_divs might have been folded already, so the check should first try to add those terms back. there is a case that every but one term is folded which is not an add chain anymore, so just added as a failed test case for now
2024-10-10 10:52:05 -04:00
qazal
3481468702 bring viz to core (#6970)
* move viz to core

* pathfix

* move test_viz to core

* cleanup test_viz diff

* use contextvars
2024-10-10 16:56:26 +03:00
nimlgen
fad575ec76 qcom tiny cleanups (#6973) 2024-10-10 12:26:41 +03:00