chenyu
ed1ed9e4ff
bert use BS=72 ( #7015 )
...
memory 131 -> 138
green tflops 201 -> 209
red tflops 160 -> 169
2024-10-12 09:41:56 -04:00
George Hotz
cba4b9a058
clean up ops file [pr] ( #7013 )
2024-10-12 19:53:52 +08:00
qazal
746a1f8c86
prep uoping diff for big graph [pr] ( #7014 )
2024-10-12 14:09:32 +03:00
ignaciosica
334f499e6a
consistent render of recip in cuda with CStyleLanguage ( #6980 )
2024-10-12 18:56:47 +08:00
George Hotz
a71bb09ec3
remove symbolic file [pr] ( #7012 )
2024-10-12 18:44:44 +08:00
George Hotz
16271189ea
hotfix: don't spend lines on a (broken) favicon
2024-10-12 18:21:10 +08:00
George Hotz
b737ee5bac
move to_indexed_uops to uops ( #7011 )
...
* move to_indexed_uops to uops
* UOp.range
2024-10-12 18:20:57 +08:00
George Hotz
5ae2de9845
UOp.variable ( #7010 )
...
* UOp.variable [pr]
* fix tests
* clean
* improve name rendering
* last bug
2024-10-12 18:20:44 +08:00
Bhavya Gada
f79e05cac0
add types in all nn/init.py classes ( #7002 )
...
* add types in batchnorm class
* fix lint error in batchnorm types
* add types to conv1d function
* add types to convtranspose1d func and conv2d, convtranspose2d classes
* add types to all remaining classes
* change conv1d padding type to also accept str
* less is more; only keep non-obvious types
* mkdocs need types
2024-10-12 14:42:14 +08:00
ignaciosica
2bb6b95e9f
refactor _make_hip_code_for_op into pm rules ( #7001 )
2024-10-12 12:46:22 +08:00
George Hotz
5c9f76e274
hotfix: openpilot compile3 compare to i==1
2024-10-12 09:44:24 +08:00
chenyu
36056e0760
update mlperf systems and copy 4.1 to 5.0 ( #7004 )
2024-10-11 16:20:34 -04:00
Markiian Novosad
8831c691e2
Add slice parameter type checking to disallow Tensor usage for slices ( #6967 )
...
* add support for single el tensors for slices
* rm trailing spaces
* cleanup long lines
* remove tensor in slice support, add comprehensive err msg
* cleanup getitem, add slice type check
* Edit err message
2024-10-11 16:20:21 -04:00
Francis Lam
b0dd407cdd
ops_cuda: add optional dynamic smem parameter ( #6956 )
...
* ops_cuda: add optional dynamic smem parameter
This is required to enable larger than 48kb shared memory usage on
a per-kernel basis.
* move setting max dynamic smem size to init
2024-10-11 21:51:06 +03:00
chenyu
0e42662f2a
log seed at the right place for bert ( #7000 )
2024-10-11 10:39:40 -04:00
nimlgen
5496a36536
update red mlperf bert readme ( #6969 )
2024-10-11 13:08:06 +03:00
nimlgen
feb0bcb58b
qcom bench bind to perf cluster ( #6996 )
2024-10-11 12:21:52 +03:00
qazal
7451812bbf
delete AST_REWRITE ctx var ( #6995 )
2024-10-11 11:33:16 +03:00
qazal
7988547df2
start changes from big graph ( #6993 )
...
* start changes from big graph [pr]
* space
* still capture ctx
2024-10-11 11:13:46 +03:00
George Hotz
e7a0ffe46a
break out linearization [pr] ( #6994 )
2024-10-11 15:27:33 +08:00
George Hotz
f319530191
don't track simplify [pr] ( #6992 )
2024-10-11 15:03:03 +08:00
George Hotz
e441794c4b
remove custom op support, we waste time maintaining this ( #6991 )
...
* remove custom op support, we waste time maintaining this
* customop is over
2024-10-11 14:31:09 +08:00
George Hotz
c08521e823
minor cleanups from toonygrad ( #6990 )
2024-10-11 14:19:10 +08:00
George Hotz
f50d0e0ee0
cloud device [pr] ( #6964 )
...
* first try at cloud device [pr]
* real separation
* we're free
* clang works
* unhappy with timeout
* better timeouts and free
* unrelated
* use http verbs + add test
* lines + better test
* fix DELETE
* shorter cloud
* split key
* fix sending renderer
* PTXRenderer serialization
* add sessions
* http.client
* minor timeout bump
* fix keep-alive
* inc server timeout
* real fix timeout
* that one too
2024-10-11 12:24:06 +08:00
Bhavya Gada
23c09f4b4c
add support for padding='same' in nn.conv ( #6975 )
...
* add support for padding='same' in nn.conv
* express concisely
* simplify loop
* test same padding with dilation and conv1d
* fix bad indentation
* make loop one liner
2024-10-11 11:39:07 +08:00
qazal
54dcea235d
viz auto recenter on out of view graph [pr] ( #6986 )
2024-10-11 02:40:06 +03:00
nimlgen
159ee04489
include qcom in view_supported_devices ( #6985 )
...
* include qcom in view_supported_devices
* ignore images
2024-10-11 01:10:51 +03:00
nimlgen
f9d454aed5
correct kernargs alignment ( #6984 )
2024-10-11 00:06:28 +03:00
qazal
2b17279d4e
viz don't default open the browser [pr] ( #6983 )
...
* viz don't default open the browser [pr]
* move st
* scale down
2024-10-10 22:12:18 +03:00
qazal
4f60252210
reduce scheduler process replay overhead [pr] ( #6981 )
2024-10-10 20:03:38 +03:00
Friedrich Carl Eichenroth
859d6d0407
Fix mypy examples/beautiful_*.py ( #6978 )
...
* fix mypy examples/beautiful_*.py
* backwards
* add test
* Revert "add test"
This reverts commit 4d88845ba3 .
---------
Co-authored-by: chenyu <chenyu@fastmail.com >
2024-10-10 11:34:29 -04:00
qazal
4ef5310039
track viz context even if rewrite errors [pr] ( #6976 )
2024-10-10 18:33:15 +03:00
chenyu
592e5f1df2
skip test_viz test_no_dedup_different_opts ( #6979 )
2024-10-10 11:10:24 -04:00
chenyu
e3dc10f8f6
improve fold_unrolled_divs ( #6977 )
...
addressed #6935
the first few terms in fold_unrolled_divs might have been folded already, so the check should first try to add those terms back. there is a case that every but one term is folded which is not an add chain anymore, so just added as a failed test case for now
2024-10-10 10:52:05 -04:00
qazal
3481468702
bring viz to core ( #6970 )
...
* move viz to core
* pathfix
* move test_viz to core
* cleanup test_viz diff
* use contextvars
2024-10-10 16:56:26 +03:00
nimlgen
fad575ec76
qcom tiny cleanups ( #6973 )
2024-10-10 12:26:41 +03:00
qazal
3724a66716
move test_viz to test/, prereq for tinygrad/viz [pr] ( #6972 )
2024-10-10 11:40:46 +03:00
Kinvert
960c495755
added beautiful fashion mnist and example ( #6961 )
...
* added beautiful fashion mnist and example
* fixing whitespace
* refactor Fashion MNIST to fewer lines
* fix newline to reduce diff
* Update beautiful_mnist.py
* Update beautiful_mnist.py
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com >
2024-10-10 12:01:07 +08:00
chenyu
b5546912e2
10% more TRAIN_STEPS for bert ( #6971 )
...
got two very close run, adding more steps for buffer
2024-10-09 19:21:43 -04:00
nimlgen
f90d8493cc
add HCQDEV_WAIT_TIMEOUT_MS ( #6968 )
2024-10-09 19:50:00 +03:00
chenyu
35cf48659b
limit beam param for bert on green ( #6966 )
...
seems to mitigate the crash
2024-10-09 11:48:18 -04:00
mesozoic-egg
0e8bcda07e
get readable error from wait_check ( #6965 )
...
Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me >
2024-10-09 17:28:58 +03:00
qazal
20d3c2d113
unify UOps.SHAPETRACKER and UOps.SWIZZLE with UOps.VIEW ( #6955 )
...
* add UOps.VIEW
* update hardcoded asts
* update sops.gz
2024-10-09 02:00:17 +08:00
nimlgen
137ad5519f
amd fix cwsr for gfx11 ( #6950 )
...
* amd cwsr
* ()
2024-10-08 17:44:29 +03:00
nimlgen
0d526e251e
nv sync on gpu before local update ( #6954 )
2024-10-08 17:43:58 +03:00
qazal
2800520dd5
even smaller process_replay.py [pr] ( #6941 )
...
* even smaller process_replay.py [pr]
* delete those tests
* dedup asts
2024-10-08 20:43:22 +08:00
qazal
851f39653a
rename to BUFFER_VIEW + MetaOps cleanup ( #6953 )
2024-10-08 20:09:22 +08:00
chenyu
1ff2c98f8a
fix logfile name for bert red ( #6952 )
2024-10-08 05:37:52 -04:00
czhu
08bfa8632b
embedding shape ( #6930 )
2024-10-08 14:42:20 +08:00
vladov
20a9683403
Make self.fd Optional. ( #6855 )
...
* Make self.fd Optional.
* Fix io_uring when missing fd.
* Compress io_uring fast path code.
2024-10-08 13:25:34 +08:00