Commit Graph

1003 Commits

Author SHA1 Message Date
chenyu
ac57d82a13 test_tiny on real NV/CUDA/AMD/HIP (#7886)
simple tests that run on real CUDA and HIP
2024-11-24 16:34:54 -05:00
chenyu
5c5b1b994c less flaky benchmarks (#7855)
JIT=2 for metal cifar with HALF, and lower tflops for nv test_gemm_4096. failures in https://github.com/tinygrad/tinygrad/actions/runs/11980239535/job/33404098428?pr=7830
2024-11-22 16:39:39 -05:00
chenyu
d5c9fafff5 default run stable diffusion benchmark with fp16 (#7831)
and keep the non-fp16 one in mac
2024-11-21 15:58:17 -05:00
chenyu
46aa23539f generate and print mypy lineprecision report (#7809) 2024-11-20 16:53:17 -05:00
chenyu
c815d7b56e run bfloat16 tensor core in metal benchmark (#7808)
* run bfloat16 tensor core in metal benchmark

* separate task
2024-11-20 15:34:07 -05:00
chenyu
d5f76462c8 fix CI beautiful_mnist dir (#7790)
fixed `fatal: not a git repository (or any of the parent directories): .git` because $HOME is not $GITHUB_WORKSPACE
2024-11-19 09:59:02 -05:00
George Hotz
fbb4099b3c add test for compile3 [pr] (#7783)
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2024-11-19 19:26:51 +08:00
chenyu
9fb396f660 test_ops maxpool2d -> max_pool2d (#7696)
and avgpool2d -> avg_pool2d for better grepping the tests
2024-11-14 10:39:12 -05:00
chenyu
e6cfaaa496 metal benchmark JIT=2 -> JIT=1 (#7661) 2024-11-12 22:55:27 -05:00
chenyu
1884f021e3 add conv3x3 to speed_v_theoretical (#7658)
* add conv3x3 to speed_v_theoretical

* show test duration
2024-11-12 16:41:56 -05:00
chenyu
a88a15c7e8 setup perflevel in red CI (#7645)
runs v4.1 bert setup.
```
rocm-smi --setprofile compute
rocm-smi --setmclk 3
rocm-smi --setperflevel high
```
2024-11-11 18:44:55 -05:00
chenyu
773d5b60bf beam benchmark tests (#7638)
* beam benchmark tests

* lower AMD number somehow

* less flaky
2024-11-11 18:11:18 -05:00
chenyu
bfab03288d fix HALF=1 in test_speed_v_torch (#7642)
* fix HALF=1 in test_speed_v_torch

"operation cache defeats" adds 1 to all arg, which were centered around 0. adding 1 makes big matmul and matvec go inf.

fixed by subtract 1 after and bumpped tolerance for half input

* bigger tol for BIG=2, update CI too

* bigger tol
2024-11-11 14:29:37 -05:00
George Hotz
b4cb6b89f9 hotfix: CI mac uses python 3.11 2024-11-11 23:42:35 +08:00
George Hotz
9648372ee6 hotfix: mac uses python 3.12 2024-11-11 23:23:48 +08:00
George Hotz
d40673505f new cloud is cloudy [pr] (#7631)
* new cloud is cloudy [pr]

* waste lines to add security

* safety, with speed and less lines

* timing and del

* lines

* cleanups

* restore CloudSession

* bump to 3.10

* quotes

* renderer security
2024-11-11 20:18:04 +08:00
chenyu
e7b18cf5c0 fix load_worlds filter_novariable (#7564)
filter based on "DEFINE_VAR" instead of "Variable". also added a unit test to make sure dataset includes image and variable kernels
2024-11-05 16:06:39 -05:00
chenyu
207bca6cea set PAGE_SIZE=1 and generate new dataset (#7559)
13080 rows in total. both generating and loading this are pretty broken now. filters are wrong for example
2024-11-05 11:25:01 -05:00
George Hotz
6f93e91deb hotfix: lower mnist threshold for non determinism 2024-11-03 11:05:12 +08:00
George Hotz
72a9ac27e9 support image dtype in cloud [pr] (#7482)
* support image dtype in cloud [pr]

* remove outdated osx hack

* unused imports
2024-11-02 23:54:27 +08:00
George Hotz
133fe81cc5 Revert "Revert "move up migrate + new gated fold (#7403)" (#7406)" (#7407)
* Revert "Revert "move up migrate + new gated fold (#7403)" (#7406)"

This reverts commit ea5654a9bc.

* test padded in emulation too

* bring back early folding
2024-10-30 23:25:45 +08:00
George Hotz
d9d4dd6756 faster ci [pr] (#7348) 2024-10-29 14:01:44 +08:00
George Hotz
a5e0f59e41 move autogen to different CI runner [pr] (#7346)
* move autogen to different CI runner [pr]

* balance a bit

* readme back there

* compile enet in autogen
2024-10-29 13:35:22 +08:00
George Hotz
f55c3dcff8 hotfix: bump ocelot 2024-10-29 12:46:24 +08:00
George Hotz
4fed358511 hotfix: timeouts to 20 minutes. better no stats update than a red x 2024-10-25 16:31:52 +08:00
chenyu
d4c94d0d32 disable llama 1 4gpu and 6gpu benchmark (#7276)
having llama3 4gpu and 6gpu should be good enough
2024-10-24 14:19:22 -04:00
chenyu
e6929f2402 RUN_PROCESS_REPLAY=0 on llama 70B and resnet training (#7272)
* RUN_PROCESS_REPLAY=0 on llama 70B and resnet training

also added a 15 minutes total timeout, this cannot grow indefinitely

* add a few more

* a few more just for NV
2024-10-24 12:09:54 -04:00
qazal
4cf7cca91a delete fuzz_schedule [pr] (#7144) 2024-10-18 15:09:39 +03:00
George Hotz
9f4ca88218 hotfix: relax target pct for beautiful_mnist 2024-10-17 12:36:07 +08:00
chenyu
d12c87dc8e use ubuntu-22.04 in CI (#7068)
ubuntu-latest points to 24.04 now, maybe it's this?
2024-10-15 09:44:59 -04:00
chenyu
fbaab30fe3 add timing to fuzz_linearizer (#7056)
and applied smaller FUZZ_MAX_SIZE. this is getting quite slow in CI
2024-10-14 11:57:41 -04:00
nimlgen
feb0bcb58b qcom bench bind to perf cluster (#6996) 2024-10-11 12:21:52 +03:00
George Hotz
f50d0e0ee0 cloud device [pr] (#6964)
* first try at cloud device [pr]

* real separation

* we're free

* clang works

* unhappy with timeout

* better timeouts and free

* unrelated

* use http verbs + add test

* lines + better test

* fix DELETE

* shorter cloud

* split key

* fix sending renderer

* PTXRenderer serialization

* add sessions

* http.client

* minor timeout bump

* fix keep-alive

* inc server timeout

* real fix timeout

* that one too
2024-10-11 12:24:06 +08:00
nimlgen
f9d454aed5 correct kernargs alignment (#6984) 2024-10-11 00:06:28 +03:00
qazal
3724a66716 move test_viz to test/, prereq for tinygrad/viz [pr] (#6972) 2024-10-10 11:40:46 +03:00
qazal
b82023c97e process replay cleanup to generic _pmap [pr] (#6929)
* process replay cleanup to generic _pmap [pr]

* delete `COMPARE_SCHEDULE`
2024-10-07 13:57:05 +08:00
George Hotz
0d6216aba1 bump the download cache (#6896) 2024-10-05 10:23:18 +08:00
George Hotz
0f28e93224 add pickle support for pattern matchers [run_process_replay] (#6816)
* add pickle support for pattern matchers [run_process_replay]

* cleaner and all

* no closures

* fix tests

* revert that

* final

* cleaner

* python 3.8 fix

* add round trip back

* this

* waste lines on this. that's the final line count

* max print better

* more targetted fix

* regrettably add 3.8 support
2024-09-30 21:54:46 +08:00
wozeparrot
2b899164c6 no numpy (#6751) 2024-09-26 16:40:18 +08:00
wozeparrot
c100f3d406 default threefry (#6116) 2024-09-25 17:45:13 +08:00
George Hotz
dd575da7ee real minimum cstyle change (#6709)
* real minimum cstyle change

* make it match

* bring back DEFINE_GLOBAL store marking writable

* bump line count to 9800

* closer

* precompute don't render

* cast/bitcast too

* smem_align

* vectorize

* more pr match

* remove that test

* less PR diff
2024-09-25 12:40:46 +08:00
George Hotz
f45d178a55 hotfix: support JIT_BATCH_SIZE=0, make that the default 2024-09-25 10:36:04 +08:00
George Hotz
52e7f1c108 add new model CI 2024-09-25 10:23:06 +08:00
George Hotz
b0ffe2452b bump line count to 9800 2024-09-25 09:15:30 +08:00
George Hotz
de259e3f09 hotfix: add compile3 to comma CI 2024-09-23 18:25:49 +08:00
qazal
e2d6e10ddf hotfix: reset benchmarks cache for process replay (#6671) 2024-09-23 15:13:02 +08:00
chenyu
26ebb7cab4 don't use div_folding in lt_folding (#6666)
* don't use div_folding in lt_folding

valids 35 -> 13

* fails the same as before
2024-09-23 01:50:18 -04:00
chenyu
da5b741656 removed valid in openpilot conv (#6619)
35 valids left
2024-09-23 00:30:18 -04:00
chenyu
1923932339 canonicalize simplex lt (#6658)
(X := a0*x0 + a1*x1 + ...) > 0 is equivalent to x0 + x1 + ... > 0 if xi >= 0 and ai > 0 for ints
2024-09-22 23:04:47 -04:00
chenyu
5707503048 x//a<b -> x <a*b for positive a (#6622)
openpilot valids 47 -> 37
2024-09-20 04:38:47 -04:00