ignaciosica
0a00187dce
add real AMX tests to benchmark ( #8216 )
...
* add real amx to benchmark
* add debug=2 to check tc are triggered
2024-12-13 14:03:41 -05:00
chenyu
d462f8ace0
use HALF in cifar wino benchmarks ( #8153 )
...
more representative as it hits tensor cores on tinyboxes
2024-12-10 20:21:00 -05:00
George Hotz
f83d715f41
move checks into compile3, delete compile2 [pr] ( #8127 )
...
* move checks into compile3 [pr]
* test_vs_onnx
* test v torch works
* float16 won't compile on compile3
* actually delete compile2
2024-12-09 14:21:42 -08:00
George Hotz
87c360c4b5
hotfix: add --size 8B to llama3
2024-12-09 07:53:20 -08:00
chenyu
3c8c98253a
BEAM_DEBUG=1 in speed_v_theoretical ( #7942 )
...
* DEBUG=3 in speed_v_theoretical
* BEAM_DEBUG=1
2024-11-28 08:30:55 -05:00
chenyu
a6171cbe71
add stable diffusion v2 to mac benchmark ( #7917 )
...
this caught #7902
2024-11-26 22:09:43 -05:00
chenyu
ac57d82a13
test_tiny on real NV/CUDA/AMD/HIP ( #7886 )
...
simple tests that run on real CUDA and HIP
2024-11-24 16:34:54 -05:00
chenyu
5c5b1b994c
less flaky benchmarks ( #7855 )
...
JIT=2 for metal cifar with HALF, and lower tflops for nv test_gemm_4096. failures in https://github.com/tinygrad/tinygrad/actions/runs/11980239535/job/33404098428?pr=7830
2024-11-22 16:39:39 -05:00
chenyu
d5c9fafff5
default run stable diffusion benchmark with fp16 ( #7831 )
...
and keep the non-fp16 one in mac
2024-11-21 15:58:17 -05:00
chenyu
c815d7b56e
run bfloat16 tensor core in metal benchmark ( #7808 )
...
* run bfloat16 tensor core in metal benchmark
* separate task
2024-11-20 15:34:07 -05:00
chenyu
e6cfaaa496
metal benchmark JIT=2 -> JIT=1 ( #7661 )
2024-11-12 22:55:27 -05:00
chenyu
1884f021e3
add conv3x3 to speed_v_theoretical ( #7658 )
...
* add conv3x3 to speed_v_theoretical
* show test duration
2024-11-12 16:41:56 -05:00
chenyu
a88a15c7e8
setup perflevel in red CI ( #7645 )
...
runs v4.1 bert setup.
```
rocm-smi --setprofile compute
rocm-smi --setmclk 3
rocm-smi --setperflevel high
```
2024-11-11 18:44:55 -05:00
chenyu
773d5b60bf
beam benchmark tests ( #7638 )
...
* beam benchmark tests
* lower AMD number somehow
* less flaky
2024-11-11 18:11:18 -05:00
chenyu
bfab03288d
fix HALF=1 in test_speed_v_torch ( #7642 )
...
* fix HALF=1 in test_speed_v_torch
"operation cache defeats" adds 1 to all arg, which were centered around 0. adding 1 makes big matmul and matvec go inf.
fixed by subtract 1 after and bumpped tolerance for half input
* bigger tol for BIG=2, update CI too
* bigger tol
2024-11-11 14:29:37 -05:00
George Hotz
b4cb6b89f9
hotfix: CI mac uses python 3.11
2024-11-11 23:42:35 +08:00
George Hotz
9648372ee6
hotfix: mac uses python 3.12
2024-11-11 23:23:48 +08:00
George Hotz
6f93e91deb
hotfix: lower mnist threshold for non determinism
2024-11-03 11:05:12 +08:00
George Hotz
4fed358511
hotfix: timeouts to 20 minutes. better no stats update than a red x
2024-10-25 16:31:52 +08:00
chenyu
d4c94d0d32
disable llama 1 4gpu and 6gpu benchmark ( #7276 )
...
having llama3 4gpu and 6gpu should be good enough
2024-10-24 14:19:22 -04:00
chenyu
e6929f2402
RUN_PROCESS_REPLAY=0 on llama 70B and resnet training ( #7272 )
...
* RUN_PROCESS_REPLAY=0 on llama 70B and resnet training
also added a 15 minutes total timeout, this cannot grow indefinitely
* add a few more
* a few more just for NV
2024-10-24 12:09:54 -04:00
George Hotz
9f4ca88218
hotfix: relax target pct for beautiful_mnist
2024-10-17 12:36:07 +08:00
nimlgen
feb0bcb58b
qcom bench bind to perf cluster ( #6996 )
2024-10-11 12:21:52 +03:00
nimlgen
f9d454aed5
correct kernargs alignment ( #6984 )
2024-10-11 00:06:28 +03:00
qazal
b82023c97e
process replay cleanup to generic _pmap [pr] ( #6929 )
...
* process replay cleanup to generic _pmap [pr]
* delete `COMPARE_SCHEDULE`
2024-10-07 13:57:05 +08:00
George Hotz
f45d178a55
hotfix: support JIT_BATCH_SIZE=0, make that the default
2024-09-25 10:36:04 +08:00
George Hotz
52e7f1c108
add new model CI
2024-09-25 10:23:06 +08:00
George Hotz
de259e3f09
hotfix: add compile3 to comma CI
2024-09-23 18:25:49 +08:00
qazal
e2d6e10ddf
hotfix: reset benchmarks cache for process replay ( #6671 )
2024-09-23 15:13:02 +08:00
nimlgen
d22b46a2ac
qcom in benchmarks ( #6337 )
2024-09-02 19:59:11 +03:00
chenyu
7d46fb0c83
load balance NV benchmark ci ( #6107 )
2024-08-16 10:08:08 -04:00
nimlgen
8f787785d9
fix openpilot benchmark ( #6049 )
2024-08-12 21:12:32 +03:00
qazal
266afad8ed
hotfix: skip schedule capture in benchmarks ( #6012 )
2024-08-10 17:13:53 +03:00
chenyu
adba5efc64
enable llama 2 70B in tinybox green CI ( #5905 )
...
runnable with MAX_CONTEXT=256
2024-08-04 18:48:46 -04:00
wozeparrot
acadccf344
comma benchmark ( #5518 )
2024-08-02 14:36:54 -07:00
wozeparrot
eebb1b9922
feat: temperature 0 llama3 benchmark ( #5806 )
2024-07-30 12:05:36 -07:00
qazal
3e49d86c01
process replay diffs 3 things now ( #5731 )
...
* github api infra
* process replay is 3 parts now
* parse benchmarks
* add gh_token
* complete diff
* move process replay tests
* last successful run
* add tempdir
* skip master
2024-07-27 12:52:20 +03:00
George Hotz
db1d093b29
reenable LLaMA-3 8B BEAM on NV ( #5746 )
2024-07-26 16:56:41 -07:00
wozeparrot
6ccb2390c3
feat: update_benchmark_staging ( #5529 )
2024-07-17 20:40:57 -07:00
wozeparrot
218e157f00
benchmark on update_benchmark_staging ( #5541 )
2024-07-17 17:11:52 -07:00
chenyu
b17e4adb3a
add -c advice.detachedHead=false to process replay git checkout ( #5419 )
...
remove the noisy `Note: switching to 'origin/master'.
You are in 'detached HEAD' state. You can look around, make experimental
changes...` in log
2024-07-12 15:13:26 -04:00
qazal
31fcc516dc
more process replay tooling ( #5407 )
...
* replays
* what's in there
* can it be up there
* sha is enough
* insert sha as the key
* fix str
* update reset utils
* that nested try/except was terrible
* github_context can go
2024-07-12 13:11:34 +03:00
chenyu
322c37e621
use helpers.JIT in llama and gpt2 examples ( #5350 )
...
* use helpers.JIT in llama and gpt2 examples
replaced getenv("JIT"), effectively made gpt2 default jit
* fix test_gpt2
2024-07-09 15:04:43 -04:00
chenyu
191463a919
add timing to SDXL ( #5273 )
2024-07-02 23:29:54 -04:00
chenyu
5808c37302
hotfix disable flaky llama3 beam benchmark on green ( #5249 )
2024-07-01 15:00:47 -04:00
chenyu
b9122ecdaf
revert stable diffusion validation with threefry ( #5248 )
...
* Revert "use threefry in stable diffusion benchmark (#4988 )"
This reverts commit 44dfa37c70 .
* sdxl and validation fix
* relax threshold
2024-07-01 14:43:47 -04:00
chenyu
88763eb9ff
fix stable_diffusion with fp16 ( #5239 )
2024-06-30 12:59:31 -04:00
nimlgen
6b08cb5e38
ptx runs on nv in benchmarks ( #5224 )
2024-06-29 11:06:44 +03:00
chenyu
7090eac8cb
validate sdxl output and put it in benchmark ( #5211 )
...
* validate sdxl output and put it in benchmark
* don't print fetch progress_bar in CI
2024-06-28 11:40:52 -04:00
chenyu
d8dc43ad06
remove JIT_BATCH_SIZE=4 from gpt2 NV benchmark ( #5198 )
...
this no longer helps
2024-06-27 15:20:34 -04:00