George Hotz
4fed358511
hotfix: timeouts to 20 minutes. better no stats update than a red x
2024-10-25 16:31:52 +08:00
chenyu
d4c94d0d32
disable llama 1 4gpu and 6gpu benchmark ( #7276 )
...
having llama3 4gpu and 6gpu should be good enough
2024-10-24 14:19:22 -04:00
chenyu
e6929f2402
RUN_PROCESS_REPLAY=0 on llama 70B and resnet training ( #7272 )
...
* RUN_PROCESS_REPLAY=0 on llama 70B and resnet training
also added a 15 minutes total timeout, this cannot grow indefinitely
* add a few more
* a few more just for NV
2024-10-24 12:09:54 -04:00
George Hotz
9f4ca88218
hotfix: relax target pct for beautiful_mnist
2024-10-17 12:36:07 +08:00
nimlgen
feb0bcb58b
qcom bench bind to perf cluster ( #6996 )
2024-10-11 12:21:52 +03:00
nimlgen
f9d454aed5
correct kernargs alignment ( #6984 )
2024-10-11 00:06:28 +03:00
qazal
b82023c97e
process replay cleanup to generic _pmap [pr] ( #6929 )
...
* process replay cleanup to generic _pmap [pr]
* delete `COMPARE_SCHEDULE`
2024-10-07 13:57:05 +08:00
George Hotz
f45d178a55
hotfix: support JIT_BATCH_SIZE=0, make that the default
2024-09-25 10:36:04 +08:00
George Hotz
52e7f1c108
add new model CI
2024-09-25 10:23:06 +08:00
George Hotz
de259e3f09
hotfix: add compile3 to comma CI
2024-09-23 18:25:49 +08:00
qazal
e2d6e10ddf
hotfix: reset benchmarks cache for process replay ( #6671 )
2024-09-23 15:13:02 +08:00
nimlgen
d22b46a2ac
qcom in benchmarks ( #6337 )
2024-09-02 19:59:11 +03:00
chenyu
7d46fb0c83
load balance NV benchmark ci ( #6107 )
2024-08-16 10:08:08 -04:00
nimlgen
8f787785d9
fix openpilot benchmark ( #6049 )
2024-08-12 21:12:32 +03:00
qazal
266afad8ed
hotfix: skip schedule capture in benchmarks ( #6012 )
2024-08-10 17:13:53 +03:00
chenyu
adba5efc64
enable llama 2 70B in tinybox green CI ( #5905 )
...
runnable with MAX_CONTEXT=256
2024-08-04 18:48:46 -04:00
wozeparrot
acadccf344
comma benchmark ( #5518 )
2024-08-02 14:36:54 -07:00
wozeparrot
eebb1b9922
feat: temperature 0 llama3 benchmark ( #5806 )
2024-07-30 12:05:36 -07:00
qazal
3e49d86c01
process replay diffs 3 things now ( #5731 )
...
* github api infra
* process replay is 3 parts now
* parse benchmarks
* add gh_token
* complete diff
* move process replay tests
* last successful run
* add tempdir
* skip master
2024-07-27 12:52:20 +03:00
George Hotz
db1d093b29
reenable LLaMA-3 8B BEAM on NV ( #5746 )
2024-07-26 16:56:41 -07:00
wozeparrot
6ccb2390c3
feat: update_benchmark_staging ( #5529 )
2024-07-17 20:40:57 -07:00
wozeparrot
218e157f00
benchmark on update_benchmark_staging ( #5541 )
2024-07-17 17:11:52 -07:00
chenyu
b17e4adb3a
add -c advice.detachedHead=false to process replay git checkout ( #5419 )
...
remove the noisy `Note: switching to 'origin/master'.
You are in 'detached HEAD' state. You can look around, make experimental
changes...` in log
2024-07-12 15:13:26 -04:00
qazal
31fcc516dc
more process replay tooling ( #5407 )
...
* replays
* what's in there
* can it be up there
* sha is enough
* insert sha as the key
* fix str
* update reset utils
* that nested try/except was terrible
* github_context can go
2024-07-12 13:11:34 +03:00
chenyu
322c37e621
use helpers.JIT in llama and gpt2 examples ( #5350 )
...
* use helpers.JIT in llama and gpt2 examples
replaced getenv("JIT"), effectively made gpt2 default jit
* fix test_gpt2
2024-07-09 15:04:43 -04:00
chenyu
191463a919
add timing to SDXL ( #5273 )
2024-07-02 23:29:54 -04:00
chenyu
5808c37302
hotfix disable flaky llama3 beam benchmark on green ( #5249 )
2024-07-01 15:00:47 -04:00
chenyu
b9122ecdaf
revert stable diffusion validation with threefry ( #5248 )
...
* Revert "use threefry in stable diffusion benchmark (#4988 )"
This reverts commit 44dfa37c70 .
* sdxl and validation fix
* relax threshold
2024-07-01 14:43:47 -04:00
chenyu
88763eb9ff
fix stable_diffusion with fp16 ( #5239 )
2024-06-30 12:59:31 -04:00
nimlgen
6b08cb5e38
ptx runs on nv in benchmarks ( #5224 )
2024-06-29 11:06:44 +03:00
chenyu
7090eac8cb
validate sdxl output and put it in benchmark ( #5211 )
...
* validate sdxl output and put it in benchmark
* don't print fetch progress_bar in CI
2024-06-28 11:40:52 -04:00
chenyu
d8dc43ad06
remove JIT_BATCH_SIZE=4 from gpt2 NV benchmark ( #5198 )
...
this no longer helps
2024-06-27 15:20:34 -04:00
chenyu
83da8b3558
use NV instead of CUDA in benchmark ( #5192 )
...
also reenabled mixtral on green
2024-06-27 13:52:58 -04:00
chenyu
0c6c7c5f7b
CACHELEVEL=0 -> IGNORE_BEAM_CACHE=1 in benchmark ( #5191 )
...
ignoring beam cache but using compile cache should be fine, saved some benchmark time.
also updated `beam_search` to check flag value before accessing diskcache
2024-06-27 13:15:18 -04:00
chenyu
c12de4f47d
benchmark use JITBEAM for llama and gpt2 ( #5189 )
2024-06-27 12:56:02 -04:00
chenyu
e9c6a36894
remove CACHELEVEL=0 in llama3 benchmark ( #5025 )
2024-06-17 22:43:16 -04:00
George Hotz
bee8fc29ee
add GPT2 half/half+beam to AMD ( #5000 )
...
* add GPT2 half/half+beam to AMD
* winograd in training. half and half/beam file upload
2024-06-16 14:07:14 -07:00
chenyu
44dfa37c70
use threefry in stable diffusion benchmark ( #4988 )
...
also updated default steps to 10. easier to tell the image is following the prompt.
2024-06-15 20:25:29 -04:00
wozeparrot
ce1ed374c9
more tinychat fixes ( #4971 )
2024-06-15 16:29:39 -07:00
qazal
ff8e9eefc3
hotfix: don't use ASSERT_COMPILE for benchmarks process replay ( #4981 )
...
* use replay_codegen [run_process_replay]
* disable for now [run_process_replay]
2024-06-15 16:57:47 +03:00
uuuvn
92f49efd06
Trigger process replay from pull request title [run_process_replay] ( #4980 )
...
* Trigger process replay from pull request title
* idk how this thing works btw
* test if it will work
* try 2
* Revert "idk how this thing works btw"
This reverts commit 580da51b07 .
* Revert "try 2"
This reverts commit 7ff1e86d5d .
* test if it works
* meh
* Reapply "idk how this thing works btw"
This reverts commit dd33ad7c14 .
* revert
2024-06-15 16:21:00 +03:00
George Hotz
f42183ba28
hotfix: relax cifar to 93.2
2024-06-09 13:09:21 +02:00
nimlgen
6327b50e51
amd in benchmarks ( #4861 )
...
* amd in benchmarks
* remove all hsa
2024-06-08 23:24:46 +03:00
qazal
240d6b5bc0
process replay benchmarks ( #4668 )
2024-06-01 14:36:21 +03:00
chenyu
38bc38cdff
fix llama example quantize ( #4699 )
...
* fix llama example quantize
import quantize layers from new example llama3
add to mac benchmark
* fix that
* save the files
2024-05-23 15:35:26 -04:00
chenyu
72560e30fe
add CACHELEVEL=0 to tinybox green GEMM BEAM ( #4693 )
...
* add CACHELEVEL=0 to tinybox green GEMM BEAM
* BEAM=4 is more stable
2024-05-22 23:59:50 -04:00
wozeparrot
00432496d7
feat: tinyboxgreen ( #4366 )
...
* feat: tinyboxgreen
* feat: tinyboxgreenv2
* fix symlink weights
* fix: remove llama 2 70b for now
* feat: naming
* fix: remove extra cifar steps
* feat: disable mixtral on nvidia
2024-05-20 22:39:34 -04:00
chenyu
8a0d1ca7bb
CI test timeout 20 min -> 10 min ( #4645 )
...
if it takes more than 10 usually setup fails anyway. also updated matmul_kfd -> matmul_amd in benchmark
2024-05-18 13:58:28 -04:00
George Hotz
07b350a8f4
new uops is an actual graph ( #4560 )
...
* new uops is an actual graph
* it's way slower
* simpler
* fix define acc
* render_loop unique
* ops test pass
* add pattern matcher back, there's bugs
* rewrite
* use priority queue
* recursive children
* fix tests
* fix tests with SINK
* fix abstractions
* fix assembly
* simpler
* link define_acc
* fix DEFINE_ACC placement
* type verify
* full cmp
* fix cmp
* ACCESS_ACC
* insert DEFINE_ACC
* fix PHI
* recursive rewrite
* fix many tests
* sum collapse
* more patterns
* correct change
* fold arange
* fix that lin test
* space
* big folding rule works
* close
* has more maxes, meh
* cached node replace
* set changed
* simplest folding yet
* works
* works
* DIV
* all tests pass
* del
* fuzz linearizer fails
* sum_collapse
* test depth 2 cf
* fix lin test 14
* fix clang depth
* disable that
* failure 14 is fixed
* fix ptx
* failure 27 is fixed
* fix llama
* run_cnt
* Revert "Optimize PTX gated loads index calculation (#4304 )"
This reverts commit d97d5a7689 .
* fix uops loop
* fix ptx bugs
* add barrier
* print
* mem_type in ptx direct
* bypass tests that fail in CI but pass locally
* ptx remove ptr_ar
* more ptx passing
* fix ptx tests
* assert compile support
* remove model inference benchmark from red
2024-05-17 18:00:18 -07:00
chenyu
ca1df20fa9
benchmark name fix - resnet eval is on eval data ( #4628 )
2024-05-17 12:56:12 -04:00