Commit Graph

148 Commits

Author SHA1 Message Date
George Hotz
f45d178a55 hotfix: support JIT_BATCH_SIZE=0, make that the default 2024-09-25 10:36:04 +08:00
George Hotz
52e7f1c108 add new model CI 2024-09-25 10:23:06 +08:00
George Hotz
de259e3f09 hotfix: add compile3 to comma CI 2024-09-23 18:25:49 +08:00
qazal
e2d6e10ddf hotfix: reset benchmarks cache for process replay (#6671) 2024-09-23 15:13:02 +08:00
nimlgen
d22b46a2ac qcom in benchmarks (#6337) 2024-09-02 19:59:11 +03:00
chenyu
7d46fb0c83 load balance NV benchmark ci (#6107) 2024-08-16 10:08:08 -04:00
nimlgen
8f787785d9 fix openpilot benchmark (#6049) 2024-08-12 21:12:32 +03:00
qazal
266afad8ed hotfix: skip schedule capture in benchmarks (#6012) 2024-08-10 17:13:53 +03:00
chenyu
adba5efc64 enable llama 2 70B in tinybox green CI (#5905)
runnable with MAX_CONTEXT=256
2024-08-04 18:48:46 -04:00
wozeparrot
acadccf344 comma benchmark (#5518) 2024-08-02 14:36:54 -07:00
wozeparrot
eebb1b9922 feat: temperature 0 llama3 benchmark (#5806) 2024-07-30 12:05:36 -07:00
qazal
3e49d86c01 process replay diffs 3 things now (#5731)
* github api infra

* process replay is 3 parts now

* parse benchmarks

* add gh_token

* complete diff

* move process replay tests

* last successful run

* add tempdir

* skip master
2024-07-27 12:52:20 +03:00
George Hotz
db1d093b29 reenable LLaMA-3 8B BEAM on NV (#5746) 2024-07-26 16:56:41 -07:00
wozeparrot
6ccb2390c3 feat: update_benchmark_staging (#5529) 2024-07-17 20:40:57 -07:00
wozeparrot
218e157f00 benchmark on update_benchmark_staging (#5541) 2024-07-17 17:11:52 -07:00
chenyu
b17e4adb3a add -c advice.detachedHead=false to process replay git checkout (#5419)
remove the noisy `Note: switching to 'origin/master'.

You are in 'detached HEAD' state. You can look around, make experimental
changes...` in log
2024-07-12 15:13:26 -04:00
qazal
31fcc516dc more process replay tooling (#5407)
* replays

* what's in there

* can it be up there

* sha is enough

* insert sha as the key

* fix str

* update reset utils

* that nested try/except was terrible

* github_context can go
2024-07-12 13:11:34 +03:00
chenyu
322c37e621 use helpers.JIT in llama and gpt2 examples (#5350)
* use helpers.JIT in llama and gpt2 examples

replaced getenv("JIT"), effectively made gpt2 default jit

* fix test_gpt2
2024-07-09 15:04:43 -04:00
chenyu
191463a919 add timing to SDXL (#5273) 2024-07-02 23:29:54 -04:00
chenyu
5808c37302 hotfix disable flaky llama3 beam benchmark on green (#5249) 2024-07-01 15:00:47 -04:00
chenyu
b9122ecdaf revert stable diffusion validation with threefry (#5248)
* Revert "use threefry in stable diffusion benchmark (#4988)"

This reverts commit 44dfa37c70.

* sdxl and validation fix

* relax threshold
2024-07-01 14:43:47 -04:00
chenyu
88763eb9ff fix stable_diffusion with fp16 (#5239) 2024-06-30 12:59:31 -04:00
nimlgen
6b08cb5e38 ptx runs on nv in benchmarks (#5224) 2024-06-29 11:06:44 +03:00
chenyu
7090eac8cb validate sdxl output and put it in benchmark (#5211)
* validate sdxl output and put it in benchmark

* don't print fetch progress_bar in CI
2024-06-28 11:40:52 -04:00
chenyu
d8dc43ad06 remove JIT_BATCH_SIZE=4 from gpt2 NV benchmark (#5198)
this no longer helps
2024-06-27 15:20:34 -04:00
chenyu
83da8b3558 use NV instead of CUDA in benchmark (#5192)
also reenabled mixtral on green
2024-06-27 13:52:58 -04:00
chenyu
0c6c7c5f7b CACHELEVEL=0 -> IGNORE_BEAM_CACHE=1 in benchmark (#5191)
ignoring beam cache but using compile cache should be fine, saved some benchmark time.

also updated `beam_search` to check flag value before accessing diskcache
2024-06-27 13:15:18 -04:00
chenyu
c12de4f47d benchmark use JITBEAM for llama and gpt2 (#5189) 2024-06-27 12:56:02 -04:00
chenyu
e9c6a36894 remove CACHELEVEL=0 in llama3 benchmark (#5025) 2024-06-17 22:43:16 -04:00
George Hotz
bee8fc29ee add GPT2 half/half+beam to AMD (#5000)
* add GPT2 half/half+beam to AMD

* winograd in training. half and half/beam file upload
2024-06-16 14:07:14 -07:00
chenyu
44dfa37c70 use threefry in stable diffusion benchmark (#4988)
also updated default steps to 10. easier to tell the image is following the prompt.
2024-06-15 20:25:29 -04:00
wozeparrot
ce1ed374c9 more tinychat fixes (#4971) 2024-06-15 16:29:39 -07:00
qazal
ff8e9eefc3 hotfix: don't use ASSERT_COMPILE for benchmarks process replay (#4981)
* use replay_codegen [run_process_replay]

* disable for now [run_process_replay]
2024-06-15 16:57:47 +03:00
uuuvn
92f49efd06 Trigger process replay from pull request title [run_process_replay] (#4980)
* Trigger process replay from pull request title

* idk how this thing works btw

* test if it will work

* try 2

* Revert "idk how this thing works btw"

This reverts commit 580da51b07.

* Revert "try 2"

This reverts commit 7ff1e86d5d.

* test if it works

* meh

* Reapply "idk how this thing works btw"

This reverts commit dd33ad7c14.

* revert
2024-06-15 16:21:00 +03:00
George Hotz
f42183ba28 hotfix: relax cifar to 93.2 2024-06-09 13:09:21 +02:00
nimlgen
6327b50e51 amd in benchmarks (#4861)
* amd in benchmarks

* remove all hsa
2024-06-08 23:24:46 +03:00
qazal
240d6b5bc0 process replay benchmarks (#4668) 2024-06-01 14:36:21 +03:00
chenyu
38bc38cdff fix llama example quantize (#4699)
* fix llama example quantize

import quantize layers from new example llama3

add to mac benchmark

* fix that

* save the files
2024-05-23 15:35:26 -04:00
chenyu
72560e30fe add CACHELEVEL=0 to tinybox green GEMM BEAM (#4693)
* add CACHELEVEL=0 to tinybox green GEMM BEAM

* BEAM=4 is more stable
2024-05-22 23:59:50 -04:00
wozeparrot
00432496d7 feat: tinyboxgreen (#4366)
* feat: tinyboxgreen

* feat: tinyboxgreenv2

* fix symlink weights

* fix: remove llama 2 70b for now

* feat: naming

* fix: remove extra cifar steps

* feat: disable mixtral on nvidia
2024-05-20 22:39:34 -04:00
chenyu
8a0d1ca7bb CI test timeout 20 min -> 10 min (#4645)
if it takes more than 10 usually setup fails anyway. also updated matmul_kfd -> matmul_amd in benchmark
2024-05-18 13:58:28 -04:00
George Hotz
07b350a8f4 new uops is an actual graph (#4560)
* new uops is an actual graph

* it's way slower

* simpler

* fix define acc

* render_loop unique

* ops test pass

* add pattern matcher back, there's bugs

* rewrite

* use priority queue

* recursive children

* fix tests

* fix tests with SINK

* fix abstractions

* fix assembly

* simpler

* link define_acc

* fix DEFINE_ACC placement

* type verify

* full cmp

* fix cmp

* ACCESS_ACC

* insert DEFINE_ACC

* fix PHI

* recursive rewrite

* fix many tests

* sum collapse

* more patterns

* correct change

* fold arange

* fix that lin test

* space

* big folding rule works

* close

* has more maxes, meh

* cached node replace

* set changed

* simplest folding yet

* works

* works

* DIV

* all tests pass

* del

* fuzz linearizer fails

* sum_collapse

* test depth 2 cf

* fix lin test 14

* fix clang depth

* disable that

* failure 14 is fixed

* fix ptx

* failure 27 is fixed

* fix llama

* run_cnt

* Revert "Optimize PTX gated loads index calculation (#4304)"

This reverts commit d97d5a7689.

* fix uops loop

* fix ptx bugs

* add barrier

* print

* mem_type in ptx direct

* bypass tests that fail in CI but pass locally

* ptx remove ptr_ar

* more ptx passing

* fix ptx tests

* assert compile support

* remove  model inference benchmark from red
2024-05-17 18:00:18 -07:00
chenyu
ca1df20fa9 benchmark name fix - resnet eval is on eval data (#4628) 2024-05-17 12:56:12 -04:00
chenyu
e5d4e6a8aa BEAM=2 in green CI for 100 TFLOPS (#4624) 2024-05-16 23:28:28 -04:00
George Hotz
fd02ab1e8b move disassemblers and openpilot (#4592)
* move disassemblers and openpilot

* delete junk

* put that in pre-commit

* fixup readme
2024-05-14 19:30:02 -07:00
chenyu
5de4a46f10 re-enable gpt2 half/beam mac benchmark (#4496)
* re-enable gpt2 half/beam mac benchmark

from fuzzer it seems to be flaky due to numerical issue, not kernel bug. we used to have half in splitted reduce.

run this in M1 Max for 20 loops and it's fine

* that should be jitted
2024-05-09 19:15:32 -04:00
chenyu
c508eb7425 revert the removal of CAST_BEFORE_VIEW (#4471)
this brings most of the memory gain for resnet back.
2024-05-08 00:14:29 -04:00
chenyu
d4062cb6fc NV tensor_cores in kernel.py (#4399) 2024-05-02 22:33:08 -04:00
chenyu
dce7ac0160 NOCLANG=1 for tinybox green ci. (#4378)
CLANG was disabled for tinybox red for speed
2024-05-01 13:31:01 -04:00
wozeparrot
4a26718ca9 feat: tinyboxgreen (#4365) 2024-04-30 19:05:37 -04:00