A starting point for SWE-Bench Evaluation with docker (#60)

* a starting point for SWE-Bench evaluation with docker * fix the swe-bench uid issue * typo fixed * fix conda missing issue * move files based on new PR * Update doc and gitignore using devin prediction file from #81 * fix typo * add a sentence * fix typo in path * fix path --------- Co-authored-by: Binyuan Hui <binyuan.hby@alibaba-inc.com>
2026-01-09 14:57:59 -05:00 · 2024-03-22 12:43:49 +08:00
parent dc88dac296
commit 5ff96111f0
9 changed files with 172 additions and 2 deletions
--- a/evaluation/README.md
+++ b/evaluation/README.md
@@ -19,4 +19,6 @@ all the preprocessing/evaluation/analysis scripts.
 - resources
  - Devin's outputs processed for evaluations is available on [Huggingface](https://huggingface.co/datasets/OpenDevin/Devin-SWE-bench-output)
    - get predictions that passed the test: `wget https://huggingface.co/datasets/OpenDevin/Devin-SWE-bench-output/raw/main/devin_swe_passed.json`
-    - get all predictions`wget https://huggingface.co/datasets/OpenDevin/Devin-SWE-bench-output/raw/main/devin_swe_outputs.json`
+    - get all predictions `wget https://huggingface.co/datasets/OpenDevin/Devin-SWE-bench-output/raw/main/devin_swe_outputs.json`
+
+See [`SWE-bench/README.md`](./SWE-bench/README.md) for more details on how to run SWE-Bench for evaluation.