A starting point for SWE-Bench Evaluation with docker (#60)

* a starting point for SWE-Bench evaluation with docker

* fix the swe-bench uid issue

* typo fixed

* fix conda missing issue

* move files based on new PR

* Update doc and gitignore using devin prediction file from #81

* fix typo

* add a sentence

* fix typo in path

* fix path

---------

Co-authored-by: Binyuan Hui <binyuan.hby@alibaba-inc.com>
This commit is contained in:
Xingyao Wang
2024-03-22 12:43:49 +08:00
committed by GitHub
parent dc88dac296
commit 5ff96111f0
9 changed files with 172 additions and 2 deletions

View File

@@ -19,4 +19,6 @@ all the preprocessing/evaluation/analysis scripts.
- resources
- Devin's outputs processed for evaluations is available on [Huggingface](https://huggingface.co/datasets/OpenDevin/Devin-SWE-bench-output)
- get predictions that passed the test: `wget https://huggingface.co/datasets/OpenDevin/Devin-SWE-bench-output/raw/main/devin_swe_passed.json`
- get all predictions`wget https://huggingface.co/datasets/OpenDevin/Devin-SWE-bench-output/raw/main/devin_swe_outputs.json`
- get all predictions `wget https://huggingface.co/datasets/OpenDevin/Devin-SWE-bench-output/raw/main/devin_swe_outputs.json`
See [`SWE-bench/README.md`](./SWE-bench/README.md) for more details on how to run SWE-Bench for evaluation.