adding a script to fetch and convert devin's output for evaluation (#81)

* adding code to fetch and convert devin's output for evaluation

* update README.md

* update code for fetching and processing devin's outputs

* update code for fetching and processing devin's outputs
This commit is contained in:
Jiaxin Pei
2024-03-21 13:33:01 -04:00
committed by GitHub
parent b84463f512
commit dc88dac296
2 changed files with 88 additions and 2 deletions

View File

@@ -11,5 +11,12 @@ all the preprocessing/evaluation/analysis scripts.
## Tasks
### SWE-bench
- analysis
- devin_eval_analysis.ipynb: notebook analyzing devin's outputs
- notebooks
- `devin_eval_analysis.ipynb`: notebook analyzing devin's outputs
- scripts
- `prepare_devin_outputs_for_evaluation.py`: script fetching and converting [devin's output](https://github.com/CognitionAI/devin-swebench-results/tree/main) into the desired json file for evaluation.
- usage: `python prepare_devin_outputs_for_evaluation.py <setting>` where setting can be `passed`, `failed` or `all`
- resources
- Devin's outputs processed for evaluations is available on [Huggingface](https://huggingface.co/datasets/OpenDevin/Devin-SWE-bench-output)
- get predictions that passed the test: `wget https://huggingface.co/datasets/OpenDevin/Devin-SWE-bench-output/raw/main/devin_swe_passed.json`
- get all predictions`wget https://huggingface.co/datasets/OpenDevin/Devin-SWE-bench-output/raw/main/devin_swe_outputs.json`