mirror of
https://github.com/All-Hands-AI/OpenHands.git
synced 2026-01-08 22:38:05 -05:00
* adding code to fetch and convert devin's output for evaluation * update README.md * update code for fetching and processing devin's outputs * update code for fetching and processing devin's outputs
1.4 KiB
1.4 KiB
Evaluation
This folder contains code and resources to run experiments and evaluations.
Logistics
To better organize the evaluation folder, we should follow the rules below:
- Each subfolder contains a specific benchmark or experiment. For example,
evaluation/SWE-benchshould contain all the preprocessing/evaluation/analysis scripts. - Raw data and experimental records should not be stored within this repo (e.g. Google Drive or Hugging Face Datasets).
- Important data files of manageable size and analysis scripts (e.g., jupyter notebooks) can be directly uploaded to this repo.
Tasks
SWE-bench
- notebooks
devin_eval_analysis.ipynb: notebook analyzing devin's outputs
- scripts
prepare_devin_outputs_for_evaluation.py: script fetching and converting devin's output into the desired json file for evaluation.- usage:
python prepare_devin_outputs_for_evaluation.py <setting>where setting can bepassed,failedorall
- usage:
- resources
- Devin's outputs processed for evaluations is available on Huggingface
- get predictions that passed the test:
wget https://huggingface.co/datasets/OpenDevin/Devin-SWE-bench-output/raw/main/devin_swe_passed.json - get all predictions
wget https://huggingface.co/datasets/OpenDevin/Devin-SWE-bench-output/raw/main/devin_swe_outputs.json
- get predictions that passed the test:
- Devin's outputs processed for evaluations is available on Huggingface