docs: Add visualizer instruction for SWE-Bench (#2529)

* Update README.md for visualizer instruction

* Polish the visualization guidance (#2531)

* fix conda create error

* fix and polish the readme for visualization

* Update README.md

---------

Co-authored-by: Haofei Yu <haofeiy@cs.cmu.edu>
This commit is contained in:
Xingyao Wang
2024-06-20 04:41:09 +08:00
committed by GitHub
parent 0a0f78f2fb
commit b569ba710d

View File

@@ -154,6 +154,33 @@ The final results will be saved to `evaluation/evaluation_outputs/outputs/swe_be
- `report.json`: a JSON file that contains keys like `"resolved"` pointing to instance IDs that are resolved by the agent.
- `summary.json`: a JSON file contains more fine-grained information for each test instance.
## Visualize Results
First you need to clone `https://huggingface.co/spaces/OpenDevin/evaluation` and add your own running results from opendevin into the `outputs` of the cloned repo.
```bash
git clone https://huggingface.co/spaces/OpenDevin/evaluation
```
**(optional) setup streamlit environment with conda**:
```bash
conda create -n streamlit python=3.10
conda activate streamlit
pip install streamlit altair st_pages
```
**run the visualizer**:
Then, in a separate Python environment with `streamlit` library, you can run the following:
```bash
# Make sure you are inside the cloned `evaluation` repo
conda activate streamlit # if you follow the optional conda env setup above
streamlit run 0_📊_OpenDevin_Benchmark.py --server.port 8501 --server.address 0.0.0.0
```
Then you can access the SWE-Bench trajectory visualizer at `localhost:8501`.
## View Result Summary