mirror of
https://github.com/All-Hands-AI/OpenHands.git
synced 2026-01-08 22:38:05 -05:00
Docs: Clarify config.toml usage in evaluation harness (#6828)
Co-authored-by: openhands <openhands@all-hands.dev>
This commit is contained in:
@@ -20,6 +20,8 @@ To evaluate an agent, you can provide the agent's name to the `run_infer.py` pro
|
||||
### Evaluating Different LLMs
|
||||
|
||||
OpenHands in development mode uses `config.toml` to keep track of most configuration.
|
||||
**IMPORTANT: For evaluation, only the LLM section in `config.toml` will be used. Other configurations, such as `save_trajectory_path`, are not applied during evaluation.**
|
||||
|
||||
Here's an example configuration file you can use to define and use multiple LLMs:
|
||||
|
||||
```toml
|
||||
@@ -40,6 +42,8 @@ api_key = "XXX"
|
||||
temperature = 0.0
|
||||
```
|
||||
|
||||
For other configurations specific to evaluation, such as `save_trajectory_path`, these are typically set in the `get_config` function of the respective `run_infer.py` file for each benchmark.
|
||||
|
||||
## Supported Benchmarks
|
||||
|
||||
The OpenHands evaluation harness supports a wide variety of benchmarks across [software engineering](#software-engineering), [web browsing](#web-browsing), [miscellaneous assistance](#misc-assistance), and [real-world](#real-world) tasks.
|
||||
|
||||
Reference in New Issue
Block a user