mirror of
https://github.com/All-Hands-AI/OpenHands.git
synced 2026-01-09 14:57:59 -05:00
feat: Support Tau-Bench and BFCL evaluation benchmarks (#11953)
Co-authored-by: openhands <openhands@all-hands.dev>
This commit is contained in:
25
evaluation/benchmarks/bfcl/README.md
Normal file
25
evaluation/benchmarks/bfcl/README.md
Normal file
@@ -0,0 +1,25 @@
|
||||
# BFCL (Berkeley Function-Calling Leaderboard) Evaluation
|
||||
|
||||
This directory contains the evaluation scripts for BFCL.
|
||||
|
||||
## Setup
|
||||
|
||||
You may need to clone the official BFCL repository or install the evaluation package if available.
|
||||
|
||||
```bash
|
||||
# Example setup (adjust as needed)
|
||||
# git clone https://github.com/ShishirPatil/gorilla.git
|
||||
# cd gorilla/berkeley-function-call-leaderboard
|
||||
# pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## Running Evaluation
|
||||
|
||||
To run the evaluation, you need to provide the path to the BFCL dataset:
|
||||
|
||||
```bash
|
||||
python evaluation/benchmarks/bfcl/run_infer.py \
|
||||
--agent-cls CodeActAgent \
|
||||
--llm-config <your_llm_config> \
|
||||
--dataset-path /path/to/bfcl_dataset.json
|
||||
```
|
||||
Reference in New Issue
Block a user