feat: Support Tau-Bench and BFCL evaluation benchmarks (#11953)

Co-authored-by: openhands <openhands@all-hands.dev>
This commit is contained in:
Aaron Sequeira
2025-12-31 06:12:50 +03:00
committed by GitHub
parent 82e0aa7924
commit 4c0f0a1e9b
6 changed files with 469 additions and 2 deletions

View File

@@ -0,0 +1,25 @@
# BFCL (Berkeley Function-Calling Leaderboard) Evaluation
This directory contains the evaluation scripts for BFCL.
## Setup
You may need to clone the official BFCL repository or install the evaluation package if available.
```bash
# Example setup (adjust as needed)
# git clone https://github.com/ShishirPatil/gorilla.git
# cd gorilla/berkeley-function-call-leaderboard
# pip install -r requirements.txt
```
## Running Evaluation
To run the evaluation, you need to provide the path to the BFCL dataset:
```bash
python evaluation/benchmarks/bfcl/run_infer.py \
--agent-cls CodeActAgent \
--llm-config <your_llm_config> \
--dataset-path /path/to/bfcl_dataset.json
```