mirror of https://github.com/All-Hands-AI/OpenHands.git synced 2026-01-06 21:44:00 -05:00

Files

Aaron Sequeira 4c0f0a1e9b feat: Support Tau-Bench and BFCL evaluation benchmarks (#11953 )

Co-authored-by: openhands <openhands@all-hands.dev>

2025-12-31 03:12:50 +00:00

679 B

Raw Blame History

BFCL (Berkeley Function-Calling Leaderboard) Evaluation

This directory contains the evaluation scripts for BFCL.

Setup

You may need to clone the official BFCL repository or install the evaluation package if available.

# Example setup (adjust as needed)
# git clone https://github.com/ShishirPatil/gorilla.git
# cd gorilla/berkeley-function-call-leaderboard
# pip install -r requirements.txt

Running Evaluation

To run the evaluation, you need to provide the path to the BFCL dataset:

python evaluation/benchmarks/bfcl/run_infer.py \
  --agent-cls CodeActAgent \
  --llm-config <your_llm_config> \
  --dataset-path /path/to/bfcl_dataset.json

679 B Raw Blame History

BFCL (Berkeley Function-Calling Leaderboard) Evaluation

Setup

Running Evaluation

679 B

Raw Blame History