feat: Support Tau-Bench and BFCL evaluation benchmarks (#11953)

Co-authored-by: openhands <openhands@all-hands.dev>
2026-01-09 14:57:59 -05:00 · 2025-12-31 06:12:50 +03:00
parent 82e0aa7924
commit 4c0f0a1e9b
6 changed files with 469 additions and 2 deletions
--- a/evaluation/benchmarks/bfcl/README.md
+++ b/evaluation/benchmarks/bfcl/README.md
@@ -0,0 +1,25 @@
+# BFCL (Berkeley Function-Calling Leaderboard) Evaluation
+
+This directory contains the evaluation scripts for BFCL.
+
+## Setup
+
+You may need to clone the official BFCL repository or install the evaluation package if available.
+
+```bash
+# Example setup (adjust as needed)
+# git clone https://github.com/ShishirPatil/gorilla.git
+# cd gorilla/berkeley-function-call-leaderboard
+# pip install -r requirements.txt
+```
+
+## Running Evaluation
+
+To run the evaluation, you need to provide the path to the BFCL dataset:
+
+```bash
+python evaluation/benchmarks/bfcl/run_infer.py \
+  --agent-cls CodeActAgent \
+  --llm-config <your_llm_config> \
+  --dataset-path /path/to/bfcl_dataset.json
+```