response filter (#1039)

* response filter * rewrite implement based on the filter * multi responses * abs path * code handling * option to not use docker * context * eval_only -> raise_error * notebook * utils * utils * separate tests * test * test * test * test * test * test * test * test * **config in test() * test * test * filename
2026-04-20 03:02:16 -04:00 · 2023-05-21 15:22:29 -07:00
parent 7de4eb347d
commit e463146cb8
21 changed files with 2253 additions and 1820 deletions
--- a/website/docs/Examples/AutoGen-OpenAI.md
+++ b/website/docs/Examples/AutoGen-OpenAI.md
@@ -128,10 +128,10 @@ print(eval_with_generated_assertions(oai.Completion.extract_text(response), **tu
 You can use flaml's `oai.Completion.test` to evaluate the performance of an entire dataset with the tuned config.

 ```python
-result = oai.Completion.test(test_data, config)
+result = oai.Completion.test(test_data, **config)
 print("performance on test data with the tuned config:", result)
 ```

 The result will vary with the inference budget and optimization budget.

-[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/autogen_openai.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/autogen_openai.ipynb)
+[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/autogen_openai_completion.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/autogen_openai_completion.ipynb)