10 Commits

Author SHA1 Message Date
afourney
f8b4b4259b Adds the GAIA benchark to the Testbed. This PR depends on #792 (#810)
* Re-added completion logging when using older versions of autogen.

* Extended scenario definitions and templating to include folders.

* Prepare collate_human_eval.py for working with group chat scenarios.

* Converted HumanEval to the folder-based approach, and added GroupChat scenarios.

* Fixed the default termination message.

* Fixed another termination condition.

* Updated compatible autogen versions.

* Added initial support for GAIA benchmark.

* Fixed a bug in executing the finalize scripts.

* Generalized the template further to support multiple folder copy operations.

* Refined GAIA support, and broke scenarios down by difficulty.

* Added some experimental scripts for computing metrics over GAIA. This is a first version, and will likely need refinement.

* Added instructions for cloning GAIA

* Updated README to fix some typos.

* Added a script to format GAIA reslts for the leaderboard.

* Update samples/tools/testbed/scenarios/GAIA/Templates/BasicTwoAgents/scenario.py

Co-authored-by: LeoLjl <3110503618@qq.com>

---------

Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
Co-authored-by: LeoLjl <3110503618@qq.com>
2023-12-06 01:46:10 +00:00
afourney
a107233e23 Testbed can now read the OPENAI_API_KEY in addition to the OAI_CONFIG_LIST (#848)
Co-authored-by: Victor Dibia <victordibia@microsoft.com>
2023-12-04 22:14:00 +00:00
Victor Dibia
5547e3b919 Improvements to AutoGen Assistant (#828)
* improve template for files, integreate files in db

* ui update, improvements to file display grid

* add new global skill for image generation

* update readme to address #739

* utils.py refactor, separate db uitls for ease of development

* db utils

* add support for sessions both in backend api and ui

* improve implementation for session support

* add early v1 support for a gallery and publishing to a gallery

* rewrite logic for file storage representation. Store only file references on in db

* update generate image logic

* update ui layout

* fix light dark mode bug

* v1 support for showing items added to gallery

* remove viewer as it is merged in gallery

* formatting updates

* QOL refactoring

* readme and general updates

* add example notebook on assistant api

* imporve naming conventions and formatting

* readme update

* Update samples/apps/autogen-assistant/pyproject.toml

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update samples/apps/autogen-assistant/notebooks/tutorial.ipynb

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
2023-12-02 00:22:02 +00:00
afourney
45c2a78970 Testbed folders (#792)
* Re-added completion logging when using older versions of autogen.

* Extended scenario definitions and templating to include folders.

* Prepare collate_human_eval.py for working with group chat scenarios.

* Converted HumanEval to the folder-based approach, and added GroupChat scenarios.

* Fixed the default termination message.

* Fixed another termination condition.

* Updated compatible autogen versions.

* Fixed a bug in executing the finalize scripts.

* Generalized the template further to support multiple folder copy operations.

* Add tests from AutoGPT.

* Update README.md

* Fix typo

* Update samples/tools/testbed/README.md

---------

Co-authored-by: LeoLjl <3110503618@qq.com>
Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
2023-11-30 16:43:03 +00:00
Victor Dibia
143e49c6e8 Sample Web Application Built with AutoGen (#695)
* Adding research assistant code

* Adding research assistant code

* checking in RA files

* Remove used text file

* Update README.md to include Saleema's name to the Contributors list.

* remove extraneous files

* update gitignore

* improve structure on global skills

* fix linting error

* readme update

* readme update

* fix wrong function bug

* readme update

* update ui build

* cleanup, remove unused modules

* readme and docs updates

* set default user

* ui build update

* add screenshot to improve instructions

* remove logout behaviour, replace with note to developers to add their own logout logic

* Create blog and edit ARA README

* Added the stock prices example in the readme for ARA

* Include edits from review with Saleema

* fix format issues

* Cosmetic changes for betting debug messages

* edit authors

* remove references to request_timeout to support autogen v0.0.2

* update bg color for UI

* readme update

* update research assistant blog post

* omit samples folder from codecov

* ui build update + precommit refactor

* formattiing updates  fromo pre-commit

* readme update

* remove compiled source files

* update gitignore

* refactor, file removals

* refactor for improved structure - datamodel, chat and db helper

* update gitignore

* refactor, file removals

* refactor for improved structure - datamodel, chat and db helper

* refactor skills view

* general refactor

* gitignore update and general refactor

* skills update

* general refactor

* ui folder structure refactor

* improve support for skills loading

* add fetch profile default skill

* refactor chat to autogenchat

* qol refactor

* improve metadata display

* early support for autogenflow in ui

* docs update general refactor

* general refactor

* readme update

* readme update

* readme and cli update

* pre-commit updates

* precommit update

* readme update

* add steup.py for older python build versions

* add manifest.in, update app icon

* in-progress changes to agent specification

* remove use_cache refs

* update datamodel, and fix for default serverurl

* request_timeout

* readme update, fix autogen values

* fix pyautogen version

* precommit formatting and other qol items

* update folder structure

* req update

* readme and docs update

* docs update

* remove duplicate in yaml file

* add support for explicit skills addition

* readme and documentation updates

* general refactor

* remove blog post, schedule for future PR

* readme update, add info on llmconfig

* make use_cache False by default unless set

* minor ui updates

* upgrade ui to use latest uatogen lib version 0.2.0b5

* Ui refactor, support for adding arbitrary model specifications

* formatting/precommit checks

* update readme, utils default skill

---------

Co-authored-by: Piali Choudhury <pialic@microsoft.com>
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
2023-11-20 18:40:30 +00:00
afourney
f790109271 Re-added completion logging when using older versions of autogen. (#701) 2023-11-18 17:11:25 +00:00
afourney
b0a6d72b8c Addresses issue 635, relating to newlines in Windows. (#678) 2023-11-15 22:27:10 +00:00
afourney
72f488e4d7 Allows users to specify a different requirements.txt file to install in Docker, to test other versions or branches of Autogen. Closes #662 (#671) 2023-11-15 00:33:09 +00:00
afourney
c37453735a Sets the umask before executing the task in Docker. (#593)
* Sets the umask before executing the task in Docker.

* Added version backward compatibility for disabling cache and setting timeouts.
2023-11-14 21:14:38 +00:00
afourney
1c4a5e6a1a Added a simple Testbed tool for repeatedly running templated Autogen scenarios with tightly-controlled initial conditions. (#455)
* Initial commit of the autogen testbed environment.

* Fixed some typos in the Testbed README.md

* Added some stricter termination logic to the two_agent scenario, and swiched the logo task from finding Autogen's logo, to finding Microsoft's (it's easier)

* Added documentation to testbed code in preparation for PR

* Added a variation of HumanEval to the Testbed. It is also a reasonable example of how to integrate other benchmarks.

* Removed ChatCompletion.start_logging and related features. Added an explicit TERMINATE output to HumanEval to save 1 turn in each conversation.

* Added metrics utils script for HumanEval

* Updated the requirements in the README.

* Added documentation for HumanEval csv schemas

* Standardized on how the OAI_CONFIG_LIST is handled.

* Removed dot-slash from 'includes' path for cross-platform compatibility

* Missed a file.

* Updated readme to include known-working versions.
2023-11-04 10:38:43 +00:00