AutoGPT

mirror of https://github.com/Significant-Gravitas/AutoGPT.git synced 2026-01-10 07:38:04 -05:00

Author	SHA1	Message	Date
Reinier van der Leer	a5de79beb6	ci(benchmark): Add nightly benchmark workflow Added autogpts-benchmark-nightly.yml, which will run every night at 02:00 UTC with a selection of challenges.	2024-02-16 17:41:58 +01:00
Reinier van der Leer	483c01b681	lint(benchmark): Remove unnecessary `pass` statement in __main__.py	2024-02-16 17:27:56 +01:00
Reinier van der Leer	992b8874fc	chore: Update `agbenchmark` dependency for agent and forge	2024-02-16 17:22:58 +01:00
Reinier van der Leer	2a55efb322	fix(benchmark): Include `WebArenaSiteInfo.additional_info` (e.g. credentials) in task input Without the `additional_info`, it is impossible to get past the login page on challenges where that is necessary.	2024-02-16 17:20:44 +01:00
Reinier van der Leer	23d58a3cc0	feat(benchmark/cli): Add `challenge list`, `challenge info` subcommands - Add `challenge list` command with options `--all`, `--names`, `--json` - Add `tabular` dependency - Add `.utils.utils.sorted_by_enum_index` function to easily sort lists by an enum value/property based on the order of the enum's definition - Add `challenge info [name]` command with option `--json` - Add `.utils.utils.pretty_print_model` routine to pretty-print Pydantic models - Refactor `config` subcommand to use `pretty_print_model`	2024-02-16 15:17:11 +01:00
Reinier van der Leer	70e345b2ce	refactor(benchmark): `load_webarena_challenges` - Reduce duplicate and nested statements - Add `skip_unavailable` parameter Related changes: - Add `available` and `unavailable_reason` attributes to `ChallengeInfo` and `WebArenaChallengeSpec` - Add `pytest.skip` statement to `WebArenaChallenge.test_method` to make sure unavailable challenges are not run	2024-02-16 15:11:48 +01:00
Reinier van der Leer	650a701317	chore: Update `agbenchmark` dependency for agent and forge	2024-02-15 18:19:06 +01:00
Reinier van der Leer	679339d00c	feat(benchmark): Make report output folder configurable - Make `AgentBenchmarkConfig.reports_folder` directly configurable (through `REPORTS_FOLDER` env variable). The default is still `./agbenchmark_config/reports`. - Change all mentions of `REPORT_LOCATION` (which fulfilled the same function at some point in the past) to `REPORTS_FOLDER`.	2024-02-15 18:07:45 +01:00
Reinier van der Leer	fd5730b04a	feat(agent/telemetry): Distinguish between `production` and `dev` environment based on VCS state - Added a helper function `.app.utils.vcs_state_diverges_from_master()`. This function determines whether the relevant part of the codebase diverges from our `master`. - Updated `.app.telemetry._setup_sentry()` to determine the default environment name using `vcs_state_diverges_from_master`.	2024-02-15 16:00:30 +01:00
Reinier van der Leer	b7f08cd0f7	feat(agent/telemetry): Enable performance tracing & update opt-in prompt accordingly	2024-02-15 14:46:36 +01:00
Reinier van der Leer	8762f7ab3d	fix(forge): Make `watchfiles` pattern more specific to prevent unwanted (breaking) reloads This fixes the issue of changes in artifacts triggering an application reload (which caused connection errors for in-progress requests).	2024-02-15 13:42:38 +01:00
Reinier van der Leer	a9b7b175ff	fix(agent/profile_generator): Improve robustness by leveraging `create_chat_completion`'s parse handling	2024-02-15 11:48:07 +01:00
Reinier van der Leer	52b93dd84e	fix(cli/agent start): Wait for applications to finish starting before returning - Added a helper function `wait_until_conn_ready(port)` to wait for the benchmark and agent applications to finish starting - Improved the CLI's own logging (within the `agent start` command)	2024-02-15 11:26:26 +01:00
Reinier van der Leer	6a09a44ef7	lint(agent): Fix telemetry.py linting error & formatting	2024-02-14 23:31:35 +01:00
Toran Bruce Richards	32a627eda9	Add Privacy Policy link to telementry opt-in.	2024-02-14 16:42:34 +00:00
Reinier van der Leer	67bafa6302	fix(autogpt/llm): `AssistantChatMessage.tool_calls` default `[]` instead of `None` OpenAI ChatCompletion calls fail when `tool_calls = None`. This issue came to light after `22aba6d`.	2024-02-14 14:34:04 +01:00
Reinier van der Leer	6017eefb32	ci: Enable telemetry in CI runs on `master`	2024-02-14 12:03:54 +01:00
Reinier van der Leer	ae197fc85f	feat(agent/telemetry): Distinguish between users This allows us to get a much better sense of how many users actually experience issues, and how issue occurrence is distributed among users.	2024-02-14 11:50:45 +01:00
Reinier van der Leer	22aba6dd8a	fix(agent/llm): Include bad response in parse-fix prompt in `OpenAIProvider.create_chat_completion` Apparently I forgot to also append the response that caused the parse error before throwing it back to the LLM and letting it fix its mistake(s).	2024-02-14 11:20:31 +01:00
Reinier van der Leer	88bbdfc7fc	ci: Pick 3 challenges to run with `--mock` in smoke test CI	2024-02-14 02:30:03 +01:00
Reinier van der Leer	d0c9b7c405	lint(benchmark): Remove unused imports	2024-02-14 01:34:30 +01:00
Reinier van der Leer	e7698a4610	chore(agent): Update `forge` and `agbenchmark` dependencies	2024-02-14 01:32:28 +01:00
Reinier van der Leer	ab05b7ae70	chore(forge): Update `agbenchmark` dependency	2024-02-14 01:27:07 +01:00
Reinier van der Leer	327fb1f916	fix(benchmark): Mock mode, python evals, `--attempts` flag, challenge definitions - Fixed `--mock` mode - Moved interrupt to beginning of the step iterator pipeline (from `BuiltinChallenge` to `agent_api_interface.py:run_api_agent`). This ensures that any finish-up code is properly executed after executing a single step. - Implemented mock mode in `WebArenaChallenge` - Fixed `fixture 'i_attempt' not found` error when `--attempts`/`-N` is omitted - Fixed handling of `python`/`pytest` evals in `BuiltinChallenge` - Disabled left-over Helicone code (see `056163e`) - Fixed a couple of challenge definitions - WebArena task 107: fix spelling of months (Sepetember, Octorbor lmao) - synthesize/1_basic_content_gen (SynthesizeInfo): remove empty string from `should_contain` list - Added some debug logging in agent_api_interface.py and challenges/builtin.py	2024-02-14 01:05:34 +01:00
Reinier van der Leer	bb7f5abc6c	fix(agent/text_processing): Fix `extract_information` LLM response parsing OpenAI's newest models return JSON with markdown fences around it, breaking the `json.loads` parser. This commit adds an `extract_list_from_response` function to json_utils/utilities.py and uses this function to replace `json.loads` in `_process_text`.	2024-02-13 18:28:17 +01:00
Reinier van der Leer	393d6b97e6	feat(agent): Add Sentry integration for telemetry * Add Sentry integration for telemetry - Add `sentry_sdk` dependency - Add setup logic and config flow using `TELEMETRY_OPT_IN` environment variable - Add app/telemetry.py with `setup_telemetry` helper routine - Call `setup_telemetry` in `cli()` in app/cli.py - Add `TELEMETRY_OPT_IN` to .env.template - Add helper function `env_file_exists` and routine `set_env_config_value` to app/utils.py - Add unit tests for `set_env_config_value` in test_utils.py - Add prompt to startup to ask whether the user wants to enable telemetry if the env variable isn't set * Add `capture_exception` statements for LLM parsing errors and command failures	2024-02-13 18:10:52 +01:00
Reinier van der Leer	3b8d63dfb6	chore(agent): Update autogpt-forge and agbenchmark dependencies to propagate dependency updates This also indirectly updates `python-multipart` and fixes "python-multipart vulnerable to Content-Type Header ReDoS" https://github.com/Significant-Gravitas/AutoGPT/security/dependabot/57	2024-02-13 13:24:24 +01:00
Reinier van der Leer	6763196d78	chore(forge): Update agbenchmark dependency	2024-02-13 12:44:17 +01:00
Reinier van der Leer	e1da58da02	chore(forge): Update aiohttp, fastapi, and python-multipart dependencies to mitigate vulnerabilities Addressed vulnerabilities: - python-multipart vulnerable to Content-Type Header ReDoS - https://github.com/Significant-Gravitas/AutoGPT/security/dependabot/56 Dependants: - FastAPI Content-Type Header ReDoS - https://github.com/Significant-Gravitas/AutoGPT/security/dependabot/52 - Starlette Content-Type Header ReDoS - https://github.com/Significant-Gravitas/AutoGPT/security/dependabot/49 - aiohttp is vulnerable to directory traversal - https://github.com/Significant-Gravitas/AutoGPT/security/dependabot/45 - aiohttp's HTTP parser (the python one, not llhttp) still overly lenient about separators - https://github.com/Significant-Gravitas/AutoGPT/security/dependabot/42	2024-02-13 12:38:36 +01:00
Reinier van der Leer	91cec515d4	chore(benchmark): Update `python-multipart` dependency to mitigate vulnerability - python-multipart vulnerable to Content-Type Header ReDoS - https://github.com/Significant-Gravitas/AutoGPT/security/dependabot/55	2024-02-13 12:36:00 +01:00
Reinier van der Leer	cc585a014f	chore(agent): Update aiohttp and fastapi dependencies to mitigate vulnerabilities Addressed vulnerabilities: - python-multipart vulnerable to Content-Type Header ReDoS - https://github.com/Significant-Gravitas/AutoGPT/security/dependabot/57 Dependants: - FastAPI Content-Type Header ReDoS - https://github.com/Significant-Gravitas/AutoGPT/security/dependabot/54 - Starlette Content-Type Header ReDoS - https://github.com/Significant-Gravitas/AutoGPT/security/dependabot/50 - aiohttp is vulnerable to directory traversal - https://github.com/Significant-Gravitas/AutoGPT/security/dependabot/44 - aiohttp's HTTP parser (the python one, not llhttp) still overly lenient about separators - https://github.com/Significant-Gravitas/AutoGPT/security/dependabot/41	2024-02-13 12:30:12 +01:00
Reinier van der Leer	e641cccb42	chore(benchmark): Update `aiohttp` and `fastapi` dependencies to mitigate vulnerabilities Addressed vulnerabilities: - python-multipart vulnerable to Content-Type Header ReDoS - https://github.com/Significant-Gravitas/AutoGPT/security/dependabot/55 Dependants: - FastAPI Content-Type Header ReDoS - https://github.com/Significant-Gravitas/AutoGPT/security/dependabot/53 - Starlette Content-Type Header ReDoS - https://github.com/Significant-Gravitas/AutoGPT/security/dependabot/48 - aiohttp is vulnerable to directory traversal - https://github.com/Significant-Gravitas/AutoGPT/security/dependabot/46 - aiohttp's HTTP parser (the python one, not llhttp) still overly lenient about separators - https://github.com/Significant-Gravitas/AutoGPT/security/dependabot/43	2024-02-13 12:21:52 +01:00
Mahdi Karami	cc73d4104b	fix(forge): incorrect import 'sdk' in .actions.finish (#6822 )	2024-02-13 11:02:03 +01:00
Reinier van der Leer	250552cb3d	fix(agent/tests): Update test_config.py:test_initial_values	2024-02-12 13:26:47 +01:00
Reinier van der Leer	1d653973e9	feat(agent/llm): Use new OpenAI models as default `SMART_LLM`, `FAST_LLM`, and `EMBEDDING_MODEL` - Change default `SMART_LLM` from `gpt-4` to `gpt-4-turbo-preview` - Change default `FAST_LLM` from `gpt-3.5-turbo-16k` to `gpt-3.5-turbo-0125` - Change default `EMBEDDING_MODEL` from `text-embedding-ada-002` to `text-embedding-3-small` - Update .env.template, azure.yaml.template, and documentation accordingly	2024-02-12 13:19:37 +01:00
Reinier van der Leer	7bf9ba5502	chore(agent/llm): Update OpenAI model info - Add `text-embedding-3-small` and `text-embedding-3-large` as `EMBEDDING_v3_S` and `EMBEDDING_v3_L` respectively - Add `gpt-3.5-turbo-0125` as `GPT3_v4` - Add `gpt-4-1106-vision-preview` as `GPT4_v3_VISION` - Add GPT-4V models to info map - Change chat model info mapping to derive info for aliases (e.g. `gpt-3.5-turbo`) from specific versions instead of the other way around	2024-02-12 13:17:20 +01:00
Reinier van der Leer	14c9773890	ci(agent): Add `GIT_REVISION` label to Docker builds	2024-02-12 12:31:04 +01:00
Reinier van der Leer	39fddb1214	fix(agent): Fix application of `extra_request_headers` in `OpenAIProvider`	2024-02-12 12:21:30 +01:00
Reinier van der Leer	fe0923ba6c	feat(agent/web): Add browser extensions to deal with cookie walls and ads (#6778 ) * Add `_sideload_chrome_extensions` subroutine to `open_page_in_browser` in web_selenium.py * Sideloads uBlock Origin and I Still Don't Care About Cookies, downloading them if necessary * Add 2-second delay to `open_page_in_browser` to allow time for handling cookie walls	2024-02-02 18:30:37 +01:00
Reinier van der Leer	dfaeda7cd5	lint(agent/tests): Fix line length in test_utils.py	2024-02-02 18:29:28 +01:00
Reinier van der Leer	9b7fee673e	fix(agent/tests): Update `test_utils.py:test_extract_json_from_response*` in accordance with `956cdc7` Commit `956cdc7` "fix(agent/json_utils): Decode as JSON rather than Python objects" broke these unit tests because they generated "JSON" by stringifying a Python object.	2024-02-02 18:21:19 +01:00
Reinier van der Leer	925269d17b	lint(agent): Fix line length in docstring of `EpisodicActionHistory.handle_compression`	2024-02-02 17:43:42 +01:00
Fernando Navarro Páez	266fe3a3f7	fix(forge): Fix "no module named 'forge.sdk.abilities'" (#6571 ) Fixes #6537	2024-02-01 11:23:35 +01:00
Reinier van der Leer	66e0c87894	feat(agent): Add history compression to increase longevity and efficiency * Compress steps in the prompt to reduce token usage, and to increase longevity when using models with limited context windows * Move multiple copies of step formatting code to `Episode.format` method * Add `EpisodicActionHistory.handle_compression` method to handle compression of new steps	2024-01-31 17:51:45 +01:00
Reinier van der Leer	55433f468a	feat(agent/web): Improve `read_webpage` information extraction abilities * Implement `extract_information` function in `autogpt.processing.text` module. This function extracts pieces of information from a body of text based on a list of topics of interest. * Add `topics_of_interest` and `get_raw_content` parameters to `read_webpage` commmand * Limit maximum content length if `get_raw_content=true` is specified	2024-01-31 15:08:08 +01:00
Reinier van der Leer	956cdc77fa	fix(agent/json_utils): Decode as JSON rather than Python objects * Replace `ast.literal_eval` with `json.loads` in `extract_dict_from_response` This fixes a bug where boolean values could not be decoded because of their required capitalization in Python.	2024-01-31 14:15:02 +01:00
Reinier van der Leer	83a0b03523	fix(agent/prompting): Fix representation of (optional) command parameters in prompt	2024-01-31 14:10:22 +01:00
Reinier van der Leer	25b9e290a5	fix(agent/json_utils): Make `extract_dict_from_response` more robust * Accommodate for both ```json and ```JSON blocks in responses	2024-01-29 15:03:09 +01:00
Reinier van der Leer	ab860981d8	feat(agent/llm): Add support for `gpt-4-0125-preview` * Add `gpt-4-0125-preview` model to OpenAI model list * Add `gpt-4-turbo-preview` alias to OpenAI model list	2024-01-29 11:22:32 +01:00
Reinier van der Leer	a0cae78ba3	feat(benchmark): Add `-N`, `--attempts` option for multiple attempts per challenge LLMs are probabilistic systems. Reproducibility of completions is not guaranteed. It only makes sense to account for this, by running challenges multiple times to obtain a success ratio rather than a boolean success/failure result. Changes: - Add `-N`, `--attempts` option to CLI and `attempts_per_challenge` parameter to `main.py:run_benchmark`. - Add dynamic `i_attempt` fixture through `pytest_generate_tests` hook in conftest.py to achieve multiple runs per challenge. - Modify `pytest_runtest_makereport` hook in conftest.py to handle multiple reporting calls per challenge. - Refactor report_types.py, reports.py, process_report.ty to allow multiple results per challenge. - Calculate `success_percentage` from results of the current run, rather than all known results ever. - Add docstrings to a number of models in report_types.py. - Allow `None` as a success value, e.g. for runs that did not render any results before being cut off. - Make SingletonReportManager thread-safe.	2024-01-22 17:16:55 +01:00

... 2 3 4 5 6 ...

5277 Commits