* [eval] increase timeout for swebench eval init/complete
* allow CmdRunAction to optionally block when .timeout is setted
* fix unit test for serialization
* fix unit tests for security analyzer
* fix integration tests
* add more timeout
* only check P2P when instances are non-empty;
convert P2P and F2P columns to string instead of list
---------
Co-authored-by: Graham Neubig <neubig@gmail.com>
* [eval] increase timeout for swebench eval init/complete
* allow CmdRunAction to optionally block when .timeout is setted
* fix unit test for serialization
* fix unit tests for security analyzer
* fix integration tests
* add more timeout
* Fail the Runtime tests check if the previous jobs were cancelled or failed
* Check only for failures
* Add test exit
* Test cancel
* fix conditions
* fix condition again
* Try another way
* Now test success
* move filematching logic into server
* wait until ready before returning
* show loading message instead of empty
* logspam
* delint
* fix type
* add a few more default ignores
* Reduce runtime tests duration by running them across CPUs
* fix hardcoded image name
* test two cpus
* Test folder change
* Up the CPU to 4 again to test
* Change to 3 CPUs
* Down to 2
* Add param to remove all openhands containers
* Add comment
* Add reruns just in case
* Fix ordering of if
* Add Handling of Cache Prompt When Formatting Messages
* Fix Value for Cache Control
* Fix Value for Cache Control
* Update openhands/core/message.py
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
* Fix lint error
* Serialize Messages if Propt Caching Is Enabled
* Remove formatting message change
---------
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: tobitege <10787084+tobitege@users.noreply.github.com>
* dummy test change
* regen yml: 1st install python 3.11, then poetry
* fix caching for poetry; old entry for python was rather useless
* fix steps order (cache before poetry)
* add poetry caching to ghcr_runtime; fix fork conditions
* ghcr_runtime: more caching actions; condition fixes
* fix interim action error (order of steps)
* cache@v4 instead of v3
* fixed interim typo for 2 fork conditions
* runtime/test_env_vars: compacted multiple tests into one to reduce time
* ugh if fork condition changes again
* (feat) making prompt caching optional instead of enabled default
At present, only the Claude models support prompt caching as a experimental feature, therefore, this feature should be implemented as an optional setting rather than being enabled by default.
Signed-off-by: Yi Lin <teroincn@gmail.com>
* handle the conflict
* fix unittest mock return value
* fix lint error in whitespace
---------
Signed-off-by: Yi Lin <teroincn@gmail.com>
* Update docs on LLM providers for consistency
* Update headless command
* minor tweaks based on feedback
---------
Co-authored-by: Robert Brennan <contact@rbren.io>
Co-authored-by: Robert Brennan <accounts@rbren.io>
* feat: add SWE-bench fullset support
* fix instance image list
* update eval script and documentation
* increase timeout for remote runtime
* add push script
* handle the case when ret push is an generator
* update pbar
* set SWE-Bench default to run SWE-Bench lite
* add script to cleanup remote runtime
* fix the cases when tag is too long
* update README
* update readme for cleanup
* rename od to oh
* Update evaluation/swe_bench/README.md
Co-authored-by: Graham Neubig <neubig@gmail.com>
* Update evaluation/swe_bench/README.md
Co-authored-by: Graham Neubig <neubig@gmail.com>
* Update evaluation/swe_bench/scripts/cleanup_remote_runtime.sh
Co-authored-by: Graham Neubig <neubig@gmail.com>
* Update evaluation/swe_bench/scripts/cleanup_remote_runtime.sh
Co-authored-by: Graham Neubig <neubig@gmail.com>
* Update evaluation/swe_bench/scripts/cleanup_remote_runtime.sh
Co-authored-by: Graham Neubig <neubig@gmail.com>
* gets API key and Runtime from env var
---------
Co-authored-by: Graham Neubig <neubig@gmail.com>
* update badges
* fix badges
* better badges
* move credits
* more badge work
* add gh logo
* update some copy
* update logo
* fix height
* update text
* emdash
* remove cruft
* move title
* update links
* add hr
* white logo
* move some stuff to getting-started
* revert logo
* more copy changes
* minor tweaks
* fix sidebar
* explicit sidebar
* words
* fix tag
* fix how-to
* more docs work
* update styles
* fix up custom sandbox docs
* change eval title
* fix up getting-started
* fix getting started
* update to 0.9.2
* update screenshot
* add company link
* fix dark mode
* minor fixes
* update image
* update headless and cli docs
* update readme
* fix links
* revert package
* rename links
* fix links
* fix link
* chagne to claude
* Add documentation for CLI mode
Fixes#3703
Add documentation for CLI mode in OpenHands.
* **New Documentation**: Add `docs/modules/usage/how-to/cli-mode.md` to document CLI mode.
- Include instructions on starting an interactive OpenHands session via the command line.
- Explain the difference between CLI mode and headless mode.
- Provide examples of CLI commands and expected outputs.
* **Update Existing Documentation**: Modify `docs/modules/usage/how-to/headless-mode.md`.
- Clarify the difference between headless mode and CLI mode.
- Add a reference to the new CLI mode documentation.
* Update cli-mode.md
* Update headless-mode.md
---------
Co-authored-by: Robert Brennan <accounts@rbren.io>
Co-authored-by: tofarr <tofarr@gmail.com>
* update badges
* fix badges
* better badges
* move credits
* more badge work
* add gh logo
* update some copy
* update logo
* fix height
* update text
* emdash
* remove cruft
* move title
* update links
* add hr
* white logo
* move some stuff to getting-started
* revert logo
* more copy changes
* minor tweaks
* words
* fix getting started
* update to 0.9.2
* update screenshot
* minor fixes
* CodeActAgent: fix message prep if prompt caching is not supported
* fix python version in regen tests workflow
* fix in conftest "mock_completion" method
* add disable_vision to LLMConfig; revert change in message parsing in llm.py
* format messages in several files for completion
* refactored message(s) formatting (llm.py); added vision_is_active()
* fix a unit test
* regenerate: added LOG_TO_FILE and FORCE_REGENERATE env flags
* try to fix path to logs folder in workflow
* llm: prevent index error
* try FORCE_USE_LLM in regenerate
* tweaks everywhere...
* fix 2 random unit test errors :(
* added FORCE_REGENERATE_TESTS=true to regenerate CLI
* fix test_lint_file_fail_typescript again
* double-quotes for env vars in workflow; llm logger set to debug
* fix typo in regenerate
* regenerate iterations now 20; applied iteration counter fix by Li
* regenerate: pass FORCE_REGENERATE flag into env
* fixes for int tests. several mock files updated.
* browsing_agent: fix response_parser.py adding ) to empty response
* test_browse_internet: fix skipif and revert obsolete mock files
* regenerate: fi bracketing for http server start/kill conditions
* disable test_browse_internet for CodeAct*Agents; mock files updated after merge
* missed to include more mock files earlier
* reverts after review feedback from Li
* forgot one
* browsing agent test, partial fixes and updated mock files
* test_browse_internet works in my WSL now!
* adapt unit test test_prompt_caching.py
* add DEBUG to regenerate workflow command
* convert regenerate workflow params to inputs
* more integration test mock files updated
* more files
* test_prompt_caching: restored test_prompt_caching_headers purpose
* file_ops: fix potential exception, like "cross device copy"; fixed mock files accordingly
* reverts/changes wrt feedback from xingyao
* updated docs and config template
* code cleanup wrt review feedback
* Catch exception and return finish action with an exception message in case of exception in llm completion
* Remove exception logs
* Raise llm response error for any exception in llm completion
* Raise LLMResponseError from async completion and async streaming completion as well
* feat: add SWE-bench fullset support
* fix instance image list
* update eval script and documentation
* increase timeout for remote runtime
* add push script
* handle the case when ret push is an generator
* update pbar
* set SWE-Bench default to run SWE-Bench lite
* feat: add SWE-bench fullset support
* fix instance image list
* update eval script and documentation
* add push script
* handle the case when ret push is an generator
* update pbar
Fix a potential issue that might lead to file corruption when edit linting is enabled
#3124 introduces a feature for editing: running linter twice before and after the change and only extract new errors introduced by the agent. This has some potential issues and I am working on #3649 to address them, but I feel like I am not gonna finish it in the next few days, and that PR has become harder and harder to review, thus this PR, which only focuses on a small improvement.
So what's the issue? When we run linters on the original file before our edits, we need to copy the original file and use a temporary file to lint, because linting may have side-effect (e.g. modifying the file in-place). I used the word "may" because:
Flake8 has no side-effect, so not a problem as of now.
We don't enforce this or document this "no side-effect" as a requirement for linter implementation, so side-effect is allowed.
Regardless, the "after-edit-linting" uses the same approach: backup the file before linting to avoid data corruption. We should keep our "before-edit-linting" consistent.
Why no new unittest that reproduces the issue? Well, as I have mentioned earlier, flake8 has no side-effect, so technically it's not a bug but a flaw. Therefore, there's no way to write a test that reproduces the issue.
* improve file editing prompts and unit test
converted most raise calls to a _output_error call in file_ops.py
* tweaks in test_agent_skill.py wrt to SEP separator
* tweaked the separator
* remove server runtime remnants and TEST_RUNTIME references
* restore use of TEST_RUNTIME args and variables
* fix integration tests
* added hint to properly escape docstrings
* revert latest prompt change
---------
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
value:Thank you for taking the time to fill out this bug report. We greatly appreciate your effort to complete this template fully. Please provide as much information as possible to help us understand and address the issue effectively.
value:Thank you for taking the time to fill out this bug report. Please provide as much information as possible to help us understand and address the issue effectively.
- type:checkboxes
attributes:
label:Is there an existing issue for the same bug?
description:Please check if an issue already exists for the bug you encountered.
options:
- label:I have checked the troubleshooting document at https://docs.all-hands.dev/modules/usage/troubleshooting
required:true
- label:I have checked the existing issues.
required:true
- type:textarea
id:bug-description
attributes:
label:Describe the bug
description:Provide a short description of the problem.
label:Describe the bug and reproduction steps
description:Provide a description of the issue along with any reproduction steps.
validations:
required:true
- type:textarea
id:current-version
- type:dropdown
id:installation
attributes:
label:Current OpenHands version
description:What version of OpenHands are you using? If you're running in docker, tell us the tag you're using (e.g. ghcr.io/all-hands-ai/openhands:0.3.1).
render:bash
validations:
required:true
label:OpenHands Installation
description:How are you running OpenHands?
options:
- Docker command in README
- GitHub resolver
- Development workflow
- app.all-hands.dev
- Other
default:0
- type:textarea
id:config
- type:input
id:openhands-version
attributes:
label:Installation and Configuration
description:Please provide any commands you ran and any configuration (redacting API keys)
render:bash
validations:
required:true
label:OpenHands Version
description:What version of OpenHands are you using?
placeholder:ex. 0.9.8, main, etc.
- type:textarea
id:model-agent
attributes:
label:Model and Agent
description:What model and agent are you using? You can see these settings in the UI by clicking the settings wheel.
placeholder:|
- Model:
- Agent:
- type:textarea
id:os-version
- type:dropdown
id:os
attributes:
label:Operating System
description:What Operating System are you using? Linux, Mac OS, WSL on Windows
- type:textarea
id:repro-steps
attributes:
label:Reproduction Steps
description:Please list the steps to reproduce the issue.
placeholder:|
1.
2.
3.
options:
- MacOS
- Linux
- WSL on Windows
- type:textarea
id:additional-context
attributes:
label:Logs, Errors, Screenshots, and Additional Context
description:If you want to share the chat history you can click the thumbs-down (👎) button above the input field and you will get a shareable link (you can also click thumbs up when things are going well of course!). LLM logs will be stored in the `logs/llm/default` folder. Please add any additional context about the problem here.
description:Please provide any additional information you think might help. If you want to share the chat history
you can click the thumbs-down (👎) button above the input field and you will get a shareable link
(you can also click thumbs up when things are going well of course!). LLM logs will be stored in the
`logs/llm/default` folder. Please add any additional context about the problem here.
poetry run ./evaluation/benchmarks/swe_bench/scripts/run_infer.sh llm.eval HEAD CodeActAgent 300 30 $N_PROCESSES "princeton-nlp/SWE-bench_Lite" test
OUTPUT_FOLDER=$(find evaluation/evaluation_outputs/outputs/princeton-nlp__SWE-bench_Lite-test/CodeActAgent -name "deepseek-chat_maxiter_50_N_*-no-hint-run_1" -type d | head -n 1)
echo "OUTPUT_FOLDER for SWE-bench evaluation: $OUTPUT_FOLDER"
poetry run ./evaluation/benchmarks/swe_bench/scripts/eval_infer_remote.sh $OUTPUT_FOLDER/output.jsonl $N_PROCESSES "princeton-nlp/SWE-bench_Lite" test
poetry run ./evaluation/benchmarks/swe_bench/scripts/eval/summarize_outputs.py $OUTPUT_FOLDER/output.jsonl > summarize_outputs.log 2>&1
echo "SWEBENCH_REPORT<<EOF" >> $GITHUB_ENV
cat summarize_outputs.log >> $GITHUB_ENV
echo "EOF" >> $GITHUB_ENV
- name:Create tar.gz of evaluation outputs
run:|
TIMESTAMP=$(date +'%y-%m-%d-%H-%M')
tar -czvf evaluation_outputs_${TIMESTAMP}.tar.gz evaluation/evaluation_outputs/outputs
poetry run ./evaluation/integration_tests/scripts/run_infer.sh llm.eval HEAD CodeActAgent '' 10 $N_PROCESSES '' 'haiku_run'
# get integration tests report
REPORT_FILE_HAIKU=$(find evaluation/evaluation_outputs/outputs/integration_tests/CodeActAgent/*haiku*_maxiter_10_N* -name "report.md" -type f | head -n 1)
- name:Run integration test evaluation for DeepSeek
env:
SANDBOX_FORCE_REBUILD_RUNTIME:True
run:|
poetry run ./evaluation/integration_tests/scripts/run_infer.sh llm.eval HEAD CodeActAgent '' 10 $N_PROCESSES '' 'deepseek_run'
# get integration tests report
REPORT_FILE_DEEPSEEK=$(find evaluation/evaluation_outputs/outputs/integration_tests/CodeActAgent/deepseek*_maxiter_10_N* -name "report.md" -type f | head -n 1)
- name:Run integration test evaluation for VisualBrowsingAgent (DeepSeek)
env:
SANDBOX_FORCE_REBUILD_RUNTIME:True
run:|
poetry run ./evaluation/integration_tests/scripts/run_infer.sh llm.eval HEAD VisualBrowsingAgent '' 15 $N_PROCESSES "t05_simple_browsing,t06_github_pr_browsing.py" 'visualbrowsing_deepseek_run'
# Find and export the visual browsing agent test results
REPORT_FILE_VISUALBROWSING_DEEPSEEK=$(find evaluation/evaluation_outputs/outputs/integration_tests/VisualBrowsingAgent/deepseek*_maxiter_15_N* -name "report.md" -type f | head -n 1)
cd evaluation/evaluation_outputs/outputs # Change to the outputs directory
tar -czvf ../../../integration_tests_${TIMESTAMP}.tar.gz integration_tests/CodeActAgent/* integration_tests/VisualBrowsingAgent/* # Only include the actual result directories
body: `[OpenHands](https://github.com/All-Hands-AI/OpenHands) started fixing the ${issueType}! You can monitor the progress [here](https://github.com/${context.repo.owner}/${context.repo.repo}/actions/runs/${context.runId}).`
body: `A potential fix has been generated and a draft PR #${prNumber} has been created. Please review the changes.`
});
process.env.AGENT_RESPONDED = 'true';
} else if (!success && branchName) {
let commentBody = `An attempt was made to automatically fix this issue, but it was unsuccessful. A branch named '${branchName}' has been created with the attempted changes. You can view the branch [here](https://github.com/${context.repo.owner}/${context.repo.repo}/tree/${branchName}). Manual intervention may be required.`;
if (resultExplanation) {
commentBody += `\n\nAdditional details about the failure:\n${resultExplanation}`;
}
github.rest.issues.createComment({
issue_number: issueNumber,
owner: context.repo.owner,
repo: context.repo.repo,
body: commentBody
});
process.env.AGENT_RESPONDED = 'true';
}
# Leave error comment when both PR/Issue comment handling fail
- name:Fallback Error Comment
uses:actions/github-script@v7
if:${{ env.AGENT_RESPONDED == 'false' }}# Only run if no conditions were met in previous steps
body: `The workflow to fix this issue encountered an error. Please check the [workflow logs](https://github.com/${context.repo.owner}/${context.repo.repo}/actions/runs/${context.runId}) for more information.`
echo "Your coworker wants to apply a pull request to this project." > task.txt
echo "Read and review ${{ github.event.pull_request.number }}.diff file. Create a review-${{ github.event.pull_request.number }}.txt and write your concise comments and suggestions there." >> task.txt
echo "Do not ask me for confirmation at any point." >> task.txt
The core AI entity in OpenHands that can perform software development tasks by interacting with tools, browsing the web, and modifying code.
#### Agent Controller
A component that manages the agent's lifecycle, handles its state, and coordinates interactions between the agent and various tools.
#### Agent Delegation
The ability of an agent to hand off specific tasks to other specialized agents for better task completion.
#### Agent Hub
A central registry of different agent types and their capabilities, allowing for easy agent selection and instantiation.
#### Agent Skill
A specific capability or function that an agent can perform, such as file manipulation, web browsing, or code editing.
#### Agent State
The current context and status of an agent, including its memory, active tools, and ongoing tasks.
#### CodeAct Agent
[A generalist agent in OpenHands](https://arxiv.org/abs/2407.16741) designed to perform tasks by editing and executing code.
### Browser
A system for web-based interactions and tasks.
#### Browser Gym
A testing and evaluation environment for browser-based agent interactions and tasks.
#### Web Browser Tool
A tool that enables agents to interact with web pages and perform web-based tasks.
### Commands
Terminal and execution related functionality.
#### Bash Session
A persistent terminal session that maintains state and history for bash command execution.
This uses tmux under the hood.
### Configuration
System-wide settings and options.
#### Agent Configuration
Settings that define an agent's behavior, capabilities, and limitations, including available tools and runtime settings.
#### Configuration Options
Settings that control various aspects of OpenHands behavior, including runtime, security, and agent settings.
#### LLM Config
Configuration settings for language models used by agents, including model selection and parameters.
#### LLM Draft Config
Settings for draft mode operations with language models, typically used for faster, lower-quality responses.
#### Runtime Configuration
Settings that define how the runtime environment should be set up and operated.
#### Security Options
Configuration settings that control security features and restrictions.
### Conversation
A sequence of interactions between a user and an agent, including messages, actions, and their results.
#### Conversation Info
Metadata about a conversation, including its status, participants, and timeline.
#### Conversation Manager
A component that handles the creation, storage, and retrieval of conversations.
#### Conversation Metadata
Additional information about conversations, such as tags, timestamps, and related resources.
#### Conversation Status
The current state of a conversation, including whether it's active, completed, or failed.
#### Conversation Store
A storage system for maintaining conversation history and related data.
### Events
#### Event
Every Conversation comprises a series of Events. Each Event is either an Action or an Observation.
#### Event Stream
A continuous flow of events that represents the ongoing activities and interactions in the system.
#### Action
A specific operation or command that an agent executes through available tools, such as running a command or editing a file.
#### Observation
The response or result returned by a tool after an agent's action, providing feedback about the action's outcome.
### Interface
Different ways to interact with OpenHands.
#### CLI Mode
A command-line interface mode for interacting with OpenHands agents without a graphical interface.
#### GUI Mode
A graphical user interface mode for interacting with OpenHands agents through a web interface.
#### Headless Mode
A mode of operation where OpenHands runs without a user interface, suitable for automation and scripting.
### Agent Memory
The system that decides which parts of the Event Stream (i.e. the conversation history) should be passed into each LLM prompt.
#### Memory Store
A storage system for maintaining agent memory and context across sessions.
#### Condenser
A component that processes and summarizes conversation history to maintain context while staying within token limits.
#### Truncation
A very simple Condenser strategy. Reduces conversation history or content to stay within token limits.
### Microagent
A specialized prompt that enhances OpenHands with domain-specific knowledge, repository-specific context, and task-specific workflows.
#### Microagent Registry
A central repository of available microagents and their configurations.
#### Public Microagent
A general-purpose microagent available to all OpenHands users, triggered by specific keywords.
#### Repository Microagent
A type of microagent that provides repository-specific context and guidelines, stored in the `.openhands/microagents/` directory.
### Prompt
Components for managing and processing prompts.
#### Prompt Caching
A system for caching and reusing common prompts to improve performance.
#### Prompt Manager
A component that handles the loading, processing, and management of prompts used by agents, including microagents.
#### Response Parsing
The process of interpreting and structuring responses from language models and tools.
### Runtime
The execution environment where agents perform their tasks, which can be local, remote, or containerized.
#### Action Execution Server
A REST API that receives agent actions (e.g. bash commands, python code, browsing actions), executes them in the runtime environment, and returns the results.
#### Action Execution Client
A component that handles the execution of actions in the runtime environment, managing the communication between the agent and the runtime.
#### Docker Runtime
A containerized runtime environment that provides isolation and reproducibility for agent operations.
#### E2B Runtime
A specialized runtime environment built on E2B for secure and isolated code execution.
#### Local Runtime
A runtime environment that executes on the local machine, suitable for development and testing.
#### Modal Runtime
A runtime environment built on Modal for scalable and distributed agent operations.
#### Remote Runtime
A sandboxed environment that executes code and commands remotely, providing isolation and security for agent operations.
#### Runtime Builder
A component that builds a Docker image for the Action Execution Server based on a user-specified base image.
### Security
Security-related components and features.
#### Security Analyzer
A component that checks agent actions for potential security risks.
@@ -18,24 +18,24 @@ diverse, inclusive, and healthy community.
Examples of behavior that contributes to a positive environment for our
community include:
* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Demonstrating empathy and kindness toward other people.
* Being respectful of differing opinions, viewpoints, and experiences.
* Giving and gracefully accepting constructive feedback.
* Accepting responsibility and apologizing to those affected by our mistakes,
and learning from the experience
and learning from the experience.
* Focusing on what is best not just for us as individuals, but for the overall
community
community.
Examples of unacceptable behavior include:
* The use of sexualized language or imagery, and sexual attention or advances of
any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
any kind.
* Trolling, insulting or derogatory comments, and personal or political attacks.
* Public or private harassment.
* Publishing others' private information, such as a physical or email address,
without their explicit permission
without their explicit permission.
* Other conduct which could reasonably be considered inappropriate in a
professional setting
professional setting.
## Enforcement Responsibilities
@@ -61,7 +61,7 @@ representative at an online or offline event.
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement at
contact@all-hands.dev
contact@all-hands.dev.
All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the
@@ -113,6 +113,20 @@ individual, or aggression toward or disparagement of classes of individuals.
**Consequence**: A permanent ban from any sort of public interaction within the
community.
### Slack and Discord Etiquettes
These Slack and Discord etiquette guidelines are designed to foster an inclusive, respectful, and productive environment for all community members. By following these best practices, we ensure effective communication and collaboration while minimizing disruptions. Let’s work together to build a supportive and welcoming community!
- Communicate respectfully and professionally, avoiding sarcasm or harsh language, and remember that tone can be difficult to interpret in text.
- Use threads for specific discussions to keep channels organized and easier to follow.
- Tag others only when their input is critical or urgent, and use @here, @channel or @everyone sparingly to minimize disruptions.
- Be patient, as open-source contributors and maintainers often have other commitments and may need time to respond.
- Post questions or discussions in the most relevant channel (e.g., for [slack - #general](https://app.slack.com/client/T06P212QSEA/C06P5NCGSFP) for general topics, [slack - #questions](https://openhands-ai.slack.com/archives/C06U8UTKSAD) for queries/questions, [discord - #general](https://discord.com/channels/1222935860639563850/1222935861386018885)).
- When asking for help or raising issues, include necessary details like links, screenshots, or clear explanations to provide context.
- Keep discussions in public channels whenever possible to allow others to benefit from the conversation, unless the matter is sensitive or private.
- Always adhere to [our standards](https://github.com/All-Hands-AI/OpenHands/blob/main/CODE_OF_CONDUCT.md#our-standards) to ensure a welcoming and collaborative environment.
- If you choose to mute a channel, consider setting up alerts for topics that still interest you to stay engaged. For Slack, Go to Settings → Notifications → My Keywords to add specific keywords that will notify you when mentioned. For example, if you're here for discussions about LLMs, mute the channel if it’s too busy, but set notifications to alert you only when “LLMs” appears in messages. Also for Discord, go to the channel notifications and choose the option that best describes your need.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
Thanks for your interest in contributing to OpenHands! We welcome and appreciate contributions.
## Understanding OpenHands's CodeBase
To understand the codebase, please refer to the README in each module:
- [frontend](./frontend/README.md)
- [evaluation](./evaluation/README.md)
- [openhands](./openhands/README.md)
- [agenthub](./openhands/agenthub/README.md)
- [server](./openhands/server/README.md)
## Setting up Your Development Environment
We have a separate doc [Development.md](https://github.com/All-Hands-AI/OpenHands/blob/main/Development.md) that tells you how to set up a development workflow.
## How Can I Contribute?
There are many ways that you can contribute:
1.**Download and use** OpenHands, and send [issues](https://github.com/All-Hands-AI/OpenHands/issues) when you encounter something that isn't working or a feature that you'd like to see.
2.**Send feedback** after each session by [clicking the thumbs-up thumbs-down buttons](https://docs.all-hands.dev/modules/usage/feedback), so we can see where things are working and failing, and also build an open dataset for training code agents.
3.**Improve the Codebase** by sending PRs (see details below). In particular, we have some [good first issue](https://github.com/All-Hands-AI/OpenHands/labels/good%20first%20issue) issues that may be ones to start on.
3.**Improve the Codebase** by sending [PRs](#sending-pull-requests-to-openhands) (see details below). In particular, we have some [good first issues](https://github.com/All-Hands-AI/OpenHands/labels/good%20first%20issue) that may be ones to start on.
## Understanding OpenHands's CodeBase
## What Can I Build?
Here are a few ways you can help improve the codebase.
To understand the codebase, please refer to the README in each module:
- [frontend](./frontend/README.md)
- [agenthub](./agenthub/README.md)
- [evaluation](./evaluation/README.md)
- [openhands](./openhands/README.md)
- [server](./openhands/server/README.md)
#### UI/UX
We're always looking to improve the look and feel of the application. If you've got a small fix
for something that's bugging you, feel free to open up a PR that changes the [`./frontend`](./frontend) directory.
When you write code, it is also good to write tests. Please navigate to the `tests` folder to see existing test suites.
At the moment, we have two kinds of tests: `unit` and `integration`. Please refer to the README for each test suite. These tests also run on GitHub's continuous integration to ensure quality of the project.
If you're looking to make a bigger change, add a new UI element, or significantly alter the style
of the application, please open an issue first, or better, join the #frontend channel in our Slack
to gather consensus from our design team first.
#### Improving the agent
Our main agent is the CodeAct agent. You can [see its prompts here](https://github.com/All-Hands-AI/OpenHands/tree/main/openhands/agenthub/codeact_agent).
Changes to these prompts, and to the underlying behavior in Python, can have a huge impact on user experience.
You can try modifying the prompts to see how they change the behavior of the agent as you use the app
locally, but we will need to do an end-to-end evaluation of any changes here to ensure that the agent
is getting better over time.
We use the [SWE-bench](https://www.swebench.com/) benchmark to test our agent. You can join the #evaluation
channel in Slack to learn more.
#### Adding a new agent
You may want to experiment with building new types of agents. You can add an agent to [`openhands/agenthub`](./openhands/agenthub)
to help expand the capabilities of OpenHands.
#### Adding a new runtime
The agent needs a place to run code and commands. When you run OpenHands on your laptop, it uses a Docker container
to do this by default. But there are other ways of creating a sandbox for the agent.
If you work for a company that provides a cloud-based runtime, you could help us add support for that runtime
by implementing the [interface specified here](https://github.com/All-Hands-AI/OpenHands/blob/main/openhands/runtime/base.py).
#### Testing
When you write code, it is also good to write tests. Please navigate to the [`./tests`](./tests) folder to see existing test suites.
At the moment, we have two kinds of tests: [`unit`](./tests/unit) and [`integration`](./evaluation/integration_tests). Please refer to the README for each test suite. These tests also run on GitHub's continuous integration to ensure quality of the project.
## Sending Pull Requests to OpenHands
### 1. Fork the Official Repository
Fork the [OpenHands repository](https://github.com/All-Hands-AI/OpenHands) into your own account.
Clone your own forked repository into your local environment:
You'll need to fork our repository to send us a Pull Request. You can learn more
about how to fork a GitHub repo and open a PR with your changes in [this article](https://medium.com/swlh/forks-and-pull-requests-how-to-contribute-to-github-repos-8843fac34ce8).
Set the official repository as your [upstream](https://www.atlassian.com/git/tutorials/git-forks-and-upstreams) to synchronize with the latest update in the official repository.
You should see both `origin` and `upstream` in the output.
### 3. Synchronize with Official Repository
Synchronize latest commit with official repository before coding:
```shell
git fetch upstream
git checkout main
git merge upstream/main
git push origin main
```
### 4. Set up the Development Environment
We have a separate doc [Development.md](https://github.com/All-Hands-AI/OpenHands/blob/main/Development.md) that tells you how to set up a development workflow.
### 5. Write Code and Commit It
Once you have done this, you can write code, test it, and commit it to a branch (replace `my_branch` with an appropriate name):
```shell
git checkout -b my_branch
git add .
git commit
git push origin my_branch
```
### 6. Open a Pull Request
* On GitHub, go to the page of your forked repository, and create a Pull Request:
- Click on `Branches`
- Click on the `...` beside your branch and click on `New pull request`
- Set `base repository` to `All-Hands-AI/OpenHands`
- Set `base` to `main`
- Click `Create pull request`
The PR should appear in [OpenHands PRs](https://github.com/All-Hands-AI/OpenHands/pulls).
Then the OpenHands team will review your code.
## PR Rules
### 1. Pull Request title
### Pull Request title
As described [here](https://github.com/commitizen/conventional-commit-types/blob/master/index.json), a valid PR title should begin with one of the following prefixes:
-`feat`: A new feature
@@ -111,6 +86,38 @@ For example, a PR title could be:
You may also check out previous PRs in the [PR list](https://github.com/All-Hands-AI/OpenHands/pulls).
### 2. Pull Request description
### Pull Request description
- If your PR is small (such as a typo fix), you can go brief.
- If it contains a lot of changes, it's better to write more details.
If your changes are user-facing (e.g. a new feature in the UI, a change in behavior, or a bugfix)
please include a short message that we can add to our changelog.
## How to Make Effective Contributions
### Opening Issues
If you notice any bugs or have any feature requests please open them via the [issues page](https://github.com/All-Hands-AI/OpenHands/issues). We will triage based on how critical the bug is or how potentially useful the improvement is, discuss, and implement the ones that the community has interest/effort for.
Further, if you see an issue you like, please leave a "thumbs-up" or a comment, which will help us prioritize.
### Making Pull Requests
We're generally happy to consider all pull requests with the evaluation process varying based on the type of change:
#### For Small Improvements
Small improvements with few downsides are typically reviewed and approved quickly.
One thing to check when making changes is to ensure that all continuous integration tests pass, which you can check before getting a review.
#### For Core Agent Changes
We need to be more careful with changes to the core agent, as it is imperative to maintain high quality. These PRs are evaluated based on three key metrics:
1.**Accuracy**
2.**Efficiency**
3.**Code Complexity**
If it improves accuracy, efficiency, or both with only a minimal change to code quality, that's great we're happy to merge it in!
If there are bigger tradeoffs (e.g. helping efficiency a lot and hurting accuracy a little) we might want to put it behind a feature flag.
Either way, please feel free to discuss on github issues or slack, and we will give guidance and preliminary feedback.
We would like to thank all the [contributors](https://github.com/All-Hands-AI/OpenHands/graphs/contributors) who have helped make OpenHands possible. Your dedication and hard work are greatly appreciated.
We would like to thank all the [contributors](https://github.com/All-Hands-AI/OpenHands/graphs/contributors) who have helped make OpenHands possible. We greatly appreciate your dedication and hard work.
## Open Source Projects
@@ -10,7 +10,7 @@ OpenHands includes and adapts the following open source projects. We are gratefu
@@ -3,14 +3,16 @@ This guide is for people working on OpenHands and editing the source code.
If you wish to contribute your changes, check out the [CONTRIBUTING.md](https://github.com/All-Hands-AI/OpenHands/blob/main/CONTRIBUTING.md) on how to clone and setup the project initially before moving on.
Otherwise, you can clone the OpenHands project directly.
## Start the server for development
## Start the Server for Development
### 1. Requirements
* Linux, Mac OS, or [WSL on Windows](https://learn.microsoft.com/en-us/windows/wsl/install) [Ubuntu <= 22.04]
* Linux, Mac OS, or [WSL on Windows](https://learn.microsoft.com/en-us/windows/wsl/install) [Ubuntu >= 22.04]
* [Docker](https://docs.docker.com/engine/install/) (For those on MacOS, make sure to allow the default Docker socket to be used from advanced settings!)
OpenHands supports a diverse array of Language Models (LMs) through the powerful [litellm](https://docs.litellm.ai) library. By default, we've chosen the mighty GPT-4 from OpenAI as our go-to model, but the world is your oyster! You can unleash the potential of Anthropic's suave Claude, the enigmatic Llama, or any other LM that piques your interest.
OpenHands supports a diverse array of Language Models (LMs) through the powerful [litellm](https://docs.litellm.ai) library.
By default, we've chosen Claude Sonnet 3.5 as our go-to model, but the world is your oyster! You can unleash the
potential of any other LM that piques your interest.
To configure the LM of your choice, run:
@@ -50,14 +54,11 @@ To configure the LM of your choice, run:
Some alternative models may prove more challenging to tame than others. Fear not, brave adventurer! We shall soon unveil LLM-specific documentation to guide you on your quest.
And if you've already mastered the art of wielding a model other than OpenAI's GPT, we encourage you to share your setup instructions with us by creating instructions and adding it [to our documentation](https://github.com/All-Hands-AI/OpenHands/tree/main/docs/modules/usage/llms).
For a full list of the LM providers and models available, please consult the [litellm documentation](https://docs.litellm.ai/docs/providers).
See [our documentation](https://docs.all-hands.dev/modules/usage/llms) for recommended models.
### 4. Running the application
#### Option A: Run the Full Application
Once the setup is complete, launching OpenHands is as simple as running a single command. This command starts both the backend and frontend servers seamlessly, allowing you to interact with OpenHands:
Once the setup is complete, this command starts both the backend and frontend servers, allowing you to interact with OpenHands:
```bash
make run
```
@@ -74,11 +75,11 @@ make run
```
### 6. LLM Debugging
If you encounter any issues with the Language Model (LM) or you're simply curious, you can inspect the actual LLM prompts and responses. To do so, export DEBUG=1 in the environment and restart the backend.
OpenHands will then log the prompts and responses in the logs/llm/CURRENT_DATE directory, allowing you to identify the causes.
If you encounter any issues with the Language Model (LM) or you're simply curious, export DEBUG=1 in the environment and restart the backend.
OpenHands will log the prompts and responses in the logs/llm/CURRENT_DATE directory, allowing you to identify the causes.
### 7. Help
Need assistance or information on available targets and commands? The help command provides all the necessary guidance to ensure a smooth experience with OpenHands.
Need help or info on available targets and commands? Use the help command for all the guidance you need with OpenHands.
```bash
make help
```
@@ -91,9 +92,37 @@ To run tests, refer to the following:
poetry run pytest ./tests/unit/test_*.py
```
#### Integration tests
Please refer to [this README](./tests/integration/README.md) for details.
### 9. Add or update dependency
1. Add your dependency in `pyproject.toml` or use `poetry add xxx`
2. Update the poetry.lock file via `poetry lock --no-update`
1. Add your dependency in `pyproject.toml` or use `poetry add xxx`.
2. Update the poetry.lock file via `poetry lock --no-update`.
### 9. Use existing Docker image
To reduce build time (e.g., if no changes were made to the client-runtime component), you can use an existing Docker container image by
setting the SANDBOX_RUNTIME_CONTAINER_IMAGE environment variable to the desired Docker image.
These are the procedures and guidelines on how issues are triaged in this repo by the maintainers.
## General
*Most issues must be tagged with **enhancement** or **bug**
* Issues may be tagged with what it relates to (**backend**, **frontend**, **agent quality**, etc.)
*All issues must be tagged with **enhancement**, **bug** or **troubleshooting/help**.
* Issues may be tagged with what it relates to (**agent quality**, **frontend**, **resolver**, etc.).
## Severity
* **Low**: Minor issues, single user report
* **Medium**: Affecting multiple users
* **Critical**: Affecting all users or potential security issues
* **Low**: Minor issues or affecting single user.
* **Medium**: Affecting multiple users.
* **High**: High visibility issues or affecting many users.
* **Critical**: Affecting all users or potential security issues.
## Effort
* Issues may be estimated with effort required (**small effort**, **medium effort**, **large effort**)
* Issues may be estimated with effort required (**small effort**, **medium effort**, **large effort**).
## Difficulty
* Issues with low implementation difficulty may be tagged with **good first issue**
* Issues with low implementation difficulty may be tagged with **good first issue**.
## Not Enough Information
* User is asked to provide more information (logs, how to reproduce, etc.) when the issue is not clear
* If an issue is unclear and the author does not provide more information or respond to a request, the issue may be closed as **not planned** (Usually after a week)
* User is asked to provide more information (logs, how to reproduce, etc.) when the issue is not clear.
* If an issue is unclear and the author does not provide more information or respond to a request,
the issue may be closed as **not planned** (Usually after a week).
## Multiple Requests/Fixes in One Issue
* These issues will be narrowed down to one request/fix so the issue is more easily tracked and fixed
* Issues may be broken down into multiple issues if required
* These issues will be narrowed down to one request/fix so the issue is more easily tracked and fixed.
* Issues may be broken down into multiple issues if required.
## Stale and Auto Closures
* In order to keep a maintainable backlog, issues that have no activity within 30 days are automatically marked as **Stale**.
* If issues marked as **Stale** continue to have no activity for 7 more days, they will automatically be closed as not planned.
* Issues may be reopened by maintainers if deemed important.
<h1 align="center">OpenHands: Code Less, Make More</h1>
<a href="https://docs.all-hands.dev/modules/usage/intro"><img src="https://img.shields.io/badge/Documentation-OpenHands-blue?logo=googledocs&logoColor=white&style=for-the-badge" alt="Check out the documentation"></a>
<a href="https://arxiv.org/abs/2407.16741"><img src="https://img.shields.io/badge/Paper-%20on%20Arxiv-red?logo=arxiv&style=for-the-badge" alt="Paper on Arxiv"></a>
<a href="https://docs.all-hands.dev/modules/usage/getting-started"><img src="https://img.shields.io/badge/Documentation-000?logo=googledocs&logoColor=FFE165&style=for-the-badge" alt="Check out the documentation"></a>
<a href="https://arxiv.org/abs/2407.16741"><img src="https://img.shields.io/badge/Paper%20on%20Arxiv-000?logoColor=FFE165&logo=arxiv&style=for-the-badge" alt="Paper on Arxiv"></a>
Welcome to OpenHands, a platform for autonomous software engineers, powered by AI and LLMs (previously called "OpenDevin").
Welcome to OpenHands (formerly OpenDevin), a platform for software development agents powered by AI.
OpenHands agents collaborate with human developers to write code, fix bugs, and ship features.
OpenHands agents can do anything a human developer can: modify code, run commands, browse the web,
call APIs, and yes—even copy code snippets from StackOverflow.
Learn more at [docs.all-hands.dev](https://docs.all-hands.dev), or jump to the [Quick Start](#-quick-start).
> [!IMPORTANT]
> Using OpenHands for work? We'd love to chat! Fill out
> [this short form](https://docs.google.com/forms/d/e/1FAIpQLSet3VbGaz8z32gW9Wm-Grl4jpt5WgMXPgJ4EDPVmCETCBpJtQ/viewform)
> to join our Design Partner program, where you'll get early access to commercial features and the opportunity to provide input on our product roadmap.
> This command pulls the `0.9` tag, which represents the most recent stable release of OpenHands. You have other options as well:
> - For a specific release version, use `ghcr.io/all-hands-ai/openhands:<OpenHands_version>` (replace <OpenHands_version> with the desired version number).
> - For the most up-to-date development version, use `ghcr.io/all-hands-ai/openhands:main`. This version may be **(unstable!)** and is recommended for testing or development purposes only.
You'll find OpenHands running at [http://localhost:3000](http://localhost:3000)!
Finally, you'll need a model provider and API key.
[Anthropic's Claude 3.5 Sonnet](https://www.anthropic.com/api) (`anthropic/claude-3-5-sonnet-20241022`)
works best, but you have [many options](https://docs.all-hands.dev/modules/usage/llms).
---
You can also [connect OpenHands to your local filesystem](https://docs.all-hands.dev/modules/usage/runtimes#connecting-to-your-filesystem),
run OpenHands in a scriptable [headless mode](https://docs.all-hands.dev/modules/usage/how-to/headless-mode),
interact with it via a [friendly CLI](https://docs.all-hands.dev/modules/usage/how-to/cli-mode),
or run it on tagged issues with [a github action](https://docs.all-hands.dev/modules/usage/how-to/github-action).
Visit [Running OpenHands](https://docs.all-hands.dev/modules/usage/installation) for more information and setup instructions.
> [!CAUTION]
> OpenHands is meant to be run by a single user on their local workstation.
> It is not appropriate for multi-tenant deployments where multiple users share the same instance. There is no built-in isolation or scalability.
>
> Choose the tag that best suits your needs based on stability requirements and desired features.
> If you're interested in running OpenHands in a multi-tenant environment, please
> [get in touch with us](https://docs.google.com/forms/d/e/1FAIpQLSet3VbGaz8z32gW9Wm-Grl4jpt5WgMXPgJ4EDPVmCETCBpJtQ/viewform)
> for advanced deployment options.
You'll find OpenHands running at [http://localhost:3000](http://localhost:3000) with access to `./workspace`. To have OpenHands operate on your code, place it in `./workspace`.
OpenHands will only have access to this workspace folder. The rest of your system will not be affected as it runs in a secured docker sandbox.
If you want to modify the OpenHands source code, check out [Development.md](https://github.com/All-Hands-AI/OpenHands/blob/main/Development.md).
Upon opening OpenHands, you must select the appropriate `Model` and enter the `API Key` within the settings that should pop up automatically. These can be set at any time by selecting
the `Settings` button (gear icon) in the UI. If the required `Model` does not exist in the list, you can manually enter it in the text box.
Having issues? The [Troubleshooting Guide](https://docs.all-hands.dev/modules/usage/troubleshooting) can help.
For the development workflow, see [Development.md](https://github.com/All-Hands-AI/OpenHands/blob/main/Development.md).
Are you having trouble? Check out our [Troubleshooting Guide](https://docs.all-hands.dev/modules/usage/troubleshooting).
## 🚀 Documentation
## 📖 Documentation
To learn more about the project, and for tips on using OpenHands,
**check out our [documentation](https://docs.all-hands.dev/modules/usage/intro)**.
check out our [documentation](https://docs.all-hands.dev/modules/usage/getting-started).
There you'll find resources on how to use different LLM providers (like ollama and Anthropic's Claude),
There you'll find resources on how to use different LLM providers,
troubleshooting resources, and advanced configuration options.
## 🤝 How to Contribute
## 🤝 How to Join the Community
OpenHands is a community-driven project, and we welcome contributions from everyone.
Whether you're a developer, a researcher, or simply enthusiastic about advancing the field of
software engineering with AI, there are many ways to get involved:
OpenHands is a community-driven project, and we welcome contributions from everyone. We do most of our communication
through Slack, so this is the best place to start, but we also are happy to have you contact us on Discord or Github:
-**Code Contributions:** Help us develop new agents, core functionality, the frontend and other interfaces, or sandboxing solutions.
-**Research and Evaluation:** Contribute to our understanding of LLMs in software engineering, participate in evaluating the models, or suggest improvements.
-**Feedback and Testing:** Use the OpenHands toolset, report bugs, suggest features, or provide feedback on usability.
-[Join our Slack workspace](https://join.slack.com/t/openhands-ai/shared_invite/zt-2ypg5jweb-d~6hObZDbXi_HEL8PDrbHg) - Here we talk about research, architecture, and future development.
-[Join our Discord server](https://discord.gg/ESHStjSjD4) - This is a community-run server for general discussion, questions, and feedback.
-[Read or post Github Issues](https://github.com/All-Hands-AI/OpenHands/issues) - Check out the issues we're working on, or add your own ideas.
For details, please check [CONTRIBUTING.md](./CONTRIBUTING.md).
## 🤖 Join Our Community
Whether you're a developer, a researcher, or simply enthusiastic about OpenHands, we'd love to have you in our community.
Let's make software engineering better together!
- [Slack workspace](https://join.slack.com/t/opendevin/shared_invite/zt-2oikve2hu-UDxHeo8nsE69y6T7yFX_BA) - Here we talk about research, architecture, and future development.
- [Discord server](https://discord.gg/ESHStjSjD4) - This is a community-run server for general discussion, questions, and feedback.
See more about the community in [COMMUNITY.md](./COMMUNITY.md) or find details on contributing in [CONTRIBUTING.md](./CONTRIBUTING.md).
## 📈 Progress
See the monthly OpenHands roadmap [here](https://github.com/orgs/All-Hands-AI/projects/1) (updated at the maintainer's meeting at the end of each month).
OpenHands is built by a large number of contributors, and every contribution is greatly appreciated! We also build upon other open source projects, and we are deeply thankful for their work.
@@ -147,8 +125,8 @@ For a list of open source projects and licenses used in OpenHands, please see ou
## 📚 Cite
```
@misc{opendevin,
title={{OpenDevin: An Open Platform for AI Software Developers as Generalist Agents}},
@misc{openhands,
title={{OpenHands: An Open Platform for AI Software Developers as Generalist Agents}},
author={Xingyao Wang and Boxuan Li and Yufan Song and Frank F. Xu and Xiangru Tang and Mingchen Zhuge and Jiayi Pan and Yueqi Song and Bowen Li and Jaskirat Singh and Hoang H. Tran and Fuqiang Li and Ren Ma and Mingzhang Zheng and Bill Qian and Yanjun Shao and Niklas Muennighoff and Yizhe Zhang and Binyuan Hui and Junyang Lin and Robert Brennan and Hao Peng and Heng Ji and Graham Neubig},
This folder implements the CodeAct idea ([paper](https://arxiv.org/abs/2402.01030), [tweet](https://twitter.com/xingyaow_/status/1754556835703751087)) that consolidates LLM agents’**act**ions into a unified **code** action space for both *simplicity* and *performance* (see paper for more details).
The conceptual idea is illustrated below. At each turn, the agent can:
1.**Converse**: Communicate with humans in natural language to ask for clarification, confirmation, etc.
2.**CodeAct**: Choose to perform the task by executing code
- Execute any valid Linux `bash` command
- Execute any valid `Python` code with [an interactive Python interpreter](https://ipython.org/). This is simulated through `bash` command, see plugin system below for more details.
To make the CodeAct agent more powerful with only access to `bash` action space, CodeAct agent leverages OpenHands's plugin system:
- [Jupyter plugin](https://github.com/All-Hands-AI/OpenHands/tree/main/openhands/runtime/plugins/jupyter): for IPython execution via bash command
- [Agent Skills plugin](https://github.com/All-Hands-AI/OpenHands/tree/main/openhands/runtime/plugins/agent_skills): Powerful bash command line tools for software development tasks introduced by [swe-agent](https://github.com/princeton-nlp/swe-agent).
The agent works by passing the model a list of action-observation pairs and prompting the model to take the next step.
### Overview
This agent implements the CodeAct idea ([paper](https://arxiv.org/abs/2402.01030), [tweet](https://twitter.com/xingyaow_/status/1754556835703751087)) that consolidates LLM agents’ **act**ions into a unified **code** action space for both *simplicity* and *performance* (see paper for more details).
The conceptual idea is illustrated below. At each turn, the agent can:
1. **Converse**: Communicate with humans in natural language to ask for clarification, confirmation, etc.
2. **CodeAct**: Choose to perform the task by executing code
- Execute any valid Linux `bash` command
- Execute any valid `Python` code with [an interactive Python interpreter](https://ipython.org/). This is simulated through `bash` command, see plugin system below for more details.
# there should not have two consecutive messages from the same role
ifmessagesandmessages[-1].role==message.role:
messages[-1].content.extend(message.content)
else:
messages.append(message)
# Add caching to the last 2 user messages
ifself.llm.supports_prompt_caching:
user_turns_processed=0
formessageinreversed(messages):
ifmessage.role=='user'anduser_turns_processed<2:
message.content[
-1
].cache_prompt=True# Last item inside the message content
user_turns_processed+=1
# the latest user message is important:
# we want to remind the agent of the environment constraints
latest_user_message=next(
(
m
forminreversed(messages)
ifm.role=='user'
andany(isinstance(c,TextContent)forcinm.content)
),
None,
)
iflatest_user_message:
reminder_text=f'\n\nENVIRONMENT REMINDER: You have {state.max_iterations-state.iteration} turns left to complete the task. When finished reply with <finish></finish>.'
SANDBOX_ENV_GITHUB_TOKEN: "Create a GitHub Personal Access Token (https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens) and set it as SANDBOX_GITHUB_TOKEN in your environment variables."
---
# How to Interact with Github
## Environment Variable Available
-`GITHUB_TOKEN`: A read-only token for Github.
## Using GitHub's RESTful API
Use `curl` with the `GITHUB_TOKEN` to interact with GitHub's API. Here are some common operations:
Here's a template for API calls:
```sh
curl -H "Authorization: token $GITHUB_TOKEN"\
"https://api.github.com/{endpoint}"
```
First replace `{endpoint}` with the specific API path. Common operations:
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
The assistant can use an interactive Python (Jupyter Notebook) environment, executing code with <execute_ipython>.
<execute_ipython>
print("Hello World!")
</execute_ipython>
The assistant can execute bash commands on behalf of the user by wrapping them with <execute_bash> and </execute_bash>.
For example, you can list the files in the current directory by <execute_bash> ls </execute_bash>.
Important, however: do not run interactive commands. You do not have access to stdin.
Also, you need to handle commands that may run indefinitely and not return a result. For such cases, you should redirect the output to a file and run the command in the background to avoid blocking the execution.
For example, to run a Python script that might run indefinitely without returning immediately, you can use the following format: <execute_bash> python3 app.py > server.log 2>&1 & </execute_bash>
Also, if a command execution result saying like: Command: "npm start" timed out. Sending SIGINT to the process, you should also retry with running the command in the background.
{% endset %}
{% set BROWSING_PREFIX %}
The assistant can browse the Internet with <execute_browse> and </execute_browse>.
For example, <execute_browse> Tell me the usa's president using google search </execute_browse>.
Or <execute_browse> Tell me what is in http://example.com </execute_browse>.
{% endset %}
{% set PIP_INSTALL_PREFIX %}
The assistant can install Python packages using the %pip magic command in an IPython environment by using the following syntax: <execute_ipython> %pip install [package needed] </execute_ipython> and should always import packages and define variables before starting to use them.
{% endset %}
{% set SYSTEM_PREFIX = MINIMAL_SYSTEM_PREFIX + BROWSING_PREFIX + PIP_INSTALL_PREFIX %}
{% set COMMAND_DOCS %}
Apart from the standard Python library, the assistant can also use the following functions (already imported) in <execute_ipython> environment:
{{ agent_skills_docs }}
Please note that THE `edit_file_by_replace`, `append_file` and `insert_content_at_line` FUNCTIONS REQUIRE PROPER INDENTATION. If the assistant would like to add the line ' print(x)', it must fully write that out, with all those spaces before the code! Indentation is important and code that is not indented correctly will fail and require fixing before it can be run.
{% endset %}
{% set SYSTEM_SUFFIX %}
Responses should be concise.
The assistant should attempt fewer things at a time instead of putting too many commands OR too much code in one "execute" block.
Include ONLY ONE <execute_ipython>, <execute_bash>, or <execute_browse> per response, unless the assistant is finished with the task or needs more input or action from the user in order to proceed.
If the assistant is finished with the task you MUST include <finish></finish> in your response.
IMPORTANT: Execute code using <execute_ipython>, <execute_bash>, or <execute_browse> whenever possible.
The assistant should utilize full file paths and the 'pwd' command to prevent path-related errors. The assistant should refrain from excessive apologies in its responses.
{% endset %}
{# Combine all parts without newlines between them #}
USER: Create a list of numbers from 1 to 10, and display them in a web page at port 5000.
ASSISTANT:
Sure! Let me create a Python file `app.py`:
<execute_ipython>
create_file('app.py')
</execute_ipython>
USER:
OBSERVATION:
[File: /workspace/app.py (1 lines total)]
(this is the beginning of the file)
1|
(this is the end of the file)
[File app.py created.]
ASSISTANT:
Now I will write the Python code for starting a web server and save it to the file `app.py`:
<execute_ipython>
EDITED_CODE="""from flask import Flask
app = Flask(__name__)
@app.route('/')
def index():
numbers = list(range(1, 11))
return str(numbers)
if __name__ == '__main__':
app.run(port=5000)"""
insert_content_at_line(
'app.py',
1,
EDITED_CODE,
)
</execute_ipython>
USER:
OBSERVATION:
(this is the beginning of the file)
1|from flask import Flask
2|app = Flask(__name__)
3|
4|@app.route('/')
5|def index():
6| numbers = list(range(1, 11))
7| return str(numbers)
8|
9|if __name__ == '__main__':
10| app.run(port=5000)
(this is the end of the file)
[File updated (edited at line 1). Please review the changes and make sure they are correct (correct indentation, no duplicate lines, etc). Edit the file again if necessary.]
ASSISTANT:
I have created a Python file `app.py` that will display a list of numbers from 1 to 10 when you run it. Let me run the Python file for you:
Now that Flask is installed, let me run the Python file again:
<execute_bash>
python3 app.py > server.log 2>&1 &
</execute_bash>
USER:
OBSERVATION:
[1] 124
ASSISTANT:
Let me check the server log again:
<execute_bash>
cat server.log
</execute_bash>
USER:
OBSERVATION:
* Serving Flask app 'app'
* Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on http://127.0.0.1:5000
Press CTRL+C to quit
ASSISTANT:
The server is running on port 5000 with PID 124. You can access the list of numbers by visiting http://127.0.0.1:5000. If you have any further questions, feel free to ask!
USER: Now browse the newly started server's homepage and show me the content.
ASSISTANT:
Sure! Let me browse the server's homepage at http://127.0.0.1:5000:
<execute_browse>
Get the content on "http://127.0.0.1:5000"
</execute_browse>
USER:
OBSERVATION:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
ASSISTANT:
The content of the server's homepage is "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]". If you have any further questions, feel free to ask!
USER: Now kill the server, make it display the numbers in a table format.
ASSISTANT:
Sure! Let me stop the server first:
<execute_bash>
kill 124
</execute_bash>
USER:
OBSERVATION:
[1]+ Terminated python3 app.py > server.log 2>&1
ASSISTANT:
The server has been stopped. Let me open the Python file and modify it.
<execute_ipython>
open_file('app.py')
</execute_ipython>
USER:
[File: /workspace/app.py (10 lines total)]
(this is the beginning of the file)
1|from flask import Flask
2|app = Flask(__name__)
3|
4|@app.route('/')
5|def index():
6| numbers = list(range(1, 11))
7| return str(numbers)
8|
9|if __name__ == '__main__':
10| app.run(port=5000)
(this is the end of the file)
ASSISTANT:
I should edit the file to display the numbers in a table format. I should include correct indentation. Let me update the file:
<execute_ipython>
edit_file_by_replace(
'app.py',
to_replace=" return str(numbers)",
new_content=" return '<table>' + ''.join([f'<tr><td>{i}</td></tr>' for i in numbers]) + '</table>'",
)
</execute_ipython>
USER:
Observation:
[File: /workspace/app.py (10 lines total after edit)]
(this is the beginning of the file)
1|from flask import Flask
2|app = Flask(__name__)
3|
4|@app.route('/')
5|def index():
6| numbers = list(range(1, 11))
7| return '<table>' + ''.join([f'<tr><td>{i}</td></tr>' for i in numbers]) + '</table>'
8|
9|if __name__ == '__main__':
10| app.run(port=5000)
(this is the end of the file)
[File updated (edited at line 7). Please review the changes and make sure they are correct (correct indentation, no duplicate lines, etc). Edit the file again if necessary.]
ASSISTANT:
Running the updated file:
<execute_bash>
python3 app.py > server.log 2>&1 &
</execute_bash>
USER:
Observation:
[1] 126
ASSISTANT:
The server is running on port 5000 with PID 126. You can access the list of numbers in a table format by visiting http://127.0.0.1:5000. Let me know if you have any further requests!
<finish></finish>
--- END OF EXAMPLE ---
{% endset %}
Here is an example of how you can interact with the environment for task solving:
{{ DEFAULT_EXAMPLE }}
{% if micro_agent %}
--- BEGIN OF GUIDELINE ---
The following information may assist you in completing your task:
This agent is an adaptation of the original [SWE Agent](https://swe-agent.com/) based on CodeAct 1.5 using the `agentskills` library of OpenHands.
It is intended use is **solving Github issues**.
It removes web-browsing and Github capability from the original CodeAct agent to avoid confusion to the agent.
"""
sandbox_plugins:list[PluginRequirement]=[
# NOTE: AgentSkillsRequirement need to go before JupyterRequirement, since
# AgentSkillsRequirement provides a lot of Python functions,
# and it needs to be initialized before Jupyter for Jupyter to use those functions.
AgentSkillsRequirement(),
JupyterRequirement(),
]
system_message:str=get_system_message()
in_context_example:str=f"Here is an example of how you can interact with the environment for task solving:\n{get_in_context_example()}\n\nNOW, LET'S START!"
response_parser=CodeActSWEResponseParser()
def__init__(
self,
llm:LLM,
config:AgentConfig,
)->None:
"""Initializes a new instance of the CodeActSWEAgent class.
# there should not have two consecutive messages from the same role
ifmessagesandmessages[-1].role==message.role:
messages[-1].content.extend(message.content)
else:
messages.append(message)
# the latest user message is important:
# we want to remind the agent of the environment constraints
latest_user_message=next(
(mforminreversed(messages)ifm.role=='user'),None
)
# Get the last user text inside content
iflatest_user_message:
latest_user_message_text=next(
(
t
fortinreversed(latest_user_message.content)
ifisinstance(t,TextContent)
)
)
# add a reminder to the prompt
reminder_text=f'\n\nENVIRONMENT REMINDER: You have {state.max_iterations-state.iteration} turns left to complete the task. When finished reply with <finish></finish>.'
'\nApart from the standard Python library, the assistant can also use the following functions (already imported) in <execute_ipython> environment:\n'
f'{_AGENT_SKILLS_DOCS}'
"Please note that THE `edit_file` FUNCTION REQUIRES PROPER INDENTATION. If the assistant would like to add the line ' print(x)', it must fully write that out, with all those spaces before the code! Indentation is important and code that is not indented correctly will fail and require fixing before it can be run."
)
# ======= SYSTEM MESSAGE =======
MINIMAL_SYSTEM_PREFIX="""A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
The assistant can interact with an interactive Python (Jupyter Notebook) environment and receive the corresponding output when needed. The code should be enclosed using "<execute_ipython>" tag, for example:
<execute_ipython>
print("Hello World!")
</execute_ipython>
The assistant can execute bash commands on behalf of the user by wrapping them with <execute_bash> and </execute_bash>.
For example, you can list the files in the current directory by <execute_bash> ls </execute_bash>.
"""
PIP_INSTALL_PREFIX="""The assistant can install Python packages using the %pip magic command in an IPython environment by using the following syntax: <execute_ipython> %pip install [package needed] </execute_ipython> and should always import packages and define variables before starting to use them."""
SYSTEM_SUFFIX="""The assistant's response should be concise.
The assistant should include ONLY ONE <execute_ipython> or <execute_bash> in every one of the responses, unless the assistant is finished with the task or need more input or action from the user in order to proceed.
IMPORTANT: Whenever possible, execute the code for the user using <execute_ipython> or <execute_bash> instead of providing it.
"""
SWE_EXAMPLE="""
--- START OF EXAMPLE ---
USER: We are currently solving the following issue within our repository. Here is the issue text:
--- BEGIN ISSUE ---
TimeDelta serialization precision
Hi there!
I just found quite strange behaviour of `TimeDelta` field serialization
```python3
from marshmallow.fields import TimeDelta
from datetime import timedelta
td_field = TimeDelta(precision="milliseconds")
obj = dict()
obj["td_field"] = timedelta(milliseconds=345)
print(td_field.serialize("td_field", obj))
```
Output of this snippet is `344`, but it seems that `345` is correct.
Looks like a rounding issue here: https://github.com/marshmallow-code/marshmallow/blob/dev/src/marshmallow/fields.py#L1474
Let's first start by reproducing the results of the issue. The issue includes some example code for reproduction, which we can use. We'll create a new file called `reproduce.py` and paste the example code into it.
[File updated. Please review the changes and make sure they are correct (correct indentation, no duplicate lines, etc). Edit the file again if necessary.]
ASSISTANT:
Now let's run the code to see if we see the same output as the issue.
<execute_bash>
python reproduce.py
</execute_bash>
USER:
OBSERVATION:
344
ASSISTANT:
We are indeed seeing the same output as the issue. The issue suggests that we should look at line 1474 of the `fields.py` file to see if there is a rounding issue. Let's open the `fields.py` file and navigate to line 1474 to see the relevant code. First we should find the `fields.py` file in the repository. It is likely to be in the `src/marshmallow` directory, but we should check to be sure.
<execute_bash>
ls -F
</execute_bash>
USER:
OBSERVATION:
AUTHORS.rst
CHANGELOG.rst
CODE_OF_CONDUCT.md
CONTRIBUTING.rst
LICENSE
MANIFEST.in
NOTICE
README.rst
RELEASING.md
azure-pipelines.yml
docs/
examples/
performance/
pyproject.toml
reproduce.py
setup.cfg
setup.py
src/
tests/
tox.ini
ASSISTANT:
It looks like the `src` directory is present, which suggests that the `fields.py` file is likely to be in the `src` directory. Let's use find_file to see where it is.
It looks like the `fields.py` file is present in the `./src/marshmallow/` directory. The issue also points to a specific URL with line number 1474. We should navigate to that line in fields.py to see the relevant code for the `TimeDelta` serialization.
1557| if not self.value_field and not self.key_field:
1558| return self.mapping_type(value)
(439 more lines below)
ASSISTANT:
We are now looking at the relevant section of the `fields.py` file where the `TimeDelta` serialization occurs. The issue suggests that there is a rounding problem near line 1474. The current implementation uses integer division, which truncates the decimal part and can lead to incorrect rounding on line 1475. To fix this issue, we should ensure that the division results in a float, and then round the result to the nearest integer before converting it to an int. This should preserve the milliseconds precision as expected. Let's make the necessary edit to the code.
1522| except FieldInstanceResolutionError as error:
1523| raise ValueError(
1524| '"keys" must be a subclass or instance of '
(473 more lines below)
[File updated. Please review the changes and make sure they are correct (correct indentation, no duplicate lines, etc). Edit the file again if necessary.]
ASSISTANT:
The code has been updated to use the `round` function, which should fix the rounding issue. Before submitting the changes, it would be prudent to run the reproduce.py code again to ensure that this change has the desired effect. I see that "Current directory" is still `/marshmallow-code__marshmallow`, so we can just run `reproduce.py` directly.
<execute_bash>
python reproduce.py
</execute_bash>
USER:
OBSERVATION:
345
ASSISTANT:
The output has changed from 344 to 345, which suggests that the rounding issue has been fixed. Let's remove the reproduce.py file since it is no longer needed.
<execute_bash>
rm reproduce.py
</execute_bash>
USER:
OBSERVATION:
ASSISTANT:
rm doesn't have any output when it deletes successfully, so that must have worked. Now that we have fixed the issue, we can exit the current task.
You are a proofreader tasked with fixing typos in the files in your current working directory.
{% if state.inputs.task %}
Specifically, your task is:
{{ state.inputs.task }}
{% endif %}
To achieve this goal, you should:
1. Scan the files for typos
2. Overwrite the files with the typos fixed
3. Provide a summary of the typos fixed
## Available Actions
{{ instructions.actions.read }}
{{ instructions.actions.write }}
{{ instructions.actions.run }}
{{ instructions.actions.message }}
{{ instructions.actions.finish }}
To complete this task:
1. Use the `read` action to read the contents of the files in your current working directory. Make sure to provide the file path in the format `'./file_name.ext'`.
2. Use the `message` action to analyze the contents and identify typos.
3. Use the `write` action to create new versions of the files with the typos fixed.
- Overwrite the original files with the corrected content. Make sure to provide the file path in the format `'./file_name.ext'`.
4. Use the `message` action to generate a summary of the typos fixed, including the original and fixed versions of each typo, and the file(s) they were found in.
5. Use the `finish` action to return the summary in the `outputs.summary` field.
Do NOT finish until you have fixed all the typos and generated a summary.
You're a diligent software engineer AI. You can't see, draw, or interact with a
browser, but you can read and write files, and you can run commands, and you can think.
You've been given the following task:
%(task)s
## Plan
As you complete this task, you're building a plan and keeping
track of your progress. Here's a JSON representation of your plan:
%(plan)s
%(plan_status)s
You're responsible for managing this plan and the status of tasks in
it, by using the `add_task` and `modify_task` actions described below.
If the History below contradicts the state of any of these tasks, you
MUST modify the task using the `modify_task` action described below.
Be sure NOT to duplicate any tasks. Do NOT use the `add_task` action for
a task that's already represented. Every task must be represented only once.
Tasks that are sequential MUST be siblings. They must be added in order
to their parent task.
If you mark a task as 'completed', 'verified', or 'abandoned',
all non-abandoned subtasks will be marked the same way.
So before closing a task this way, you MUST not only be sure that it has
been completed successfully--you must ALSO be sure that all its subtasks
are ready to be marked the same way.
If, and only if, ALL tasks have already been marked verified,
you MUST respond with the `finish` action.
## History
Here is a recent history of actions you've taken in service of this plan,
as well as observations you've made. This only includes the MOST RECENT
ten actions--more happened before that.
%(history)s
Your most recent action is at the bottom of that history.
## Action
What is your next thought or action? Your response must be in JSON format.
It must be an object, and it must contain two fields:
* `action`, which is one of the actions below
* `args`, which is a map of key-value pairs, specifying the arguments for that action
* `read` - reads the content of a file. Arguments:
* `path` - the path of the file to read
* `write` - writes the content to a file. Arguments:
* `path` - the path of the file to write
* `content` - the content to write to the file
* `run` - runs a command on the command line in a Linux shell. Arguments:
* `command` - the command to run
* `browse` - opens a web page. Arguments:
* `url` - the URL to open
* `message` - make a plan, set a goal, record your thoughts, or ask for more input from the user. Arguments:
* `content` - the message to record
* `wait_for_response` - set to `true` to wait for the user to respond before proceeding
* `add_task` - add a task to your plan. Arguments:
* `parent` - the ID of the parent task (leave empty if it should go at the top level)
* `goal` - the goal of the task
* `subtasks` - a list of subtasks, each of which is a map with a `goal` key.
* `modify_task` - close a task. Arguments:
* `task_id` - the ID of the task to close
* `state` - set to 'in_progress' to start the task, 'completed' to finish it, 'verified' to assert that it was successful, 'abandoned' to give up on it permanently, or `open` to stop working on it for now.
* `finish` - if ALL of your tasks and subtasks have been verified or abandoned, and you're absolutely certain that you've completed your task and have tested your work, use the finish action to stop working.
You MUST take time to think in between read, write, run, and browse actions--do this with the `message` action.
You should never act twice in a row without thinking. But if your last several
actions are all `message` actions, you should consider taking a different action.
What is your next thought or action? Again, you must reply with JSON, and only with JSON.
%(hint)s
"""
defget_hint(latest_action_id:str)->str:
"""Returns action type hint based on given action_id"""
hints={
'':"You haven't taken any actions yet. Start by using `ls` to check out what files you're working with.",
ActionType.RUN:'You should think about the command you just ran, what output it gave, and how that affects your plan.',
ActionType.READ:'You should think about the file you just read, what you learned from it, and how that affects your plan.',
ActionType.WRITE:'You just changed a file. You should think about how it affects your plan.',
ActionType.BROWSE:'You should think about the page you just visited, and what you learned from it.',
ActionType.MESSAGE:"Look at your last thought in the history above. What does it suggest? Don't think anymore--take action.",
ActionType.ADD_TASK:'You should think about the next action to take.',
ActionType.MODIFY_TASK:'You should think about the next action to take.',
ActionType.SUMMARIZE:'',
ActionType.FINISH:'',
}
returnhints.get(latest_action_id,'')
defget_prompt_and_images(
state:State,max_message_chars:int
)->tuple[str,list[str]]:
"""Gets the prompt for the planner agent.
Formatted with the most recent action-observation pairs, current task, and hint based on last action
Parameters:
- state (State): The state of the current agent
Returns:
- str: The formatted string prompt with historical values
# Whether to use native tool calling if supported by the model. Can be true, false, or None by default, which chooses the model's default behavior based on the evaluation.
# ATTENTION: Based on evaluation, enabling native function calling may lead to worse results
# in some scenarios. Use with caution and consider testing with your specific use case.
> This is not officially supported and may not work.
Install [Docker](https://docs.docker.com/engine/install/) on your host machine and run:
```bash
make docker-dev
# same as:
cd ./containers/dev
./dev.sh
```
It could take some time if you are running for the first time as Docker will pull all the tools required for building OpenHands. The next time you run again, it should be instant.
## Build and run
If everything goes well, you should be inside a container after Docker finishes building the `openhands:dev` image similar to the following:
```bash
Build and run in Docker ...
root@93fc0005fcd2:/app#
```
You may now proceed with the normal [build and run](../../Development.md) workflow as if you were on the host.
## Make changes
The source code on the host is mounted as `/app` inside docker. You may edit the files as usual either inside the Docker container or on your host with your favorite IDE/editors.
The following are also mapped as readonly from your host:
This folder builds runtime image (sandbox), which will use a `Dockerfile` that is dynamically generated depends on the `base_image` AND a [Python source distribution](https://docs.python.org/3.10/distutils/sourcedist.html) that's based on the current commit of `openhands`.
This folder builds a runtime image (sandbox), which will use a dynamically generated `Dockerfile`
that depends on the `base_image`**AND** a [Python source distribution](https://docs.python.org/3.10/distutils/sourcedist.html) that is based on the current commit of `openhands`.
The following command will generate Dockerfile for `ubuntu:22.04` and the source distribution `.tar` into `containers/runtime`.
The following command will generate a `Dockerfile` file for `nikolaik/python-nodejs:python3.12-nodejs22` (the default base image), an updated `config.sh` and the runtime source distribution files/folders into `containers/runtime`:
```bash
poetry run python3 openhands/runtime/utils/runtime_build.py \
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.