Compare commits

...

178 Commits

Author SHA1 Message Date
openhands
93287ef9ac Fix microagent test filenames to match expected names
- Change test filenames from 'test.md' to match expected microagent names
- Use 'default.md' for tests expecting 'default' name
- Use 'custom_name.md' for test expecting 'custom_name' name
- Use 'test_agent.md' for test expecting 'test_agent' name
- This properly tests the filename-based naming behavior
2025-06-24 14:20:34 +00:00
openhands
e70595f46f Fix microagent tests and remove debug prints
- Update test assertions to expect filename as microagent name instead of 'default'
- Remove debug print statements from microagent.py
- Revert pytest-asyncio dependency addition as requested
- All tests now pass with the new filename-based naming behavior
2025-06-24 14:16:20 +00:00
openhands
1d3ff66987 Fix failing tests: add missing newlines and pytest-asyncio dependency
- Add missing newlines at end of microagent files (fixed by pre-commit)
- Add pytest-asyncio dependency to fix async test execution
- All non-Docker tests now pass
2025-06-24 14:01:12 +00:00
Xingyao Wang
1a95f86802 fix all remaining issue' 2025-06-23 17:49:02 -04:00
Xingyao Wang
eee12bfd94 fix test 2025-06-23 16:09:32 -04:00
Xingyao Wang
8c2d4dbe8b Merge branch 'main' into update-microagent-docs 2025-06-23 14:22:56 -04:00
மனோஜ்குமார் பழனிச்சாமி
f5ae1759b6 Add model name (#8718) 2025-06-23 14:21:47 -04:00
Ikuo Matsumura
9ec94737ed feat(cli): Add vi mode support (#9287)
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
2025-06-23 17:39:38 +00:00
llamantino
63c7815823 docs: rewrite local LLMs page (#9307) 2025-06-24 01:20:03 +08:00
baii
95ae47307c Fix the issue where the shttp_services configuration from config.toml fails to load correctly. (#9175) 2025-06-23 13:02:56 -04:00
Graham Neubig
035050252b Better timeout prompt (#9140)
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
2025-06-23 16:42:15 +00:00
Tommaso Bendinelli
5b48aee0c9 Fix openhands.core.exceptions.FunctionCallConversionError fn_call_converter for GPT-o4-mini when the agent generates images (#9152)
Co-authored-by: tommaso <tommaso@t7144.csem.local>
2025-06-23 16:01:36 +00:00
Xingyao Wang
1a89dbb738 docs: Add Success Stories tab to documentation (#9120)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-23 23:39:39 +08:00
Rohit Malhotra
bba62c26fd Make sandbox api key configurable via user settings (#8803)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-23 11:25:10 -04:00
Graham Neubig
9b4ad4e6e3 Fix SambaNova context length exception handling (#9252)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-23 07:06:31 -04:00
Graham Neubig
1e33624951 Simplify max_output_tokens handling in LLM classes (#9296)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-23 06:48:45 -04:00
Graham Neubig
8b90d610c6 Fix CLI model selection to allow custom model names (#9205)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-23 04:03:00 +00:00
mamoodi
834abc0eee More doc updates (#9289) 2025-06-22 22:46:47 -04:00
Tim O'Farrell
c9bb0fc168 Conversation Manager small refactor (#9286)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-22 19:27:03 -06:00
Graham Neubig
5d69e606eb feat: Add Windows PowerShell support to CLI runtime (#9211)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-22 20:17:40 -04:00
Engel Nyst
081880248c Fix lint (#9290) 2025-06-22 13:40:14 -04:00
Chase
4ee269c3f7 Add ability to customize configuration model on per-agent basis (#8576) 2025-06-22 14:43:17 +02:00
Xingyao Wang
711315c3b9 docs: Update documentation based on llamantino feedback (#9119)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-21 21:57:14 -04:00
mamoodi
c2e6244b86 Small doc updates. Fix FAQs (#9270) 2025-06-21 15:52:29 -07:00
Xingyao Wang
a1479adfd3 feat(agent): Add configurable system_prompt_filename to AgentConfig (#9265)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-22 06:21:52 +08:00
dependabot[bot]
99fd3f7bb2 chore(deps): bump ubuntu from 22.04 to 24.04 in /containers/e2b-sandbox (#9042)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-21 12:23:42 -07:00
dependabot[bot]
c617881b3c chore(deps): bump the version-all group in /frontend with 4 updates (#9234)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-21 12:22:43 -07:00
dependabot[bot]
7ca3607dcd chore(deps): bump the version-all group with 3 updates (#9256)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-21 12:21:57 -07:00
mamoodi
89999a8e09 Update free credits lines (#9269) 2025-06-21 15:35:04 +00:00
Ray Myers
3d9761df7e Release branch for 0.45.0 (#9264) 2025-06-20 21:14:23 +00:00
Xingyao Wang
ea3c4f9366 Fix(CLI): duplicated Command Action display in CLI (#9260)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-21 04:24:16 +08:00
Graham Neubig
bda0a64a3d Fix empty image URLs in multimodal browsing causing litellm.BadRequestError (#9214)
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2025-06-20 19:44:11 +00:00
Graham Neubig
8badcb7b35 Fix feedback UI localization in LikertScale component (#9253)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-20 18:50:39 +00:00
Xingyao Wang
078534c2ab Fix httpx deprecation warning during LLM API calls (#9261)
Co-authored-by: Rohit Malhotra <rohitvinodmalhotra@gmail.com>
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-20 18:36:31 +00:00
Rohit Malhotra
ba885cd04c Remove Bitbucket login button from SAAS auth modal (#9258)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-20 16:29:57 +00:00
Rohit Malhotra
ee64a6662a (Hotfix): tokens go stale for restarted convos in cloud openhands (#9111)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-20 12:16:42 -04:00
solah soyalp
075ef4db9f Add Japanese translations (#9244) 2025-06-20 00:45:08 +00:00
Xingyao Wang
a526f73ea6 Add FAQ page to documentation (#9132) 2025-06-19 13:37:03 -07:00
Xingyao Wang
516f9fa635 Add o4-mini model and Mistral provider support to OpenHands CLI (#9217)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-19 18:47:27 +00:00
Xingyao Wang
8c5995a5d8 Update citation in README.md (#9243) 2025-06-19 18:01:30 +00:00
dependabot[bot]
afe130f6db chore(deps): bump the version-all group across 1 directory with 15 updates (#9239)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-19 18:48:47 +02:00
Xingyao Wang
cc2f96c6c4 Fix search_events signature mismatches after get_events replacement (#9238)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-19 17:57:44 +02:00
Rohit Malhotra
b7a6190133 Add max_budget_per_task to settings (#8812)
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>
2025-06-18 20:25:01 -04:00
brettstewart
54af9ff3fe feat(runtime): add kubernetes support (#8814)
Co-authored-by: Corey White <corey.white@ziffdavis.com>
Co-authored-by: luke_schulz <luke.schulz@ziffmedia.com>
2025-06-18 21:25:50 +00:00
Xingyao Wang
ef582a6335 Increase max iterations from 250 to 500 (#9203)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-18 21:32:15 +02:00
Xingyao Wang
0ca3188afa Merge branch 'main' into update-microagent-docs 2025-06-18 14:23:58 -04:00
Xingyao Wang
d5f5e34ead Fix deprecation warnings in OpenHands CLI (#9199)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-19 02:22:55 +08:00
Graham Neubig
91e6d359c2 Update repo.md with better "openhands with openhands" directions (#9216) 2025-06-18 12:38:51 -04:00
Mislav Lukach
a9f26a13a6 feat(chat): support file upload (#8945)
Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>
2025-06-18 20:13:07 +04:00
dependabot[bot]
a92d6904fc chore(deps): bump the version-all group in /frontend with 2 updates (#9215)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-18 15:28:28 +00:00
dependabot[bot]
306777626f chore(deps): bump the version-all group across 1 directory with 9 updates (#9182)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: amanape <83104063+amanape@users.noreply.github.com>
2025-06-18 12:05:15 +00:00
Rohit Malhotra
1807efad0b Add Bitbucket integration documentation for local usage (#9206)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-18 05:56:36 -04:00
Graham Neubig
e074b2d36f Add Bitbucket microagent and backend implementation (#9021)
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Rohit Malhotra <rohitvinodmalhotra@gmail.com>
2025-06-18 00:04:29 -04:00
Ray Myers
b7efeb11d9 Bump version to 0.44.0 (#9163)
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
2025-06-17 21:13:17 +00:00
Graham Neubig
7d0aadf8ed Rename ~/.openhands-state to ~/.openhands (#9135)
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
2025-06-17 20:44:52 +00:00
Mislav Lukach
78af1de870 chore(analytics): improve label clarity (#9161)
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
2025-06-17 20:33:52 +00:00
llamantino
6a9065960d fix(devcontainer): mark workspace as safe dir (#9136) 2025-06-18 04:22:42 +08:00
Maxim Evtush
653a8a7ce2 Refactor: Improve Consistency in Function Signatures and Regex Usage in compute_ism_pm_score.py (#9145) 2025-06-18 04:22:16 +08:00
Graham Neubig
3591c7a79f Add uvx installation option to CLI documentation (#9186)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-18 04:19:18 +08:00
Ivan Dagelic
bae6bd77f4 fix: daytona runtime sandbox handling (#9187)
Signed-off-by: Ivan Dagelic <dagelic.ivan@gmail.com>
2025-06-18 04:18:46 +08:00
Rohit Malhotra
30c71776e7 [Fix]: Loading microagents for integrations (#9189)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-17 16:16:19 -04:00
Robert Brennan
147ffb7e42 Suppress pydub warning about ffmpeg/avconv not found (#8940)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-17 14:44:32 -04:00
Tim O'Farrell
237037cee9 Fix remote runtime status (#9190) 2025-06-18 02:34:41 +08:00
Xingyao Wang
567af43a71 Fix deprecation warning: Replace get_events with search_events (#9188)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-18 00:54:29 +08:00
Rohit Malhotra
65071550b6 Fix grammar issues in Slack documentation (#9180)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-17 23:53:55 +08:00
Alexander
d81d2f62cb docs: local serving with ollama documented (#8807)
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2025-06-17 07:18:18 -04:00
Ryan H. Tran
ddaa186971 [GAIA] Add prompt improvement to alleviate solution parsing issue & support Tavily search tools (#9057) 2025-06-17 13:16:50 +07:00
Graham Neubig
e6e0f4673f docs: Add "Running OpenHands with OpenHands" section for recursive development (#9146)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-16 20:57:52 -04:00
Graham Neubig
7d78b65a1a docs: Add Python version requirement to CLI documentation (#9164)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-16 20:14:10 +00:00
Rohit Malhotra
1f90086030 (Hotfix): Slack app installation flow (#9162)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-16 19:33:43 +00:00
Xingyao Wang
2c4ecd02f7 feat(frontend): add user feedback Likert scale for agent performance rating (only on OH Cloud) (#8992)
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>
2025-06-16 19:26:24 +00:00
Rohit Malhotra
2fd1fdcd7e [Refactor, Fix]: Agent controller state/metrics management (#9012)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-16 11:24:13 -04:00
Graham Neubig
cbe32a1a12 Fix bash timeout issue caused by interactive git clone prompts (#9148)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-16 08:39:28 -04:00
better629
432d8829dc disable mcp in run_localize and install oh-aci[llama] for issue 9150 (#9151) 2025-06-16 11:03:17 +00:00
Graham Neubig
24f891687d Fix CLI displaying claude-2 as default model for anthropic provider (#9101)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-15 21:21:33 -04:00
Graham Neubig
2d2ccf1329 Fix conversation URL format in pull request links (#9143) 2025-06-15 15:41:08 -04:00
FT
e5bff91e8e Fix Typo: Change "accurancy" to "accuracy" in Evaluation Benchmark Comments (#9139) 2025-06-15 12:48:26 +00:00
Linghao Zhang
a93b0457c6 feat(eval): Support evaluation on SWE-bench-Live (#9137) 2025-06-15 12:30:47 +00:00
Graham Neubig
98e0f5509c Update CLI mode docs to accurately reflect settings workflow (#9134) 2025-06-14 19:21:18 +00:00
kilavvy
4e99aabcb2 Minor Code Comment Corrections and Clarifications (#9129) 2025-06-14 18:57:14 +00:00
Graham Neubig
0c307ea12e Lint all files in the repo (#9131)
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2025-06-14 16:25:59 +00:00
Graham Neubig
5134a7d938 Add secrets manager documentation to GUI mode docs (#9084)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-14 12:13:24 -04:00
Graham Neubig
a1627914ad Fix broken link to LLMs section in GUI mode documentation (#9121)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-14 23:26:41 +08:00
Graham Neubig
ccdd86e476 docs: remove 'coming soon' mentions from Slack app installation page (#9112)
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Rohit Malhotra <rohitvinodmalhotra@gmail.com>
2025-06-14 14:35:04 +00:00
ASTONE
be62ba6b35 add_versicode (#8221) 2025-06-14 13:17:18 +00:00
leopardracer
13c298d35f Minor Typo Fixes in Comments and Documentation (#9058) 2025-06-14 12:51:38 +00:00
llamantino
47b0dc548e feat: support dev container networking without host mode (#9122) 2025-06-14 08:38:18 -04:00
Graham Neubig
90ae4bda0d Restore Windows without WSL documentation (#9090)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-14 08:35:30 -04:00
dependabot[bot]
8963644fb4 chore(deps): bump the version-all group across 1 directory with 14 updates (#9107)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-14 07:58:24 -04:00
Engel Nyst
fd3b4ac8e6 Refactor SWE-bench instruction (#8010) 2025-06-13 23:27:52 +02:00
Rohit Malhotra
53623c76b5 [Fix]: allow agent to configure draft status for opened prs/mrs via git mcp (#9117) 2025-06-13 21:06:23 +00:00
Ray Myers
e6036b8346 Bump version for 0.43.0 release (#9109) 2025-06-13 14:47:26 -05:00
jpelletier1
144d09a578 Code review microagent (#9093)
Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2025-06-13 01:35:44 +00:00
llamantino
f97a837d46 fix: fix unreachable runtime container in make docker-dev (#9072) 2025-06-12 12:46:10 -04:00
dependabot[bot]
eadec4ce9e chore(deps): bump the version-all group in /frontend with 8 updates (#9095)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-12 15:17:45 +00:00
dependabot[bot]
49e8737779 chore(deps): bump the version-all group across 1 directory with 24 updates (#9066)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: amanape <83104063+amanape@users.noreply.github.com>
2025-06-12 14:31:35 +00:00
Graham Neubig
4711e74101 Fix default provider in CLI to be 'anthropic' instead of 'openai' (#9004)
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
2025-06-12 03:02:03 +00:00
mamoodi
c87f1cc8c0 Move Advanced Configurations under Running OpenHands on your Own (#9082) 2025-06-11 16:36:17 -04:00
Rohit Malhotra
33b64786b0 [Docs]: add info about lower scope tokens for gitlab (#9017)
Co-authored-by: mamoodi <mamoodiha@gmail.com>
2025-06-11 19:34:06 +00:00
Rohit Malhotra
12fc50299b [Docs]: add slack integration docs (#8903)
Co-authored-by: mamoodi <mamoodiha@gmail.com>
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-11 19:32:54 +00:00
Tim O'Farrell
57fee17348 Fix VSCode workspace dir (#9080) 2025-06-11 13:31:59 -06:00
Engel Nyst
77517d8ba0 Save CLI settings directly under ~/.openhands (#9079) 2025-06-11 21:07:40 +02:00
Calvin Smith
a356f56237 fix: Context window truncation makes progress (#9052)
Co-authored-by: Calvin Smith <calvin@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-11 12:47:34 -06:00
chuckbutkus
7dede37fd8 Make sure redirect URI is HTTPS unless it is for localhost (#9076) 2025-06-11 18:19:15 +00:00
Ray Myers
c11dcad309 Add more log context on key events (#9056) 2025-06-11 11:34:16 -05:00
Tim O'Farrell
47209e794a Runtime Status Fixes (#9050)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-11 09:28:17 -06:00
Xingyao Wang
3f50eb0079 feat: Add microagents UI to conversation context menu (#8984)
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>
2025-06-11 23:12:27 +08:00
sp.wack
f27b02411b chore: Add deprecated tag to ActionMessage type (#9063) 2025-06-11 18:34:07 +04:00
llamantino
d151093872 docs: added devstral to llms list, added local llms in local setup (#9062)
Co-authored-by: mamoodi <mamoodiha@gmail.com>
2025-06-11 10:22:15 -04:00
neo
ea7294b7f9 docs: add links to other language versions of README (#9038)
Co-authored-by: mamoodi <mamoodiha@gmail.com>
2025-06-11 09:49:40 -04:00
Xingyao Wang
9097f487a6 Move get_agent_obs_text function to browser utils and add return_all option (#9019)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-11 12:32:38 +08:00
Rohit Malhotra
fd921a4f88 [Fix]: model tracking in convo metadata (#9053) 2025-06-10 22:19:33 -04:00
Xingyao Wang
96fe5a50d6 Update repo.md (#9054) 2025-06-10 21:51:13 -04:00
Howie Zhou
b634e10b45 Add JSON serialization for array and object parameters when converting tools (#8780) 2025-06-10 16:48:49 -04:00
Xingyao Wang
73f01657eb docs: Add TanStack Query state management documentation (#9047) 2025-06-10 16:44:00 -04:00
mamoodi
5d328183d5 Release 0.42.0 (#9046) 2025-06-10 16:34:10 -04:00
Mislav Lukach
b7da65d373 chore(ui): update tailwind (#9049)
Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>
2025-06-10 18:20:04 +00:00
sp.wack
dca9c7bdc6 feat(backend): New "update microagent prompt" API (#8357)
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Engel Nyst <engel.nyst@gmail.com>
2025-06-10 22:10:55 +04:00
Rene Leonhardt
07862c32cb chore(docker): update docker base images (#8796)
Co-authored-by: Xingyao Wang <xingyaoww@gmail.com>
2025-06-10 22:48:46 +08:00
Emmanuel Ferdman
e04f876df9 Migrate to modern logger interface in server utils (#8965)
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-06-10 10:25:06 -04:00
Mislav Lukach
78d707de83 chore(billing): add stripe powered by (#9016)
Co-authored-by: amanape <83104063+amanape@users.noreply.github.com>
2025-06-10 18:10:09 +04:00
sp.wack
058153292f fix(ui): startup message ui (#9007) 2025-06-10 16:50:18 +04:00
openhands
283f503870 Exclude name field in MicroagentMetadata as it's deprecated 2025-06-08 22:07:33 +00:00
openhands
0691e5c0d0 Remove type: field from all microagent markdown files 2025-06-08 19:48:01 +00:00
openhands
fc16da8fd2 Update microagent documentation to clarify that type field is optional 2025-06-08 19:39:17 +00:00
openhands
bd3ff43c67 Remove name field from microagent files 2025-06-08 19:35:06 +00:00
openhands
0fe5b808af Update microagent code to use filename as name when not specified 2025-06-08 19:34:59 +00:00
openhands
6c49686ff0 Add MCP tools documentation and update microagent field requirements 2025-06-08 19:30:21 +00:00
openhands
17212bb2f2 Remove unused fields from microagent code and update all microagent files 2025-06-08 19:26:56 +00:00
openhands
9d9f931e95 Remove unused fields from microagent documentation and example 2025-06-08 19:23:47 +00:00
openhands
6fe9680474 Consolidate task microagent documentation into keyword-triggered microagents 2025-06-08 19:19:44 +00:00
Xingyao Wang
53c80d1c92 Merge branch 'main' into update-microagent-docs 2025-06-08 15:17:37 -04:00
openhands
401262f353 Update documentation for task microagents with user input support 2025-06-08 19:15:31 +00:00
Xingyao Wang
58845b01a3 rename more files 2025-06-08 14:30:37 -04:00
Xingyao Wang
469d184157 address engel comment 2025-06-08 14:28:22 -04:00
Xingyao Wang
4837c4dc74 Update microagents/get_test_to_pass.md
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2025-06-09 02:24:23 +08:00
Xingyao Wang
6763f21cc3 Merge branch 'main' into add-back-microagents 2025-06-07 16:47:00 -04:00
Xingyao Wang
32e610ac1d revert unnecessary change 2025-06-07 16:30:55 -04:00
Xingyao Wang
85c65391ca revert changes 2025-06-03 13:53:27 -04:00
Xingyao Wang
c444dbfbbf remove fe changes 2025-06-03 12:04:37 -04:00
Xingyao Wang
dd988d0f14 revert fe 2025-06-03 12:03:00 -04:00
Xingyao Wang
6f1a74e286 merge main 2025-06-03 11:37:51 -04:00
Xingyao Wang
7b956b6103 revert docs to look like main 2025-06-03 11:35:57 -04:00
openhands
34b097115d Fix linting issues in frontend and Python code 2025-05-19 01:39:48 +00:00
openhands
3e4ab4f379 Fix docstring formatting in KnowledgeMicroagent class 2025-05-19 01:29:23 +00:00
openhands
54cd9f7e44 Fix unlocalized strings in microagent-dropdown.tsx 2025-05-19 01:26:33 +00:00
openhands
802b765f98 Add microagent button and dropdown to trajectory actions 2025-05-17 12:05:13 +00:00
openhands
18c88f99ff Merge from main to resolve conflicts 2025-05-17 06:56:11 +00:00
openhands
f3934be07b Fix microagent suggestions using tippy.js for better popup handling 2025-05-12 12:55:00 +00:00
openhands
6ce9f49d1e Fix linting issues in TipTap editor component 2025-05-12 11:06:15 +00:00
openhands
fc07622b20 Implement microagent suggestions using TipTap 2025-05-12 11:00:08 +00:00
Xingyao Wang
da935f9d8f Merge branch 'main' into add-back-microagents 2025-05-03 00:04:17 +08:00
openhands
642cc52a1a Fix linting issues in handlers.ts 2025-05-02 13:06:21 +00:00
openhands
4c361ab9e5 Add mock handler for microagents endpoint 2025-05-02 09:23:25 +00:00
openhands
5dfa1bb6eb Fix microagent suggestions UI and TypeScript errors 2025-05-02 09:21:15 +00:00
Xingyao Wang
a07cf972a5 Merge commit '6032d2620d6ec252d3c80695a6de1fc88da9c87a' into add-back-microagents 2025-05-02 09:03:17 +00:00
openhands
f2e3bc3254 Fix microagent suggestions feature 2025-05-02 08:52:19 +00:00
openhands
3790ec7d60 Add tests for microagent suggestions component 2025-05-02 03:31:41 +00:00
openhands
3c0719309e Add microagent suggestions feature to chat input 2025-05-02 02:57:57 +00:00
Xingyao Wang
0236e0943e fix test 2025-05-02 02:09:27 +00:00
Xingyao Wang
cd464c0022 rename files 2025-05-01 10:38:04 +08:00
Xingyao Wang
4519a7f4f3 fix test 2025-05-01 02:29:52 +00:00
Xingyao Wang
fdc591330b add remain 2025-05-01 02:25:38 +00:00
Xingyao Wang
98e454e82c fix lint and missing imports 2025-05-01 02:25:24 +00:00
Xingyao Wang
e088d2d24a simplify microagent 2025-05-01 02:13:46 +00:00
Xingyao Wang
58c574af1e revert changes 2025-05-01 02:13:00 +00:00
Xingyao Wang
405f0069f8 revert some changes 2025-05-01 02:03:06 +00:00
Xingyao Wang
f26d770d03 remove hardcoded last line 2025-05-01 02:01:51 +00:00
Xingyao Wang
bf2c3de219 cleanup tests 2025-04-30 11:11:23 +08:00
Xingyao Wang
7c35ce16e5 Merge branch 'main' into add-back-microagents 2025-04-30 11:07:17 +08:00
Xingyao Wang
f4024ccd94 Update microagents/update_pr_description.md
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2025-04-30 10:43:37 +08:00
Xingyao Wang
b55bfed831 Update microagents/address_pr_comments.md
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2025-04-30 10:38:22 +08:00
OpenHands Bot
cb0994027f 🤖 Auto-fix Python linting issues 2025-04-29 16:02:04 +00:00
openhands
bcc9bd0b9a Move task microagent tests to test_microagent_task.py 2025-04-29 02:12:14 +00:00
openhands
6c144e6b5a Add back microagent files with special handling for user inputs 2025-04-29 02:06:42 +00:00
openhands
e90b841b0d Update microagent files to match original ones with added triggers and variable prompts 2025-04-29 01:48:10 +00:00
openhands
a1e6ed4dff Add special handling for microagents that require user input 2025-04-29 01:47:18 +00:00
openhands
ad6311d3cd Add back microagent files and add special handling for user input variables 2025-04-29 01:33:23 +00:00
424 changed files with 19037 additions and 6326 deletions

View File

@@ -12,5 +12,8 @@
"ghcr.io/devcontainers/features/node:1": {},
},
"postCreateCommand": ".devcontainer/setup.sh",
"runArgs": ["--network=host"],
"runArgs": ["--add-host=host.docker.internal:host-gateway"],
"containerEnv": {
"DOCKER_HOST_ADDR": "host.docker.internal"
},
}

View File

@@ -1,5 +1,9 @@
#!/bin/bash
# Mark the current repository as safe for Git to prevent "dubious ownership" errors,
# which can occur in containerized environments when directory ownership doesn't match the current user.
git config --global --add safe.directory "$(realpath .)"
# Install `nc`
sudo apt update && sudo apt install netcat -y

View File

@@ -1,5 +1,23 @@
# NodeJS
frontend/node_modules
config.toml
.envrc
.env
.git
# Configuration (except pyproject.toml)
*.ini
*.toml
!pyproject.toml
*.yml
# Documentation (except README.md)
*.md
!README.md
# Hidden files and directories
.*
__pycache__
# Unneded files and directories
/dev_config/
/docs/
/evaluation/
/tests/
CITATION.cff

View File

@@ -45,6 +45,13 @@ body:
description: What version of OpenHands are you using?
placeholder: ex. 0.9.8, main, etc.
- type: input
id: model-name
attributes:
label: Model Name
description: What model are you using?
placeholder: ex. gpt-4o, claude-3-5-sonnet, openrouter/deepseek-r1, etc.
- type: dropdown
id: os
attributes:

View File

@@ -72,3 +72,9 @@ updates:
directory: "/"
schedule:
interval: "weekly"
- package-ecosystem: "docker"
directories:
- "containers/*"
schedule:
interval: "weekly"

View File

@@ -74,7 +74,7 @@ jobs:
- name: Fix python lint issues
run: |
# Run all pre-commit hooks and continue even if they modify files (exit code 1)
pre-commit run --config ./dev_config/python/.pre-commit-config.yaml --files openhands/**/* evaluation/**/* tests/**/* || true
pre-commit run --config ./dev_config/python/.pre-commit-config.yaml --all-files || true
# Commit and push changes if any
- name: Check for changes

View File

@@ -53,7 +53,7 @@ jobs:
- name: Install pre-commit
run: pip install pre-commit==3.7.0
- name: Run pre-commit hooks
run: pre-commit run --files openhands/**/* evaluation/**/* tests/**/* --show-diff-on-failure --config ./dev_config/python/.pre-commit-config.yaml
run: pre-commit run --all-files --show-diff-on-failure --config ./dev_config/python/.pre-commit-config.yaml
# Check version consistency across documentation
check-version-consistency:

View File

@@ -81,4 +81,3 @@ jobs:
env:
TEST_RUNTIME: local
DEBUG: "1"

View File

@@ -5,6 +5,14 @@ This repository contains the code for OpenHands, an automated AI software engine
To set up the entire repo, including frontend and backend, run `make build`.
You don't need to do this unless the user asks you to, or if you're trying to run the entire application.
## Running OpenHands with OpenHands:
To run the full application to debug issues:
```bash
export INSTALL_DOCKER=0
export RUNTIME=local
make build && make run FRONTEND_PORT=12000 FRONTEND_HOST=0.0.0.0 BACKEND_HOST=0.0.0.0 &> /tmp/openhands-log.txt &
```
IMPORTANT: Before making any changes to the codebase, ALWAYS run `make install-pre-commit-hooks` to ensure pre-commit hooks are properly installed.
Before pushing any changes, you MUST ensure that any lint errors or simple test errors have been fixed.
@@ -44,7 +52,13 @@ Frontend:
- Available variables: VITE_BACKEND_HOST, VITE_USE_TLS, VITE_INSECURE_SKIP_VERIFY, VITE_FRONTEND_PORT
- Internationalization:
- Generate i18n declaration file: `npm run make-i18n`
- Data Fetching & Cache Management:
- We use TanStack Query (fka React Query) for data fetching and cache management
- Data Access Layer: API client methods are located in `frontend/src/api` and should never be called directly from UI components - they must always be wrapped with TanStack Query
- Custom hooks are located in `frontend/src/hooks/query/` and `frontend/src/hooks/mutation/`
- Query hooks should follow the pattern use[Resource] (e.g., `useConversationMicroagents`)
- Mutation hooks should follow the pattern use[Action] (e.g., `useDeleteConversation`)
- Architecture rule: UI components → TanStack Query hooks → Data Access Layer (`frontend/src/api`) → API endpoints
## Template for Github Pull Request

View File

@@ -103,6 +103,29 @@ components or interface enhancements.
make start-frontend
```
### 5. Running OpenHands with OpenHands
You can use OpenHands to develop and improve OpenHands itself! This is a powerful way to leverage AI assistance for contributing to the project.
#### Quick Start
1. **Build and run OpenHands:**
```bash
export INSTALL_DOCKER=0
export RUNTIME=local
make build && make run
```
2. **Access the interface:**
- Local development: http://localhost:3001
- Remote/cloud environments: Use the appropriate external URL
3. **Configure for external access (if needed):**
```bash
# For external access (e.g., cloud environments)
make run FRONTEND_PORT=12000 FRONTEND_HOST=0.0.0.0 BACKEND_HOST=0.0.0.0
```
### 6. LLM Debugging
If you encounter any issues with the Language Model (LM) or you're simply curious, export DEBUG=1 in the environment and restart the backend.
@@ -136,7 +159,7 @@ poetry run pytest ./tests/unit/test_*.py
To reduce build time (e.g., if no changes were made to the client-runtime component), you can use an existing Docker
container image by setting the SANDBOX_RUNTIME_CONTAINER_IMAGE environment variable to the desired Docker image.
Example: `export SANDBOX_RUNTIME_CONTAINER_IMAGE=ghcr.io/all-hands-ai/runtime:0.41-nikolaik`
Example: `export SANDBOX_RUNTIME_CONTAINER_IMAGE=ghcr.io/all-hands-ai/runtime:0.45-nikolaik`
## Develop inside Docker container

View File

@@ -12,6 +12,7 @@ DEFAULT_MODEL = "gpt-4o"
CONFIG_FILE = config.toml
PRE_COMMIT_CONFIG_PATH = "./dev_config/python/.pre-commit-config.yaml"
PYTHON_VERSION = 3.12
KIND_CLUSTER_NAME = "local-hands"
# ANSI color codes
GREEN=$(shell tput -Txterm setaf 2)
@@ -189,7 +190,7 @@ install-pre-commit-hooks:
lint-backend:
@echo "$(YELLOW)Running linters...$(RESET)"
@poetry run pre-commit run --files openhands/**/* evaluation/**/* tests/**/* --show-diff-on-failure --config $(PRE_COMMIT_CONFIG_PATH)
@poetry run pre-commit run --all-files --show-diff-on-failure --config $(PRE_COMMIT_CONFIG_PATH)
lint-frontend:
@echo "$(YELLOW)Running linters for frontend...$(RESET)"
@@ -199,6 +200,40 @@ lint:
@$(MAKE) -s lint-frontend
@$(MAKE) -s lint-backend
kind:
@echo "$(YELLOW)Checking if kind is installed...$(RESET)"
@if ! command -v kind > /dev/null; then \
echo "$(RED)kind is not installed. Please install kind with `brew install kind` to continue$(RESET)"; \
exit 1; \
else \
echo "$(BLUE)kind $(shell kind version) is already installed.$(RESET)"; \
fi
@echo "$(YELLOW)Checking if kind cluster '$(KIND_CLUSTER_NAME)' already exists...$(RESET)"
@if kind get clusters | grep -q "^$(KIND_CLUSTER_NAME)$$"; then \
echo "$(BLUE)Kind cluster '$(KIND_CLUSTER_NAME)' already exists.$(RESET)"; \
kubectl config use-context kind-$(KIND_CLUSTER_NAME); \
else \
echo "$(YELLOW)Creating kind cluster '$(KIND_CLUSTER_NAME)'...$(RESET)"; \
kind create cluster --name $(KIND_CLUSTER_NAME) --config kind/cluster.yaml; \
fi
@echo "$(YELLOW)Checking if mirrord is installed...$(RESET)"
@if ! command -v mirrord > /dev/null; then \
echo "$(RED)mirrord is not installed. Please install mirrord with `brew install metalbear-co/mirrord/mirrord` to continue$(RESET)"; \
exit 1; \
else \
echo "$(BLUE)mirrord $(shell mirrord --version) is already installed.$(RESET)"; \
fi
@echo "$(YELLOW)Installing k8s mirrord resources...$(RESET)"
@kubectl apply -f kind/manifests
@echo "$(GREEN)Mirrord resources installed successfully.$(RESET)"
@echo "$(YELLOW)Waiting for Mirrord pod to be ready.$(RESET)"
@sleep 5
@kubectl wait --for=condition=Available deployment/ubuntu-dev
@echo "$(YELLOW)Waiting for Nginx to be ready.$(RESET)"
@kubectl -n ingress-nginx wait --for=condition=Available deployment/ingress-nginx-controller
@echo "$(YELLOW)Running make run inside of mirrord.$(RESET)"
@mirrord exec --target deployment/ubuntu-dev -- make run
test-frontend:
@echo "$(YELLOW)Running tests for frontend...$(RESET)"
@cd frontend && npm run test
@@ -333,3 +368,4 @@ help:
# Phony targets
.PHONY: build check-dependencies check-system check-python check-npm check-nodejs check-docker check-poetry install-python-dependencies install-frontend-dependencies install-pre-commit-hooks lint-backend lint-frontend lint test-frontend test build-frontend start-backend start-frontend _run_setup run run-wsl setup-config setup-config-prompts setup-config-basic openhands-cloud-run docker-dev docker-run clean help
.PHONY: kind

View File

@@ -18,6 +18,17 @@
<a href="https://docs.all-hands.dev/usage/getting-started"><img src="https://img.shields.io/badge/Documentation-000?logo=googledocs&logoColor=FFE165&style=for-the-badge" alt="Check out the documentation"></a>
<a href="https://arxiv.org/abs/2407.16741"><img src="https://img.shields.io/badge/Paper%20on%20Arxiv-000?logoColor=FFE165&logo=arxiv&style=for-the-badge" alt="Paper on Arxiv"></a>
<a href="https://docs.google.com/spreadsheets/d/1wOUdFCMyY6Nt0AIqF705KN4JKOWgeI4wUGUP60krXXs/edit?gid=0#gid=0"><img src="https://img.shields.io/badge/Benchmark%20score-000?logoColor=FFE165&logo=huggingface&style=for-the-badge" alt="Evaluation Benchmark Score"></a>
<!-- Keep these links. Translations will automatically update with the README. -->
<a href="https://www.readme-i18n.com/All-Hands-AI/OpenHands?lang=de">Deutsch</a> |
<a href="https://www.readme-i18n.com/All-Hands-AI/OpenHands?lang=es">Español</a> |
<a href="https://www.readme-i18n.com/All-Hands-AI/OpenHands?lang=fr">français</a> |
<a href="https://www.readme-i18n.com/All-Hands-AI/OpenHands?lang=ja">日本語</a> |
<a href="https://www.readme-i18n.com/All-Hands-AI/OpenHands?lang=ko">한국어</a> |
<a href="https://www.readme-i18n.com/All-Hands-AI/OpenHands?lang=pt">Português</a> |
<a href="https://www.readme-i18n.com/All-Hands-AI/OpenHands?lang=ru">Русский</a> |
<a href="https://www.readme-i18n.com/All-Hands-AI/OpenHands?lang=zh">中文</a>
<hr>
</div>
@@ -37,7 +48,7 @@ Learn more at [docs.all-hands.dev](https://docs.all-hands.dev), or [sign up for
## ☁️ OpenHands Cloud
The easiest way to get started with OpenHands is on [OpenHands Cloud](https://app.all-hands.dev),
which comes with $50 in free credits for new users.
which comes with $20 in free credits for new users.
## 💻 Running OpenHands Locally
@@ -51,19 +62,21 @@ system requirements and more information.
```bash
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.41-nikolaik
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.45-nikolaik
docker run -it --rm --pull=always \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.41-nikolaik \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.45-nikolaik \
-e LOG_ALL_EVENTS=true \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.openhands-state:/.openhands-state \
-v ~/.openhands:/.openhands \
-p 3000:3000 \
--add-host host.docker.internal:host-gateway \
--name openhands-app \
docker.all-hands.dev/all-hands-ai/openhands:0.41
docker.all-hands.dev/all-hands-ai/openhands:0.45
```
> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.
You'll find OpenHands running at [http://localhost:3000](http://localhost:3000)!
When you open the application, you'll be asked to choose an LLM provider and add an API key.
@@ -134,13 +147,12 @@ For a list of open source projects and licenses used in OpenHands, please see ou
## 📚 Cite
```
@misc{openhands,
title={{OpenHands: An Open Platform for AI Software Developers as Generalist Agents}},
author={Xingyao Wang and Boxuan Li and Yufan Song and Frank F. Xu and Xiangru Tang and Mingchen Zhuge and Jiayi Pan and Yueqi Song and Bowen Li and Jaskirat Singh and Hoang H. Tran and Fuqiang Li and Ren Ma and Mingzhang Zheng and Bill Qian and Yanjun Shao and Niklas Muennighoff and Yizhe Zhang and Binyuan Hui and Junyang Lin and Robert Brennan and Hao Peng and Heng Ji and Graham Neubig},
year={2024},
eprint={2407.16741},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2407.16741},
@inproceedings{
wang2025openhands,
title={OpenHands: An Open Platform for {AI} Software Developers as Generalist Agents},
author={Xingyao Wang and Boxuan Li and Yufan Song and Frank F. Xu and Xiangru Tang and Mingchen Zhuge and Jiayi Pan and Yueqi Song and Bowen Li and Jaskirat Singh and Hoang H. Tran and Fuqiang Li and Ren Ma and Mingzhang Zheng and Bill Qian and Yanjun Shao and Niklas Muennighoff and Yizhe Zhang and Binyuan Hui and Junyang Lin and Robert Brennan and Hao Peng and Heng Ji and Graham Neubig},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=OJd3ayDDoF}
}
```

View File

@@ -51,19 +51,21 @@ OpenHands也可以使用Docker在本地系统上运行。
```bash
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.41-nikolaik
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.45-nikolaik
docker run -it --rm --pull=always \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.41-nikolaik \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.45-nikolaik \
-e LOG_ALL_EVENTS=true \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.openhands-state:/.openhands-state \
-v ~/.openhands:/.openhands \
-p 3000:3000 \
--add-host host.docker.internal:host-gateway \
--name openhands-app \
docker.all-hands.dev/all-hands-ai/openhands:0.41
docker.all-hands.dev/all-hands-ai/openhands:0.45
```
> **注意**: 如果您在0.44版本之前使用过OpenHands您可能需要运行 `mv ~/.openhands-state ~/.openhands` 来将对话历史迁移到新位置。
您将在[http://localhost:3000](http://localhost:3000)找到运行中的OpenHands
打开应用程序时您将被要求选择一个LLM提供商并添加API密钥。

60
README_JA.md Normal file
View File

@@ -0,0 +1,60 @@
<a name="readme-top"></a>
<div align="center">
<img src="./docs/static/img/logo.png" alt="Logo" width="200">
<h1 align="center">OpenHands: コードを減らして、もっと作ろう</h1>
</div>
<div align="center">
<a href="https://github.com/All-Hands-AI/OpenHands/graphs/contributors"><img src="https://img.shields.io/github/contributors/All-Hands-AI/OpenHands?style=for-the-badge&color=blue" alt="Contributors"></a>
<a href="https://github.com/All-Hands-AI/OpenHands/stargazers"><img src="https://img.shields.io/github/stars/All-Hands-AI/OpenHands?style=for-the-badge&color=blue" alt="Stargazers"></a>
<a href="https://github.com/All-Hands-AI/OpenHands/blob/main/LICENSE"><img src="https://img.shields.io/github/license/All-Hands-AI/OpenHands?style=for-the-badge&color=blue" alt="MIT License"></a>
<br/>
<a href="https://join.slack.com/t/openhands-ai/shared_invite/zt-34zm4j0gj-Qz5kRHoca8DFCbqXPS~f_A"><img src="https://img.shields.io/badge/Slack-Join%20Us-red?logo=slack&logoColor=white&style=for-the-badge" alt="Slackコミュニティに参加"></a>
<a href="https://discord.gg/ESHStjSjD4"><img src="https://img.shields.io/badge/Discord-Join%20Us-purple?logo=discord&logoColor=white&style=for-the-badge" alt="Discordコミュニティに参加"></a>
<a href="https://github.com/All-Hands-AI/OpenHands/blob/main/CREDITS.md"><img src="https://img.shields.io/badge/Project-Credits-blue?style=for-the-badge&color=FFE165&logo=github&logoColor=white" alt="クレジット"></a>
<br/>
<a href="https://docs.all-hands.dev/usage/getting-started"><img src="https://img.shields.io/badge/Documentation-000?logo=googledocs&logoColor=FFE165&style=for-the-badge" alt="ドキュメントを見る"></a>
<a href="https://arxiv.org/abs/2407.16741"><img src="https://img.shields.io/badge/Paper%20on%20Arxiv-000?logoColor=FFE165&logo=arxiv&style=for-the-badge" alt="Arxiv論文"></a>
<a href="https://docs.google.com/spreadsheets/d/1wOUdFCMyY6Nt0AIqF705KN4JKOWgeI4wUGUP60krXXs/edit?gid=0#gid=0"><img src="https://img.shields.io/badge/Benchmark%20score-000?logoColor=FFE165&logo=huggingface&style=for-the-badge" alt="評価ベンチマークスコア"></a>
<hr>
</div>
OpenHands旧OpenDevinへようこそ。これはAIが駆動するソフトウェア開発エージェントのプラットフォームです。
OpenHandsのエージェントは人間の開発者ができることは何でもこなします。コードを修正し、コマンドを実行し、ウェブを閲覧し、APIを呼び出し、StackOverflowからコードスニペットをコピーすることさえできます。
詳細は[docs.all-hands.dev](https://docs.all-hands.dev)をご覧いただくか、[OpenHands Cloud](https://app.all-hands.dev)に登録して始めましょう。
> [!IMPORTANT]
> 仕事でOpenHandsを使っていますかぜひお話を聞かせてください。[こちらの短いフォーム](https://docs.google.com/forms/d/e/1FAIpQLSet3VbGaz8z32gW9Wm-Grl4jpt5WgMXPgJ4EDPVmCETCBpJtQ/viewform)にご記入いただき、Design Partnerプログラムにご参加ください。商用機能の早期アクセスや製品ロードマップへのフィードバックの機会を提供します。
![アプリのスクリーンショット](./docs/static/img/screenshot.png)
## ☁️ OpenHands Cloud
OpenHandsを始める最も簡単な方法は[OpenHands Cloud](https://app.all-hands.dev)を利用することです。新規ユーザーには50ドル分の無料クレジットが付与されます。
## 💻 OpenHandsをローカルで実行する
OpenHandsはDockerを利用してローカル環境でも実行できます。システム要件や詳細については[Running OpenHands](https://docs.all-hands.dev/usage/installation)ガイドをご覧ください。
> [!WARNING]
> 公共ネットワークで実行していますか?[Hardened Docker Installation Guide](https://docs.all-hands.dev/usage/runtimes/docker#hardened-docker-installation)を参照して、ネットワークバインディングの制限や追加のセキュリティ対策を実施してください。
```bash
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.45-nikolaik
docker run -it --rm --pull=always \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.45-nikolaik \
-e LOG_ALL_EVENTS=true \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.openhands:/.openhands \
-p 3000:3000 \
--add-host host.docker.internal:host-gateway \
--name openhands-app \
docker.all-hands.dev/all-hands-ai/openhands:0.45
```
**注**: バージョン0.44以前のOpenHandsを使用していた場合は、会話履歴を移行するために `mv ~/.openhands-state ~/.openhands` を実行してください。
OpenHandsは[http://localhost:3000](http://localhost:3000)で起動します!

View File

@@ -64,7 +64,7 @@
#max_budget_per_task = 0.0
# Maximum number of iterations
#max_iterations = 250
#max_iterations = 500
# Path to mount the workspace in the sandbox
#workspace_mount_path_in_sandbox = "/workspace"
@@ -415,3 +415,47 @@ type = "noop"
# Configuration for the evaluation, please refer to the specific evaluation
# plugin for the available options
##############################################################################
########################### Kubernetes #######################################
# Kubernetes configuration when using the Kubernetes runtime
##############################################################################
[kubernetes]
# The Kubernetes namespace to use for OpenHands resources
#namespace = "default"
# Domain for ingress resources
#ingress_domain = "localhost"
# Size of the persistent volume claim
#pvc_storage_size = "2Gi"
# Storage class for persistent volume claims
#pvc_storage_class = "standard"
# CPU request for runtime pods
#resource_cpu_request = "1"
# Memory request for runtime pods
#resource_memory_request = "1Gi"
# Memory limit for runtime pods
#resource_memory_limit = "2Gi"
# Optional name of image pull secret for private registries
#image_pull_secret = ""
# Optional name of TLS secret for ingress
#ingress_tls_secret = ""
# Optional node selector key for pod scheduling
#node_selector_key = ""
# Optional node selector value for pod scheduling
#node_selector_val = ""
# Optional YAML string defining pod tolerations
#tolerations_yaml = ""
# Run the runtime sandbox container in privileged mode for use with docker-in-docker
#privileged = false

View File

@@ -1,16 +1,16 @@
ARG OPENHANDS_BUILD_VERSION=dev
FROM node:21.7.2-bookworm-slim AS frontend-builder
FROM node:22.16.0-bookworm-slim AS frontend-builder
WORKDIR /app
COPY ./frontend/package.json frontend/package-lock.json ./
RUN npm install -g npm@10.5.1
COPY frontend/package.json frontend/package-lock.json ./
RUN npm ci
COPY ./frontend ./
COPY frontend ./
RUN npm run build
FROM python:3.12.3-slim AS backend-builder
FROM python:3.12.10-slim AS base
FROM base AS backend-builder
WORKDIR /app
ENV PYTHONPATH='/app'
@@ -22,17 +22,18 @@ ENV POETRY_NO_INTERACTION=1 \
RUN apt-get update -y \
&& apt-get install -y curl make git build-essential \
&& python3 -m pip install poetry==1.8.2 --break-system-packages
&& python3 -m pip install poetry --break-system-packages
COPY ./pyproject.toml ./poetry.lock ./
COPY pyproject.toml poetry.lock ./
RUN touch README.md
RUN export POETRY_CACHE_DIR && poetry install --no-root && rm -rf $POETRY_CACHE_DIR
FROM python:3.12.3-slim AS openhands-app
FROM base AS openhands-app
WORKDIR /app
ARG OPENHANDS_BUILD_VERSION #re-declare for this section
# re-declare for this section
ARG OPENHANDS_BUILD_VERSION
ENV RUN_AS_OPENHANDS=true
# A random number--we need this to be different from the user's UID on the host machine
@@ -43,7 +44,7 @@ ENV WORKSPACE_BASE=/opt/workspace_base
ENV OPENHANDS_BUILD_VERSION=$OPENHANDS_BUILD_VERSION
ENV SANDBOX_USER_ID=0
ENV FILE_STORE=local
ENV FILE_STORE_PATH=/.openhands-state
ENV FILE_STORE_PATH=/.openhands
RUN mkdir -p $FILE_STORE_PATH
RUN mkdir -p $WORKSPACE_BASE
@@ -74,12 +75,7 @@ COPY --chown=openhands:app --chmod=770 --from=backend-builder ${VIRTUAL_ENV} ${V
COPY --chown=openhands:app --chmod=770 ./microagents ./microagents
COPY --chown=openhands:app --chmod=770 ./openhands ./openhands
COPY --chown=openhands:app --chmod=777 ./openhands/runtime/plugins ./openhands/runtime/plugins
COPY --chown=openhands:app --chmod=770 ./openhands/agenthub ./openhands/agenthub
COPY --chown=openhands:app ./pyproject.toml ./pyproject.toml
COPY --chown=openhands:app ./poetry.lock ./poetry.lock
COPY --chown=openhands:app ./README.md ./README.md
COPY --chown=openhands:app ./MANIFEST.in ./MANIFEST.in
COPY --chown=openhands:app ./LICENSE ./LICENSE
COPY --chown=openhands:app pyproject.toml poetry.lock README.md MANIFEST.in LICENSE ./
# This is run as "openhands" user, and will create __pycache__ with openhands:openhands ownership
RUN python openhands/core/download.py # No-op to download assets

View File

@@ -10,8 +10,9 @@ services:
environment:
- BACKEND_HOST=${BACKEND_HOST:-"0.0.0.0"}
- SANDBOX_API_HOSTNAME=host.docker.internal
- DOCKER_HOST_ADDR=host.docker.internal
#
- SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-ghcr.io/all-hands-ai/runtime:0.41-nikolaik}
- SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-ghcr.io/all-hands-ai/runtime:0.45-nikolaik}
- SANDBOX_USER_ID=${SANDBOX_USER_ID:-1234}
- WORKSPACE_MOUNT_PATH=${WORKSPACE_BASE:-$PWD/workspace}
ports:

View File

@@ -1,4 +1,4 @@
FROM ubuntu:22.04
FROM ubuntu:24.04
# install basic packages
RUN apt-get update && apt-get install -y \

View File

@@ -7,6 +7,7 @@ repos:
- id: end-of-file-fixer
exclude: docs/modules/python
- id: check-yaml
args: ["--allow-multiple-documents"]
- id: debug-statements
- repo: https://github.com/tox-dev/pyproject-fmt

View File

@@ -7,8 +7,8 @@ services:
image: openhands:latest
container_name: openhands-app-${DATE:-}
environment:
- SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-docker.all-hands.dev/all-hands-ai/runtime:0.41-nikolaik}
#- SANDBOX_USER_ID=${SANDBOX_USER_ID:-1234} # enable this only if you want a specific non-root sandbox user but you will have to manually adjust permissions of openhands-state for this user
- SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-docker.all-hands.dev/all-hands-ai/runtime:0.45-nikolaik}
#- SANDBOX_USER_ID=${SANDBOX_USER_ID:-1234} # enable this only if you want a specific non-root sandbox user but you will have to manually adjust permissions of ~/.openhands for this user
- WORKSPACE_MOUNT_PATH=${WORKSPACE_BASE:-$PWD/workspace}
ports:
- "3000:3000"
@@ -16,7 +16,7 @@ services:
- "host.docker.internal:host-gateway"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ~/.openhands-state:/.openhands-state
- ~/.openhands:/.openhands
- ${WORKSPACE_BASE:-$PWD/workspace}:/opt/workspace_base
pull_policy: build
stdin_open: true

17
docs/README.md Normal file
View File

@@ -0,0 +1,17 @@
# Setup
```
npm install -g mint
```
or
```
yarn global add mint
```
# Preview
```
mint dev
```

17
docs/README_JA.md Normal file
View File

@@ -0,0 +1,17 @@
# セットアップ
```
npm install -g mint
```
または
```
yarn global add mint
```
# プレビュー
```
mint dev
```

View File

@@ -26,6 +26,7 @@
"usage/installation",
"usage/getting-started",
"usage/key-features",
"usage/faqs",
{
"group": "OpenHands Cloud",
"pages": [
@@ -34,7 +35,8 @@
"group": "Integrations",
"pages": [
"usage/cloud/github-installation",
"usage/cloud/gitlab-installation"
"usage/cloud/gitlab-installation",
"usage/cloud/slack-installation"
]
},
"usage/cloud/cloud-ui",
@@ -42,19 +44,69 @@
]
},
{
"group": "Running OpenHands on Your Own",
"group": "Run OpenHands on Your Own",
"pages": [
"usage/local-setup",
"usage/how-to/gui-mode",
"usage/how-to/cli-mode",
"usage/how-to/headless-mode",
"usage/how-to/github-action"
"usage/how-to/github-action",
{
"group": "Advanced Configuration",
"pages": [
{
"group": "LLM Configuration",
"pages": [
"usage/llms/llms",
{
"group": "Providers",
"pages": [
"usage/llms/azure-llms",
"usage/llms/google-llms",
"usage/llms/groq",
"usage/llms/local-llms",
"usage/llms/litellm-proxy",
"usage/llms/openai-llms",
"usage/llms/openrouter"
]
}
]
},
{
"group": "Runtime Configuration",
"pages": [
"usage/runtimes/overview",
{
"group": "Providers",
"pages": [
"usage/runtimes/docker",
"usage/runtimes/remote",
"usage/runtimes/local",
{
"group": "Third-Party Providers",
"pages": [
"usage/runtimes/modal",
"usage/runtimes/daytona",
"usage/runtimes/runloop",
"usage/runtimes/e2b"
]
}
]
}
]
},
"usage/configuration-options",
"usage/how-to/custom-sandbox-guide",
"usage/search-engine-setup",
"usage/mcp"
]
}
]
},
{
"group": "Customization",
"group": "Customizations & Settings",
"pages": [
"usage/prompting/prompting-best-practices",
"usage/common-settings",
"usage/prompting/repository",
{
"group": "Microagents",
@@ -69,53 +121,9 @@
]
},
{
"group": "Advanced Configuration",
"group": "Tips and Tricks",
"pages": [
{
"group": "LLM Configuration",
"pages": [
"usage/llms/llms",
{
"group": "Providers",
"pages": [
"usage/llms/azure-llms",
"usage/llms/google-llms",
"usage/llms/groq",
"usage/llms/local-llms",
"usage/llms/litellm-proxy",
"usage/llms/openai-llms",
"usage/llms/openrouter"
]
}
]
},
{
"group": "Runtime Configuration",
"pages": [
"usage/runtimes/overview",
{
"group": "Providers",
"pages": [
"usage/runtimes/docker",
"usage/runtimes/remote",
"usage/runtimes/local",
{
"group": "Third-Party Providers",
"pages": [
"usage/runtimes/modal",
"usage/runtimes/daytona",
"usage/runtimes/runloop",
"usage/runtimes/e2b"
]
}
]
}
]
},
"usage/configuration-options",
"usage/how-to/custom-sandbox-guide",
"usage/search-engine-setup",
"usage/mcp"
"usage/prompting/prompting-best-practices"
]
},
{
@@ -143,6 +151,12 @@
}
]
},
{
"tab": "Success Stories",
"pages": [
"success-stories/index"
]
},
{
"tab": "API Reference",
"openapi": "/openapi.json"

BIN
docs/static/img/slack-create-convo.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 113 KiB

BIN
docs/static/img/slack-pro-tip.png vendored Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 118 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 542 KiB

View File

@@ -0,0 +1,217 @@
---
title: "Success Stories"
description: "Real-world examples of what you can achieve with OpenHands"
---
Discover how developers and teams are using OpenHands to automate their software development workflows. From quick fixes to complex projects, see what's possible with AI-powered development assistance.
Check out the [#success-stories](https://www.linen.dev/s/openhands/c/success-stories) channel on our Slack for more!
<Update label="2025-06-13 OpenHands helps frontline support" description="@Joe Pelletier">
## One of the cool things about OpenHands, and especially the Slack Integration, is the ability to empower folks who are on the front lines with customers.
For example, often times Support and Customer Success teams will field bug reports, doc questions, and other nits from customers. They tend to have few options to deal with this, other than file a feedback ticket with product teams and hope it gets prioritized in an upcoming sprint.
Instead, with tools like OpenHands and the Slack integration, they can request OpenHands to make fixes proactively and then have someone on the engineering team (like a lead engineer, a merge engineer, or even technical product manager) review the PR and approve it — thus reducing the cycle time for quick wins from weeks to just a few hours.
Here's how we do that with the OpenHands project:
<iframe
width="560"
height="560"
src="https://www.linen.dev/s/openhands/t/29118545/seems-mcp-config-from-config-toml-is-being-overwritten-hence#629f8e2b-cde8-427e-920c-390557a06cc9"
frameborder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
></iframe>
[Original Slack thread](https://www.linen.dev/s/openhands/t/29124350/one-of-the-cool-things-about-openhands-and-especially-the-sl#25029f37-7b0d-4535-9187-83b3e06a4011)
</Update>
<Update label="2025-06-13 Ask OpenHands to show me some love" description="@Graham Neubig">
## Asked openhands to “show me some love” and...
Asked openhands to “show me some love” and it coded up this app for me, actually kinda genuinely feel loved
<video
controls
autoplay
className="w-full aspect-video"
src="/success-stories/stories/2025-06-13-show-love/v1.mp4"
></video>
[Original Slack thread](https://www.linen.dev/s/openhands/t/29100731/asked-openhands-to-show-me-some-love-and-it-coded-up-this-ap#1e08af6b-b7d5-4167-8a53-17e6806555e0)
</Update>
<Update label="2025-06-11 OpenHands does 100% of my infra IAM research for me" description="@Xingyao Wang">
## Now, OpenHands does 100% of my infra IAM research for me
Got an IAM error on GCP? Send a screenshot to OH... and it just works!!!
Can't imagine going back to the early days without OH: I'd spend an entire afternoon figuring how to get IAM right
[Original Slack thread](https://www.linen.dev/s/openhands/t/29100732/now-openhands-does-100-of-my-infra-iam-research-for-me-sweat#20482a73-4e2e-4edd-b6d1-c9e8442fccd1)
![](/success-stories/stories/2025-06-11-infra-iam/s1.png)
![](/success-stories/stories/2025-06-11-infra-iam/s2.png)
</Update>
<Update label="2025-06-08 OpenHands builds an interactive map for me" description="@Rodrigo Argenton Freire (ODLab)">
## Very simple example, but baby steps....
I am a professor of architecture and urban design. We built, me and some students, an interactive map prototype to help visitors and new students to find important places in the campus. Considering that we lack a lot of knowledge in programming, that was really nice to build and a smooth process.
We first created the main components with all-hands and then adjusted some details locally. Definitely, saved us a lot of time and money.
That's a prototype but we will have all the info by tuesday.
https://buriti-emau.github.io/Mapa-UFU/
[Original Slack thread](https://www.linen.dev/s/openhands/t/29100736/very-simple-example-but-baby-steps-i-am-a-professor-of-archi#8f2e3f3f-44e6-44ea-b9a8-d53487470179)
![](/success-stories/stories/2025-06-08-map/s1.png)
</Update>
<Update label="2025-06-06 Web Search Saves the Day" description="@Ian Walker">
## Tavily adapter helps solve persistent debugging issue
Big congratulations to the new [Tavily adapter](https://www.all-hands.dev/blog/building-a-provably-versatile-agent)... OpenHands and I have been beavering away at a Lightstreamer client library for most of this week but were getting a persistent (and unhelpful) "unexpected error" from the server.
Coming back to the problem today, after trying several unsuccessful fixes prompted by me, OH decided all by itself to search the web, and found the cause of the problem (of course it was simply CRLF line endings...). I was on the verge of giving up - good thing OH has more stamina than me!
This demonstrates how OpenHands' web search capabilities can help solve debugging issues that would otherwise require extensive manual research.
<iframe
width="560"
height="560"
src="https://www.linen.dev/s/openhands/t/29100737/big-congratulations-to-the-new-tavily-adapter-openhands-and-#87b027e5-188b-425e-8aa9-719dcb4929f4"
title="YouTube video player"
frameborder="0"
allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
></iframe>
[Original Slack thread](https://www.linen.dev/s/openhands/t/29100737/big-congratulations-to-the-new-tavily-adapter-openhands-and-#76f1fb26-6ef7-4709-b9ea-fb99105e47e4)
</Update>
<Update label="2025-06-05 OpenHands updates my personal website for a new paper" description="@Xingyao Wang">
## I asked OpenHands to update my personal website for the "OpenHands Versa" paper.
It is an extremely trivial task: You just need to browse to arxiv, copy the author names, format them for BibTeX, and then modify the papers.bib file. But now I'm getting way too lazy to even open my IDE and actually do this one-file change!
[Original Tweet/X thread](https://x.com/xingyaow_/status/1930796287919542410)
[Original Slack thread](https://www.linen.dev/s/openhands/t/29100738/i-asked-openhands-to-update-my-personal-website-for-the-open#f0324022-b12b-4d34-b12b-bdbc43823f69)
</Update>
<Update label="2025-06-02 OpenHands makes an animated gif of swe-bench verified scores over time" description="@Graham Neubig">
## I asked OpenHands to make an animated gif of swe-bench verified scores over time.
It took a bit of prompting but ended up looking pretty nice I think
<video width="560" height="315" autoPlay loop muted src="/success-stories/stories/2025-06-02-swebench-score/s1.mp4"></video>
[Original Slack thread](https://www.linen.dev/s/openhands/t/29100744/i-asked-openhands-to-make-an-animated-gif-of-swe-bench-verif#fb3b82c9-6222-4311-b97b-b2ac1cfe6dff)
</Update>
<Update label="2025-05-30 AWS Troubleshooting" description="@Graham Neubig">
## Quick AWS security group fix
I really don't like trying to fix issues with AWS, especially security groups and other finicky things like this. But I started up an instance and wasn't able to ssh in. So I asked OpenHands:
> Currently, the following ssh command is timing out:
>
> $ ssh -i gneubig.pem ubuntu@XXX.us-east-2.compute.amazonaws.com
> ssh: connect to host XXX.us-east-2.compute.amazonaws.com port 22: Operation timed out
>
> Use the provided AWS credentials to take a look at i-XXX and examine why
And 2 minutes later I was able to SSH in!
This shows how OpenHands can quickly diagnose and fix AWS infrastructure issues that would normally require manual investigation.
[Original Slack thread](https://www.linen.dev/s/openhands/t/29100747/i-really-don-t-like-trying-to-fix-issues-with-aws-especially#d92a66d2-3bc1-4467-9d09-dc983004d083)
</Update>
<Update label="2025-05-04 Chrome Extension Development" description="@Xingyao Wang">
## OpenHands builds Chrome extension for GitHub integration
I asked OpenHands to write a Chrome extension based on our [OpenHands Cloud API](https://docs.all-hands.dev/modules/usage/cloud/cloud-api). Once installed, you can now easily launch an OpenHands cloud session from your GitHub webpage/PR!
This demonstrates OpenHands' ability to create browser extensions and integrate with external APIs, enabling seamless workflows between GitHub and OpenHands Cloud.
![Chrome extension](/success-stories/stories/2025-05-04-chrome-extension/s1.png)
![Chrome extension](/success-stories/stories/2025-05-04-chrome-extension/s2.png)
[GitHub Repository](https://github.com/xingyaoww/openhands-chrome-extension)
[Original Slack thread](https://www.linen.dev/s/openhands/t/29100755/i-asked-openhands-to-write-a-chrome-extension-based-on-our-h#88f14b7f-f8ff-40a6-83c2-bd64e95924c5)
</Update>
<Update label="2025-04-11 Visual UI Testing" description="@Xingyao Wang">
## OpenHands tests UI automatically with visual browsing
Thanks to visual browsing -- OpenHands can actually test some simple UI by serving the website, clicking the button in the browser and looking at screenshots now!
Prompt is just:
```
I want to create a Hello World app in Javascript that:
* Displays Hello World in the middle.
* Has a button that when clicked, changes the greeting with a bouncing animation to fun versions of Hello.
* Has a counter for how many times the button has been clicked.
* Has another button that changes the app's background color.
```
Eager-to-work Sonnet 3.7 will test stuff for you without you asking!
This showcases OpenHands' visual browsing capabilities, enabling it to create, serve, and automatically test web applications through actual browser interactions and screenshot analysis.
![Visual UI testing](/success-stories/stories/2025-04-11-visual-ui/s1.png)
[Original Slack thread](https://www.linen.dev/s/openhands/t/29100764/thanks-to-u07k0p3bdb9-s-visual-browsing-openhands-can-actual#21beb9bc-1a04-4272-87e9-4d3e3b9925e7)
</Update>
<Update label="2025-03-07 Proactive Error Handling" description="@Graham Neubig">
## OpenHands fixes crashes before you notice them
Interesting story, I asked OpenHands to start an app on port 12000, it showed up on the app pane. I started using the app, and then it crashed... But because it crashed in OpenHands, OpenHands immediately saw the error message and started fixing the problem without me having to do anything. It was already fixing the problem before I even realized what was going wrong.
This demonstrates OpenHands' proactive monitoring capabilities - it doesn't just execute commands, but actively watches for errors and begins remediation automatically, often faster than human reaction time.
</Update>
<Update label="2024-12-03 Creative Design Acceleration" description="@Rohit Malhotra">
## Pair programming for interactive design projects
Used OpenHands as a pair programmer to do heavy lifting for a creative/interactive design project in p5js.
I usually take around 2 days for high fidelity interactions (planning strategy + writing code + circling back with designer), did this in around 5hrs instead with the designer watching curiously the entire time.
This showcases how OpenHands can accelerate creative and interactive design workflows, reducing development time by 75% while maintaining high quality output.
[Original Tweet](https://x.com/rohit_malh5/status/1863995531657425225)
</Update>

Binary file not shown.

After

Width:  |  Height:  |  Size: 306 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 144 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 279 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.6 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 102 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 236 KiB

View File

@@ -1,7 +1,7 @@
---
title: Cloud UI
description: The Cloud UI provides a web interface for interacting with OpenHands. This page explains how to use the
OpenHands Cloud UI.
description: The Cloud UI provides a web interface for interacting with OpenHands. This page provides references on
how to use the OpenHands Cloud UI.
---
## Landing Page
@@ -19,10 +19,12 @@ The landing page is where you can:
The Settings page allows you to:
- [Configure GitHub repository access](/usage/cloud/github-installation#modifying-repository-access) for OpenHands.
- [Install the OpenHands Slack app](/usage/cloud/slack-installation).
- Set application settings like your preferred language, notifications and other preferences.
- Add credits to your account.
- Generate custom secrets.
- Create API keys to work with OpenHands programmatically.
- [Generate custom secrets](/usage/common-settings#secrets-management).
- [Create API keys to work with OpenHands programmatically](/usage/cloud/cloud-api).
- Change your email address.
## Key Features

View File

@@ -35,7 +35,7 @@ You can grant OpenHands access to specific GitHub repositories:
You can modify GitHub repository access at any time by:
- Selecting `Add GitHub repos` on the landing page or
- Visiting the Settings page and selecting `Configure GitHub Repositories` under the `Git` tab
- Visiting the Settings page and selecting `Configure GitHub Repositories` under the `Integrations` tab
## Working With GitHub Repos in Openhands Cloud

View File

@@ -19,6 +19,12 @@ appropriate repository and branch you'd like OpenHands to work on. Then click on
![Connect Repo](/static/img/connect-repo.png)
## Using Tokens with Reduced Scopes
OpenHands requests an API-scoped token during OAuth authentication. By default, this token is provided to the agent.
To restrict the agent's permissions, you can define a custom secret `GITLAB_TOKEN`, which will override the default token assigned to the agent.
While the high-permission API token is still requested and used for other components of the application (e.g. opening merge requests), the agent will not have access to it.
## Next Steps
- [Learn about the Cloud UI](/usage/cloud/cloud-ui).

View File

@@ -0,0 +1,73 @@
---
title: Slack Integration (Beta)
description: This guide walks you through installing the OpenHands Slack app.
---
## Prerequisites
- Access to OpenHands Cloud.
## Installation Steps
<AccordionGroup>
<Accordion title="Install Slack App (only for Slack admins/owners)">
**This step is for Slack admins/owners**
1. Make sure you have permissions to install Apps to your workspace.
2. Click the button below to install OpenHands Slack App <a target="_blank" href="https://slack.com/oauth/v2/authorize?client_id=7477886716822.8729519890534&scope=app_mentions:read,chat:write,users:read,channels:history,groups:history,mpim:history,im:history&user_scope=channels:history,groups:history,im:history,mpim:history"><img alt="Add to Slack" height="40" width="139" src="https://platform.slack-edge.com/img/add_to_slack.png" srcSet="https://platform.slack-edge.com/img/add_to_slack.png 1x, https://platform.slack-edge.com/img/add_to_slack@2x.png 2x" /></a>
3. In the top right corner, select the workspace to install the OpenHands Slack app.
4. Review permissions and click allow.
</Accordion>
<Accordion title="Authorize Slack App (for all Slack workspace members)">
**Make sure your Slack workspace admin/owner has installed OpenHands Slack App first.**
Every user in the Slack workspace (including admins/owners) must link their OpenHands Cloud account to the OpenHands Slack App. To do this:
1. Visit [integrations settings](https://app.all-hands.dev/settings/integrations) in OpenHands Cloud.
2. Click `Install OpenHands Slack App`.
3. In the top right corner, select the workspace to install the OpenHands Slack app.
4. Review permissions and click allow.
Depending on the workspace settings, you may need approval from your Slack admin to authorize the Slack App.
</Accordion>
</AccordionGroup>
## Working With the Slack App
To start a new conversation, you can mention `@openhands` in a new message or a thread inside any Slack channel.
Once a conversation is started, all thread messages underneath it will be follow-up messages to OpenHands.
To send follow-up messages for the same conversation, mention `@openhands` in a thread reply to the original message. You must be the user who started the conversation.
## Example conversation
### Start a new conversation, and select repo
Conversation is started by mentioning `@openhands`.
![slack-create-convo.png](/static/img/slack-create-convo.png)
### See agent response and send follow up messages
Initial request is followed up by mentioning `@openhands` in a thread reply.
![slack-results-and-follow-up.png](/static/img/slack-results-and-follow-up.png)
## Pro tip
You can mention a repo name when starting a new conversation in the following formats
1. "My-Repo" repo (e.g `@openhands in the openhands repo ...`)
2. "All-Hands-AI/OpenHands" (e.g `@openhands in All-Hands-AI/OpenHands ...`)
The repo match is case insensitive. If a repo name match is made, it will kick off the conversation.
If the repo name partially matches against multiple repos, you'll be asked to select a repo from the filtered list.
![slack-pro-tip.png](/static/img/slack-pro-tip.png)

View File

@@ -0,0 +1,52 @@
---
title: OpenHands Settings
description: Overview of some of the settings available in OpenHands.
---
## Openhands Cloud vs Running on Your Own
There are some differences between the settings available in OpenHands Cloud and those available when running OpenHands
on your own:
* [OpenHands Cloud settings](/usage/cloud/cloud-ui#settings)
* [Settings available when running on your own](/usage/how-to/gui-mode#settings)
Refer to these pages for more detailed information.
## Secrets Management
OpenHands provides a secrets manager that allows you to securely store and manage sensitive information that can be
accessed by the agent during runtime, such as API keys. These secrets are automatically exported as environment
variables in the agent's runtime environment.
### Accessing the Secrets Manager
In the Settings page, navigate to the `Secrets` tab. Here, you'll see a list of all your existing custom secrets.
### Adding a New Secret
1. Click `Add a new secret`.
2. Fill in the following fields:
- **Name**: A unique identifier for your secret (e.g., `AWS_ACCESS_KEY`). This will be the environment variable name.
- **Value**: The sensitive information you want to store.
- **Description** (optional): A brief description of what the secret is used for, which is also provided to the agent.
3. Click `Add secret` to save.
### Editing a Secret
1. Click the `Edit` button next to the secret you want to modify.
2. You can update the name and description of the secret.
<Note>
For security reasons, you cannot view or edit the value of an existing secret. If you need to change the
value, delete the secret and create a new one.
</Note>
### Deleting a Secret
1. Click the `Delete` button next to the secret you want to remove.
2. Select `Confirm` to delete the secret.
### Using Secrets in the Agent
- All custom secrets are automatically exported as environment variables in the agent's runtime environment.
- You can access them in your code using standard environment variable access methods
(e.g., `os.environ['SECRET_NAME']` in Python).
- Example: If you create a secret named `OPENAI_API_KEY`, you can access it in your code as
`process.env.OPENAI_API_KEY` in JavaScript or `os.environ['OPENAI_API_KEY']` in Python.

96
docs/usage/faqs.mdx Normal file
View File

@@ -0,0 +1,96 @@
---
title: FAQs
description: Frequently asked questions about OpenHands
icon: question
---
## Getting Started
### I'm new to OpenHands. Where should I start?
1. **Quick start**: Use [OpenHands Cloud](/usage/cloud/openhands-cloud) to get started quickly with
[GitHub](/usage/cloud/github-installation), [GitLab](/usage/cloud/gitlab-installation),
and [Slack](/usage/cloud/slack-installation) integrations.
2. **Run on your own**: If you prefer to run it on your own hardware, follow our [Getting Started guide](/usage/local-setup).
3. **First steps**: Complete the [start building tutorial](/usage/getting-started) to learn the basics.
### Can I use OpenHands for production workloads?
OpenHands is meant to be run by a single user on their local workstation. It is not appropriate for multi-tenant
deployments where multiple users share the same instance. There is no built-in authentication, isolation, or scalability.
If you're interested in running OpenHands in a multi-tenant environment, check out the source-available,
commercially-licensed [OpenHands Cloud Helm Chart](https://github.com/all-Hands-AI/OpenHands-cloud).
<Info>
Using OpenHands for work? We'd love to chat! Fill out
[this short form](https://docs.google.com/forms/d/e/1FAIpQLSet3VbGaz8z32gW9Wm-Grl4jpt5WgMXPgJ4EDPVmCETCBpJtQ/viewform)
to join our Design Partner program, where you'll get early access to commercial features and the opportunity to provide
input on our product roadmap.
</Info>
## Safety and Security
### It's doing stuff without asking, is that safe?
**Generally yes, but with important considerations.** OpenHands runs all code in a secure, isolated Docker container
(called a "sandbox") that is separate from your host system. However, the safety depends on your configuration:
**What's protected:**
- Your host system files and programs (unless you mount them using [this feature](/usage/runtimes/docker#connecting-to-your-filesystem))
- Host system resources
- Other containers and processes
**Potential risks to consider:**
- The agent can access the internet from within the container.
- If you provide credentials (API keys, tokens), the agent can use them.
- Mounted files and directories can be modified or deleted.
- Network requests can be made to external services.
For detailed security information, see our [Runtime Architecture](/usage/architecture/runtime),
[Security Configuration](/usage/configuration-options#security-configuration),
and [Hardened Docker Installation](/usage/runtimes/docker#hardened-docker-installation) documentation.
## File Storage and Access
### Where are my files stored?
Your files are stored in different locations depending on how you've configured OpenHands:
**Default behavior (no file mounting):**
- Files created by the agent are stored inside the runtime Docker container.
- These files are temporary and will be lost when the container is removed.
- The agent works in the `/workspace` directory inside the runtime container.
**When you mount your local filesystem (following [this](/usage/runtimes/docker#connecting-to-your-filesystem)):**
- Your local files are mounted into the container's `/workspace` directory.
- Changes made by the agent are reflected in your local filesystem.
- Files persist after the container is stopped.
<Warning>
Be careful when mounting your filesystem - the agent can modify or delete any files in the mounted directory.
</Warning>
## Development Tools and Environment
### How do I get the dev tools I need?
OpenHands comes with a basic runtime environment that includes Python and Node.js.
It also has the ability to install any tools it needs, so usually it's sufficient to ask it to set up its environment.
If you would like to set things up more systematically, you can:
- **Use setup.sh**: Add a [setup.sh file](/usage/prompting/repository#setup-script) file to
your repository, which will be run every time the agent starts.
- **Use a custom sandbox**: Use a [custom docker image](/usage/how-to/custom-sandbox-guide) to initialize the sandbox.
### Something's not working. Where can I get help?
1. **Search existing issues**: Check our [GitHub issues](https://github.com/All-Hands-AI/OpenHands/issues) to see if
others have encountered the same problem.
2. **Join our community**: Get help from other users and developers:
- [Slack community](https://join.slack.com/t/openhands-ai/shared_invite/zt-34zm4j0gj-Qz5kRHoca8DFCbqXPS~f_A)
- [Discord server](https://discord.gg/ESHStjSjD4)
3. **Check our troubleshooting guide**: Common issues and solutions are documented in
[Troubleshooting](/usage/troubleshooting/troubleshooting).
4. **Report bugs**: If you've found a bug, please [create an issue](https://github.com/All-Hands-AI/OpenHands/issues/new)
and fill in as much detail as possible.

View File

@@ -1,6 +1,6 @@
---
title: Start Building
description: So you've [run OpenHands](./installation) and have [set up your LLM](./installation#setup). Now what?
description: So you've [run OpenHands](/usage/installation). Now what?
icon: code
---

View File

@@ -11,19 +11,28 @@ for scripting.
### Running with Python
**Note** - OpenHands requires Python version 3.12 or higher (Python 3.14 is not currently supported)
1. Install OpenHands using pip:
```bash
pip install openhands-ai
```
2. Set your model, API key, and other preferences using environment variables or with the [`config.toml`](https://github.com/All-Hands-AI/OpenHands/blob/main/config.template.toml) file.
3. Launch an interactive OpenHands conversation from the command line:
Or if you prefer not to manage your own Python environment, you can use `uvx`:
```bash
uvx --python 3.12 --from openhands-ai openhands
```
2. Launch an interactive OpenHands conversation from the command line:
```bash
openhands
```
3. Set your model, API key, and other preferences using the UI (or alternatively environment variables, below).
This command opens an interactive prompt where you can type tasks or commands and get responses from OpenHands.
#### For Developers
@@ -46,19 +55,21 @@ poetry run python -m openhands.cli.main
```bash
docker run -it \
--pull=always \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.41-nikolaik \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.45-nikolaik \
-e SANDBOX_USER_ID=$(id -u) \
-e SANDBOX_VOLUMES=$SANDBOX_VOLUMES \
-e LLM_API_KEY=$LLM_API_KEY \
-e LLM_MODEL=$LLM_MODEL \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.openhands-state:/.openhands-state \
-v ~/.openhands:/.openhands \
--add-host host.docker.internal:host-gateway \
--name openhands-app-$(date +%Y%m%d%H%M%S) \
docker.all-hands.dev/all-hands-ai/openhands:0.41 \
docker.all-hands.dev/all-hands-ai/openhands:0.45 \
python -m openhands.cli.main --override-cli-mode true
```
> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.
This launches the CLI in Docker, allowing you to interact with OpenHands as described above.
The `-e SANDBOX_USER_ID=$(id -u)` ensures files created by the agent in your workspace have the correct permissions.

View File

@@ -25,9 +25,9 @@ You can use the Settings page at any time to:
- Setup the LLM provider and model for OpenHands.
- [Setup the search engine](/usage/search-engine-setup).
- [Configure MCP servers](/usage/mcp).
- [Connect to GitHub](/usage/how-to/gui-mode#github-setup) and [connect to GitLab](/usage/how-to/gui-mode#gitlab-setup)
- [Connect to GitHub](/usage/how-to/gui-mode#github-setup) and [connect to GitLab](/usage/how-to/gui-mode#gitlab-setup).
- Set application settings like your preferred language, notifications and other preferences.
- Generate custom secrets.
- [Manage custom secrets](/usage/common-settings#secrets-management).
#### GitHub Setup
@@ -45,7 +45,7 @@ OpenHands automatically exports a `GITHUB_TOKEN` to the shell environment if pro
- All Repositories (You can select specific repositories, but this will impact what returns in repo search)
- Minimal Permissions (Select `Meta Data = Read-only` read for search, `Pull Requests = Read and Write` and `Content = Read and Write` for branch creation)
2. **Enter Token in OpenHands**:
- In the Settings page, navigate to the `Git` tab.
- In the Settings page, navigate to the `Integrations` tab.
- Paste your token in the `GitHub Token` field.
- Click `Save Changes` to apply the changes.
@@ -97,9 +97,14 @@ OpenHands automatically exports a `GITLAB_TOKEN` to the shell environment if pro
- `write_repository` (Write repository)
- Set an expiration date or leave it blank for a non-expiring token.
2. **Enter Token in OpenHands**:
- In the Settings page, navigate to the `Git` tab.
- In the Settings page, navigate to the `Integrations` tab.
- Paste your token in the `GitLab Token` field.
- Click `Save Changes` to apply the changes.
3. **(Optional): Restrict agent permissions**
- Create another PAT using Step 1 and exclude `api` scope .
- In the Settings page, in the `Secrets` tab, create a new secret `GITLAB_TOKEN` and paste your lower scope token.
- OpenHands will use the higher scope token, and the agent will use the lower scope token
</Accordion>
<Accordion title="Troubleshooting">
@@ -117,6 +122,41 @@ OpenHands automatically exports a `GITLAB_TOKEN` to the shell environment if pro
</Accordion>
</AccordionGroup>
#### BitBucket Setup (Coming soon ...)
<AccordionGroup>
<Accordion title="Setting Up a BitBucket Password">
1. **Generate an App Password**:
- On BitBucket, go to Personal Settings > App Password.
- Create a new password with the following scopes:
- `repository: read`
- `repository: write`
- `pull requests: read`
- `pull requests: write`
- `issues: read`
- `issues: write`
- App passwords are non-expiring token. OpenHands will migrate to using API tokens in the future.
2. **Enter Token in OpenHands**:
- In the Settings page, navigate to the `Integrations` tab.
- Paste your token in the `BitBucket Token` field.
- Click `Save Changes` to apply the changes.
</Accordion>
<Accordion title="Troubleshooting">
Common issues and solutions:
- **Token Not Recognized**:
- Ensure the token is properly saved in settings.
- Check that the token hasn't expired.
- Verify the token has the required scopes.
- **Verifying Token Works**:
- The app will show a green checkmark if the token is valid.
- Try accessing a repository to confirm permissions.
- Check the browser console for any error messages.
</Accordion>
</AccordionGroup>
#### Advanced Settings
The `Advanced` settings allows configuration of additional LLM settings. Inside the Settings page, under the `LLM` tab,
@@ -132,10 +172,24 @@ toggle `Advanced` options to access additional settings.
For an overview of the key features available inside a conversation, please refer to the [Key Features](/usage/key-features)
section of the documentation.
### Status Indicator
The status indicator located in the bottom left of the screen will cycle through a number of states as a new conversation
is loaded. Typically these include:
* `Disconnected` : The frontend is not connected to any conversation.
* `Connecting` : The frontend is connecting a websocket to a conversation.
* `Building Runtime...` : The server is building a runtime. This is typically in development mode only while building a docker image.
* `Starting Runtime...` : The server is starting a new runtime instance - probably a new docker container or remote runtime.
* `Initializing Agent...` : The server is starting the agent loop (This step does not appear at present with Nested runtimes).
* `Setting up workspace...` : Usually this means a `git clone ...` operation.
* `Setting up git hooks` : Setting up the git pre commit hooks for the workspace.
* `Agent is awaiting user input...` : Ready to go!
## Tips for Effective Use
- Be specific in your requests to get the most accurate and helpful responses, as described in the [prompting best practices](../prompting/prompting-best-practices).
- Use one of the recommended models, as described in the [LLMs section](usage/llms/llms.md).
- Use one of the recommended models, as described in the [LLMs section](/usage/llms/llms).
## Other Ways to Run Openhands
- [Run OpenHands in a scriptable headless mode.](/usage/how-to/headless-mode)

View File

@@ -32,19 +32,20 @@ To run OpenHands in Headless mode with Docker:
```bash
docker run -it \
--pull=always \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.41-nikolaik \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.45-nikolaik \
-e SANDBOX_USER_ID=$(id -u) \
-e SANDBOX_VOLUMES=$SANDBOX_VOLUMES \
-e LLM_API_KEY=$LLM_API_KEY \
-e LLM_MODEL=$LLM_MODEL \
-e LOG_ALL_EVENTS=true \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.openhands-state:/.openhands-state \
-v ~/.openhands:/.openhands \
--add-host host.docker.internal:host-gateway \
--name openhands-app-$(date +%Y%m%d%H%M%S) \
docker.all-hands.dev/all-hands-ai/openhands:0.41 \
docker.all-hands.dev/all-hands-ai/openhands:0.45 \
python -m openhands.core.main -t "write a bash script that prints hi"
```
> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.
The `-e SANDBOX_USER_ID=$(id -u)` is passed to the Docker command to ensure the sandbox user matches the host users
permissions. This prevents the agent from creating root-owned files in the mounted workspace.

View File

@@ -1,12 +1,12 @@
---
title: Quick Start
description: Running OpenHands Cloud or running on your local system.
description: Running OpenHands Cloud or running on your own.
icon: rocket
---
## OpenHands Cloud
The easiest way to get started with OpenHands is on OpenHands Cloud, which comes with $50 in free credits for new users.
The easiest way to get started with OpenHands is on OpenHands Cloud, which comes with $20 in free credits for new users.
To get started with OpenHands Cloud, visit [app.all-hands.dev](https://app.all-hands.dev).

View File

@@ -8,7 +8,7 @@ description: OpenHands uses LiteLLM to make calls to Google's chat models. You c
When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab:
- `LLM Provider` to `Gemini`
- `LLM Model` to the model you will be using.
If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model`
If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model`
(e.g. gemini/&lt;model-name&gt; like `gemini/gemini-2.0-flash`).
- `API Key` to your Gemini API key
@@ -26,5 +26,5 @@ VERTEXAI_LOCATION="<your-gcp-location>"
Then set the following in the OpenHands UI through the Settings under the `LLM` tab:
- `LLM Provider` to `VertexAI`
- `LLM Model` to the model you will be using.
If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model`
If the model is not in the list, enable `Advanced` options, and enter it in `Custom Model`
(e.g. vertex_ai/&lt;model-name&gt;).

View File

@@ -8,7 +8,7 @@ description: OpenHands uses LiteLLM to make calls to chat models on Groq. You ca
When running OpenHands, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab:
- `LLM Provider` to `Groq`
- `LLM Model` to the model you will be using. [Visit here to see the list of
models that Groq hosts](https://console.groq.com/docs/models). If the model is not in the list,
models that Groq hosts](https://console.groq.com/docs/models). If the model is not in the list,
enable `Advanced` options, and enter it in `Custom Model` (e.g. groq/&lt;model-name&gt; like `groq/llama3-70b-8192`).
- `API key` to your Groq API key. To find or create your Groq API Key, [see here](https://console.groq.com/keys).

View File

@@ -16,7 +16,7 @@ To use LiteLLM proxy with OpenHands, you need to:
## Supported Models
The supported models depend on your LiteLLM proxy configuration. OpenHands supports any model that your LiteLLM proxy
The supported models depend on your LiteLLM proxy configuration. OpenHands supports any model that your LiteLLM proxy
is configured to handle.
Refer to your LiteLLM proxy configuration for the list of available models and their names.

View File

@@ -14,23 +14,28 @@ recommendations for model selection. Our latest benchmarking results can be foun
Based on these findings and community feedback, these are the latest models that have been verified to work reasonably well with OpenHands:
### Cloud / API-Based Models
- [anthropic/claude-sonnet-4-20250514](https://www.anthropic.com/api) (recommended)
- [openai/o4-mini](https://openai.com/index/introducing-o3-and-o4-mini/)
- [gemini/gemini-2.5-pro](https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/)
- [deepseek/deepseek-chat](https://api-docs.deepseek.com/)
- [all-hands/openhands-lm-32b-v0.1](https://www.all-hands.dev/blog/introducing-openhands-lm-32b----a-strong-open-coding-agent-model) -- available through [OpenRouter](https://openrouter.ai/all-hands/openhands-lm-32b-v0.1)
If you have successfully run OpenHands with specific providers, we encourage you to open a PR to share your setup process
to help others using the same provider!
For a full list of the providers and models available, please consult the
[litellm documentation](https://docs.litellm.ai/docs/providers).
<Warning>
OpenHands will issue many prompts to the LLM you configure. Most of these LLMs cost money, so be sure to set spending
limits and monitor usage.
</Warning>
If you have successfully run OpenHands with specific providers, we encourage you to open a PR to share your setup process
to help others using the same provider!
### Local / Self-Hosted Models
For a full list of the providers and models available, please consult the
[litellm documentation](https://docs.litellm.ai/docs/providers).
- [mistralai/devstral-small](https://www.all-hands.dev/blog/devstral-a-new-state-of-the-art-open-model-for-coding-agents) (20 May 2025) -- also available through [OpenRouter](https://openrouter.ai/mistralai/devstral-small:free)
- [all-hands/openhands-lm-32b-v0.1](https://www.all-hands.dev/blog/introducing-openhands-lm-32b----a-strong-open-coding-agent-model) (31 March 2025) -- also available through [OpenRouter](https://openrouter.ai/all-hands/openhands-lm-32b-v0.1)
<Note>
Most current local and open source models are not as powerful. When using such models, you may see long

View File

@@ -6,73 +6,85 @@ description: When using a Local LLM, OpenHands may have limited functionality. I
## News
- 2025/05/21: We collaborated with Mistral AI and released [Devstral Small](https://mistral.ai/news/devstral) that achieves [46.8% on SWE-Bench Verified](https://github.com/SWE-bench/experiments/pull/228)!
- 2025/03/31: We released an open model OpenHands LM v0.1 32B that achieves 37.1% on SWE-Bench Verified
- 2025/03/31: We released an open model OpenHands LM 32B v0.1 that achieves 37.1% on SWE-Bench Verified
([blog](https://www.all-hands.dev/blog/introducing-openhands-lm-32b----a-strong-open-coding-agent-model), [model](https://huggingface.co/all-hands/openhands-lm-32b-v0.1)).
## Quickstart: Running OpenHands with a Local LLM using LM Studio
## Quickstart: Running OpenHands on Your Macbook
This guide explains how to serve a local Devstral LLM using [LM Studio](https://lmstudio.ai/) and have OpenHands connect to it.
### Serve the model on your Macbook
We recommend:
- **LM Studio** as the local model server, which handles metadata downloads automatically and offers a simple, user-friendly interface for configuration.
- **Devstral Small 2505** as the LLM for software development, trained on real GitHub issues and optimized for agent-style workflows like OpenHands.
We recommend using [LMStudio](https://lmstudio.ai/) for serving these models locally.
### Hardware Requirements
1. Download [LM Studio](https://lmstudio.ai/) and install it
Running Devstral requires a recent GPU with at least 16GB of VRAM, or a Mac with Apple Silicon (M1, M2, etc.) with at least 32GB of RAM.
2. Download the model:
- Option 1: Directly download the LLM from [this link](https://lmstudio.ai/model/devstral-small-2505-mlx) or by searching for the name `Devstral-Small-2505` in LM Studio
- Option 2: Download a LLM in GGUF format. For example, to download [Devstral Small 2505 GGUF](https://huggingface.co/mistralai/Devstral-Small-2505_gguf), using `huggingface-cli download mistralai/Devstral-Small-2505_gguf --local-dir mistralai/Devstral-Small-2505_gguf`. Then in bash terminal, run `lms import {model_name}` in the directory where you've downloaded the model checkpoint (e.g. run `lms import devstralQ4_K_M.gguf` in `mistralai/Devstral-Small-2505_gguf`)
### 1. Install LM Studio
3. Open LM Studio application, you should first switch to `power user` mode, and then open the developer tab:
Download and install the LM Studio desktop app from [lmstudio.ai](https://lmstudio.ai/).
![image](./screenshots/1_select_power_user.png)
### 2. Download Devstral Small
4. Then click `Select a model to load` on top of the application:
1. Make sure to set the User Interface Complexity Level to "Power User", by clicking on the appropriate label at the bottom of the window.
2. Click the "Discover" button (Magnifying Glass icon) on the left navigation bar to open the Models download page.
![image](./screenshots/2_select_model.png)
![image](./screenshots/01_lm_studio_open_model_hub.png)
5. And choose the model you want to use, holding `option` on mac to enable advanced loading options:
3. Search for the "Devstral Small 2505" model, confirm it's the official Mistral AI (mistralai) model, then proceed to download.
![image](./screenshots/3_select_devstral.png)
![image](./screenshots/02_lm_studio_download_devstral.png)
6. You should then pick an appropriate context window for OpenHands based on your hardware configuration (larger than 32768 is recommended for using OpenHands, but too large may cause you to run out of memory); Flash attention is also recommended if it works on your machine.
4. Wait for the download to finish.
![image](./screenshots/4_set_context_window.png)
### 3. Load the Model
7. And you should start the server (if it is not already in `Running` status), un-toggle `Serve on Local Network` and remember the port number of the LMStudio URL (`1234` is the port number for `http://127.0.0.1:1234` in this example):
1. Click the "Developer" button (Console icon) on the left navigation bar to open the Developer Console.
2. Click the "Select a model to load" dropdown at the top of the application window.
![image](./screenshots/5_copy_url.png)
![image](./screenshots/03_lm_studio_open_load_model.png)
8. Finally, you can click the `copy` button near model name to copy the model name (`imported-models/uncategorized/devstralq4_k_m.gguf` in this example):
3. Enable the "Manually choose model load parameters" switch.
4. Select 'Devstral Small 2505' from the model list.
![image](./screenshots/6_copy_to_get_model_name.png)
![image](./screenshots/04_lm_studio_setup_devstral_part_1.png)
### Start OpenHands with locally served model
5. Enable the "Show advanced settings" switch at the bottom of the Model settings flyout to show all the available settings.
6. Set "Context Length" to at least 32768 and enable Flash Attention.
7. Click "Load Model" to start loading the model.
Check [the installation guide](/usage/local-setup) to make sure you have all the prerequisites for running OpenHands.
![image](./screenshots/05_lm_studio_setup_devstral_part_2.png)
### 4. Start the LLM server
1. Enable the switch next to "Status" at the top-left of the Window.
2. Take note of the Model API Identifier shown on the sidebar on the right.
![image](./screenshots/06_lm_studio_start_server.png)
### 5. Start OpenHands
1. Check [the installation guide](/usage/local-setup) and ensure all prerequisites are met before running OpenHands, then run:
```bash
export LMSTUDIO_MODEL_NAME="imported-models/uncategorized/devstralq4_k_m.gguf" # <- Replace this with the model name you copied from LMStudio
export LMSTUDIO_URL="http://host.docker.internal:1234" # <- Replace this with the port from LMStudio
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.41-nikolaik
mkdir -p ~/.openhands-state && echo '{"language":"en","agent":"CodeActAgent","max_iterations":null,"security_analyzer":null,"confirmation_mode":false,"llm_model":"lm_studio/'$LMSTUDIO_MODEL_NAME'","llm_api_key":"dummy","llm_base_url":"'$LMSTUDIO_URL/v1'","remote_runtime_resource_factor":null,"github_token":null,"enable_default_condenser":true,"user_consents_to_analytics":true}' > ~/.openhands-state/settings.json
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.45-nikolaik
docker run -it --rm --pull=always \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.41-nikolaik \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.45-nikolaik \
-e LOG_ALL_EVENTS=true \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.openhands-state:/.openhands-state \
-v ~/.openhands:/.openhands \
-p 3000:3000 \
--add-host host.docker.internal:host-gateway \
--name openhands-app \
docker.all-hands.dev/all-hands-ai/openhands:0.41
docker.all-hands.dev/all-hands-ai/openhands:0.45
```
Once your server is running -- you can visit `http://localhost:3000` in your browser to use OpenHands with local Devstral model:
2. Wait until the server is running (see log below):
```
Digest: sha256:e72f9baecb458aedb9afc2cd5bc935118d1868719e55d50da73190d3a85c674f
Status: Image is up to date for docker.all-hands.dev/all-hands-ai/openhands:0.41
Status: Image is up to date for docker.all-hands.dev/all-hands-ai/openhands:0.45
Starting OpenHands...
Running OpenHands as root
14:22:13 - openhands:INFO: server_config.py:50 - Using config class None
@@ -82,53 +94,88 @@ INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:3000 (Press CTRL+C to quit)
```
3. Visit `http://localhost:3000` in your browser.
## Advanced: Serving LLM on GPUs
### 6. Configure OpenHands to use the LLM server
### Download model checkpoints
Once you open OpenHands in your browser, you'll need to configure it to use the local LLM server you just started.
<Note>
The model checkpoints downloaded here should NOT be in GGUF format.
</Note>
When started for the first time, OpenHands will prompt you to set up the LLM provider.
For example, to download [OpenHands LM 32B v0.1](https://huggingface.co/all-hands/openhands-lm-32b-v0.1):
1. Click "see advanced settings" to open the LLM Settings page.
![image](./screenshots/07_openhands_open_advanced_settings.png)
2. Enable the "Advanced" switch at the top of the page to show all the available settings.
3. Set the following values:
- **Custom Model**: `openai/mistralai/devstral-small-2505` (the Model API identifier from LM Studio, prefixed with "openai/")
- **Base URL**: `http://host.docker.internal:1234/v1`
- **API Key**: `local-llm`
4. Click "Save Settings" to save the configuration.
![image](./screenshots/08_openhands_configure_local_llm_parameters.png)
That's it! You can now start using OpenHands with the local LLM server.
If you encounter any issues, let us know on [Slack](https://join.slack.com/t/openhands-ai/shared_invite/zt-34zm4j0gj-Qz5kRHoca8DFCbqXPS~f_A) or [Discord](https://discord.gg/ESHStjSjD4).
## Advanced: Alternative LLM Backends
This section describes how to run local LLMs with OpenHands using alternative backends like Ollama, SGLang, or vLLM — without relying on LM Studio.
### Create an OpenAI-Compatible Endpoint with Ollama
- Install Ollama following [the official documentation](https://ollama.com/download).
- Example launch command for Devstral Small 2505:
```bash
huggingface-cli download all-hands/openhands-lm-32b-v0.1 --local-dir all-hands/openhands-lm-32b-v0.1
# ⚠️ WARNING: OpenHands requires a large context size to work properly.
# When using Ollama, set OLLAMA_CONTEXT_LENGTH to at least 32768.
# The default (4096) is way too small — not even the system prompt will fit, and the agent will not behave correctly.
OLLAMA_CONTEXT_LENGTH=32768 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_KEEP_ALIVE=-1 nohup ollama serve &
ollama pull devstral:latest
```
### Create an OpenAI-Compatible Endpoint With SGLang
### Create an OpenAI-Compatible Endpoint with vLLM or SGLang
First, download the model checkpoints. For [Devstral Small 2505](https://huggingface.co/mistralai/Devstral-Small-2505):
```bash
huggingface-cli download mistralai/Devstral-Small-2505 --local-dir mistralai/Devstral-Small-2505
```
#### Serving the model using SGLang
- Install SGLang following [the official documentation](https://docs.sglang.ai/start/install.html).
- Example launch command for OpenHands LM 32B (with at least 2 GPUs):
- Example launch command for Devstral Small 2505 (with at least 2 GPUs):
```bash
SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 python3 -m sglang.launch_server \
--model all-hands/openhands-lm-32b-v0.1 \
--served-model-name openhands-lm-32b-v0.1 \
--model mistralai/Devstral-Small-2505 \
--served-model-name Devstral-Small-2505 \
--port 8000 \
--tp 2 --dp 1 \
--host 0.0.0.0 \
--api-key mykey --context-length 131072
```
### Create an OpenAI-Compatible Endpoint with vLLM
#### Serving the model using vLLM
- Install vLLM following [the official documentation](https://docs.vllm.ai/en/latest/getting_started/installation.html).
- Example launch command for OpenHands LM 32B (with at least 2 GPUs):
- Example launch command for Devstral Small 2505 (with at least 2 GPUs):
```bash
vllm serve all-hands/openhands-lm-32b-v0.1 \
vllm serve mistralai/Devstral-Small-2505 \
--host 0.0.0.0 --port 8000 \
--api-key mykey \
--tensor-parallel-size 2 \
--served-model-name openhands-lm-32b-v0.1
--served-model-name Devstral-Small-2505 \
--enable-prefix-caching
```
## Advanced: Run and Configure OpenHands
### Run OpenHands
### Run OpenHands (Alternative Backends)
#### Using Docker
@@ -137,24 +184,20 @@ Run OpenHands using [the official docker run command](../installation#start-the-
#### Using Development Mode
Use the instructions in [Development.md](https://github.com/All-Hands-AI/OpenHands/blob/main/Development.md) to build OpenHands.
Ensure `config.toml` exists by running `make setup-config` which will create one for you. In the `config.toml`, enter the following:
```
[core]
workspace_base="/path/to/your/workspace"
[llm]
model="openhands-lm-32b-v0.1"
ollama_base_url="http://localhost:8000"
```
Start OpenHands using `make run`.
### Configure OpenHands
### Configure OpenHands (Alternative Backends)
Once OpenHands is running, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab:
1. Enable `Advanced` options.
2. Set the following:
- `Custom Model` to `openai/<served-model-name>` (e.g. `openai/openhands-lm-32b-v0.1`)
- `Base URL` to `http://host.docker.internal:8000`
- `API key` to the same string you set when serving the model (e.g. `mykey`)
Once OpenHands is running, open the Settings page in the UI and go to the `LLM` tab.
1. Click **"see advanced settings"** to access the full configuration panel.
2. Enable the **Advanced** toggle at the top of the page.
3. Set the following parameters, if you followed the examples above:
- **Custom Model**: `openai/<served-model-name>`
e.g. `openai/devstral` if you're using Ollama, or `openai/Devstral-Small-2505` for SGLang or vLLM.
- **Base URL**: `http://host.docker.internal:<port>/v1`
Use port `11434` for Ollama, or `8000` for SGLang and vLLM.
- **API Key**:
- For **Ollama**: any placeholder value (e.g. `dummy`, `local-llm`)
- For **SGLang** or **vLLM**: use the same key provided when starting the server (e.g. `mykey`)

View File

@@ -9,6 +9,6 @@ When running OpenHands, you'll need to set the following in the OpenHands UI thr
* `LLM Provider` to `OpenRouter`
* `LLM Model` to the model you will be using.
[Visit here to see a full list of OpenRouter models](https://openrouter.ai/models).
If the model is not in the list, enable `Advanced` options, and enter it in
If the model is not in the list, enable `Advanced` options, and enter it in
`Custom Model` (e.g. openrouter/&lt;model-name&gt; like `openrouter/anthropic/claude-3.5-sonnet`).
* `API Key` to your OpenRouter API key.

Binary file not shown.

After

Width:  |  Height:  |  Size: 68 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 168 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 60 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 73 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 127 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 87 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 74 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 228 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 420 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 83 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 558 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 646 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 93 KiB

View File

@@ -10,6 +10,7 @@ description: Getting started with running OpenHands on your own.
- MacOS with [Docker Desktop support](https://docs.docker.com/desktop/setup/install/mac-install/#system-requirements)
- Linux
- Windows with [WSL](https://learn.microsoft.com/en-us/windows/wsl/install) and [Docker Desktop support](https://docs.docker.com/desktop/setup/install/windows-install/#system-requirements)
- Windows without WSL (see [Windows Without WSL Guide](/usage/windows-without-wsl))
A system with a modern processor and a minimum of **4GB RAM** is recommended to run OpenHands.
@@ -55,6 +56,10 @@ A system with a modern processor and a minimum of **4GB RAM** is recommended to
The docker command below to start the app must be run inside the WSL terminal.
</Note>
**Alternative: Windows without WSL**
If you prefer to run OpenHands on Windows without WSL or Docker, see our [Windows Without WSL Guide](/usage/windows-without-wsl).
</Accordion>
</AccordionGroup>
@@ -62,19 +67,21 @@ A system with a modern processor and a minimum of **4GB RAM** is recommended to
### Start the App
```bash
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.41-nikolaik
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.45-nikolaik
docker run -it --rm --pull=always \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.41-nikolaik \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.45-nikolaik \
-e LOG_ALL_EVENTS=true \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.openhands-state:/.openhands-state \
-v ~/.openhands:/.openhands \
-p 3000:3000 \
--add-host host.docker.internal:host-gateway \
--name openhands-app \
docker.all-hands.dev/all-hands-ai/openhands:0.41
docker.all-hands.dev/all-hands-ai/openhands:0.45
```
> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.
You'll find OpenHands running at http://localhost:3000!
### Setup
@@ -117,10 +124,24 @@ OpenHands requires an API key to access most language models. Here's how to get
</Accordion>
<Accordion title="Local LLM (e.g. LM Studio, llama.cpp, Ollama)">
If your local LLM server isnt behind an authentication proxy, you can enter any value as the API key (e.g. `local-key`, `test123`) — it wont be used.
</Accordion>
</AccordionGroup>
Consider setting usage limits to control costs.
#### Using a Local LLM
<Note>
Effective use of local models for agent tasks requires capable hardware, along with models specifically tuned for instruction-following and agent-style behavior.
</Note>
To run OpenHands with a locally hosted language model instead of a cloud provider, see the [Local LLMs guide](/usage/llms/local-llms) for setup instructions.
#### Setting Up Search Engine
OpenHands can be configured to use a search engine to allow the agent to search the web for information when needed.
@@ -132,8 +153,6 @@ To enable search functionality in OpenHands:
For more details, see the [Search Engine Setup](/usage/search-engine-setup) guide.
Now you're ready to [get started with OpenHands](/usage/getting-started).
### Versions
The [docker command above](/usage/local-setup#start-the-app) pulls the most recent stable release of OpenHands. You have other options as well:

View File

@@ -5,26 +5,111 @@ description: Keyword-triggered microagents provide OpenHands with specific instr
## Usage
These microagents are only loaded when a prompt includes one of the trigger words.
Keyword-triggered microagents are only loaded when a prompt includes one of the trigger words. There are two types of keyword-triggered microagents:
1. **Standard Keyword Microagents**: Triggered by keywords embedded in text
2. **Command-Style Microagents**: Triggered by command-style inputs (e.g., `/fix_test`) that can prompt for user input
Additionally, there's a special type of microagent that's always active:
3. **Repository Microagents**: Always active for a specific repository, providing repository-specific context and tools
## Frontmatter Syntax
Frontmatter is required for keyword-triggered microagents. It must be placed at the top of the file,
above the guidelines.
above the guidelines. Enclose the frontmatter in triple dashes (---).
Enclose the frontmatter in triple dashes (---) and include the following fields:
### Standard Keyword Microagents
For standard keyword microagents, include the following fields:
| Field | Description | Required | Default |
|------------|--------------------------------------------------|----------|------------------|
| `name` | The name of the microagent | No | Filename |
| `type` | The type of microagent (`knowledge`) | No | Inferred |
| `triggers` | A list of keywords that activate the microagent. | Yes | None |
| `agent` | The agent this microagent applies to. | No | 'CodeActAgent' |
### Command-Style Microagents
## Example
For command-style microagents that require user input, include the following fields:
Keyword-triggered microagent file example located at `.openhands/microagents/yummy.md`:
```
| Field | Description | Required | Default |
|------------|------------------------------------------------------------|----------|------------------|
| `name` | The name of the microagent | No | Filename |
| `type` | The type of microagent (`task`) | No | Inferred |
| `triggers` | A list of command triggers (e.g., `/fix_test`) | No | `/[name]` |
| `inputs` | A list of input variables the microagent requires | Yes | None |
### Repository Microagents
Repository microagents are always active for a specific repository. They provide repository-specific context and tools.
| Field | Description | Required | Default |
|------------|------------------------------------------------------------|----------|------------------|
| `name` | The name of the microagent | No | Filename |
| `type` | The type of microagent (`repo`) | No | Inferred |
#### Repository Microagent Example
Here's an example of a repository microagent:
```yaml
---
# The type field is optional and will be inferred as 'repo' when no triggers are present
---
# Repository Guidelines
This repository follows these coding standards:
1. Use PEP 8 for Python code
2. Use ESLint for JavaScript code
3. Write unit tests for all new features
```
This microagent is always active when working with the repository and provides repository-specific guidelines.
### MCP Tools Support
Microagents can also provide additional MCP (Model-Code-Prompt) tools to the agent. This is useful for extending the agent's capabilities with custom tools.
| Field | Description | Required | Default |
|--------------|-----------------------------------------------------------|----------|------------------|
| `mcp_tools` | Configuration for additional MCP tools | No | None |
#### MCP Tools Example
Here's an example of a microagent that provides an additional MCP tool (the `fetch` tool for accessing web content):
```yaml
---
# The type field is optional and will be inferred as 'repo' when no triggers are present
mcp_tools:
stdio_servers:
- name: "fetch"
command: uvx
args:
- mcp-server-fetch
---
```
This microagent is a repository microagent (always active) that adds the `fetch` tool to the agent's capabilities.
Each input in the `inputs` list requires:
| Field | Description | Required |
|---------------|--------------------------------------------------|----------|
| `name` | The name of the input variable | Yes |
| `description` | A description of what the input should contain | Yes |
## Examples
### Standard Keyword Microagent Example
Standard keyword microagent file example located at `.openhands/microagents/yummy.md`:
```yaml
---
# The type field is optional and will be inferred as 'knowledge' when triggers are present
triggers:
- yummyhappy
- happyyummy
@@ -33,4 +118,58 @@ triggers:
The user has said the magic word. Respond with "That was delicious!"
```
[See examples of microagents triggered by keywords in the official OpenHands repository](https://github.com/All-Hands-AI/OpenHands/tree/main/microagents)
### Command-Style Microagent Example
Command-style microagent file example located at `.openhands/microagents/fix_test.md`:
```yaml
---
# The type field is optional and will be inferred as 'task' when inputs are present
triggers:
- /fix_test
inputs:
- name: BRANCH_NAME
description: "Branch for the agent to work on"
- name: TEST_COMMAND_TO_RUN
description: "The test command you want the agent to work on. For example, `pytest tests/unit/test_bash_parsing.py`"
- name: FUNCTION_TO_FIX
description: "The name of function to fix"
- name: FILE_FOR_FUNCTION
description: "The path of the file that contains the function"
---
Can you check out branch "{{ BRANCH_NAME }}", and run {{ TEST_COMMAND_TO_RUN }}.
Help me fix these tests to pass by fixing the {{ FUNCTION_TO_FIX }} function in file {{ FILE_FOR_FUNCTION }}.
PLEASE DO NOT modify the tests by yourself -- Let me know if you think some of the tests are incorrect.
```
## Using Command-Style Microagents
Command-style microagents are designed to streamline common development tasks by providing structured templates for specific operations. They are triggered using a command-style format and will prompt the user for any required inputs.
### How to Use
1. Type `/` in the chat input to see available command-style microagents
2. Select a microagent from the dropdown or type its name (e.g., `/fix_test`)
3. The agent will prompt you for any required inputs
4. Provide the requested information
5. The agent will execute the task with your inputs
### Template Variables
In the body of a command-style microagent, you can reference input variables using the `{{ VARIABLE_NAME }}` syntax. These will be replaced with the user-provided values when the microagent is triggered.
### Available Command-Style Microagents
OpenHands includes several built-in command-style microagents:
| Command | Description |
|----------------------|-------------------------------------------------------|
| `/fix_test` | Fix failing tests by modifying a specific function |
| `/update_test` | Update tests for a new implementation |
| `/update_pr` | Update a pull request description |
| `/address_pr_comments` | Address comments on a pull request |
| `/add_repo_instruction` | Add instructions to the repository microagent |
[See examples of microagents in the official OpenHands repository](https://github.com/All-Hands-AI/OpenHands/tree/main/microagents)

View File

@@ -5,7 +5,7 @@ description: Organizations and users can define microagents that apply to all re
## Usage
These microagents can be [any type of microagent](./microagents-overview#microagent-types) and will be loaded
These microagents can be [any type of microagent](./microagents-overview#microagent-types) and will be loaded
accordingly. However, they are applied to all repositories belonging to the organization or user.
Add a `.openhands` repository under the organization or user and create a `microagents` directory and place the

View File

@@ -8,7 +8,7 @@ description: Microagents are specialized prompts that enhance OpenHands with dom
Currently OpenHands supports the following types of microagents:
- [General Microagents](./microagents-repo): General guidelines for OpenHands about the repository.
- [Keyword-Triggered Microagents](./microagents-keyword): Guidelines activated by specific keywords in prompts.
- [Keyword-Triggered Microagents](./microagents-keyword): Guidelines activated by specific keywords in prompts, including command-style microagents that prompt for user inputs.
To customize OpenHands' behavior, create a .openhands/microagents/ directory in the root of your repository and
add `<microagent_name>.md` files inside. For repository-specific guidelines, you can ask OpenHands to analyze your repository and create a comprehensive `repo.md` file (see [General Microagents](./microagents-repo) for details).
@@ -34,7 +34,7 @@ some-repository/
Each microagent file may include frontmatter that provides additional information. In some cases, this frontmatter
is required:
| Microagent Type | Required |
|---------------------------------|----------|
| `General Microagents` | No |
| `Keyword-Triggered Microagents` | Yes |
| Microagent Type | Required |
|------------------------------------------------|----------|
| `General Microagents` | No |
| `Keyword-Triggered Microagents (all types)` | Yes |

View File

@@ -128,3 +128,7 @@ docker network create openhands-network
docker run # ... \
--network openhands-network \
```
<Note>
**Docker Desktop Required**: Network isolation features, including custom networks and `host.docker.internal` routing, require Docker Desktop. Docker Engine alone does not support these features on localhost across custom networks. If you're using Docker Engine without Docker Desktop, network isolation may not work as expected.
</Note>

View File

@@ -15,7 +15,7 @@ Before using the Local Runtime, ensure that:
1. You can run OpenHands using the [Development workflow](https://github.com/All-Hands-AI/OpenHands/blob/main/Development.md).
2. For Linux and Mac, tmux is available on your system.
3. For Windows, PowerShell is available on your system.
- Only [CLI mode](../how-to/cli-mode) and [headless mode](../how-to/headless-mode) are supported in Windows with Local Runtime.
- Only [CLI mode](../how-to/cli-mode) and [headless mode](../how-to/headless-mode) are supported in Windows with Local Runtime.
## Configuration

View File

@@ -31,9 +31,9 @@ On initial prompt, an error is seen with `Permission Denied` or `PermissionError
**Resolution**
* Check if the `~/.openhands-state` is owned by `root`. If so, you can:
* Change the directory's ownership: `sudo chown <user>:<user> ~/.openhands-state`.
* or update permissions on the directory: `sudo chmod 777 ~/.openhands-state`
* Check if the `~/.openhands` is owned by `root`. If so, you can:
* Change the directory's ownership: `sudo chown <user>:<user> ~/.openhands`.
* or update permissions on the directory: `sudo chmod 777 ~/.openhands`
* or delete it if you dont need previous data. OpenHands will recreate it. You'll need to re-enter LLM settings.
* If mounting a local directory, ensure your `WORKSPACE_BASE` has the necessary permissions for the user running
OpenHands.
@@ -56,13 +56,16 @@ To fix this:
-e SANDBOX_VSCODE_PORT=41234 \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:latest \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.openhands-state:/.openhands-state \
-v ~/.openhands:/.openhands \
-p 3000:3000 \
-p 41234:41234 \
--add-host host.docker.internal:host-gateway \
--name openhands-app \
docker.all-hands.dev/all-hands-ai/openhands:latest
```
> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.
2. Make sure to expose the same port with `-p 41234:41234` in your Docker command.
3. If running with the development workflow, you can set this in your `config.toml` file:
```toml

View File

@@ -0,0 +1,253 @@
---
title: Windows Without WSL
description: Running OpenHands GUI on Windows without using WSL or Docker
---
# Running OpenHands GUI on Windows Without WSL
This guide provides step-by-step instructions for running OpenHands on a Windows machine without using WSL or Docker.
## Prerequisites
1. **Windows 10/11** - A modern Windows operating system
2. **PowerShell 7+** - While Windows PowerShell comes pre-installed on Windows 10/11, PowerShell 7+ is strongly recommended to avoid compatibility issues (see Troubleshooting section for "System.Management.Automation" errors)
3. **.NET Core Runtime** - Required for the PowerShell integration via pythonnet
4. **Python 3.12 or 3.13** - Python 3.12 or 3.13 is required (Python 3.14 is not supported due to pythonnet compatibility)
5. **Git** - For cloning the repository and version control
6. **Node.js and npm** - For running the frontend
## Step 1: Install Required Software
1. **Install Python 3.12 or 3.13**
- Download Python 3.12.x or 3.13.x from [python.org](https://www.python.org/downloads/)
- During installation, check "Add Python to PATH"
- Verify installation by opening PowerShell and running:
```powershell
python --version
```
2. **Install PowerShell 7**
- Download and install PowerShell 7 from the [official PowerShell GitHub repository](https://github.com/PowerShell/PowerShell/releases)
- Choose the MSI installer appropriate for your system (x64 for most modern computers)
- Run the installer with default options
- Verify installation by opening a new terminal and running:
```powershell
pwsh --version
```
- Using PowerShell 7 (pwsh) instead of Windows PowerShell will help avoid "System.Management.Automation" errors
3. **Install .NET Core Runtime**
- Download and install the .NET Core Runtime from [Microsoft's .NET download page](https://dotnet.microsoft.com/download)
- Choose the latest .NET Core Runtime (not SDK)
- Verify installation by opening PowerShell and running:
```powershell
dotnet --info
```
- This step is required for the PowerShell integration via pythonnet. Without it, OpenHands will fall back to a more limited PowerShell implementation.
4. **Install Git**
- Download Git from [git-scm.com](https://git-scm.com/download/win)
- Use default installation options
- Verify installation:
```powershell
git --version
```
5. **Install Node.js and npm**
- Download Node.js from [nodejs.org](https://nodejs.org/) (LTS version recommended)
- During installation, accept the default options which will install npm as well
- Verify installation:
```powershell
node --version
npm --version
```
6. **Install Poetry**
- Open PowerShell as Administrator and run:
```powershell
(Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | python -
```
- Add Poetry to your PATH:
```powershell
$env:Path += ";$env:APPDATA\Python\Scripts"
```
- Verify installation:
```powershell
poetry --version
```
## Step 2: Clone and Set Up OpenHands
1. **Clone the Repository**
```powershell
git clone https://github.com/All-Hands-AI/OpenHands.git
cd OpenHands
```
2. **Install Dependencies**
```powershell
poetry install
```
This will install all required dependencies, including:
- pythonnet - Required for Windows PowerShell integration
- All other OpenHands dependencies
## Step 3: Run OpenHands
1. **Build the Frontend**
```powershell
cd frontend
npm install
npm run build
cd ..
```
This will build the frontend files that the backend will serve.
2. **Start the Backend**
```powershell
# Make sure to use PowerShell 7 (pwsh) instead of Windows PowerShell
pwsh
$env:RUNTIME="local"; poetry run uvicorn openhands.server.listen:app --host 0.0.0.0 --port 3000 --reload --reload-exclude "./workspace"
```
This will start the OpenHands app using the local runtime with PowerShell integration, available at `localhost:3000`.
> **Note**: If you encounter a `RuntimeError: Directory './frontend/build' does not exist` error, make sure you've built the frontend first using the command above.
> **Important**: Using PowerShell 7 (pwsh) instead of Windows PowerShell is recommended to avoid "System.Management.Automation" errors. If you encounter this error, see the Troubleshooting section below.
3. **Alternatively, Run the Frontend in Development Mode (in a separate PowerShell window)**
```powershell
cd frontend
npm run dev
```
4. **Access the OpenHands GUI**
Open your browser and navigate to:
```
http://localhost:3000
```
> **Note**: If you're running the frontend in development mode (using `npm run dev`), use port 3001 instead: `http://localhost:3001`
## Installing and Running the CLI
To install and run the OpenHands CLI on Windows without WSL, follow these steps:
### 1. Install uv (Python Package Manager)
Open PowerShell as Administrator and run:
```powershell
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
```
### 2. Install .NET SDK (Required)
The OpenHands CLI **requires** the .NET Core runtime for PowerShell integration. Without it, the CLI will fail to start with a `coreclr` error. Install the .NET SDK which includes the runtime:
```powershell
winget install Microsoft.DotNet.SDK.8
```
Alternatively, you can download and install the .NET SDK from the [official Microsoft website](https://dotnet.microsoft.com/download).
After installation, restart your PowerShell session to ensure the environment variables are updated.
### 3. Install and Run OpenHands
After installing the prerequisites, you can install and run OpenHands with:
```powershell
uvx --python 3.12 --from openhands-ai openhands
```
### Troubleshooting CLI Issues
#### CoreCLR Error
If you encounter an error like `Failed to load CoreCLR` or `pythonnet.load('coreclr')` when running OpenHands CLI, this indicates that the .NET Core runtime is missing or not properly configured. To fix this:
1. Install the .NET SDK as described in step 2 above
2. Verify that your system PATH includes the .NET SDK directories
3. Restart your PowerShell session completely after installing the .NET SDK
4. Make sure you're using PowerShell 7 (pwsh) rather than Windows PowerShell
To verify your .NET installation, run:
```powershell
dotnet --info
```
This should display information about your installed .NET SDKs and runtimes. If this command fails, the .NET SDK is not properly installed or not in your PATH.
If the issue persists after installing the .NET SDK, try installing the specific .NET Runtime version 6.0 or later from the [.NET download page](https://dotnet.microsoft.com/download).
## Limitations on Windows
When running OpenHands on Windows without WSL or Docker, be aware of the following limitations:
1. **Browser Tool Not Supported**: The browser tool is not currently supported on Windows.
2. **.NET Core Requirement**: The PowerShell integration requires .NET Core Runtime to be installed. The CLI implementation attempts to load the CoreCLR at startup with `pythonnet.load('coreclr')` and will fail with an error if .NET Core is not properly installed.
3. **Interactive Shell Commands**: Some interactive shell commands may not work as expected. The PowerShell session implementation has limitations compared to the bash session used on Linux/macOS.
4. **Path Handling**: Windows uses backslashes (`\`) in paths, which may require adjustments when working with code examples designed for Unix-like systems.
## Troubleshooting
### "System.Management.Automation" Not Found Error
If you encounter an error message stating that "System.Management.Automation" was not found, this typically indicates that you have a minimal version of PowerShell installed or that the .NET components required for PowerShell integration are missing.
> **IMPORTANT**: This error is most commonly caused by using the built-in Windows PowerShell (powershell.exe) instead of PowerShell 7 (pwsh.exe). Even if you installed PowerShell 7 during the prerequisites, you may still be using the older Windows PowerShell by default.
To resolve this issue:
1. **Install the latest version of PowerShell 7** from the official Microsoft repository:
- Visit [https://github.com/PowerShell/PowerShell/releases](https://github.com/PowerShell/PowerShell/releases)
- Download and install the latest MSI package for your system architecture (x64 for most systems)
- During installation, ensure you select the following options:
- "Add PowerShell to PATH environment variable"
- "Register Windows PowerShell 7 as the default shell"
- "Enable PowerShell remoting"
- The installer will place PowerShell 7 in `C:\Program Files\PowerShell\7` by default
2. **Restart your terminal or command prompt** to ensure the new PowerShell is available
3. **Verify the installation** by running:
```powershell
pwsh --version
```
You should see output indicating PowerShell 7.x.x
4. **Run OpenHands using PowerShell 7** instead of Windows PowerShell:
```powershell
pwsh
cd path\to\openhands
$env:RUNTIME="local"; poetry run uvicorn openhands.server.listen:app --host 0.0.0.0 --port 3000 --reload --reload-exclude "./workspace"
```
> **Note**: Make sure you're explicitly using `pwsh` (PowerShell 7) and not `powershell` (Windows PowerShell). The command prompt or terminal title should say "PowerShell 7" rather than just "Windows PowerShell".
5. **If the issue persists**, ensure that you have the .NET Runtime installed:
- Download and install the latest .NET Runtime from [Microsoft's .NET download page](https://dotnet.microsoft.com/download)
- Choose ".NET Runtime" (not SDK) version 6.0 or later
- After installation, verify it's properly installed by running:
```powershell
dotnet --info
```
- Restart your computer after installation
- Try running OpenHands again
6. **Ensure that the .NET Framework is properly installed** on your system:
- Go to Control Panel > Programs > Programs and Features > Turn Windows features on or off
- Make sure ".NET Framework 4.8 Advanced Services" is enabled
- Click OK and restart if prompted
This error occurs because OpenHands uses the pythonnet package to interact with PowerShell, which requires the System.Management.Automation assembly from the .NET framework. A minimal PowerShell installation or older Windows PowerShell (rather than PowerShell 7+) might not include all the necessary components for this integration.

View File

@@ -144,7 +144,7 @@ if __name__ == '__main__':
llm_config = None
if args.llm_config:
llm_config = get_llm_config_arg(args.llm_config)
# modify_params must be False for evaluation purpose, for reproducibility and accurancy of results
# modify_params must be False for evaluation purpose, for reproducibility and accuracy of results
llm_config.modify_params = False
if llm_config is None:

1
evaluation/benchmarks/gaia/.gitignore vendored Normal file
View File

@@ -0,0 +1 @@
data/

View File

@@ -6,6 +6,13 @@ This folder contains evaluation harness for evaluating agents on the [GAIA bench
Please follow instruction [here](../../README.md#setup) to setup your local development environment and LLM.
To enable the Tavily MCP Server, you can add the Tavily API key under the `core` section of your `config.toml` file, like below:
```toml
[core]
search_api_key = "tvly-******"
```
## Run the evaluation
We are using the GAIA dataset hosted on [Hugging Face](https://huggingface.co/datasets/gaia-benchmark/GAIA).

View File

@@ -1,4 +1,5 @@
import asyncio
import copy
import functools
import os
import re
@@ -6,6 +7,7 @@ import re
import huggingface_hub
import pandas as pd
from datasets import load_dataset
from pydantic import SecretStr
from evaluation.benchmarks.gaia.scorer import question_scorer
from evaluation.utils.shared import (
@@ -24,6 +26,7 @@ from openhands.core.config import (
OpenHandsConfig,
get_llm_config_arg,
get_parser,
load_from_toml,
)
from openhands.core.config.utils import get_agent_config_arg
from openhands.core.logger import openhands_logger as logger
@@ -41,7 +44,7 @@ AGENT_CLS_TO_FAKE_USER_RESPONSE_FN = {
}
AGENT_CLS_TO_INST_SUFFIX = {
'CodeActAgent': 'When you think you have solved the question, please first send your answer to user through message and then exit.\n'
'CodeActAgent': 'When you think you have solved the question, please use the finish tool and include your final answer in the message parameter of the finish tool. Your final answer MUST be encapsulated within <solution> and </solution>.\n'
}
@@ -49,7 +52,7 @@ def get_config(
metadata: EvalMetadata,
) -> OpenHandsConfig:
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = 'python:3.12-bookworm'
sandbox_config.base_container_image = 'nikolaik/python-nodejs:python3.12-nodejs22'
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
@@ -67,6 +70,11 @@ def get_config(
logger.info('Agent config not provided, using default settings')
agent_config = config.get_agent_config(metadata.agent_class)
agent_config.enable_prompt_extensions = False
config_copy = copy.deepcopy(config)
load_from_toml(config_copy)
if config_copy.search_api_key:
config.search_api_key = SecretStr(config_copy.search_api_key)
return config
@@ -134,16 +142,26 @@ def process_instance(
dest_file = None
# Prepare instruction
instruction = f'{instance["Question"]}\n'
instruction = """You have one question to answer. It is paramount that you provide a correct answer.
Give it all you can: I know for a fact that you have access to all the relevant tools to solve it and find the correct answer (the answer does exist). Failure or 'I cannot answer' or 'None found' will not be tolerated, success will be rewarded.
You must make sure you find the correct answer! You MUST strictly follow the task-specific formatting instructions for your final answer.
Here is the task:
{task_question}
""".format(
task_question=instance['Question'],
)
logger.info(f'Instruction: {instruction}')
if dest_file:
instruction += f'\n\nThe mentioned file is provided in the workspace at: {dest_file.split("/")[-1]}'
instruction += 'IMPORTANT: You should ONLY interact with the environment provided to you AND NEVER ASK FOR HUMAN HELP.\n'
instruction += 'Please encapsulate your final answer (answer ONLY) within <solution> and </solution>.\n'
instruction += """IMPORTANT: When seeking information from a website, REFRAIN from arbitrary URL navigation. You should utilize the designated search engine tool with precise keywords to obtain relevant URLs or use the specific website's search interface. DO NOT navigate directly to specific URLs as they may not exist.\n\nFor example: if you want to search for a research paper on Arxiv, either use the search engine tool with specific keywords or navigate to arxiv.org and then use its interface.\n"""
instruction += 'IMPORTANT: You should NEVER ask for Human Help.\n'
instruction += 'IMPORTANT: Please encapsulate your final answer (answer ONLY) within <solution> and </solution>. Your answer will be evaluated using string matching approaches so it important that you STRICTLY adhere to the output formatting instructions specified in the task (e.g., alphabetization, sequencing, units, rounding, decimal places, etc.)\n'
instruction += (
'For example: The answer to the question is <solution> 42 </solution>.\n'
)
instruction += "IMPORTANT: Your final answer should be a number OR as few words as possible OR a comma separated list of numbers and/or strings. If you are asked for a number, express it numerically (i.e., with digits rather than words), do not use commas, and do not include units such as $ or percent signs unless specified otherwise. If you are asked for a string, don't use articles, neither abbreviations (e.g. for cities). If you are asked for a comma separated list, apply the above rules depending of whether the element to be put in the list is a number or a string.\n"
# NOTE: You can actually set slightly different instruction for different agents
instruction += AGENT_CLS_TO_INST_SUFFIX.get(metadata.agent_class, '')
logger.info(f'Instruction:\n{instruction}', extra={'msg_type': 'OBSERVATION'})
@@ -175,7 +193,7 @@ def process_instance(
for event in reversed(state.history):
if event.source == 'agent':
if isinstance(event, AgentFinishAction):
model_answer_raw = event.thought
model_answer_raw = event.final_thought
break
elif isinstance(event, CmdRunAction):
model_answer_raw = event.thought
@@ -222,6 +240,7 @@ def process_instance(
error=state.last_error if state and state.last_error else None,
test_result=test_result,
)
runtime.close()
return output
@@ -253,6 +272,8 @@ if __name__ == '__main__':
if llm_config is None:
raise ValueError(f'Could not find LLM config: --llm_config {args.llm_config}')
toml_config = OpenHandsConfig()
load_from_toml(toml_config)
metadata = make_metadata(
llm_config=llm_config,
dataset_name='gaia',
@@ -261,7 +282,10 @@ if __name__ == '__main__':
eval_note=args.eval_note,
eval_output_dir=args.eval_output_dir,
data_split=args.data_split,
details={'gaia-level': args.level},
details={
'gaia-level': args.level,
'mcp-servers': ['tavily'] if toml_config.search_api_key else [],
},
agent_config=agent_config,
)

View File

@@ -39,7 +39,7 @@ echo "LEVELS: $LEVELS"
COMMAND="poetry run python ./evaluation/benchmarks/gaia/run_infer.py \
--agent-cls $AGENT \
--llm-config $MODEL_CONFIG \
--max-iterations 30 \
--max-iterations 60 \
--level $LEVELS \
--data-split validation \
--eval-num-workers $NUM_WORKERS \

View File

@@ -223,7 +223,7 @@ if __name__ == '__main__':
llm_config = None
if args.llm_config:
llm_config = get_llm_config_arg(args.llm_config)
# modify_params must be False for evaluation purpose, for reproducibility and accurancy of results
# modify_params must be False for evaluation purpose, for reproducibility and accuracy of results
llm_config.modify_params = False
if llm_config is None:
raise ValueError(f'Could not find LLM config: --llm_config {args.llm_config}')

View File

@@ -2,6 +2,8 @@
This folder contains the evaluation harness that we built on top of the original [SWE-Bench benchmark](https://www.swebench.com/) ([paper](https://arxiv.org/abs/2310.06770)).
**UPDATE (6/15/2025): We now support running SWE-bench-Live evaluation (see the paper [here](https://arxiv.org/abs/2505.23419))! For how to run it, checkout [this README](./SWE-bench-Live.md).**
**UPDATE (5/26/2025): We now support running interactive SWE-Bench evaluation (see the paper [here](https://arxiv.org/abs/2502.13069))! For how to run it, checkout [this README](./SWE-Interact.md).**
**UPDATE (4/8/2025): We now support running SWT-Bench evaluation! For more details, checkout [the corresponding section](#SWT-Bench-Evaluation).**

View File

@@ -0,0 +1,65 @@
# SWE-bench-Live
<p align="center">
<a href="https://arxiv.org/abs/2505.23419">📃 Paper</a>
<a href="https://huggingface.co/SWE-bench-Live" >🤗 HuggingFace</a>
<a href="https://SWE-bench-Live.github.io" >📊 Leaderboard</a>
</p>
SWE-bench-Live is a live benchmark for issue resolving, providing a dataset that contains the latest issue tasks. This document explains how to run the evaluation of OpenHands on SWE-bench-Live.
Since SWE-bench-Live has an almost identical setting to SWE-bench, you only need to simply change the dataset name to `SWE-bench-Live/SWE-bench-Live`, the other parts are basically the same as running on SWE-bench.
## Setting Up
Set up the development environment and configure your LLM provider by following the [README](README.md).
## Running Inference
Use the same script, but change the dataset name to `SWE-bench-Live` and select the split (either `lite` or `full`). The lite split contains 300 instances from the past six months, while the full split includes 1,319 instances created after 2024.
```shell
./evaluation/benchmarks/swe_bench/scripts/run_infer.sh [model_config] [git-version] [agent] [eval_limit] [max_iter] [num_workers] [dataset] [dataset_split]
```
In the original SWE-bench-Live paper, max_iterations is set to 100.
```shell
./evaluation/benchmarks/swe_bench/scripts/run_infer.sh llm.your_llm HEAD CodeActAgent 300 100 3 SWE-bench-Live/SWE-bench-Live lite
```
## Evaluating Results
After OpenHands generates patch results for each issue, we evaluate the results using the [SWE-bench-Live evaluation harness](https://github.com/microsoft/SWE-bench-Live).
Convert to the format of predictions for SWE benchmarks:
```shell
# You can find output.jsonl in evaluation/evaluation_outputs
python evaluation/benchmarks/swe_bench/scripts/live/convert.py --output_jsonl [path/to/evaluation/output.jsonl] > preds.jsonl
```
Please refer to the original [SWE-bench-Live repository](https://github.com/microsoft/SWE-bench-Live) to set up the evaluation harness and use the provided scripts to generate the evaluation report:
```shell
python -m swebench.harness.run_evaluation \
--dataset_name SWE-bench-Live/SWE-bench-Live \
--split lite \
--namespace starryzhang \
--predictions_path preds.jsonl \
--max_workers 10 \
--run_id openhands
```
## Citation
```bibtex
@article{zhang2025swebenchgoeslive,
title={SWE-bench Goes Live!},
author={Linghao Zhang and Shilin He and Chaoyun Zhang and Yu Kang and Bowen Li and Chengxing Xie and Junhao Wang and Maoquan Wang and Yufan Huang and Shengyu Fu and Elsie Nallipogu and Qingwei Lin and Yingnong Dang and Saravan Rajmohan and Dongmei Zhang},
journal={arXiv preprint arXiv:2505.23419},
year={2025}
}
```

View File

@@ -0,0 +1,80 @@
from typing import Any
import pandas as pd
from evaluation.utils.shared import assert_and_raise
from openhands.core.logger import openhands_logger as logger
from openhands.events.action import CmdRunAction
from openhands.events.observation import (
CmdOutputObservation,
ErrorObservation,
)
from openhands.runtime.base import Runtime
from openhands.utils.shutdown_listener import sleep_if_should_continue
def complete_runtime(
runtime: Runtime,
instance: pd.Series,
) -> dict[str, Any]:
"""Complete the runtime and export the git patch for SWE-bench-Live."""
logger.info('-' * 30)
logger.info('BEGIN Runtime Completion Fn')
logger.info('-' * 30)
obs: CmdOutputObservation
workspace_dir_name = instance.instance_id
action = CmdRunAction(command=f'cd /workspace/{workspace_dir_name}')
action.set_hard_timeout(600)
logger.info(action)
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(
isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
f'Failed to cd to /workspace/{workspace_dir_name}: {str(obs)}',
)
action = CmdRunAction(command='git config --global core.pager ""')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(
isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
f'Failed to git config --global core.pager "": {str(obs)}',
)
action = CmdRunAction(command='git add -A')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(
isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
f'Failed to git add -A: {str(obs)}',
)
n_retries = 0
git_patch = None
while n_retries < 5:
action = CmdRunAction(
command=f'git diff --no-color --cached {instance["base_commit"]}',
)
action.set_hard_timeout(100 + 10 * n_retries)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
n_retries += 1
if isinstance(obs, CmdOutputObservation):
if obs.exit_code == 0:
git_patch = obs.content.strip()
break
else:
logger.info('Failed to get git diff, retrying...')
sleep_if_should_continue(10)
elif isinstance(obs, ErrorObservation):
logger.error(f'Error occurred: {obs.content}. Retrying...')
sleep_if_should_continue(10)
else:
assert_and_raise(False, f'Unexpected observation type: {str(obs)}')
assert_and_raise(git_patch is not None, 'Failed to get git diff (None)')
logger.info('-' * 30)
logger.info('END Runtime Completion Fn')
logger.info('-' * 30)
return {'git_patch': git_patch}

View File

@@ -1,4 +1,4 @@
TASK_INSTRUECTION="""
TASK_INSTRUECTION = """
Given the following GitHub problem description, your objective is to localize the specific files, classes or functions, and lines of code that need modification or contain key information to resolve the issue.
Follow these steps to localize the issue:
@@ -66,4 +66,4 @@ FAKE_USER_MSG_FOR_LOC = (
'Verify that you have carefully analyzed the impact of the found locations on the repository, especially their dependencies. '
'If you think you have solved the task, please send your final answer (including the former answer and reranking) to user through message and then call `finish` to finish.\n'
'IMPORTANT: YOU SHOULD NEVER ASK FOR HUMAN HELP.\n'
)
)

View File

@@ -0,0 +1,65 @@
<uploaded_files>
/workspace/{{ workspace_dir_name }}
</uploaded_files>
I've uploaded a python code repository in the directory {{ workspace_dir_name }}. Consider the following issue description:
<issue_description>
{{ instance.problem_statement }}
</issue_description>
Can you help me implement the necessary changes to the repository so that the requirements specified in the <issue_description> are met?
I've already taken care of all changes to any of the test files described in the <issue_description>. This means you DON'T have to modify the testing logic or any of the tests in any way!
Also the development Python environment is already set up for you (i.e., all dependencies already installed), so you don't need to install other packages.
Your task is to make the minimal changes to non-test files in the /workspace/{{ workspace_dir_name }} directory to ensure the <issue_description> is satisfied.
Follow these phases to resolve the issue:
Phase 1. READING: read the problem and reword it in clearer terms
1.1 If there are code or config snippets. Express in words any best practices or conventions in them.
1.2 Hightlight message errors, method names, variables, file names, stack traces, and technical details.
1.3 Explain the problem in clear terms.
1.4 Enumerate the steps to reproduce the problem.
1.5 Hightlight any best practices to take into account when testing and fixing the issue
Phase 2. RUNNING: install and run the tests on the repository
2.1 Follow the readme
2.2 Install the environment and anything needed
2.2 Iterate and figure out how to run the tests
Phase 3. EXPLORATION: find the files that are related to the problem and possible solutions
3.1 Use `grep` to search for relevant methods, classes, keywords and error messages.
3.2 Identify all files related to the problem statement.
3.3 Propose the methods and files to fix the issue and explain why.
3.4 From the possible file locations, select the most likely location to fix the issue.
Phase 4. TEST CREATION: before implementing any fix, create a script to reproduce and verify the issue.
4.1 Look at existing test files in the repository to understand the test format/structure.
4.2 Create a minimal reproduction script that reproduces the located issue.
4.3 Run the reproduction script to confirm you are reproducing the issue.
4.4 Adjust the reproduction script as necessary.
Phase 5. FIX ANALYSIS: state clearly the problem and how to fix it
5.1 State clearly what the problem is.
5.2 State clearly where the problem is located.
5.3 State clearly how the test reproduces the issue.
5.4 State clearly the best practices to take into account in the fix.
5.5 State clearly how to fix the problem.
Phase 6. FIX IMPLEMENTATION: Edit the source code to implement your chosen solution.
6.1 Make minimal, focused changes to fix the issue.
Phase 7. VERIFICATION: Test your implementation thoroughly.
7.1 Run your reproduction script to verify the fix works.
7.2 Add edge cases to your test script to ensure comprehensive coverage.
7.3 Run existing tests related to the modified code to ensure you haven't broken anything.
8. FINAL REVIEW: Carefully re-read the problem description and compare your changes with the base commit {{ instance.base_commit }}.
8.1 Ensure you've fully addressed all requirements.
8.2 Run any tests in the repository related to:
8.2.1 The issue you are fixing
8.2.2 The files you modified
8.2.3 The functions you changed
8.3 If any tests fail, revise your implementation until all tests pass
Be thorough in your exploration, testing, and reasoning. It's fine if your thinking process is lengthy - quality and completeness are more important than brevity.

View File

@@ -0,0 +1,65 @@
<uploaded_files>
/workspace/{{ workspace_dir_name }}
</uploaded_files>
I've uploaded a python code repository in the directory {{ workspace_dir_name }}. Consider the following issue description:
<issue_description>
{{ instance.problem_statement }}
</issue_description>
Can you help me implement the necessary changes to the repository so that the requirements specified in the <issue_description> are met?
I've already taken care of all changes to any of the test files described in the <issue_description>. This means you DON'T have to modify the testing logic or any of the tests in any way!
Also the development Python environment is already set up for you (i.e., all dependencies already installed), so you don't need to install other packages.
Your task is to make the minimal changes to non-test files in the /workspace/{{ workspace_dir_name }} directory to ensure the <issue_description> is satisfied.
Follow these phases to resolve the issue:
Phase 1. READING: read the problem and reword it in clearer terms
1.1 If there are code or config snippets. Express in words any best practices or conventions in them.
1.2 Hightlight message errors, method names, variables, file names, stack traces, and technical details.
1.3 Explain the problem in clear terms.
1.4 Enumerate the steps to reproduce the problem.
1.5 Hightlight any best practices to take into account when testing and fixing the issue
Phase 2. RUNNING: install and run the tests on the repository
2.1 Follow the readme
2.2 Install the environment and anything needed
2.2 Iterate and figure out how to run the tests
Phase 3. EXPLORATION: find the files that are related to the problem and possible solutions
3.1 Use `grep` to search for relevant methods, classes, keywords and error messages.
3.2 Identify all files related to the problem statement.
3.3 Propose the methods and files to fix the issue and explain why.
3.4 From the possible file locations, select the most likely location to fix the issue.
Phase 4. TEST CREATION: before implementing any fix, create a script to reproduce and verify the issue.
4.1 Look at existing test files in the repository to understand the test format/structure.
4.2 Create a minimal reproduction script that reproduces the located issue.
4.3 Run the reproduction script to confirm you are reproducing the issue.
4.4 Adjust the reproduction script as necessary.
Phase 5. FIX ANALYSIS: state clearly the problem and how to fix it
5.1 State clearly what the problem is.
5.2 State clearly where the problem is located.
5.3 State clearly how the test reproduces the issue.
5.4 State clearly the best practices to take into account in the fix.
5.5 State clearly how to fix the problem.
Phase 6. FIX IMPLEMENTATION: Edit the source code to implement your chosen solution.
6.1 Make minimal, focused changes to fix the issue.
Phase 7. VERIFICATION: Test your implementation thoroughly.
7.1 Run your reproduction script to verify the fix works.
7.2 Add edge cases to your test script to ensure comprehensive coverage.
7.3 Run existing tests related to the modified code to ensure you haven't broken anything.
8. FINAL REVIEW: Carefully re-read the problem description and compare your changes with the base commit {{ instance.base_commit }}.
8.1 Ensure you've fully addressed all requirements.
8.2 Run any tests in the repository related to:
8.2.1 The issue you are fixing
8.2.2 The files you modified
8.2.3 The functions you changed
8.3 If any tests fail, revise your implementation until all tests pass
Be thorough in your exploration, testing, and reasoning. It's fine if your thinking process is lengthy - quality and completeness are more important than brevity.

View File

@@ -0,0 +1,45 @@
# Task: Fix Issue in Python Repository
## Repository Context
You are provided with a Python code repository that contains an issue requiring your attention. The repository is located in a sandboxed environment, and you have access to the codebase to implement the necessary changes.
The code repository is located at: `/workspace/{{ workspace_dir_name }}`
(This path is provided for context; use file system tools to confirm paths before access).
## Goal
Your goal is to fix the issue described in the **Issue Description** section below. Implement the necessary changes to **non-test files only** within the repository, ensuring that **all relevant tests pass** after your changes.
## Key Requirements & Constraints
1. **Understand the problem** very well: it is a bug report, and you know humans don't always write good descriptions. Explore the codebase to understand the related code and the problem in depth. It is possible that the solution needs to be a bit more extensive than just the stated text. Don't exagerate though: don't do unrelated refactoring, but also don't interpret the description too strictly.
2. **Focus on the issues:** Implement the fix focusing on non-test files related to the issue.
2. **Environment Ready:** The Python environment is pre-configured with all dependencies. Do not install packages.
3. **Mandatory Testing Procedure:**
* **Create Test to Reproduce the Issue:** *Before* implementing any fix, you MUST create a *new test* (separate from existing tests) that specifically reproduces the issue.
* Take existing tests as example to understand the testing format/structure.
* Enhance this test with edge cases.
* Run this test to confirm reproduction.
* **Verify Fix:** After implementing the fix, run your test again to verify the issue is resolved.
* **Identify ALL Relevant Tests:** You MUST perform a **dedicated search and analysis** to identify **all** existing unit tests potentially affected by your changes. This includes:
* Tests in the same module/directory as the changed files (e.g., `tests/` subdirectories).
* Tests explicitly importing or using the modified code/classes/functions.
* Tests mentioned in the issue description or related documentation.
* Tests covering functionalities that *depend on* the modified code (analyze callers/dependencies if necessary).
**If you cannot confidently identify a specific subset, you MUST identify and plan to run the entire test suite for the modified application or module(s). State your identified test scope clearly.**
* **Run Identified Relevant Tests:** You MUST execute the **complete set** of relevant existing unit tests you identified in the previous step. Ensure you are running the *correct and comprehensive set* of tests. You MUST NOT modify these existing tests.
* **Final Check & Verification:** Before finishing, ensure **all** identified relevant existing tests pass. **Explicitly confirm that you have considered potential omissions in your test selection and believe the executed tests comprehensively cover the impact of your changes.** Failing to identify and run the *complete* relevant set constitutes a failure. If any identified tests fail, revise your fix. Passing all relevant tests is the primary measure of success.
4. **Defensive Programming:** Actively practice defensive programming: anticipate and handle potential edge cases, unexpected inputs, and different ways the affected code might be called **to ensure the fix works reliably and allows relevant tests to pass.** Analyze the potential impact on other parts of the codebase.
5. **Final Review:** Compare your solution against the original issue and the base commit ({{ instance.base_commit }}) to ensure completeness and test passage.
## General Workflow Guidance
* Prioritize understanding the problem, exploring the code, planning your fix, implementing it carefully using the required diff format, and **thoroughly testing** according to the **Mandatory Testing Procedure**.
* Consider trade-offs between different solutions. The goal is a **robust change that makes the relevant tests pass.** Quality, correctness, and reliability are key.
* Actively practice defensive programming: anticipate and handle potential edge cases, unexpected inputs, and different ways the affected code might be called **to ensure the fix works reliably and allows relevant tests to pass.** Analyze the potential impact on other parts of the codebase.
* IMPORTANT: Your solution will be tested by additional hidden tests, so do not assume the task is complete just because visible tests pass! Refine the solution until you are confident that it is robust and comprehensive according to the **Defensive Programming** requirement.
## Final Note
Be thorough in your exploration, testing, and reasoning. It's fine if your thinking process is lengthy - quality and completeness are more important than brevity.
## Issue Description
{{ instance.problem_statement }}

View File

@@ -0,0 +1,80 @@
You will be tasked to fix an issue from an open-source repository.
Your thinking should be thorough and so it's fine if it's very long. You can think step by step before and after each action you decide to take.
You MUST iterate and keep going until the problem is solved.
You already have everything you need to solve this problem in the /workspace/{{ workspace_dir_name }} folder, even without internet connection. I want you to fully solve this autonomously before coming back to me.
Only terminate your turn when you are sure that the problem is solved. Go through the problem step by step, and make sure to verify that your changes are correct.
NEVER end your turn without having solved the problem, and when you say you are going to make a tool call, make sure you ACTUALLY make the tool call, instead of ending your turn.
THE PROBLEM CAN DEFINITELY BE SOLVED WITHOUT THE INTERNET.
Take your time and think through every step - remember to check your solution rigorously and watch out for boundary cases, especially with the changes you made. Your solution must be perfect. If not, continue working on it.
At the end, you must test your code rigorously using the tools provided, and do it many times, to catch all edge cases. If it is not robust, iterate more and make it perfect. Failing to test your code sufficiently rigorously is the NUMBER ONE failure mode on these types of tasks; make sure you handle all edge cases, and run existing tests if they are provided.
You MUST plan extensively before each function call, and reflect extensively on the outcomes of the previous function calls. DO NOT do this entire process by making function calls only, as this can impair your ability to solve the problem and think insightfully.
# Workflow
## High-Level Problem Solving Strategy
1. Understand the problem deeply. Carefully read the issue and think critically about what is required.
2. Investigate the codebase. Explore relevant files, search for key functions, and gather context.
3. Develop a clear, step-by-step plan. Break down the fix into manageable, incremental steps.
4. Implement the fix incrementally. Make small, testable code changes.
5. Debug as needed. Use debugging techniques to isolate and resolve issues.
6. Test frequently. Run tests after each change to verify correctness.
7. Iterate until the root cause is fixed and all tests pass.
8. Reflect and validate comprehensively. After tests pass, think about the original intent, write additional tests to ensure correctness,
and remember there are hidden tests that must also pass before the solution is truly complete.
Refer to the detailed sections below for more information on each step.
## 1. Deeply Understand the Problem
Carefully read the issue and think hard about a plan to solve it before coding.
## 2. Codebase Investigation
- Explore relevant files and directories.
- Search for key functions, classes, or variables related to the issue.
- Read and understand relevant code snippets.
- Identify the root cause of the problem.
- Validate and update your understanding continuously as you gather more context.
## 3. Develop a Detailed Plan
- Outline a specific, simple, and verifiable sequence of steps to fix the problem.
- Break down the fix into small, incremental changes.
## 4. Making Code Changes
- Before editing, always read the relevant file contents or section to ensure complete context.
- If a patch is not applied correctly, attempt to reapply it.
- Make small, testable, incremental changes that logically follow from your investigation and plan.
## 5. Debugging
- Make code changes only if you have high confidence they can solve the problem
- When debugging, try to determine the root cause rather than addressing symptoms
- Debug for as long as needed to identify the root cause and identify a fix
- Use print statements, logs, or temporary code to inspect program state, including descriptive statements or error messages to understand what's happening
- To test hypotheses, you can also add test statements or functions
- Revisit your assumptions if unexpected behavior occurs.
## 6. Testing
- Run tests frequently using `python3 run_tests.py` (or equivalent).
- After each change, verify correctness by running relevant tests.
- If tests fail, analyze failures and revise your patch.
- Write additional tests if needed to capture important behaviors or edge cases.
- Ensure all tests pass before finalizing.
## 7. Final Verification
- Confirm the root cause is fixed.
- Review your solution for logic correctness and robustness.
- Iterate until you are extremely confident the fix is complete and all tests pass.
## 8. Final Reflection and Additional Testing
- Reflect carefully on the original intent of the user and the problem statement.
- Think about potential edge cases or scenarios that may not be covered by existing tests.
- Write additional tests that would need to pass to fully validate the correctness of your solution.
- Run these new tests and ensure they all pass.
- Be aware that there are additional hidden tests that must also pass for the solution to be successful.
- Do not assume the task is complete just because the visible tests pass; continue refining until you are confident the fix is robust and comprehensive.

View File

@@ -0,0 +1,19 @@
<uploaded_files>
/workspace/{{ workspace_dir_name }}
</uploaded_files>
I've uploaded a python code repository in the directory {{ workspace_dir_name }}. Consider the following issue description:
<issue_description>
{{ instance.problem_statement }}
</issue_description>
Can you help me implement the necessary changes to the repository to test whether the issue in <issue_description> was resolved?
I will take care of all changes to any of the non-test files. This means you DON'T have to modify the actual logic and ONLY have to update test logic and tests!
Your task is to make the minimal changes to tests files in the /workspace directory to reproduce the issue in the <issue_description>, i.e., such that the generated tests fail in the current state (where the issue is unresolved) and pass when the issue will be resolved.
Follow these steps to reproduce the issue:
1. As a first step, it might be a good idea to explore the repo to familiarize yourself with its structure.
2. Create a script `reproduction.py` to reproduce the error and execute it with `python reproduction.py` using the BashTool, to confirm the error
3. Edit the sourcecode of the repo to integrate your reproduction script into the test framework
4. Run the test framework and make sure your tests fail! Only submit FAILING tests! Never submit passing tests.
{{ test_instructions }}Your thinking should be thorough and so it's fine if it's very long.

View File

@@ -8,6 +8,7 @@ from typing import Any, Literal
import pandas as pd
import toml
from datasets import load_dataset
from jinja2 import Environment, FileSystemLoader
import openhands.agenthub
from evaluation.benchmarks.swe_bench.binary_patch_utils import (
@@ -42,7 +43,7 @@ from openhands.core.config import (
AgentConfig,
OpenHandsConfig,
get_llm_config_arg,
get_parser
get_parser,
)
from openhands.core.config.condenser_config import NoOpCondenserConfig
from openhands.core.config.utils import get_condenser_config_arg
@@ -65,6 +66,26 @@ RUN_WITH_BROWSING = os.environ.get('RUN_WITH_BROWSING', 'false').lower() == 'tru
ENABLE_LLM_EDITOR = os.environ.get('ENABLE_LLM_EDITOR', 'false').lower() == 'true'
BenchMode = Literal['swe', 'swt', 'swt-ci']
# Global variable to track dataset type
DATASET_TYPE = 'SWE-bench'
def set_dataset_type(dataset_name: str) -> str:
"""Set dataset type based on dataset name."""
global DATASET_TYPE
name_lower = dataset_name.lower()
if 'swe-gym' in name_lower:
DATASET_TYPE = 'SWE-Gym'
elif 'swe-bench-live' in name_lower:
DATASET_TYPE = 'SWE-bench-Live'
elif 'multimodal' in name_lower:
DATASET_TYPE = 'Multimodal'
else:
DATASET_TYPE = 'SWE-bench'
logger.info(f'Dataset type set to: {DATASET_TYPE}')
AGENT_CLS_TO_FAKE_USER_RESPONSE_FN = {
'CodeActAgent': codeact_user_response,
@@ -72,107 +93,59 @@ AGENT_CLS_TO_FAKE_USER_RESPONSE_FN = {
def _get_swebench_workspace_dir_name(instance: pd.Series) -> str:
return f'{instance.repo}__{instance.version}'.replace('/', '__')
if DATASET_TYPE == 'SWE-bench-Live':
return instance.instance_id
else:
return f'{instance.repo}__{instance.version}'.replace('/', '__')
def get_instruction(instance: pd.Series, metadata: EvalMetadata) -> MessageAction:
workspace_dir_name = _get_swebench_workspace_dir_name(instance)
mode = metadata.details['mode']
llm_model = metadata.llm_config.model
# Determine the template file based on mode and LLM
if mode.startswith('swt'):
test_instructions = (
f'The following command can be used to run the tests: `{list(MAP_REPO_TO_TEST_FRAMEWORK_VERBOSE[instance.repo].values())[0]}`. Make sure they fail in the expected way.\n'
if mode.endswith('ci')
else ''
)
instruction = f"""\
<uploaded_files>
/workspace/{workspace_dir_name}
</uploaded_files>
I've uploaded a python code repository in the directory {workspace_dir_name}. Consider the following issue description:
<issue_description>
{instance.problem_statement}
</issue_description>
Can you help me implement the necessary changes to the repository to test whether the issue in <issue_description> was resolved?
I will take care of all changes to any of the non-test files. This means you DON'T have to modify the actual logic and ONLY have to update test logic and tests!
Your task is to make the minimal changes to tests files in the /workspace directory to reproduce the issue in the <issue_description>, i.e., such that the generated tests fail in the current state (where the issue is unresolved) and pass when the issue will be resolved.
Follow these steps to reproduce the issue:
1. As a first step, it might be a good idea to explore the repo to familiarize yourself with its structure.
2. Create a script `reproduction.py` to reproduce the error and execute it with `python reproduction.py` using the BashTool, to confirm the error
3. Edit the sourcecode of the repo to integrate your reproduction script into the test framework
4. Run the test framework and make sure your tests fail! Only submit FAILING tests! Never submit passing tests.
{test_instructions}Your thinking should be thorough and so it's fine if it's very long.
"""
template_name = 'swt.j2'
elif mode == 'swe':
if 'claude' in llm_model:
template_name = 'swe_claude.j2'
elif 'gemini' in llm_model:
template_name = 'swe_gemini.j2'
elif 'gpt-4.1' in llm_model:
template_name = 'swe_gpt4.j2'
else:
template_name = (
'swe_default.j2' # Default for 'swe' mode (regular swe-bench)
)
else:
instruction = f"""
<uploaded_files>
/workspace/{workspace_dir_name}
</uploaded_files>
# Fallback or error handling if mode is unexpected
logger.error(f'Unexpected evaluation mode: {mode}. Falling back to default.')
template_name = 'swe_default.j2'
I've uploaded a python code repository in the directory {workspace_dir_name}. Consider the following issue description:
# Set up Jinja2 environment
# Assuming templates are in 'evaluation/benchmarks/swe_bench/prompts' relative to this script
prompts_dir = os.path.join(os.path.dirname(__file__), 'prompts')
env = Environment(loader=FileSystemLoader(prompts_dir))
template = env.get_template(template_name)
<issue_description>
{instance.problem_statement}
</issue_description>
# Prepare context for rendering
context = {
'instance': instance,
'workspace_dir_name': workspace_dir_name,
'metadata': metadata, # Pass metadata if needed in templates
}
Can you help me implement the necessary changes to the repository so that the requirements specified in the <issue_description> are met?
I've already taken care of all changes to any of the test files described in the <issue_description>. This means you DON'T have to modify the testing logic or any of the tests in any way!
Also the development Python environment is already set up for you (i.e., all dependencies already installed), so you don't need to install other packages.
Your task is to make the minimal changes to non-test files in the /workspace/{workspace_dir_name} directory to ensure the <issue_description> is satisfied.
# Add specific context for swt-ci mode if needed
if mode == 'swt-ci':
context['test_instructions'] = (
f'The following command can be used to run the tests: `{list(MAP_REPO_TO_TEST_FRAMEWORK_VERBOSE[instance.repo].values())[0]}`. Make sure they fail in the expected way.\n'
)
else:
context['test_instructions'] = '' # Ensure it's defined for other modes
Follow these phases to resolve the issue:
Phase 1. READING: read the problem and reword it in clearer terms
1.1 If there are code or config snippets. Express in words any best practices or conventions in them.
1.2 Hightlight message errors, method names, variables, file names, stack traces, and technical details.
1.3 Explain the problem in clear terms.
1.4 Enumerate the steps to reproduce the problem.
1.5 Hightlight any best practices to take into account when testing and fixing the issue
Phase 2. RUNNING: install and run the tests on the repository
2.1 Follow the readme
2.2 Install the environment and anything needed
2.2 Iterate and figure out how to run the tests
Phase 3. EXPLORATION: find the files that are related to the problem and possible solutions
3.1 Use `grep` to search for relevant methods, classes, keywords and error messages.
3.2 Identify all files related to the problem statement.
3.3 Propose the methods and files to fix the issue and explain why.
3.4 From the possible file locations, select the most likely location to fix the issue.
Phase 4. TEST CREATION: before implementing any fix, create a script to reproduce and verify the issue.
4.1 Look at existing test files in the repository to understand the test format/structure.
4.2 Create a minimal reproduction script that reproduces the located issue.
4.3 Run the reproduction script to confirm you are reproducing the issue.
4.4 Adjust the reproduction script as necessary.
Phase 5. FIX ANALYSIS: state clearly the problem and how to fix it
5.1 State clearly what the problem is.
5.2 State clearly where the problem is located.
5.3 State clearly how the test reproduces the issue.
5.4 State clearly the best practices to take into account in the fix.
5.5 State clearly how to fix the problem.
Phase 6. FIX IMPLEMENTATION: Edit the source code to implement your chosen solution.
6.1 Make minimal, focused changes to fix the issue.
Phase 7. VERIFICATION: Test your implementation thoroughly.
7.1 Run your reproduction script to verify the fix works.
7.2 Add edge cases to your test script to ensure comprehensive coverage.
7.3 Run existing tests related to the modified code to ensure you haven't broken anything.
8. FINAL REVIEW: Carefully re-read the problem description and compare your changes with the base commit {instance['base_commit']}.
8.1 Ensure you've fully addressed all requirements.
8.2 Run any tests in the repository related to:
8.2.1 The issue you are fixing
8.2.2 The files you modified
8.2.3 The functions you changed
8.3 If any tests fail, revise your implementation until all tests pass
Be thorough in your exploration, testing, and reasoning. It's fine if your thinking process is lengthy - quality and completeness are more important than brevity.
"""
# Render the instruction
instruction = template.render(context)
if RUN_WITH_BROWSING:
instruction += (
@@ -203,9 +176,13 @@ def get_instance_docker_image(
if swebench_official_image:
# Official SWE-Bench image
# swebench/sweb.eval.x86_64.django_1776_django-11333:v1
docker_image_prefix = 'docker.io/swebench/'
# SWE-bench-Live uses the same naming convention as SWE-Bench
if DATASET_TYPE == 'SWE-bench-Live':
docker_image_prefix = 'docker.io/starryzhang/'
elif DATASET_TYPE == 'SWE-bench':
docker_image_prefix = 'docker.io/swebench/'
repo, name = instance_id.split('__')
image_name = f'swebench/sweb.eval.x86_64.{repo}_1776_{name}:latest'.lower()
image_name = f'{docker_image_prefix.rstrip("/")}/sweb.eval.x86_64.{repo}_1776_{name}:latest'.lower()
logger.debug(f'Using official SWE-Bench image: {image_name}')
return image_name
else:
@@ -223,7 +200,8 @@ def get_config(
metadata: EvalMetadata,
) -> OpenHandsConfig:
# We use a different instance image for the each instance of swe-bench eval
use_swebench_official_image = 'swe-gym' not in metadata.dataset.lower()
use_swebench_official_image = DATASET_TYPE != 'SWE-Gym'
base_container_image = get_instance_docker_image(
instance['instance_id'],
swebench_official_image=use_swebench_official_image,
@@ -340,8 +318,12 @@ def initialize_runtime(
runtime.copy_to(temp_file_path, '/swe_util/eval_data/instances/')
# inject the instance swe entry
if DATASET_TYPE == 'SWE-bench-Live':
entry_script_path = 'instance_swe_entry_live.sh'
else:
entry_script_path = 'instance_swe_entry.sh'
runtime.copy_to(
str(os.path.join(script_dir, 'scripts/setup/instance_swe_entry.sh')),
str(os.path.join(script_dir, f'scripts/setup/{entry_script_path}')),
'/swe_util/',
)
@@ -361,14 +343,14 @@ def initialize_runtime(
logger.error(f'Failed to source ~/.bashrc: {str(obs)}')
assert_and_raise(obs.exit_code == 0, f'Failed to source ~/.bashrc: {str(obs)}')
action = CmdRunAction(command='source /swe_util/instance_swe_entry.sh')
action = CmdRunAction(command=f'source /swe_util/{entry_script_path}')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(
obs.exit_code == 0,
f'Failed to source /swe_util/instance_swe_entry.sh: {str(obs)}',
f'Failed to source /swe_util/{entry_script_path}: {str(obs)}',
)
action = CmdRunAction(command=f'cd /workspace/{workspace_dir_name}')
@@ -421,9 +403,9 @@ def initialize_runtime(
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
if 'multimodal' not in metadata.dataset.lower():
if DATASET_TYPE != 'Multimodal' and DATASET_TYPE != 'SWE-bench-Live':
# Only for non-multimodal datasets, we need to activate the testbed environment for Python
# SWE-Bench multimodal datasets are not using the testbed environment
# SWE-Bench multimodal datasets and SWE-bench-Live are not using the testbed environment
action = CmdRunAction(command='which python')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
@@ -665,7 +647,13 @@ def process_instance(
# ======= THIS IS SWE-Bench specific =======
# Get git patch
return_val = complete_runtime(runtime, instance)
if DATASET_TYPE == 'SWE-bench-Live':
from evaluation.benchmarks.swe_bench.live_utils import (
complete_runtime as complete_runtime_fn,
)
else:
complete_runtime_fn = complete_runtime
return_val = complete_runtime_fn(runtime, instance)
git_patch = return_val['git_patch']
logger.info(
f'Got git diff for instance {instance.instance_id}:\n--------\n{git_patch}\n--------'
@@ -770,11 +758,15 @@ if __name__ == '__main__':
# NOTE: It is preferable to load datasets from huggingface datasets and perform post-processing
# so we don't need to manage file uploading to OpenHands's repo
dataset = load_dataset(args.dataset, split=args.split)
# Set the global dataset type based on dataset name
set_dataset_type(args.dataset)
swe_bench_tests = filter_dataset(dataset.to_pandas(), 'instance_id')
logger.info(
f'Loaded dataset {args.dataset} with split {args.split}: {len(swe_bench_tests)} tasks'
)
if 'SWE-Gym' in args.dataset:
if DATASET_TYPE == 'SWE-Gym':
with open(
os.path.join(
os.path.dirname(os.path.abspath(__file__)),

View File

@@ -192,6 +192,8 @@ def get_config(
dataset_name=metadata.dataset,
instance_id=instance['instance_id'],
)
oh_aci_li_cmd = '/openhands/micromamba/bin/micromamba run -n openhands poetry run pip install openhands-aci[llama]'
sandbox_config.runtime_extra_deps = oh_aci_li_cmd
workspace_dir_name = _get_swebench_workspace_dir_name(instance)
sandbox_config.runtime_startup_env_vars = {
'REPO_PATH': f'/workspace/{workspace_dir_name}/',
@@ -216,6 +218,7 @@ def get_config(
enable_jupyter=False,
enable_browsing=RUN_WITH_BROWSING,
enable_llm_editor=False,
enable_mcp=os.environ.get('ENABLE_MCP', False),
condenser=metadata.condenser_config,
enable_prompt_extensions=False,
)

View File

@@ -0,0 +1,33 @@
import argparse
import json
def main(output_jsonl: str):
with open(output_jsonl, 'r') as f:
for line in f:
try:
output = json.loads(line)
pred = {
'instance_id': output['instance_id'],
'model_name_or_path': output['metadata']['llm_config']['model'],
'model_patch': output['test_result']['git_patch'],
}
except Exception as e:
print(
f'Error while reading output of instance {output["instance_id"]}: {e}'
)
print(json.dumps(pred))
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument(
'--output_jsonl',
type=str,
required=True,
help='Path to the prediction file (.../outputs.jsonl)',
)
args = parser.parse_args()
main(args.output_jsonl)

View File

@@ -0,0 +1,41 @@
#!/usr/bin/env bash
source ~/.bashrc
SWEUTIL_DIR=/swe_util
# FIXME: Cannot read SWE_INSTANCE_ID from the environment variable
# SWE_INSTANCE_ID=django__django-11099
if [ -z "$SWE_INSTANCE_ID" ]; then
echo "Error: SWE_INSTANCE_ID is not set." >&2
exit 1
fi
# Read the swe-bench-test-lite.json file and extract the required item based on instance_id
item=$(jq --arg INSTANCE_ID "$SWE_INSTANCE_ID" '.[] | select(.instance_id == $INSTANCE_ID)' $SWEUTIL_DIR/eval_data/instances/swe-bench-instance.json)
if [[ -z "$item" ]]; then
echo "No item found for the provided instance ID."
exit 1
fi
echo "WORKSPACE_NAME: $SWE_INSTANCE_ID"
# Clear the workspace
if [ -d /workspace ]; then
rm -rf /workspace/*
else
mkdir /workspace
fi
# Copy repo to workspace
if [ -d /workspace/$SWE_INSTANCE_ID ]; then
rm -rf /workspace/$SWE_INSTANCE_ID
fi
mkdir -p /workspace
cp -r /testbed /workspace/$SWE_INSTANCE_ID
# SWE-bench-Live does not use conda to manage Python
# if [ -d /opt/miniconda3 ]; then
# . /opt/miniconda3/etc/profile.d/conda.sh
# conda activate testbed
# fi

View File

@@ -921,7 +921,7 @@ SPECS_PYDICOM.update(
SPECS_HUMANEVAL = {k: {'python': '3.9', 'test_cmd': 'python'} for k in ['1.0']}
# Constants - Task Instance Instllation Environment
# Constants - Task Instance Installation Environment
MAP_REPO_VERSION_TO_SPECS: dict[str, dict[str, Any]] = {
'astropy/astropy': SPECS_ASTROPY,
'dbt-labs/dbt-core': SPECS_DBT_CORE,

View File

@@ -539,7 +539,7 @@ if __name__ == '__main__':
if args.llm_config:
llm_config = get_llm_config_arg(args.llm_config)
llm_config.log_completions = True
# modify_params must be False for evaluation purpose, for reproducibility and accurancy of results
# modify_params must be False for evaluation purpose, for reproducibility and accuracy of results
llm_config.modify_params = False
if llm_config is None:

View File

@@ -0,0 +1,102 @@
# VersiCode benchmark
This project is used to evaluate the performance of the model on VersiCode. It includes:
- data: the test data needed and the model outputs
- inference_utils: inference scripts for ours tasks and models
- metric: scripts for calculating various metric
- output_processing: process the model output to facilitate the calculation of model metrics
# Details
1. **Prepare the environment**
```shell
#create conda environment
conda create -n VersiCode python==3.12
#install requirements
pip install -r requirements.txt
```
2. **Experiment Data**
To obtain the experimental data, please visit the Hugging Face link: https://huggingface.co/datasets/AstoneNg/VersiCode.
Locate the files `VersiCode_block_completion.json` and `VersiCode_migration.json` under the `experiment_data` directory, and place them in the `/data/test_data directory` of this project.
3. **Model inference**
```shell
#cd inference_utils directory
cd inference_utils
#The script file starting with 'test' is used to test the local model
#The script file at the beginning of the API is used to test the API call model
#block level code completipn
#Modify the 10th and 12th lines of code to specify the base URL and model name
python api_test_block_completion.py
#Modify the 30th line of code to specify the local model path
python test_block.py
# code migration (migration order is 'old_to_new')
#Modify the 10th and 12th lines of code to specify the base URL and model name
python api_code_migration.py
#Modify the 30th line of code to specify the local model path
python test_migration.py
```
4. **Process output**
Process the output content of the model, remove redundant content, extract specified content for easy calculation of indicators.
```shell
#cd output_processing
cd output_processing
#Extract content from<start> and <end>
#Modify the 8th and 9th lines of code to specify the model and task granularity
python clear_ans.py
#In the block completion task and migration task, cdc@k The calculation of indicators needs to be targeted at key rows,
#Modify lines 76 and 79 to specify the data path
python choose_core_line_from_block_versicode.py
python choose_core_line_from_migration_versicode.py
```
5. **Metric**
We have three metrics pass@kem@k and cdc@k Due to our inability to automatically build a dynamic evaluation environment, we have not provided pass@k .
```shell
#cd metric
cd metric
#Modify lines 137-140 in migration task (compute_migration_cdc_score.py) or 143-145 in block and line completion task (compute_versicode_cdc_score.py and compute_versicode_em_score.py) of the code to specify the data path and calculate the k-value of the metric
python compute_migration_cdc_score.py
python compute_versicode_cdc_score.py
python compute_versicode_em_score.py
#Notes
#We found limitations in the ISM@k and PM@k metrics for evaluating code generation, so they are used only as reference in our experiments.
#Modify lines 261-265 in block and line completion task of the code to specify the data path and calculate the k-value of the metric
python compute_ism_pm_score.py
```
# Citation
```
@article{versicode,
author={Tongtong Wu and Weigang Wu and Xingyu Wang and Kang Xu and Suyu Ma and Bo Jiang and Ping Yang and Zhenchang Xing and Yuan-Fang Li and Gholamreza Haffari},
title = {VersiCode: Towards Version-controllable Code Generation},
journal = {CoRR},
volume = {abs/2406.07411},
year = {2024},
url = {https://arxiv.org/abs/2406.07411},
}
```
**Github url**: https://github.com/wutong8023/VersiCode
# Contributor
[Tongtong Wu](https://scholar.google.com/citations?hl=zh-CN&user=u1Qp8lUAAAAJ&view_op=list_works&sortby=pubdate), [Weigang Wu](https://scholar.google.com/citations?hl=zh-CN&user=UneIZo8AAAAJ), [Xingyu Wang](https://scholar.google.com/citations?hl=zh-CN&user=wqPJcxcAAAAJ), [Kang Xu](https://scholar.google.com/citations?hl=zh-CN&user=N1UUDi0AAAAJ), [Suyu Ma](https://scholar.google.com/citations?hl=zh-CN&user=NJHR1ukAAAAJ), [Bo Jiang](https://wutong8023.site/VersiCode/), [Ping Yang](https://scholar.google.com/citations?view_op=list_works&hl=en&hl=en&user=hrogvxoAAAAJ), [Zhenchang Xing](https://scholar.google.com/citations?hl=zh-CN&user=0vCxuH4AAAAJ), [Yuan-Fang Li](https://scholar.google.com/citations?hl=zh-CN&user=wufXO1kAAAAJ), [Gholamreza Haffari](https://scholar.google.com/citations?hl=zh-CN&user=Perjx5EAAAAJ)

View File

@@ -0,0 +1,134 @@
"""
GPT performs line level generation prediction and truncates overly long tokens
"""
import json
import os
import tiktoken
from openai import OpenAI
max_tokens = 127000 # gpt3.5 is 16ktoken gpt4o is 128k
model_name = ''
os.environ['OPENAI_API_KEY'] = ''
client = OpenAI()
def truncate_text(text, max_tokens):
encoding = tiktoken.get_encoding('cl100k_base')
disallowed_special = ()
tokens = encoding.encode(text, disallowed_special=disallowed_special)
print(len(tokens))
if len(tokens) > max_tokens:
tokens = tokens[:max_tokens]
truncated_text = encoding.decode(tokens)
return truncated_text
def predict(content, model_name):
response = client.chat.completions.create(
model=model_name,
messages=[{'role': 'user', 'content': content}],
frequency_penalty=0.1,
max_tokens=128,
logit_bias=None,
logprobs=None,
n=6,
presence_penalty=0.0,
seed=None,
stop=None,
stream=False,
temperature=0.8,
top_p=0.95,
)
ans_list = []
choices_list = response.choices
for c in choices_list:
content = c.message.content
ans_list.append(content)
final_ans = str(ans_list)
return final_ans
def bulid_prompt(description, old_version, old_code, new_version) -> str:
"""
build prompt
:param version:
:param description:
:param masked_code:
:param options:
:return:
"""
prompt = f"""
You are now a professional Python programming engineer. I will provide you with a code snippet and a description of its functionality,
including the dependencies and versions used in the code. Then, I will provide the same dependencies but with a specified new version.
Your task is to refactor the code using the methods provided by the specified new version and return the refactored code.
Please note that you only need to return the refactored code and enclose it with <start> and <end>:
###Functionality description of the code
{description}
###Dependency and old version
{old_version}
###Old version code
{old_code}
###Dependency and new version
{new_version}
###Refactored new code
"""
return prompt
json_path = '../data/test_data/VersiCode_migration.json'
with open(json_path, 'r', encoding='utf-8') as fr:
lodict = json.load(fr)
data_dict = lodict
data_list = data_dict
for data in data_list:
if 'model_output' in data:
print(
f'the {data_list.index(data) + 1} has already been predicted, skipping this data!'
)
continue
try:
print(f'Predicting {data_list.index(data) + 1} ')
old_version = data['dependency'] + data['old_version'] # package == x.x.x
new_version = data['dependency'] + data['new_version'] # package == x.x.x
description = data['description'] # 功能描述
old_code = data['old_code'] # mask后的代码
instruction = bulid_prompt(description, old_version, old_code, new_version)
truncated_text = truncate_text(instruction, max_tokens)
prediction = predict(truncated_text, model_name)
data['model_output'] = prediction
except Exception as e:
print(f'error{e}')
print('save current data')
save_folder_path = os.path.join(
'../data/result_data/code_migration', model_name
)
if not os.path.exists(save_folder_path):
os.makedirs(save_folder_path)
save_json_path = os.path.join(save_folder_path, json_path.split('/')[-1])
with open(save_json_path, 'w', encoding='utf-8') as fw:
json.dump(data_dict, fw, indent=4, ensure_ascii=False)
break
save_folder_path = os.path.join('../data/result_data/code_migration', model_name)
if not os.path.exists(save_folder_path):
os.makedirs(save_folder_path)
save_json_path = os.path.join(save_folder_path, json_path.split('/')[-1])
with open(save_json_path, 'w', encoding='utf-8') as fw:
json.dump(data_dict, fw, indent=4, ensure_ascii=False)

View File

@@ -0,0 +1,141 @@
"""
GPT performs line level generation prediction and truncates overly long tokens
"""
import json
import os
import tiktoken
from openai import OpenAI
max_tokens = 127000 # gpt3.5 is 16ktoken gpt4o is 128k
model_name = ''
os.environ['OPENAI_API_KEY'] = ''
client = OpenAI()
def truncate_text(text, max_tokens):
encoding = tiktoken.get_encoding('cl100k_base')
disallowed_special = ()
tokens = encoding.encode(text, disallowed_special=disallowed_special)
print(len(tokens))
if len(tokens) > max_tokens:
tokens = tokens[:max_tokens]
truncated_text = encoding.decode(tokens)
return truncated_text
def predict(content, model_name):
response = client.chat.completions.create(
model=model_name,
messages=[{'role': 'user', 'content': content}],
frequency_penalty=0.1,
max_tokens=128,
logit_bias=None,
logprobs=None,
n=6,
presence_penalty=0.0,
seed=None,
stop=None,
stream=False,
temperature=0.8,
top_p=0.95,
)
ans_list = []
choices_list = response.choices
for c in choices_list:
content = c.message.content
ans_list.append(content)
final_ans = str(ans_list)
return final_ans
def bulid_prompt(version, description) -> str:
"""
build prompt
:param version:
:param description:
:param masked_code:
:param options:
:return:
"""
prompt = f"""
You are a professional Python engineer, and I will provide functional descriptions and versions of specified dependency packages.
You need to write code in Python to implement this feature based on the functional description and using the dependency package and version I specified.
Please note that you only need to return the code that implements the function, and do not return any other content.
Please use <start> and <end> to enclose the generated code. Here is an example:
###Function Description
The function of this code is to print the results predicted by calling the model using vllm.
###dependeny and version
vllm==0.3.3
###response:
<start>
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print("Prompt,Generated text")
<end>
###Function Description
{description}
###dependeny and version
{version}
###response:
"""
return prompt
json_path = '../data/test_data/VersiCode_block_completion.json'
with open(json_path, 'r', encoding='utf-8') as fr:
lodict = json.load(fr)
data_dict = lodict
data_list = data_dict
for data in data_list:
if 'model_output' in data:
print(
f'the {data_list.index(data) + 1} has already been predicted, skipping this data!'
)
continue
try:
print(f'Predicting {data_list.index(data) + 1} ')
version = data['dependency'] + data['version'] # package == x.x.x
description = data['description'] # func description
instruction = bulid_prompt(version, description)
truncated_text = truncate_text(instruction, max_tokens)
prediction = predict(truncated_text, model_name)
data['model_output'] = prediction
except Exception as e:
print(f'error{e}')
print('save current data')
save_folder_path = os.path.join(
'../data/result_data/block_completion', model_name
)
if not os.path.exists(save_folder_path):
os.makedirs(save_folder_path)
save_json_path = os.path.join(save_folder_path, json_path.split('/')[-1])
with open(save_json_path, 'w', encoding='utf-8') as fw:
json.dump(data_dict, fw, indent=4, ensure_ascii=False)
break
save_folder_path = os.path.join('../data/result_data/block_completion', model_name)
if not os.path.exists(save_folder_path):
os.makedirs(save_folder_path)
save_json_path = os.path.join(save_folder_path, json_path.split('/')[-1])
with open(save_json_path, 'w', encoding='utf-8') as fw:
json.dump(data_dict, fw, indent=4, ensure_ascii=False)

View File

@@ -0,0 +1,129 @@
"""
block completion
"""
import copy
import gc
import json
import os
import time
from multiprocessing import Process
import tiktoken
import torch
from vllm import LLM, SamplingParams
# os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
def truncate_text(text, max_tokens):
encoding = tiktoken.get_encoding('cl100k_base')
disallowed_special = ()
tokens = encoding.encode(text, disallowed_special=disallowed_special)
print(len(tokens))
if len(tokens) > max_tokens:
tokens = tokens[:max_tokens]
truncated_text = encoding.decode(tokens)
return truncated_text
model_list = ['/data2/base models/starcoder2-15b', '/data2/base models/CodeGemma-7B']
def run_inference(model_name, origin_data_list):
temp_data_list = copy.deepcopy(origin_data_list)
test_list = []
for data in temp_data_list:
version = data['dependency'] + data['version'] # package == x.x.x
description = data['description'] # func description
instruction = bulid_prompt(version, description)
test_list.append(instruction)
sampling_params = SamplingParams(n=6, temperature=0.8, top_p=0.95, max_tokens=64)
llm = LLM(
model=model_name,
tensor_parallel_size=4,
gpu_memory_utilization=0.9,
swap_space=20,
)
outputs = llm.generate(test_list, sampling_params)
for output in outputs:
requests_id = int(output.request_id)
temp_ans_list = []
output_list = output.outputs
for o in output_list:
text = o.text
temp_ans_list.append(text)
temp_data_list[requests_id]['model_output'] = str(temp_ans_list)
save_folder_path = os.path.join(
'../data/result_data/block_completion', model_name.split('/')[-1]
)
if not os.path.exists(save_folder_path):
os.makedirs(save_folder_path)
save_json_path = os.path.join(save_folder_path, json_path.split('/')[-1])
with open(save_json_path, 'w', encoding='utf-8') as fw:
json.dump(temp_data_list, fw, indent=4, ensure_ascii=False)
gc.collect()
torch.cuda.empty_cache()
def bulid_prompt(version, description) -> str:
"""
build prompt
:param version:
:param description:
:param masked_code:
:param options:
:return:
"""
prompt = f"""
You are a professional Python engineer, and I will provide functional descriptions and versions of specified dependency packages.
You need to write code in Python to implement this feature based on the functional description and using the dependency package and version I specified.
Please note that you only need to return the code that implements the function, and do not return any other content.
Please use <start> and <end> to enclose the generated code. Here is an example:
###Function Description
The function of this code is to print the results predicted by calling the model using vllm.
###dependeny and version
vllm==0.3.3
###response:
<start>
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print("Prompt,Generated text")
<end>
###Function Description
{description}
###dependeny and version
{version}
###response:
"""
return prompt
json_path = '../data/test_data/VersiCode_block_completion.json'
with open(json_path, 'r', encoding='utf-8') as fr:
lodict = json.load(fr)
origin_data_list = lodict
for model_name in model_list:
process = Process(target=run_inference, args=(model_name, origin_data_list))
process.start()
process.join()
time.sleep(120)

View File

@@ -0,0 +1,122 @@
"""
code migration
"""
import copy
import gc
import json
import os
import time
from multiprocessing import Process
import tiktoken
import torch
from vllm import LLM, SamplingParams
# os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
def truncate_text(text, max_tokens):
encoding = tiktoken.get_encoding('cl100k_base')
disallowed_special = ()
tokens = encoding.encode(text, disallowed_special=disallowed_special)
print(len(tokens))
if len(tokens) > max_tokens:
tokens = tokens[:max_tokens]
truncated_text = encoding.decode(tokens)
return truncated_text
model_list = ['/data2/base models/starcoder2-15b', '/data2/base models/CodeGemma-7B']
def run_inference(model_name, origin_data_list):
temp_data_list = copy.deepcopy(origin_data_list)
test_list = []
for data in temp_data_list:
old_version = data['dependency'] + data['old_version'] # package == x.x.x
new_version = data['dependency'] + data['new_version'] # package == x.x.x
description = data['description'] # 功能描述
old_code = data['old_code'] # mask后的代码
instruction = bulid_prompt(description, old_version, old_code, new_version)
test_list.append(instruction)
sampling_params = SamplingParams(n=6, temperature=0.8, top_p=0.95, max_tokens=512)
llm = LLM(
model=model_name,
tensor_parallel_size=4,
gpu_memory_utilization=0.6,
swap_space=40,
)
outputs = llm.generate(test_list, sampling_params)
for output in outputs:
requests_id = int(output.request_id)
temp_ans_list = []
output_list = output.outputs
for o in output_list:
text = o.text
temp_ans_list.append(text)
temp_data_list[requests_id]['model_output'] = str(temp_ans_list)
save_folder_path = os.path.join(
'../data/result_data/code_migration', model_name.split('/')[-1]
)
if not os.path.exists(save_folder_path):
os.makedirs(save_folder_path)
save_json_path = os.path.join(save_folder_path, json_path.split('/')[-1])
with open(save_json_path, 'w', encoding='utf-8') as fw:
json.dump(temp_data_list, fw, indent=4, ensure_ascii=False)
gc.collect()
torch.cuda.empty_cache()
def bulid_prompt(description, old_version, old_code, new_version) -> str:
"""
build prompt
:param version:
:param description:
:param masked_code:
:param options:
:return:
"""
prompt = f"""
You are now a professional Python programming engineer. I will provide you with a code snippet and a description of its functionality,
including the dependencies and versions used in the code. Then, I will provide the same dependencies but with a specified new version.
Your task is to refactor the code using the methods provided by the specified new version and return the refactored code.
Please note that you only need to return the refactored code and enclose it with <start> and <end>:
###Functionality description of the code
{description}
###Dependency and old version
{old_version}
###Old version code
{old_code}
###Dependency and new version
{new_version}
###Refactored new code
"""
return prompt
json_path = '../data/test_data/VersiCode_migration.json'
with open(json_path, 'r', encoding='utf-8') as fr:
lodict = json.load(fr)
origin_data_list = lodict
for model_name in model_list:
process = Process(target=run_inference, args=(model_name, origin_data_list))
process.start()
process.join()
time.sleep(120)

Some files were not shown because too many files have changed in this diff Show More