Compare commits

..

181 Commits

Author SHA1 Message Date
openhands
635a29a47c Revise Scenario 1 design to align with engineer's sandbox spec approach
Major changes to align with existing OpenHands infrastructure:
- Use enhanced sandbox specifications instead of parallel CustomAgentSpec system
- Implement permissions as core M1 requirement (not M4 afterthought)
- Support both pre-built image registration and secure user upload workflows
- Extend existing SandboxSpecService rather than creating separate infrastructure
- Integrate with existing conversation creation API via sandbox_spec_id parameter
- Address V0 security issues with comprehensive validation and approval workflows
- Maintain backward compatibility with existing sandbox system

This approach is more pragmatic and leverages proven OpenHands infrastructure
while addressing the engineer's specific requirements for permissions and security.

Co-authored-by: openhands <openhands@all-hands.dev>
2025-12-02 19:16:00 +00:00
openhands
9f04705b30 Enhance Scenario 2 design doc with detailed V1 architecture explanation
- Add comprehensive explanation of current V1 agent instantiation and execution flow
- Detail how agents are currently created, managed, and executed in agent server containers
- Explain the distributed service model and communication patterns
- Clarify integration points and how dynamic loading would modify existing processes
- Provide better context for understanding the technical changes required

Co-authored-by: openhands <openhands@all-hands.dev>
2025-12-02 19:07:17 +00:00
openhands
d9b6901785 Add design document for Dynamic Custom Agent Package Loading (Scenario 2)
- Comprehensive design for dynamic loading of custom agent packages within existing agent server container
- Supports Git repositories, PyPI packages, and ZIP archives without requiring custom Docker images
- Maintains full compatibility with existing V1 HTTP API and infrastructure
- Includes security validation, error handling, and fallback mechanisms
- Covers 4-milestone implementation plan with detailed technical specifications

Co-authored-by: openhands <openhands@all-hands.dev>
2025-12-01 17:58:44 +00:00
openhands
39c240527c Add design document for Custom Agent Packages with Custom Runtime Images (Scenario 1)
- Comprehensive design for supporting custom agent packages that require custom system dependencies
- Extends V1 architecture to support custom Docker images with specialized tools and runtimes
- Maintains compatibility with existing HTTP API and tooling ecosystem
- Includes detailed technical design, implementation plan, and user interface specifications

Co-authored-by: openhands <openhands@all-hands.dev>
2025-12-01 17:37:35 +00:00
Hiep Le
6c821ab73e fix(frontend): the content of the FinishObservation event is not being rendered correctly. (#11846) 2025-12-01 09:29:18 -05:00
sp.wack
96f13b15e7 Revert "chore(backend): Add better PostHog tracking" (#11749) 2025-12-01 13:58:03 +00:00
Hiep Le
d9731b6850 feat(frontend): show plan content in the planning tab (#11807) 2025-12-01 08:42:44 -05:00
Hiep Le
e7e49c9110 fix(frontend): AppConversationStartTask timezone display in ui (#11847) 2025-12-01 08:13:54 -05:00
Ray Myers
27590497d5 chore: update posthog-js from 1.290.0 to 1.298.1 (#11830)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-12-01 17:03:44 +04:00
adshrc
991f1a242c feat(llm): added Claude Opus 4.5 model and corresponding test (#11841) 2025-12-01 11:09:33 +00:00
Marco Dalalba
6d8cca43a8 fix: add Azure GPT-5 family to stop words unsupported patterns (#11842) 2025-12-01 01:32:34 +01:00
Hiep Le
d62bb81c3b feat(backend): implement API to fetch contents of PLAN.md (#11795) 2025-11-30 13:29:13 +07:00
Hiep Le
156d0686c4 fix(frontend): the content of the BrowserObservation event is not being rendered correctly (#11832) 2025-11-28 23:16:34 +07:00
Hiep Le
d0b1d29379 fix(backend): the SaaS codebase is currently non-functional. (#11834) 2025-11-28 09:12:02 -07:00
Jeffrey Ma
974bcdfd0b SWE-fficiency benchmark implementation (#11716)
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
Co-authored-by: enyst <engel.nyst@gmail.com>
2025-11-27 09:13:15 +01:00
Rohit Malhotra
ed094b6a97 Fix v1_enabled migration failures by making column nullable (#11829)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-11-26 21:41:03 +00:00
Rohit Malhotra
49624219ed fix(migration): add server_default to v1_enabled column migration (#11828)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-11-26 20:21:12 +00:00
Rohit Malhotra
9906a1d49a V1: Support v1 conversations in github resolver (#11773)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-11-26 13:11:05 -05:00
Hiep Le
014884333d fix(frontend): Remove azure devops integration button from cloud settings (#11826) 2025-11-27 00:41:28 +07:00
Hiep Le
865ddaabdf fix(backend): unable to start a new V0 conversation (#11824) 2025-11-26 23:49:52 +07:00
Hiep Le
3219834e35 fix(frontend): resolve issue preventing cost from displaying (V1) (#11798) 2025-11-26 19:39:07 +07:00
Hiep Le
2e295073ae fix(frontend): fileeditorobservationevent rendering issue (#11820) 2025-11-26 18:40:28 +07:00
Hiep Le
5ef45cfec2 refactor(frontend): support TerminalObservation event (#11819) 2025-11-26 17:53:47 +07:00
Tim O'Farrell
d737141efa SDK Fixes (#11813)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-11-26 10:44:17 +00:00
Hiep Le
b532a5e7fe fix(backend): github token not working for v1 conversations (#11814) 2025-11-26 01:04:45 +07:00
Hiep Le
c58e2157ea feat(frontend): display skill ready for v1 conversations (#11815) 2025-11-25 23:37:54 +07:00
mamoodi
9cc8687271 fix: handle None return from version_info.get('Components') in docker builder (#11816)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-11-25 15:35:40 +00:00
aoi127
f6e4d00df1 fix: prevent newline accumulation in XML parameter serialization (#11767)
Co-authored-by: Lai Jinyi <laijinyi@tp-link.com.cn>
2025-11-25 11:56:35 +01:00
Engel Nyst
7782f2afe9 Fix links in readme (#11802) 2025-11-24 19:58:55 +01:00
Hiep Le
639de8114f feat(frontend): add blue border to Planning Agent events (#11788) 2025-11-24 21:36:30 +07:00
Hiep Le
b830d1c513 fix(frontend): hide api key field for openhands provider and auto-populate the key (#11791) 2025-11-24 20:44:15 +07:00
Wan Arif
3504ca7752 feat: add Azure DevOps integration support (#11243)
Co-authored-by: Graham Neubig <neubig@gmail.com>
2025-11-22 14:00:24 -05:00
Graham Neubig
1e513ad63f feat: Add configurable stuck/loop detection (#11799)
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: chuckbutkus <chuck@all-hands.dev>
2025-11-21 22:27:38 +00:00
chuckbutkus
b9b8d27135 Add config option to check if roles are present (#11414) 2025-11-21 16:56:19 -05:00
mamoodi
da8a4b1179 remove unused workflows (#11793) 2025-11-20 16:21:37 -05:00
Hiep Le
d1d08bc490 feat(frontend): integration of events from execution and planning agents within a single conversation (#11786) 2025-11-20 21:21:46 +07:00
Tim O'Farrell
c82e183066 Fix Docker hostname issues in HTTP requests (#11787)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-11-20 11:59:58 +00:00
Rohit Malhotra
26e7d8060f fix(migrations): make SETTING_UP_SKILLS enum migration idempotent (#11782)
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Tim O'Farrell <tofarr@gmail.com>
2025-11-20 11:21:40 +00:00
Tim O'Farrell
ba883ffeca Feat sandbox skills (#11785) 2025-11-20 10:52:13 +00:00
Rodney A.
77b565ce08 fix(frontend): fix duplicate React Aria IDs by updating @heroui/react to v2.8.5 (#11783) 2025-11-20 11:48:11 +07:00
Hiep Le
151c2895e0 feat(frontend): disable change-agent button until WebSocket connection is ready (#11781) 2025-11-20 01:28:17 +07:00
Tim O'Farrell
9538c7bd89 fix(migrations): add SETTING_UP_SKILLS to appconversationstarttaskstatus enum (#11780)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-11-19 18:14:24 +00:00
Boxuan Li
790b7c6e39 Add grok-code-fast-1 to FUNCTION_CALLING_PATTERNS (#11775)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-11-19 08:38:57 -05:00
Daniel Foguelman
4c57a98660 Remove inconsistent parameters in claude sonnet (#11719) 2025-11-19 08:38:19 -05:00
Hiep Le
28af600c16 fix(frontend): display LLM configuration errors to the user (#11776) 2025-11-19 20:15:42 +07:00
Hiep Le
36cf4e161a fix(backend): ensure microagents are loaded for V1 conversations (#11772)
Co-authored-by: Engel Nyst <engel.nyst@gmail.com>
2025-11-19 18:54:08 +07:00
Engel Nyst
bede37fdb6 feat: Enable native tool calling for gemini-3-pro-preview (#11774) 2025-11-18 23:29:54 +01:00
Rohit Malhotra
1a33606987 Chore: move CLI code its own repo (#11724) 2025-11-18 19:59:12 +00:00
Robert Brennan
494eba094f Update fundraising amount in COMMUNITY.md (#11771) 2025-11-18 17:31:34 +00:00
Tim O'Farrell
84c62c4f23 Bumped Software Agent SDK and fixed V1 Delete (#11768) 2025-11-18 15:52:23 +00:00
Hiep Le
f5611c2188 fix(frontend): terminal output not appearing in v1 (#11769) 2025-11-18 22:03:28 +07:00
Robert Brennan
492c12693d Update README and COMMUNITY.md for v1 (#11747)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-11-18 09:37:30 -05:00
Graham Neubig
5345716340 Fix the favicon (#11766) 2025-11-18 07:30:46 -05:00
Hiep Le
b43f7439a7 feat(backend): enable deletion of sub-conversations when removing a parent conversation (#11757) 2025-11-18 17:53:04 +07:00
Tim O'Farrell
192a8e6de4 Fix for docker regression (#11759) 2025-11-17 18:18:40 +00:00
Hiep Le
cd87987037 feat(frontend): add functionality to fetch sub-conversation data (#11758) 2025-11-18 00:49:54 +07:00
Graham Neubig
0dbf09f954 Update OpenHands logos with new branding (#11741)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-11-17 12:47:36 -05:00
Tim O'Farrell
871cc932d7 APP-155 Made all version tags the same color to reduce confusion (#11753) 2025-11-17 16:05:27 +00:00
மனோஜ்குமார் பழனிச்சாமி
60c4d9a23f Add Groq models to function calling patterns (#11745) 2025-11-17 09:19:39 -05:00
Tim O'Farrell
6c121bde74 APP-159 Fix Docker container networking for agent server URLs (#11751)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-11-17 06:09:21 -07:00
sp.wack
6dcf27dbc0 feat(frontend): move PostHog trackers to the frontend (#11748) 2025-11-17 14:55:29 +04:00
Tim O'Farrell
1f6ef8175b Enhance Docker image pull logging with periodic progress updates (#11750)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-11-17 03:15:21 -07:00
Hiep Le
d6fab190bf feat(frontend): integrate with the API to create a sub-conversation for the planning agent (#11730) 2025-11-15 09:43:21 +07:00
Hiep Le
833aae1833 feat(backend): exclude sub-conversations when searching for conversations (#11733) 2025-11-15 00:21:27 +07:00
Tim O'Farrell
2841e35f24 Do not get live status updates when they are not required (#11727)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-11-14 07:55:43 -07:00
Tim O'Farrell
8115d82f96 feat: add created_at__gte filter to search_app_conversation_start_tasks (#11740)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-11-14 07:08:34 -07:00
Hiep Le
7263657937 feat(backend): include sub-conversation ids when fetching conversation details (#11734) 2025-11-14 11:34:30 +07:00
jpelletier1
34fcc50350 Update to include llms.txt (#11737) 2025-11-13 21:42:50 +00:00
jpelletier1
24a9758434 Adding an Agent Builder Skill/Microagent (#11720) 2025-11-13 16:10:00 -05:00
Tim O'Farrell
f24d2a61e6 Fix for wrong column name (#11735) 2025-11-13 17:55:23 +00:00
Hiep Le
e3d0380c2e feat(frontend): add support for the shift + tab shortcut to cycle through conversation modes (#11731) 2025-11-14 00:10:25 +07:00
Hiep Le
8c3f93ddc4 feat(frontend): set descriptive text for all options in the change agent button (#11732) 2025-11-14 00:10:15 +07:00
Hiep Le
bc86796a67 feat(backend): enable sub-conversation creation using a different agent (#11715) 2025-11-13 23:06:44 +07:00
sp.wack
d5b2d2ebc5 fix(frontend): Sync client PostHog opt-in status with server setting (#11728) 2025-11-13 13:22:05 +00:00
Rohit Malhotra
b605c96796 Hotfix: rm max condenser size override (#11713) 2025-11-12 20:13:16 -05:00
sp.wack
8192184d3e chore(backend): Add better PostHog tracking (#11655)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-11-12 16:47:21 +00:00
Hiep Le
8e75f25108 feat(frontend): implement new task tracker interface (#11692) 2025-11-12 22:59:45 +07:00
Neha Prasad
73fe865c7e feat: queue chat messages during runtime connection (#11687)
Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>
2025-11-12 13:20:09 +00:00
Rohit Malhotra
95a44f4248 CLI release 1.0.7 (#11712) 2025-11-11 16:46:30 -05:00
Rohit Malhotra
0a6b76ca2d CLI: bump agent-sdk (#11710)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-11-11 20:29:18 +00:00
Tim O'Farrell
8b6521de62 Fix for issue where conversation does not start (#11695) 2025-11-11 20:23:18 +00:00
mamoodi
11636edf15 Release 0.62.0 (#11706) 2025-11-11 14:57:13 -05:00
Hiep Le
915c180ba7 feat(frontend): disable change agent button while agent is running (#11691) 2025-11-12 00:46:12 +07:00
sp.wack
cdd8aace86 refactor(frontend): migrate from direct posthog imports to usePostHog hook (#11703) 2025-11-11 15:48:56 +00:00
Hiep Le
a2c312d108 feat(frontend): add plan preview component (#11676) 2025-11-11 21:59:23 +07:00
sp.wack
5ad3572810 chore(frontend): Remove user_activated PostHog capture event (#11704) 2025-11-11 14:35:04 +00:00
John Eismeier
967e9e1891 Propose fix some typos and ignore emacs backup files (#11701)
Signed-off-by: John E <jeis4wpi@outlook.com>
2025-11-11 09:20:42 -05:00
sp.wack
f8a41d3ffe fix(frontend): Properly reflect default user analytics setting (#11702) 2025-11-11 18:19:37 +04:00
John-Mason P. Shackelford
6e9e7547e5 Add Documentation link to profile context menu (#11583)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-11-11 09:16:32 -05:00
Hiep Le
9b4f1c365b feat(frontend): add change agent button (#11675) 2025-11-11 20:28:48 +07:00
Engel Nyst
f4dcc136d0 tests: remove Windows-only tests and clean up Windows conditionals (#11697) 2025-11-10 21:34:55 +01:00
Rohit Malhotra
36a8cbbfe4 Add GitHub CI workflow to check package versions (#11637)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-11-10 19:39:49 +00:00
Engel Nyst
83a3c2c5bf Add invisible AI-only guidance to Checklist: humans must fill (#11688) 2025-11-10 18:13:18 +00:00
Engel Nyst
63c9e6403f ci: remove flaky Windows Python tests workflow (#11694) 2025-11-10 12:43:48 -05:00
Hiep Le
bff734070c feat(frontend): update data-placeholder when switching to plan mode (#11674) 2025-11-10 21:30:29 +04:00
mamoodi
5db6bffaf6 Add some notes to the README for things that are not officially suppo… (#11663) 2025-11-10 20:16:41 +04:00
Engel Nyst
14807ed273 ci: remove outdated integration runner (#11653) 2025-11-10 15:51:40 +01:00
Rohit Malhotra
e0d26c1f4e CLI: custom visualizer (#11677) 2025-11-07 19:45:01 +00:00
Rohit Malhotra
27c8c330f4 CLI release 1.0.6 (#11672) 2025-11-07 14:10:04 -05:00
sp.wack
0c927b19d2 fix(frontend): agent loading condition update logic (#11673) 2025-11-07 18:04:27 +00:00
Hiep Le
a660321d55 feat(frontend): display plan content within the planner tab (#11658) 2025-11-08 00:54:15 +07:00
Tim O'Farrell
0e94833d5b Now removing V1 sandboxes in the V0 endpoint (#11671) 2025-11-07 10:51:46 -07:00
Engel Nyst
b83e2877ec CLI: align with agent-sdk renames (#11643)
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: rohitvinodmalhotra@gmail.com <rohitvinodmalhotra@gmail.com>
2025-11-07 11:30:37 -05:00
sp.wack
7acee16de5 fix(frontend): Consider start task job error status for loading indicators (#11670) 2025-11-07 19:24:29 +04:00
sp.wack
1e3f1de773 fix(frontend): Add translations for error status' (#11669) 2025-11-07 13:51:58 +00:00
sp.wack
bfe60d3bbf chore(frontend): Disable /feedback/conversation/{conversationId}/batch for V1 conversations (#11668) 2025-11-07 13:50:09 +00:00
sp.wack
ad75cd05d8 chore(frontend): Add better PostHog tracking (#11645) 2025-11-07 16:35:54 +04:00
Hiep Le
955f87561b feat(frontend): enable pinning and unpinning of conversation tabs (#11659) 2025-11-07 13:38:30 +07:00
Hiep Le
1e5bff82f2 feat(frontend): visually highlight chat input container in plan mode (#11647) 2025-11-07 13:14:28 +07:00
Tim O'Farrell
ddf58da995 Fix V1 callbacks (#11654)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-11-06 16:05:58 -07:00
Hiep Le
b678d548c2 feat(frontend): create new planner tab in the interface (#11646) 2025-11-06 23:56:35 +07:00
Hiep Le
a1d4d62f68 feat(frontend): show server status menu when hovering over the status indicator (#11635) 2025-11-06 16:23:08 +04:00
Yakshith
75e54e3552 fix(llm): remove default reasoning_effort; fix Gemini special case (#11567) 2025-11-05 23:30:46 +01:00
Yuxiao Cheng
6b211f3b29 Fix stuck after incorrect TaskTrackingAction (#11436)
Co-authored-by: jarrycyx <dzdzzd@126.com>
Co-authored-by: Graham Neubig <neubig@gmail.com>
2025-11-05 22:09:51 +00:00
mamoodi
e208b64a95 Update free credits statement to $10 (#11651) 2025-11-05 20:57:56 +00:00
mamoodi
555444f239 Release 0.61.0 (#11618)
Co-authored-by: rohitvinodmalhotra@gmail.com <rohitvinodmalhotra@gmail.com>
2025-11-05 15:11:22 -05:00
Tim O'Farrell
d99c7827d8 More updates of agent_status to execution_status (#11642)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-11-05 19:19:34 +00:00
mamoodi
5a8f08b4ef Remove obsolete workflow (#11650) 2025-11-05 19:56:34 +01:00
Hiep Le
44fbd6c1b9 refactor(backend): the delete_app_conversation_info function (#11648) 2025-11-05 23:45:16 +07:00
sp.wack
7e824ca5dc fix(frontend): V1 Loading UI (#11630) 2025-11-05 14:23:10 +00:00
sp.wack
9a7002d817 fix(frontend): V1 resume conversation / agent (#11627) 2025-11-05 14:16:46 +00:00
Hiep Le
6411d4df94 feat(frontend): display text label when items are selected across all canvas views (#11636) 2025-11-05 16:47:22 +07:00
eddierichter-amd
c544ea1187 localhost base_url fixup when running in a docker container (#11474)
Co-authored-by: Rohit Malhotra <rohitvinodmalhotra@gmail.com>
2025-11-04 17:57:25 -05:00
Graham Neubig
308d0e62ab Change error logging to info for missing config files (#11639) 2025-11-04 21:27:13 +01:00
Ray Myers
9abd1714b9 fix - Speed up runtime tests (#11570)
Co-authored-by: Rohit Malhotra <rohitvinodmalhotra@gmail.com>
Co-authored-by: openhands <openhands@all-hands.dev>
2025-11-04 11:17:55 -06:00
sp.wack
f1abe6c6af fix(ci): Lint Python (#11634) 2025-11-04 16:24:24 +00:00
Tim O'Farrell
30b5ad1768 Fix for issue where conversations won't start (#11633) 2025-11-04 08:51:22 -07:00
Hiep Le
4ea3e4b1fd refactor(frontend): break down conversation service into smaller services (#11594) 2025-11-04 20:52:44 +07:00
Hiep Le
7049a3e918 chore(frontend): add feature flag for planning agent (#11616) 2025-11-04 20:32:45 +07:00
Hiep Le
fa431fb956 refactor(backend): update get_microagent_management_conversations API to support V1 (#11313)
Co-authored-by: Tim O'Farrell <tofarr@gmail.com>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2025-11-04 17:44:44 +07:00
Tim O'Farrell
2fc8ab2601 Bumped Software Agent SDK (#11626) 2025-11-03 14:53:12 -07:00
mamoodi
8e119c68ab Create CNAME 2025-11-03 15:43:34 -05:00
Hiep Le
8893f9364d refactor: update delete_app_conversation to accept ID instead of object (#11486)
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Tim O'Farrell <tofarr@gmail.com>
2025-11-03 13:26:33 -07:00
Tim O'Farrell
727520f6ce V1 CORS Fix (#11586)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-11-03 12:14:02 -07:00
Tim O'Farrell
898c3501dd Update initial from $20 to $10 (#11624) 2025-11-03 19:11:18 +00:00
Jessica Kerr
4c81965c61 build(devcontainer): add uvx installation (#11610) 2025-11-03 19:37:54 +01:00
Hiep Le
0f054c740c fix(frontend): the width of the branch dropdown appears inconsistent on medium-sized screens. (#11620) 2025-11-04 01:30:11 +07:00
Yuxiao Cheng
9bcf80dba5 Adding error logging when config file is not found. (#11419)
Co-authored-by: jarrycyx <dzdzzd@126.com>
Co-authored-by: Engel Nyst <engel.nyst@gmail.com>
2025-11-03 13:19:48 -05:00
மனோஜ்குமார் பழனிச்சாமி
2a98cd9338 Fix import order for Windows PowerShell support (#11557) 2025-11-03 13:14:23 -05:00
Rohit Malhotra
b31dbfc21a CLI: make sure MCP server doesn't persist even after removal (#11602)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-11-03 12:45:47 -05:00
Tim O'Farrell
5d711d5576 Exclude V1 conversations from V0 (#11595) 2025-11-03 09:57:34 -07:00
Rohit Malhotra
3eb73de924 CLI: lazy load conversation for /new command (#11601)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-11-03 16:30:08 +00:00
Rohit Malhotra
2e49f07451 CLI: Rm loading context (#11603)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-11-03 16:15:47 +00:00
Hiep Le
e51685dab4 fix(frontend): there is insufficient padding below the code block. (#11615) 2025-11-03 21:34:01 +07:00
Aphix
b85cc0c716 fix: Autodetect pwsh.exe & DLL path (Win/non-WSL) (#11044) 2025-11-03 08:27:30 -05:00
Hiep Le
7ef1720b5d fix(frontend): correct handling of OBSERVATION_MESSAGE messages for task events (#11613) 2025-11-03 18:57:11 +07:00
Hiep Le
a6385b4059 fix(frontend): agent status shows “Disconnected” when starting a new conversation until sandbox initializes (#11612) 2025-11-03 18:56:52 +07:00
sp.wack
7cfe667a3f fix(frontend): V1 event rendering to display thought + action, then thought + observation (#11596) 2025-11-03 14:07:35 +04:00
Engel Nyst
6e8be827b8 Fix deprecated links (#11605) 2025-11-01 12:37:32 -04:00
Tim O'Farrell
2ccc611e7c Regenerated poetry lock to update dependencies (#11593) 2025-10-31 20:25:01 +00:00
Rohit Malhotra
1f7dec4d94 CLI: patch release 1.0.5 (#11598) 2025-10-31 19:57:39 +00:00
sp.wack
966e4ae990 APP-125: Reset V1 terminal state when switching conversations by forcing remount (#11592)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-10-31 18:41:19 +00:00
Rohit Malhotra
231019974c CLI: fix binary build (#11591) 2025-10-31 18:01:29 +00:00
Rohit Malhotra
d246ab1a21 Hotfix(CLI): make settings page available even when conversation hasn't been created (#11588)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-10-31 17:19:53 +00:00
jpelletier1
15c207c401 Disables Copilot icon by default (#11589) 2025-10-31 17:06:15 +00:00
Rohit Malhotra
cf21cfed6c Hotfix(CLI): make sure to update condenser credentials (#11587)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-10-31 16:37:59 +00:00
Rohit Malhotra
12d57df6ac CLI Patch release 1.0.4 (#11585) 2025-10-31 14:59:39 +00:00
Rohit Malhotra
3239eb4027 Hotfix(CLI): Update README to use V1 CLI for serve command and point to new docker image artifacts (#11584) 2025-10-31 09:34:19 -04:00
Rohit Malhotra
9be673d553 CLI: Create conversation last minute (#11576)
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2025-10-30 23:04:41 +00:00
Tim O'Farrell
7272eae758 Fix remote sandbox permissions (#11582) 2025-10-30 22:13:02 +00:00
mamoodi
ec670cd130 Rename LLM API Key to OpenHands LLM Key in settings (#11577)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-10-30 16:52:31 -04:00
Hiep Le
31702bf46b fix(frontend): delays in updating conversation titles before they are reflected in the user interface. (#11558)
Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>
2025-10-30 18:06:18 +00:00
Tim O'Farrell
5894d2675e V1 IDs without hyphens (#11564) 2025-10-30 16:33:16 +00:00
Hiep Le
59a992c0fb feat(frontend): allow all users to access the LLM page and disable Pro subscription functionality (#11573) 2025-10-30 22:01:30 +07:00
Rohit Malhotra
1939bd0fda CLI Release 1.0.3 (#11574) 2025-10-30 14:39:42 +00:00
Ray Myers
58e690ef75 Fix flaky test_condenser_metrics_included by creating new action objects (#11555)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-10-30 09:20:06 -05:00
Rohit Malhotra
97403dfbdb CLI: rename deprecated args (#11568) 2025-10-30 09:20:27 -04:00
sp.wack
2fc31e96d0 chore(frontend): Add V1 git service API with unified hooks for git changes and diffs (#11565) 2025-10-30 13:03:25 +00:00
Rohit Malhotra
6558b4f97d CLI: bump agent-sdk version (#11566)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-10-30 03:38:36 +00:00
Kevin Musgrave
12d6da8130 feat(evaluation): Filter task ids by difficulty for SWE Gym rollouts (#11490)
Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: openhands <openhands@all-hands.dev>
2025-10-30 02:30:19 +00:00
mamoodi
38f2728cfa Release 0.60.0 (#11544)
Co-authored-by: rohitvinodmalhotra@gmail.com <rohitvinodmalhotra@gmail.com>
2025-10-29 16:17:46 -04:00
sp.wack
fab48fe864 chore(frontend): Remove Jupyter tab and features (#11563) 2025-10-29 17:57:48 +00:00
sp.wack
a196881ab0 chore(frontend): Make terminal read-only by removing user input handlers (#11546) 2025-10-29 21:30:10 +04:00
Rohit Malhotra
ca2c9546ad CLI: add unit test for default agent (#11562)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-10-29 13:11:06 -04:00
sp.wack
704fc6dd69 chore(frontend): Add history loading state for V1 conversations (#11536) 2025-10-29 16:11:25 +00:00
Hiep Le
6630d5dc4e fix(frontend): display error content when FileEditorAction encounters an error (#11560) 2025-10-29 20:03:25 +04:00
Hiep Le
0e7fefca7e fix(frontend): displaying observation result statuses (#11559) 2025-10-29 20:02:32 +04:00
sp.wack
4020448d64 chore(frontend): Add unified hooks for V1 sandbox URLs (VSCode and served hosts) (#11511)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-10-29 14:52:31 +00:00
Hiep Le
2fdd4d084a feat(frontend): display “waiting for user confirmation” when agent status is “awaiting_user_confirmation” (#11539) 2025-10-29 17:31:05 +04:00
Hiep Le
aba5d54a86 feat(frontend): V1 confirmation's call the right API (#11542) 2025-10-29 17:29:27 +04:00
sp.wack
6710a39621 hotfix(frontend): add unified conversation config hook with V1 support (#11547) 2025-10-29 17:26:37 +04:00
502 changed files with 24944 additions and 12678 deletions

1
.devcontainer/README.md Normal file
View File

@@ -0,0 +1 @@
This way of running OpenHands is not officially supported. It is maintained by the community.

View File

@@ -7,5 +7,8 @@ git config --global --add safe.directory "$(realpath .)"
# Install `nc`
sudo apt update && sudo apt install netcat -y
# Install `uv` and `uvx`
wget -qO- https://astral.sh/uv/install.sh | sh
# Do common setup tasks
source .openhands/setup.sh

View File

@@ -13,6 +13,7 @@
- [ ] Other (dependency update, docs, typo fixes, etc.)
## Checklist
<!-- AI/LLM AGENTS: This checklist is for a human author to complete. Do NOT check either of the two boxes below. Leave them unchecked until a human has personally reviewed and tested the changes. -->
- [ ] I have read and reviewed the code and I understand what the code is doing.
- [ ] I have tested the code to the best of my ability and ensured it works as expected.

View File

@@ -1,73 +0,0 @@
#!/usr/bin/env python3
import os
import re
import sys
def find_version_references(directory: str) -> tuple[set[str], set[str]]:
openhands_versions = set()
runtime_versions = set()
version_pattern_openhands = re.compile(r'openhands:(\d{1})\.(\d{2})')
version_pattern_runtime = re.compile(r'runtime:(\d{1})\.(\d{2})')
for root, _, files in os.walk(directory):
# Skip .git directory and docs/build directory
if '.git' in root or 'docs/build' in root:
continue
for file in files:
if file.endswith(
('.md', '.yml', '.yaml', '.txt', '.html', '.py', '.js', '.ts')
):
file_path = os.path.join(root, file)
try:
with open(file_path, 'r', encoding='utf-8') as f:
content = f.read()
# Find all openhands version references
matches = version_pattern_openhands.findall(content)
if matches:
print(f'Found openhands version {matches} in {file_path}')
openhands_versions.update(matches)
# Find all runtime version references
matches = version_pattern_runtime.findall(content)
if matches:
print(f'Found runtime version {matches} in {file_path}')
runtime_versions.update(matches)
except Exception as e:
print(f'Error reading {file_path}: {e}', file=sys.stderr)
return openhands_versions, runtime_versions
def main():
repo_root = os.path.abspath(os.path.join(os.path.dirname(__file__), '..', '..'))
print(f'Checking version consistency in {repo_root}')
openhands_versions, runtime_versions = find_version_references(repo_root)
print(f'Found openhands versions: {sorted(openhands_versions)}')
print(f'Found runtime versions: {sorted(runtime_versions)}')
exit_code = 0
if len(openhands_versions) > 1:
print('Error: Multiple openhands versions found:', file=sys.stderr)
print('Found versions:', sorted(openhands_versions), file=sys.stderr)
exit_code = 1
elif len(openhands_versions) == 0:
print('Warning: No openhands version references found', file=sys.stderr)
if len(runtime_versions) > 1:
print('Error: Multiple runtime versions found:', file=sys.stderr)
print('Found versions:', sorted(runtime_versions), file=sys.stderr)
exit_code = 1
elif len(runtime_versions) == 0:
print('Warning: No runtime version references found', file=sys.stderr)
sys.exit(exit_code)
if __name__ == '__main__':
main()

View File

@@ -13,12 +13,9 @@ DOCKER_RUN_COMMAND="docker run -it --rm \
-p 3000:3000 \
-v /var/run/docker.sock:/var/run/docker.sock \
--add-host host.docker.internal:host-gateway \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/openhands/runtime:${SHORT_SHA}-nikolaik \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.openhands.dev/openhands/runtime:${SHORT_SHA}-nikolaik \
--name openhands-app-${SHORT_SHA} \
docker.all-hands.dev/openhands/openhands:${SHORT_SHA}"
# Define the uvx command
UVX_RUN_COMMAND="uvx --python 3.12 --from git+https://github.com/OpenHands/OpenHands@${BRANCH_NAME}#subdirectory=openhands-cli openhands"
docker.openhands.dev/openhands/openhands:${SHORT_SHA}"
# Get the current PR body
PR_BODY=$(gh pr view "$PR_NUMBER" --json body --jq .body)
@@ -37,11 +34,6 @@ GUI with Docker:
\`\`\`
${DOCKER_RUN_COMMAND}
\`\`\`
CLI with uvx:
\`\`\`
${UVX_RUN_COMMAND}
\`\`\`
EOF
)
else
@@ -57,11 +49,6 @@ GUI with Docker:
\`\`\`
${DOCKER_RUN_COMMAND}
\`\`\`
CLI with uvx:
\`\`\`
${UVX_RUN_COMMAND}
\`\`\`
EOF
)
fi

View File

@@ -0,0 +1,65 @@
name: Check Package Versions
on:
push:
branches: [main]
pull_request:
workflow_dispatch:
jobs:
check-package-versions:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Check for any 'rev' fields in pyproject.toml
run: |
python - <<'PY'
import sys, tomllib, pathlib
path = pathlib.Path("pyproject.toml")
if not path.exists():
print("❌ ERROR: pyproject.toml not found")
sys.exit(1)
try:
data = tomllib.loads(path.read_text(encoding="utf-8"))
except Exception as e:
print(f"❌ ERROR: Failed to parse pyproject.toml: {e}")
sys.exit(1)
poetry = data.get("tool", {}).get("poetry", {})
sections = {
"dependencies": poetry.get("dependencies", {}),
}
errors = []
print("🔍 Checking for any dependencies with 'rev' fields...\n")
for section_name, deps in sections.items():
if not isinstance(deps, dict):
continue
for pkg_name, cfg in deps.items():
if isinstance(cfg, dict) and "rev" in cfg:
msg = f" ✖ {pkg_name} in [{section_name}] uses rev='{cfg['rev']}' (NOT ALLOWED)"
print(msg)
errors.append(msg)
else:
print(f" • {pkg_name}: OK")
if errors:
print("\n❌ FAILED: Found dependencies using 'rev' fields:\n" + "\n".join(errors))
print("\nPlease use versioned releases instead, e.g.:")
print(' my-package = "1.0.0"')
sys.exit(1)
print("\n✅ SUCCESS: No 'rev' fields found. All dependencies are using proper versioned releases.")
PY

View File

@@ -1,69 +0,0 @@
# Workflow that cleans up outdated and old workflows to prevent out of disk issues
name: Delete old workflow runs
# This workflow is currently only triggered manually
on:
workflow_dispatch:
inputs:
days:
description: 'Days-worth of runs to keep for each workflow'
required: true
default: '30'
minimum_runs:
description: 'Minimum runs to keep for each workflow'
required: true
default: '10'
delete_workflow_pattern:
description: 'Name or filename of the workflow (if not set, all workflows are targeted)'
required: false
delete_workflow_by_state_pattern:
description: 'Filter workflows by state: active, deleted, disabled_fork, disabled_inactivity, disabled_manually'
required: true
default: "ALL"
type: choice
options:
- "ALL"
- active
- deleted
- disabled_inactivity
- disabled_manually
delete_run_by_conclusion_pattern:
description: 'Remove runs based on conclusion: action_required, cancelled, failure, skipped, success'
required: true
default: 'ALL'
type: choice
options:
- 'ALL'
- 'Unsuccessful: action_required,cancelled,failure,skipped'
- action_required
- cancelled
- failure
- skipped
- success
dry_run:
description: 'Logs simulated changes, no deletions are performed'
required: false
jobs:
del_runs:
runs-on: blacksmith-4vcpu-ubuntu-2204
permissions:
actions: write
contents: read
steps:
- name: Delete workflow runs
uses: Mattraks/delete-workflow-runs@v2
with:
token: ${{ github.token }}
repository: ${{ github.repository }}
retain_days: ${{ github.event.inputs.days }}
keep_minimum_runs: ${{ github.event.inputs.minimum_runs }}
delete_workflow_pattern: ${{ github.event.inputs.delete_workflow_pattern }}
delete_workflow_by_state_pattern: ${{ github.event.inputs.delete_workflow_by_state_pattern }}
delete_run_by_conclusion_pattern: >-
${{
startsWith(github.event.inputs.delete_run_by_conclusion_pattern, 'Unsuccessful:')
&& 'action_required,cancelled,failure,skipped'
|| github.event.inputs.delete_run_by_conclusion_pattern
}}
dry_run: ${{ github.event.inputs.dry_run }}

View File

@@ -1,114 +0,0 @@
# Workflow that builds and tests the CLI binary executable
name: CLI - Build binary and optionally release
# Run on pushes to main branch and CLI tags, and on pull requests when CLI files change
on:
push:
branches:
- main
tags:
- "*-cli"
pull_request:
paths:
- "openhands-cli/**"
permissions:
contents: write # needed to create releases or upload assets
# Cancel previous runs if a new commit is pushed
concurrency:
group: ${{ github.workflow }}-${{ (github.head_ref && github.ref) || github.run_id }}
cancel-in-progress: true
jobs:
build-binary:
name: Build binary executable
strategy:
matrix:
include:
# Build on Ubuntu 22.04 for maximum GLIBC compatibility (GLIBC 2.31)
- os: ubuntu-22.04
platform: linux
artifact_name: openhands-cli-linux
# Build on macOS for macOS users
- os: macos-15
platform: macos
artifact_name: openhands-cli-macos
runs-on: ${{ matrix.os }}
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: 3.12
- name: Install uv
uses: astral-sh/setup-uv@v3
with:
version: "latest"
- name: Install dependencies
working-directory: openhands-cli
run: |
uv sync
- name: Build binary executable
working-directory: openhands-cli
run: |
./build.sh --install-pyinstaller | tee output.log
echo "Full output:"
cat output.log
if grep -q "❌" output.log; then
echo "❌ Found failure marker in output"
exit 1
fi
echo "✅ Build & test finished without ❌ markers"
- name: Upload binary artifact
uses: actions/upload-artifact@v4
with:
name: ${{ matrix.artifact_name }}
path: openhands-cli/dist/openhands*
retention-days: 30
create-github-release:
name: Create GitHub Release
runs-on: ubuntu-latest
needs: build-binary
if: startsWith(github.ref, 'refs/tags/')
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Download all artifacts
uses: actions/download-artifact@v4
with:
path: artifacts
- name: Prepare release assets
run: |
mkdir -p release-assets
# Copy binaries with appropriate names for release
if [ -f artifacts/openhands-cli-linux/openhands ]; then
cp artifacts/openhands-cli-linux/openhands release-assets/openhands-linux
fi
if [ -f artifacts/openhands-cli-macos/openhands ]; then
cp artifacts/openhands-cli-macos/openhands release-assets/openhands-macos
fi
ls -la release-assets/
- name: Create GitHub Release
uses: softprops/action-gh-release@v2
with:
files: release-assets/*
draft: true
prerelease: false
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

View File

@@ -1,23 +0,0 @@
name: Dispatch to docs repo
on:
push:
branches: [main]
paths:
- 'docs/**'
workflow_dispatch:
jobs:
dispatch:
runs-on: ubuntu-latest
strategy:
matrix:
repo: ["OpenHands/docs"]
steps:
- name: Push to docs repo
uses: peter-evans/repository-dispatch@v3
with:
token: ${{ secrets.ALLHANDS_BOT_GITHUB_PAT }}
repository: ${{ matrix.repo }}
event-type: update
client-payload: '{"ref": "${{ github.ref }}", "sha": "${{ github.sha }}", "module": "openhands", "branch": "main"}'

View File

@@ -86,7 +86,7 @@ jobs:
# Builds the runtime Docker images
ghcr_build_runtime:
name: Build Image
name: Build Runtime Image
runs-on: blacksmith-8vcpu-ubuntu-2204
if: "!(github.event_name == 'push' && startsWith(github.ref, 'refs/tags/ext-v'))"
permissions:
@@ -256,7 +256,7 @@ jobs:
test_runtime_root:
name: RT Unit Tests (Root)
needs: [ghcr_build_runtime, define-matrix]
runs-on: blacksmith-8vcpu-ubuntu-2204
runs-on: blacksmith-4vcpu-ubuntu-2404
strategy:
fail-fast: false
matrix:
@@ -298,7 +298,7 @@ jobs:
# We install pytest-xdist in order to run tests across CPUs
poetry run pip install pytest-xdist
# Install to be able to retry on failures for flaky tests
# Install to be able to retry on failures for flakey tests
poetry run pip install pytest-rerunfailures
image_name=ghcr.io/${{ env.REPO_OWNER }}/runtime:${{ env.RELEVANT_SHA }}-${{ matrix.base_image.tag }}
@@ -311,14 +311,14 @@ jobs:
SANDBOX_RUNTIME_CONTAINER_IMAGE=$image_name \
TEST_IN_CI=true \
RUN_AS_OPENHANDS=false \
poetry run pytest -n 0 -raRs --reruns 2 --reruns-delay 5 -s ./tests/runtime --ignore=tests/runtime/test_browsergym_envs.py --durations=10
poetry run pytest -n 5 -raRs --reruns 2 --reruns-delay 3 -s ./tests/runtime --ignore=tests/runtime/test_browsergym_envs.py --durations=10
env:
DEBUG: "1"
# Run unit tests with the Docker runtime Docker images as openhands user
test_runtime_oh:
name: RT Unit Tests (openhands)
runs-on: blacksmith-8vcpu-ubuntu-2204
runs-on: blacksmith-4vcpu-ubuntu-2404
needs: [ghcr_build_runtime, define-matrix]
strategy:
matrix:
@@ -370,7 +370,7 @@ jobs:
SANDBOX_RUNTIME_CONTAINER_IMAGE=$image_name \
TEST_IN_CI=true \
RUN_AS_OPENHANDS=true \
poetry run pytest -n 0 -raRs --reruns 2 --reruns-delay 5 -s ./tests/runtime --ignore=tests/runtime/test_browsergym_envs.py --durations=10
poetry run pytest -n 5 -raRs --reruns 2 --reruns-delay 3 -s ./tests/runtime --ignore=tests/runtime/test_browsergym_envs.py --durations=10
env:
DEBUG: "1"

View File

@@ -1,199 +0,0 @@
name: Run Integration Tests
on:
pull_request:
types: [labeled]
workflow_dispatch:
inputs:
reason:
description: 'Reason for manual trigger'
required: true
default: ''
schedule:
- cron: '30 22 * * *' # Runs at 10:30pm UTC every day
env:
N_PROCESSES: 10 # Global configuration for number of parallel processes for evaluation
jobs:
run-integration-tests:
if: github.event.label.name == 'integration-test' || github.event_name == 'workflow_dispatch' || github.event_name == 'schedule'
runs-on: blacksmith-4vcpu-ubuntu-2204
permissions:
contents: "read"
id-token: "write"
pull-requests: "write"
issues: "write"
strategy:
matrix:
python-version: ["3.12"]
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Install poetry via pipx
run: pipx install poetry
- name: Set up Python
uses: useblacksmith/setup-python@v6
with:
python-version: ${{ matrix.python-version }}
cache: "poetry"
- name: Setup Node.js
uses: useblacksmith/setup-node@v5
with:
node-version: '22.x'
- name: Comment on PR if 'integration-test' label is present
if: github.event_name == 'pull_request' && github.event.label.name == 'integration-test'
uses: KeisukeYamashita/create-comment@v1
with:
unique: false
comment: |
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.
- name: Install Python dependencies using Poetry
run: poetry install --with dev,test,runtime,evaluation
- name: Configure config.toml for testing with Haiku
env:
LLM_MODEL: "litellm_proxy/claude-3-5-haiku-20241022"
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
LLM_BASE_URL: ${{ secrets.LLM_BASE_URL }}
MAX_ITERATIONS: 10
run: |
echo "[llm.eval]" > config.toml
echo "model = \"$LLM_MODEL\"" >> config.toml
echo "api_key = \"$LLM_API_KEY\"" >> config.toml
echo "base_url = \"$LLM_BASE_URL\"" >> config.toml
echo "temperature = 0.0" >> config.toml
- name: Build environment
run: make build
- name: Run integration test evaluation for Haiku
env:
SANDBOX_FORCE_REBUILD_RUNTIME: True
run: |
poetry run ./evaluation/integration_tests/scripts/run_infer.sh llm.eval HEAD CodeActAgent '' 10 $N_PROCESSES '' 'haiku_run'
# get integration tests report
REPORT_FILE_HAIKU=$(find evaluation/evaluation_outputs/outputs/integration_tests/CodeActAgent/*haiku*_maxiter_10_N* -name "report.md" -type f | head -n 1)
echo "REPORT_FILE: $REPORT_FILE_HAIKU"
echo "INTEGRATION_TEST_REPORT_HAIKU<<EOF" >> $GITHUB_ENV
cat $REPORT_FILE_HAIKU >> $GITHUB_ENV
echo >> $GITHUB_ENV
echo "EOF" >> $GITHUB_ENV
- name: Wait a little bit
run: sleep 10
- name: Configure config.toml for testing with DeepSeek
env:
LLM_MODEL: "litellm_proxy/deepseek-chat"
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
LLM_BASE_URL: ${{ secrets.LLM_BASE_URL }}
MAX_ITERATIONS: 10
run: |
echo "[llm.eval]" > config.toml
echo "model = \"$LLM_MODEL\"" >> config.toml
echo "api_key = \"$LLM_API_KEY\"" >> config.toml
echo "base_url = \"$LLM_BASE_URL\"" >> config.toml
echo "temperature = 0.0" >> config.toml
- name: Run integration test evaluation for DeepSeek
env:
SANDBOX_FORCE_REBUILD_RUNTIME: True
run: |
poetry run ./evaluation/integration_tests/scripts/run_infer.sh llm.eval HEAD CodeActAgent '' 10 $N_PROCESSES '' 'deepseek_run'
# get integration tests report
REPORT_FILE_DEEPSEEK=$(find evaluation/evaluation_outputs/outputs/integration_tests/CodeActAgent/deepseek*_maxiter_10_N* -name "report.md" -type f | head -n 1)
echo "REPORT_FILE: $REPORT_FILE_DEEPSEEK"
echo "INTEGRATION_TEST_REPORT_DEEPSEEK<<EOF" >> $GITHUB_ENV
cat $REPORT_FILE_DEEPSEEK >> $GITHUB_ENV
echo >> $GITHUB_ENV
echo "EOF" >> $GITHUB_ENV
# -------------------------------------------------------------
# Run VisualBrowsingAgent tests for DeepSeek, limited to t05 and t06
- name: Wait a little bit (again)
run: sleep 5
- name: Configure config.toml for testing VisualBrowsingAgent (DeepSeek)
env:
LLM_MODEL: "litellm_proxy/deepseek-chat"
LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
LLM_BASE_URL: ${{ secrets.LLM_BASE_URL }}
MAX_ITERATIONS: 15
run: |
echo "[llm.eval]" > config.toml
echo "model = \"$LLM_MODEL\"" >> config.toml
echo "api_key = \"$LLM_API_KEY\"" >> config.toml
echo "base_url = \"$LLM_BASE_URL\"" >> config.toml
echo "temperature = 0.0" >> config.toml
- name: Run integration test evaluation for VisualBrowsingAgent (DeepSeek)
env:
SANDBOX_FORCE_REBUILD_RUNTIME: True
run: |
poetry run ./evaluation/integration_tests/scripts/run_infer.sh llm.eval HEAD VisualBrowsingAgent '' 15 $N_PROCESSES "t05_simple_browsing,t06_github_pr_browsing.py" 'visualbrowsing_deepseek_run'
# Find and export the visual browsing agent test results
REPORT_FILE_VISUALBROWSING_DEEPSEEK=$(find evaluation/evaluation_outputs/outputs/integration_tests/VisualBrowsingAgent/deepseek*_maxiter_15_N* -name "report.md" -type f | head -n 1)
echo "REPORT_FILE_VISUALBROWSING_DEEPSEEK: $REPORT_FILE_VISUALBROWSING_DEEPSEEK"
echo "INTEGRATION_TEST_REPORT_VISUALBROWSING_DEEPSEEK<<EOF" >> $GITHUB_ENV
cat $REPORT_FILE_VISUALBROWSING_DEEPSEEK >> $GITHUB_ENV
echo >> $GITHUB_ENV
echo "EOF" >> $GITHUB_ENV
- name: Create archive of evaluation outputs
run: |
TIMESTAMP=$(date +'%y-%m-%d-%H-%M')
cd evaluation/evaluation_outputs/outputs # Change to the outputs directory
tar -czvf ../../../integration_tests_${TIMESTAMP}.tar.gz integration_tests/CodeActAgent/* integration_tests/VisualBrowsingAgent/* # Only include the actual result directories
- name: Upload evaluation results as artifact
uses: actions/upload-artifact@v4
id: upload_results_artifact
with:
name: integration-test-outputs-${{ github.run_id }}-${{ github.run_attempt }}
path: integration_tests_*.tar.gz
- name: Get artifact URLs
run: |
echo "ARTIFACT_URL=${{ steps.upload_results_artifact.outputs.artifact-url }}" >> $GITHUB_ENV
- name: Set timestamp and trigger reason
run: |
echo "TIMESTAMP=$(date +'%Y-%m-%d-%H-%M')" >> $GITHUB_ENV
if [[ "${{ github.event_name }}" == "pull_request" ]]; then
echo "TRIGGER_REASON=pr-${{ github.event.pull_request.number }}" >> $GITHUB_ENV
elif [[ "${{ github.event_name }}" == "workflow_dispatch" ]]; then
echo "TRIGGER_REASON=manual-${{ github.event.inputs.reason }}" >> $GITHUB_ENV
else
echo "TRIGGER_REASON=nightly-scheduled" >> $GITHUB_ENV
fi
- name: Comment with results and artifact link
id: create_comment
uses: KeisukeYamashita/create-comment@v1
with:
# if triggered by PR, use PR number, otherwise use 9745 as fallback issue number for manual triggers
number: ${{ github.event_name == 'pull_request' && github.event.pull_request.number || 9745 }}
unique: false
comment: |
Trigger by: ${{ github.event_name == 'pull_request' && format('Pull Request (integration-test label on PR #{0})', github.event.pull_request.number) || (github.event_name == 'workflow_dispatch' && format('Manual Trigger: {0}', github.event.inputs.reason)) || 'Nightly Scheduled Run' }}
Commit: ${{ github.sha }}
**Integration Tests Report (Haiku)**
Haiku LLM Test Results:
${{ env.INTEGRATION_TEST_REPORT_HAIKU }}
---
**Integration Tests Report (DeepSeek)**
DeepSeek LLM Test Results:
${{ env.INTEGRATION_TEST_REPORT_DEEPSEEK }}
---
**Integration Tests Report VisualBrowsing (DeepSeek)**
${{ env.INTEGRATION_TEST_REPORT_VISUALBROWSING_DEEPSEEK }}
---
Download testing outputs (includes both Haiku and DeepSeek results): [Download](${{ steps.upload_results_artifact.outputs.artifact-url }})

View File

@@ -72,34 +72,3 @@ jobs:
- name: Run pre-commit hooks
working-directory: ./enterprise
run: pre-commit run --all-files --show-diff-on-failure --config ./dev_config/python/.pre-commit-config.yaml
lint-cli-python:
name: Lint CLI python
runs-on: blacksmith-4vcpu-ubuntu-2204
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Set up python
uses: useblacksmith/setup-python@v6
with:
python-version: 3.12
cache: "pip"
- name: Install pre-commit
run: pip install pre-commit==4.2.0
- name: Run pre-commit hooks
working-directory: ./openhands-cli
run: pre-commit run --all-files --config ./dev_config/python/.pre-commit-config.yaml
# Check version consistency across documentation
check-version-consistency:
name: Check version consistency
runs-on: blacksmith-4vcpu-ubuntu-2204
steps:
- uses: actions/checkout@v4
- name: Set up python
uses: useblacksmith/setup-python@v6
with:
python-version: 3.12
- name: Run version consistency check
run: .github/scripts/check_version_consistency.py

View File

@@ -1,70 +0,0 @@
# Workflow that checks MDX format in docs/ folder
name: MDX Lint
# Run on pushes to main and on pull requests that modify docs/ files
on:
push:
branches:
- main
paths:
- 'docs/**/*.mdx'
pull_request:
paths:
- 'docs/**/*.mdx'
# If triggered by a PR, it will be in the same group. However, each commit on main will be in its own unique group
concurrency:
group: ${{ github.workflow }}-${{ (github.head_ref && github.ref) || github.run_id }}
cancel-in-progress: true
jobs:
mdx-lint:
name: Lint MDX files
runs-on: blacksmith-4vcpu-ubuntu-2204
steps:
- uses: actions/checkout@v4
- name: Install Node.js 22
uses: useblacksmith/setup-node@v5
with:
node-version: 22
- name: Install MDX dependencies
run: |
npm install @mdx-js/mdx@3 glob@10
- name: Validate MDX files
run: |
node -e "
const {compile} = require('@mdx-js/mdx');
const fs = require('fs');
const path = require('path');
const glob = require('glob');
async function validateMDXFiles() {
const files = glob.sync('docs/**/*.mdx');
console.log('Found', files.length, 'MDX files to validate');
let hasErrors = false;
for (const file of files) {
try {
const content = fs.readFileSync(file, 'utf8');
await compile(content);
console.log('✅ MDX parsing successful for', file);
} catch (err) {
console.error('❌ MDX parsing failed for', file, ':', err.message);
hasErrors = true;
}
}
if (hasErrors) {
console.error('\\n❌ Some MDX files have parsing errors. Please fix them before merging.');
process.exit(1);
} else {
console.log('\\n✅ All MDX files are valid!');
}
}
validateMDXFiles();
"

View File

@@ -48,7 +48,10 @@ jobs:
python-version: ${{ matrix.python-version }}
cache: "poetry"
- name: Install Python dependencies using Poetry
run: poetry install --with dev,test,runtime
run: |
poetry install --with dev,test,runtime
poetry run pip install pytest-xdist
poetry run pip install pytest-rerunfailures
- name: Build Environment
run: make build
- name: Run Unit Tests
@@ -56,7 +59,7 @@ jobs:
env:
COVERAGE_FILE: ".coverage.${{ matrix.python_version }}"
- name: Run Runtime Tests with CLIRuntime
run: PYTHONPATH=".:$PYTHONPATH" TEST_RUNTIME=cli poetry run pytest -s tests/runtime/test_bash.py --cov=openhands --cov-branch
run: PYTHONPATH=".:$PYTHONPATH" TEST_RUNTIME=cli poetry run pytest -n 5 --reruns 2 --reruns-delay 3 -s tests/runtime/test_bash.py --cov=openhands --cov-branch
env:
COVERAGE_FILE: ".coverage.runtime.${{ matrix.python_version }}"
- name: Store coverage file
@@ -67,37 +70,7 @@ jobs:
.coverage.${{ matrix.python_version }}
.coverage.runtime.${{ matrix.python_version }}
include-hidden-files: true
# Run specific Windows python tests
test-on-windows:
name: Python Tests on Windows
runs-on: windows-latest
strategy:
matrix:
python-version: ["3.12"]
steps:
- uses: actions/checkout@v4
- name: Install pipx
run: pip install pipx
- name: Install poetry via pipx
run: pipx install poetry
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: "poetry"
- name: Install Python dependencies using Poetry
run: poetry install --with dev,test,runtime
- name: Run Windows unit tests
run: poetry run pytest -svv tests/unit/runtime/utils/test_windows_bash.py
env:
PYTHONPATH: ".;$env:PYTHONPATH"
DEBUG: "1"
- name: Run Windows runtime tests with LocalRuntime
run: $env:TEST_RUNTIME="local"; poetry run pytest -svv tests/runtime/test_bash.py
env:
PYTHONPATH: ".;$env:PYTHONPATH"
TEST_RUNTIME: local
DEBUG: "1"
test-enterprise:
name: Enterprise Python Unit Tests
runs-on: blacksmith-4vcpu-ubuntu-2404
@@ -128,56 +101,11 @@ jobs:
path: ".coverage.enterprise.${{ matrix.python_version }}"
include-hidden-files: true
# Run CLI unit tests
test-cli-python:
name: CLI Unit Tests
runs-on: blacksmith-4vcpu-ubuntu-2404
strategy:
matrix:
python-version: ["3.12"]
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Set up Python
uses: useblacksmith/setup-python@v6
with:
python-version: ${{ matrix.python-version }}
- name: Install uv
uses: astral-sh/setup-uv@v3
with:
version: "latest"
- name: Install dependencies
working-directory: ./openhands-cli
run: |
uv sync --group dev
- name: Run CLI unit tests
working-directory: ./openhands-cli
env:
# write coverage to repo root so the merge step finds it
COVERAGE_FILE: "${{ github.workspace }}/.coverage.openhands-cli.${{ matrix.python-version }}"
run: |
uv run pytest --forked -n auto -s \
-p no:ddtrace -p no:ddtrace.pytest_bdd -p no:ddtrace.pytest_benchmark \
tests --cov=openhands_cli --cov-branch
- name: Store coverage file
uses: actions/upload-artifact@v4
with:
name: coverage-openhands-cli
path: ".coverage.openhands-cli.${{ matrix.python-version }}"
include-hidden-files: true
coverage-comment:
name: Coverage Comment
if: github.event_name == 'pull_request'
runs-on: ubuntu-latest
needs: [test-on-linux, test-enterprise, test-cli-python]
needs: [test-on-linux, test-enterprise]
permissions:
pull-requests: write
@@ -191,27 +119,9 @@ jobs:
pattern: coverage-*
merge-multiple: true
- name: Create symlink for CLI source files
run: ln -sf openhands-cli/openhands_cli openhands_cli
- name: Coverage comment
# In PR mode leaves a comment, otherwise records coverage in branch python-coverage-comment-action-data.
id: coverage_comment
uses: py-cov-action/python-coverage-comment-action@v3
with:
GITHUB_TOKEN: ${{ github.token }}
MERGE_COVERAGE_FILES: true
- name: Enforce coverage
# Fail if on PR AND there are uncovered lines AND diff coverage is less than total coverage.
# To debug, try a step to log outputs like: `echo ${{ toJSON(steps.coverage_comment.outputs) }}`
# Once we track base branch, reference_percent_covered will be better to use than new_percent_covered.
if: ${{ github.event_name == 'pull_request' && fromJSON(steps.coverage_comment.outputs.diff_total_num_violations) > 0 && steps.coverage_comment.outputs.diff_total_percent_covered < steps.coverage_comment.outputs.new_percent_covered }}
run: |
echo "Coverage decreased, which is not allowed."
echo "Please add some unit tests for the modified code."
echo
echo " diff_total_num_violations: ${{ steps.coverage_comment.outputs.diff_total_num_violations }}"
echo " diff_total_percent_covered: ${{ steps.coverage_comment.outputs.diff_total_percent_covered}}"
echo " new_percent_covered: ${{ steps.coverage_comment.outputs.new_percent_covered}}"
exit 1

View File

@@ -10,7 +10,6 @@ on:
type: choice
options:
- app server
- cli
default: app server
push:
tags:
@@ -39,36 +38,3 @@ jobs:
run: ./build.sh
- name: publish
run: poetry publish -u __token__ -p ${{ secrets.PYPI_TOKEN }}
release-cli:
name: Publish CLI to PyPI
runs-on: ubuntu-latest
# Run when manually dispatched for "cli" OR for tag pushes that contain '-cli'
if: |
(github.event_name == 'workflow_dispatch' && github.event.inputs.reason == 'cli')
|| (github.event_name == 'push' && startsWith(github.ref, 'refs/tags/') && contains(github.ref, '-cli'))
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: 3.12
- name: Install uv
uses: astral-sh/setup-uv@v3
with:
version: "latest"
- name: Build CLI package
working-directory: openhands-cli
run: |
# Clean dist directory to avoid conflicts with binary builds
rm -rf dist/
uv build
- name: Publish CLI to PyPI
working-directory: openhands-cli
run: |
uv publish --token ${{ secrets.PYPI_TOKEN_OPENHANDS }}

View File

@@ -1,135 +0,0 @@
# Run evaluation on a PR, after releases, or manually
name: Run Eval
# Runs when a PR is labeled with one of the "run-eval-" labels, after releases, or manually triggered
on:
pull_request:
types: [labeled]
release:
types: [published]
workflow_dispatch:
inputs:
branch:
description: 'Branch to evaluate'
required: true
default: 'main'
eval_instances:
description: 'Number of evaluation instances'
required: true
default: '50'
type: choice
options:
- '1'
- '2'
- '50'
- '100'
reason:
description: 'Reason for manual trigger'
required: false
default: ''
env:
# Environment variable for the master GitHub issue number where all evaluation results will be commented
# This should be set to the issue number where you want all evaluation results to be posted
MASTER_EVAL_ISSUE_NUMBER: ${{ vars.MASTER_EVAL_ISSUE_NUMBER || '0' }}
jobs:
trigger-job:
name: Trigger remote eval job
if: ${{ (github.event_name == 'pull_request' && (github.event.label.name == 'run-eval-1' || github.event.label.name == 'run-eval-2' || github.event.label.name == 'run-eval-50' || github.event.label.name == 'run-eval-100')) || github.event_name == 'release' || github.event_name == 'workflow_dispatch' }}
runs-on: blacksmith-4vcpu-ubuntu-2204
steps:
- name: Checkout branch
uses: actions/checkout@v4
with:
ref: ${{ github.event_name == 'pull_request' && github.head_ref || (github.event_name == 'workflow_dispatch' && github.event.inputs.branch) || github.ref }}
- name: Set evaluation parameters
id: eval_params
run: |
REPO_URL="https://github.com/${{ github.repository }}"
echo "Repository URL: $REPO_URL"
# Determine branch based on trigger type
if [[ "${{ github.event_name }}" == "pull_request" ]]; then
EVAL_BRANCH="${{ github.head_ref }}"
echo "PR Branch: $EVAL_BRANCH"
elif [[ "${{ github.event_name }}" == "workflow_dispatch" ]]; then
EVAL_BRANCH="${{ github.event.inputs.branch }}"
echo "Manual Branch: $EVAL_BRANCH"
else
# For release events, use the tag name or main branch
EVAL_BRANCH="${{ github.ref_name }}"
echo "Release Branch/Tag: $EVAL_BRANCH"
fi
# Determine evaluation instances based on trigger type
if [[ "${{ github.event_name }}" == "pull_request" ]]; then
if [[ "${{ github.event.label.name }}" == "run-eval-1" ]]; then
EVAL_INSTANCES="1"
elif [[ "${{ github.event.label.name }}" == "run-eval-2" ]]; then
EVAL_INSTANCES="2"
elif [[ "${{ github.event.label.name }}" == "run-eval-50" ]]; then
EVAL_INSTANCES="50"
elif [[ "${{ github.event.label.name }}" == "run-eval-100" ]]; then
EVAL_INSTANCES="100"
fi
elif [[ "${{ github.event_name }}" == "workflow_dispatch" ]]; then
EVAL_INSTANCES="${{ github.event.inputs.eval_instances }}"
else
# For release events, default to 50 instances
EVAL_INSTANCES="50"
fi
echo "Evaluation instances: $EVAL_INSTANCES"
echo "repo_url=$REPO_URL" >> $GITHUB_OUTPUT
echo "eval_branch=$EVAL_BRANCH" >> $GITHUB_OUTPUT
echo "eval_instances=$EVAL_INSTANCES" >> $GITHUB_OUTPUT
- name: Trigger remote job
run: |
# Determine PR number for the remote evaluation system
if [[ "${{ github.event_name }}" == "pull_request" ]]; then
PR_NUMBER="${{ github.event.pull_request.number }}"
else
# For non-PR triggers, use the master issue number as PR number
PR_NUMBER="${{ env.MASTER_EVAL_ISSUE_NUMBER }}"
fi
curl -X POST \
-H "Authorization: Bearer ${{ secrets.PAT_TOKEN }}" \
-H "Accept: application/vnd.github+json" \
-d "{\"ref\": \"main\", \"inputs\": {\"github-repo\": \"${{ steps.eval_params.outputs.repo_url }}\", \"github-branch\": \"${{ steps.eval_params.outputs.eval_branch }}\", \"pr-number\": \"${PR_NUMBER}\", \"eval-instances\": \"${{ steps.eval_params.outputs.eval_instances }}\"}}" \
https://api.github.com/repos/OpenHands/evaluation/actions/workflows/create-branch.yml/dispatches
# Send Slack message
if [[ "${{ github.event_name }}" == "pull_request" ]]; then
TRIGGER_URL="https://github.com/${{ github.repository }}/pull/${{ github.event.pull_request.number }}"
slack_text="PR $TRIGGER_URL has triggered evaluation on ${{ steps.eval_params.outputs.eval_instances }} instances..."
elif [[ "${{ github.event_name }}" == "release" ]]; then
TRIGGER_URL="https://github.com/${{ github.repository }}/releases/tag/${{ github.ref_name }}"
slack_text="Release $TRIGGER_URL has triggered evaluation on ${{ steps.eval_params.outputs.eval_instances }} instances..."
else
TRIGGER_URL="https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}"
slack_text="Manual trigger (${{ github.event.inputs.reason || 'No reason provided' }}) has triggered evaluation on ${{ steps.eval_params.outputs.eval_instances }} instances for branch ${{ steps.eval_params.outputs.eval_branch }}..."
fi
curl -X POST -H 'Content-type: application/json' --data '{"text":"'"$slack_text"'"}' \
https://hooks.slack.com/services/${{ secrets.SLACK_TOKEN }}
- name: Comment on issue/PR
uses: KeisukeYamashita/create-comment@v1
with:
# For PR triggers, comment on the PR. For other triggers, comment on the master issue
number: ${{ github.event_name == 'pull_request' && github.event.pull_request.number || env.MASTER_EVAL_ISSUE_NUMBER }}
unique: false
comment: |
**Evaluation Triggered**
**Trigger:** ${{ github.event_name == 'pull_request' && format('Pull Request #{0}', github.event.pull_request.number) || (github.event_name == 'release' && 'Release') || format('Manual Trigger: {0}', github.event.inputs.reason || 'No reason provided') }}
**Branch:** ${{ steps.eval_params.outputs.eval_branch }}
**Instances:** ${{ steps.eval_params.outputs.eval_instances }}
**Commit:** ${{ github.sha }}
Running evaluation on the specified branch. Once eval is done, the results will be posted here.

3
.gitignore vendored
View File

@@ -185,6 +185,9 @@ cython_debug/
.repomix
repomix-output.txt
# Emacs backup
*~
# evaluation
evaluation/evaluation_outputs
evaluation/outputs

1
CNAME Normal file
View File

@@ -0,0 +1 @@
docs.all-hands.dev

View File

@@ -1,43 +1,45 @@
# 🙌 The OpenHands Community
# The OpenHands Community
The OpenHands community is built around the belief that (1) AI and AI agents are going to fundamentally change the way
we build software, and (2) if this is true, we should do everything we can to make sure that the benefits provided by
such powerful technology are accessible to everyone.
OpenHands is a community of engineers, academics, and enthusiasts reimagining software development for an AI-powered world.
If this resonates with you, we'd love to have you join us in our quest!
## Mission
## 🤝 How to Join
Its very clear that AI is changing software development. We want the developer community to drive that change organically, through open source.
Check out our [How to Join the Community section.](https://github.com/OpenHands/OpenHands?tab=readme-ov-file#-how-to-join-the-community)
So were not just building friendly interfaces for AI-driven development. Were publishing _building blocks_ that empower developers to create new experiences, tailored to your own habits, needs, and imagination.
## 💪 Becoming a Contributor
## Ethos
We welcome contributions from everyone! Whether you're a developer, a researcher, or simply enthusiastic about advancing
the field of software engineering with AI, there are many ways to get involved:
We have two core values: **high openness** and **high agency**. While we dont expect everyone in the community to embody these values, we want to establish them as norms.
- **Code Contributions:** Help us develop new core functionality, improve our agents, improve the frontend and other
interfaces, or anything else that would help make OpenHands better.
- **Research and Evaluation:** Contribute to our understanding of LLMs in software engineering, participate in
evaluating the models, or suggest improvements.
- **Feedback and Testing:** Use the OpenHands toolset, report bugs, suggest features, or provide feedback on usability.
### High Openness
For details, please check [CONTRIBUTING.md](./CONTRIBUTING.md).
We welcome anyone and everyone into our community by default. You dont have to be a software developer to help us build. You dont have to be pro-AI to help us learn.
## Code of Conduct
Our plans, our work, our successes, and our failures are all public record. We want the world to see not just the fruits of our work, but the whole process of growing it.
We have a [Code of Conduct](./CODE_OF_CONDUCT.md) that we expect all contributors to adhere to.
Long story short, we are aiming for an open, welcoming, diverse, inclusive, and healthy community.
All contributors are expected to contribute to building this sort of community.
We welcome thoughtful criticism, whether its a comment on a PR or feedback on the community as a whole.
## 🛠️ Becoming a Maintainer
### High Agency
For contributors who have made significant and sustained contributions to the project, there is a possibility of joining
the maintainer team. The process for this is as follows:
Everyone should feel empowered to contribute to OpenHands. Whether its by making a PR, hosting an event, sharing feedback, or just asking a question, dont hold back!
1. Any contributor who has made sustained and high-quality contributions to the codebase can be nominated by any
maintainer. If you feel that you may qualify you can reach out to any of the maintainers that have reviewed your PRs and ask if you can be nominated.
2. Once a maintainer nominates a new maintainer, there will be a discussion period among the maintainers for at least 3 days.
3. If no concerns are raised the nomination will be accepted by acclamation, and if concerns are raised there will be a discussion and possible vote.
OpenHands gives everyone the building blocks to create state-of-the-art developer experiences. We experiment constantly and love building new things.
Note that just making many PRs does not immediately imply that you will become a maintainer. We will be looking
at sustained high-quality contributions over a period of time, as well as good teamwork and adherence to our [Code of Conduct](./CODE_OF_CONDUCT.md).
Coding, development practices, and communities are changing rapidly. We wont hesitate to change direction and make big bets.
## Relationship to All Hands
OpenHands is supported by the for-profit organization [All Hands AI, Inc](https://www.all-hands.dev/).
All Hands was founded by three of the first major contributors to OpenHands:
- Xingyao Wang, a UIUC PhD candidate who got OpenHands to the top of the SWE-bench leaderboards
- Graham Neubig, a CMU Professor who rallied the academic community around OpenHands
- Robert Brennan, a software engineer who architected the user-facing features of OpenHands
All Hands is an important part of the OpenHands ecosystem. Weve raised over $20M--mainly to hire developers and researchers who can work on OpenHands full-time, and to provide them with expensive infrastructure. ([Join us!](https://allhandsai.applytojob.com/apply/))
But we see OpenHands as much larger, and ultimately more important, than All Hands. When our financial responsibility to investors is at odds with our social responsibility to the community—as it inevitably will be, from time to time—we promise to navigate that conflict thoughtfully and transparently.
At some point, we may transfer custody of OpenHands to an open source foundation. But for now, the [Benevolent Dictator approach](http://www.catb.org/~esr/writings/cathedral-bazaar/homesteading/ar01s16.html) helps us move forward with speed and intention. If we ever forget the “benevolent” part, please: fork us.

View File

@@ -58,7 +58,7 @@ by implementing the [interface specified here](https://github.com/OpenHands/Open
#### Testing
When you write code, it is also good to write tests. Please navigate to the [`./tests`](./tests) folder to see existing test suites.
At the moment, we have two kinds of tests: [`unit`](./tests/unit) and [`integration`](./evaluation/integration_tests). Please refer to the README for each test suite. These tests also run on GitHub's continuous integration to ensure quality of the project.
At the moment, we have these kinds of tests: [`unit`](./tests/unit), [`runtime`](./tests/runtime), and [`end-to-end (e2e)`](./tests/e2e). Please refer to the README for each test suite. These tests also run on GitHub's continuous integration to ensure quality of the project.
## Sending Pull Requests to OpenHands

View File

@@ -91,14 +91,14 @@ make run
#### Option B: Individual Server Startup
- **Start the Backend Server:** If you prefer, you can start the backend server independently to focus on
backend-related tasks or configurations.
backend-related tasks or configurations.
```bash
make start-backend
```
- **Start the Frontend Server:** Similarly, you can start the frontend server on its own to work on frontend-related
components or interface enhancements.
components or interface enhancements.
```bash
make start-frontend
```
@@ -110,6 +110,7 @@ You can use OpenHands to develop and improve OpenHands itself! This is a powerfu
#### Quick Start
1. **Build and run OpenHands:**
```bash
export INSTALL_DOCKER=0
export RUNTIME=local
@@ -117,6 +118,7 @@ You can use OpenHands to develop and improve OpenHands itself! This is a powerfu
```
2. **Access the interface:**
- Local development: http://localhost:3001
- Remote/cloud environments: Use the appropriate external URL
@@ -159,7 +161,7 @@ poetry run pytest ./tests/unit/test_*.py
To reduce build time (e.g., if no changes were made to the client-runtime component), you can use an existing Docker
container image by setting the SANDBOX_RUNTIME_CONTAINER_IMAGE environment variable to the desired Docker image.
Example: `export SANDBOX_RUNTIME_CONTAINER_IMAGE=ghcr.io/openhands/runtime:0.59-nikolaik`
Example: `export SANDBOX_RUNTIME_CONTAINER_IMAGE=ghcr.io/openhands/runtime:0.62-nikolaik`
## Develop inside Docker container
@@ -199,6 +201,6 @@ Here's a guide to the important documentation files in the repository:
- [/containers/README.md](./containers/README.md): Information about Docker containers and deployment
- [/tests/unit/README.md](./tests/unit/README.md): Guide to writing and running unit tests
- [/evaluation/README.md](./evaluation/README.md): Documentation for the evaluation framework and benchmarks
- [/microagents/README.md](./microagents/README.md): Information about the microagents architecture and implementation
- [/skills/README.md](./skills/README.md): Information about the skills architecture and implementation
- [/openhands/server/README.md](./openhands/server/README.md): Server implementation details and API documentation
- [/openhands/runtime/README.md](./openhands/runtime/README.md): Documentation for the runtime environment and execution model

186
README.md
View File

@@ -1,22 +1,18 @@
<a name="readme-top"></a>
<div align="center">
<img src="https://raw.githubusercontent.com/All-Hands-AI/docs/main/openhands/static/img/logo.png" alt="Logo" width="200">
<h1 align="center">OpenHands: Code Less, Make More</h1>
<img src="https://raw.githubusercontent.com/OpenHands/docs/main/openhands/static/img/logo.png" alt="Logo" width="200">
<h1 align="center" style="border-bottom: none">OpenHands: AI-Driven Development</h1>
</div>
<div align="center">
<a href="https://github.com/OpenHands/OpenHands/graphs/contributors"><img src="https://img.shields.io/github/contributors/OpenHands/OpenHands?style=for-the-badge&color=blue" alt="Contributors"></a>
<a href="https://github.com/OpenHands/OpenHands/stargazers"><img src="https://img.shields.io/github/stars/OpenHands/OpenHands?style=for-the-badge&color=blue" alt="Stargazers"></a>
<a href="https://github.com/OpenHands/OpenHands/blob/main/LICENSE"><img src="https://img.shields.io/github/license/OpenHands/OpenHands?style=for-the-badge&color=blue" alt="MIT License"></a>
<a href="https://github.com/OpenHands/OpenHands/blob/main/LICENSE"><img src="https://img.shields.io/badge/LICENSE-MIT-20B2AA?style=for-the-badge" alt="MIT License"></a>
<a href="https://docs.google.com/spreadsheets/d/1wOUdFCMyY6Nt0AIqF705KN4JKOWgeI4wUGUP60krXXs/edit?gid=811504672#gid=811504672"><img src="https://img.shields.io/badge/SWEBench-72.8-00cc00?logoColor=FFE165&style=for-the-badge" alt="Benchmark Score"></a>
<br/>
<a href="https://all-hands.dev/joinslack"><img src="https://img.shields.io/badge/Slack-Join%20Us-red?logo=slack&logoColor=white&style=for-the-badge" alt="Join our Slack community"></a>
<a href="https://github.com/OpenHands/OpenHands/blob/main/CREDITS.md"><img src="https://img.shields.io/badge/Project-Credits-blue?style=for-the-badge&color=FFE165&logo=github&logoColor=white" alt="Credits"></a>
<br/>
<a href="https://docs.all-hands.dev/usage/getting-started"><img src="https://img.shields.io/badge/Documentation-000?logo=googledocs&logoColor=FFE165&style=for-the-badge" alt="Check out the documentation"></a>
<a href="https://arxiv.org/abs/2407.16741"><img src="https://img.shields.io/badge/Paper%20on%20Arxiv-000?logoColor=FFE165&logo=arxiv&style=for-the-badge" alt="Paper on Arxiv"></a>
<a href="https://docs.google.com/spreadsheets/d/1wOUdFCMyY6Nt0AIqF705KN4JKOWgeI4wUGUP60krXXs/edit?gid=0#gid=0"><img src="https://img.shields.io/badge/Benchmark%20score-000?logoColor=FFE165&logo=huggingface&style=for-the-badge" alt="Evaluation Benchmark Score"></a>
<a href="https://docs.openhands.dev/sdk"><img src="https://img.shields.io/badge/Documentation-000?logo=googledocs&logoColor=FFE165&style=for-the-badge" alt="Check out the documentation"></a>
<a href="https://arxiv.org/abs/2511.03690"><img src="https://img.shields.io/badge/Paper-000?logoColor=FFE165&logo=arxiv&style=for-the-badge" alt="Tech Report"></a>
<!-- Keep these links. Translations will automatically update with the README. -->
<a href="https://www.readme-i18n.com/OpenHands/OpenHands?lang=de">Deutsch</a> |
@@ -28,157 +24,63 @@
<a href="https://www.readme-i18n.com/OpenHands/OpenHands?lang=ru">Русский</a> |
<a href="https://www.readme-i18n.com/OpenHands/OpenHands?lang=zh">中文</a>
<hr>
</div>
Welcome to OpenHands (formerly OpenDevin), a platform for software development agents powered by AI.
<hr>
OpenHands agents can do anything a human developer can: modify code, run commands, browse the web,
call APIs, and yes—even copy code snippets from StackOverflow.
🙌 Welcome to OpenHands, a [community](COMMUNITY.md) focused on AI-driven development. Wed love for you to [join us on Slack](https://dub.sh/openhands).
Learn more at [docs.all-hands.dev](https://docs.all-hands.dev), or [sign up for OpenHands Cloud](https://app.all-hands.dev) to get started.
There are a few ways to work with OpenHands:
### OpenHands Software Agent SDK
The SDK is a composable Python library that contains all of our agentic tech. It's the engine that powers everything else below.
> [!IMPORTANT]
> **Upcoming change**: We are renaming our GitHub Org from `All-Hands-AI` to `OpenHands` on October 20th, 2025.
> Check the [tracking issue](https://github.com/All-Hands-AI/OpenHands/issues/11376) for more information.
Define agents in code, then run them locally, or scale to 1000s of agents in the cloud.
[Check out the docs](https://docs.openhands.dev/sdk) or [view the source](https://github.com/OpenHands/software-agent-sdk/)
> [!IMPORTANT]
> Using OpenHands for work? We'd love to chat! Fill out
> [this short form](https://docs.google.com/forms/d/e/1FAIpQLSet3VbGaz8z32gW9Wm-Grl4jpt5WgMXPgJ4EDPVmCETCBpJtQ/viewform)
> to join our Design Partner program, where you'll get early access to commercial features and the opportunity to provide input on our product roadmap.
### OpenHands CLI
The CLI is the easiest way to start using OpenHands. The experience will be familiar to anyone who has worked
with e.g. Claude Code or Codex. You can power it with Claude, GPT, or any other LLM.
## ☁️ OpenHands Cloud
The easiest way to get started with OpenHands is on [OpenHands Cloud](https://app.all-hands.dev),
which comes with $20 in free credits for new users.
[Check out the docs](https://docs.openhands.dev/openhands/usage/run-openhands/cli-mode) or [view the source](https://github.com/OpenHands/OpenHands-CLI)
## 💻 Running OpenHands Locally
### OpenHands Local GUI
Use the Local GUI for running agents on your laptop. It comes with a REST API and a single-page React application.
The experience will be familiar to anyone who has used Devin or Jules.
### Option 1: CLI Launcher (Recommended)
[Check out the docs](https://docs.openhands.dev/openhands/usage/run-openhands/local-setup) or view the source in this repo.
The easiest way to run OpenHands locally is using the CLI launcher with [uv](https://docs.astral.sh/uv/). This provides better isolation from your current project's virtual environment and is required for OpenHands' default MCP servers.
### OpenHands Cloud
This is a deployment of OpenHands GUI, running on hosted infrastructure.
**Install uv** (if you haven't already):
You can try it with a free $10 credit by [signing in with your GitHub account](https://app.all-hands.dev).
See the [uv installation guide](https://docs.astral.sh/uv/getting-started/installation/) for the latest installation instructions for your platform.
OpenHands Cloud comes with source-available features and integrations:
- Integrations with Slack, Jira, and Linear
- Multi-user support
- RBAC and permissions
- Collaboration features (e.g., conversation sharing)
**Launch OpenHands**:
```bash
# Launch the GUI server
uvx --python 3.12 --from openhands-ai openhands serve
### OpenHands Enterprise
Large enterprises can work with us to self-host OpenHands Cloud in their own VPC, via Kubernetes.
OpenHands Enterprise can also work with the CLI and SDK above.
# Or launch the CLI
uvx --python 3.12 --from openhands-ai openhands
```
OpenHands Enterprise is source-available--you can see all the source code here in the enterprise/ directory,
but you'll need to purchase a license if you want to run it for more than one month.
You'll find OpenHands running at [http://localhost:3000](http://localhost:3000) (for GUI mode)!
Enterprise contracts also come with extended support and access to our research team.
### Option 2: Docker
Learn more at [openhands.dev/enterprise](https://openhands.dev/enterprise)
<details>
<summary>Click to expand Docker command</summary>
### Everything Else
You can also run OpenHands directly with Docker:
Check out our [Product Roadmap](https://github.com/orgs/openhands/projects/1), and feel free to
[open up an issue](https://github.com/OpenHands/OpenHands/issues) if there's something you'd like to see!
```bash
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.59-nikolaik
You might also be interested in our [evaluation infrastructure](https://github.com/OpenHands/benchmarks), our [chrome extension](https://github.com/OpenHands/openhands-chrome-extension/), or our [Theory-of-Mind module](https://github.com/OpenHands/ToM-SWE).
docker run -it --rm --pull=always \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.59-nikolaik \
-e LOG_ALL_EVENTS=true \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.openhands:/.openhands \
-p 3000:3000 \
--add-host host.docker.internal:host-gateway \
--name openhands-app \
docker.all-hands.dev/all-hands-ai/openhands:0.59
```
All our work is available under the MIT license, except for the `enterprise/` directory in this repository (see the [enterprise license](enterprise/LICENSE) for details).
The core `openhands` and `agent-server` Docker images are fully MIT-licensed as well.
</details>
> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.
> [!WARNING]
> On a public network? See our [Hardened Docker Installation Guide](https://docs.all-hands.dev/usage/runtimes/docker#hardened-docker-installation)
> to secure your deployment by restricting network binding and implementing additional security measures.
### Getting Started
When you open the application, you'll be asked to choose an LLM provider and add an API key.
[Anthropic's Claude Sonnet 4.5](https://www.anthropic.com/api) (`anthropic/claude-sonnet-4-5-20250929`)
works best, but you have [many options](https://docs.all-hands.dev/usage/llms).
See the [Running OpenHands](https://docs.all-hands.dev/usage/installation) guide for
system requirements and more information.
## 💡 Other ways to run OpenHands
> [!WARNING]
> OpenHands is meant to be run by a single user on their local workstation.
> It is not appropriate for multi-tenant deployments where multiple users share the same instance. There is no built-in authentication, isolation, or scalability.
>
> If you're interested in running OpenHands in a multi-tenant environment, check out the source-available, commercially-licensed
> [OpenHands Cloud Helm Chart](https://github.com/openHands/OpenHands-cloud)
You can [connect OpenHands to your local filesystem](https://docs.all-hands.dev/usage/runtimes/docker#connecting-to-your-filesystem),
interact with it via a [friendly CLI](https://docs.all-hands.dev/usage/how-to/cli-mode),
run OpenHands in a scriptable [headless mode](https://docs.all-hands.dev/usage/how-to/headless-mode),
or run it on tagged issues with [a github action](https://docs.all-hands.dev/usage/how-to/github-action).
Visit [Running OpenHands](https://docs.all-hands.dev/usage/installation) for more information and setup instructions.
If you want to modify the OpenHands source code, check out [Development.md](https://github.com/OpenHands/OpenHands/blob/main/Development.md).
Having issues? The [Troubleshooting Guide](https://docs.all-hands.dev/usage/troubleshooting) can help.
## 📖 Documentation
To learn more about the project, and for tips on using OpenHands,
check out our [documentation](https://docs.all-hands.dev/usage/getting-started).
There you'll find resources on how to use different LLM providers,
troubleshooting resources, and advanced configuration options.
## 🤝 How to Join the Community
OpenHands is a community-driven project, and we welcome contributions from everyone. We do most of our communication
through Slack, so this is the best place to start, but we also are happy to have you contact us on Github:
- [Join our Slack workspace](https://all-hands.dev/joinslack) - Here we talk about research, architecture, and future development.
- [Read or post Github Issues](https://github.com/OpenHands/OpenHands/issues) - Check out the issues we're working on, or add your own ideas.
See more about the community in [COMMUNITY.md](./COMMUNITY.md) or find details on contributing in [CONTRIBUTING.md](./CONTRIBUTING.md).
## 📈 Progress
See the monthly OpenHands roadmap [here](https://github.com/orgs/OpenHands/projects/1) (updated at the maintainer's meeting at the end of each month).
<p align="center">
<a href="https://star-history.com/#OpenHands/OpenHands&Date">
<img src="https://api.star-history.com/svg?repos=OpenHands/OpenHands&type=Date" width="500" alt="Star History Chart">
</a>
</p>
## 📜 License
Distributed under the MIT License, with the exception of the `enterprise/` folder. See [`LICENSE`](./LICENSE) for more information.
## 🙏 Acknowledgements
OpenHands is built by a large number of contributors, and every contribution is greatly appreciated! We also build upon other open source projects, and we are deeply thankful for their work.
For a list of open source projects and licenses used in OpenHands, please see our [CREDITS.md](./CREDITS.md) file.
## 📚 Cite
```
@inproceedings{
wang2025openhands,
title={OpenHands: An Open Platform for {AI} Software Developers as Generalist Agents},
author={Xingyao Wang and Boxuan Li and Yufan Song and Frank F. Xu and Xiangru Tang and Mingchen Zhuge and Jiayi Pan and Yueqi Song and Bowen Li and Jaskirat Singh and Hoang H. Tran and Fuqiang Li and Ren Ma and Mingzhang Zheng and Bill Qian and Yanjun Shao and Niklas Muennighoff and Yizhe Zhang and Binyuan Hui and Junyang Lin and Robert Brennan and Hao Peng and Heng Ji and Graham Neubig},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=OJd3ayDDoF}
}
```
If you need help with anything, or just want to chat, [come find us on Slack](https://dub.sh/openhands).

View File

@@ -73,7 +73,7 @@ ENV VIRTUAL_ENV=/app/.venv \
COPY --chown=openhands:openhands --chmod=770 --from=backend-builder ${VIRTUAL_ENV} ${VIRTUAL_ENV}
COPY --chown=openhands:openhands --chmod=770 ./microagents ./microagents
COPY --chown=openhands:openhands --chmod=770 ./skills ./skills
COPY --chown=openhands:openhands --chmod=770 ./openhands ./openhands
COPY --chown=openhands:openhands --chmod=777 ./openhands/runtime/plugins ./openhands/runtime/plugins
COPY --chown=openhands:openhands pyproject.toml poetry.lock README.md MANIFEST.in LICENSE ./

View File

@@ -1,7 +1,7 @@
# Develop in Docker
> [!WARNING]
> This is not officially supported and may not work.
> This way of running OpenHands is not officially supported. It is maintained by the community and may not work.
Install [Docker](https://docs.docker.com/engine/install/) on your host machine and run:

View File

@@ -12,7 +12,7 @@ services:
- SANDBOX_API_HOSTNAME=host.docker.internal
- DOCKER_HOST_ADDR=host.docker.internal
#
- SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-ghcr.io/openhands/runtime:0.59-nikolaik}
- SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-ghcr.io/openhands/runtime:0.62-nikolaik}
- SANDBOX_USER_ID=${SANDBOX_USER_ID:-1234}
- WORKSPACE_MOUNT_PATH=${WORKSPACE_BASE:-$PWD/workspace}
ports:

View File

@@ -3,9 +3,9 @@ repos:
rev: v5.0.0
hooks:
- id: trailing-whitespace
exclude: ^(docs/|modules/|python/|openhands-ui/|third_party/|enterprise/|openhands-cli/)
exclude: ^(docs/|modules/|python/|openhands-ui/|third_party/|enterprise/)
- id: end-of-file-fixer
exclude: ^(docs/|modules/|python/|openhands-ui/|third_party/|enterprise/|openhands-cli/)
exclude: ^(docs/|modules/|python/|openhands-ui/|third_party/|enterprise/)
- id: check-yaml
args: ["--allow-multiple-documents"]
- id: debug-statements
@@ -28,12 +28,12 @@ repos:
entry: ruff check --config dev_config/python/ruff.toml
types_or: [python, pyi, jupyter]
args: [--fix, --unsafe-fixes]
exclude: ^(third_party/|enterprise/|openhands-cli/)
exclude: ^(third_party/|enterprise/)
# Run the formatter.
- id: ruff-format
entry: ruff format --config dev_config/python/ruff.toml
types_or: [python, pyi, jupyter]
exclude: ^(third_party/|enterprise/|openhands-cli/)
exclude: ^(third_party/|enterprise/)
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.15.0

View File

@@ -0,0 +1,836 @@
# Custom Agent Packages with Custom Runtime Images (Scenario 1)
## 1. Introduction
### 1.1 Problem Statement
OpenHands currently supports agent customization through the software-agent-sdk, but users who need custom system dependencies, specialized tools, or non-Python runtime environments cannot easily deploy their agents. The current V1 architecture uses a fixed agent server image (`ghcr.io/openhands/agent-server:5f62cee-python`) that may not contain the required dependencies for specialized agents.
Users building agents that require:
- Custom system packages (e.g., specialized compilers, databases, ML frameworks)
- Non-Python tools and runtimes (e.g., Node.js, Go, Rust toolchains)
- Custom Docker base images with specific OS configurations
- Proprietary or licensed software installations
Currently have no supported path to deploy their agents to OpenHands Enterprise.
### 1.2 Proposed Solution
We propose extending the existing **Sandbox Specification System** to support custom agent runtime images with proper permissions and security controls. This approach builds directly on OpenHands' current sandbox infrastructure rather than creating parallel systems.
Users will be able to:
1. Create custom Docker images containing their agent code and dependencies
2. Register these images as enhanced sandbox specifications with rich metadata
3. Deploy conversations using their custom sandbox specs (with proper permissions)
4. Maintain full compatibility with existing sandbox management and API infrastructure
The solution extends the current `SandboxSpecService` with:
- **Permission-based access control** to limit custom specs to authorized users
- **Enhanced sandbox specifications** that include agent-specific metadata and requirements
- **Secure image management** with validation and approval workflows
- **Integrated deployment** through existing conversation creation APIs
**Trade-offs**: This approach requires users to build and maintain Docker images, increasing complexity compared to simple Python package deployment. However, it provides the necessary isolation and dependency management for complex agent requirements while leveraging proven sandbox infrastructure.
## 2. User Interface
### 2.1 Custom Agent Image Creation
Users create a custom agent image by extending the base agent server image:
```dockerfile
# Dockerfile for custom agent
FROM ghcr.io/openhands/agent-server:5f62cee-python
# Install custom system dependencies
RUN apt-get update && apt-get install -y \
nodejs \
npm \
golang-go \
&& rm -rf /var/lib/apt/lists/*
# Install custom Python packages
COPY requirements.txt /tmp/
RUN pip install -r /tmp/requirements.txt
# Copy custom agent code
COPY my_custom_agent/ /app/my_custom_agent/
COPY agent_config.json /app/config/
# Set custom agent as default
ENV CUSTOM_AGENT_MODULE=my_custom_agent
ENV CUSTOM_AGENT_CLASS=MySpecializedAgent
```
### 2.2 Enhanced Sandbox Spec Registration
Users register their custom agent image as an enhanced sandbox specification:
```yaml
# enhanced-sandbox-spec.yaml
apiVersion: openhands.ai/v1
kind: SandboxSpec
metadata:
name: specialized-ml-agent
version: "1.0.0"
owner: user@company.com
permissions:
users: ["user@company.com", "team-lead@company.com"]
groups: ["ml-team", "data-science"]
spec:
image: "myregistry/specialized-ml-agent:v1.0.0"
description: "ML agent with TensorFlow and custom data processing tools"
# Agent-specific metadata
agent:
capabilities:
- machine_learning
- data_analysis
- custom_visualization
type: "custom"
module: "agents.specialized_ml_agent"
class: "SpecializedMLAgent"
requirements:
memory: "4Gi"
cpu: "2"
environment:
TENSORFLOW_VERSION: "2.15.0"
CUSTOM_MODEL_PATH: "/app/models"
# Agent server configuration
CUSTOM_AGENT_MODULE: "agents.specialized_ml_agent"
CUSTOM_AGENT_CLASS: "SpecializedMLAgent"
ports:
- name: agent-server
port: 8000
- name: tensorboard
port: 6006
```
### 2.3 Conversation Creation with Custom Sandbox Spec
Users create conversations using their custom sandbox specs through the existing API:
```bash
# Create conversation with custom sandbox spec
curl -X POST "https://api.openhands.ai/api/conversations" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"sandbox_spec_id": "specialized-ml-agent:v1.0.0",
"initial_message": "Analyze this dataset and create a predictive model",
"workspace": {
"type": "local",
"working_dir": "/workspace/ml-project"
}
}'
```
### 2.4 Image Management Workflows
#### 2.4.1 Pre-built Image Approach
For organizations that want to manage custom agent images centrally:
```bash
# Admin registers pre-built image as sandbox spec
curl -X POST "https://api.openhands.ai/api/sandbox-specs" \
-H "Authorization: Bearer $ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "company-ml-agent",
"version": "1.0.0",
"image": "company-registry/ml-agent:v1.0.0",
"permissions": {
"groups": ["ml-team", "data-science"]
},
"agent": {
"type": "custom",
"capabilities": ["machine_learning", "data_analysis"]
}
}'
```
#### 2.4.2 User Upload Approach
For users who want to upload their own custom images:
```bash
# User uploads custom image (with security validation)
curl -X POST "https://api.openhands.ai/api/sandbox-specs/upload" \
-H "Authorization: Bearer $API_KEY" \
-F "dockerfile=@Dockerfile" \
-F "context=@agent-context.tar.gz" \
-F "spec=@sandbox-spec.yaml"
```
## 3. Other Context
### 3.1 Current Sandbox Specification System
OpenHands V1 uses a sandbox specification system to manage container deployments:
- **Single Default Spec**: Currently only one sandbox spec exists, shared by all users
- **SandboxSpecService**: Manages sandbox specifications and container creation
- **SandboxSpecInfo**: Contains image, environment, and resource configuration
- **No Permissions**: Current system lacks user-based access control
The existing system provides the foundation but needs enhancement for custom agents:
- **Permission Layer**: Required to control access to custom specs
- **Rich Metadata**: Need agent-specific information beyond basic container config
- **Image Management**: Need secure workflows for custom image registration
### 3.2 Enhanced Sandbox Specification Architecture
Our proposal extends the existing system with:
#### 3.2.1 Permission-Based Access Control
- **User Permissions**: Individual user access to specific sandbox specs
- **Group Permissions**: Team-based access control for organizational specs
- **Owner Management**: Spec ownership and delegation capabilities
- **Admin Override**: Administrative access for spec management
#### 3.2.2 Agent-Specific Metadata
- **Agent Configuration**: Module, class, and capability information
- **Resource Requirements**: Memory, CPU, and storage specifications
- **Environment Variables**: Agent-specific configuration and secrets
- **Port Mappings**: Additional ports for agent services (e.g., TensorBoard)
#### 3.2.3 Image Management Integration
- **Registry Support**: Integration with Docker registries for image storage
- **Security Validation**: Image scanning and approval workflows
- **Version Management**: Support for multiple versions of custom specs
- **Build Integration**: Optional image building from Dockerfile uploads
### 3.3 Existing Container Orchestration Integration
The enhanced system leverages existing OpenHands infrastructure:
- **Sandbox Service**: Extended to support permission checks and enhanced specs
- **Container Management**: Same lifecycle management with additional metadata
- **Network Isolation**: Maintains existing security boundaries
- **Resource Enforcement**: Enhanced with custom resource requirements
- **Health Monitoring**: Extended to track custom agent-specific metrics
## 4. Technical Design
### 4.1 Enhanced Sandbox Specification Model
#### 4.1.1 Extended SandboxSpecInfo Structure
The existing `SandboxSpecInfo` model is enhanced to support custom agents:
```python
# openhands/app_server/sandbox/sandbox_spec_models.py (enhanced)
from pydantic import BaseModel, Field
from typing import Dict, List, Optional
class AgentMetadata(BaseModel):
"""Agent-specific metadata for custom agents."""
type: str = Field(default="default", description="Agent type (default|custom)")
capabilities: List[str] = Field(default_factory=list, description="Agent capabilities")
module: Optional[str] = Field(description="Python module containing agent class")
class_name: Optional[str] = Field(description="Agent class name")
class PermissionSpec(BaseModel):
"""Permission specification for sandbox spec access."""
users: List[str] = Field(default_factory=list, description="Authorized user emails")
groups: List[str] = Field(default_factory=list, description="Authorized group names")
owner: Optional[str] = Field(description="Spec owner")
class EnhancedSandboxSpecInfo(BaseModel):
"""Enhanced sandbox specification with agent metadata and permissions."""
# Existing fields from SandboxSpecInfo
id: str = Field(description="Docker image identifier")
command: List[str] = Field(default_factory=lambda: ['--port', '8000'])
initial_env: Dict[str, str] = Field(default_factory=dict)
working_dir: str = Field(default="/workspace/project")
# Enhanced fields
name: str = Field(description="Human-readable spec name")
version: str = Field(description="Spec version")
description: Optional[str] = Field(description="Spec description")
# Agent-specific metadata
agent: AgentMetadata = Field(default_factory=AgentMetadata)
# Permission and access control
permissions: PermissionSpec = Field(default_factory=PermissionSpec)
# Resource requirements
memory_limit: Optional[str] = Field(description="Memory limit (e.g., '4Gi')")
cpu_limit: Optional[str] = Field(description="CPU limit (e.g., '2')")
# Additional ports for custom services
ports: List[Dict[str, any]] = Field(
default_factory=lambda: [{"name": "agent-server", "port": 8000}]
)
```
#### 4.1.2 Custom Agent Image Structure
Custom agent images extend the base agent server with this structure:
```
/app/
├── config/
│ ├── agent_config.json # Agent configuration
│ └── tool_registry.json # Custom tool definitions (optional)
├── agents/
│ └── custom_agent.py # Agent implementation
├── tools/ # Custom tools (optional)
│ ├── __init__.py
│ └── custom_tools.py
└── startup/
└── init_agent.py # Agent initialization script
```
### 4.2 Agent Implementation Interface
#### 4.2.1 Custom Agent Base Class
```python
# agents/custom_agent.py
from openhands.sdk.agent.base import AgentBase
from openhands.sdk.llm import LLM
from openhands.sdk.tool import Tool
from typing import List, Dict, Any
class SpecializedMLAgent(AgentBase):
"""Custom ML agent with TensorFlow capabilities."""
def __init__(
self,
llm: LLM,
tools: List[Tool],
config: Dict[str, Any] = None
):
super().__init__(llm=llm, tools=tools)
self.config = config or {}
self.model_cache = self.config.get('MODEL_CACHE_DIR', '/app/models')
async def initialize(self) -> None:
"""Initialize custom agent resources."""
# Load pre-trained models
await self._load_models()
# Initialize custom tools
await self._setup_custom_tools()
async def _load_models(self) -> None:
"""Load TensorFlow models from cache."""
import tensorflow as tf
# Custom model loading logic
pass
async def _setup_custom_tools(self) -> None:
"""Initialize custom tools with agent context."""
# Custom tool setup logic
pass
```
#### 4.2.2 Custom Tool Implementation
```python
# tools/custom_tools.py
from openhands.sdk.tool import Tool, ToolExecutor, register_tool
from openhands.sdk import Action, Observation
from pydantic import Field
import tensorflow as tf
class TensorFlowAnalysisAction(Action):
dataset_path: str = Field(description="Path to dataset file")
model_type: str = Field(description="Type of ML model to create")
target_column: str = Field(description="Target column for prediction")
class TensorFlowAnalysisObservation(Observation):
model_accuracy: float = Field(description="Model accuracy score")
feature_importance: Dict[str, float] = Field(description="Feature importance scores")
model_path: str = Field(description="Path to saved model")
class TensorFlowToolExecutor(ToolExecutor[TensorFlowAnalysisAction, TensorFlowAnalysisObservation]):
def __call__(self, action: TensorFlowAnalysisAction, conversation=None) -> TensorFlowAnalysisObservation:
# Custom TensorFlow analysis logic
model = self._create_model(action.dataset_path, action.model_type, action.target_column)
accuracy = self._evaluate_model(model)
importance = self._get_feature_importance(model)
model_path = self._save_model(model)
return TensorFlowAnalysisObservation(
model_accuracy=accuracy,
feature_importance=importance,
model_path=model_path
)
# Register the custom tool
register_tool(
Tool(
name="TensorFlowTool",
executor=TensorFlowToolExecutor(),
definition=ToolDefinition(
name="tensorflow_analysis",
description="Perform machine learning analysis using TensorFlow",
parameters=TensorFlowAnalysisAction.model_json_schema()
)
)
)
```
### 4.3 Runtime Integration
#### 4.3.1 Custom Agent Loader
```python
# startup/init_agent.py
import json
import importlib
from pathlib import Path
from openhands.sdk.agent.base import AgentBase
from openhands.sdk.llm import LLM
from openhands.sdk.tool import Tool, resolve_tool
class CustomAgentLoader:
"""Loads custom agents from configuration."""
def __init__(self, config_path: str = "/app/config/agent_config.json"):
self.config_path = Path(config_path)
self.config = self._load_config()
def _load_config(self) -> dict:
"""Load agent configuration from JSON file."""
with open(self.config_path) as f:
return json.load(f)
def create_agent(self, llm: LLM) -> AgentBase:
"""Create custom agent instance."""
agent_config = self.config["agent"]
# Import custom agent class
module = importlib.import_module(agent_config["module"])
agent_class = getattr(module, agent_config["class"])
# Load custom tools
tools = self._load_tools()
# Create agent instance
agent = agent_class(
llm=llm,
tools=tools,
config=self.config.get("environment", {})
)
return agent
def _load_tools(self) -> List[Tool]:
"""Load and resolve custom tools."""
tools = []
for tool_config in self.config.get("tools", []):
if "module" in tool_config:
# Import custom tool module to register it
importlib.import_module(tool_config["module"])
tool = resolve_tool(tool_config["name"])
tools.append(tool)
return tools
```
#### 4.3.2 Agent Server Startup Integration
```python
# Modified agent server startup in software-agent-sdk
import os
from openhands.agent_server.api import app
from openhands.agent_server.conversation_service import ConversationService
from startup.init_agent import CustomAgentLoader
@app.on_event("startup")
async def startup_event():
"""Initialize custom agent during server startup."""
# Check for custom agent configuration
custom_agent_module = os.getenv('CUSTOM_AGENT_MODULE')
custom_agent_class = os.getenv('CUSTOM_AGENT_CLASS')
if custom_agent_module and custom_agent_class:
# Load custom agent
loader = CustomAgentLoader()
app.state.agent_factory = loader.create_agent
print(f"Loaded custom agent: {custom_agent_class}")
else:
# Use default agent
from openhands.sdk.agent import Agent
app.state.agent_factory = lambda llm: Agent(llm=llm, tools=get_default_tools())
print("Using default OpenHands agent")
```
### 4.4 Enhanced Sandbox Service Integration
#### 4.4.1 Permission-Aware Sandbox Service
```python
# openhands/app_server/sandbox/enhanced_sandbox_spec_service.py
from openhands.app_server.sandbox.sandbox_spec_service import SandboxSpecService
from openhands.app_server.sandbox.sandbox_spec_models import SandboxSpecInfo, EnhancedSandboxSpecInfo
from typing import Dict, List, Optional
class EnhancedSandboxSpecService(SandboxSpecService):
"""Enhanced sandbox service with permissions and custom agent support."""
def __init__(self, spec_registry: Dict[str, EnhancedSandboxSpecInfo]):
super().__init__()
self.spec_registry = spec_registry
def get_available_sandbox_specs(self, user_email: str, user_groups: List[str]) -> List[str]:
"""Get sandbox specs available to the user based on permissions."""
available_specs = []
for spec_key, spec in self.spec_registry.items():
if self._has_permission(spec, user_email, user_groups):
available_specs.append(spec_key)
return available_specs
def get_sandbox_spec_by_id(
self,
spec_id: str,
user_email: str,
user_groups: List[str]
) -> SandboxSpecInfo:
"""Get sandbox spec by ID with permission check."""
if spec_id not in self.spec_registry:
# Fall back to default specs for backward compatibility
return super().get_default_sandbox_specs()[0]
enhanced_spec = self.spec_registry[spec_id]
# Check permissions
if not self._has_permission(enhanced_spec, user_email, user_groups):
raise PermissionError(f"User {user_email} does not have access to spec {spec_id}")
# Convert to SandboxSpecInfo for existing infrastructure
return self._convert_to_sandbox_spec_info(enhanced_spec)
def _has_permission(
self,
spec: EnhancedSandboxSpecInfo,
user_email: str,
user_groups: List[str]
) -> bool:
"""Check if user has permission to use the sandbox spec."""
# Owner always has access
if spec.permissions.owner == user_email:
return True
# Check user permissions
if user_email in spec.permissions.users:
return True
# Check group permissions
for group in user_groups:
if group in spec.permissions.groups:
return True
return False
def _convert_to_sandbox_spec_info(self, enhanced_spec: EnhancedSandboxSpecInfo) -> SandboxSpecInfo:
"""Convert enhanced spec to standard SandboxSpecInfo."""
# Build environment variables including agent configuration
env_vars = {
'OPENVSCODE_SERVER_ROOT': '/openhands/.openvscode-server',
'OH_ENABLE_VNC': '0',
'LOG_JSON': 'true',
'OH_CONVERSATIONS_PATH': '/workspace/conversations',
'OH_BASH_EVENTS_DIR': '/workspace/bash_events',
'PYTHONUNBUFFERED': '1',
'ENV_LOG_LEVEL': '20',
**enhanced_spec.initial_env
}
# Add custom agent configuration if specified
if enhanced_spec.agent.type == "custom":
env_vars.update({
'CUSTOM_AGENT_MODULE': enhanced_spec.agent.module,
'CUSTOM_AGENT_CLASS': enhanced_spec.agent.class_name,
})
return SandboxSpecInfo(
id=enhanced_spec.id,
command=enhanced_spec.command,
initial_env=env_vars,
working_dir=enhanced_spec.working_dir,
)
def register_sandbox_spec(
self,
spec: EnhancedSandboxSpecInfo,
admin_user: str
) -> str:
"""Register a new sandbox spec (admin only)."""
spec_key = f"{spec.name}:{spec.version}"
# Validate spec
self._validate_sandbox_spec(spec)
# Store in registry
self.spec_registry[spec_key] = spec
return spec_key
def _validate_sandbox_spec(self, spec: EnhancedSandboxSpecInfo) -> None:
"""Validate sandbox spec for security and correctness."""
# Image validation
if not spec.id or not spec.id.strip():
raise ValueError("Image ID cannot be empty")
# Permission validation
if not spec.permissions.owner:
raise ValueError("Sandbox spec must have an owner")
# Agent validation for custom agents
if spec.agent.type == "custom":
if not spec.agent.module or not spec.agent.class_name:
raise ValueError("Custom agents must specify module and class_name")
```
### 4.5 Enhanced API Integration
#### 4.5.1 Enhanced Conversation Creation
```python
# openhands/server/routes/conversation_routes.py (enhanced)
from fastapi import APIRouter, HTTPException, Depends
from pydantic import BaseModel
from typing import Optional, Dict, Any, List
from uuid import UUID
from openhands.app_server.sandbox.enhanced_sandbox_spec_service import EnhancedSandboxSpecService
from openhands.server.session.agent_session import AgentSession
from openhands.server.auth import get_current_user, get_user_groups
# Enhanced conversation creation request
class CreateConversationRequest(BaseModel):
initial_message: str
workspace_config: Optional[Dict[str, Any]] = None
# New field for custom sandbox spec
sandbox_spec_id: Optional[str] = None
@router.post("/conversations")
async def create_conversation(
request: CreateConversationRequest,
current_user: str = Depends(get_current_user),
user_groups: List[str] = Depends(get_user_groups),
sandbox_service: EnhancedSandboxSpecService = Depends(get_enhanced_sandbox_service)
) -> ConversationResponse:
"""Create conversation with optional custom sandbox spec."""
try:
if request.sandbox_spec_id:
# Use custom sandbox spec with permission check
sandbox_spec = sandbox_service.get_sandbox_spec_by_id(
request.sandbox_spec_id,
current_user,
user_groups
)
else:
# Use default sandbox spec
sandbox_spec = sandbox_service.get_default_sandbox_specs()[0]
# Create sandbox and conversation
sandbox = await sandbox_service.create_sandbox(sandbox_spec)
await wait_for_agent_server_ready(sandbox)
conversation = await create_conversation_with_sandbox(
sandbox=sandbox,
initial_message=request.initial_message,
workspace_config=request.workspace_config
)
return ConversationResponse(
conversation_id=conversation.id,
status="created",
sandbox_spec_id=request.sandbox_spec_id or "default"
)
except PermissionError as e:
raise HTTPException(status_code=403, detail=str(e))
except ValueError as e:
raise HTTPException(status_code=404, detail=str(e))
```
#### 4.5.2 Sandbox Spec Management API
```python
# openhands/server/routes/sandbox_spec_routes.py (new)
from fastapi import APIRouter, HTTPException, Depends, UploadFile, File
from pydantic import BaseModel
from typing import List, Optional
import yaml
from openhands.app_server.sandbox.enhanced_sandbox_spec_service import EnhancedSandboxSpecService
from openhands.app_server.sandbox.sandbox_spec_models import EnhancedSandboxSpecInfo
from openhands.server.auth import get_current_user, get_user_groups, require_admin
router = APIRouter(prefix="/api/sandbox-specs", tags=["Sandbox Specs"])
@router.get("/")
async def list_available_sandbox_specs(
current_user: str = Depends(get_current_user),
user_groups: List[str] = Depends(get_user_groups),
sandbox_service: EnhancedSandboxSpecService = Depends(get_enhanced_sandbox_service)
) -> List[str]:
"""List sandbox specs available to the current user."""
return sandbox_service.get_available_sandbox_specs(current_user, user_groups)
@router.post("/")
async def register_sandbox_spec(
spec_data: EnhancedSandboxSpecInfo,
current_user: str = Depends(require_admin),
sandbox_service: EnhancedSandboxSpecService = Depends(get_enhanced_sandbox_service)
) -> Dict[str, str]:
"""Register a new sandbox spec (admin only)."""
try:
spec_key = sandbox_service.register_sandbox_spec(spec_data, current_user)
return {"spec_id": spec_key, "status": "registered"}
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
@router.post("/upload")
async def upload_custom_image(
dockerfile: UploadFile = File(...),
context: UploadFile = File(...),
spec: UploadFile = File(...),
current_user: str = Depends(get_current_user),
sandbox_service: EnhancedSandboxSpecService = Depends(get_enhanced_sandbox_service)
) -> Dict[str, str]:
"""Upload custom image with Dockerfile and context (with security validation)."""
try:
# Parse spec file
spec_content = await spec.read()
spec_data = yaml.safe_load(spec_content)
# Validate user has permission to create specs
if not _can_user_create_specs(current_user):
raise HTTPException(status_code=403, detail="User not authorized to create custom specs")
# Security validation of Dockerfile
dockerfile_content = await dockerfile.read()
_validate_dockerfile_security(dockerfile_content)
# Build image (implementation depends on build system)
image_id = await _build_custom_image(dockerfile_content, context, current_user)
# Create enhanced spec
enhanced_spec = EnhancedSandboxSpecInfo(**spec_data)
enhanced_spec.id = image_id
enhanced_spec.permissions.owner = current_user
# Register the spec
spec_key = sandbox_service.register_sandbox_spec(enhanced_spec, current_user)
return {"spec_id": spec_key, "image_id": image_id, "status": "uploaded"}
except Exception as e:
raise HTTPException(status_code=400, detail=f"Upload failed: {str(e)}")
```
## 5. Implementation Plan
All implementation must pass existing lints and tests. New functionality requires comprehensive test coverage including unit tests, integration tests, and end-to-end scenarios.
### 5.1 Enhanced Sandbox Models and Permissions (M1)
#### 5.1.1 Enhanced Sandbox Specification Models
* `openhands/app_server/sandbox/sandbox_spec_models.py` (enhanced)
* `tests/unit/app_server/sandbox/test_enhanced_sandbox_spec_models.py`
Extend existing `SandboxSpecInfo` with `EnhancedSandboxSpecInfo` including agent metadata, permissions, and resource requirements. This is the **core requirement** identified by the engineer.
#### 5.1.2 Permission System Foundation
* `openhands/server/auth/permissions.py`
* `tests/unit/server/auth/test_permissions.py`
Implement user and group-based permission system for sandbox spec access control. This addresses the **security concerns** from V0 mentioned by the engineer.
**Demo**: Create enhanced sandbox specs with permission restrictions and verify access control works correctly.
### 5.2 Enhanced Sandbox Service (M2)
#### 5.2.1 Permission-Aware Sandbox Service
* `openhands/app_server/sandbox/enhanced_sandbox_spec_service.py`
* `tests/unit/app_server/sandbox/test_enhanced_sandbox_spec_service.py`
Extend existing `SandboxSpecService` with permission checks and enhanced spec management. This **builds on existing infrastructure** as the engineer suggested.
#### 5.2.2 Agent Server Startup Integration
* `openhands-agent-server/openhands/agent_server/custom_agent_loader.py`
* `tests/unit/agent_server/test_custom_agent_loader.py`
Implement custom agent loading mechanism in agent server startup process with configuration-driven agent instantiation.
**Demo**: Deploy custom agents using enhanced sandbox specs and verify permission-based access control works end-to-end.
### 5.3 Image Management and API Integration (M3)
#### 5.3.1 Secure Image Management
* `openhands/app_server/sandbox/image_builder.py`
* `openhands/app_server/security/dockerfile_validator.py`
* `tests/unit/app_server/sandbox/test_image_builder.py`
* `tests/unit/app_server/security/test_dockerfile_validator.py`
Implement both **pre-built image registration** and **secure user upload** workflows as identified by the engineer. This addresses the security issues from V0.
#### 5.3.2 Enhanced Conversation API
* `openhands/server/routes/conversation_routes.py` (enhanced)
* `openhands/server/routes/sandbox_spec_routes.py` (new)
* `tests/unit/server/routes/test_enhanced_conversation_routes.py`
* `tests/unit/server/routes/test_sandbox_spec_routes.py`
Enhance existing conversation creation API to support `sandbox_spec_id` parameter and add new sandbox spec management endpoints.
**Demo**: Create conversations with custom sandbox specs through existing API endpoints and demonstrate both pre-built and user-uploaded image workflows.
### 5.4 Advanced Security and Management (M4)
#### 5.4.1 Image Security Validation
* `openhands/app_server/security/image_scanner.py`
* `openhands/app_server/security/security_policies.py`
* `tests/unit/app_server/security/test_image_scanner.py`
Implement comprehensive security validation including image vulnerability scanning, Dockerfile analysis, and approval workflows.
#### 5.4.2 Spec Registry and Lifecycle Management
* `openhands/app_server/sandbox/spec_registry.py`
* `openhands/app_server/sandbox/spec_lifecycle.py`
* `tests/unit/app_server/sandbox/test_spec_registry.py`
Add persistent storage for enhanced sandbox specs, version management, and lifecycle policies (deprecation, cleanup).
**Demo**: Deploy multiple custom agents with different permission levels, demonstrate security validation workflows, and show proper spec lifecycle management.
---
## Key Alignment with Engineer's Approach
This revised implementation plan directly addresses the engineer's requirements:
1. **✅ Uses existing sandbox specs system** - Enhanced rather than replaced
2. **✅ Permissions as core requirement** - Moved to M1 instead of M4
3. **✅ Two image management approaches** - Pre-built registration and secure user uploads
4. **✅ Security-first design** - Addresses V0 security issues with comprehensive validation
5. **✅ Minimal infrastructure changes** - Builds on existing `SandboxSpecService` and conversation APIs

View File

@@ -0,0 +1,934 @@
# Dynamic Custom Agent Package Loading (Scenario 2)
## 1. Introduction
### 1.1 Problem Statement
OpenHands V1 architecture uses a fixed agent server image (`ghcr.io/openhands/agent-server:5f62cee-python`) that contains the default agent implementation. Users who want to customize agent behavior with pure Python packages that don't require additional system dependencies currently have no supported mechanism to deploy their custom agents without building entirely new Docker images.
This creates unnecessary complexity for the common use case where users simply want to:
- Customize agent prompts and reasoning logic
- Add new Python-based tools and capabilities
- Integrate with Python APIs and libraries already available in the base image
- Deploy agents with different LLM configurations or specialized workflows
The current approach forces all customization through the heavyweight Scenario 1 path (custom Docker images), even when the base agent server image already contains all necessary dependencies.
### 1.2 Proposed Solution
We propose implementing **Dynamic Custom Agent Package Loading** within the existing V1 agent server container. This allows users to deploy custom agents by providing Python packages that are downloaded, installed, and instantiated at runtime without requiring custom Docker images.
Users will be able to:
1. Package their custom agents as standard Python packages (pip-installable)
2. Specify agent package URLs (Git repositories, PyPI packages, or ZIP archives) in conversation creation
3. Have the agent server dynamically download and install the package at startup
4. Instantiate their custom agent within the existing container environment
5. Maintain full compatibility with the existing HTTP API (`/ask_agent` endpoint)
The solution leverages the existing V1 architecture's agent server container but extends the startup process to support dynamic agent loading based on environment configuration.
**Trade-offs**: This approach is limited to Python packages that can run within the existing agent server environment. Users needing custom system dependencies, non-Python tools, or different base images must use Scenario 1. However, this covers the majority of agent customization use cases with significantly reduced complexity.
## 2. User Interface
### 2.1 Custom Agent Package Structure
Users create a standard Python package with the required interface:
```python
# my_custom_agent/
setup.py
requirements.txt # Optional additional dependencies
my_custom_agent/
__init__.py
agent.py # Main agent implementation
tools.py # Custom tools (optional)
config.py # Agent configuration
```
### 2.2 Agent Implementation
```python
# my_custom_agent/agent.py
from openhands.sdk.agent.base import AgentBase
from openhands.sdk.llm import LLM
from openhands.sdk.tool import Tool
from typing import List, Dict, Any
class MyCustomAgent(AgentBase):
"""Custom agent with specialized behavior."""
def __init__(self, llm: LLM, tools: List[Tool], config: Dict[str, Any] = None):
super().__init__(llm=llm, tools=tools)
self.config = config or {}
async def initialize(self) -> None:
"""Initialize custom agent resources."""
# Custom initialization logic
pass
# Factory function for agent creation
def create_agent(llm: LLM, tools: List[Tool], config: Dict[str, Any] = None) -> AgentBase:
"""Factory function to create the custom agent."""
return MyCustomAgent(llm=llm, tools=tools, config=config)
```
### 2.3 Package Entry Point
```python
# my_custom_agent/__init__.py
from .agent import create_agent, MyCustomAgent
__all__ = ['create_agent', 'MyCustomAgent']
```
### 2.4 Setup Configuration
```python
# setup.py
from setuptools import setup, find_packages
setup(
name="my-custom-agent",
version="1.0.0",
packages=find_packages(),
install_requires=[
# Only additional dependencies beyond base image
"requests>=2.25.0",
"beautifulsoup4>=4.9.0",
],
entry_points={
'openhands.agents': [
'my_custom_agent = my_custom_agent:create_agent',
],
},
)
```
### 2.5 Conversation Creation with Dynamic Agent Loading
Users create conversations by specifying the agent package URL:
```bash
# Create conversation with custom agent package
curl -X POST "https://api.openhands.ai/api/conversations" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"agent_package_url": "git+https://github.com/user/my-custom-agent.git",
"initial_message": "Help me analyze this codebase",
"workspace": {
"type": "local",
"working_dir": "/workspace/project"
}
}'
```
Alternative package sources:
```bash
# PyPI package
"agent_package_url": "my-custom-agent==1.0.0"
# ZIP archive
"agent_package_url": "https://example.com/agents/my-custom-agent.zip"
# Private Git repository
"agent_package_url": "git+https://token@github.com/private/agent.git"
```
## 3. Other Context
### 3.1 Current V1 Architecture Overview
The OpenHands V1 architecture follows a distributed service model with clear separation between the main application server and agent execution environment. Understanding this architecture is crucial for implementing dynamic agent loading.
#### 3.1.1 Service Separation
The V1 system consists of three primary components:
1. **Main Server** (`openhands/app_server/`): Handles user requests, conversation management, and sandbox orchestration
2. **Agent Server** (`software-agent-sdk/openhands-agent-server/`): Executes agent logic and manages conversation state
3. **Action Execution Server**: Handles tool execution (bash commands, file operations) within sandboxed environments
#### 3.1.2 Communication Flow
The current communication pattern follows this sequence:
```
User Request → Main Server → HTTP API → Agent Server → Agent Instance → Tools → Response
```
This separation allows for:
- **Isolation**: Agent execution is isolated from the main application
- **Scalability**: Multiple agent servers can be spawned for different conversations
- **Security**: Sandboxed execution prevents agent actions from affecting the host system
- **Flexibility**: Different agent configurations can be deployed without affecting the main server
#### 3.1.3 Container Orchestration
The main server uses `DockerSandboxSpecService` to create and manage agent server containers:
- **Image Selection**: Currently hardcoded to `ghcr.io/openhands/agent-server:5f62cee-python`
- **Environment Configuration**: Passed via `initial_env` in `SandboxSpecInfo`
- **Network Isolation**: Each conversation gets its own container instance
- **Resource Management**: Memory and CPU limits enforced at container level
### 3.2 Agent Server Internal Architecture
#### 3.2.1 FastAPI Application Structure
The agent server is built as a FastAPI application with these key components:
- **Conversation Router** (`conversation_router.py`): Handles HTTP endpoints for agent interaction
- **Conversation Service** (`conversation_service.py`): Manages conversation lifecycle and state
- **Event Service** (`event_service.py`): Processes agent actions and observations
- **Dependencies** (`dependencies.py`): Provides dependency injection for services
#### 3.2.2 Agent Instantiation Pattern
Currently, agents are instantiated during server startup using a fixed pattern:
```python
# Simplified current pattern
agent = Agent(
llm=LLM(model="default-model", api_key="..."),
tools=[TerminalTool(), FileEditorTool(), ...]
)
```
This creates a single agent instance that serves all requests to that container.
#### 3.2.3 Request Processing Flow
When the `/ask_agent` endpoint receives a request:
1. **Request Validation**: `AskAgentRequest` is validated and parsed
2. **Conversation Lookup**: Conversation state is retrieved or created
3. **Agent Invocation**: The fixed agent instance processes the question
4. **Response Formatting**: Result is wrapped in `AskAgentResponse`
5. **HTTP Response**: JSON response sent back to main server
### 3.3 Software Agent SDK Integration Points
#### 3.3.1 Agent Interface Requirements
The `software-agent-sdk` defines the contract that all agents must follow:
- **AgentBase**: Abstract base class requiring `llm` and `tools` parameters
- **Tool Integration**: Agents must work with the standardized tool system
- **Event Handling**: Agents process events through the conversation framework
- **State Management**: Agents maintain conversation context through event streams
#### 3.3.2 Tool System Architecture
The tool system provides the foundation for agent capabilities:
- **Tool Registration**: Tools are registered globally and resolved by name
- **Execution Framework**: `ToolExecutor` classes handle action execution
- **Built-in Tools**: Standard tools (Terminal, FileEditor, Browser) are always available
- **Custom Tools**: Additional tools can be registered through the plugin system
#### 3.3.3 LLM Integration
Agents interact with language models through the SDK's LLM abstraction:
- **Provider Agnostic**: Supports multiple LLM providers through unified interface
- **Configuration**: LLM settings (model, API keys, parameters) are configurable
- **Response Processing**: Structured handling of LLM responses and tool calls
### 3.4 Dynamic Loading Technical Foundation
#### 3.4.1 Python Package Management
Our dynamic loading approach leverages Python's built-in package management:
- **pip install**: Supports Git repositories, PyPI packages, and archive files
- **importlib**: Enables runtime module importing and class instantiation
- **entry_points**: Provides standardized plugin discovery mechanism
- **sys.path**: Allows dynamic modification of Python module search paths
#### 3.4.2 Container Environment Considerations
The agent server container provides a controlled environment for dynamic loading:
- **Python Runtime**: Pre-installed Python 3.x with pip and common libraries
- **Network Access**: Required for downloading packages from external sources
- **File System**: Writable areas for package installation and caching
- **Security Context**: Isolated from host system with appropriate permissions
#### 3.4.3 State Management Implications
Dynamic agent loading affects conversation state management:
- **Agent Persistence**: Custom agents must maintain state across requests
- **Configuration Isolation**: Different conversations can use different agent configurations
- **Resource Cleanup**: Proper cleanup of agent resources when conversations end
- **Error Recovery**: Fallback mechanisms when custom agents fail to load or execute
## 4. Technical Design
### 4.1 Current V1 Agent Instantiation Flow
To understand how our proposal integrates with the existing system, it's important to first examine how agents are currently instantiated and executed in the V1 architecture.
#### 4.1.1 Current Agent Server Startup Process
In the current V1 flow, agent instantiation follows this sequence:
1. **Main Server Request**: When a user creates a conversation, the main server (`openhands/app_server`) creates a sandbox specification via `DockerSandboxSpecService.get_default_sandbox_specs()`
2. **Container Launch**: The sandbox service launches the agent server container using the hardcoded image `ghcr.io/openhands/agent-server:5f62cee-python`
3. **Agent Server Initialization**: The agent server container starts with the command `['--port', '8000']` and initializes a FastAPI application
4. **Default Agent Creation**: During startup, the agent server creates a default agent instance (typically from the software-agent-sdk) with standard tools and configuration
5. **HTTP API Ready**: The agent server exposes the `/api/conversations/{id}/ask_agent` endpoint, routing requests to the default agent instance
#### 4.1.2 Current Agent Execution Flow
When a user sends a message through the V1 API:
1. **HTTP Request**: Main server makes POST request to `http://agent-server:8000/api/conversations/{id}/ask_agent`
2. **Agent Router**: `conversation_router.py` receives the request and extracts the `AskAgentRequest`
3. **Conversation Service**: `ConversationService.ask_agent()` method is called with the user's question
4. **Event Service**: The request is forwarded to `EventService.ask_agent()` which manages the conversation state
5. **Agent Execution**: The default agent processes the question using its configured LLM and tools
6. **Response Return**: The agent's response is returned through the same HTTP chain back to the main server
#### 4.1.3 Limitations of Current Approach
The current system has several limitations for custom agent deployment:
- **Fixed Agent Implementation**: The agent server container contains a single, hardcoded agent implementation
- **Static Configuration**: Agent behavior cannot be modified without rebuilding the entire container
- **No Runtime Customization**: Users cannot specify different agent types or configurations per conversation
- **Deployment Complexity**: Any agent customization requires building and maintaining custom Docker images
### 4.2 Proposed Dynamic Agent Loading Architecture
Our proposal extends the current V1 flow by introducing dynamic agent loading capabilities while maintaining full backward compatibility with existing APIs and infrastructure.
#### 4.2.1 Enhanced Agent Server Startup Process
The modified startup process introduces agent selection based on environment configuration:
1. **Environment Detection**: During agent server startup, check for `CUSTOM_AGENT_PACKAGE_URL` environment variable
2. **Conditional Loading**: If custom agent URL is present, download and install the package; otherwise use default agent
3. **Agent Factory Creation**: Create an agent factory function that can instantiate either custom or default agents
4. **HTTP API Registration**: Register the same `/ask_agent` endpoint, but route to the dynamically selected agent
#### 4.2.2 Dynamic Package Installation Process
When a custom agent package URL is detected, the system performs these steps:
1. **Package Download**: Use `pip install` to download the package from Git, PyPI, or ZIP sources
2. **Dependency Resolution**: Install any additional Python dependencies specified in the package
3. **Module Import**: Use `importlib` to dynamically import the custom agent module
4. **Agent Instantiation**: Call the package's `create_agent()` factory function with LLM and tools
5. **Initialization**: Execute any custom initialization logic defined by the agent
6. **Caching**: Cache the agent instance for reuse across multiple requests
#### 4.2.3 Modified Execution Flow
The execution flow remains largely unchanged from the user's perspective, but internally:
1. **Same HTTP API**: The `/ask_agent` endpoint signature and behavior remain identical
2. **Dynamic Routing**: Requests are routed to either custom or default agent based on startup configuration
3. **Transparent Operation**: The main server is unaware of whether it's communicating with a custom or default agent
4. **Consistent Response Format**: All agents return responses in the same `AskAgentResponse` format
### 4.3 Integration Points and Modifications
#### 4.3.1 Sandbox Service Modifications
The main server's sandbox service requires minimal changes to support dynamic agent loading:
```python
# Current: Fixed environment for all conversations
def get_default_sandbox_specs():
return [SandboxSpecInfo(
id=AGENT_SERVER_IMAGE,
command=['--port', '8000'],
initial_env={...} # Standard environment
)]
# Enhanced: Dynamic environment based on conversation requirements
def create_dynamic_agent_sandbox_spec(agent_package_url: str):
return SandboxSpecInfo(
id=AGENT_SERVER_IMAGE, # Same base image
command=['--port', '8000'],
initial_env={
...standard_env,
'CUSTOM_AGENT_PACKAGE_URL': agent_package_url # New variable
}
)
```
#### 4.3.2 Agent Server Startup Modifications
The agent server startup process is enhanced to detect and load custom agents:
```python
# Current: Fixed agent creation
@app.on_event("startup")
async def startup_event():
app.state.agent = DefaultAgent(llm=default_llm, tools=default_tools)
# Enhanced: Dynamic agent creation
@app.on_event("startup")
async def startup_event():
custom_agent_url = os.getenv('CUSTOM_AGENT_PACKAGE_URL')
if custom_agent_url:
loader = DynamicAgentLoader()
app.state.agent = await loader.load_agent_from_url(custom_agent_url, ...)
else:
app.state.agent = DefaultAgent(llm=default_llm, tools=default_tools)
```
#### 4.3.3 Conversation Service Integration
The conversation service routing logic is updated to use the dynamically loaded agent:
```python
# Current: Direct agent usage
async def ask_agent(self, conversation_id: UUID, question: str) -> str:
event_service = self.event_services[conversation_id]
return await event_service.ask_agent(question)
# Enhanced: Dynamic agent resolution
async def ask_agent(self, conversation_id: UUID, question: str) -> str:
event_service = self.event_services[conversation_id]
# Agent is now dynamically determined at startup
return await event_service.ask_agent(question)
```
### 4.4 Dynamic Agent Loading Implementation
#### 4.4.1 Agent Package Loader
```python
# openhands/agent_server/dynamic_agent_loader.py
import subprocess
import importlib
import tempfile
import os
import sys
from typing import Dict, Any, Optional
from urllib.parse import urlparse
from pathlib import Path
from openhands.sdk.agent.base import AgentBase
from openhands.sdk.llm import LLM
from openhands.sdk.tool import Tool
from openhands.sdk.logger import get_logger
logger = get_logger(__name__)
class DynamicAgentLoader:
"""Loads custom agents from package URLs at runtime."""
def __init__(self):
self.installed_packages: Dict[str, str] = {}
async def load_agent_from_url(
self,
package_url: str,
llm: LLM,
tools: list[Tool],
config: Optional[Dict[str, Any]] = None
) -> AgentBase:
"""Load and instantiate agent from package URL."""
# Check if already installed
if package_url in self.installed_packages:
package_name = self.installed_packages[package_url]
return await self._create_agent_instance(package_name, llm, tools, config)
# Install the package
package_name = await self._install_package(package_url)
self.installed_packages[package_url] = package_name
# Create agent instance
return await self._create_agent_instance(package_name, llm, tools, config)
async def _install_package(self, package_url: str) -> str:
"""Install package from URL and return package name."""
logger.info(f"Installing custom agent package: {package_url}")
try:
# Install package using pip
result = subprocess.run([
sys.executable, "-m", "pip", "install", package_url
], capture_output=True, text=True, check=True)
logger.info(f"Package installation successful: {result.stdout}")
# Extract package name from URL
package_name = self._extract_package_name(package_url)
return package_name
except subprocess.CalledProcessError as e:
logger.error(f"Failed to install package {package_url}: {e.stderr}")
raise RuntimeError(f"Package installation failed: {e.stderr}")
def _extract_package_name(self, package_url: str) -> str:
"""Extract package name from various URL formats."""
if package_url.startswith('git+'):
# Git URL: extract repo name
url = package_url.replace('git+', '')
return Path(urlparse(url).path).stem
elif '==' in package_url:
# PyPI with version: extract package name
return package_url.split('==')[0]
elif package_url.endswith('.zip'):
# ZIP file: extract filename
return Path(urlparse(package_url).path).stem
else:
# Assume it's a simple package name
return package_url
async def _create_agent_instance(
self,
package_name: str,
llm: LLM,
tools: list[Tool],
config: Optional[Dict[str, Any]] = None
) -> AgentBase:
"""Create agent instance from installed package."""
try:
# Import the package
module = importlib.import_module(package_name)
# Look for create_agent function
if hasattr(module, 'create_agent'):
create_agent_func = getattr(module, 'create_agent')
agent = create_agent_func(llm=llm, tools=tools, config=config)
else:
# Fallback: look for agent class
agent_classes = [
attr for attr in dir(module)
if (isinstance(getattr(module, attr), type) and
issubclass(getattr(module, attr), AgentBase) and
getattr(module, attr) != AgentBase)
]
if not agent_classes:
raise RuntimeError(f"No agent class found in package {package_name}")
agent_class = getattr(module, agent_classes[0])
agent = agent_class(llm=llm, tools=tools, config=config)
# Initialize the agent
if hasattr(agent, 'initialize'):
await agent.initialize()
logger.info(f"Successfully created agent from package: {package_name}")
return agent
except Exception as e:
logger.error(f"Failed to create agent from package {package_name}: {e}")
raise RuntimeError(f"Agent instantiation failed: {e}")
```
#### 4.1.2 Agent Server Integration
```python
# Modified openhands/agent_server/conversation_service.py
import os
from typing import Optional
from openhands.agent_server.dynamic_agent_loader import DynamicAgentLoader
from openhands.sdk.agent.base import AgentBase
from openhands.sdk.agent import Agent # Default agent
class ConversationService:
"""Enhanced conversation service with dynamic agent loading."""
def __init__(self, config: Config):
self.config = config
self.agent_loader = DynamicAgentLoader()
self._default_agent_factory = None
self._custom_agent_cache: Dict[str, AgentBase] = {}
async def _get_or_create_agent(
self,
conversation_id: UUID,
llm: LLM,
tools: list[Tool]
) -> AgentBase:
"""Get or create agent for conversation."""
# Check for custom agent package URL in environment
custom_agent_url = os.getenv('CUSTOM_AGENT_PACKAGE_URL')
if custom_agent_url:
# Use custom agent
if custom_agent_url not in self._custom_agent_cache:
agent = await self.agent_loader.load_agent_from_url(
package_url=custom_agent_url,
llm=llm,
tools=tools,
config=self._get_agent_config()
)
self._custom_agent_cache[custom_agent_url] = agent
return self._custom_agent_cache[custom_agent_url]
else:
# Use default agent
if not self._default_agent_factory:
self._default_agent_factory = Agent(llm=llm, tools=tools)
return self._default_agent_factory
def _get_agent_config(self) -> Dict[str, Any]:
"""Extract agent configuration from environment."""
config = {}
# Parse JSON config if provided
config_json = os.getenv('CUSTOM_AGENT_CONFIG')
if config_json:
import json
config.update(json.loads(config_json))
return config
```
### 4.2 Sandbox Service Integration
#### 4.2.1 Enhanced Sandbox Specification
```python
# openhands/app_server/sandbox/docker_sandbox_spec_service.py
from typing import Optional
class DockerSandboxSpecService(SandboxSpecService):
"""Enhanced sandbox service supporting dynamic agent loading."""
def create_dynamic_agent_sandbox_spec(
self,
agent_package_url: str,
agent_config: Optional[Dict[str, Any]] = None
) -> SandboxSpecInfo:
"""Create sandbox spec with dynamic agent loading configuration."""
# Base environment from existing implementation
base_env = {
'OPENVSCODE_SERVER_ROOT': '/openhands/.openvscode-server',
'OH_ENABLE_VNC': '0',
'LOG_JSON': 'true',
'OH_CONVERSATIONS_PATH': '/workspace/conversations',
'OH_BASH_EVENTS_DIR': '/workspace/bash_events',
'PYTHONUNBUFFERED': '1',
'ENV_LOG_LEVEL': '20',
}
# Add dynamic agent configuration
dynamic_env = {
**base_env,
'CUSTOM_AGENT_PACKAGE_URL': agent_package_url,
}
# Add agent configuration as JSON if provided
if agent_config:
import json
dynamic_env['CUSTOM_AGENT_CONFIG'] = json.dumps(agent_config)
return SandboxSpecInfo(
id=AGENT_SERVER_IMAGE, # Same base image
command=['--port', '8000'],
initial_env=dynamic_env,
working_dir='/workspace/project',
)
```
#### 4.2.2 Conversation Creation API Enhancement
```python
# openhands/server/routes/conversation_routes.py
from pydantic import BaseModel
from typing import Optional, Dict, Any
class CreateConversationRequest(BaseModel):
"""Enhanced conversation creation request."""
initial_message: str
workspace_config: Optional[Dict[str, Any]] = None
# New field for dynamic agent loading
agent_package_url: Optional[str] = None
agent_config: Optional[Dict[str, Any]] = None
@router.post("/conversations")
async def create_conversation(
request: CreateConversationRequest,
sandbox_service: DockerSandboxSpecService = Depends(get_sandbox_service)
) -> ConversationResponse:
"""Create conversation with optional dynamic agent loading."""
if request.agent_package_url:
# Create sandbox with dynamic agent loading
sandbox_spec = sandbox_service.create_dynamic_agent_sandbox_spec(
agent_package_url=request.agent_package_url,
agent_config=request.agent_config
)
else:
# Use default sandbox specification
sandbox_spec = sandbox_service.get_default_sandbox_specs()[0]
# Create sandbox and conversation
sandbox = await sandbox_service.create_sandbox(sandbox_spec)
await wait_for_agent_server_ready(sandbox)
conversation = await create_conversation_with_sandbox(
sandbox=sandbox,
initial_message=request.initial_message,
workspace_config=request.workspace_config
)
return ConversationResponse(
conversation_id=conversation.id,
status="created",
agent_type="custom" if request.agent_package_url else "default"
)
```
### 4.3 Agent Server Startup Process
#### 4.3.1 Enhanced Agent Server Initialization
```python
# openhands/agent_server/api.py startup modification
import os
from openhands.agent_server.dynamic_agent_loader import DynamicAgentLoader
@app.on_event("startup")
async def startup_event():
"""Enhanced startup with dynamic agent loading support."""
# Initialize dynamic agent loader
app.state.agent_loader = DynamicAgentLoader()
# Check for custom agent package URL
custom_agent_url = os.getenv('CUSTOM_AGENT_PACKAGE_URL')
if custom_agent_url:
logger.info(f"Dynamic agent loading enabled: {custom_agent_url}")
# Pre-validate package URL (optional)
try:
await app.state.agent_loader._install_package(custom_agent_url)
logger.info("Custom agent package pre-installed successfully")
except Exception as e:
logger.warning(f"Custom agent pre-installation failed: {e}")
# Continue with startup - will retry on first conversation
else:
logger.info("Using default agent configuration")
```
### 4.4 Error Handling and Fallback
#### 4.4.1 Robust Error Handling
```python
# openhands/agent_server/dynamic_agent_loader.py (enhanced)
class DynamicAgentLoader:
"""Enhanced loader with comprehensive error handling."""
async def load_agent_with_fallback(
self,
package_url: str,
llm: LLM,
tools: list[Tool],
config: Optional[Dict[str, Any]] = None
) -> AgentBase:
"""Load custom agent with fallback to default agent."""
try:
return await self.load_agent_from_url(package_url, llm, tools, config)
except Exception as e:
logger.error(f"Custom agent loading failed: {e}")
logger.info("Falling back to default agent")
# Import default agent
from openhands.sdk.agent import Agent
return Agent(llm=llm, tools=tools)
async def _validate_package_url(self, package_url: str) -> bool:
"""Validate package URL accessibility."""
try:
if package_url.startswith('git+'):
# Validate Git repository access
import subprocess
result = subprocess.run([
'git', 'ls-remote', package_url.replace('git+', '')
], capture_output=True, timeout=30)
return result.returncode == 0
elif package_url.startswith('http'):
# Validate HTTP URL accessibility
import httpx
async with httpx.AsyncClient() as client:
response = await client.head(package_url, timeout=30)
return response.status_code == 200
else:
# Assume PyPI package - always return True
return True
except Exception:
return False
```
### 4.5 Security and Isolation
#### 4.5.1 Package Security Validation
```python
# openhands/agent_server/security/package_validator.py
import re
from typing import List, Set
from urllib.parse import urlparse
class PackageSecurityValidator:
"""Validates custom agent packages for security compliance."""
ALLOWED_DOMAINS: Set[str] = {
'github.com',
'gitlab.com',
'bitbucket.org',
'pypi.org',
'files.pythonhosted.org'
}
BLOCKED_PACKAGES: Set[str] = {
# Add known malicious packages
}
def validate_package_url(self, package_url: str) -> bool:
"""Validate package URL against security policies."""
# Check blocked packages
if self._is_blocked_package(package_url):
return False
# Validate domain for HTTP/Git URLs
if package_url.startswith(('http', 'git+')):
parsed = urlparse(package_url.replace('git+', ''))
if parsed.hostname not in self.ALLOWED_DOMAINS:
return False
# Additional security checks
return self._validate_package_name(package_url)
def _is_blocked_package(self, package_url: str) -> bool:
"""Check if package is in blocklist."""
for blocked in self.BLOCKED_PACKAGES:
if blocked in package_url.lower():
return True
return False
def _validate_package_name(self, package_url: str) -> bool:
"""Validate package name format."""
# Basic validation for malicious patterns
malicious_patterns = [
r'\.\./', # Path traversal
r'[;&|`$]', # Command injection
r'<script', # XSS attempts
]
for pattern in malicious_patterns:
if re.search(pattern, package_url, re.IGNORECASE):
return False
return True
```
## 5. Implementation Plan
All implementation must pass existing lints and tests. New functionality requires comprehensive test coverage including unit tests, integration tests, and end-to-end scenarios.
### 5.1 Dynamic Agent Loading Foundation (M1)
#### 5.1.1 Dynamic Agent Loader Implementation
* `openhands/agent_server/dynamic_agent_loader.py`
* `tests/unit/agent_server/test_dynamic_agent_loader.py`
Implement core dynamic agent loading functionality with package installation, module importing, and agent instantiation.
#### 5.1.2 Package Security Validation
* `openhands/agent_server/security/package_validator.py`
* `tests/unit/agent_server/security/test_package_validator.py`
Add security validation for custom agent packages including domain allowlists and malicious pattern detection.
**Demo**: Load a simple custom agent from a Git repository and verify it responds to basic queries through the existing `/ask_agent` HTTP API.
### 5.2 Sandbox Service Integration (M2)
#### 5.2.1 Enhanced Sandbox Specification
* `openhands/app_server/sandbox/docker_sandbox_spec_service.py` (modifications)
* `tests/unit/app_server/sandbox/test_docker_sandbox_spec_service.py` (enhancements)
Extend existing sandbox service to support dynamic agent loading configuration through environment variables.
#### 5.2.2 Agent Server Startup Integration
* `openhands/agent_server/conversation_service.py` (modifications)
* `openhands/agent_server/api.py` (startup enhancements)
* `tests/unit/agent_server/test_conversation_service.py` (enhancements)
Integrate dynamic agent loading into agent server startup and conversation management processes.
**Demo**: Create conversations with custom agents specified via environment variables and demonstrate proper agent instantiation and tool execution.
### 5.3 API Integration (M3)
#### 5.3.1 Enhanced Conversation Creation API
* `openhands/server/routes/conversation_routes.py` (modifications)
* `tests/unit/server/routes/test_conversation_routes.py` (enhancements)
Extend conversation creation API to accept agent package URLs and configuration parameters.
#### 5.3.2 Error Handling and Fallback
* `openhands/agent_server/dynamic_agent_loader.py` (enhancements)
* `tests/unit/agent_server/test_dynamic_agent_fallback.py`
Implement comprehensive error handling with fallback to default agents when custom agent loading fails.
**Demo**: Create conversations through API endpoints with various package URL formats (Git, PyPI, ZIP) and demonstrate proper error handling and fallback behavior.
### 5.4 Advanced Features and Optimization (M4)
#### 5.4.1 Agent Caching and Performance
* `openhands/agent_server/agent_cache.py`
* `tests/unit/agent_server/test_agent_cache.py`
Implement agent instance caching to avoid repeated package installation and improve performance for multiple conversations with the same custom agent.
#### 5.4.2 Package Management and Cleanup
* `openhands/agent_server/package_manager.py`
* `tests/unit/agent_server/test_package_manager.py`
Add package lifecycle management including cleanup of unused packages and version management for package updates.
**Demo**: Deploy multiple conversations with different custom agents simultaneously and demonstrate proper resource management, caching, and cleanup behavior.
---
## References
This design document is based on analysis of the following source materials:
1. **OpenHands V1 Architecture**: Analysis of `openhands/app_server/sandbox/docker_sandbox_spec_service.py` and `openhands/app_server/event_callback/github_v1_callback_processor.py` for understanding the V1 flow and agent server integration.
2. **Software Agent SDK**: Analysis of the `software-agent-sdk` repository, specifically:
- `openhands-agent-server/openhands/agent_server/conversation_router.py` for HTTP API patterns
- `openhands-sdk/openhands/sdk/agent/base.py` for agent interface requirements
- `examples/01_standalone_sdk/02_custom_tools.py` for custom agent implementation patterns
3. **Agent Server Models**: Analysis of `openhands.agent_server.models` imports in the main OpenHands codebase for understanding the API contract between main server and agent server.
4. **Container Architecture**: Analysis of `AGENT_SERVER_IMAGE` constant usage in `openhands/app_server/sandbox/sandbox_spec_service.py` for understanding the current container deployment model.
All technical specifications and implementation details are derived from examination of the existing codebase and established patterns within the OpenHands ecosystem.

View File

@@ -7,7 +7,7 @@ services:
image: openhands:latest
container_name: openhands-app-${DATE:-}
environment:
- SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-docker.all-hands.dev/openhands/runtime:0.59-nikolaik}
- SANDBOX_RUNTIME_CONTAINER_IMAGE=${SANDBOX_RUNTIME_CONTAINER_IMAGE:-docker.openhands.dev/openhands/runtime:0.62-nikolaik}
#- SANDBOX_USER_ID=${SANDBOX_USER_ID:-1234} # enable this only if you want a specific non-root sandbox user but you will have to manually adjust permissions of ~/.openhands for this user
- WORKSPACE_MOUNT_PATH=${WORKSPACE_BASE:-$PWD/workspace}
ports:

View File

@@ -5,12 +5,8 @@ from experiments.constants import (
EXPERIMENT_SYSTEM_PROMPT_EXPERIMENT,
)
from experiments.experiment_versions import (
handle_condenser_max_step_experiment,
handle_system_prompt_experiment,
)
from experiments.experiment_versions._004_condenser_max_step_experiment import (
handle_condenser_max_step_experiment__v1,
)
from openhands.core.config.openhands_config import OpenHandsConfig
from openhands.core.logger import openhands_logger as logger
@@ -31,10 +27,6 @@ class SaaSExperimentManager(ExperimentManager):
)
return agent
agent = handle_condenser_max_step_experiment__v1(
user_id, conversation_id, agent
)
if EXPERIMENT_SYSTEM_PROMPT_EXPERIMENT:
agent = agent.model_copy(
update={'system_prompt_filename': 'system_prompt_long_horizon.j2'}
@@ -60,20 +52,7 @@ class SaaSExperimentManager(ExperimentManager):
"""
logger.debug(
'experiment_manager:run_conversation_variant_test:started',
extra={'user_id': user_id},
)
# Skip all experiment processing if the experiment manager is disabled
if not ENABLE_EXPERIMENT_MANAGER:
logger.info(
'experiment_manager:run_conversation_variant_test:skipped',
extra={'reason': 'experiment_manager_disabled'},
)
return conversation_settings
# Apply conversation-scoped experiments
conversation_settings = handle_condenser_max_step_experiment(
user_id, conversation_id, conversation_settings
extra={'user_id': user_id, 'conversation_id': conversation_id},
)
return conversation_settings

View File

@@ -292,18 +292,26 @@ class GithubManager(Manager):
f'[GitHub] Created conversation {conversation_id} for user {user_info.username}'
)
# Create a GithubCallbackProcessor
processor = GithubCallbackProcessor(
github_view=github_view,
send_summary_instruction=True,
)
from openhands.server.shared import ConversationStoreImpl, config
# Register the callback processor
register_callback_processor(conversation_id, processor)
logger.info(
f'[Github] Registered callback processor for conversation {conversation_id}'
conversation_store = await ConversationStoreImpl.get_instance(
config, github_view.user_info.keycloak_user_id
)
metadata = await conversation_store.get_metadata(conversation_id)
if metadata.conversation_version != 'v1':
# Create a GithubCallbackProcessor
processor = GithubCallbackProcessor(
github_view=github_view,
send_summary_instruction=True,
)
# Register the callback processor
register_callback_processor(conversation_id, processor)
logger.info(
f'[Github] Registered callback processor for conversation {conversation_id}'
)
# Send message with conversation link
conversation_link = CONVERSATION_URL.format(conversation_id)

View File

@@ -1,4 +1,4 @@
from uuid import uuid4
from uuid import UUID, uuid4
from github import Github, GithubIntegration
from github.Issue import Issue
@@ -26,10 +26,22 @@ from storage.proactive_conversation_store import ProactiveConversationStore
from storage.saas_secrets_store import SaasSecretsStore
from storage.saas_settings_store import SaasSettingsStore
from openhands.agent_server.models import SendMessageRequest
from openhands.app_server.app_conversation.app_conversation_models import (
AppConversationStartRequest,
AppConversationStartTaskStatus,
)
from openhands.app_server.config import get_app_conversation_service
from openhands.app_server.services.injector import InjectorState
from openhands.app_server.user.specifiy_user_context import USER_CONTEXT_ATTR
from openhands.app_server.user.user_context import UserContext
from openhands.app_server.user.user_models import UserInfo
from openhands.core.logger import openhands_logger as logger
from openhands.integrations.github.github_service import GithubServiceImpl
from openhands.integrations.provider import PROVIDER_TOKEN_TYPE, ProviderType
from openhands.integrations.service_types import Comment
from openhands.sdk import TextContent
from openhands.sdk.conversation.secret_source import SecretSource
from openhands.server.services.conversation_service import (
initialize_conversation,
start_conversation,
@@ -43,6 +55,49 @@ from openhands.utils.async_utils import call_sync_from_async
OH_LABEL, INLINE_OH_LABEL = get_oh_labels(HOST)
class GithubUserContext(UserContext):
"""User context for GitHub integration that provides user info without web request."""
def __init__(self, keycloak_user_id: str, git_provider_tokens: PROVIDER_TOKEN_TYPE):
self.keycloak_user_id = keycloak_user_id
self.git_provider_tokens = git_provider_tokens
self.settings_store = SaasSettingsStore(
user_id=self.keycloak_user_id,
session_maker=session_maker,
config=get_config(),
)
self.secrets_store = SaasSecretsStore(
self.keycloak_user_id, session_maker, get_config()
)
async def get_user_id(self) -> str | None:
return self.keycloak_user_id
async def get_user_info(self) -> UserInfo:
user_settings = await self.settings_store.load()
return UserInfo(
id=self.keycloak_user_id,
**user_settings.model_dump(context={'expose_secrets': True}),
)
async def get_authenticated_git_url(self, repository: str) -> str:
# This would need to be implemented based on the git provider tokens
# For now, return a basic HTTPS URL
return f'https://github.com/{repository}.git'
async def get_latest_token(self, provider_type: ProviderType) -> str | None:
# Return the appropriate token from git_provider_tokens
if provider_type == ProviderType.GITHUB and self.git_provider_tokens:
return self.git_provider_tokens.get(ProviderType.GITHUB)
return None
async def get_secrets(self) -> dict[str, SecretSource]:
# Return empty dict for now - GitHub integration handles secrets separately
user_secrets = await self.secrets_store.load()
return dict(user_secrets.custom_secrets) if user_secrets else {}
async def get_user_proactive_conversation_setting(user_id: str | None) -> bool:
"""Get the user's proactive conversation setting.
@@ -76,6 +131,35 @@ async def get_user_proactive_conversation_setting(user_id: str | None) -> bool:
return settings.enable_proactive_conversation_starters
async def get_user_v1_enabled_setting(user_id: str | None) -> bool:
"""Get the user's V1 conversation API setting.
Args:
user_id: The keycloak user ID
Returns:
True if V1 conversations are enabled for this user, False otherwise
"""
# If no user ID is provided, we can't check user settings
if not user_id:
return False
config = get_config()
settings_store = SaasSettingsStore(
user_id=user_id, session_maker=session_maker, config=config
)
settings = await call_sync_from_async(
settings_store.get_user_settings_by_keycloak_id, user_id
)
if not settings or settings.v1_enabled is None:
return False
return settings.v1_enabled
# =================================================
# SECTION: Github view types
# =================================================
@@ -159,6 +243,31 @@ class GithubIssue(ResolverViewInterface):
git_provider_tokens: PROVIDER_TOKEN_TYPE,
conversation_metadata: ConversationMetadata,
):
v1_enabled = await get_user_v1_enabled_setting(self.user_info.keycloak_user_id)
if v1_enabled:
try:
# Use V1 app conversation service
await self._create_v1_conversation(
jinja_env, git_provider_tokens, conversation_metadata
)
return
except Exception as e:
logger.warning(f'Error checking V1 settings, falling back to V0: {e}')
# Use existing V0 conversation service
await self._create_v0_conversation(
jinja_env, git_provider_tokens, conversation_metadata
)
async def _create_v0_conversation(
self,
jinja_env: Environment,
git_provider_tokens: PROVIDER_TOKEN_TYPE,
conversation_metadata: ConversationMetadata,
):
"""Create conversation using the legacy V0 system."""
custom_secrets = await self._get_user_secrets()
user_instructions, conversation_instructions = await self._get_instructions(
@@ -177,6 +286,77 @@ class GithubIssue(ResolverViewInterface):
conversation_instructions=conversation_instructions,
)
async def _create_v1_conversation(
self,
jinja_env: Environment,
git_provider_tokens: PROVIDER_TOKEN_TYPE,
conversation_metadata: ConversationMetadata,
):
"""Create conversation using the new V1 app conversation system."""
user_instructions, conversation_instructions = await self._get_instructions(
jinja_env
)
# Create the initial message request
initial_message = SendMessageRequest(
role='user', content=[TextContent(text=user_instructions)]
)
# Create the GitHub V1 callback processor
github_callback_processor = self._create_github_v1_callback_processor()
# Get the app conversation service and start the conversation
injector_state = InjectorState()
# Create the V1 conversation start request with the callback processor
start_request = AppConversationStartRequest(
conversation_id=UUID(conversation_metadata.conversation_id),
system_message_suffix=conversation_instructions,
initial_message=initial_message,
selected_repository=self.full_repo_name,
git_provider=ProviderType.GITHUB,
title=f'GitHub Issue #{self.issue_number}: {self.title}',
trigger=ConversationTrigger.RESOLVER,
processors=[
github_callback_processor
], # Pass the callback processor directly
)
# Set up the GitHub user context for the V1 system
github_user_context = GithubUserContext(
keycloak_user_id=self.user_info.keycloak_user_id,
git_provider_tokens=git_provider_tokens,
)
setattr(injector_state, USER_CONTEXT_ATTR, github_user_context)
async with get_app_conversation_service(
injector_state
) as app_conversation_service:
async for task in app_conversation_service.start_app_conversation(
start_request
):
if task.status == AppConversationStartTaskStatus.ERROR:
logger.error(f'Failed to start V1 conversation: {task.detail}')
raise RuntimeError(
f'Failed to start V1 conversation: {task.detail}'
)
def _create_github_v1_callback_processor(self):
"""Create a V1 callback processor for GitHub integration."""
from openhands.app_server.event_callback.github_v1_callback_processor import (
GithubV1CallbackProcessor,
)
# Create and return the GitHub V1 callback processor
return GithubV1CallbackProcessor(
github_view_data={
'issue_number': self.issue_number,
'full_repo_name': self.full_repo_name,
'installation_id': self.installation_id,
},
send_summary_instruction=self.send_summary_instruction,
)
@dataclass
class GithubIssueComment(GithubIssue):
@@ -292,6 +472,24 @@ class GithubInlinePRComment(GithubPRComment):
return user_instructions, conversation_instructions
def _create_github_v1_callback_processor(self):
"""Create a V1 callback processor for GitHub integration."""
from openhands.app_server.event_callback.github_v1_callback_processor import (
GithubV1CallbackProcessor,
)
# Create and return the GitHub V1 callback processor
return GithubV1CallbackProcessor(
github_view_data={
'issue_number': self.issue_number,
'full_repo_name': self.full_repo_name,
'installation_id': self.installation_id,
'comment_id': self.comment_id,
},
inline_pr_comment=True,
send_summary_instruction=self.send_summary_instruction,
)
@dataclass
class GithubFailingAction:

View File

@@ -0,0 +1,71 @@
"""add status and updated_at to callback
Revision ID: 080
Revises: 079
Create Date: 2025-11-05 00:00:00.000000
"""
from enum import Enum
from typing import Sequence, Union
import sqlalchemy as sa
from alembic import op
# revision identifiers, used by Alembic.
revision: str = '080'
down_revision: Union[str, None] = '079'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
class EventCallbackStatus(Enum):
ACTIVE = 'ACTIVE'
DISABLED = 'DISABLED'
COMPLETED = 'COMPLETED'
ERROR = 'ERROR'
def upgrade() -> None:
"""Upgrade schema."""
status = sa.Enum(EventCallbackStatus, name='eventcallbackstatus')
status.create(op.get_bind(), checkfirst=True)
op.add_column(
'event_callback',
sa.Column('status', status, nullable=False, server_default='ACTIVE'),
)
op.add_column(
'event_callback',
sa.Column(
'updated_at', sa.DateTime, nullable=False, server_default=sa.func.now()
),
)
op.drop_index('ix_event_callback_result_event_id')
op.drop_column('event_callback_result', 'event_id')
op.add_column(
'event_callback_result', sa.Column('event_id', sa.String, nullable=True)
)
op.create_index(
op.f('ix_event_callback_result_event_id'),
'event_callback_result',
['event_id'],
unique=False,
)
def downgrade() -> None:
"""Downgrade schema."""
op.drop_column('event_callback', 'status')
op.drop_column('event_callback', 'updated_at')
op.drop_index('ix_event_callback_result_event_id')
op.drop_column('event_callback_result', 'event_id')
op.add_column(
'event_callback_result', sa.Column('event_id', sa.UUID, nullable=True)
)
op.create_index(
op.f('ix_event_callback_result_event_id'),
'event_callback_result',
['event_id'],
unique=False,
)
op.execute('DROP TYPE eventcallbackstatus')

View File

@@ -0,0 +1,41 @@
"""add parent_conversation_id to conversation_metadata
Revision ID: 081
Revises: 080
Create Date: 2025-11-06 00:00:00.000000
"""
from typing import Sequence, Union
import sqlalchemy as sa
from alembic import op
# revision identifiers, used by Alembic.
revision: str = '081'
down_revision: Union[str, None] = '080'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
"""Upgrade schema."""
op.add_column(
'conversation_metadata',
sa.Column('parent_conversation_id', sa.String(), nullable=True),
)
op.create_index(
op.f('ix_conversation_metadata_parent_conversation_id'),
'conversation_metadata',
['parent_conversation_id'],
unique=False,
)
def downgrade() -> None:
"""Downgrade schema."""
op.drop_index(
op.f('ix_conversation_metadata_parent_conversation_id'),
table_name='conversation_metadata',
)
op.drop_column('conversation_metadata', 'parent_conversation_id')

View File

@@ -0,0 +1,51 @@
"""Add SETTING_UP_SKILLS to appconversationstarttaskstatus enum
Revision ID: 082
Revises: 081
Create Date: 2025-11-19 12:00:00.000000
"""
from typing import Sequence, Union
from alembic import op
from sqlalchemy import text
# revision identifiers, used by Alembic.
revision: str = '082'
down_revision: Union[str, Sequence[str], None] = '081'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
"""Add SETTING_UP_SKILLS enum value to appconversationstarttaskstatus."""
# Check if the enum value already exists before adding it
# This handles the case where the enum was created with the value already included
connection = op.get_bind()
result = connection.execute(
text(
"SELECT 1 FROM pg_enum WHERE enumlabel = 'SETTING_UP_SKILLS' "
"AND enumtypid = (SELECT oid FROM pg_type WHERE typname = 'appconversationstarttaskstatus')"
)
)
if not result.fetchone():
# Add the new enum value only if it doesn't already exist
op.execute(
"ALTER TYPE appconversationstarttaskstatus ADD VALUE 'SETTING_UP_SKILLS'"
)
def downgrade() -> None:
"""Remove SETTING_UP_SKILLS enum value from appconversationstarttaskstatus.
Note: PostgreSQL doesn't support removing enum values directly.
This would require recreating the enum type and updating all references.
For safety, this downgrade is not implemented.
"""
# PostgreSQL doesn't support removing enum values directly
# This would require a complex migration to recreate the enum
# For now, we'll leave this as a no-op since removing enum values
# is rarely needed and can be dangerous
pass

View File

@@ -0,0 +1,35 @@
"""Add v1_enabled column to user_settings
Revision ID: 083
Revises: 082
Create Date: 2025-11-18 00:00:00.000000
"""
from typing import Sequence, Union
import sqlalchemy as sa
from alembic import op
# revision identifiers, used by Alembic.
revision: str = '083'
down_revision: Union[str, None] = '082'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
"""Add v1_enabled column to user_settings table."""
op.add_column(
'user_settings',
sa.Column(
'v1_enabled',
sa.Boolean(),
nullable=True,
),
)
def downgrade() -> None:
"""Remove v1_enabled column from user_settings table."""
op.drop_column('user_settings', 'v1_enabled')

348
enterprise/poetry.lock generated
View File

@@ -201,19 +201,20 @@ files = [
[[package]]
name = "anthropic"
version = "0.65.0"
version = "0.72.0"
description = "The official Python library for the anthropic API"
optional = false
python-versions = ">=3.8"
groups = ["main"]
files = [
{file = "anthropic-0.65.0-py3-none-any.whl", hash = "sha256:ba9d9f82678046c74ddf5698ca06d9f5b0f599cfac922ab0d5921638eb448d98"},
{file = "anthropic-0.65.0.tar.gz", hash = "sha256:6b6b6942574e54342050dfd42b8d856a8366b171daec147df3b80be4722733b9"},
{file = "anthropic-0.72.0-py3-none-any.whl", hash = "sha256:0e9f5a7582f038cab8efbb4c959e49ef654a56bfc7ba2da51b5a7b8a84de2e4d"},
{file = "anthropic-0.72.0.tar.gz", hash = "sha256:8971fe76dcffc644f74ac3883069beb1527641115ae0d6eb8fa21c1ce4082f7a"},
]
[package.dependencies]
anyio = ">=3.5.0,<5"
distro = ">=1.7.0,<2"
docstring-parser = ">=0.15,<1"
google-auth = {version = ">=2,<3", extras = ["requests"], optional = true, markers = "extra == \"vertex\""}
httpx = ">=0.25.0,<1"
jiter = ">=0.4.0,<1"
@@ -222,7 +223,7 @@ sniffio = "*"
typing-extensions = ">=4.10,<5"
[package.extras]
aiohttp = ["aiohttp", "httpx-aiohttp (>=0.1.8)"]
aiohttp = ["aiohttp", "httpx-aiohttp (>=0.1.9)"]
bedrock = ["boto3 (>=1.28.57)", "botocore (>=1.31.57)"]
vertex = ["google-auth[requests] (>=2,<3)"]
@@ -681,31 +682,34 @@ crt = ["awscrt (==0.27.6)"]
[[package]]
name = "browser-use"
version = "0.7.10"
version = "0.9.5"
description = "Make websites accessible for AI agents"
optional = false
python-versions = "<4.0,>=3.11"
groups = ["main"]
files = [
{file = "browser_use-0.7.10-py3-none-any.whl", hash = "sha256:669e12571a0c0c4c93e5fd26abf9e2534eb9bacbc510328aedcab795bd8906a9"},
{file = "browser_use-0.7.10.tar.gz", hash = "sha256:f93ce59e06906c12d120360dee4aa33d83618ddf7c9a575dd0ac517d2de7ccbc"},
{file = "browser_use-0.9.5-py3-none-any.whl", hash = "sha256:4a2e92847204d1ded269026a99cb0cc0e60e38bd2751fa3f58aedd78f00b4e67"},
{file = "browser_use-0.9.5.tar.gz", hash = "sha256:f8285fe253b149d01769a7084883b4cf4db351e2f38e26302c157bcbf14a703f"},
]
[package.dependencies]
aiohttp = "3.12.15"
anthropic = ">=0.58.2,<1.0.0"
anthropic = ">=0.68.1,<1.0.0"
anyio = ">=4.9.0"
authlib = ">=1.6.0"
bubus = ">=1.5.6"
cdp-use = ">=1.4.0"
click = ">=8.1.8"
cloudpickle = ">=3.1.1"
google-api-core = ">=2.25.0"
google-api-python-client = ">=2.174.0"
google-auth = ">=2.40.3"
google-auth-oauthlib = ">=1.2.2"
google-genai = ">=1.29.0,<2.0.0"
groq = ">=0.30.0"
html2text = ">=2025.4.15"
httpx = ">=0.28.1"
inquirerpy = ">=0.3.4"
markdownify = ">=1.2.0"
mcp = ">=1.10.1"
ollama = ">=0.5.1"
openai = ">=1.99.2,<2.0.0"
@@ -720,16 +724,20 @@ pypdf = ">=5.7.0"
python-dotenv = ">=1.0.1"
reportlab = ">=4.0.0"
requests = ">=2.32.3"
rich = ">=14.0.0"
screeninfo = {version = ">=0.8.1", markers = "platform_system != \"darwin\""}
typing-extensions = ">=4.12.2"
uuid7 = ">=0.1.0"
[package.extras]
all = ["agentmail (>=0.0.53)", "boto3 (>=1.38.45)", "botocore (>=1.37.23)", "click (>=8.1.8)", "imgcat (>=0.6.0)", "langchain-openai (>=0.3.26)", "rich (>=14.0.0)", "textual (>=3.2.0)"]
all = ["agentmail (==0.0.59)", "boto3 (>=1.38.45)", "botocore (>=1.37.23)", "imgcat (>=0.6.0)", "langchain-openai (>=0.3.26)", "oci (>=2.126.4)", "textual (>=3.2.0)"]
aws = ["boto3 (>=1.38.45)"]
cli = ["click (>=8.1.8)", "rich (>=14.0.0)", "textual (>=3.2.0)"]
eval = ["anyio (>=4.9.0)", "browserbase (==1.4.0)", "datamodel-code-generator (>=0.26.0)", "hyperbrowser (==0.47.0)", "lmnr[all] (==0.7.10)", "psutil (>=7.0.0)"]
examples = ["agentmail (>=0.0.53)", "botocore (>=1.37.23)", "imgcat (>=0.6.0)", "langchain-openai (>=0.3.26)"]
cli = ["textual (>=3.2.0)"]
cli-oci = ["oci (>=2.126.4)", "textual (>=3.2.0)"]
code = ["matplotlib (>=3.9.0)", "numpy (>=2.3.2)", "pandas (>=2.2.0)", "tabulate (>=0.9.0)"]
eval = ["anyio (>=4.9.0)", "datamodel-code-generator (>=0.26.0)", "lmnr[all] (==0.7.17)", "psutil (>=7.0.0)"]
examples = ["agentmail (==0.0.59)", "botocore (>=1.37.23)", "imgcat (>=0.6.0)", "langchain-openai (>=0.3.26)"]
oci = ["oci (>=2.126.4)"]
video = ["imageio[ffmpeg] (>=2.37.0)", "numpy (>=2.3.2)"]
[[package]]
@@ -3525,6 +3533,25 @@ files = [
{file = "iniconfig-2.1.0.tar.gz", hash = "sha256:3abbd2e30b36733fee78f9c7f7308f2d0050e88f0087fd25c2645f63c773e1c7"},
]
[[package]]
name = "inquirerpy"
version = "0.3.4"
description = "Python port of Inquirer.js (A collection of common interactive command-line user interfaces)"
optional = false
python-versions = ">=3.7,<4.0"
groups = ["main"]
files = [
{file = "InquirerPy-0.3.4-py3-none-any.whl", hash = "sha256:c65fdfbac1fa00e3ee4fb10679f4d3ed7a012abf4833910e63c295827fe2a7d4"},
{file = "InquirerPy-0.3.4.tar.gz", hash = "sha256:89d2ada0111f337483cb41ae31073108b2ec1e618a49d7110b0d7ade89fc197e"},
]
[package.dependencies]
pfzy = ">=0.3.1,<0.4.0"
prompt-toolkit = ">=3.0.1,<4.0.0"
[package.extras]
docs = ["Sphinx (>=4.1.2,<5.0.0)", "furo (>=2021.8.17-beta.43,<2022.0.0)", "myst-parser (>=0.15.1,<0.16.0)", "sphinx-autobuild (>=2021.3.14,<2022.0.0)", "sphinx-copybutton (>=0.4.0,<0.5.0)"]
[[package]]
name = "installer"
version = "0.7.0"
@@ -4580,6 +4607,62 @@ files = [
{file = "llvmlite-0.44.0.tar.gz", hash = "sha256:07667d66a5d150abed9157ab6c0b9393c9356f229784a4385c02f99e94fc94d4"},
]
[[package]]
name = "lmnr"
version = "0.7.20"
description = "Python SDK for Laminar"
optional = false
python-versions = "<4,>=3.10"
groups = ["main"]
files = [
{file = "lmnr-0.7.20-py3-none-any.whl", hash = "sha256:5f9fa7444e6f96c25e097f66484ff29e632bdd1de0e9346948bf5595f4a8af38"},
{file = "lmnr-0.7.20.tar.gz", hash = "sha256:1f484cd618db2d71af65f90a0b8b36d20d80dc91a5138b811575c8677bf7c4fd"},
]
[package.dependencies]
grpcio = ">=1"
httpx = ">=0.24.0"
opentelemetry-api = ">=1.33.0"
opentelemetry-exporter-otlp-proto-grpc = ">=1.33.0"
opentelemetry-exporter-otlp-proto-http = ">=1.33.0"
opentelemetry-instrumentation = ">=0.54b0"
opentelemetry-instrumentation-threading = ">=0.57b0"
opentelemetry-sdk = ">=1.33.0"
opentelemetry-semantic-conventions = ">=0.54b0"
opentelemetry-semantic-conventions-ai = ">=0.4.13"
orjson = ">=3.0.0"
packaging = ">=22.0"
pydantic = ">=2.0.3,<3.0.0"
python-dotenv = ">=1.0"
tenacity = ">=8.0"
tqdm = ">=4.0"
[package.extras]
alephalpha = ["opentelemetry-instrumentation-alephalpha (>=0.47.1)"]
all = ["opentelemetry-instrumentation-alephalpha (>=0.47.1)", "opentelemetry-instrumentation-bedrock (>=0.47.1)", "opentelemetry-instrumentation-chromadb (>=0.47.1)", "opentelemetry-instrumentation-cohere (>=0.47.1)", "opentelemetry-instrumentation-crewai (>=0.47.1)", "opentelemetry-instrumentation-haystack (>=0.47.1)", "opentelemetry-instrumentation-lancedb (>=0.47.1)", "opentelemetry-instrumentation-langchain (>=0.47.1)", "opentelemetry-instrumentation-llamaindex (>=0.47.1)", "opentelemetry-instrumentation-marqo (>=0.47.1)", "opentelemetry-instrumentation-mcp (>=0.47.1)", "opentelemetry-instrumentation-milvus (>=0.47.1)", "opentelemetry-instrumentation-mistralai (>=0.47.1)", "opentelemetry-instrumentation-ollama (>=0.47.1)", "opentelemetry-instrumentation-pinecone (>=0.47.1)", "opentelemetry-instrumentation-qdrant (>=0.47.1)", "opentelemetry-instrumentation-replicate (>=0.47.1)", "opentelemetry-instrumentation-sagemaker (>=0.47.1)", "opentelemetry-instrumentation-together (>=0.47.1)", "opentelemetry-instrumentation-transformers (>=0.47.1)", "opentelemetry-instrumentation-vertexai (>=0.47.1)", "opentelemetry-instrumentation-watsonx (>=0.47.1)", "opentelemetry-instrumentation-weaviate (>=0.47.1)"]
bedrock = ["opentelemetry-instrumentation-bedrock (>=0.47.1)"]
chromadb = ["opentelemetry-instrumentation-chromadb (>=0.47.1)"]
cohere = ["opentelemetry-instrumentation-cohere (>=0.47.1)"]
crewai = ["opentelemetry-instrumentation-crewai (>=0.47.1)"]
haystack = ["opentelemetry-instrumentation-haystack (>=0.47.1)"]
lancedb = ["opentelemetry-instrumentation-lancedb (>=0.47.1)"]
langchain = ["opentelemetry-instrumentation-langchain (>=0.47.1)"]
llamaindex = ["opentelemetry-instrumentation-llamaindex (>=0.47.1)"]
marqo = ["opentelemetry-instrumentation-marqo (>=0.47.1)"]
mcp = ["opentelemetry-instrumentation-mcp (>=0.47.1)"]
milvus = ["opentelemetry-instrumentation-milvus (>=0.47.1)"]
mistralai = ["opentelemetry-instrumentation-mistralai (>=0.47.1)"]
ollama = ["opentelemetry-instrumentation-ollama (>=0.47.1)"]
pinecone = ["opentelemetry-instrumentation-pinecone (>=0.47.1)"]
qdrant = ["opentelemetry-instrumentation-qdrant (>=0.47.1)"]
replicate = ["opentelemetry-instrumentation-replicate (>=0.47.1)"]
sagemaker = ["opentelemetry-instrumentation-sagemaker (>=0.47.1)"]
together = ["opentelemetry-instrumentation-together (>=0.47.1)"]
transformers = ["opentelemetry-instrumentation-transformers (>=0.47.1)"]
vertexai = ["opentelemetry-instrumentation-vertexai (>=0.47.1)"]
watsonx = ["opentelemetry-instrumentation-watsonx (>=0.47.1)"]
weaviate = ["opentelemetry-instrumentation-weaviate (>=0.47.1)"]
[[package]]
name = "lxml"
version = "6.0.1"
@@ -5737,13 +5820,15 @@ llama = ["llama-index (>=0.12.29,<0.13.0)", "llama-index-core (>=0.12.29,<0.13.0
[[package]]
name = "openhands-agent-server"
version = "1.0.0a4"
version = "1.3.0"
description = "OpenHands Agent Server - REST/WebSocket interface for OpenHands AI Agent"
optional = false
python-versions = ">=3.12"
groups = ["main"]
files = []
develop = false
files = [
{file = "openhands_agent_server-1.3.0-py3-none-any.whl", hash = "sha256:2f87f790c740dc3fb81821c5f9fa375af875fbb937ebca3baa6dc5c035035b3c"},
{file = "openhands_agent_server-1.3.0.tar.gz", hash = "sha256:0a83ae77373f5c41d0ba0e22d8f0f6144d54d55784183a50b7c098c96cd5135c"},
]
[package.dependencies]
aiosqlite = ">=0.19"
@@ -5756,16 +5841,9 @@ uvicorn = ">=0.31.1"
websockets = ">=12"
wsproto = ">=1.2.0"
[package.source]
type = "git"
url = "https://github.com/OpenHands/agent-sdk.git"
reference = "ce0a71af55dfce101f7419fbdb0116178f01e109"
resolved_reference = "ce0a71af55dfce101f7419fbdb0116178f01e109"
subdirectory = "openhands-agent-server"
[[package]]
name = "openhands-ai"
version = "0.59.0"
version = "0.62.0"
description = "OpenHands: Code Less, Make More"
optional = false
python-versions = "^3.12,<3.14"
@@ -5782,6 +5860,7 @@ bashlex = "^0.18"
boto3 = "*"
browsergym-core = "0.13.3"
deprecated = "*"
deprecation = "^2.1.0"
dirhash = "*"
docker = "*"
fastapi = "*"
@@ -5801,13 +5880,14 @@ jupyter_kernel_gateway = "*"
kubernetes = "^33.1.0"
libtmux = ">=0.46.2"
litellm = ">=1.74.3, <1.78.0, !=1.64.4, !=1.67.*"
lmnr = "^0.7.20"
memory-profiler = "^0.61.0"
numpy = "*"
openai = "1.99.9"
openhands-aci = "0.3.2"
openhands-agent-server = {git = "https://github.com/OpenHands/agent-sdk.git", rev = "ce0a71af55dfce101f7419fbdb0116178f01e109", subdirectory = "openhands-agent-server"}
openhands-sdk = {git = "https://github.com/OpenHands/agent-sdk.git", rev = "ce0a71af55dfce101f7419fbdb0116178f01e109", subdirectory = "openhands-sdk"}
openhands-tools = {git = "https://github.com/OpenHands/agent-sdk.git", rev = "ce0a71af55dfce101f7419fbdb0116178f01e109", subdirectory = "openhands-tools"}
openhands-agent-server = "1.3.0"
openhands-sdk = "1.3.0"
openhands-tools = "1.3.0"
opentelemetry-api = "^1.33.1"
opentelemetry-exporter-otlp-proto-grpc = "^1.33.1"
pathspec = "^0.12.1"
@@ -5863,18 +5943,22 @@ url = ".."
[[package]]
name = "openhands-sdk"
version = "1.0.0a4"
version = "1.3.0"
description = "OpenHands SDK - Core functionality for building AI agents"
optional = false
python-versions = ">=3.12"
groups = ["main"]
files = []
develop = false
files = [
{file = "openhands_sdk-1.3.0-py3-none-any.whl", hash = "sha256:feee838346f8e60ea3e4d3391de7cb854314eb8b3c9e3dbbb56f98a784aadc56"},
{file = "openhands_sdk-1.3.0.tar.gz", hash = "sha256:2d060803a78de462121b56dea717a66356922deb02276f37b29fae8af66343fb"},
]
[package.dependencies]
deprecation = ">=2.1.0"
fastmcp = ">=2.11.3"
httpx = ">=0.27.0"
litellm = ">=1.77.7.dev9"
lmnr = ">=0.7.20"
pydantic = ">=2.11.7"
python-frontmatter = ">=1.1.0"
python-json-logger = ">=3.3.0"
@@ -5884,40 +5968,28 @@ websockets = ">=12"
[package.extras]
boto3 = ["boto3 (>=1.35.0)"]
[package.source]
type = "git"
url = "https://github.com/OpenHands/agent-sdk.git"
reference = "ce0a71af55dfce101f7419fbdb0116178f01e109"
resolved_reference = "ce0a71af55dfce101f7419fbdb0116178f01e109"
subdirectory = "openhands-sdk"
[[package]]
name = "openhands-tools"
version = "1.0.0a4"
version = "1.3.0"
description = "OpenHands Tools - Runtime tools for AI agents"
optional = false
python-versions = ">=3.12"
groups = ["main"]
files = []
develop = false
files = [
{file = "openhands_tools-1.3.0-py3-none-any.whl", hash = "sha256:f31056d87c3058ac92709f9161c7c602daeee3ed0cb4439097b43cda105ed03e"},
{file = "openhands_tools-1.3.0.tar.gz", hash = "sha256:3da46f09e28593677d3e17252ce18584fcc13caab1a73213e66bd7edca2cebe0"},
]
[package.dependencies]
bashlex = ">=0.18"
binaryornot = ">=0.4.4"
browser-use = ">=0.7.7"
browser-use = ">=0.8.0"
cachetools = "*"
func-timeout = ">=4.3.5"
libtmux = ">=0.46.2"
openhands-sdk = "*"
pydantic = ">=2.11.7"
[package.source]
type = "git"
url = "https://github.com/OpenHands/agent-sdk.git"
reference = "ce0a71af55dfce101f7419fbdb0116178f01e109"
resolved_reference = "ce0a71af55dfce101f7419fbdb0116178f01e109"
subdirectory = "openhands-tools"
[[package]]
name = "openpyxl"
version = "3.1.5"
@@ -5988,6 +6060,62 @@ opentelemetry-proto = "1.36.0"
opentelemetry-sdk = ">=1.36.0,<1.37.0"
typing-extensions = ">=4.6.0"
[[package]]
name = "opentelemetry-exporter-otlp-proto-http"
version = "1.36.0"
description = "OpenTelemetry Collector Protobuf over HTTP Exporter"
optional = false
python-versions = ">=3.9"
groups = ["main"]
files = [
{file = "opentelemetry_exporter_otlp_proto_http-1.36.0-py3-none-any.whl", hash = "sha256:3d769f68e2267e7abe4527f70deb6f598f40be3ea34c6adc35789bea94a32902"},
{file = "opentelemetry_exporter_otlp_proto_http-1.36.0.tar.gz", hash = "sha256:dd3637f72f774b9fc9608ab1ac479f8b44d09b6fb5b2f3df68a24ad1da7d356e"},
]
[package.dependencies]
googleapis-common-protos = ">=1.52,<2.0"
opentelemetry-api = ">=1.15,<2.0"
opentelemetry-exporter-otlp-proto-common = "1.36.0"
opentelemetry-proto = "1.36.0"
opentelemetry-sdk = ">=1.36.0,<1.37.0"
requests = ">=2.7,<3.0"
typing-extensions = ">=4.5.0"
[[package]]
name = "opentelemetry-instrumentation"
version = "0.57b0"
description = "Instrumentation Tools & Auto Instrumentation for OpenTelemetry Python"
optional = false
python-versions = ">=3.9"
groups = ["main"]
files = [
{file = "opentelemetry_instrumentation-0.57b0-py3-none-any.whl", hash = "sha256:9109280f44882e07cec2850db28210b90600ae9110b42824d196de357cbddf7e"},
{file = "opentelemetry_instrumentation-0.57b0.tar.gz", hash = "sha256:f2a30135ba77cdea2b0e1df272f4163c154e978f57214795d72f40befd4fcf05"},
]
[package.dependencies]
opentelemetry-api = ">=1.4,<2.0"
opentelemetry-semantic-conventions = "0.57b0"
packaging = ">=18.0"
wrapt = ">=1.0.0,<2.0.0"
[[package]]
name = "opentelemetry-instrumentation-threading"
version = "0.57b0"
description = "Thread context propagation support for OpenTelemetry"
optional = false
python-versions = ">=3.9"
groups = ["main"]
files = [
{file = "opentelemetry_instrumentation_threading-0.57b0-py3-none-any.whl", hash = "sha256:adfd64857c8c78d6111cf80552311e1713bad64272dd81abdd61f07b892a161b"},
{file = "opentelemetry_instrumentation_threading-0.57b0.tar.gz", hash = "sha256:06fa4c98d6bfe4670e7532497670ac202db42afa647ff770aedce0e422421c6e"},
]
[package.dependencies]
opentelemetry-api = ">=1.12,<2.0"
opentelemetry-instrumentation = "0.57b0"
wrapt = ">=1.0.0,<2.0.0"
[[package]]
name = "opentelemetry-proto"
version = "1.36.0"
@@ -6036,6 +6164,115 @@ files = [
opentelemetry-api = "1.36.0"
typing-extensions = ">=4.5.0"
[[package]]
name = "opentelemetry-semantic-conventions-ai"
version = "0.4.13"
description = "OpenTelemetry Semantic Conventions Extension for Large Language Models"
optional = false
python-versions = "<4,>=3.9"
groups = ["main"]
files = [
{file = "opentelemetry_semantic_conventions_ai-0.4.13-py3-none-any.whl", hash = "sha256:883a30a6bb5deaec0d646912b5f9f6dcbb9f6f72557b73d0f2560bf25d13e2d5"},
{file = "opentelemetry_semantic_conventions_ai-0.4.13.tar.gz", hash = "sha256:94efa9fb4ffac18c45f54a3a338ffeb7eedb7e1bb4d147786e77202e159f0036"},
]
[[package]]
name = "orjson"
version = "3.11.4"
description = "Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy"
optional = false
python-versions = ">=3.9"
groups = ["main"]
files = [
{file = "orjson-3.11.4-cp310-cp310-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:e3aa2118a3ece0d25489cbe48498de8a5d580e42e8d9979f65bf47900a15aba1"},
{file = "orjson-3.11.4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:a69ab657a4e6733133a3dca82768f2f8b884043714e8d2b9ba9f52b6efef5c44"},
{file = "orjson-3.11.4-cp310-cp310-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:3740bffd9816fc0326ddc406098a3a8f387e42223f5f455f2a02a9f834ead80c"},
{file = "orjson-3.11.4-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:65fd2f5730b1bf7f350c6dc896173d3460d235c4be007af73986d7cd9a2acd23"},
{file = "orjson-3.11.4-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:9fdc3ae730541086158d549c97852e2eea6820665d4faf0f41bf99df41bc11ea"},
{file = "orjson-3.11.4-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:e10b4d65901da88845516ce9f7f9736f9638d19a1d483b3883dc0182e6e5edba"},
{file = "orjson-3.11.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:fb6a03a678085f64b97f9d4a9ae69376ce91a3a9e9b56a82b1580d8e1d501aff"},
{file = "orjson-3.11.4-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:2c82e4f0b1c712477317434761fbc28b044c838b6b1240d895607441412371ac"},
{file = "orjson-3.11.4-cp310-cp310-musllinux_1_2_armv7l.whl", hash = "sha256:d58c166a18f44cc9e2bad03a327dc2d1a3d2e85b847133cfbafd6bfc6719bd79"},
{file = "orjson-3.11.4-cp310-cp310-musllinux_1_2_i686.whl", hash = "sha256:94f206766bf1ea30e1382e4890f763bd1eefddc580e08fec1ccdc20ddd95c827"},
{file = "orjson-3.11.4-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:41bf25fb39a34cf8edb4398818523277ee7096689db352036a9e8437f2f3ee6b"},
{file = "orjson-3.11.4-cp310-cp310-win32.whl", hash = "sha256:fa9627eba4e82f99ca6d29bc967f09aba446ee2b5a1ea728949ede73d313f5d3"},
{file = "orjson-3.11.4-cp310-cp310-win_amd64.whl", hash = "sha256:23ef7abc7fca96632d8174ac115e668c1e931b8fe4dde586e92a500bf1914dcc"},
{file = "orjson-3.11.4-cp311-cp311-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:5e59d23cd93ada23ec59a96f215139753fbfe3a4d989549bcb390f8c00370b39"},
{file = "orjson-3.11.4-cp311-cp311-macosx_15_0_arm64.whl", hash = "sha256:5c3aedecfc1beb988c27c79d52ebefab93b6c3921dbec361167e6559aba2d36d"},
{file = "orjson-3.11.4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:da9e5301f1c2caa2a9a4a303480d79c9ad73560b2e7761de742ab39fe59d9175"},
{file = "orjson-3.11.4-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:8873812c164a90a79f65368f8f96817e59e35d0cc02786a5356f0e2abed78040"},
{file = "orjson-3.11.4-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:5d7feb0741ebb15204e748f26c9638e6665a5fa93c37a2c73d64f1669b0ddc63"},
{file = "orjson-3.11.4-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:01ee5487fefee21e6910da4c2ee9eef005bee568a0879834df86f888d2ffbdd9"},
{file = "orjson-3.11.4-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:3d40d46f348c0321df01507f92b95a377240c4ec31985225a6668f10e2676f9a"},
{file = "orjson-3.11.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:95713e5fc8af84d8edc75b785d2386f653b63d62b16d681687746734b4dfc0be"},
{file = "orjson-3.11.4-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:ad73ede24f9083614d6c4ca9a85fe70e33be7bf047ec586ee2363bc7418fe4d7"},
{file = "orjson-3.11.4-cp311-cp311-musllinux_1_2_armv7l.whl", hash = "sha256:842289889de515421f3f224ef9c1f1efb199a32d76d8d2ca2706fa8afe749549"},
{file = "orjson-3.11.4-cp311-cp311-musllinux_1_2_i686.whl", hash = "sha256:3b2427ed5791619851c52a1261b45c233930977e7de8cf36de05636c708fa905"},
{file = "orjson-3.11.4-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:3c36e524af1d29982e9b190573677ea02781456b2e537d5840e4538a5ec41907"},
{file = "orjson-3.11.4-cp311-cp311-win32.whl", hash = "sha256:87255b88756eab4a68ec61837ca754e5d10fa8bc47dc57f75cedfeaec358d54c"},
{file = "orjson-3.11.4-cp311-cp311-win_amd64.whl", hash = "sha256:e2d5d5d798aba9a0e1fede8d853fa899ce2cb930ec0857365f700dffc2c7af6a"},
{file = "orjson-3.11.4-cp311-cp311-win_arm64.whl", hash = "sha256:6bb6bb41b14c95d4f2702bce9975fda4516f1db48e500102fc4d8119032ff045"},
{file = "orjson-3.11.4-cp312-cp312-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:d4371de39319d05d3f482f372720b841c841b52f5385bd99c61ed69d55d9ab50"},
{file = "orjson-3.11.4-cp312-cp312-macosx_15_0_arm64.whl", hash = "sha256:e41fd3b3cac850eaae78232f37325ed7d7436e11c471246b87b2cd294ec94853"},
{file = "orjson-3.11.4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:600e0e9ca042878c7fdf189cf1b028fe2c1418cc9195f6cb9824eb6ed99cb938"},
{file = "orjson-3.11.4-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:7bbf9b333f1568ef5da42bc96e18bf30fd7f8d54e9ae066d711056add508e415"},
{file = "orjson-3.11.4-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:4806363144bb6e7297b8e95870e78d30a649fdc4e23fc84daa80c8ebd366ce44"},
{file = "orjson-3.11.4-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:ad355e8308493f527d41154e9053b86a5be892b3b359a5c6d5d95cda23601cb2"},
{file = "orjson-3.11.4-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:c8a7517482667fb9f0ff1b2f16fe5829296ed7a655d04d68cd9711a4d8a4e708"},
{file = "orjson-3.11.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:97eb5942c7395a171cbfecc4ef6701fc3c403e762194683772df4c54cfbb2210"},
{file = "orjson-3.11.4-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:149d95d5e018bdd822e3f38c103b1a7c91f88d38a88aada5c4e9b3a73a244241"},
{file = "orjson-3.11.4-cp312-cp312-musllinux_1_2_armv7l.whl", hash = "sha256:624f3951181eb46fc47dea3d221554e98784c823e7069edb5dbd0dc826ac909b"},
{file = "orjson-3.11.4-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:03bfa548cf35e3f8b3a96c4e8e41f753c686ff3d8e182ce275b1751deddab58c"},
{file = "orjson-3.11.4-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:525021896afef44a68148f6ed8a8bf8375553d6066c7f48537657f64823565b9"},
{file = "orjson-3.11.4-cp312-cp312-win32.whl", hash = "sha256:b58430396687ce0f7d9eeb3dd47761ca7d8fda8e9eb92b3077a7a353a75efefa"},
{file = "orjson-3.11.4-cp312-cp312-win_amd64.whl", hash = "sha256:c6dbf422894e1e3c80a177133c0dda260f81428f9de16d61041949f6a2e5c140"},
{file = "orjson-3.11.4-cp312-cp312-win_arm64.whl", hash = "sha256:d38d2bc06d6415852224fcc9c0bfa834c25431e466dc319f0edd56cca81aa96e"},
{file = "orjson-3.11.4-cp313-cp313-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:2d6737d0e616a6e053c8b4acc9eccea6b6cce078533666f32d140e4f85002534"},
{file = "orjson-3.11.4-cp313-cp313-macosx_15_0_arm64.whl", hash = "sha256:afb14052690aa328cc118a8e09f07c651d301a72e44920b887c519b313d892ff"},
{file = "orjson-3.11.4-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:38aa9e65c591febb1b0aed8da4d469eba239d434c218562df179885c94e1a3ad"},
{file = "orjson-3.11.4-cp313-cp313-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:f2cf4dfaf9163b0728d061bebc1e08631875c51cd30bf47cb9e3293bfbd7dcd5"},
{file = "orjson-3.11.4-cp313-cp313-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:89216ff3dfdde0e4070932e126320a1752c9d9a758d6a32ec54b3b9334991a6a"},
{file = "orjson-3.11.4-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:9daa26ca8e97fae0ce8aa5d80606ef8f7914e9b129b6b5df9104266f764ce436"},
{file = "orjson-3.11.4-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:5c8b2769dc31883c44a9cd126560327767f848eb95f99c36c9932f51090bfce9"},
{file = "orjson-3.11.4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:1469d254b9884f984026bd9b0fa5bbab477a4bfe558bba6848086f6d43eb5e73"},
{file = "orjson-3.11.4-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:68e44722541983614e37117209a194e8c3ad07838ccb3127d96863c95ec7f1e0"},
{file = "orjson-3.11.4-cp313-cp313-musllinux_1_2_armv7l.whl", hash = "sha256:8e7805fda9672c12be2f22ae124dcd7b03928d6c197544fe12174b86553f3196"},
{file = "orjson-3.11.4-cp313-cp313-musllinux_1_2_i686.whl", hash = "sha256:04b69c14615fb4434ab867bf6f38b2d649f6f300af30a6705397e895f7aec67a"},
{file = "orjson-3.11.4-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:639c3735b8ae7f970066930e58cf0ed39a852d417c24acd4a25fc0b3da3c39a6"},
{file = "orjson-3.11.4-cp313-cp313-win32.whl", hash = "sha256:6c13879c0d2964335491463302a6ca5ad98105fc5db3565499dcb80b1b4bd839"},
{file = "orjson-3.11.4-cp313-cp313-win_amd64.whl", hash = "sha256:09bf242a4af98732db9f9a1ec57ca2604848e16f132e3f72edfd3c5c96de009a"},
{file = "orjson-3.11.4-cp313-cp313-win_arm64.whl", hash = "sha256:a85f0adf63319d6c1ba06fb0dbf997fced64a01179cf17939a6caca662bf92de"},
{file = "orjson-3.11.4-cp314-cp314-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:42d43a1f552be1a112af0b21c10a5f553983c2a0938d2bbb8ecd8bc9fb572803"},
{file = "orjson-3.11.4-cp314-cp314-macosx_15_0_arm64.whl", hash = "sha256:26a20f3fbc6c7ff2cb8e89c4c5897762c9d88cf37330c6a117312365d6781d54"},
{file = "orjson-3.11.4-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:6e3f20be9048941c7ffa8fc523ccbd17f82e24df1549d1d1fe9317712d19938e"},
{file = "orjson-3.11.4-cp314-cp314-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:aac364c758dc87a52e68e349924d7e4ded348dedff553889e4d9f22f74785316"},
{file = "orjson-3.11.4-cp314-cp314-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:d5c54a6d76e3d741dcc3f2707f8eeb9ba2a791d3adbf18f900219b62942803b1"},
{file = "orjson-3.11.4-cp314-cp314-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:f28485bdca8617b79d44627f5fb04336897041dfd9fa66d383a49d09d86798bc"},
{file = "orjson-3.11.4-cp314-cp314-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:bfc2a484cad3585e4ba61985a6062a4c2ed5c7925db6d39f1fa267c9d166487f"},
{file = "orjson-3.11.4-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e34dbd508cb91c54f9c9788923daca129fe5b55c5b4eebe713bf5ed3791280cf"},
{file = "orjson-3.11.4-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:b13c478fa413d4b4ee606ec8e11c3b2e52683a640b006bb586b3041c2ca5f606"},
{file = "orjson-3.11.4-cp314-cp314-musllinux_1_2_armv7l.whl", hash = "sha256:724ca721ecc8a831b319dcd72cfa370cc380db0bf94537f08f7edd0a7d4e1780"},
{file = "orjson-3.11.4-cp314-cp314-musllinux_1_2_i686.whl", hash = "sha256:977c393f2e44845ce1b540e19a786e9643221b3323dae190668a98672d43fb23"},
{file = "orjson-3.11.4-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:1e539e382cf46edec157ad66b0b0872a90d829a6b71f17cb633d6c160a223155"},
{file = "orjson-3.11.4-cp314-cp314-win32.whl", hash = "sha256:d63076d625babab9db5e7836118bdfa086e60f37d8a174194ae720161eb12394"},
{file = "orjson-3.11.4-cp314-cp314-win_amd64.whl", hash = "sha256:0a54d6635fa3aaa438ae32e8570b9f0de36f3f6562c308d2a2a452e8b0592db1"},
{file = "orjson-3.11.4-cp314-cp314-win_arm64.whl", hash = "sha256:78b999999039db3cf58f6d230f524f04f75f129ba3d1ca2ed121f8657e575d3d"},
{file = "orjson-3.11.4-cp39-cp39-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl", hash = "sha256:405261b0a8c62bcbd8e2931c26fdc08714faf7025f45531541e2b29e544b545b"},
{file = "orjson-3.11.4-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:af02ff34059ee9199a3546f123a6ab4c86caf1708c79042caf0820dc290a6d4f"},
{file = "orjson-3.11.4-cp39-cp39-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:0b2eba969ea4203c177c7b38b36c69519e6067ee68c34dc37081fac74c796e10"},
{file = "orjson-3.11.4-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:0baa0ea43cfa5b008a28d3c07705cf3ada40e5d347f0f44994a64b1b7b4b5350"},
{file = "orjson-3.11.4-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:80fd082f5dcc0e94657c144f1b2a3a6479c44ad50be216cf0c244e567f5eae19"},
{file = "orjson-3.11.4-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:1e3704d35e47d5bee811fb1cbd8599f0b4009b14d451c4c57be5a7e25eb89a13"},
{file = "orjson-3.11.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:caa447f2b5356779d914658519c874cf3b7629e99e63391ed519c28c8aea4919"},
{file = "orjson-3.11.4-cp39-cp39-musllinux_1_2_aarch64.whl", hash = "sha256:bba5118143373a86f91dadb8df41d9457498226698ebdf8e11cbb54d5b0e802d"},
{file = "orjson-3.11.4-cp39-cp39-musllinux_1_2_armv7l.whl", hash = "sha256:622463ab81d19ef3e06868b576551587de8e4d518892d1afab71e0fbc1f9cffc"},
{file = "orjson-3.11.4-cp39-cp39-musllinux_1_2_i686.whl", hash = "sha256:3e0a700c4b82144b72946b6629968df9762552ee1344bfdb767fecdd634fbd5a"},
{file = "orjson-3.11.4-cp39-cp39-musllinux_1_2_x86_64.whl", hash = "sha256:6e18a5c15e764e5f3fc569b47872450b4bcea24f2a6354c0a0e95ad21045d5a9"},
{file = "orjson-3.11.4-cp39-cp39-win32.whl", hash = "sha256:fb1c37c71cad991ef4d89c7a634b5ffb4447dbd7ae3ae13e8f5ee7f1775e7ab1"},
{file = "orjson-3.11.4-cp39-cp39-win_amd64.whl", hash = "sha256:e2985ce8b8c42d00492d0ed79f2bd2b6460d00f2fa671dfde4bf2e02f49bf5c6"},
{file = "orjson-3.11.4.tar.gz", hash = "sha256:39485f4ab4c9b30a3943cfe99e1a213c4776fb69e8abd68f66b83d5a0b0fdc6d"},
]
[[package]]
name = "packaging"
version = "25.0"
@@ -6252,6 +6489,21 @@ files = [
[package.dependencies]
ptyprocess = ">=0.5"
[[package]]
name = "pfzy"
version = "0.3.4"
description = "Python port of the fzy fuzzy string matching algorithm"
optional = false
python-versions = ">=3.7,<4.0"
groups = ["main"]
files = [
{file = "pfzy-0.3.4-py3-none-any.whl", hash = "sha256:5f50d5b2b3207fa72e7ec0ef08372ef652685470974a107d0d4999fc5a903a96"},
{file = "pfzy-0.3.4.tar.gz", hash = "sha256:717ea765dd10b63618e7298b2d98efd819e0b30cd5905c9707223dceeb94b3f1"},
]
[package.extras]
docs = ["Sphinx (>=4.1.2,<5.0.0)", "furo (>=2021.8.17-beta.43,<2022.0.0)", "myst-parser (>=0.15.1,<0.16.0)", "sphinx-autobuild (>=2021.3.14,<2022.0.0)", "sphinx-copybutton (>=0.4.0,<0.5.0)"]
[[package]]
name = "pg8000"
version = "1.31.5"

View File

@@ -30,3 +30,11 @@ JIRA_DC_CLIENT_SECRET = os.getenv('JIRA_DC_CLIENT_SECRET', '').strip()
JIRA_DC_BASE_URL = os.getenv('JIRA_DC_BASE_URL', '').strip()
JIRA_DC_ENABLE_OAUTH = os.getenv('JIRA_DC_ENABLE_OAUTH', '1') in ('1', 'true')
AUTH_URL = os.getenv('AUTH_URL', '').rstrip('/')
ROLE_CHECK_ENABLED = os.getenv('ROLE_CHECK_ENABLED', 'false').lower() in (
'1',
'true',
't',
'yes',
'y',
'on',
)

View File

@@ -50,7 +50,7 @@ SUBSCRIPTION_PRICE_DATA = {
},
}
DEFAULT_INITIAL_BUDGET = float(os.environ.get('DEFAULT_INITIAL_BUDGET', '20'))
DEFAULT_INITIAL_BUDGET = float(os.environ.get('DEFAULT_INITIAL_BUDGET', '10'))
STRIPE_API_KEY = os.environ.get('STRIPE_API_KEY', None)
STRIPE_WEBHOOK_SECRET = os.environ.get('STRIPE_WEBHOOK_SECRET', None)
REQUIRE_PAYMENT = os.environ.get('REQUIRE_PAYMENT', '0') in ('1', 'true')

View File

@@ -12,6 +12,7 @@ from server.auth.constants import (
KEYCLOAK_CLIENT_ID,
KEYCLOAK_REALM_NAME,
KEYCLOAK_SERVER_URL_EXT,
ROLE_CHECK_ENABLED,
)
from server.auth.gitlab_sync import schedule_gitlab_repo_sync
from server.auth.saas_user_auth import SaasUserAuth
@@ -132,6 +133,12 @@ async def keycloak_callback(
user_info = await token_manager.get_user_info(keycloak_access_token)
logger.debug(f'user_info: {user_info}')
if ROLE_CHECK_ENABLED and 'roles' not in user_info:
return JSONResponse(
status_code=status.HTTP_401_UNAUTHORIZED,
content={'error': 'Missing required role'},
)
if 'sub' not in user_info or 'preferred_username' not in user_info:
return JSONResponse(
status_code=status.HTTP_400_BAD_REQUEST,

View File

@@ -35,6 +35,7 @@ class SaasConversationStore(ConversationStore):
session.query(StoredConversationMetadata)
.filter(StoredConversationMetadata.user_id == self.user_id)
.filter(StoredConversationMetadata.conversation_id == conversation_id)
.filter(StoredConversationMetadata.conversation_version == 'V0')
)
def _to_external_model(self, conversation_metadata: StoredConversationMetadata):
@@ -59,6 +60,7 @@ class SaasConversationStore(ConversationStore):
kwargs.pop('reasoning_tokens', None)
kwargs.pop('context_window', None)
kwargs.pop('per_turn_token', None)
kwargs.pop('parent_conversation_id', None)
return ConversationMetadata(**kwargs)
@@ -123,6 +125,7 @@ class SaasConversationStore(ConversationStore):
conversations = (
session.query(StoredConversationMetadata)
.filter(StoredConversationMetadata.user_id == self.user_id)
.filter(StoredConversationMetadata.conversation_version == 'V0')
.order_by(StoredConversationMetadata.created_at.desc())
.offset(offset)
.limit(limit + 1)

View File

@@ -97,6 +97,10 @@ class SaasSettingsStore(SettingsStore):
return settings
async def store(self, item: Settings):
# Check if provider is OpenHands and generate API key if needed
if item and self._is_openhands_provider(item):
await self._ensure_openhands_api_key(item)
with self.session_maker() as session:
existing = None
kwargs = {}
@@ -368,6 +372,30 @@ class SaasSettingsStore(SettingsStore):
def _should_encrypt(self, key: str) -> bool:
return key in ('llm_api_key', 'llm_api_key_for_byor', 'search_api_key')
def _is_openhands_provider(self, item: Settings) -> bool:
"""Check if the settings use the OpenHands provider."""
return bool(item.llm_model and item.llm_model.startswith('openhands/'))
async def _ensure_openhands_api_key(self, item: Settings) -> None:
"""Generate and set the OpenHands API key for the given settings.
First checks if an existing key with the OpenHands alias exists,
and reuses it if found. Otherwise, generates a new key.
"""
# Generate new key if none exists
generated_key = await self._generate_openhands_key()
if generated_key:
item.llm_api_key = SecretStr(generated_key)
logger.info(
'saas_settings_store:store:generated_openhands_key',
extra={'user_id': self.user_id},
)
else:
logger.warning(
'saas_settings_store:store:failed_to_generate_openhands_key',
extra={'user_id': self.user_id},
)
async def _create_user_in_lite_llm(
self, client: httpx.AsyncClient, email: str | None, max_budget: int, spend: int
):
@@ -390,3 +418,55 @@ class SaasSettingsStore(SettingsStore):
},
)
return response
async def _generate_openhands_key(self) -> str | None:
"""Generate a new OpenHands provider key for a user."""
if not (LITE_LLM_API_KEY and LITE_LLM_API_URL):
logger.warning(
'saas_settings_store:_generate_openhands_key:litellm_config_not_found',
extra={'user_id': self.user_id},
)
return None
try:
async with httpx.AsyncClient(
verify=httpx_verify_option(),
headers={
'x-goog-api-key': LITE_LLM_API_KEY,
},
) as client:
response = await client.post(
f'{LITE_LLM_API_URL}/key/generate',
json={
'user_id': self.user_id,
'metadata': {'type': 'openhands'},
},
)
response.raise_for_status()
response_json = response.json()
key = response_json.get('key')
if key:
logger.info(
'saas_settings_store:_generate_openhands_key:success',
extra={
'user_id': self.user_id,
'key_length': len(key) if key else 0,
'key_prefix': (
key[:10] + '...' if key and len(key) > 10 else key
),
},
)
return key
else:
logger.error(
'saas_settings_store:_generate_openhands_key:no_key_in_response',
extra={'user_id': self.user_id, 'response_json': response_json},
)
return None
except Exception as e:
logger.exception(
'saas_settings_store:_generate_openhands_key:error',
extra={'user_id': self.user_id, 'error': str(e)},
)
return None

View File

@@ -38,3 +38,4 @@ class UserSettings(Base): # type: ignore
email_verified = Column(Boolean, nullable=True)
git_user_name = Column(String, nullable=True)
git_user_email = Column(String, nullable=True)
v1_enabled = Column(Boolean, nullable=True)

View File

@@ -92,11 +92,8 @@ def test_unknown_variant_returns_original_agent_without_changes(monkeypatch):
assert getattr(result, 'condenser', None) is None
@patch('experiments.experiment_manager.handle_condenser_max_step_experiment__v1')
@patch('experiments.experiment_manager.ENABLE_EXPERIMENT_MANAGER', False)
def test_run_agent_variant_tests_v1_noop_when_manager_disabled(
mock_handle_condenser,
):
def test_run_agent_variant_tests_v1_noop_when_manager_disabled():
"""If ENABLE_EXPERIMENT_MANAGER is False, the method returns the exact same agent and does not call the handler."""
agent = make_agent()
conv_id = uuid4()
@@ -109,8 +106,6 @@ def test_run_agent_variant_tests_v1_noop_when_manager_disabled(
# Same object returned (no copy)
assert result is agent
# Handler should not have been called
mock_handle_condenser.assert_not_called()
@patch('experiments.experiment_manager.ENABLE_EXPERIMENT_MANAGER', True)
@@ -131,7 +126,3 @@ def test_run_agent_variant_tests_v1_calls_handler_and_sets_system_prompt(monkeyp
# Should be a different instance than the original (copied after handler runs)
assert result is not agent
assert result.system_prompt_filename == 'system_prompt_long_horizon.j2'
# The condenser returned by the handler must be preserved after the system-prompt override copy
assert isinstance(result.condenser, LLMSummarizingCondenser)
assert result.condenser.max_size == 80

View File

@@ -1,7 +1,9 @@
from unittest import TestCase, mock
from unittest.mock import MagicMock, patch
from integrations.github.github_view import GithubFactory, get_oh_labels
from integrations.github.github_view import GithubFactory, GithubIssue, get_oh_labels
from integrations.models import Message, SourceType
from integrations.types import UserData
class TestGithubLabels(TestCase):
@@ -75,3 +77,128 @@ class TestGithubCommentCaseInsensitivity(TestCase):
self.assertTrue(GithubFactory.is_issue_comment(message_lower))
self.assertTrue(GithubFactory.is_issue_comment(message_upper))
self.assertTrue(GithubFactory.is_issue_comment(message_mixed))
class TestGithubV1ConversationRouting(TestCase):
"""Test V1 conversation routing logic in GitHub integration."""
def setUp(self):
"""Set up test fixtures."""
# Create a proper UserData instance instead of MagicMock
user_data = UserData(
user_id=123, username='testuser', keycloak_user_id='test-keycloak-id'
)
# Create a mock raw_payload
raw_payload = Message(
source=SourceType.GITHUB,
message={
'payload': {
'action': 'opened',
'issue': {'number': 123},
}
},
)
self.github_issue = GithubIssue(
user_info=user_data,
full_repo_name='test/repo',
issue_number=123,
installation_id=456,
conversation_id='test-conversation-id',
should_extract=True,
send_summary_instruction=False,
is_public_repo=True,
raw_payload=raw_payload,
uuid='test-uuid',
title='Test Issue',
description='Test issue description',
previous_comments=[],
)
@patch('integrations.github.github_view.get_user_v1_enabled_setting')
@patch.object(GithubIssue, '_create_v0_conversation')
@patch.object(GithubIssue, '_create_v1_conversation')
async def test_create_new_conversation_routes_to_v0_when_disabled(
self, mock_create_v1, mock_create_v0, mock_get_v1_setting
):
"""Test that conversation creation routes to V0 when v1_enabled is False."""
# Mock v1_enabled as False
mock_get_v1_setting.return_value = False
mock_create_v0.return_value = None
mock_create_v1.return_value = None
# Mock parameters
jinja_env = MagicMock()
git_provider_tokens = MagicMock()
conversation_metadata = MagicMock()
# Call the method
await self.github_issue.create_new_conversation(
jinja_env, git_provider_tokens, conversation_metadata
)
# Verify V0 was called and V1 was not
mock_create_v0.assert_called_once_with(
jinja_env, git_provider_tokens, conversation_metadata
)
mock_create_v1.assert_not_called()
@patch('integrations.github.github_view.get_user_v1_enabled_setting')
@patch.object(GithubIssue, '_create_v0_conversation')
@patch.object(GithubIssue, '_create_v1_conversation')
async def test_create_new_conversation_routes_to_v1_when_enabled(
self, mock_create_v1, mock_create_v0, mock_get_v1_setting
):
"""Test that conversation creation routes to V1 when v1_enabled is True."""
# Mock v1_enabled as True
mock_get_v1_setting.return_value = True
mock_create_v0.return_value = None
mock_create_v1.return_value = None
# Mock parameters
jinja_env = MagicMock()
git_provider_tokens = MagicMock()
conversation_metadata = MagicMock()
# Call the method
await self.github_issue.create_new_conversation(
jinja_env, git_provider_tokens, conversation_metadata
)
# Verify V1 was called and V0 was not
mock_create_v1.assert_called_once_with(
jinja_env, git_provider_tokens, conversation_metadata
)
mock_create_v0.assert_not_called()
@patch('integrations.github.github_view.get_user_v1_enabled_setting')
@patch.object(GithubIssue, '_create_v0_conversation')
@patch.object(GithubIssue, '_create_v1_conversation')
async def test_create_new_conversation_fallback_on_v1_setting_error(
self, mock_create_v1, mock_create_v0, mock_get_v1_setting
):
"""Test that conversation creation falls back to V0 when _create_v1_conversation fails."""
# Mock v1_enabled as True so V1 is attempted
mock_get_v1_setting.return_value = True
# Mock _create_v1_conversation to raise an exception
mock_create_v1.side_effect = Exception('V1 conversation creation failed')
mock_create_v0.return_value = None
# Mock parameters
jinja_env = MagicMock()
git_provider_tokens = MagicMock()
conversation_metadata = MagicMock()
# Call the method
await self.github_issue.create_new_conversation(
jinja_env, git_provider_tokens, conversation_metadata
)
# Verify V1 was attempted first, then V0 was called as fallback
mock_create_v1.assert_called_once_with(
jinja_env, git_provider_tokens, conversation_metadata
)
mock_create_v0.assert_called_once_with(
jinja_env, git_provider_tokens, conversation_metadata
)

View File

@@ -243,7 +243,7 @@ async def test_update_settings_with_litellm_default(
# Check that the URL and most of the JSON payload match what we expect
assert call_args['json']['user_email'] == 'testy@tester.com'
assert call_args['json']['models'] == []
assert call_args['json']['max_budget'] == 20.0
assert call_args['json']['max_budget'] == 10.0
assert call_args['json']['user_id'] == 'user-id'
assert call_args['json']['teams'] == ['test_team']
assert call_args['json']['auto_create_key'] is True

View File

@@ -15,7 +15,7 @@ python evaluation/benchmarks/multi_swe_bench/scripts/data/data_change.py
## Docker image download
Please download the multi-swe-bench dokcer images from [here](https://github.com/multi-swe-bench/multi-swe-bench?tab=readme-ov-file#run-evaluation).
Please download the multi-swe-bench docker images from [here](https://github.com/multi-swe-bench/multi-swe-bench?tab=readme-ov-file#run-evaluation).
## Generate patch
@@ -47,7 +47,7 @@ For debugging purposes, you can set `export EVAL_SKIP_MAXIMUM_RETRIES_EXCEEDED=t
The results will be generated in evaluation/evaluation_outputs/outputs/XXX/CodeActAgent/YYY/output.jsonl, you can refer to the [example](examples/output.jsonl).
## Runing evaluation
## Running evaluation
First, install [multi-swe-bench](https://github.com/multi-swe-bench/multi-swe-bench).

View File

@@ -0,0 +1,79 @@
import argparse
import fnmatch
import json
from collections import Counter
from pathlib import Path
def find_final_reports(base_dir, pattern=None):
base_path = Path(base_dir)
if not base_path.exists():
raise FileNotFoundError(f'Base directory does not exist: {base_dir}')
# Find all final_report.json files
all_reports = list(base_path.rglob('final_report.json'))
if pattern is None:
return all_reports
# Filter by pattern
filtered_reports = []
for report in all_reports:
# Get relative path from base_dir for matching
rel_path = report.relative_to(base_path)
if fnmatch.fnmatch(str(rel_path), pattern):
filtered_reports.append(report)
return filtered_reports
def collect_resolved_ids(report_files):
id_counter = Counter()
for report_file in report_files:
with open(report_file, 'r') as f:
data = json.load(f)
if 'resolved_ids' not in data:
raise KeyError(f"'resolved_ids' key not found in {report_file}")
resolved_ids = data['resolved_ids']
id_counter.update(resolved_ids)
return id_counter
def get_skip_ids(id_counter, threshold):
return [id_str for id_str, count in id_counter.items() if count >= threshold]
def main():
parser = argparse.ArgumentParser(
description='Compute SKIP_IDS from resolved IDs in final_report.json files'
)
parser.add_argument(
'threshold',
type=int,
help='Minimum number of times an ID must be resolved to be skipped',
)
parser.add_argument(
'--base-dir',
default='evaluation/evaluation_outputs/outputs',
help='Base directory to search for final_report.json files (default: evaluation/evaluation_outputs/outputs)',
)
parser.add_argument(
'--pattern',
default=None,
help='Glob pattern to filter paths (e.g., "*Multi-SWE-RL*/**/*gpt*")',
)
args = parser.parse_args()
report_files = find_final_reports(args.base_dir, args.pattern)
id_counter = collect_resolved_ids(report_files)
skip_ids = get_skip_ids(id_counter, args.threshold)
skip_ids = [s.replace('/', '__').replace(':pr-', '-') for s in skip_ids]
skip_ids = ','.join(sorted(skip_ids))
print(skip_ids)
if __name__ == '__main__':
main()

View File

@@ -747,10 +747,14 @@ def filter_dataset(dataset: pd.DataFrame, filter_column: str) -> pd.DataFrame:
subset = dataset[dataset[filter_column].isin(selected_ids)]
logger.info(f'Retained {subset.shape[0]} tasks after filtering')
return subset
skip_ids = os.environ.get('SKIP_IDS', '').split(',')
skip_ids = [id for id in os.environ.get('SKIP_IDS', '').split(',') if id]
if len(skip_ids) > 0:
logger.info(f'Dataset size before filtering: {dataset.shape[0]} tasks')
logger.info(f'Filtering {len(skip_ids)} tasks from "SKIP_IDS"...')
return dataset[~dataset[filter_column].isin(skip_ids)]
logger.info(f'SKIP_IDS:\n{skip_ids}')
filtered_dataset = dataset[~dataset[filter_column].isin(skip_ids)]
logger.info(f'Dataset size after filtering: {filtered_dataset.shape[0]} tasks')
return filtered_dataset
return dataset
@@ -768,6 +772,11 @@ if __name__ == '__main__':
default='test',
help='split to evaluate on',
)
parser.add_argument(
'--filter_dataset_after_sampling',
action='store_true',
help='if provided, filter dataset after sampling instead of before',
)
args, _ = parser.parse_known_args()
# NOTE: It is preferable to load datasets from huggingface datasets and perform post-processing
@@ -777,10 +786,24 @@ if __name__ == '__main__':
logger.info(f'Loading dataset {args.dataset} with split {args.split} ')
dataset = load_dataset('json', data_files=args.dataset)
dataset = dataset[args.split]
swe_bench_tests = filter_dataset(dataset.to_pandas(), 'instance_id')
logger.info(
f'Loaded dataset {args.dataset} with split {args.split}: {len(swe_bench_tests)} tasks'
)
swe_bench_tests = dataset.to_pandas()
# Determine filter strategy based on flag
filter_func = None
if args.filter_dataset_after_sampling:
# Pass filter as callback to apply after sampling
def filter_func(df):
return filter_dataset(df, 'instance_id')
logger.info(
f'Loaded dataset {args.dataset} with split {args.split}: {len(swe_bench_tests)} tasks (filtering will occur after sampling)'
)
else:
# Apply filter before sampling
swe_bench_tests = filter_dataset(swe_bench_tests, 'instance_id')
logger.info(
f'Loaded dataset {args.dataset} with split {args.split}: {len(swe_bench_tests)} tasks'
)
llm_config = None
if args.llm_config:
@@ -810,7 +833,9 @@ if __name__ == '__main__':
output_file = os.path.join(metadata.eval_output_dir, 'output.jsonl')
print(f'### OUTPUT FILE: {output_file} ###')
instances = prepare_dataset(swe_bench_tests, output_file, args.eval_n_limit)
instances = prepare_dataset(
swe_bench_tests, output_file, args.eval_n_limit, filter_func=filter_func
)
if len(instances) > 0 and not isinstance(
instances['FAIL_TO_PASS'][instances['FAIL_TO_PASS'].index[0]], str

View File

@@ -8,8 +8,14 @@
MODEL=$1 # eg your llm config name in config.toml (eg: "llm.claude-3-5-sonnet-20241022-t05")
EXP_NAME=$2 # "train-t05"
EVAL_DATASET=$3 # path to original dataset (jsonl file)
N_WORKERS=${4:-64}
N_RUNS=${5:-1}
MAX_ITER=$4
N_WORKERS=${5:-64}
N_RUNS=${6:-1}
EVAL_LIMIT=${7:-}
SKIP_IDS_THRESHOLD=$8
SKIP_IDS_PATTERN=$9
INPUT_SKIP_IDS=${10}
FILTER_DATASET_AFTER_SAMPLING=${11:-}
export EXP_NAME=$EXP_NAME
# use 2x resources for rollout since some codebases are pretty resource-intensive
@@ -17,6 +23,7 @@ export DEFAULT_RUNTIME_RESOURCE_FACTOR=2
echo "MODEL: $MODEL"
echo "EXP_NAME: $EXP_NAME"
echo "EVAL_DATASET: $EVAL_DATASET"
echo "INPUT_SKIP_IDS: $INPUT_SKIP_IDS"
# Generate DATASET path by adding _with_runtime_ before .jsonl extension
DATASET="${EVAL_DATASET%.jsonl}_with_runtime_.jsonl" # path to converted dataset
@@ -35,9 +42,6 @@ else
export SANDBOX_REMOTE_RUNTIME_API_URL="https://runtime.eval.all-hands.dev"
fi
#EVAL_LIMIT=3000
MAX_ITER=100
# ===== Run inference =====
source "evaluation/utils/version_control.sh"
@@ -69,17 +73,52 @@ function run_eval() {
--dataset $DATASET \
--split $SPLIT"
# Conditionally add filter flag
if [ "$FILTER_DATASET_AFTER_SAMPLING" = "true" ]; then
COMMAND="$COMMAND --filter_dataset_after_sampling"
fi
echo "Running command: $COMMAND"
if [ -n "$EVAL_LIMIT" ]; then
echo "EVAL_LIMIT: $EVAL_LIMIT"
COMMAND="$COMMAND --eval-n-limit $EVAL_LIMIT"
fi
# Run the command
eval $COMMAND
}
for run_idx in $(seq 1 $N_RUNS); do
if [ -n "$SKIP_IDS_THRESHOLD" ]; then
echo "Computing SKIP_IDS for run $run_idx..."
SKIP_CMD="poetry run python evaluation/benchmarks/multi_swe_bench/compute_skip_ids.py $SKIP_IDS_THRESHOLD"
if [ -n "$SKIP_IDS_PATTERN" ]; then
SKIP_CMD="$SKIP_CMD --pattern \"$SKIP_IDS_PATTERN\""
fi
COMPUTED_SKIP_IDS=$(eval $SKIP_CMD)
SKIP_STATUS=$?
if [ $SKIP_STATUS -ne 0 ]; then
echo "ERROR: Skip IDs computation failed with exit code $SKIP_STATUS"
exit $SKIP_STATUS
fi
echo "COMPUTED_SKIP_IDS: $COMPUTED_SKIP_IDS"
else
echo "SKIP_IDS_THRESHOLD not provided, skipping SKIP_IDS computation"
COMPUTED_SKIP_IDS=""
fi
# Concatenate COMPUTED_SKIP_IDS and INPUT_SKIP_IDS
if [ -n "$COMPUTED_SKIP_IDS" ] && [ -n "$INPUT_SKIP_IDS" ]; then
export SKIP_IDS="${COMPUTED_SKIP_IDS},${INPUT_SKIP_IDS}"
elif [ -n "$COMPUTED_SKIP_IDS" ]; then
export SKIP_IDS="$COMPUTED_SKIP_IDS"
elif [ -n "$INPUT_SKIP_IDS" ]; then
export SKIP_IDS="$INPUT_SKIP_IDS"
else
unset SKIP_IDS
fi
echo "FINAL SKIP_IDS: $SKIP_IDS"
echo ""
while true; do
echo "### Running inference... ###"

View File

@@ -0,0 +1,65 @@
# SWE-fficiency Evaluation
This folder contains the OpenHands inference generation of the [SWE-fficiency benchmark](https://swefficiency.com/) ([paper](https://arxiv.org/pdf/2507.12415v1)).
The evaluation consists of three steps:
1. Environment setup: [install python environment](../../README.md#development-environment) and [configure LLM config](../../README.md#configure-openhands-and-your-llm).
2. [Run inference](#running-inference-locally-with-docker): Generate a edit patch for each Github issue
3. [Evaluate patches](#evaluate-generated-patches)
## Setup Environment and LLM Configuration
Please follow instruction [here](../../README.md#setup) to setup your local development environment and LLM.
## Running inference Locally with Docker
Make sure your Docker daemon is running, and you have ample disk space (at least 200-500GB, depends on the SWE-PErf set you are running on) for the instance-level docker image.
When the `run_infer.sh` script is started, it will automatically pull the relevant SWE-Perf images.
For example, for instance ID `scikit-learn_scikit-learn-11674`, it will try to pull our pre-build docker image `betty1202/sweb.eval.x86_64.scikit-learn_s_scikit-learn-11674` from DockerHub.
This image will be used create an OpenHands runtime image where the agent will operate on.
```bash
./evaluation/benchmarks/swefficiency/scripts/run_infer.sh [model_config] [git-version] [agent] [eval_limit] [max_iter] [num_workers] [dataset] [dataset_split] [n_runs] [mode]
# Example
./evaluation/benchmarks/swefficiency/scripts/run_infer.sh llm.eval_gpt4_1106_preview HEAD CodeActAgent 500 100 1 swefficiency/swefficiency test
```
where `model_config` is mandatory, and the rest are optional.
- `model_config`, e.g. `eval_gpt4_1106_preview`, is the config group name for your
LLM settings, as defined in your `config.toml`.
- `git-version`, e.g. `HEAD`, is the git commit hash of the OpenHands version you would
like to evaluate. It could also be a release tag like `0.6.2`.
- `agent`, e.g. `CodeActAgent`, is the name of the agent for benchmarks, defaulting
to `CodeActAgent`.
- `eval_limit`, e.g. `10`, limits the evaluation to the first `eval_limit` instances. By
default, the script evaluates the entire SWE-Perf test set (140 issues). Note:
in order to use `eval_limit`, you must also set `agent`.
- `max_iter`, e.g. `20`, is the maximum number of iterations for the agent to run. By
default, it is set to 100.
- `num_workers`, e.g. `3`, is the number of parallel workers to run the evaluation. By
default, it is set to 1.
- `dataset`, a huggingface dataset name. e.g. `SWE-Perf/SWE-Perf`, specifies which dataset to evaluate on.
- `dataset_split`, split for the huggingface dataset. e.g., `test`, `dev`. Default to `test`.
- `n_runs`, e.g. `3`, is the number of times to run the evaluation. Default is 1.
- `mode`, e.g. `swt`, `swt-ci`, or `swe`, specifies the evaluation mode. Default is `swe`.
> [!CAUTION]
> Setting `num_workers` larger than 1 is not officially tested, YMMV.
Let's say you'd like to run 10 instances using `llm.eval_gpt4_1106_preview` and CodeActAgent,
then your command would be:
```bash
./evaluation/benchmarks/swe_bench/scripts/run_infer.sh llm.eval_gpt4_1106_preview HEAD CodeActAgent 10
```
### 2. Run the SWE-fficiency benchmark official evaluation
Once the output is converted, use the [official SWE-fficiency benchmark evaluation](https://github.com/swefficiency/swefficiency) to evaluate it.

View File

@@ -0,0 +1,52 @@
"""
Utilities for handling binary files and patch generation in SWE-bench evaluation.
"""
def remove_binary_diffs(patch_text):
"""
Remove binary file diffs from a git patch.
Args:
patch_text (str): The git patch text
Returns:
str: The cleaned patch text with binary diffs removed
"""
lines = patch_text.splitlines()
cleaned_lines = []
block = []
is_binary_block = False
for line in lines:
if line.startswith('diff --git '):
if block and not is_binary_block:
cleaned_lines.extend(block)
block = [line]
is_binary_block = False
elif 'Binary files' in line:
is_binary_block = True
block.append(line)
else:
block.append(line)
if block and not is_binary_block:
cleaned_lines.extend(block)
return '\n'.join(cleaned_lines)
def remove_binary_files_from_git():
"""
Generate a bash command to remove binary files from git staging.
Returns:
str: A bash command that removes binary files from git staging
"""
return """
for file in $(git status --porcelain | grep -E "^(M| M|\\?\\?|A| A)" | cut -c4-); do
if [ -f "$file" ] && (file "$file" | grep -q "executable" || git check-attr binary "$file" | grep -q "binary: set"); then
git rm -f "$file" 2>/dev/null || rm -f "$file"
echo "Removed: $file"
fi
done
""".strip()

View File

@@ -0,0 +1,960 @@
import asyncio
import copy
import functools
import json
import multiprocessing
import os
import tempfile
from typing import Any, Literal
import pandas as pd
import toml
from datasets import load_dataset
import openhands.agenthub
from evaluation.benchmarks.swe_bench.binary_patch_utils import (
remove_binary_diffs,
remove_binary_files_from_git,
)
from evaluation.utils.shared import (
EvalException,
EvalMetadata,
EvalOutput,
assert_and_raise,
codeact_user_response,
get_default_sandbox_config_for_eval,
get_metrics,
is_fatal_evaluation_error,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
run_evaluation,
update_llm_config_for_completions_logging,
)
from openhands.controller.state.state import State
from openhands.core.config import (
AgentConfig,
OpenHandsConfig,
get_evaluation_parser,
get_llm_config_arg,
)
from openhands.core.config.condenser_config import NoOpCondenserConfig
from openhands.core.config.utils import get_condenser_config_arg
from openhands.core.logger import openhands_logger as logger
from openhands.core.main import create_runtime, run_controller
from openhands.critic import AgentFinishedCritic
from openhands.events.action import CmdRunAction, FileReadAction, MessageAction
from openhands.events.observation import (
CmdOutputObservation,
ErrorObservation,
FileReadObservation,
)
from openhands.events.serialization.event import event_from_dict, event_to_dict
from openhands.runtime.base import Runtime
from openhands.utils.async_utils import call_async_from_sync
from openhands.utils.shutdown_listener import sleep_if_should_continue
USE_HINT_TEXT = os.environ.get('USE_HINT_TEXT', 'false').lower() == 'true'
RUN_WITH_BROWSING = os.environ.get('RUN_WITH_BROWSING', 'false').lower() == 'true'
BenchMode = Literal['swe', 'swt', 'swt-ci']
AGENT_CLS_TO_FAKE_USER_RESPONSE_FN = {
'CodeActAgent': codeact_user_response,
}
def _get_swebench_workspace_dir_name(instance: pd.Series) -> str:
return f'{instance.repo}__{instance.version}'.replace('/', '__')
def get_instruction(instance: pd.Series, metadata: EvalMetadata) -> MessageAction:
workspace_dir_name = _get_swebench_workspace_dir_name(instance)
# TODO: Change to testbed?
instruction = f"""
<uploaded_files>
/workspace/{workspace_dir_name}
</uploaded_files>
Ive uploaded a python code repository in the directory workspace_dir_name. Consider the following performance workload and `workload()` function showing an specific usage of the repository:
<performance_workload>
{instance.workload}
</performance_workload>
Can you help me implement the necessary changes to the repository so that the runtime of the `workload()` function is faster? Basic guidelines:
1. Your task is to make changes to non-test files in the /workspace directory to improve the performance of the code running in `workload()`. Please do not directly change the implementation of the `workload()` function to optimize things: I want you to focus on making the workload AS IS run faster by only editing the repository containing code that the `workload()` function calls.
2. Make changes while ensuring the repository is functionally equivalent to the original: your changes should not introduce new bugs or cause already-passing tests to begin failing after your changes. However, you do not need to worry about tests that already fail without any changes made. For relevant test files you find in the repository, you can run them via the bash command `{instance.test_cmd} <test_file>` to check for correctness. Note that running all the tests may take a long time, so you need to determine which tests are relevant to your changes.
3. Make sure the `workload()` function improves in performance after you make changes to the repository. The workload can potentially take some time to run, so please allow it to finish and be generous with setting your timeout parameter (a timeout value of 3600 or larger here is encouraged): for faster iteration, you should adjust the workload script to use fewer iterations. Before you complete your task, please make sure to check that the **original performance workload** and `workload()` function runs successfully and the performance is improved.
4. You may need to reinstall/rebuild the repo for your changes to take effect before testing if you made non-Python changes. Reinstalling may take a long time to run (a timeout value of 3600 or larger here is encouraged), so please be patient with running it and allow it to complete if possible. You can reinstall the repository by running the bash command `{instance.rebuild_cmd}` in the workspace directory.
5. All the dependencies required to run the `workload()` function are already installed in the environment. You should not install or upgrade any dependencies.
Follow these steps to improve performance:
1. As a first step, explore the repository structure.
2. Create a Python script to reproduce the performance workload, execute it with python <workload_file>, and examine the printed output metrics.
3. Edit the source code of the repository to improve performance. Please do not change the contents of the `workload()` function itself, but focus on optimizing the code in the repository that the original `workload()` function uses.
4. If non-Python changes were made, rebuild the repo to make sure the changes take effect.
5. Rerun your script to confirm that performance has improved.
6. If necessary, identify any relevant test files in the repository related to your changes and verify that test statuses did not change after your modifications.
7. After each attempted change, please reflect on the changes attempted and the performance impact observed. If the performance did not improve, consider alternative approaches or optimizations.
8. Once you are satisfied, please use the finish command to complete your task.
Please remember that you should not change the implementation of the `workload()` function. The performance improvement should solely come from editing the source files in the code repository.
"""
if RUN_WITH_BROWSING:
instruction += (
'<IMPORTANT!>\nYou SHOULD NEVER attempt to browse the web. </IMPORTANT!>\n'
)
return MessageAction(content=instruction)
def get_instance_docker_image(
instance_id: str,
) -> str:
return f'ghcr.io/swefficiency/swefficiency-images:{instance_id}'
def get_config(
instance: pd.Series,
metadata: EvalMetadata,
cpu_group: list[int] | None = None,
) -> OpenHandsConfig:
# We use a different instance image for the each instance of swe-bench eval
base_container_image = get_instance_docker_image(
instance['instance_id'],
)
logger.info(
f'Using instance container image: {base_container_image}. '
f'Please make sure this image exists. '
f'Submit an issue on https://github.com/All-Hands-AI/OpenHands if you run into any issues.'
)
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.base_container_image = base_container_image
sandbox_config.enable_auto_lint = True
sandbox_config.use_host_network = False
sandbox_config.timeout = 3600
# Control container cleanup behavior via environment variable
# Default to False for multiprocessing stability to prevent cascade failures
sandbox_config.rm_all_containers = True
sandbox_config.platform = 'linux/amd64'
sandbox_config.remote_runtime_resource_factor = 4.0
sandbox_config.runtime_startup_env_vars.update(
{
'NO_CHANGE_TIMEOUT_SECONDS': '900', # 15 minutes
}
)
if cpu_group is not None:
print(f'Configuring Docker runtime with CPU group: {cpu_group}')
sandbox_config.docker_runtime_kwargs = {
# HACK: Use the cpu_group if provided, otherwise use all available CPUs
'cpuset_cpus': ','.join(map(str, cpu_group)),
'nano_cpus': int(1e9 * len(cpu_group)), # optional: hard cap to vCPU count
'mem_limit': '16g',
}
# Note: We keep rm_all_containers = False for worker process safety
config = OpenHandsConfig(
default_agent=metadata.agent_class,
run_as_openhands=False,
max_iterations=metadata.max_iterations,
runtime=os.environ.get('RUNTIME', 'docker'),
sandbox=sandbox_config,
# do not mount workspace
workspace_base=None,
workspace_mount_path=None,
)
config.set_llm_config(
update_llm_config_for_completions_logging(
metadata.llm_config, metadata.eval_output_dir, instance['instance_id']
)
)
agent_config = AgentConfig(
enable_jupyter=False,
enable_browsing=RUN_WITH_BROWSING,
enable_llm_editor=False,
enable_mcp=False,
condenser=metadata.condenser_config,
enable_prompt_extensions=False,
)
config.set_agent_config(agent_config)
return config
def initialize_runtime(
runtime: Runtime,
instance: pd.Series, # this argument is not required
metadata: EvalMetadata,
):
"""Initialize the runtime for the agent.
This function is called before the runtime is used to run the agent.
"""
logger.info('-' * 30)
logger.info('BEGIN Runtime Initialization Fn')
logger.info('-' * 30)
workspace_dir_name = _get_swebench_workspace_dir_name(instance)
obs: CmdOutputObservation
# Set instance id and git configuration
action = CmdRunAction(
command=f"""echo 'export SWE_INSTANCE_ID={instance['instance_id']}' >> ~/.bashrc && echo 'export PIP_CACHE_DIR=~/.cache/pip' >> ~/.bashrc && echo "alias git='git --no-pager'" >> ~/.bashrc && git config --global core.pager "" && git config --global diff.binary false"""
)
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(
obs.exit_code == 0,
f'Failed to export SWE_INSTANCE_ID and configure git: {str(obs)}',
)
action = CmdRunAction(command="""export USER=$(whoami); echo USER=${USER} """)
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(obs.exit_code == 0, f'Failed to export USER: {str(obs)}')
# inject the init script
script_dir = os.path.dirname(__file__)
# inject the instance info
action = CmdRunAction(command='mkdir -p /swe_util/eval_data/instances')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(
obs.exit_code == 0,
f'Failed to create /swe_util/eval_data/instances: {str(obs)}',
)
swe_instance_json_name = 'swe-bench-instance.json'
with tempfile.TemporaryDirectory() as temp_dir:
# Construct the full path for the desired file name within the temporary directory
temp_file_path = os.path.join(temp_dir, swe_instance_json_name)
# Write to the file with the desired name within the temporary directory
with open(temp_file_path, 'w') as f:
if not isinstance(instance, dict):
json.dump([instance.to_dict()], f)
else:
json.dump([instance], f)
# Copy the file to the desired location
runtime.copy_to(temp_file_path, '/swe_util/eval_data/instances/')
# inject the instance swe entry
runtime.copy_to(
str(os.path.join(script_dir, 'scripts/setup/instance_swe_entry.sh')),
'/swe_util/',
)
action = CmdRunAction(command='cat ~/.bashrc')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(obs.exit_code == 0, f'Failed to cat ~/.bashrc: {str(obs)}')
action = CmdRunAction(command='source ~/.bashrc')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
if isinstance(obs, ErrorObservation):
logger.error(f'Failed to source ~/.bashrc: {str(obs)}')
assert_and_raise(obs.exit_code == 0, f'Failed to source ~/.bashrc: {str(obs)}')
action = CmdRunAction(command='source /swe_util/instance_swe_entry.sh')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(
obs.exit_code == 0,
f'Failed to source /swe_util/instance_swe_entry.sh: {str(obs)}',
)
action = CmdRunAction(command=f'cd /workspace/{workspace_dir_name}')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(
obs.exit_code == 0,
f'Failed to cd to /workspace/{workspace_dir_name}: {str(obs)}',
)
action = CmdRunAction(command='git reset --hard')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(obs.exit_code == 0, f'Failed to git reset --hard: {str(obs)}')
action = CmdRunAction(
command='for remote_name in $(git remote); do git remote remove "${remote_name}"; done'
)
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(obs.exit_code == 0, f'Failed to remove git remotes: {str(obs)}')
action = CmdRunAction(command='which python')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(
obs.exit_code == 0 and 'testbed' in obs.content,
f'Expected to find python interpreter from testbed, but got: {str(obs)}',
)
logger.info('-' * 30)
logger.info('END Runtime Initialization Fn')
logger.info('-' * 30)
def complete_runtime(
runtime: Runtime,
instance: pd.Series, # this argument is not required, but it is used to get the workspace_dir_name
) -> dict[str, Any]:
"""Complete the runtime for the agent.
This function is called before the runtime is used to run the agent.
If you need to do something in the sandbox to get the correctness metric after
the agent has run, modify this function.
"""
logger.info('-' * 30)
logger.info('BEGIN Runtime Completion Fn')
logger.info('-' * 30)
obs: CmdOutputObservation
workspace_dir_name = _get_swebench_workspace_dir_name(instance)
action = CmdRunAction(command=f'cd /workspace/{workspace_dir_name}')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
if obs.exit_code == -1:
# The previous command is still running
# We need to kill previous command
logger.info('The previous command is still running, trying to kill it...')
action = CmdRunAction(command='C-c')
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
# Then run the command again
action = CmdRunAction(command=f'cd /workspace/{workspace_dir_name}')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
if obs.exit_code == -1:
# The previous command is still running
# We need to kill previous command
logger.info('The previous command is still running, trying to ctrl+z it...')
action = CmdRunAction(command='C-z')
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
# Then run the command again
action = CmdRunAction(command=f'cd /workspace/{workspace_dir_name}')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(
isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
f'Failed to cd to /workspace/{workspace_dir_name}: {str(obs)}',
)
action = CmdRunAction(command='git config --global core.pager ""')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(
isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
f'Failed to git config --global core.pager "": {str(obs)}',
)
# First check for any git repositories in subdirectories
action = CmdRunAction(command='find . -type d -name .git -not -path "./.git"')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(
isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
f'Failed to find git repositories: {str(obs)}',
)
git_dirs = [p for p in obs.content.strip().split('\n') if p]
if git_dirs:
# Remove all .git directories in subdirectories
for git_dir in git_dirs:
action = CmdRunAction(command=f'rm -rf "{git_dir}"')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(
isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
f'Failed to remove git directory {git_dir}: {str(obs)}',
)
# add all files
action = CmdRunAction(command='git add -A')
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(
isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
f'Failed to git add -A: {str(obs)}',
)
# Remove binary files from git staging
action = CmdRunAction(command=remove_binary_files_from_git())
action.set_hard_timeout(600)
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
assert_and_raise(
isinstance(obs, CmdOutputObservation) and obs.exit_code == 0,
f'Failed to remove binary files: {str(obs)}',
)
n_retries = 0
git_patch = None
while n_retries < 5:
action = CmdRunAction(
command=f'git diff --no-color --cached {instance["base_commit"]} > patch.diff'
)
action.set_hard_timeout(max(300 + 100 * n_retries, 600))
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
n_retries += 1
if isinstance(obs, CmdOutputObservation):
if obs.exit_code == 0:
# Read the patch file
action = FileReadAction(path='patch.diff')
action.set_hard_timeout(max(300 + 100 * n_retries, 600))
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
if isinstance(obs, FileReadObservation):
git_patch = obs.content
break
elif isinstance(obs, ErrorObservation):
# Fall back to cat "patch.diff" to get the patch
assert 'File could not be decoded as utf-8' in obs.content
action = CmdRunAction(command='cat patch.diff')
action.set_hard_timeout(max(300 + 100 * n_retries, 600))
logger.info(action, extra={'msg_type': 'ACTION'})
obs = runtime.run_action(action)
assert isinstance(obs, CmdOutputObservation) and obs.exit_code == 0
logger.info(obs, extra={'msg_type': 'OBSERVATION'})
git_patch = obs.content
break
else:
assert_and_raise(False, f'Unexpected observation type: {str(obs)}')
else:
logger.info('Failed to get git diff, retrying...')
sleep_if_should_continue(10)
elif isinstance(obs, ErrorObservation):
logger.error(f'Error occurred: {obs.content}. Retrying...')
sleep_if_should_continue(10)
else:
assert_and_raise(False, f'Unexpected observation type: {str(obs)}')
assert_and_raise(git_patch is not None, 'Failed to get git diff (None)')
# Remove binary diffs from the patch
git_patch = remove_binary_diffs(git_patch)
logger.info('-' * 30)
logger.info('END Runtime Completion Fn')
logger.info('-' * 30)
return {'git_patch': git_patch}
class CPUGroupManager:
def __init__(self, cpu_groups_queue: multiprocessing.Queue):
self.cpu_groups_queue = cpu_groups_queue
def __enter__(self):
# Get the current CPU group for this worker]
if self.cpu_groups_queue is not None:
self.cpu_group = self.cpu_groups_queue.get()
logger.info(f'Worker started with CPU group: {self.cpu_group}')
return self.cpu_group
return None
def __exit__(self, exc_type, exc_value, traceback):
# Put the CPU group back into the queue for other workers to use
if self.cpu_groups_queue is not None:
self.cpu_groups_queue.put(self.cpu_group)
logger.info(f'Worker finished with CPU group: {self.cpu_group}')
def cleanup_docker_resources_for_worker():
"""Clean up Docker resources specific to this worker process.
This prevents cascade failures when one worker's container crashes.
Note: This only cleans up stale locks, not containers, to avoid
interfering with other workers. Container cleanup is handled
by the DockerRuntime.close() method based on configuration.
"""
# Clean up any stale port locks from crashed processes
try:
from openhands.runtime.utils.port_lock import cleanup_stale_locks
cleanup_stale_locks(max_age_seconds=300) # Clean up locks older than 5 minutes
except Exception as e:
logger.debug(f'Error cleaning up stale port locks: {e}')
def process_instance(
instance: pd.Series,
metadata: EvalMetadata,
reset_logger: bool = True,
runtime_failure_count: int = 0,
cpu_groups_queue: multiprocessing.Queue = None,
) -> EvalOutput:
# Clean up any Docker resources from previous failed runs
cleanup_docker_resources_for_worker()
# HACK: Use the global and get the cpu group for this worker.
with CPUGroupManager(cpu_groups_queue) as cpu_group:
config = get_config(instance, metadata, cpu_group=cpu_group)
# Setup the logger properly, so you can run multi-processing to parallelize the evaluation
if reset_logger:
log_dir = os.path.join(metadata.eval_output_dir, 'infer_logs')
reset_logger_for_multiprocessing(logger, instance.instance_id, log_dir)
else:
logger.info(f'Starting evaluation for instance {instance.instance_id}.')
metadata = copy.deepcopy(metadata)
metadata.details['runtime_failure_count'] = runtime_failure_count
metadata.details['remote_runtime_resource_factor'] = (
config.sandbox.remote_runtime_resource_factor
)
runtime = create_runtime(config, sid=None)
call_async_from_sync(runtime.connect)
try:
initialize_runtime(runtime, instance, metadata)
message_action = get_instruction(instance, metadata)
# Here's how you can run the agent (similar to the `main` function) and get the final task state
state: State | None = asyncio.run(
run_controller(
config=config,
initial_user_action=message_action,
runtime=runtime,
fake_user_response_fn=AGENT_CLS_TO_FAKE_USER_RESPONSE_FN[
metadata.agent_class
],
)
)
# if fatal error, throw EvalError to trigger re-run
if is_fatal_evaluation_error(state.last_error):
raise EvalException('Fatal error detected: ' + state.last_error)
# ======= THIS IS SWE-Bench specific =======
# Get git patch
return_val = complete_runtime(runtime, instance)
git_patch = return_val['git_patch']
logger.info(
f'Got git diff for instance {instance.instance_id}:\n--------\n{git_patch}\n--------'
)
except Exception as e:
# Log the error but don't let it crash other workers
logger.error(
f'Error in worker processing instance {instance.instance_id}: {str(e)}'
)
raise
finally:
# Ensure runtime is properly closed to prevent cascade failures
try:
runtime.close()
except Exception as e:
logger.warning(
f'Error closing runtime for {instance.instance_id}: {str(e)}'
)
# Don't re-raise - we want to continue cleanup
# ==========================================
# ======= Attempt to evaluate the agent's edits =======
# we use eval_infer.sh to evaluate the agent's edits, not here
# because the agent may alter the environment / testcases
test_result = {
'git_patch': git_patch,
}
# If you are working on some simpler benchmark that only evaluates the final model output (e.g., in a MessageAction)
# You can simply get the LAST `MessageAction` from the returned `state.history` and parse it for evaluation.
if state is None:
raise ValueError('State should not be None.')
# NOTE: this is NO LONGER the event stream, but an agent history that includes delegate agent's events
histories = [event_to_dict(event) for event in state.history]
metrics = get_metrics(state)
# Save the output
instruction = message_action.content
if message_action.image_urls:
instruction += (
'\n\n<image_urls>'
+ '\n'.join(message_action.image_urls)
+ '</image_urls>'
)
output = EvalOutput(
instance_id=instance.instance_id,
instruction=instruction,
instance=instance.to_dict(), # SWE Bench specific
test_result=test_result,
metadata=metadata,
history=histories,
metrics=metrics,
error=state.last_error if state and state.last_error else None,
)
return output
def filter_dataset(dataset: pd.DataFrame, filter_column: str) -> pd.DataFrame:
file_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'config.toml')
if os.path.exists(file_path):
with open(file_path, 'r') as file:
data = toml.load(file)
if 'selected_ids' in data:
selected_ids = data['selected_ids']
logger.info(
f'Filtering {len(selected_ids)} tasks from "selected_ids"...'
)
subset = dataset[dataset[filter_column].isin(selected_ids)]
logger.info(f'Retained {subset.shape[0]} tasks after filtering')
return subset
if 'selected_repos' in data:
# repos for the swe-bench instances:
# ['astropy/astropy', 'django/django', 'matplotlib/matplotlib', 'mwaskom/seaborn', 'pallets/flask', 'psf/requests', 'pydata/xarray', 'pylint-dev/pylint', 'pytest-dev/pytest', 'scikit-learn/scikit-learn', 'sphinx-doc/sphinx', 'sympy/sympy']
selected_repos = data['selected_repos']
if isinstance(selected_repos, str):
selected_repos = [selected_repos]
assert isinstance(selected_repos, list)
logger.info(
f'Filtering {selected_repos} tasks from "selected_repos"...'
)
subset = dataset[dataset['repo'].isin(selected_repos)]
logger.info(f'Retained {subset.shape[0]} tasks after filtering')
return subset
skip_ids = os.environ.get('SKIP_IDS', '').split(',')
if len(skip_ids) > 0:
logger.info(f'Filtering {len(skip_ids)} tasks from "SKIP_IDS"...')
return dataset[~dataset[filter_column].isin(skip_ids)]
return dataset
def divide_cpus_among_workers(num_workers, num_cpus_per_worker=4, num_to_skip=0):
"""Divide CPUs among workers, with better error handling for multiprocessing."""
try:
current_cpus = list(os.sched_getaffinity(0))
except AttributeError:
# os.sched_getaffinity not available on all platforms
import multiprocessing
current_cpus = list(range(multiprocessing.cpu_count()))
num_cpus = len(current_cpus)
if num_workers <= 0:
raise ValueError('Number of workers must be greater than 0')
# Chec that num worers and num_cpus_per_worker fit into available CPUs
total_cpus_needed = num_workers * num_cpus_per_worker + num_to_skip
if total_cpus_needed > num_cpus:
raise ValueError(
f'Not enough CPUs available. Requested {total_cpus_needed} '
f'CPUs (num_workers={num_workers}, num_cpus_per_worker={num_cpus_per_worker}, '
f'num_to_skip={num_to_skip}), but only {num_cpus} CPUs are available.'
)
# Divide this into groups, skipping the first `num_to_skip` CPUs.
available_cpus = current_cpus[num_to_skip:]
cpu_groups = [
available_cpus[i * num_cpus_per_worker : (i + 1) * num_cpus_per_worker]
for i in range(num_workers)
]
print(
f'Divided {num_cpus} CPUs into {num_workers} groups, each with {num_cpus_per_worker} CPUs.'
)
print(f'CPU groups: {cpu_groups}')
return cpu_groups
if __name__ == '__main__':
parser = get_evaluation_parser()
parser.add_argument(
'--dataset',
type=str,
default=None,
help='data set to evaluate on, for now use local.',
)
parser.add_argument(
'--split',
type=str,
default='test',
help='split to evaluate on',
)
parser.add_argument(
'--mode',
type=str,
default='swe',
help='mode to evaluate on',
)
args, _ = parser.parse_known_args()
# NOTE: It is preferable to load datasets from huggingface datasets and perform post-processing
# so we don't need to manage file uploading to OpenHands's repo
# dataset = load_dataset(args.dataset, split=args.split)
# swe_bench_tests = filter_dataset(dataset.to_pandas(), 'instance_id')
dataset = load_dataset(args.dataset, split=args.split)
# Convert dataset to pandas DataFrame if it is not already.
if not isinstance(dataset, pd.DataFrame):
dataset = dataset.to_pandas()
dataset['version'] = dataset['version'].astype(str)
# Convert created_at column to string.
dataset['created_at'] = dataset['created_at'].astype(str)
swe_bench_tests = filter_dataset(dataset, 'instance_id')
logger.info(
f'Loaded dataset {args.dataset} with split {args.split}: {len(swe_bench_tests)} tasks'
)
llm_config = None
if args.llm_config:
llm_config = get_llm_config_arg(args.llm_config)
llm_config.log_completions = True
# modify_params must be False for evaluation purpose, for reproducibility and accurancy of results
llm_config.modify_params = False
if llm_config is None:
raise ValueError(f'Could not find LLM config: --llm_config {args.llm_config}')
# Get condenser config from environment variable
condenser_name = os.environ.get('EVAL_CONDENSER')
if condenser_name:
condenser_config = get_condenser_config_arg(condenser_name)
if condenser_config is None:
raise ValueError(
f'Could not find Condenser config: EVAL_CONDENSER={condenser_name}'
)
else:
# If no specific condenser config is provided via env var, default to NoOpCondenser
condenser_config = NoOpCondenserConfig()
logger.debug(
'No Condenser config provided via EVAL_CONDENSER, using NoOpCondenser.'
)
details = {'mode': args.mode}
_agent_cls = openhands.agenthub.Agent.get_cls(args.agent_cls)
dataset_descrption = (
args.dataset.replace('/', '__') + '-' + args.split.replace('/', '__')
)
metadata = make_metadata(
llm_config,
dataset_descrption,
args.agent_cls,
args.max_iterations,
args.eval_note,
args.eval_output_dir,
details=details,
condenser_config=condenser_config,
)
output_file = os.path.join(metadata.eval_output_dir, 'output.jsonl')
print(f'### OUTPUT FILE: {output_file} ###')
# Run evaluation in iterative mode:
# If a rollout fails to output AgentFinishAction, we will try again until it succeeds OR total 3 attempts have been made.
ITERATIVE_EVAL_MODE = (
os.environ.get('ITERATIVE_EVAL_MODE', 'false').lower() == 'true'
)
ITERATIVE_EVAL_MODE_MAX_ATTEMPTS = int(
os.environ.get('ITERATIVE_EVAL_MODE_MAX_ATTEMPTS', '3')
)
# Get all CPUs and divide into groups of num_workers and put them into a multiprocessing.Queue.
cpu_groups_queue = None
cpu_groups_list = divide_cpus_among_workers(args.eval_num_workers, num_to_skip=8)
cpu_groups_queue = multiprocessing.Manager().Queue()
for cpu_group in cpu_groups_list:
cpu_groups_queue.put(cpu_group)
if not ITERATIVE_EVAL_MODE:
# load the dataset
instances = prepare_dataset(swe_bench_tests, output_file, args.eval_n_limit)
process_instance_with_cpu_groups = functools.partial(
process_instance,
cpu_groups_queue=cpu_groups_queue,
)
config = get_config(
instances.iloc[0], # Use the first instance to get the config
metadata,
cpu_group=None, # We will use the cpu_groups_queue to get the cpu group later
)
run_evaluation(
instances,
metadata,
output_file,
args.eval_num_workers,
process_instance_with_cpu_groups,
timeout_seconds=8
* 60
* 60, # 8 hour PER instance should be more than enough
max_retries=3,
)
else:
critic = AgentFinishedCritic()
def get_cur_output_file_path(attempt: int) -> str:
return (
f'{output_file.removesuffix(".jsonl")}.critic_attempt_{attempt}.jsonl'
)
eval_ids = None
for attempt in range(1, ITERATIVE_EVAL_MODE_MAX_ATTEMPTS + 1):
cur_output_file = get_cur_output_file_path(attempt)
logger.info(
f'Running evaluation with critic {critic.__class__.__name__} for attempt {attempt} of {ITERATIVE_EVAL_MODE_MAX_ATTEMPTS}.'
)
# For deterministic eval, we set temperature to 0.1 for (>1) attempt
# so hopefully we get slightly different results
if attempt > 1 and metadata.llm_config.temperature == 0:
logger.info(
f'Detected temperature is 0 for (>1) attempt {attempt}. Setting temperature to 0.1...'
)
metadata.llm_config.temperature = 0.1
# Load instances - at first attempt, we evaluate all instances
# On subsequent attempts, we only evaluate the instances that failed the previous attempt determined by critic
instances = prepare_dataset(
swe_bench_tests, cur_output_file, args.eval_n_limit, eval_ids=eval_ids
)
if len(instances) > 0 and not isinstance(
instances['PASS_TO_PASS'][instances['PASS_TO_PASS'].index[0]], str
):
for col in ['PASS_TO_PASS', 'FAIL_TO_PASS']:
instances[col] = instances[col].apply(lambda x: str(x))
# Run evaluation - but save them to cur_output_file
logger.info(
f'Evaluating {len(instances)} instances for attempt {attempt}...'
)
run_evaluation(
instances,
metadata,
cur_output_file,
args.eval_num_workers,
process_instance,
timeout_seconds=8
* 60
* 60, # 8 hour PER instance should be more than enough
max_retries=1,
)
# When eval is done, we update eval_ids to the instances that failed the current attempt
instances_failed = []
logger.info(
f'Use critic {critic.__class__.__name__} to check {len(instances)} instances for attempt {attempt}...'
)
with open(cur_output_file, 'r') as f:
for line in f:
instance = json.loads(line)
try:
history = [
event_from_dict(event) for event in instance['history']
]
critic_result = critic.evaluate(
history, instance['test_result'].get('git_patch', '')
)
if not critic_result.success:
instances_failed.append(instance['instance_id'])
except Exception as e:
logger.error(
f'Error loading history for instance {instance["instance_id"]}: {e}'
)
instances_failed.append(instance['instance_id'])
logger.info(
f'{len(instances_failed)} instances failed the current attempt {attempt}: {instances_failed}'
)
eval_ids = instances_failed
# If no instances failed, we break
if len(instances_failed) == 0:
break
# Then we should aggregate the results from all attempts into the original output file
# and remove the intermediate files
logger.info(
'Aggregating results from all attempts into the original output file...'
)
fout = open(output_file, 'w')
added_instance_ids = set()
for attempt in reversed(range(1, ITERATIVE_EVAL_MODE_MAX_ATTEMPTS + 1)):
cur_output_file = get_cur_output_file_path(attempt)
if not os.path.exists(cur_output_file):
logger.warning(
f'Intermediate output file {cur_output_file} does not exist. Skipping...'
)
continue
with open(cur_output_file, 'r') as f:
for line in f:
instance = json.loads(line)
# Also make sure git_patch is not empty - otherwise we fall back to previous attempt (empty patch is worse than anything else)
if (
instance['instance_id'] not in added_instance_ids
and instance['test_result'].get('git_patch', '').strip()
):
fout.write(line)
added_instance_ids.add(instance['instance_id'])
logger.info(
f'Aggregated instances from {cur_output_file}. Total instances added so far: {len(added_instance_ids)}'
)
fout.close()
logger.info(
f'Done! Total {len(added_instance_ids)} instances added to {output_file}'
)

View File

@@ -0,0 +1,148 @@
#!/usr/bin/env bash
set -eo pipefail
source "evaluation/utils/version_control.sh"
MODEL_CONFIG=$1
COMMIT_HASH=$2
AGENT=$3
EVAL_LIMIT=$4
MAX_ITER=$5
NUM_WORKERS=$6
DATASET=$7
SPLIT=$8
N_RUNS=$9
MODE=${10}
if [ -z "$NUM_WORKERS" ]; then
NUM_WORKERS=1
echo "Number of workers not specified, use default $NUM_WORKERS"
fi
checkout_eval_branch
if [ -z "$AGENT" ]; then
echo "Agent not specified, use default CodeActAgent"
AGENT="CodeActAgent"
fi
if [ -z "$MAX_ITER" ]; then
echo "MAX_ITER not specified, use default 100"
MAX_ITER=100
fi
if [ -z "$RUN_WITH_BROWSING" ]; then
echo "RUN_WITH_BROWSING not specified, use default false"
RUN_WITH_BROWSING=false
fi
if [ -z "$DATASET" ]; then
echo "DATASET not specified, use default princeton-nlp/SWE-bench_Lite"
DATASET="swefficiency/swefficiency"
fi
if [ -z "$SPLIT" ]; then
echo "SPLIT not specified, use default test"
SPLIT="test"
fi
if [ -z "$MODE" ]; then
MODE="swe"
echo "MODE not specified, use default $MODE"
fi
if [ -n "$EVAL_CONDENSER" ]; then
echo "Using Condenser Config: $EVAL_CONDENSER"
else
echo "No Condenser Config provided via EVAL_CONDENSER, use default (NoOpCondenser)."
fi
export RUN_WITH_BROWSING=$RUN_WITH_BROWSING
echo "RUN_WITH_BROWSING: $RUN_WITH_BROWSING"
get_openhands_version
echo "AGENT: $AGENT"
echo "OPENHANDS_VERSION: $OPENHANDS_VERSION"
echo "MODEL_CONFIG: $MODEL_CONFIG"
echo "DATASET: $DATASET"
echo "SPLIT: $SPLIT"
echo "MAX_ITER: $MAX_ITER"
echo "NUM_WORKERS: $NUM_WORKERS"
echo "COMMIT_HASH: $COMMIT_HASH"
echo "MODE: $MODE"
echo "EVAL_CONDENSER: $EVAL_CONDENSER"
# Default to NOT use Hint
if [ -z "$USE_HINT_TEXT" ]; then
export USE_HINT_TEXT=false
fi
echo "USE_HINT_TEXT: $USE_HINT_TEXT"
EVAL_NOTE="$OPENHANDS_VERSION"
# if not using Hint, add -no-hint to the eval note
if [ "$USE_HINT_TEXT" = false ]; then
EVAL_NOTE="$EVAL_NOTE-no-hint"
fi
if [ "$RUN_WITH_BROWSING" = true ]; then
EVAL_NOTE="$EVAL_NOTE-with-browsing"
fi
if [ -n "$EXP_NAME" ]; then
EVAL_NOTE="$EVAL_NOTE-$EXP_NAME"
fi
# if mode != swe, add mode to the eval note
if [ "$MODE" != "swe" ]; then
EVAL_NOTE="${EVAL_NOTE}-${MODE}"
fi
# Add condenser config to eval note if provided
if [ -n "$EVAL_CONDENSER" ]; then
EVAL_NOTE="${EVAL_NOTE}-${EVAL_CONDENSER}"
fi
# export RUNTIME="remote"
# export SANDBOX_REMOTE_RUNTIME_API_URL="https://runtime.eval.all-hands.dev"
export NO_CHANGE_TIMEOUT_SECONDS=900 # 15 minutes
function run_eval() {
local eval_note="${1}"
COMMAND="poetry run python evaluation/benchmarks/swefficiency/run_infer.py \
--agent-cls $AGENT \
--llm-config $MODEL_CONFIG \
--max-iterations $MAX_ITER \
--eval-num-workers $NUM_WORKERS \
--eval-note $eval_note \
--dataset $DATASET \
--split $SPLIT \
--mode $MODE"
if [ -n "$EVAL_LIMIT" ]; then
echo "EVAL_LIMIT: $EVAL_LIMIT"
COMMAND="$COMMAND --eval-n-limit $EVAL_LIMIT"
fi
# Run the command
eval $COMMAND
}
unset SANDBOX_ENV_GITHUB_TOKEN # prevent the agent from using the github token to push
if [ -z "$N_RUNS" ]; then
N_RUNS=1
echo "N_RUNS not specified, use default $N_RUNS"
fi
# Skip runs if the run number is in the SKIP_RUNS list
# read from env variable SKIP_RUNS as a comma separated list of run numbers
SKIP_RUNS=(${SKIP_RUNS//,/ })
for i in $(seq 1 $N_RUNS); do
if [[ " ${SKIP_RUNS[@]} " =~ " $i " ]]; then
echo "Skipping run $i"
continue
fi
current_eval_note="$EVAL_NOTE-run_$i"
echo "EVAL_NOTE: $current_eval_note"
run_eval $current_eval_note
done
checkout_original_branch

View File

@@ -0,0 +1,43 @@
#!/usr/bin/env bash
source ~/.bashrc
SWEUTIL_DIR=/swe_util
# FIXME: Cannot read SWE_INSTANCE_ID from the environment variable
# SWE_INSTANCE_ID=django__django-11099
if [ -z "$SWE_INSTANCE_ID" ]; then
echo "Error: SWE_INSTANCE_ID is not set." >&2
exit 1
fi
# Read the swe-bench-test-lite.json file and extract the required item based on instance_id
item=$(jq --arg INSTANCE_ID "$SWE_INSTANCE_ID" '.[] | select(.instance_id == $INSTANCE_ID)' $SWEUTIL_DIR/eval_data/instances/swe-bench-instance.json)
if [[ -z "$item" ]]; then
echo "No item found for the provided instance ID."
exit 1
fi
WORKSPACE_NAME=$(echo "$item" | jq -r '(.repo | tostring) + "__" + (.version | tostring) | gsub("/"; "__")')
echo "WORKSPACE_NAME: $WORKSPACE_NAME"
# Clear the workspace
if [ -d /workspace ]; then
rm -rf /workspace/*
else
mkdir /workspace
fi
# Copy repo to workspace
if [ -d /workspace/$WORKSPACE_NAME ]; then
rm -rf /workspace/$WORKSPACE_NAME
fi
mkdir -p /workspace
cp -r /testbed /workspace/$WORKSPACE_NAME
# Activate instance-specific environment
if [ -d /opt/miniconda3 ]; then
. /opt/miniconda3/etc/profile.d/conda.sh
conda activate testbed
fi

View File

@@ -0,0 +1,27 @@
#!/usr/bin/env bash
set -e
EVAL_WORKSPACE="evaluation/benchmarks/swe_bench/eval_workspace"
mkdir -p $EVAL_WORKSPACE
# 1. Prepare REPO
echo "==== Prepare SWE-bench repo ===="
OH_SWE_BENCH_REPO_PATH="https://github.com/All-Hands-AI/SWE-bench.git"
OH_SWE_BENCH_REPO_BRANCH="eval"
git clone -b $OH_SWE_BENCH_REPO_BRANCH $OH_SWE_BENCH_REPO_PATH $EVAL_WORKSPACE/OH-SWE-bench
# 2. Prepare DATA
echo "==== Prepare SWE-bench data ===="
EVAL_IMAGE=ghcr.io/all-hands-ai/eval-swe-bench:builder_with_conda
EVAL_WORKSPACE=$(realpath $EVAL_WORKSPACE)
chmod +x $EVAL_WORKSPACE/OH-SWE-bench/swebench/harness/prepare_data.sh
if [ -d $EVAL_WORKSPACE/eval_data ]; then
rm -r $EVAL_WORKSPACE/eval_data
fi
docker run \
-v $EVAL_WORKSPACE:/workspace \
-w /workspace \
-u $(id -u):$(id -g) \
-e HF_DATASETS_CACHE="/tmp" \
--rm -it $EVAL_IMAGE \
bash -c "cd OH-SWE-bench/swebench/harness && /swe_util/miniforge3/bin/conda run -n swe-bench-eval ./prepare_data.sh && mv eval_data /workspace/"

View File

@@ -0,0 +1,96 @@
#!/usr/bin/env bash
set -e
# assert user name is `root`
if [ "$USER" != "root" ]; then
echo "Error: This script is intended to be run by the 'root' user only." >&2
exit 1
fi
source ~/.bashrc
SWEUTIL_DIR=/swe_util
# Create logs directory
LOG_DIR=/openhands/logs
mkdir -p $LOG_DIR && chmod 777 $LOG_DIR
# FIXME: Cannot read SWE_INSTANCE_ID from the environment variable
# SWE_INSTANCE_ID=django__django-11099
if [ -z "$SWE_INSTANCE_ID" ]; then
echo "Error: SWE_INSTANCE_ID is not set." >&2
exit 1
fi
# Read the swe-bench-test-lite.json file and extract the required item based on instance_id
item=$(jq --arg INSTANCE_ID "$SWE_INSTANCE_ID" '.[] | select(.instance_id == $INSTANCE_ID)' $SWEUTIL_DIR/eval_data/instances/swe-bench-test-lite.json)
if [[ -z "$item" ]]; then
echo "No item found for the provided instance ID."
exit 1
fi
CONDA_ENV_NAME=$(echo "$item" | jq -r '.repo + "__" + .version | gsub("/"; "__")')
echo "CONDA_ENV_NAME: $CONDA_ENV_NAME"
SWE_TASK_DIR=/openhands/swe_tasks
mkdir -p $SWE_TASK_DIR
# Dump test_patch to /workspace/test.patch
echo "$item" | jq -r '.test_patch' > $SWE_TASK_DIR/test.patch
# Dump patch to /workspace/gold.patch
echo "$item" | jq -r '.patch' > $SWE_TASK_DIR/gold.patch
# Dump the item to /workspace/instance.json except for the "test_patch" and "patch" fields
echo "$item" | jq 'del(.test_patch, .patch)' > $SWE_TASK_DIR/instance.json
# Clear the workspace
rm -rf /workspace/*
# Copy repo to workspace
if [ -d /workspace/$CONDA_ENV_NAME ]; then
rm -rf /workspace/$CONDA_ENV_NAME
fi
cp -r $SWEUTIL_DIR/eval_data/testbeds/$CONDA_ENV_NAME /workspace
# Reset swe-bench testbed and install the repo
. $SWEUTIL_DIR/miniforge3/etc/profile.d/conda.sh
conda config --set changeps1 False
conda config --append channels conda-forge
conda activate swe-bench-eval
mkdir -p $SWE_TASK_DIR/reset_testbed_temp
mkdir -p $SWE_TASK_DIR/reset_testbed_log_dir
SWE_BENCH_DIR=/swe_util/OH-SWE-bench
output=$(
export PYTHONPATH=$SWE_BENCH_DIR && \
cd $SWE_BENCH_DIR && \
python swebench/harness/reset_swe_env.py \
--swe_bench_tasks $SWEUTIL_DIR/eval_data/instances/swe-bench-test.json \
--temp_dir $SWE_TASK_DIR/reset_testbed_temp \
--testbed /workspace \
--conda_path $SWEUTIL_DIR/miniforge3 \
--instance_id $SWE_INSTANCE_ID \
--log_dir $SWE_TASK_DIR/reset_testbed_log_dir \
--timeout 900 \
--verbose
)
REPO_PATH=$(echo "$output" | awk -F': ' '/repo_path:/ {print $2}')
TEST_CMD=$(echo "$output" | awk -F': ' '/test_cmd:/ {print $2}')
echo "Repo Path: $REPO_PATH"
echo "Test Command: $TEST_CMD"
echo "export SWE_BENCH_DIR=\"$SWE_BENCH_DIR\"" >> ~/.bashrc
echo "export REPO_PATH=\"$REPO_PATH\"" >> ~/.bashrc
echo "export TEST_CMD=\"$TEST_CMD\"" >> ~/.bashrc
if [[ "$REPO_PATH" == "None" ]]; then
echo "Error: Failed to retrieve repository path. Tests may not have passed or output was not as expected." >&2
exit 1
fi
# Activate instance-specific environment
. $SWEUTIL_DIR/miniforge3/etc/profile.d/conda.sh
conda activate $CONDA_ENV_NAME
set +e

View File

@@ -1,69 +0,0 @@
# Integration tests
This directory implements integration tests that [was running in CI](https://github.com/OpenHands/OpenHands/tree/23d3becf1d6f5d07e592f7345750c314a826b4e9/tests/integration).
[PR 3985](https://github.com/OpenHands/OpenHands/pull/3985) introduce LLM-based editing, which requires access to LLM to perform edit. Hence, we remove integration tests from CI and intend to run them as nightly evaluation to ensure the quality of OpenHands softwares.
## To add new tests
Each test is a file named like `tXX_testname.py` where `XX` is a number.
Make sure to name the file for each test to start with `t` and ends with `.py`.
Each test should be structured as a subclass of [`BaseIntegrationTest`](./tests/base.py), where you need to implement `initialize_runtime` that setup the runtime enviornment before test, and `verify_result` that takes in a `Runtime` and history of `Event` and return a `TestResult`. See [t01_fix_simple_typo.py](./tests/t01_fix_simple_typo.py) and [t05_simple_browsing.py](./tests/t05_simple_browsing.py) for two representative examples.
```python
class TestResult(BaseModel):
success: bool
reason: str | None = None
class BaseIntegrationTest(ABC):
"""Base class for integration tests."""
INSTRUCTION: str
@classmethod
@abstractmethod
def initialize_runtime(cls, runtime: Runtime) -> None:
"""Initialize the runtime for the test to run."""
pass
@classmethod
@abstractmethod
def verify_result(cls, runtime: Runtime, histories: list[Event]) -> TestResult:
"""Verify the result of the test.
This method will be called after the agent performs the task on the runtime.
"""
pass
```
## Setup Environment and LLM Configuration
Please follow instruction [here](../README.md#setup) to setup your local
development environment and LLM.
## Start the evaluation
```bash
./evaluation/integration_tests/scripts/run_infer.sh [model_config] [git-version] [agent] [eval_limit] [eval-num-workers] [eval_ids]
```
- `model_config`, e.g. `eval_gpt4_1106_preview`, is the config group name for
your LLM settings, as defined in your `config.toml`.
- `git-version`, e.g. `HEAD`, is the git commit hash of the OpenHands version
you would like to evaluate. It could also be a release tag like `0.9.0`.
- `agent`, e.g. `CodeActAgent`, is the name of the agent for benchmarks,
defaulting to `CodeActAgent`.
- `eval_limit`, e.g. `10`, limits the evaluation to the first `eval_limit`
instances. By default, the script evaluates the entire Exercism test set
(133 issues). Note: in order to use `eval_limit`, you must also set `agent`.
- `eval-num-workers`: the number of workers to use for evaluation. Default: `1`.
- `eval_ids`, e.g. `"1,3,10"`, limits the evaluation to instances with the
given IDs (comma separated).
Example:
```bash
./evaluation/integration_tests/scripts/run_infer.sh llm.claude-35-sonnet-eval HEAD CodeActAgent
```

View File

@@ -1,251 +0,0 @@
import asyncio
import importlib.util
import os
import pandas as pd
from evaluation.integration_tests.tests.base import BaseIntegrationTest, TestResult
from evaluation.utils.shared import (
EvalMetadata,
EvalOutput,
get_default_sandbox_config_for_eval,
get_metrics,
get_openhands_config_for_eval,
make_metadata,
prepare_dataset,
reset_logger_for_multiprocessing,
run_evaluation,
update_llm_config_for_completions_logging,
)
from evaluation.utils.shared import (
codeact_user_response as fake_user_response,
)
from openhands.controller.state.state import State
from openhands.core.config import (
AgentConfig,
OpenHandsConfig,
get_evaluation_parser,
get_llm_config_arg,
)
from openhands.core.logger import openhands_logger as logger
from openhands.core.main import create_runtime, run_controller
from openhands.events.action import MessageAction
from openhands.events.serialization.event import event_to_dict
from openhands.runtime.base import Runtime
from openhands.utils.async_utils import call_async_from_sync
FAKE_RESPONSES = {
'CodeActAgent': fake_user_response,
'VisualBrowsingAgent': fake_user_response,
}
def get_config(
metadata: EvalMetadata,
instance_id: str,
) -> OpenHandsConfig:
sandbox_config = get_default_sandbox_config_for_eval()
sandbox_config.platform = 'linux/amd64'
config = get_openhands_config_for_eval(
metadata=metadata,
runtime=os.environ.get('RUNTIME', 'docker'),
sandbox_config=sandbox_config,
)
config.debug = True
config.set_llm_config(
update_llm_config_for_completions_logging(
metadata.llm_config, metadata.eval_output_dir, instance_id
)
)
agent_config = AgentConfig(
enable_jupyter=True,
enable_browsing=True,
enable_llm_editor=False,
)
config.set_agent_config(agent_config)
return config
def process_instance(
instance: pd.Series,
metadata: EvalMetadata,
reset_logger: bool = True,
) -> EvalOutput:
config = get_config(metadata, instance.instance_id)
# Setup the logger properly, so you can run multi-processing to parallelize the evaluation
if reset_logger:
log_dir = os.path.join(metadata.eval_output_dir, 'infer_logs')
reset_logger_for_multiprocessing(logger, str(instance.instance_id), log_dir)
else:
logger.info(
f'\nStarting evaluation for instance {str(instance.instance_id)}.\n'
)
# =============================================
# import test instance
# =============================================
instance_id = instance.instance_id
spec = importlib.util.spec_from_file_location(instance_id, instance.file_path)
test_module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(test_module)
assert hasattr(test_module, 'Test'), (
f'Test module {instance_id} does not have a Test class'
)
test_class: type[BaseIntegrationTest] = test_module.Test
assert issubclass(test_class, BaseIntegrationTest), (
f'Test class {instance_id} does not inherit from BaseIntegrationTest'
)
instruction = test_class.INSTRUCTION
# =============================================
# create sandbox and run the agent
# =============================================
runtime: Runtime = create_runtime(config)
call_async_from_sync(runtime.connect)
try:
test_class.initialize_runtime(runtime)
# Here's how you can run the agent (similar to the `main` function) and get the final task state
state: State | None = asyncio.run(
run_controller(
config=config,
initial_user_action=MessageAction(content=instruction),
runtime=runtime,
fake_user_response_fn=FAKE_RESPONSES[metadata.agent_class],
)
)
if state is None:
raise ValueError('State should not be None.')
# # =============================================
# # result evaluation
# # =============================================
histories = state.history
# some basic check
logger.info(f'Total events in history: {len(histories)}')
assert len(histories) > 0, 'History should not be empty'
test_result: TestResult = test_class.verify_result(runtime, histories)
metrics = get_metrics(state)
finally:
runtime.close()
# Save the output
output = EvalOutput(
instance_id=str(instance.instance_id),
instance=instance.to_dict(),
instruction=instruction,
metadata=metadata,
history=[event_to_dict(event) for event in histories],
metrics=metrics,
error=state.last_error if state and state.last_error else None,
test_result=test_result.model_dump(),
)
return output
def load_integration_tests() -> pd.DataFrame:
"""Load tests from python files under ./tests"""
cur_dir = os.path.dirname(os.path.abspath(__file__))
test_dir = os.path.join(cur_dir, 'tests')
test_files = [
os.path.join(test_dir, f)
for f in os.listdir(test_dir)
if f.startswith('t') and f.endswith('.py')
]
df = pd.DataFrame(test_files, columns=['file_path'])
df['instance_id'] = df['file_path'].apply(
lambda x: os.path.basename(x).rstrip('.py')
)
return df
if __name__ == '__main__':
parser = get_evaluation_parser()
args, _ = parser.parse_known_args()
integration_tests = load_integration_tests()
llm_config = None
if args.llm_config:
llm_config = get_llm_config_arg(args.llm_config)
if llm_config is None:
raise ValueError(f'Could not find LLM config: --llm_config {args.llm_config}')
metadata = make_metadata(
llm_config,
'integration_tests',
args.agent_cls,
args.max_iterations,
args.eval_note,
args.eval_output_dir,
)
output_file = os.path.join(metadata.eval_output_dir, 'output.jsonl')
# Parse dataset IDs if provided
eval_ids = None
if args.eval_ids:
eval_ids = str(args.eval_ids).split(',')
logger.info(f'\nUsing specific dataset IDs: {eval_ids}\n')
instances = prepare_dataset(
integration_tests,
output_file,
args.eval_n_limit,
eval_ids=eval_ids,
)
run_evaluation(
instances,
metadata,
output_file,
args.eval_num_workers,
process_instance,
)
df = pd.read_json(output_file, lines=True, orient='records')
# record success and reason
df['success'] = df['test_result'].apply(lambda x: x['success'])
df['reason'] = df['test_result'].apply(lambda x: x['reason'])
logger.info('-' * 100)
logger.info(
f'Success rate: {df["success"].mean():.2%} ({df["success"].sum()}/{len(df)})'
)
logger.info(
'\nEvaluation Results:'
+ '\n'
+ df[['instance_id', 'success', 'reason']].to_string(index=False)
)
logger.info('-' * 100)
# record cost for each instance, with 3 decimal places
# we sum up all the "costs" from the metrics array
df['cost'] = df['metrics'].apply(
lambda m: round(sum(c['cost'] for c in m['costs']), 3)
if m and 'costs' in m
else 0.0
)
# capture the top-level error if present, per instance
df['error_message'] = df.get('error', None)
logger.info(f'Total cost: USD {df["cost"].sum():.2f}')
report_file = os.path.join(metadata.eval_output_dir, 'report.md')
with open(report_file, 'w') as f:
f.write(
f'Success rate: {df["success"].mean():.2%}'
f' ({df["success"].sum()}/{len(df)})\n'
)
f.write(f'\nTotal cost: USD {df["cost"].sum():.2f}\n')
f.write(
df[
['instance_id', 'success', 'reason', 'cost', 'error_message']
].to_markdown(index=False)
)

View File

@@ -1,62 +0,0 @@
#!/usr/bin/env bash
set -eo pipefail
source "evaluation/utils/version_control.sh"
MODEL_CONFIG=$1
COMMIT_HASH=$2
AGENT=$3
EVAL_LIMIT=$4
MAX_ITERATIONS=$5
NUM_WORKERS=$6
EVAL_IDS=$7
if [ -z "$NUM_WORKERS" ]; then
NUM_WORKERS=1
echo "Number of workers not specified, use default $NUM_WORKERS"
fi
checkout_eval_branch
if [ -z "$AGENT" ]; then
echo "Agent not specified, use default CodeActAgent"
AGENT="CodeActAgent"
fi
get_openhands_version
echo "AGENT: $AGENT"
echo "OPENHANDS_VERSION: $OPENHANDS_VERSION"
echo "MODEL_CONFIG: $MODEL_CONFIG"
EVAL_NOTE=$OPENHANDS_VERSION
# Default to NOT use unit tests.
if [ -z "$USE_UNIT_TESTS" ]; then
export USE_UNIT_TESTS=false
fi
echo "USE_UNIT_TESTS: $USE_UNIT_TESTS"
# If use unit tests, set EVAL_NOTE to the commit hash
if [ "$USE_UNIT_TESTS" = true ]; then
EVAL_NOTE=$EVAL_NOTE-w-test
fi
# export PYTHONPATH=evaluation/integration_tests:\$PYTHONPATH
COMMAND="poetry run python evaluation/integration_tests/run_infer.py \
--agent-cls $AGENT \
--llm-config $MODEL_CONFIG \
--max-iterations ${MAX_ITERATIONS:-10} \
--eval-num-workers $NUM_WORKERS \
--eval-note $EVAL_NOTE"
if [ -n "$EVAL_LIMIT" ]; then
echo "EVAL_LIMIT: $EVAL_LIMIT"
COMMAND="$COMMAND --eval-n-limit $EVAL_LIMIT"
fi
if [ -n "$EVAL_IDS" ]; then
echo "EVAL_IDS: $EVAL_IDS"
COMMAND="$COMMAND --eval-ids $EVAL_IDS"
fi
# Run the command
eval $COMMAND

View File

@@ -1,32 +0,0 @@
from abc import ABC, abstractmethod
from pydantic import BaseModel
from openhands.events.event import Event
from openhands.runtime.base import Runtime
class TestResult(BaseModel):
success: bool
reason: str | None = None
class BaseIntegrationTest(ABC):
"""Base class for integration tests."""
INSTRUCTION: str
@classmethod
@abstractmethod
def initialize_runtime(cls, runtime: Runtime) -> None:
"""Initialize the runtime for the test to run."""
pass
@classmethod
@abstractmethod
def verify_result(cls, runtime: Runtime, histories: list[Event]) -> TestResult:
"""Verify the result of the test.
This method will be called after the agent performs the task on the runtime.
"""
pass

View File

@@ -1,39 +0,0 @@
import os
import tempfile
from evaluation.integration_tests.tests.base import BaseIntegrationTest, TestResult
from openhands.events.action import CmdRunAction
from openhands.events.event import Event
from openhands.runtime.base import Runtime
class Test(BaseIntegrationTest):
INSTRUCTION = 'Fix typos in bad.txt.'
@classmethod
def initialize_runtime(cls, runtime: Runtime) -> None:
# create a file with a typo in /workspace/bad.txt
with tempfile.TemporaryDirectory() as temp_dir:
temp_file_path = os.path.join(temp_dir, 'bad.txt')
with open(temp_file_path, 'w') as f:
f.write('This is a stupid typoo.\nReally?\nNo mor typos!\nEnjoy!')
# Copy the file to the desired location
runtime.copy_to(temp_file_path, '/workspace')
@classmethod
def verify_result(cls, runtime: Runtime, histories: list[Event]) -> TestResult:
# check if the file /workspace/bad.txt has been fixed
action = CmdRunAction(command='cat /workspace/bad.txt')
obs = runtime.run_action(action)
if obs.exit_code != 0:
return TestResult(
success=False, reason=f'Failed to run command: {obs.content}'
)
# check if the file /workspace/bad.txt has been fixed
if (
obs.content.strip().replace('\r\n', '\n')
== 'This is a stupid typo.\nReally?\nNo more typos!\nEnjoy!'
):
return TestResult(success=True)
return TestResult(success=False, reason=f'File not fixed: {obs.content}')

View File

@@ -1,40 +0,0 @@
from evaluation.integration_tests.tests.base import BaseIntegrationTest, TestResult
from evaluation.utils.shared import assert_and_raise
from openhands.events.action import CmdRunAction
from openhands.events.event import Event
from openhands.runtime.base import Runtime
class Test(BaseIntegrationTest):
INSTRUCTION = "Write a shell script '/workspace/hello.sh' that prints 'hello'."
@classmethod
def initialize_runtime(cls, runtime: Runtime) -> None:
action = CmdRunAction(command='mkdir -p /workspace')
obs = runtime.run_action(action)
assert_and_raise(obs.exit_code == 0, f'Failed to run command: {obs.content}')
@classmethod
def verify_result(cls, runtime: Runtime, histories: list[Event]) -> TestResult:
# check if the file /workspace/hello.sh exists
action = CmdRunAction(command='cat /workspace/hello.sh')
obs = runtime.run_action(action)
if obs.exit_code != 0:
return TestResult(
success=False,
reason=f'Failed to cat /workspace/hello.sh: {obs.content}.',
)
# execute the script
action = CmdRunAction(command='bash /workspace/hello.sh')
obs = runtime.run_action(action)
if obs.exit_code != 0:
return TestResult(
success=False,
reason=f'Failed to execute /workspace/hello.sh: {obs.content}.',
)
if obs.content.strip() != 'hello':
return TestResult(
success=False, reason=f'Script did not print "hello": {obs.content}.'
)
return TestResult(success=True)

View File

@@ -1,43 +0,0 @@
from evaluation.integration_tests.tests.base import BaseIntegrationTest, TestResult
from evaluation.utils.shared import assert_and_raise
from openhands.events.action import CmdRunAction
from openhands.events.event import Event
from openhands.runtime.base import Runtime
class Test(BaseIntegrationTest):
INSTRUCTION = "Use Jupyter IPython to write a text file containing 'hello world' to '/workspace/test.txt'."
@classmethod
def initialize_runtime(cls, runtime: Runtime) -> None:
action = CmdRunAction(command='mkdir -p /workspace')
obs = runtime.run_action(action)
assert_and_raise(obs.exit_code == 0, f'Failed to run command: {obs.content}')
@classmethod
def verify_result(cls, runtime: Runtime, histories: list[Event]) -> TestResult:
# check if the file /workspace/hello.sh exists
action = CmdRunAction(command='cat /workspace/test.txt')
obs = runtime.run_action(action)
if obs.exit_code != 0:
return TestResult(
success=False,
reason=f'Failed to cat /workspace/test.txt: {obs.content}.',
)
# execute the script
action = CmdRunAction(command='cat /workspace/test.txt')
obs = runtime.run_action(action)
if obs.exit_code != 0:
return TestResult(
success=False,
reason=f'Failed to cat /workspace/test.txt: {obs.content}.',
)
if 'hello world' not in obs.content.strip():
return TestResult(
success=False,
reason=f'File did not contain "hello world": {obs.content}.',
)
return TestResult(success=True)

View File

@@ -1,57 +0,0 @@
from evaluation.integration_tests.tests.base import BaseIntegrationTest, TestResult
from evaluation.utils.shared import assert_and_raise
from openhands.events.action import CmdRunAction
from openhands.events.event import Event
from openhands.runtime.base import Runtime
class Test(BaseIntegrationTest):
INSTRUCTION = 'Write a git commit message for the current staging area and commit the changes.'
@classmethod
def initialize_runtime(cls, runtime: Runtime) -> None:
action = CmdRunAction(command='mkdir -p /workspace')
obs = runtime.run_action(action)
assert_and_raise(obs.exit_code == 0, f'Failed to run command: {obs.content}')
# git init
action = CmdRunAction(command='git init')
obs = runtime.run_action(action)
assert_and_raise(obs.exit_code == 0, f'Failed to run command: {obs.content}')
# create file
action = CmdRunAction(command='echo \'print("hello world")\' > hello.py')
obs = runtime.run_action(action)
assert_and_raise(obs.exit_code == 0, f'Failed to run command: {obs.content}')
# git add
cmd_str = 'git add hello.py'
action = CmdRunAction(command=cmd_str)
obs = runtime.run_action(action)
assert_and_raise(obs.exit_code == 0, f'Failed to run command: {obs.content}')
@classmethod
def verify_result(cls, runtime: Runtime, histories: list[Event]) -> TestResult:
# check if the file /workspace/hello.py exists
action = CmdRunAction(command='cat /workspace/hello.py')
obs = runtime.run_action(action)
if obs.exit_code != 0:
return TestResult(
success=False,
reason=f'Failed to cat /workspace/hello.py: {obs.content}.',
)
# check if the staging area is empty
action = CmdRunAction(command='git status')
obs = runtime.run_action(action)
if obs.exit_code != 0:
return TestResult(
success=False, reason=f'Failed to git status: {obs.content}.'
)
if 'nothing to commit, working tree clean' in obs.content.strip():
return TestResult(success=True)
return TestResult(
success=False,
reason=f'Failed to check for "nothing to commit, working tree clean": {obs.content}.',
)

View File

@@ -1,145 +0,0 @@
import os
import tempfile
from evaluation.integration_tests.tests.base import BaseIntegrationTest, TestResult
from evaluation.utils.shared import assert_and_raise
from openhands.events.action import AgentFinishAction, CmdRunAction, MessageAction
from openhands.events.event import Event
from openhands.events.observation import AgentDelegateObservation
from openhands.runtime.base import Runtime
HTML_FILE = """
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>The Ultimate Answer</title>
<style>
body {
display: flex;
justify-content: center;
align-items: center;
height: 100vh;
margin: 0;
background: linear-gradient(to right, #1e3c72, #2a5298);
color: #fff;
font-family: 'Arial', sans-serif;
text-align: center;
}
.container {
text-align: center;
padding: 20px;
background: rgba(255, 255, 255, 0.1);
border-radius: 10px;
box-shadow: 0 0 10px rgba(0, 0, 0, 0.2);
}
h1 {
font-size: 36px;
margin-bottom: 20px;
}
p {
font-size: 18px;
margin-bottom: 30px;
}
#showButton {
padding: 10px 20px;
font-size: 16px;
color: #1e3c72;
background: #fff;
border: none;
border-radius: 5px;
cursor: pointer;
transition: background 0.3s ease;
}
#showButton:hover {
background: #f0f0f0;
}
#result {
margin-top: 20px;
font-size: 24px;
}
</style>
</head>
<body>
<div class="container">
<h1>The Ultimate Answer</h1>
<p>Click the button to reveal the answer to life, the universe, and everything.</p>
<button id="showButton">Click me</button>
<div id="result"></div>
</div>
<script>
document.getElementById('showButton').addEventListener('click', function() {
document.getElementById('result').innerText = 'The answer is OpenHands is all you need!';
});
</script>
</body>
</html>
"""
class Test(BaseIntegrationTest):
INSTRUCTION = 'Browse localhost:8000, and tell me the ultimate answer to life.'
@classmethod
def initialize_runtime(cls, runtime: Runtime) -> None:
action = CmdRunAction(command='mkdir -p /workspace')
obs = runtime.run_action(action)
assert_and_raise(obs.exit_code == 0, f'Failed to run command: {obs.content}')
action = CmdRunAction(command='mkdir -p /tmp/server')
obs = runtime.run_action(action)
assert_and_raise(obs.exit_code == 0, f'Failed to run command: {obs.content}')
# create a file with a typo in /workspace/bad.txt
with tempfile.TemporaryDirectory() as temp_dir:
temp_file_path = os.path.join(temp_dir, 'index.html')
with open(temp_file_path, 'w') as f:
f.write(HTML_FILE)
# Copy the file to the desired location
runtime.copy_to(temp_file_path, '/tmp/server')
# create README.md
action = CmdRunAction(
command='cd /tmp/server && nohup python3 -m http.server 8000 &'
)
obs = runtime.run_action(action)
@classmethod
def verify_result(cls, runtime: Runtime, histories: list[Event]) -> TestResult:
from openhands.core.logger import openhands_logger as logger
# check if the "The answer is OpenHands is all you need!" is in any message
message_actions = [
event
for event in histories
if isinstance(
event, (MessageAction, AgentFinishAction, AgentDelegateObservation)
)
]
logger.debug(f'Total message-like events: {len(message_actions)}')
for event in message_actions:
try:
if isinstance(event, AgentDelegateObservation):
content = event.content
elif isinstance(event, AgentFinishAction):
content = event.outputs.get('content', '')
elif isinstance(event, MessageAction):
content = event.content
else:
logger.warning(f'Unexpected event type: {type(event)}')
continue
if 'OpenHands is all you need!' in content:
return TestResult(success=True)
except Exception as e:
logger.error(f'Error processing event: {e}')
logger.debug(
f'Total messages: {len(message_actions)}. Messages: {message_actions}'
)
return TestResult(
success=False,
reason=f'The answer is not found in any message. Total messages: {len(message_actions)}.',
)

View File

@@ -1,58 +0,0 @@
from evaluation.integration_tests.tests.base import BaseIntegrationTest, TestResult
from openhands.events.action import AgentFinishAction, MessageAction
from openhands.events.event import Event
from openhands.events.observation import AgentDelegateObservation
from openhands.runtime.base import Runtime
class Test(BaseIntegrationTest):
INSTRUCTION = 'Look at https://github.com/OpenHands/OpenHands/pull/8, and tell me what is happening there and what did @asadm suggest.'
@classmethod
def initialize_runtime(cls, runtime: Runtime) -> None:
pass
@classmethod
def verify_result(cls, runtime: Runtime, histories: list[Event]) -> TestResult:
from openhands.core.logger import openhands_logger as logger
# check if the license information is in any message
message_actions = [
event
for event in histories
if isinstance(
event, (MessageAction, AgentFinishAction, AgentDelegateObservation)
)
]
logger.info(f'Total message-like events: {len(message_actions)}')
for event in message_actions:
try:
if isinstance(event, AgentDelegateObservation):
content = event.content
elif isinstance(event, AgentFinishAction):
content = event.outputs.get('content', '')
if event.thought:
content += f'\n\n{event.thought}'
elif isinstance(event, MessageAction):
content = event.content
else:
logger.warning(f'Unexpected event type: {type(event)}')
continue
if (
'non-commercial' in content
or 'MIT' in content
or 'Apache 2.0' in content
):
return TestResult(success=True)
except Exception as e:
logger.error(f'Error processing event: {e}')
logger.debug(
f'Total messages: {len(message_actions)}. Messages: {message_actions}'
)
return TestResult(
success=False,
reason=f'The answer is not found in any message. Total messages: {len(message_actions)}.',
)

View File

@@ -1,73 +0,0 @@
import hashlib
from evaluation.integration_tests.tests.base import BaseIntegrationTest, TestResult
from openhands.events.action import (
AgentFinishAction,
FileWriteAction,
MessageAction,
)
from openhands.events.event import Event
from openhands.events.observation import AgentDelegateObservation
from openhands.runtime.base import Runtime
class Test(BaseIntegrationTest):
INSTRUCTION = 'Execute the python script /workspace/python_script.py with input "John" and "25" and tell me the secret number.'
SECRET_NUMBER = int(hashlib.sha256(str(25).encode()).hexdigest()[:8], 16) % 1000
@classmethod
def initialize_runtime(cls, runtime: Runtime) -> None:
from openhands.core.logger import openhands_logger as logger
action = FileWriteAction(
path='/workspace/python_script.py',
content=(
'name = input("Enter your name: "); age = input("Enter your age: "); '
'import hashlib; secret = int(hashlib.sha256(str(age).encode()).hexdigest()[:8], 16) % 1000; '
'print(f"Hello {name}, you are {age} years old. Tell you a secret number: {secret}")'
),
)
logger.info(action, extra={'msg_type': 'ACTION'})
observation = runtime.run_action(action)
logger.info(observation, extra={'msg_type': 'OBSERVATION'})
@classmethod
def verify_result(cls, runtime: Runtime, histories: list[Event]) -> TestResult:
from openhands.core.logger import openhands_logger as logger
# check if the license information is in any message
message_actions = [
event
for event in histories
if isinstance(
event, (MessageAction, AgentFinishAction, AgentDelegateObservation)
)
]
logger.info(f'Total message-like events: {len(message_actions)}')
for event in message_actions:
try:
if isinstance(event, AgentDelegateObservation):
content = event.content
elif isinstance(event, AgentFinishAction):
content = event.outputs.get('content', '')
if event.thought:
content += f'\n\n{event.thought}'
elif isinstance(event, MessageAction):
content = event.content
else:
logger.warning(f'Unexpected event type: {type(event)}')
continue
if str(cls.SECRET_NUMBER) in content:
return TestResult(success=True)
except Exception as e:
logger.error(f'Error processing event: {e}')
logger.debug(
f'Total messages: {len(message_actions)}. Messages: {message_actions}'
)
return TestResult(
success=False,
reason=f'The answer is not found in any message. Total messages: {len(message_actions)}.',
)

View File

@@ -9,7 +9,7 @@ import time
import traceback
from contextlib import contextmanager
from inspect import signature
from typing import Any, Awaitable, Callable, TextIO
from typing import Any, Awaitable, Callable, Optional, TextIO
import pandas as pd
from pydantic import BaseModel
@@ -222,6 +222,7 @@ def prepare_dataset(
eval_n_limit: int,
eval_ids: list[str] | None = None,
skip_num: int | None = None,
filter_func: Optional[Callable[[pd.DataFrame], pd.DataFrame]] = None,
):
assert 'instance_id' in dataset.columns, (
"Expected 'instance_id' column in the dataset. You should define your own unique identifier for each instance and use it as the 'instance_id' column."
@@ -265,6 +266,12 @@ def prepare_dataset(
f'Randomly sampling {eval_n_limit} unique instances with random seed 42.'
)
if filter_func is not None:
dataset = filter_func(dataset)
logger.info(
f'Applied filter after sampling: {len(dataset)} instances remaining'
)
def make_serializable(instance_dict: dict) -> dict:
import numpy as np

View File

@@ -33,9 +33,24 @@ describe("AccountSettingsContextMenu", () => {
expect(
screen.getByTestId("account-settings-context-menu"),
).toBeInTheDocument();
expect(screen.getByText("SIDEBAR$DOCS")).toBeInTheDocument();
expect(screen.getByText("ACCOUNT_SETTINGS$LOGOUT")).toBeInTheDocument();
});
it("should render Documentation link with correct attributes", () => {
renderWithRouter(
<AccountSettingsContextMenu
onLogout={onLogoutMock}
onClose={onCloseMock}
/>,
);
const documentationLink = screen.getByText("SIDEBAR$DOCS").closest("a");
expect(documentationLink).toHaveAttribute("href", "https://docs.openhands.dev");
expect(documentationLink).toHaveAttribute("target", "_blank");
expect(documentationLink).toHaveAttribute("rel", "noopener noreferrer");
});
it("should call onLogout when the logout option is clicked", async () => {
renderWithRouter(
<AccountSettingsContextMenu

View File

@@ -8,6 +8,13 @@ vi.mock("#/hooks/use-auth-url", () => ({
useAuthUrl: () => "https://gitlab.com/oauth/authorize",
}));
// Mock the useTracking hook
vi.mock("#/hooks/use-tracking", () => ({
useTracking: () => ({
trackLoginButtonClick: vi.fn(),
}),
}));
describe("AuthModal", () => {
beforeEach(() => {
vi.stubGlobal("location", { href: "" });

View File

@@ -8,10 +8,11 @@ vi.mock("react-i18next", () => ({
useTranslation: () => ({
t: (key: string) => {
const translations: Record<string, string> = {
"TASK_TRACKING_OBSERVATION$TASK_LIST": "Task List",
"TASK_TRACKING_OBSERVATION$TASK_ID": "ID",
"TASK_TRACKING_OBSERVATION$TASK_NOTES": "Notes",
"TASK_TRACKING_OBSERVATION$RESULT": "Result",
TASK_TRACKING_OBSERVATION$TASK_LIST: "Task List",
TASK_TRACKING_OBSERVATION$TASK_ID: "ID",
TASK_TRACKING_OBSERVATION$TASK_NOTES: "Notes",
TASK_TRACKING_OBSERVATION$RESULT: "Result",
COMMON$TASKS: "Tasks",
};
return translations[key] || key;
},
@@ -61,19 +62,26 @@ describe("TaskTrackingObservationContent", () => {
it("renders task list when command is 'plan' and tasks exist", () => {
render(<TaskTrackingObservationContent event={mockEvent} />);
expect(screen.getByText("Task List (3 items)")).toBeInTheDocument();
expect(screen.getByText("Tasks")).toBeInTheDocument();
expect(screen.getByText("Implement feature A")).toBeInTheDocument();
expect(screen.getByText("Fix bug B")).toBeInTheDocument();
expect(screen.getByText("Deploy to production")).toBeInTheDocument();
});
it("displays correct status icons and badges", () => {
render(<TaskTrackingObservationContent event={mockEvent} />);
const { container } = render(
<TaskTrackingObservationContent event={mockEvent} />,
);
// Check for status text (the icons are emojis)
expect(screen.getByText("todo")).toBeInTheDocument();
expect(screen.getByText("in progress")).toBeInTheDocument();
expect(screen.getByText("done")).toBeInTheDocument();
// Status is represented by icons, not text. Verify task items are rendered with their titles
// which indicates the status icons are present (status affects icon rendering)
expect(screen.getByText("Implement feature A")).toBeInTheDocument();
expect(screen.getByText("Fix bug B")).toBeInTheDocument();
expect(screen.getByText("Deploy to production")).toBeInTheDocument();
// Verify task items are present (they contain the status icons)
const taskItems = container.querySelectorAll('[data-name="item"]');
expect(taskItems).toHaveLength(3);
});
it("displays task IDs and notes", () => {
@@ -84,14 +92,9 @@ describe("TaskTrackingObservationContent", () => {
expect(screen.getByText("ID: task-3")).toBeInTheDocument();
expect(screen.getByText("Notes: This is a test task")).toBeInTheDocument();
expect(screen.getByText("Notes: Completed successfully")).toBeInTheDocument();
});
it("renders result section when content exists", () => {
render(<TaskTrackingObservationContent event={mockEvent} />);
expect(screen.getByText("Result")).toBeInTheDocument();
expect(screen.getByText("Task tracking operation completed successfully")).toBeInTheDocument();
expect(
screen.getByText("Notes: Completed successfully"),
).toBeInTheDocument();
});
it("does not render task list when command is not 'plan'", () => {
@@ -105,7 +108,7 @@ describe("TaskTrackingObservationContent", () => {
render(<TaskTrackingObservationContent event={eventWithoutPlan} />);
expect(screen.queryByText("Task List")).not.toBeInTheDocument();
expect(screen.queryByText("Tasks")).not.toBeInTheDocument();
});
it("does not render task list when task list is empty", () => {
@@ -119,17 +122,6 @@ describe("TaskTrackingObservationContent", () => {
render(<TaskTrackingObservationContent event={eventWithEmptyTasks} />);
expect(screen.queryByText("Task List")).not.toBeInTheDocument();
});
it("does not render result section when content is empty", () => {
const eventWithoutContent = {
...mockEvent,
content: "",
};
render(<TaskTrackingObservationContent event={eventWithoutContent} />);
expect(screen.queryByText("Result")).not.toBeInTheDocument();
expect(screen.queryByText("Tasks")).not.toBeInTheDocument();
});
});

View File

@@ -13,34 +13,6 @@ vi.mock("#/hooks/use-agent-state", () => ({
useAgentState: vi.fn(),
}));
// Mock the custom hooks
const mockStartConversationMutate = vi.fn();
const mockStopConversationMutate = vi.fn();
vi.mock("#/hooks/mutation/use-unified-start-conversation", () => ({
useUnifiedStartConversation: () => ({
mutate: mockStartConversationMutate,
}),
}));
vi.mock("#/hooks/mutation/use-unified-stop-conversation", () => ({
useUnifiedStopConversation: () => ({
mutate: mockStopConversationMutate,
}),
}));
vi.mock("#/hooks/use-conversation-id", () => ({
useConversationId: () => ({
conversationId: "test-conversation-id",
}),
}));
vi.mock("#/hooks/use-user-providers", () => ({
useUserProviders: () => ({
providers: [],
}),
}));
vi.mock("#/hooks/query/use-task-polling", () => ({
useTaskPolling: () => ({
isTask: false,
@@ -66,8 +38,12 @@ vi.mock("react-i18next", async () => {
COMMON$SERVER_STOPPED: "Server Stopped",
COMMON$ERROR: "Error",
COMMON$STARTING: "Starting",
COMMON$STOPPING: "Stopping...",
COMMON$STOP_RUNTIME: "Stop Runtime",
COMMON$START_RUNTIME: "Start Runtime",
CONVERSATION$ERROR_STARTING_CONVERSATION:
"Error starting conversation",
CONVERSATION$READY: "Ready",
};
return translations[key] || key;
},
@@ -79,10 +55,6 @@ vi.mock("react-i18next", async () => {
});
describe("ServerStatus", () => {
// Mock functions for handlers
const mockHandleStop = vi.fn();
const mockHandleResumeAgent = vi.fn();
// Helper function to mock agent state with specific state
const mockAgentStore = (agentState: AgentState) => {
vi.mocked(useAgentState).mockReturnValue({
@@ -94,248 +66,91 @@ describe("ServerStatus", () => {
vi.clearAllMocks();
});
it("should render server status with different conversation statuses", () => {
// Mock agent store to return RUNNING state
it("should render server status with RUNNING conversation status", () => {
mockAgentStore(AgentState.RUNNING);
// Test RUNNING status
const { rerender } = renderWithProviders(
<ServerStatus
conversationStatus="RUNNING"
handleStop={mockHandleStop}
handleResumeAgent={mockHandleResumeAgent}
/>,
);
expect(screen.getByText("Running")).toBeInTheDocument();
renderWithProviders(<ServerStatus conversationStatus="RUNNING" />);
// Test STOPPED status
rerender(
<ServerStatus
conversationStatus="STOPPED"
handleStop={mockHandleStop}
handleResumeAgent={mockHandleResumeAgent}
/>,
);
expect(screen.getByTestId("server-status")).toBeInTheDocument();
expect(screen.getByText("Running")).toBeInTheDocument();
});
it("should render server status with STOPPED conversation status", () => {
mockAgentStore(AgentState.RUNNING);
renderWithProviders(<ServerStatus conversationStatus="STOPPED" />);
expect(screen.getByTestId("server-status")).toBeInTheDocument();
expect(screen.getByText("Server Stopped")).toBeInTheDocument();
// Test STARTING status (shows "Running" due to agent state being RUNNING)
rerender(
<ServerStatus
conversationStatus="STARTING"
handleStop={mockHandleStop}
handleResumeAgent={mockHandleResumeAgent}
/>,
);
expect(screen.getByText("Running")).toBeInTheDocument();
// Test null status (shows "Running" due to agent state being RUNNING)
rerender(
<ServerStatus
conversationStatus={null}
handleStop={mockHandleStop}
handleResumeAgent={mockHandleResumeAgent}
/>,
);
expect(screen.getByText("Running")).toBeInTheDocument();
});
it("should show context menu when clicked with RUNNING status", async () => {
const user = userEvent.setup();
it("should render STARTING status when agent state is LOADING", () => {
mockAgentStore(AgentState.LOADING);
// Mock agent store to return RUNNING state
renderWithProviders(<ServerStatus conversationStatus="STARTING" />);
expect(screen.getByTestId("server-status")).toBeInTheDocument();
expect(screen.getByText("Starting")).toBeInTheDocument();
});
it("should render STARTING status when agent state is INIT", () => {
mockAgentStore(AgentState.INIT);
renderWithProviders(<ServerStatus conversationStatus="STARTING" />);
expect(screen.getByTestId("server-status")).toBeInTheDocument();
expect(screen.getByText("Starting")).toBeInTheDocument();
});
it("should render ERROR status when agent state is ERROR", () => {
mockAgentStore(AgentState.ERROR);
renderWithProviders(<ServerStatus conversationStatus="RUNNING" />);
expect(screen.getByTestId("server-status")).toBeInTheDocument();
expect(screen.getByText("Error")).toBeInTheDocument();
});
it("should render STOPPING status when isPausing is true", () => {
mockAgentStore(AgentState.RUNNING);
renderWithProviders(
<ServerStatus
conversationStatus="RUNNING"
handleStop={mockHandleStop}
handleResumeAgent={mockHandleResumeAgent}
/>,
<ServerStatus conversationStatus="RUNNING" isPausing={true} />,
);
const statusContainer = screen.getByText("Running").closest("div");
expect(statusContainer).toBeInTheDocument();
await user.click(statusContainer!);
// Context menu should appear
expect(
screen.getByTestId("server-status-context-menu"),
).toBeInTheDocument();
expect(screen.getByTestId("stop-server-button")).toBeInTheDocument();
});
it("should show context menu when clicked with STOPPED status", async () => {
const user = userEvent.setup();
// Mock agent store to return STOPPED state
mockAgentStore(AgentState.STOPPED);
renderWithProviders(
<ServerStatus
conversationStatus="STOPPED"
handleStop={mockHandleStop}
handleResumeAgent={mockHandleResumeAgent}
/>,
);
const statusContainer = screen.getByText("Server Stopped").closest("div");
expect(statusContainer).toBeInTheDocument();
await user.click(statusContainer!);
// Context menu should appear
expect(
screen.getByTestId("server-status-context-menu"),
).toBeInTheDocument();
expect(screen.getByTestId("start-server-button")).toBeInTheDocument();
});
it("should not show context menu when clicked with other statuses", async () => {
const user = userEvent.setup();
// Mock agent store to return RUNNING state
mockAgentStore(AgentState.RUNNING);
renderWithProviders(
<ServerStatus
conversationStatus="STARTING"
handleStop={mockHandleStop}
handleResumeAgent={mockHandleResumeAgent}
/>,
);
const statusContainer = screen.getByText("Running").closest("div");
expect(statusContainer).toBeInTheDocument();
await user.click(statusContainer!);
// Context menu should not appear
expect(
screen.queryByTestId("server-status-context-menu"),
).not.toBeInTheDocument();
});
it("should call stop conversation mutation when stop server is clicked", async () => {
const user = userEvent.setup();
// Clear previous calls
mockHandleStop.mockClear();
// Mock agent store to return RUNNING state
mockAgentStore(AgentState.RUNNING);
renderWithProviders(
<ServerStatus
conversationStatus="RUNNING"
handleStop={mockHandleStop}
handleResumeAgent={mockHandleResumeAgent}
/>,
);
const statusContainer = screen.getByText("Running").closest("div");
await user.click(statusContainer!);
const stopButton = screen.getByTestId("stop-server-button");
await user.click(stopButton);
expect(mockHandleStop).toHaveBeenCalledTimes(1);
});
it("should call start conversation mutation when start server is clicked", async () => {
const user = userEvent.setup();
// Clear previous calls
mockHandleResumeAgent.mockClear();
// Mock agent store to return STOPPED state
mockAgentStore(AgentState.STOPPED);
renderWithProviders(
<ServerStatus
conversationStatus="STOPPED"
handleStop={mockHandleStop}
handleResumeAgent={mockHandleResumeAgent}
/>,
);
const statusContainer = screen.getByText("Server Stopped").closest("div");
await user.click(statusContainer!);
const startButton = screen.getByTestId("start-server-button");
await user.click(startButton);
expect(mockHandleResumeAgent).toHaveBeenCalledTimes(1);
});
it("should close context menu after stop server action", async () => {
const user = userEvent.setup();
// Mock agent store to return RUNNING state
mockAgentStore(AgentState.RUNNING);
renderWithProviders(
<ServerStatus
conversationStatus="RUNNING"
handleStop={mockHandleStop}
handleResumeAgent={mockHandleResumeAgent}
/>,
);
const statusContainer = screen.getByText("Running").closest("div");
await user.click(statusContainer!);
const stopButton = screen.getByTestId("stop-server-button");
await user.click(stopButton);
// Context menu should be closed (handled by the component)
expect(mockHandleStop).toHaveBeenCalledTimes(1);
});
it("should close context menu after start server action", async () => {
const user = userEvent.setup();
// Mock agent store to return STOPPED state
mockAgentStore(AgentState.STOPPED);
renderWithProviders(
<ServerStatus
conversationStatus="STOPPED"
handleStop={mockHandleStop}
handleResumeAgent={mockHandleResumeAgent}
/>,
);
const statusContainer = screen.getByText("Server Stopped").closest("div");
await user.click(statusContainer!);
const startButton = screen.getByTestId("start-server-button");
await user.click(startButton);
// Context menu should be closed
expect(
screen.queryByTestId("server-status-context-menu"),
).not.toBeInTheDocument();
expect(screen.getByTestId("server-status")).toBeInTheDocument();
expect(screen.getByText("Stopping...")).toBeInTheDocument();
});
it("should handle null conversation status", () => {
// Mock agent store to return RUNNING state
mockAgentStore(AgentState.RUNNING);
renderWithProviders(<ServerStatus conversationStatus={null} />);
expect(screen.getByTestId("server-status")).toBeInTheDocument();
expect(screen.getByText("Running")).toBeInTheDocument();
});
it("should apply custom className", () => {
mockAgentStore(AgentState.RUNNING);
renderWithProviders(
<ServerStatus
conversationStatus={null}
handleStop={mockHandleStop}
handleResumeAgent={mockHandleResumeAgent}
/>,
<ServerStatus conversationStatus="RUNNING" className="custom-class" />,
);
const statusText = screen.getByText("Running");
expect(statusText).toBeInTheDocument();
const container = screen.getByTestId("server-status");
expect(container).toHaveClass("custom-class");
});
});
describe("ServerStatusContextMenu", () => {
// Helper function to mock agent state with specific state
const mockAgentStore = (agentState: AgentState) => {
vi.mocked(useAgentState).mockReturnValue({
curAgentState: agentState,
});
};
const defaultProps = {
onClose: vi.fn(),
conversationStatus: "RUNNING" as ConversationStatus,
@@ -346,6 +161,8 @@ describe("ServerStatusContextMenu", () => {
});
it("should render stop server button when status is RUNNING", () => {
mockAgentStore(AgentState.RUNNING);
renderWithProviders(
<ServerStatusContextMenu
{...defaultProps}
@@ -354,11 +171,14 @@ describe("ServerStatusContextMenu", () => {
/>,
);
expect(screen.getByTestId("server-status")).toBeInTheDocument();
expect(screen.getByTestId("stop-server-button")).toBeInTheDocument();
expect(screen.getByText("Stop Runtime")).toBeInTheDocument();
});
it("should render start server button when status is STOPPED", () => {
mockAgentStore(AgentState.RUNNING);
renderWithProviders(
<ServerStatusContextMenu
{...defaultProps}
@@ -367,11 +187,14 @@ describe("ServerStatusContextMenu", () => {
/>,
);
expect(screen.getByTestId("server-status")).toBeInTheDocument();
expect(screen.getByTestId("start-server-button")).toBeInTheDocument();
expect(screen.getByText("Start Runtime")).toBeInTheDocument();
});
it("should not render stop server button when onStopServer is not provided", () => {
mockAgentStore(AgentState.RUNNING);
renderWithProviders(
<ServerStatusContextMenu
{...defaultProps}
@@ -379,10 +202,13 @@ describe("ServerStatusContextMenu", () => {
/>,
);
expect(screen.getByTestId("server-status")).toBeInTheDocument();
expect(screen.queryByTestId("stop-server-button")).not.toBeInTheDocument();
});
it("should not render start server button when onStartServer is not provided", () => {
mockAgentStore(AgentState.RUNNING);
renderWithProviders(
<ServerStatusContextMenu
{...defaultProps}
@@ -390,12 +216,14 @@ describe("ServerStatusContextMenu", () => {
/>,
);
expect(screen.getByTestId("server-status")).toBeInTheDocument();
expect(screen.queryByTestId("start-server-button")).not.toBeInTheDocument();
});
it("should call onStopServer when stop button is clicked", async () => {
const user = userEvent.setup();
const onStopServer = vi.fn();
mockAgentStore(AgentState.RUNNING);
renderWithProviders(
<ServerStatusContextMenu
@@ -414,6 +242,7 @@ describe("ServerStatusContextMenu", () => {
it("should call onStartServer when start button is clicked", async () => {
const user = userEvent.setup();
const onStartServer = vi.fn();
mockAgentStore(AgentState.RUNNING);
renderWithProviders(
<ServerStatusContextMenu
@@ -430,6 +259,8 @@ describe("ServerStatusContextMenu", () => {
});
it("should render correct text content for stop server button", () => {
mockAgentStore(AgentState.RUNNING);
renderWithProviders(
<ServerStatusContextMenu
{...defaultProps}
@@ -444,6 +275,8 @@ describe("ServerStatusContextMenu", () => {
});
it("should render correct text content for start server button", () => {
mockAgentStore(AgentState.RUNNING);
renderWithProviders(
<ServerStatusContextMenu
{...defaultProps}
@@ -459,6 +292,7 @@ describe("ServerStatusContextMenu", () => {
it("should call onClose when context menu is closed", () => {
const onClose = vi.fn();
mockAgentStore(AgentState.RUNNING);
renderWithProviders(
<ServerStatusContextMenu
@@ -475,6 +309,8 @@ describe("ServerStatusContextMenu", () => {
});
it("should not render any buttons for other conversation statuses", () => {
mockAgentStore(AgentState.RUNNING);
renderWithProviders(
<ServerStatusContextMenu
{...defaultProps}
@@ -482,6 +318,7 @@ describe("ServerStatusContextMenu", () => {
/>,
);
expect(screen.getByTestId("server-status")).toBeInTheDocument();
expect(screen.queryByTestId("stop-server-button")).not.toBeInTheDocument();
expect(screen.queryByTestId("start-server-button")).not.toBeInTheDocument();
});

View File

@@ -71,6 +71,7 @@ beforeEach(() => {
provider_tokens_set: {
github: "some-token",
gitlab: null,
azure_devops: null,
},
});
});

View File

@@ -23,6 +23,7 @@ const MOCK_RESPOSITORIES: GitRepository[] = [
{ id: "2", full_name: "repo2", git_provider: "github", is_public: true },
{ id: "3", full_name: "repo3", git_provider: "gitlab", is_public: true },
{ id: "4", full_name: "repo4", git_provider: "gitlab", is_public: true },
{ id: "5", full_name: "repo5", git_provider: "azure_devops", is_public: true },
];
const renderTaskCard = (task = MOCK_TASK_1) => {

View File

@@ -21,6 +21,7 @@ const mockUseConfig = vi.fn();
const mockUseRepositoryMicroagents = vi.fn();
const mockUseMicroagentManagementConversations = vi.fn();
const mockUseSearchRepositories = vi.fn();
const mockUseCreateConversationAndSubscribeMultiple = vi.fn();
vi.mock("#/hooks/use-user-providers", () => ({
useUserProviders: () => mockUseUserProviders(),
@@ -47,6 +48,17 @@ vi.mock("#/hooks/query/use-search-repositories", () => ({
useSearchRepositories: () => mockUseSearchRepositories(),
}));
vi.mock("#/hooks/use-tracking", () => ({
useTracking: () => ({
trackEvent: vi.fn(),
}),
}));
vi.mock("#/hooks/use-create-conversation-and-subscribe-multiple", () => ({
useCreateConversationAndSubscribeMultiple: () =>
mockUseCreateConversationAndSubscribeMultiple(),
}));
describe("MicroagentManagement", () => {
const RouterStub = createRoutesStub([
{
@@ -309,6 +321,16 @@ describe("MicroagentManagement", () => {
isError: false,
});
mockUseCreateConversationAndSubscribeMultiple.mockReturnValue({
createConversationAndSubscribe: vi.fn(({ onSuccessCallback }) => {
// Immediately call the success callback to close the modal
if (onSuccessCallback) {
onSuccessCallback();
}
}),
isPending: false,
});
// Mock the search repositories hook to return repositories with OpenHands suffixes
const mockSearchResults =
getRepositoriesWithOpenHandsSuffix(mockRepositories);

View File

@@ -188,172 +188,4 @@ describe("PaymentForm", () => {
expect(mockMutate).not.toHaveBeenCalled();
});
});
describe("Cancel Subscription", () => {
const getSubscriptionAccessSpy = vi.spyOn(
BillingService,
"getSubscriptionAccess",
);
const cancelSubscriptionSpy = vi.spyOn(
BillingService,
"cancelSubscription",
);
beforeEach(() => {
// Mock active subscription
getSubscriptionAccessSpy.mockResolvedValue({
start_at: "2024-01-01T00:00:00Z",
end_at: "2024-12-31T23:59:59Z",
created_at: "2024-01-01T00:00:00Z",
});
});
it("should render cancel subscription button when user has active subscription", async () => {
renderPaymentForm();
await waitFor(() => {
const cancelButton = screen.getByTestId("cancel-subscription-button");
expect(cancelButton).toBeInTheDocument();
expect(cancelButton).toHaveTextContent("PAYMENT$CANCEL_SUBSCRIPTION");
});
});
it("should not render cancel subscription button when user has no subscription", async () => {
getSubscriptionAccessSpy.mockResolvedValue(null);
renderPaymentForm();
await waitFor(() => {
const cancelButton = screen.queryByTestId("cancel-subscription-button");
expect(cancelButton).not.toBeInTheDocument();
});
});
it("should show confirmation modal when cancel subscription button is clicked", async () => {
const user = userEvent.setup();
renderPaymentForm();
const cancelButton = await screen.findByTestId(
"cancel-subscription-button",
);
await user.click(cancelButton);
// Should show confirmation modal
expect(
screen.getByTestId("cancel-subscription-modal"),
).toBeInTheDocument();
expect(
screen.getByText("PAYMENT$CANCEL_SUBSCRIPTION_TITLE"),
).toBeInTheDocument();
// The message should be rendered (either with Trans component or regular text)
const modalContent = screen.getByTestId("cancel-subscription-modal");
expect(modalContent).toBeInTheDocument();
expect(screen.getByTestId("confirm-cancel-button")).toBeInTheDocument();
expect(screen.getByTestId("modal-cancel-button")).toBeInTheDocument();
});
it("should close modal when cancel button in modal is clicked", async () => {
const user = userEvent.setup();
renderPaymentForm();
const cancelButton = await screen.findByTestId(
"cancel-subscription-button",
);
await user.click(cancelButton);
// Modal should be visible
expect(
screen.getByTestId("cancel-subscription-modal"),
).toBeInTheDocument();
// Click cancel in modal
const modalCancelButton = screen.getByTestId("modal-cancel-button");
await user.click(modalCancelButton);
// Modal should be closed
expect(
screen.queryByTestId("cancel-subscription-modal"),
).not.toBeInTheDocument();
});
it("should call cancel subscription API when confirm button is clicked", async () => {
const user = userEvent.setup();
renderPaymentForm();
const cancelButton = await screen.findByTestId(
"cancel-subscription-button",
);
await user.click(cancelButton);
// Click confirm in modal
const confirmButton = screen.getByTestId("confirm-cancel-button");
await user.click(confirmButton);
// Should call the cancel subscription API
expect(cancelSubscriptionSpy).toHaveBeenCalled();
});
it("should close modal after successful cancellation", async () => {
const user = userEvent.setup();
cancelSubscriptionSpy.mockResolvedValue({
status: "success",
message: "Subscription cancelled successfully",
});
renderPaymentForm();
const cancelButton = await screen.findByTestId(
"cancel-subscription-button",
);
await user.click(cancelButton);
const confirmButton = screen.getByTestId("confirm-cancel-button");
await user.click(confirmButton);
// Wait for API call to complete and modal to close
await waitFor(() => {
expect(
screen.queryByTestId("cancel-subscription-modal"),
).not.toBeInTheDocument();
});
});
it("should show next billing date for active subscription", async () => {
// Mock active subscription with end_at as next billing date
getSubscriptionAccessSpy.mockResolvedValue({
start_at: "2024-01-01T00:00:00Z",
end_at: "2025-01-01T00:00:00Z",
created_at: "2024-01-01T00:00:00Z",
cancelled_at: null,
stripe_subscription_id: "sub_123",
});
renderPaymentForm();
await waitFor(() => {
const nextBillingInfo = screen.getByTestId("next-billing-date");
expect(nextBillingInfo).toBeInTheDocument();
// Check that it contains some date-related content (translation key or actual date)
expect(nextBillingInfo).toHaveTextContent(
/2025|PAYMENT.*BILLING.*DATE/,
);
});
});
it("should not show next billing date when subscription is cancelled", async () => {
// Mock cancelled subscription
getSubscriptionAccessSpy.mockResolvedValue({
start_at: "2024-01-01T00:00:00Z",
end_at: "2025-01-01T00:00:00Z",
created_at: "2024-01-01T00:00:00Z",
cancelled_at: "2024-06-15T10:30:00Z",
stripe_subscription_id: "sub_123",
});
renderPaymentForm();
await waitFor(() => {
const nextBillingInfo = screen.queryByTestId("next-billing-date");
expect(nextBillingInfo).not.toBeInTheDocument();
});
});
});
});

View File

@@ -30,7 +30,7 @@ describe("ImagePreview", () => {
expect(onRemoveMock).toHaveBeenCalledOnce();
});
it("shoud not display the close button when onRemove is not provided", () => {
it("should not display the close button when onRemove is not provided", () => {
render(<ImagePreview src="https://example.com/image.jpg" />);
expect(screen.queryByRole("button")).not.toBeInTheDocument();
});

View File

@@ -1,47 +0,0 @@
import { render, screen } from "@testing-library/react";
import { JupyterEditor } from "#/components/features/jupyter/jupyter";
import { vi, describe, it, expect, beforeEach } from "vitest";
import { AgentState } from "#/types/agent-state";
import { useAgentState } from "#/hooks/use-agent-state";
import { useJupyterStore } from "#/state/jupyter-store";
// Mock the agent state hook
vi.mock("#/hooks/use-agent-state", () => ({
useAgentState: vi.fn(),
}));
// Mock react-i18next
vi.mock("react-i18next", () => ({
useTranslation: () => ({
t: (key: string) => key,
}),
}));
describe("JupyterEditor", () => {
beforeEach(() => {
// Reset the Zustand store before each test
useJupyterStore.setState({
cells: Array(20).fill({
content: "Test cell content",
type: "input",
imageUrls: undefined,
}),
});
});
it("should have a scrollable container", () => {
// Mock agent state to return RUNNING state (not in RUNTIME_INACTIVE_STATES)
vi.mocked(useAgentState).mockReturnValue({
curAgentState: AgentState.RUNNING,
});
render(
<div style={{ height: "100vh" }}>
<JupyterEditor maxWidth={800} />
</div>,
);
const container = screen.getByTestId("jupyter-container");
expect(container).toHaveClass("flex-1 overflow-y-auto");
});
});

View File

@@ -11,6 +11,7 @@ const renderTerminal = (commands: Command[] = []) => {
};
describe.skip("Terminal", () => {
// Terminal is now read-only - no user input functionality
global.ResizeObserver = vi.fn().mockImplementation(() => ({
observe: vi.fn(),
disconnect: vi.fn(),
@@ -21,8 +22,6 @@ describe.skip("Terminal", () => {
write: vi.fn(),
writeln: vi.fn(),
dispose: vi.fn(),
onKey: vi.fn(),
attachCustomKeyEventHandler: vi.fn(),
loadAddon: vi.fn(),
};

View File

@@ -1,6 +1,7 @@
import { describe, it, expect, beforeAll, afterAll, afterEach } from "vitest";
import { screen, waitFor, render, cleanup } from "@testing-library/react";
import { QueryClient, QueryClientProvider } from "@tanstack/react-query";
import { http, HttpResponse } from "msw";
import { useOptimisticUserMessageStore } from "#/stores/optimistic-user-message-store";
import {
createMockMessageEvent,
@@ -13,8 +14,12 @@ import {
OptimisticUserMessageStoreComponent,
ErrorMessageStoreComponent,
} from "./helpers/websocket-test-components";
import { ConversationWebSocketProvider } from "#/contexts/conversation-websocket-context";
import {
ConversationWebSocketProvider,
useConversationWebSocket,
} from "#/contexts/conversation-websocket-context";
import { conversationWebSocketTestSetup } from "./helpers/msw-websocket-setup";
import { useEventStore } from "#/stores/use-event-store";
// MSW WebSocket mock setup
const { wsLink, server: mswServer } = conversationWebSocketTestSetup();
@@ -417,7 +422,206 @@ describe("Conversation WebSocket Handler", () => {
it.todo("should handle send attempts when disconnected");
});
// 8. Terminal I/O Tests (ExecuteBashAction and ExecuteBashObservation)
// 8. History Loading State Tests
describe("History Loading State", () => {
it("should track history loading state using event count from API", async () => {
const conversationId = "test-conversation-with-history";
// Mock the event count API to return 3 events
const expectedEventCount = 3;
// Create 3 mock events to simulate history
const mockHistoryEvents = [
createMockUserMessageEvent({ id: "history-event-1" }),
createMockMessageEvent({ id: "history-event-2" }),
createMockMessageEvent({ id: "history-event-3" }),
];
// Set up MSW to mock both the HTTP API and WebSocket connection
mswServer.use(
http.get("/api/v1/events/count", ({ request }) => {
const url = new URL(request.url);
const conversationIdParam = url.searchParams.get(
"conversation_id__eq",
);
if (conversationIdParam === conversationId) {
return HttpResponse.json(expectedEventCount);
}
return HttpResponse.json(0);
}),
wsLink.addEventListener("connection", ({ client, server }) => {
server.connect();
// Send all history events
mockHistoryEvents.forEach((event) => {
client.send(JSON.stringify(event));
});
}),
);
// Create a test component that displays loading state
const HistoryLoadingComponent = () => {
const context = useConversationWebSocket();
const { events } = useEventStore();
return (
<div>
<div data-testid="is-loading-history">
{context?.isLoadingHistory ? "true" : "false"}
</div>
<div data-testid="events-received">{events.length}</div>
<div data-testid="expected-event-count">{expectedEventCount}</div>
</div>
);
};
// Render with WebSocket context
renderWithWebSocketContext(
<HistoryLoadingComponent />,
conversationId,
`http://localhost:3000/api/conversations/${conversationId}`,
);
// Initially should be loading history
expect(screen.getByTestId("is-loading-history")).toHaveTextContent("true");
// Wait for all events to be received
await waitFor(() => {
expect(screen.getByTestId("events-received")).toHaveTextContent("3");
});
// Once all events are received, loading should be complete
await waitFor(() => {
expect(screen.getByTestId("is-loading-history")).toHaveTextContent(
"false",
);
});
});
it("should handle empty conversation history", async () => {
const conversationId = "test-conversation-empty";
// Set up MSW to mock both the HTTP API and WebSocket connection
mswServer.use(
http.get("/api/v1/events/count", ({ request }) => {
const url = new URL(request.url);
const conversationIdParam = url.searchParams.get(
"conversation_id__eq",
);
if (conversationIdParam === conversationId) {
return HttpResponse.json(0);
}
return HttpResponse.json(0);
}),
wsLink.addEventListener("connection", ({ server }) => {
server.connect();
// No events sent for empty history
}),
);
// Create a test component that displays loading state
const HistoryLoadingComponent = () => {
const context = useConversationWebSocket();
return (
<div>
<div data-testid="is-loading-history">
{context?.isLoadingHistory ? "true" : "false"}
</div>
</div>
);
};
// Render with WebSocket context
renderWithWebSocketContext(
<HistoryLoadingComponent />,
conversationId,
`http://localhost:3000/api/conversations/${conversationId}`,
);
// Should quickly transition from loading to not loading when count is 0
await waitFor(() => {
expect(screen.getByTestId("is-loading-history")).toHaveTextContent(
"false",
);
});
});
it("should handle history loading with large event count", async () => {
const conversationId = "test-conversation-large-history";
// Create 50 mock events to simulate large history
const expectedEventCount = 50;
const mockHistoryEvents = Array.from({ length: 50 }, (_, i) =>
createMockMessageEvent({ id: `history-event-${i + 1}` }),
);
// Set up MSW to mock both the HTTP API and WebSocket connection
mswServer.use(
http.get("/api/v1/events/count", ({ request }) => {
const url = new URL(request.url);
const conversationIdParam = url.searchParams.get(
"conversation_id__eq",
);
if (conversationIdParam === conversationId) {
return HttpResponse.json(expectedEventCount);
}
return HttpResponse.json(0);
}),
wsLink.addEventListener("connection", ({ client, server }) => {
server.connect();
// Send all history events
mockHistoryEvents.forEach((event) => {
client.send(JSON.stringify(event));
});
}),
);
// Create a test component that displays loading state
const HistoryLoadingComponent = () => {
const context = useConversationWebSocket();
const { events } = useEventStore();
return (
<div>
<div data-testid="is-loading-history">
{context?.isLoadingHistory ? "true" : "false"}
</div>
<div data-testid="events-received">{events.length}</div>
</div>
);
};
// Render with WebSocket context
renderWithWebSocketContext(
<HistoryLoadingComponent />,
conversationId,
`http://localhost:3000/api/conversations/${conversationId}`,
);
// Initially should be loading history
expect(screen.getByTestId("is-loading-history")).toHaveTextContent("true");
// Wait for all events to be received
await waitFor(() => {
expect(screen.getByTestId("events-received")).toHaveTextContent("50");
});
// Once all events are received, loading should be complete
await waitFor(() => {
expect(screen.getByTestId("is-loading-history")).toHaveTextContent(
"false",
);
});
});
});
// 9. Terminal I/O Tests (ExecuteBashAction and ExecuteBashObservation)
describe("Terminal I/O Integration", () => {
it("should append command to store when ExecuteBashAction event is received", async () => {
const { createMockExecuteBashActionEvent } = await import(

View File

@@ -38,8 +38,7 @@ export const createWebSocketTestSetup = (
/**
* Standard WebSocket test setup for conversation WebSocket handler tests
* Updated to use the V1 WebSocket URL pattern: /sockets/events/{conversationId}
* Uses a wildcard pattern to match any conversation ID
*/
export const conversationWebSocketTestSetup = () =>
createWebSocketTestSetup(
"ws://localhost:3000/sockets/events/test-conversation-default",
);
createWebSocketTestSetup("ws://localhost:3000/sockets/events/*");

View File

@@ -35,13 +35,12 @@ function TestTerminalComponent() {
}
describe("useTerminal", () => {
// Terminal is read-only - no longer tests user input functionality
const mockTerminal = vi.hoisted(() => ({
loadAddon: vi.fn(),
open: vi.fn(),
write: vi.fn(),
writeln: vi.fn(),
onKey: vi.fn(),
attachCustomKeyEventHandler: vi.fn(),
dispose: vi.fn(),
}));

View File

@@ -268,7 +268,7 @@ describe("useWebSocket", () => {
});
// onError handler should have been called
expect(onErrorSpy).toHaveBeenCalledOnce();
expect(onErrorSpy).toHaveBeenCalled();
});
it("should provide sendMessage function to send messages to WebSocket", async () => {

View File

@@ -0,0 +1,233 @@
import {
describe,
it,
expect,
beforeAll,
afterAll,
afterEach,
vi,
} from "vitest";
import { screen, waitFor, render, cleanup } from "@testing-library/react";
import { QueryClient, QueryClientProvider } from "@tanstack/react-query";
import { createMockAgentErrorEvent } from "#/mocks/mock-ws-helpers";
import { ConversationWebSocketProvider } from "#/contexts/conversation-websocket-context";
import { conversationWebSocketTestSetup } from "./helpers/msw-websocket-setup";
import { ConnectionStatusComponent } from "./helpers/websocket-test-components";
// Mock the tracking function
const mockTrackCreditLimitReached = vi.fn();
// Mock useTracking hook
vi.mock("#/hooks/use-tracking", () => ({
useTracking: () => ({
trackCreditLimitReached: mockTrackCreditLimitReached,
trackLoginButtonClick: vi.fn(),
trackConversationCreated: vi.fn(),
trackPushButtonClick: vi.fn(),
trackPullButtonClick: vi.fn(),
trackCreatePrButtonClick: vi.fn(),
trackGitProviderConnected: vi.fn(),
trackUserSignupCompleted: vi.fn(),
trackCreditsPurchased: vi.fn(),
}),
}));
// Mock useActiveConversation hook
vi.mock("#/hooks/query/use-active-conversation", () => ({
useActiveConversation: () => ({
data: null,
isLoading: false,
error: null,
}),
}));
// MSW WebSocket mock setup
const { wsLink, server: mswServer } = conversationWebSocketTestSetup();
beforeAll(() => {
// The global MSW server from vitest.setup.ts is already running
// We just need to start our WebSocket-specific server
mswServer.listen({ onUnhandledRequest: "bypass" });
});
afterEach(() => {
// Clear all mocks before each test
mockTrackCreditLimitReached.mockClear();
mswServer.resetHandlers();
// Clean up any React components
cleanup();
});
afterAll(async () => {
// Close the WebSocket MSW server
mswServer.close();
// Give time for any pending WebSocket connections to close. This is very important to prevent serious memory leaks
await new Promise((resolve) => {
setTimeout(resolve, 500);
});
});
// Helper function to render components with all necessary providers
function renderWithProviders(
children: React.ReactNode,
conversationId = "test-conversation-123",
conversationUrl = "http://localhost:3000/api/conversations/test-conversation-123",
) {
const queryClient = new QueryClient({
defaultOptions: {
queries: { retry: false },
mutations: { retry: false },
},
});
return render(
<QueryClientProvider client={queryClient}>
<ConversationWebSocketProvider
conversationId={conversationId}
conversationUrl={conversationUrl}
sessionApiKey={null}
>
{children}
</ConversationWebSocketProvider>
</QueryClientProvider>,
);
}
describe("PostHog Analytics Tracking", () => {
describe("Credit Limit Tracking", () => {
it("should track credit_limit_reached when AgentErrorEvent contains budget error", async () => {
// Create a mock AgentErrorEvent with budget-related error message
const mockBudgetErrorEvent = createMockAgentErrorEvent({
error: "ExceededBudget: Task exceeded maximum budget of $10.00",
});
// Set up MSW to send the budget error event when connection is established
mswServer.use(
wsLink.addEventListener("connection", ({ client, server }) => {
server.connect();
// Send the mock budget error event after connection
client.send(JSON.stringify(mockBudgetErrorEvent));
}),
);
// Render with all providers
renderWithProviders(<ConnectionStatusComponent />);
// Wait for connection to be established
await waitFor(() => {
expect(screen.getByTestId("connection-state")).toHaveTextContent(
"OPEN",
);
});
// Wait for the tracking event to be captured
await waitFor(() => {
expect(mockTrackCreditLimitReached).toHaveBeenCalledWith(
expect.objectContaining({
conversationId: "test-conversation-123",
}),
);
});
});
it("should track credit_limit_reached when AgentErrorEvent contains 'credit' keyword", async () => {
// Create error with "credit" keyword (case-insensitive)
const mockCreditErrorEvent = createMockAgentErrorEvent({
error: "Insufficient CREDIT to complete this operation",
});
mswServer.use(
wsLink.addEventListener("connection", ({ client, server }) => {
server.connect();
client.send(JSON.stringify(mockCreditErrorEvent));
}),
);
renderWithProviders(<ConnectionStatusComponent />);
await waitFor(() => {
expect(screen.getByTestId("connection-state")).toHaveTextContent(
"OPEN",
);
});
await waitFor(() => {
expect(mockTrackCreditLimitReached).toHaveBeenCalledWith(
expect.objectContaining({
conversationId: "test-conversation-123",
}),
);
});
});
it("should NOT track credit_limit_reached for non-budget errors", async () => {
// Create a regular error without budget/credit keywords
const mockRegularErrorEvent = createMockAgentErrorEvent({
error: "Failed to execute command: Permission denied",
});
mswServer.use(
wsLink.addEventListener("connection", ({ client, server }) => {
server.connect();
client.send(JSON.stringify(mockRegularErrorEvent));
}),
);
renderWithProviders(<ConnectionStatusComponent />);
// Wait for connection and error to be processed
await waitFor(() => {
expect(screen.getByTestId("connection-state")).toHaveTextContent(
"OPEN",
);
});
// Verify that credit_limit_reached was NOT tracked
expect(mockTrackCreditLimitReached).not.toHaveBeenCalled();
});
it("should only track credit_limit_reached once per error event", async () => {
const mockBudgetErrorEvent = createMockAgentErrorEvent({
error: "Budget exceeded: $10.00 limit reached",
});
mswServer.use(
wsLink.addEventListener("connection", ({ client, server }) => {
server.connect();
// Send the same error event twice
client.send(JSON.stringify(mockBudgetErrorEvent));
client.send(
JSON.stringify({ ...mockBudgetErrorEvent, id: "different-id" }),
);
}),
);
renderWithProviders(<ConnectionStatusComponent />);
await waitFor(() => {
expect(screen.getByTestId("connection-state")).toHaveTextContent(
"OPEN",
);
});
await waitFor(() => {
expect(mockTrackCreditLimitReached).toHaveBeenCalledTimes(2);
});
// Both calls should be for credit_limit_reached (once per event)
expect(mockTrackCreditLimitReached).toHaveBeenNthCalledWith(
1,
expect.objectContaining({
conversationId: "test-conversation-123",
}),
);
expect(mockTrackCreditLimitReached).toHaveBeenNthCalledWith(
2,
expect.objectContaining({
conversationId: "test-conversation-123",
}),
);
});
});
});

View File

@@ -1,10 +1,9 @@
import { render, screen } from "@testing-library/react";
import { it, describe, expect, vi, beforeEach, afterEach } from "vitest";
import userEvent from "@testing-library/user-event";
import { QueryClient, QueryClientProvider } from "@tanstack/react-query";
import AcceptTOS from "#/routes/accept-tos";
import * as CaptureConsent from "#/utils/handle-capture-consent";
import * as ToastHandlers from "#/utils/custom-toast-handlers";
import { QueryClient, QueryClientProvider } from "@tanstack/react-query";
import { openHands } from "#/api/open-hands-axios";
// Mock the react-router hooks
@@ -44,9 +43,13 @@ const createWrapper = () => {
},
});
return ({ children }: { children: React.ReactNode }) => (
<QueryClientProvider client={queryClient}>{children}</QueryClientProvider>
);
function Wrapper({ children }: { children: React.ReactNode }) {
return (
<QueryClientProvider client={queryClient}>{children}</QueryClientProvider>
);
}
return Wrapper;
};
describe("AcceptTOS", () => {
@@ -106,7 +109,10 @@ describe("AcceptTOS", () => {
// Wait for the mutation to complete
await new Promise(process.nextTick);
expect(handleCaptureConsentSpy).toHaveBeenCalledWith(true);
expect(handleCaptureConsentSpy).toHaveBeenCalledWith(
expect.anything(),
true,
);
expect(openHands.post).toHaveBeenCalledWith("/api/accept_tos", {
redirect_url: "/dashboard",
});

View File

@@ -46,6 +46,21 @@ describe("Content", () => {
});
});
it("should render analytics toggle as enabled when server returns null (opt-in by default)", async () => {
const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");
getSettingsSpy.mockResolvedValue({
...MOCK_DEFAULT_USER_SETTINGS,
user_consents_to_analytics: null,
});
renderAppSettingsScreen();
await waitFor(() => {
const analytics = screen.getByTestId("enable-analytics-switch");
expect(analytics).toBeChecked();
});
});
it("should render the language options", async () => {
renderAppSettingsScreen();
@@ -163,7 +178,10 @@ describe("Form submission", () => {
await userEvent.click(submit);
await waitFor(() =>
expect(handleCaptureConsentsSpy).toHaveBeenCalledWith(true),
expect(handleCaptureConsentsSpy).toHaveBeenCalledWith(
expect.anything(),
true,
),
);
});
@@ -188,7 +206,10 @@ describe("Form submission", () => {
await userEvent.click(submit);
await waitFor(() =>
expect(handleCaptureConsentsSpy).toHaveBeenCalledWith(false),
expect(handleCaptureConsentsSpy).toHaveBeenCalledWith(
expect.anything(),
false,
),
);
});

View File

@@ -124,6 +124,9 @@ describe("Content", () => {
await screen.findByTestId("bitbucket-token-input");
await screen.findByTestId("bitbucket-token-help-anchor");
await screen.findByTestId("azure-devops-token-input");
await screen.findByTestId("azure-devops-token-help-anchor");
getConfigSpy.mockResolvedValue(VALID_SAAS_CONFIG);
queryClient.invalidateQueries();
rerender();
@@ -149,6 +152,13 @@ describe("Content", () => {
expect(
screen.queryByTestId("bitbucket-token-help-anchor"),
).not.toBeInTheDocument();
expect(
screen.queryByTestId("azure-devops-token-input"),
).not.toBeInTheDocument();
expect(
screen.queryByTestId("azure-devops-token-help-anchor"),
).not.toBeInTheDocument();
});
});
@@ -287,6 +297,7 @@ describe("Form submission", () => {
github: { token: "test-token", host: "" },
gitlab: { token: "", host: "" },
bitbucket: { token: "", host: "" },
azure_devops: { token: "", host: "" },
});
});
@@ -308,6 +319,7 @@ describe("Form submission", () => {
github: { token: "", host: "" },
gitlab: { token: "test-token", host: "" },
bitbucket: { token: "", host: "" },
azure_devops: { token: "", host: "" },
});
});
@@ -329,6 +341,29 @@ describe("Form submission", () => {
github: { token: "", host: "" },
gitlab: { token: "", host: "" },
bitbucket: { token: "test-token", host: "" },
azure_devops: { token: "", host: "" },
});
});
it("should save the Azure DevOps token", async () => {
const saveProvidersSpy = vi.spyOn(SecretsService, "addGitProvider");
saveProvidersSpy.mockImplementation(() => Promise.resolve(true));
const getConfigSpy = vi.spyOn(OptionService, "getConfig");
getConfigSpy.mockResolvedValue(VALID_OSS_CONFIG);
renderGitSettingsScreen();
const azureDevOpsInput = await screen.findByTestId("azure-devops-token-input");
const submit = await screen.findByTestId("submit-button");
await userEvent.type(azureDevOpsInput, "test-token");
await userEvent.click(submit);
expect(saveProvidersSpy).toHaveBeenCalledWith({
github: { token: "", host: "" },
gitlab: { token: "", host: "" },
bitbucket: { token: "", host: "" },
azure_devops: { token: "test-token", host: "" },
});
});

View File

@@ -4,14 +4,12 @@ import { beforeEach, describe, expect, it, vi } from "vitest";
import { QueryClientProvider, QueryClient } from "@tanstack/react-query";
import LlmSettingsScreen from "#/routes/llm-settings";
import SettingsService from "#/settings-service/settings-service.api";
import OptionService from "#/api/option-service/option-service.api";
import {
MOCK_DEFAULT_USER_SETTINGS,
resetTestHandlersMockSettings,
} from "#/mocks/handlers";
import * as AdvancedSettingsUtlls from "#/utils/has-advanced-settings-set";
import * as ToastHandlers from "#/utils/custom-toast-handlers";
import BillingService from "#/api/billing-service/billing-service.api";
// Mock react-router hooks
const mockUseSearchParams = vi.fn();
@@ -25,12 +23,6 @@ vi.mock("#/hooks/query/use-is-authed", () => ({
useIsAuthed: () => mockUseIsAuthed(),
}));
// Mock useIsAllHandsSaaSEnvironment hook
const mockUseIsAllHandsSaaSEnvironment = vi.fn();
vi.mock("#/hooks/use-is-all-hands-saas-environment", () => ({
useIsAllHandsSaaSEnvironment: () => mockUseIsAllHandsSaaSEnvironment(),
}));
const renderLlmSettingsScreen = () =>
render(<LlmSettingsScreen />, {
wrapper: ({ children }) => (
@@ -54,9 +46,6 @@ beforeEach(() => {
// Default mock for useIsAuthed - returns authenticated by default
mockUseIsAuthed.mockReturnValue({ data: true, isLoading: false });
// Default mock for useIsAllHandsSaaSEnvironment - returns true for SaaS environment
mockUseIsAllHandsSaaSEnvironment.mockReturnValue(true);
});
describe("Content", () => {
@@ -605,9 +594,14 @@ describe("Form submission", () => {
renderLlmSettingsScreen();
await screen.findByTestId("llm-settings-screen");
// Component automatically shows advanced view when advanced settings exist
// Switch to basic view to test clearing advanced settings
const advancedSwitch = screen.getByTestId("advanced-settings-switch");
await userEvent.click(advancedSwitch);
// Now we should be in basic view
await screen.findByTestId("llm-settings-form-basic");
const provider = screen.getByTestId("llm-provider-input");
const model = screen.getByTestId("llm-model-input");
@@ -731,405 +725,3 @@ describe("Status toasts", () => {
});
});
});
describe("SaaS mode", () => {
describe("SaaS subscription", () => {
// Common mock configurations
const MOCK_SAAS_CONFIG = {
APP_MODE: "saas" as const,
GITHUB_CLIENT_ID: "fake-github-client-id",
POSTHOG_CLIENT_KEY: "fake-posthog-client-key",
FEATURE_FLAGS: {
ENABLE_BILLING: true,
HIDE_LLM_SETTINGS: false,
ENABLE_JIRA: false,
ENABLE_JIRA_DC: false,
ENABLE_LINEAR: false,
},
};
const MOCK_ACTIVE_SUBSCRIPTION = {
start_at: "2024-01-01",
end_at: "2024-12-31",
created_at: "2024-01-01",
};
it("should show upgrade banner and prevent all interactions for unsubscribed SaaS users", async () => {
// Mock SaaS mode without subscription
const getConfigSpy = vi.spyOn(OptionService, "getConfig");
getConfigSpy.mockResolvedValue(MOCK_SAAS_CONFIG);
// Mock subscription access to return null (no subscription)
const getSubscriptionAccessSpy = vi.spyOn(
BillingService,
"getSubscriptionAccess",
);
getSubscriptionAccessSpy.mockResolvedValue(null);
// Mock saveSettings to ensure it's not called
const saveSettingsSpy = vi.spyOn(SettingsService, "saveSettings");
renderLlmSettingsScreen();
await screen.findByTestId("llm-settings-screen");
// Should show upgrade banner
expect(screen.getByTestId("upgrade-banner")).toBeInTheDocument();
// Should have a clickable upgrade button
const upgradeButton = screen.getByRole("button", { name: /upgrade/i });
expect(upgradeButton).toBeInTheDocument();
expect(upgradeButton).not.toBeDisabled();
// Form should be disabled
const form = screen.getByTestId("llm-settings-form-basic");
expect(form).toHaveAttribute("aria-disabled", "true");
// All form inputs should be disabled or non-interactive
const providerInput = screen.getByTestId("llm-provider-input");
const modelInput = screen.getByTestId("llm-model-input");
const apiKeyInput = screen.getByTestId("llm-api-key-input");
const advancedSwitch = screen.getByTestId("advanced-settings-switch");
const submitButton = screen.getByTestId("submit-button");
// Inputs should be disabled
expect(providerInput).toBeDisabled();
expect(modelInput).toBeDisabled();
expect(apiKeyInput).toBeDisabled();
expect(advancedSwitch).toBeDisabled();
expect(submitButton).toBeDisabled();
// Confirmation mode switch is in advanced view, so it's not visible in basic view
expect(
screen.queryByTestId("enable-confirmation-mode-switch"),
).not.toBeInTheDocument();
// Try to interact with inputs - they should not respond
await userEvent.click(providerInput);
await userEvent.type(apiKeyInput, "test-key");
// Values should not change
expect(apiKeyInput).toHaveValue("");
// Try to submit form - should not call API
await userEvent.click(submitButton);
expect(saveSettingsSpy).not.toHaveBeenCalled();
});
it("should call subscription checkout API when upgrade button is clicked", async () => {
// Mock SaaS mode without subscription
const getConfigSpy = vi.spyOn(OptionService, "getConfig");
getConfigSpy.mockResolvedValue(MOCK_SAAS_CONFIG);
// Mock subscription access to return null (no subscription)
const getSubscriptionAccessSpy = vi.spyOn(
BillingService,
"getSubscriptionAccess",
);
getSubscriptionAccessSpy.mockResolvedValue(null);
// Mock the subscription checkout API call
const createSubscriptionCheckoutSessionSpy = vi.spyOn(
BillingService,
"createSubscriptionCheckoutSession",
);
createSubscriptionCheckoutSessionSpy.mockResolvedValue({});
renderLlmSettingsScreen();
await screen.findByTestId("llm-settings-screen");
// Click the upgrade button
const upgradeButton = screen.getByRole("button", { name: /upgrade/i });
await userEvent.click(upgradeButton);
// Should call the subscription checkout API
expect(createSubscriptionCheckoutSessionSpy).toHaveBeenCalled();
});
it("should disable upgrade button for unauthenticated users in SaaS mode", async () => {
// Mock SaaS mode without subscription
const getConfigSpy = vi.spyOn(OptionService, "getConfig");
getConfigSpy.mockResolvedValue(MOCK_SAAS_CONFIG);
// Mock subscription access to return null (no subscription)
const getSubscriptionAccessSpy = vi.spyOn(
BillingService,
"getSubscriptionAccess",
);
getSubscriptionAccessSpy.mockResolvedValue(null);
// Mock subscription checkout API
const createSubscriptionCheckoutSessionSpy = vi.spyOn(
BillingService,
"createSubscriptionCheckoutSession",
);
// Mock authentication to return false (unauthenticated) from the start
mockUseIsAuthed.mockReturnValue({ data: false, isLoading: false });
// Mock settings to return default settings even when unauthenticated
// This is necessary because the useSettings hook is disabled when user is not authenticated
const getSettingsSpy = vi.spyOn(SettingsService, "getSettings");
getSettingsSpy.mockResolvedValue(MOCK_DEFAULT_USER_SETTINGS);
renderLlmSettingsScreen();
// Wait for either the settings screen or skeleton to appear
await waitFor(() => {
const settingsScreen = screen.queryByTestId("llm-settings-screen");
const skeleton = screen.queryByTestId("app-settings-skeleton");
expect(settingsScreen || skeleton).toBeInTheDocument();
});
// If we get the skeleton, the test scenario isn't valid - skip the rest
if (screen.queryByTestId("app-settings-skeleton")) {
// For unauthenticated users, the settings don't load, so no upgrade banner is shown
// This is the expected behavior - unauthenticated users see a skeleton loading state
expect(screen.queryByTestId("upgrade-banner")).not.toBeInTheDocument();
return;
}
await screen.findByTestId("llm-settings-screen");
// Should show upgrade banner
expect(screen.getByTestId("upgrade-banner")).toBeInTheDocument();
// Upgrade button should be disabled for unauthenticated users
const upgradeButton = screen.getByRole("button", { name: /upgrade/i });
expect(upgradeButton).toBeInTheDocument();
expect(upgradeButton).toBeDisabled();
// Clicking disabled button should not call the API
await userEvent.click(upgradeButton);
expect(createSubscriptionCheckoutSessionSpy).not.toHaveBeenCalled();
});
it("should not show upgrade banner and allow form interaction for subscribed SaaS users", async () => {
// Mock SaaS mode with subscription
const getConfigSpy = vi.spyOn(OptionService, "getConfig");
getConfigSpy.mockResolvedValue(MOCK_SAAS_CONFIG);
// Mock subscription access to return active subscription
const getSubscriptionAccessSpy = vi.spyOn(
BillingService,
"getSubscriptionAccess",
);
getSubscriptionAccessSpy.mockResolvedValue(MOCK_ACTIVE_SUBSCRIPTION);
renderLlmSettingsScreen();
await screen.findByTestId("llm-settings-screen");
// Wait for subscription data to load
await waitFor(() => {
expect(getSubscriptionAccessSpy).toHaveBeenCalled();
});
// Should NOT show upgrade banner
expect(screen.queryByTestId("upgrade-banner")).not.toBeInTheDocument();
// Form should NOT be disabled
const form = screen.getByTestId("llm-settings-form-basic");
expect(form).not.toHaveAttribute("aria-disabled", "true");
});
it("should not call save settings API when making changes in disabled form for unsubscribed users", async () => {
// Mock SaaS mode without subscription
const getConfigSpy = vi.spyOn(OptionService, "getConfig");
getConfigSpy.mockResolvedValue(MOCK_SAAS_CONFIG);
// Mock subscription access to return null (no subscription)
const getSubscriptionAccessSpy = vi.spyOn(
BillingService,
"getSubscriptionAccess",
);
getSubscriptionAccessSpy.mockResolvedValue(null);
// Mock saveSettings to track calls
const saveSettingsSpy = vi.spyOn(SettingsService, "saveSettings");
renderLlmSettingsScreen();
await screen.findByTestId("llm-settings-screen");
// Verify that basic form elements are disabled for unsubscribed users
const advancedSwitch = screen.getByTestId("advanced-settings-switch");
const submitButton = screen.getByTestId("submit-button");
expect(advancedSwitch).toBeDisabled();
expect(submitButton).toBeDisabled();
// Confirmation mode switch is in advanced view, which can't be accessed when form is disabled
expect(
screen.queryByTestId("enable-confirmation-mode-switch"),
).not.toBeInTheDocument();
// Try to submit the form - button should remain disabled
await userEvent.click(submitButton);
// Should NOT call save settings API for unsubscribed users
expect(saveSettingsSpy).not.toHaveBeenCalled();
});
it("should show backdrop overlay for unsubscribed users", async () => {
// Mock SaaS mode without subscription
const getConfigSpy = vi.spyOn(OptionService, "getConfig");
getConfigSpy.mockResolvedValue(MOCK_SAAS_CONFIG);
// Mock subscription access to return null (no subscription)
const getSubscriptionAccessSpy = vi.spyOn(
BillingService,
"getSubscriptionAccess",
);
getSubscriptionAccessSpy.mockResolvedValue(null);
renderLlmSettingsScreen();
await screen.findByTestId("llm-settings-screen");
// Wait for subscription data to load
await waitFor(() => {
expect(getSubscriptionAccessSpy).toHaveBeenCalled();
});
// Should show upgrade banner
expect(screen.getByTestId("upgrade-banner")).toBeInTheDocument();
// Should show backdrop overlay
const backdrop = screen.getByTestId("settings-backdrop");
expect(backdrop).toBeInTheDocument();
});
it("should not show backdrop overlay for subscribed users", async () => {
// Mock SaaS mode with subscription
const getConfigSpy = vi.spyOn(OptionService, "getConfig");
getConfigSpy.mockResolvedValue(MOCK_SAAS_CONFIG);
// Mock subscription access to return active subscription
const getSubscriptionAccessSpy = vi.spyOn(
BillingService,
"getSubscriptionAccess",
);
getSubscriptionAccessSpy.mockResolvedValue(MOCK_ACTIVE_SUBSCRIPTION);
renderLlmSettingsScreen();
await screen.findByTestId("llm-settings-screen");
// Wait for subscription data to load
await waitFor(() => {
expect(getSubscriptionAccessSpy).toHaveBeenCalled();
});
// Should NOT show backdrop overlay
expect(screen.queryByTestId("settings-backdrop")).not.toBeInTheDocument();
});
it("should display success toast when redirected back with ?checkout=success parameter", async () => {
// Mock SaaS mode
const getConfigSpy = vi.spyOn(OptionService, "getConfig");
getConfigSpy.mockResolvedValue(MOCK_SAAS_CONFIG);
// Mock subscription access
const getSubscriptionAccessSpy = vi.spyOn(
BillingService,
"getSubscriptionAccess",
);
getSubscriptionAccessSpy.mockResolvedValue(MOCK_ACTIVE_SUBSCRIPTION);
// Mock toast handler
const displaySuccessToastSpy = vi.spyOn(
ToastHandlers,
"displaySuccessToast",
);
// Mock URL search params with ?checkout=success
mockUseSearchParams.mockReturnValue([
{
get: (param: string) => (param === "checkout" ? "success" : null),
},
vi.fn(),
]);
// Render component with checkout=success parameter
renderLlmSettingsScreen();
await screen.findByTestId("llm-settings-screen");
// Verify success toast is displayed with correct message
expect(displaySuccessToastSpy).toHaveBeenCalledWith(
"SUBSCRIPTION$SUCCESS",
);
});
it("should display error toast when redirected back with ?checkout=cancel parameter", async () => {
// Mock SaaS mode
const getConfigSpy = vi.spyOn(OptionService, "getConfig");
getConfigSpy.mockResolvedValue(MOCK_SAAS_CONFIG);
// Mock subscription access
const getSubscriptionAccessSpy = vi.spyOn(
BillingService,
"getSubscriptionAccess",
);
getSubscriptionAccessSpy.mockResolvedValue(MOCK_ACTIVE_SUBSCRIPTION);
// Mock toast handler
const displayErrorToastSpy = vi.spyOn(ToastHandlers, "displayErrorToast");
// Mock URL search params with ?checkout=cancel
mockUseSearchParams.mockReturnValue([
{
get: (param: string) => (param === "checkout" ? "cancel" : null),
},
vi.fn(),
]);
// Render component with checkout=cancel parameter
renderLlmSettingsScreen();
await screen.findByTestId("llm-settings-screen");
// Verify error toast is displayed with correct message
expect(displayErrorToastSpy).toHaveBeenCalledWith("SUBSCRIPTION$FAILURE");
});
it("should show upgrade banner when subscription is expired or disabled", async () => {
// Mock SaaS mode
const getConfigSpy = vi.spyOn(OptionService, "getConfig");
getConfigSpy.mockResolvedValue(MOCK_SAAS_CONFIG);
// Mock subscription access to return null (expired/disabled subscriptions return null from backend)
// The backend only returns active subscriptions within their validity period
const getSubscriptionAccessSpy = vi.spyOn(
BillingService,
"getSubscriptionAccess",
);
getSubscriptionAccessSpy.mockResolvedValue(null);
renderLlmSettingsScreen();
await screen.findByTestId("llm-settings-screen");
// Wait for subscription data to load
await waitFor(() => {
expect(getSubscriptionAccessSpy).toHaveBeenCalled();
});
// Should show upgrade banner for expired/disabled subscriptions (when API returns null)
expect(screen.getByTestId("upgrade-banner")).toBeInTheDocument();
// Form should be disabled
const form = screen.getByTestId("llm-settings-form-basic");
expect(form).toHaveAttribute("aria-disabled", "true");
// All form inputs should be disabled
const providerInput = screen.getByTestId("llm-provider-input");
const modelInput = screen.getByTestId("llm-model-input");
const apiKeyInput = screen.getByTestId("llm-api-key-input");
const advancedSwitch = screen.getByTestId("advanced-settings-switch");
expect(providerInput).toBeDisabled();
expect(modelInput).toBeDisabled();
expect(apiKeyInput).toBeDisabled();
expect(advancedSwitch).toBeDisabled();
// Confirmation mode switch is in advanced view, which can't be accessed when form is disabled
expect(
screen.queryByTestId("enable-confirmation-mode-switch"),
).not.toBeInTheDocument();
});
});
});

View File

@@ -5,7 +5,6 @@ import { ActionMessage } from "#/types/message";
// Mock the store and actions
const mockDispatch = vi.fn();
const mockAppendInput = vi.fn();
const mockAppendJupyterInput = vi.fn();
vi.mock("#/store", () => ({
default: {
@@ -21,14 +20,6 @@ vi.mock("#/state/command-store", () => ({
},
}));
vi.mock("#/state/jupyter-store", () => ({
useJupyterStore: {
getState: () => ({
appendJupyterInput: mockAppendJupyterInput,
}),
},
}));
vi.mock("#/state/metrics-slice", () => ({
setMetrics: vi.fn(),
}));
@@ -63,10 +54,9 @@ describe("handleActionMessage", () => {
// Check that appendInput was called with the command
expect(mockAppendInput).toHaveBeenCalledWith("ls -la");
expect(mockDispatch).not.toHaveBeenCalled();
expect(mockAppendJupyterInput).not.toHaveBeenCalled();
});
it("should handle RUN_IPYTHON actions by adding input to Jupyter", async () => {
it("should handle RUN_IPYTHON actions as no-op (Jupyter removed)", async () => {
const { handleActionMessage } = await import("#/services/actions");
const ipythonAction: ActionMessage = {
@@ -84,10 +74,7 @@ describe("handleActionMessage", () => {
// Handle the action
handleActionMessage(ipythonAction);
// Check that appendJupyterInput was called with the code
expect(mockAppendJupyterInput).toHaveBeenCalledWith(
"print('Hello from Jupyter!')",
);
// Jupyter functionality has been removed, so nothing should be called
expect(mockAppendInput).not.toHaveBeenCalled();
});
@@ -112,6 +99,5 @@ describe("handleActionMessage", () => {
// Check that nothing was dispatched or called
expect(mockDispatch).not.toHaveBeenCalled();
expect(mockAppendInput).not.toHaveBeenCalled();
expect(mockAppendJupyterInput).not.toHaveBeenCalled();
});
});

View File

@@ -55,7 +55,7 @@ const mockObservationEvent: ObservationEvent = {
tool_call_id: "call_123",
observation: {
kind: "ExecuteBashObservation",
output: "hello\n",
content: [{ type: "text", text: "hello\n" }],
command: "echo hello",
exit_code: 0,
error: false,

View File

@@ -7,6 +7,7 @@ describe("convertRawProvidersToList", () => {
const example1: Partial<Record<Provider, string | null>> | undefined = {
github: "test-token",
gitlab: "test-token",
azure_devops: "test-token",
};
const example2: Partial<Record<Provider, string | null>> | undefined = {
github: "",
@@ -14,9 +15,13 @@ describe("convertRawProvidersToList", () => {
const example3: Partial<Record<Provider, string | null>> | undefined = {
gitlab: null,
};
const example4: Partial<Record<Provider, string | null>> | undefined = {
azure_devops: "test-token",
};
expect(convertRawProvidersToList(example1)).toEqual(["github", "gitlab"]);
expect(convertRawProvidersToList(example1)).toEqual(["github", "gitlab", "azure_devops"]);
expect(convertRawProvidersToList(example2)).toEqual(["github"]);
expect(convertRawProvidersToList(example3)).toEqual(["gitlab"]);
expect(convertRawProvidersToList(example4)).toEqual(["azure_devops"]);
});
});

View File

@@ -32,6 +32,7 @@ describe("Error Handler", () => {
const error = {
message: "Test error",
source: "test",
posthog,
};
trackError(error);
@@ -52,6 +53,7 @@ describe("Error Handler", () => {
extra: "info",
details: { foo: "bar" },
},
posthog,
};
trackError(error);
@@ -73,6 +75,7 @@ describe("Error Handler", () => {
const error = {
message: "Toast error",
source: "toast-test",
posthog,
};
showErrorToast(error);
@@ -94,6 +97,7 @@ describe("Error Handler", () => {
message: "Toast error",
source: "toast-test",
metadata: { context: "testing" },
posthog,
};
showErrorToast(error);
@@ -113,6 +117,7 @@ describe("Error Handler", () => {
message: "Agent error",
source: "agent-status",
metadata: { id: "error.agent" },
posthog,
});
expect(posthog.captureException).toHaveBeenCalledWith(
@@ -127,6 +132,7 @@ describe("Error Handler", () => {
message: "Server error",
source: "server",
metadata: { error_code: 500, details: "Internal error" },
posthog,
});
expect(posthog.captureException).toHaveBeenCalledWith(
@@ -145,6 +151,7 @@ describe("Error Handler", () => {
message: error.message,
source: "feedback",
metadata: { conversationId: "123", error },
posthog,
});
expect(posthog.captureException).toHaveBeenCalledWith(
@@ -164,6 +171,7 @@ describe("Error Handler", () => {
message: "Chat error",
source: "chat-test",
msgId: "123",
posthog,
};
showChatError(error);

View File

@@ -13,14 +13,14 @@ describe("handleCaptureConsent", () => {
});
it("should opt out of of capturing", () => {
handleCaptureConsent(false);
handleCaptureConsent(posthog, false);
expect(optOutSpy).toHaveBeenCalled();
expect(optInSpy).not.toHaveBeenCalled();
});
it("should opt in to capturing if the user consents", () => {
handleCaptureConsent(true);
handleCaptureConsent(posthog, true);
expect(optInSpy).toHaveBeenCalled();
expect(optOutSpy).not.toHaveBeenCalled();
@@ -28,7 +28,7 @@ describe("handleCaptureConsent", () => {
it("should not opt in to capturing if the user is already opted in", () => {
hasOptedInSpy.mockReturnValueOnce(true);
handleCaptureConsent(true);
handleCaptureConsent(posthog, true);
expect(optInSpy).not.toHaveBeenCalled();
expect(optOutSpy).not.toHaveBeenCalled();
@@ -36,7 +36,7 @@ describe("handleCaptureConsent", () => {
it("should not opt out of capturing if the user is already opted out", () => {
hasOptedOutSpy.mockReturnValueOnce(true);
handleCaptureConsent(false);
handleCaptureConsent(posthog, false);
expect(optOutSpy).not.toHaveBeenCalled();
expect(optInSpy).not.toHaveBeenCalled();

View File

@@ -17,7 +17,7 @@ describe("handleEventForUI", () => {
tool_call_id: "call_123",
observation: {
kind: "ExecuteBashObservation",
output: "hello\n",
content: [{ type: "text", text: "hello\n" }],
command: "echo hello",
exit_code: 0,
error: false,

View File

@@ -1,5 +1,5 @@
import { describe, it, expect } from "vitest";
import { getStatusCode, getIndicatorColor, IndicatorColor } from "../status";
import { getStatusCode, getIndicatorColor, IndicatorColor } from "#/utils/status";
import { AgentState } from "#/types/agent-state";
import { I18nKey } from "#/i18n/declaration";
@@ -87,6 +87,36 @@ describe("getStatusCode", () => {
// Should return runtime status since no agent state
expect(result).toBe("STATUS$STARTING_RUNTIME");
});
it("should prioritize task ERROR status over websocket CONNECTING state", () => {
// Test case: Task has errored but websocket is still trying to connect
const result = getStatusCode(
{ id: "", message: "", type: "info", status_update: true }, // statusMessage
"CONNECTING", // webSocketStatus (stuck connecting)
null, // conversationStatus
null, // runtimeStatus
AgentState.LOADING, // agentState
"ERROR", // taskStatus (ERROR)
);
// Should return error message, not "Connecting..."
expect(result).toBe(I18nKey.AGENT_STATUS$ERROR_OCCURRED);
});
it("should show Connecting when task is working and websocket is connecting", () => {
// Test case: Task is in progress and websocket is connecting normally
const result = getStatusCode(
{ id: "", message: "", type: "info", status_update: true }, // statusMessage
"CONNECTING", // webSocketStatus
null, // conversationStatus
null, // runtimeStatus
AgentState.LOADING, // agentState
"WORKING", // taskStatus (in progress)
);
// Should show connecting message since task hasn't errored
expect(result).toBe(I18nKey.CHAT_INTERFACE$CONNECTING);
});
});
describe("getIndicatorColor", () => {

File diff suppressed because it is too large Load Diff

View File

@@ -1,16 +1,17 @@
{
"name": "openhands-frontend",
"version": "0.59.0",
"version": "0.62.0",
"private": true,
"type": "module",
"engines": {
"node": ">=22.0.0"
},
"dependencies": {
"@heroui/react": "^2.8.4",
"@heroui/react": "2.8.5",
"@heroui/use-infinite-scroll": "^2.2.11",
"@microlink/react-json-view": "^1.26.2",
"@monaco-editor/react": "^4.7.0-rc.0",
"@posthog/react": "^1.4.0",
"@react-router/node": "^7.9.3",
"@react-router/serve": "^7.9.3",
"@react-types/shared": "^3.32.0",
@@ -37,7 +38,7 @@
"jose": "^6.1.0",
"lucide-react": "^0.544.0",
"monaco-editor": "^0.53.0",
"posthog-js": "^1.268.8",
"posthog-js": "^1.298.1",
"react": "^19.1.1",
"react-dom": "^19.1.1",
"react-highlight": "^0.15.0",

Some files were not shown because too many files have changed in this diff Show More