- Server route already returns a list[VSCodeInstanceInfo]
- VsCodeRuntime now expects a list and validates shape
- Updated tests to mock list responses consistently
Co-authored-by: OpenHands-GPT-5 <openhands@all-hands.dev>
- IterationControlFlag.reached_limit now compares current_value >= max_value
so tests expecting limit detection and extensions pass
- VsCodeRuntime._get_available_vscode_instances accepts both list and
{"instances": [...]} responses from server for backward/forward compatibility
Co-authored-by: OpenHands-GPT-5 <openhands@all-hands.dev>
- Include VSCode API routes only when AppMode is OSS, aligning with app-mode gating
alongside Git routes.
- Conflicts reconciled with main: kept OSS-gated inclusion to match current server
composition and PR intent.
Co-authored-by: OpenHands-GPT-5 <openhands@all-hands.dev>
Fixes mypy error in VsCodeRuntime by aligning status_callback signature with Runtime and importing RuntimeStatus.
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
- Merge VSCode extension ignore and test-results entries in .gitignore.
- In openhands/server/app.py import server_config and AppMode and conditionally include git routes for OSS mode; also include vscode routes.
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
Resolved conflicts by taking vscode-runtime versions for VSCode-related files:
- package.json: Kept runtime features (testConnection command, serverUrl config)
- extension.ts: Kept runtime services and connection logic
- README.md: Kept unified launcher + runtime documentation
- test/suite/index.ts: Kept modern async/await glob usage
Took main version for:
- local_runtime.py: Use sys.executable instead of poetry for Jupyter check
The file was causing an import collision on Windows,
where the would try to import from this
file instead of the installed library. This was causing
the server process to crash and the tests to fail.
This commit renames to to avoid this
name collision and updates all references to the old filename.
Co-authored-by: Gemini
The previous implementation of the jupyter dependency check in the
LocalRuntime used with . This was
causing the server process to die on Windows, leading to test failures.
This change refactors the subprocess call to avoid using the shell,
making it more robust and secure, especially on Windows. This resolves
the CI failures for LocalRuntime tests on the Windows platform.
Co-authored-by: Gemini
- Fix trailing whitespace in test_coverage_analysis.md
- Fix end of file issues
- Apply ruff fixes to ensure all Python code passes linting
All pre-commit hooks now pass successfully.
✅ ALL TESTS NOW PASSING (31/31)
FIXES APPLIED:
🔧 Updated subprocess call count expectations (0→1 for --list-extensions)
🔧 Fixed Windsurf command detection (windsurf→surf)
🔧 Updated error message expectations (attempt→success flag)
🔧 Fixed flag creation behavior (no flag on failure = retry logic)
🔧 Updated bundled installation test patterns (1→2 subprocess calls)
BEHAVIORAL CHANGES VALIDATED:
✅ Extension detection via --list-extensions (always called first)
✅ Success-only flag creation (no flag on failure allows retry)
✅ Proper error handling and user messaging
✅ Windsurf vs VS Code command detection
✅ GitHub + bundled installation fallback patterns
COVERAGE STATUS:
📊 67% coverage (42 lines missing)
🎯 All critical new functionality fully tested
🧪 31 comprehensive tests covering all scenarios
The test suite now accurately reflects the new user-friendly
retry logic and success-based flagging behavior.
MAJOR UX IMPROVEMENT:
- Only create flag file on SUCCESS, not on failure
- Check if extension is already installed before attempting installation
- Allow automatic retry if previous installation failed
- No more manual flag file deletion needed
NEW BEHAVIOR:
- ✅ Extension already installed → detect and mark as successful
- ✅ Installation succeeds → create flag, don't retry
- ✅ Installation fails → no flag, will retry next time
- ✅ User installs VS Code later → automatic retry works
- ✅ User fixes PATH/permissions → automatic retry works
TECHNICAL CHANGES:
- Add _is_extension_installed() to check via --list-extensions
- Add _mark_installation_successful() helper
- Change flag file name from _install_attempted to _installed
- Update tests for new subprocess call patterns
- Add test for extension already installed detection
This makes the installation much more user-friendly and follows
standard practices used by package managers and IDE extensions.
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
- Extract GitHub installation logic into _attempt_github_install()
- Extract bundled VSIX installation logic into _attempt_bundled_install()
- Improve code readability and maintainability
- Each method now has clear responsibility and return values
- Main function is now much cleaner and easier to follow
- All existing functionality preserved, all tests still pass
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
- Fixed quote consistency (double to single quotes)
- Applied line wrapping for long argument lists
- Improved code formatting per ruff standards
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
- Updated all tests to expect no marketplace installation attempts
- Simplified error message expectations to match new behavior
- All 24 tests now pass with marketplace installation disabled
- Applied linter formatting fixes
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
- Fixed control flow bug where return statement prevented finally block execution
- Ensured temporary GitHub VSIX files are always cleaned up after installation
- Updated test to properly mock os.path.exists for cleanup verification
The issue was that when GitHub installation succeeded, the function would return
immediately before the finally block could execute to clean up the downloaded
temporary file. Now we use a success flag and return after cleanup.
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
- Convert single quotes to double quotes for consistency
- Clean up if-else structure formatting
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
Use npx vsce instead of global vsce command to ensure the tool is available
from node_modules/.bin on Windows CI environments where global packages
may not be properly configured.
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
- Remove unnecessary try-catch around fs.existsSync() which doesn't throw exceptions
- Fix Windows virtual environment activation to use PowerShell syntax with Activate.ps1
- Improve cross-platform path handling using path.join() instead of string concatenation
- Reorganize code for better separation of platform-specific logic
- Add detailed comments explaining Windows activation approach and limitations
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
Implement contextual messaging for saved files in 'Start Conversation with File Context' command:
- Saved files now use contextual task messages instead of --file flag
- Message format: 'The user has tagged a file [path]. Please read and understand...'
- Maintains original natural language for untitled files: 'User opened an untitled file...'
- Updated tests to verify new contextual messaging behavior
- Follows same pattern as selection context for consistent user experience
This addresses reviewer feedback to provide contextual messaging for file operations
similar to the Python CLI implementation.
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
- Implemented contextual messaging with createFileContextMessage() and createSelectionContextMessage() helpers
- Added Shell Integration support for better command tracking when available
- Conservative terminal reuse approach - only reuses terminals known to be idle to avoid interrupting user processes
- Idle terminal tracking through Shell Integration execution events
- Proper fallback to sendText when Shell Integration unavailable
- Fixed TypeScript compilation errors in Shell Integration tests with proper mock object properties
- Updated test setup for Mocha compatibility (setup() instead of beforeEach())
- All 16 tests now passing including contextual messaging and Shell Integration functionality
- Verified line number conversion (+1) is correct per VSCode API documentation (0-based to 1-based for human readability)
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
Updated Node.js requirement from >=16 to >=18 to match the frontend's
actual usage (18.20.1 via Volta), ensuring consistency across the project.
Changes:
- package.json: Added Node.js >=18.0.0 engine requirement
- build.py: Updated version check to require Node.js >=18
- README.md: Updated documentation to reflect >=18 requirement
- Error messages: Updated to show correct version requirement
This aligns with the frontend's practical Node.js version while
maintaining the optional build fallback for older versions.
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
The previous logic incorrectly skipped building if a .vsix file existed,
which prevented rebuilding during development. Now the logic is:
1. Always try to build if Node.js >= 16 is available
2. Only use pre-built .vsix as fallback when Node.js < 16 or missing
3. Only skip building when SKIP_VSCODE_BUILD is explicitly set
This ensures:
- Developers can rebuild extensions during development
- Users with old Node.js get the pre-built fallback
- The build process works correctly for fresh installs
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
Based on the reviewer's error output showing many VSCode extension
dependencies require Node.js >= 16 or >= 18, update the version check
from >= 14 to >= 16 for more accurate compatibility.
This addresses the specific error with Node.js v12.22.9 that was failing
due to dependencies requiring newer Node.js versions.
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
- Add Node.js version check (requires >= 14)
- Use pre-built .vsix file when Node.js is too old
- Add SKIP_VSCODE_BUILD environment variable option
- Gracefully handle build failures
- Update documentation with build options
This fixes installation issues on systems with Node.js < 14 by falling back
to the pre-built extension instead of failing the entire installation.
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
- Correct Task 2 status from completed to in-progress
- Maintain focus on VSCode Runtime refinement rather than moving to Task 3
- Update next steps to show rebase/integration completed but runtime work ongoing
- Accurately reflect that we're working on making VSCode Runtime robust and reliable
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
- Fix async Promise executor pattern in test suite
- Fix class method reference issues (RuntimeActionHandler -> VSCodeRuntimeActionHandler)
- Fix explicit any types to unknown in error handlers
- Remove unused mockSocket variable in tests
- Fix trailing whitespace in markdown file
All TypeScript compilation errors resolved, extension now compiles successfully.
Co-authored-by: OpenHands-Claude <openhands-claude@all-hands.dev>
Since we now use openhands-types directly from GitHub repository
via git dependency, the local packages/types copy is no longer needed.
Co-authored-by: OpenHands-Claude <openhands-claude@all-hands.dev>
- Add openhands-types as git dependency from GitHub repository
- Install openhands-types package with full TypeScript declarations
- Fix test/suite/index.ts to use modern glob API with named import
- Verify all type imports work correctly (OpenHandsParsedEvent, isOpenHandsAction)
- Confirm extension compiles and packages successfully
- Add comprehensive analysis document for openhands-types integration
This resolves the missing openhands-types dependency that was blocking
VSCode Runtime (Task 2) development. The extension can now properly
validate and handle OpenHands events and actions.
Co-authored-by: OpenHands-Claude <openhands-claude@all-hands.dev>
BREAKTHROUGH: Solved the @openhands/types package issue that was blocking VSCode extension testing!
## Problem Solved:
- Module resolution failure: 'Cannot find module packages/types/dist/core/base'
- File-based package linking failed in VSCode test environment
- Module format mismatch between ES modules and CommonJS
## Solution Implemented:
1. **Package Renamed**: @openhands/types → openhands-types (npm compatible)
2. **Dual-Format Package**: Support both CommonJS (.cjs) and ES modules (.js)
3. **npm link**: Established proper symlink between packages/types and extension
4. **Import Path Fixes**: Fixed CommonJS require statements to use .cjs extensions
5. **Build Automation**: Scripts handle dual builds and file renaming
## Technical Changes:
- packages/types/package.json: Dual exports with proper file extensions
- packages/types/tsconfig.cjs.json: CommonJS build configuration
- packages/types/fix-cjs-imports.js: Script to fix import paths
- VSCode extension: Updated dependency to 'openhands-types': '^0.1.0'
- Import statements: Updated in socket-service.ts and runtime-action-handler.ts
## Verification:
✅ Extension compiles successfully without errors
✅ Tests run properly (20 tests passing)
✅ Module resolution working in both dev and test environments
✅ npm link functioning with proper symlink
## Status:
- Module resolution issue: COMPLETELY SOLVED
- Extension testing: UNBLOCKED
- Remaining test failures: Unrelated network/mocking issues
This resolves the core TypeScript types package issue that was preventing
VSCode extension testing and development.
Co-authored-by: openhands <openhands@all-hands.dev>
Auto-fixed formatting and style issues in VSCode extension source files
using eslint and prettier.
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
- Add socket-service.test.ts with 3 passing tests for basic functionality, VSCode API access, and fetch mocking
- Add runtime-action-handler.test.ts with 3 passing tests for basic functionality, workspace API, and workspace mocking
- Establish TypeScript test framework for VSCode extension services
- Implement proper mocking patterns for VSCode APIs
- Create test infrastructure ready for future service testing expansion
- All new tests compile and run successfully (7/7 passing)
- Update task.md to mark Phase 4.3 as completed
Technical achievements:
- Successfully created TypeScript test framework avoiding complex import issues
- Validated VSCode API mocking capabilities for future comprehensive testing
- Established foundation for testing SocketService and RuntimeActionHandler classes
OpenHands-Claude
✅ PHASE 3 COMPLETED: VsCodeRuntime Discovery & Error Handling
Dynamic Discovery System:
- Removed constructor dependencies for sio_server/socket_connection_id
- Added _get_available_vscode_instances() to query /api/vscode/instances
- Added _validate_vscode_connection() for health checking
- Added _discover_and_connect() for automatic VSCode instance discovery
- Gets sio_server from shared.py automatically (no injection needed)
Smart Connection Management:
- Lazy connection: only connects when actions need to be sent
- Connection validation before every action
- Automatic reconnection if VSCode instance becomes inactive
- Failover to alternative VSCode instances when available
- Comprehensive error handling with user-friendly messages
Enhanced Runtime Features:
- Works with standard AgentSession parameters (no special constructor args)
- Logs workspace path and capabilities on connection
- Continuous health monitoring of connections
- Graceful handling of disconnections and network issues
- Clear error messages when no VSCode instances available
Architecture Achievement:
- Complete end-to-end lazy connection pattern implementation
- VSCode Extension registers → Server tracks → Runtime discovers → Actions flow
- Eliminated timing issues between extension connection and runtime creation
- Robust connection lifecycle management with automatic recovery
- Foundation ready for Phase 4 integration testing
Technical Details:
- Fixed mypy type errors for None checks and union types
- Added proper validation for socket_connection_id before use
- Enhanced error handling for sio_server None cases
- Maintained backward compatibility with existing test injection patterns
Next: Phase 4 - Integration testing and final validation of complete system
Co-authored-by: enyst <enyst@users.noreply.github.com>
Co-authored-by: OpenHands-Claude <openhands-claude@all-hands.dev>
✅ COMPLETED: Extension Lazy Connection Implementation
- Remove immediate initializeRuntime() call from activate()
- Add ConnectionStatus enum for tracking connection state
- Implement ensureConnected() function with lazy connection logic
- Modify all user commands to trigger connection on-demand
- Add openhands.testConnection command for manual testing
- Replace eager connection with user-triggered connection flow
🔧 TECHNICAL CHANGES:
- Extension now activates without connecting to server
- Connection only happens when user runs OpenHands commands
- Comprehensive error handling with user-friendly messages
- Retry and configuration options in error dialogs
- Connection status tracking prevents duplicate attempts
🎯 BENEFITS:
- Eliminates timing dependency (server doesn't need to be running on VSCode start)
- Matches user mental model (connect when using OpenHands)
- Better error handling and user feedback
- Resource efficient (no background connections)
📋 NEXT: Phase 2 - Server Registration System
Co-authored-by: OpenHands-Claude <openhands-claude@all-hands.dev>
BREAKTHROUGH: Identified fundamental timing issue with immediate connection:
- VSCode Extension activates when VSCode starts
- But OpenHands server might not be running yet!
- Extension fails to connect and becomes unusable
NEW APPROACH: Lazy Connection Pattern
- Extension activates but doesn't connect immediately
- Only connects when user runs OpenHands commands
- Matches user mental model and eliminates timing dependencies
- Simpler, more resource-efficient implementation
Next: Implement lazy connection in extension activation
Co-authored-by: OpenHands-Claude <openhands-claude@all-hands.dev>
Removed migration section since code consolidation is done.
Now focused on implementing the Runtime Registration Pattern:
- VSCode registration API endpoint
- Extension registration after Socket.IO connection
- VsCodeRuntime connection discovery
- End-to-end coordination testing
- Architecture breakthrough: Socket.IO approach is brilliant, not hallucinated
- Identified real problems: connection coordination, not fundamental architecture
- Proposed solution: Runtime Registration Pattern for connection discovery
- Migration plan: consolidate extension code from old scaffolding to main extension
- Next steps: migrate files, implement registration API, test coordination
Key findings:
- Socket.IO architecture is actually brilliant and correct
- VSCode Extension acts like another frontend client (like web UI)
- Main issue: VsCodeRuntime needs socket_connection_id but has no way to get it
- AgentSession only passes standard runtime params, missing VSCode-specific ones
Proposed solution: Runtime Registration Pattern
- VSCode Extension registers itself with OpenHands server after connecting
- Server maintains registry: socket_connection_id → VSCode instance info
- VsCodeRuntime queries registry to find available connections
- Clean separation: Extension handles connection, Runtime handles execution
This solves the coordination problem without changing core architecture!
- VSCode Extension acts like another frontend client (like web UI)
- Main Socket.IO server acts as message broker
- VsCodeRuntime routes events via socket_connection_id
- Architecture reuses existing OpenHands infrastructure elegantly
- Real issues are connection timing and coordination, not architecture
Document the successful completion of the VSCode runtime migration:
- All 4 phases completed successfully
- Extension functionality unified (launcher + runtime)
- Old extension cleanly removed
- Documentation updated
- Ready for testing and deployment
This summary provides a comprehensive overview of what was accomplished
during the migration process.
Major integration milestone:
- Add imports for SocketService and VSCodeRuntimeActionHandler
- Add runtime initialization function with server URL configuration
- Integrate runtime startup in activate() function
- Add proper cleanup in deactivate() function
- Successfully compile and package unified extension
The extension now combines:
1. Launcher functionality (context menu commands)
2. Runtime functionality (backend communication and action execution)
Testing results:
- ✅ TypeScript compilation successful (npm run compile)
- ✅ Extension packaging successful (npm run package-vsix)
- 🔄 Manual testing in VSCode pending
Next: Phase 4 cleanup of old runtime extension files.
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
- Add ~/.openhands/microagents/ as a microagent source directory
- User microagents are loaded after global ones, allowing overrides
- Automatically create user microagents directory if it doesn't exist
- Add comprehensive unit tests for user microagent functionality
- Handle errors gracefully when loading user microagents
This allows users to store personal/local microagents in their user
directory instead of keeping uncommitted files in repository working
directories, preventing accidental loss during git operations.
Co-authored-by: openhands <openhands@all-hands.dev>
- Fix VsCodeRuntime constructor to match standard runtime interface
- Add missing abstract methods with correct signatures: connect, copy_from, copy_to, get_mcp_config, list_files
- Add VSCode runtime to test framework in conftest.py
- Add VSCode runtime tests to CI workflow
- Create comprehensive task analysis in vscode_runtime_task.md
- Update vscode.md with current implementation status
The VSCode runtime now properly integrates with the existing test infrastructure
and returns appropriate errors when no VSCode extension is connected.
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
✅ Added event serialization support:
- Import event_to_dict and event_from_dict from openhands.events.serialization
- Replace manual event payload creation with proper event_to_dict()
- Replace manual observation construction with event_from_dict()
✅ Benefits:
- Ensures consistent JSON serialization format across all runtimes
- Handles all action/observation types automatically
- Proper handling of complex fields (timestamps, enums, metadata)
- Maintains compatibility with existing event stream format
- Reduces code duplication and potential serialization bugs
✅ Socket.IO communication now uses:
- Outgoing: event_to_dict(action) → JSON → VSCode extension
- Incoming: JSON → event_from_dict(observation_event) → Observation
This makes the VSCode runtime fully compatible with OpenHands event
serialization standards and ready for production use.
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
Major fixes applied:
✅ Removed hallucinated actions:
- Deleted mkdir(), rmdir(), rm() methods - these action types don't exist
- Directory operations should use CmdRunAction or FileEditAction
✅ Added missing required abstract methods:
- edit() for FileEditAction
- browse_interactive() for BrowseInteractiveAction
- call_tool_mcp() for MCPAction
✅ Fixed method signatures:
- All methods now match Runtime base class exactly
- Added _run_async_action() helper for async operations in sync context
✅ Removed non-standard methods:
- Deleted recall(), finish(), send_message() - these are agent-level actions
✅ Fixed imports and observations:
- Added missing Action import and all required action/observation types
- Added support for FileEditObservation, BrowserOutputObservation, etc.
- Fixed observation constructors with correct parameters
✅ Fixed event payload and logging:
- Use action.__class__.__name__ and action.__dict__
- Fixed logger.warn() to logger.warning()
- Fixed mypy type errors with proper type assertions
The runtime now correctly implements all required abstract methods with only
actual OpenHands actions. Socket.IO architecture remains sound. Ready for
integration testing with VSCode extension.
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
- Identified hallucinated actions: mkdir, rmdir, rm don't exist in OpenHands
- Directory operations should use CmdRunAction or FileEditAction
- Missing required abstract methods: edit, browse_interactive, call_tool_mcp
- Wrong method signatures: some async methods should be sync
- Scope issues: implementing agent-level actions instead of execution actions
- Socket.IO architecture is correct, but action handling needs fixes
- Documented actual OpenHands actions vs hallucinated ones
The runtime needs to implement only the actions that actually exist in openhands.events.
- Corrected analysis to recognize existing Socket.IO infrastructure
- Removed incorrect assumptions about missing infrastructure
- Updated architecture documentation to show proper event flow
- Changed assessment from 'fundamental issues' to 'implementation details'
- Documented proper integration with existing OpenHands Socket.IO server
The VSCode runtime approach is architecturally sound and leverages existing infrastructure correctly.
- Add ~/.openhands/microagents/ as a microagent source directory
- User microagents are loaded after global ones, allowing overrides
- Automatically create user microagents directory if it doesn't exist
- Add comprehensive unit tests for user microagent functionality
- Handle errors gracefully when loading user microagents
This allows users to store personal/local microagents in their user
directory instead of keeping uncommitted files in repository working
directories, preventing accidental loss during git operations.
Co-authored-by: openhands <openhands@all-hands.dev>
- Fix VsCodeRuntime constructor to match standard runtime interface
- Add missing abstract methods with correct signatures: connect, copy_from, copy_to, get_mcp_config, list_files
- Add VSCode runtime to test framework in conftest.py
- Add VSCode runtime tests to CI workflow
- Create comprehensive task analysis in vscode_runtime_task.md
- Update vscode.md with current implementation status
The VSCode runtime now properly integrates with the existing test infrastructure
and returns appropriate errors when no VSCode extension is connected.
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
- Fix VsCodeRuntime constructor to match standard runtime interface
- Add missing abstract methods with correct signatures: connect, copy_from, copy_to, get_mcp_config, list_files
- Add VSCode runtime to test framework in conftest.py
- Add VSCode runtime tests to CI workflow
- Create comprehensive task analysis in vscode_runtime_task.md
- Update vscode.md with current implementation status
The VSCode runtime now properly integrates with the existing test infrastructure
and returns appropriate errors when no VSCode extension is connected.
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
✅ Added event serialization support:
- Import event_to_dict and event_from_dict from openhands.events.serialization
- Replace manual event payload creation with proper event_to_dict()
- Replace manual observation construction with event_from_dict()
✅ Benefits:
- Ensures consistent JSON serialization format across all runtimes
- Handles all action/observation types automatically
- Proper handling of complex fields (timestamps, enums, metadata)
- Maintains compatibility with existing event stream format
- Reduces code duplication and potential serialization bugs
✅ Socket.IO communication now uses:
- Outgoing: event_to_dict(action) → JSON → VSCode extension
- Incoming: JSON → event_from_dict(observation_event) → Observation
This makes the VSCode runtime fully compatible with OpenHands event
serialization standards and ready for production use.
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
Major fixes applied:
✅ Removed hallucinated actions:
- Deleted mkdir(), rmdir(), rm() methods - these action types don't exist
- Directory operations should use CmdRunAction or FileEditAction
✅ Added missing required abstract methods:
- edit() for FileEditAction
- browse_interactive() for BrowseInteractiveAction
- call_tool_mcp() for MCPAction
✅ Fixed method signatures:
- All methods now match Runtime base class exactly
- Added _run_async_action() helper for async operations in sync context
✅ Removed non-standard methods:
- Deleted recall(), finish(), send_message() - these are agent-level actions
✅ Fixed imports and observations:
- Added missing Action import and all required action/observation types
- Added support for FileEditObservation, BrowserOutputObservation, etc.
- Fixed observation constructors with correct parameters
✅ Fixed event payload and logging:
- Use action.__class__.__name__ and action.__dict__
- Fixed logger.warn() to logger.warning()
- Fixed mypy type errors with proper type assertions
The runtime now correctly implements all required abstract methods with only
actual OpenHands actions. Socket.IO architecture remains sound. Ready for
integration testing with VSCode extension.
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
- Identified hallucinated actions: mkdir, rmdir, rm don't exist in OpenHands
- Directory operations should use CmdRunAction or FileEditAction
- Missing required abstract methods: edit, browse_interactive, call_tool_mcp
- Wrong method signatures: some async methods should be sync
- Scope issues: implementing agent-level actions instead of execution actions
- Socket.IO architecture is correct, but action handling needs fixes
- Documented actual OpenHands actions vs hallucinated ones
The runtime needs to implement only the actions that actually exist in openhands.events.
- Corrected analysis to recognize existing Socket.IO infrastructure
- Removed incorrect assumptions about missing infrastructure
- Updated architecture documentation to show proper event flow
- Changed assessment from 'fundamental issues' to 'implementation details'
- Documented proper integration with existing OpenHands Socket.IO server
The VSCode runtime approach is architecturally sound and leverages existing infrastructure correctly.
- Add OpenHands submenu to context menu for cleaner organization
- Group 'Start with File Content' and 'Start with Selected Text' commands
- Use shorter titles in context menu while preserving full descriptive names in Command Palette
- Leverage category field to automatically prefix commands with 'OpenHands:' in Ctrl+Shift+P
Co-authored-by: openhands <openhands@all-hands.dev>
- Change from 'OpenHands 14:32:45' to 'OpenHands 14:32'
- More human-friendly and cleaner terminal tab names
- Minute precision is sufficient for terminal identification
- VSCode handles duplicate names gracefully if needed
Co-authored-by: openhands <openhands@all-hands.dev>
- build.py is essential: runs npm install and npm run package-vsix
- Creates the .vsix file during Poetry build process
- Without it, there would be no .vsix file to include in package
- This is a necessary part of VSCode extension integration
Co-authored-by: openhands <openhands@all-hands.dev>
- Keep only essential change: include .vsix file in package
- Revert unnecessary changes to packages structure and dependencies
- Remove pytest from main dependencies (belongs in dev.dependencies)
- Remove custom build script (not needed for this PR)
- Cleaner, focused changes for VSCode extension integration
Co-authored-by: openhands <openhands@all-hands.dev>
- Moved development planning document to ~/.openhands/microagents/plan-vscode-integration.md
- PLAN.md was useful during development but doesn't belong in production extension
- Keeps repository clean for end users while preserving development history
Co-authored-by: openhands <openhands@all-hands.dev>
- Remove error messages for missing editor/file/selection contexts
- All commands now gracefully fallback to starting OpenHands without task
- Better user experience: clicking any command always starts OpenHands
- Commands behavior:
* startConversation: no task (unchanged)
* startConversationWithFileContext: file content as task, or no task if no file/empty
* startConversationWithSelectionContext: selected text as task, or no task if no selection
Co-authored-by: openhands <openhands@all-hands.dev>
- Replace last DEBUG showErrorMessage with output channel logging
- Keep legitimate user-facing error messages as popups
- All debug info now goes to 'OpenHands Debug' output channel
Co-authored-by: openhands <openhands@all-hands.dev>
- Remove vscode.window.showErrorMessage() calls for debug information
- Add dedicated 'OpenHands Debug' output channel for development logging
- Debug messages now appear in Output panel instead of popup notifications
- Users won't be bothered by debug messages, but developers can still access them
- Follows VSCode extension best practices for logging
Co-authored-by: openhands <openhands@all-hands.dev>
This development-time analysis file has been moved to user microagents
directory (~/.openhands/microagents/) as it's not needed by other developers.
The analysis was useful during development but doesn't belong in the PR.
- Uncomment .vscode-test/ in .gitignore to prevent accidental commits
- These files are generated during extension testing and shouldn't be in version control
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
- Add location info for public microagents in glossary
- Add comprehensive Microagents section to repo.md with:
- Types (public vs repository microagents)
- Loading behavior (frontmatter triggers vs always-loaded)
- Structure example with YAML frontmatter
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
- Add comprehensive VSCode API documentation references as comments
- Include Shell Integration requirements and compatibility notes
- Preserve important development references in the codebase for future maintainers
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
- Move TERMINAL_REUSE_ANALYSIS.md to .openhands/microagents/vscode-terminal-reuse-analysis.md
- Update README.md with essential user-facing terminal management info
- Remove detailed development analysis from PR, keeping it for future reference
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
- Updated package.json engines.vscode from ^1.80.0 to ^1.98.2
- Updated @types/vscode dependency to ^1.98.2
- Updated README.md requirements section
- Updated PLAN.md documentation
- Regenerated package-lock.json automatically via npm install
This aligns our main VSCode extension with the runtime extensions
which already require VSCode 1.98.2+, ensuring consistency across
all VSCode integrations in the project.
Co-authored-by: openhands <openhands@all-hands.dev>
- Add VSCode extension linting command to pre-push checklist
- Document VSCode extension structure, setup, and commands
- Include linting, building, and testing commands for the extension
Co-authored-by: OpenHands <openhands@all-hands.dev>
- Add comprehensive linting setup adapted from frontend configuration
- Configure ESLint with airbnb-base rules for Node.js/VSCode extensions
- Add Prettier configuration matching frontend standards
- Include linting scripts in package.json (lint, lint:fix, typecheck)
- Add development dependencies for linting tools
- Update documentation with linting workflow and development guidelines
- Apply automatic formatting to all source files
- Configure special rules for test files and VSCode extension patterns
This ensures code quality consistency with the main OpenHands codebase.
Co-authored-by: OpenHands <openhands@all-hands.dev>
The previous implementation used probing to detect terminal status, which
could interrupt running CLI processes. This fix implements safe state
tracking that only reuses terminals where OpenHands commands have completed.
Key changes:
- Remove intrusive terminal probing that interrupted running processes
- Add safe state tracking using Set to track idle terminals
- Only reuse terminals that we know are safe (completed our commands)
- Use Shell Integration API for monitoring command completion
- Create new terminals when terminal state is unknown (safe fallback)
- Clean up terminal state tracking when terminals are closed
This ensures that running CLIs and other processes in terminals are never
interrupted when sending new tasks to OpenHands.
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
- Add Shell Integration API support for smart terminal detection
- Implement terminal probing to check if terminals are idle
- Add graceful fallback to new terminal creation when Shell Integration unavailable
- Refactor code into modular functions for better maintainability
- Add comprehensive tests for new terminal reuse functionality
- Update README with new features and requirements
- Support cross-shell compatibility (bash, zsh, PowerShell, fish)
This implements the advanced terminal handling described in TERMINAL_REUSE_ANALYSIS.md,
providing intelligent terminal reuse while maintaining backward compatibility.
Co-authored-by: OpenHands-Gemini <openhands@all-hands.dev>
- Add comprehensive analysis of VSCode's Shell Integration capabilities
- Document intelligent terminal probing with execution.read() and executeCommand()
- Update recommendations to use Shell Integration with graceful fallback
- Replace outdated API limitations with current 2024/2025 capabilities
- Add implementation strategy with phases and code examples
- Include proper references to VSCode API documentation
Co-authored-by: Claude 3.5 Sonnet <claude-3-5-sonnet@anthropic.com>
The build script was trying to copy the VSIX file to the same location,
causing a SameFileError. Since the VSIX is already built in the correct
location (openhands/integrations/vscode/) and pyproject.toml includes
it from there, no copying is needed.
Changes:
- Remove unnecessary copy operation from build_vscode_extension()
- Remove unused shutil import and RESOURCES_DIR variable
- Simplify to just build and verify the VSIX exists
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
- Move VS Code extension from root-level openhands-vscode/ to openhands/integrations/vscode/
- Update pyproject.toml to include VSIX from new location: openhands/integrations/vscode/*.vsix
- Update CLI code to load VSIX from new path: integrations/vscode/
- Update build.py to build extension in new location
- Preserve file history using git mv operations
- Maintain VSIX bundling in PyPI package for CLI auto-installation
This reorganization improves architectural consistency by placing the VS Code
integration alongside other integrations rather than at the root level.
The VSIX file is excluded as it's a build artifact generated by build.py.
Co-authored-by: OpenHands-Claude <openhands@all-hands.dev>
PR_BODY=$(gh pr view $PR_NUMBER --json body --jq .body)
PR_BODY=$(gh pr view "$PR_NUMBER" --json body --jq .body)
# Prepare the new PR body with both commands
ifecho"$PR_BODY"| grep -q "To run this PR locally, use the following command:";then
# For existing PR descriptions, replace the command section
NEW_PR_BODY=$(echo"$PR_BODY"| sed "s|To run this PR locally, use the following command:.*\`\`\`|To run this PR locally, use the following command:\n\nGUI with Docker:\n\`\`\`\n$DOCKER_RUN_COMMAND\n\`\`\`\n\nCLI with uvx:\n\`\`\`\n$UVX_RUN_COMMAND\n\`\`\`|s")
# For existing PR descriptions, use a more robust approach
# Split the PR body at the "To run this PR locally" section and replace everything after it
BEFORE_SECTION=$(echo"$PR_BODY"| sed '/To run this PR locally, use the following command:/,$d')
NEW_PR_BODY=$(cat <<EOF
${BEFORE_SECTION}
To run this PR locally, use the following command:
GUI with Docker:
\`\`\`
${DOCKER_RUN_COMMAND}
\`\`\`
CLI with uvx:
\`\`\`
${UVX_RUN_COMMAND}
\`\`\`
EOF
)
else
# For new PR descriptions
NEW_PR_BODY="${PR_BODY}
# For new PR descriptions: use heredoc safely without indentation
NEW_PR_BODY=$(cat <<EOF
$PR_BODY
---
@@ -35,15 +55,17 @@ To run this PR locally, use the following command:
GUI with Docker:
\`\`\`
$DOCKER_RUN_COMMAND
${DOCKER_RUN_COMMAND}
\`\`\`
CLI with uvx:
\`\`\`
$UVX_RUN_COMMAND
\`\`\`"
${UVX_RUN_COMMAND}
\`\`\`
EOF
)
fi
# Update the PR description
echo"Updating PR description with Docker and uvx commands"
stale-issue-message:'This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.'
stale-pr-message:'This PR is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.'
days-before-stale:30
stale-issue-message:'This issue is stale because it has been open for 40 days with no activity. Remove the stale label or leave a comment, otherwise it will be closed in 10 days.'
stale-pr-message:'This PR is stale because it has been open for 40 days with no activity. Remove the stale label or leave a comment, otherwise it will be closed in 10 days.'
days-before-stale:40
exempt-issue-labels:'roadmap'
close-issue-message:'This issue was closed because it has been stalled for over 30 days with no activity.'
close-pr-message:'This PR was closed because it has been stalled for over 30 days with no activity.'
days-before-close:7
close-issue-message:'This issue was automatically closed due to 50 days of inactivity. We do this to help keep the issues somewhat manageable and focus on active issues.'
close-pr-message:'This PR was closed because it had no activity for 50 days. If you feel this was closed in error, and you would like to continue the PR, please resubmit or let us know.'
@@ -52,37 +52,63 @@ which comes with $20 in free credits for new users.
## 💻 Running OpenHands Locally
OpenHands can also run on your local system using Docker.
See the [Running OpenHands](https://docs.all-hands.dev/usage/installation) guide for
system requirements and more information.
### Option 1: CLI Launcher (Recommended)
> [!WARNING]
> On a public network? See our [Hardened Docker Installation Guide](https://docs.all-hands.dev/usage/runtimes/docker#hardened-docker-installation)
> to secure your deployment by restricting network binding and implementing additional security measures.
The easiest way to run OpenHands locally is using the CLI launcher with [uv](https://docs.astral.sh/uv/). This provides better isolation from your current project's virtual environment and is required for OpenHands' default MCP servers.
**Install uv** (if you haven't already):
See the [uv installation guide](https://docs.astral.sh/uv/getting-started/installation/) for the latest installation instructions for your platform.
> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.
You'll find OpenHands running at [http://localhost:3000](http://localhost:3000)!
> [!WARNING]
> On a public network? See our [Hardened Docker Installation Guide](https://docs.all-hands.dev/usage/runtimes/docker#hardened-docker-installation)
> to secure your deployment by restricting network binding and implementing additional security measures.
### Getting Started
When you open the application, you'll be asked to choose an LLM provider and add an API key.
[Anthropic's Claude Sonnet 4](https://www.anthropic.com/api) (`anthropic/claude-sonnet-4-20250514`)
works best, but you have [many options](https://docs.all-hands.dev/usage/llms).
See the [Running OpenHands](https://docs.all-hands.dev/usage/installation) guide for
system requirements and more information.
## 💡 Other ways to run OpenHands
> [!WARNING]
@@ -93,8 +119,8 @@ works best, but you have [many options](https://docs.all-hands.dev/usage/llms).
#- SANDBOX_USER_ID=${SANDBOX_USER_ID:-1234} # enable this only if you want a specific non-root sandbox user but you will have to manually adjust permissions of ~/.openhands for this user
Here the workspace name is **jira.all-hands.dev**.
</Note>
3. **Complete OAuth Flow**
- You'll be redirected to Jira Data Center to complete OAuth verification
- Grant the necessary permissions to verify your workspace access. If you have access to multiple workspaces, select the correct one that you initially provided
@@ -101,18 +109,18 @@ description: Complete guide for setting up Jira Data Center integration with Ope
@@ -28,7 +28,7 @@ description: Complete guide for setting up Linear integration with OpenHands Clo
1. **Access API Settings**
- Log in as the service account
- Go to **Settings** > **API**
- Go to **Settings** > **Security & access**
2. **Create Personal API Key**
- Click **Create new key**
@@ -82,6 +82,14 @@ description: Complete guide for setting up Linear integration with OpenHands Clo
- **Service Account API Key**: The API key from Step 2 above
- Ensure **Active** toggle is enabled
<Note>
Workspace name is the identifier after the host name when accessing a resource in Linear.
Eg: https://linear.app/allhands/issue/OH-37
Here the workspace name is **allhands**.
</Note>
3. **Complete OAuth Flow**
- You'll be redirected to Linear to complete OAuth verification
- Grant the necessary permissions to verify your workspace access. If you have access to multiple workspaces, select the correct one that you initially provided
@@ -105,15 +113,15 @@ description: Complete guide for setting up Linear integration with OpenHands Clo
@@ -58,17 +58,18 @@ The OpenHands agent needs to identify which Git repository to work with when pro
### Platform Configuration Issues
- **Webhook not triggering**: Verify the webhook URL is correct and the proper event types are selected (Comment, Issue updated)
- **API authentication failing**: Check API key/token validity and ensure required scopes are granted
- **API authentication failing**: Check API key/token validity and ensure required scopes are granted. If your current API token is expired, make sure to update it in the respective integration settings
- **Permission errors**: Ensure the service account has access to relevant projects/teams and appropriate permissions
### Workspace Integration Issues
- **Workspace linking requests credentials**: If there are no active workspace integrations for the workspace you specified, you need to configure it first. Contact your platform administrator that you want to integrate with (eg: Jira, Linear)
- **OAuth flow fails**: Ensure you're signing in with the same Git provider account that contains the repositories you want OpenHands to work on
- **Integration not found**: Verify the workspace name matches exactly and that platform configuration was completed first
- **OAuth flow fails**: Make sure that you're authorizing with the correct account with proper workspace access
### General Issues
- **Agent not responding**: Check webhook logs in your platform settings and verify service account status
- **Agent fails to identify git repo**: Ensure you're signing in with the same Git provider account that contains the repositories you want OpenHands to work on
- **Partial functionality**: Ensure both platform configuration and workspace integration are properly completed
**Note** - OpenHands requires Python version 3.12 or higher (Python 3.14 is not currently supported) and `uvx` for the default `fetch` MCP server (more details below).
**Note** - OpenHands requires Python version 3.12 or higher (Python 3.14 is not currently supported) and `uv` for the default `fetch` MCP server (more details below).
1. Install OpenHands using pip:
```bash
pip install openhands-ai
```
#### Recommended: Using uv
Or if you prefer not to manage your own Python environment, you can use `uvx`:
We recommend using [uv](https://docs.astral.sh/uv/) for the best OpenHands experience. uv provides better isolation from your current project's virtual environment and is required for OpenHands' default MCP servers.
1. **Install uv** (if you haven't already):
See the [uv installation guide](https://docs.astral.sh/uv/getting-started/installation/) for the latest installation instructions for your platform.
2. **Launch OpenHands CLI**:
```bash
uvx --python 3.12 --from openhands-ai openhands
```
<AccordionGroup>
<Accordion title="Alternative: Traditional pip installation">
If you prefer to use pip:
```bash
# Install OpenHands
pip install openhands-ai
```
Note that you'll still need `uv` installed for the default MCP servers to work properly.
</Accordion>
<Accordion title="Create shell aliases for easy access across environments">
Add the following to your shell configuration file (`.bashrc`, `.zshrc`, etc.):
```bash
# Add OpenHands aliases
# Add OpenHands aliases (recommended)
alias openhands="uvx --python 3.12 --from openhands-ai openhands"
alias oh="uvx --python 3.12 --from openhands-ai openhands"
```
@@ -72,18 +87,19 @@ source ~/.bashrc # or source ~/.zshrc
</AccordionGroup>
2. Launch an interactive OpenHands conversation from the command line:
3. Launch an interactive OpenHands conversation from the command line:
```bash
openhands
# If using uvx (recommended)
uvx --python 3.12 --from openhands-ai openhands
```
<Note>
If you have cloned the repository, you can also run the CLI directly using Poetry:
poetry run python -m openhands.cli.main
poetry run openhands
</Note>
3. Set your model, API key, and other preferences using the UI (or alternatively environment variables, below).
4. Set your model, API key, and other preferences using the UI (or alternatively environment variables, below).
This command opens an interactive prompt where you can type tasks or commands and get responses from OpenHands.
The first time you run the CLI, it will take you through configuring the required LLM
@@ -103,7 +119,7 @@ The conversation history will be saved in `~/.openhands/sessions`.
@@ -7,6 +7,67 @@ description: High level overview of the Graphical User Interface (GUI) in OpenHa
- [OpenHands is running](/usage/local-setup)
## Launching the GUI Server
### Using the CLI Command
You can launch the OpenHands GUI server directly from the command line using the `serve` command:
<Callout type="info">
**Prerequisites**: You need to have the [OpenHands CLI installed](/usage/how-to/cli-mode) first, OR have `uv` installed and run `uvx --python 3.12 --from openhands-ai openhands serve`. Otherwise, you'll need to use Docker directly (see the [Docker section](#using-docker-directly) below).
</Callout>
```bash
openhands serve
```
This command will:
- Check that Docker is installed and running
- Pull the required Docker images
- Launch the OpenHands GUI server at http://localhost:3000
- Use the same configuration directory (`~/.openhands`) as the CLI mode
#### Mounting Your Current Directory
To mount your current working directory into the GUI server container, use the `--mount-cwd` flag:
```bash
openhands serve --mount-cwd
```
This is useful when you want to work on files in your current directory through the GUI. The directory will be mounted at `/workspace` inside the container.
#### Using GPU Support
If you have NVIDIA GPUs and want to make them available to the OpenHands container, use the `--gpu` flag:
```bash
openhands serve --gpu
```
This will enable GPU support via nvidia-docker, mounting all available GPUs into the container. You can combine this with other flags:
```bash
openhands serve --gpu --mount-cwd
```
**Prerequisites for GPU support:**
- NVIDIA GPU drivers must be installed on your host system
- [NVIDIA Container Toolkit (nvidia-docker2)](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) must be installed and configured
#### Requirements
Before using the `openhands serve` command, ensure that:
- Docker is installed and running on your system
- You have internet access to pull the required Docker images
- Port 3000 is available on your system
The CLI will automatically check these requirements and provide helpful error messages if anything is missing.
### Using Docker Directly
Alternatively, you can run the GUI server using Docker directly. See the [local setup guide](/usage/local-setup) for detailed Docker instructions.
@@ -32,4 +32,4 @@ When running OpenHands, you'll need to set the following in the OpenHands UI thr
Pricing follows official API provider rates. [You can view model prices here.](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json)
For `qwen3-coder-480b`, we charge the cheapest FP8 rate available on openrouter: $0.4 per million input tokens and $1.6 per million output tokens.
For `qwen3-coder-480b`, we charge the cheapest FP8 rate available on openrouter: \$0.4 per million input tokens and \$1.6 per million output tokens.
@@ -66,20 +66,64 @@ A system with a modern processor and a minimum of **4GB RAM** is recommended to
### Start the App
#### Option 1: Using the CLI Launcher with uv (Recommended)
We recommend using [uv](https://docs.astral.sh/uv/) for the best OpenHands experience. uv provides better isolation from your current project's virtual environment and is required for OpenHands' default MCP servers (like the [fetch MCP server](https://github.com/modelcontextprotocol/servers/tree/main/src/fetch)).
**Install uv** (if you haven't already):
See the [uv installation guide](https://docs.astral.sh/uv/getting-started/installation/) for the latest installation instructions for your platform.
This will automatically handle Docker requirements checking, image pulling, and launching the GUI server. The `--gpu` flag enables GPU support via nvidia-docker, and `--mount-cwd` mounts your current directory into the container.
<Accordion title="Alternative: Traditional pip installation">
If you prefer to use pip and have Python 3.12+ installed:
```bash
# Install OpenHands
pip install openhands-ai
# Launch the GUI server
openhands serve
```
Note that you'll still need `uv` installed for the default MCP servers to work properly.
</Accordion>
#### Option 2: Using Docker Directly
<Accordion title="Docker Command (Click to expand)">
> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.
You'll find OpenHands running at http://localhost:3000!
This folder contains the evaluation harness that we built on top of the original [SWE-Bench benchmark](https://www.swebench.com/) ([paper](https://arxiv.org/abs/2310.06770)).
**UPDATE (8/12/2025): We now support running SWE-rebench evaluation (see the paper [here](https://arxiv.org/abs/2505.20411))! For how to run it, checkout [this README](./SWE-rebench.md).**
**UPDATE (6/15/2025): We now support running SWE-bench-Live evaluation (see the paper [here](https://arxiv.org/abs/2505.23419))! For how to run it, checkout [this README](./SWE-bench-Live.md).**
**UPDATE (5/26/2025): We now support running interactive SWE-Bench evaluation (see the paper [here](https://arxiv.org/abs/2502.13069))! For how to run it, checkout [this README](./SWE-Interact.md).**
SWE-rebench is a large-scale dataset for verifiable software engineering tasks.
It comes in **two datasets**:
* **[`nebius/SWE-rebench-leaderboard`](https://huggingface.co/datasets/nebius/SWE-rebench-leaderboard)** – updatable benchmark used for [leaderboard evaluation](https://swe-rebench.com/leaderboard).
* **[`nebius/SWE-rebench`](https://huggingface.co/datasets/nebius/SWE-rebench)** – full dataset with **21,302 tasks**, suitable for training or large-scale offline evaluation.
This document explains how to run OpenHands on SWE-rebench, using the leaderboard split as the main example.
To run on the full dataset, simply replace the dataset name.
## Setting Up
Set up your development environment and configure your LLM provider by following the [SWE-bench README](README.md) in this directory.
## Running Inference
Use the existing SWE-bench inference script, changing the dataset to `nebius/SWE-rebench-leaderboard` and selecting the split (`test` for leaderboard submission):
2. Clone the SWE-bench-fork repo (https://github.com/SWE-rebench/SWE-bench-fork) and follow its README to install dependencies.
3. Run the evaluation using the fork:
```bash
python -m swebench.harness.run_evaluation \
--dataset_name nebius/SWE-rebench-leaderboard \
--split test\
--predictions_path preds.jsonl \
--max_workers 10\
--run_id openhands
```
## Citation
```bibtex
@article{badertdinov2025swerebench,
title={SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents},
author={Badertdinov, Ibragim and Golubev, Alexander and Nekrashevich, Maksim and Shevtsov, Anton and Karasik, Simon and Andriushchenko, Andrei and Trofimova, Maria and Litvintseva, Daria and Yangel, Boris},
I've uploaded a python code repository in the directory {{ workspace_dir_name }}. Consider the following issue description:
<issue_description>
{{ instance.problem_statement }}
</issue_description>
Can you help me implement the necessary changes to the repository so that the requirements specified in the <issue_description> are met?
I've already taken care of all changes to any of the test files described in the <issue_description>. This means you DON'T have to modify the testing logic or any of the tests in any way!
Also the development Python environment is already set up for you (i.e., all dependencies already installed), so you don't need to install other packages.
Your task is to make the minimal changes to non-test files in the /workspace/{{ workspace_dir_name }} directory to ensure the <issue_description> is satisfied.
Follow these phases to resolve the issue:
Phase 1. READING: read the problem and reword it in clearer terms
1.1 If there are code or config snippets. Express in words any best practices or conventions in them.
1.5 Hightlight any best practices to take into account when testing and fixing the issue
Phase 2. RUNNING: install and run the tests on the repository
2.1 Follow the readme
2.2 Install the environment and anything needed
2.2 Iterate and figure out how to run the tests
Phase 3. EXPLORATION: find the files that are related to the problem and possible solutions
3.1 Use `grep` to search for relevant methods, classes, keywords and error messages.
3.2 Identify all files related to the problem statement.
3.3 Propose the methods and files to fix the issue and explain why.
3.4 From the possible file locations, select the most likely location to fix the issue.
Phase 4. TEST CREATION: before implementing any fix, create a script to reproduce and verify the issue.
4.1 Look at existing test files in the repository to understand the test format/structure.
4.2 Create a minimal reproduction script that reproduces the located issue.
4.3 Run the reproduction script to confirm you are reproducing the issue.
4.4 Adjust the reproduction script as necessary.
Phase 5. FIX ANALYSIS: state clearly the problem and how to fix it
5.1 State clearly what the problem is.
5.2 State clearly where the problem is located.
5.3 State clearly how the test reproduces the issue.
5.4 State clearly the best practices to take into account in the fix.
5.5 State clearly how to fix the problem.
Phase 6. FIX IMPLEMENTATION: Edit the source code to implement your chosen solution.
6.1 Make minimal, focused changes to fix the issue.
Phase 7. VERIFICATION: Test your implementation thoroughly.
7.1 Run your reproduction script to verify the fix works.
7.2 Add edge cases to your test script to ensure comprehensive coverage.
7.3 Run existing tests related to the modified code to ensure you haven't broken anything.
8. FINAL REVIEW: Carefully re-read the problem description and compare your changes with the base commit {{ instance.base_commit }}.
8.1 Ensure you've fully addressed all requirements.
8.2 Run any tests in the repository related to:
8.2.1 The issue you are fixing
8.2.2 The files you modified
8.2.3 The functions you changed
8.3 If any tests fail, revise your implementation until all tests pass
Be thorough in your exploration, testing, and reasoning. It's fine if your thinking process is lengthy - quality and completeness are more important than brevity.
You are provided with a Python code repository that contains an issue requiring your attention. The repository is located in a sandboxed environment, and you have access to the codebase to implement the necessary changes.
The code repository is located at: `/workspace/{{ workspace_dir_name }}`
(This path is provided for context; use file system tools to confirm paths before access).
## Goal
Your goal is to fix the issue described in the **Issue Description** section below. Implement the necessary changes to **non-test files only** within the repository, ensuring that **all relevant tests pass** after your changes.
## Key Requirements & Constraints
1. **Understand the problem** very well: it is a bug report, and you know humans don't always write good descriptions. Explore the codebase to understand the related code and the problem in depth. It is possible that the solution needs to be a bit more extensive than just the stated text. Don't exagerate though: don't do unrelated refactoring, but also don't interpret the description too strictly.
2. **Focus on the issues:** Implement the fix focusing on non-test files related to the issue.
2. **Environment Ready:** The Python environment is pre-configured with all dependencies. Do not install packages.
3. **Mandatory Testing Procedure:**
* **Create Test to Reproduce the Issue:** *Before* implementing any fix, you MUST create a *new test* (separate from existing tests) that specifically reproduces the issue.
* Take existing tests as example to understand the testing format/structure.
* Enhance this test with edge cases.
* Run this test to confirm reproduction.
* **Verify Fix:** After implementing the fix, run your test again to verify the issue is resolved.
* **Identify ALL Relevant Tests:** You MUST perform a **dedicated search and analysis** to identify **all** existing unit tests potentially affected by your changes. This includes:
* Tests in the same module/directory as the changed files (e.g., `tests/` subdirectories).
* Tests explicitly importing or using the modified code/classes/functions.
* Tests mentioned in the issue description or related documentation.
* Tests covering functionalities that *depend on* the modified code (analyze callers/dependencies if necessary).
**If you cannot confidently identify a specific subset, you MUST identify and plan to run the entire test suite for the modified application or module(s). State your identified test scope clearly.**
* **Run Identified Relevant Tests:** You MUST execute the **complete set** of relevant existing unit tests you identified in the previous step. Ensure you are running the *correct and comprehensive set* of tests. You MUST NOT modify these existing tests.
* **Final Check & Verification:** Before finishing, ensure **all** identified relevant existing tests pass. **Explicitly confirm that you have considered potential omissions in your test selection and believe the executed tests comprehensively cover the impact of your changes.** Failing to identify and run the *complete* relevant set constitutes a failure. If any identified tests fail, revise your fix. Passing all relevant tests is the primary measure of success.
4. **Defensive Programming:** Actively practice defensive programming: anticipate and handle potential edge cases, unexpected inputs, and different ways the affected code might be called **to ensure the fix works reliably and allows relevant tests to pass.** Analyze the potential impact on other parts of the codebase.
5. **Final Review:** Compare your solution against the original issue and the base commit ({{ instance.base_commit }}) to ensure completeness and test passage.
## General Workflow Guidance
* Prioritize understanding the problem, exploring the code, planning your fix, implementing it carefully using the required diff format, and **thoroughly testing** according to the **Mandatory Testing Procedure**.
* Consider trade-offs between different solutions. The goal is a **robust change that makes the relevant tests pass.** Quality, correctness, and reliability are key.
* Actively practice defensive programming: anticipate and handle potential edge cases, unexpected inputs, and different ways the affected code might be called **to ensure the fix works reliably and allows relevant tests to pass.** Analyze the potential impact on other parts of the codebase.
* IMPORTANT: Your solution will be tested by additional hidden tests, so do not assume the task is complete just because visible tests pass! Refine the solution until you are confident that it is robust and comprehensive according to the **Defensive Programming** requirement.
## Final Note
Be thorough in your exploration, testing, and reasoning. It's fine if your thinking process is lengthy - quality and completeness are more important than brevity.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.