Compare commits

...

1 Commits

Author SHA1 Message Date
Engel Nyst
3ffc2ff345 refactor: replace resolver's legacy 2-step cloning with standard pattern
- Remove subprocess git clone from resolve_issue() method
- Remove shutil.copytree from process_issue() method
- Use initialize_repository_for_runtime() for direct workspace cloning
- Move base_commit capture and .openhands_instructions reading after cloning
- Move branch checkout logic to process_issue() after repo cloning
- Resolver now follows same cloning pattern as other OpenHands entry points

This eliminates the legacy clone-to-temp-then-copy workflow in favor of
the standard clone-directly-to-workspace pattern used throughout OpenHands.

Co-authored-by: openhands <openhands@all-hands.dev>
2025-06-29 20:45:45 +02:00
2 changed files with 200 additions and 60 deletions

156
REFACTOR_PLAN.md Normal file
View File

@@ -0,0 +1,156 @@
# Resolver Runtime Refactoring Plan
## Task Overview
Refactor the resolver component to reuse setup.py functions for runtime initialization, connection, and completion instead of reinventing the wheel.
## Repository Cloning Patterns Analysis
### Repository Cloning Patterns Across OpenHands Entry Points
#### 1. **Resolver (issue_resolver.py)** - DIFFERENT PATTERN (Legacy)
```python
# Step 1: Clone to separate location
subprocess.check_output(['git', 'clone', url, f'{output_dir}/repo'])
# Step 2: Later, copy repo to workspace
shutil.copytree(os.path.join(self.output_dir, 'repo'), self.workspace_base)
# Step 3: Create and connect runtime
runtime = create_runtime(config)
await runtime.connect()
# Step 4: Initialize runtime (git config, setup scripts)
self.initialize_runtime(runtime)
```
#### 2. **Main.py** - STANDARD PATTERN
```python
# Step 1: Create and connect runtime
runtime = create_runtime(config)
await runtime.connect()
# Step 2: Clone directly into runtime workspace + setup
repo_directory = initialize_repository_for_runtime(runtime, selected_repository)
```
#### 3. **Server/Session** - STANDARD PATTERN
```python
# Step 1: Create and connect runtime
# Step 2: Clone directly into runtime workspace
await runtime.clone_or_init_repo(tokens, repo, branch)
# Step 3: Run setup scripts
await runtime.maybe_run_setup_script()
await runtime.maybe_setup_git_hooks()
```
#### 4. **Setup.py's initialize_repository_for_runtime()** - STANDARD PATTERN
```python
# Calls runtime.clone_or_init_repo() + setup scripts
repo_directory = runtime.clone_or_init_repo(tokens, repo, branch)
runtime.maybe_run_setup_script()
runtime.maybe_setup_git_hooks()
```
### The Issue
The **resolver is the odd one out** - it uses a 2-step process (clone to temp location, then copy to workspace) due to **legacy reasons** (it was originally developed as a separate app built on OH, not a component of OH). All other entry points use the standard pattern (clone directly into runtime workspace).
## Current State Analysis
### ✅ What Resolver Already Does Right:
- [x] Uses `create_runtime()` from setup.py for runtime creation
### ❌ What Needs to be Fixed:
- [ ] **Resolver uses legacy 2-step cloning instead of standard runtime.clone_or_init_repo()**
- [ ] Resolver has custom `initialize_runtime()` method that duplicates setup.py logic
- [ ] Resolver has custom `complete_runtime()` method with no setup.py equivalent
- [ ] Resolver doesn't follow proper runtime cleanup patterns like main.py
- [ ] Runtime connection pattern is inconsistent across codebase
## Refactoring Steps
### Phase 1: Fix Repository Cloning Pattern (PRIORITY)
**Goal**: Make resolver use the same repository cloning pattern as all other OpenHands entry points.
- [ ] **Step 1.1**: Replace resolver's legacy 2-step cloning with standard pattern
- Remove `subprocess.check_output(['git', 'clone', ...])` from `resolve_issue()`
- Remove `shutil.copytree()` from `process_issue()`
- Use `initialize_repository_for_runtime()` instead
- This will clone directly into runtime workspace AND run setup scripts
- [ ] **Step 1.2**: Update resolver workflow to match standard pattern
- Create and connect runtime first
- Then call `initialize_repository_for_runtime()` for cloning + setup
- Remove the manual repo copying step entirely
- Ensure base_commit is still captured correctly
### Phase 2: Refactor Runtime Initialization and Completion
**Goal**: Remove code duplication between resolver and setup.py for runtime operations.
- [ ] **Step 2.1**: Create missing functions in setup.py
- Create `setup_runtime_environment()` for git config and platform-specific setup
- Create `complete_runtime_session()` for git patch generation
- Create `cleanup_runtime()` for proper resource cleanup
- [ ] **Step 2.2**: Replace resolver's `initialize_runtime()`
- Use setup.py's `setup_runtime_environment()` instead
- Remove duplicate git configuration code
- Maintain platform-specific behavior (GitLab CI)
- [ ] **Step 2.3**: Replace resolver's `complete_runtime()`
- Use setup.py's `complete_runtime_session()` instead
- Move git patch generation logic to setup.py
- Ensure return values match resolver's expectations
- [ ] **Step 2.4**: Add proper runtime cleanup to resolver
- Use setup.py's `cleanup_runtime()` function
- Ensure resources are properly released in try/finally blocks
### Phase 3: Testing and Validation
- [ ] **Step 3.1**: Test resolver functionality with refactored code
- Verify git operations work correctly
- Verify setup scripts are executed
- Verify git hooks are set up
- [ ] **Step 3.2**: Test runtime lifecycle (create → connect → clone → initialize → complete → cleanup)
- Ensure no resource leaks
- Verify proper error handling
- [ ] **Step 3.3**: Verify resolver output remains consistent
- Git patches are generated correctly
- Issue resolution works as before
- No regression in functionality
### Phase 4: Code Quality and Documentation
- [ ] **Step 4.1**: Add proper documentation to new setup.py functions
- Document parameters and return values
- Add usage examples
- Document platform-specific behavior
- [ ] **Step 4.2**: Remove obsolete code from resolver
- Delete old `initialize_runtime()` method
- Delete old `complete_runtime()` method
- Clean up imports and unused code
- [ ] **Step 4.3**: Update any other components that might benefit from these functions
- Check if other entry points could use the same patterns
- Ensure consistency across the codebase
## Success Criteria
- [ ] **Resolver uses standard repository cloning pattern (runtime.clone_or_init_repo)**
- [ ] Resolver uses setup.py functions for all runtime operations
- [ ] No code duplication between resolver and setup.py
- [ ] Proper runtime lifecycle management (connect → initialize → complete → cleanup)
- [ ] All existing resolver functionality preserved
- [ ] Consistent patterns across all OpenHands entry points
- [ ] Proper error handling and resource cleanup
## Files to Modify
1. `/openhands/core/setup.py` - Add new runtime management functions
2. `/openhands/resolver/issue_resolver.py` - Refactor to use setup.py functions
3. Any tests related to resolver functionality
## Risk Mitigation
- Maintain backward compatibility during refactoring
- Test thoroughly before removing old code
- Keep git patch generation logic identical to avoid breaking issue resolution
- Ensure platform-specific behavior (GitLab CI) is preserved

View File

@@ -5,7 +5,6 @@ import dataclasses
import json
import os
import pathlib
import shutil
import subprocess
from argparse import Namespace
from typing import Any
@@ -394,7 +393,7 @@ class IssueResolver:
async def process_issue(
self,
issue: Issue,
base_commit: str,
branch_to_checkout: str | None,
issue_handler: ServiceContextIssue | ServiceContextPR,
reset_logger: bool = False,
) -> ResolverOutput:
@@ -405,14 +404,45 @@ class IssueResolver:
else:
logger.info(f'Starting fixing issue {issue.number}.')
# write the repo to the workspace
if os.path.exists(self.workspace_base):
shutil.rmtree(self.workspace_base)
shutil.copytree(os.path.join(self.output_dir, 'repo'), self.workspace_base)
# create runtime and clone repo using standard pattern
runtime = create_runtime(self.app_config)
await runtime.connect()
# clone repo directly into runtime workspace
from openhands.core.setup import initialize_repository_for_runtime
initialize_repository_for_runtime(runtime, self.issue_handler.get_clone_url())
# checkout to PR branch if needed
if branch_to_checkout:
logger.info(f'Checking out to PR branch {branch_to_checkout}')
# Fetch the branch first to ensure it exists locally
fetch_cmd = ['git', 'fetch', 'origin', branch_to_checkout]
subprocess.check_output(fetch_cmd, cwd=runtime.workspace_root) # noqa: ASYNC101
# Checkout the branch
checkout_cmd = ['git', 'checkout', branch_to_checkout]
subprocess.check_output(checkout_cmd, cwd=runtime.workspace_root) # noqa: ASYNC101
# get the commit id of current repo for reproducibility
base_commit = (
subprocess.check_output(
['git', 'rev-parse', 'HEAD'], cwd=runtime.workspace_root
) # noqa: ASYNC101
.decode('utf-8')
.strip()
)
logger.info(f'Base commit: {base_commit}')
# Check for .openhands_instructions file in the workspace directory
if self.repo_instruction is None:
openhands_instructions_path = os.path.join(
runtime.workspace_root, '.openhands_instructions'
)
if os.path.exists(openhands_instructions_path):
with open(openhands_instructions_path, 'r') as f: # noqa: ASYNC101
self.repo_instruction = f.read()
def on_event(evt: Event) -> None:
logger.info(evt)
@@ -565,36 +595,10 @@ class IssueResolver:
)
logger.info(f'Using output directory: {self.output_dir}')
# checkout the repo
repo_dir = os.path.join(self.output_dir, 'repo')
if not os.path.exists(repo_dir):
checkout_output = subprocess.check_output( # noqa: ASYNC101
[
'git',
'clone',
self.issue_handler.get_clone_url(),
f'{self.output_dir}/repo',
]
).decode('utf-8')
if 'fatal' in checkout_output:
raise RuntimeError(f'Failed to clone repository: {checkout_output}')
# repo will be cloned later in process_issue using standard pattern
# base_commit will be captured after cloning
# get the commit id of current repo for reproducibility
base_commit = (
subprocess.check_output(['git', 'rev-parse', 'HEAD'], cwd=repo_dir) # noqa: ASYNC101
.decode('utf-8')
.strip()
)
logger.info(f'Base commit: {base_commit}')
if self.repo_instruction is None:
# Check for .openhands_instructions file in the workspace directory
openhands_instructions_path = os.path.join(
repo_dir, '.openhands_instructions'
)
if os.path.exists(openhands_instructions_path):
with open(openhands_instructions_path, 'r') as f: # noqa: ASYNC101
self.repo_instruction = f.read()
# .openhands_instructions will be read after repo is cloned in process_issue
# OUTPUT FILE
output_file = os.path.join(self.output_dir, 'output.jsonl')
@@ -618,39 +622,19 @@ class IssueResolver:
)
try:
# checkout to pr branch if needed
# determine branch to use for PR
branch_to_use = None
if self.issue_type == 'pr':
branch_to_use = issue.head_branch
logger.info(
f'Checking out to PR branch {branch_to_use} for issue {issue.number}'
f'Will checkout to PR branch {branch_to_use} for issue {issue.number}'
)
if not branch_to_use:
raise ValueError('Branch name cannot be None')
# Fetch the branch first to ensure it exists locally
fetch_cmd = ['git', 'fetch', 'origin', branch_to_use]
subprocess.check_output( # noqa: ASYNC101
fetch_cmd,
cwd=repo_dir,
)
# Checkout the branch
checkout_cmd = ['git', 'checkout', branch_to_use]
subprocess.check_output( # noqa: ASYNC101
checkout_cmd,
cwd=repo_dir,
)
base_commit = (
subprocess.check_output(['git', 'rev-parse', 'HEAD'], cwd=repo_dir) # noqa: ASYNC101
.decode('utf-8')
.strip()
)
output = await self.process_issue(
issue,
base_commit,
branch_to_use, # pass branch instead of base_commit
self.issue_handler,
reset_logger,
)