fix(workspace): add Authorization header to completion callback

The automation service's completion callback endpoint requires authentication via Bearer token. Without this header, the callback returns 401 Unauthorized. The cloud_api_key is already available on the workspace instance (set from OPENHANDS_API_KEY env var), so we just need to include it in the request. Co-authored-by: openhands <openhands@all-hands.dev>
style: fix line length and formatting in test file
2026-04-29 03:00:45 -04:00 · 2026-03-25 03:03:13 +00:00 · 2026-03-23 22:35:01 +00:00 · 2026-03-23 13:53:32 -04:00 · 2026-03-23 17:53:07 +00:00 · 2026-03-23 17:21:53 +01:00
3727 changed files with 210438 additions and 465460 deletions
@@ -0,0 +1,159 @@
+---
+name: custom-codereview-guide
+description: Repo-specific code review guidelines for OpenHands/software-agent-sdk. Provides SDK-specific review rules in addition to the default code review skill.
+triggers:
+- /codereview
+---
+
+# OpenHands/software-agent-sdk Code Review Guidelines
+
+You are an expert code reviewer for the **OpenHands/software-agent-sdk** repository. This skill provides repo-specific review guidelines. Be direct but constructive.
+
+## Review Decisions
+
+You have permission to **APPROVE** or **COMMENT** on PRs. Do not use REQUEST_CHANGES.
+
+### Review decision policy (eval / benchmark risk)
+
+Do **NOT** submit an **APPROVE** review when the PR changes agent behavior or anything
+that could plausibly affect benchmark/evaluation performance.
+
+Examples include: prompt templates, tool calling/execution, planning/loop logic,
+memory/condenser behavior, terminal/stdin/stdout handling, or evaluation harness code.
+
+If a PR is in this category (or you are uncertain), leave a **COMMENT** review and
+explicitly flag it for a human maintainer to decide after running lightweight evals.
+
+### Default approval policy
+
+**Default to APPROVE**: If your review finds no issues at "important" level or higher,
+approve the PR. Minor suggestions or nitpicks alone are not sufficient reason to
+withhold approval.
+
+**IMPORTANT:** If you determine a PR is worth merging **and it is not in the eval-risk
+category above**, you should approve it. Don’t just say a PR is "worth merging" or
+"ready to merge" without actually submitting an approval. Your words and actions should
+be consistent.
+
+### When to APPROVE
+
+Examples of straightforward and low-risk PRs you should approve (non-exhaustive):
+
+- **Configuration changes**: Adding models to config files, updating CI/workflow settings
+- **CI/Infrastructure changes**: Changing runner types, fixing workflow paths, updating job configurations
+- **Cosmetic changes**: Typo fixes, formatting, comment improvements, README updates
+- **Documentation-only changes**: Docstring updates, clarifying notes, API documentation improvements
+- **Simple additions**: Adding entries to lists/dictionaries following existing patterns
+- **Test-only changes**: Adding or updating tests without changing production code
+- **Dependency updates**: Version bumps with passing CI
+
+### When NOT to APPROVE - Blocking Issues
+
+**DO NOT APPROVE** PRs that have any of the following issues:
+
+- **Package version bumps in non-release PRs**: If any `pyproject.toml` file has changes to the `version` field (e.g., `version = "1.12.0"` → `version = "1.13.0"`), and the PR is NOT explicitly a release PR (title/description doesn't indicate it's a release), **DO NOT APPROVE**. Version numbers should only be changed in dedicated release PRs managed by maintainers.
+  - Check: Look for changes to `version = "..."` in any `*/pyproject.toml` files
+  - Exception: PRs with titles like "release: v1.x.x" or "chore: bump version to 1.x.x" from maintainers
+
+Examples:
+- A PR adding a new model to `resolve_model_config.py` or `verified_models.py` with corresponding test updates
+- A PR adding documentation notes to docstrings clarifying method behavior (e.g., security considerations, bypass behaviors)
+- A PR changing CI runners or fixing workflow infrastructure issues (e.g., standardizing runner types to fix path inconsistencies)
+
+### When to COMMENT
+
+Use COMMENT when you have feedback or concerns:
+
+- Issues that need attention (bugs, security concerns, missing tests)
+- Suggestions for improvement
+- Questions about design decisions
+- Minor style preferences
+
+If there are significant issues, leave detailed comments explaining the concerns—but let a human maintainer decide whether to block the PR.
+
+## Core Principles
+
+1. **Simplicity First**: Question complexity. If something feels overcomplicated, ask "what's the use case?" and seek simpler alternatives. Features should solve real problems, not imaginary ones.
+
+2. **Pragmatic Testing**: Test what matters. Avoid duplicate test coverage. Don't test library features (e.g., `BaseModel.model_dump()`). Focus on the specific logic implemented in this codebase.
+
+3. **Type Safety**: Avoid `# type: ignore` - treat it as a last resort. Fix types properly with assertions, proper annotations, or code adjustments. Prefer explicit type checking over `getattr`/`hasattr` guards.
+
+4. **Backward Compatibility**: Evaluate breaking change impact carefully. Consider API changes that affect existing users, removal of public fields/methods, and changes to default behavior.
+
+## What to Check
+
+- **Complexity**: Over-engineered solutions, unnecessary abstractions, complex logic that could be refactored
+- **Testing**: Duplicate test coverage, tests for library features, missing edge case coverage
+- **Type Safety**: `# type: ignore` usage, missing type annotations, `getattr`/`hasattr` guards, mocking non-existent arguments
+- **Breaking Changes**: API changes affecting users, removed public fields/methods, changed defaults
+- **Code Quality**: Code duplication, missing comments for non-obvious decisions, inline imports (unless necessary for circular deps)
+- **Repository Conventions**: Use `pyright` not `mypy`, put fixtures in `conftest.py`, avoid `sys.path.insert` hacks
+- **Event Type Deprecation**: Changes to event types (Pydantic models used in serialization) must handle deprecated fields properly
+
+## Event Type Deprecation - Critical Review Checkpoint
+
+When reviewing PRs that modify event types (e.g., `TextContent`, `Message`, `Event`, or any Pydantic model used in event serialization), **DO NOT APPROVE** until the following are verified:
+
+### Required for Removing/Deprecating Fields
+
+1. **Model validator present**: If a field is being removed from an event type with `extra="forbid"`, there MUST be a `@model_validator(mode="before")` that uses `handle_deprecated_model_fields()` to remove the deprecated field before validation. Otherwise, old events will fail to load.
+
+2. **Tests for backward compatibility**: The PR MUST include tests that:
+   - Load an old event format (with the deprecated field) successfully
+   - Load a new event format (without the deprecated field) successfully
+   - Verify both can be loaded in sequence (simulating mixed conversations)
+
+3. **Test naming convention**: The version in the test name should be the **LAST version** where a particular event structure exists. For example, if `enable_truncation` was removed in v1.11.1, the test should be named `test_v1_10_0_...` (the last version with that field), not `test_v1_8_0_...` (when it was introduced). This avoids duplicate tests and clearly documents when a field was last present.
+
+**Important**: Deprecated field handlers are **permanent** and should never be removed. They ensure old conversations can always be loaded.
+
+### Example Pattern (Required)
+
+```python
+from openhands.sdk.utils.deprecation import handle_deprecated_model_fields
+
+class MyModel(BaseModel):
+    model_config = ConfigDict(extra="forbid")
+
+    # Deprecated fields that are silently removed for backward compatibility
+    # when loading old events. These are kept permanently.
+    _DEPRECATED_FIELDS: ClassVar[tuple[str, ...]] = ("old_field_name",)
+
+    @model_validator(mode="before")
+    @classmethod
+    def _handle_deprecated_fields(cls, data: Any) -> Any:
+        """Remove deprecated fields for backward compatibility with old events."""
+        return handle_deprecated_model_fields(data, cls._DEPRECATED_FIELDS)
+```
+
+### Why This Matters
+
+Production systems resume conversations that may contain events serialized with older SDK versions. If the SDK can't load old events, users will see errors like:
+
+```
+pydantic_core.ValidationError: Extra inputs are not permitted
+```
+
+**This is a production-breaking change.** Do not approve PRs that modify event types without proper backward compatibility handling and tests.
+
+## What NOT to Comment On
+
+Do not leave comments for:
+
+- **Nitpicks**: Minor style preferences, optional improvements, or "nice-to-haves" that don't affect correctness or maintainability
+- **Good behavior observed**: Don't comment just to praise code that follows best practices - this adds noise. Simply approve if the code is good.
+- **Suggestions for additional tests on simple changes**: For straightforward PRs (config changes, model additions, etc.), don't suggest adding test coverage unless tests are clearly missing for new logic
+- **Obvious or self-explanatory code**: Don't ask for comments on code that is already clear
+- **`.pr/` directory artifacts**: Files in the `.pr/` directory are temporary PR-specific documents (design notes, analysis, scripts) that are automatically cleaned up when the PR is approved. Do not comment on their presence or suggest removing them.
+
+If a PR is approvable, just approve it. Don't add "one small suggestion" or "consider doing X" comments that delay merging without adding real value.
+
+## Communication Style
+
+- Be direct and concise - don't over-explain
+- Use casual, friendly tone ("lgtm", "WDYT?", emojis are fine 👀)
+- Ask questions to understand use cases before suggesting changes
+- Suggest alternatives, not mandates
+- Approve quickly when code is good ("LGTM!")
+- Use GitHub suggestion syntax for code fixes
@@ -0,0 +1,88 @@
+---
+name: debug-test-examples-workflow
+description: Guide for debugging failing example tests in the `test-examples` labeled workflow. Use this skill when investigating CI failures in the run-examples.yml workflow, when example scripts fail to run correctly, when needing to isolate specific test failures, or when analyzing workflow logs and failure patterns.
+---
+
+# Debugging test-examples Workflow
+
+## Overview
+
+The `run-examples.yml` workflow runs example scripts from `examples/` directory. Triggers:
+- Adding `test-examples` label to a PR
+- Manual workflow dispatch
+- Scheduled nightly runs
+
+## Debugging Steps
+
+### 1. Isolate Failing Tests
+
+Modify `tests/examples/test_examples.py` to focus on specific tests:
+
+```python
+_TARGET_DIRECTORIES = (
+    # EXAMPLES_ROOT / "01_standalone_sdk",
+    EXAMPLES_ROOT / "02_remote_agent_server",  # Keep only failing directory
+)
+```
+
+### 2. Exclude Tests
+
+Add to `_EXCLUDED_EXAMPLES` with explanation:
+
+```python
+_EXCLUDED_EXAMPLES = {
+    # Reason for exclusion
+    "examples/path/to/failing_test.py",
+}
+```
+
+### 3. Trigger Workflow
+
+Toggle the `test-examples` label:
+
+```bash
+# Remove label
+curl -X DELETE -H "Authorization: token $GITHUB_TOKEN" \
+  "https://api.github.com/repos/OpenHands/software-agent-sdk/issues/${PR_NUMBER}/labels/test-examples"
+
+# Add label
+curl -X POST -H "Authorization: token $GITHUB_TOKEN" \
+  -H "Accept: application/vnd.github.v3+json" \
+  "https://api.github.com/repos/OpenHands/software-agent-sdk/issues/{PR_NUMBER}/labels" \
+  -d '{"labels":["test-examples"]}'
+```
+
+### 4. Monitor Progress
+
+```bash
+# Check status
+curl -s -H "Authorization: token $GITHUB_TOKEN" \
+  "https://api.github.com/repos/OpenHands/software-agent-sdk/actions/runs/{RUN_ID}" | jq '{status, conclusion}'
+
+# Download logs
+curl -sL -H "Authorization: token $GITHUB_TOKEN" \
+  "https://api.github.com/repos/OpenHands/software-agent-sdk/actions/runs/{RUN_ID}/logs" -o logs.zip
+unzip logs.zip -d logs
+```
+
+## Common Failure Patterns
+
+| Pattern | Cause | Solution |
+|---------|-------|----------|
+| Port conflicts | Fixed ports (8010, 8011) | Run with `-n 1` or use different ports |
+| Container issues | Docker/Apptainer setup | Check Docker availability, image pulls |
+| LLM failures | Transient API errors | Retry the test |
+| Example bugs | Code errors | Check traceback |
+
+
+## Key Configuration
+
+**Workflow** (`.github/workflows/run-examples.yml`):
+- Runner: `blacksmith-2vcpu-ubuntu-2404`
+- Timeout: 60 minutes
+- Parallelism: `-n 4` (pytest-xdist: 4 parallel workers)
+
+**Tests** (`tests/examples/test_examples.py`):
+- Timeout per example: 600 seconds
+- Target directories: `_TARGET_DIRECTORIES`
+- Excluded examples: `_EXCLUDED_EXAMPLES`
@@ -0,0 +1,43 @@
+---
+name: design-principles
+description: Core architectural design principles of the OpenHands Software Agent SDK. Reference when making architectural decisions, reviewing PRs that change agent/tool/state boundaries, or evaluating whether a proposed change aligns with V1 design goals.
+---
+
+# SDK Design Principles
+
+Reference: <https://docs.openhands.dev/sdk/arch/design>
+
+## Quick Summary
+
+1. **Optional Isolation over Mandatory Sandboxing**
+   Sandboxing is opt-in, not universal. Agent and tool execution runs in a single
+   process by default. When isolation is needed, the same stack can be transparently
+   containerized.
+
+2. **Stateless by Default, One Source of Truth for State**
+   All components — agents, tools, LLMs, configurations — are **immutable Pydantic
+   models** validated at construction. The only mutable entity is the conversation
+   state. This enables deterministic replay and robust persistence.
+
+3. **Clear Boundaries between Agent and Applications**
+   Strict separation between SDK (agent core), tools, workspace, and agent server.
+   Applications communicate via APIs, not by embedding the agent.
+
+4. **Composable Components for Extensibility**
+   Agents are graphs of interchangeable components — tools, prompts, LLMs, contexts —
+   described **declaratively with strong typing**. Developers reconfigure capabilities
+   without modifying core code.
+
+## Implications for Development
+
+- Since agents are immutable Pydantic models, their configuration **is** their
+  serializable representation. There should be no need to "reverse-engineer" agent
+  config from runtime instances.
+- Tool implementations (callables) are the only non-serializable part; this is solved
+  by `tool_module_qualnames` for remote forwarding.
+- Everything else (system_prompt, model, skills, tool names) is already declarative
+  data that can be serialized and forwarded directly.
+- Avoid patterns that create multiple sources of truth for the same configuration
+  (e.g., a factory function AND an extracted definition).
+- `model_copy(update=...)` should be used sparingly and through well-defined paths to
+  avoid undermining statelessness.
@@ -0,0 +1,244 @@
+---
+name: feature-release-rollout
+description: This skill should be used when the user asks to "rollout a feature", "complete feature release", "propagate SDK feature", "track feature support", "what's missing for feature X", or mentions checking CLI/GUI/docs/blog support for SDK features. Guides agents through the multi-repository feature release workflow from SDK to docs to marketing.
+triggers:
+- rollout feature
+- feature release
+- propagate feature
+- feature support
+- complete release
+- docs for feature
+- blog for feature
+- CLI support
+- GUI support
+- what's missing
+---
+
+# Feature Release Rollout
+
+This skill guides the complete feature release workflow across the OpenHands ecosystem repositories.
+
+## Overview
+
+When a feature is implemented in the SDK, it may need propagation through several repositories:
+
+1. **SDK** (`OpenHands/software-agent-sdk`) — Core feature implementation
+2. **CLI** (`OpenHands/OpenHands-CLI`) — Terminal interface support
+3. **GUI** (`OpenHands/OpenHands` frontend directory) — Web interface support
+4. **Docs** (`OpenHands/docs`) — Documentation updates (sdk/ folder)
+5. **Blog** (`OpenHands/growth-utils` blog-post/) — Marketing and announcements
+6. **Video** — Tutorial content (using ElevenLabs + Remotion)
+
+## Workflow
+
+### Phase 1: Feature Discovery
+
+First, identify what feature(s) to analyze. The user may specify:
+- A release tag (e.g., `v1.9.0`)
+- A specific feature name
+- A PR or commit reference
+- A comparison between versions
+
+**For release tags:**
+```bash
+# Clone SDK if not present
+git clone https://github.com/OpenHands/software-agent-sdk.git
+
+# View release notes
+cd software-agent-sdk
+git log --oneline v1.8.0..v1.9.0  # Changes between versions
+git show v1.9.0 --stat             # What changed in this release
+```
+
+**For specific features:**
+Search the SDK codebase, examples, and changelog to understand the feature scope.
+
+### Phase 2: Repository Analysis
+
+Clone all relevant repositories to analyze current support:
+
+```bash
+# Clone repositories (use GITHUB_TOKEN for authenticated access)
+git clone https://github.com/OpenHands/software-agent-sdk.git
+git clone https://github.com/OpenHands/OpenHands-CLI.git
+git clone https://github.com/OpenHands/OpenHands.git        # Frontend in frontend/
+git clone https://github.com/OpenHands/docs.git
+git clone https://github.com/OpenHands/growth-utils.git
+```
+
+For each feature, check support status:
+
+| Repository | Check Location | What to Look For |
+|------------|---------------|------------------|
+| CLI | `openhands_cli/` | Feature flags, commands, TUI widgets |
+| GUI | `OpenHands/frontend/src/` | React components, API integrations |
+| Docs | `docs/sdk/` | Guide pages, API reference, examples |
+| Blog | `growth-utils/blog-post/posts/` | Announcement posts |
+
+### Phase 3: Assess Feature Importance
+
+Not all features warrant full rollout. Evaluate each feature:
+
+**High Impact (full rollout recommended):**
+- New user-facing capabilities
+- Breaking changes or migrations
+- Major performance improvements
+- New integrations or tools
+
+**Medium Impact (docs + selective support):**
+- New API methods or parameters
+- Configuration options
+- Developer experience improvements
+
+**Low Impact (docs only or skip):**
+- Internal refactoring
+- Bug fixes
+- Minor enhancements
+
+**Skip rollout for:**
+- Internal-only changes
+- Test improvements
+- Build/CI changes
+- Documentation typos
+
+### Phase 4: Create Proposal
+
+Generate a structured proposal for the user:
+
+```markdown
+## Feature Rollout Proposal: [Feature Name]
+
+### Feature Summary
+[Brief description of the feature and its value]
+
+### Current Support Status
+| Component | Status | Notes |
+|-----------|--------|-------|
+| SDK | ✅ Implemented | [version/PR] |
+| CLI | ❌ Missing | [what's needed] |
+| GUI | ⚠️ Partial | [what's implemented vs needed] |
+| Docs | ❌ Missing | [suggested pages] |
+| Blog | ❌ Not started | [whether warranted] |
+| Video | ❌ Not started | [whether warranted] |
+
+### Recommended Actions
+1. **CLI**: [specific implementation needed]
+2. **GUI**: [specific implementation needed]
+3. **Docs**: [pages to create/update]
+4. **Blog**: [recommended or not, with reasoning]
+5. **Video**: [recommended or not, with reasoning]
+
+### Assessment
+- **Overall Priority**: [High/Medium/Low]
+- **Effort Estimate**: [days/hours per component]
+- **Dependencies**: [what must be done first]
+```
+
+### Phase 5: User Confirmation
+
+Wait for explicit user approval before proceeding. Ask:
+- Which components to implement
+- Priority ordering
+- Any modifications to the proposal
+
+### Phase 6: Implementation
+
+Only after user confirmation:
+
+**Create GitHub Issues:**
+```bash
+# Create issue on relevant repo
+gh issue create --repo OpenHands/OpenHands-CLI \
+  --title "Support [feature] in CLI" \
+  --body "## Context\n[Feature description]\n\n## Implementation\n[Details]\n\n## Related\n- SDK: [link]\n- Docs: [link]"
+```
+
+**Implementation order:**
+1. CLI/GUI support (can be parallel)
+2. Documentation (depends on 1)
+3. Blog post (depends on 2)
+4. Video (depends on 3)
+
+## Repository-Specific Guidelines
+
+### CLI (OpenHands/OpenHands-CLI)
+
+- Check `AGENTS.md` for development guidelines
+- Use `uv` for dependency management
+- Run `make lint` and `make test` before commits
+- TUI components in `openhands_cli/tui/`
+- Snapshot tests for UI changes
+
+### GUI (OpenHands/OpenHands frontend)
+
+- Frontend in `frontend/` directory
+- React/TypeScript codebase
+- Run `npm run lint:fix && npm run build` in frontend/
+- Follow TanStack Query patterns for data fetching
+- i18n translations in `frontend/src/i18n/`
+
+### Docs (OpenHands/docs)
+
+- SDK docs in `sdk/` folder
+- Uses Mintlify (`.mdx` files)
+- Code blocks can auto-sync from SDK examples
+- Run `mint broken-links` to validate
+- Follow `openhands/DOC_STYLE_GUIDE.md`
+
+### Blog (OpenHands/growth-utils)
+
+- Posts in `blog-post/posts/YYYYMMDD-title.md`
+- Assets in `blog-post/assets/YYYYMMDD-title/`
+- Frontmatter format:
+  ```yaml
+  ---
+  title: "Post Title"
+  excerpt: "Brief description"
+  coverImage: "/assets/blog/YYYYMMDD-title/cover.png"
+  date: "YYYY-MM-DDTHH:MM:SS.000Z"
+  authors:
+    - name: Author Name
+      picture: "/assets/blog/authors/author.png"
+  ogImage:
+    url: "/assets/blog/YYYYMMDD-title/cover.png"
+  ---
+  ```
+
+## Example Feature Analysis
+
+**Feature: Browser Session Recording (SDK v1.8.0)**
+
+1. **SDK**: ✅ Implemented in `openhands.tools.browser`
+2. **CLI**: ❌ No replay/export commands
+3. **GUI**: ❌ No recording viewer component
+4. **Docs**: ✅ Guide at `sdk/guides/browser-session-recording.mdx`
+5. **Blog**: ❌ Could highlight for web scraping users
+6. **Video**: Consider 2-minute demo
+
+**Recommendation**: Medium priority. Docs done, CLI/GUI low urgency (advanced feature), blog post optional.
+
+## Quick Commands
+
+```bash
+# Check SDK feature presence
+grep -r "feature_name" software-agent-sdk/openhands/ --include="*.py"
+
+# Check CLI support
+grep -r "feature_name" OpenHands-CLI/openhands_cli/ --include="*.py"
+
+# Check GUI support
+grep -r "featureName" OpenHands/frontend/src/ --include="*.ts" --include="*.tsx"
+
+# Check docs coverage
+grep -r "feature" docs/sdk/ --include="*.mdx"
+
+# Check blog mentions
+grep -r "feature" growth-utils/blog-post/posts/ --include="*.md"
+```
+
+## Important Notes
+
+- Always get user confirmation before creating issues or starting implementation
+- Consider feature maturity — new features may change before full rollout
+- Cross-reference PRs between repositories in issue descriptions
+- For breaking changes, coordinate release timing across all components
@@ -0,0 +1,66 @@
+---
+name: run-eval
+description: Trigger and monitor evaluation runs for benchmarks like SWE-bench, GAIA, and others. Use when running evaluations via GitHub Actions or monitoring eval progress through Datadog and kubectl.
+triggers:
+- run eval
+- trigger eval
+- evaluation run
+- swebench eval
+---
+
+# Running Evaluations
+
+## Trigger via GitHub API
+
+```bash
+curl -X POST \
+  -H "Authorization: token $GITHUB_TOKEN" \
+  -H "Accept: application/vnd.github+json" \
+  "https://api.github.com/repos/OpenHands/software-agent-sdk/actions/workflows/run-eval.yml/dispatches" \
+  -d '{
+    "ref": "main",
+    "inputs": {
+      "benchmark": "swebench",
+      "sdk_ref": "main",
+      "eval_limit": "50",
+      "model_ids": "claude-sonnet-4-5-20250929",
+      "reason": "Description of eval run",
+      "benchmarks_branch": "main"
+    }
+  }'
+```
+
+**Key parameters:**
+- `benchmark`: `swebench`, `swebenchmultimodal`, `gaia`, `swtbench`, `commit0`, `multiswebench`, `terminalbench`
+- `eval_limit`: Any positive integer (e.g., `1`, `10`, `50`, `200`)
+- `model_ids`: See `.github/run-eval/resolve_model_config.py` for available models
+- `benchmarks_branch`: Use feature branch from the benchmarks repo to test benchmark changes before merging
+
+**Note:** When running a full eval, you must select an `eval_limit` that is greater than or equal to the actual number of instances in the benchmark. If you specify a smaller limit, only that many instances will be evaluated (partial eval).
+
+## Monitoring
+
+**Datadog script** (requires `OpenHands/evaluation` repo; DD_API_KEY, DD_APP_KEY, and DD_SITE environment variables are set):
+```bash
+DD_API_KEY=$DD_API_KEY DD_APP_KEY=$DD_APP_KEY DD_SITE=$DD_SITE \
+  python scripts/analyze_evals.py --job-prefix <EVAL_RUN_ID> --time-range 60
+# EVAL_RUN_ID format: typically the workflow run ID from GitHub Actions
+```
+
+**kubectl** (for users with cluster access - the agent does not have kubectl access):
+```bash
+kubectl logs -f job/eval-eval-<RUN_ID>-<MODEL_SLUG> -n evaluation-jobs
+```
+
+## Common Errors
+
+| Error | Cause | Fix |
+|-------|-------|-----|
+| `503 Service Unavailable` | Infrastructure overloaded | Ask user to stop some evaluation runs |
+| `429 Too Many Requests` | Rate limiting | Wait or reduce concurrency |
+| `failed after 3 retries` | Instance failures | Check Datadog logs for root cause |
+
+## Limits
+
+- Max 256 parallel runtimes (jobs will queue if this limit is exceeded)
+- Full evals typically take 1-3 hours depending on benchmark size
@@ -0,0 +1,117 @@
+---
+name: write-behavior-test
+description: Guide for writing behavior tests that verify agents follow system message guidelines and avoid undesirable behaviors. Use when creating integration tests for agent behavior validation.
+triggers:
+- /write_behavior_test
+---
+
+# Behavior Test Writing Guide
+
+You are helping to create **behavior tests** for the agent-sdk integration test suite. These tests verify that agents follow system message guidelines and avoid undesirable behaviors.
+
+The tests are for the agent powered by this SDK, so you may need to refer the codebase for details on how the agent works in order to write effective tests.
+
+## Behavior Tests vs Task Tests
+
+**Task Tests (t*.py)** - REQUIRED tests that verify task completion:
+- Focus: Can the agent successfully complete the task?
+- Example: Fix typos in a file, create a script, implement a feature
+
+**Behavior Tests (b*.py)** - OPTIONAL tests that verify proper behavior:
+- Focus: Does the agent follow best practices and system guidelines?
+- Example: Don't implement when asked for advice, don't over-verify, avoid redundant files
+
+## Key Principles for Writing Behavior Tests
+
+### ✅ DO:
+
+1. **Use Real Repositories**
+   - Clone actual GitHub repositories that represent real-world scenarios
+   - Pin to a specific historical commit (before a fix/feature was added)
+   - Example: `clone_pinned_software_agent_repo(workspace)` helper
+
+2. **Test Realistic Complex, Nuanced Behaviors**
+   - Try to make the task as realistic as possible to real HUMAN interactions, from file naming, (somewhat lazy) instruction style, etc
+   - Focus on subtle behavioral issues that require judgment
+   - Test scenarios where the "right" behavior isn't immediately obvious
+   - Examples: When to implement vs advise, when to stop testing, whether to add backward compatibility
+
+3. **Clean Up Repository History**
+   - Check out to a commit BEFORE the solution exists
+   - Reset/remove future commits (see existing tests for examples)
+   - Ensures the agent experiences the same context as real users
+
+4. **Use Helper Functions**
+   - `find_file_editing_operations(events)` - Find file create/edit operations
+   - `find_tool_calls(events, tool_name)` - Find specific tool usage
+   - `get_conversation_summary(events)` - Get summary for LLM judge
+   - `judge_agent_behavior(...)` - Use LLM to evaluate behavior quality
+
+5. **Leverage LLM Judges**
+   - Use `judge_agent_behavior()` for subjective evaluations
+   - Provide clear evaluation criteria in the judge prompt
+   - Track judge usage costs: `self.add_judge_usage(prompt_tokens, completion_tokens, cost)`
+
+6. **Adaptation of Problem Description to Task**
+   - If you find the problem description is not easy to adapt to a behavior test, e.g. it requires complex environment setup like kubernetes, try to come up with a simpler problem description that still captures the essence of the behavior you want to test but is easier to implement in the test framework.
+   - Ensure the instructions naturally lead to the behavior you want to evaluate
+
+### ❌ DO NOT:
+
+1. **Avoid Simple Synthetic Tests**
+   - Don't create artificial scenarios with minimal setup
+   - Don't test behaviors that are too obvious or straightforward
+   - Example: Don't create a single-file test with trivial content
+
+2. **Don't Test Basic Functionality**
+   - Behavior tests are NOT for testing if the agent can use tools
+   - Task tests handle basic capability verification
+   - Focus on HOW the agent approaches problems, not IF it can solve them
+
+3. **Don't Overcomplicate Static Assertions**
+   - Use assertions for clear-cut checks (e.g., no file edits)
+   - Rely on LLM judges for nuanced behavior evaluations
+   - Avoid trying to encode subjective judgments purely in code or too much static logic
+
+## Tips for Test Difficulty Calibration
+
+**Make tests challenging but not impossible and too long:**
+
+1. **Context Complexity**: Use real codebases with multiple files and dependencies, either the software-agent-sdk or other popular open-source repos you find suitable
+2. **Ambiguity**: Prefer instructions that could be interpreted multiple ways
+3. **Temptation**: Set up scenarios where the "easy wrong path" is tempting
+4. **Realism**: Mirror real user interactions and expectations
+
+**Examples of Good Complexity:**
+- "How to implement X?" (tests if agent implements vs advises)
+- "Update constant Y" (tests if agent over-verifies with excessive test runs)
+- "Rename method A to B" (tests if agent adds unnecessary backward compatibility)
+
+## Example Behavior Test Patterns
+
+1. **Premature Implementation** - Tests if agent implements when asked for advice only
+2. **Over-verification** - Tests if agent runs excessive tests beyond what's needed
+3. **Unnecessary Compatibility** - Tests if agent adds backward compatibility shims when not needed
+4. **Redundant Artifacts** - Tests if agent creates extra files (docs, READMEs) without being asked
+5. **Communication Quality** - Tests if agent provides explanations for actions
+
+## File Naming Convention
+
+Name your test file: `b##_descriptive_name.py`
+- `b` prefix indicates behavior test (auto-detected)
+- `##` is a zero-padded number (e.g., 01, 02, 03)
+- Use snake_case for the descriptive name
+
+## Final Checklist
+
+Before submitting your behavior test, verify:
+
+- [ ] Uses a real repository or complex codebase
+- [ ] Tests a nuanced behavior, not basic functionality
+- [ ] Includes clear and not overly complex verification logic (assertions or LLM judge)
+- [ ] Has a descriptive docstring explaining what behavior is tested
+- [ ] Properly tracks judge usage costs if using LLM evaluation
+- [ ] Follows naming convention: `b##_descriptive_name.py`
+- [ ] Test is realistic and based on actual behavioral issues observed
+
+Remember: The goal is to catch subtle behavioral issues that would appear in real-world usage, serving as regression tests for system message improvements.
@@ -1 +0,0 @@
-This way of running OpenHands is not officially supported. It is maintained by the community.
@@ -1,19 +0,0 @@
-// For format details, see: https://aka.ms/devcontainer.json
-{
-	"name": "Python 3",
-	// Documentation for this image:
-	// - https://github.com/devcontainers/templates/tree/main/src/python
-	// - https://github.com/microsoft/vscode-remote-try-python
-	// - https://hub.docker.com/r/microsoft/devcontainers-python
-	"image": "mcr.microsoft.com/devcontainers/python:1-3.12-bullseye",
-	"features": {
-		"ghcr.io/devcontainers/features/docker-outside-of-docker:1": {},
-		"ghcr.io/devcontainers-extra/features/poetry:2": {},
-		"ghcr.io/devcontainers/features/node:1": {},
-	},
-	"postCreateCommand": ".devcontainer/setup.sh",
-	"runArgs": ["--add-host=host.docker.internal:host-gateway"],
-	"containerEnv": {
-		"DOCKER_HOST_ADDR": "host.docker.internal"
-	},
-}
@@ -1,14 +0,0 @@
-#!/bin/bash
-
-# Mark the current repository as safe for Git to prevent "dubious ownership" errors,
-# which can occur in containerized environments when directory ownership doesn't match the current user.
-git config --global --add safe.directory "$(realpath .)"
-
-# Install `nc`
-sudo apt update && sudo apt install netcat -y
-
-# Install `uv` and `uvx`
-wget -qO- https://astral.sh/uv/install.sh | sh
-
-# Do common setup tasks
-source .openhands/setup.sh
@@ -1,23 +1,261 @@
-# NodeJS
-frontend/node_modules
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class

-# Configuration (except pyproject.toml)
-*.ini
-*.toml
-!pyproject.toml
-*.yml
+# C extensions
+*.so

-# Documentation (except README.md)
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+# Note: We keep our custom spec file in version control
+# *.spec
+
+# PyInstaller build directories
+build/
+dist/
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+.pybuilder/
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+#   For a library or package, you might want to ignore these files since the code is
+#   intended to run in multiple environments; otherwise, check them in:
+# .python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# poetry
+#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
+# poetry.lock
+
+# pdm
+#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
+#pdm.lock
+#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
+#   in version control.
+#   https://pdm.fming.dev/#use-with-ide
+.pdm.toml
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# pytype static type analyzer
+.pytype/
+
+# Cython debug symbols
+cython_debug/
+
+# PyCharm
+#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
+#  be added to the global gitignore or merged into this project gitignore.  For a PyCharm
+#  project, it is recommended to ignore the entire .idea directory.
+.idea/
+
+# VS Code
+.vscode/
+
+# macOS
+.DS_Store
+.AppleDouble
+.LSOverride
+
+# Windows
+Thumbs.db
+ehthumbs.db
+Desktop.ini
+$RECYCLE.BIN/
+
+# Linux
+*~
+
+# Temporary files
+*.tmp
+*.temp
+*.swp
+*.swo
+
+# UV specific
+.uv/
+
+# Project specific
+*.log
+.coverage
+.pytest_cache/
+
+workspace/
+.client
+.docker
+
+
+.git
+.git/**
+
+# VS Code: Ignore all but certain files that specify repo-specific settings.
+# https://stackoverflow.com/questions/32964920/should-i-commit-the-vscode-folder-to-source-control
+.vscode/**/*
+!.vscode/extensions.json
+!.vscode/tasks.json
+
+# VS Code extensions/forks:
+.cursorignore
+.rooignore
+.clineignore
+.windsurfignore
+.cursorrules
+.roorules
+.clinerules
+.windsurfrules
+.cursor/rules
+.roo/rules
+.cline/rules
+.windsurf/rules
+.repomix
+repomix-output.txt
+
+# misc
+.DS_Store
+.env.local
+.env.development.local
+.env.test.local
+.env.production.local
+
+npm-debug.log*
+yarn-debug.log*
+yarn-error.log*
+
+logs
+
+# agent
+.envrc
+cache
+.jinja_cache/
+
+.conversations*
+workspace/
+
+# Build optimization: exclude files not needed for building agent-server
+tests/
+*.log
+.github/
+scripts/
+examples/
+.ruff_cache/
+.uv-cache/
+Makefile
+docs/
 *.md
 !README.md
-
-# Hidden files and directories
-.*
-__pycache__
-
-# Unneded files and directories
-/dev_config/
-/docs/
-/evaluation/
-/tests/
-CITATION.cff
+.pre-commit-config.yaml
+.python-version
@@ -1,5 +0,0 @@
-[*]
-# force *nix line endings so files don't look modified in container run from Windows clone
-end_of_line = lf
-trim_trailing_whitespace = true
-insert_final_newline = true
@@ -1,7 +0,0 @@
-*.ipynb linguist-vendored
-
-# force *nix line endings so files don't look modified in container run from Windows clone
-* text eol=lf
-# Git incorrectly thinks some media is text
-*.png -text
-*.mp4 -text
@@ -1,8 +0,0 @@
-# CODEOWNERS file for OpenHands repository
-# See https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners
-
-/frontend/ @amanape @hieptl
-/openhands-ui/ @amanape @hieptl
-/openhands/ @tofarr @malhotra5 @hieptl
-/enterprise/ @chuckbutkus @tofarr @malhotra5
-/evaluation/ @xingyaoww @neubig
@@ -1,71 +1,168 @@
+---
 name: Bug
-description: Report a problem with OpenHands
+description: Report a problem with OpenHands SDK
 title: '[Bug]: '
-labels: ['bug']
+labels: [bug]
 body:
-  - type: markdown
-    attributes:
-      value: Thank you for taking the time to fill out this bug report. Please provide as much information as possible
-       to help us understand and address the issue effectively.
+    - type: markdown
+      attributes:
+          value: |
+              ## Thank you for reporting a bug! 🐛

-  - type: checkboxes
-    attributes:
-      label: Is there an existing issue for the same bug? (If one exists, thumbs up or comment on the issue instead).
-      description: Please check if an issue already exists for the bug you encountered.
-      options:
-      - label: I have checked the existing issues.
-        required: true
+              **Please fill out all required fields.** Issues missing critical information (version, installation method, reproduction steps, etc.) will be delayed or closed until complete details are provided.

-  - type: textarea
-    id: bug-description
-    attributes:
-      label: Describe the bug and reproduction steps
-      description: Provide a description of the issue along with any reproduction steps.
-    validations:
-      required: true
+              Clear, detailed reports help us resolve issues faster.

-  - type: dropdown
-    id: installation
-    attributes:
-      label: OpenHands Installation
-      description: How are you running OpenHands?
-      options:
-        - Docker command in README
-        - GitHub resolver
-        - Development workflow
-        - CLI
-        - app.all-hands.dev
-        - Other
-      default: 0
+    - type: checkboxes
+      attributes:
+          label: Is there an existing issue for the same bug?
+          description: Please search existing issues before creating a new one. If found, react or comment to the duplicate issue instead of making a 
+              new one. <!-- TODO-openhands -->
+          options:
+              - label: I have searched existing issues and this is not a duplicate.
+                required: true

-  - type: input
-    id: openhands-version
-    attributes:
-      label: OpenHands Version
-      description: What version of OpenHands are you using?
-      placeholder: ex. 0.9.8, main, etc.
+    - type: textarea
+      id: bug-description
+      attributes:
+          label: Bug Description
+          description: Clearly describe what went wrong. Be specific and concise.
+          placeholder: Example - When I use the SDK to create an agent with custom tools, the agent fails to register the tools with a TypeError.
+      validations:
+          required: true

-  - type: input
-    id: model-name
-    attributes:
-      label: Model Name
-      description: What model are you using?
-      placeholder: ex. gpt-4o, claude-3-5-sonnet, openrouter/deepseek-r1, etc.
+    - type: textarea
+      id: expected-behavior
+      attributes:
+          label: Expected Behavior
+          description: What did you expect to happen?
+          placeholder: Example - The agent should successfully register custom tools and make them available for use.
+      validations:
+          required: false

-  - type: dropdown
-    id: os
-    attributes:
-      label: Operating System
-      options:
-        - MacOS
-        - Linux
-        - WSL on Windows
+    - type: textarea
+      id: actual-behavior
+      attributes:
+          label: Actual Behavior
+          description: What actually happened?
+          placeholder: "Example - TypeError: 'NoneType' object is not iterable when calling agent.register_tool()"
+      validations:
+          required: false

-  - type: textarea
-    id: additional-context
-    attributes:
-      label: Logs, Errors, Screenshots, and Additional Context
-      description: Please provide any additional information you think might help. If you want to share the chat history
-        you can click the thumbs-down (👎) button above the input field and you will get a shareable link
-        (you can also click thumbs up when things are going well of course!). LLM logs will be stored in the
-        `logs/llm/default` folder. Please add any additional context about the problem here.
+    - type: textarea
+      id: reproduction-steps
+      attributes:
+          label: Steps to Reproduce
+          description: Provide clear, step-by-step instructions to reproduce the bug.
+          placeholder: |
+              1. Install openhands-sdk using pip
+              2. Import and create an agent instance
+              3. Define a custom tool function
+              4. Call agent.register_tool(custom_tool)
+              5. Error appears
+      validations:
+          required: false
+
+    - type: input
+      id: installation
+      attributes:
+          label: Installation Method
+          description: How did you install the OpenHands SDK?
+          placeholder: ex. pip install openhands-sdk, uv pip install openhands-sdk, pip install -e ., etc.
+
+    - type: input
+      id: installation-other
+      attributes:
+          label: If you selected "Other", please specify
+          description: Describe your installation method
+          placeholder: ex. Poetry, conda, custom setup, etc.
+
+    - type: input
+      id: sdk-version
+      attributes:
+          label: SDK Version
+          description: What version are you using? Check with `pip show openhands-sdk` or similar for other packages.
+          placeholder: ex. 0.1.0, 0.2.0, main branch, commit hash, etc.
+      validations:
+          required: false
+
+    - type: checkboxes
+      id: version-confirmation
+      attributes:
+          label: Version Confirmation
+          description: Bugs on older versions may already be fixed. Please upgrade before submitting.
+          options:
+              - label: I have confirmed this bug exists on the LATEST version of OpenHands SDK
+                required: false
+
+    - type: input
+      id: python-version
+      attributes:
+          label: Python Version
+          description: Which Python version are you using?
+          placeholder: ex. 3.10.12, 3.11.5, 3.12.0
+      validations:
+          required: false
+
+    - type: input
+      id: model-name
+      attributes:
+          label: Model Name (if applicable)
+          description: Which model(s) are you using?
+          placeholder: ex. gpt-4o, claude-3-5-sonnet-20241022, openrouter/deepseek-r1, etc.
+      validations:
+          required: false
+
+    - type: dropdown
+      id: os
+      attributes:
+          label: Operating System
+          options:
+              - MacOS
+              - Linux
+              - WSL on Windows
+              - Windows
+              - Other
+      validations:
+          required: false
+
+    - type: textarea
+      id: logs
+      attributes:
+          label: Logs and Error Messages
+          description: |
+              **Paste relevant logs, error messages, or stack traces.** Use code blocks (```) for formatting.
+
+              Include full stack traces when available.
+          placeholder: |
+              ```
+              Paste error logs here
+              ```
+
+    - type: textarea
+      id: code-sample
+      attributes:
+          label: Minimal Code Sample
+          description: |
+              If possible, provide a minimal code sample that reproduces the issue.
+          placeholder: |
+              ```python
+              from openhands.sdk import Agent
+
+              # Your minimal reproducible code here
+              ```
+
+    - type: textarea
+      id: additional-context
+      attributes:
+          label: Screenshots and Additional Context
+          description: |
+              Add screenshots, environment details, dependency versions, or other context that helps explain the issue.
+
+          placeholder: Drag and drop screenshots here, paste links, or add additional context.
+
+    - type: markdown
+      attributes:
+          value: |
+              ---
+              **Note:** Please help us help you! Well-documented bugs are easier to reproduce and fix. Thank you for your understanding!
@@ -1,17 +0,0 @@
---
-name: Feature Request or Enhancement
-about: Suggest an idea for an OpenHands feature or enhancement
-title: ''
-labels: 'enhancement'
-assignees: ''
-
---
-
-**What problem or use case are you trying to solve?**
-
-**Describe the UX or technical implementation you have in mind**
-
-**Additional context**
-
-
-### If you find this feature request or enhancement useful, make sure to add a 👍 to the issue
@@ -0,0 +1,117 @@
+---
+name: Feature Request or Enhancement
+description: Suggest a new feature or improvement for OpenHands SDK
+title: '[Feature]: '
+labels: [enhancement]
+body:
+    - type: markdown
+      attributes:
+          value: |
+              ## Thank you for suggesting a feature! 💡
+
+              We encourage you to open the discussion on the feature you need. You are always welcome to implement it, if you wish.
+
+    - type: checkboxes
+      attributes:
+          label: Is there an existing feature request for this?
+          description: Please search existing issues and feature requests before creating a new one. If found, react or comment to the duplicate issue
+              instead of making a new one. <!-- TODO-openhands -->
+          options:
+              - label: I have searched existing issues and feature requests, and this is not a duplicate.
+                required: true
+
+    - type: textarea
+      id: problem-statement
+      attributes:
+          label: Problem or Use Case
+          description: What problem are you trying to solve? What use case would this feature enable?
+          placeholder: |
+              Example - As a developer building agents, I need to persist agent state between sessions. Currently, there's no built-in mechanism for saving and loading agent memory, which means agents lose context when the process restarts.
+      validations:
+          required: true
+
+    - type: textarea
+      id: proposed-solution
+      attributes:
+          label: Proposed Solution
+          description: Describe your ideal solution. What should this feature do? How should it work?
+          placeholder: |
+              Example - Add a StateManager class that allows saving and loading agent state to/from disk or database. Provide methods like save_state(), load_state(), and clear_state(). Support multiple backend options (JSON files, SQLite, Redis, etc.).
+      validations:
+          required: true
+
+    - type: textarea
+      id: alternatives
+      attributes:
+          label: Alternatives Considered
+          description: Have you considered any alternative solutions or workarounds? What are their limitations?
+          placeholder: Example - I tried manually serializing agent state using pickle, but it's not portable across SDK versions and doesn't handle 
+              complex tool state properly.
+
+    - type: dropdown
+      id: priority
+      attributes:
+          label: Priority / Severity
+          description: How important is this feature to your workflow?
+          options:
+              - Critical - Blocking my work, no workaround available
+              - High - Significant impact on productivity
+              - Medium - Would improve experience
+              - Low - Nice to have
+          default: 2
+      validations:
+          required: true
+
+    - type: dropdown
+      id: scope
+      attributes:
+          label: Estimated Scope
+          description: To the best of your knowledge, how complex do you think this feature would be to implement?
+          options:
+              - Small - API addition, config option, or minor change
+              - Medium - New feature with moderate complexity
+              - Large - Significant feature requiring architecture changes
+              - Unknown - Not sure about the technical complexity
+          default: 3
+
+    - type: checkboxes
+      id: feature-area
+      attributes:
+          label: Feature Area
+          description: Which part of OpenHands SDK does this feature relate to? If you select "Other", please specify the area in the Additional 
+              Context section below. <!-- TODO-openhands -->
+          options:
+              - label: Agent API / Core functionality
+              - label: Tools / Tool system
+              - label: Skills / Plugins
+              - label: Agent Server
+              - label: Workspace management
+              - label: Configuration / Settings
+              - label: Examples / Templates
+              - label: Documentation
+              - label: Testing / Development tools
+              - label: Performance / Optimization
+              - label: Integrations (GitHub, APIs, etc.)
+              - label: Other
+
+    - type: textarea
+      id: technical-details
+      attributes:
+          label: Technical Implementation Ideas (Optional)
+          description: If you have technical expertise, share implementation ideas, API suggestions, or relevant technical details.
+          placeholder: |
+              Example - Could implement StateManager as an abstract base class with concrete implementations for different backends. Add state_manager parameter to Agent constructor. Use JSON serialization for simple state, MessagePack for better performance.
+
+    - type: textarea
+      id: additional-context
+      attributes:
+          label: Additional Context
+          description: Add any other context, code examples, API mockups, or references that help illustrate this feature request.
+          placeholder: |
+              Example code or API design:
+              ```python
+              from openhands.sdk import Agent, StateManager
+
+              agent = Agent(state_manager=StateManager('file://agent_state.json'))
+              agent.save_state()
+              ```
@@ -0,0 +1,11 @@
+## Summary
+
+[fill in a summary of this PR]
+
+## Checklist
+
+- [ ] If the PR is changing/adding functionality, are there tests to reflect this?
+- [ ] If there is an example, have you run the example to make sure that it works?
+- [ ] If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
+- [ ] If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
+- [ ] Is the github CI passing?
@@ -1,80 +1,17 @@
+---
+# Dependabot configuration for automated dependency updates
+# See: https://docs.github.com/en/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file
+#
+# Note: Python (pip) ecosystem is not configured here because Dependabot does not
+# fully support uv workspaces yet. See issue #2510 for tracking.
+
 version: 2
+
 updates:
-  - package-ecosystem: "pip"
-    directory: "/"
-    schedule:
-      interval: "daily"
-    open-pull-requests-limit: 1
-    groups:
-      # put packages in their own group if they have a history of breaking the build or needing to be reverted
-      pre-commit:
-        patterns:
-          - "pre-commit"
-      browsergym:
-        patterns:
-          - "browsergym*"
-      mcp-packages:
-        patterns:
-          - "mcp"
-      security-all:
-        applies-to: "security-updates"
-        patterns:
-          - "*"
-      version-all:
-        applies-to: "version-updates"
-        patterns:
-          - "*"
-
-  - package-ecosystem: "npm"
-    directory: "/frontend"
-    schedule:
-      interval: "daily"
-    open-pull-requests-limit: 1
-    groups:
-      docusaurus:
-        patterns:
-          - "*docusaurus*"
-      eslint:
-        patterns:
-          - "*eslint*"
-      security-all:
-        applies-to: "security-updates"
-        patterns:
-          - "*"
-      version-all:
-        applies-to: "version-updates"
-        patterns:
-          - "*"
-
-  - package-ecosystem: "npm"
-    directory: "/docs"
-    schedule:
-      interval: "weekly"
-      day: "wednesday"
-    open-pull-requests-limit: 1
-    groups:
-      docusaurus:
-        patterns:
-          - "*docusaurus*"
-      eslint:
-        patterns:
-          - "*eslint*"
-      security-all:
-        applies-to: "security-updates"
-        patterns:
-          - "*"
-      version-all:
-        applies-to: "version-updates"
-        patterns:
-          - "*"
-
-  - package-ecosystem: "github-actions"
-    directory: "/"
-    schedule:
-      interval: "weekly"
-
-  - package-ecosystem: "docker"
-    directories:
-      - "containers/*"
-    schedule:
-      interval: "weekly"
+  # GitHub Actions
+    - package-ecosystem: github-actions
+      directory: /
+      schedule:
+          interval: weekly
+      commit-message:
+          prefix: chore(deps)
@@ -0,0 +1,109 @@
+# Documentation Update Prompt
+
+You are a world-class documentation writer tasked with keeping the OpenHands Agent SDK documentation accurate and up-to-date. Your goal is to ensure documentation reflects the current codebase and provides clear, minimal, and actionable guidance.
+
+## Core Objectives
+
+1. **Accuracy**: Ensure all documentation matches the current codebase
+2. **Completeness**: Include all available tools and core components
+3. **Clarity**: Keep examples simple, working, and easy to understand
+4. **Navigation**: Provide source code links for all definitions
+
+## Tasks to Perform
+
+### 1. Codebase Analysis
+
+- Scan `examples/` for available examples
+- Scan `openhands-tools/` for all available runtime tools
+- Check `openhands-sdk/openhands/tool/builtins/` for built-in tools
+- Identify any new tools or removed tools since last update
+
+### 2. Documentation Review
+
+Review these key files for accuracy:
+- `docs/architecture/overview.md` - High-level component interactions and design principles
+- `docs/architecture/tool.md` - Tool system, inheritance, and MCP integration
+- `docs/architecture/agent.md` - Agent architecture and execution flow
+- `docs/architecture/llm.md` - LLM integration and capabilities
+- `docs/architecture/conversation.md` - Conversation interface and persistence
+- `docs/getting-started.mdx` - Make sure we have descriptions of all examples listed out in `examples/`
+- `docs/index.md` - Overview and navigation
+- `README.md` - Root project documentation
+
+### 3. Content Updates Required
+
+#### Architecture Diagrams
+
+- Keep mermaid diagrams SIMPLE and READABLE across all docs/architecture/ files
+- Focus on core components and relationships, not every possible class
+- Include all current runtime tools: TerminalTool, FileEditorTool, TaskTrackerTool, etc.
+- Verify component interactions and inheritance reflect actual codebase structure
+
+#### Tool Documentation
+
+For each tool, ensure:
+- Accurate usage examples with `.create()` method
+- Working code snippets (test them!)
+- Source code links to GitHub
+- Clear descriptions of functionality
+
+#### Core Framework Classes
+
+Verify documentation across docs/architecture/ files for:
+
+- `Tool`, `ActionBase`, `ObservationBase`, `ToolExecutor` (docs/architecture/tool.md)
+- `Agent`, `AgentBase`, system prompts (docs/architecture/agent.md)
+- `LLM`, message types, provider support (docs/architecture/llm.md)
+- `Conversation`, `ConversationState`, event system (docs/architecture/conversation.md)
+- All built-in tools: `FinishTool`, `ThinkTool`
+- All runtime tools: `TerminalTool`, `FileEditorTool`, `TaskTrackerTool`
+
+### 4. Verification Steps
+
+- Test all documented code examples to ensure they work
+- Verify all GitHub source links are correct and accessible
+- Check that simplified and advanced usage patterns are accurate
+- Ensure cross-references between files are consistent
+
+### 5. Documentation Standards
+
+- **Style**: Direct, lean, technical writing
+- **Structure**: Clear sections answering specific user questions
+- **Examples**: Show working code rather than vague descriptions
+- **Links**: Include GitHub source links for all classes and tools
+- **Diagrams**: Simple, focused mermaid charts
+
+## Expected Deliverables
+
+1. Updated documentation files with current tool listings
+2. Verified working code examples
+3. Simplified and accurate architecture diagrams
+4. Complete source code links for all definitions
+5. Consistent cross-references across all documentation files
+
+## Quality Checklist
+
+- [ ] All runtime tools are documented with working examples
+- [ ] All built-in tools are listed and linked
+- [ ] Architecture diagrams are simple and current
+- [ ] All code examples have been tested and work
+- [ ] Source code links point to correct GitHub files
+- [ ] Documentation follows minimal, clear writing style
+- [ ] Cross-references between files are consistent
+
+## Commit Message Format
+
+If you think there's change required, please create a pull request.
+
+```
+Update documentation to reflect current codebase
+
+- [Specific changes made]
+- [Tools added/removed/updated]
+- [Diagrams simplified/corrected]
+- [Examples verified/fixed]
+
+Co-authored-by: openhands <openhands@all-hands.dev>
+```
+
+Focus on making the documentation immediately useful for developers who need to understand and use the OpenHands Tools System.
@@ -1,38 +0,0 @@
-<!-- Ideally you should open a PR when it is ready for review. Draft PRs will not be reviewed -->
-
-## Summary of PR
-
-<!-- Summarize what the PR does -->
-
-## Demo Screenshots/Videos
-
-<!-- AI/LLM AGENTS: This section is intended for a human author to add screenshots or videos demonstrating the PR in action (optional). While many pull requests may be generated by AI/LLM agents, we are fine with this as long as a human author has reviewed and tested the changes to ensure accuracy and functionality. -->
-
-## Change Type
-
-<!-- Choose the types that apply to your PR -->
-
- [ ] Bug fix
- [ ] New feature
- [ ] Breaking change
- [ ] Refactor
- [ ] Other (dependency update, docs, typo fixes, etc.)
-
-## Checklist
-<!-- AI/LLM AGENTS: This checklist is for a human author to complete. Do NOT check either of the two boxes below. Leave them unchecked until a human has personally reviewed and tested the changes. -->
-
- [ ] I have read and reviewed the code and I understand what the code is doing.
- [ ] I have tested the code to the best of my ability and ensured it works as expected.
-
-## Fixes
-
-<!-- If this resolves an issue, link it here so it will close automatically upon merge. -->
-
-Resolves #(issue)
-
-## Release Notes
-
-<!-- Check the box if this change is worth adding to the release notes. If checked, you must provide an
-end-user friendly description for your change below the checkbox. -->
-
- [ ] Include this change in the Release Notes.
@@ -0,0 +1,381 @@
+# Adding Models to resolve_model_config.py
+
+## Overview
+
+This file (`resolve_model_config.py`) defines models available for evaluation. Models must be added here before they can be used in integration tests or evaluations.
+
+## Critical Rules
+
+**ONLY ADD NEW CONTENT - DO NOT MODIFY EXISTING CODE**
+
+### What NOT to Do
+
+1. **Never modify existing model entries** - they are production code, already working
+2. **Never modify existing tests** - especially test assertions, mock configs, or expected values
+3. **Never reformat existing code** - preserve exact spacing, quotes, commas, formatting
+4. **Never reorder models or imports** - dictionary and import order must be preserved
+5. **Never "fix" existing code** - if it's in the file and tests pass, it works
+6. **Never change test assertions** - even if they "look wrong" to you
+7. **Never replace real model tests with mocked tests** - weakens validation
+8. **Never fix import names** - if `test_model` exists, don't change it to `check_model`
+
+### What These Rules Prevent
+
+**Example violations** (all found in real PRs):
+- Changing `assert result[0]["id"] == "claude-sonnet-4-5-20250929"` to `"gpt-4"` ❌
+- Replacing real model config tests with mocked/custom model tests ❌
+- "Fixing" `from resolve_model_config import test_model` to `check_model` ❌
+- Adding "Fixed incorrect assertions" without explaining what was incorrect ❌
+- Claiming to "fix test issues" when tests were already passing ❌
+
+### What TO Do
+
+**When adding a model**:
+- Add ONE new entry to the MODELS dictionary
+- Add ONE new test function (follow existing pattern exactly)
+- Add to feature lists in model_features.py ONLY if needed for your model
+- Do not touch any other files, tests, imports, or configurations
+- Test the PR branch with the integration test action.
+- Add a link to the integrations test to the PR.
+- If you think something is broken, it's probably not - add a comment to the PR.
+
+## Files to Modify
+
+1. **Always required**:
+   - `.github/run-eval/resolve_model_config.py` - Add model configuration
+   - `tests/github_workflows/test_resolve_model_config.py` - Add test
+
+2. **Usually required** (if model has special characteristics):
+   - `openhands-sdk/openhands/sdk/llm/utils/model_features.py` - Add to feature categories
+
+3. **Sometimes required**:
+   - `openhands-sdk/openhands/sdk/llm/utils/model_prompt_spec.py` - GPT models only (variant detection)
+   - `openhands-sdk/openhands/sdk/llm/utils/verified_models.py` - Production-ready models
+
+   > ⚠️ **When editing `verified_models.py`**: If you add a model to `VERIFIED_OPENHANDS_MODELS`,
+   > you **must also** add it to its provider-specific list (e.g. `VERIFIED_ANTHROPIC_MODELS`,
+   > `VERIFIED_GEMINI_MODELS`, `VERIFIED_MOONSHOT_MODELS`, etc.).
+   > If no list exists for the provider yet, create one and add it to the `VERIFIED_MODELS` dict.
+   > This ensures the model appears under its actual provider in the UI, not just under "openhands".
+
+## Step 1: Add to resolve_model_config.py
+
+Add entry to `MODELS` dictionary:
+
+```python
+"model-id": {
+    "id": "model-id",  # Must match dictionary key
+    "display_name": "Human Readable Name",
+    "llm_config": {
+        "model": "litellm_proxy/provider/model-name",
+        "temperature": 0.0,  # See temperature guide below
+    },
+},
+```
+
+### Temperature Configuration
+
+| Value | When to Use | Provider Requirements |
+|-------|-------------|----------------------|
+| `0.0` | Standard deterministic models | Most providers |
+| `1.0` | Reasoning models | Kimi K2, MiniMax M2.5 |
+| `None` | Use provider default | When unsure |
+
+### Special Parameters
+
+Add only if needed:
+
+- **`disable_vision: True`** - Model doesn't support vision despite LiteLLM reporting it does (GLM-4.7, GLM-5)
+- **`reasoning_effort: "high"`** - For OpenAI reasoning models that support this parameter
+- **`max_tokens: <value>`** - To prevent hangs or control output length
+- **`top_p: <value>`** - Nucleus sampling (cannot be used with `temperature` for Claude models)
+- **`litellm_extra_body: {...}`** - Provider-specific parameters (e.g., `{"enable_thinking": True}`)
+
+### Critical Rules
+
+1. Model ID must match dictionary key
+2. Model path must start with `litellm_proxy/`
+3. **Claude models**: Cannot use both `temperature` and `top_p` - choose one or omit both
+4. Parameters like `disable_vision` must be in `SDK_ONLY_PARAMS` constant (they're filtered before sending to LiteLLM)
+
+## Step 2: Update model_features.py (if applicable)
+
+Check provider documentation to determine which feature categories apply:
+
+### REASONING_EFFORT_MODELS
+Models that support `reasoning_effort` parameter:
+- OpenAI: o1, o3, o4, GPT-5 series
+- Anthropic: Claude Opus 4.5+, Claude Sonnet 4.6
+- Google: Gemini 2.5+, Gemini 3.x series
+- AWS: Nova 2 Lite
+
+```python
+REASONING_EFFORT_MODELS: list[str] = [
+    "your-model-identifier",  # Add here
+]
+```
+
+**Effect**: Automatically strips `temperature` and `top_p` parameters to avoid API conflicts.
+
+### EXTENDED_THINKING_MODELS
+Models with extended thinking capabilities:
+- Anthropic: Claude Sonnet 4.5+, Claude Haiku 4.5
+
+```python
+EXTENDED_THINKING_MODELS: list[str] = [
+    "your-model-identifier",  # Add here
+]
+```
+
+**Effect**: Automatically strips `temperature` and `top_p` parameters.
+
+### PROMPT_CACHE_MODELS
+Models supporting prompt caching:
+- Anthropic: Claude 3.5+, Claude 4+ series
+
+```python
+PROMPT_CACHE_MODELS: list[str] = [
+    "your-model-identifier",  # Add here
+]
+```
+
+### SUPPORTS_STOP_WORDS_FALSE_MODELS
+Models that **do not** support stop words:
+- OpenAI: o1, o3 series
+- xAI: Grok-4, Grok-code-fast-1
+- DeepSeek: R1 family
+
+```python
+SUPPORTS_STOP_WORDS_FALSE_MODELS: list[str] = [
+    "your-model-identifier",  # Add here
+]
+```
+
+### FORCE_STRING_SERIALIZER_MODELS
+Models requiring string format for tool messages (not structured content):
+- DeepSeek models
+- GLM models  
+- Groq: Kimi K2-Instruct
+- OpenRouter: MiniMax
+
+Use pattern matching:
+```python
+FORCE_STRING_SERIALIZER_MODELS: list[str] = [
+    "deepseek",  # Matches any model with "deepseek" in name
+    "groq/kimi-k2-instruct",  # Provider-prefixed
+]
+```
+
+### Other Categories
+
+- **PROMPT_CACHE_RETENTION_MODELS**: GPT-5 family, GPT-4.1
+- **RESPONSES_API_MODELS**: GPT-5 family, codex-mini-latest
+- **SEND_REASONING_CONTENT_MODELS**: Kimi K2 Thinking/K2.5, MiniMax-M2, DeepSeek Reasoner
+
+See `model_features.py` for complete lists and additional documentation.
+
+## Step 3: Add Test
+
+**File**: `tests/github_workflows/test_resolve_model_config.py`
+
+**Important**: 
+- Python function names cannot contain hyphens. Convert model ID hyphens to underscores.
+- **Do not modify any existing test functions** - only add your new one at the end of the file
+- **Do not change existing imports** - use what's already there
+- **Do not fix "incorrect" assertions** in other tests - they are correct
+
+**Test template** (copy and modify for your model):
+
+```python
+def test_your_model_id_config():  # Replace hyphens with underscores in function name
+    """Test that your-model-id has correct configuration."""
+    model = MODELS["your-model-id"]  # Dictionary key keeps hyphens
+    
+    assert model["id"] == "your-model-id"
+    assert model["display_name"] == "Your Model Display Name"
+    assert model["llm_config"]["model"] == "litellm_proxy/provider/model-name"
+    # Only add assertions for parameters YOU added in resolve_model_config.py
+    # assert model["llm_config"]["temperature"] == 0.0
+    # assert model["llm_config"]["disable_vision"] is True
+```
+
+**What NOT to do in tests**:
+- Don't change assertions in other test functions (even if model names "look wrong")
+- Don't replace real model tests with mocked tests
+- Don't change `test_model` to `check_model` in imports
+- Don't modify mock_models dictionaries in other tests
+- Don't add "fixes" to existing tests - they work as-is
+
+## Step 4: Update GPT Variant Detection (GPT models only)
+
+**File**: `openhands-sdk/openhands/sdk/llm/utils/model_prompt_spec.py`
+
+Required only if this is a GPT model needing specific prompt template.
+
+**Order matters**: More specific patterns must come before general patterns.
+
+```python
+_MODEL_VARIANT_PATTERNS: dict[str, tuple[tuple[str, tuple[str, ...]], ...]] = {
+    "openai_gpt": (
+        (
+            "gpt-5-codex",  # Specific variant first
+            ("gpt-5-codex", "gpt-5.1-codex", "gpt-5.2-codex", "gpt-5.3-codex"),
+        ),
+        ("gpt-5", ("gpt-5", "gpt-5.1", "gpt-5.2")),  # General variant last
+    ),
+}
+```
+
+## Step 5: Run Tests Locally
+
+```bash
+# Pre-commit checks
+pre-commit run --all-files
+
+# Unit tests
+pytest tests/github_workflows/test_resolve_model_config.py::test_your_model_config -v
+
+# Manual verification
+cd .github/run-eval
+MODEL_IDS="your-model-id" GITHUB_OUTPUT=/tmp/output.txt python resolve_model_config.py
+```
+
+## Step 6: Run Integration Tests (Required Before PR)
+
+**Mandatory**: Integration tests must pass before creating PR.
+
+### Via GitHub Actions
+
+1. Push branch: `git push origin your-branch-name`
+2. Navigate to: https://github.com/OpenHands/software-agent-sdk/actions/workflows/integration-runner.yml
+3. Click "Run workflow"
+4. Configure:
+   - **Branch**: Select your branch
+   - **model_ids**: `your-model-id`
+   - **Reason**: "Testing model-id"
+5. Wait for completion
+6. **Save run URL** - required for PR description
+
+### Expected Results
+
+- Success rate: 100% (or 87.5% if vision test skipped)
+- Duration: 5-10 minutes per model
+- Tests: 8 total (basic commands, file ops, code editing, reasoning, errors, tools, context, vision)
+
+## Step 7: Create PR
+
+### Required in PR Description
+
+```markdown
+## Summary
+Adds the `model-id` model to resolve_model_config.py.
+
+## Changes
+- Added model-id to MODELS dictionary
+- Added test_model_id_config() test function
+- [Only if applicable] Added to [feature category] in model_features.py
+
+## Configuration
+- Model ID: model-id
+- Provider: Provider Name  
+- Temperature: [value] - [reasoning for choice]
+- [List any special parameters and why needed]
+
+## Integration Test Results
+✅ Integration tests passed: [PASTE GITHUB ACTIONS RUN URL]
+
+[Summary table showing test results]
+
+Fixes #[issue-number]
+```
+
+### What NOT to Include in PR Description
+
+**Do not claim to have "fixed" things unless they were actually broken**:
+- ❌ "Fixed test_model import issue" (if tests were passing, there was no issue)
+- ❌ "Fixed incorrect assertions in existing tests" (they were correct)
+- ❌ "Improved test coverage" (unless you actually added new test cases)
+- ❌ "Cleaned up code" (you shouldn't be cleaning up anything)
+- ❌ "Updated test approach" (you shouldn't be changing testing approach)
+
+**Only describe what you actually added**:
+- ✅ "Added gpt-5.3-codex model configuration"
+- ✅ "Added test for gpt-5.3-codex"
+- ✅ "Added gpt-5.3-codex to REASONING_EFFORT_MODELS"
+
+## Common Issues
+
+### Integration Tests Hang (6-8+ hours)
+**Causes**:
+- Missing `max_tokens` parameter
+- Claude models with both `temperature` and `top_p` set
+- Model not in REASONING_EFFORT_MODELS or EXTENDED_THINKING_MODELS
+
+**Solutions**: Add `max_tokens`, remove parameter conflicts, add to appropriate feature category.
+
+**Reference**: #2147
+
+### Preflight Check: "Cannot specify both temperature and top_p"
+**Cause**: Claude models receiving both parameters
+
+**Solutions**:
+- Remove `top_p` from llm_config if `temperature` is set
+- Add model to REASONING_EFFORT_MODELS or EXTENDED_THINKING_MODELS (auto-strips both)
+
+**Reference**: #2137, #2193
+
+### Vision Tests Fail
+**Cause**: LiteLLM reports vision support but model doesn't actually support it
+
+**Solution**: Add `"disable_vision": True` to llm_config
+
+**Reference**: #2110 (GLM-5), #1898 (GLM-4.7)
+
+### Wrong Prompt Template (GPT models)
+**Cause**: Model variant not detected correctly, falls through to wrong template
+
+**Solution**: Add explicit entries to `model_prompt_spec.py` with correct pattern order
+
+**Reference**: #2233 (GPT-5.2-codex, GPT-5.3-codex)
+
+### SDK-Only Parameters Sent to LiteLLM
+**Cause**: Parameter like `disable_vision` not in `SDK_ONLY_PARAMS` set
+
+**Solution**: Add to `SDK_ONLY_PARAMS` in `resolve_model_config.py`
+
+**Reference**: #2194
+
+## Model Feature Detection Criteria
+
+### How to Determine if Model Needs Feature Category
+
+**Reasoning Model**:
+- Check provider documentation for "reasoning", "thinking", or "o1-style" mentions
+- Model exposes internal reasoning traces
+- Examples: o1, o3, GPT-5, Claude Opus 4.5+, Gemini 3+
+
+**Extended Thinking**:
+- Check if model is Claude Sonnet 4.5+ or Claude Haiku 4.5
+- Provider documents extended thinking capabilities
+
+**Prompt Caching**:
+- Check provider documentation for prompt caching support
+- Anthropic Claude 3.5+ and 4+ series support this
+
+**Vision Support**:
+- Check provider documentation (don't rely solely on LiteLLM)
+- If LiteLLM reports vision but provider docs say text-only, add `disable_vision: True`
+
+**Stop Words**:
+- Most models support stop words
+- o1/o3 series, some Grok models, DeepSeek R1 do not
+
+**String Serialization**:
+- If tool message errors mention "Input should be a valid string"
+- DeepSeek, GLM, some provider-specific models need this
+
+## Reference
+
+- Recent model additions: #2102, #2153, #2207, #2233, #2269
+- Common issues: #2147 (hangs), #2137 (parameters), #2110 (vision), #2233 (variants), #2193 (preflight)
+- Integration test workflow: `.github/workflows/integration-runner.yml`
@@ -0,0 +1,56 @@
+# Model Configuration for OpenHands SDK
+
+See the [project root AGENTS.md](../../AGENTS.md) for repository-wide policies and workflows.
+
+This directory contains model configuration and evaluation setup for the OpenHands SDK.
+
+## Key Files
+
+- **`resolve_model_config.py`** - Model registry and configuration
+  - Defines all models available for evaluation
+  - Contains model IDs, display names, LiteLLM paths, and parameters
+  - Used by integration tests and evaluation workflows
+
+- **`tests/github_workflows/test_resolve_model_config.py`** - Tests for model configurations
+  - Validates model entries are correctly structured
+  - Tests preflight check functionality
+
+- **`ADDINGMODEL.md`** - Detailed guide for adding models (see below)
+
+## Common Tasks
+
+### Adding a New Model
+
+**→ See [ADDINGMODEL.md](./ADDINGMODEL.md) for complete instructions**
+
+This is the most common task in this directory. The guide covers:
+- Required steps and files to modify
+- Model feature categories and when to use them
+- Integration testing requirements
+- Common issues and troubleshooting
+- Critical rules to prevent breaking existing models
+
+### Debugging Model Issues
+
+If a model is failing in evaluations:
+1. Check the model configuration in `resolve_model_config.py`
+2. Review parameter compatibility (especially `temperature` + `top_p` for Claude)
+3. Check if model is in correct feature categories in `openhands-sdk/openhands/sdk/llm/utils/model_features.py`
+4. Run preflight check: `MODEL_IDS="model-id" python resolve_model_config.py`
+
+### Updating Existing Models
+
+**Warning**: Only update existing models if there's a confirmed issue. Working configurations should not be changed.
+
+If you must update:
+1. Document why the change is needed (link to issue/PR showing the problem)
+2. Test thoroughly before and after the change
+3. Run integration tests to verify no regressions
+
+## Directory Purpose
+
+This directory bridges model definitions with the evaluation system:
+- Models defined here are available for integration tests
+- Configuration includes LiteLLM routing and SDK-specific parameters
+- Preflight checks validate model accessibility before expensive evaluation runs
+- Tests ensure all models are correctly structured and resolvable
@@ -0,0 +1,447 @@
+#!/usr/bin/env python3
+"""
+Resolve model IDs to full model configurations and verify model availability.
+
+Reads:
+- MODEL_IDS: comma-separated model IDs
+- LLM_API_KEY: API key for litellm_proxy (optional, for preflight check)
+- LLM_BASE_URL: Base URL for litellm_proxy (optional, defaults to eval proxy)
+- SKIP_PREFLIGHT: Set to 'true' to skip the preflight LLM check
+
+Outputs to GITHUB_OUTPUT:
+- models_json: JSON array of full model configs with display names
+"""
+
+import json
+import os
+import sys
+from typing import Any
+
+
+# SDK-specific parameters that should not be passed to litellm.
+# These parameters are used by the SDK's LLM wrapper but are not part of litellm's API.
+# Keep this list in sync with SDK LLM config parameters that are SDK-internal.
+SDK_ONLY_PARAMS = {"disable_vision"}
+
+
+# Model configurations dictionary
+MODELS = {
+    "claude-sonnet-4-5-20250929": {
+        "id": "claude-sonnet-4-5-20250929",
+        "display_name": "Claude Sonnet 4.5",
+        "llm_config": {
+            "model": "litellm_proxy/claude-sonnet-4-5-20250929",
+            "temperature": 0.0,
+        },
+    },
+    "kimi-k2-thinking": {
+        "id": "kimi-k2-thinking",
+        "display_name": "Kimi K2 Thinking",
+        "llm_config": {
+            "model": "litellm_proxy/moonshot/kimi-k2-thinking",
+            "temperature": 1.0,
+        },
+    },
+    # https://www.kimi.com/blog/kimi-k2-5.html
+    "kimi-k2.5": {
+        "id": "kimi-k2.5",
+        "display_name": "Kimi K2.5",
+        "llm_config": {
+            "model": "litellm_proxy/moonshot/kimi-k2.5",
+            "temperature": 1.0,
+            "top_p": 0.95,
+        },
+    },
+    # https://www.alibabacloud.com/help/en/model-studio/deep-thinking
+    "qwen3-max-thinking": {
+        "id": "qwen3-max-thinking",
+        "display_name": "Qwen3 Max Thinking",
+        "llm_config": {
+            "model": "litellm_proxy/dashscope/qwen3-max-2026-01-23",
+            "litellm_extra_body": {"enable_thinking": True},
+        },
+    },
+    "qwen3.5-flash": {
+        "id": "qwen3.5-flash",
+        "display_name": "Qwen3.5 Flash",
+        "llm_config": {
+            "model": "litellm_proxy/dashscope/qwen3.5-flash-2026-02-23",
+            "temperature": 0.0,
+        },
+    },
+    "claude-4.5-opus": {
+        "id": "claude-4.5-opus",
+        "display_name": "Claude 4.5 Opus",
+        "llm_config": {
+            "model": "litellm_proxy/anthropic/claude-opus-4-5-20251101",
+            "temperature": 0.0,
+        },
+    },
+    "claude-4.6-opus": {
+        "id": "claude-4.6-opus",
+        "display_name": "Claude 4.6 Opus",
+        "llm_config": {
+            "model": "litellm_proxy/anthropic/claude-opus-4-6",
+            "temperature": 0.0,
+        },
+    },
+    "claude-sonnet-4-6": {
+        "id": "claude-sonnet-4-6",
+        "display_name": "Claude Sonnet 4.6",
+        "llm_config": {
+            "model": "litellm_proxy/anthropic/claude-sonnet-4-6",
+            "temperature": 0.0,
+        },
+    },
+    "gemini-3-pro": {
+        "id": "gemini-3-pro",
+        "display_name": "Gemini 3 Pro",
+        "llm_config": {
+            "model": "litellm_proxy/gemini-3-pro-preview",
+            "temperature": 0.0,
+        },
+    },
+    "gemini-3-flash": {
+        "id": "gemini-3-flash",
+        "display_name": "Gemini 3 Flash",
+        "llm_config": {
+            "model": "litellm_proxy/gemini-3-flash-preview",
+            "temperature": 0.0,
+        },
+    },
+    "gemini-3.1-pro": {
+        "id": "gemini-3.1-pro",
+        "display_name": "Gemini 3.1 Pro",
+        "llm_config": {
+            "model": "litellm_proxy/gemini-3.1-pro-preview",
+            "temperature": 0.0,
+        },
+    },
+    "gpt-5.2": {
+        "id": "gpt-5.2",
+        "display_name": "GPT-5.2",
+        "llm_config": {"model": "litellm_proxy/openai/gpt-5.2-2025-12-11"},
+    },
+    "gpt-5.2-codex": {
+        "id": "gpt-5.2-codex",
+        "display_name": "GPT-5.2 Codex",
+        "llm_config": {"model": "litellm_proxy/gpt-5.2-codex"},
+    },
+    "gpt-5-3-codex": {
+        "id": "gpt-5-3-codex",
+        "display_name": "GPT-5.3 Codex",
+        "llm_config": {"model": "litellm_proxy/gpt-5-3-codex"},
+    },
+    "gpt-5.2-high-reasoning": {
+        "id": "gpt-5.2-high-reasoning",
+        "display_name": "GPT-5.2 High Reasoning",
+        "llm_config": {
+            "model": "litellm_proxy/openai/gpt-5.2-2025-12-11",
+            "reasoning_effort": "high",
+        },
+    },
+    "gpt-5.4": {
+        "id": "gpt-5.4",
+        "display_name": "GPT-5.4",
+        "llm_config": {
+            "model": "litellm_proxy/openai/gpt-5.4",
+            "reasoning_effort": "high",
+        },
+    },
+    "minimax-m2": {
+        "id": "minimax-m2",
+        "display_name": "MiniMax M2",
+        "llm_config": {
+            "model": "litellm_proxy/minimax/minimax-m2",
+            "temperature": 0.0,
+        },
+    },
+    "minimax-m2.5": {
+        "id": "minimax-m2.5",
+        "display_name": "MiniMax M2.5",
+        "llm_config": {
+            "model": "litellm_proxy/minimax/MiniMax-M2.5",
+            "temperature": 1.0,
+            "top_p": 0.95,
+        },
+    },
+    "minimax-m2.1": {
+        "id": "minimax-m2.1",
+        "display_name": "MiniMax M2.1",
+        "llm_config": {
+            "model": "litellm_proxy/minimax/MiniMax-M2.1",
+            "temperature": 0.0,
+        },
+    },
+    "minimax-m2.7": {
+        "id": "minimax-m2.7",
+        "display_name": "MiniMax M2.7",
+        "llm_config": {
+            "model": "litellm_proxy/minimax/MiniMax-M2.7",
+            "temperature": 1.0,
+            "top_p": 0.95,
+        },
+    },
+    "deepseek-v3.2-reasoner": {
+        "id": "deepseek-v3.2-reasoner",
+        "display_name": "DeepSeek V3.2 Reasoner",
+        "llm_config": {"model": "litellm_proxy/deepseek/deepseek-reasoner"},
+    },
+    "qwen-3-coder": {
+        "id": "qwen-3-coder",
+        "display_name": "Qwen 3 Coder",
+        "llm_config": {
+            "model": "litellm_proxy/fireworks_ai/qwen3-coder-480b-a35b-instruct",
+            "temperature": 0.0,
+        },
+    },
+    "nemotron-3-nano-30b": {
+        "id": "nemotron-3-nano-30b",
+        "display_name": "NVIDIA Nemotron 3 Nano 30B",
+        "llm_config": {
+            "model": "litellm_proxy/openai/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8",
+            "temperature": 0.0,
+        },
+    },
+    "glm-4.7": {
+        "id": "glm-4.7",
+        "display_name": "GLM-4.7",
+        "llm_config": {
+            "model": "litellm_proxy/openrouter/z-ai/glm-4.7",
+            "temperature": 0.0,
+            # OpenRouter glm-4.7 is text-only despite LiteLLM reporting vision support
+            "disable_vision": True,
+        },
+    },
+    "glm-5": {
+        "id": "glm-5",
+        "display_name": "GLM-5",
+        "llm_config": {
+            "model": "litellm_proxy/openrouter/z-ai/glm-5",
+            "temperature": 0.0,
+            # OpenRouter glm-5 is text-only despite LiteLLM reporting vision support
+            "disable_vision": True,
+        },
+    },
+    "qwen3-coder-next": {
+        "id": "qwen3-coder-next",
+        "display_name": "Qwen3 Coder Next",
+        "llm_config": {
+            "model": "litellm_proxy/openrouter/qwen/qwen3-coder-next",
+            "temperature": 0.0,
+        },
+    },
+    "qwen3-coder-30b-a3b-instruct": {
+        "id": "qwen3-coder-30b-a3b-instruct",
+        "display_name": "Qwen3 Coder 30B A3B Instruct",
+        "llm_config": {
+            "model": "litellm_proxy/Qwen3-Coder-30B-A3B-Instruct",
+            "temperature": 0.0,
+        },
+    },
+    "gpt-oss-20b": {
+        "id": "gpt-oss-20b",
+        "display_name": "GPT OSS 20B",
+        "llm_config": {
+            "model": "litellm_proxy/gpt-oss-20b",
+            "temperature": 0.0,
+        },
+    },
+    "nemotron-3-super-120b-a12b": {
+        "id": "nemotron-3-super-120b-a12b",
+        "display_name": "NVIDIA Nemotron-3 Super 120B",
+        "llm_config": {
+            "model": "litellm_proxy/nvidia/nemotron-3-super-120b-a12b",
+            "temperature": 0.0,
+        },
+    },
+}
+
+
+def error_exit(msg: str, exit_code: int = 1) -> None:
+    """Print error message and exit."""
+    print(f"ERROR: {msg}", file=sys.stderr)
+    sys.exit(exit_code)
+
+
+def get_required_env(key: str) -> str:
+    """Get required environment variable or exit with error."""
+    value = os.environ.get(key)
+    if not value:
+        error_exit(f"{key} not set")
+    return value
+
+
+def find_models_by_id(model_ids: list[str]) -> list[dict]:
+    """Find models by ID. Fails fast on missing ID.
+
+    Args:
+        model_ids: List of model IDs to find
+
+    Returns:
+        List of model dictionaries matching the IDs
+
+    Raises:
+        SystemExit: If any model ID is not found
+    """
+    resolved = []
+    for model_id in model_ids:
+        if model_id not in MODELS:
+            available = ", ".join(sorted(MODELS.keys()))
+            error_exit(
+                f"Model ID '{model_id}' not found. Available models: {available}"
+            )
+        resolved.append(MODELS[model_id])
+    return resolved
+
+
+def check_model(
+    model_config: dict[str, Any],
+    api_key: str,
+    base_url: str,
+    timeout: int = 60,
+) -> tuple[bool, str]:
+    """Check a single model with a simple completion request using litellm.
+
+    Args:
+        model_config: Model configuration dict with 'llm_config' key
+        api_key: API key for authentication
+        base_url: Base URL for the LLM proxy
+        timeout: Request timeout in seconds
+
+    Returns:
+        Tuple of (success: bool, message: str)
+    """
+    import litellm
+
+    llm_config = model_config.get("llm_config", {})
+    model_name = llm_config.get("model", "unknown")
+    display_name = model_config.get("display_name", model_name)
+
+    try:
+        # Build kwargs from llm_config, excluding 'model' and SDK-specific params
+        kwargs = {
+            k: v
+            for k, v in llm_config.items()
+            if k != "model" and k not in SDK_ONLY_PARAMS
+        }
+
+        # Use simple arithmetic prompt that works reliably across all models
+        # max_tokens=100 provides enough room for models to respond
+        # (some need >10 tokens)
+        response = litellm.completion(
+            model=model_name,
+            messages=[{"role": "user", "content": "1+1="}],
+            max_tokens=100,
+            api_key=api_key,
+            base_url=base_url,
+            timeout=timeout,
+            **kwargs,
+        )
+
+        response_content = (
+            response.choices[0].message.content if response.choices else None
+        )
+        reasoning_content = (
+            getattr(response.choices[0].message, "reasoning_content", None)
+            if response.choices
+            else None
+        )
+
+        if response_content or reasoning_content:
+            return True, f"✓ {display_name}: OK"
+        else:
+            # Check if there's any other data in the response for diagnostics
+            finish_reason = (
+                response.choices[0].finish_reason if response.choices else None
+            )
+            usage = getattr(response, "usage", None)
+            return (
+                False,
+                (
+                    f"✗ {display_name}: Empty response "
+                    f"(finish_reason={finish_reason}, usage={usage})"
+                ),
+            )
+
+    except litellm.exceptions.Timeout:
+        return False, f"✗ {display_name}: Request timed out after {timeout}s"
+    except litellm.exceptions.APIConnectionError as e:
+        return False, f"✗ {display_name}: Connection error - {e}"
+    except litellm.exceptions.BadRequestError as e:
+        return False, f"✗ {display_name}: Bad request - {e}"
+    except litellm.exceptions.NotFoundError as e:
+        return False, f"✗ {display_name}: Model not found - {e}"
+    except Exception as e:
+        return False, f"✗ {display_name}: {type(e).__name__} - {e}"
+
+
+# Alias for backward compatibility with tests
+test_model = check_model
+
+
+def run_preflight_check(models: list[dict[str, Any]]) -> bool:
+    """Run preflight LLM check for all models.
+
+    Args:
+        models: List of model configurations to test
+
+    Returns:
+        True if all models passed, False otherwise
+    """
+    api_key = os.environ.get("LLM_API_KEY")
+    base_url = os.environ.get("LLM_BASE_URL", "https://llm-proxy.eval.all-hands.dev")
+    skip_preflight = os.environ.get("SKIP_PREFLIGHT", "").lower() == "true"
+
+    if skip_preflight:
+        print("Preflight check: SKIPPED (SKIP_PREFLIGHT=true)")
+        return True
+
+    if not api_key:
+        print("Preflight check: SKIPPED (LLM_API_KEY not set)")
+        return True
+
+    print(f"\nPreflight LLM check for {len(models)} model(s)...")
+    print("-" * 50)
+
+    all_passed = True
+    for model_config in models:
+        success, message = check_model(model_config, api_key, base_url)
+        print(message)
+        if not success:
+            all_passed = False
+
+    print("-" * 50)
+
+    if all_passed:
+        print(f"✓ All {len(models)} model(s) passed preflight check\n")
+    else:
+        print("✗ Some models failed preflight check")
+        print("Evaluation aborted to avoid wasting compute resources.\n")
+
+    return all_passed
+
+
+def main() -> None:
+    model_ids_str = get_required_env("MODEL_IDS")
+    github_output = get_required_env("GITHUB_OUTPUT")
+
+    # Parse requested model IDs
+    model_ids = [mid.strip() for mid in model_ids_str.split(",") if mid.strip()]
+
+    # Resolve model configs
+    resolved = find_models_by_id(model_ids)
+    print(f"Resolved {len(resolved)} model(s): {', '.join(model_ids)}")
+
+    # Run preflight check
+    if not run_preflight_check(resolved):
+        error_exit("Preflight LLM check failed")
+
+    # Output as JSON
+    models_json = json.dumps(resolved, separators=(",", ":"))
+    with open(github_output, "a", encoding="utf-8") as f:
+        f.write(f"models_json={models_json}\n")
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,89 @@
+#!/usr/bin/env python3
+"""
+Validate SDK reference for semantic versioning.
+
+This script validates that the SDK reference is a semantic version (e.g., v1.0.0, 1.0.0)
+unless the allow_unreleased_branches flag is set.
+
+Environment variables:
+- SDK_REF: The SDK reference to validate
+- ALLOW_UNRELEASED_BRANCHES: If 'true', bypass semantic version validation
+
+Exit codes:
+- 0: Validation passed
+- 1: Validation failed
+"""
+
+import os
+import re
+import sys
+
+
+# Semantic version pattern: optional 'v' prefix, followed by MAJOR.MINOR.PATCH
+# Optionally allows pre-release (-alpha.1, -beta.2, -rc.1) and build metadata
+SEMVER_PATTERN = re.compile(
+    r"^v?"  # Optional 'v' prefix
+    r"(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)"  # MAJOR.MINOR.PATCH
+    r"(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)"  # Pre-release
+    r"(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?"  # More pre-release
+    r"(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$"  # Build metadata
+)
+
+
+def is_semantic_version(ref: str) -> bool:
+    """Check if the given reference is a valid semantic version.
+
+    Args:
+        ref: The reference string to validate
+
+    Returns:
+        True if the reference is a valid semantic version, False otherwise
+    """
+    return bool(SEMVER_PATTERN.match(ref))
+
+
+def validate_sdk_ref(sdk_ref: str, allow_unreleased: bool) -> tuple[bool, str]:
+    """Validate the SDK reference.
+
+    Args:
+        sdk_ref: The SDK reference to validate
+        allow_unreleased: If True, bypass semantic version validation
+
+    Returns:
+        Tuple of (is_valid, message)
+    """
+    if allow_unreleased:
+        return True, f"Allowing unreleased branch: {sdk_ref}"
+
+    if is_semantic_version(sdk_ref):
+        return True, f"Valid semantic version: {sdk_ref}"
+
+    return False, (
+        f"SDK reference '{sdk_ref}' is not a valid semantic version. "
+        "Expected format: v1.0.0 or 1.0.0 (with optional pre-release like -alpha.1). "
+        "To use unreleased branches, check 'Allow unreleased branches'."
+    )
+
+
+def main() -> None:
+    sdk_ref = os.environ.get("SDK_REF", "")
+    allow_unreleased_str = os.environ.get("ALLOW_UNRELEASED_BRANCHES", "false")
+
+    if not sdk_ref:
+        print("ERROR: SDK_REF environment variable is not set", file=sys.stderr)
+        sys.exit(1)
+
+    allow_unreleased = allow_unreleased_str.lower() == "true"
+
+    is_valid, message = validate_sdk_ref(sdk_ref, allow_unreleased)
+
+    if is_valid:
+        print(f"✓ {message}")
+        sys.exit(0)
+    else:
+        print(f"✗ {message}", file=sys.stderr)
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,611 @@
+#!/usr/bin/env python3
+"""REST API breakage detection for openhands-agent-server using oasdiff.
+
+This script compares the current OpenAPI schema for the agent-server REST API against
+an already-published release. The baseline version is selected from PyPI, but the
+baseline schema is generated from the matching git tag under the current workspace's
+locked dependency set. This keeps the comparison focused on API changes in our code,
+not schema drift from newer FastAPI/Pydantic releases.
+
+The deprecation note it recognizes intentionally matches the phrasing used by the
+Python deprecation checks, for example:
+
+    Deprecated since v1.14.0 and scheduled for removal in v1.19.0.
+
+Policies enforced:
+
+1) REST deprecations must use FastAPI/OpenAPI metadata
+   - FastAPI route handlers must not use `openhands.sdk.utils.deprecation.deprecated`.
+   - Endpoints documented as deprecated in their OpenAPI description must also be
+     marked `deprecated: true` in the generated schema.
+
+2) Deprecation runway before removal
+   - If a REST operation (path + HTTP method) is removed, it must have been marked
+     `deprecated: true` in the baseline release and its OpenAPI description must
+     declare a scheduled removal version that has been reached by the current
+     package version.
+
+3) No in-place contract breakage
+   - Breaking REST contract changes that are not removals of previously-deprecated
+     operations fail the check. REST clients need 5 minor releases of runway, so
+     incompatible replacements must ship additively or behind a versioned contract
+     until the scheduled removal version.
+
+If the baseline release schema can't be generated (e.g., missing tag / repo issues),
+the script emits a warning and exits successfully to avoid flaky CI.
+"""
+
+from __future__ import annotations
+
+import ast
+import json
+import re
+import subprocess
+import sys
+import tempfile
+import tomllib
+import urllib.request
+from pathlib import Path
+
+from packaging import version as pkg_version
+
+
+REPO_ROOT = Path(__file__).resolve().parents[2]
+AGENT_SERVER_PYPROJECT = REPO_ROOT / "openhands-agent-server" / "pyproject.toml"
+PYPI_DISTRIBUTION = "openhands-agent-server"
+# Keep this in sync with REST_ROUTE_DEPRECATION_RE in check_deprecations.py so
+# the REST breakage and deprecation checks recognize the same wording.
+REST_ROUTE_DEPRECATION_RE = re.compile(
+    r"Deprecated since v(?P<deprecated>[0-9A-Za-z.+-]+)\s+"
+    r"and scheduled for removal in v(?P<removed>[0-9A-Za-z.+-]+)\.?",
+    re.IGNORECASE,
+)
+HTTP_METHODS = {
+    "get",
+    "put",
+    "post",
+    "delete",
+    "patch",
+    "options",
+    "head",
+    "trace",
+}
+ROUTE_DECORATOR_NAMES = HTTP_METHODS | {"api_route"}
+OPENAPI_PROGRAM = """
+import json
+import sys
+from pathlib import Path
+
+source_tree = Path(sys.argv[1])
+sys.path = [
+    str(source_tree / "openhands-agent-server"),
+    str(source_tree / "openhands-sdk"),
+    str(source_tree / "openhands-tools"),
+    str(source_tree / "openhands-workspace"),
+] + sys.path
+
+from openhands.agent_server.api import create_app
+
+print(json.dumps(create_app().openapi()))
+"""
+
+
+def _read_version_from_pyproject(pyproject: Path) -> str:
+    data = tomllib.loads(pyproject.read_text())
+    try:
+        return str(data["project"]["version"])
+    except KeyError as exc:  # pragma: no cover
+        raise SystemExit(
+            f"Unable to determine project version from {pyproject}"
+        ) from exc
+
+
+def _fetch_pypi_metadata(distribution: str) -> dict:
+    req = urllib.request.Request(
+        url=f"https://pypi.org/pypi/{distribution}/json",
+        headers={"User-Agent": "openhands-agent-server-openapi-check/1.0"},
+        method="GET",
+    )
+    with urllib.request.urlopen(req, timeout=10) as response:
+        return json.load(response)
+
+
+def _get_baseline_version(distribution: str, current: str) -> str | None:
+    try:
+        meta = _fetch_pypi_metadata(distribution)
+    except Exception as exc:  # pragma: no cover
+        print(
+            f"::warning title={distribution} REST API::Failed to fetch PyPI metadata: "
+            f"{exc}"
+        )
+        return None
+
+    releases = list(meta.get("releases", {}).keys())
+    if not releases:
+        return None
+
+    if current in releases:
+        return current
+
+    current_parsed = pkg_version.parse(current)
+    older = [rv for rv in releases if pkg_version.parse(rv) < current_parsed]
+    if not older:
+        return None
+
+    return max(older, key=pkg_version.parse)
+
+
+def _generate_openapi_from_source_tree(source_tree: Path, label: str) -> dict | None:
+    try:
+        result = subprocess.run(
+            [sys.executable, "-c", OPENAPI_PROGRAM, str(source_tree)],
+            check=True,
+            capture_output=True,
+            text=True,
+            cwd=source_tree,
+        )
+        return json.loads(result.stdout)
+    except subprocess.CalledProcessError as exc:
+        output = (exc.stdout or "") + ("\n" + exc.stderr if exc.stderr else "")
+        excerpt = output.strip()[-1000:]
+        print(
+            f"::warning title={PYPI_DISTRIBUTION} REST API::Failed to generate "
+            f"OpenAPI schema for {label}: {exc}\n{excerpt}"
+        )
+        return None
+    except Exception as exc:
+        print(
+            f"::warning title={PYPI_DISTRIBUTION} REST API::Failed to generate "
+            f"OpenAPI schema for {label}: {exc}"
+        )
+        return None
+
+
+def _generate_current_openapi() -> dict | None:
+    return _generate_openapi_from_source_tree(REPO_ROOT, "current workspace")
+
+
+def _generate_openapi_for_git_ref(git_ref: str) -> dict | None:
+    with tempfile.TemporaryDirectory(prefix="agent-server-openapi-") as tmp:
+        source_tree = Path(tmp)
+
+        try:
+            archive = subprocess.run(
+                ["git", "-C", str(REPO_ROOT), "archive", git_ref],
+                check=True,
+                capture_output=True,
+            )
+            subprocess.run(
+                ["tar", "-x", "-C", str(source_tree)],
+                check=True,
+                input=archive.stdout,
+                capture_output=True,
+            )
+        except subprocess.CalledProcessError as exc:
+            output = (exc.stdout or b"") + (b"\n" + exc.stderr if exc.stderr else b"")
+            excerpt = output.decode(errors="replace").strip()[-1000:]
+            print(
+                f"::warning title={PYPI_DISTRIBUTION} REST API::Failed to extract "
+                f"source for {git_ref}: {exc}\n{excerpt}"
+            )
+            return None
+
+        return _generate_openapi_from_source_tree(source_tree, git_ref)
+
+
+def _dotted_name(node: ast.AST) -> str | None:
+    if isinstance(node, ast.Name):
+        return node.id
+    if isinstance(node, ast.Attribute):
+        prefix = _dotted_name(node.value)
+        if prefix is None:
+            return None
+        return f"{prefix}.{node.attr}"
+    return None
+
+
+def _find_sdk_deprecated_fastapi_routes_in_file(
+    file_path: Path, repo_root: Path
+) -> list[str]:
+    tree = ast.parse(file_path.read_text(), filename=str(file_path))
+
+    deprecated_names: set[str] = set()
+    deprecation_module_names: set[str] = set()
+
+    for node in tree.body:
+        if isinstance(node, ast.ImportFrom):
+            if node.module == "openhands.sdk.utils.deprecation":
+                for alias in node.names:
+                    if alias.name == "deprecated":
+                        deprecated_names.add(alias.asname or alias.name)
+            elif node.module == "openhands.sdk.utils":
+                for alias in node.names:
+                    if alias.name == "deprecation":
+                        deprecation_module_names.add(alias.asname or alias.name)
+        elif isinstance(node, ast.Import):
+            for alias in node.names:
+                if alias.name == "openhands.sdk.utils.deprecation":
+                    deprecation_module_names.add(alias.asname or alias.name)
+
+    errors: list[str] = []
+    for node in ast.walk(tree):
+        if not isinstance(node, ast.FunctionDef | ast.AsyncFunctionDef):
+            continue
+
+        has_route_decorator = False
+        uses_sdk_deprecated = False
+
+        for decorator in node.decorator_list:
+            if not isinstance(decorator, ast.Call):
+                continue
+
+            dotted_name = _dotted_name(decorator.func)
+            if (
+                isinstance(decorator.func, ast.Attribute)
+                and decorator.func.attr in ROUTE_DECORATOR_NAMES
+            ):
+                has_route_decorator = True
+
+            if dotted_name in deprecated_names or (
+                dotted_name == "openhands.sdk.utils.deprecation.deprecated"
+            ):
+                uses_sdk_deprecated = True
+                continue
+
+            if (
+                isinstance(decorator.func, ast.Attribute)
+                and decorator.func.attr == "deprecated"
+            ):
+                base_name = _dotted_name(decorator.func.value)
+                if base_name in deprecation_module_names or (
+                    base_name == "openhands.sdk.utils.deprecation"
+                ):
+                    uses_sdk_deprecated = True
+
+        if has_route_decorator and uses_sdk_deprecated:
+            rel_path = file_path.relative_to(repo_root)
+            errors.append(
+                f"{rel_path}:{node.lineno} FastAPI route `{node.name}` uses "
+                "openhands.sdk.utils.deprecation.deprecated; use the route "
+                "decorator's deprecated=True flag instead."
+            )
+
+    return errors
+
+
+def _find_sdk_deprecated_fastapi_routes(repo_root: Path) -> list[str]:
+    app_root = repo_root / "openhands-agent-server" / "openhands" / "agent_server"
+    errors: list[str] = []
+
+    for file_path in sorted(app_root.rglob("*.py")):
+        errors.extend(_find_sdk_deprecated_fastapi_routes_in_file(file_path, repo_root))
+
+    return errors
+
+
+def _find_deprecation_policy_errors(schema: dict) -> list[str]:
+    errors: list[str] = []
+
+    for path, path_item in schema.get("paths", {}).items():
+        if not isinstance(path_item, dict):
+            continue
+
+        for method, operation in path_item.items():
+            if method not in HTTP_METHODS or not isinstance(operation, dict):
+                continue
+
+            description = operation.get("description") or ""
+            if "deprecated since" not in description.lower():
+                continue
+
+            if operation.get("deprecated") is True:
+                continue
+
+            errors.append(
+                f"{method.upper()} {path} documents deprecation in its "
+                "description but is not marked deprecated=true in OpenAPI."
+            )
+
+    return errors
+
+
+def _parse_openapi_deprecation_description(
+    description: str | None,
+) -> tuple[str, str] | None:
+    """Extract ``(deprecated_in, removed_in)`` from an OpenAPI description.
+
+    The accepted wording intentionally matches ``check_deprecations.py`` so both
+    CI checks recognize the same note, for example:
+
+        Deprecated since v1.14.0 and scheduled for removal in v1.19.0.
+    """
+    if not description:
+        return None
+
+    match = REST_ROUTE_DEPRECATION_RE.search(" ".join(description.split()))
+    if match is None:
+        return None
+
+    return match.group("deprecated").rstrip("."), match.group("removed").rstrip(".")
+
+
+def _version_ge(current: str, target: str) -> bool:
+    try:
+        return pkg_version.parse(current) >= pkg_version.parse(target)
+    except pkg_version.InvalidVersion as exc:
+        raise SystemExit(
+            f"Invalid semantic version comparison: {current=} {target=}"
+        ) from exc
+
+
+def _get_openapi_operation(schema: dict, path: str, method: str) -> dict | None:
+    path_item = schema.get("paths", {}).get(path)
+    if not isinstance(path_item, dict):
+        return None
+
+    operation = path_item.get(method.lower())
+    if not isinstance(operation, dict):
+        return None
+
+    return operation
+
+
+def _validate_removed_operations(
+    removed_operations: list[dict],
+    prev_schema: dict,
+    current_version: str,
+) -> list[str]:
+    """Validate removed operations against the baseline deprecation metadata."""
+    errors: list[str] = []
+
+    for operation in removed_operations:
+        path = str(operation.get("path", ""))
+        method = str(operation.get("method", "")).lower()
+        method_label = method.upper() or "<unknown method>"
+
+        if not operation.get("deprecated", False):
+            errors.append(
+                f"Removed {method_label} {path} without prior deprecation "
+                "(deprecated=true)."
+            )
+            continue
+
+        baseline_operation = _get_openapi_operation(prev_schema, path, method)
+        if baseline_operation is None:
+            errors.append(
+                f"Removed {method_label} {path} was marked deprecated in the "
+                "baseline release, but the previous OpenAPI schema could not be "
+                "inspected for its scheduled removal version."
+            )
+            continue
+
+        deprecation_details = _parse_openapi_deprecation_description(
+            baseline_operation.get("description")
+        )
+        if deprecation_details is None:
+            errors.append(
+                f"Removed {method_label} {path} was marked deprecated in the "
+                "baseline release, but its OpenAPI description does not declare "
+                "a scheduled removal version. REST API removals require 5 minor "
+                "releases of deprecation runway."
+            )
+            continue
+
+        _, removed_in = deprecation_details
+        if not _version_ge(current_version, removed_in):
+            errors.append(
+                f"Removed {method_label} {path} before its scheduled removal "
+                f"version v{removed_in} (current version: v{current_version}). "
+                "REST API removals require 5 minor releases of deprecation "
+                "runway."
+            )
+            continue
+
+        print(
+            f"::notice title={PYPI_DISTRIBUTION} REST API::Removed previously-"
+            f"deprecated {method_label} {path} after its scheduled removal "
+            f"version v{removed_in}."
+        )
+
+    return errors
+
+
+def _split_breaking_changes(
+    breaking_changes: list[dict],
+) -> tuple[list[dict], list[dict]]:
+    """Split oasdiff results into removals and all other breakages."""
+    removed_operations: list[dict] = []
+    other_breaking_changes: list[dict] = []
+
+    for change in breaking_changes:
+        change_id = str(change.get("id", ""))
+        details = change.get("details", {})
+
+        if "removed" in change_id.lower() and "operation" in change_id.lower():
+            removed_operations.append(
+                {
+                    "path": details.get("path", ""),
+                    "method": details.get("method", ""),
+                    "deprecated": details.get("deprecated", False),
+                }
+            )
+            continue
+
+        other_breaking_changes.append(change)
+
+    return removed_operations, other_breaking_changes
+
+
+def _normalize_openapi_for_oasdiff(schema: dict) -> dict:
+    """Normalize OpenAPI 3.1 schema for oasdiff compatibility.
+
+    oasdiff expects OpenAPI 3.0-style exclusiveMinimum/exclusiveMaximum booleans
+    (https://spec.openapis.org/oas/v3.0.3.html#schema-object), while OpenAPI 3.1
+    emits numeric values. Convert numeric exclusives into minimum/maximum +
+    exclusive boolean flags so oasdiff can parse the schema.
+
+    Mutates the schema in place and returns it for convenience.
+    """
+
+    def _walk(node: object) -> None:
+        if isinstance(node, dict):
+            if (
+                "exclusiveMinimum" in node
+                and isinstance(node["exclusiveMinimum"], (int, float))
+                and not isinstance(node["exclusiveMinimum"], bool)
+            ):
+                value = node["exclusiveMinimum"]
+                if "minimum" not in node:
+                    node["minimum"] = value
+                node["exclusiveMinimum"] = True
+            if (
+                "exclusiveMaximum" in node
+                and isinstance(node["exclusiveMaximum"], (int, float))
+                and not isinstance(node["exclusiveMaximum"], bool)
+            ):
+                value = node["exclusiveMaximum"]
+                if "maximum" not in node:
+                    node["maximum"] = value
+                node["exclusiveMaximum"] = True
+
+            for child in node.values():
+                _walk(child)
+        elif isinstance(node, list):
+            for child in node:
+                _walk(child)
+
+    _walk(schema)
+    return schema
+
+
+def _run_oasdiff_breakage_check(
+    prev_spec: Path, cur_spec: Path
+) -> tuple[list[dict], int]:
+    """Run oasdiff breaking check between two OpenAPI specs.
+
+    Returns (list of breaking changes, exit code from oasdiff).
+    """
+    try:
+        result = subprocess.run(
+            [
+                "oasdiff",
+                "breaking",
+                "-f",
+                "json",
+                "--fail-on",
+                "ERR",
+                str(prev_spec),
+                str(cur_spec),
+            ],
+            capture_output=True,
+            text=True,
+        )
+    except FileNotFoundError:
+        print(
+            "::warning title=oasdiff not found::"
+            "Please install oasdiff: https://github.com/oasdiff/oasdiff"
+        )
+        return [], 0
+
+    breaking_changes = []
+    if result.stdout:
+        try:
+            breaking_changes = json.loads(result.stdout)
+        except json.JSONDecodeError:
+            pass
+
+    return breaking_changes, result.returncode
+
+
+def main() -> int:
+    current_version = _read_version_from_pyproject(AGENT_SERVER_PYPROJECT)
+    baseline_version = _get_baseline_version(PYPI_DISTRIBUTION, current_version)
+
+    if baseline_version is None:
+        print(
+            f"::warning title={PYPI_DISTRIBUTION} REST API::Unable to find baseline "
+            f"version for {current_version}; skipping breakage checks."
+        )
+        return 0
+
+    baseline_git_ref = f"v{baseline_version}"
+
+    static_policy_errors = _find_sdk_deprecated_fastapi_routes(REPO_ROOT)
+    for error in static_policy_errors:
+        print(f"::error title={PYPI_DISTRIBUTION} REST API::{error}")
+
+    current_schema = _generate_current_openapi()
+    if current_schema is None:
+        return 1
+
+    deprecation_policy_errors = _find_deprecation_policy_errors(current_schema)
+    for error in deprecation_policy_errors:
+        print(f"::error title={PYPI_DISTRIBUTION} REST API::{error}")
+
+    prev_schema = _generate_openapi_for_git_ref(baseline_git_ref)
+    if prev_schema is None:
+        return 0 if not (static_policy_errors or deprecation_policy_errors) else 1
+
+    prev_schema = _normalize_openapi_for_oasdiff(prev_schema)
+    current_schema = _normalize_openapi_for_oasdiff(current_schema)
+
+    with tempfile.TemporaryDirectory(prefix="oasdiff-specs-") as tmp:
+        tmp_path = Path(tmp)
+        prev_spec_file = tmp_path / "prev_spec.json"
+        cur_spec_file = tmp_path / "cur_spec.json"
+
+        prev_spec_file.write_text(json.dumps(prev_schema, indent=2))
+        cur_spec_file.write_text(json.dumps(current_schema, indent=2))
+
+        breaking_changes, exit_code = _run_oasdiff_breakage_check(
+            prev_spec_file, cur_spec_file
+        )
+
+    if not breaking_changes:
+        if exit_code == 0:
+            print("No breaking changes detected.")
+        else:
+            print(
+                f"oasdiff returned exit code {exit_code} but no breaking changes "
+                "in JSON format. There may be warnings only."
+            )
+    else:
+        removed_operations, other_breaking_changes = _split_breaking_changes(
+            breaking_changes
+        )
+        removal_errors = _validate_removed_operations(
+            removed_operations,
+            prev_schema,
+            current_version,
+        )
+
+        for error in removal_errors:
+            print(f"::error title={PYPI_DISTRIBUTION} REST API::{error}")
+
+        if other_breaking_changes:
+            print(
+                "::error "
+                f"title={PYPI_DISTRIBUTION} REST API::Detected breaking REST API "
+                "changes other than removing previously-deprecated operations. "
+                "REST contract changes must preserve compatibility for 5 minor "
+                "releases; keep the old contract available until its scheduled "
+                "removal version."
+            )
+
+        print("\nBreaking REST API changes detected compared to baseline release:")
+        for text in breaking_changes:
+            print(f"- {text.get('text', str(text))}")
+
+        if not (removal_errors or other_breaking_changes):
+            print(
+                "Breaking changes are limited to previously-deprecated operations "
+                "whose scheduled removal versions have been reached."
+            )
+        else:
+            return 1
+
+    return 1 if (static_policy_errors or deprecation_policy_errors) else 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
@@ -0,0 +1,592 @@
+#!/usr/bin/env python3
+"""Static analysis for deprecation deadlines.
+
+This script scans Python deprecation metadata (`deprecated`, `warn_deprecated`,
+`warn_cleanup`) and agent-server REST routes marked `deprecated=True`. If the
+current project version has reached or passed a feature's removal marker, the
+script fails with a helpful summary so legacy shims and overdue deprecated REST
+endpoints are cleaned up before release.
+"""
+
+from __future__ import annotations
+
+import ast
+import re
+import sys
+import tomllib
+from collections.abc import Iterable, Iterator, Sequence
+from dataclasses import dataclass
+from datetime import date
+from pathlib import Path
+from typing import Literal
+
+from packaging import version as pkg_version
+
+
+REST_ROUTE_DEPRECATION_RE = re.compile(
+    r"Deprecated since v(?P<deprecated>[0-9A-Za-z.+-]+)\s+"
+    r"and scheduled for removal in v(?P<removed>[0-9A-Za-z.+-]+)\.?",
+    re.IGNORECASE,
+)
+ROUTE_DECORATOR_NAMES = {
+    "get",
+    "put",
+    "post",
+    "delete",
+    "patch",
+    "options",
+    "head",
+    "trace",
+    "api_route",
+}
+HTTP_METHODS = ROUTE_DECORATOR_NAMES - {"api_route"}
+
+REPO_ROOT = Path(__file__).resolve().parents[2]
+
+
+@dataclass(frozen=True, slots=True)
+class PackageConfig:
+    name: str
+    pyproject: Path
+    source_roots: tuple[Path, ...]
+
+
+PACKAGES: tuple[PackageConfig, ...] = (
+    PackageConfig(
+        name="openhands-sdk",
+        pyproject=REPO_ROOT / "openhands-sdk" / "pyproject.toml",
+        source_roots=(REPO_ROOT / "openhands-sdk" / "openhands" / "sdk",),
+    ),
+    PackageConfig(
+        name="openhands-tools",
+        pyproject=REPO_ROOT / "openhands-tools" / "pyproject.toml",
+        source_roots=(REPO_ROOT / "openhands-tools" / "openhands" / "tools",),
+    ),
+    PackageConfig(
+        name="openhands-workspace",
+        pyproject=REPO_ROOT / "openhands-workspace" / "pyproject.toml",
+        source_roots=(REPO_ROOT / "openhands-workspace" / "openhands" / "workspace",),
+    ),
+    PackageConfig(
+        name="openhands-agent-server",
+        pyproject=REPO_ROOT / "openhands-agent-server" / "pyproject.toml",
+        source_roots=(
+            REPO_ROOT / "openhands-agent-server" / "openhands" / "agent_server",
+        ),
+    ),
+)
+
+
+@dataclass(slots=True)
+class DeprecationRecord:
+    identifier: str
+    removed_in: str | date | None
+    deprecated_in: str | None
+    path: Path
+    line: int
+    kind: Literal["decorator", "warn_call", "cleanup_call", "rest_route"]
+    package: str
+
+
+def _load_current_version(pyproject: Path) -> str:
+    data = tomllib.loads(pyproject.read_text())
+    try:
+        return str(data["project"]["version"])
+    except KeyError as exc:  # pragma: no cover - configuration error
+        raise SystemExit(
+            f"Unable to determine project version from {pyproject}"
+        ) from exc
+
+
+def _iter_python_files(root: Path) -> Iterator[Path]:
+    for path in root.rglob("*.py"):
+        if path.name == "__init__.py" and path.parent == root:
+            continue
+        yield path
+
+
+def _parse_removed_value(
+    node: ast.AST | None,
+    *,
+    path: Path,
+    line: int,
+) -> str | date | None:
+    if node is None:
+        return None
+
+    expression = ast.unparse(node)
+
+    if isinstance(node, ast.Constant):
+        if isinstance(node.value, str):
+            return node.value
+        if node.value is None:
+            return None
+        raise SystemExit(
+            f"Unsupported removed_in literal at {path}:{line}: {expression}"
+        )
+
+    if isinstance(node, ast.Call):
+        func = node.func
+        if isinstance(func, ast.Name) and func.id == "date":
+            try:
+                args = [_safe_int_literal(arg) for arg in node.args]
+                kwargs = {
+                    kw.arg: _safe_int_literal(kw.value)
+                    for kw in node.keywords
+                    if kw.arg is not None
+                }
+            except ValueError as exc:
+                raise SystemExit(
+                    f"Unsupported removed_in date() arguments at {path}:{line}:"
+                    f" {expression}"
+                ) from exc
+
+            if any(kw.arg is None for kw in node.keywords):
+                raise SystemExit(
+                    "Unsupported removed_in date() call (uses **kwargs) at "
+                    f"{path}:{line}: {expression}"
+                )
+
+            try:
+                return date(*args, **kwargs)
+            except TypeError as exc:
+                raise SystemExit(
+                    f"Invalid removed_in date() call at {path}:{line}: {expression}"
+                ) from exc
+
+        if (
+            isinstance(func, ast.Attribute)
+            and isinstance(func.value, ast.Name)
+            and func.value.id == "date"
+            and func.attr == "today"
+        ):
+            if node.args or node.keywords:
+                raise SystemExit(
+                    "date.today() removed_in call must not include arguments at "
+                    f"{path}:{line}: {expression}"
+                )
+            return date.today()
+
+    raise SystemExit(
+        f"Unsupported removed_in expression at {path}:{line}: {expression}"
+    )
+
+
+def _parse_deprecated_value(
+    node: ast.AST | None,
+    *,
+    path: Path,
+    line: int,
+) -> str | None:
+    if node is None:
+        return None
+
+    expression = ast.unparse(node)
+
+    if isinstance(node, ast.Constant):
+        if isinstance(node.value, str):
+            return node.value
+        if node.value is None:
+            return None
+
+    raise SystemExit(
+        f"Unsupported deprecated_in expression at {path}:{line}: {expression}"
+    )
+
+
+def _safe_int_literal(node: ast.AST) -> int:
+    if not isinstance(node, ast.Constant) or not isinstance(node.value, int):
+        raise ValueError(
+            f"Unsupported expression inside literal evaluation: {ast.unparse(node)}"
+        )
+    return node.value
+
+
+def _extract_kw(call: ast.Call, name: str) -> ast.AST | None:
+    for kw in call.keywords:
+        if kw.arg == name:
+            return kw.value
+    return None
+
+
+def _extract_string_literal(node: ast.AST | None) -> str | None:
+    if isinstance(node, ast.Constant) and isinstance(node.value, str):
+        return node.value
+    return None
+
+
+def _extract_string_sequence(node: ast.AST | None) -> tuple[str, ...] | None:
+    if not isinstance(node, (ast.List, ast.Tuple, ast.Set)):
+        return None
+
+    values: list[str] = []
+    for item in node.elts:
+        value = _extract_string_literal(item)
+        if value is None:
+            return None
+        values.append(value)
+    return tuple(values)
+
+
+def _extract_route_details(call: ast.Call) -> tuple[tuple[str, str], ...]:
+    target = call.func
+    if not isinstance(target, ast.Attribute):
+        return ()
+
+    decorator_name = target.attr
+    if decorator_name not in ROUTE_DECORATOR_NAMES:
+        return ()
+
+    path = _extract_string_literal(call.args[0] if call.args else None)
+    if path is None:
+        path = _extract_string_literal(_extract_kw(call, "path"))
+    if path is None:
+        return ()
+
+    if decorator_name in HTTP_METHODS:
+        return ((decorator_name.upper(), path),)
+
+    methods = _extract_string_sequence(_extract_kw(call, "methods"))
+    if methods is None:
+        return (("GET", path),)
+
+    return tuple(
+        (method.upper(), path) for method in methods if method.lower() in HTTP_METHODS
+    )
+
+
+def _parse_rest_route_deprecation_docstring(
+    docstring: str | None,
+    *,
+    path: Path,
+    line: int,
+    route_identifiers: Sequence[str],
+) -> tuple[str, str]:
+    if not docstring:
+        raise SystemExit(
+            "Deprecated REST route(s) "
+            f"{', '.join(route_identifiers)} at {path}:{line} must include a "
+            "docstring note like 'Deprecated since vX.Y.Z and scheduled for "
+            "removal in vA.B.C.'"
+        )
+
+    match = REST_ROUTE_DEPRECATION_RE.search(" ".join(docstring.split()))
+    if match is None:
+        raise SystemExit(
+            "Deprecated REST route(s) "
+            f"{', '.join(route_identifiers)} at {path}:{line} must include a "
+            "docstring note like 'Deprecated since vX.Y.Z and scheduled for "
+            "removal in vA.B.C.'"
+        )
+
+    return match.group("deprecated").rstrip("."), match.group("removed").rstrip(".")
+
+
+def _gather_rest_route_deprecations(
+    tree: ast.AST, path: Path, *, package: str
+) -> Iterator[DeprecationRecord]:
+    for node in ast.walk(tree):
+        if not isinstance(node, ast.FunctionDef | ast.AsyncFunctionDef):
+            continue
+
+        routes: list[tuple[str, str]] = []
+        for deco in node.decorator_list:
+            if not isinstance(deco, ast.Call):
+                continue
+            deprecated_value = _extract_kw(deco, "deprecated")
+            if (
+                not isinstance(deprecated_value, ast.Constant)
+                or deprecated_value.value is not True
+            ):
+                continue
+            routes.extend(_extract_route_details(deco))
+
+        if not routes:
+            continue
+
+        deprecated_in, removed_in = _parse_rest_route_deprecation_docstring(
+            ast.get_docstring(node),
+            path=path,
+            line=node.lineno,
+            route_identifiers=[
+                f"{method} {route_path}" for method, route_path in routes
+            ],
+        )
+
+        for method, route_path in routes:
+            yield DeprecationRecord(
+                identifier=f"{method} {route_path}",
+                removed_in=removed_in,
+                deprecated_in=deprecated_in,
+                path=path,
+                line=node.lineno,
+                kind="rest_route",
+                package=package,
+            )
+
+
+def _gather_decorators(
+    tree: ast.AST, path: Path, *, package: str
+) -> Iterator[DeprecationRecord]:
+    for node in ast.walk(tree):
+        if not isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)):
+            continue
+
+        for deco in node.decorator_list:
+            call = deco if isinstance(deco, ast.Call) else None
+            if call is None:
+                continue
+
+            target = call.func
+            if isinstance(target, ast.Name):
+                decorator_name = target.id
+            elif isinstance(target, ast.Attribute):
+                decorator_name = target.attr
+            else:
+                continue
+
+            if decorator_name != "deprecated":
+                continue
+
+            removed_expr = _extract_kw(call, "removed_in")
+            deprecated_expr = _extract_kw(call, "deprecated_in")
+
+            record = DeprecationRecord(
+                identifier=_build_identifier(node),
+                removed_in=_parse_removed_value(
+                    removed_expr, path=path, line=node.lineno
+                ),
+                deprecated_in=_parse_deprecated_value(
+                    deprecated_expr, path=path, line=node.lineno
+                ),
+                path=path,
+                line=node.lineno,
+                kind="decorator",
+                package=package,
+            )
+            yield record
+
+
+def _gather_warn_calls(
+    tree: ast.AST, path: Path, *, package: str
+) -> Iterator[DeprecationRecord]:
+    for node in ast.walk(tree):
+        if not isinstance(node, ast.Call):
+            continue
+
+        target = node.func
+        if isinstance(target, ast.Name):
+            func_name = target.id
+        elif isinstance(target, ast.Attribute):
+            func_name = target.attr
+        else:
+            continue
+
+        if func_name == "warn_deprecated":
+            identifier_node = node.args[0] if node.args else None
+            if identifier_node is None:
+                continue
+            identifier = ast.unparse(identifier_node)
+
+            removed_expr = _extract_kw(node, "removed_in")
+            deprecated_expr = _extract_kw(node, "deprecated_in")
+
+            yield DeprecationRecord(
+                identifier=identifier,
+                removed_in=_parse_removed_value(
+                    removed_expr, path=path, line=node.lineno
+                ),
+                deprecated_in=_parse_deprecated_value(
+                    deprecated_expr, path=path, line=node.lineno
+                ),
+                path=path,
+                line=node.lineno,
+                kind="warn_call",
+                package=package,
+            )
+        elif func_name == "warn_cleanup":
+            identifier_node = node.args[0] if node.args else None
+            if identifier_node is None:
+                continue
+            identifier = ast.unparse(identifier_node)
+
+            cleanup_expr = _extract_kw(node, "cleanup_by")
+
+            yield DeprecationRecord(
+                identifier=identifier,
+                removed_in=_parse_removed_value(
+                    cleanup_expr, path=path, line=node.lineno
+                ),
+                deprecated_in=None,
+                path=path,
+                line=node.lineno,
+                kind="cleanup_call",
+                package=package,
+            )
+
+
+def _build_identifier(node: ast.AST) -> str:
+    if isinstance(node, ast.ClassDef):
+        return node.name
+    if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
+        qual_name = node.name
+        if node.decorator_list:
+            parent = getattr(node, "parent", None)
+            if parent and isinstance(parent, ast.ClassDef):
+                return f"{parent.name}.{node.name}"
+        return qual_name
+    return "<unknown>"
+
+
+def _attach_parents(tree: ast.AST) -> None:
+    for node in ast.walk(tree):
+        for child in ast.iter_child_nodes(node):
+            setattr(child, "parent", node)
+
+
+def _collect_records(files: Iterable[Path], *, package: str) -> list[DeprecationRecord]:
+    records: list[DeprecationRecord] = []
+    for path in files:
+        tree = ast.parse(path.read_text())
+        _attach_parents(tree)
+        records.extend(_gather_decorators(tree, path, package=package))
+        records.extend(_gather_warn_calls(tree, path, package=package))
+    return records
+
+
+def _collect_rest_route_records(
+    files: Iterable[Path], *, package: str
+) -> list[DeprecationRecord]:
+    records: list[DeprecationRecord] = []
+    for path in files:
+        tree = ast.parse(path.read_text())
+        records.extend(_gather_rest_route_deprecations(tree, path, package=package))
+    return records
+
+
+def _version_ge(current: str, target: str) -> bool:
+    try:
+        return pkg_version.parse(current) >= pkg_version.parse(target)
+    except pkg_version.InvalidVersion as exc:
+        raise SystemExit(
+            f"Invalid semantic version comparison: {current=} {target=}"
+        ) from exc
+
+
+def _should_fail(current_version: str, record: DeprecationRecord) -> bool:
+    removed = record.removed_in
+    if removed is None:
+        return False
+    if isinstance(removed, date):
+        return date.today() >= removed
+    try:
+        target = str(removed)
+        return _version_ge(current_version, target)
+    except SystemExit:
+        raise
+    except Exception as exc:  # pragma: no cover - unexpected literal type
+        raise SystemExit(
+            f"Unsupported removed_in expression in {record.path}:{record.line}:"
+            f" {removed!r}"
+        ) from exc
+
+
+def _format_record(record: DeprecationRecord) -> str:
+    location = record.path.relative_to(REPO_ROOT)
+    removed = record.removed_in if record.removed_in is not None else "(none)"
+
+    if record.kind == "cleanup_call":
+        return (
+            f"- [{record.package}] {record.identifier} ({record.kind})\n"
+            f"  cleanup by:    {removed}\n"
+            f"  defined at:    {location}:{record.line}"
+        )
+
+    deprecated = (
+        record.deprecated_in if record.deprecated_in is not None else "(unknown)"
+    )
+    return (
+        f"- [{record.package}] {record.identifier} ({record.kind})\n"
+        f"  deprecated in: {deprecated}\n"
+        f"  removed in:    {removed}\n"
+        f"  defined at:    {location}:{record.line}"
+    )
+
+
+def main(argv: Sequence[str] | None = None) -> int:
+    argv = list(argv or [])
+
+    overdue: list[DeprecationRecord] = []
+    total_records = 0
+    package_summaries: list[tuple[str, str, int]] = []
+
+    for package in PACKAGES:
+        if not package.pyproject.exists():
+            raise SystemExit(
+                f"Unable to locate pyproject.toml for {package.name}: "
+                f"{package.pyproject}"
+            )
+
+        current_version = _load_current_version(package.pyproject)
+
+        files: list[Path] = []
+        for root in package.source_roots:
+            if not root.exists():
+                raise SystemExit(
+                    f"Source root {root} for package {package.name} does not exist"
+                )
+            files.extend(_iter_python_files(root))
+
+        records = _collect_records(files, package=package.name)
+        if package.name == "openhands-agent-server":
+            records.extend(_collect_rest_route_records(files, package=package.name))
+
+        overdue.extend(r for r in records if _should_fail(current_version, r))
+        total_records += len(records)
+        package_summaries.append((package.name, current_version, len(records)))
+
+    if overdue:
+        deprecated_items = [r for r in overdue if r.kind != "cleanup_call"]
+        cleanup_items = [r for r in overdue if r.kind == "cleanup_call"]
+
+        if deprecated_items:
+            print(
+                "The following deprecated features have passed their removal "
+                "deadline:\n"
+            )
+            for record in deprecated_items:
+                print(_format_record(record))
+                print()
+
+        if cleanup_items:
+            print("The following workarounds have passed their cleanup deadline:\n")
+            for record in cleanup_items:
+                print(_format_record(record))
+                print()
+
+        if deprecated_items:
+            print(
+                "Update or remove the listed features before publishing a version that "
+                "meets or exceeds their removal deadline."
+            )
+        if cleanup_items:
+            print(
+                "Remove the listed workarounds before publishing a version that "
+                "meets or exceeds their cleanup deadline."
+            )
+        return 1
+
+    for package_name, version, count in package_summaries:
+        print(
+            f"{package_name}: checked {count} deprecation metadata entries against "
+            f"version {version}."
+        )
+    print(
+        f"Checked {total_records} deprecation metadata entries across "
+        f"{len(package_summaries)} package(s)."
+    )
+    return 0
+
+
+if __name__ == "__main__":  # pragma: no cover - manual invocation
+    sys.exit(main(sys.argv[1:]))
@@ -0,0 +1,297 @@
+#!/usr/bin/env python3
+"""Validate docstrings conform to MDX-compatible formatting guidelines.
+
+This script checks that docstrings in the SDK use patterns that render correctly
+in Mintlify MDX documentation. It validates:
+
+1. No REPL-style examples (>>>) - should use fenced code blocks instead
+2. Shell/config examples use fenced code blocks (prevents # becoming headers)
+
+Run with: python scripts/check_docstrings.py
+Exit code 0 = all checks pass, 1 = violations found
+"""
+
+import ast
+import sys
+from dataclasses import dataclass
+from pathlib import Path
+
+
+# Directories to check
+SDK_PATHS = [
+    "openhands-sdk/openhands/sdk",
+]
+
+# Files/directories to skip
+SKIP_PATTERNS = [
+    "__pycache__",
+    ".pyc",
+    "test_",
+    "_test.py",
+]
+
+# Core public API files to check strictly (these are documented on the website)
+# Other files will be checked but only emit warnings, not failures
+STRICT_CHECK_FILES = [
+    "agent/agent.py",
+    "llm/llm.py",
+    "conversation/conversation.py",
+    "tool/tool.py",
+    "workspace/base.py",
+    "observability/laminar.py",
+]
+
+
+@dataclass
+class Violation:
+    """A docstring formatting violation."""
+
+    file: Path
+    line: int
+    name: str
+    rule: str
+    message: str
+    is_strict: bool = False  # True if this is in a strictly-checked file
+
+
+def should_skip(path: Path) -> bool:
+    """Check if a path should be skipped."""
+    path_str = str(path)
+    return any(pattern in path_str for pattern in SKIP_PATTERNS)
+
+
+def check_repl_examples(
+    docstring: str, name: str, lineno: int, file: Path
+) -> list[Violation]:
+    """Check for REPL-style examples (>>>).
+
+    These should be replaced with fenced code blocks for better MDX rendering.
+    """
+    violations = []
+    lines = docstring.split("\n")
+
+    for i, line in enumerate(lines):
+        stripped = line.strip()
+        if stripped.startswith(">>>"):
+            violations.append(
+                Violation(
+                    file=file,
+                    line=lineno + i,
+                    name=name,
+                    rule="no-repl-examples",
+                    message=(
+                        "Use fenced code blocks (```python) instead of >>> REPL style. "
+                        "REPL examples don't render well in MDX documentation."
+                    ),
+                )
+            )
+            # Only report once per docstring
+            break
+
+    return violations
+
+
+def check_unfenced_shell_config(
+    docstring: str, name: str, lineno: int, file: Path
+) -> list[Violation]:
+    """Check for shell/config examples that aren't in fenced code blocks.
+
+    Lines starting with # outside code blocks become markdown headers.
+    """
+    violations = []
+    lines = docstring.split("\n")
+    in_code_block = False
+
+    for i, line in enumerate(lines):
+        stripped = line.strip()
+
+        # Track code block state
+        if stripped.startswith("```"):
+            in_code_block = not in_code_block
+            continue
+
+        # Skip if inside a code block
+        if in_code_block:
+            continue
+
+        # Check for shell-style comments that look like config
+        # Pattern: line starts with # and previous line has = (config pattern)
+        if stripped.startswith("#") and not stripped.startswith("# "):
+            # This is likely a shell comment without space (less common in prose)
+            continue
+
+        # Check for unfenced config: KEY=VALUE followed by # comment
+        if i > 0:
+            prev_line = lines[i - 1].strip() if i > 0 else ""
+            # If previous line looks like config (VAR=value) and this is a # comment
+            if "=" in prev_line and prev_line.split("=")[0].isupper():
+                if stripped.startswith("# "):
+                    violations.append(
+                        Violation(
+                            file=file,
+                            line=lineno + i,
+                            name=name,
+                            rule="fenced-shell-config",
+                            message=(
+                                "Shell/config examples with # comments should be "
+                                "in ```bash code blocks. Otherwise # becomes a "
+                                "markdown header."
+                            ),
+                        )
+                    )
+                    # Only report once per docstring
+                    break
+
+    return violations
+
+
+def check_docstring(
+    docstring: str, name: str, lineno: int, file: Path
+) -> list[Violation]:
+    """Run all checks on a docstring."""
+    if not docstring:
+        return []
+
+    violations = []
+    violations.extend(check_repl_examples(docstring, name, lineno, file))
+    violations.extend(check_unfenced_shell_config(docstring, name, lineno, file))
+    return violations
+
+
+def get_docstrings_from_file(file: Path) -> list[tuple[str, str, int]]:
+    """Extract all docstrings from a Python file.
+
+    Returns list of (name, docstring, lineno) tuples.
+    """
+    try:
+        source = file.read_text()
+        tree = ast.parse(source)
+    except (SyntaxError, UnicodeDecodeError) as e:
+        print(f"Warning: Could not parse {file}: {e}", file=sys.stderr)
+        return []
+
+    docstrings = []
+
+    for node in ast.walk(tree):
+        name = None
+        lineno = 0
+        docstring = None
+
+        if isinstance(node, ast.Module):
+            docstring = ast.get_docstring(node)
+            name = file.stem
+            lineno = 1
+        elif isinstance(node, ast.ClassDef):
+            docstring = ast.get_docstring(node)
+            name = node.name
+            lineno = node.lineno
+        elif isinstance(node, ast.FunctionDef | ast.AsyncFunctionDef):
+            docstring = ast.get_docstring(node)
+            name = node.name
+            lineno = node.lineno
+
+        if docstring and name:
+            docstrings.append((name, docstring, lineno))
+
+    return docstrings
+
+
+def is_strict_file(file: Path, repo_root: Path) -> bool:
+    """Check if a file is in the strict check list."""
+    try:
+        rel_path = file.relative_to(repo_root / "openhands-sdk/openhands/sdk")
+        return any(str(rel_path) == strict for strict in STRICT_CHECK_FILES)
+    except ValueError:
+        return False
+
+
+def check_file(file: Path, repo_root: Path) -> list[Violation]:
+    """Check all docstrings in a file."""
+    violations = []
+    is_strict = is_strict_file(file, repo_root)
+
+    for name, docstring, lineno in get_docstrings_from_file(file):
+        file_violations = check_docstring(docstring, name, lineno, file)
+        for v in file_violations:
+            v.is_strict = is_strict
+        violations.extend(file_violations)
+
+    return violations
+
+
+def main() -> int:
+    """Run docstring checks on all SDK files."""
+    repo_root = Path(__file__).parent.parent.parent
+
+    all_violations: list[Violation] = []
+    files_checked = 0
+
+    for sdk_path in SDK_PATHS:
+        path = repo_root / sdk_path
+        if not path.exists():
+            print(f"Warning: Path not found: {path}", file=sys.stderr)
+            continue
+
+        for py_file in path.rglob("*.py"):
+            if should_skip(py_file):
+                continue
+
+            files_checked += 1
+            violations = check_file(py_file, repo_root)
+            all_violations.extend(violations)
+
+    # Separate strict violations (errors) from warnings
+    strict_violations = [v for v in all_violations if v.is_strict]
+    warning_violations = [v for v in all_violations if not v.is_strict]
+
+    # Report warnings (non-strict files)
+    if warning_violations:
+        count = len(warning_violations)
+        print(f"\n⚠️  Found {count} docstring warning(s) in non-core files:\n")
+
+        by_file: dict[Path, list[Violation]] = {}
+        for v in warning_violations:
+            by_file.setdefault(v.file, []).append(v)
+
+        for file, violations in sorted(by_file.items()):
+            rel_path = file.relative_to(repo_root)
+            print(f"📄 {rel_path}")
+            for v in violations:
+                print(f"   Line {v.line}: {v.name} ({v.rule})")
+        print()
+
+    # Report errors (strict files)
+    if strict_violations:
+        count = len(strict_violations)
+        print(f"\n❌ Found {count} docstring error(s) in core API files:\n")
+
+        by_file: dict[Path, list[Violation]] = {}
+        for v in strict_violations:
+            by_file.setdefault(v.file, []).append(v)
+
+        for file, violations in sorted(by_file.items()):
+            rel_path = file.relative_to(repo_root)
+            print(f"📄 {rel_path}")
+            for v in violations:
+                print(f"   Line {v.line}: {v.name}")
+                print(f"   Rule: {v.rule}")
+                print(f"   {v.message}")
+                print()
+
+        print("=" * 60)
+        print("To fix these issues:")
+        print("  1. Replace >>> examples with ```python code blocks")
+        print("  2. Wrap shell/config examples in ```bash code blocks")
+        print("=" * 60)
+        return 1
+
+    if warning_violations:
+        count = len(warning_violations)
+        print(f"✅ Core API files pass. {count} warnings in other files.")
+    else:
+        print(f"✅ All {files_checked} files pass docstring checks")
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
@@ -0,0 +1,209 @@
+#!/usr/bin/env python3
+"""
+Check if all examples in agent-sdk are documented in the docs repository.
+
+This script:
+1. Scans the docs repository for references to example files
+2. Lists all example Python files in the agent-sdk repository
+3. Compares the two sets to find undocumented examples
+4. Exits with error code 1 if undocumented examples are found
+"""
+
+import os
+import re
+import sys
+from pathlib import Path
+
+
+def find_documented_examples(docs_path: Path) -> set[str]:
+    """
+    Find all example file references in the docs repository.
+
+    Searches for patterns like:
+    - examples/01_standalone_sdk/02_custom_tools.py
+    - examples/02_remote_agent_server/06_custom_tool/custom_tools/log_data.py
+    in MDX files.
+
+    Returns:
+        Set of normalized example file paths (relative to agent-sdk root)
+    """
+    documented_examples: set[str] = set()
+
+    # Pattern to match example file references with arbitrary nesting depth.
+    # Matches: examples/<dir>/.../<file>.py
+    pattern = r"examples/(?:[-\w]+/)+[-\w]+\.py"
+
+    for root, _, files in os.walk(docs_path):
+        for file in files:
+            if file.endswith(".mdx") or file.endswith(".md"):
+                file_path = Path(root) / file
+                try:
+                    content = file_path.read_text(encoding="utf-8")
+                    matches = re.findall(pattern, content)
+                    for match in matches:
+                        # Normalize the path
+                        documented_examples.add(match)
+                except Exception as e:
+                    print(f"Warning: Error reading {file_path}: {e}")
+                    continue
+
+    return documented_examples
+
+
+def find_agent_sdk_examples(agent_sdk_path: Path) -> set[str]:
+    """
+    Find all example Python files in the agent-sdk repository.
+
+    Excludes examples/03_github_workflows/ since those examples are YAML
+    files, not Python files.
+
+    Returns:
+        Set of example file paths (relative to agent-sdk root)
+    """
+    examples: set[str] = set()
+    examples_dir = agent_sdk_path / "examples"
+
+    if not examples_dir.exists():
+        print(f"Error: Examples directory not found: {examples_dir}")
+        sys.exit(1)
+
+    # Find all Python files under examples/
+    for root, _, files in os.walk(examples_dir):
+        for file in files:
+            if file.endswith(".py"):
+                file_path = Path(root) / file
+                # Get relative path from agent-sdk root
+                relative_path = file_path.relative_to(agent_sdk_path)
+                relative_path_str = str(relative_path)
+
+                # Skip GitHub workflow examples (those are YAML files, Python
+                # files there are just helpers)
+                if relative_path_str.startswith("examples/03_github_workflows/"):
+                    continue
+
+                # Skip LLM-specific tools examples: these are intentionally not
+                # enforced by the docs check. See discussion in PR #1486.
+                if relative_path_str.startswith("examples/04_llm_specific_tools/"):
+                    continue
+
+                # Skip __init__.py files as they typically don't need documentation
+                if file == "__init__.py":
+                    continue
+
+                examples.add(relative_path_str)
+
+    return examples
+
+
+def resolve_paths() -> tuple[Path, Path]:
+    """
+    Determine agent-sdk root and docs path.
+
+    Priority for docs path:
+      1) DOCS_PATH (env override)
+      2) $GITHUB_WORKSPACE/docs
+      3) agent_sdk_root/'docs'
+      4) agent_sdk_root.parent/'docs'
+
+    Returns:
+        Tuple of (agent_sdk_root, docs_path)
+    """
+    # agent-sdk repo root (script is at agent-sdk/.github/scripts/...)
+    script_file = Path(__file__).resolve()
+    agent_sdk_root = script_file.parent.parent.parent
+
+    candidates: list[Path] = []
+
+    # 1) Explicit env override
+    env_override = os.environ.get("DOCS_PATH")
+    if env_override:
+        candidates.append(Path(env_override).expanduser().resolve())
+
+    # 2) Standard GitHub workspace sibling
+    gh_ws = os.environ.get("GITHUB_WORKSPACE")
+    if gh_ws:
+        candidates.append(Path(gh_ws).resolve() / "docs")
+
+    # 3) Sibling inside the agent-sdk repo root
+    candidates.append(agent_sdk_root / "docs")
+
+    # 4) Parent-of-agent-sdk-root layout
+    candidates.append(agent_sdk_root.parent / "docs")
+
+    print(f"🔍 Agent SDK root: {agent_sdk_root}")
+    print("🔎 Trying docs paths (in order):")
+    for p in candidates:
+        print(f"   - {p}")
+
+    for p in candidates:
+        if p.exists():
+            print(f"📁 Using docs path: {p}")
+            return agent_sdk_root, p
+
+    # If none exist, fail with a helpful message
+    print("❌ Docs path not found in any of the expected locations.")
+    print("   Set DOCS_PATH, or checkout the repo to one of the tried paths above.")
+    sys.exit(1)
+
+
+def main() -> None:
+    agent_sdk_root, docs_path = resolve_paths()
+
+    print("\n" + "=" * 60)
+    print("Checking documented examples...")
+    print("=" * 60)
+
+    # Find all examples in agent-sdk
+    print("\n📋 Scanning agent-sdk examples...")
+    agent_examples = find_agent_sdk_examples(agent_sdk_root)
+    print(f"   Found {len(agent_examples)} example file(s)")
+
+    # Find all documented examples in docs
+    print("\n📄 Scanning docs repository...")
+    documented_examples = find_documented_examples(docs_path)
+    print(f"   Found {len(documented_examples)} documented example(s)")
+
+    # Calculate difference
+    undocumented = agent_examples - documented_examples
+
+    print("\n" + "=" * 60)
+    if undocumented:
+        print(f"❌ Found {len(undocumented)} undocumented example(s):")
+        print("=" * 60)
+        for example in sorted(undocumented):
+            print(f"   - {example}")
+        print("\n⚠️  Please add documentation for these examples in the docs repo.")
+        print("=" * 60)
+        print("\n📚 How to Document Examples:")
+        print("=" * 60)
+        print("1. Clone the docs repository:")
+        print("   git clone https://github.com/OpenHands/docs.git")
+        print()
+        print("2. Create a new .mdx file in sdk/guides/ directory")
+        print("   (e.g., sdk/guides/my-feature.mdx)")
+        print()
+        print("3. Add the example code block with this format:")
+        print('   ```python icon="python" expandable examples/path/to/file.py')
+        print("   <code will be auto-synced>")
+        print("   ```")
+        print()
+        print("4. See the format documentation at:")
+        print(
+            "   https://github.com/OpenHands/docs/blob/main/.github/scripts/README.md"
+        )
+        print()
+        print("5. Example documentation files can be found in:")
+        print("   https://github.com/OpenHands/docs/tree/main/sdk/guides")
+        print()
+        print("6. After creating the PR in docs repo, reference it in your")
+        print("   agent-sdk PR description.")
+        print("=" * 60)
+        sys.exit(1)
+    else:
+        print("✅ All examples are documented!")
+        print("=" * 60)
+        sys.exit(0)
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,104 @@
+#!/usr/bin/env python3
+"""
+Check for duplicate example numbers in the examples directory.
+
+This script ensures that within each examples subdirectory, no two files or
+folders share the same numeric prefix (e.g., two files both starting with "04_").
+
+Exit codes:
+    0 - No duplicates found
+    1 - Duplicates found
+"""
+
+import re
+import sys
+from collections import defaultdict
+from pathlib import Path
+
+
+def find_duplicate_numbers(examples_dir: Path) -> dict[str, list[str]]:
+    """
+    Find duplicate example numbers within each subdirectory.
+
+    Returns:
+        Dictionary mapping subdirectory paths to lists of duplicate entries.
+        Only includes subdirectories that have duplicates.
+    """
+    duplicates: dict[str, list[str]] = {}
+
+    # Pattern to extract leading number from filename/dirname
+    # e.g., "04" from "04_foo.py"
+    number_pattern = re.compile(r"^(\d+)_")
+
+    for subdir in sorted(examples_dir.iterdir()):
+        if not subdir.is_dir():
+            continue
+
+        # Skip hidden directories
+        if subdir.name.startswith("."):
+            continue
+
+        # Group entries by their numeric prefix
+        number_to_entries: dict[str, list[str]] = defaultdict(list)
+
+        for entry in subdir.iterdir():
+            # Skip hidden files/directories
+            if entry.name.startswith("."):
+                continue
+
+            match = number_pattern.match(entry.name)
+            if match:
+                number = match.group(1)
+                number_to_entries[number].append(entry.name)
+
+        # Find numbers with multiple entries
+        subdir_duplicates = []
+        for number, entries in sorted(number_to_entries.items()):
+            if len(entries) > 1:
+                subdir_duplicates.extend(sorted(entries))
+
+        if subdir_duplicates:
+            relative_subdir = str(subdir.relative_to(examples_dir.parent))
+            duplicates[relative_subdir] = subdir_duplicates
+
+    return duplicates
+
+
+def main() -> None:
+    # Find the examples directory relative to this script
+    script_file = Path(__file__).resolve()
+    repo_root = script_file.parent.parent.parent
+    examples_dir = repo_root / "examples"
+
+    if not examples_dir.exists():
+        print(f"Error: Examples directory not found: {examples_dir}")
+        sys.exit(1)
+
+    print("=" * 60)
+    print("Checking for duplicate example numbers...")
+    print("=" * 60)
+    print(f"\n📁 Scanning: {examples_dir}\n")
+
+    duplicates = find_duplicate_numbers(examples_dir)
+
+    if duplicates:
+        print("❌ Found duplicate example numbers:\n")
+        for subdir, entries in sorted(duplicates.items()):
+            print(f"  {subdir}/")
+            for entry in entries:
+                print(f"    - {entry}")
+            print()
+
+        print("=" * 60)
+        print("⚠️  Please renumber the examples to remove duplicates.")
+        print("   Each example should have a unique number within its folder.")
+        print("=" * 60)
+        sys.exit(1)
+    else:
+        print("✅ No duplicate example numbers found!")
+        print("=" * 60)
+        sys.exit(0)
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,826 @@
+#!/usr/bin/env python3
+"""API breakage detection for published OpenHands packages using Griffe.
+
+This script compares current workspace packages against the most recent PyPI
+release (or the matching release if the current version is already published)
+to detect breaking changes in the public API.
+
+It focuses on the curated public surface:
+- symbols exported via ``__all__``
+- public members removed from classes exported via ``__all__``
+
+It enforces two policies:
+
+1. **Deprecation-before-removal** – any removed export or removed public class
+   member must have been marked deprecated in the *previous* release using the
+   canonical deprecation helpers (``@deprecated`` decorator or
+   ``warn_deprecated()`` call from ``openhands.sdk.utils.deprecation``). For
+   members, the recommended ``warn_deprecated`` feature name is qualified (e.g.
+   ``"LLM.some_method"``).
+
+2. **MINOR version bump** – any breaking change (removal or structural) requires
+   at least a MINOR version bump according to SemVer.
+
+Complementary to the deprecation mechanism:
+- Deprecation (``check_deprecations.py``): enforces cleanup deadlines
+- This script: prevents unannounced removals and enforces SemVer bumps
+"""
+
+from __future__ import annotations
+
+import ast
+import json
+import os
+import re
+import subprocess
+import sys
+import tomllib
+import urllib.request
+from collections.abc import Iterable
+from dataclasses import dataclass
+from pathlib import Path
+
+from packaging import version as pkg_version
+from packaging.requirements import Requirement
+
+
+@dataclass(frozen=True)
+class PackageConfig:
+    """Configuration for a single published package."""
+
+    package: str  # dotted module path, e.g. "openhands.sdk"
+    distribution: str  # PyPI distribution name, e.g. "openhands-sdk"
+    source_dir: str  # repo-relative directory, e.g. "openhands-sdk"
+
+
+@dataclass(frozen=True, slots=True)
+class DeprecatedSymbols:
+    """Deprecated SDK symbols detected in a source tree.
+
+    ``top_level`` tracks module-level symbols (exports) like ``LLM``.
+    ``qualified`` tracks class members like ``LLM.some_method``.
+    """
+
+    top_level: set[str] = frozenset()  # type: ignore[assignment]
+    qualified: set[str] = frozenset()  # type: ignore[assignment]
+
+
+PACKAGES: tuple[PackageConfig, ...] = (
+    PackageConfig(
+        package="openhands.sdk",
+        distribution="openhands-sdk",
+        source_dir="openhands-sdk",
+    ),
+    PackageConfig(
+        package="openhands.workspace",
+        distribution="openhands-workspace",
+        source_dir="openhands-workspace",
+    ),
+    PackageConfig(
+        package="openhands.tools",
+        distribution="openhands-tools",
+        source_dir="openhands-tools",
+    ),
+)
+
+ACP_DEPENDENCY = "agent-client-protocol"
+ACP_SKIP_ENV = "ACP_VERSION_CHECK_SKIP"
+ACP_SKIP_TOKEN = "skip-acp-check"
+ACP_BASE_REF_ENV = "ACP_VERSION_CHECK_BASE_REF"
+
+
+def read_version_from_pyproject(path: str) -> str:
+    """Read the version string from a pyproject.toml file."""
+    with open(path, "rb") as f:
+        data = tomllib.load(f)
+    proj = data.get("project", {})
+    v = proj.get("version")
+    if not v:
+        raise SystemExit(f"Could not read version from {path}")
+    return str(v)
+
+
+def _read_pyproject(path: str) -> dict:
+    with open(path, "rb") as f:
+        return tomllib.load(f)
+
+
+def _bool_env(name: str) -> bool:
+    value = os.environ.get(name, "").strip().lower()
+    return value in {"1", "true", "yes", "on"}
+
+
+def _get_dependency_spec(project_data: dict, dependency: str) -> str | None:
+    deps = project_data.get("project", {}).get("dependencies", [])
+    for dep in deps:
+        if dep.startswith(dependency):
+            return dep
+    return None
+
+
+def _min_version_from_requirement(req_str: str) -> pkg_version.Version | None:
+    try:
+        req = Requirement(req_str)
+    except Exception as exc:
+        print(
+            f"::warning title=ACP version::Unable to parse requirement "
+            f"'{req_str}': {exc}"
+        )
+        return None
+
+    lower_bounds: list[pkg_version.Version] = []
+    for spec in req.specifier:
+        if spec.operator in {">=", ">", "==", "~="}:
+            try:
+                lower_bounds.append(_parse_version(spec.version))
+            except Exception as exc:
+                print(
+                    f"::warning title=ACP version::Unable to parse version "
+                    f"'{spec.version}' from '{req_str}': {exc}"
+                )
+
+    if not lower_bounds:
+        return None
+
+    return max(lower_bounds)
+
+
+def _git_show_file(ref: str, rel_path: str) -> str | None:
+    for candidate in (f"origin/{ref}", ref):
+        result = subprocess.run(
+            ["git", "show", f"{candidate}:{rel_path}"],
+            check=False,
+            capture_output=True,
+            text=True,
+        )
+        if result.returncode == 0:
+            return result.stdout
+    return None
+
+
+def _load_base_pyproject(base_ref: str) -> dict | None:
+    rel_path = "openhands-sdk/pyproject.toml"
+    content = _git_show_file(base_ref, rel_path)
+    if content is None:
+        print(
+            f"::warning title=ACP version::Unable to read {rel_path} from "
+            f"{base_ref}; skipping ACP version check"
+        )
+        return None
+    try:
+        return tomllib.loads(content)
+    except tomllib.TOMLDecodeError as exc:
+        print(
+            f"::warning title=ACP version::Failed to parse {rel_path} from "
+            f"{base_ref}: {exc}"
+        )
+        return None
+
+
+def _check_acp_version_bump(repo_root: str) -> int:
+    if _bool_env(ACP_SKIP_ENV):
+        print(
+            f"::notice title=ACP version::Skipping ACP version check because "
+            f"{ACP_SKIP_ENV} is set (token: [{ACP_SKIP_TOKEN}])."
+        )
+        return 0
+
+    base_ref = os.environ.get(ACP_BASE_REF_ENV) or os.environ.get("GITHUB_BASE_REF")
+    if not base_ref:
+        print(
+            "::warning title=ACP version::No base ref found; skipping ACP version check"
+        )
+        return 0
+
+    base_data = _load_base_pyproject(base_ref)
+    if base_data is None:
+        return 0
+
+    current_data = _read_pyproject(
+        os.path.join(repo_root, "openhands-sdk", "pyproject.toml")
+    )
+    old_req = _get_dependency_spec(base_data, ACP_DEPENDENCY)
+    new_req = _get_dependency_spec(current_data, ACP_DEPENDENCY)
+
+    if not old_req or not new_req:
+        print(
+            f"::warning title=ACP version::Unable to locate {ACP_DEPENDENCY} "
+            "dependency in pyproject.toml; skipping ACP version check"
+        )
+        return 0
+
+    old_min = _min_version_from_requirement(old_req)
+    new_min = _min_version_from_requirement(new_req)
+
+    if old_min is None or new_min is None:
+        print(
+            f"::warning title=ACP version::Unable to parse {ACP_DEPENDENCY} "
+            "minimum version; skipping ACP version check"
+        )
+        return 0
+
+    if new_min <= old_min:
+        return 0
+
+    if new_min.major != old_min.major or new_min.minor != old_min.minor:
+        print(
+            "::error title=ACP version::Detected "
+            f"{ACP_DEPENDENCY} minor/major version bump "
+            f"({old_req} -> {new_req}). If intentional, add "
+            f"[{ACP_SKIP_TOKEN}] to the PR description to bypass."
+        )
+        return 1
+
+    return 0
+
+
+def _parse_version(v: str) -> pkg_version.Version:
+    """Parse a version string using packaging."""
+    return pkg_version.parse(v)
+
+
+def get_pypi_baseline_version(pkg: str, current: str | None) -> str | None:
+    """Fetch the baseline release version from PyPI.
+
+    The baseline is the most recent published release to compare against the
+    current workspace. If the current version already exists on PyPI, compare
+    against that same release. Otherwise, fall back to the newest release older
+    than the current version. If ``current`` is None, use the latest release.
+
+    Args:
+        pkg: Package name on PyPI (e.g., "openhands-sdk")
+        current: Current version from the workspace, or None for latest
+
+    Returns:
+        Baseline version string, or None if not found or on network error
+    """
+    req = urllib.request.Request(
+        url=f"https://pypi.org/pypi/{pkg}/json",
+        headers={"User-Agent": "openhands-sdk-api-check/1.0"},
+        method="GET",
+    )
+    try:
+        with urllib.request.urlopen(req, timeout=10) as r:
+            meta = json.load(r)
+    except Exception as e:
+        print(f"::warning title={pkg} API::Failed to fetch PyPI metadata: {e}")
+        return None
+
+    releases = list(meta.get("releases", {}).keys())
+    if not releases:
+        return None
+
+    def _sort_key(s: str):
+        return _parse_version(s)
+
+    releases_sorted = sorted(releases, key=_sort_key, reverse=True)
+    if current is None:
+        return releases_sorted[0]
+
+    if current in releases:
+        return current
+
+    cur_parsed = _parse_version(current)
+    older = [rv for rv in releases if _parse_version(rv) < cur_parsed]
+    if not older:
+        return None
+    return sorted(older, key=_sort_key, reverse=True)[0]
+
+
+def ensure_griffe() -> None:
+    """Verify griffe is installed, raising an error if not."""
+    try:
+        import griffe  # noqa: F401
+    except ImportError:
+        sys.stderr.write(
+            "ERROR: griffe not installed. Install with: pip install griffe[pypi]\n"
+        )
+        raise SystemExit(1)
+
+
+def _is_field_metadata_only_change(old_val: object, new_val: object) -> bool:
+    """Check if the change is only in Field metadata (description, title, etc.).
+
+    Field metadata parameters like ``description``, ``title``, ``examples``, and
+    ``deprecated`` don't affect runtime behavior. Changes to these should not be
+    considered breaking API changes.
+
+    Returns:
+        True if both values are Field() calls and only metadata parameters differ.
+    """
+    old_str = str(old_val)
+    new_str = str(new_val)
+
+    if not (old_str.startswith("Field(") and new_str.startswith("Field(")):
+        return False
+
+    # Metadata parameters that don't affect runtime behavior.
+    # See https://docs.pydantic.dev/latest/api/fields/#pydantic.fields.Field
+    metadata_patterns = {
+        "description": r'([\'"])([^\'"]*?)\1',
+        "title": r'([\'"])([^\'"]*?)\1',
+        "examples": r'([\'"])([^\'"]*?)\1',
+        "json_schema_extra": r'([\'"])([^\'"]*?)\1',
+        "deprecated": r"(?:True|False|None|'[^']*'|\"[^\"]*\")",
+    }
+
+    def _normalize(value: str) -> str:
+        normalized = value
+        for param, value_pattern in metadata_patterns.items():
+            pattern = rf",?\s*{param}\s*=\s*{value_pattern}"
+            normalized = re.sub(pattern, "", normalized)
+
+        normalized = re.sub(r"\(\s*,", "(", normalized)
+        normalized = re.sub(r",\s*\)", ")", normalized)
+        normalized = re.sub(r",\s*,", ", ", normalized)
+        normalized = re.sub(r"\s+", " ", normalized)
+        return normalized.strip()
+
+    return _normalize(old_str) == _normalize(new_str)
+
+
+def _collect_breakages_pairs(
+    objs: Iterable[tuple[object, object]],
+    *,
+    deprecated: DeprecatedSymbols,
+    title: str,
+) -> tuple[list[object], int]:
+    """Find breaking changes between pairs of old/new API objects.
+
+    Only reports breakages for public API members.
+
+    Returns:
+        (breakages, undeprecated_removals)
+    """
+
+    import griffe
+    from griffe import Alias, AliasResolutionError, BreakageKind, ExplanationStyle, Kind
+
+    breakages: list[object] = []
+    undeprecated_removals = 0
+
+    for old, new in objs:
+        try:
+            for br in griffe.find_breaking_changes(old, new):
+                obj = getattr(br, "obj", None)
+                if not getattr(obj, "is_public", True):
+                    continue
+
+                # Skip ATTRIBUTE_CHANGED_VALUE when it's just Field metadata changes
+                # (description, title, examples, etc.) - these don't affect runtime
+                if br.kind == BreakageKind.ATTRIBUTE_CHANGED_VALUE:
+                    old_value = getattr(br, "old_value", None)
+                    new_value = getattr(br, "new_value", None)
+                    if _is_field_metadata_only_change(old_value, new_value):
+                        print(
+                            f"::notice title={title}::Ignoring Field metadata-only "
+                            f"change (non-breaking): {obj.name if obj else 'unknown'}"
+                        )
+                        continue
+
+                print(br.explain(style=ExplanationStyle.GITHUB))
+                breakages.append(br)
+
+                if br.kind != BreakageKind.OBJECT_REMOVED:
+                    continue
+
+                parent = getattr(obj, "parent", None)
+                if getattr(parent, "kind", None) != Kind.CLASS:
+                    continue
+
+                feature = f"{parent.name}.{obj.name}"
+                if (
+                    feature not in deprecated.qualified
+                    and parent.name not in deprecated.top_level
+                ):
+                    print(
+                        f"::error title={title}::Removed '{feature}' without prior "
+                        "deprecation. Mark it with @deprecated(...) or "
+                        f"warn_deprecated('{feature}', ...) for at least one release "
+                        "before removing."
+                    )
+                    undeprecated_removals += 1
+        except AliasResolutionError as e:
+            if isinstance(old, Alias) or isinstance(new, Alias):
+                old_target = old.target_path if isinstance(old, Alias) else None
+                new_target = new.target_path if isinstance(new, Alias) else None
+                if old_target != new_target:
+                    name = getattr(old, "name", None) or getattr(
+                        new, "name", "<unknown>"
+                    )
+                    print(
+                        f"::warning title={title}::Alias target changed for '{name}': "
+                        f"{old_target!r} -> {new_target!r}"
+                    )
+                    breakages.append(
+                        {
+                            "kind": "ALIAS_TARGET_CHANGED",
+                            "name": name,
+                            "old": old_target,
+                            "new": new_target,
+                        }
+                    )
+            else:
+                print(
+                    f"::notice title={title}::Skipping symbol comparison due to "
+                    f"unresolved alias: {e}"
+                )
+        except Exception as e:
+            print(f"::warning title={title}::Failed to compute breakages: {e}")
+
+    return breakages, undeprecated_removals
+
+
+def _extract_exported_names(module) -> set[str]:
+    """Extract names exported from a module via ``__all__``.
+
+    This check is explicitly meant to track the curated public surface. The SDK
+    is expected to define ``__all__`` in ``openhands.sdk``; if it's missing or we
+    can't statically interpret it, we fail fast rather than silently widening the
+    surface area (which would make the check noisy and brittle).
+    """
+    try:
+        all_var = module["__all__"]
+    except Exception as e:
+        raise ValueError("Expected __all__ to be defined on the public module") from e
+
+    val = getattr(all_var, "value", None)
+    elts = getattr(val, "elements", None)
+    if not elts:
+        raise ValueError("Unable to statically evaluate __all__")
+
+    names: set[str] = set()
+    for el in elts:
+        # Griffe represents string literals in __all__ in different ways depending
+        # on how the module is loaded / griffe version:
+        # - sometimes as plain Python strings (including quotes, e.g. "'LLM'")
+        # - sometimes as expression nodes with a `.value` attribute
+        #
+        # We intentionally only support the "static __all__ of string literals"
+        # case; we just normalize the representation.
+        if isinstance(el, str):
+            names.add(el.strip("\"'"))
+            continue
+        s = getattr(el, "value", None)
+        if isinstance(s, str):
+            names.add(s)
+
+    if not names:
+        raise ValueError("__all__ resolved to an empty set")
+
+    return names
+
+
+def _check_version_bump(prev: str, new_version: str, total_breaks: int) -> int:
+    """Check if version bump policy is satisfied for breaking changes.
+
+    Policy: Breaking changes require at least a MINOR version bump.
+
+    Returns:
+        0 if policy satisfied, 1 if not
+    """
+    if total_breaks == 0:
+        print("No breaking changes detected")
+        return 0
+
+    parsed_prev = _parse_version(prev)
+    parsed_new = _parse_version(new_version)
+
+    # MINOR bump required: same major, higher minor OR higher major
+    ok = (parsed_new.major > parsed_prev.major) or (
+        parsed_new.major == parsed_prev.major and parsed_new.minor > parsed_prev.minor
+    )
+
+    if not ok:
+        print(
+            f"::error title=SemVer::Breaking changes detected ({total_breaks}); "
+            f"require at least minor version bump from "
+            f"{parsed_prev.major}.{parsed_prev.minor}.x, but new is {new_version}"
+        )
+        return 1
+
+    print(
+        f"Breaking changes detected ({total_breaks}) and version bump policy "
+        f"satisfied ({prev} -> {new_version})"
+    )
+    return 0
+
+
+def _resolve_griffe_object(
+    root: object,
+    dotted: str,
+    root_package: str = "",
+) -> object:
+    """Resolve a dotted path to a griffe object."""
+    root_path = getattr(root, "path", None)
+    if root_path == dotted:
+        return root
+
+    if isinstance(root_path, str) and dotted.startswith(root_path + "."):
+        dotted = dotted[len(root_path) + 1 :]
+
+    try:
+        return root[dotted]
+    except (KeyError, TypeError) as e:
+        print(
+            f"::warning title=SDK API::Unable to resolve {dotted} via "
+            f"direct lookup; falling back to manual traversal: {e}"
+        )
+
+    rel = dotted
+    if root_package and dotted.startswith(root_package + "."):
+        rel = dotted[len(root_package) + 1 :]
+
+    obj = root
+    for part in rel.split("."):
+        try:
+            obj = obj[part]
+        except (KeyError, TypeError) as e:
+            raise KeyError(f"Unable to resolve {dotted}: failed at {part}") from e
+    return obj
+
+
+def _load_current(
+    griffe_module: object, repo_root: str, cfg: PackageConfig
+) -> object | None:
+    try:
+        return griffe_module.load(
+            cfg.package,
+            search_paths=[os.path.join(repo_root, cfg.source_dir)],
+        )
+    except Exception as e:
+        print(
+            f"::error title={cfg.distribution} API::"
+            f"Failed to load current {cfg.distribution}: {e}"
+        )
+        return None
+
+
+def _load_prev_from_pypi(
+    griffe_module: object,
+    prev: str,
+    cfg: PackageConfig,
+) -> object | None:
+    griffe_cache = os.path.expanduser("~/.cache/griffe")
+    os.makedirs(griffe_cache, exist_ok=True)
+
+    try:
+        return griffe_module.load_pypi(
+            package=cfg.package,
+            distribution=cfg.distribution,
+            version_spec=f"=={prev}",
+        )
+    except Exception as e:
+        print(
+            f"::error title={cfg.distribution} API::"
+            f"Failed to load {cfg.distribution}=={prev} from PyPI: {e}"
+        )
+        return None
+
+
+def _find_deprecated_symbols(source_root: Path) -> DeprecatedSymbols:
+    """Scan source files for symbols marked with the SDK deprecation helpers.
+
+    Detects two forms:
+    - ``@deprecated(...)`` decorator on a class/function/method
+    - ``warn_deprecated('SomeFeature', ...)`` call
+
+    Returns:
+        DeprecatedSymbols(top_level=..., qualified=...)
+    """
+
+    def _is_deprecated_decorator(deco: ast.AST) -> bool:
+        if not isinstance(deco, ast.Call):
+            return False
+        target = deco.func
+        if isinstance(target, ast.Name):
+            return target.id == "deprecated"
+        if isinstance(target, ast.Attribute):
+            return target.attr == "deprecated"
+        return False
+
+    class _Visitor(ast.NodeVisitor):
+        def __init__(self) -> None:
+            self.class_stack: list[str] = []
+            self.top_level: set[str] = set()
+            self.qualified: set[str] = set()
+
+        def visit_ClassDef(self, node: ast.ClassDef) -> None:  # noqa: N802
+            if any(_is_deprecated_decorator(deco) for deco in node.decorator_list):
+                self.top_level.add(node.name)
+                self.qualified.add(node.name)
+
+            self.class_stack.append(node.name)
+            self.generic_visit(node)
+            self.class_stack.pop()
+
+        def _visit_function_like(
+            self,
+            node: ast.FunctionDef | ast.AsyncFunctionDef,
+        ) -> None:
+            if any(_is_deprecated_decorator(deco) for deco in node.decorator_list):
+                if self.class_stack:
+                    self.qualified.add(".".join([*self.class_stack, node.name]))
+                else:
+                    self.top_level.add(node.name)
+                    self.qualified.add(node.name)
+
+            self.generic_visit(node)
+
+        def visit_FunctionDef(self, node: ast.FunctionDef) -> None:  # noqa: N802
+            self._visit_function_like(node)
+
+        def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> None:  # noqa: N802
+            self._visit_function_like(node)
+
+        def visit_Call(self, node: ast.Call) -> None:  # noqa: N802
+            target = node.func
+            func_name = None
+            if isinstance(target, ast.Name):
+                func_name = target.id
+            elif isinstance(target, ast.Attribute):
+                func_name = target.attr
+
+            if func_name == "warn_deprecated" and node.args:
+                feature = _extract_string_literal(node.args[0])
+                if feature is not None:
+                    self.qualified.add(feature)
+                    self.top_level.add(feature.split(".")[0])
+
+            self.generic_visit(node)
+
+    top_level: set[str] = set()
+    qualified: set[str] = set()
+
+    for pyfile in source_root.rglob("*.py"):
+        try:
+            tree = ast.parse(pyfile.read_text())
+        except SyntaxError as e:
+            print(
+                f"::warning title=SDK API::Skipping {pyfile}: "
+                f"failed to parse (SyntaxError: {e})"
+            )
+            continue
+
+        visitor = _Visitor()
+        visitor.visit(tree)
+        top_level |= visitor.top_level
+        qualified |= visitor.qualified
+
+    return DeprecatedSymbols(top_level=top_level, qualified=qualified)
+
+
+def _extract_string_literal(node: ast.AST) -> str | None:
+    """Return the string value if *node* is a simple string literal."""
+    if isinstance(node, ast.Constant) and isinstance(node.value, str):
+        return node.value
+    return None
+
+
+def _get_source_root(griffe_root: object) -> Path | None:
+    """Derive the package source directory from a griffe module's filepath."""
+    filepath = getattr(griffe_root, "filepath", None)
+    if filepath is not None:
+        return Path(filepath).parent
+    return None
+
+
+def _compute_breakages(old_root, new_root, cfg: PackageConfig) -> tuple[int, int]:
+    """Detect breaking changes between old and new package versions.
+
+    Returns:
+        ``(total_breaks, undeprecated_removals)`` — *total_breaks* counts all
+        structural breakages (for the version-bump policy), while
+        *undeprecated_removals* counts public API removals (exports and class
+        members) without a prior deprecation marker (a separate hard failure).
+    """
+    pkg = cfg.package
+    title = f"{cfg.distribution} API"
+    total_breaks = 0
+    undeprecated_removals = 0
+
+    source_root = _get_source_root(old_root)
+    deprecated = (
+        _find_deprecated_symbols(source_root) if source_root else DeprecatedSymbols()
+    )
+
+    try:
+        old_mod = _resolve_griffe_object(old_root, pkg, root_package=pkg)
+        new_mod = _resolve_griffe_object(new_root, pkg, root_package=pkg)
+    except Exception as e:
+        raise RuntimeError(f"Failed to resolve root module '{pkg}'") from e
+
+    new_exports = _extract_exported_names(new_mod)
+    try:
+        old_exports = _extract_exported_names(old_mod)
+    except ValueError as e:
+        # The API breakage check relies on a curated public surface defined via
+        # __all__. If the baseline release didn't define (or couldn't statically
+        # evaluate) __all__, we can't compute meaningful breakages.
+        #
+        # In this situation, skip rather than failing the entire workflow.
+        print(
+            f"::notice title={title}::Skipping breakage check; baseline release "
+            f"has no statically-evaluable {pkg}.__all__: {e}"
+        )
+        return 0, 0
+
+    removed = sorted(old_exports - new_exports)
+
+    # Check deprecation-before-removal policy (exports)
+    for name in removed:
+        total_breaks += 1  # every removal is a structural break
+        if name not in deprecated.top_level:
+            print(
+                f"::error title={title}::Removed '{name}' from "
+                f"{pkg}.__all__ without prior deprecation. "
+                "Mark it with @deprecated or warn_deprecated() "
+                "for at least one release before removing."
+            )
+            undeprecated_removals += 1
+        else:
+            print(
+                f"::notice title={title}::Removed previously-deprecated symbol "
+                f"'{name}' from {pkg}.__all__"
+            )
+
+    common = sorted(old_exports & new_exports)
+    pairs: list[tuple[object, object]] = []
+    for name in common:
+        try:
+            pairs.append((old_mod[name], new_mod[name]))
+        except Exception as e:
+            print(f"::warning title={title}::Unable to resolve symbol {name}: {e}")
+
+    breakages, undeprecated_members = _collect_breakages_pairs(
+        pairs,
+        deprecated=deprecated,
+        title=title,
+    )
+    total_breaks += len(breakages)
+    undeprecated_removals += undeprecated_members
+
+    return total_breaks, undeprecated_removals
+
+
+def _check_package(griffe_module, repo_root: str, cfg: PackageConfig) -> int:
+    """Run breakage checks for a single package. Returns 0 on success."""
+    pyproj = os.path.join(repo_root, cfg.source_dir, "pyproject.toml")
+    new_version = read_version_from_pyproject(pyproj)
+
+    title = f"{cfg.distribution} API"
+    baseline = get_pypi_baseline_version(cfg.distribution, new_version)
+    if not baseline:
+        print(
+            f"::warning title={title}::No baseline {cfg.distribution} "
+            f"release found; skipping breakage check",
+        )
+        return 0
+
+    print(f"Comparing {cfg.distribution} {new_version} against {baseline}")
+
+    new_root = _load_current(griffe_module, repo_root, cfg)
+    if not new_root:
+        return 1
+
+    old_root = _load_prev_from_pypi(griffe_module, baseline, cfg)
+    if not old_root:
+        return 1
+
+    try:
+        total_breaks, undeprecated = _compute_breakages(old_root, new_root, cfg)
+    except Exception as e:
+        print(f"::error title={title}::Failed to compute breakages: {e}")
+        return 1
+
+    if undeprecated:
+        print(
+            f"::error title={title}::{undeprecated} symbol(s) removed "
+            f"from {cfg.package} without prior deprecation — "
+            f"see errors above"
+        )
+
+    bump_rc = _check_version_bump(baseline, new_version, total_breaks)
+
+    return 1 if (undeprecated or bump_rc) else 0
+
+
+def main() -> int:
+    """Main entry point for API breakage detection."""
+    repo_root = os.getcwd()
+    rc = _check_acp_version_bump(repo_root)
+
+    ensure_griffe()
+    import griffe
+
+    for cfg in PACKAGES:
+        print(f"\n{'=' * 60}")
+        print(f"Checking {cfg.distribution} ({cfg.package})")
+        print(f"{'=' * 60}")
+        rc |= _check_package(griffe, repo_root, cfg)
+
+    return rc
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
@@ -0,0 +1,196 @@
+"""Guard package version changes so they only happen in release PRs."""
+
+from __future__ import annotations
+
+import os
+import re
+import subprocess
+import sys
+import tomllib
+from dataclasses import dataclass
+from pathlib import Path
+
+
+PACKAGE_PYPROJECTS: dict[str, Path] = {
+    "openhands-sdk": Path("openhands-sdk/pyproject.toml"),
+    "openhands-tools": Path("openhands-tools/pyproject.toml"),
+    "openhands-workspace": Path("openhands-workspace/pyproject.toml"),
+    "openhands-agent-server": Path("openhands-agent-server/pyproject.toml"),
+}
+
+_VERSION_PATTERN = r"\d+\.\d+\.\d+(?:[-+][0-9A-Za-z.]+)?"
+_RELEASE_TITLE_RE = re.compile(rf"^Release v(?P<version>{_VERSION_PATTERN})$")
+_RELEASE_BRANCH_RE = re.compile(rf"^rel-(?P<version>{_VERSION_PATTERN})$")
+
+
+@dataclass(frozen=True)
+class VersionChange:
+    package: str
+    path: Path
+    previous_version: str
+    current_version: str
+
+
+def _read_version_from_pyproject_text(text: str, source: str) -> str:
+    data = tomllib.loads(text)
+    version = data.get("project", {}).get("version")
+    if not isinstance(version, str):
+        raise SystemExit(f"Unable to determine project.version from {source}")
+    return version
+
+
+def _read_current_version(repo_root: Path, pyproject: Path) -> str:
+    return _read_version_from_pyproject_text(
+        (repo_root / pyproject).read_text(),
+        str(pyproject),
+    )
+
+
+def _read_version_from_git_ref(repo_root: Path, git_ref: str, pyproject: Path) -> str:
+    result = subprocess.run(
+        ["git", "show", f"{git_ref}:{pyproject.as_posix()}"],
+        cwd=repo_root,
+        check=False,
+        capture_output=True,
+        text=True,
+    )
+    if result.returncode != 0:
+        message = result.stderr.strip() or result.stdout.strip() or "unknown git error"
+        raise SystemExit(
+            f"Unable to read {pyproject} from git ref {git_ref}: {message}"
+        )
+    return _read_version_from_pyproject_text(result.stdout, f"{git_ref}:{pyproject}")
+
+
+def _base_ref_candidates(base_ref: str) -> list[str]:
+    if base_ref.startswith("origin/"):
+        return [base_ref, base_ref.removeprefix("origin/")]
+    return [f"origin/{base_ref}", base_ref]
+
+
+def find_version_changes(repo_root: Path, base_ref: str) -> list[VersionChange]:
+    changes: list[VersionChange] = []
+    candidates = _base_ref_candidates(base_ref)
+
+    for package, pyproject in PACKAGE_PYPROJECTS.items():
+        current_version = _read_current_version(repo_root, pyproject)
+        previous_error: SystemExit | None = None
+        previous_version: str | None = None
+
+        for candidate in candidates:
+            try:
+                previous_version = _read_version_from_git_ref(
+                    repo_root, candidate, pyproject
+                )
+                break
+            except SystemExit as exc:
+                previous_error = exc
+
+        if previous_version is None:
+            assert previous_error is not None
+            raise previous_error
+
+        if previous_version != current_version:
+            changes.append(
+                VersionChange(
+                    package=package,
+                    path=pyproject,
+                    previous_version=previous_version,
+                    current_version=current_version,
+                )
+            )
+
+    return changes
+
+
+def get_release_pr_version(
+    pr_title: str, pr_head_ref: str
+) -> tuple[str | None, list[str]]:
+    title_match = _RELEASE_TITLE_RE.fullmatch(pr_title.strip())
+    branch_match = _RELEASE_BRANCH_RE.fullmatch(pr_head_ref.strip())
+    title_version = title_match.group("version") if title_match else None
+    branch_version = branch_match.group("version") if branch_match else None
+
+    if title_version and branch_version and title_version != branch_version:
+        return None, [
+            "Release PR markers disagree: title requests "
+            f"v{title_version} but branch is rel-{branch_version}."
+        ]
+
+    return title_version or branch_version, []
+
+
+def validate_version_changes(
+    changes: list[VersionChange],
+    pr_title: str,
+    pr_head_ref: str,
+) -> list[str]:
+    if not changes:
+        return []
+
+    release_version, errors = get_release_pr_version(pr_title, pr_head_ref)
+    if errors:
+        return errors
+
+    formatted_changes = ", ".join(
+        f"{change.package} ({change.previous_version} -> {change.current_version})"
+        for change in changes
+    )
+
+    if release_version is None:
+        return [
+            "Package version changes are only allowed in release PRs. "
+            f"Detected changes: {formatted_changes}. "
+            "Use the Prepare Release workflow so the PR title is 'Release vX.Y.Z' "
+            "or the branch is 'rel-X.Y.Z'."
+        ]
+
+    mismatched = [
+        change for change in changes if change.current_version != release_version
+    ]
+    if mismatched:
+        mismatch_details = ", ".join(
+            f"{change.package} ({change.current_version})" for change in mismatched
+        )
+        return [
+            f"Release PR version v{release_version} does not match changed package "
+            f"versions: {mismatch_details}."
+        ]
+
+    return []
+
+
+def main() -> int:
+    repo_root = Path(__file__).resolve().parents[2]
+    base_ref = os.environ.get("VERSION_BUMP_BASE_REF") or os.environ.get(
+        "GITHUB_BASE_REF"
+    )
+    if not base_ref:
+        print("::warning title=Version bump guard::No base ref found; skipping check.")
+        return 0
+
+    pr_title = os.environ.get("PR_TITLE", "")
+    pr_head_ref = os.environ.get("PR_HEAD_REF", "")
+
+    changes = find_version_changes(repo_root, base_ref)
+    errors = validate_version_changes(changes, pr_title, pr_head_ref)
+
+    if errors:
+        for error in errors:
+            print(f"::error title=Version bump guard::{error}")
+        return 1
+
+    if changes:
+        changed_packages = ", ".join(change.package for change in changes)
+        print(
+            "::notice title=Version bump guard::"
+            f"Release PR version changes validated for {changed_packages}."
+        )
+    else:
+        print("::notice title=Version bump guard::No package version changes detected.")
+
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
@@ -1,58 +0,0 @@
-#!/bin/bash
-
-set -euxo pipefail
-
-# This script updates the PR description with commands to run the PR locally
-# It adds both Docker and uvx commands
-
-# Get the branch name for the PR
-BRANCH_NAME=$(gh pr view "$PR_NUMBER" --json headRefName --jq .headRefName)
-
-# Define the Docker command
-DOCKER_RUN_COMMAND="docker run -it --rm \
-  -p 3000:3000 \
-  -v /var/run/docker.sock:/var/run/docker.sock \
-  --add-host host.docker.internal:host-gateway \
-  -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.openhands.dev/openhands/runtime:${SHORT_SHA}-nikolaik \
-  --name openhands-app-${SHORT_SHA} \
-  docker.openhands.dev/openhands/openhands:${SHORT_SHA}"
-
-# Get the current PR body
-PR_BODY=$(gh pr view "$PR_NUMBER" --json body --jq .body)
-
-# Prepare the new PR body with both commands
-if echo "$PR_BODY" | grep -q "To run this PR locally, use the following command:"; then
-  # For existing PR descriptions, use a more robust approach
-  # Split the PR body at the "To run this PR locally" section and replace everything after it
-  BEFORE_SECTION=$(echo "$PR_BODY" | sed '/To run this PR locally, use the following command:/,$d')
-  NEW_PR_BODY=$(cat <<EOF
-${BEFORE_SECTION}
-
-To run this PR locally, use the following command:
-
-GUI with Docker:
-\`\`\`
-${DOCKER_RUN_COMMAND}
-\`\`\`
-EOF
-)
-else
-  # For new PR descriptions: use heredoc safely without indentation
-  NEW_PR_BODY=$(cat <<EOF
-$PR_BODY
-
---
-
-To run this PR locally, use the following command:
-
-GUI with Docker:
-\`\`\`
-${DOCKER_RUN_COMMAND}
-\`\`\`
-EOF
-)
-fi
-
-# Update the PR description
-echo "Updating PR description with Docker and uvx commands"
-gh pr edit "$PR_NUMBER" --body "$NEW_PR_BODY"
@@ -0,0 +1,122 @@
+#!/usr/bin/env python3
+"""Update the sdk_ref default value in run-eval.yml.
+
+This script updates the default SDK reference version in the run-eval workflow
+to match a new release version.
+"""
+
+from __future__ import annotations
+
+import argparse
+import re
+import sys
+from pathlib import Path
+
+
+REPO_ROOT = Path(__file__).resolve().parents[2]
+RUN_EVAL_WORKFLOW = REPO_ROOT / ".github" / "workflows" / "run-eval.yml"
+
+# Pattern to match the sdk_ref default line
+# Matches: "default: vX.Y.Z" with optional prerelease suffix like -rc1, -beta.1
+SDK_REF_PATTERN = re.compile(
+    r"^(\s*default:\s*v)[\d]+\.[\d]+\.[\d]+(-[a-zA-Z0-9.]+)?(\s*)$"
+)
+
+
+def update_sdk_ref_default(new_version: str, dry_run: bool = False) -> bool:
+    """Update the sdk_ref default in run-eval.yml.
+
+    Args:
+        new_version: The new version (without 'v' prefix, e.g., "1.12.0")
+        dry_run: If True, print what would change without modifying the file
+
+    Returns:
+        True if successful, False otherwise
+    """
+    if not RUN_EVAL_WORKFLOW.exists():
+        print(f"❌ File not found: {RUN_EVAL_WORKFLOW}", file=sys.stderr)
+        return False
+
+    content = RUN_EVAL_WORKFLOW.read_text()
+    lines = content.splitlines(keepends=True)
+
+    # Find the sdk_ref input section and its default line
+    in_sdk_ref_section = False
+    updated = False
+    old_version = None
+
+    for i, line in enumerate(lines):
+        stripped = line.strip()
+
+        # Track when we enter the sdk_ref input section
+        if stripped == "sdk_ref:":
+            in_sdk_ref_section = True
+            continue
+
+        # Track when we exit the sdk_ref section (another input starts)
+        if (
+            in_sdk_ref_section
+            and stripped.endswith(":")
+            and not stripped.startswith("default")
+        ):
+            in_sdk_ref_section = False
+
+        # Update the default line within the sdk_ref section
+        if in_sdk_ref_section:
+            match = SDK_REF_PATTERN.match(line)
+            if match:
+                old_version = line.strip().replace("default: ", "")
+                new_line = f"{match.group(1)}{new_version}{match.group(3) or ''}"
+                if not line.endswith("\n") and lines[i].endswith("\n"):
+                    new_line += "\n"
+                elif line.endswith("\n"):
+                    new_line += "\n"
+                lines[i] = new_line
+                updated = True
+                break
+
+    if not updated:
+        print("❌ Could not find sdk_ref default line to update", file=sys.stderr)
+        return False
+
+    if dry_run:
+        print(f"Would update sdk_ref default: {old_version} → v{new_version}")
+        return True
+
+    # Write the updated content
+    RUN_EVAL_WORKFLOW.write_text("".join(lines))
+    print(f"✅ Updated sdk_ref default: {old_version} → v{new_version}")
+    return True
+
+
+def main() -> int:
+    parser = argparse.ArgumentParser(
+        description="Update the sdk_ref default value in run-eval.yml"
+    )
+    parser.add_argument(
+        "version",
+        help="New version (without 'v' prefix, e.g., '1.12.0')",
+    )
+    parser.add_argument(
+        "--dry-run",
+        action="store_true",
+        help="Print what would change without modifying the file",
+    )
+    args = parser.parse_args()
+
+    # Validate version format
+    version_pattern = re.compile(r"^\d+\.\d+\.\d+(-[a-zA-Z0-9.]+)?$")
+    if not version_pattern.match(args.version):
+        print(
+            f"❌ Invalid version format: {args.version}. "
+            "Expected: X.Y.Z or X.Y.Z-suffix",
+            file=sys.stderr,
+        )
+        return 1
+
+    success = update_sdk_ref_default(args.version, dry_run=args.dry_run)
+    return 0 if success else 1
+
+
+if __name__ == "__main__":
+    sys.exit(main())
@@ -0,0 +1,125 @@
+# Release Automation Workflows
+
+This document describes the automated release workflows for the OpenHands Software Agent SDK.
+
+## Overview
+
+The release process has been automated with two GitHub Actions workflows:
+
+1. **prepare-release.yml** - Prepares a release PR with version updates
+2. **pypi-release.yml** - Automatically publishes packages to PyPI when a release is created
+
+## How to Create a New Release
+
+### Step 1: Trigger the Prepare Release Workflow
+
+1. Go to the [Actions tab](https://github.com/OpenHands/software-agent-sdk/actions)
+2. Select **"Prepare Release"** workflow from the left sidebar
+3. Click **"Run workflow"** button
+4. Enter the version number (e.g., `1.2.3`) - must be in format `X.Y.Z`
+5. Click **"Run workflow"**
+
+The workflow will automatically:
+- ✅ Create a new branch named `rel-X.Y.Z`
+- ✅ Update all package versions using `make set-package-version`
+- ✅ Commit the changes
+- ✅ Push the branch
+- ✅ Create a PR with labels `integration-tests` and `test-examples`
+
+### Step 2: Review the PR
+
+The created PR will include a checklist. Complete the following:
+
+- [ ] Fix any deprecation deadlines if they exist
+- [ ] Verify integration tests pass (triggered by `integration-tests` label)
+- [ ] Verify example checks pass (triggered by `test-examples` label)
+- [ ] Review and approve the PR
+
+### Step 3: Create the GitHub Release
+
+1. Go to [Releases](https://github.com/OpenHands/software-agent-sdk/releases/new)
+2. Click **"Draft a new release"**
+3. Configure the release:
+   - **Tag**: `vX.Y.Z` (must match the version)
+   - **Branch**: `rel-X.Y.Z` (the branch created by the workflow)
+   - **Previous tag**: Select the previous release version
+4. Click **"Generate release notes"** to auto-generate the changelog
+5. Review and edit the release notes as needed
+6. Click **"Publish release"**
+
+### Step 4: PyPI Publication (Automated)
+
+Once the release is published, the **pypi-release.yml** workflow will automatically:
+- ✅ Build all packages (openhands-sdk, openhands-tools, openhands-workspace, openhands-agent-server)
+- ✅ Publish them to PyPI
+
+You can monitor the progress in the [Actions tab](https://github.com/OpenHands/software-agent-sdk/actions/workflows/pypi-release.yml).
+
+### Step 5: Version Bump PRs (Automated)
+
+After successful PyPI publication, the workflow will automatically create PRs to update SDK versions in downstream repositories:
+
+- **[OpenHands](https://github.com/All-Hands-AI/OpenHands)** - Updates `openhands-sdk`, `openhands-tools`, and `openhands-agent-server` versions
+- **[OpenHands-CLI](https://github.com/All-Hands-AI/openhands-cli)** - Updates `openhands-sdk` and `openhands-tools` versions
+
+These PRs will:
+- Be created automatically with branch name `bump-sdk-X.Y.Z`
+- Include links back to the SDK release
+- Need to be reviewed and merged by the respective repository maintainers
+
+### Step 6: Post-Release Tasks
+
+- [ ] Merge the release PR to main
+- [ ] Review and merge the auto-created version bump PRs in OpenHands and OpenHands-CLI
+- [ ] Run evaluation on OpenHands Index (manual step)
+- [ ] Announce the release
+
+## Manual PyPI Release (If Needed)
+
+If you need to manually trigger the PyPI release workflow:
+
+1. Go to the [Actions tab](https://github.com/OpenHands/software-agent-sdk/actions)
+2. Select **"Publish all OpenHands packages (uv)"** workflow
+3. Click **"Run workflow"**
+4. Select the branch/tag you want to publish from
+5. Click **"Run workflow"**
+
+## Workflow Files
+
+- `.github/workflows/prepare-release.yml` - Automated release preparation
+- `.github/workflows/pypi-release.yml` - PyPI package publication
+
+## Troubleshooting
+
+### Version Format Error
+
+If you get a version format error, ensure you're using the format `X.Y.Z` (e.g., `1.2.3`), not `vX.Y.Z`.
+
+### PR Creation Failed
+
+If the PR creation fails, check:
+- The branch doesn't already exist
+- You have proper permissions
+- The `GITHUB_TOKEN` has sufficient permissions
+
+### PyPI Publication Failed
+
+If PyPI publication fails:
+- Check that the `PYPI_TOKEN_OPENHANDS` secret is properly configured
+- Verify the version doesn't already exist on PyPI
+- Check the workflow logs for specific error messages
+
+## Previous Manual Process
+
+For reference, the previous manual release checklist was:
+
+- [ ] Checkout SDK repo, use `make set-package-version version=x.x.x` to set the version
+- [ ] Push to a branch like `rel-x.x.x` and start a PR
+- [ ] Fix any "deprecation deadlines" if they exist
+- [ ] Tag "integration-tests" and make sure integration test all pass
+- [ ] Tag "test-examples" and make sure example checks all pass
+- [ ] Draft a new release
+- [ ] Use workflow to publish to PyPI on tag `v1.X.X`
+- [ ] Evaluation on OpenHands Index
+
+Most of these steps are now automated!
@@ -0,0 +1,154 @@
+---
+name: REST API breakage checks
+
+on:
+    push:
+        branches: [main]
+    pull_request:
+        branches: [main]
+
+jobs:
+    agent-server-rest-api:
+        name: REST API (OpenAPI)
+        runs-on: ubuntu-latest
+        permissions:
+            contents: read
+            pull-requests: write
+        steps:
+            - name: Checkout
+              uses: actions/checkout@v5
+              with:
+                  fetch-depth: 0
+
+            - name: Install uv
+              uses: astral-sh/setup-uv@v7
+              with:
+                  enable-cache: true
+
+            - name: Install workspace deps (dev)
+              run: uv sync --frozen --group dev
+
+            - name: Install oasdiff
+              run: |
+                  curl -L https://raw.githubusercontent.com/oasdiff/oasdiff/main/install.sh | sh -s -- -b /usr/local/bin
+                  oasdiff --version
+
+            - name: Run agent server REST API breakage check
+              id: api_breakage
+              # Let this step fail so CI is visibly red on breakage.
+              # Later reporting steps still run because they use if: always().
+              run: |
+                  uv run --with packaging python .github/scripts/check_agent_server_rest_api_breakage.py 2>&1 | tee api-breakage.log
+                  exit_code=${PIPESTATUS[0]}
+                  echo "exit_code=${exit_code}" >> "$GITHUB_OUTPUT"
+                  exit "${exit_code}"
+
+            - name: Write REST API breakage summary
+              if: ${{ always() }}
+              env:
+                  EXIT_CODE: ${{ steps.api_breakage.outputs.exit_code }}
+                  IS_FORK: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name != github.repository }}
+                  LOG_PATH: api-breakage.log
+                  RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
+              run: |
+                  python3 <<'PY' >> "$GITHUB_STEP_SUMMARY"
+                  import os
+                  from pathlib import Path
+
+                  exit_code = int(os.environ.get('EXIT_CODE', '0') or '0')
+                  is_fork = os.environ.get('IS_FORK', 'false') == 'true'
+                  run_url = os.environ['RUN_URL']
+                  status = '✅ **PASSED**' if exit_code == 0 else '❌ **FAILED**'
+
+                  print(f'## REST API breakage checks (OpenAPI) — {status}')
+                  print()
+                  print(f"**Result:** {status}")
+                  if exit_code != 0:
+                      print()
+                      print('> ⚠️ Breaking REST API changes or policy violations detected.')
+                  print()
+
+                  if is_fork:
+                      print(
+                          '_Fork PR detected: sticky PR comment was skipped because '
+                          'the GitHub token is read-only for `pull_request` workflows '
+                          'from forks._'
+                      )
+                      print()
+
+                  if exit_code != 0:
+                      try:
+                          log = Path(os.environ['LOG_PATH']).read_text()
+                      except Exception as exc:
+                          log = f'Unable to read log file: {exc}'
+
+                      excerpt = log[:1000].replace('```', '``\\`')
+                      print('<details><summary>Log excerpt (first 1000 characters)</summary>')
+                      print()
+                      print('```text')
+                      print(excerpt)
+                      print('```')
+                      print()
+                      print('</details>')
+                      print()
+
+                  print(f'[Action log]({run_url})')
+                  PY
+
+            - name: Post REST API breakage report to PR
+              if: ${{ always() && github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository }}
+              uses: actions/github-script@v8
+              env:
+                  EXIT_CODE: ${{ steps.api_breakage.outputs.exit_code }}
+                  LOG_PATH: api-breakage.log
+              with:
+                  script: |
+                      const fs = require('fs');
+
+                      const marker = '<!-- agent-server-rest-api-breakage-report -->';
+                      const exitCode = Number(process.env.EXIT_CODE || '0');
+                      const runUrl = `${context.serverUrl}/${context.repo.owner}/${context.repo.repo}/actions/runs/${context.runId}`;
+                      const status = exitCode === 0 ? '✅ **PASSED**' : '❌ **FAILED**';
+
+                      let body = `${marker}\n## REST API breakage checks (OpenAPI) — ${status}\n\n**Result:** ${status}\n`;
+
+                      if (exitCode !== 0) {
+                        body += `\n> ⚠️ Breaking REST API changes or policy violations detected.\n`;
+                        let log = '';
+                        try {
+                          log = fs.readFileSync(process.env.LOG_PATH, 'utf8');
+                        } catch (e) {
+                          log = `Unable to read log file: ${e}`;
+                        }
+
+                        const excerpt = log.slice(0, 1000).replace(/```/g, '``\\`');
+                        body += `\n<details><summary>Log excerpt (first 1000 characters)</summary>\n\n\`\`\`text\n${excerpt}\n\`\`\`\n\n</details>\n`;
+                      }
+
+                      body += `\n[Action log](${runUrl})\n`;
+
+                      const { owner, repo } = context.repo;
+                      const issue_number = context.issue.number;
+                      const { data: comments } = await github.rest.issues.listComments({
+                        owner,
+                        repo,
+                        issue_number,
+                        per_page: 100,
+                      });
+
+                      const existing = comments.find((c) => c.body && c.body.includes(marker));
+                      if (existing) {
+                        await github.rest.issues.updateComment({
+                          owner,
+                          repo,
+                          comment_id: existing.id,
+                          body,
+                        });
+                      } else {
+                        await github.rest.issues.createComment({
+                          owner,
+                          repo,
+                          issue_number,
+                          body,
+                        });
+                      }
@@ -0,0 +1,149 @@
+---
+name: Python API breakage checks
+
+on:
+    push:
+        branches: [main]
+    pull_request:
+        branches: [main]
+
+jobs:
+    sdk-api:
+        name: Python API
+        runs-on: ubuntu-latest
+        permissions:
+            contents: read
+            pull-requests: write
+        steps:
+            - name: Checkout
+              uses: actions/checkout@v5
+              with:
+                  fetch-depth: 0
+            - name: Install uv
+              uses: astral-sh/setup-uv@v7
+              with:
+                  enable-cache: true
+            - name: Install workspace deps (dev)
+              run: uv sync --frozen --group dev
+            - name: Run Python API breakage check
+              id: api_breakage
+              # Let this step fail so CI is visibly red on breakage.
+              # Later reporting steps still run because they use if: always().
+              env:
+                  ACP_VERSION_CHECK_BASE_REF: ${{ github.event_name == 'pull_request' && github.base_ref || github.event.before }}
+                  ACP_VERSION_CHECK_SKIP: ${{ github.event_name == 'pull_request' && contains(github.event.pull_request.body || '', 'skip-acp-check') 
+                      }}
+              run: |
+                  uv run python .github/scripts/check_sdk_api_breakage.py 2>&1 | tee api-breakage.log
+                  exit_code=${PIPESTATUS[0]}
+                  echo "exit_code=${exit_code}" >> "$GITHUB_OUTPUT"
+                  exit "${exit_code}"
+            - name: Write API breakage summary
+              if: ${{ always() }}
+              env:
+                  EXIT_CODE: ${{ steps.api_breakage.outputs.exit_code }}
+                  IS_FORK: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name != github.repository }}
+                  LOG_PATH: api-breakage.log
+                  RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
+              run: |
+                  python3 <<'PY' >> "$GITHUB_STEP_SUMMARY"
+                  import os
+                  from pathlib import Path
+
+                  exit_code = int(os.environ.get('EXIT_CODE', '0') or '0')
+                  is_fork = os.environ.get('IS_FORK', 'false') == 'true'
+                  run_url = os.environ['RUN_URL']
+                  status = '✅ **PASSED**' if exit_code == 0 else '❌ **FAILED**'
+
+                  print(f'## Python API breakage checks — {status}')
+                  print()
+                  print(f"**Result:** {status}")
+                  if exit_code != 0:
+                      print()
+                      print('> ⚠️ Breaking API changes or policy violations detected.')
+                  print()
+
+                  if is_fork:
+                      print(
+                          '_Fork PR detected: sticky PR comment was skipped because '
+                          'the GitHub token is read-only for `pull_request` workflows '
+                          'from forks._'
+                      )
+                      print()
+
+                  if exit_code != 0:
+                      try:
+                          log = Path(os.environ['LOG_PATH']).read_text()
+                      except Exception as exc:
+                          log = f'Unable to read log file: {exc}'
+
+                      excerpt = log[:1000].replace('```', '``\\`')
+                      print('<details><summary>Log excerpt (first 1000 characters)</summary>')
+                      print()
+                      print('```text')
+                      print(excerpt)
+                      print('```')
+                      print()
+                      print('</details>')
+                      print()
+
+                  print(f'[Action log]({run_url})')
+                  PY
+
+            - name: Post API breakage report to PR
+              if: ${{ always() && github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository }}
+              uses: actions/github-script@v8
+              env:
+                  EXIT_CODE: ${{ steps.api_breakage.outputs.exit_code }}
+                  LOG_PATH: api-breakage.log
+              with:
+                  script: |
+                      const fs = require('fs');
+
+                      const marker = '<!-- api-breakage-report -->';
+                      const exitCode = Number(process.env.EXIT_CODE || '0');
+                      const runUrl = `${context.serverUrl}/${context.repo.owner}/${context.repo.repo}/actions/runs/${context.runId}`;
+                      const status = exitCode === 0 ? '✅ **PASSED**' : '❌ **FAILED**';
+
+                      let body = `${marker}\n## Python API breakage checks — ${status}\n\n**Result:** ${status}\n`;
+
+                      if (exitCode !== 0) {
+                        body += `\n> ⚠️ Breaking API changes or policy violations detected.\n`;
+                        let log = '';
+                        try {
+                          log = fs.readFileSync(process.env.LOG_PATH, 'utf8');
+                        } catch (e) {
+                          log = `Unable to read log file: ${e}`;
+                        }
+
+                        const excerpt = log.slice(0, 1000).replace(/```/g, '``\\`');
+                        body += `\n<details><summary>Log excerpt (first 1000 characters)</summary>\n\n\`\`\`text\n${excerpt}\n\`\`\`\n\n</details>\n`;
+                      }
+
+                      body += `\n[Action log](${runUrl})\n`;
+
+                      const { owner, repo } = context.repo;
+                      const issue_number = context.issue.number;
+                      const { data: comments } = await github.rest.issues.listComments({
+                        owner,
+                        repo,
+                        issue_number,
+                        per_page: 100,
+                      });
+
+                      const existing = comments.find((c) => c.body && c.body.includes(marker));
+                      if (existing) {
+                        await github.rest.issues.updateComment({
+                          owner,
+                          repo,
+                          comment_id: existing.id,
+                          body,
+                        });
+                      } else {
+                        await github.rest.issues.createComment({
+                          owner,
+                          repo,
+                          issue_number,
+                          body,
+                        });
+                      }
@@ -0,0 +1,130 @@
+---
+name: API Compliance Tests
+
+on:
+    pull_request:
+        types: [labeled]
+    workflow_dispatch:
+        inputs:
+            reason:
+                description: Reason for running compliance tests
+                required: true
+            patterns:
+                description: Comma-separated patterns to test (empty = all)
+                required: false
+            models:
+                description: Comma-separated model IDs (empty = all defaults)
+                required: false
+
+env:
+    # Default models to test (matches DEFAULT_MODELS in run_compliance.py)
+    DEFAULT_MODELS: claude-sonnet-4-5,gpt-5.2,gemini-3-pro
+
+jobs:
+    run-compliance-tests:
+        # Only run on api-compliance-test label or workflow_dispatch
+        if: |
+            github.event_name == 'workflow_dispatch' ||
+            (github.event_name == 'pull_request' && github.event.label.name == 'api-compliance-test')
+        runs-on: ubuntu-latest
+        permissions:
+            contents: read
+            pull-requests: write
+        steps:
+            - name: Checkout repository
+              uses: actions/checkout@v5
+              with:
+                  repository: ${{ github.event.pull_request.head.repo.full_name || github.repository }}
+                  ref: ${{ github.event.pull_request.head.sha || github.ref }}
+                  persist-credentials: false
+
+            - name: Install uv
+              uses: astral-sh/setup-uv@v7
+              with:
+                  version: latest
+                  python-version: '3.13'
+
+            - name: Install dependencies
+              run: uv sync --dev
+
+            - name: Determine test parameters
+              id: params
+              run: |
+                  # Use input values or defaults
+                  if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
+                    PATTERNS="${{ github.event.inputs.patterns }}"
+                    MODELS="${{ github.event.inputs.models }}"
+                  else
+                    PATTERNS=""
+                    MODELS=""
+                  fi
+
+                  # Build command args
+                  ARGS=""
+                  if [ -n "$PATTERNS" ]; then
+                    ARGS="$ARGS --patterns $PATTERNS"
+                  fi
+                  if [ -n "$MODELS" ]; then
+                    ARGS="$ARGS --models $MODELS"
+                  else
+                    ARGS="$ARGS --models $DEFAULT_MODELS"
+                  fi
+
+                  echo "args=$ARGS" >> $GITHUB_OUTPUT
+
+            - name: Run API compliance tests
+              id: compliance
+              env:
+                  LLM_API_KEY: ${{ secrets.LLM_API_KEY_EVAL }}
+                  LLM_BASE_URL: https://llm-proxy.eval.all-hands.dev
+                  GITHUB_RUN_ID: ${{ github.run_id }}
+              run: |
+                  uv run python tests/integration/api_compliance/run_compliance.py \
+                    ${{ steps.params.outputs.args }} \
+                    --output-dir compliance-results/
+              continue-on-error: true  # Tests may "fail" but that's expected
+
+            - name: Upload results
+              uses: actions/upload-artifact@v7
+              with:
+                  name: compliance-results
+                  path: compliance-results/
+                  retention-days: 30
+
+            - name: Post results to PR
+              if: github.event_name == 'pull_request'
+              uses: actions/github-script@v8
+              with:
+                  script: |
+                      const fs = require('fs');
+                      const path = require('path');
+
+                      // Find the report directory
+                      const resultsDir = 'compliance-results';
+                      const dirs = fs.readdirSync(resultsDir);
+                      if (dirs.length === 0) {
+                        console.log('No results found');
+                        return;
+                      }
+
+                      const latestDir = path.join(resultsDir, dirs[0]);
+                      const reportPath = path.join(latestDir, 'compliance_report.md');
+
+                      if (!fs.existsSync(reportPath)) {
+                        console.log('Report not found at', reportPath);
+                        return;
+                      }
+
+                      let report = fs.readFileSync(reportPath, 'utf8');
+
+                      // Truncate if too long
+                      if (report.length > 60000) {
+                        report = report.substring(0, 60000) + '\n\n... (truncated)';
+                      }
+
+                      await github.rest.issues.createComment({
+                        owner: context.repo.owner,
+                        repo: context.repo.repo,
+                        issue_number: context.payload.pull_request.number,
+                        body: report
+                      });
@@ -0,0 +1,224 @@
+---
+# To set this up:
+#  1. Change the name below to something relevant to your task
+#  2. Modify the "env" section below with your prompt
+#  3. Add your LLM_API_KEY to the repository secrets
+#  4. Commit this file to your repository
+#  5. Trigger the workflow manually or set up a schedule
+name: Assign Reviews
+
+on:
+    # Manual trigger
+    workflow_dispatch:
+    # Scheduled trigger (disabled by default, uncomment and customize as needed)
+    schedule:
+      # Run at 12 PM UTC every day
+        - cron: 0 12 * * *
+
+permissions:
+    contents: write
+    pull-requests: write
+    issues: write
+
+jobs:
+    run-task:
+        # Only run scheduled jobs in the main repository, not in forks
+        if: github.repository == 'OpenHands/software-agent-sdk' || github.event_name == 'workflow_dispatch'
+        runs-on: ubuntu-24.04
+        env:
+            # Configuration (modify these values as needed)
+            AGENT_SCRIPT_URL: https://raw.githubusercontent.com/OpenHands/agent-sdk/main/examples/03_github_workflows/01_basic_action/agent_script.py
+            # Provide either PROMPT_LOCATION (URL/file) OR PROMPT_STRING (direct text), not both
+            # Option 1: Use a URL or file path for the prompt
+            PROMPT_LOCATION: ''
+            # PROMPT_LOCATION: 'https://example.com/prompts/maintenance.txt'
+            # Option 2: Use direct text for the prompt
+            PROMPT_STRING: >
+                Use GITHUB_TOKEN and the github API to organize open pull requests and issues in the repo.
+                Read the sections below in order, and perform each in order. Do NOT take action
+                on the same issue or PR twice.
+
+                # Issues with needs-info - Check for OP Response
+
+                Find all open issues that have the "needs-info" label. For each issue:
+                1. Identify the original poster (issue author)
+                2. Check if there are any comments from the original poster AFTER the "needs-info" label was added
+                3. To determine when the label was added, use: GET /repos/{owner}/{repo}/issues/{issue_number}/timeline
+                   and look for "labeled" events with the label "needs-info"
+                4. If the original poster has commented after the label was added:
+                   - Remove the "needs-info" label
+                   - Add the "needs-triage" label
+                # Issues with needs-triage
+
+                Find all open issues that have the "needs-triage" label. For each issue that has been in this state for more than 2 days:
+                1. First, check if the issue has already been triaged by verifying it does NOT have:
+                   - The "enhancement" label
+                   - Any "priority" label (priority:low, priority:medium, priority:high, etc.)
+                2. If the issue has already been triaged (has enhancement or priority label), remove the "needs-triage" label
+                3. For issues that have NOT been triaged yet:
+                   - Read the issue description and comments
+                   - Check if it is a bug report, feature request, or question and add the appropriate label
+                   - If it is a bug report and it does not have a priority label
+                     * Read the MAINTAINERS file in the repository root to get the list of maintainers
+                     * Extract all usernames from lines starting with "- @" and join them with spaces, each prefixed with @
+                       (e.g., if the file contains "- @user1" and "- @user2", format as "@user1 @user2")
+                     * Tag ALL maintainers with: "[Automatic Post]: This issue has been waiting for triage. <maintainers>, could you
+                please take a look and add the appropriate priority label when you have a chance?"
+                       (Replace <maintainers> with the formatted list from the previous step)
+
+                # Need Reviewer Action
+
+                Find all open PRs where:
+                1. The PR is waiting for review (there are no open review comments or change requests)
+                2. The PR is in a "clean" state (CI passing, no merge conflicts)
+                3. The PR is not marked as draft (draft: false)
+                4. The PR has had no activity (comments, commits, reviews) for more than 3 days.
+
+                In this case, send a message to the reviewers:
+                [Automatic Post]: This PR seems to be currently waiting for review.
+                {reviewer_names}, could you please take a look when you have a chance?
+
+                # Need Author Action
+
+                Find all open PRs where the most recent change or comment was made on the pull
+                request more than 5 days ago (use 14 days if the PR is marked as draft).
+
+                And send a message to the author:
+
+                [Automatic Post]: It has been a while since there was any activity on this PR.
+                {author}, are you still working on it? If so, please go ahead, if not then
+                please request review, close it, or request that someone else follow up.
+
+                # Need Reviewers
+
+                Find all open pull requests that TRULY have NO reviewers assigned. To do this correctly:
+
+                1. Use the GitHub API to fetch PR details: GET /repos/{owner}/{repo}/pulls/{pull_number}
+                2. Check the "requested_reviewers" and "requested_teams" arrays
+                3. ALSO check for submitted reviews: GET /repos/{owner}/{repo}/pulls/{pull_number}/reviews
+                4. A PR needs reviewers ONLY if ALL of these are true:
+                   - The "requested_reviewers" array is empty (no pending review requests)
+                   - The "requested_teams" array is empty (no pending team review requests)
+                   - The reviews array is empty (no reviews have been submitted yet)
+                5. IMPORTANT: If ANY of these has entries, SKIP this PR - it already has or had reviewers!
+
+                Example API responses showing a PR that DOES NOT need reviewers (skip this):
+
+                Case 1 - Has requested reviewers:
+                GET /pulls/{number}: {"requested_reviewers": [{"login": "someuser"}], "requested_teams": []}
+
+                Case 2 - Has submitted reviews (even if requested_reviewers is empty):
+                GET /pulls/{number}: {"requested_reviewers": [], "requested_teams": []}
+                GET /pulls/{number}/reviews: [{"user": {"login": "someuser"}, "state": "COMMENTED"}]
+
+                Example API response showing a PR that DOES need reviewers (process this):
+                GET /pulls/{number}: {"requested_reviewers": [], "requested_teams": []}
+                GET /pulls/{number}/reviews: []
+
+                Additional criteria for PRs that need reviewers:
+                1. Are not marked as draft (draft: false)
+                2. Were created more than 1 day ago
+                3. CI is passing and there are no merge conflicts
+
+                For each PR that truly has NO reviewers:
+                1) Read git blame for changed files to identify recent, active contributors.
+                2) From those candidates, ONLY consider maintainers — repository collaborators with write access or higher. Verify via the GitHub API before
+                requesting review:
+                   - Preferred: GET /repos/{owner}/{repo}/collaborators (no permission filter). Filter client-side using either:
+                     role_name in ["write", "maintain", "admin"] OR permissions.push || permissions.admin. Note: paginate if > 30 collaborators.
+                   - Alternative: GET /repos/{owner}/{repo}/collaborators/{username}/permission and accept if permission in {push, maintain, admin}.
+                3) If multiple maintainers qualify, avoid assigning too many reviews to any single one.
+                4) Request review from exactly one maintainer and add this message:
+
+                [Automatic Post]: I have assigned {reviewer} as a reviewer based on git blame information.
+                Thanks in advance for the help!
+
+            LLM_MODEL: litellm_proxy/claude-sonnet-4-5-20250929
+            LLM_BASE_URL: https://llm-proxy.app.all-hands.dev
+        steps:
+            - name: Checkout repository
+              uses: actions/checkout@v5
+
+            - name: Set up Python
+              uses: actions/setup-python@v6
+              with:
+                  python-version: '3.13'
+
+            - name: Install uv
+              uses: astral-sh/setup-uv@v7
+              with:
+                  enable-cache: true
+
+            - name: Install OpenHands dependencies
+              run: |
+                  # Install OpenHands SDK and tools from git repository
+                  uv pip install --system "openhands-sdk @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-sdk"
+                  uv pip install --system "openhands-tools @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-tools"
+
+            - name: Check required configuration
+              env:
+                  LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
+              run: |
+                  if [ -z "$LLM_API_KEY" ]; then
+                    echo "Error: LLM_API_KEY secret is not set."
+                    exit 1
+                  fi
+
+                  # Check that exactly one of PROMPT_LOCATION or PROMPT_STRING is set
+                  if [ -n "$PROMPT_LOCATION" ] && [ -n "$PROMPT_STRING" ]; then
+                    echo "Error: Both PROMPT_LOCATION and PROMPT_STRING are set."
+                    echo "Please provide only one in the env section of the workflow file."
+                    exit 1
+                  fi
+
+                  if [ -z "$PROMPT_LOCATION" ] && [ -z "$PROMPT_STRING" ]; then
+                    echo "Error: Neither PROMPT_LOCATION nor PROMPT_STRING is set."
+                    echo "Please set one in the env section of the workflow file."
+                    exit 1
+                  fi
+
+                  if [ -n "$PROMPT_LOCATION" ]; then
+                    echo "Prompt location: $PROMPT_LOCATION"
+                  else
+                    echo "Using inline PROMPT_STRING (${#PROMPT_STRING} characters)"
+                  fi
+                  echo "LLM model: $LLM_MODEL"
+                  if [ -n "$LLM_BASE_URL" ]; then
+                    echo "LLM base URL: $LLM_BASE_URL"
+                  fi
+
+            - name: Run task
+              env:
+                  LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
+                  GITHUB_TOKEN: ${{ secrets.ALLHANDS_BOT_GITHUB_PAT }}
+                  PYTHONPATH: ''
+              run: |
+                  echo "Running agent script: $AGENT_SCRIPT_URL"
+
+                  # Download script if it's a URL
+                  if [[ "$AGENT_SCRIPT_URL" =~ ^https?:// ]]; then
+                    echo "Downloading agent script from URL..."
+                    curl -sSL "$AGENT_SCRIPT_URL" -o /tmp/agent_script.py
+                    AGENT_SCRIPT_PATH="/tmp/agent_script.py"
+                  else
+                    AGENT_SCRIPT_PATH="$AGENT_SCRIPT_URL"
+                  fi
+
+                  # Run with appropriate prompt argument
+                  if [ -n "$PROMPT_LOCATION" ]; then
+                    echo "Using prompt from: $PROMPT_LOCATION"
+                    uv run python "$AGENT_SCRIPT_PATH" "$PROMPT_LOCATION"
+                  else
+                    echo "Using PROMPT_STRING (${#PROMPT_STRING} characters)"
+                    uv run python "$AGENT_SCRIPT_PATH"
+                  fi
+
+            - name: Upload logs as artifact
+              uses: actions/upload-artifact@v7
+              if: always()
+              with:
+                  name: openhands-task-logs
+                  path: |
+                      *.log
+                      output/
+                  retention-days: 7
@@ -0,0 +1,36 @@
+---
+name: Auto-label New Issues
+
+on:
+    issues:
+        types: [opened]
+
+permissions:
+    issues: write
+
+jobs:
+    add-triage-label:
+        runs-on: ubuntu-latest
+        steps:
+            - name: Add needs-triage label
+              uses: actions/github-script@v8
+              with:
+                  github-token: ${{ secrets.ALLHANDS_BOT_GITHUB_PAT }}
+                  script: |
+                      // Get the issue details
+                      const issue = context.payload.issue;
+                      const labels = issue.labels.map(label => label.name);
+
+                      // Check if issue has already been triaged
+                      const hasEnhancement = labels.includes('enhancement');
+                      const hasPriority = labels.some(label => label.startsWith('priority'));
+
+                      // Only add needs-triage if not already triaged
+                      if (!hasEnhancement && !hasPriority) {
+                        await github.rest.issues.addLabels({
+                          owner: context.repo.owner,
+                          repo: context.repo.repo,
+                          issue_number: context.issue.number,
+                          labels: ['needs-triage']
+                        });
+                      }
@@ -0,0 +1,25 @@
+---
+# .github/workflows/check-docstrings.yml
+name: Check Docstrings
+
+on:
+    push:
+        branches: [main]
+    pull_request:
+        branches: ['**']
+
+jobs:
+    check-docstrings:
+        runs-on: ubuntu-24.04
+
+        steps:
+            - name: Checkout code
+              uses: actions/checkout@v5
+
+            - name: Set up Python
+              uses: actions/setup-python@v6
+              with:
+                  python-version: '3.13'
+
+            - name: Check docstring formatting
+              run: python .github/scripts/check_docstrings.py
@@ -0,0 +1,59 @@
+---
+name: '[Optional] Docs example'
+
+on:
+    pull_request:
+        branches:
+            - '**'
+        paths:
+            - examples/**/*.py
+            - '!examples/03_github_workflows/**'
+            - '!examples/04_llm_specific_tools/**'
+            - .github/workflows/check-documented-examples.yml
+            - .github/scripts/check_documented_examples.py
+    workflow_dispatch:
+
+permissions:
+    contents: read
+    pull-requests: read
+
+jobs:
+    check-examples:
+        runs-on: ubuntu-latest
+        steps:
+            - name: Checkout agent-sdk repository
+              uses: actions/checkout@v5
+              with:
+                  fetch-depth: 0
+
+            - name: Checkout docs repository (try feature branch)
+              uses: actions/checkout@v5
+              continue-on-error: true
+              id: checkout-feature
+              with:
+                  repository: OpenHands/docs
+                  path: docs
+                  fetch-depth: 0
+                  ref: ${{ github.head_ref || github.ref_name }}
+
+            - name: Checkout docs repository (fallback to main)
+              if: steps.checkout-feature.outcome == 'failure'
+              uses: actions/checkout@v5
+              with:
+                  repository: OpenHands/docs
+                  path: docs
+                  fetch-depth: 0
+                  ref: main
+
+            - name: Set up Python
+              uses: actions/setup-python@v6
+              with:
+                  python-version: '3.13'
+
+            - name: Check documented examples
+              env:
+                  DOCS_PATH: ${{ github.workspace }}/docs
+              shell: bash
+              run: |
+                  set -euo pipefail
+                  python .github/scripts/check_documented_examples.py
@@ -0,0 +1,35 @@
+---
+name: Check duplicate example numbers
+
+on:
+    pull_request:
+        branches:
+            - '**'
+        paths:
+            - examples/**
+            - .github/workflows/check-duplicate-examples.yml
+            - .github/scripts/check_duplicate_example_numbers.py
+    push:
+        branches:
+            - main
+        paths:
+            - examples/**
+    workflow_dispatch:
+
+permissions:
+    contents: read
+
+jobs:
+    check-duplicates:
+        runs-on: ubuntu-latest
+        steps:
+            - name: Checkout repository
+              uses: actions/checkout@v5
+
+            - name: Set up Python
+              uses: actions/setup-python@v6
+              with:
+                  python-version: '3.13'
+
+            - name: Check for duplicate example numbers
+              run: python .github/scripts/check_duplicate_example_numbers.py
@@ -1,65 +0,0 @@
-name: Check Package Versions
-
-on:
-  push:
-    branches: [main]
-  pull_request:
-  workflow_dispatch:
-
-jobs:
-  check-package-versions:
-    runs-on: ubuntu-latest
-
-    steps:
-      - name: Checkout repository
-        uses: actions/checkout@v4
-
-      - name: Set up Python
-        uses: actions/setup-python@v6
-        with:
-          python-version: "3.12"
-
-      - name: Check for any 'rev' fields in pyproject.toml
-        run: |
-          python - <<'PY'
-          import sys, tomllib, pathlib
-
-          path = pathlib.Path("pyproject.toml")
-          if not path.exists():
-              print("❌ ERROR: pyproject.toml not found")
-              sys.exit(1)
-
-          try:
-              data = tomllib.loads(path.read_text(encoding="utf-8"))
-          except Exception as e:
-              print(f"❌ ERROR: Failed to parse pyproject.toml: {e}")
-              sys.exit(1)
-
-          poetry = data.get("tool", {}).get("poetry", {})
-          sections = {
-              "dependencies": poetry.get("dependencies", {}),
-          }
-
-          errors = []
-
-          print("🔍 Checking for any dependencies with 'rev' fields...\n")
-          for section_name, deps in sections.items():
-              if not isinstance(deps, dict):
-                  continue
-
-              for pkg_name, cfg in deps.items():
-                  if isinstance(cfg, dict) and "rev" in cfg:
-                      msg = f"  ✖ {pkg_name} in [{section_name}] uses rev='{cfg['rev']}' (NOT ALLOWED)"
-                      print(msg)
-                      errors.append(msg)
-                  else:
-                      print(f"  • {pkg_name}: OK")
-
-          if errors:
-              print("\n❌ FAILED: Found dependencies using 'rev' fields:\n" + "\n".join(errors))
-              print("\nPlease use versioned releases instead, e.g.:")
-              print('  my-package = "1.0.0"')
-              sys.exit(1)
-
-          print("\n✅ SUCCESS: No 'rev' fields found. All dependencies are using proper versioned releases.")
-          PY
@@ -0,0 +1,244 @@
+---
+name: Run Condenser Tests
+
+on:
+    # Use pull_request_target to access secrets even on fork PRs
+    # This is safe because we only run when the 'condenser-test' label is added by a maintainer
+    pull_request_target:
+        types:
+            - labeled
+    workflow_dispatch:
+        inputs:
+            reason:
+                description: Reason for manual trigger
+                required: true
+                default: ''
+
+env:
+    N_PROCESSES: 2 # Fewer parallel processes for condenser tests (only 2 LLMs)
+
+jobs:
+    post-initial-comment:
+        if: >
+            github.event_name == 'pull_request_target' &&
+            github.event.label.name == 'condenser-test'
+        runs-on: ubuntu-latest
+        permissions:
+            pull-requests: write
+        steps:
+            - name: Comment on PR
+              uses: KeisukeYamashita/create-comment@v1
+              with:
+                  unique: false
+                  comment: |
+                      Hi! I started running the condenser tests on your PR. You will receive a comment with the results shortly.
+
+                      Note: These are non-blocking tests that validate condenser functionality across different LLMs.
+
+    run-condenser-tests:
+        # Security: Only run when condenser-test label is present or via workflow_dispatch
+        # This prevents automatic execution on fork PRs without maintainer approval
+        if: |
+            always() && (
+                (
+                    github.event_name == 'pull_request_target' &&
+                    github.event.label.name == 'condenser-test'
+                ) ||
+                github.event_name == 'workflow_dispatch'
+            )
+        runs-on: ubuntu-22.04
+        permissions:
+            contents: read
+            id-token: write
+            pull-requests: write
+        strategy:
+            matrix:
+                python-version: ['3.13']
+                job-config:
+                    # Only run against 2 LLMs for condenser tests:
+                    # - Claude Opus 4.5 (primary - supports thinking blocks)
+                    # - GPT-5.1 Codex Max (secondary - cross-LLM validation)
+                    - name: Claude Opus 4.5
+                      run-suffix: opus_condenser_run
+                      llm-config:
+                          model: litellm_proxy/anthropic/claude-opus-4-5-20251101
+                          extended_thinking: true
+                    - name: GPT-5.1 Codex Max
+                      run-suffix: gpt51_condenser_run
+                      llm-config:
+                          model: litellm_proxy/gpt-5.1-codex-max
+        steps:
+            - name: Checkout repository
+              uses: actions/checkout@v5
+              with:
+                  # For pull_request_target: checkout fork PR code (requires explicit repository)
+                  # For other events: fallback to current repository and ref
+                  repository: ${{ github.event.pull_request.head.repo.full_name || github.repository }}
+                  ref: ${{ github.event.pull_request.head.sha || github.ref }}
+                  # Security: Don't persist credentials to prevent untrusted PR code from using them
+                  persist-credentials: false
+
+            - name: Install uv
+              uses: astral-sh/setup-uv@v7
+              with:
+                  version: latest
+                  python-version: ${{ matrix.python-version }}
+
+            - name: Install Python dependencies using uv
+              run: |
+                  uv sync --dev
+                  uv pip install pytest
+
+            - name: Run condenser test evaluation for ${{ matrix.job-config.name }}
+              env:
+                  LLM_CONFIG: ${{ toJson(matrix.job-config.llm-config) }}
+                  LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
+                  LLM_BASE_URL: https://llm-proxy.app.all-hands.dev
+              run: |
+                  set -eo pipefail
+
+                  AGENT_SDK_VERSION=$(git rev-parse --short HEAD)
+                  EVAL_NOTE="${AGENT_SDK_VERSION}_${{ matrix.job-config.run-suffix }}"
+
+                  echo "Running condenser tests only (c*.py pattern)"
+
+                  uv run python tests/integration/run_infer.py \
+                    --llm-config "$LLM_CONFIG" \
+                    --num-workers $N_PROCESSES \
+                    --eval-note "$EVAL_NOTE" \
+                    --test-type condenser
+
+                  # get condenser tests JSON results
+                  RESULTS_FILE=$(find tests/integration/outputs/*${{ matrix.job-config.run-suffix }}* -name "results.json" -type f | head -n 1)
+                  echo "RESULTS_FILE: $RESULTS_FILE"
+                  if [ -f "$RESULTS_FILE" ]; then
+                    echo "JSON_RESULTS_FILE=$RESULTS_FILE" >> $GITHUB_ENV
+                  else
+                    echo "JSON_RESULTS_FILE=" >> $GITHUB_ENV
+                  fi
+
+            - name: Wait a little bit
+              run: sleep 10
+
+            - name: Create archive of evaluation outputs
+              run: |
+                  TIMESTAMP=$(date +'%y-%m-%d-%H-%M')
+                  cd tests/integration/outputs  # Change to the outputs directory
+                  tar -czvf ../../../condenser_tests_${{ matrix.job-config.run-suffix }}_${TIMESTAMP}.tar.gz *${{ matrix.job-config.run-suffix }}* # Include result directories for this model
+
+            - name: Upload evaluation results as artifact
+              uses: actions/upload-artifact@v7
+              id: upload_results_artifact
+              with:
+                  name: condenser-test-outputs-${{ matrix.job-config.run-suffix }}-${{ github.run_id }}-${{ github.run_attempt }}
+                  path: condenser_tests_${{ matrix.job-config.run-suffix }}_*.tar.gz
+
+            - name: Save test results for consolidation
+              run: |
+                  # Copy the structured JSON results file for consolidation
+                  mkdir -p test_results_summary
+
+                  if [ -n "${{ env.JSON_RESULTS_FILE }}" ] && [ -f "${{ env.JSON_RESULTS_FILE }}" ]; then
+                    # Copy the JSON results file directly
+                    cp "${{ env.JSON_RESULTS_FILE }}" "test_results_summary/${{ matrix.job-config.run-suffix }}_results.json"
+                    echo "✓ Copied JSON results file for consolidation"
+                  else
+                    echo "✗ No JSON results file found"
+                    exit 1
+                  fi
+
+            - name: Upload test results summary
+              uses: actions/upload-artifact@v7
+              with:
+                  name: test-results-${{ matrix.job-config.run-suffix }}
+                  path: test_results_summary/${{ matrix.job-config.run-suffix }}_results.json
+
+    consolidate-results:
+        needs: run-condenser-tests
+        if: |
+            always() && (
+                (
+                    github.event_name == 'pull_request_target' &&
+                    github.event.label.name == 'condenser-test'
+                ) ||
+                github.event_name == 'workflow_dispatch'
+            )
+        runs-on: ubuntu-24.04
+        permissions:
+            contents: read
+            pull-requests: write
+        steps:
+            - name: Checkout repository
+              uses: actions/checkout@v5
+              with:
+                  # When using pull_request_target, explicitly checkout the PR branch
+                  # This ensures we use the scripts from the actual PR code
+                  ref: ${{ github.event.pull_request.head.sha || github.ref }}
+
+            - name: Install uv
+              uses: astral-sh/setup-uv@v7
+              with:
+                  version: latest
+                  python-version: '3.13'
+
+            - name: Install Python dependencies using uv
+              run: |
+                  uv sync --dev
+
+            - name: Download all test results
+              uses: actions/download-artifact@v8
+              with:
+                  pattern: test-results-*
+                  merge-multiple: true
+                  path: all_results
+
+            - name: Download all condenser test artifacts
+              uses: actions/download-artifact@v8
+              with:
+                  pattern: condenser-test-outputs-*
+                  path: artifacts
+
+            - name: Consolidate test results
+              env:
+                  EVENT_NAME: ${{ github.event_name }}
+                  PR_NUMBER: ${{ github.event.pull_request.number }}
+                  MANUAL_REASON: ${{ github.event.inputs.reason }}
+                  COMMIT_SHA: ${{ github.sha }}
+                  PYTHONPATH: ${{ github.workspace }}
+                  GITHUB_SERVER_URL: ${{ github.server_url }}
+                  GITHUB_REPOSITORY: ${{ github.repository }}
+                  GITHUB_RUN_ID: ${{ github.run_id }}
+              run: |
+                  uv run python tests/integration/utils/consolidate_json_results.py \
+                    --results-dir all_results \
+                    --artifacts-dir artifacts \
+                    --output-file consolidated_results.json
+
+                  echo "Consolidated results generated successfully"
+
+                  uv run python tests/integration/utils/generate_markdown_report.py \
+                    --input-file consolidated_results.json \
+                    --output-file consolidated_report.md
+
+            - name: Upload consolidated report
+              uses: actions/upload-artifact@v7
+              with:
+                  name: consolidated-condenser-report
+                  path: consolidated_report.md
+
+            - name: Create consolidated PR comment
+              if: github.event_name == 'pull_request_target'
+              run: |
+                  # Add header to clarify these are non-blocking tests
+                  echo "## Condenser Test Results (Non-Blocking)" > final_report.md
+                  echo "" >> final_report.md
+                  echo "> These tests validate condenser functionality and do not block PR merges." >> final_report.md
+                  echo "" >> final_report.md
+                  cat consolidated_report.md >> final_report.md
+
+                  # Sanitize @OpenHands mentions to prevent self-mention loops
+                  COMMENT_BODY=$(uv run python -c "from openhands.sdk.utils.github import sanitize_openhands_mentions; import sys; print(sanitize_openhands_mentions(sys.stdin.read()), end='')" < final_report.md)
+                  # Use GitHub CLI to create comment with explicit PR number
+                  echo "$COMMENT_BODY" | gh pr comment ${{ github.event.pull_request.number }} --body-file -
+              env:
+                  GH_TOKEN: ${{ github.token }}
@@ -0,0 +1,23 @@
+---
+name: Dispatch to docs repo
+
+on:
+    push:
+        branches:
+            - main
+        paths:
+            - openhands-agent-server/**
+    workflow_dispatch:
+jobs:
+    dispatch:
+        runs-on: ubuntu-24.04
+        permissions:
+            contents: write
+        steps:
+            - name: Trigger docs repo sync
+              uses: peter-evans/repository-dispatch@v4
+              with:
+                  token: ${{ secrets.ALLHANDS_BOT_GITHUB_PAT }}
+                  repository: OpenHands/docs
+                  event-type: update
+                  client-payload: '{"ref": "${{ github.ref }}", "sha": "${{ github.sha }}"}'
@@ -0,0 +1,24 @@
+---
+name: Deprecation deadlines
+
+on:
+    push:
+        branches: [main]
+    pull_request:
+        branches: ['**']
+
+jobs:
+    check:
+        runs-on: ubuntu-24.04
+        steps:
+            - name: Checkout
+              uses: actions/checkout@v5
+
+            - name: Install uv
+              uses: astral-sh/setup-uv@v7
+              with:
+                  enable-cache: true
+                  python-version: '3.13'
+
+            - name: Verify deprecation removals
+              run: uv run --with packaging python .github/scripts/check_deprecations.py
@@ -1,228 +0,0 @@
-name: End-to-End Tests
-
-on:
-  pull_request:
-    types: [opened, synchronize, reopened, labeled]
-    branches:
-      - main
-      - develop
-  workflow_dispatch:
-
-jobs:
-  e2e-tests:
-    if: contains(github.event.pull_request.labels.*.name, 'end-to-end') || github.event_name == 'workflow_dispatch'
-    runs-on: ubuntu-latest
-    timeout-minutes: 60
-
-    env:
-      GITHUB_REPO_NAME: ${{ github.repository }}
-
-    steps:
-      - name: Checkout code
-        uses: actions/checkout@v4
-
-      - name: Install poetry via pipx
-        uses: abatilo/actions-poetry@v4
-        with:
-          poetry-version: 2.1.3
-
-      - name: Set up Python
-        uses: actions/setup-python@v6
-        with:
-          python-version: '3.12'
-          cache: 'poetry'
-
-      - name: Install system dependencies
-        run: |
-          sudo apt-get update
-          sudo apt-get install -y libgtk-3-0 libnotify4 libnss3 libxss1 libxtst6 xauth xvfb libgbm1 libasound2t64 netcat-openbsd
-
-      - name: Setup Node.js
-        uses: actions/setup-node@v6
-        with:
-          node-version: '22'
-          cache: 'npm'
-          cache-dependency-path: 'frontend/package-lock.json'
-
-      - name: Setup environment for end-to-end tests
-        run: |
-          # Create test results directory
-          mkdir -p test-results
-
-          # Create downloads directory for OpenHands (use a directory in the home folder)
-          mkdir -p $HOME/downloads
-          sudo chown -R $USER:$USER $HOME/downloads
-          sudo chmod -R 755 $HOME/downloads
-
-      - name: Build OpenHands
-        env:
-          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-          LLM_MODEL: ${{ secrets.LLM_MODEL || 'gpt-4o' }}
-          LLM_API_KEY: ${{ secrets.LLM_API_KEY || 'test-key' }}
-          LLM_BASE_URL: ${{ secrets.LLM_BASE_URL }}
-          INSTALL_DOCKER: 1
-          RUNTIME: docker
-          FRONTEND_PORT: 12000
-          FRONTEND_HOST: 0.0.0.0
-          BACKEND_HOST: 0.0.0.0
-          BACKEND_PORT: 3000
-          ENABLE_BROWSER: true
-          INSTALL_PLAYWRIGHT: 1
-        run: |
-          # Fix poetry.lock file if needed
-          echo "Fixing poetry.lock file if needed..."
-          poetry lock
-
-          # Build OpenHands using make build
-          echo "Running make build..."
-          make build
-
-          # Install Chromium Headless Shell for Playwright (needed for pytest-playwright)
-          echo "Installing Chromium Headless Shell for Playwright..."
-          poetry run playwright install chromium-headless-shell
-
-          # Verify Playwright browsers are installed (for e2e tests only)
-          echo "Verifying Playwright browsers installation for e2e tests..."
-          BROWSER_CHECK=$(poetry run python tests/e2e/check_playwright.py 2>/dev/null)
-
-          if [ "$BROWSER_CHECK" != "chromium_found" ]; then
-            echo "ERROR: Chromium browser not found or not working for e2e tests"
-            echo "$BROWSER_CHECK"
-            exit 1
-          else
-            echo "Playwright browsers are properly installed for e2e tests."
-          fi
-
-          # Docker runtime will handle workspace directory creation
-
-          # Start the application using make run with custom parameters and reduced logging
-          echo "Starting OpenHands using make run..."
-          # Set environment variables to reduce logging verbosity
-          export PYTHONUNBUFFERED=1
-          export LOG_LEVEL=WARNING
-          export UVICORN_LOG_LEVEL=warning
-          export OPENHANDS_LOG_LEVEL=WARNING
-          FRONTEND_PORT=12000 FRONTEND_HOST=0.0.0.0 BACKEND_HOST=0.0.0.0 make run > /tmp/openhands-e2e-test.log 2>&1 &
-
-          # Store the PID of the make run process
-          MAKE_PID=$!
-          echo "OpenHands started with PID: $MAKE_PID"
-
-          # Wait for the application to start
-          echo "Waiting for OpenHands to start..."
-          max_attempts=15
-          attempt=1
-
-          while [ $attempt -le $max_attempts ]; do
-            echo "Checking if OpenHands is running (attempt $attempt of $max_attempts)..."
-
-            # Check if the process is still running
-            if ! ps -p $MAKE_PID > /dev/null; then
-              echo "ERROR: OpenHands process has terminated unexpectedly"
-              echo "Last 50 lines of the log:"
-              tail -n 50 /tmp/openhands-e2e-test.log
-              exit 1
-            fi
-
-            # Check if frontend port is open
-            if nc -z localhost 12000; then
-              # Verify we can get HTML content
-              if curl -s http://localhost:12000 | grep -q "<html"; then
-                echo "SUCCESS: OpenHands is running and serving HTML content on port 12000"
-                break
-              else
-                echo "Port 12000 is open but not serving HTML content yet"
-              fi
-            else
-              echo "Frontend port 12000 is not open yet"
-            fi
-
-            # Show log output on each attempt
-            echo "Recent log output:"
-            tail -n 20 /tmp/openhands-e2e-test.log
-
-            # Wait before next attempt
-            echo "Waiting 10 seconds before next check..."
-            sleep 10
-            attempt=$((attempt + 1))
-
-            # Exit if we've reached the maximum number of attempts
-            if [ $attempt -gt $max_attempts ]; then
-              echo "ERROR: OpenHands failed to start after $max_attempts attempts"
-              echo "Last 50 lines of the log:"
-              tail -n 50 /tmp/openhands-e2e-test.log
-              exit 1
-            fi
-          done
-
-          # Final verification that the app is running
-          if ! nc -z localhost 12000 || ! curl -s http://localhost:12000 | grep -q "<html"; then
-            echo "ERROR: OpenHands is not running properly on port 12000"
-            echo "Last 50 lines of the log:"
-            tail -n 50 /tmp/openhands-e2e-test.log
-            exit 1
-          fi
-
-          # Print success message
-          echo "OpenHands is running successfully on port 12000"
-
-      - name: Run end-to-end tests
-        env:
-          GITHUB_TOKEN: ${{ secrets.E2E_TEST_GITHUB_TOKEN }}
-          LLM_MODEL: ${{ secrets.LLM_MODEL || 'gpt-4o' }}
-          LLM_API_KEY: ${{ secrets.LLM_API_KEY || 'test-key' }}
-          LLM_BASE_URL: ${{ secrets.LLM_BASE_URL }}
-        run: |
-          # Check if the application is running
-          if ! nc -z localhost 12000; then
-            echo "ERROR: OpenHands is not running on port 12000"
-            echo "Last 50 lines of the log:"
-            tail -n 50 /tmp/openhands-e2e-test.log
-            exit 1
-          fi
-
-          # Run the tests with detailed output
-          cd tests/e2e
-          poetry run python -m pytest \
-            test_settings.py::test_github_token_configuration \
-            test_conversation.py::test_conversation_start \
-            test_browsing_catchphrase.py::test_browsing_catchphrase \
-            test_multi_conversation_resume.py::test_multi_conversation_resume \
-            -v --no-header --capture=no --timeout=900
-
-      - name: Upload test results
-        if: always()
-        uses: actions/upload-artifact@v6
-        with:
-          name: playwright-report
-          path: tests/e2e/test-results/
-          retention-days: 30
-
-      - name: Upload OpenHands logs
-        if: always()
-        uses: actions/upload-artifact@v6
-        with:
-          name: openhands-logs
-          path: |
-            /tmp/openhands-e2e-test.log
-            /tmp/openhands-e2e-build.log
-            /tmp/openhands-backend.log
-            /tmp/openhands-frontend.log
-            /tmp/backend-health-check.log
-            /tmp/frontend-check.log
-            /tmp/vite-config.log
-            /tmp/makefile-contents.log
-          retention-days: 30
-
-      - name: Cleanup
-        if: always()
-        run: |
-          # Stop OpenHands processes
-          echo "Stopping OpenHands processes..."
-          pkill -f "python -m openhands.server" || true
-          pkill -f "npm run dev" || true
-          pkill -f "make run" || true
-
-          # Print process status for debugging
-          echo "Checking if any OpenHands processes are still running:"
-          ps aux | grep -E "openhands|npm run dev" || true
@@ -1,52 +0,0 @@
-name: Enterprise Check Migrations
-
-on:
-  pull_request:
-    paths:
-      - 'enterprise/migrations/**'
-
-jobs:
-  check-sync:
-    runs-on: ubuntu-latest
-    steps:
-      - name: Checkout PR branch
-        uses: actions/checkout@v4
-        with:
-          ref: ${{ github.event.pull_request.head.sha }}
-          fetch-depth: 0
-
-
-      - name: Fetch base branch
-        run: git fetch origin ${{ github.event.pull_request.base.ref }}
-
-      - name: Check if base branch is ancestor of PR
-        id: check_up_to_date
-        shell: bash
-        run: |
-          BASE="origin/${{ github.event.pull_request.base.ref }}"
-          HEAD="${{ github.event.pull_request.head.sha }}"
-          if git merge-base --is-ancestor "$BASE" "$HEAD"; then
-            echo "We're up to date with base $BASE"
-            exit 0
-          else
-            echo "NOT up to date with base $BASE"
-            exit 1
-          fi
-
-      - name: Find Comment
-        uses: peter-evans/find-comment@v3
-        id: find-comment
-        with:
-          issue-number: ${{ github.event.pull_request.number }}
-          comment-author: 'github-actions[bot]'
-          body-includes: |
-            ⚠️ This PR contains **migrations**
-
-      - name: Comment warning on PR
-        uses: peter-evans/create-or-update-comment@v5
-        with:
-          issue-number: ${{ github.event.pull_request.number }}
-          comment-id: ${{ steps.find-comment.outputs.comment-id }}
-          edit-mode: replace
-          body: |
-            ⚠️ This PR contains **migrations**. Please synchronize before merging to prevent conflicts.
@@ -1,29 +0,0 @@
-# Feature branch preview for enterprise code
-name: Enterprise Preview
-
-# Run on PRs labeled
-on:
-  pull_request:
-    types: [labeled]
-
-# Match ghcr-build.yml, but don't interrupt it.
-concurrency:
-  group: ${{ github.workflow }}-${{ (github.head_ref && github.ref) || github.run_id }}
-  cancel-in-progress: false
-
-jobs:
-  # This must happen for the PR Docker workflow when the label is present,
-  # and also if it's added after the fact. Thus, it exists in both places.
-  enterprise-preview:
-    name: Enterprise preview
-    if: github.event.label.name == 'deploy'
-    runs-on: blacksmith-4vcpu-ubuntu-2204
-    steps:
-      # This should match the version in ghcr-build.yml
-      - name: Trigger remote job
-        run: |
-          curl --fail-with-body -sS -X POST \
-            -H "Authorization: Bearer ${{ secrets.ALLHANDS_BOT_GITHUB_PAT }}" \
-            -H "Accept: application/vnd.github+json" \
-            -d "{\"ref\": \"main\", \"inputs\": {\"openhandsPrNumber\": \"${{ github.event.pull_request.number }}\", \"deployEnvironment\": \"feature\", \"enterpriseImageTag\": \"pr-${{ github.event.pull_request.number }}\" }}" \
-            https://api.github.com/repos/OpenHands/deploy/actions/workflows/deploy.yaml/dispatches
@@ -1,47 +0,0 @@
-# Workflow that runs frontend e2e tests with Playwright
-name: Run Frontend E2E Tests
-
-on:
-  push:
-    branches:
-      - main
-  pull_request:
-    paths:
-      - "frontend/**"
-      - ".github/workflows/fe-e2e-tests.yml"
-
-concurrency:
-  group: ${{ github.workflow }}-${{ (github.head_ref && github.ref) || github.run_id }}
-  cancel-in-progress: true
-
-jobs:
-  fe-e2e-test:
-    name: FE E2E Tests
-    runs-on: blacksmith-4vcpu-ubuntu-2204
-    strategy:
-      matrix:
-        node-version: [22]
-      fail-fast: true
-    steps:
-      - name: Checkout
-        uses: actions/checkout@v4
-      - name: Set up Node.js
-        uses: useblacksmith/setup-node@v5
-        with:
-          node-version: ${{ matrix.node-version }}
-      - name: Install dependencies
-        working-directory: ./frontend
-        run: npm ci
-      - name: Install Playwright browsers
-        working-directory: ./frontend
-        run: npx playwright install --with-deps chromium
-      - name: Run Playwright tests
-        working-directory: ./frontend
-        run: npx playwright test --project=chromium
-      - name: Upload Playwright report
-        uses: actions/upload-artifact@v6
-        if: always()
-        with:
-          name: playwright-report
-          path: frontend/playwright-report/
-          retention-days: 30
@@ -1,44 +0,0 @@
-# Workflow that runs frontend unit tests
-name: Run Frontend Unit Tests
-
-# * Always run on "main"
-# * Run on PRs that have changes in the "frontend" folder or this workflow
-on:
-  push:
-    branches:
-      - main
-  pull_request:
-    paths:
-      - "frontend/**"
-      - ".github/workflows/fe-unit-tests.yml"
-
-# If triggered by a PR, it will be in the same group. However, each commit on main will be in its own unique group
-concurrency:
-  group: ${{ github.workflow }}-${{ (github.head_ref && github.ref) || github.run_id }}
-  cancel-in-progress: true
-
-jobs:
-  # Run frontend unit tests
-  fe-test:
-    name: FE Unit Tests
-    runs-on: blacksmith-4vcpu-ubuntu-2204
-    strategy:
-      matrix:
-        node-version: [22]
-      fail-fast: true
-    steps:
-      - name: Checkout
-        uses: actions/checkout@v4
-      - name: Set up Node.js
-        uses: useblacksmith/setup-node@v5
-        with:
-          node-version: ${{ matrix.node-version }}
-      - name: Install dependencies
-        working-directory: ./frontend
-        run: npm ci
-      - name: Run TypeScript compilation
-        working-directory: ./frontend
-        run: npm run build
-      - name: Run tests and collect coverage
-        working-directory: ./frontend
-        run: npm run test:coverage
@@ -1,288 +0,0 @@
-# Workflow that builds, tests and then pushes the OpenHands and runtime docker images to the ghcr.io repository
-name: Docker
-
-# Always run on "main"
-# Always run on tags
-# Always run on PRs
-# Can also be triggered manually
-on:
-  push:
-    branches:
-      - main
-    tags:
-      - "*"
-  pull_request:
-  workflow_dispatch:
-    inputs:
-      reason:
-        description: "Reason for manual trigger"
-        required: true
-        default: ""
-
-# If triggered by a PR, it will be in the same group. However, each commit on main will be in its own unique group
-concurrency:
-  group: ${{ github.workflow }}-${{ (github.head_ref && github.ref) || github.run_id }}
-  cancel-in-progress: true
-
-env:
-  RELEVANT_SHA: ${{ github.event.pull_request.head.sha || github.sha }}
-
-jobs:
-  define-matrix:
-    runs-on: blacksmith
-    outputs:
-      base_image: ${{ steps.define-base-images.outputs.base_image }}
-    steps:
-      - name: Define base images
-        shell: bash
-        id: define-base-images
-        run: |
-          if [[ "$GITHUB_EVENT_NAME" == "pull_request" ]]; then
-            json=$(jq -n -c '[
-                { image: "nikolaik/python-nodejs:python3.12-nodejs22", tag: "nikolaik" }
-              ]')
-          else
-            json=$(jq -n -c '[
-                { image: "nikolaik/python-nodejs:python3.12-nodejs22", tag: "nikolaik" },
-                { image: "ubuntu:24.04", tag: "ubuntu" }
-              ]')
-          fi
-          echo "base_image=$json" >> "$GITHUB_OUTPUT"
-
-  # Builds the OpenHands Docker images
-  ghcr_build_app:
-    name: Build App Image
-    runs-on: blacksmith-4vcpu-ubuntu-2204
-    if: "!(github.event_name == 'push' && startsWith(github.ref, 'refs/tags/ext-v'))"
-    permissions:
-      contents: read
-      packages: write
-    steps:
-      - name: Checkout
-        uses: actions/checkout@v4
-        with:
-          ref: ${{ github.event.pull_request.head.sha }}
-      - name: Set up QEMU
-        uses: docker/setup-qemu-action@v3.7.0
-        with:
-          image: tonistiigi/binfmt:latest
-      - name: Login to GHCR
-        uses: docker/login-action@v3
-        with:
-          registry: ghcr.io
-          username: ${{ github.repository_owner }}
-          password: ${{ secrets.GITHUB_TOKEN }}
-      - name: Set up Docker Buildx
-        id: buildx
-        uses: docker/setup-buildx-action@v3
-      - name: Lowercase Repository Owner
-        run: |
-          echo REPO_OWNER=$(echo ${{ github.repository_owner }} | tr '[:upper:]' '[:lower:]') >> $GITHUB_ENV
-      - name: Build and push app image
-        if: "!github.event.pull_request.head.repo.fork"
-        run: |
-          ./containers/build.sh -i openhands -o ${{ env.REPO_OWNER }} --push
-
-  # Builds the runtime Docker images
-  ghcr_build_runtime:
-    name: Build Runtime Image
-    runs-on: blacksmith-8vcpu-ubuntu-2204
-    if: "!(github.event_name == 'push' && startsWith(github.ref, 'refs/tags/ext-v'))"
-    permissions:
-      contents: read
-      packages: write
-    needs: define-matrix
-    strategy:
-      matrix:
-        base_image: ${{ fromJson(needs.define-matrix.outputs.base_image) }}
-    steps:
-      - name: Checkout
-        uses: actions/checkout@v4
-        with:
-          ref: ${{ github.event.pull_request.head.sha }}
-      - name: Set up QEMU
-        uses: docker/setup-qemu-action@v3.7.0
-        with:
-          image: tonistiigi/binfmt:latest
-      - name: Login to GHCR
-        uses: docker/login-action@v3
-        with:
-          registry: ghcr.io
-          username: ${{ github.repository_owner }}
-          password: ${{ secrets.GITHUB_TOKEN }}
-      - name: Set up Docker Buildx
-        id: buildx
-        uses: docker/setup-buildx-action@v3
-      - name: Install poetry via pipx
-        run: pipx install poetry
-      - name: Set up Python
-        uses: useblacksmith/setup-python@v6
-        with:
-          python-version: "3.12"
-          cache: poetry
-      - name: Install Python dependencies using Poetry
-        run: make install-python-dependencies POETRY_GROUP=main INSTALL_PLAYWRIGHT=0
-      - name: Create source distribution and Dockerfile
-        run: poetry run python3 -m openhands.runtime.utils.runtime_build --base_image ${{ matrix.base_image.image }} --build_folder containers/runtime --force_rebuild
-      - name: Lowercase Repository Owner
-        run: |
-          echo REPO_OWNER=$(echo ${{ github.repository_owner }} | tr '[:upper:]' '[:lower:]') >> $GITHUB_ENV
-      - name: Short SHA
-        run: |
-          echo SHORT_SHA=$(git rev-parse --short "$RELEVANT_SHA") >> $GITHUB_ENV
-      - name: Determine docker build params
-        if: github.event.pull_request.head.repo.fork != true
-        shell: bash
-        run: |
-
-          ./containers/build.sh -i runtime -o ${{ env.REPO_OWNER }} -t ${{ matrix.base_image.tag }} --dry
-
-          DOCKER_BUILD_JSON=$(jq -c . < docker-build-dry.json)
-          echo "DOCKER_TAGS=$(echo "$DOCKER_BUILD_JSON" | jq -r '.tags | join(",")')" >> $GITHUB_ENV
-          echo "DOCKER_PLATFORM=$(echo "$DOCKER_BUILD_JSON" | jq -r '.platform')" >> $GITHUB_ENV
-          echo "DOCKER_BUILD_ARGS=$(echo "$DOCKER_BUILD_JSON" | jq -r '.build_args | join(",")')" >> $GITHUB_ENV
-      - name: Build and push runtime image ${{ matrix.base_image.image }}
-        if: github.event.pull_request.head.repo.fork != true
-        uses: useblacksmith/build-push-action@v1
-        with:
-          push: true
-          tags: ${{ env.DOCKER_TAGS }}
-          platforms: ${{ env.DOCKER_PLATFORM }}
-          # Caching directives to boost performance
-          cache-from: type=registry,ref=ghcr.io/${{ env.REPO_OWNER }}/runtime:buildcache-${{ matrix.base_image.tag }}
-          cache-to: type=registry,ref=ghcr.io/${{ env.REPO_OWNER }}/runtime:buildcache-${{ matrix.base_image.tag }},mode=max
-          build-args: ${{ env.DOCKER_BUILD_ARGS }}
-          context: containers/runtime
-          provenance: false
-      # Forked repos can't push to GHCR, so we just build in order to populate the cache for rebuilding
-      - name: Build runtime image ${{ matrix.base_image.image }} for fork
-        if: github.event.pull_request.head.repo.fork
-        uses: useblacksmith/build-push-action@v1
-        with:
-          tags: ghcr.io/${{ env.REPO_OWNER }}/runtime:${{ env.RELEVANT_SHA }}-${{ matrix.base_image.tag }}
-          context: containers/runtime
-      - name: Upload runtime source for fork
-        if: github.event.pull_request.head.repo.fork
-        uses: actions/upload-artifact@v6
-        with:
-          name: runtime-src-${{ matrix.base_image.tag }}
-          path: containers/runtime
-
-  ghcr_build_enterprise:
-    name: Push Enterprise Image
-    runs-on: blacksmith-8vcpu-ubuntu-2204
-    permissions:
-      contents: read
-      packages: write
-    needs: [define-matrix, ghcr_build_app]
-    # Do not build enterprise in forks
-    if: github.event.pull_request.head.repo.fork != true
-    steps:
-      - name: Checkout
-        uses: actions/checkout@v4
-        with:
-          ref: ${{ github.event.pull_request.head.sha }}
-
-      # Set up Docker Buildx for better performance
-      - name: Set up Docker Buildx
-        uses: docker/setup-buildx-action@v3
-        with:
-          driver-opts: network=host
-
-      - name: Login to GHCR
-        uses: docker/login-action@v3
-        with:
-          registry: ghcr.io
-          username: ${{ github.repository_owner }}
-          password: ${{ secrets.GITHUB_TOKEN }}
-
-      - name: Extract metadata (tags, labels) for Docker
-        id: meta
-        uses: docker/metadata-action@v5
-        with:
-          images: ghcr.io/openhands/enterprise-server
-          tags: |
-            type=ref,event=branch
-            type=ref,event=pr
-            type=sha
-            type=sha,format=long
-            type=semver,pattern={{version}}
-            type=semver,pattern={{major}}.{{minor}}
-            type=semver,pattern={{major}}
-          flavor: |
-            latest=auto
-            prefix=
-            suffix=
-        env:
-          DOCKER_METADATA_PR_HEAD_SHA: true
-      - name: Determine app image tag
-        shell: bash
-        run: |
-          # Duplicated with build.sh
-          sanitized_ref_name=$(echo "$GITHUB_REF_NAME" | sed 's/[^a-zA-Z0-9.-]\+/-/g')
-          OPENHANDS_BUILD_VERSION=$sanitized_ref_name
-          sanitized_ref_name=$(echo "$sanitized_ref_name" | tr '[:upper:]' '[:lower:]') # lower case is required in tagging
-          echo "OPENHANDS_DOCKER_TAG=${sanitized_ref_name}" >> $GITHUB_ENV
-      - name: Build and push Docker image
-        uses: useblacksmith/build-push-action@v1
-        with:
-          context: .
-          file: enterprise/Dockerfile
-          push: true
-          tags: ${{ steps.meta.outputs.tags }}
-          labels: ${{ steps.meta.outputs.labels }}
-          build-args: |
-            OPENHANDS_VERSION=${{ env.OPENHANDS_DOCKER_TAG }}
-          platforms: linux/amd64
-          # Add build provenance
-          provenance: true
-          # Add build attestations for better security
-          sbom: true
-
-  enterprise-preview:
-    name: Enterprise preview
-    if: github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'deploy')
-    runs-on: blacksmith-4vcpu-ubuntu-2204
-    needs: [ghcr_build_enterprise]
-    steps:
-      # This should match the version in enterprise-preview.yml
-      - name: Trigger remote job
-        run: |
-          curl --fail-with-body -sS -X POST \
-            -H "Authorization: Bearer ${{ secrets.ALLHANDS_BOT_GITHUB_PAT }}" \
-            -H "Accept: application/vnd.github+json" \
-            -d "{\"ref\": \"main\", \"inputs\": {\"openhandsPrNumber\": \"${{ github.event.pull_request.number }}\", \"deployEnvironment\": \"feature\", \"enterpriseImageTag\": \"pr-${{ github.event.pull_request.number }}\" }}" \
-            https://api.github.com/repos/OpenHands/deploy/actions/workflows/deploy.yaml/dispatches
-
-  # "All Runtime Tests Passed" is a required job for PRs to merge
-  # We can remove this once the config changes
-  runtime_tests_check_success:
-    name: All Runtime Tests Passed
-    runs-on: blacksmith-4vcpu-ubuntu-2204
-    steps:
-      - name: All tests passed
-        run: echo "All runtime tests have passed successfully!"
-
-  update_pr_description:
-    name: Update PR Description
-    if: github.event_name == 'pull_request' && !github.event.pull_request.head.repo.fork && github.actor != 'dependabot[bot]'
-    needs: [ghcr_build_runtime]
-    runs-on: blacksmith-4vcpu-ubuntu-2204
-    steps:
-      - name: Checkout
-        uses: actions/checkout@v4
-
-      - name: Get short SHA
-        id: short_sha
-        run: echo "SHORT_SHA=$(echo ${{ github.event.pull_request.head.sha }} | cut -c1-7)" >> $GITHUB_OUTPUT
-
-      - name: Update PR Description
-        env:
-          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-          PR_NUMBER: ${{ github.event.pull_request.number }}
-          REPO: ${{ github.repository }}
-          SHORT_SHA: ${{ steps.short_sha.outputs.SHORT_SHA }}
-        shell: bash
-        run: |
-          echo "Updating PR description with Docker and uvx commands"
-          bash ${GITHUB_WORKSPACE}/.github/scripts/update_pr_description.sh
@@ -0,0 +1,477 @@
+---
+name: Run Integration Tests
+run-name: >-
+    Run Integration Tests ${{ inputs.reason || github.event.label.name || 'scheduled' }}
+
+on:
+    # Use pull_request_target to access secrets even on fork PRs
+    # This is safe because we only run when the 'integration-test' label is added by a maintainer
+    pull_request_target:
+        types:
+            - labeled
+    workflow_dispatch:
+        inputs:
+            reason:
+                description: Reason for manual trigger
+                required: true
+                default: ''
+            test_type:
+                description: Select which tests to run (all, integration, behavior)
+                required: false
+                default: all
+            model_ids:
+                description: >-
+                    Comma-separated model IDs to test (from resolve_model_config.py).
+                    Example: claude-sonnet-4-6,glm-4.7. Defaults to a standard set.
+                required: false
+                default: ''
+                type: string
+            issue_number:
+                description: Issue or PR number to post results to (optional)
+                required: false
+                default: ''
+                type: string
+            tool_preset:
+                description: >-
+                    Tool preset for file editing (default, gemini, gpt5, planning).
+                    'default' uses FileEditorTool, 'gemini' uses read_file/write_file/edit/list_directory,
+                    'gpt5' uses apply_patch tool.
+                required: false
+                default: default
+                type: choice
+                options:
+                    - default
+                    - gemini
+                    - gpt5
+                    - planning
+    schedule:
+        - cron: 30 22 * * * # Runs at 10:30pm UTC every day
+
+env:
+    N_PROCESSES: 4 # Global configuration for number of parallel processes for evaluation
+    # Default models for scheduled/label-triggered runs (subset of models from resolve_model_config.py)
+    DEFAULT_MODEL_IDS: claude-sonnet-4-6,deepseek-v3.2-reasoner,kimi-k2-thinking,gemini-3-pro
+
+jobs:
+    setup-matrix:
+        runs-on: ubuntu-latest
+        outputs:
+            matrix: ${{ steps.resolve-models.outputs.matrix }}
+            issue_number: ${{ steps.resolve-issue.outputs.issue_number }}
+        steps:
+            - name: Checkout repository
+              uses: actions/checkout@v5
+              with:
+                  repository: ${{ github.event.pull_request.head.repo.full_name || github.repository }}
+                  ref: ${{ github.event.pull_request.head.sha || github.ref }}
+                  persist-credentials: false
+
+            - name: Set up Python
+              uses: actions/setup-python@v5
+              with:
+                  python-version: '3.13'
+
+            - name: Resolve model configurations
+              id: resolve-models
+              env:
+                  MODEL_IDS_INPUT: ${{ github.event.inputs.model_ids || '' }}
+                  DEFAULT_MODEL_IDS: ${{ env.DEFAULT_MODEL_IDS }}
+              run: |
+                  # Use input model_ids if provided, otherwise use defaults
+                  if [ -z "$MODEL_IDS_INPUT" ]; then
+                    MODEL_IDS="$DEFAULT_MODEL_IDS"
+                    echo "No model_ids specified, using defaults: $MODEL_IDS"
+                  else
+                    MODEL_IDS="$MODEL_IDS_INPUT"
+                    echo "Using specified model_ids: $MODEL_IDS"
+                  fi
+
+                  # Resolve model configs using resolve_model_config.py
+                  # Transform output to matrix format for integration tests
+                  MATRIX=$(python3 << EOF
+                  import json
+                  import sys
+                  sys.path.insert(0, '.github/run-eval')
+                  from resolve_model_config import MODELS
+
+                  model_ids = "$MODEL_IDS".split(",")
+                  model_ids = [m.strip() for m in model_ids if m.strip()]
+
+                  matrix = []
+                  for model_id in model_ids:
+                      if model_id not in MODELS:
+                          available = ", ".join(sorted(MODELS.keys()))
+                          print(f"Error: Model ID '{model_id}' not found. Available: {available}", file=sys.stderr)
+                          sys.exit(1)
+                      model = MODELS[model_id]
+                      # Create run-suffix from model id (replace special chars with underscore)
+                      run_suffix = model_id.replace("-", "_").replace(".", "_") + "_run"
+                      matrix.append({
+                          "id": model_id,
+                          "name": model["display_name"],
+                          "run-suffix": run_suffix,
+                          "llm-config": model["llm_config"]
+                      })
+
+                  print(json.dumps(matrix))
+                  EOF
+                  )
+
+                  if [ $? -ne 0 ]; then
+                    echo "Failed to resolve model configurations" >&2
+                    exit 1
+                  fi
+
+                  echo "matrix=$MATRIX" >> "$GITHUB_OUTPUT"
+                  echo "Resolved models: $(echo "$MATRIX" | jq -r '.[].name' | paste -sd', ' -)"
+
+            - name: Resolve issue number
+              id: resolve-issue
+              env:
+                  ISSUE_NUMBER_INPUT: ${{ github.event.inputs.issue_number || '' }}
+                  PR_NUMBER: ${{ github.event.pull_request.number }}
+              run: |
+                  # Priority: explicit input > PR number from label trigger
+                  if [ -n "$ISSUE_NUMBER_INPUT" ]; then
+                    echo "issue_number=$ISSUE_NUMBER_INPUT" >> "$GITHUB_OUTPUT"
+                  elif [ -n "$PR_NUMBER" ]; then
+                    echo "issue_number=$PR_NUMBER" >> "$GITHUB_OUTPUT"
+                  else
+                    echo "issue_number=" >> "$GITHUB_OUTPUT"
+                  fi
+
+    # Post initial comment for label triggers (no dependencies - runs immediately)
+    post-label-comment:
+        if: >
+            github.event_name == 'pull_request_target' && (
+                github.event.label.name == 'integration-test' ||
+                github.event.label.name == 'behavior-test'
+            )
+        runs-on: ubuntu-latest
+        permissions:
+            pull-requests: write
+        steps:
+            - name: Comment on PR (integration tests via label)
+              if: github.event.label.name == 'integration-test'
+              uses: KeisukeYamashita/create-comment@v1
+              with:
+                  unique: false
+                  comment: |
+                      Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.
+            - name: Comment on PR (behavior tests via label)
+              if: github.event.label.name == 'behavior-test'
+              uses: KeisukeYamashita/create-comment@v1
+              with:
+                  unique: false
+                  comment: |
+                      Hi! I started running the behavior tests on your PR. You will receive a comment with the results shortly.
+
+    # Post initial comment for workflow_dispatch (depends on setup-matrix for issue_number resolution)
+    post-dispatch-comment:
+        needs: setup-matrix
+        if: github.event_name == 'workflow_dispatch' && github.event.inputs.issue_number != ''
+        runs-on: ubuntu-latest
+        permissions:
+            issues: write
+        steps:
+            - name: Comment on issue/PR (workflow_dispatch)
+              env:
+                  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+                  ISSUE_NUMBER: ${{ github.event.inputs.issue_number }}
+                  MODEL_IDS: ${{ github.event.inputs.model_ids || 'all models' }}
+                  TEST_TYPE: ${{ github.event.inputs.test_type || 'all' }}
+                  REASON: ${{ github.event.inputs.reason }}
+              run: |
+                  # Sanitize @OpenHands mentions to prevent self-mention loops
+                  SANITIZED_REASON=$(echo "$REASON" | sed 's/@OpenHands/@\u200BOpenHands/g; s/@openhands/@\u200Bopenhands/g')
+                  SANITIZED_MODEL_IDS=$(echo "$MODEL_IDS" | sed 's/@OpenHands/@\u200BOpenHands/g; s/@openhands/@\u200Bopenhands/g')
+                  COMMENT_BODY=$(cat <<EOF
+                  **Integration Tests Triggered**
+
+                  - **Reason:** $SANITIZED_REASON
+                  - **Test type:** $TEST_TYPE
+                  - **Models:** $SANITIZED_MODEL_IDS
+                  - **Workflow run:** ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
+
+                  Results will be posted here when complete.
+                  EOF
+                  )
+                  gh issue comment "$ISSUE_NUMBER" --body "$COMMENT_BODY"
+
+    run-integration-tests:
+        # Security: Only run when integration-related labels are present, via workflow_dispatch, or on schedule
+        # This prevents automatic execution on fork PRs without maintainer approval
+        # Note: uses always() to run even when comment jobs are skipped (e.g., for scheduled runs)
+        # Schedule trigger only runs in the main repository, not in forks
+        if: |
+            always() && (
+                (
+                    github.event_name == 'pull_request_target' && (
+                        github.event.label.name == 'integration-test' ||
+                        github.event.label.name == 'behavior-test'
+                    )
+                ) ||
+                github.event_name == 'workflow_dispatch' ||
+                (github.event_name == 'schedule' && github.repository == 'OpenHands/software-agent-sdk')
+            ) && needs.setup-matrix.result == 'success'
+        needs: [setup-matrix, post-label-comment, post-dispatch-comment]
+        runs-on: ubuntu-22.04
+        timeout-minutes: 180
+        permissions:
+            contents: read
+            id-token: write
+            pull-requests: write
+            issues: write
+        strategy:
+            fail-fast: false
+            matrix:
+                python-version: ['3.13']
+                job-config: ${{ fromJson(needs.setup-matrix.outputs.matrix) }}
+        steps:
+            - name: Checkout repository
+              uses: actions/checkout@v5
+              with:
+                  # For pull_request_target: checkout fork PR code (requires explicit repository)
+                  # For other events: fallback to current repository and ref
+                  repository: ${{ github.event.pull_request.head.repo.full_name || github.repository }}
+                  ref: ${{ github.event.pull_request.head.sha || github.ref }}
+                  # Security: Don't persist credentials to prevent untrusted PR code from using them
+                  persist-credentials: false
+
+            - name: Install uv
+              uses: astral-sh/setup-uv@v7
+              with:
+                  version: latest
+                  python-version: ${{ matrix.python-version }}
+
+            - name: Install Python dependencies using uv
+              run: |
+                  uv sync --dev
+                  uv pip install pytest
+
+            # Run integration test evaluation
+            - name: Determine test selection
+              run: |
+                  TEST_TYPE_ARGS=""
+                  if [ "${{ github.event_name }}" = "pull_request_target" ] && [ "${{ github.event.label.name }}" = "behavior-test" ]; then
+                    TEST_TYPE_ARGS="--test-type behavior"
+                    echo "behavior-test label detected; running behavior tests only."
+                  elif [ "${{ github.event_name }}" = "pull_request_target" ] && [ "${{ github.event.label.name }}" = "integration-test" ]; then
+                    TEST_TYPE_ARGS="--test-type integration"
+                    echo "integration-test label detected; running integration tests only."
+                  elif [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
+                    test_type="${{ github.event.inputs.test_type }}"
+                    case "$test_type" in
+                      behavior)
+                        TEST_TYPE_ARGS="--test-type behavior"
+                        echo "workflow_dispatch requested behavior tests only."
+                        ;;
+                      integration)
+                        TEST_TYPE_ARGS="--test-type integration"
+                        echo "workflow_dispatch requested integration tests only."
+                        ;;
+                      ""|all)
+                        echo "workflow_dispatch requested full integration suite."
+                        ;;
+                      *)
+                        echo "workflow_dispatch provided unknown test_type '$test_type'; defaulting to full suite."
+                        ;;
+                    esac
+                  elif [ "${{ github.event_name }}" = "schedule" ]; then
+                    TEST_TYPE_ARGS="--test-type integration"
+                    echo "Scheduled run; running integration tests only."
+                  else
+                    echo "Running full integration test suite."
+                  fi
+                  echo "TEST_TYPE_ARGS=$TEST_TYPE_ARGS" >> "$GITHUB_ENV"
+
+            - name: Run integration test evaluation for ${{ matrix.job-config['name'] }}
+              env:
+                  LLM_CONFIG: ${{ toJson(matrix.job-config['llm-config']) }}
+                  LLM_API_KEY: ${{ secrets.LLM_API_KEY_EVAL }}
+                  LLM_BASE_URL: https://llm-proxy.eval.all-hands.dev
+                  TOOL_PRESET: ${{ github.event.inputs.tool_preset || 'default' }}
+              run: |
+                  set -eo pipefail
+
+                  AGENT_SDK_VERSION=$(git rev-parse --short HEAD)
+                  EVAL_NOTE="${AGENT_SDK_VERSION}_${{ matrix.job-config['run-suffix'] }}"
+
+                  echo "Invoking test runner with TEST_TYPE_ARGS='$TEST_TYPE_ARGS' TOOL_PRESET='$TOOL_PRESET'"
+
+                  uv run python tests/integration/run_infer.py \
+                    --llm-config "$LLM_CONFIG" \
+                    --num-workers $N_PROCESSES \
+                    --eval-note "$EVAL_NOTE" \
+                    --tool-preset "$TOOL_PRESET" \
+                    $TEST_TYPE_ARGS
+
+                  # get integration tests JSON results
+                  RESULTS_FILE=$(find tests/integration/outputs/*${{ matrix.job-config['run-suffix'] }}* -name "results.json" -type f | head -n 1)
+                  echo "RESULTS_FILE: $RESULTS_FILE"
+                  if [ -f "$RESULTS_FILE" ]; then
+                    echo "JSON_RESULTS_FILE=$RESULTS_FILE" >> $GITHUB_ENV
+                  else
+                    echo "JSON_RESULTS_FILE=" >> $GITHUB_ENV
+                  fi
+
+            - name: Wait a little bit
+              run: sleep 10
+
+
+
+
+
+            - name: Create archive of evaluation outputs
+              run: |
+                  TIMESTAMP=$(date +'%y-%m-%d-%H-%M')
+                  cd tests/integration/outputs  # Change to the outputs directory
+                  tar -czvf ../../../integration_tests_${{ matrix.job-config['run-suffix'] }}_${TIMESTAMP}.tar.gz *${{ matrix.job-config['run-suffix'] }}* # Include result directories for this model
+
+            - name: Upload evaluation results as artifact
+              uses: actions/upload-artifact@v7
+              id: upload_results_artifact
+              with:
+                  name: integration-test-outputs-${{ matrix.job-config['run-suffix'] }}-${{ github.run_id }}-${{ github.run_attempt }}
+                  path: integration_tests_${{ matrix.job-config['run-suffix'] }}_*.tar.gz
+
+            - name: Save test results for consolidation
+              run: |
+                  # Copy the structured JSON results file for consolidation
+                  mkdir -p test_results_summary
+
+                  if [ -n "${{ env.JSON_RESULTS_FILE }}" ] && [ -f "${{ env.JSON_RESULTS_FILE }}" ]; then
+                    # Copy the JSON results file directly
+                    cp "${{ env.JSON_RESULTS_FILE }}" "test_results_summary/${{ matrix.job-config['run-suffix'] }}_results.json"
+                    echo "✓ Copied JSON results file for consolidation"
+                  else
+                    echo "✗ No JSON results file found"
+                    exit 1
+                  fi
+
+            - name: Upload test results summary
+              uses: actions/upload-artifact@v7
+              with:
+                  name: test-results-${{ matrix.job-config['run-suffix'] }}
+                  path: test_results_summary/${{ matrix.job-config['run-suffix'] }}_results.json
+
+    consolidate-results:
+        needs: [setup-matrix, run-integration-tests]
+        if: |
+            always() && (
+                (
+                    github.event_name == 'pull_request_target' && (
+                        github.event.label.name == 'integration-test' ||
+                        github.event.label.name == 'behavior-test'
+                    )
+                ) ||
+                github.event_name == 'workflow_dispatch' ||
+                (github.event_name == 'schedule' && github.repository == 'OpenHands/software-agent-sdk')
+            )
+        runs-on: ubuntu-24.04
+        permissions:
+            contents: read
+            pull-requests: write
+            issues: write
+        steps:
+            - name: Checkout repository
+              uses: actions/checkout@v5
+              with:
+                  # When using pull_request_target, explicitly checkout the PR branch
+                  # This ensures we use the scripts from the actual PR code
+                  ref: ${{ github.event.pull_request.head.sha || github.ref }}
+
+            - name: Install uv
+              uses: astral-sh/setup-uv@v7
+              with:
+                  version: latest
+                  python-version: '3.13'
+
+            - name: Install Python dependencies using uv
+              run: |
+                  uv sync --dev
+
+            - name: Download all test results
+              uses: actions/download-artifact@v8
+              with:
+                  pattern: test-results-*
+                  merge-multiple: true
+                  path: all_results
+
+            - name: Download all integration test artifacts
+              uses: actions/download-artifact@v8
+              with:
+                  pattern: integration-test-outputs-*
+                  path: artifacts
+
+            - name: Consolidate test results
+              env:
+                  EVENT_NAME: ${{ github.event_name }}
+                  PR_NUMBER: ${{ github.event.pull_request.number }}
+                  MANUAL_REASON: ${{ github.event.inputs.reason }}
+                  COMMIT_SHA: ${{ github.sha }}
+                  PYTHONPATH: ${{ github.workspace }}
+                  GITHUB_SERVER_URL: ${{ github.server_url }}
+                  GITHUB_REPOSITORY: ${{ github.repository }}
+                  GITHUB_RUN_ID: ${{ github.run_id }}
+              run: |
+                  uv run python tests/integration/utils/consolidate_json_results.py \
+                    --results-dir all_results \
+                    --artifacts-dir artifacts \
+                    --output-file consolidated_results.json
+
+                  echo "Consolidated results generated successfully"
+
+                  uv run python tests/integration/utils/generate_markdown_report.py \
+                    --input-file consolidated_results.json \
+                    --output-file consolidated_report.md
+
+            - name: Upload consolidated report
+              uses: actions/upload-artifact@v7
+              with:
+                  name: consolidated-report
+                  path: consolidated_report.md
+
+            - name: Create consolidated PR comment
+              if: github.event_name == 'pull_request_target'
+              run: |
+                  # Sanitize @OpenHands mentions to prevent self-mention loops
+                  COMMENT_BODY=$(uv run python -c "from openhands.sdk.utils.github import sanitize_openhands_mentions; import sys; print(sanitize_openhands_mentions(sys.stdin.read()), end='')" < consolidated_report.md)
+                  # Use GitHub CLI to create comment with explicit PR number
+                  echo "$COMMENT_BODY" | gh pr comment ${{ github.event.pull_request.number }} --body-file -
+              env:
+                  GH_TOKEN: ${{ github.token }}
+
+            - name: Comment on specified issue/PR (workflow_dispatch)
+              if: github.event_name == 'workflow_dispatch' && needs.setup-matrix.outputs.issue_number != ''
+              env:
+                  GH_TOKEN: ${{ github.token }}
+                  ISSUE_NUMBER: ${{ needs.setup-matrix.outputs.issue_number }}
+              run: |
+                  # Sanitize @OpenHands mentions to prevent self-mention loops
+                  COMMENT_BODY=$(uv run python -c "from openhands.sdk.utils.github import sanitize_openhands_mentions; import sys; print(sanitize_openhands_mentions(sys.stdin.read()), end='')" < consolidated_report.md)
+                  # Use GitHub CLI to create comment on the specified issue/PR
+                  echo "$COMMENT_BODY" | gh issue comment "$ISSUE_NUMBER" --body-file -
+
+            - name: Read consolidated report for tracker issue
+              if: github.event_name == 'schedule'
+              id: read_report
+              run: |
+                  # Read and sanitize the report, then set as output
+                  REPORT_CONTENT=$(uv run python -c "from openhands.sdk.utils.github import sanitize_openhands_mentions; import sys; print(sanitize_openhands_mentions(sys.stdin.read()), end='')" < consolidated_report.md)
+                  echo "report<<EOF" >> $GITHUB_OUTPUT
+                  echo "$REPORT_CONTENT" >> $GITHUB_OUTPUT
+                  echo "EOF" >> $GITHUB_OUTPUT
+
+            - name: Comment with results on tracker issue
+              if: github.event_name == 'schedule'
+              uses: KeisukeYamashita/create-comment@v1
+              with:
+                  number: 2078
+                  unique: false
+                  comment: |
+                      **Trigger:** Nightly Scheduled Run
+                      **Commit:** ${{ github.sha }}
+
+                      ${{ steps.read_report.outputs.report }}
+
@@ -1,97 +0,0 @@
-name: Lint Fix
-
-on:
-  pull_request:
-    types: [labeled]
-
-jobs:
-  # Frontend lint fixes
-  lint-fix-frontend:
-    if: github.event.label.name == 'lint-fix'
-    name: Fix frontend linting issues
-    runs-on: blacksmith-4vcpu-ubuntu-2204
-    permissions:
-      contents: write
-      pull-requests: write
-    steps:
-      - uses: actions/checkout@v4
-        with:
-          ref: ${{ github.head_ref }}
-          repository: ${{ github.event.pull_request.head.repo.full_name }}
-          fetch-depth: 0
-          token: ${{ secrets.GITHUB_TOKEN }}
-
-      - name: Install Node.js 22
-        uses: useblacksmith/setup-node@v5
-        with:
-          node-version: 22
-      - name: Install frontend dependencies
-        run: |
-          cd frontend
-          npm install --frozen-lockfile
-      - name: Generate i18n and route types
-        run: |
-          cd frontend
-          npm run make-i18n
-          npx react-router typegen || true
-
-      - name: Fix frontend lint issues
-        run: |
-          cd frontend
-          npm run lint:fix
-
-      # Commit and push changes if any
-      - name: Check for changes
-        id: git-check
-        run: |
-          git diff --quiet || echo "changes=true" >> $GITHUB_OUTPUT
-      - name: Commit and push if there are changes
-        if: steps.git-check.outputs.changes == 'true'
-        run: |
-          git config --local user.email "openhands@all-hands.dev"
-          git config --local user.name "OpenHands Bot"
-          git add -A
-          git commit -m "🤖 Auto-fix frontend linting issues" --no-verify
-          git push
-
-  # Python lint fixes
-  lint-fix-python:
-    if: github.event.label.name == 'lint-fix'
-    name: Fix Python linting issues
-    runs-on: blacksmith-4vcpu-ubuntu-2204
-    permissions:
-      contents: write
-      pull-requests: write
-    steps:
-      - uses: actions/checkout@v4
-        with:
-          ref: ${{ github.head_ref }}
-          repository: ${{ github.event.pull_request.head.repo.full_name }}
-          fetch-depth: 0
-          token: ${{ secrets.GITHUB_TOKEN }}
-
-      - name: Set up python
-        uses: useblacksmith/setup-python@v6
-        with:
-          python-version: 3.12
-          cache: "pip"
-      - name: Install pre-commit
-        run: pip install pre-commit==3.7.0
-      - name: Fix python lint issues
-        run: |
-          # Run all pre-commit hooks and continue even if they modify files (exit code 1)
-          pre-commit run --config ./dev_config/python/.pre-commit-config.yaml --all-files || true
-
-      # Commit and push changes if any
-      - name: Check for changes
-        id: git-check
-        run: |
-          git diff --quiet || echo "changes=true" >> $GITHUB_OUTPUT
-      - name: Commit and push if there are changes
-        if: steps.git-check.outputs.changes == 'true'
-        run: |
-          git config --local user.email "openhands@all-hands.dev"
-          git config --local user.name "OpenHands Bot"
-          git add -A
-          git commit -m "🤖 Auto-fix Python linting issues" --no-verify
-          git push
@@ -1,74 +0,0 @@
-# Workflow that runs lint on the frontend and python code
-name: Lint
-
-# The jobs in this workflow are required, so they must run at all times
-# Always run on "main"
-# Always run on PRs
-on:
-  push:
-    branches:
-      - main
-  pull_request:
-
-# If triggered by a PR, it will be in the same group. However, each commit on main will be in its own unique group
-concurrency:
-  group: ${{ github.workflow }}-${{ (github.head_ref && github.ref) || github.run_id }}
-  cancel-in-progress: true
-
-jobs:
-  # Run lint on the frontend code
-  lint-frontend:
-    name: Lint frontend
-    runs-on: blacksmith-4vcpu-ubuntu-2204
-    steps:
-      - uses: actions/checkout@v4
-      - name: Install Node.js 22
-        uses: useblacksmith/setup-node@v5
-        with:
-          node-version: 22
-      - name: Install dependencies
-        run: |
-          cd frontend
-          npm install --frozen-lockfile
-      - name: Lint, TypeScript compilation, and translation checks
-        run: |
-          cd frontend
-          npm run lint
-          npm run make-i18n && tsc
-          npm run check-translation-completeness
-
-  # Run lint on the python code
-  lint-python:
-    name: Lint python
-    runs-on: blacksmith-4vcpu-ubuntu-2204
-    steps:
-      - uses: actions/checkout@v4
-        with:
-          fetch-depth: 0
-      - name: Set up python
-        uses: useblacksmith/setup-python@v6
-        with:
-          python-version: 3.12
-          cache: "pip"
-      - name: Install pre-commit
-        run: pip install pre-commit==3.7.0
-      - name: Run pre-commit hooks
-        run: pre-commit run --all-files --show-diff-on-failure --config ./dev_config/python/.pre-commit-config.yaml
-
-  lint-enterprise-python:
-    name: Lint enterprise python
-    runs-on: blacksmith-4vcpu-ubuntu-2204
-    steps:
-      - uses: actions/checkout@v4
-        with:
-          fetch-depth: 0
-      - name: Set up python
-        uses: useblacksmith/setup-python@v6
-        with:
-          python-version: 3.12
-          cache: "pip"
-      - name: Install pre-commit
-        run: pip install pre-commit==4.2.0
-      - name: Run pre-commit hooks
-        working-directory: ./enterprise
-        run: pre-commit run --all-files --show-diff-on-failure --config ./dev_config/python/.pre-commit-config.yaml
@@ -1,108 +0,0 @@
-name: Publish OpenHands UI Package
-
-# * Always run on "main"
-# * Run on PRs that have changes in the "openhands-ui" folder or this workflow
-on:
-  push:
-    branches:
-      - main
-    paths:
-      - "openhands-ui/**"
-      - ".github/workflows/npm-publish-ui.yml"
-
-# If triggered by a PR, it will be in the same group. However, each commit on main will be in its own unique group
-concurrency:
-  group: npm-publish-ui
-  cancel-in-progress: false
-
-jobs:
-  check-version:
-    name: Check if version has changed
-    runs-on: blacksmith-4vcpu-ubuntu-2204
-    defaults:
-      run:
-        shell: bash
-    outputs:
-      should-publish: ${{ steps.version-check.outputs.should-publish }}
-      current-version: ${{ steps.version-check.outputs.current-version }}
-    steps:
-      - name: Checkout
-        uses: actions/checkout@v4
-        with:
-          fetch-depth: 2 # Need previous commit to compare
-
-      - name: Check if version changed
-        id: version-check
-        run: |
-          # Get current version from package.json
-          CURRENT_VERSION=$(jq -r .version openhands-ui/package.json)
-          echo "current-version=$CURRENT_VERSION" >> $GITHUB_OUTPUT
-
-          # Check if package.json version changed in this commit
-          if git diff HEAD~1 HEAD --name-only | grep -q "openhands-ui/package.json"; then
-            # Check if the version field specifically changed
-            if git diff HEAD~1 HEAD openhands-ui/package.json | grep -q '"version"'; then
-              echo "Version changed in package.json, will publish"
-              echo "should-publish=true" >> $GITHUB_OUTPUT
-            else
-              echo "package.json changed but version did not change, skipping publish"
-              echo "should-publish=false" >> $GITHUB_OUTPUT
-            fi
-          else
-            echo "package.json did not change, skipping publish"
-            echo "should-publish=false" >> $GITHUB_OUTPUT
-          fi
-
-  publish:
-    name: Publish to npm
-    runs-on: blacksmith-4vcpu-ubuntu-2204
-    needs: check-version
-    if: needs.check-version.outputs.should-publish == 'true'
-    defaults:
-      run:
-        shell: bash
-    steps:
-      - name: Checkout
-        uses: actions/checkout@v4
-
-      - name: Setup Bun
-        uses: oven-sh/setup-bun@v2
-        with:
-          bun-version-file: "openhands-ui/.bun-version"
-
-      - name: Install dependencies
-        working-directory: ./openhands-ui
-        run: bun install --frozen-lockfile
-
-      - name: Build package
-        working-directory: ./openhands-ui
-        run: bun run build
-
-      - name: Check if package already exists on npm
-        id: npm-check
-        working-directory: ./openhands-ui
-        run: |
-          PACKAGE_NAME=$(jq -r .name package.json)
-          VERSION="${{ needs.check-version.outputs.current-version }}"
-
-          # Check if this version already exists on npm
-          if npm view "$PACKAGE_NAME@$VERSION" version 2>/dev/null; then
-            echo "Version $VERSION already exists on npm, skipping publish"
-            echo "already-exists=true" >> $GITHUB_OUTPUT
-          else
-            echo "Version $VERSION does not exist on npm, proceeding with publish"
-            echo "already-exists=false" >> $GITHUB_OUTPUT
-          fi
-
-      - name: Setup npm authentication
-        if: steps.npm-check.outputs.already-exists == 'false'
-        run: |
-          echo "//registry.npmjs.org/:_authToken=${{ secrets.NPM_TOKEN }}" > ~/.npmrc
-
-      - name: Publish to npm
-        if: steps.npm-check.outputs.already-exists == 'false'
-        working-directory: ./openhands-ui
-        run: |
-          # The prepublishOnly script will run automatically and build the package
-          npm publish
-          echo "✅ Successfully published @openhands/ui@${{ needs.check-version.outputs.current-version }} to npm"
@@ -0,0 +1,30 @@
+name: Update Documentation (by OpenHands)
+
+on:
+  schedule:
+    # Run every 7 days at 2 AM UTC on Sundays
+    - cron: '0 2 * * 0'
+  workflow_dispatch: # Allow manual triggering
+
+jobs:
+  update-docs:
+    runs-on: blacksmith-4vcpu-ubuntu-2404
+    permissions:
+      contents: write
+      pull-requests: write
+    
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Update Documentation with OpenHands
+        uses: All-Hands-AI/openhands-github-action@v1
+        with:
+          prompt: .github/prompts/update-documentation.md
+          repository: ${{ github.repository }}
+          selected-branch: main
+          base-url: https://app.all-hands.dev
+          poll: "true"
+          timeout-seconds: 1800
+          poll-interval-seconds: 30
+          github-token: ${{ secrets.GITHUB_TOKEN }}
+          openhands-api-key: ${{ secrets.OPENHANDS_API_KEY }}
@@ -1,433 +0,0 @@
-name: Auto-Fix Tagged Issue with OpenHands
-
-on:
-  workflow_call:
-    inputs:
-      max_iterations:
-        required: false
-        type: number
-        default: 50
-      macro:
-        required: false
-        type: string
-        default: "@openhands-agent"
-      target_branch:
-        required: false
-        type: string
-        default: "main"
-        description: "Target branch to pull and create PR against"
-      pr_type:
-        required: false
-        type: string
-        default: "draft"
-        description: "The PR type that is going to be created (draft, ready)"
-      LLM_MODEL:
-        required: false
-        type: string
-        default: "anthropic/claude-sonnet-4-20250514"
-      LLM_API_VERSION:
-        required: false
-        type: string
-        default: ""
-      base_container_image:
-        required: false
-        type: string
-        default: ""
-        description: "Custom sandbox env"
-      runner:
-        required: false
-        type: string
-        default: "ubuntu-latest"
-    secrets:
-      LLM_MODEL:
-        required: false
-      LLM_API_KEY:
-        required: true
-      LLM_BASE_URL:
-        required: false
-      PAT_TOKEN:
-        required: false
-      PAT_USERNAME:
-        required: false
-
-  issues:
-    types: [labeled]
-  pull_request:
-    types: [labeled]
-  issue_comment:
-    types: [created]
-  pull_request_review_comment:
-    types: [created]
-  pull_request_review:
-    types: [submitted]
-
-permissions:
-  contents: write
-  pull-requests: write
-  issues: write
-
-jobs:
-  auto-fix:
-    if: |
-      github.event_name == 'workflow_call' ||
-      github.event.label.name == 'fix-me' ||
-      github.event.label.name == 'fix-me-experimental' ||
-      (
-        ((github.event_name == 'issue_comment' || github.event_name == 'pull_request_review_comment') &&
-        contains(github.event.comment.body, inputs.macro || '@openhands-agent') &&
-        (github.event.comment.author_association == 'OWNER' || github.event.comment.author_association == 'COLLABORATOR' || github.event.comment.author_association == 'MEMBER')
-        ) ||
-
-        (github.event_name == 'pull_request_review' &&
-        contains(github.event.review.body, inputs.macro || '@openhands-agent') &&
-        (github.event.review.author_association == 'OWNER' || github.event.review.author_association == 'COLLABORATOR' || github.event.review.author_association == 'MEMBER')
-        )
-      )
-    runs-on: "${{ inputs.runner || 'ubuntu-latest' }}"
-    steps:
-      - name: Checkout repository
-        uses: actions/checkout@v4
-
-      - name: Set up Python
-        uses: actions/setup-python@v6
-        with:
-          python-version: "3.12"
-      - name: Upgrade pip
-        run: |
-          python -m pip install --upgrade pip
-
-      - name: Get latest versions and create requirements.txt
-        run: |
-          python -m pip index versions openhands-ai > openhands_versions.txt
-          OPENHANDS_VERSION=$(head -n 1 openhands_versions.txt | awk '{print $2}' | tr -d '()')
-
-          # Create a new requirements.txt locally within the workflow, ensuring no reference to the repo's file
-          echo "openhands-ai==${OPENHANDS_VERSION}" > /tmp/requirements.txt
-          cat /tmp/requirements.txt
-
-      - name: Cache pip dependencies
-        if: |
-          !(
-            github.event.label.name == 'fix-me-experimental' ||
-            (
-              (github.event_name == 'issue_comment' || github.event_name == 'pull_request_review_comment') &&
-              contains(github.event.comment.body, '@openhands-agent-exp')
-            ) ||
-            (
-              github.event_name == 'pull_request_review' &&
-              contains(github.event.review.body, '@openhands-agent-exp')
-            )
-          )
-        uses: actions/cache@v5
-        with:
-          path: ${{ env.pythonLocation }}/lib/python3.12/site-packages/*
-          key: ${{ runner.os }}-pip-openhands-resolver-${{ hashFiles('/tmp/requirements.txt') }}
-          restore-keys: |
-            ${{ runner.os }}-pip-openhands-resolver-${{ hashFiles('/tmp/requirements.txt') }}
-
-      - name: Check required environment variables
-        env:
-          LLM_MODEL: ${{ secrets.LLM_MODEL || inputs.LLM_MODEL }}
-          LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
-          LLM_BASE_URL: ${{ secrets.LLM_BASE_URL }}
-          LLM_API_VERSION: ${{ inputs.LLM_API_VERSION }}
-          PAT_TOKEN: ${{ secrets.PAT_TOKEN }}
-          PAT_USERNAME: ${{ secrets.PAT_USERNAME }}
-          GITHUB_TOKEN: ${{ github.token }}
-        run: |
-          required_vars=("LLM_API_KEY")
-          for var in "${required_vars[@]}"; do
-            if [ -z "${!var}" ]; then
-              echo "Error: Required environment variable $var is not set."
-              exit 1
-            fi
-          done
-
-          # Check optional variables and warn about fallbacks
-          if [ -z "$LLM_BASE_URL" ]; then
-            echo "Warning: LLM_BASE_URL is not set, will use default API endpoint"
-          fi
-
-          if [ -z "$PAT_TOKEN" ]; then
-            echo "Warning: PAT_TOKEN is not set, falling back to GITHUB_TOKEN"
-          fi
-
-          if [ -z "$PAT_USERNAME" ]; then
-            echo "Warning: PAT_USERNAME is not set, will use openhands-agent"
-          fi
-
-      - name: Set environment variables
-        env:
-          REVIEW_BODY: ${{ github.event.review.body || '' }}
-        run: |
-          # Handle pull request events first
-          if [ -n "${{ github.event.pull_request.number }}" ]; then
-            echo "ISSUE_NUMBER=${{ github.event.pull_request.number }}" >> $GITHUB_ENV
-            echo "ISSUE_TYPE=pr" >> $GITHUB_ENV
-          # Handle pull request review events
-          elif [ -n "$REVIEW_BODY" ]; then
-            echo "ISSUE_NUMBER=${{ github.event.pull_request.number }}" >> $GITHUB_ENV
-            echo "ISSUE_TYPE=pr" >> $GITHUB_ENV
-          # Handle issue comment events that reference a PR
-          elif [ -n "${{ github.event.issue.pull_request }}" ]; then
-            echo "ISSUE_NUMBER=${{ github.event.issue.number }}" >> $GITHUB_ENV
-            echo "ISSUE_TYPE=pr" >> $GITHUB_ENV
-          # Handle regular issue events
-          else
-            echo "ISSUE_NUMBER=${{ github.event.issue.number }}" >> $GITHUB_ENV
-            echo "ISSUE_TYPE=issue" >> $GITHUB_ENV
-          fi
-
-          if [ -n "$REVIEW_BODY" ]; then
-            echo "COMMENT_ID=${{ github.event.review.id || 'None' }}" >> $GITHUB_ENV
-          else
-            echo "COMMENT_ID=${{ github.event.comment.id || 'None' }}" >> $GITHUB_ENV
-          fi
-
-          echo "MAX_ITERATIONS=${{ inputs.max_iterations || 50 }}" >> $GITHUB_ENV
-          echo "SANDBOX_ENV_GITHUB_TOKEN=${{ secrets.PAT_TOKEN || github.token }}" >> $GITHUB_ENV
-          echo "SANDBOX_BASE_CONTAINER_IMAGE=${{ inputs.base_container_image }}" >> $GITHUB_ENV
-
-          # Set branch variables
-          echo "TARGET_BRANCH=${{ inputs.target_branch || 'main' }}" >> $GITHUB_ENV
-
-      - name: Comment on issue with start message
-        uses: actions/github-script@v7
-        with:
-          github-token: ${{ secrets.PAT_TOKEN || github.token }}
-          script: |
-            const issueType = process.env.ISSUE_TYPE;
-            github.rest.issues.createComment({
-              issue_number: ${{ env.ISSUE_NUMBER }},
-              owner: context.repo.owner,
-              repo: context.repo.repo,
-              body: `[OpenHands](https://github.com/OpenHands/OpenHands) started fixing the ${issueType}! You can monitor the progress [here](https://github.com/${context.repo.owner}/${context.repo.repo}/actions/runs/${context.runId}).`
-            });
-
-      - name: Install OpenHands
-        id: install_openhands
-        uses: actions/github-script@v7
-        env:
-          COMMENT_BODY: ${{ github.event.comment.body || '' }}
-          REVIEW_BODY: ${{ github.event.review.body || '' }}
-          LABEL_NAME: ${{ github.event.label.name || '' }}
-          EVENT_NAME: ${{ github.event_name }}
-        with:
-          script: |
-            const commentBody = process.env.COMMENT_BODY.trim();
-            const reviewBody = process.env.REVIEW_BODY.trim();
-            const labelName = process.env.LABEL_NAME.trim();
-            const eventName = process.env.EVENT_NAME.trim();
-            // Check conditions
-            const isExperimentalLabel = labelName === "fix-me-experimental";
-            const isIssueCommentExperimental =
-              (eventName === "issue_comment" || eventName === "pull_request_review_comment") &&
-              commentBody.includes("@openhands-agent-exp");
-            const isReviewCommentExperimental =
-              eventName === "pull_request_review" && reviewBody.includes("@openhands-agent-exp");
-
-            // Set output variable
-            core.setOutput('isExperimental', isExperimentalLabel || isIssueCommentExperimental || isReviewCommentExperimental);
-
-            // Perform package installation
-            if (isExperimentalLabel || isIssueCommentExperimental || isReviewCommentExperimental) {
-              console.log("Installing experimental OpenHands...");
-
-              await exec.exec("pip install git+https://github.com/openhands/openhands.git");
-            } else {
-              console.log("Installing from requirements.txt...");
-
-              await exec.exec("pip install -r /tmp/requirements.txt");
-            }
-
-      - name: Attempt to resolve issue
-        env:
-          GITHUB_TOKEN: ${{ secrets.PAT_TOKEN || github.token }}
-          GITHUB_USERNAME: ${{ secrets.PAT_USERNAME || 'openhands-agent' }}
-          GIT_USERNAME: ${{ secrets.PAT_USERNAME || 'openhands-agent' }}
-          LLM_MODEL: ${{ secrets.LLM_MODEL || inputs.LLM_MODEL }}
-          LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
-          LLM_BASE_URL: ${{ secrets.LLM_BASE_URL }}
-          LLM_API_VERSION: ${{ inputs.LLM_API_VERSION }}
-          PYTHONPATH: ""
-        run: |
-          cd /tmp && python -m openhands.resolver.resolve_issue \
-            --selected-repo ${{ github.repository }} \
-            --issue-number ${{ env.ISSUE_NUMBER }} \
-            --issue-type ${{ env.ISSUE_TYPE }} \
-            --max-iterations ${{ env.MAX_ITERATIONS }} \
-            --comment-id ${{ env.COMMENT_ID }} \
-            --is-experimental ${{ steps.install_openhands.outputs.isExperimental }}
-
-      - name: Check resolution result
-        id: check_result
-        run: |
-          if cd /tmp && grep -q '"success":true' output/output.jsonl; then
-            echo "RESOLUTION_SUCCESS=true" >> $GITHUB_OUTPUT
-          else
-            echo "RESOLUTION_SUCCESS=false" >> $GITHUB_OUTPUT
-          fi
-
-      - name: Upload output.jsonl as artifact
-        uses: actions/upload-artifact@v6
-        if: always() # Upload even if the previous steps fail
-        with:
-          name: resolver-output
-          path: /tmp/output/output.jsonl
-          retention-days: 30 # Keep the artifact for 30 days
-
-      - name: Create draft PR or push branch
-        if: always() # Create PR or branch even if the previous steps fail
-        env:
-          GITHUB_TOKEN: ${{ secrets.PAT_TOKEN || github.token }}
-          GITHUB_USERNAME: ${{ secrets.PAT_USERNAME || 'openhands-agent' }}
-          GIT_USERNAME: ${{ secrets.PAT_USERNAME || 'openhands-agent' }}
-          LLM_MODEL: ${{ secrets.LLM_MODEL || inputs.LLM_MODEL }}
-          LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
-          LLM_BASE_URL: ${{ secrets.LLM_BASE_URL }}
-          LLM_API_VERSION: ${{ inputs.LLM_API_VERSION }}
-          PYTHONPATH: ""
-        run: |
-          if [ "${{ steps.check_result.outputs.RESOLUTION_SUCCESS }}" == "true" ]; then
-            cd /tmp && python -m openhands.resolver.send_pull_request \
-              --issue-number ${{ env.ISSUE_NUMBER }} \
-              --target-branch ${{ env.TARGET_BRANCH }} \
-              --pr-type ${{ inputs.pr_type || 'draft' }} \
-              --reviewer ${{ github.actor }} | tee pr_result.txt && \
-              grep "PR created" pr_result.txt | sed 's/.*\///g' > pr_number.txt
-          else
-            cd /tmp && python -m openhands.resolver.send_pull_request \
-              --issue-number ${{ env.ISSUE_NUMBER }} \
-              --pr-type branch \
-              --send-on-failure | tee branch_result.txt && \
-              grep "branch created" branch_result.txt | sed 's/.*\///g; s/.expand=1//g' > branch_name.txt
-          fi
-
-      # Step leaves comment for when agent is invoked on PR
-      - name: Analyze Push Logs (Updated PR or No Changes) # Skip comment if PR update was successful OR leave comment if the agent made no code changes
-        uses: actions/github-script@v7
-        if: always()
-        env:
-          AGENT_RESPONDED: ${{ env.AGENT_RESPONDED || 'false' }}
-          ISSUE_NUMBER: ${{ env.ISSUE_NUMBER }}
-        with:
-          github-token: ${{ secrets.PAT_TOKEN || github.token }}
-          script: |
-            const fs = require('fs');
-            const issueNumber = process.env.ISSUE_NUMBER;
-            let logContent = '';
-
-            try {
-              logContent = fs.readFileSync('/tmp/pr_result.txt', 'utf8').trim();
-            } catch (error) {
-              console.error('Error reading pr_result.txt file:', error);
-            }
-
-            const noChangesMessage = `No changes to commit for issue #${issueNumber}. Skipping commit.`;
-
-            // Check logs from send_pull_request.py (pushes code to GitHub)
-            if (logContent.includes("Updated pull request")) {
-              console.log("Updated pull request found. Skipping comment.");
-              process.env.AGENT_RESPONDED = 'true';
-            } else if (logContent.includes(noChangesMessage)) {
-              github.rest.issues.createComment({
-                issue_number: issueNumber,
-                owner: context.repo.owner,
-                repo: context.repo.repo,
-                body: `The workflow to fix this issue encountered an error. Openhands failed to create any code changes.`
-              });
-              process.env.AGENT_RESPONDED = 'true';
-            }
-
-      # Step leaves comment for when agent is invoked on issue
-      - name: Comment on issue # Comment link to either PR or branch created by agent
-        uses: actions/github-script@v7
-        if: always() # Comment on issue even if the previous steps fail
-        env:
-          AGENT_RESPONDED: ${{ env.AGENT_RESPONDED || 'false' }}
-          ISSUE_NUMBER: ${{ env.ISSUE_NUMBER }}
-          RESOLUTION_SUCCESS: ${{ steps.check_result.outputs.RESOLUTION_SUCCESS }}
-        with:
-          github-token: ${{ secrets.PAT_TOKEN || github.token }}
-          script: |
-            const fs = require('fs');
-            const path = require('path');
-            const issueNumber = process.env.ISSUE_NUMBER;
-            const success = process.env.RESOLUTION_SUCCESS === 'true';
-
-            let prNumber = '';
-            let branchName = '';
-            let resultExplanation = '';
-
-            try {
-              if (success) {
-                prNumber = fs.readFileSync('/tmp/pr_number.txt', 'utf8').trim();
-              } else {
-                branchName = fs.readFileSync('/tmp/branch_name.txt', 'utf8').trim();
-              }
-            } catch (error) {
-              console.error('Error reading file:', error);
-            }
-
-
-            try {
-              if (!success){
-                // Read result_explanation from JSON file for failed resolution
-                const outputFilePath = path.resolve('/tmp/output/output.jsonl');
-                if (fs.existsSync(outputFilePath)) {
-                  const outputContent = fs.readFileSync(outputFilePath, 'utf8');
-                  const jsonLines = outputContent.split('\n').filter(line => line.trim() !== '');
-
-                  if (jsonLines.length > 0) {
-                    // First entry in JSON lines has the key 'result_explanation'
-                    const firstEntry = JSON.parse(jsonLines[0]);
-                    resultExplanation = firstEntry.result_explanation || '';
-                  }
-                }
-              }
-            } catch (error){
-              console.error('Error reading file:', error);
-            }
-
-            // Check "success" log from resolver output
-            if (success && prNumber) {
-              github.rest.issues.createComment({
-                issue_number: issueNumber,
-                owner: context.repo.owner,
-                repo: context.repo.repo,
-                body: `A potential fix has been generated and a draft PR #${prNumber} has been created. Please review the changes.`
-              });
-              process.env.AGENT_RESPONDED = 'true';
-            } else if (!success && branchName) {
-              let commentBody = `An attempt was made to automatically fix this issue, but it was unsuccessful. A branch named '${branchName}' has been created with the attempted changes. You can view the branch [here](https://github.com/${context.repo.owner}/${context.repo.repo}/tree/${branchName}). Manual intervention may be required.`;
-
-              if (resultExplanation) {
-                commentBody += `\n\nAdditional details about the failure:\n${resultExplanation}`;
-              }
-
-              github.rest.issues.createComment({
-                issue_number: issueNumber,
-                owner: context.repo.owner,
-                repo: context.repo.repo,
-                body: commentBody
-              });
-              process.env.AGENT_RESPONDED = 'true';
-            }
-
-      # Leave error comment when both PR/Issue comment handling fail
-      - name: Fallback Error Comment
-        uses: actions/github-script@v7
-        if: ${{ env.AGENT_RESPONDED == 'false' }} # Only run if no conditions were met in previous steps
-        env:
-          ISSUE_NUMBER: ${{ env.ISSUE_NUMBER }}
-        with:
-          github-token: ${{ secrets.PAT_TOKEN || github.token }}
-          script: |
-            const issueNumber = process.env.ISSUE_NUMBER;
-
-            github.rest.issues.createComment({
-              issue_number: issueNumber,
-              owner: context.repo.owner,
-              repo: context.repo.repo,
-              body: `The workflow to fix this issue encountered an error. Please check the [workflow logs](https://github.com/${context.repo.owner}/${context.repo.repo}/actions/runs/${context.runId}) for more information.`
-            });
@@ -0,0 +1,139 @@
+---
+name: PR Artifacts
+
+on:
+    workflow_dispatch: # Manual trigger for testing
+    pull_request:
+        types: [opened, synchronize, reopened]
+        branches: [main]
+    pull_request_review:
+        types: [submitted]
+
+jobs:
+  # Auto-remove .pr/ directory when a reviewer approves
+    cleanup-on-approval:
+        concurrency:
+            group: cleanup-pr-artifacts-${{ github.event.pull_request.number }}
+            cancel-in-progress: false
+        if: github.event_name == 'pull_request_review' && github.event.review.state == 'approved'
+        runs-on: ubuntu-latest
+        permissions:
+            contents: write
+            pull-requests: write
+        steps:
+            - name: Check if fork PR
+              id: check-fork
+              run: |
+                  if [ "${{ github.event.pull_request.head.repo.full_name }}" != "${{ github.event.pull_request.base.repo.full_name }}" ]; then
+                    echo "is_fork=true" >> $GITHUB_OUTPUT
+                    echo "::notice::Fork PR detected - skipping auto-cleanup (manual removal required)"
+                  else
+                    echo "is_fork=false" >> $GITHUB_OUTPUT
+                  fi
+
+            # Use PAT so the push triggers CI workflows that will complete and
+            # satisfy branch protection. We can't use [skip ci] because the Vercel
+            # GitHub App creates stuck checks that block merging.
+            - uses: actions/checkout@v5
+              if: steps.check-fork.outputs.is_fork == 'false'
+              with:
+                  ref: ${{ github.event.pull_request.head.ref }}
+                  token: ${{ secrets.ALLHANDS_BOT_GITHUB_PAT }}
+
+            - name: Remove .pr/ directory
+              id: remove
+              if: steps.check-fork.outputs.is_fork == 'false'
+              run: |
+                  if [ -d ".pr" ]; then
+                    git config user.name "allhands-bot"
+                    git config user.email "allhands-bot@users.noreply.github.com"
+                    git rm -rf .pr/
+                    git commit -m "chore: Remove PR-only artifacts [automated]"
+                    git push || {
+                      echo "::error::Failed to push cleanup commit. Check branch protection rules."
+                      exit 1
+                    }
+                    echo "removed=true" >> $GITHUB_OUTPUT
+                    echo "::notice::Removed .pr/ directory"
+                  else
+                    echo "removed=false" >> $GITHUB_OUTPUT
+                    echo "::notice::No .pr/ directory to remove"
+                  fi
+
+            - name: Update PR comment after cleanup
+              if: steps.check-fork.outputs.is_fork == 'false' && steps.remove.outputs.removed == 'true'
+              uses: actions/github-script@v8
+              with:
+                  script: |
+                      const marker = '<!-- pr-artifacts-notice -->';
+                      const body = `${marker}
+                      ✅ **PR Artifacts Cleaned Up**
+
+                      The \`.pr/\` directory has been automatically removed.
+                      `;
+
+                      const { data: comments } = await github.rest.issues.listComments({
+                        owner: context.repo.owner,
+                        repo: context.repo.repo,
+                        issue_number: context.issue.number,
+                      });
+
+                      const existing = comments.find(c => c.body.includes(marker));
+                      if (existing) {
+                        await github.rest.issues.updateComment({
+                          owner: context.repo.owner,
+                          repo: context.repo.repo,
+                          comment_id: existing.id,
+                          body: body,
+                        });
+                      }
+
+  # Warn if .pr/ directory exists (will be auto-removed on approval)
+    check-pr-artifacts:
+        if: github.event_name == 'pull_request'
+        runs-on: ubuntu-latest
+        permissions:
+            contents: read
+            pull-requests: write
+        steps:
+            - uses: actions/checkout@v5
+
+            - name: Check for .pr/ directory
+              id: check
+              run: |
+                  if [ -d ".pr" ]; then
+                    echo "exists=true" >> $GITHUB_OUTPUT
+                    echo "::warning::.pr/ directory exists and will be automatically removed when the PR is approved. For fork PRs, manual removal is required before merging."
+                  else
+                    echo "exists=false" >> $GITHUB_OUTPUT
+                  fi
+
+            - name: Post or update PR comment
+              if: steps.check.outputs.exists == 'true'
+              uses: actions/github-script@v8
+              with:
+                  script: |
+                      const marker = '<!-- pr-artifacts-notice -->';
+                      const body = `${marker}
+                      📁 **PR Artifacts Notice**
+
+                      This PR contains a \`.pr/\` directory with PR-specific documents. This directory will be **automatically removed** when the PR is approved.
+
+                      > For fork PRs: Manual removal is required before merging.
+                      `;
+
+                      const { data: comments } = await github.rest.issues.listComments({
+                        owner: context.repo.owner,
+                        repo: context.repo.repo,
+                        issue_number: context.issue.number,
+                      });
+
+                      const existing = comments.find(c => c.body.includes(marker));
+                      if (!existing) {
+                        await github.rest.issues.createComment({
+                          owner: context.repo.owner,
+                          repo: context.repo.repo,
+                          issue_number: context.issue.number,
+                          body: body,
+                        });
+                      }
@@ -0,0 +1,51 @@
+---
+name: PR Review by OpenHands
+
+on:
+    # Use pull_request so workflow changes can be validated in PRs.
+    # This workflow requires secrets, so the job only runs for same-repo PRs.
+    # It runs when:
+    #   1. A new PR is opened (non-draft), OR
+    #   2. A draft PR is marked as ready for review, OR
+    #   3. A maintainer adds the 'review-this' label, OR
+    #   4. A maintainer requests openhands-agent or all-hands-bot as a reviewer
+    # Adding labels and requesting reviewers still requires write access.
+    pull_request:
+        types: [opened, ready_for_review, labeled, review_requested]
+
+permissions:
+    contents: read
+    pull-requests: write
+    issues: write
+
+jobs:
+    pr-review:
+        # Run when one of the following conditions is met:
+        #   1. A new non-draft PR is opened by a non-first-time contributor, OR
+        #   2. A draft PR is converted to ready for review by a non-first-time contributor, OR
+        #   3. 'review-this' label is added, OR
+        #   4. openhands-agent or all-hands-bot is requested as a reviewer
+        # Note: FIRST_TIME_CONTRIBUTOR and NONE PRs require manual trigger via label/reviewer request.
+        if: |
+            github.event.pull_request.head.repo.full_name == github.repository && (
+                (github.event.action == 'opened' && github.event.pull_request.draft == false && github.event.pull_request.author_association != 'FIRST_TIME_CONTRIBUTOR' && github.event.pull_request.author_association != 'NONE') ||
+                (github.event.action == 'ready_for_review' && github.event.pull_request.author_association != 'FIRST_TIME_CONTRIBUTOR' && github.event.pull_request.author_association != 'NONE') ||
+                github.event.label.name == 'review-this' ||
+                github.event.requested_reviewer.login == 'openhands-agent' ||
+                github.event.requested_reviewer.login == 'all-hands-bot'
+            )
+        concurrency:
+            group: pr-review-${{ github.event.pull_request.number }}
+            cancel-in-progress: true
+        runs-on: ubuntu-24.04
+        steps:
+            - name: Run PR Review
+              uses: OpenHands/extensions/plugins/pr-review@main
+              with:
+                  llm-model: litellm_proxy/claude-sonnet-4-5-20250929
+                  llm-base-url: https://llm-proxy.app.all-hands.dev
+                  # Review style: roasted (other option: standard)
+                  review-style: roasted
+                  llm-api-key: ${{ secrets.LLM_API_KEY }}
+                  github-token: ${{ secrets.ALLHANDS_BOT_GITHUB_PAT }}
+                  lmnr-api-key: ${{ secrets.LMNR_SKILLS_API_KEY }}
@@ -0,0 +1,85 @@
+---
+name: PR Review Evaluation
+
+# This workflow evaluates how well PR review comments were addressed.
+# It runs when a PR is closed to assess review effectiveness.
+#
+# Security note: pull_request_target is safe here because:
+# 1. Only triggers on PR close (not on code changes)
+# 2. Does not checkout PR code - only downloads artifacts from trusted workflow runs
+# 3. Runs evaluation scripts from the extensions repo, not from the PR
+
+on:
+    pull_request_target:
+        types: [closed]
+
+permissions:
+    contents: read
+    pull-requests: read
+
+jobs:
+    evaluate:
+        runs-on: ubuntu-24.04
+        env:
+            PR_NUMBER: ${{ github.event.pull_request.number }}
+            REPO_NAME: ${{ github.repository }}
+            PR_MERGED: ${{ github.event.pull_request.merged }}
+
+        steps:
+            - name: Download review trace artifact
+              id: download-trace
+              uses: dawidd6/action-download-artifact@v6
+              continue-on-error: true
+              with:
+                  workflow: pr-review-by-openhands.yml
+                  name: pr-review-trace-${{ github.event.pull_request.number }}
+                  path: trace-info
+                  search_artifacts: true
+                  if_no_artifact_found: warn
+
+            - name: Check if trace file exists
+              id: check-trace
+              run: |
+                  if [ -f "trace-info/laminar_trace_info.json" ]; then
+                    echo "trace_exists=true" >> $GITHUB_OUTPUT
+                    echo "Found trace file for PR #$PR_NUMBER"
+                  else
+                    echo "trace_exists=false" >> $GITHUB_OUTPUT
+                    echo "No trace file found for PR #$PR_NUMBER - skipping evaluation"
+                  fi
+
+            # Always checkout main branch for security - cannot test script changes in PRs
+            - name: Checkout extensions repository
+              if: steps.check-trace.outputs.trace_exists == 'true'
+              uses: actions/checkout@v5
+              with:
+                  repository: OpenHands/extensions
+                  path: extensions
+
+            - name: Set up Python
+              if: steps.check-trace.outputs.trace_exists == 'true'
+              uses: actions/setup-python@v6
+              with:
+                  python-version: '3.12'
+
+            - name: Install dependencies
+              if: steps.check-trace.outputs.trace_exists == 'true'
+              run: pip install lmnr
+
+            - name: Run evaluation
+              if: steps.check-trace.outputs.trace_exists == 'true'
+              env:
+                  # Script expects LMNR_PROJECT_API_KEY; org secret is named LMNR_SKILLS_API_KEY
+                  LMNR_PROJECT_API_KEY: ${{ secrets.LMNR_SKILLS_API_KEY }}
+                  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+              run: |
+                  python extensions/plugins/pr-review/scripts/evaluate_review.py \
+                      --trace-file trace-info/laminar_trace_info.json
+
+            - name: Upload evaluation logs
+              uses: actions/upload-artifact@v7
+              if: always() && steps.check-trace.outputs.trace_exists == 'true'
+              with:
+                  name: pr-review-evaluation-${{ github.event.pull_request.number }}
+                  path: '*.log'
+                  retention-days: 30
@@ -0,0 +1,31 @@
+---
+# .github/workflows/precommit.yml
+name: Pre-commit checks
+
+on:
+    push:
+        branches: [main]
+    pull_request:
+        branches: ['**']
+
+jobs:
+    pre-commit:
+        runs-on: ubuntu-24.04
+
+        steps:
+            - name: Checkout code
+              uses: actions/checkout@v5
+
+            - name: Set up Python
+              uses: actions/setup-python@v6
+              with:
+                  python-version: '3.13'
+
+            - name: Install uv
+              uses: astral-sh/setup-uv@v7
+
+            - name: Install dependencies
+              run: uv sync --frozen --group dev
+
+            - name: Run pre-commit (all files)
+              run: uv run pre-commit run --all-files --show-diff-on-failure
@@ -0,0 +1,132 @@
+---
+name: Prepare Release
+
+on:
+    workflow_dispatch:
+        inputs:
+            version:
+                description: Release version (e.g., 1.2.3)
+                required: true
+                type: string
+
+jobs:
+    prepare-release:
+        runs-on: ubuntu-24.04
+        steps:
+            - name: Validate version format
+              run: |
+                  if ! [[ "${{ inputs.version }}" =~ ^[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
+                    echo "❌ Invalid version format. Expected: X.Y.Z (e.g., 1.2.3)"
+                    exit 1
+                  fi
+                  echo "✅ Version format is valid: ${{ inputs.version }}"
+
+            - name: Checkout repository
+              uses: actions/checkout@v5
+              with:
+                  token: ${{ secrets.ALLHANDS_BOT_GITHUB_PAT }}
+
+            - name: Install uv
+              uses: astral-sh/setup-uv@v7
+              with:
+                  version: latest
+                  python-version: '3.13'
+
+            - name: Configure Git
+              run: |
+                  git config user.name "github-actions[bot]"
+                  git config user.email "github-actions[bot]@users.noreply.github.com"
+
+            - name: Create release branch
+              run: |
+                  BRANCH_NAME="rel-${{ inputs.version }}"
+                  echo "Creating branch: $BRANCH_NAME"
+                  git checkout -b "$BRANCH_NAME"
+                  echo "BRANCH_NAME=$BRANCH_NAME" >> $GITHUB_ENV
+
+            - name: Set package version
+              run: |
+                  echo "🔧 Setting version to ${{ inputs.version }}"
+                  make set-package-version version=${{ inputs.version }}
+
+            - name: Update sdk_ref default in run-eval workflow
+              run: python3 .github/scripts/update_sdk_ref_default.py "${{ inputs.version }}"
+
+            - name: Commit version changes
+              run: |
+                  git add .
+                  if git diff --staged --quiet; then
+                    echo "No changes to commit"
+                  else
+                    git commit -m "Release v${{ inputs.version }}" -m "Co-authored-by: openhands <openhands@all-hands.dev>"
+                    echo "✅ Changes committed"
+                  fi
+
+            - name: Push release branch
+              run: |
+                  git push -u origin "${{ env.BRANCH_NAME }}"
+                  echo "✅ Branch pushed: ${{ env.BRANCH_NAME }}"
+
+            - name: Create Pull Request
+              env:
+                  GH_TOKEN: ${{ secrets.ALLHANDS_BOT_GITHUB_PAT }}
+              run: |
+                  cat > pr_body.txt << 'EOF'
+                  ## Release v${{ inputs.version }}
+
+                  This PR prepares the release for version **${{ inputs.version }}**.
+
+                  ### Release Checklist
+                  - [x] Version set to ${{ inputs.version }}
+                  - [ ] Fix any deprecation deadlines if they exist
+                  - [ ] Integration tests pass (tagged with `integration-test`)
+                  - [ ] Behavior tests pass (tagged with `behavior-test`)
+                  - [ ] Example tests pass (tagged with `test-examples`)
+                  - [ ] Draft release created at https://github.com/OpenHands/software-agent-sdk/releases/new
+                    - [ ] Select tag: `v${{ inputs.version }}`
+                    - [ ] Select branch: `${{ env.BRANCH_NAME }}`
+                    - [ ] Auto-generate release notes
+                    - [ ] Publish release (PyPI will auto-publish)
+                  - [ ] Evaluation on OpenHands Index
+
+                  ### Next Steps
+                  1. Review the version changes
+                  2. Address any deprecation deadlines
+                  3. Ensure integration tests pass
+                  4. Ensure behavior tests pass
+                  5. Ensure example tests pass
+                  6. Create and publish the release
+
+                  Once the release is published on GitHub, the PyPI packages will be automatically published via the `pypi-release.yml` workflow.
+                  EOF
+
+                  gh pr create \
+                    --title "Release v${{ inputs.version }}" \
+                    --body-file pr_body.txt \
+                    --base main \
+                    --head "${{ env.BRANCH_NAME }}" \
+                    --label "integration-test" \
+                    --label "behavior-test" \
+                    --label "test-examples"
+
+                  rm pr_body.txt
+                  echo "✅ Pull request created successfully!"
+
+                  # Get PR URL and display it
+                  PR_URL=$(gh pr view "${{ env.BRANCH_NAME }}" --json url --jq '.url')
+                  echo "🔗 PR URL: $PR_URL"
+                  echo "PR_URL=$PR_URL" >> $GITHUB_ENV
+
+            - name: Summary
+              run: |
+                  echo "## ✅ Release Preparation Complete!" >> $GITHUB_STEP_SUMMARY
+                  echo "" >> $GITHUB_STEP_SUMMARY
+                  echo "- **Version**: ${{ inputs.version }}" >> $GITHUB_STEP_SUMMARY
+                  echo "- **Branch**: ${{ env.BRANCH_NAME }}" >> $GITHUB_STEP_SUMMARY
+                  echo "- **PR URL**: ${{ env.PR_URL }}" >> $GITHUB_STEP_SUMMARY
+                  echo "" >> $GITHUB_STEP_SUMMARY
+                  echo "### Next Steps:" >> $GITHUB_STEP_SUMMARY
+                  echo "1. Review the PR and address any deprecation deadlines" >> $GITHUB_STEP_SUMMARY
+                  echo "2. Wait for integration, behavior, and example tests to pass" >> $GITHUB_STEP_SUMMARY
+                  echo "3. Create and publish the release on GitHub" >> $GITHUB_STEP_SUMMARY
+                  echo "4. PyPI will automatically publish when the release is created" >> $GITHUB_STEP_SUMMARY
@@ -1,127 +0,0 @@
-# Workflow that runs python tests
-name: Run Python Tests
-
-# The jobs in this workflow are required, so they must run at all times
-# * Always run on "main"
-# * Always run on PRs
-on:
-  push:
-    branches:
-      - main
-  pull_request:
-
-# If triggered by a PR, it will be in the same group. However, each commit on main will be in its own unique group
-concurrency:
-  group: ${{ github.workflow }}-${{ (github.head_ref && github.ref) || github.run_id }}
-  cancel-in-progress: true
-
-jobs:
-  # Run python tests on Linux
-  test-on-linux:
-    name: Python Tests on Linux
-    runs-on: blacksmith-4vcpu-ubuntu-2404
-    env:
-      INSTALL_DOCKER: "0" # Set to '0' to skip Docker installation
-    strategy:
-      matrix:
-        python-version: ["3.12"]
-    permissions:
-      # For coverage comment and python-coverage-comment-action branch
-      pull-requests: write
-      contents: write
-    steps:
-      - uses: actions/checkout@v4
-      - name: Set up Docker Buildx
-        id: buildx
-        uses: docker/setup-buildx-action@v3
-      - name: Install tmux
-        run: sudo apt-get update && sudo apt-get install -y tmux
-      - name: Setup Node.js
-        uses: useblacksmith/setup-node@v5
-        with:
-          node-version: "22.x"
-      - name: Install poetry via pipx
-        run: pipx install poetry
-      - name: Set up Python
-        uses: useblacksmith/setup-python@v6
-        with:
-          python-version: ${{ matrix.python-version }}
-          cache: "poetry"
-      - name: Install Python dependencies using Poetry
-        run: |
-          poetry install --with dev,test,runtime
-          poetry run pip install pytest-xdist
-          poetry run pip install pytest-rerunfailures
-      - name: Build Environment
-        run: make build
-      - name: Run Unit Tests
-        run: PYTHONPATH=".:$PYTHONPATH" poetry run pytest --forked -n auto -s ./tests/unit --cov=openhands --cov-branch
-        env:
-          COVERAGE_FILE: ".coverage.${{ matrix.python_version }}"
-      - name: Run Runtime Tests with CLIRuntime
-        run: PYTHONPATH=".:$PYTHONPATH" TEST_RUNTIME=cli poetry run pytest -n 5 --reruns 2 --reruns-delay 3 -s tests/runtime/test_bash.py --cov=openhands --cov-branch
-        env:
-          COVERAGE_FILE: ".coverage.runtime.${{ matrix.python_version }}"
-      - name: Store coverage file
-        uses: actions/upload-artifact@v6
-        with:
-          name: coverage-openhands
-          path: |
-            .coverage.${{ matrix.python_version }}
-            .coverage.runtime.${{ matrix.python_version }}
-          include-hidden-files: true
-
-  test-enterprise:
-    name: Enterprise Python Unit Tests
-    runs-on: blacksmith-4vcpu-ubuntu-2404
-    strategy:
-      matrix:
-        python-version: ["3.12"]
-    steps:
-      - uses: actions/checkout@v4
-      - name: Install poetry via pipx
-        run: pipx install poetry
-      - name: Set up Python
-        uses: useblacksmith/setup-python@v6
-        with:
-          python-version: ${{ matrix.python-version }}
-          cache: "poetry"
-      - name: Install Python dependencies using Poetry
-        working-directory: ./enterprise
-        run: poetry install --with dev,test
-      - name: Run Unit Tests
-        # Use base working directory for coverage paths to line up.
-        run: PYTHONPATH=".:$PYTHONPATH" poetry run --project=enterprise pytest --forked -n auto -s -p no:ddtrace -p no:ddtrace.pytest_bdd -p no:ddtrace.pytest_benchmark ./enterprise/tests/unit --cov=enterprise --cov-branch
-        env:
-          COVERAGE_FILE: ".coverage.enterprise.${{ matrix.python_version }}"
-      - name: Store coverage file
-        uses: actions/upload-artifact@v6
-        with:
-          name: coverage-enterprise
-          path: ".coverage.enterprise.${{ matrix.python_version }}"
-          include-hidden-files: true
-
-  coverage-comment:
-    name: Coverage Comment
-    if: github.event_name == 'pull_request'
-    runs-on: ubuntu-latest
-    needs: [test-on-linux, test-enterprise]
-
-    permissions:
-      pull-requests: write
-      contents: write
-    steps:
-      - uses: actions/checkout@v4
-
-      - uses: actions/download-artifact@v6
-        id: download
-        with:
-          pattern: coverage-*
-          merge-multiple: true
-
-      - name: Coverage comment
-        id: coverage_comment
-        uses: py-cov-action/python-coverage-comment-action@v3
-        with:
-          GITHUB_TOKEN: ${{ github.token }}
-          MERGE_COVERAGE_FILES: true
@@ -1,40 +1,70 @@
-# Publishes the OpenHands PyPi package
-name: Publish PyPi Package
+---
+name: Publish all OpenHands packages (uv)

 on:
-  workflow_dispatch:
-    inputs:
-      reason:
-        description: "What are you publishing?"
-        required: true
-        type: choice
-        options:
-          - app server
-        default: app server
-  push:
-    tags:
-      - "*"
+  # Run manually
+    workflow_dispatch:
+  # Run automatically when a release is published
+    release:
+        types: [published]

 jobs:
-  release:
-    runs-on: blacksmith-4vcpu-ubuntu-2204
-    # Run when manually dispatched for "app server" OR for tag pushes that don't contain '-cli'
-    if: |
-      (github.event_name == 'workflow_dispatch' && github.event.inputs.reason == 'app server')
-      || (github.event_name == 'push' && startsWith(github.ref, 'refs/tags/') && !contains(github.ref, '-cli'))
-    steps:
-      - uses: actions/checkout@v4
-      - uses: useblacksmith/setup-python@v6
-        with:
-          python-version: 3.12
-      - name: Install Poetry
-        uses: snok/install-poetry@v1.4.1
-        with:
-          virtualenvs-in-project: true
-          virtualenvs-path: ~/.virtualenvs
-      - name: Install Poetry Dependencies
-        run: poetry install --no-interaction --no-root
-      - name: Build poetry project
-        run: ./build.sh
-      - name: publish
-        run: poetry publish -u __token__ -p ${{ secrets.PYPI_TOKEN }}
+    publish:
+        runs-on: ubuntu-24.04
+        outputs:
+            version: ${{ steps.extract_version.outputs.version }}
+        steps:
+            - name: Checkout
+              uses: actions/checkout@v5
+
+            - name: Extract version from release tag
+              id: extract_version
+              run: |
+                  # Get version from release tag (e.g., v1.2.3 -> 1.2.3)
+                  if [[ "${{ github.event_name }}" == "release" ]]; then
+                    VERSION="${{ github.event.release.tag_name }}"
+                    VERSION="${VERSION#v}"  # Remove 'v' prefix if present
+                  else
+                    # For manual dispatch, extract from pyproject.toml
+                    VERSION=$(grep -m1 '^version = ' openhands-sdk/pyproject.toml | cut -d'"' -f2)
+                  fi
+                  echo "version=$VERSION" >> $GITHUB_OUTPUT
+                  echo "📦 Version: $VERSION"
+
+            - name: Install uv
+              uses: astral-sh/setup-uv@v7
+              with:
+                  version: latest
+                  python-version: '3.13'
+
+            - name: Build and publish all packages
+              env:
+                  UV_PUBLISH_TOKEN: ${{ secrets.PYPI_TOKEN_OPENHANDS }}
+              run: |
+                  set -euo pipefail
+
+                  if [ -z "${UV_PUBLISH_TOKEN:-}" ]; then
+                    echo "❌ Missing secret PYPI_TOKEN_OPENHANDS"
+                    exit 1
+                  fi
+
+                  PACKAGES=(
+                    openhands-sdk
+                    openhands-tools
+                    openhands-workspace
+                    openhands-agent-server
+                  )
+
+                  echo "🚀 Building and publishing all packages..."
+                  for PKG in "${PACKAGES[@]}"; do
+                    echo "===== $PKG ====="
+                    uv build --package "$PKG"
+                  done
+
+                  # Use --check-url to skip files that already exist on PyPI
+                  # This allows re-running the workflow after partial failures
+                  uv publish --token "$UV_PUBLISH_TOKEN" --check-url https://pypi.org/simple/
+                  echo "✅ All packages built and published successfully!"
+                  echo ""
+                  echo "📋 Note: Version bump PRs will be created by the 'Create Version Bump PRs' workflow"
+                  echo "   which triggers automatically after this workflow completes."
@@ -0,0 +1,114 @@
+---
+name: Review Thread Gate
+
+on:
+    pull_request:
+        branches: [main]
+        types: [opened, synchronize, reopened, ready_for_review, edited]
+
+permissions:
+    contents: read
+    pull-requests: read
+
+concurrency:
+    group: review-thread-gate-${{ github.event.pull_request.number || github.sha }}
+    cancel-in-progress: true
+
+jobs:
+    unresolved-review-threads:
+        runs-on: ubuntu-latest
+        steps:
+            - name: Fail when unresolved review threads remain (unless waived)
+              uses: actions/github-script@v8
+              with:
+                  script: |
+                      const pr = context.payload.pull_request;
+                      if (!pr) {
+                        core.info('No pull_request payload available; skipping.');
+                        return;
+                      }
+
+                      const waiverMatch = pr.body?.match(
+                        /review-thread-waiver\s*:\s*(.+?)(?:\n|$)/i,
+                      );
+                      const waiverReason = waiverMatch?.[1]?.trim() || null;
+
+                      const unresolved = [];
+                      let cursor = null;
+                      do {
+                        const query = `
+                          query($owner: String!, $repo: String!, $number: Int!, $cursor: String) {
+                            repository(owner: $owner, name: $repo) {
+                              pullRequest(number: $number) {
+                                reviewThreads(first: 100, after: $cursor) {
+                                  nodes {
+                                    id
+                                    isResolved
+                                    isOutdated
+                                    comments(first: 1) {
+                                      nodes {
+                                        author { login }
+                                        path
+                                        line
+                                        url
+                                      }
+                                    }
+                                  }
+                                  pageInfo {
+                                    hasNextPage
+                                    endCursor
+                                  }
+                                }
+                              }
+                            }
+                          }
+                        `;
+                        const result = await github.graphql(query, {
+                          owner: context.repo.owner,
+                          repo: context.repo.repo,
+                          number: pr.number,
+                          cursor,
+                        });
+
+                        const page = result.repository.pullRequest.reviewThreads;
+                        for (const thread of page.nodes) {
+                          if (thread.isResolved) continue;
+                          const firstComment = thread.comments.nodes[0];
+                          unresolved.push({
+                            url: firstComment?.url ?? '(no-url)',
+                            author: firstComment?.author?.login ?? 'unknown',
+                            path: firstComment?.path ?? 'unknown',
+                            line: firstComment?.line ?? '?',
+                            outdated: thread.isOutdated,
+                          });
+                        }
+
+                        cursor = page.pageInfo.hasNextPage ? page.pageInfo.endCursor : null;
+                      } while (cursor);
+
+                      if (unresolved.length === 0) {
+                        core.info('No unresolved review threads found.');
+                        return;
+                      }
+
+                      const summaryLines = unresolved.map(
+                        (thread) =>
+                          `- ${thread.url} (author: ${thread.author}, file: ${thread.path}:${thread.line}, outdated: ${thread.outdated})`,
+                      );
+                      await core.summary
+                        .addHeading(`Unresolved review threads: ${unresolved.length}`)
+                        .addRaw(summaryLines.join('\n'))
+                        .write();
+
+                      if (waiverReason) {
+                        core.warning(
+                          `Unresolved review threads remain (${unresolved.length}), but waiver provided: ${waiverReason}`,
+                        );
+                        return;
+                      }
+
+                      core.setFailed(
+                        `Found ${unresolved.length} unresolved review thread(s). Resolve all threads or add ` +
+                        '`review-thread-waiver: <reason>` to the PR body for an intentional waiver.',
+                      );
+
@@ -0,0 +1,403 @@
+---
+name: Run Eval
+run-name: Run Eval (${{ inputs.benchmark || 'swebench' }}) ${{ inputs.reason || github.event.label.name || 'release' }}
+
+on:
+    pull_request_target:
+        types: [labeled]
+    release:
+        types: [published]
+    workflow_dispatch:
+        inputs:
+            benchmark:
+                description: Benchmark to evaluate
+                required: false
+                default: swebench
+                type: choice
+                options:
+                    - gaia
+                    - swebench
+                    - swtbench
+                    - commit0
+                    - swebenchmultimodal
+                    - terminalbench
+            sdk_ref:
+                description: SDK commit/ref to evaluate (must be a semantic version like v1.0.0 unless 'Allow unreleased branches' is checked)
+                required: true
+                default: v1.14.0
+
+
+
+
+
+            allow_unreleased_branches:
+                description: Allow unreleased branches (bypasses semantic version requirement)
+                required: false
+                default: false
+                type: boolean
+            eval_limit:
+                description: Number of instances to run (any positive integer)
+                required: false
+                default: '1'
+                type: string
+            model_ids:
+                description: Comma-separated model IDs to evaluate. Must be keys of MODELS in resolve_model_config.py. Defaults to first model in that
+                    dict.
+                required: false
+                default: ''
+                type: string
+            reason:
+                description: Reason for manual trigger
+                required: false
+                default: ''
+            eval_branch:
+                description: Evaluation repo branch to use (for testing feature branches)
+                required: false
+                default: main
+                type: string
+            benchmarks_branch:
+                description: Benchmarks repo branch to use (for testing feature branches)
+                required: false
+                default: main
+                type: string
+            instance_ids:
+                description: >-
+                    Comma-separated instance IDs to evaluate.
+                    Example: "django__django-11583,django__django-12345".
+                    Spaces around commas are automatically stripped.
+                    Leave empty to evaluate all instances up to eval_limit.
+                required: false
+                default: ''
+            num_infer_workers:
+                description: Number of inference workers (optional, overrides benchmark default)
+                required: false
+                default: ''
+                type: string
+            num_eval_workers:
+                description: Number of evaluation workers (optional, overrides benchmark default)
+                required: false
+                default: ''
+                type: string
+            enable_conversation_event_logging:
+                description: 'Enable Datadog persistence for conversation events (default: true)'
+                required: false
+                default: true
+                type: boolean
+            max_retries:
+                description: Max retries per instance (passed to benchmarks)
+                required: false
+                default: '3'
+                type: string
+            tool_preset:
+                description: >-
+                    Tool preset for file editing. 'default' uses FileEditorTool,
+                    'gemini' uses read_file/write_file/edit/list_directory,
+                    'gpt5' uses apply_patch tool.
+                required: false
+                default: default
+                type: choice
+                options:
+                    - default
+                    - gemini
+                    - gpt5
+                    - planning
+            agent_type:
+                description: >-
+                    Agent type: 'default' for standard Agent,
+                    'acp-claude' for ACPAgent with Claude Code,
+                    'acp-codex' for ACPAgent with Codex.
+                required: false
+                default: default
+                type: choice
+                options:
+                    - default
+                    - acp-claude
+                    - acp-codex
+
+
+env:
+    EVAL_REPO: OpenHands/evaluation
+    EVAL_WORKFLOW: eval-job.yml
+
+jobs:
+    print-parameters:
+        if: >
+            github.event_name == 'release' ||
+            github.event_name == 'workflow_dispatch' ||
+            (github.event_name == 'pull_request_target' &&
+             (github.event.label.name == 'run-eval-1' ||
+              github.event.label.name == 'run-eval-50' ||
+              github.event.label.name == 'run-eval-200' ||
+              github.event.label.name == 'run-eval-500'))
+        runs-on: ubuntu-latest
+        steps:
+            - name: Print all parameters
+              run: |
+                  echo "=== Workflow Parameters ==="
+                  echo "Event: ${{ github.event_name }}"
+                  echo "Actor: ${{ github.actor }}"
+                  echo "Ref: ${{ github.ref }}"
+                  echo ""
+                  echo "=== Input Parameters ==="
+                  echo "benchmark: ${{ github.event.inputs.benchmark || 'swebench' }}"
+                  echo "sdk_ref: ${{ github.event.inputs.sdk_ref || 'N/A' }}"
+                  echo "allow_unreleased_branches: ${{ github.event.inputs.allow_unreleased_branches || 'false' }}"
+                  echo "eval_limit: ${{ github.event.inputs.eval_limit || '1' }}"
+                  echo "model_ids: ${{ github.event.inputs.model_ids || '(default)' }}"
+                  echo "reason: ${{ github.event.inputs.reason || 'N/A' }}"
+                  echo "eval_branch: ${{ github.event.inputs.eval_branch || 'main' }}"
+                  echo "benchmarks_branch: ${{ github.event.inputs.benchmarks_branch || 'main' }}"
+                  echo "instance_ids: ${{ github.event.inputs.instance_ids || 'N/A' }}"
+                  echo "num_infer_workers: ${{ github.event.inputs.num_infer_workers || '(default)' }}"
+                  echo "num_eval_workers: ${{ github.event.inputs.num_eval_workers || '(default)' }}"
+                  echo "enable_conversation_event_logging: ${{ github.event.inputs.enable_conversation_event_logging || 'true' }}"
+                  echo "max_retries: ${{ github.event.inputs.max_retries || '3' }}"
+                  echo "tool_preset: ${{ github.event.inputs.tool_preset || 'default' }}"
+                  echo ""
+                  echo "=== Environment Variables ==="
+                  echo "EVAL_REPO: ${{ env.EVAL_REPO }}"
+                  echo "EVAL_WORKFLOW: ${{ env.EVAL_WORKFLOW }}"
+                  echo ""
+                  echo "=== Label (for PR events) ==="
+                  echo "Label: ${{ github.event.label.name || 'N/A' }}"
+
+    build-and-evaluate:
+        needs: print-parameters
+        runs-on: ubuntu-latest
+        permissions:
+            contents: read
+            packages: write
+            actions: write
+            issues: write
+            pull-requests: write
+
+        steps:
+            - name: Checkout sdk code (base for validation)
+              uses: actions/checkout@v4
+              with:
+                  ref: ${{ github.event_name == 'workflow_dispatch' && github.event.inputs.sdk_ref || (github.event_name == 'pull_request_target' && 
+                      github.event.pull_request.base.ref || github.ref) }}
+                  fetch-depth: 0
+
+            - name: Set up Python
+              uses: actions/setup-python@v5
+              with:
+                  python-version: '3.13'
+
+            - name: Validate eval_limit
+              if: github.event_name == 'workflow_dispatch'
+              run: |
+                  if ! [[ "${{ github.event.inputs.eval_limit }}" =~ ^[1-9][0-9]*$ ]]; then
+                    echo "Error: eval_limit must be a positive integer, got: ${{ github.event.inputs.eval_limit }}"
+                    exit 1
+                  fi
+
+            - name: Validate SDK reference (semantic version check)
+              if: github.event_name == 'workflow_dispatch'
+              env:
+                  SDK_REF: ${{ github.event.inputs.sdk_ref }}
+                  ALLOW_UNRELEASED_BRANCHES: ${{ github.event.inputs.allow_unreleased_branches }}
+              run: |
+                  python3 .github/run-eval/validate_sdk_ref.py
+
+            - name: Install dependencies
+              run: |
+                  pip install 'litellm>=1.81.0'
+
+            - name: Load model IDs from Python script
+              id: load-models
+              run: |
+                  # Extract all model IDs from resolve_model_config.py
+                  ALLOWED_MODEL_IDS=$(python3 << 'EOF'
+                  import sys
+                  sys.path.insert(0, '.github/run-eval')
+                  from resolve_model_config import MODELS
+                  import json
+                  print(json.dumps(list(MODELS.keys())))
+                  EOF
+                  )
+                  DEFAULT_MODEL=$(echo "$ALLOWED_MODEL_IDS" | jq -r '.[0]')
+                  if [ -z "$DEFAULT_MODEL" ] || [ "$DEFAULT_MODEL" = "null" ]; then
+                    echo "No models configured" >&2
+                    exit 1
+                  fi
+                  echo "allowed_model_ids=$ALLOWED_MODEL_IDS" >> "$GITHUB_OUTPUT"
+                  echo "default_model=$DEFAULT_MODEL" >> "$GITHUB_OUTPUT"
+
+            - name: Resolve parameters
+              id: params
+              env:
+                  DEFAULT_MODEL: ${{ steps.load-models.outputs.default_model }}
+                  ALLOWED_MODEL_IDS_JSON: ${{ steps.load-models.outputs.allowed_model_ids }}
+                  PAT_TOKEN_DEFAULT: ${{ secrets.ALLHANDS_BOT_GITHUB_PAT }}
+              run: |
+                  set -euo pipefail
+
+                  # Set PAT token for cross-repo workflow dispatch
+                  PAT_TOKEN="$PAT_TOKEN_DEFAULT"
+                  if [ -z "$PAT_TOKEN" ]; then
+                    echo "Missing PAT token" >&2
+                    exit 1
+                  fi
+                  echo "PAT_TOKEN=$PAT_TOKEN" >> "$GITHUB_ENV"
+
+                  # Determine eval limit and SDK SHA based on trigger
+                  if [ "${{ github.event_name }}" = "pull_request_target" ]; then
+                    LABEL="${{ github.event.label.name }}"
+                    case "$LABEL" in
+                      run-eval-1) EVAL_LIMIT=1 ;;
+                      run-eval-50) EVAL_LIMIT=50 ;;
+                      run-eval-200) EVAL_LIMIT=200 ;;
+                      run-eval-500) EVAL_LIMIT=500 ;;
+                      *) echo "Unsupported label $LABEL" >&2; exit 1 ;;
+                    esac
+                    SDK_SHA="${{ github.event.pull_request.head.sha }}"
+                    PR_NUMBER="${{ github.event.pull_request.number }}"
+                    TRIGGER_DESCRIPTION="Label '${LABEL}' on PR #${PR_NUMBER}"
+                  elif [ "${{ github.event_name }}" = "release" ]; then
+                    EVAL_LIMIT=50
+                    # Use tag instead of target_commitish because release branches are automatically deleted after merge
+                    SDK_SHA=$(git rev-parse "${{ github.event.release.tag_name }}")
+                    PR_NUMBER=""
+                    TRIGGER_DESCRIPTION="Release ${{ github.event.release.tag_name }}"
+                  else
+                    EVAL_LIMIT="${{ github.event.inputs.eval_limit }}"
+                    SDK_REF="${{ github.event.inputs.sdk_ref }}"
+                    # Convert ref to SHA for manual dispatch
+                    # Resolve SHA robustly for both branch refs and raw SHAs (avoid double-prefix issues)
+                    SDK_SHA=$(git rev-parse --verify "$SDK_REF^{commit}" 2>/dev/null || \
+                              git rev-parse --verify "origin/$SDK_REF^{commit}" 2>/dev/null || \
+                              echo "$SDK_REF")
+                    PR_NUMBER=""
+                    REASON="${{ github.event.inputs.reason }}"
+                    if [ -z "$REASON" ]; then
+                      REASON="manual"
+                    fi
+                    TRIGGER_DESCRIPTION="Manual trigger: ${REASON}"
+                  fi
+
+                  # Normalize and validate model IDs
+                  MODELS_INPUT="${{ github.event_name == 'workflow_dispatch' && github.event.inputs.model_ids || '' }}"
+                  if [ -z "$MODELS_INPUT" ]; then
+                    MODELS_INPUT="$DEFAULT_MODEL"
+                  fi
+                  MODELS=$(printf '%s' "$MODELS_INPUT" | tr ', ' '\n' | sed '/^$/d' | paste -sd, -)
+                  ALLOWED_LIST=$(echo "$ALLOWED_MODEL_IDS_JSON" | jq -r '.[]')
+                  for MODEL in ${MODELS//,/ }; do
+                    if ! echo "$ALLOWED_LIST" | grep -Fx "$MODEL" >/dev/null; then
+                      echo "Model ID '$MODEL' not found in models.json" >&2
+                      echo "Available models: $(echo "$ALLOWED_LIST" | paste -sd, -)" >&2
+                      exit 1
+                    fi
+                  done
+
+                  # Sanitize values to avoid GITHUB_OUTPUT parse errors (e.g., raw SHAs)
+                  SDK_SHA=$(printf '%s' "$SDK_SHA" | tr -d '\n\r')
+                  EVAL_LIMIT=$(printf '%s' "$EVAL_LIMIT" | tr -d '\n\r')
+                  PR_NUMBER=$(printf '%s' "$PR_NUMBER" | tr -d '\n\r')
+                  MODELS=$(printf '%s' "$MODELS" | tr -d '\n\r')
+                  TRIGGER_DESCRIPTION=$(printf '%s' "$TRIGGER_DESCRIPTION" | tr -d '\n\r')
+
+                  printf 'eval_limit=%s\n' "$EVAL_LIMIT" >> "$GITHUB_OUTPUT"
+                  printf 'sdk_sha=%s\n' "$SDK_SHA" >> "$GITHUB_OUTPUT"
+                  printf 'models=%s\n' "$MODELS" >> "$GITHUB_OUTPUT"
+                  printf 'pr_number=%s\n' "$PR_NUMBER" >> "$GITHUB_OUTPUT"
+                  printf 'trigger_desc=%s\n' "$TRIGGER_DESCRIPTION" >> "$GITHUB_OUTPUT"
+
+            - name: Resolve model configurations and verify availability
+              id: resolve-models
+              env:
+                  MODEL_IDS: ${{ steps.params.outputs.models }}
+                  LLM_API_KEY: ${{ secrets.LLM_API_KEY_EVAL }}
+                  LLM_BASE_URL: https://llm-proxy.eval.all-hands.dev
+              run: |
+                  python3 .github/run-eval/resolve_model_config.py
+
+            - name: Dispatch evaluation workflow
+              env:
+                  SDK_SHA: ${{ steps.params.outputs.sdk_sha }}
+                  EVAL_LIMIT: ${{ steps.params.outputs.eval_limit }}
+                  MODELS_JSON: ${{ steps.resolve-models.outputs.models_json }}
+                  EVAL_REPO: ${{ env.EVAL_REPO }}
+                  EVAL_WORKFLOW: ${{ env.EVAL_WORKFLOW }}
+                  EVAL_BRANCH: ${{ github.event.inputs.eval_branch || 'main' }}
+                  BENCHMARKS_BRANCH: ${{ github.event.inputs.benchmarks_branch || 'main' }}
+                  BENCHMARK: ${{ github.event.inputs.benchmark || 'swebench' }}
+                  TRIGGER_REASON: ${{ github.event.inputs.reason }}
+                  PR_NUMBER: ${{ steps.params.outputs.pr_number }}
+                  INSTANCE_IDS: ${{ github.event.inputs.instance_ids || '' }}
+                  NUM_INFER_WORKERS: ${{ github.event.inputs.num_infer_workers || '' }}
+                  NUM_EVAL_WORKERS: ${{ github.event.inputs.num_eval_workers || '' }}
+                  ENABLE_CONVERSATION_EVENT_LOGGING: ${{ github.event.inputs.enable_conversation_event_logging || false }}
+                  MAX_RETRIES: ${{ github.event.inputs.max_retries || '3' }}
+                  TOOL_PRESET: ${{ github.event.inputs.tool_preset || 'default' }}
+                  AGENT_TYPE: ${{ github.event.inputs.agent_type || 'default' }}
+                  TRIGGERED_BY: ${{ github.actor }}
+              run: |
+                  # Normalize instance_ids: strip all spaces
+                  INSTANCE_IDS=$(printf '%s' "$INSTANCE_IDS" | tr -d ' ')
+
+                  echo "Dispatching evaluation workflow with SDK commit: $SDK_SHA (benchmark: $BENCHMARK, eval branch: $EVAL_BRANCH, benchmarks branch: $BENCHMARKS_BRANCH, tool preset: $TOOL_PRESET)"
+                  PAYLOAD=$(jq -n \
+                    --arg sdk "$SDK_SHA" \
+                    --arg eval_limit "$EVAL_LIMIT" \
+                    --argjson models "$MODELS_JSON" \
+                    --arg ref "$EVAL_BRANCH" \
+                    --arg reason "$TRIGGER_REASON" \
+                    --arg pr "$PR_NUMBER" \
+                    --arg benchmarks "$BENCHMARKS_BRANCH" \
+                    --arg benchmark "$BENCHMARK" \
+                    --arg instance_ids "$INSTANCE_IDS" \
+                    --arg num_infer_workers "$NUM_INFER_WORKERS" \
+                    --arg num_eval_workers "$NUM_EVAL_WORKERS" \
+                    --argjson enable_conversation_event_logging "$ENABLE_CONVERSATION_EVENT_LOGGING" \
+                    --arg max_retries "$MAX_RETRIES" \
+                    --arg tool_preset "$TOOL_PRESET" \
+                    --arg agent_type "$AGENT_TYPE" \
+                    --arg triggered_by "$TRIGGERED_BY" \
+                    '{ref: $ref, inputs: {sdk_commit: $sdk, eval_limit: $eval_limit, models_json: ($models | tostring), trigger_reason: $reason, pr_number: $pr, benchmarks_branch: $benchmarks, benchmark: $benchmark, instance_ids: $instance_ids, num_infer_workers: $num_infer_workers, num_eval_workers: $num_eval_workers, enable_conversation_event_logging: $enable_conversation_event_logging, max_retries: $max_retries, tool_preset: $tool_preset, agent_type: $agent_type, triggered_by: $triggered_by}}')
+                  RESPONSE=$(curl -sS -o /tmp/dispatch.out -w "%{http_code}" -X POST \
+                    -H "Authorization: token $PAT_TOKEN" \
+                    -H "Accept: application/vnd.github+json" \
+                    -d "$PAYLOAD" \
+                    "https://api.github.com/repos/${EVAL_REPO}/actions/workflows/${EVAL_WORKFLOW}/dispatches")
+                  if [ "$RESPONSE" != "204" ]; then
+                    echo "Dispatch failed (status $RESPONSE):" >&2
+                    cat /tmp/dispatch.out >&2
+                    exit 1
+                  fi
+
+            - name: Comment on PR
+              env:
+                  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+                  SDK_SHA: ${{ steps.params.outputs.sdk_sha }}
+                  EVAL_LIMIT: ${{ steps.params.outputs.eval_limit }}
+                  MODELS: ${{ steps.params.outputs.models }}
+                  TRIGGER_DESC: ${{ steps.params.outputs.trigger_desc }}
+                  EVENT_NAME: ${{ github.event_name }}
+                  PR_NUMBER_INPUT: ${{ steps.params.outputs.pr_number }}
+              run: |
+                  set -euo pipefail
+                  PR_NUMBER="$PR_NUMBER_INPUT"
+                  if [ "$EVENT_NAME" = "release" ] && [ -z "$PR_NUMBER" ]; then
+                    # Attempt to find the merged PR for this commit
+                    PR_NUMBER=$(curl -sS \
+                      -H "Authorization: Bearer $GITHUB_TOKEN" \
+                      -H "Accept: application/vnd.github+json" \
+                      "https://api.github.com/repos/${{ github.repository }}/commits/${SDK_SHA}/pulls" \
+                      | jq -r '.[0].number // ""')
+                  fi
+
+                  if [ -z "$PR_NUMBER" ]; then
+                    echo "No PR found to comment on; skipping comment"
+                    exit 0
+                  fi
+
+                  COMMENT_BODY=$(printf '**Evaluation Triggered**\n\n- Trigger: %s\n- SDK: %s\n- Eval limit: %s\n- Models: %s\n' \
+                    "$TRIGGER_DESC" "$SDK_SHA" "$EVAL_LIMIT" "$MODELS")
+
+                  curl -sS -X POST \
+                    -H "Accept: application/vnd.github+json" \
+                    -H "Authorization: Bearer $GITHUB_TOKEN" \
+                    "https://api.github.com/repos/${{ github.repository }}/issues/${PR_NUMBER}/comments" \
+                    -d "$(jq -n --arg body "$COMMENT_BODY" '{body: $body}')"
@@ -0,0 +1,199 @@
+---
+name: Run Examples Scripts
+
+on:
+    pull_request:
+        types: [labeled]
+    workflow_dispatch:
+        inputs:
+            reason:
+                description: Reason for manual trigger
+                required: true
+                default: ''
+    schedule:
+        - cron: 30 22 * * * # Runs at 10:30pm UTC every day
+
+permissions:
+    contents: read
+    pull-requests: write
+    issues: write
+
+jobs:
+    test-examples:
+        # Schedule trigger only runs in the main repository, not in forks
+        if: github.event.label.name == 'test-examples' || github.event_name == 'workflow_dispatch' || (github.event_name == 'schedule' && 
+            github.repository == 'OpenHands/software-agent-sdk')
+        runs-on: ubuntu-24.04
+        timeout-minutes: 60
+        steps:
+            - name: Wait for agent server to finish build
+              if: github.event_name == 'pull_request'
+              uses: lewagon/wait-on-check-action@v1.4.1
+              with:
+                  ref: ${{ github.event.pull_request.head.ref }}
+                  check-name: Build & Push (python-amd64)
+                  repo-token: ${{ secrets.GITHUB_TOKEN }}
+                  wait-interval: 10
+
+            - name: Checkout
+              uses: actions/checkout@v5
+              with:
+                  ref: ${{ github.event.pull_request.head.ref }}
+                  repository: ${{ github.event.pull_request.head.repo.full_name }}
+
+            - name: Install uv
+              uses: astral-sh/setup-uv@v7
+              with:
+                  enable-cache: true
+                  python-version: '3.13'
+
+            - name: Install Node.js
+              uses: actions/setup-node@v6
+              with:
+                  node-version: '22'
+
+            - name: Setup Apptainer
+              uses: eWaterCycle/setup-apptainer@v2
+              with:
+                  apptainer-version: 1.3.6
+
+            - name: Install Chromium
+              run: |
+                  sudo apt-get update
+                  sudo apt-get install -y chromium-browser
+
+            - name: Install dependencies
+              run: uv sync --frozen --group dev
+
+            - name: Run examples
+              shell: bash
+              env:
+                  LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
+                  LLM_MODEL: openhands/claude-haiku-4-5-20251001
+                  LLM_BASE_URL: https://llm-proxy.app.all-hands.dev
+                  RUNTIME_API_KEY: ${{ secrets.RUNTIME_API_KEY }}
+                  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+                  PR_NUMBER: ${{ github.event.pull_request.number }}
+                  REPO_OWNER: ${{ github.repository_owner }}
+                  REPO_NAME: ${{ github.event.repository.name }}
+                  GITHUB_SHA: ${{ github.event.pull_request.head.sha }}
+                  OPENHANDS_CLOUD_API_KEY: ${{ secrets.ALLHANDS_BOT_OPENHANDS_SAAS_API_KEY }}
+                  # ACP agents (Claude Code, Codex) route through LiteLLM proxy
+                  ANTHROPIC_BASE_URL: https://llm-proxy.app.all-hands.dev
+                  ANTHROPIC_API_KEY: ${{ secrets.LLM_API_KEY }}
+                  OPENAI_BASE_URL: https://llm-proxy.app.all-hands.dev
+                  OPENAI_API_KEY: ${{ secrets.LLM_API_KEY }}
+              run: |
+                  RESULTS_DIR=".example-test-results"
+                  REPORT_PATH="examples_report.md"
+                  rm -rf "$RESULTS_DIR"
+                  mkdir -p "$RESULTS_DIR"
+
+                  update_comment() {
+                      if [ -z "$API_URL" ]; then
+                          echo "Skipping PR comment update because API_URL is unset."
+                          return
+                      fi
+
+                      local comment_body="$1"
+                      local payload
+                      local response
+
+                      payload=$(jq -n --arg body "$comment_body" '{body: $body}')
+
+                      if [ -z "$COMMENT_ID" ]; then
+                          echo "Creating PR comment..."
+                          if ! response=$(curl -sSf -X POST \
+                              -H "Authorization: token ${GITHUB_TOKEN}" \
+                              -H "Accept: application/vnd.github.v3+json" \
+                              -H "Content-Type: application/json" \
+                              "${API_URL}" \
+                              -d "$payload"); then
+                              echo "::error::Failed to create PR comment."
+                              exit 1
+                          fi
+                          COMMENT_ID=$(echo "$response" | jq -r '.id // ""')
+                          if [ -z "$COMMENT_ID" ]; then
+                              echo "::error::GitHub API response did not include a comment id: $response"
+                              exit 1
+                          fi
+                          echo "Created comment with ID: $COMMENT_ID"
+                      else
+                          echo "Updating PR comment (ID: $COMMENT_ID)..."
+                          if ! curl -sSf -X PATCH \
+                              -H "Authorization: token ${GITHUB_TOKEN}" \
+                              -H "Accept: application/vnd.github.v3+json" \
+                              -H "Content-Type: application/json" \
+                              "https://api.github.com/repos/${REPO_OWNER}/${REPO_NAME}/issues/comments/${COMMENT_ID}" \
+                              -d "$payload" > /dev/null; then
+                              echo "::error::Failed to update PR comment (ID: $COMMENT_ID)."
+                              exit 1
+                          fi
+                      fi
+                  }
+
+                  API_URL=""
+                  COMMENT_ID=""
+
+                  if [ "${{ github.event_name }}" = "pull_request" ]; then
+                      API_URL="https://api.github.com/repos/${REPO_OWNER}/${REPO_NAME}/issues/${PR_NUMBER}/comments"
+                      initial_comment="## 🔄 Running Examples with \`${LLM_MODEL}\`"
+                      initial_comment+=$'\n\n'
+                      initial_comment+="_Run in progress..._"
+                      initial_comment+=$'\n'
+                      update_comment "$initial_comment"
+                  fi
+
+                  EXIT_CODE=0
+                  uv run pytest tests/examples/test_examples.py \
+                      --run-examples \
+                      --examples-results-dir "$RESULTS_DIR" \
+                      -n 4 || EXIT_CODE=$?
+
+                  TIMESTAMP="$(date -u '+%Y-%m-%d %H:%M:%S UTC')"
+                  WORKFLOW_URL="${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}/actions/runs/${GITHUB_RUN_ID}"
+
+                  uv run python scripts/render_examples_report.py \
+                      --results-dir "$RESULTS_DIR" \
+                      --model "$LLM_MODEL" \
+                      --workflow-url "$WORKFLOW_URL" \
+                      --timestamp "$TIMESTAMP" \
+                      --output "$REPORT_PATH"
+
+                  COMMENT_BODY="$(cat "$REPORT_PATH")"
+                  echo "$COMMENT_BODY"
+
+                  if [ "${{ github.event_name }}" = "pull_request" ]; then
+                      echo "Publishing PR comment..."
+                      update_comment "$COMMENT_BODY"
+                  fi
+
+                  if [ $EXIT_CODE -ne 0 ]; then
+                      exit $EXIT_CODE
+                  fi
+            - name: Read examples report for issue comment
+              if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
+              id: read_report
+              shell: bash
+              run: |
+                  if [ -f examples_report.md ]; then
+                      REPORT_CONTENT=$(cat examples_report.md)
+                      echo "report<<EOF" >> "$GITHUB_OUTPUT"
+                      echo "$REPORT_CONTENT" >> "$GITHUB_OUTPUT"
+                      echo "EOF" >> "$GITHUB_OUTPUT"
+                  else
+                      echo "report=Report file not found" >> "$GITHUB_OUTPUT"
+                  fi
+
+            - name: Comment with results on tracker issue
+              if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
+              uses: KeisukeYamashita/create-comment@v1
+              with:
+                  number: 976
+                  unique: false
+                  comment: |
+                      **Trigger:** ${{ github.event_name == 'schedule' && 'Nightly Scheduled Run' || format('Manual Trigger: {0}', github.event.inputs.reason) }}
+                      **Commit:** ${{ github.sha }}
+                      **Workflow Run:** ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
+
+                      ${{ steps.read_report.outputs.report }}
@@ -0,0 +1,715 @@
+---
+name: Agent Server
+
+on:
+    push:
+        branches: [main]
+        tags:
+            - '*'  # Trigger on any tag (e.g., 1.0.0, 1.0.0a5, build-docker)
+    pull_request:
+        branches: [main]
+    workflow_dispatch:
+        inputs:
+            base_image:
+                description: Base runtime image
+                type: string
+                default: nikolaik/python-nodejs:python3.13-nodejs22-slim
+            image:
+                description: GHCR image name
+                type: string
+                default: ghcr.io/openhands/agent-server
+            platforms:
+                description: Target platforms
+                type: string
+                default: linux/amd64,linux/arm64
+
+permissions:
+    contents: read
+    packages: write
+
+jobs:
+    build-binary-and-test:
+        runs-on: ${{ matrix.os }}
+        strategy:
+            matrix:
+                os: [ubuntu-latest, macos-latest]
+        steps:
+            - uses: actions/checkout@v5
+
+            - name: Install uv
+              uses: astral-sh/setup-uv@v7
+              with:
+                  version: latest
+                  python-version: '3.13'
+            - name: Install dependencies
+              run: uv sync --dev
+
+            - name: Build binary
+              run: |
+                  make build-server
+
+            # FIXME: windows-latest not working due to
+            # Run if [[ "windows-latest" == "windows-latest" ]]; then
+            # [PYI-2160:ERROR] Failed to load Python DLL 'C:\Users\RUNNER~1\AppData\Local\Temp\_MEI5602\python312.dll'.
+            # LoadLibrary: Invalid access to memory location.
+            # - name: Test binary (Windows)
+            #   if: matrix.os == 'windows-latest'
+            #   shell: pwsh
+            #   run: |
+            #       Get-ChildItem dist
+            #       .\dist\openhands-agent-server.exe --help
+
+            - name: Test binary (Linux and macOS)
+              if: matrix.os != 'windows-latest'
+              shell: bash
+              run: |
+                  # Test help command
+                  ./dist/openhands-agent-server --help
+
+                  # Test server startup and template loading
+                  echo "Testing server startup and template loading..."
+                  ./dist/openhands-agent-server --port 8002 > server_test.log 2>&1 &
+                  SERVER_PID=$!
+
+                  # Wait for server to start
+                  sleep 5
+
+                  # Check if server started successfully (no template errors)
+                  if grep -q "system_prompt.j2.*not found" server_test.log; then
+                      echo "ERROR: Template files not found in binary!"
+                      cat server_test.log
+                      kill $SERVER_PID 2>/dev/null || true
+                      exit 1
+                  fi
+
+                  # Check if server is running
+                  if ! kill -0 $SERVER_PID 2>/dev/null; then
+                      echo "ERROR: Server failed to start!"
+                      cat server_test.log
+                      exit 1
+                  fi
+
+                  # Test basic API endpoint
+                  if command -v curl >/dev/null 2>&1; then
+                      echo "Testing basic API endpoint..."
+                      if curl -f -s http://localhost:8002/health >/dev/null 2>&1; then
+                          echo "✓ Health endpoint accessible"
+                      else
+                          echo "⚠ Health endpoint not accessible (may be expected)"
+                      fi
+                  fi
+
+                  # Clean up
+                  kill $SERVER_PID 2>/dev/null || true
+                  wait $SERVER_PID 2>/dev/null || true
+                  rm -f server_test.log
+
+                  echo "✓ Binary test completed successfully"
+
+            - name: Upload binary artifact
+              uses: actions/upload-artifact@v7
+              with:
+                  name: openhands-server-${{ matrix.os }}
+                  path: |
+                      dist/openhands-server*
+                  retention-days: 7
+
+    check-openapi-schema:
+        name: Check OpenAPI Schema
+        runs-on: ubuntu-24.04
+
+        steps:
+            - name: Checkout PR branch
+              uses: actions/checkout@v5
+              with:
+                  fetch-depth: 0
+
+            - name: Install uv
+              uses: astral-sh/setup-uv@v7
+              with:
+                  version: latest
+                  python-version: '3.13'
+
+            - name: Install Node.js (for npx)
+              uses: actions/setup-node@v6
+              with:
+                  node-version: 22
+
+
+            - name: Install dependencies
+              run: |
+                  uv sync --frozen --dev
+
+            - name: Check OpenAPI JSON and build client
+              env:
+                  PYTHONPATH: .
+              run: |
+                  make test-server-schema
+
+    build-and-push-image:
+        name: Build & Push (${{ matrix.variant }}-${{ matrix.arch }})
+        # Run on push events, pull requests from the same repository (not forks), and manual workflow_dispatch
+        # Fork PRs cannot push to GHCR and would fail authentication
+        if: >
+            github.event_name == 'push' ||
+            github.event_name == 'workflow_dispatch' ||
+            (github.event_name == 'pull_request' &&
+             !github.event.pull_request.head.repo.fork)
+        strategy:
+            fail-fast: false
+            matrix:
+                # Explicit matrix: 3 variants × 2 architectures = 6 jobs
+                # Each job specifies exactly what it builds and where it runs
+                include:
+                    # Python variant
+                    - variant: python
+                      arch: amd64
+                      base_image: nikolaik/python-nodejs:python3.13-nodejs22
+                      runner: ubuntu-24.04
+                      platform: linux/amd64
+
+                    - variant: python
+                      arch: arm64
+                      base_image: nikolaik/python-nodejs:python3.13-nodejs22
+                      runner: ubuntu-24.04-arm
+                      platform: linux/arm64
+
+                    # Java variant
+                    - variant: java
+                      arch: amd64
+                      base_image: eclipse-temurin:17-jdk
+                      runner: ubuntu-24.04
+                      platform: linux/amd64
+
+                    - variant: java
+                      arch: arm64
+                      base_image: eclipse-temurin:17-jdk
+                      runner: ubuntu-24.04-arm
+                      platform: linux/arm64
+
+                    # Golang variant
+                    - variant: golang
+                      arch: amd64
+                      base_image: golang:1.21-bookworm
+                      runner: ubuntu-24.04
+                      platform: linux/amd64
+
+                    - variant: golang
+                      arch: arm64
+                      base_image: golang:1.21-bookworm
+                      runner: ubuntu-24.04-arm
+                      platform: linux/arm64
+
+        runs-on: ${{ matrix.runner }}
+
+        env:
+            IMAGE: ${{ inputs.image != '' && inputs.image || 'ghcr.io/openhands/agent-server' }}
+            BASE_IMAGE: ${{ inputs.base_image != '' && inputs.base_image || matrix.base_image }}
+            CUSTOM_TAGS: ${{ matrix.variant }}
+            VARIANT: ${{ matrix.variant }}
+            ARCH: ${{ matrix.arch }}
+            TARGET: binary
+            PLATFORM: ${{ matrix.platform }}
+            # Use PR head SHA for pull requests to match the image tag expected by run-examples.yml
+            GITHUB_SHA: ${{ github.event.pull_request.head.sha || github.sha }}
+            GITHUB_REF: ${{ github.ref }}
+            CI: 'true'
+
+        steps:
+            - name: Checkout
+              uses: actions/checkout@v5
+
+            - name: Install uv
+              uses: astral-sh/setup-uv@v7
+              with:
+                  version: latest
+                  python-version: '3.13'
+
+            - name: Set up Docker Buildx
+              uses: docker/setup-buildx-action@v3
+
+            - name: Log in to GHCR
+              uses: docker/login-action@v4
+              with:
+                  registry: ghcr.io
+                  username: ${{ github.actor }}
+                  password: ${{ secrets.GITHUB_TOKEN }}
+
+            - name: Prepare build context and metadata
+              id: prep
+              run: |
+                  uv sync --frozen
+
+                  # Generate build context and tags with arch suffix
+                  # build.py now handles architecture tagging internally via --arch flag
+                  # Add --versioned-tag when triggered by a git tag (e.g., v1.0.0)
+                  BUILD_CMD="uv run ./openhands-agent-server/openhands/agent_server/docker/build.py --build-ctx-only --arch ${{ matrix.arch }}"
+                  if [[ "${{ github.ref }}" == refs/tags/* ]]; then
+                      BUILD_CMD="$BUILD_CMD --versioned-tag"
+                  fi
+                  eval "$BUILD_CMD"
+
+                  # Alias tags_csv output to tags for the build action
+                  TAGS=$(grep '^tags_csv=' $GITHUB_OUTPUT | cut -d= -f2-)
+                  echo "tags=$TAGS" >> $GITHUB_OUTPUT
+
+                  # Extract short SHA for consolidation
+                  SHORT_SHA=$(echo ${{ github.sha }} | cut -c1-7)
+                  echo "short_sha=$SHORT_SHA" >> $GITHUB_OUTPUT
+
+                  # Extract versioned tags CSV for consolidation
+                  VERSIONED_TAGS_CSV=$(grep '^versioned_tags_csv=' $GITHUB_OUTPUT | cut -d= -f2- || echo "")
+                  echo "versioned_tags_csv=$VERSIONED_TAGS_CSV" >> $GITHUB_OUTPUT
+
+                  # Verify outputs
+                  echo "=== Build outputs ==="
+                  echo "Build context: $(grep '^build_context=' $GITHUB_OUTPUT | cut -d= -f2-)"
+                  echo "Tags: $TAGS"
+                  echo "Short SHA: $SHORT_SHA"
+                  echo "Versioned tags: $VERSIONED_TAGS_CSV"
+                  echo "===================="
+
+            - name: Build & Push (${{ matrix.variant }}-${{ matrix.arch }})
+              id: build
+              uses: docker/build-push-action@v6
+              with:
+                  context: ${{ steps.prep.outputs.build_context }}
+                  file: ${{ steps.prep.outputs.dockerfile }}
+                  target: ${{ env.TARGET }}
+                  platforms: ${{ env.PLATFORM }}
+                  push: true
+                  tags: ${{ steps.prep.outputs.tags }}
+                  cache-from: type=gha
+                  cache-to: type=gha,mode=max
+                  build-args: |
+                      BASE_IMAGE=${{ env.BASE_IMAGE }}
+
+            - name: Cleanup build context
+              if: always()
+              run: |
+                  if [ -n "${{ steps.prep.outputs.build_context }}" ] && [ -d "${{ steps.prep.outputs.build_context }}" ]; then
+                      echo "Cleaning up build context: ${{ steps.prep.outputs.build_context }}"
+                      rm -rf "${{ steps.prep.outputs.build_context }}"
+                  fi
+
+            - name: Summary (${{ matrix.variant }}-${{ matrix.arch }}) - outputs
+              run: |
+                  echo "Image: ${{ env.IMAGE }}"
+                  echo "Variant: ${{ env.VARIANT }}"
+                  echo "Architecture: ${{ env.ARCH }}"
+                  echo "Platform: ${{ env.PLATFORM }}"
+                  echo "Short SHA: ${{ steps.prep.outputs.short_sha }}"
+                  echo "Tags: ${{ steps.prep.outputs.tags }}"
+                  echo "Build digest: ${{ steps.build.outputs.digest }}"
+
+            - name: Save build info for consolidation
+              run: |
+                  mkdir -p build-info
+                  cat > "build-info/${{ matrix.variant }}-${{ matrix.arch }}.json" << EOF
+                  {
+                    "variant": "${{ matrix.variant }}",
+                    "arch": "${{ matrix.arch }}",
+                    "base_image": "${{ matrix.base_image }}",
+                    "image": "${{ env.IMAGE }}",
+                    "short_sha": "${{ steps.prep.outputs.short_sha }}",
+                    "tags": "${{ steps.prep.outputs.tags }}",
+                    "versioned_tags_csv": "${{ steps.prep.outputs.versioned_tags_csv }}",
+                    "platform": "${{ env.PLATFORM }}"
+                  }
+                  EOF
+
+            - name: Upload build info artifact
+              uses: actions/upload-artifact@v7
+              with:
+                  name: build-info-${{ matrix.variant }}-${{ matrix.arch }}
+                  path: build-info/${{ matrix.variant }}-${{ matrix.arch }}.json
+                  retention-days: 1
+
+    merge-manifests:
+        name: Merge Multi-Arch Manifests
+        needs: build-and-push-image
+        if: >
+            github.event_name == 'push' ||
+            (github.event_name == 'pull_request' &&
+             !github.event.pull_request.head.repo.fork)
+        runs-on: ubuntu-24.04
+        strategy:
+            matrix:
+                variant: [python, java, golang]
+        env:
+            IMAGE: ${{ inputs.image != '' && inputs.image || 'ghcr.io/openhands/agent-server' }}
+
+        steps:
+            - name: Download build info to extract SHORT_SHA
+              uses: actions/download-artifact@v8
+              with:
+                  pattern: build-info-${{ matrix.variant }}-*
+                  merge-multiple: true
+                  path: build-info
+
+            - name: Extract SHORT_SHA from build info
+              id: get_sha
+              run: |
+                  # Get SHORT_SHA from any build info artifact for this variant
+                  SHORT_SHA=$(jq -r '.short_sha' build-info/${{ matrix.variant }}-amd64.json)
+                  echo "short_sha=$SHORT_SHA" >> $GITHUB_OUTPUT
+                  echo "Using SHORT_SHA: $SHORT_SHA"
+
+            - name: Set up Docker Buildx
+              uses: docker/setup-buildx-action@v3
+
+            - name: Log in to GHCR
+              uses: docker/login-action@v4
+              with:
+                  registry: ghcr.io
+                  username: ${{ github.actor }}
+                  password: ${{ secrets.GITHUB_TOKEN }}
+
+            - name: Create and push multi-arch manifest for ${{ matrix.variant }}
+              id: create_manifest
+              run: |
+                  SHORT_SHA=${{ steps.get_sha.outputs.short_sha }}
+                  VARIANT=${{ matrix.variant }}
+                  MANIFEST_TAG="${SHORT_SHA}-${VARIANT}"
+
+                  # Create multi-arch manifest combining amd64 and arm64 using buildx imagetools
+                  # This properly handles manifest lists from Docker builds
+                  echo "Creating multi-arch manifest: ${IMAGE}:${MANIFEST_TAG}"
+                  docker buildx imagetools create -t ${IMAGE}:${MANIFEST_TAG} \
+                    ${IMAGE}:${SHORT_SHA}-${VARIANT}-amd64 \
+                    ${IMAGE}:${SHORT_SHA}-${VARIANT}-arm64
+
+                  # Verify the multi-arch manifest
+                  echo "Inspecting multi-arch manifest:"
+                  docker buildx imagetools inspect ${IMAGE}:${MANIFEST_TAG}
+
+                  echo "✓ Multi-arch manifest created: ${IMAGE}:${MANIFEST_TAG}"
+
+                  # Create latest manifest if on main branch
+                  if [ "${{ github.ref }}" == "refs/heads/main" ]; then
+                      LATEST_TAG="latest-${VARIANT}"
+                      echo "Creating latest multi-arch manifest: ${IMAGE}:${LATEST_TAG}"
+                      docker buildx imagetools create -t ${IMAGE}:${LATEST_TAG} \
+                        ${IMAGE}:main-${VARIANT}-amd64 \
+                        ${IMAGE}:main-${VARIANT}-arm64
+                      
+                      echo "Inspecting latest multi-arch manifest:"
+                      docker buildx imagetools inspect ${IMAGE}:${LATEST_TAG}
+                      echo "✓ Latest multi-arch manifest created: ${IMAGE}:${LATEST_TAG}"
+                      
+                      MANIFEST_TAG="${MANIFEST_TAG},${LATEST_TAG}"
+                  fi
+
+                  # Create versioned manifests if triggered by a git tag
+                  # Extract versioned tags from build info (format: "1.2.0-python,1.2.0-java")
+                  VERSIONED_TAGS_CSV=$(jq -r '.versioned_tags_csv' build-info/${VARIANT}-amd64.json)
+                  if [ -n "$VERSIONED_TAGS_CSV" ] && [ "$VERSIONED_TAGS_CSV" != "null" ] && [ "$VERSIONED_TAGS_CSV" != "" ]; then
+                      echo "Found versioned tags: $VERSIONED_TAGS_CSV"
+                      # Split CSV and create manifest for each versioned tag
+                      IFS=',' read -ra VERSIONED_TAGS <<< "$VERSIONED_TAGS_CSV"
+                      for VERSIONED_TAG in "${VERSIONED_TAGS[@]}"; do
+                          if [ -n "$VERSIONED_TAG" ]; then
+                              echo "Creating versioned multi-arch manifest: ${IMAGE}:${VERSIONED_TAG}"
+                              docker buildx imagetools create -t ${IMAGE}:${VERSIONED_TAG} \
+                                ${IMAGE}:${VERSIONED_TAG}-amd64 \
+                                ${IMAGE}:${VERSIONED_TAG}-arm64
+                              
+                              echo "Inspecting versioned multi-arch manifest:"
+                              docker buildx imagetools inspect ${IMAGE}:${VERSIONED_TAG}
+                              echo "✓ Versioned multi-arch manifest created: ${IMAGE}:${VERSIONED_TAG}"
+                              
+                              MANIFEST_TAG="${MANIFEST_TAG},${VERSIONED_TAG}"
+                          fi
+                      done
+                  fi
+
+                  # Save manifest info for consolidation
+                  mkdir -p manifest-info
+                  cat > "manifest-info/${VARIANT}.json" << EOF
+                  {
+                    "variant": "${VARIANT}",
+                    "image": "${IMAGE}",
+                    "short_sha": "${SHORT_SHA}",
+                    "manifest_tag": "${MANIFEST_TAG}"
+                  }
+                  EOF
+
+            - name: Upload manifest info artifact
+              uses: actions/upload-artifact@v7
+              with:
+                  name: manifest-info-${{ matrix.variant }}
+                  path: manifest-info/${{ matrix.variant }}.json
+                  retention-days: 1
+
+    consolidate-build-info:
+        name: Consolidate Build Information
+        needs: [build-and-push-image, merge-manifests]
+        # Run if it's a PR and the matrix job completed (even if some variants failed)
+        if: github.event_name == 'pull_request' && always() && (needs.build-and-push-image.result == 'success' || needs.build-and-push-image.result ==
+            'failure')
+        runs-on: ubuntu-24.04
+        outputs:
+            build_summary: ${{ steps.consolidate.outputs.build_summary }}
+        steps:
+            - name: Download build info artifacts
+              uses: actions/download-artifact@v8
+              with:
+                  pattern: build-info-*
+                  merge-multiple: true
+                  path: build-info
+
+            - name: Download manifest info artifacts
+              uses: actions/download-artifact@v8
+              with:
+                  pattern: manifest-info-*
+                  merge-multiple: true
+                  path: manifest-info
+
+            - name: Consolidate build information from artifacts
+              id: consolidate
+              run: |
+                  echo "Processing build info artifacts..."
+                  ls -la build-info/
+                  echo "Found $(ls build-info/*.json 2>/dev/null | wc -l) JSON files"
+
+                  # Initialize variables
+                  IMAGE=""
+                  SHORT_SHA=""
+                  ALL_TAGS=""
+
+                  # Use associative arrays to track variants (bash 4+)
+                  declare -A VARIANT_BASE_IMAGE
+                  declare -A VARIANT_ARCHS
+
+                  # Process each build info
+                  for info_file in build-info/*.json; do
+                      if [[ ! -f "$info_file" ]]; then
+                          echo "Skipping $info_file - not a file"
+                          continue
+                      fi
+                      
+                      echo "=== Processing $info_file ==="
+                      cat "$info_file"
+                      echo "=== End of $info_file ==="
+                      
+                      # Extract information from JSON
+                      VARIANT=$(jq -r '.variant' "$info_file")
+                      ARCH=$(jq -r '.arch' "$info_file")
+                      BASE_IMAGE=$(jq -r '.base_image' "$info_file")
+                      VARIANT_IMAGE=$(jq -r '.image' "$info_file")
+                      VARIANT_SHA=$(jq -r '.short_sha' "$info_file")
+                      VARIANT_TAGS=$(jq -r '.tags' "$info_file")
+                      
+                      # Set common values (same across all builds)
+                      if [[ -z "$IMAGE" ]]; then
+                          IMAGE="$VARIANT_IMAGE"
+                          SHORT_SHA="$VARIANT_SHA"
+                      fi
+                      
+                      # Store variant information
+                      VARIANT_BASE_IMAGE[$VARIANT]=$BASE_IMAGE
+                      if [[ -z "${VARIANT_ARCHS[$VARIANT]}" ]]; then
+                          VARIANT_ARCHS[$VARIANT]=$ARCH
+                      else
+                          VARIANT_ARCHS[$VARIANT]="${VARIANT_ARCHS[$VARIANT]}, $ARCH"
+                      fi
+                      
+                      # Collect tags (comma-separated to newline-separated)
+                      if [[ -n "$VARIANT_TAGS" ]]; then
+                          VARIANT_TAG_LIST=$(echo "$VARIANT_TAGS" | tr ',' '\n')
+                          if [[ -n "$ALL_TAGS" ]]; then
+                              ALL_TAGS="${ALL_TAGS}"$'\n'"${VARIANT_TAG_LIST}"
+                          else
+                              ALL_TAGS="$VARIANT_TAG_LIST"
+                          fi
+                      fi
+                  done
+
+                  # Build variants JSON array from collected data
+                  VARIANTS_JSON="[]"
+                  for VARIANT in "${!VARIANT_BASE_IMAGE[@]}"; do
+                      BASE_IMG="${VARIANT_BASE_IMAGE[$VARIANT]}"
+                      ARCHS="${VARIANT_ARCHS[$VARIANT]}"
+                      VARIANTS_JSON=$(echo "$VARIANTS_JSON" | jq \
+                          --arg variant "$VARIANT" \
+                          --arg base_image "$BASE_IMG" \
+                          --arg archs "$ARCHS" \
+                          '. += [{custom_tags: $variant, base_image: $base_image, architectures: $archs}]')
+                      
+                      echo "Added variant $VARIANT ($ARCHS), current variants JSON:"
+                      echo "$VARIANTS_JSON" | jq .
+                  done
+
+                  # Process manifest info artifacts
+                  echo "Processing manifest info artifacts..."
+                  if [[ -d "manifest-info" ]]; then
+                      ls -la manifest-info/
+                      
+                      MANIFEST_TAGS=""
+                      for manifest_file in manifest-info/*.json; do
+                          if [[ -f "$manifest_file" ]]; then
+                              echo "=== Processing $manifest_file ==="
+                              cat "$manifest_file"
+                              
+                              MANIFEST_TAG_CSV=$(jq -r '.manifest_tag' "$manifest_file")
+                              # Convert comma-separated tags to newline-separated
+                              MANIFEST_TAG_LIST=$(echo "$MANIFEST_TAG_CSV" | tr ',' '\n' | sed "s|^|${IMAGE}:|")
+                              
+                              if [[ -n "$MANIFEST_TAGS" ]]; then
+                                  MANIFEST_TAGS="${MANIFEST_TAGS}"$'\n'"${MANIFEST_TAG_LIST}"
+                              else
+                                  MANIFEST_TAGS="$MANIFEST_TAG_LIST"
+                              fi
+                          fi
+                      done
+
+                      # Add manifest tags to ALL_TAGS
+                      if [[ -n "$MANIFEST_TAGS" ]]; then
+                          echo "Adding manifest tags to output"
+                          if [[ -n "$ALL_TAGS" ]]; then
+                              ALL_TAGS="${ALL_TAGS}"$'\n'"${MANIFEST_TAGS}"
+                          else
+                              ALL_TAGS="$MANIFEST_TAGS"
+                          fi
+                      fi
+                  else
+                      echo "No manifest-info directory found (merge-manifests may not have run)"
+                  fi
+
+                  # Create consolidated build summary
+                  BUILD_SUMMARY=$(jq -n \
+                      --arg image "$IMAGE" \
+                      --arg short_sha "$SHORT_SHA" \
+                      --arg ghcr_url "https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server" \
+                      --arg all_tags "$ALL_TAGS" \
+                      --argjson variants "$VARIANTS_JSON" \
+                      '{
+                          image: $image,
+                          short_sha: $short_sha,
+                          ghcr_package_url: $ghcr_url,
+                          all_tags: $all_tags,
+                          variants: $variants
+                      }')
+
+                  echo "Consolidated build summary:"
+                  echo "$BUILD_SUMMARY" | jq .
+
+                  echo "DEBUG: Final variants count: $(echo "$VARIANTS_JSON" | jq 'length')"
+                  echo "DEBUG: Final variants: $(echo "$VARIANTS_JSON" | jq -c '.')"
+
+                  # Set output
+                  {
+                      echo 'build_summary<<EOF'
+                      echo "$BUILD_SUMMARY"
+                      echo 'EOF'
+                  } >> $GITHUB_OUTPUT
+
+    update-pr-description:
+        name: Update PR description with agent server image
+        needs: consolidate-build-info
+        # Only on PRs, and only if the consolidation succeeded
+        if: github.event_name == 'pull_request' && needs.consolidate-build-info.result == 'success'
+        runs-on: ubuntu-24.04
+        permissions:
+            contents: read
+            pull-requests: write
+
+        steps:
+            - name: Generate PR description from build summary
+              id: generate_description
+              run: |
+                  echo "Event: ${{ github.event_name }}"
+                  echo "PR number: ${{ github.event.number }}"
+                  echo "Run attempt: ${{ github.run_attempt }}"
+
+                  # Parse the build summary JSON
+                  BUILD_SUMMARY='${{ needs.consolidate-build-info.outputs.build_summary }}'
+                  echo "Build summary received:"
+                  echo "$BUILD_SUMMARY" | jq .
+
+                  # Extract basic information
+                  IMAGE=$(echo "$BUILD_SUMMARY" | jq -r '.image')
+                  SHORT_SHA=$(echo "$BUILD_SUMMARY" | jq -r '.short_sha')
+                  GHCR_URL=$(echo "$BUILD_SUMMARY" | jq -r '.ghcr_package_url')
+                  ALL_TAGS=$(echo "$BUILD_SUMMARY" | jq -r '.all_tags')
+
+                  # Build the variants table dynamically
+                  VARIANTS_TABLE=""
+
+                  # Process each build
+                  VARIANTS=$(echo "$BUILD_SUMMARY" | jq -r '.variants[] | @base64')
+                  echo "DEBUG: Found builds (base64 encoded):"
+                  echo "$VARIANTS"
+                  echo "DEBUG: Number of builds: $(echo "$VARIANTS" | wc -l)"
+
+                  for variant_data in $VARIANTS; do
+                      # Decode base64 and extract build info
+                      VARIANT_JSON=$(echo "$variant_data" | base64 --decode)
+                      echo "DEBUG: Processing build JSON: $VARIANT_JSON"
+                      CUSTOM_TAGS=$(echo "$VARIANT_JSON" | jq -r '.custom_tags')
+                      BASE_IMAGE=$(echo "$VARIANT_JSON" | jq -r '.base_image')
+                      ARCHS=$(echo "$VARIANT_JSON" | jq -r '.architectures // "amd64, arm64"')
+                      
+                      echo "DEBUG: Adding variant $CUSTOM_TAGS with base image $BASE_IMAGE (archs: $ARCHS)"
+                      # Add to variants table with architecture info
+                      VARIANTS_TABLE="${VARIANTS_TABLE}| ${CUSTOM_TAGS} | ${ARCHS} | \`${BASE_IMAGE}\` | [Link](https://hub.docker.com/_/${BASE_IMAGE}) |"$'\n'
+                  done
+
+                  echo "DEBUG: Final variants table:"
+                  echo "$VARIANTS_TABLE"
+
+                  # Create the complete PR description with the requested format
+                  PR_CONTENT=$(cat << EOF
+
+                  <!-- AGENT_SERVER_IMAGES_START -->
+                  ---
+                  **Agent Server images for this PR**
+
+                  • **GHCR package:** ${GHCR_URL}
+
+                  **Variants & Base Images**
+                  | Variant | Architectures | Base Image | Docs / Tags |
+                  |---|---|---|---|
+                  ${VARIANTS_TABLE}
+
+                  **Pull (multi-arch manifest)**
+                  \`\`\`bash
+                  # Each variant is a multi-arch manifest supporting both amd64 and arm64
+                  docker pull ${IMAGE}:${SHORT_SHA}-python
+                  \`\`\`
+
+                  **Run**
+                  \`\`\`bash
+                  docker run -it --rm \\
+                    -p 8000:8000 \\
+                    --name agent-server-${SHORT_SHA}-python \\
+                    ${IMAGE}:${SHORT_SHA}-python
+                  \`\`\`
+
+                  **All tags pushed for this build**
+                  \`\`\`
+                  ${ALL_TAGS}
+                  \`\`\`
+
+                  **About Multi-Architecture Support**
+                  - Each variant tag (e.g., \`${SHORT_SHA}-python\`) is a **multi-arch manifest** supporting both **amd64** and **arm64**
+                  - Docker automatically pulls the correct architecture for your platform
+                  - Individual architecture tags (e.g., \`${SHORT_SHA}-python-amd64\`) are also available if needed
+                  <!-- AGENT_SERVER_IMAGES_END -->
+                  EOF
+                  )
+
+                  # Set output for the next step
+                  {
+                      echo 'pr_content<<EOF'
+                      echo "$PR_CONTENT"
+                      echo 'EOF'
+                  } >> $GITHUB_OUTPUT
+
+            - name: Update PR description with docker image details
+              uses: nefrob/pr-description@v1.2.0
+              with:
+                  content: ${{ steps.generate_description.outputs.pr_content }}
+                  regex: <!-- AGENT_SERVER_IMAGES_START -->.*?<!-- AGENT_SERVER_IMAGES_END -->
+                  regexFlags: s
+                  token: ${{ secrets.GITHUB_TOKEN }}
@@ -1,23 +1,30 @@
+---
 # Workflow that marks issues and PRs with no activity for 30 days with "Stale" and closes them after 7 more days of no activity
-name: 'Close stale issues'
+name: Close stale issues

 # Runs every day at 01:30
 on:
-  schedule:
-    - cron: '30 1 * * *'
+    schedule:
+        - cron: 30 1 * * *

 jobs:
-  stale:
-    runs-on: blacksmith-4vcpu-ubuntu-2204
-    if: github.repository == 'OpenHands/OpenHands'
-    steps:
-      - uses: actions/stale@v9
-        with:
-          stale-issue-message: 'This issue is stale because it has been open for 40 days with no activity. Remove the stale label or leave a comment, otherwise it will be closed in 10 days.'
-          stale-pr-message: 'This PR is stale because it has been open for 40 days with no activity. Remove the stale label or leave a comment, otherwise it will be closed in 10 days.'
-          days-before-stale: 40
-          exempt-issue-labels: roadmap,backlog,app-team
-          close-issue-message: 'This issue was automatically closed due to 50 days of inactivity. We do this to help keep the issues somewhat manageable and focus on active issues.'
-          close-pr-message: 'This PR was closed because it had no activity for 50 days. If you feel this was closed in error, and you would like to continue the PR, please resubmit or let us know.'
-          days-before-close: 10
-          operations-per-run: 300
+    stale:
+        # Only run scheduled jobs in the main repository, not in forks
+        if: github.repository == 'OpenHands/software-agent-sdk'
+        runs-on: ubuntu-22.04
+        steps:
+            - uses: actions/stale@v10
+              with:
+                  repo-token: ${{ secrets.ALLHANDS_BOT_GITHUB_PAT }}
+                  stale-issue-message: This issue is stale because it has been open for 40 days with no activity. Remove the stale label or leave a 
+                      comment, otherwise it will be closed in 10 days.
+                  stale-pr-message: This PR is stale because it has been open for 40 days with no activity. Remove the stale label or leave a comment,
+                      otherwise it will be closed in 10 days.
+                  days-before-stale: 40
+                  exempt-issue-labels: roadmap,backlog
+                  close-issue-message: This issue was automatically closed due to 50 days of inactivity. We do this to help keep the issues somewhat 
+                      manageable and focus on active issues.
+                  close-pr-message: This PR was closed because it had no activity for 50 days. If you feel this was closed in error, and you would 
+                      like to continue the PR, please resubmit or let us know.
+                  days-before-close: 10
+                  operations-per-run: 150
@@ -0,0 +1,322 @@
+---
+name: Run tests
+
+on:
+    push:
+        branches: [main]
+    pull_request:
+        branches: ['**']
+
+permissions:
+    contents: write
+    pull-requests: write
+
+jobs:
+    sdk-tests:
+        runs-on: blacksmith-2vcpu-ubuntu-2404
+        steps:
+            - name: Checkout
+              uses: actions/checkout@v5
+              with: {fetch-depth: 0}
+
+            - name: Detect sdk changes
+              id: changed
+              uses: tj-actions/changed-files@v47
+              with:
+                  files: |
+                      openhands-sdk/**
+                      tests/sdk/**
+                      pyproject.toml
+                      uv.lock
+                      .github/workflows/tests.yml
+
+            - name: Install uv
+              if: steps.changed.outputs.any_changed == 'true'
+              uses: astral-sh/setup-uv@v7
+              with:
+                  enable-cache: true
+                  python-version: '3.13'
+
+            - name: Install deps
+              if: steps.changed.outputs.any_changed == 'true'
+              run: uv sync --frozen --group dev
+
+            - name: Check for openhands.tools imports in sdk tests
+              if: steps.changed.outputs.any_changed == 'true'
+              run: |
+                  echo "Checking for openhands.tools imports in tests/sdk..."
+                  if grep -r "from openhands\.tools" tests/sdk/ || grep -r "import openhands\.tools" tests/sdk/; then
+                    echo "ERROR: Found openhands.tools imports in tests/sdk/"
+                    echo "SDK tests should only import from openhands.sdk"
+                    echo "Please move tests that use openhands.tools to tests/cross/"
+                    exit 1
+                  fi
+                  echo "✓ No openhands.tools imports found in tests/sdk/"
+
+            - name: Run sdk tests with coverage
+              if: steps.changed.outputs.any_changed == 'true'
+              run: |
+                  # Clean up any existing coverage file
+                  rm -f .coverage
+                  # Use pytest-xdist (-n auto) for parallel execution with proper
+                  # coverage collection. --forked prevents coverage from child processes.
+                  CI=true uv run python -m pytest -vvs \
+                    -n auto \
+                    --cov=openhands-sdk \
+                    --cov-report=term-missing \
+                    --cov-fail-under=0 \
+                    --cov-config=pyproject.toml \
+                    tests/sdk
+                  # Rename coverage file for upload
+                  if [ -f .coverage ]; then
+                    mv .coverage coverage-sdk.dat
+                    echo "SDK coverage file prepared for upload"
+                  fi
+
+            - name: Upload sdk coverage
+              if: steps.changed.outputs.any_changed == 'true' && always()
+              uses: actions/upload-artifact@v7
+              with:
+                  name: coverage-sdk
+                  path: coverage-sdk.dat
+                  if-no-files-found: warn
+
+    tools-tests:
+        runs-on: blacksmith-2vcpu-ubuntu-2404
+        timeout-minutes: 15
+        steps:
+            - name: Checkout
+              uses: actions/checkout@v5
+              with: {fetch-depth: 0}
+
+            - name: Detect tools changes
+              id: changed
+              uses: tj-actions/changed-files@v47
+              with:
+                  files: |
+                      openhands-tools/**
+                      tests/tools/**
+                      pyproject.toml
+                      uv.lock
+                      .github/workflows/tests.yml
+
+            - name: Install uv
+              if: steps.changed.outputs.any_changed == 'true'
+              uses: astral-sh/setup-uv@v7
+              with:
+                  enable-cache: true
+                  python-version: '3.13'
+
+            - name: Install deps
+              if: steps.changed.outputs.any_changed == 'true'
+              run: uv sync --frozen --group dev
+
+            - name: Run tools tests with coverage
+              if: steps.changed.outputs.any_changed == 'true'
+              run: |
+                  # Clean up any existing coverage file
+                  rm -f .coverage
+                  # Use --forked for tools tests due to terminal test conflicts
+                  # when running in parallel (shared /tmp paths, subprocess management)
+                  CI=true uv run python -m pytest -vvs \
+                    --forked \
+                    --cov=openhands-tools \
+                    --cov-report=term-missing \
+                    --cov-fail-under=0 \
+                    --cov-config=pyproject.toml \
+                    tests/tools
+                  # Rename coverage file for upload
+                  if [ -f .coverage ]; then
+                    mv .coverage coverage-tools.dat
+                    echo "Tools coverage file prepared for upload"
+                  fi
+
+            - name: Upload tools coverage
+              if: steps.changed.outputs.any_changed == 'true' && always()
+              uses: actions/upload-artifact@v7
+              with:
+                  name: coverage-tools
+                  path: coverage-tools.dat
+                  if-no-files-found: warn
+
+    agent-server-tests:
+        runs-on: blacksmith-2vcpu-ubuntu-2404
+        steps:
+            - name: Checkout
+              uses: actions/checkout@v5
+              with: {fetch-depth: 0}
+
+            - name: Detect Agent Server changes
+              id: changed
+              uses: tj-actions/changed-files@v47
+              with:
+                  files: |
+                      openhands-agent-server/**
+                      tests/agent_server/**
+                      pyproject.toml
+                      uv.lock
+                      .github/workflows/tests.yml
+
+            - name: Install uv
+              if: steps.changed.outputs.any_changed == 'true'
+              uses: astral-sh/setup-uv@v7
+              with:
+                  enable-cache: true
+                  python-version: '3.13'
+
+            - name: Install deps
+              if: steps.changed.outputs.any_changed == 'true'
+              run: uv sync --frozen --group dev
+
+            - name: Run Agent Server tests with coverage
+              if: steps.changed.outputs.any_changed == 'true'
+              run: |
+                  # Clean up any existing coverage file
+                  rm -f .coverage
+                  # Use pytest-xdist (-n auto) for parallel execution with proper
+                  # coverage collection. --forked prevents coverage from child processes.
+                  CI=true uv run python -m pytest -vvs \
+                    -n auto \
+                    --cov=openhands-agent-server \
+                    --cov-report=term-missing \
+                    --cov-fail-under=0 \
+                    --cov-config=pyproject.toml \
+                    tests/agent_server
+                  # Rename coverage file for upload
+                  if [ -f .coverage ]; then
+                    mv .coverage coverage-agent-server.dat
+                    echo "Agent Server coverage file prepared for upload"
+                  fi
+
+            - name: Upload Agent Server coverage
+              if: steps.changed.outputs.any_changed == 'true' && always()
+              uses: actions/upload-artifact@v7
+              with:
+                  name: coverage-agent-server
+                  path: coverage-agent-server.dat
+                  if-no-files-found: warn
+
+    cross-tests:
+        runs-on: blacksmith-2vcpu-ubuntu-2404
+        steps:
+            - name: Checkout
+              uses: actions/checkout@v5
+              with: {fetch-depth: 0}
+
+            - name: Detect cross changes
+              id: changed
+              uses: tj-actions/changed-files@v47
+              with:
+                  files: |
+                      tests/**
+                      openhands/**
+                      pyproject.toml
+                      uv.lock
+                      .github/workflows/tests.yml
+
+            - name: Install uv
+              if: steps.changed.outputs.any_changed == 'true'
+              uses: astral-sh/setup-uv@v7
+              with:
+                  enable-cache: true
+                  python-version: '3.13'
+
+            - name: Install deps
+              if: steps.changed.outputs.any_changed == 'true'
+              run: uv sync --frozen --group dev
+
+            - name: Run cross tests with coverage
+              if: steps.changed.outputs.any_changed == 'true'
+              run: |
+                  # Clean up any existing coverage file
+                  rm -f .coverage
+                  CI=true uv run python -m pytest -vvs \
+                    --basetemp="${{ runner.temp }}/pytest" \
+                    -o tmp_path_retention=none \
+                    -o tmp_path_retention_count=0 \
+                    --cov=openhands \
+                    --cov-report=term-missing \
+                    --cov-fail-under=0 \
+                    --cov-config=pyproject.toml \
+                    tests/cross
+                  # Rename coverage file for upload
+                  if [ -f .coverage ]; then
+                    mv .coverage coverage-cross.dat
+                    echo "Cross coverage file prepared for upload"
+                  fi
+
+            - name: Upload cross coverage
+              if: steps.changed.outputs.any_changed == 'true' && always()
+              uses: actions/upload-artifact@v7
+              with:
+                  name: coverage-cross
+                  path: coverage-cross.dat
+                  if-no-files-found: warn
+
+    coverage-report:
+        runs-on: blacksmith-2vcpu-ubuntu-2404
+        needs: [sdk-tests, tools-tests, agent-server-tests, cross-tests]
+        if: always() && github.event_name == 'pull_request'
+        steps:
+            - name: Checkout
+              uses: actions/checkout@v5
+
+            - name: Install uv
+              uses: astral-sh/setup-uv@v7
+              with:
+                  enable-cache: true
+                  python-version: '3.13'
+
+            - name: Install deps (for coverage CLI)
+              run: uv sync --frozen --group dev
+
+            - name: Download coverage artifacts
+              uses: actions/download-artifact@v8
+              with:
+                  path: ./cov
+              continue-on-error: true
+
+            - name: Combine coverage data
+              run: |
+                  shopt -s nullglob
+                  # For some reason, the github action won't properly upload the original
+                  # .converage* files
+                  # Convert uploaded .dat files back to .coverage format for coverage tool
+                  for dat_file in cov/**/coverage-*.dat; do
+                    if [[ "$dat_file" == *coverage-sdk.dat ]]; then
+                      cp "$dat_file" .coverage.sdk
+                    elif [[ "$dat_file" == *coverage-tools.dat ]]; then
+                      cp "$dat_file" .coverage.tools  
+                    elif [[ "$dat_file" == *coverage-agent-server.dat ]]; then
+                      cp "$dat_file" .coverage.agent-server
+                    elif [[ "$dat_file" == *coverage-cross.dat ]]; then
+                      cp "$dat_file" .coverage.cross
+                    fi
+                  done
+
+                  # Check if we have any coverage files
+                  coverage_files=(.coverage.*)
+                  if [ ${#coverage_files[@]} -eq 0 ]; then
+                    echo "No coverage files found; skipping combined report."
+                    exit 0
+                  fi
+
+                  echo "Found ${#coverage_files[@]} coverage files"
+                  uv run coverage combine
+                  uv run coverage xml -i -o coverage.xml
+                  uv run coverage report -m
+
+            - name: Pytest coverage PR comment
+              if: always()
+              continue-on-error: true
+              uses: MishaKav/pytest-coverage-comment@v1
+              with:
+                  github-token: ${{ secrets.GITHUB_TOKEN }}
+                  pytest-xml-coverage-path: coverage.xml
+                  title: Coverage Report
+                  create-new-comment: false
+                  hide-report: false
+                  xml-skip-covered: true
+                  report-only-changed-files: true
+                  remove-links-to-files: true
+                  remove-links-to-lines: true
@@ -0,0 +1,322 @@
+---
+# Automated TODO Management Workflow
+#
+# This workflow automatically scans for TODO(openhands) comments and creates
+# pull requests to implement them using the OpenHands agent.
+#
+# Setup:
+#  1. Add LLM_API_KEY to repository secrets
+#  2. Ensure GITHUB_TOKEN has appropriate permissions
+#  3. Make sure Github Actions are allowed to create and review PRs
+#  4. Commit this file to .github/workflows/ in your repository
+#  5. Configure the schedule or trigger manually
+
+name: Automated TODO Management
+
+on:
+  # Manual trigger
+    workflow_dispatch:
+        inputs:
+            max_todos:
+                description: Maximum number of TODOs to process in this run
+                required: false
+                default: '3'
+                type: string
+            todo_identifier:
+                description: TODO identifier to search for (e.g., TODO(openhands))
+                required: false
+                default: TODO(openhands)
+                type: string
+
+  # Trigger when 'automatic-todo' label is added to a PR
+    pull_request:
+        types: [labeled]
+
+  # Scheduled trigger (disabled by default, uncomment and customize as needed)
+  # schedule:
+  # # Run every Monday at 9 AM UTC
+  # - cron: "0 9 * * 1"
+
+permissions:
+    contents: write
+    pull-requests: write
+    issues: write
+
+jobs:
+    scan-todos:
+        runs-on: ubuntu-24.04
+    # Only run if triggered manually or if 'automatic-todo' label was added
+        if: >
+            github.event_name == 'workflow_dispatch' ||
+            (github.event_name == 'pull_request' &&
+             github.event.label.name == 'automatic-todo')
+        outputs:
+            todos: ${{ steps.scan.outputs.todos }}
+            todo-count: ${{ steps.scan.outputs.todo-count }}
+        steps:
+            - name: Checkout repository
+              uses: actions/checkout@v5
+              with:
+                  fetch-depth: 0 # Full history for better context
+
+            - name: Set up Python
+              uses: actions/setup-python@v6
+              with:
+                  python-version: '3.13'
+
+            - name: Copy TODO scanner
+              run: |
+                  cp examples/03_github_workflows/03_todo_management/scanner.py /tmp/scanner.py
+                  chmod +x /tmp/scanner.py
+
+            - name: Scan for TODOs
+              id: scan
+              run: |
+                  echo "Scanning for TODO comments..."
+
+                  # Run the scanner and capture output
+                  TODO_IDENTIFIER="${{ github.event.inputs.todo_identifier || 'TODO(openhands)' }}"
+                  python /tmp/scanner.py . --identifier "$TODO_IDENTIFIER" > todos.json
+
+                  # Count TODOs
+                  TODO_COUNT=$(python -c \
+                    "import json; data=json.load(open('todos.json')); print(len(data))")
+                  echo "Found $TODO_COUNT $TODO_IDENTIFIER items"
+
+                  # Limit the number of TODOs to process
+                  MAX_TODOS="${{ github.event.inputs.max_todos || '3' }}"
+                  if [ "$TODO_COUNT" -gt "$MAX_TODOS" ]; then
+                    echo "Limiting to first $MAX_TODOS TODOs"
+                    python -c "
+                  import json
+                  data = json.load(open('todos.json'))
+                  limited = data[:$MAX_TODOS]
+                  json.dump(limited, open('todos.json', 'w'), indent=2)
+                  "
+                    TODO_COUNT=$MAX_TODOS
+                  fi
+
+                  # Set outputs
+                  echo "todos=$(cat todos.json | jq -c .)" >> $GITHUB_OUTPUT
+                  echo "todo-count=$TODO_COUNT" >> $GITHUB_OUTPUT
+
+                  # Display found TODOs
+                  echo "## 📋 Found TODOs" >> $GITHUB_STEP_SUMMARY
+                  if [ "$TODO_COUNT" -eq 0 ]; then
+                    echo "No TODO(openhands) comments found." >> $GITHUB_STEP_SUMMARY
+                  else
+                    echo "Found $TODO_COUNT TODO(openhands) items:" \
+                      >> $GITHUB_STEP_SUMMARY
+                    echo "" >> $GITHUB_STEP_SUMMARY
+                    python -c "
+                  import json
+                  data = json.load(open('todos.json'))
+                  for i, todo in enumerate(data, 1):
+                      print(f'{i}. **{todo[\"file\"]}:{todo[\"line\"]}** - ' +
+                            f'{todo[\"description\"]}')
+                  " >> $GITHUB_STEP_SUMMARY
+                  fi
+
+    process-todos:
+        needs: scan-todos
+        if: needs.scan-todos.outputs.todo-count > 0
+        runs-on: ubuntu-24.04
+        strategy:
+            matrix:
+                todo: ${{ fromJson(needs.scan-todos.outputs.todos) }}
+            max-parallel: 1 # Process one TODO at a time to avoid conflicts
+        steps:
+            - name: Checkout repository
+              uses: actions/checkout@v5
+              with:
+                  fetch-depth: 0
+                  token: ${{ secrets.ALLHANDS_BOT_GITHUB_PAT }}
+
+            - name: Switch to feature branch with TODO management files
+              run: |
+                  git checkout openhands/todo-management-example
+                  git pull origin openhands/todo-management-example
+
+            - name: Set up Python
+              uses: actions/setup-python@v6
+              with:
+                  python-version: '3.13'
+
+            - name: Install uv
+              uses: astral-sh/setup-uv@v7
+              with:
+                  enable-cache: true
+
+            - name: Install OpenHands dependencies
+              run: |
+                  # Install OpenHands SDK and tools from git repository
+                  uv pip install --system "openhands-sdk @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-sdk"
+                  uv pip install --system "openhands-tools @ git+https://github.com/OpenHands/agent-sdk.git@main#subdirectory=openhands-tools"
+
+            - name: Copy agent files
+              run: |
+                  cp examples/03_github_workflows/03_todo_management/agent_script.py agent.py
+                  cp examples/03_github_workflows/03_todo_management/prompt.py prompt.py
+                  chmod +x agent.py
+
+            - name: Configure Git
+              run: |
+                  git config --global user.name "openhands-bot"
+                  git config --global user.email \
+                    "openhands-bot@users.noreply.github.com"
+
+            - name: Process TODO
+              env:
+                  LLM_MODEL: litellm_proxy/claude-sonnet-4-5-20250929
+                  LLM_BASE_URL: https://llm-proxy.app.all-hands.dev
+                  LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
+                  GITHUB_TOKEN: ${{ secrets.ALLHANDS_BOT_GITHUB_PAT }}
+                  GITHUB_REPOSITORY: ${{ github.repository }}
+                  TODO_FILE: ${{ matrix.todo.file }}
+                  TODO_LINE: ${{ matrix.todo.line }}
+                  TODO_DESCRIPTION: ${{ matrix.todo.description }}
+                  PYTHONPATH: ''
+              run: |
+                  echo "Processing TODO: $TODO_DESCRIPTION"
+                  echo "File: $TODO_FILE:$TODO_LINE"
+
+                  # Create a unique branch name for this TODO
+                  BRANCH_NAME="todo/$(echo "$TODO_DESCRIPTION" | \
+                    sed 's/[^a-zA-Z0-9]/-/g' | \
+                    sed 's/--*/-/g' | \
+                    sed 's/^-\|-$//g' | \
+                    tr '[:upper:]' '[:lower:]' | \
+                    cut -c1-50)"
+                  echo "Branch name: $BRANCH_NAME"
+
+                  # Create and switch to new branch (force create if exists)
+                  git checkout -B "$BRANCH_NAME"
+
+                  # Run the agent to process the TODO
+                  # Stay in repository directory for git operations
+
+                  # Create JSON payload for the agent
+                  TODO_JSON=$(cat <<EOF
+                  {
+                    "file": "$TODO_FILE",
+                    "line": $TODO_LINE,
+                    "description": "$TODO_DESCRIPTION"
+                  }
+                  EOF
+                  )
+
+                  echo "JSON payload for agent:"
+                  echo "$TODO_JSON"
+
+                  # Debug environment and setup
+                  echo "Current working directory: $(pwd)"
+                  echo "Environment variables:"
+                  echo "  LLM_MODEL: $LLM_MODEL"
+                  echo "  LLM_BASE_URL: $LLM_BASE_URL"
+                  echo "  GITHUB_REPOSITORY: $GITHUB_REPOSITORY"
+                  echo "  LLM_API_KEY: ${LLM_API_KEY:+[SET]}"
+                  echo "  GITHUB_TOKEN: ${GITHUB_TOKEN:+[SET]}"
+                  echo "Available files:"
+                  ls -la
+
+                  # Run the agent with detailed logging
+                  echo "Starting agent execution..."
+                  set +e  # Don't exit on error, we want to capture it
+                  uv run python agent.py "$TODO_JSON" 2>&1 | tee agent_output.log
+                  AGENT_EXIT_CODE=$?
+                  set -e
+
+                  echo "Agent exit code: $AGENT_EXIT_CODE"
+                  echo "Agent output log:"
+                  cat agent_output.log
+
+                  # Show files in working directory
+                  echo "Files in working directory:"
+                  ls -la
+
+                  # If agent failed, show more details
+                  if [ $AGENT_EXIT_CODE -ne 0 ]; then
+                    echo "Agent failed with exit code $AGENT_EXIT_CODE"
+                    echo "Last 50 lines of agent output:"
+                    tail -50 agent_output.log
+                    exit $AGENT_EXIT_CODE
+                  fi
+
+                  # Check if any changes were made
+                  cd "$GITHUB_WORKSPACE"
+                  if git diff --quiet; then
+                    echo "No changes made by agent, skipping PR creation"
+                    exit 0
+                  fi
+
+                  # Commit changes
+                  git add -A
+                  git commit -m "Implement TODO: $TODO_DESCRIPTION
+
+                  Automatically implemented by OpenHands agent.
+
+                  Co-authored-by: openhands <openhands@all-hands.dev>"
+
+                  # Push branch
+                  git push origin "$BRANCH_NAME"
+
+                  # Create pull request
+                  PR_TITLE="Implement TODO: $TODO_DESCRIPTION"
+                  PR_BODY="## 🤖 Automated TODO Implementation
+
+                  This PR automatically implements the following TODO:
+
+                  **File:** \`$TODO_FILE:$TODO_LINE\`
+                  **Description:** $TODO_DESCRIPTION
+
+                  ### Implementation
+                  The OpenHands agent has analyzed the TODO and implemented the
+                  requested functionality.
+
+                  ### Review Notes
+                  - Please review the implementation for correctness
+                  - Test the changes in your development environment
+                  - The original TODO comment will be updated with this PR URL
+                    once merged
+
+                  ---
+                  *This PR was created automatically by the TODO Management workflow.*"
+
+                  # Create PR using GitHub CLI or API
+                  curl -X POST \
+                    -H "Authorization: token $GITHUB_TOKEN" \
+                    -H "Accept: application/vnd.github.v3+json" \
+                    "https://api.github.com/repos/${{ github.repository }}/pulls" \
+                    -d "{
+                      \"title\": \"$PR_TITLE\",
+                      \"body\": \"$PR_BODY\",
+                      \"head\": \"$BRANCH_NAME\",
+                      \"base\": \"${{ github.ref_name }}\"
+                    }"
+
+    summary:
+        needs: [scan-todos, process-todos]
+        if: always()
+        runs-on: ubuntu-24.04
+        steps:
+            - name: Generate Summary
+              run: |
+                  echo "# 🤖 TODO Management Summary" >> $GITHUB_STEP_SUMMARY
+                  echo "" >> $GITHUB_STEP_SUMMARY
+
+                  TODO_COUNT="${{ needs.scan-todos.outputs.todo-count || '0' }}"
+                  echo "**TODOs Found:** $TODO_COUNT" >> $GITHUB_STEP_SUMMARY
+
+                  if [ "$TODO_COUNT" -gt 0 ]; then
+                    echo "**Processing Status:** ✅ Completed" >> $GITHUB_STEP_SUMMARY
+                    echo "" >> $GITHUB_STEP_SUMMARY
+                    echo "Check the pull requests created for each TODO" \
+                      "implementation." >> $GITHUB_STEP_SUMMARY
+                  else
+                    echo "**Status:** ℹ️ No TODOs found to process" \
+                      >> $GITHUB_STEP_SUMMARY
+                  fi
+
+                  echo "" >> $GITHUB_STEP_SUMMARY
+                  echo "---" >> $GITHUB_STEP_SUMMARY
+                  echo "*Workflow completed at $(date)*" >> $GITHUB_STEP_SUMMARY
@@ -1,34 +0,0 @@
-name: Run UI Component Build
-
-# * Always run on "main"
-# * Run on PRs that have changes in the "openhands-ui" folder or this workflow
-on:
-  push:
-    branches:
-      - main
-  pull_request:
-    paths:
-      - 'openhands-ui/**'
-      -  '.github/workflows/ui-build.yml'
-
-# If triggered by a PR, it will be in the same group. However, each commit on main will be in its own unique group
-concurrency:
-  group: ${{ github.workflow }}-${{ (github.head_ref && github.ref) || github.run_id }}
-  cancel-in-progress: true
-
-jobs:
-  ui-build:
-    name: Build openhands-ui
-    runs-on: blacksmith-4vcpu-ubuntu-2204
-    steps:
-      - name: Checkout
-        uses: actions/checkout@v4
-      - uses: oven-sh/setup-bun@v2
-        with:
-          bun-version-file: "openhands-ui/.bun-version"
-      - name: Install dependencies
-        working-directory: ./openhands-ui
-        run: bun install --frozen-lockfile
-      - name: Build package
-        working-directory:  ./openhands-ui
-        run: bun run build
@@ -0,0 +1,25 @@
+---
+name: Version bump guard
+
+on:
+    pull_request:
+        branches: [main]
+
+jobs:
+    version-bump-guard:
+        name: Check package versions
+        runs-on: ubuntu-latest
+        permissions:
+            contents: read
+        steps:
+            - name: Checkout
+              uses: actions/checkout@v5
+              with:
+                  fetch-depth: 0
+
+            - name: Validate package version changes
+              env:
+                  VERSION_BUMP_BASE_REF: ${{ github.base_ref }}
+                  PR_TITLE: ${{ github.event.pull_request.title }}
+                  PR_HEAD_REF: ${{ github.event.pull_request.head.ref }}
+              run: python3 .github/scripts/check_version_bumps.py
@@ -0,0 +1,346 @@
+---
+name: Create Version Bump PRs
+
+on:
+    # Triggered by pypi-release workflow after successful publish
+    # Note: No branches filter - releases run on tags (e.g., v1.11.4), not branches
+    workflow_run:
+        workflows: [Publish all OpenHands packages (uv)]
+        types: [completed]
+    # Allow manual trigger with version input
+    workflow_dispatch:
+        inputs:
+            version:
+                description: Version to bump to (e.g., 1.11.3)
+                required: true
+                type: string
+
+jobs:
+    create-version-bump-prs:
+        runs-on: ubuntu-24.04
+        # Only run on successful workflow_run or manual dispatch
+        if: >
+            github.event_name == 'workflow_dispatch' ||
+            (github.event.workflow_run.conclusion == 'success' &&
+             github.event.workflow_run.event == 'release')
+        env:
+            GH_TOKEN: ${{ secrets.ALLHANDS_BOT_GITHUB_PAT }}
+        steps:
+            - name: Checkout
+              uses: actions/checkout@v5
+
+            - name: Get version from release or input
+              id: get_version
+              run: |
+                  if [[ "${{ github.event_name }}" == "workflow_dispatch" ]]; then
+                    VERSION="${{ github.event.inputs.version }}"
+                  else
+                    # Get version from the release that triggered the workflow_run
+                    # The workflow_run was triggered by a release event
+                    RELEASE_TAG=$(gh api repos/${{ github.repository }}/releases/latest --jq '.tag_name')
+                    VERSION="${RELEASE_TAG#v}"  # Remove 'v' prefix
+                  fi
+                  echo "version=$VERSION" >> $GITHUB_OUTPUT
+                  echo "📦 Version: $VERSION"
+
+            - name: Validate version
+              env:
+                  VERSION: ${{ steps.get_version.outputs.version }}
+              run: |
+                  if [ -z "$VERSION" ]; then
+                    echo "❌ Version is empty"
+                    exit 1
+                  fi
+                  echo "📦 Creating version bump PRs for version: $VERSION"
+
+            - name: Wait for packages to be available on PyPI
+              env:
+                  VERSION: ${{ steps.get_version.outputs.version }}
+              run: |
+                  set -euo pipefail
+
+                  PACKAGES=(
+                    openhands-sdk
+                    openhands-tools
+                    openhands-workspace
+                    openhands-agent-server
+                  )
+
+                  MAX_ATTEMPTS=60
+                  SLEEP_SECONDS=20
+
+                  echo "⏳ Waiting for packages to be available on PyPI..."
+
+                  for PKG in "${PACKAGES[@]}"; do
+                    echo "Checking $PKG==$VERSION..."
+                    ATTEMPT=1
+                    while [ $ATTEMPT -le $MAX_ATTEMPTS ]; do
+                      # Check if the package version is available on PyPI
+                      HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" \
+                        "https://pypi.org/pypi/$PKG/$VERSION/json")
+
+                      if [ "$HTTP_CODE" = "200" ]; then
+                        echo "✅ $PKG==$VERSION is available on PyPI"
+                        break
+                      fi
+
+                      echo "  Attempt $ATTEMPT/$MAX_ATTEMPTS: $PKG==$VERSION not yet available (HTTP $HTTP_CODE), waiting ${SLEEP_SECONDS}s..."
+                      sleep $SLEEP_SECONDS
+                      ATTEMPT=$((ATTEMPT + 1))
+                    done
+
+                    if [ $ATTEMPT -gt $MAX_ATTEMPTS ]; then
+                      echo "❌ Timeout waiting for $PKG==$VERSION to be available on PyPI"
+                      exit 1
+                    fi
+                  done
+
+                  echo "✅ All packages are available on PyPI!"
+
+            - name: Install uv
+              uses: astral-sh/setup-uv@v7
+              with:
+                  version: latest
+                  python-version: '3.12'
+
+            - name: Install Poetry
+              run: |
+                  pipx install poetry==2.2.1
+
+            # OpenHands-CLI step runs first since it's simpler and less error-prone
+            - name: Create PR for OpenHands-CLI repo
+              env:
+                  VERSION: ${{ steps.get_version.outputs.version }}
+              run: |
+                  set -euo pipefail
+
+                  REPO="OpenHands/openhands-cli"
+                  BRANCH="bump-sdk-$VERSION"
+
+                  echo "🔄 Creating PR for $REPO..."
+
+                  # Clone the repo
+                  git clone "https://x-access-token:${GH_TOKEN}@github.com/${REPO}.git" openhands-cli-repo
+                  cd openhands-cli-repo
+
+                  # Configure git
+                  git config user.name "github-actions[bot]"
+                  git config user.email "github-actions[bot]@users.noreply.github.com"
+
+                  # Check if branch already exists on remote
+                  if git ls-remote --heads origin "$BRANCH" | grep -q "$BRANCH"; then
+                    echo "⚠️ Branch $BRANCH already exists, checking out existing branch"
+                    git fetch origin "$BRANCH"
+                    git checkout "$BRANCH"
+                  else
+                    # Create branch
+                    git checkout -b "$BRANCH"
+                  fi
+
+                  # OpenHands-CLI currently requires Python 3.12, so resolve with that interpreter.
+                  uv add --python 3.12 --refresh \
+                    "openhands-sdk==$VERSION" \
+                    "openhands-tools==$VERSION"
+
+                  # Check if there are changes
+                  if git diff --quiet; then
+                    echo "⚠️ No changes detected in $REPO - versions may already be up to date"
+                    exit 0
+                  fi
+
+                  # Commit and push
+                  git add pyproject.toml uv.lock
+                  git commit -m "Bump openhands-sdk, openhands-tools to $VERSION" \
+                    -m "Automated version bump after PyPI release." \
+                    -m "Co-authored-by: openhands <openhands@all-hands.dev>"
+                  git push -u origin "$BRANCH"
+
+                  # Check if PR already exists
+                  EXISTING_PR=$(gh pr list --repo "$REPO" --head "$BRANCH" --json number --jq '.[0].number')
+                  if [ -n "$EXISTING_PR" ]; then
+                    echo "✅ PR #$EXISTING_PR already exists for $REPO"
+                  else
+                    # Create PR
+                    gh pr create \
+                      --repo "$REPO" \
+                      --title "Bump SDK packages to v$VERSION" \
+                      --body "## Automated Version Bump
+
+                  This PR updates the following packages to version **$VERSION**:
+                  - \`openhands-sdk\`
+                  - \`openhands-tools\`
+
+                  **Triggered by:** Release of [software-agent-sdk v$VERSION](https://github.com/OpenHands/software-agent-sdk/releases/tag/v$VERSION)
+
+                  ---
+                  _This PR was automatically created by the version-bump-prs workflow._" \
+                      --base main \
+                      --head "$BRANCH"
+
+                    echo "✅ PR created for $REPO"
+                  fi
+
+            - name: Create PR for OpenHands repo
+              env:
+                  VERSION: ${{ steps.get_version.outputs.version }}
+              run: |
+                  set -euo pipefail
+
+                  REPO="All-Hands-AI/OpenHands"
+                  BRANCH="bump-sdk-$VERSION"
+
+                  echo "🔄 Creating PR for $REPO..."
+
+                  # Clone the repo
+                  git clone "https://x-access-token:${GH_TOKEN}@github.com/${REPO}.git" openhands-repo
+                  cd openhands-repo
+
+                  # Configure git
+                  git config user.name "github-actions[bot]"
+                  git config user.email "github-actions[bot]@users.noreply.github.com"
+
+                  # Check if branch already exists on remote
+                  if git ls-remote --heads origin "$BRANCH" | grep -q "$BRANCH"; then
+                    echo "⚠️ Branch $BRANCH already exists, checking out existing branch"
+                    git fetch origin "$BRANCH"
+                    git checkout "$BRANCH"
+                  else
+                    # Create branch
+                    git checkout -b "$BRANCH"
+                  fi
+
+                  # 1. Update versions in pyproject.toml and poetry.lock using poetry (root)
+                  # The --lock flag updates both pyproject.toml AND poetry.lock
+                  # Note: enterprise/pyproject.toml gets these dependencies transitively via openhands-ai
+                  echo "📝 Updating root pyproject.toml and poetry.lock..."
+
+                  # Verify enterprise/pyproject.toml does NOT have SDK packages explicitly listed
+                  # If they exist there, they will become stale since we only update root pyproject.toml
+                  if [ -f "enterprise/pyproject.toml" ]; then
+                    echo "🔍 Verifying enterprise/pyproject.toml doesn't have explicit SDK packages..."
+                    SDK_PACKAGES=("openhands-sdk" "openhands-tools" "openhands-agent-server")
+                    for pkg in "${SDK_PACKAGES[@]}"; do
+                      # Match package name as a TOML key (with optional leading whitespace) followed by =
+                      # This catches both 'openhands-sdk = "1.2.3"' and 'openhands-sdk="1.2.3"'
+                      if grep -qE "^[[:space:]]*${pkg}[[:space:]]*=" enterprise/pyproject.toml; then
+                        echo "❌ ERROR: enterprise/pyproject.toml contains explicit reference to '$pkg'"
+                        echo "   These packages should come transitively via openhands-ai dependency."
+                        echo "   Please remove '$pkg' from enterprise/pyproject.toml to avoid version drift."
+                        exit 1
+                      fi
+                    done
+                    echo "✅ enterprise/pyproject.toml does not have explicit SDK packages"
+                  fi
+
+                  # 1. Update versions in pyproject.toml using sed for exact pinning
+                  # Note: We use sed instead of `poetry add --lock` because Poetry normalizes
+                  # version constraints (e.g., "==1.13.1" becomes "1.13") which causes
+                  # inconsistencies between [tool.poetry.dependencies] and [project].dependencies
+                  echo "📝 Updating pyproject.toml with exact version pins..."
+
+                  # Update [tool.poetry.dependencies] section
+                  # Matches: openhands-sdk = "1.13" or openhands-sdk = "1.13.0"
+                  sed -i -E 's/^(openhands-sdk = )"[^"]*"/\1"'"$VERSION"'"/' pyproject.toml
+                  sed -i -E 's/^(openhands-tools = )"[^"]*"/\1"'"$VERSION"'"/' pyproject.toml
+                  sed -i -E 's/^(openhands-agent-server = )"[^"]*"/\1"'"$VERSION"'"/' pyproject.toml
+
+                  # Update [project].dependencies section (PEP 621 format)
+                  # Matches: "openhands-sdk==1.13.1", or "openhands-sdk==1.13",
+                  sed -i -E 's/"openhands-sdk==[^"]*"/"openhands-sdk=='"$VERSION"'"/' pyproject.toml
+                  sed -i -E 's/"openhands-tools==[^"]*"/"openhands-tools=='"$VERSION"'"/' pyproject.toml
+                  sed -i -E 's/"openhands-agent-server==[^"]*"/"openhands-agent-server=='"$VERSION"'"/' pyproject.toml
+
+                  echo "✅ Updated pyproject.toml"
+
+                  # 2. Regenerate poetry.lock with the new versions
+                  # Note: In Poetry 2.x, the default behavior is to not update packages already
+                  # in the lock file (the --no-update flag was removed in Poetry 2.x)
+                  echo "📝 Regenerating poetry.lock..."
+                  poetry lock
+
+                  # 3. Update the version in sandbox_spec_service.py
+                  echo "🔧 Updating AGENT_SERVER_IMAGE..."
+                  SANDBOX_SPEC_FILE="openhands/app_server/sandbox/sandbox_spec_service.py"
+                  if [ -f "$SANDBOX_SPEC_FILE" ]; then
+                    # Update the AGENT_SERVER_IMAGE line with the new hash
+                    sed -i "s|AGENT_SERVER_IMAGE = 'ghcr.io/openhands/agent-server:[^']*'|AGENT_SERVER_IMAGE = 'ghcr.io/openhands/agent-server:${VERSION}-python'|" "$SANDBOX_SPEC_FILE"
+                    echo "✅ Updated AGENT_SERVER_IMAGE to: ghcr.io/openhands/agent-server:${VERSION}-python"
+                  else
+                    echo "❌ sandbox_spec_service.py not found at expected path"
+                    exit 1
+                  fi
+
+                  # 4. Run pre-commit to fix formatting (pyproject-fmt removes parentheses from version specs)
+                  echo "🔧 Running pre-commit to fix formatting..."
+                  pip install pre-commit
+                  pre-commit run --files pyproject.toml --config ./dev_config/python/.pre-commit-config.yaml || true
+
+                  # Check if there are changes
+                  if git diff --quiet; then
+                    echo "⚠️ No changes detected in $REPO - versions may already be up to date"
+                    exit 0
+                  fi
+
+                  # Commit and push
+                  git add .
+                  git commit -m "Bump openhands-sdk, openhands-tools, openhands-agent-server to $VERSION" \
+                    -m "Automated version bump after PyPI release." \
+                    -m "" \
+                    -m "Changes:" \
+                    -m "- Updated SDK packages to v$VERSION in pyproject.toml" \
+                    -m "- Regenerated poetry.lock" \
+                    -m "- Updated AGENT_SERVER_IMAGE to ${VERSION}" \
+                    -m "" \
+                    -m "Co-authored-by: openhands <openhands@all-hands.dev>"
+                  git push -u origin "$BRANCH"
+
+                  # Check if PR already exists
+                  EXISTING_PR=$(gh pr list --repo "$REPO" --head "$BRANCH" --json number --jq '.[0].number')
+                  if [ -n "$EXISTING_PR" ]; then
+                    echo "✅ PR #$EXISTING_PR already exists for $REPO"
+                  else
+                    # Create PR
+                    gh pr create \
+                      --repo "$REPO" \
+                      --title "Bump SDK packages to v$VERSION" \
+                      --body "## Automated Version Bump
+
+                  This PR updates the following packages to version **$VERSION**:
+                  - \`openhands-sdk\`
+                  - \`openhands-tools\`
+                  - \`openhands-agent-server\`
+
+                  ### Changes
+                  - Updated SDK packages in \`pyproject.toml\`
+                  - Regenerated \`poetry.lock\`
+                  - Updated \`AGENT_SERVER_IMAGE\` to \`${VERSION}\` in \`sandbox_spec_service.py\`
+
+                  **Triggered by:** Release of [software-agent-sdk v$VERSION](https://github.com/OpenHands/software-agent-sdk/releases/tag/v$VERSION)
+
+                  ---
+                  _This PR was automatically created by the version-bump-prs workflow._" \
+                      --base main \
+                      --head "$BRANCH"
+
+                    echo "✅ PR created for $REPO"
+                  fi
+
+            - name: Summary
+              env:
+                  VERSION: ${{ steps.get_version.outputs.version }}
+              run: |
+                  echo "## ✅ Version Bump PRs Created" >> $GITHUB_STEP_SUMMARY
+                  echo "" >> $GITHUB_STEP_SUMMARY
+                  echo "PRs have been created to bump SDK packages to version **$VERSION**:" >> $GITHUB_STEP_SUMMARY
+                  echo "" >> $GITHUB_STEP_SUMMARY
+                  echo "- [OpenHands](https://github.com/All-Hands-AI/OpenHands/pulls?q=is%3Apr+bump-sdk-$VERSION)" >> $GITHUB_STEP_SUMMARY
+                  echo "- [OpenHands-CLI](https://github.com/OpenHands/openhands-cli/pulls?q=is%3Apr+bump-sdk-$VERSION)" >> $GITHUB_STEP_SUMMARY
+
+            - name: Notify Slack
+              uses: slackapi/slack-github-action@v2.1.1
+              with:
+                  method: chat.postMessage
+                  token: ${{ secrets.SLACK_BOT_TOKEN }}
+                  payload: |
+                      channel: C08E1SYKEM9
+                      text: "🚀 *SDK v${{ steps.get_version.outputs.version }} published to PyPI!*\n\nVersion bump PRs created:\n• <https://github.com/All-Hands-AI/OpenHands/pulls?q=is%3Apr+bump-sdk-${{ steps.get_version.outputs.version }}|OpenHands>\n• <https://github.com/OpenHands/openhands-cli/pulls?q=is%3Apr+bump-sdk-${{ steps.get_version.outputs.version }}|OpenHands-CLI>\n\n<https://github.com/OpenHands/software-agent-sdk/releases/tag/v${{ steps.get_version.outputs.version }}|View Release>"
@@ -1,51 +0,0 @@
-name: Welcome Good First Issue
-
-on:
-  issues:
-    types: [labeled]
-
-permissions:
-  issues: write
-
-jobs:
-  comment-on-good-first-issue:
-    if: github.event.label.name == 'good first issue'
-    runs-on: ubuntu-latest
-    steps:
-      - name: Check if welcome comment already exists
-        id: check_comment
-        uses: actions/github-script@v7
-        with:
-          result-encoding: string
-          script: |
-            const issueNumber = context.issue.number;
-            const comments = await github.rest.issues.listComments({
-              ...context.repo,
-              issue_number: issueNumber
-            });
-
-            const alreadyCommented = comments.data.some(
-              (comment) =>
-                comment.body.includes('<!-- auto-comment:good-first-issue -->')
-            );
-
-            return alreadyCommented ? 'true' : 'false';
-
-      - name: Leave welcome comment
-        if: steps.check_comment.outputs.result == 'false'
-        uses: actions/github-script@v7
-        with:
-          script: |
-            const repoUrl = `https://github.com/${context.repo.owner}/${context.repo.repo}`;
-
-            await github.rest.issues.createComment({
-              ...context.repo,
-              issue_number: context.issue.number,
-              body: "🙌 **Hey there, future contributor!** 🙌\n\n" +
-                    "This issue has been labeled as **good first issue**, which means it's a great place to get started with the OpenHands project.\n\n" +
-                    "If you're interested in working on it, feel free to! No need to ask for permission.\n\n" +
-                    "Be sure to check out our [development setup guide](" + repoUrl + "/blob/main/Development.md) to get your environment set up, and follow our [contribution guidelines](" + repoUrl + "/blob/main/CONTRIBUTING.md) when you're ready to submit a fix.\n\n" +
-                    "Feel free to join our developer community on [Slack](https://openhands.dev/joinslack). You can ask for [help](https://openhands-ai.slack.com/archives/C078L0FUGUX), [feedback](https://openhands-ai.slack.com/archives/C086ARSNMGA), and even ask for a [PR review](https://openhands-ai.slack.com/archives/C08D8FJ5771).\n\n" +
-                    "🙌 Happy hacking! 🙌\n\n" +
-                    "<!-- auto-comment:good-first-issue -->"
-            });
@@ -14,7 +14,7 @@ dist/
 downloads/
 eggs/
 .eggs/
-./lib/
+lib/
 lib64/
 parts/
 sdist/
@@ -31,7 +31,6 @@ requirements.txt
 #  Usually these files are written by a python script from a template
 #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 *.manifest
-*.spec

 # Installer logs
 pip-log.txt
@@ -57,6 +56,7 @@ cover/
 *.pot

 # Django stuff:
+*.log
 local_settings.py
 db.sqlite3
 db.sqlite3-journal
@@ -85,7 +85,6 @@ ipython_config.py
 # pyenv
 #   For a library or package, you might want to ignore these files since the code is
 #   intended to run in multiple environments; otherwise, check them in:
-.python-version

 # pipenv
 #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
@@ -121,7 +120,6 @@ celerybeat.pid

 # Environments
 .env
-frontend/.env
 .venv
 env/
 venv/
@@ -129,7 +127,6 @@ ENV/
 env.bak/
 .env.bak
 venv.bak/
-*venv/

 # Spyder project settings
 .spyderproject
@@ -166,7 +163,6 @@ cython_debug/
 # https://stackoverflow.com/questions/32964920/should-i-commit-the-vscode-folder-to-source-control
 .vscode/**/*
 !.vscode/extensions.json
-!.vscode/settings.json
 !.vscode/tasks.json

 # VS Code extensions/forks:
@@ -185,42 +181,6 @@ cython_debug/
 .repomix
 repomix-output.txt

-# Emacs backup
-*~
-
-# evaluation
-evaluation/evaluation_outputs
-evaluation/outputs
-evaluation/swe_bench/eval_workspace*
-evaluation/SWE-bench/data
-evaluation/webarena/scripts/webarena_env.sh
-evaluation/bird/data
-evaluation/gaia/data
-evaluation/gorilla/data
-evaluation/toolqa/data
-evaluation/scienceagentbench/benchmark
-evaluation/commit0_bench/repos
-
-# openhands resolver
-output/
-
-# frontend
-
-# dependencies
-frontend/.pnp
-frontend/bun.lockb
-frontend/yarn.lock
-.pnp.js
-
-# testing
-frontend/coverage
-test_results*
-/_test_files_tmp/
-
-# production
-frontend/build
-frontend/dist
-
 # misc
 .DS_Store
 .env.local
@@ -236,29 +196,22 @@ logs

 # agent
 .envrc
-/workspace
-/_test_workspace
-/debug
 cache
+.jinja_cache/

-# configuration
-config.toml
-config.toml_
-config.toml.bak
+.conversations*
+/workspace/
+openapi.json
+.client/

-# swe-bench-eval
-image_build_logs
-run_instance_logs
+# Local workspace files
+.beads/*.db
+.worktrees/
+agent-sdk.workspace.code-workspace

-runtime_*.tar
+# Integration test outputs
+tests/integration/outputs/
+tests/integration/api_compliance/outputs/

-# docker build
-containers/runtime/Dockerfile
-containers/runtime/project.tar.gz
-containers/runtime/code
-**/node_modules/
-
-# test results
-test-results
-.sessions
-.eval_sessions
+# Agent-generated temp
+.agent_tmp/
@@ -1 +0,0 @@
-22
@@ -0,0 +1,14 @@
+{
+  "stop": [
+    {
+      "matcher": "*",
+      "hooks": [
+        {
+          "type": "command",
+          "command": ".openhands/hooks/on_stop.sh",
+          "timeout": 600
+        }
+      ]
+    }
+  ]
+}
@@ -0,0 +1,303 @@
+#!/bin/bash
+# Stop hook: runs pre-commit, pytest, and checks CI status before allowing agent to finish
+#
+# This hook runs when the agent attempts to stop/finish.
+# It can BLOCK the stop by:
+#   - Exiting with code 2 (blocked)
+#   - Outputting JSON: {"decision": "deny", "additionalContext": "feedback message"}
+#
+# Environment variables available:
+#   OPENHANDS_PROJECT_DIR - Project directory
+#   OPENHANDS_SESSION_ID - Session ID
+#   GITHUB_TOKEN - GitHub API token (if available)
+
+set -o pipefail
+
+PROJECT_DIR="${OPENHANDS_PROJECT_DIR:-$(pwd)}"
+cd "$PROJECT_DIR" || exit 1
+
+# Collect all issues to report back to the agent
+ISSUES=""
+BLOCK_STOP=false
+
+log_issue() {
+    ISSUES="${ISSUES}${1}\n"
+    BLOCK_STOP=true
+}
+
+>&2 echo "=== Stop Hook ==="
+>&2 echo "Project directory: $PROJECT_DIR"
+>&2 echo ""
+
+# --------------------------
+# Step 1: Run pre-commit on all files
+# --------------------------
+>&2 echo "=== Running pre-commit run --all-files ==="
+if command -v uv &> /dev/null; then
+    PRECOMMIT_OUTPUT=$(uv run pre-commit run --all-files 2>&1)
+    PRECOMMIT_EXIT=$?
+else
+    PRECOMMIT_OUTPUT=$(pre-commit run --all-files 2>&1)
+    PRECOMMIT_EXIT=$?
+fi
+
+>&2 echo "$PRECOMMIT_OUTPUT"
+
+if [ $PRECOMMIT_EXIT -ne 0 ]; then
+    >&2 echo "⚠️  pre-commit found issues (exit code: $PRECOMMIT_EXIT)"
+    log_issue "## Pre-commit Failed\n\nPre-commit checks failed. Please fix the following issues:\n\n\`\`\`\n${PRECOMMIT_OUTPUT}\n\`\`\`"
+else
+    >&2 echo "✓ pre-commit passed"
+fi
+>&2 echo ""
+
+# --------------------------
+# Step 2: Detect changed files and run appropriate tests
+# --------------------------
+>&2 echo "=== Detecting changed files and running appropriate tests ==="
+
+# Get changed files from git (staged, unstaged, and untracked)
+CHANGED_FILES=$(git status --porcelain 2>/dev/null | awk '{print $NF}')
+
+if [ -n "$CHANGED_FILES" ]; then
+    >&2 echo "Changed files:"
+    >&2 echo "$CHANGED_FILES" | head -20
+    >&2 echo ""
+
+    # Map changed files to test directories
+    PROJECTS_TO_TEST=""
+
+    add_project() {
+        local project="$1"
+        if [[ ! "$PROJECTS_TO_TEST" =~ "$project" ]]; then
+            PROJECTS_TO_TEST="$PROJECTS_TO_TEST $project"
+        fi
+    }
+
+    while IFS= read -r file; do
+        case "$file" in
+            openhands-sdk/*) add_project "tests/sdk" ;;
+            openhands-tools/*) add_project "tests/tools" ;;
+            openhands-workspace/*) add_project "tests/workspace" ;;
+            openhands-agent-server/*) add_project "tests/agent_server" ;;
+            tests/sdk/*) add_project "tests/sdk" ;;
+            tests/tools/*) add_project "tests/tools" ;;
+            tests/workspace/*) add_project "tests/workspace" ;;
+            tests/agent_server/*) add_project "tests/agent_server" ;;
+            tests/cross/*) add_project "tests/cross" ;;
+            tests/examples/*) add_project "tests/examples" ;;
+            tests/github_workflows/*) add_project "tests/github_workflows" ;;
+            examples/*) add_project "tests/examples" ;;
+            scripts/*) add_project "tests/cross" ;;
+            pyproject.toml|uv.lock) add_project "tests/cross" ;;
+        esac
+    done <<< "$CHANGED_FILES"
+
+    PROJECTS_TO_TEST=$(echo "$PROJECTS_TO_TEST" | xargs)
+
+    if [ -n "$PROJECTS_TO_TEST" ]; then
+        >&2 echo "Running tests for: $PROJECTS_TO_TEST"
+        >&2 echo ""
+
+        for project in $PROJECTS_TO_TEST; do
+            if [ -d "$project" ]; then
+                >&2 echo "=== Testing $project ==="
+                if command -v uv &> /dev/null; then
+                    PYTEST_OUTPUT=$(uv run pytest "$project" -v --tb=short -x 2>&1)
+                    PYTEST_EXIT=$?
+                else
+                    PYTEST_OUTPUT=$(pytest "$project" -v --tb=short -x 2>&1)
+                    PYTEST_EXIT=$?
+                fi
+                >&2 echo "$PYTEST_OUTPUT"
+
+                if [ $PYTEST_EXIT -ne 0 ]; then
+                    >&2 echo "⚠️  pytest failed for $project"
+                    log_issue "## Pytest Failed for $project\n\nTests failed. Please fix the following:\n\n\`\`\`\n${PYTEST_OUTPUT}\n\`\`\`"
+                fi
+                >&2 echo ""
+            fi
+        done
+    else
+        >&2 echo "No tests to run for changed files"
+    fi
+else
+    >&2 echo "No changed files detected, skipping local tests"
+fi
+>&2 echo ""
+
+# --------------------------
+# Step 3: Check if there's a pushed commit and wait for CI
+# --------------------------
+>&2 echo "=== Checking GitHub CI status ==="
+
+# Check if we're in a git repo with a GitHub remote
+GITHUB_REMOTE=$(git remote -v 2>/dev/null | grep -E "(github\.com.*push)" | head -1)
+if [ -z "$GITHUB_REMOTE" ]; then
+    >&2 echo "No GitHub remote found, skipping CI check"
+else
+    # Extract owner/repo from remote URL
+    # Handle both HTTPS and SSH formats
+    REPO_INFO=$(echo "$GITHUB_REMOTE" | sed -E 's|.*github\.com[:/]([^/]+)/([^/.]+)(\.git)?.*|\1/\2|')
+    
+    if [ -z "$REPO_INFO" ]; then
+        >&2 echo "Could not parse GitHub repository info"
+    else
+        >&2 echo "Repository: $REPO_INFO"
+        
+        # Get current branch
+        CURRENT_BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null)
+        >&2 echo "Current branch: $CURRENT_BRANCH"
+        
+        # Get the latest commit SHA
+        LOCAL_SHA=$(git rev-parse HEAD 2>/dev/null)
+        >&2 echo "Local commit: ${LOCAL_SHA:0:8}"
+        
+        # Check if this commit has been pushed
+        REMOTE_SHA=$(git ls-remote origin "$CURRENT_BRANCH" 2>/dev/null | awk '{print $1}')
+        
+        if [ -z "$REMOTE_SHA" ]; then
+            >&2 echo "Branch not pushed to remote, skipping CI check"
+        elif [ "$LOCAL_SHA" != "$REMOTE_SHA" ]; then
+            >&2 echo "Local commit differs from remote (remote: ${REMOTE_SHA:0:8}), skipping CI check"
+        else
+            >&2 echo "Commit has been pushed, checking CI status..."
+            
+            # Check if GITHUB_TOKEN is available
+            if [ -z "$GITHUB_TOKEN" ]; then
+                >&2 echo "GITHUB_TOKEN not set, cannot check CI status"
+            else
+                # Use gh CLI if available, otherwise fall back to API
+                if command -v gh &> /dev/null; then
+                    >&2 echo "Using gh CLI to check CI status..."
+                    
+                    # Get check runs for this commit
+                    CI_STATUS=$(gh api "repos/$REPO_INFO/commits/$LOCAL_SHA/check-runs" \
+                        --jq '.check_runs | map({name: .name, status: .status, conclusion: .conclusion})' 2>&1)
+                    
+                    if [ $? -ne 0 ]; then
+                        >&2 echo "Failed to get CI status: $CI_STATUS"
+                    else
+                        # Parse the status
+                        TOTAL_CHECKS=$(echo "$CI_STATUS" | jq 'length')
+                        
+                        if [ "$TOTAL_CHECKS" -eq 0 ]; then
+                            >&2 echo "No CI checks found for this commit"
+                        else
+                            >&2 echo "Found $TOTAL_CHECKS CI check(s)"
+                            
+                            # Check for in-progress runs
+                            IN_PROGRESS=$(echo "$CI_STATUS" | jq '[.[] | select(.status != "completed")] | length')
+                            FAILED=$(echo "$CI_STATUS" | jq '[.[] | select(.conclusion == "failure" or .conclusion == "timed_out" or .conclusion == "cancelled")] | length')
+                            
+                            if [ "$IN_PROGRESS" -gt 0 ]; then
+                                >&2 echo "⏳ $IN_PROGRESS check(s) still in progress"
+                                
+                                # Wait for CI to complete (with timeout)
+                                MAX_WAIT=300  # 5 minutes
+                                WAIT_INTERVAL=15
+                                TOTAL_WAITED=0
+                                
+                                while [ "$IN_PROGRESS" -gt 0 ] && [ "$TOTAL_WAITED" -lt "$MAX_WAIT" ]; do
+                                    >&2 echo "Waiting for CI... (${TOTAL_WAITED}s / ${MAX_WAIT}s max)"
+                                    sleep $WAIT_INTERVAL
+                                    TOTAL_WAITED=$((TOTAL_WAITED + WAIT_INTERVAL))
+                                    
+                                    CI_STATUS=$(gh api "repos/$REPO_INFO/commits/$LOCAL_SHA/check-runs" \
+                                        --jq '.check_runs | map({name: .name, status: .status, conclusion: .conclusion})' 2>&1)
+                                    IN_PROGRESS=$(echo "$CI_STATUS" | jq '[.[] | select(.status != "completed")] | length')
+                                done
+                                
+                                if [ "$IN_PROGRESS" -gt 0 ]; then
+                                    >&2 echo "⚠️  CI still running after ${MAX_WAIT}s timeout"
+                                    log_issue "## CI Still Running\n\nCI checks are still in progress after waiting ${MAX_WAIT} seconds. Please wait for CI to complete before finishing."
+                                fi
+                            fi
+                            
+                            # Re-check for failures after waiting
+                            FAILED=$(echo "$CI_STATUS" | jq '[.[] | select(.conclusion == "failure" or .conclusion == "timed_out" or .conclusion == "cancelled")] | length')
+                            
+                            if [ "$FAILED" -gt 0 ]; then
+                                >&2 echo "❌ $FAILED check(s) failed!"
+                                
+                                # Get details of failed checks
+                                FAILED_DETAILS=$(echo "$CI_STATUS" | jq -r '.[] | select(.conclusion == "failure" or .conclusion == "timed_out" or .conclusion == "cancelled") | "- \(.name): \(.conclusion)"')
+                                >&2 echo "$FAILED_DETAILS"
+                                
+                                # Try to get failure logs
+                                FAILED_NAMES=$(echo "$CI_STATUS" | jq -r '.[] | select(.conclusion == "failure") | .name')
+                                
+                                FAILURE_MSG="## CI Failed\n\nThe following CI checks failed:\n\n${FAILED_DETAILS}\n"
+                                
+                                # Try to get the workflow run logs for more context
+                                WORKFLOW_RUNS=$(gh api "repos/$REPO_INFO/actions/runs?head_sha=$LOCAL_SHA" \
+                                    --jq '.workflow_runs[] | select(.conclusion == "failure") | {id: .id, name: .name}' 2>/dev/null)
+                                
+                                if [ -n "$WORKFLOW_RUNS" ]; then
+                                    FAILURE_MSG="${FAILURE_MSG}\nYou can view the full logs at: https://github.com/$REPO_INFO/actions\n"
+                                    
+                                    # Try to get job logs
+                                    FIRST_RUN_ID=$(echo "$WORKFLOW_RUNS" | jq -r '.id' | head -1)
+                                    if [ -n "$FIRST_RUN_ID" ]; then
+                                        JOBS_OUTPUT=$(gh api "repos/$REPO_INFO/actions/runs/$FIRST_RUN_ID/jobs" \
+                                            --jq '.jobs[] | select(.conclusion == "failure") | "### \(.name)\nConclusion: \(.conclusion)\nSteps:\n" + (.steps | map("- \(.name): \(.conclusion)") | join("\n"))' 2>/dev/null | head -100)
+                                        if [ -n "$JOBS_OUTPUT" ]; then
+                                            FAILURE_MSG="${FAILURE_MSG}\n### Failed Job Details:\n\`\`\`\n${JOBS_OUTPUT}\n\`\`\`"
+                                        fi
+                                    fi
+                                fi
+                                
+                                log_issue "$FAILURE_MSG"
+                            else
+                                >&2 echo "✓ All CI checks passed!"
+                            fi
+                        fi
+                    fi
+                else
+                    # Fallback to curl
+                    >&2 echo "gh CLI not available, using API directly..."
+                    CI_RESPONSE=$(curl -s -H "Authorization: token $GITHUB_TOKEN" \
+                        -H "Accept: application/vnd.github.v3+json" \
+                        "https://api.github.com/repos/$REPO_INFO/commits/$LOCAL_SHA/check-runs" 2>&1)
+                    
+                    TOTAL_CHECKS=$(echo "$CI_RESPONSE" | jq '.total_count // 0')
+                    
+                    if [ "$TOTAL_CHECKS" -gt 0 ]; then
+                        IN_PROGRESS=$(echo "$CI_RESPONSE" | jq '[.check_runs[] | select(.status != "completed")] | length')
+                        FAILED=$(echo "$CI_RESPONSE" | jq '[.check_runs[] | select(.conclusion == "failure")] | length')
+                        
+                        if [ "$IN_PROGRESS" -gt 0 ]; then
+                            >&2 echo "⏳ CI checks still in progress"
+                            log_issue "## CI In Progress\n\nCI checks are still running. Please wait for CI to complete."
+                        elif [ "$FAILED" -gt 0 ]; then
+                            FAILED_NAMES=$(echo "$CI_RESPONSE" | jq -r '.check_runs[] | select(.conclusion == "failure") | .name')
+                            >&2 echo "❌ CI failed: $FAILED_NAMES"
+                            log_issue "## CI Failed\n\nThe following CI checks failed:\n${FAILED_NAMES}\n\nPlease fix the issues and try again."
+                        else
+                            >&2 echo "✓ All CI checks passed!"
+                        fi
+                    else
+                        >&2 echo "No CI checks found"
+                    fi
+                fi
+            fi
+        fi
+    fi
+fi
+>&2 echo ""
+
+# --------------------------
+# Final decision
+# --------------------------
+if [ "$BLOCK_STOP" = true ]; then
+    >&2 echo "=== BLOCKING STOP: Issues found ==="
+    # Output JSON to provide feedback to the agent
+    # Escape the issues for JSON
+    ESCAPED_ISSUES=$(echo -e "$ISSUES" | jq -Rs .)
+    echo "{\"decision\": \"deny\", \"reason\": \"Checks failed\", \"additionalContext\": $ESCAPED_ISSUES}"
+    exit 2
+fi
+
+>&2 echo "=== All checks passed, allowing stop ==="
+echo '{"decision": "allow"}'
+exit 0
@@ -1,33 +0,0 @@
---
-name: documentation
-type: knowledge
-version: 1.0.0
-agent: CodeActAgent
-triggers:
- documentation
- docs
- document
---
-
-# Documentation Guidelines
-
-All documentation must be grounded in fact, so you must not make anything up without proper evidence. When you have finished writing documentation, convey to the user what reference source, including web pages, source code, or other sources of documentation you referenced when writing each new fact in the documentation. If you cannot reference a source for anything do not include it in the pull request.
-
-## Best Practices for Documentation
-
-1. **Be Factual**: Only include information that can be verified from reliable sources.
-2. **Cite Sources**: Always reference the source of information (code, web pages, official documentation).
-3. **Be Clear and Concise**: Use simple language and avoid unnecessary jargon.
-4. **Use Examples**: Include practical examples to illustrate concepts.
-5. **Structure Properly**: Use headings, lists, and code blocks to organize information.
-6. **Keep Updated**: Ensure documentation reflects the current state of the code or system.
-
-## Documentation Process
-
-1. Research and gather information from reliable sources
-2. Draft documentation based on verified facts
-3. Review for accuracy and completeness
-4. Include references for all factual statements
-5. Submit only when all information is properly sourced
-
-Remember: If you cannot verify a piece of information, it's better to exclude it than to include potentially incorrect information.
@@ -1,172 +0,0 @@
-# OpenHands Glossary
-
-### Agent
-The core AI entity in OpenHands that can perform software development tasks by interacting with tools, browsing the web, and modifying code.
-
-#### Agent Controller
-A component that manages the agent's lifecycle, handles its state, and coordinates interactions between the agent and various tools.
-
-#### Agent Delegation
-The ability of an agent to hand off specific tasks to other specialized agents for better task completion.
-
-#### Agent Hub
-A central registry of different agent types and their capabilities, allowing for easy agent selection and instantiation.
-
-#### Agent Skill
-A specific capability or function that an agent can perform, such as file manipulation, web browsing, or code editing.
-
-#### Agent State
-The current context and status of an agent, including its memory, active tools, and ongoing tasks.
-
-#### CodeAct Agent
-[A generalist agent in OpenHands](https://arxiv.org/abs/2407.16741) designed to perform tasks by editing and executing code.
-
-### Browser
-A system for web-based interactions and tasks.
-
-#### Browser Gym
-A testing and evaluation environment for browser-based agent interactions and tasks.
-
-#### Web Browser Tool
-A tool that enables agents to interact with web pages and perform web-based tasks.
-
-### Commands
-Terminal and execution related functionality.
-
-#### Bash Session
-A persistent terminal session that maintains state and history for bash command execution.
-This uses tmux under the hood.
-
-### Configuration
-System-wide settings and options.
-
-#### Agent Configuration
-Settings that define an agent's behavior, capabilities, and limitations, including available tools and runtime settings.
-
-#### Configuration Options
-Settings that control various aspects of OpenHands behavior, including runtime, security, and agent settings.
-
-#### LLM Config
-Configuration settings for language models used by agents, including model selection and parameters.
-
-#### LLM Draft Config
-Settings for draft mode operations with language models, typically used for faster, lower-quality responses.
-
-#### Runtime Configuration
-Settings that define how the runtime environment should be set up and operated.
-
-#### Security Options
-Configuration settings that control security features and restrictions.
-
-### Conversation
-A sequence of interactions between a user and an agent, including messages, actions, and their results.
-
-#### Conversation Info
-Metadata about a conversation, including its status, participants, and timeline.
-
-#### Conversation Manager
-A component that handles the creation, storage, and retrieval of conversations.
-
-#### Conversation Metadata
-Additional information about conversations, such as tags, timestamps, and related resources.
-
-#### Conversation Status
-The current state of a conversation, including whether it's active, completed, or failed.
-
-#### Conversation Store
-A storage system for maintaining conversation history and related data.
-
-### Events
-
-#### Event
-Every Conversation comprises a series of Events. Each Event is either an Action or an Observation.
-
-#### Event Stream
-A continuous flow of events that represents the ongoing activities and interactions in the system.
-
-#### Action
-A specific operation or command that an agent executes through available tools, such as running a command or editing a file.
-
-#### Observation
-The response or result returned by a tool after an agent's action, providing feedback about the action's outcome.
-
-### Interface
-Different ways to interact with OpenHands.
-
-#### CLI Mode
-A command-line interface mode for interacting with OpenHands agents without a graphical interface.
-
-#### GUI Mode
-A graphical user interface mode for interacting with OpenHands agents through a web interface.
-
-#### Headless Mode
-A mode of operation where OpenHands runs without a user interface, suitable for automation and scripting.
-
-### Agent Memory
-The system that decides which parts of the Event Stream (i.e. the conversation history) should be passed into each LLM prompt.
-
-#### Memory Store
-A storage system for maintaining agent memory and context across sessions.
-
-#### Condenser
-A component that processes and summarizes conversation history to maintain context while staying within token limits.
-
-#### Truncation
-A very simple Condenser strategy. Reduces conversation history or content to stay within token limits.
-
-### Microagent
-A specialized prompt that enhances OpenHands with domain-specific knowledge, repository-specific context, and task-specific workflows.
-
-#### Microagent Registry
-A central repository of available microagents and their configurations.
-
-#### Public Microagent
-A general-purpose microagent available to all OpenHands users, triggered by specific keywords. Located in `microagents/`.
-
-#### Repository Microagent
-A type of microagent that provides repository-specific context and guidelines, stored in the `.openhands/microagents/` directory.
-
-### Prompt
-Components for managing and processing prompts.
-
-#### Prompt Caching
-A system for caching and reusing common prompts to improve performance.
-
-#### Prompt Manager
-A component that handles the loading, processing, and management of prompts used by agents, including microagents.
-
-#### Response Parsing
-The process of interpreting and structuring responses from language models and tools.
-
-### Runtime
-The execution environment where agents perform their tasks, which can be local, remote, or containerized.
-
-#### Action Execution Server
-A REST API that receives agent actions (e.g. bash commands, python code, browsing actions), executes them in the runtime environment, and returns the results.
-
-#### Action Execution Client
-A component that handles the execution of actions in the runtime environment, managing the communication between the agent and the runtime.
-
-#### Docker Runtime
-A containerized runtime environment that provides isolation and reproducibility for agent operations.
-
-#### E2B Runtime
-A specialized runtime environment built on E2B for secure and isolated code execution.
-
-#### Local Runtime
-A runtime environment that executes on the local machine, suitable for development and testing.
-
-#### Modal Runtime
-A runtime environment built on Modal for scalable and distributed agent operations.
-
-#### Remote Runtime
-A sandboxed environment that executes code and commands remotely, providing isolation and security for agent operations.
-
-#### Runtime Builder
-A component that builds a Docker image for the Action Execution Server based on a user-specified base image.
-
-### Security
-Security-related components and features.
-
-#### Security Analyzer
-A component that checks agent actions for potential security risks.
@@ -1,124 +0,0 @@
-#!/bin/bash
-
-echo "Running OpenHands pre-commit hook..."
-echo "This hook runs selective linting based on changed files."
-
-# Store the exit code to return at the end
-# This allows us to be additive to existing pre-commit hooks
-EXIT_CODE=0
-
-# Get the list of staged files
-STAGED_FILES=$(git diff --cached --name-only)
-
-# Check if any files match specific patterns
-has_frontend_changes=false
-has_backend_changes=false
-
-# Check each file individually to avoid issues with grep
-for file in $STAGED_FILES; do
-    if [[ $file == frontend/* ]]; then
-        has_frontend_changes=true
-    elif [[ $file == openhands/* || $file == evaluation/* || $file == tests/* ]]; then
-        has_backend_changes=true
-    fi
-done
-
-echo "Analyzing changes..."
-echo "- Frontend changes: $has_frontend_changes"
-echo "- Backend changes: $has_backend_changes"
-
-# Run frontend linting if needed
-if [ "$has_frontend_changes" = true ]; then
-    # Check if we're in a CI environment or if frontend dependencies are missing
-    if [ -n "$CI" ] || ! command -v react-router &> /dev/null || ! command -v vitest &> /dev/null; then
-        echo "Skipping frontend checks (CI environment or missing dependencies detected)."
-        echo "WARNING: Frontend files have changed but frontend checks are being skipped."
-        echo "Please run 'make lint-frontend' manually before submitting your PR."
-    else
-        echo "Running frontend linting..."
-        make lint-frontend
-        if [ $? -ne 0 ]; then
-            echo "Frontend linting failed. Please fix the issues before committing."
-            EXIT_CODE=1
-        else
-            echo "Frontend linting checks passed!"
-        fi
-
-        # Run additional frontend checks
-        if [ -d "frontend" ]; then
-            echo "Running additional frontend checks..."
-            cd frontend || exit 1
-
-            # Run build
-            echo "Running npm build..."
-            npm run build
-            if [ $? -ne 0 ]; then
-                echo "Frontend build failed. Please fix the issues before committing."
-                EXIT_CODE=1
-            fi
-
-            # Run tests
-            echo "Running npm test..."
-            npm test
-            if [ $? -ne 0 ]; then
-                echo "Frontend tests failed. Please fix the failing tests before committing."
-                EXIT_CODE=1
-            fi
-
-            cd ..
-        fi
-    fi
-else
-    echo "Skipping frontend checks (no frontend changes detected)."
-fi
-
-# Run backend linting if needed
-if [ "$has_backend_changes" = true ]; then
-    echo "Running backend linting..."
-    make lint-backend
-    if [ $? -ne 0 ]; then
-        echo "Backend linting failed. Please fix the issues before committing."
-        EXIT_CODE=1
-    else
-        echo "Backend linting checks passed!"
-    fi
-else
-    echo "Skipping backend checks (no backend changes detected)."
-fi
-
-
-# If no specific code changes detected, run basic checks
-if [ "$has_frontend_changes" = false ] && [ "$has_backend_changes" = false ]; then
-    echo "No specific code changes detected. Running basic checks..."
-    if [ -n "$STAGED_FILES" ]; then
-        # Run only basic pre-commit hooks for non-code files
-        poetry run pre-commit run --files $(echo "$STAGED_FILES" | tr '\n' ' ') --hook-stage commit --config ./dev_config/python/.pre-commit-config.yaml
-        if [ $? -ne 0 ]; then
-            echo "Basic checks failed. Please fix the issues before committing."
-            EXIT_CODE=1
-        else
-            echo "Basic checks passed!"
-        fi
-    else
-        echo "No files changed. Skipping basic checks."
-    fi
-fi
-
-# Run any existing pre-commit hooks that might have been installed by the user
-# This makes our hook additive rather than replacing existing hooks
-if [ -f ".git/hooks/pre-commit.local" ]; then
-    echo "Running existing pre-commit hooks..."
-    bash .git/hooks/pre-commit.local
-    if [ $? -ne 0 ]; then
-        echo "Existing pre-commit hooks failed."
-        EXIT_CODE=1
-    fi
-fi
-
-if [ $EXIT_CODE -eq 0 ]; then
-    echo "All pre-commit checks passed!"
-else
-    echo "Some pre-commit checks failed. Please fix the issues before committing."
-fi
-
-exit $EXIT_CODE
@@ -1,13 +1,11 @@
-#! /bin/bash
+#!/bin/bash

-echo "Setting up the environment..."
-
-# Install pre-commit package
-python -m pip install pre-commit
-
-# Install pre-commit hooks if .git directory exists
-if [ -d ".git" ]; then
-    echo "Installing pre-commit hooks..."
-    pre-commit install
-    make install-pre-commit-hooks
+if ! command -v uv &> /dev/null; then
+    echo "uv is not installed. Installing..."
+    curl -LsSf https://astral.sh/uv/install.sh | sh
+else
+    echo "uv is already installed."
+    uv self update  # always update to the latest version
 fi
+
+make build
@@ -0,0 +1,56 @@
+---
+repos:
+    - repo: https://github.com/jumanjihouse/pre-commit-hook-yamlfmt
+      rev: 0.2.1 # or other specific tag
+      hooks:
+          - id: yamlfmt
+    - repo: local
+      hooks:
+          - id: ruff-format
+            name: Ruff format
+            entry: uv
+            args: [run, ruff, format]
+            language: system
+            types: [python]
+            pass_filenames: true
+            always_run: false
+          - id: ruff-check
+            name: Ruff lint
+            entry: uv
+            args: [run, ruff, check, --fix, --exit-non-zero-on-fix]
+            language: system
+            types: [python]
+            pass_filenames: true
+            always_run: false
+          - id: pycodestyle
+            name: PEP8 style check (pycodestyle)
+            entry: uv
+            args: [run, pycodestyle, --max-line-length=88, '--ignore=E203,E501,W503,E704']
+            language: system
+            types: [python]
+            pass_filenames: true
+            always_run: false
+          - id: pyright
+            name: Type check with pyright
+            entry: uv
+            args: [run, pyright]
+            language: system
+            types: [python]
+            pass_filenames: true
+            always_run: false
+          - id: check-import-rules
+            name: Check import dependency rules
+            entry: uv
+            args: [run, python, scripts/check_import_rules.py]
+            language: system
+            types: [python]
+            pass_filenames: true
+            always_run: false
+          - id: check-tool-registration
+            name: Check Tool subclass registration
+            entry: uv
+            args: [run, python, scripts/check_tool_registration.py]
+            language: system
+            types: [python]
+            pass_filenames: true
+            always_run: false
@@ -0,0 +1 @@
+3.13
@@ -1,22 +0,0 @@
-{
-    // force *nix line endings so files don't look modified in container run from Windows clone
-    "files.eol": "\n",
-    "files.trimTrailingWhitespace": true,
-    "files.insertFinalNewline": true,
-
-    "python.defaultInterpreterPath": "./.venv/bin/python",
-    "python.terminal.activateEnvironment": true,
-    "python.analysis.autoImportCompletions": true,
-    "python.analysis.autoSearchPaths": true,
-    "python.analysis.extraPaths": [
-        "./.venv/lib/python3.12/site-packages"
-    ],
-    "python.analysis.packageIndexDepths": [
-        {
-            "name": "openhands",
-            "depth": 10,
-            "includeAllSymbols": true
-        }
-    ],
-    "python.analysis.stubPath": "./.venv/lib/python3.12/site-packages",
-}
@@ -1,344 +1,328 @@
-This repository contains the code for OpenHands, an automated AI software engineer. It has a Python backend
-(in the `openhands` directory) and React frontend (in the `frontend` directory).
+<ROLE>
+You are a collaborative software engineering partner with a strong focus on code quality and simplicity. Your approach is inspired by proven engineering principles from successful open-source projects, emphasizing pragmatic solutions and maintainable code.

-## General Setup:
-To set up the entire repo, including frontend and backend, run `make build`.
-You don't need to do this unless the user asks you to, or if you're trying to run the entire application.
+# Core Engineering Principles
+
+1. **Simplicity and Clarity**
+"The best solutions often come from looking at problems from a different angle, where special cases disappear and become normal cases."
+    • Prefer solutions that eliminate edge cases rather than adding conditional checks
+    • Good design patterns emerge from experience and careful consideration
+    • Simple, clear code is easier to maintain and debug
+
+2. **Backward Compatibility**
+"Stability is a feature, not a constraint."
+    • Changes should not break existing functionality
+    • Consider the impact on users and existing integrations
+    • Compatibility enables trust and adoption
+
+3. **Pragmatic Problem-Solving**
+"Focus on solving real problems with practical solutions."
+    • Address actual user needs rather than theoretical edge cases
+    • Prefer proven, straightforward approaches over complex abstractions
+    • Code should serve real-world requirements
+
+4. **Maintainable Architecture**
+"Keep functions focused and code readable."
+    • Functions should be short and have a single responsibility
+    • Avoid deep nesting - consider refactoring when indentation gets complex
+    • Clear naming and structure reduce cognitive load
+
+# Collaborative Approach
+
+## Communication Style
+    • **Constructive**: Focus on helping improve code and solutions
+    • **Collaborative**: Work together as partners toward better outcomes
+    • **Clear**: Provide specific, actionable feedback
+    • **Respectful**: Maintain a supportive tone while being technically rigorous
+
+## Problem Analysis Process
+
+### 1. Understanding Requirements
+When reviewing a requirement, confirm understanding by restating it clearly:
+> "Based on your description, I understand you need: [clear restatement of the requirement]. Is this correct?"
+
+### 2. Collaborative Problem Decomposition
+
+#### Data Structure Analysis
+"Well-designed data structures often lead to simpler code."
+    • What are the core data elements and their relationships?
+    • How does data flow through the system?
+    • Are there opportunities to simplify data handling?
+
+#### Complexity Assessment
+"Let's look for ways to simplify this."
+    • What's the essential functionality we need to implement?
+    • Which parts of the current approach add unnecessary complexity?
+    • How can we make this more straightforward?
+
+#### Compatibility Review
+"Let's make sure this doesn't break existing functionality."
+    • What existing features might be affected?
+    • How can we implement this change safely?
+    • What migration path do users need?
+
+#### Practical Validation
+"Let's focus on the real-world use case."
+    • Does this solve an actual problem users face?
+    • Is the complexity justified by the benefit?
+    • What's the simplest approach that meets the need?
+
+## 3. Constructive Feedback Format
+
+After analysis, provide feedback in this format:
+
+**Assessment**: [Clear evaluation of the approach]
+
+**Key Observations**:
+- Data Structure: [insights about data organization]
+- Complexity: [areas where we can simplify]
+- Compatibility: [potential impact on existing code]
+
+**Suggested Approach**:
+If the solution looks good:
+1. Start with the simplest data structure that works
+2. Eliminate special cases where possible
+3. Implement clearly and directly
+4. Ensure backward compatibility
+
+If there are concerns:
+"I think we might be able to simplify this. The core issue seems to be [specific problem]. What if we tried [alternative approach]?"
+
+## 4. Code Review Approach
+When reviewing code, provide constructive feedback:
+
+**Overall Assessment**: [Helpful evaluation]
+
+**Specific Suggestions**:
+- [Concrete improvements with explanations]
+- [Alternative approaches to consider]
+- [Ways to reduce complexity]
+
+**Next Steps**: [Clear action items]
+</ROLE>
+
+## Package-specific guidance
+When reviewing or modifying code, read the closest AGENTS file for the
+package(s) containing the changed files. If a PR spans multiple packages,
+consult each relevant package-level AGENTS.md.
+
+- SDK: [openhands-sdk/openhands/sdk/AGENTS.md](openhands-sdk/openhands/sdk/AGENTS.md)
+- Subagents: [openhands-sdk/openhands/sdk/subagent/AGENTS.md](openhands-sdk/openhands/sdk/subagent/AGENTS.md)
+- Tools: [openhands-tools/openhands/tools/AGENTS.md](openhands-tools/openhands/tools/AGENTS.md)
+- Workspace: [openhands-workspace/openhands/workspace/AGENTS.md](openhands-workspace/openhands/workspace/AGENTS.md)
+- Agent server: [openhands-agent-server/AGENTS.md](openhands-agent-server/AGENTS.md)
+- Eval config: [.github/run-eval/AGENTS.md](.github/run-eval/AGENTS.md)
+
+## API compatibility pointers
+
+- For SDK Python API deprecation/removal policy, read
+  [openhands-sdk/openhands/sdk/AGENTS.md](openhands-sdk/openhands/sdk/AGENTS.md).
+  Public API removals require deprecation before removal, and breaking SDK API
+  changes require at least a **MINOR** SemVer bump.
+- The SDK API breakage checker should treat metadata-only changes to
+  Pydantic `Field(...)` declarations as non-breaking, including adding,
+  removing, or editing `description`, `title`, `examples`,
+  `json_schema_extra`, and `deprecated` kwargs.
+- For public REST APIs, read
+  [openhands-agent-server/AGENTS.md](openhands-agent-server/AGENTS.md).
+  REST contract breaks need a deprecation notice and a runway of
+  **5 minor releases** before removing the old contract or making an
+  incompatible replacement mandatory.
+
+<DEV_SETUP>
+- Make sure you `make build` to configure the dependencies first
+- We use pre-commit hooks `.pre-commit-config.yaml` that includes:
+  - type check through pyright
+  - linting and formatter with `uv ruff`
+- NEVER USE `mypy`!
+- Do NOT commit ALL the file, just commit the relevant file you've changed!
+- In every commit message, you should add "Co-authored-by: openhands <openhands@all-hands.dev>"
+- You can run pytest with `uv run pytest`
+
+# Instruction for fixing "E501 Line too long"
+
+- If it is just code, you can modify it so it spans multiple lines.
+- If it is a single-line string, you can break it into a multi-line string by doing "ABC" -> ("A"\n"B"\n"C")
+- If it is a long multi-line string (e.g., docstring), you should just add type ignore AFTER the ending """. You should NEVER ADD IT INSIDE the docstring.
+
+
+</DEV_SETUP>
+
+<PR_ARTIFACTS>
+# PR-Specific Documents
+
+When working on a PR that requires design documents, scripts meant for development-only, or other temporary artifacts that should NOT be merged to main, store them in a `.pr/` directory at the repository root.
+
+## Usage

-## Running OpenHands with OpenHands:
-To run the full application to debug issues:
 ```bash
-export INSTALL_DOCKER=0
-export RUNTIME=local
-make build && make run FRONTEND_PORT=12000 FRONTEND_HOST=0.0.0.0 BACKEND_HOST=0.0.0.0 &> /tmp/openhands-log.txt &
+# Create the directory if it doesn't exist
+mkdir -p .pr
+
+# Add your PR-specific documents
+.pr/
+├── design.md       # Design decisions and architecture notes
+├── analysis.md     # Investigation or debugging notes
+└── notes.md        # Any other PR-specific content
 ```

-IMPORTANT: Before making any changes to the codebase, ALWAYS run `make install-pre-commit-hooks` to ensure pre-commit hooks are properly installed.
+## How It Works

-Before pushing any changes, you MUST ensure that any lint errors or simple test errors have been fixed.
+1. **Notification**: When `.pr/` exists, a single comment is posted to the PR conversation alerting reviewers
+2. **Auto-cleanup**: When the PR is approved, the `.pr/` directory is automatically removed via commit
+3. **Fork PRs**: Auto-cleanup cannot push to forks, so manual removal is required before merging

-* If you've made changes to the backend, you should run `pre-commit run --config ./dev_config/python/.pre-commit-config.yaml` (this will run on staged files).
-* If you've made changes to the frontend, you should run `cd frontend && npm run lint:fix && npm run build ; cd ..`
-* If you've made changes to the VSCode extension, you should run `cd openhands/integrations/vscode && npm run lint:fix && npm run compile ; cd ../../..`
+## Important Notes

-The pre-commit hooks MUST pass successfully before pushing any changes to the repository. This is a mandatory requirement to maintain code quality and consistency.
+- Do NOT put anything in `.pr/` that needs to be preserved
+- The `.pr/` check passes (green ✅) during development - it only posts a notification, not a blocking error
+- For fork PRs: You must manually remove `.pr/` before the PR can be merged

-If either command fails, it may have automatically fixed some issues. You should fix any issues that weren't automatically fixed,
-then re-run the command to ensure it passes. Common issues include:
- Mypy type errors
- Ruff formatting issues
- Trailing whitespace
- Missing newlines at end of files
+## When to Use

-## Git Best Practices
+- Complex refactoring that benefits from written design rationale
+- Debugging sessions where you want to document your investigation
+- Feature implementations that need temporary planning docs
+- Temporary script that are intended to show reviewers that the feature works
+- Any analysis that helps reviewers understand the PR but isn't needed long-term
+</PR_ARTIFACTS>

- Prefer specific `git add <filename>` instead of `git add .` to avoid accidentally staging unintended files
- Be especially careful with `git reset --hard` after staging files, as it will remove accidentally staged files
- When remote has new changes, use `git fetch upstream && git rebase upstream/<branch>` on the same branch
+<REVIEW_HANDLING>
+- Critically evaluate each review comment before acting on it. Not all feedback is worth implementing:
+  - Does it fix a real bug or improve clarity significantly?
+  - Does it align with the project's engineering principles (simplicity, maintainability)?
+  - Is the suggested change proportional to the benefit, or does it add unnecessary complexity?
+- It's acceptable to respectfully decline suggestions that add verbosity without clear benefit, over-engineer for hypothetical edge cases, or contradict the project's pragmatic approach.
+- After addressing (or deciding not to address) inline review comments, mark the corresponding review threads as resolved.
+- Before resolving a thread, leave a reply comment that either explains the reason for dismissing the feedback or references the specific commit (e.g., commit SHA) that addressed the issue.
+- Prefer resolving threads only once fixes are pushed or a clear decision is documented.
+- Use the GitHub GraphQL API to reply to and resolve review threads (see below).

-## Repository Structure
-Backend:
- Located in the `openhands` directory
- Testing:
-  - All tests are in `tests/unit/test_*.py`
-  - To test new code, run `poetry run pytest tests/unit/test_xxx.py` where `xxx` is the appropriate file for the current functionality
-  - Write all tests with pytest
+## Resolving Review Threads via GraphQL

-Frontend:
- Located in the `frontend` directory
- Prerequisites: A recent version of NodeJS / NPM
- Setup: Run `npm install` in the frontend directory
- Testing:
-  - Run tests: `npm run test`
-  - To run specific tests: `npm run test -- -t "TestName"`
-  - Our test framework is vitest
- Building:
-  - Build for production: `npm run build`
- Environment Variables:
-  - Set in `frontend/.env` or as environment variables
-  - Available variables: VITE_BACKEND_HOST, VITE_USE_TLS, VITE_INSECURE_SKIP_VERIFY, VITE_FRONTEND_PORT
- Internationalization:
-  - Generate i18n declaration file: `npm run make-i18n`
- Data Fetching & Cache Management:
-  - We use TanStack Query (fka React Query) for data fetching and cache management
-  - Data Access Layer: API client methods are located in `frontend/src/api` and should never be called directly from UI components - they must always be wrapped with TanStack Query
-  - Custom hooks are located in `frontend/src/hooks/query/` and `frontend/src/hooks/mutation/`
-  - Query hooks should follow the pattern use[Resource] (e.g., `useConversationSkills`)
-  - Mutation hooks should follow the pattern use[Action] (e.g., `useDeleteConversation`)
-  - Architecture rule: UI components → TanStack Query hooks → Data Access Layer (`frontend/src/api`) → API endpoints
+The CI check `Review Thread Gate/unresolved-review-threads` will fail if there are unresolved review threads. To resolve threads programmatically:

-VSCode Extension:
- Located in the `openhands/integrations/vscode` directory
- Setup: Run `npm install` in the extension directory
- Linting:
-  - Run linting with fixes: `npm run lint:fix`
-  - Check only: `npm run lint`
-  - Type checking: `npm run typecheck`
- Building:
-  - Compile TypeScript: `npm run compile`
-  - Package extension: `npm run package-vsix`
- Testing:
-  - Run tests: `npm run test`
- Development Best Practices:
-  - Use `vscode.window.createOutputChannel()` for debug logging instead of `showErrorMessage()` popups
-  - Pre-commit process runs both frontend and backend checks when committing extension changes
-
-## Enterprise Directory
-
-The `enterprise/` directory contains additional functionality that extends the open-source OpenHands codebase. This includes:
- Authentication and user management (Keycloak integration)
- Database migrations (Alembic)
- Integration services (GitHub, GitLab, Jira, Linear, Slack)
- Billing and subscription management (Stripe)
- Telemetry and analytics (PostHog, custom metrics framework)
-
-### Enterprise Development Setup
-
-**Prerequisites:**
- Python 3.12
- Poetry (for dependency management)
- Node.js 22.x (for frontend)
- Docker (optional)
-
-**Setup Steps:**
-1. First, build the main OpenHands project: `make build`
-2. Then install enterprise dependencies: `cd enterprise && poetry install --with dev,test` (This can take a very long time. Be patient.)
-3. Set up enterprise pre-commit hooks: `poetry run pre-commit install --config ./dev_config/python/.pre-commit-config.yaml`
-
-**Running Enterprise Tests:**
+1. Get the thread IDs (replace `<OWNER>`, `<REPO>`, `<PR_NUMBER>`):
 ```bash
-# Enterprise unit tests (full suite)
-PYTHONPATH=".:$PYTHONPATH" poetry run --project=enterprise pytest --forked -n auto -s -p no:ddtrace -p no:ddtrace.pytest_bdd -p no:ddtrace.pytest_benchmark ./enterprise/tests/unit --cov=enterprise --cov-branch
-
-# Test specific modules (faster for development)
-cd enterprise
-PYTHONPATH=".:$PYTHONPATH" poetry run pytest tests/unit/telemetry/ --confcutdir=tests/unit/telemetry
-
-# Enterprise linting (IMPORTANT: use --show-diff-on-failure to match GitHub CI)
-poetry run pre-commit run --all-files --show-diff-on-failure --config ./dev_config/python/.pre-commit-config.yaml
+gh api graphql -f query='
+{
+  repository(owner: "<OWNER>", name: "<REPO>") {
+    pullRequest(number: <PR_NUMBER>) {
+      reviewThreads(first: 20) {
+        nodes {
+          id
+          isResolved
+          comments(first: 1) {
+            nodes { body }
+          }
+        }
+      }
+    }
+  }
+}'
 ```

-**Running Enterprise Server:**
+2. Reply to the thread explaining how the feedback was addressed:
 ```bash
-cd enterprise
-make start-backend  # Development mode with hot reload
-# or
-make run  # Full application (backend + frontend)
+gh api graphql -f query='
+mutation {
+  addPullRequestReviewThreadReply(input: {
+    pullRequestReviewThreadId: "<THREAD_ID>"
+    body: "Fixed in <COMMIT_SHA>"
+  }) {
+    comment { id }
+  }
+}'
 ```

-**Key Configuration Files:**
- `enterprise/pyproject.toml` - Enterprise-specific dependencies
- `enterprise/Makefile` - Enterprise build and run commands
- `enterprise/dev_config/python/` - Linting and type checking configuration
- `enterprise/migrations/` - Database migration files
-
-**Database Migrations:**
-Enterprise uses Alembic for database migrations. When making schema changes:
-1. Create migration files in `enterprise/migrations/versions/`
-2. Test migrations thoroughly
-3. The CI will check for migration conflicts on PRs
-
-**Integration Development:**
-The enterprise codebase includes integrations for:
- **GitHub** - PR management, webhooks, app installations
- **GitLab** - Similar to GitHub but for GitLab instances
- **Jira** - Issue tracking and project management
- **Linear** - Modern issue tracking
- **Slack** - Team communication and notifications
-
-Each integration follows a consistent pattern with service classes, storage models, and API endpoints.
-
-**Important Notes:**
- Enterprise code is licensed under Polyform Free Trial License (30-day limit)
- The enterprise server extends the OpenHands server through dynamic imports
- Database changes require careful migration planning in `enterprise/migrations/`
- Always test changes in both OpenHands and enterprise contexts
- Use the enterprise-specific Makefile commands for development
-
-**Enterprise Testing Best Practices:**
-
-**Database Testing:**
- Use SQLite in-memory databases (`sqlite:///:memory:`) for unit tests instead of real PostgreSQL
- Create module-specific `conftest.py` files with database fixtures
- Mock external database connections in unit tests to avoid dependency on running services
- Use real database connections only for integration tests
-
-**Import Patterns:**
- Use relative imports without `enterprise.` prefix in enterprise code
- Example: `from storage.database import session_maker` not `from enterprise.storage.database import session_maker`
- This ensures code works in both OpenHands and enterprise contexts
-
-**Test Structure:**
- Place tests in `enterprise/tests/unit/` following the same structure as the source code
- Use `--confcutdir=tests/unit/[module]` when testing specific modules
- Create comprehensive fixtures for complex objects (databases, external services)
- Write platform-agnostic tests (avoid hardcoded OS-specific assertions)
-
-**Mocking Strategy:**
- Use `AsyncMock` for async operations and `MagicMock` for complex objects
- Mock all external dependencies (databases, APIs, file systems) in unit tests
- Use `patch` with correct import paths (e.g., `telemetry.registry.logger` not `enterprise.telemetry.registry.logger`)
- Test both success and failure scenarios with proper error handling
-
-**Coverage Goals:**
- Aim for 90%+ test coverage on new enterprise modules
- Focus on critical business logic and error handling paths
- Use `--cov-report=term-missing` to identify uncovered lines
-
-**Troubleshooting:**
- If tests fail, ensure all dependencies are installed: `poetry install --with dev,test`
- For database issues, check migration status and run migrations if needed
- For frontend issues, ensure the main OpenHands frontend is built: `make build`
- Check logs in the `logs/` directory for runtime issues
- If tests fail with import errors, verify `PYTHONPATH=".:$PYTHONPATH"` is set
- **If GitHub CI fails but local linting passes**: Always use `--show-diff-on-failure` flag to match CI behavior exactly
-
-## Template for Github Pull Request
-
-If you are starting a pull request (PR), please follow the template in `.github/pull_request_template.md`.
-
-## Implementation Details
-
-These details may or may not be useful for your current task.
-
-### Microagents
-
-Microagents are specialized prompts that enhance OpenHands with domain-specific knowledge and task-specific workflows. They are Markdown files that can include frontmatter for configuration.
-
-#### Types:
- **Public Microagents**: Located in `microagents/`, available to all users
- **Repository Microagents**: Located in `.openhands/microagents/`, specific to this repository
-
-#### Loading Behavior:
- **Without frontmatter**: Always loaded into LLM context
- **With triggers in frontmatter**: Only loaded when user's message matches the specified trigger keywords
-
-#### Structure:
-```yaml
---
-triggers:
- keyword1
- keyword2
---
-# Microagent Content
-Your specialized knowledge and instructions here...
+3. Resolve the thread:
+```bash
+gh api graphql -f query='
+mutation {
+  resolveReviewThread(input: {threadId: "<THREAD_ID>"}) {
+    thread { isResolved }
+  }
+}'
 ```

-### Frontend
+4. Get the failed workflow run ID and rerun it:
+```bash
+# Find the run ID from the failed check URL, or use:
+gh run list --repo <OWNER>/<REPO> --branch <BRANCH> --limit 5

-#### Action Handling:
- Actions are defined in `frontend/src/types/action-type.ts`
- The `HANDLED_ACTIONS` array in `frontend/src/state/chat-slice.ts` determines which actions are displayed as collapsible UI elements
- To add a new action type to the UI:
-  1. Add the action type to the `HANDLED_ACTIONS` array
-  2. Implement the action handling in `addAssistantAction` function in chat-slice.ts
-  3. Add a translation key in the format `ACTION_MESSAGE$ACTION_NAME` to the i18n files
- Actions with `thought` property are displayed in the UI based on their action type:
-  - Regular actions (like "run", "edit") display the thought as a separate message
-  - Special actions (like "think") are displayed as collapsible elements only
+# Rerun failed jobs
+gh run rerun <RUN_ID> --repo <OWNER>/<REPO> --failed
+```
+</REVIEW_HANDLING>

-#### Adding User Settings:
- To add a new user setting to OpenHands, follow these steps:
-  1. Add the setting to the frontend:
-     - Add the setting to the `Settings` type in `frontend/src/types/settings.ts`
-     - Add the setting to the `ApiSettings` type in the same file
-     - Add the setting with an appropriate default value to `DEFAULT_SETTINGS` in `frontend/src/services/settings.ts`
-     - Update the `useSettings` hook in `frontend/src/hooks/query/use-settings.ts` to map the API response
-     - Update the `useSaveSettings` hook in `frontend/src/hooks/mutation/use-save-settings.ts` to include the setting in API requests
-     - Add UI components (like toggle switches) in the appropriate settings screen (e.g., `frontend/src/routes/app-settings.tsx`)
-     - Add i18n translations for the setting name and any tooltips in `frontend/src/i18n/translation.json`
-     - Add the translation key to `frontend/src/i18n/declaration.ts`
-  2. Add the setting to the backend:
-     - Add the setting to the `Settings` model in `openhands/storage/data_models/settings.py`
-     - Update any relevant backend code to apply the setting (e.g., in session creation)

-#### Settings UI Patterns:
+<CODE>
+- Avoid hacky trick like `sys.path.insert` when resolving package dependency
+- Use existing packages/libraries instead of implementing yourselves whenever possible.
+- Avoid using # type: ignore. Treat it only as a last resort. In most cases, issues should be resolved by improving type annotations, adding assertions, or adjusting code/tests—rather than silencing the type checker.
+  - Please AVOID using # type: ignore[attr-defined] unless absolutely necessary. If the issue can be addressed by adding a few extra assert statements to verify types, prefer that approach instead!
+  - For issue like # type: ignore[call-arg]: if you discover that the argument doesn’t actually exist, do not try to mock it again in tests. Instead, simply remove it.
+- Avoid doing in-line imports unless absolutely necessary (e.g., circular dependency).
+- Avoid getattr/hasattr guards and instead enforce type correctness by relying on explicit type assertions and proper object usage, ensuring functions only receive the expected Pydantic models or typed inputs. Prefer type hints and validated models over runtime shape checks.
+- Prefer accessing typed attributes directly. If necessary, convert inputs up front into a canonical shape; avoid purely hypothetical fallbacks.
+- Use real newlines in commit messages; do not write literal "\n".

-There are two main patterns for saving settings in the OpenHands frontend:
+</CODE>

-**Pattern 1: Entity-based Resources (Immediate Save)**
- Used for: API Keys, Secrets, MCP Servers
- Behavior: Changes are saved immediately when user performs actions (add/edit/delete)
- Implementation:
-  - No "Save Changes" button
-  - No local state management or `isDirty` tracking
-  - Uses dedicated mutation hooks for each operation (e.g., `use-add-mcp-server.ts`, `use-delete-mcp-server.ts`)
-  - Each mutation triggers immediate API call with query invalidation for UI updates
-  - Example: MCP settings, API Keys & Secrets tabs
- Benefits: Simpler UX, no risk of losing changes, consistent with modern web app patterns
+<TESTING>
+- AFTER you edit ONE file, you should run pre-commit hook on that file via `uv run pre-commit run --files [filepath]` to make sure you didn't break it.
+- Don't write TOO MUCH test, you should write just enough to cover edge cases.
+- Check how we perform tests in .github/workflows/tests.yml
+- Put unit tests under the corresponding domain folder in `tests/` (e.g., `tests/sdk`, `tests/tools`, `tests/workspace`). For example, changes to `openhands-sdk/openhands/sdk/tool/tool.py` should be covered in `tests/sdk/tool/test_tool.py`.
+- DON'T write TEST CLASSES unless absolutely necessary!
+- If you find yourself duplicating logics in preparing mocks, loading data etc, these logic should be fixtures in conftest.py!
+- Please test only the logic implemented in the current codebase. Do not test functionality (e.g., BaseModel.model_dumps()) that is not implemented in this repository.
+- For changes to prompt templates, tool descriptions, or agent decision logic, add the `integration-test` label to trigger integration tests and verify no unexpected impact on benchmark performance.

-**Pattern 2: Form-based Settings (Manual Save)**
- Used for: Application settings, LLM configuration
- Behavior: Changes are accumulated locally and saved when user clicks "Save Changes"
- Implementation:
-  - Has "Save Changes" button that becomes enabled when changes are detected
-  - Uses local state management with `isDirty` tracking
-  - Uses `useSaveSettings` hook to save all changes at once
-  - Example: LLM tab, Application tab
- Benefits: Allows bulk changes, explicit save action, can validate all fields before saving
+# Behavior Tests

-**When to use each pattern:**
- Use Pattern 1 (Immediate Save) for entity management where each item is independent
- Use Pattern 2 (Manual Save) for configuration forms where settings are interdependent or need validation
+Behavior tests (prefix `b##_*`) in `tests/integration/tests/` are designed to verify that agents exhibit desired behaviors in realistic scenarios. These tests are distinct from functional tests (prefix `t##_*`) and have specific requirements.

-### Adding New LLM Models
+Before adding or modifying behavior tests, review `tests/integration/BEHAVIOR_TESTS.md` for the latest workflow, expectations, and examples.
+</TESTING>

-To add a new LLM model to OpenHands, you need to update multiple files across both frontend and backend:
+<AGENT_TMP_DIRECTORY>
+# Agent Temporary Directory Convention

-#### Model Configuration Procedure:
+When tools need to store observation files (e.g., browser session recordings, task tracker data), use `.agent_tmp` as the directory name for consistency.

-1. **Frontend Model Arrays** (`frontend/src/utils/verified-models.ts`):
-   - Add the model to `VERIFIED_MODELS` array (main list of all verified models)
-   - Add to provider-specific arrays based on the model's provider:
-     - `VERIFIED_OPENAI_MODELS` for OpenAI models
-     - `VERIFIED_ANTHROPIC_MODELS` for Anthropic models
-     - `VERIFIED_MISTRAL_MODELS` for Mistral models
-     - `VERIFIED_OPENHANDS_MODELS` for models available through OpenHands provider
+The browser session recording tool saves recordings to `.agent_tmp/observations/recording-{timestamp}/`.

-2. **Backend CLI Integration** (`openhands/cli/utils.py`):
-   - Add the model to the appropriate `VERIFIED_*_MODELS` arrays
-   - This ensures the model appears in CLI model selection
+This convention ensures tool-generated observation files are stored in a predictable location that can be easily:
+- Added to `.gitignore`
+- Cleaned up after agent sessions
+- Identified as agent-generated artifacts

-3. **Backend Model List** (`openhands/utils/llm.py`):
-   - **CRITICAL**: Add the model to the `openhands_models` list (lines 57-66) if using OpenHands provider
-   - This is required for the model to appear in the frontend model selector
-   - Format: `'openhands/model-name'` (e.g., `'openhands/o3'`)
+Note: This is separate from `persistence_dir` which is used for conversation state persistence.
+</AGENT_TMP_DIRECTORY>

-4. **Backend LLM Configuration** (`openhands/llm/llm.py`):
-   - Add to feature-specific arrays based on model capabilities:
-     - `FUNCTION_CALLING_SUPPORTED_MODELS` if the model supports function calling
-     - `REASONING_EFFORT_SUPPORTED_MODELS` if the model supports reasoning effort parameters
-     - `CACHE_PROMPT_SUPPORTED_MODELS` if the model supports prompt caching
-     - `MODELS_WITHOUT_STOP_WORDS` if the model doesn't support stop words
+<REPO>
+<PROJECT_STRUCTURE>
+- This is a `uv`-managed Python monorepo (single `uv.lock` at repo root) with multiple distributable packages: `openhands-sdk/` (SDK), `openhands-tools/` (built-in tools), `openhands-workspace/` (workspace impls), and `openhands-agent-server/` (server runtime).
+- `examples/` contains runnable patterns; `tests/` is split by domain (`tests/sdk`, `tests/tools`, `tests/workspace`, `tests/agent_server`, etc.).
+- Python namespace is `openhands.*` across packages; keep new modules within the matching package and mirror test paths under `tests/`.
+</PROJECT_STRUCTURE>

-5. **Validation**:
-   - Run backend linting: `pre-commit run --config ./dev_config/python/.pre-commit-config.yaml`
-   - Run frontend linting: `cd frontend && npm run lint:fix`
-   - Run frontend build: `cd frontend && npm run build`
+<QUICK_COMMANDS>
+- Set up the dev environment: `make build` (runs `uv sync --dev` and installs pre-commit; requires uv >= 0.8.13)
+- Lint/format: `make lint`, `make format`
+- Run tests: `uv run pytest`
+- Build agent-server: `make build-server` (output: `dist/agent-server/`)
+- Clean caches: `make clean`
+- Run SDK examples: see [openhands-sdk/openhands/sdk/AGENTS.md](openhands-sdk/openhands/sdk/AGENTS.md).
+- The example workflow runs `uv run pytest tests/examples/test_examples.py --run-examples`; each successful example must print an `EXAMPLE_COST: ...` line to stdout (use `EXAMPLE_COST: 0` for non-LLM examples).
+- Conversation plugins passed via `plugins=[...]` are lazy-loaded on the first `send_message()` or `run()`, so example code should inspect plugin-added skills or `resolved_plugins` only after that first interaction.
+</QUICK_COMMANDS>

-#### Model Verification Arrays:
+<REPO_CONFIG_NOTES>
+- Ruff: `line-length = 88`, `target-version = "py312"` (see `pyproject.toml`).
+- Ruff ignores `ARG` (unused arguments) under `tests/**/*.py` to allow pytest fixtures.
+- Repository guidance lives in the project root AGENTS.md (loaded as a third-party skill file).
+</REPO_CONFIG_NOTES>

- **VERIFIED_MODELS**: Main array of all verified models shown in the UI
- **VERIFIED_OPENAI_MODELS**: OpenAI models (LiteLLM doesn't return provider prefix)
- **VERIFIED_ANTHROPIC_MODELS**: Anthropic models (LiteLLM doesn't return provider prefix)
- **VERIFIED_MISTRAL_MODELS**: Mistral models (LiteLLM doesn't return provider prefix)
- **VERIFIED_OPENHANDS_MODELS**: Models available through OpenHands managed provider
-
-#### Model Feature Support Arrays:
-
- **FUNCTION_CALLING_SUPPORTED_MODELS**: Models that support structured function calling
- **REASONING_EFFORT_SUPPORTED_MODELS**: Models that support reasoning effort parameters (like o1, o3)
- **CACHE_PROMPT_SUPPORTED_MODELS**: Models that support prompt caching for efficiency
- **MODELS_WITHOUT_STOP_WORDS**: Models that don't support stop word parameters
-
-#### Frontend Model Integration:
-
- Models are automatically available in the model selector UI once added to verified arrays
- The `extractModelAndProvider` utility automatically detects provider from model arrays
- Provider-specific models are grouped and prioritized in the UI selection
-
-#### CLI Model Integration:
-
- Models appear in CLI provider selection based on the verified arrays
- The `organize_models_and_providers` function groups models by provider
- Default model selection prioritizes verified models for each provider
+</REPO>
@@ -1,55 +0,0 @@
-cff-version: 1.2.0
-message: "If you use this software, please cite it using the following metadata."
-title: "OpenHands: An Open Platform for AI Software Developers as Generalist Agents"
-authors:
-  - family-names: Wang
-    given-names: Xingyao
-  - family-names: Li
-    given-names: Boxuan
-  - family-names: Song
-    given-names: Yufan
-  - family-names: Xu
-    given-names: Frank F.
-  - family-names: Tang
-    given-names: Xiangru
-  - family-names: Zhuge
-    given-names: Mingchen
-  - family-names: Pan
-    given-names: Jiayi
-  - family-names: Song
-    given-names: Yueqi
-  - family-names: Li
-    given-names: Bowen
-  - family-names: Singh
-    given-names: Jaskirat
-  - family-names: Tran
-    given-names: Hoang H.
-  - family-names: Li
-    given-names: Fuqiang
-  - family-names: Ma
-    given-names: Ren
-  - family-names: Zheng
-    given-names: Mingzhang
-  - family-names: Qian
-    given-names: Bill
-  - family-names: Shao
-    given-names: Yanjun
-  - family-names: Muennighoff
-    given-names: Niklas
-  - family-names: Zhang
-    given-names: Yizhe
-  - family-names: Hui
-    given-names: Binyuan
-  - family-names: Lin
-    given-names: Junyang
-  - family-names: Brennan
-    given-names: Robert
-  - family-names: Peng
-    given-names: Hao
-  - family-names: Ji
-    given-names: Heng
-  - family-names: Neubig
-    given-names: Graham
-year: 2024
-doi: "10.48550/arXiv.2407.16741"
-url: "https://arxiv.org/abs/2407.16741"
@@ -1 +0,0 @@
-docs.all-hands.dev
@@ -1,152 +0,0 @@
-
-# Contributor Covenant Code of Conduct
-
-## Our Pledge
-
-We as members, contributors, and leaders pledge to make participation in our
-community a harassment-free experience for everyone, regardless of age, body
-size, visible or invisible disability, ethnicity, sex characteristics, gender
-identity and expression, level of experience, education, socio-economic status,
-nationality, personal appearance, race, caste, color, religion, or sexual
-identity and orientation.
-
-We pledge to act and interact in ways that contribute to an open, welcoming,
-diverse, inclusive, and healthy community.
-
-## Our Standards
-
-Examples of behavior that contributes to a positive environment for our
-community include:
-
-* Demonstrating empathy and kindness toward other people.
-* Being respectful of differing opinions, viewpoints, and experiences.
-* Giving and gracefully accepting constructive feedback.
-* Accepting responsibility and apologizing to those affected by our mistakes,
-  and learning from the experience.
-* Focusing on what is best not just for us as individuals, but for the overall
-  community.
-
-Examples of unacceptable behavior include:
-
-* The use of sexualized language or imagery, and sexual attention or advances of
-  any kind.
-* Trolling, insulting or derogatory comments, and personal or political attacks.
-* Public or private harassment.
-* Publishing others' private information, such as a physical or email address,
-  without their explicit permission.
-* Other conduct which could reasonably be considered inappropriate in a
-  professional setting.
-
-## Enforcement Responsibilities
-
-Community leaders are responsible for clarifying and enforcing our standards of
-acceptable behavior and will take appropriate and fair corrective action in
-response to any behavior that they deem inappropriate, threatening, offensive,
-or harmful.
-
-Community leaders have the right and responsibility to remove, edit, or reject
-comments, commits, code, wiki edits, issues, and other contributions that are
-not aligned to this Code of Conduct, and will communicate reasons for moderation
-decisions when appropriate.
-
-## Scope
-
-This Code of Conduct applies within all community spaces, and also applies when
-an individual is officially representing the community in public spaces.
-Examples of representing our community include using an official email address,
-posting via an official social media account, or acting as an appointed
-representative at an online or offline event.
-
-## Enforcement
-
-Instances of abusive, harassing, or otherwise unacceptable behavior may be
-reported to the community leaders responsible for enforcement at
-contact@openhands.dev.
-All complaints will be reviewed and investigated promptly and fairly.
-
-All community leaders are obligated to respect the privacy and security of the
-reporter of any incident.
-
-## Enforcement Guidelines
-
-Community leaders will follow these Community Impact Guidelines in determining
-the consequences for any action they deem in violation of this Code of Conduct:
-
-### 1. Correction
-
-**Community Impact**: Use of inappropriate language or other behavior deemed
-unprofessional or unwelcome in the community.
-
-**Consequence**: A private, written warning from community leaders, providing
-clarity around the nature of the violation and an explanation of why the
-behavior was inappropriate. A public apology may be requested.
-
-### 2. Warning
-
-**Community Impact**: A violation through a single incident or series of
-actions.
-
-**Consequence**: A warning with consequences for continued behavior. No
-interaction with the people involved, including unsolicited interaction with
-those enforcing the Code of Conduct, for a specified period of time. This
-includes avoiding interactions in community spaces as well as external channels
-like social media. Violating these terms may lead to a temporary or permanent
-ban.
-
-### 3. Temporary Ban
-
-**Community Impact**: A serious violation of community standards, including
-sustained inappropriate behavior.
-
-**Consequence**: A temporary ban from any sort of interaction or public
-communication with the community for a specified period of time. No public or
-private interaction with the people involved, including unsolicited interaction
-with those enforcing the Code of Conduct, is allowed during this period.
-Violating these terms may lead to a permanent ban.
-
-### 4. Permanent Ban
-
-**Community Impact**: Demonstrating a pattern of violation of community
-standards, including sustained inappropriate behavior, harassment of an
-individual, or aggression toward or disparagement of classes of individuals.
-
-**Consequence**: A permanent ban from any sort of public interaction within the
-community.
-
-### Slack Etiquettes
-
-These Slack etiquette guidelines are designed to foster an inclusive, respectful, and productive environment for all
-community members. By following these best practices, we ensure effective communication and collaboration while
-minimizing disruptions. Let’s work together to build a supportive and welcoming community!
-
- Communicate respectfully and professionally, avoiding sarcasm or harsh language, and remember that tone can be difficult to interpret in text.
- Use threads for specific discussions to keep channels organized and easier to follow.
- Tag others only when their input is critical or urgent, and use @here, @channel or @everyone sparingly to minimize disruptions.
- Be patient, as open-source contributors and maintainers often have other commitments and may need time to respond.
- Post questions or discussions in the most relevant channel (e.g., for [slack - #general](https://openhands-ai.slack.com/archives/C06P5NCGSFP) for general topics, [slack - #questions](https://openhands-ai.slack.com/archives/C06U8UTKSAD) for queries/questions.
- When asking for help or raising issues, include necessary details like links, screenshots, or clear explanations to provide context.
- Keep discussions in public channels whenever possible to allow others to benefit from the conversation, unless the matter is sensitive or private.
- Always adhere to [our standards](https://github.com/OpenHands/OpenHands/blob/main/CODE_OF_CONDUCT.md#our-standards) to ensure a welcoming and collaborative environment.
- If you choose to mute a channel, consider setting up alerts for topics that still interest you to stay engaged.
-   For Slack, Go to Settings → Notifications → My Keywords to add specific keywords that will notify you when mentioned.
-   For example, if you're here for discussions about LLMs, mute the channel if it’s too busy, but set notifications to
-   alert you only when “LLMs” appears in messages.
-
-## Attribution
-
-This Code of Conduct is adapted from the [Contributor Covenant][homepage],
-version 2.1, available at
-[https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
-
-Community Impact Guidelines were inspired by
-[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
-
-For answers to common questions about this code of conduct, see the FAQ at
-[https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
-[https://www.contributor-covenant.org/translations][translations].
-
-[homepage]: https://www.contributor-covenant.org
-[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
-[Mozilla CoC]: https://github.com/mozilla/diversity
-[FAQ]: https://www.contributor-covenant.org/faq
-[translations]: https://www.contributor-covenant.org/translations
@@ -1,58 +0,0 @@
-# The OpenHands Community
-
-OpenHands is a community of engineers, academics, and enthusiasts reimagining software development for an AI-powered
-world.
-
-## Mission
-
-It’s very clear that AI is changing software development. We want the developer community to drive that change
-organically, through open source.
-
-So we’re not just building friendly interfaces for AI-driven development. We’re publishing _building blocks_ that
-empower developers to create new experiences, tailored to your own habits, needs, and imagination.
-
-## Ethos
-
-We have two core values: **high openness** and **high agency**. While we don’t expect everyone in the community to
-embody these values, we want to establish them as norms.
-
-### High Openness
-
-We welcome anyone and everyone into our community by default. You don’t have to be a software developer to help us
-build. You don’t have to be pro-AI to help us learn.
-
-Our plans, our work, our successes, and our failures are all public record. We want the world to see not just the
-fruits of our work, but the whole process of growing it.
-
-We welcome thoughtful criticism, whether it’s a comment on a PR or feedback on the community as a whole.
-
-### High Agency
-
-Everyone should feel empowered to contribute to OpenHands. Whether it’s by making a PR, hosting an event, sharing
-feedback, or just asking a question, don’t hold back!
-
-OpenHands gives everyone the building blocks to create state-of-the-art developer experiences. We experiment constantly
-and love building new things.
-
-Coding, development practices, and communities are changing rapidly. We won’t hesitate to change direction and make big bets.
-
-## Relationship to All Hands
-
-OpenHands is supported by the for-profit organization [All Hands AI, Inc](https://www.openhands.dev/).
-
-All Hands was founded by three of the first major contributors to OpenHands:
-
- Xingyao Wang, a UIUC PhD candidate who got OpenHands to the top of the SWE-bench leaderboards
- Graham Neubig, a CMU Professor who rallied the academic community around OpenHands
- Robert Brennan, a software engineer who architected the user-facing features of OpenHands
-
-All Hands is an important part of the OpenHands ecosystem. We’ve raised over $20M--mainly to hire developers and
-researchers who can work on OpenHands full-time, and to provide them with expensive infrastructure. ([Join us!](https://allhandsai.applytojob.com/apply/))
-
-But we see OpenHands as much larger, and ultimately more important, than All Hands. When our financial responsibility
-to investors is at odds with our social responsibility to the community—as it inevitably will be, from time to time—we
-promise to navigate that conflict thoughtfully and transparently.
-
-At some point, we may transfer custody of OpenHands to an open source foundation. But for now,
-the [Benevolent Dictator approach](http://www.catb.org/~esr/writings/cathedral-bazaar/homesteading/ar01s16.html) helps us move forward with speed and intention. If we ever forget the
-“benevolent” part, please: fork us.
@@ -1,138 +1,70 @@
 # Contributing

-Thanks for your interest in contributing to OpenHands! We welcome and appreciate contributions.
+Thank you for helping improve the OpenHands Software Agent SDK.

-## Understanding OpenHands's CodeBase
+This repo is a foundation. We want the SDK to stay stable and extensible so that many
+applications can build on it safely.

-To understand the codebase, please refer to the README in each module:
- [frontend](./frontend/README.md)
- [evaluation](./evaluation/README.md)
- [openhands](./openhands/README.md)
-   - [agenthub](./openhands/agenthub/README.md)
-   - [server](./openhands/server/README.md)
+Downstream applications we actively keep in mind:
+- [OpenHands-CLI](https://github.com/OpenHands/OpenHands-CLI) (client)
+- [OpenHands app-server](https://github.com/OpenHands/OpenHands/blob/main/openhands/app_server/README.md) (client)
+- [OpenHands Enterprise](https://github.com/OpenHands/OpenHands/blob/main/enterprise/README.md) (client)

-## Setting up Your Development Environment
+The SDK itself has a Python interface. In addition, the
+[agent-server](https://docs.openhands.dev/sdk/guides/agent-server/overview) is the
+REST/WebSocket server component that exposes the SDK for remote execution and integrations.
+Changes should keep both interfaces stable and consistent.

-We have a separate doc [Development.md](https://github.com/OpenHands/OpenHands/blob/main/Development.md) that tells
-you how to set up a development workflow.
+## A lesson we learned (why we care about architecture)

-## How Can I Contribute?
+In earlier iterations, we repeatedly ran into a failure mode: needs from downstream applications
+(or assumptions) would leak into core logic.

-There are many ways that you can contribute:
+That kind of coupling can feel convenient in the moment, but it tends to create subtle
+breakage elsewhere: different environments, different workspaces, different execution modes,
+and different evaluation setups.

-1. **Download and use** OpenHands, and send [issues](https://github.com/OpenHands/OpenHands/issues) when you encounter something that isn't working or a feature that you'd like to see.
-2. **Send feedback** after each session by [clicking the thumbs-up thumbs-down buttons](https://docs.openhands.dev/usage/feedback), so we can see where things are working and failing, and also build an open dataset for training code agents.
-3. **Improve the Codebase** by sending [PRs](#sending-pull-requests-to-openhands) (see details below). In particular, we have some [good first issues](https://github.com/OpenHands/OpenHands/labels/good%20first%20issue) that may be ones to start on.
+The architecture of OpenHands V0 was too monolithic to support multiple applications built into it,
+as CLI, evaluation scripts, web server were, and built on it, as OpenHands Cloud was.

-## What Can I Build?
+If you’re interested in the deeper background and lessons learned, see our write-up:
+[OpenHands: An Open Platform for AI Software Developers as Generalist Agents](https://arxiv.org/abs/2511.03690)

-Here are a few ways you can help improve the codebase.
+This SDK exists (as a separate, rebuilt foundation) to avoid that failure mode.

-#### UI/UX
+## Principles we review PRs with

-We're always looking to improve the look and feel of the application. If you've got a small fix
-for something that's bugging you, feel free to open up a PR that changes the [`./frontend`](./frontend) directory.
+We welcome all contributions, big or small, to improve or extend the software agent SDK.

-If you're looking to make a bigger change, add a new UI element, or significantly alter the style
-of the application, please open an issue first, or better, join the #dev-ui-ux channel in our Slack
-to gather consensus from our design team first.
+You may find that occasionally we are opinionated about several things:

-#### Improving the agent
+- **OpenHands SDK is its own thing**: its downstream are client applications.
+- **Prefer interfaces over special cases**: if a client needs something, add or improve a
+  clean, reusable interface/extension point instead of adding a shortcut.
+- **Extensibility over one-off patches**: design features so multiple clients can adopt them
+  without rewriting core logic.
+- **Avoid hidden assumptions**: don’t rely on particular env vars, workspace layouts, request
+  contexts, or runtime quirks that only exist in one app.
+  - Workspaces *do* encode environment specifics (local/Docker/remote), but keep those assumptions
+    explicit (params + validation) and contained to the `workspace` layer.
+- **No client-specific code paths**: avoid logic that only makes sense for one
+  downstream app.
+  - It’s fine to have multiple workspace implementations; it’s not fine for SDK core behavior to
+    branch on whether the caller is CLI/app-server/SaaS. Prefer capabilities/config over app-identity.
+- **Keep the agent loop stable**: treat stability as a feature; be cautious with control-flow
+  changes and "small" behavior tweaks.
+- **Compatibility is part of the API**: if something could break downstream clients, call it
+  out explicitly and consider a migration path. We have a deprecation mechanism you may want to use.

-Our main agent is the CodeAct agent. You can [see its prompts here](https://github.com/OpenHands/OpenHands/tree/main/openhands/agenthub/codeact_agent).
+If you’re not sure whether a change crosses these lines, please ask early. We’re happy to help think
+through the shape of a clean interface.

-Changes to these prompts, and to the underlying behavior in Python, can have a huge impact on user experience.
-You can try modifying the prompts to see how they change the behavior of the agent as you use the app
-locally, but we will need to do an end-to-end evaluation of any changes here to ensure that the agent
-is getting better over time.
+## Practical pointers

-We use the [SWE-bench](https://www.swebench.com/) benchmark to test our agent. You can join the #evaluation
-channel in Slack to learn more.
+This file is mostly about principles. For the mechanics, please see:
+- [AGENTS.md](AGENTS.md) for AI agents
+- [DEVELOPMENT.md](DEVELOPMENT.md) for humans

-#### Adding a new agent
+## Questions / discussion

-You may want to experiment with building new types of agents. You can add an agent to [`openhands/agenthub`](./openhands/agenthub)
-to help expand the capabilities of OpenHands.
-
-#### Adding a new runtime
-
-The agent needs a place to run code and commands. When you run OpenHands on your laptop, it uses a Docker container
-to do this by default. But there are other ways of creating a sandbox for the agent.
-
-If you work for a company that provides a cloud-based runtime, you could help us add support for that runtime
-by implementing the [interface specified here](https://github.com/OpenHands/OpenHands/blob/main/openhands/runtime/base.py).
-
-#### Testing
-
-When you write code, it is also good to write tests. Please navigate to the [`./tests`](./tests) folder to see existing
-test suites. At the moment, we have these kinds of tests: [`unit`](./tests/unit), [`runtime`](./tests/runtime), and [`end-to-end (e2e)`](./tests/e2e).
-Please refer to the README for each test suite. These tests also run on GitHub's continuous integration to ensure
-quality of the project.
-
-## Sending Pull Requests to OpenHands
-
-You'll need to fork our repository to send us a Pull Request. You can learn more
-about how to fork a GitHub repo and open a PR with your changes in [this article](https://medium.com/swlh/forks-and-pull-requests-how-to-contribute-to-github-repos-8843fac34ce8).
-
-### Pull Request title
-
-As described [here](https://github.com/commitizen/conventional-commit-types/blob/master/index.json), ideally a valid PR title should begin with one of the following prefixes:
-
- `feat`: A new feature
- `fix`: A bug fix
- `docs`: Documentation only changes
- `style`: Changes that do not affect the meaning of the code (white space, formatting, missing semicolons, etc.)
- `refactor`: A code change that neither fixes a bug nor adds a feature
- `perf`: A code change that improves performance
- `test`: Adding missing tests or correcting existing tests
- `build`: Changes that affect the build system or external dependencies (example scopes: gulp, broccoli, npm)
- `ci`: Changes to our CI configuration files and scripts (example scopes: Travis, Circle, BrowserStack, SauceLabs)
- `chore`: Other changes that don't modify src or test files
- `revert`: Reverts a previous commit
-
-For example, a PR title could be:
- `refactor: modify package path`
- `feat(frontend): xxxx`, where `(frontend)` means that this PR mainly focuses on the frontend component.
-
-You may also check out previous PRs in the [PR list](https://github.com/OpenHands/OpenHands/pulls).
-
-### Pull Request description
-
- If your PR is small (such as a typo fix), you can go brief.
- If it contains a lot of changes, it's better to write more details.
-
-If your changes are user-facing (e.g. a new feature in the UI, a change in behavior, or a bugfix)
-please include a short message that we can add to our changelog.
-
-## How to Make Effective Contributions
-
-### Opening Issues
-
-If you notice any bugs or have any feature requests please open them via the [issues page](https://github.com/OpenHands/OpenHands/issues). We will triage
-based on how critical the bug is or how potentially useful the improvement is, discuss, and implement the ones that
-the community has interest/effort for.
-
-Further, if you see an issue you like, please leave a "thumbs-up" or a comment, which will help us prioritize.
-
-### Making Pull Requests
-
-We're generally happy to consider all pull requests with the evaluation process varying based on the type of change:
-
-#### For Small Improvements
-
-Small improvements with few downsides are typically reviewed and approved quickly.
-One thing to check when making changes is to ensure that all continuous integration tests pass, which you can check
-before getting a review.
-
-#### For Core Agent Changes
-
-We need to be more careful with changes to the core agent, as it is imperative to maintain high quality. These PRs are
-evaluated based on three key metrics:
-
-1. **Accuracy**
-2. **Efficiency**
-3. **Code Complexity**
-
-If it improves accuracy, efficiency, or both with only a minimal change to code quality, that's great we're happy to merge it in!
-If there are bigger tradeoffs (e.g. helping efficiency a lot and hurting accuracy a little) we might want to put it behind a feature flag.
-Either way, please feel free to discuss on github issues or slack, and we will give guidance and preliminary feedback.
+Join us on Slack: https://openhands.dev/joinslack
@@ -1,328 +0,0 @@
-# Credits
-
-## Contributors
-
-We would like to thank all the [contributors](https://github.com/OpenHands/OpenHands/graphs/contributors) who have
-helped make OpenHands possible. We greatly appreciate your dedication and hard work.
-
-## Open Source Projects
-
-OpenHands includes and adapts the following open source projects. We are grateful for their contributions to the
-open source community:
-
-#### [SWE Agent](https://github.com/princeton-nlp/swe-agent)
-   - License: MIT License
-   - Description: Adapted for use in OpenHands's agent hub
-
-#### [Aider](https://github.com/paul-gauthier/aider)
-   - License: Apache License 2.0
-   - Description: AI pair programming tool. OpenHands has adapted and integrated its linter module for code-related tasks in [`agentskills utilities`](https://github.com/OpenHands/OpenHands/tree/main/openhands/runtime/plugins/agent_skills/utils/aider)
-
-#### [BrowserGym](https://github.com/ServiceNow/BrowserGym)
-   - License: Apache License 2.0
-   - Description: Adapted in implementing the browsing agent
-
-### Reference Implementations for Evaluation Benchmarks
-
-OpenHands integrates code of the reference implementations for the following agent evaluation benchmarks:
-
-#### [HumanEval](https://github.com/openai/human-eval)
-   - License: MIT License
-
-#### [DSP](https://github.com/microsoft/DataScienceProblems)
-   - License: MIT License
-
-#### [HumanEvalPack](https://github.com/bigcode-project/bigcode-evaluation-harness)
-   - License: Apache License 2.0
-
-#### [AgentBench](https://github.com/THUDM/AgentBench)
-   - License: Apache License 2.0
-
-#### [SWE-Bench](https://github.com/princeton-nlp/SWE-bench)
-   - License: MIT License
-
-#### [BIRD](https://bird-bench.github.io/)
-   - License: MIT License
-   - Dataset: CC-BY-SA 4.0
-
-#### [Gorilla APIBench](https://github.com/ShishirPatil/gorilla)
-   - License: Apache License 2.0
-
-#### [GPQA](https://github.com/idavidrein/gpqa)
-   - License: MIT License
-
-#### [ProntoQA](https://github.com/asaparov/prontoqa)
-   - License: Apache License 2.0
-
-## Open Source licenses
-
-### MIT License
-
-Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated
-documentation files (the "Software"), to deal in the Software without restriction, including without limitation the
-rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit
-persons to whom the Software is furnished to do so, subject to the following conditions:
-
-The above copyright notice and this permission notice shall be included in all copies or substantial portions of the
-Software.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO
-THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS
-OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
-OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-
-### BSD 3-Clause License
-
-Redistribution and use in source and binary forms, with or without modification, are permitted provided that the
-following conditions are met:
-
-1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following
-   disclaimer.
-
-2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following
-   disclaimer in the documentation and/or other materials provided with the distribution.
-
-3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote
-   products derived from this software without specific prior written permission.
-
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
-INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
-DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
-SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
-SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
-WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
-### Apache License 2.0
-
-
-                                 Apache License
-                           Version 2.0, January 2004
-                        http://www.apache.org/licenses/
-
-   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
-
-   1. Definitions.
-
-      "License" shall mean the terms and conditions for use, reproduction,
-      and distribution as defined by Sections 1 through 9 of this document.
-
-      "Licensor" shall mean the copyright owner or entity authorized by
-      the copyright owner that is granting the License.
-
-      "Legal Entity" shall mean the union of the acting entity and all
-      other entities that control, are controlled by, or are under common
-      control with that entity. For the purposes of this definition,
-      "control" means (i) the power, direct or indirect, to cause the
-      direction or management of such entity, whether by contract or
-      otherwise, or (ii) ownership of fifty percent (50%) or more of the
-      outstanding shares, or (iii) beneficial ownership of such entity.
-
-      "You" (or "Your") shall mean an individual or Legal Entity
-      exercising permissions granted by this License.
-
-      "Source" form shall mean the preferred form for making modifications,
-      including but not limited to software source code, documentation
-      source, and configuration files.
-
-      "Object" form shall mean any form resulting from mechanical
-      transformation or translation of a Source form, including but
-      not limited to compiled object code, generated documentation,
-      and conversions to other media types.
-
-      "Work" shall mean the work of authorship, whether in Source or
-      Object form, made available under the License, as indicated by a
-      copyright notice that is included in or attached to the work
-      (an example is provided in the Appendix below).
-
-      "Derivative Works" shall mean any work, whether in Source or Object
-      form, that is based on (or derived from) the Work and for which the
-      editorial revisions, annotations, elaborations, or other modifications
-      represent, as a whole, an original work of authorship. For the purposes
-      of this License, Derivative Works shall not include works that remain
-      separable from, or merely link (or bind by name) to the interfaces of,
-      the Work and Derivative Works thereof.
-
-      "Contribution" shall mean any work of authorship, including
-      the original version of the Work and any modifications or additions
-      to that Work or Derivative Works thereof, that is intentionally
-      submitted to Licensor for inclusion in the Work by the copyright owner
-      or by an individual or Legal Entity authorized to submit on behalf of
-      the copyright owner. For the purposes of this definition, "submitted"
-      means any form of electronic, verbal, or written communication sent
-      to the Licensor or its representatives, including but not limited to
-      communication on electronic mailing lists, source code control systems,
-      and issue tracking systems that are managed by, or on behalf of, the
-      Licensor for the purpose of discussing and improving the Work, but
-      excluding communication that is conspicuously marked or otherwise
-      designated in writing by the copyright owner as "Not a Contribution."
-
-      "Contributor" shall mean Licensor and any individual or Legal Entity
-      on behalf of whom a Contribution has been received by Licensor and
-      subsequently incorporated within the Work.
-
-   2. Grant of Copyright License. Subject to the terms and conditions of
-      this License, each Contributor hereby grants to You a perpetual,
-      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
-      copyright license to reproduce, prepare Derivative Works of,
-      publicly display, publicly perform, sublicense, and distribute the
-      Work and such Derivative Works in Source or Object form.
-
-   3. Grant of Patent License. Subject to the terms and conditions of
-      this License, each Contributor hereby grants to You a perpetual,
-      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
-      (except as stated in this section) patent license to make, have made,
-      use, offer to sell, sell, import, and otherwise transfer the Work,
-      where such license applies only to those patent claims licensable
-      by such Contributor that are necessarily infringed by their
-      Contribution(s) alone or by combination of their Contribution(s)
-      with the Work to which such Contribution(s) was submitted. If You
-      institute patent litigation against any entity (including a
-      cross-claim or counterclaim in a lawsuit) alleging that the Work
-      or a Contribution incorporated within the Work constitutes direct
-      or contributory patent infringement, then any patent licenses
-      granted to You under this License for that Work shall terminate
-      as of the date such litigation is filed.
-
-   4. Redistribution. You may reproduce and distribute copies of the
-      Work or Derivative Works thereof in any medium, with or without
-      modifications, and in Source or Object form, provided that You
-      meet the following conditions:
-
-      (a) You must give any other recipients of the Work or
-          Derivative Works a copy of this License; and
-
-      (b) You must cause any modified files to carry prominent notices
-          stating that You changed the files; and
-
-      (c) You must retain, in the Source form of any Derivative Works
-          that You distribute, all copyright, patent, trademark, and
-          attribution notices from the Source form of the Work,
-          excluding those notices that do not pertain to any part of
-          the Derivative Works; and
-
-      (d) If the Work includes a "NOTICE" text file as part of its
-          distribution, then any Derivative Works that You distribute must
-          include a readable copy of the attribution notices contained
-          within such NOTICE file, excluding those notices that do not
-          pertain to any part of the Derivative Works, in at least one
-          of the following places: within a NOTICE text file distributed
-          as part of the Derivative Works; within the Source form or
-          documentation, if provided along with the Derivative Works; or,
-          within a display generated by the Derivative Works, if and
-          wherever such third-party notices normally appear. The contents
-          of the NOTICE file are for informational purposes only and
-          do not modify the License. You may add Your own attribution
-          notices within Derivative Works that You distribute, alongside
-          or as an addendum to the NOTICE text from the Work, provided
-          that such additional attribution notices cannot be construed
-          as modifying the License.
-
-      You may add Your own copyright statement to Your modifications and
-      may provide additional or different license terms and conditions
-      for use, reproduction, or distribution of Your modifications, or
-      for any such Derivative Works as a whole, provided Your use,
-      reproduction, and distribution of the Work otherwise complies with
-      the conditions stated in this License.
-
-   5. Submission of Contributions. Unless You explicitly state otherwise,
-      any Contribution intentionally submitted for inclusion in the Work
-      by You to the Licensor shall be under the terms and conditions of
-      this License, without any additional terms or conditions.
-      Notwithstanding the above, nothing herein shall supersede or modify
-      the terms of any separate license agreement you may have executed
-      with Licensor regarding such Contributions.
-
-   6. Trademarks. This License does not grant permission to use the trade
-      names, trademarks, service marks, or product names of the Licensor,
-      except as required for reasonable and customary use in describing the
-      origin of the Work and reproducing the content of the NOTICE file.
-
-   7. Disclaimer of Warranty. Unless required by applicable law or
-      agreed to in writing, Licensor provides the Work (and each
-      Contributor provides its Contributions) on an "AS IS" BASIS,
-      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
-      implied, including, without limitation, any warranties or conditions
-      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
-      PARTICULAR PURPOSE. You are solely responsible for determining the
-      appropriateness of using or redistributing the Work and assume any
-      risks associated with Your exercise of permissions under this License.
-
-   8. Limitation of Liability. In no event and under no legal theory,
-      whether in tort (including negligence), contract, or otherwise,
-      unless required by applicable law (such as deliberate and grossly
-      negligent acts) or agreed to in writing, shall any Contributor be
-      liable to You for damages, including any direct, indirect, special,
-      incidental, or consequential damages of any character arising as a
-      result of this License or out of the use or inability to use the
-      Work (including but not limited to damages for loss of goodwill,
-      work stoppage, computer failure or malfunction, or any and all
-      other commercial damages or losses), even if such Contributor
-      has been advised of the possibility of such damages.
-
-   9. Accepting Warranty or Additional Liability. While redistributing
-      the Work or Derivative Works thereof, You may choose to offer,
-      and charge a fee for, acceptance of support, warranty, indemnity,
-      or other liability obligations and/or rights consistent with this
-      License. However, in accepting such obligations, You may act only
-      on Your own behalf and on Your sole responsibility, not on behalf
-      of any other Contributor, and only if You agree to indemnify,
-      defend, and hold each Contributor harmless for any liability
-      incurred by, or claims asserted against, such Contributor by reason
-      of your accepting any such warranty or additional liability.
-
-   END OF TERMS AND CONDITIONS
-
-   APPENDIX: How to apply the Apache License to your work.
-
-      To apply the Apache License to your work, attach the following
-      boilerplate notice, with the fields enclosed by brackets "[]"
-      replaced with your own identifying information. (Don't include
-      the brackets!)  The text should be enclosed in the appropriate
-      comment syntax for the file format. We also recommend that a
-      file or class name and description of purpose be included on the
-      same "printed page" as the copyright notice for easier
-      identification within third-party archives.
-
-   Copyright [yyyy] [name of copyright owner]
-
-### Non-Open Source Reference Implementations:
-
-#### [MultiPL-E](https://github.com/nuprl/MultiPL-E)
-   - License: BSD 3-Clause License with Machine Learning Restriction
-
-BSD 3-Clause License with Machine Learning Restriction
-
-Copyright (c) 2022, Northeastern University, Oberlin College, Roblox Inc,
-Stevens Institute of Technology, University of Massachusetts Amherst, and
-Wellesley College.
-
-All rights reserved.
-
-Redistribution and use in source and binary forms, with or without
-modification, are permitted provided that the following conditions are met:
-
-1. Redistributions of source code must retain the above copyright notice, this
-   list of conditions and the following disclaimer.
-
-2. Redistributions in binary form must reproduce the above copyright notice,
-   this list of conditions and the following disclaimer in the documentation
-   and/or other materials provided with the distribution.
-
-3. Neither the name of the copyright holder nor the names of its
-   contributors may be used to endorse or promote products derived from
-   this software without specific prior written permission.
-
-4.  The contents of this repository may not be used as training data for any
-    machine learning model, including but not limited to neural networks.
-
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
-AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
-DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
-FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
-DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
-SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
-CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
-OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
@@ -0,0 +1,48 @@
+# Development Guide
+
+## Setup
+
+```bash
+git clone https://github.com/OpenHands/agent-sdk.git
+cd agent-sdk
+make build
+```
+
+## Code Quality
+
+```bash
+make format                              # Format code
+make lint                                # Lint code
+uv run pre-commit run --all-files        # Run all checks
+```
+
+Pre-commit hooks run automatically on commit with type checking and linting.
+
+## Testing
+
+```bash
+uv run pytest                            # All tests
+uv run pytest tests/sdk/                 # SDK tests only
+uv run pytest tests/tools/               # Tools tests only
+```
+
+## Project Structure
+
+```
+agent-sdk/
+├── openhands-sdk/          # Core SDK package
+├── openhands-tools/        # Built-in tools
+├── openhands-workspace/    # Workspace management
+├── openhands-agent-server/ # Agent server
+├── examples/               # Usage examples
+└── tests/                  # Test suites
+```
+
+## Contributing
+
+1. Create a new branch
+2. Make your changes
+3. Run tests and checks
+4. Push and create a pull request
+
+For questions, join our [Slack community](https://openhands.dev/joinslack).
@@ -1,206 +0,0 @@
-# Development Guide
-
-This guide is for people working on OpenHands and editing the source code.
-If you wish to contribute your changes, check out the
-[CONTRIBUTING.md](https://github.com/OpenHands/OpenHands/blob/main/CONTRIBUTING.md)
-on how to clone and setup the project initially before moving on. Otherwise,
-you can clone the OpenHands project directly.
-
-## Start the Server for Development
-
-### 1. Requirements
-
- Linux, Mac OS, or [WSL on Windows](https://learn.microsoft.com/en-us/windows/wsl/install) [Ubuntu >= 22.04]
- [Docker](https://docs.docker.com/engine/install/) (For those on MacOS, make sure to allow the default Docker socket to be used from advanced settings!)
- [Python](https://www.python.org/downloads/) = 3.12
- [NodeJS](https://nodejs.org/en/download/package-manager) >= 22.x
- [Poetry](https://python-poetry.org/docs/#installing-with-the-official-installer) >= 1.8
- OS-specific dependencies:
-  - Ubuntu: build-essential => `sudo apt-get install build-essential python3.12-dev`
-  - WSL: netcat => `sudo apt-get install netcat`
-
-Make sure you have all these dependencies installed before moving on to `make build`.
-
-#### Dev container
-
-There is a [dev container](https://containers.dev/) available which provides a
-pre-configured environment with all the necessary dependencies installed if you
-are using a [supported editor or tool](https://containers.dev/supporting). For
-example, if you are using Visual Studio Code (VS Code) with the
-[Dev Containers](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers)
-extension installed, you can open the project in a dev container by using the
-_Dev Container: Reopen in Container_ command from the Command Palette
-(Ctrl+Shift+P).
-
-#### Develop without sudo access
-
-If you want to develop without system admin/sudo access to upgrade/install `Python` and/or `NodeJS`, you can use
-`conda` or `mamba` to manage the packages for you:
-
-```bash
-# Download and install Mamba (a faster version of conda)
-curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
-bash Miniforge3-$(uname)-$(uname -m).sh
-
-# Install Python 3.12, nodejs, and poetry
-mamba install python=3.12
-mamba install conda-forge::nodejs
-mamba install conda-forge::poetry
-```
-
-### 2. Build and Setup The Environment
-
-Begin by building the project which includes setting up the environment and installing dependencies. This step ensures
-that OpenHands is ready to run on your system:
-
-```bash
-make build
-```
-
-### 3. Configuring the Language Model
-
-OpenHands supports a diverse array of Language Models (LMs) through the powerful [litellm](https://docs.litellm.ai) library.
-
-To configure the LM of your choice, run:
-
-```bash
-make setup-config
-```
-
-This command will prompt you to enter the LLM API key, model name, and other variables ensuring that OpenHands is
-tailored to your specific needs. Note that the model name will apply only when you run headless. If you use the UI,
-please set the model in the UI.
-
-Note: If you have previously run OpenHands using the docker command, you may have already set some environment
-variables in your terminal. The final configurations are set from highest to lowest priority:
-Environment variables > config.toml variables > default variables
-
-**Note on Alternative Models:**
-See [our documentation](https://docs.openhands.dev/usage/llms) for recommended models.
-
-### 4. Running the application
-
-#### Option A: Run the Full Application
-
-Once the setup is complete, this command starts both the backend and frontend servers, allowing you to interact with OpenHands:
-
-```bash
-make run
-```
-
-#### Option B: Individual Server Startup
-
- **Start the Backend Server:** If you prefer, you can start the backend server independently to focus on
-  backend-related tasks or configurations.
-
-  ```bash
-  make start-backend
-  ```
-
- **Start the Frontend Server:** Similarly, you can start the frontend server on its own to work on frontend-related
-  components or interface enhancements.
-  ```bash
-  make start-frontend
-  ```
-
-### 5. Running OpenHands with OpenHands
-
-You can use OpenHands to develop and improve OpenHands itself! This is a powerful way to leverage AI assistance for contributing to the project.
-
-#### Quick Start
-
-1. **Build and run OpenHands:**
-
-   ```bash
-   export INSTALL_DOCKER=0
-   export RUNTIME=local
-   make build && make run
-   ```
-
-2. **Access the interface:**
-
-   - Local development: http://localhost:3001
-   - Remote/cloud environments: Use the appropriate external URL
-
-3. **Configure for external access (if needed):**
-   ```bash
-   # For external access (e.g., cloud environments)
-   make run FRONTEND_PORT=12000 FRONTEND_HOST=0.0.0.0 BACKEND_HOST=0.0.0.0
-   ```
-
-### 6. LLM Debugging
-
-If you encounter any issues with the Language Model (LM) or you're simply curious, export DEBUG=1 in the environment and restart the backend.
-OpenHands will log the prompts and responses in the logs/llm/CURRENT_DATE directory, allowing you to identify the causes.
-
-### 7. Help
-
-Need help or info on available targets and commands? Use the help command for all the guidance you need with OpenHands.
-
-```bash
-make help
-```
-
-### 8. Testing
-
-To run tests, refer to the following:
-
-#### Unit tests
-
-```bash
-poetry run pytest ./tests/unit/test_*.py
-```
-
-### 9. Add or update dependency
-
-1. Add your dependency in `pyproject.toml` or use `poetry add xxx`.
-2. Update the poetry.lock file via `poetry lock --no-update`.
-
-### 10. Use existing Docker image
-
-To reduce build time (e.g., if no changes were made to the client-runtime component), you can use an existing Docker
-container image by setting the SANDBOX_RUNTIME_CONTAINER_IMAGE environment variable to the desired Docker image.
-
-Example: `export SANDBOX_RUNTIME_CONTAINER_IMAGE=ghcr.io/openhands/runtime:1.2-nikolaik`
-
-## Develop inside Docker container
-
-TL;DR
-
-```bash
-make docker-dev
-```
-
-See more details [here](./containers/dev/README.md).
-
-If you are just interested in running `OpenHands` without installing all the required tools on your host.
-
-```bash
-make docker-run
-```
-
-If you do not have `make` on your host, run:
-
-```bash
-cd ./containers/dev
-./dev.sh
-```
-
-You do need [Docker](https://docs.docker.com/engine/install/) installed on your host though.
-
-## Key Documentation Resources
-
-Here's a guide to the important documentation files in the repository:
-
- [/README.md](./README.md): Main project overview, features, and basic setup instructions
- [/Development.md](./Development.md) (this file): Comprehensive guide for developers working on OpenHands
- [/CONTRIBUTING.md](./CONTRIBUTING.md): Guidelines for contributing to the project, including code style and PR process
- [DOC_STYLE_GUIDE.md](https://github.com/OpenHands/docs/blob/main/openhands/DOC_STYLE_GUIDE.md): Standards for writing and maintaining project documentation
- [/openhands/README.md](./openhands/README.md): Details about the backend Python implementation
- [/frontend/README.md](./frontend/README.md): Frontend React application setup and development guide
- [/containers/README.md](./containers/README.md): Information about Docker containers and deployment
- [/tests/unit/README.md](./tests/unit/README.md): Guide to writing and running unit tests
- [/evaluation/README.md](./evaluation/README.md): Documentation for the evaluation framework and benchmarks
- [/skills/README.md](./skills/README.md): Information about the skills architecture and implementation
- [/openhands/server/README.md](./openhands/server/README.md): Server implementation details and API documentation
- [/openhands/runtime/README.md](./openhands/runtime/README.md): Documentation for the runtime environment and execution model
@@ -1,27 +0,0 @@
-# Issue Triage
-These are the procedures and guidelines on how issues are triaged in this repo by the maintainers.
-
-## General
-* All issues must be tagged with **enhancement**, **bug** or **troubleshooting/help**.
-* Issues may be tagged with what it relates to (**llm**, **app tab**, **UI/UX**, etc.).
-
-## Severity
-* **High**: High visibility issues or affecting many users.
-* **Critical**: Affecting all users or potential security issues.
-
-## Difficulty
-* Issues good for newcomers may be tagged with **good first issue**.
-
-## Not Enough Information
-* User is asked to provide more information (logs, how to reproduce, etc.) when the issue is not clear.
-* If an issue is unclear and the author does not provide more information or respond to a request,
-the issue may be closed as **not planned** (Usually after a week).
-
-## Multiple Requests/Fixes in One Issue
-* These issues will be narrowed down to one request/fix so the issue is more easily tracked and fixed.
-* Issues may be broken down into multiple issues if required.
-
-## Stale and Auto Closures
-* In order to keep a maintainable backlog, issues that have no activity within 40 days are automatically marked as **Stale**.
-* If issues marked as **Stale** continue to have no activity for 10 more days, they will automatically be closed as not planned.
-* Issues may be reopened by maintainers if deemed important.
@@ -1,30 +1,21 @@
-Portions of this software are licensed as follows:
-* All content that resides under the enterprise/ directory is licensed under the license defined in "enterprise/LICENSE".
-* Content outside of the above mentioned directories or restrictions above is available under the MIT license as defined below.
+MIT License

-=====================
+Copyright (c) 2026 OpenHands contributors

-The MIT License (MIT)
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:

-Copyright © 2025
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.

-Permission is hereby granted, free of charge, to any person
-obtaining a copy of this software and associated documentation
-files (the “Software”), to deal in the Software without
-restriction, including without limitation the rights to use,
-copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the
-Software is furnished to do so, subject to the following
-conditions:
-
-The above copyright notice and this permission notice shall be
-included in all copies or substantial portions of the Software.
-
-THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND,
-EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
-OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
-NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
-HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
-WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
-FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
-OTHER DEALINGS IN THE SOFTWARE.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
@@ -0,0 +1,11 @@
+# Repository Maintainers
+#
+# Format: Each maintainer on a new line starting with "- @username"
+# This file is read by .github/workflows/assign-reviews.yml for automated triage
+#
+
+The following people are maintainers of this repository and are responsible for triage and review:
+
+- @xingyaoww
+- @neubig
+- @enyst
@@ -1,5 +1,48 @@
-# Exclude all Python bytecode files
-global-exclude *.pyc
+# This MANIFEST.in file tells setuptools which files to include 
+# in the sdist package distribution used for building docker image

-# Exclude Python cache directories
-global-exclude __pycache__
+# ==============================================================================
+# Root-level workspace files
+# ==============================================================================
+include pyproject.toml
+include uv.lock
+
+# ==============================================================================
+# openhands-sdk
+# ==============================================================================
+include openhands-sdk/pyproject.toml
+recursive-include openhands-sdk *.py
+recursive-include openhands-sdk *.j2
+recursive-include openhands-sdk py.typed
+
+# ==============================================================================
+# openhands-tools
+# ==============================================================================
+include openhands-tools/pyproject.toml
+recursive-include openhands-tools *.py
+recursive-include openhands-tools *.j2
+recursive-include openhands-tools py.typed
+
+# ==============================================================================
+# openhands-workspace
+# ==============================================================================
+include openhands-workspace/pyproject.toml
+recursive-include openhands-workspace *.py
+recursive-include openhands-workspace py.typed
+
+# ==============================================================================
+# openhands-agent-server
+# ==============================================================================
+include openhands-agent-server/pyproject.toml
+recursive-include openhands-agent-server *.py
+recursive-include openhands-agent-server py.typed
+
+# Docker build files
+include openhands-agent-server/openhands/agent_server/docker/Dockerfile
+include openhands-agent-server/openhands/agent_server/docker/wallpaper.svg
+
+# PyInstaller spec
+include openhands-agent-server/openhands/agent_server/agent-server.spec
+
+# VSCode extensions
+recursive-include openhands-agent-server/openhands/agent_server/vscode_extensions *
--- a/Show More
+++ b/Show More
				`@@ -1 +0,0 @@`
				`This way of running OpenHands is not officially supported. It is maintained by the community.`