AutoGPT

mirror of https://github.com/Significant-Gravitas/AutoGPT.git synced 2026-04-08 03:00:28 -04:00

Files

Zamil Majdy 678ddde751 refactor(backend): unify context compression into compress_context() (#11937 )

## Background

This PR consolidates and unifies context window management for the
CoPilot backend.

### Problem
The CoPilot backend had **two separate implementations** of context
window management:

1. **`service.py` → `_manage_context_window()`** - Chat service
streaming/continuation
2. **`prompt.py` → `compress_prompt()`** - Sync LLM blocks

This duplication led to inconsistent behavior, maintenance burden, and
duplicate code.

---

## Solution: Unified `compress_context()`

A single async function that handles both use cases:

| Caller | Usage | Behavior |
|--------|-------|----------|
| **Chat service** | `compress_context(msgs, client=openai_client)` |
Summarization → Truncation |
| **LLM blocks** | `compress_context(msgs, client=None)` | Truncation
only (no API call) |

---

## Strategy Order

| Step | Description | Runs When |
|------|-------------|-----------|
| **1. LLM Summarization** | Summarize old messages into single context
message, keep recent 15 | Only if `client` provided |
| **2. Content Truncation** | Progressively truncate message content
(8192→4096→...→128 tokens) | If still over limit |
| **3. Middle-out Deletion** | Delete messages one at a time from center
outward | If still over limit |
| **4. First/Last Trim** | Truncate system prompt and last message
content | Last resort |

### Why This Order?

1. **Summarization first** (if available) - Preserves semantic meaning
of old messages
2. **Content truncation before deletion** - Keeps all conversation
turns, just shorter
3. **Middle-out deletion** - More granular than dropping all old
messages at once
4. **First/last trim** - Only touch system prompt as last resort

---

## Key Fixes

| Issue | Before | After |
|-------|--------|-------|
| **Socket leak** | `AsyncOpenAI` client never closed | `async with`
context manager |
| **Timeout ignored** | `timeout=30` passed to `create()` (invalid) |
`client.with_options(timeout=30)` |
| **OpenAI tool messages** | Not truncated | Properly truncated |
| **Tool pair integrity** | OpenAI format only | Both OpenAI + Anthropic
formats |

---

## Tool Format Support

`_ensure_tool_pairs_intact()` now supports both formats:

### OpenAI Format
```python
# Assistant with tool_calls
{"role": "assistant", "tool_calls": [{"id": "call_1", ...}]}
# Tool response
{"role": "tool", "tool_call_id": "call_1", "content": "result"}
```

### Anthropic Format
```python
# Assistant with tool_use
{"role": "assistant", "content": [{"type": "tool_use", "id": "toolu_1", ...}]}
# Tool result
{"role": "user", "content": [{"type": "tool_result", "tool_use_id": "toolu_1", ...}]}
```

---

## Files Changed

| File | Change |
|------|--------|
| `backend/util/prompt.py` | +450 lines: Add `CompressResult`,
`compress_context()`, helpers |
| `backend/api/features/chat/service.py` | -380 lines: Remove duplicate,
use thin wrapper |
| `backend/blocks/llm.py` | Migrate `llm_call()` to use
`compress_context(client=None)` |
| `backend/util/prompt_test.py` | +400 lines: Comprehensive tests
(OpenAI + Anthropic) |

### Removed
- `compress_prompt()` - Replaced by `compress_context(client=None)`
- `_manage_context_window()` - Replaced by
`compress_context(client=openai_client)`

---

## API

```python
async def compress_context(
    messages: list[dict],
    target_tokens: int = 120_000,
    *,
    model: str = "gpt-4o",
    client: AsyncOpenAI | None = None,  # None = truncation only
    keep_recent: int = 15,
    reserve: int = 2_048,
    start_cap: int = 8_192,
    floor_cap: int = 128,
) -> CompressResult:
    ...

@dataclass
class CompressResult:
    messages: list[dict]
    token_count: int
    was_compacted: bool
    error: str | None = None
    original_token_count: int = 0
    messages_summarized: int = 0
    messages_dropped: int = 0
```

---

## Tests Added

| Test Class | Coverage |
|------------|----------|
| `TestMsgTokens` | Token counting for regular messages, OpenAI tool
calls, Anthropic tool_use |
| `TestTruncateToolMessageContent` | OpenAI + Anthropic tool message
truncation |
| `TestEnsureToolPairsIntact` | OpenAI format (3 tests), Anthropic
format (3 tests), edge cases (3 tests) |
| `TestCompressContext` | No compression, truncation-only, tool pair
preservation, error handling |

---

## Checklist

- [x] Code follows project conventions
- [x] Linting passes (`poetry run format`)
- [x] Type checking passes (`pyright`)
- [x] Tests added for all new functions
- [x] Both OpenAI and Anthropic tool formats supported
- [x] Backward compatible behavior preserved
- [x] All review comments addressed

2026-02-03 10:36:10 +00:00