mirror of
https://github.com/Significant-Gravitas/AutoGPT.git
synced 2026-04-08 03:00:28 -04:00
fix(backend): preserve thinking blocks during transcript compaction (#12574)
AutoPilot users hit `invalid_request_error` ("thinking or
redacted_thinking blocks in the latest assistant message cannot be
modified") when sessions get long enough to trigger transcript
compaction. The Anthropic API requires thinking blocks in the last
assistant message to be byte-for-byte identical to the original response
— our compaction was flattening them to plain text, destroying the
cryptographic signatures.
Reported in Discord `#breakage` by John Ababseh with session
`31d3f08a-cb94-45eb-9fce-56b3f0287ef4`.
- **`compact_transcript`** now splits the transcript into a compressible
prefix and a preserved tail (last assistant entry + trailing entries).
Only the prefix is compressed; the tail is re-appended verbatim,
preserving thinking blocks exactly.
- **`_flatten_assistant_content`** now silently drops `thinking` and
`redacted_thinking` blocks instead of creating `[__thinking__]`
placeholders — they carry no useful context for compression summaries.
- **`response_adapter`** explicitly handles `ThinkingBlock` (skip
gracefully instead of silently falling through the isinstance chain).
- **`_format_sdk_content_blocks`** now passes through raw dict blocks
(e.g. `redacted_thinking` that the SDK may not have a typed class for)
verbatim to the transcript.
The key insight is the Anthropic API's asymmetric constraint:
- **Last assistant message**: thinking/redacted_thinking blocks must be
preserved byte-for-byte
- **Older assistant messages**: thinking blocks can be removed entirely
`compact_transcript` uses `_find_last_assistant_entry()` to split the
JSONL into two parts:
1. **Prefix** (everything before the last assistant): flattened and
compressed normally
2. **Tail** (last assistant + any trailing user message): preserved
verbatim and re-chained via `_rechain_tail()` to maintain the
`parentUuid` chain
This ensures the API always sees the original thinking blocks in the
last assistant message while still achieving meaningful compression on
older turns.
- [x] 25 new tests across `thinking_blocks_test.py` (TDD: written before
implementation)
- [x] `_find_last_assistant_entry` splits correctly at last assistant,
handles edges (no assistant, index 0, trailing user)
- [x] `_rechain_tail` patches parentUuid chain, handles empty tail
- [x] `_flatten_assistant_content` strips thinking/redacted_thinking
blocks, handles mixed content
- [x] `compact_transcript` preserves last assistant's thinking blocks
- [x] `compact_transcript` strips thinking from older assistant messages
- [x] Edge cases: trailing user message, single assistant, no thinking
blocks
- [x] `response_adapter` handles ThinkingBlock without crash
- [x] `_format_sdk_content_blocks` preserves thinking block format and
raw dict blocks
- [x] All existing copilot SDK tests pass
- [x] Pre-commit hooks (lint, format, typecheck) all pass
This commit is contained in:
@@ -689,6 +689,9 @@ COMPACT_MSG_ID_PREFIX = "msg_compact_"
|
||||
ENTRY_TYPE_MESSAGE = "message"
|
||||
|
||||
|
||||
_THINKING_BLOCK_TYPES = frozenset({"thinking", "redacted_thinking"})
|
||||
|
||||
|
||||
def _flatten_assistant_content(blocks: list) -> str:
|
||||
"""Flatten assistant content blocks into a single plain-text string.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user