Files
AutoGPT/autogpt_platform/backend/test
Tejas Dharani db1f034544 Fix Gmail body parsing for multipart messages (#9863) (#10071)
<!-- Clearly explain the need for these changes: -->

The `GmailReadBlock._get_email_body()` method was only inspecting the
top-level payload and a single `text/plain` part, causing it to return
the fallback string "This email does not contain a text body." for most
Gmail messages. This occurred because Gmail messages are typically
wrapped in `multipart/alternative` or other multipart containers, which
the original implementation couldn't handle.

This critical issue made the Gmail integration unusable for reading
email body content, as virtually every real Gmail message uses multipart
MIME structures.

<!-- Concisely describe all of the changes made in this pull request:
-->

### Changes 

#### Core Implementation:
- **Replaced simple `_get_email_body()` with recursive multipart
parser** that can walk through nested MIME structures
- **Added `_walk_for_body()` method** for recursive traversal of email
parts with depth limiting (max 10 levels)
- **Implemented safe base64 decoding** with automatic padding correction
in `_decode_base64()`
- **Added attachment body support** via `_download_attachment_body()`
for emails where body content is stored as attachments

#### Email Format Support:
- **HTML to text conversion** using `html2text` library for HTML-only
emails
- **Multipart/alternative handling** with preference for `text/plain`
over `text/html`
- **Nested multipart structure support** (e.g., `multipart/mixed`
containing `multipart/alternative`)
- **Single-part email support** (maintains backward compatibility)

#### Dependencies & Testing:
- **Added `html2text = "^2024.2.26"`** to `pyproject.toml` for HTML
conversion
- **Created comprehensive unit tests** in `test/blocks/test_gmail.py`
covering all email types and edge cases
- **Added error handling and graceful fallbacks** for malformed data and
missing dependencies

#### Security & Performance:
- **Recursion depth limiting** prevents infinite loops on malformed
email structures
- **Exception handling** ensures graceful degradation when API calls
fail
- **Efficient tree traversal** with early returns for better performance

### Checklist 

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:

<details>
<summary>Test Plan</summary>

- **Single-part text/plain emails** - Verified correct extraction of
plain text content
- **Multipart/alternative emails** - Tested preference for plain text
over HTML when both available
- **HTML-only emails** - Confirmed HTML to text conversion works
correctly
- **Nested multipart structures** - Tested deeply nested
`multipart/mixed` containing `multipart/alternative`
- **Attachment-based body content** - Verified downloading and decoding
of body stored as attachments
- **Base64 padding edge cases** - Tested malformed base64 data with
missing padding
- **Recursion depth limits** - Confirmed protection against infinite
recursion
- **Error handling scenarios** - Tested graceful fallbacks for API
failures and missing dependencies
- **Backward compatibility** - Ensured existing functionality remains
unchanged for edge cases
- **Integration testing** - Ran standalone verification script with 100%
test pass rate

</details>

#### For configuration changes:
- [x] `.env.example` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

<details>
<summary>Configuration Changes</summary>

- Added `html2text` dependency to `pyproject.toml` - no environment or
infrastructure changes required
- No changes to ports, services, secrets, or databases
- Fully backward compatible with existing Gmail API configuration

</details>

---------

Co-authored-by: Toran Bruce Richards <toran.richards@gmail.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
2025-07-15 19:39:02 +00:00
..