mirror of
https://github.com/Significant-Gravitas/AutoGPT.git
synced 2026-04-08 03:00:28 -04:00
## Summary Reduces Sentry error noise by ~90% by filtering out expected/transient errors and downgrading inappropriate error-level logs to warnings. Most of the top Sentry issues are not actual bugs but expected conditions (user errors, transient infra, business logic) that were incorrectly logged at ERROR level, causing them to be captured as Sentry events. ## Changes ### 1. Sentry `before_send` filter (`metrics.py`) Added a `before_send` hook to filter known expected errors before they reach Sentry: - **AMQP/RabbitMQ connection errors** — transient during deploys/restarts - **User credential errors** — invalid API keys, missing auth headers (user error, not platform bug) - **Insufficient balance** — expected business logic - **Blocked IP access** — security check working as intended - **Discord bot token errors** — misconfiguration, not runtime error - **Google metadata DNS errors** — expected in non-GCP environments - **Inactive email recipients** — expected for bounced addresses - **Unclosed client sessions/connectors** — resource cleanup noise ### 2. Connection retry log levels (`retry.py`) - `conn_retry` final failure: `error` → `warning` (these are infra retries, not bugs) - `conn_retry` wrapper final failure: `error` → `warning` - Discord alert send failure: `error` → `warning` ### 3. Block execution Sentry capture (`manager.py`) - Skip `sentry_sdk.capture_exception()` for `ValueError` subclasses (BlockExecutionError, BlockInputError, InsufficientBalanceError, etc.) — these are user-caused errors, not platform bugs - Downgrade executor shutdown/disconnect errors to warning ### 4. Scheduler log levels (`scheduler.py`) - Graph validation failure: `error` → `warning` (expected for old/invalid graphs) - Unable to unschedule graph: `error` → `warning` - Job listener failure: `error` → `warning` - Async operation failure: `error` → `warning` ### 5. Discord system alert (`notifications.py`) - Wrapped `discord_system_alert` endpoint with try/catch to prevent unhandled exceptions (fixes AUTOGPT-SERVER-743, AUTOGPT-SERVER-7MW) ### 6. Notification system log levels (`notifications.py`) - All batch processing errors: `error` → `warning` - User email not found: `error` → `warning` - Notification parsing errors: `error` → `warning` - Email sending failures: `error` → `warning` - Summary data gathering failure: `error` → `warning` - Cleaned up unprofessional error messages ### 7. Cloud storage cleanup (`cloud_storage.py`) - Cleanup error: `error` → `warning` ## Sentry Issues Addressed ### AMQP/RabbitMQ (~3.4M events total) - [AUTOGPT-SERVER-3H2](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-3H2) — AMQPConnector ConnectionRefusedError (1.2M events) - [AUTOGPT-SERVER-3H3](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-3H3) — AMQPConnectionWorkflowFailed (770K events) - [AUTOGPT-SERVER-3H4](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-3H4) — AMQP connection workflow failed (770K events) - [AUTOGPT-SERVER-3H5](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-3H5) — AMQPConnectionWorkflow reporting failure (770K events) - [AUTOGPT-SERVER-3H7](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-3H7) — Socket failed to connect (514K events) - [AUTOGPT-SERVER-3H8](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-3H8) — TCP Connection attempt failed (514K events) - [AUTOGPT-SERVER-3H6](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-3H6) — AMQPConnectionError (93K events) - [AUTOGPT-SERVER-7SX](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-7SX) — Error creating transport (69K events) - [AUTOGPT-SERVER-1TN](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-1TN) — ChannelInvalidStateError (39K events) - [AUTOGPT-SERVER-6JC](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-6JC) — ConnectionClosedByBroker (2K events) - [AUTOGPT-SERVER-6RJ/6RK/6RN/6RQ/6RP/6RR](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-6RJ) — Various connection failures (~15K events) - [AUTOGPT-SERVER-4A5/6RM/7XN](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-4A5) — Connection close/transport errors (~540 events) ### User Credential Errors (~15K events) - [AUTOGPT-SERVER-6S5](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-6S5) — Incorrect OpenAI API key (9.2K events) - [AUTOGPT-SERVER-7W4](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-7W4) — Incorrect API key in AIConditionBlock (3.4K events) - [AUTOGPT-SERVER-83Y](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-83Y) — AI condition invalid key (2.3K events) - [AUTOGPT-SERVER-7ZP](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-7ZP) — Perplexity missing auth header (451 events) - [AUTOGPT-SERVER-7XK/7XM](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-7XK) — Anthropic invalid key (125 events) - [AUTOGPT-SERVER-82C](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-82C) — Missing auth header (27 events) - [AUTOGPT-SERVER-721](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-721) — Ideogram invalid token (165 events) ### Business Logic / Validation (~120K events) - [AUTOGPT-SERVER-7YQ](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-7YQ) — Disabled block used in graph (56K events) - [AUTOGPT-SERVER-6W3](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-6W3) — Graph failed validation (46K events) - [AUTOGPT-SERVER-6W2](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-6W2) — Unable to unschedule graph (46K events) - [AUTOGPT-SERVER-83X](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-83X) — Blocked IP access (15K events) - [AUTOGPT-SERVER-6K9](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-6K9) — Insufficient balance (4K events) ### Discord Alert Failures (~24K events) - [AUTOGPT-SERVER-743](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-743) — Discord improper token (22K events) - [AUTOGPT-SERVER-7MW](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-7MW) — Discord 403 Missing Access (1.5K events) ### Notification System (~16K events) - [AUTOGPT-SERVER-550](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-550) — Notification batch create error (8.3K events) - [AUTOGPT-SERVER-58H](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-58H) — ValidationError for NotificationEventModel (3K events) - [AUTOGPT-SERVER-5C6](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-5C6) — Get notification batch error (2.1K events) - [AUTOGPT-SERVER-4BT](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-4BT) — Notification batch create error (1.8K events) - [AUTOGPT-SERVER-5E4](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-5E4) — NotificationPreference validation (1.4K events) - [AUTOGPT-SERVER-508](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-508) — Inactive email recipients (702 events) ### Infrastructure / Transient (~20K events) - [AUTOGPT-SERVER-6WJ](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-6WJ) — Unclosed client session (13K events) - [AUTOGPT-SERVER-745](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-745) — Unclosed connector (5.8K events) - [AUTOGPT-SERVER-4V1](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-4V1) — Google metadata DNS error (2.2K events) - [AUTOGPT-SERVER-80J](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-80J) — CloudStorage DNS error (35 events) ### Executor Shutdown - [AUTOGPT-SERVER-55J](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-55J) — Error disconnecting run client (118 events) ## Test plan - [x] All pre-commit hooks pass (Ruff, isort, Black, Pyright typecheck) - [x] All changed modules import successfully - [ ] Deploy to staging and verify Sentry event volume drops significantly - [ ] Verify legitimate errors still appear in Sentry
179 lines
6.2 KiB
Python
179 lines
6.2 KiB
Python
import logging
|
|
from enum import Enum
|
|
|
|
import sentry_sdk
|
|
from pydantic import SecretStr
|
|
from sentry_sdk.integrations import DidNotEnable
|
|
from sentry_sdk.integrations.anthropic import AnthropicIntegration
|
|
from sentry_sdk.integrations.asyncio import AsyncioIntegration
|
|
from sentry_sdk.integrations.launchdarkly import LaunchDarklyIntegration
|
|
from sentry_sdk.integrations.logging import LoggingIntegration
|
|
|
|
from backend.util import feature_flag
|
|
from backend.util.settings import BehaveAs, Settings
|
|
|
|
settings = Settings()
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
class DiscordChannel(str, Enum):
|
|
PLATFORM = "platform" # For platform/system alerts
|
|
PRODUCT = "product" # For product alerts (low balance, zero balance, etc.)
|
|
|
|
|
|
def _before_send(event, hint):
|
|
"""Filter out expected/transient errors from Sentry to reduce noise."""
|
|
if "exc_info" in hint:
|
|
exc_type, exc_value, _ = hint["exc_info"]
|
|
exc_msg = str(exc_value).lower() if exc_value else ""
|
|
|
|
# AMQP/RabbitMQ transient connection errors — expected during deploys
|
|
amqp_keywords = [
|
|
"amqpconnection",
|
|
"amqpconnector",
|
|
"connection_forced",
|
|
"channelinvalidstateerror",
|
|
"no active transport",
|
|
]
|
|
if any(kw in exc_msg for kw in amqp_keywords):
|
|
return None
|
|
|
|
# "connection refused" only for AMQP-related exceptions (not other services)
|
|
if "connection refused" in exc_msg:
|
|
exc_module = getattr(exc_type, "__module__", "") or ""
|
|
exc_name = getattr(exc_type, "__name__", "") or ""
|
|
amqp_indicators = ["aio_pika", "aiormq", "amqp", "pika", "rabbitmq"]
|
|
if any(
|
|
ind in exc_module.lower() or ind in exc_name.lower()
|
|
for ind in amqp_indicators
|
|
) or any(kw in exc_msg for kw in ["amqp", "pika", "rabbitmq"]):
|
|
return None
|
|
|
|
# User-caused credential/auth errors — not platform bugs
|
|
user_auth_keywords = [
|
|
"incorrect api key",
|
|
"invalid x-api-key",
|
|
"missing authentication header",
|
|
"invalid api token",
|
|
"authentication_error",
|
|
]
|
|
if any(kw in exc_msg for kw in user_auth_keywords):
|
|
return None
|
|
|
|
# Expected business logic — insufficient balance
|
|
if "insufficient balance" in exc_msg or "no credits left" in exc_msg:
|
|
return None
|
|
|
|
# Expected security check — blocked IP access
|
|
if "access to blocked or private ip" in exc_msg:
|
|
return None
|
|
|
|
# Discord bot token misconfiguration — not a platform error
|
|
if "improper token has been passed" in exc_msg or (
|
|
exc_type and exc_type.__name__ == "Forbidden" and "50001" in exc_msg
|
|
):
|
|
return None
|
|
|
|
# Google metadata DNS errors — expected in non-GCP environments
|
|
if (
|
|
"metadata.google.internal" in exc_msg
|
|
and settings.config.behave_as != BehaveAs.CLOUD
|
|
):
|
|
return None
|
|
|
|
# Inactive email recipients — expected for bounced addresses
|
|
if "marked as inactive" in exc_msg or "inactive addresses" in exc_msg:
|
|
return None
|
|
|
|
# Also filter log-based events for known noisy messages.
|
|
# Sentry's LoggingIntegration stores log messages under "logentry", not "message".
|
|
logentry = event.get("logentry") or {}
|
|
log_msg = (
|
|
logentry.get("formatted") or logentry.get("message") or event.get("message")
|
|
)
|
|
if event.get("logger") and log_msg:
|
|
msg = log_msg.lower()
|
|
noisy_patterns = [
|
|
"amqpconnection",
|
|
"connection_forced",
|
|
"unclosed client session",
|
|
"unclosed connector",
|
|
]
|
|
if any(p in msg for p in noisy_patterns):
|
|
return None
|
|
# "connection refused" in logs only when AMQP-related context is present
|
|
if "connection refused" in msg and any(
|
|
ind in msg for ind in ("amqp", "pika", "rabbitmq", "aio_pika", "aiormq")
|
|
):
|
|
return None
|
|
|
|
return event
|
|
|
|
|
|
def sentry_init():
|
|
sentry_dsn = settings.secrets.sentry_dsn
|
|
integrations = []
|
|
if feature_flag.is_configured():
|
|
try:
|
|
integrations.append(LaunchDarklyIntegration(feature_flag.get_client()))
|
|
except DidNotEnable as e:
|
|
logger.error(f"Error enabling LaunchDarklyIntegration for Sentry: {e}")
|
|
sentry_sdk.init(
|
|
dsn=sentry_dsn,
|
|
traces_sample_rate=1.0,
|
|
profiles_sample_rate=1.0,
|
|
environment=f"app:{settings.config.app_env.value}-behave:{settings.config.behave_as.value}",
|
|
_experiments={"enable_logs": True},
|
|
before_send=_before_send,
|
|
integrations=[
|
|
AsyncioIntegration(),
|
|
LoggingIntegration(sentry_logs_level=logging.INFO),
|
|
AnthropicIntegration(
|
|
include_prompts=False,
|
|
),
|
|
]
|
|
+ integrations,
|
|
)
|
|
|
|
|
|
def sentry_capture_error(error: BaseException):
|
|
sentry_sdk.capture_exception(error)
|
|
sentry_sdk.flush()
|
|
|
|
|
|
async def discord_send_alert(
|
|
content: str, channel: DiscordChannel = DiscordChannel.PLATFORM
|
|
):
|
|
from backend.blocks.discord.bot_blocks import SendDiscordMessageBlock
|
|
from backend.data.model import APIKeyCredentials, CredentialsMetaInput, ProviderName
|
|
|
|
creds = APIKeyCredentials(
|
|
provider="discord",
|
|
api_key=SecretStr(settings.secrets.discord_bot_token),
|
|
title="Provide Discord Bot Token for the platform alert",
|
|
expires_at=None,
|
|
)
|
|
|
|
# Select channel based on enum
|
|
if channel == DiscordChannel.PLATFORM:
|
|
channel_name = settings.config.platform_alert_discord_channel
|
|
elif channel == DiscordChannel.PRODUCT:
|
|
channel_name = settings.config.product_alert_discord_channel
|
|
else:
|
|
channel_name = settings.config.platform_alert_discord_channel
|
|
|
|
return await SendDiscordMessageBlock().run_once(
|
|
SendDiscordMessageBlock.Input(
|
|
credentials=CredentialsMetaInput(
|
|
id=creds.id,
|
|
title=creds.title,
|
|
type=creds.type,
|
|
provider=ProviderName.DISCORD,
|
|
),
|
|
message_content=content,
|
|
channel_name=channel_name,
|
|
),
|
|
"status",
|
|
credentials=creds,
|
|
)
|