When the FirecrawlExtractBlock receives an output_schema, we currently
declare the field as a str.
Pydantic therefore serialises the JSON‐looking value into a string and
the Firecrawl API rejects the request with:
`400 Bad Request – Invalid JSON schema. path: ['schema']`
Direct curl requests work because the same structure is sent as a proper
JSON object.
### Changes 🏗️
- Changed the output_schema to dict instead of str
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Test firebase.extract(..., schema) works with dict rather than str
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
<!-- Clearly explain the need for these changes: -->
## 🚨 CRITICAL: Double Transaction Bug
**Critical Issue:** Top-up transactions were being applied TWICE to user
balances, causing severe accounting errors.
**Example:**
- User with $160 balance tops up $50
- Expected: $210 balance
- Actual: $260 balance (extra $50 incorrectly credited)
This compromises the financial integrity of our credit system and
requires immediate fix.
### Changes 🏗️
1. **Added double-checked locking pattern in `_enable_transaction`**
(backend/data/credit.py)
- Added transaction re-check INSIDE the locked transaction block (lines
294-298)
- Prevents race condition when concurrent requests try to activate the
same transaction
- Ensures transaction can only be activated once, even with webhook
retries
2. **Enhanced error messages in Stripe webhook handler**
(backend/server/routers/v1.py)
- Added detailed error messages for better debugging of webhook failures
- Helps identify issues with payload validation or signature
verification
### Root Cause Analysis 🔍
**TOCTOU (Time-of-Check to Time-of-Use) Race Condition:**
The original code checked `transaction.isActive` outside the database
lock. Between this check and acquiring the lock, another concurrent
request (webhook retry or duplicate) could enter, causing both to
proceed with activation.
**Sequence:**
1. Request A: Checks `isActive=False` ✅
2. Request B: Checks `isActive=False` ✅ (webhook retry)
3. Request A: Acquires lock, activates transaction, adds $50
4. Request B: Waits for lock, then ALSO adds $50 ❌
**Contributing Factors:**
- Stripe webhook retry mechanism
- `@func_retry` decorator (up to 5 attempts)
- No database-level unique constraint on active transactions
- Missing atomicity between check and update
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Verified the double-check prevents duplicate transaction
activation
- [x] Tested concurrent webhook calls - only one succeeds in activating
transaction
- [x] Confirmed balance is only incremented once per transaction
- [x] Verified idempotency - multiple calls with same transaction_key
are safe
- [x] All existing credit system tests pass
- [x] Tested webhook error handling with invalid payloads/signatures
#### For configuration changes:
- [x] `.env.example` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
*Note: No configuration changes required - this is a code-only fix*
This should fix sub-agent execution issues with passed-in credentials after a crucial data path was removed in #10568.
Additionally, some of the changes are to ensure the `credentials_input_schema` gets refreshed correctly when saving a new version of a graph in the builder.
### Changes 🏗️
- Include `graph_credentials_inputs` in `nodes_input_masks` passed into sub-agent execution
- Fix credentials input schema in `update_graph` and `get_library_agent_by_graph_id` return
- Improve error message on sub-graph validation failure
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Import agent with sub-agent(s) with required credentials inputs & run it -> should work
## Summary
- Fixes scheduler pod crashes during peak scheduling periods (e.g.,
03:00:00)
- Reduces APScheduler ThreadPoolExecutor max_workers from 10 to 3
(matching scheduler_db_pool_size)
- Prevents event loop saturation that blocks health checks and causes
pod restarts
## Root Cause Analysis
During peak scheduling periods, multiple jobs execute simultaneously and
compete for the shared event loop through `run_async()`. This creates a
resource bottleneck where:
1. **ThreadPoolExecutor** runs up to 10 jobs concurrently
2. Each job calls `run_async()` which submits to the **same event loop**
that FastAPI health check needs
3. **Health check blocks** waiting for event loop availability
4. **Liveness probe fails** after 5 consecutive timeouts (50s)
5. **Pod gets killed** with SIGKILL (exit code 137)
6. **Executions orphaned** - created in DB but never published to
RabbitMQ
## Solution
Match `max_workers` to `scheduler_db_pool_size` (3) to prevent more
concurrent jobs than the system can handle without blocking critical
health checks.
## Evidence
- Pod restart at exactly 03:05:48 when executions
e47cd564-ed87-4a52-999b-40804c41537a and
eae69811-4c7c-4cd5-b084-41872293185b were created
- 7 scheduled jobs triggered simultaneously at 03:00:00
- Health check normally responds in 0.007s but times out during high
concurrency
- Exit code 137 indicates SIGKILL from liveness probe failure
## Test Plan
- [ ] Monitor scheduler pod stability during peak scheduling periods
- [ ] Verify no executions remain QUEUED without being published to
RabbitMQ
- [ ] Confirm health checks remain responsive under load
- [ ] Check that job execution still works correctly with reduced
concurrency
🤖 Generated with [Claude Code](https://claude.ai/code)
---------
Co-authored-by: Claude <noreply@anthropic.com>
## Summary
**HOTFIX for production** - Fixes executor being stuck in infinite retry
loop when RabbitMQ channels are closed
- Ensures proper reconnection by checking channel state before
attempting to consume messages
- Prevents accumulation of thousands of retry attempts (was seeing 7000+
retries)
## Changes
The executor was stuck repeatedly failing with "Channel is closed"
errors because the `continuous_retry` decorator was attempting to reuse
closed channels instead of creating new ones.
Added channel state checks (`is_ready`) before connecting in both:
- `_consume_execution_run()`
- `_consume_execution_cancel()`
When a channel is not ready (closed), the code now:
1. Disconnects the client (safe operation, checks if already
disconnected)
2. Establishes a fresh connection with new channel
3. Proceeds with message consumption
## Test plan
- [x] Verified the disconnect() method is safe to call on already
disconnected clients
- [x] Confirmed is_ready property checks both connection and channel
state
- [ ] Deploy to environment and verify executors reconnect properly
after channel failures
- [ ] Monitor logs to ensure no more "Channel is closed" retry loops
## Related Issues
Fixes critical production issue where:
- Executor pods show repeated "Channel is closed" errors
- 757 messages stuck in `graph_execution_queue`
- 102,286 messages in `failed_notifications` queue
- RabbitMQ logs show connections being closed due to missed heartbeats
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-authored-by: Claude <noreply@anthropic.com>
## Summary
This PR adds support for v0 by Vercel's Model API to the AutoGPT
platform, enabling users to leverage v0's framework-aware AI models
optimized for React and Next.js code generation.
v0 provides OpenAI-compatible endpoints with models specifically trained
for frontend development, making them ideal for generating UI components
and web applications.
### Changes 🏗️
#### Backend Changes
- **Added v0 Provider**: Added `V0 = "v0"` to `ProviderName` enum in
`/backend/backend/integrations/providers.py`
- **Added v0 Models**: Added three v0 models to `LlmModel` enum in
`/backend/backend/blocks/llm.py`:
- `V0_1_5_MD = "v0-1.5-md"` - Everyday tasks and UI generation (128K
context, 64K output)
- `V0_1_5_LG = "v0-1.5-lg"` - Advanced reasoning (512K context, 64K
output)
- `V0_1_0_MD = "v0-1.0-md"` - Legacy model (128K context, 64K output)
- **Implemented v0 Provider**: Added v0 support in `llm_call()` function
using OpenAI-compatible client with base URL `https://api.v0.dev/v1`
- **Added Credentials Support**: Created `v0_credentials` in
`/backend/backend/integrations/credentials_store.py` with UUID
`c4e6d1a0-3b5f-4789-a8e2-9b123456789f`
- **Cost Configuration**: Added model costs in
`/backend/backend/data/block_cost_config.py`:
- v0-1.5-md: 1 credit
- v0-1.5-lg: 2 credits
- v0-1.0-md: 1 credit
#### Configuration Changes
- **Settings**: Added `v0_api_key` field to `Secrets` class in
`/backend/backend/util/settings.py`
- **Environment Variables**: Added `V0_API_KEY=` to
`/backend/.env.default`
### Features
- ✅ Full OpenAI-compatible API support
- ✅ Tool/function calling support
- ✅ JSON response format support
- ✅ Framework-aware completions optimized for React/Next.js
- ✅ Large context windows (up to 512K tokens)
- ✅ Integrated with platform credit system
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Run existing block tests to ensure no regressions: `poetry run
pytest backend/blocks/test/test_block.py`
- [x] Verify AITextGeneratorBlock works with v0 models
- [x] Confirm all model metadata is correctly configured
- [x] Validate cost configuration is properly set up
- [x] Check that v0_credentials has a valid UUID4
#### For configuration changes:
- [x] `.env.example` is updated or already compatible with my changes
- Added `V0_API_KEY=` to `/backend/.env.default`
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- No changes needed - uses existing environment variable patterns
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
### Configuration Requirements
Users need to:
1. Obtain a v0 API key from [v0.app](https://v0.app) (requires Premium
or Team plan)
2. Add `V0_API_KEY=your-api-key` to their `.env` file
### API Documentation
- v0 API Docs: https://v0.app/docs/api
- Model API Docs: https://v0.app/docs/api/model
### Testing
All existing tests pass with the new v0 integration:
```bash
poetry run pytest backend/blocks/test/test_block.py::test_available_blocks -k "AITextGeneratorBlock" -xvs
# Result: PASSED
```
<!-- Clearly explain the need for these changes: -->
We want to support ~~proxy curl~~ enrichlayer as an integration, and
this is a baseline way to get there
### Changes 🏗️
- Adds some subset of proxycurl blocks based on the API docs:
~~https://nubela.co/proxycurl/docs#people-api-person-profile-endpoint~~https://enrichlayer.com/docs/pc/#people-api
<!-- Concisely describe all of the changes made in this pull request:
-->
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] manually test the blocks with an API key
- [x] make sure the automated tests pass
---------
Co-authored-by: SwiftyOS <craigswift13@gmail.com>
Co-authored-by: Claude <claude@users.noreply.github.com>
Co-authored-by: majdyz <zamil@agpt.co>
<!-- Clearly explain the need for these changes: -->
Our weekly summary emails are currently broken, hard-coded, and so ugly.
### Changes 🏗️
Update the email template to look better
Update the way we queue messages to work after other changes have
occurred
<!-- Concisely describe all of the changes made in this pull request:
-->
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Test by sending a self email with the cron job set to every
minute, so you can see what it would look like
---------
Co-authored-by: Claude <claude@users.noreply.github.com>
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
## Summary
This PR adds the frontend service to the Docker Compose configuration,
enabling `docker compose up` to run the complete stack, including the
frontend. It also implements comprehensive environment variable
improvements, unified .env file support, and fixes Docker networking
issues.
## Key Changes
### 🐳 Docker Compose Improvements
- **Added frontend service** to `docker-compose.yml` and
`docker-compose.platform.yml`
- **Production build**: Uses `pnpm build + serve` instead of dev server
for better stability and lower memory usage
- **Service dependencies**: Frontend now waits for backend services
(`rest_server`, `websocket_server`) to be ready
- **YAML anchors**: Implemented DRY configuration to avoid duplicating
environment values
### 📁 Unified .env File Support
- **Frontend .env loading**: Automatically loads `.env` file during
Docker build and runtime
- **Backend .env loading**: Optional `.env` file support with fallback
to sensible defaults in `settings.py`
- **Single source of truth**: All `NEXT_PUBLIC_*` and API keys can be
defined in respective `.env` files
- **Docker integration**: Updated `.dockerignore` to include `.env`
files in build context
- **Git tracking**: Frontend and backend `.env` files are now trackable
(removed from gitignore)
### 🔧 Environment Variable Architecture
- **Dual environment strategy**:
- Server-side code uses Docker service names
(`http://rest_server:8006/api`)
- Client-side code uses localhost URLs (`http://localhost:8006/api`)
- **Comprehensive config**: Added build args and runtime environment
variables
- **Network compatibility**: Fixes connection issues between frontend
and backend containers
- **Shared backend variables**: Common environment variables (service
hosts, auth settings) centralized using YAML anchors
### 🛠️ Code Improvements
- **Centralized env-config helper** (`/frontend/src/lib/env-config.ts`)
with server-side priority
- **Updated all frontend code** to use shared environment helpers
instead of direct `process.env` access
- **Consistent API**: All environment variable access now goes through
helper functions
- **Settings.py improvements**: Better defaults for CORS origins and
optional .env file loading
### 🔗 Files Changed
- `docker-compose.yml` & `docker-compose.platform.yml` - Added frontend
service and shared backend env vars
- `frontend/Dockerfile` - Simplified build process to use .env files
directly
- `backend/settings.py` - Optional .env loading and better defaults
- `frontend/src/lib/env-config.ts` - New centralized environment
configuration
- `.dockerignore` - Allow .env files in build context
- `.gitignore` - Updated to allow frontend/backend .env files
- Multiple frontend files - Updated to use env helpers
- Updates to both auto installer scripts to work with the latest setup!
## Benefits
- ✅ **Single command deployment**: `docker compose up` now runs
everything
- ✅ **Better reliability**: Production build reduces memory usage and
crashes
- ✅ **Network compatibility**: Proper container-to-container
communication
- ✅ **Maintainable config**: Centralized environment variable management
with .env files
- ✅ **Development friendly**: Works in both Docker and local development
- ✅ **API key management**: Easy configuration through .env files for
all services
- ✅ **No more manual env vars**: Frontend and backend automatically load
their respective .env files
## Testing
- ✅ Verified Docker service communication works correctly
- ✅ Frontend responds and serves content properly
- ✅ Environment variables are correctly resolved in both server and
client contexts
- ✅ No connection errors after implementing service dependencies
- ✅ .env file loading works correctly in both build and runtime phases
- ✅ Backend services work with and without .env files present
### Checklist 📋
#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
🤖 Generated with [Claude Code](https://claude.ai/code)
---------
Co-authored-by: Lluis Agusti <hi@llu.lu>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
Co-authored-by: Claude <claude@users.noreply.github.com>
Co-authored-by: Bentlybro <Github@bentlybro.com>
## Problem
After applying the CloudLoggingHandler fix to use
BackgroundThreadTransport (#10634), scheduler pods entered a new
deadlock during startup when uvicorn reconfigures logging.
## Root Cause
When uvicorn starts with a log_config parameter, it calls
`logging.config.dictConfig()` which:
1. Calls `_clearExistingHandlers()`
2. Which calls `logging.shutdown()`
3. Which tries to `flush()` all handlers including CloudLoggingHandler
4. CloudLoggingHandler with BackgroundThreadTransport tries to flush its
queue
5. The background worker thread tries to acquire the logging module lock
to check log levels
6. **Deadlock**: shutdown holds lock waiting for flush to complete,
worker thread needs lock to continue
## Thread Dump Evidence
From py-spy analysis of the stuck pod:
- **Thread 21 (FastAPI)**: Stuck in `flush()` waiting for background
thread to drain queue
- **Thread 13 (google.cloud.logging.Worker)**: Waiting for logging lock
in `isEnabledFor()`
- **Thread 1 (MainThread)**: Waiting for logging lock in `getLogger()`
during SQLAlchemy import
- **Threads 30, 31 (Sentry)**: Also waiting for logging lock
## Solution
Set `log_config=None` for all uvicorn servers. This prevents uvicorn
from calling `dictConfig()` and avoids the deadlock entirely.
**Trade-off**: Uvicorn will use its default logging configuration which
may produce duplicate log entries (one from uvicorn, one from the app),
but the application will start successfully without deadlocks.
## Changes
- Set `log_config=None` in all uvicorn.Config() calls
- Remove unused `generate_uvicorn_config` imports
## Testing
- [x] Verified scheduler pods can start and become healthy
- [x] Health checks respond properly
- [x] No deadlocks during startup
- [x] Application logs still appear (though may be duplicated)
## Related Issues
- Fixes the startup deadlock introduced after #10634
### Summary
Added a new documentation page and images for integrating AI/ML API with
AutoGPT, including step-by-step instructions. Updated LLM block to send
additional headers for requests to aimlapi.com. Improved provider
listing in index.md and added the new guide to mkdocs navigation. Builds
on and extends the integration work from
https://github.com/Significant-Gravitas/AutoGPT/pull/9996
### Changes 🏗️
This PR introduces official support and documentation for using **AI/ML
API** with the **AutoGPT platform**:
* 📄 **Added a new documentation page** `platform/aimlapi.md` with a
detailed step-by-step integration guide.
* 🖼️ **Added 12+ reference images** to `docs/content/imgs/aimlapi/` for
clear visual walkthrough.
* 🧠 **Updated the LLM block** (`llm.py`) to send extra headers
(`X-Project`, `X-Title`, `Referer`) in requests to `aimlapi.com` for
analytics and source attribution.
* 📚 **Improved provider listing** in `index.md` — added section about
AI/ML API models and benefits.
* 🧭 **Added the new guide to the mkdocs navigation** via `mkdocs.yml`.
---
### Checklist 📋
#### For code changes:
* [x] I have clearly listed my changes in the PR description
* [x] I have made a test plan
* [x] I have tested my changes according to the test plan:
* [x] Successfully authenticated against `api.aimlapi.com`
* [x] Verified requests use correct headers
* [x] Confirmed `AI Text Generator` block returns completions for all
supported models
* [x] End-to-end tested: created, saved, and ran agent with AI/ML API
successfully
* [x] Verified outputs render correctly in the Output panel
No breaking changes introduced. Let me know if you'd like this guide
cross-referenced from other onboarding pages. ✅
---------
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
### Changes 🏗️
Calls to the moderation API now strip whitespace from the API key before
including it in the 'X-API-Key' header, preventing authentication issues
due to accidental leading or trailing spaces.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Setup and run the platform with moderation and test it works
## Summary
This PR refactors all Gmail blocks to share a common base class
(`GmailBase`) and adds several improvements to email handling, including
proper HTML content support, async API calls, and fixing the
78-character line wrapping issue for plain text emails.
## Changes
### Architecture Improvements
- **Unified base class**: Created `GmailBase` abstract class that
consolidates common functionality across all Gmail blocks
- **Async API calls**: Converted all Gmail API calls to use
`asyncio.to_thread` for better performance and non-blocking operations
- **Code deduplication**: Moved shared methods like `_build_service`,
`_get_email_body`, `_get_attachments`, and `_get_label_id` to the base
class
### Email Content Handling
- **Smart content type detection**: Added automatic detection of HTML vs
plain text content
- **Fix 78-char line wrapping**: Plain text emails now use a no-wrap
policy (`max_line_length=0`) to prevent Gmail's default 78-character
hard line wrapping
- **Content type parameter**: Added optional `content_type` field to
Send, Draft, Reply, and Forward blocks allowing manual override ("auto",
"plain", or "html")
- **Proper MIME handling**: Created `_make_mime_text` helper function to
properly configure MIME types and policies
### New Features
- **Gmail Forward Block**: Added new `GmailForwardBlock` for forwarding
emails with proper thread preservation
- **Reply improvements**: Reply block now properly reads the original
email content when replying
### Bug Fixes
- Fixed issue where reply block wasn't reading the email it was replying
to
- Fixed attachment handling in multipart messages
- Improved error handling for base64 decoding
## Technical Details
The refactoring introduces:
- `NO_WRAP_POLICY = SMTP.clone(max_line_length=0)` to prevent line
wrapping in plain text emails
- UTF-8 charset support for proper Unicode/emoji handling
- Consistent async patterns using `asyncio.to_thread` for all Gmail API
calls
- Proper HTML to text conversion using html2text library when available
## Testing
All existing tests pass. The changes maintain backward compatibility
while adding new optional parameters.
## Breaking Changes
None - all changes are backward compatible. The new `content_type`
parameter is optional and defaults to "auto" detection.
---------
Co-authored-by: Claude <claude@users.noreply.github.com>
This PR consolidates LaunchDarkly feature flag management by moving it
from autogpt_libs to backend and fixing several issues with boolean
handling and configuration management.
### Changes 🏗️
**Code Structure:**
- Move LaunchDarkly client from `autogpt_libs/feature_flag` to
`backend/util/feature_flag.py`
- Delete redundant `config.py` file and merge LaunchDarkly settings into
`backend/util/settings.py`
- Update all imports throughout the codebase to use
`backend.util.feature_flag`
- Move test file to `backend/util/feature_flag_test.py`
**Bug Fixes:**
- Fix `is_feature_enabled` function to properly return boolean values
instead of arbitrary objects that were always evaluating to `True`
- Add proper async/await handling for all `is_feature_enabled` calls
- Add better error handling when LaunchDarkly client is not initialized
**Performance & Architecture:**
- Load Settings at module level instead of creating new instances inside
functions
- Remove unnecessary `sdk_key` parameter from
`initialize_launchdarkly()` function
- Simplify initialization by using centralized settings management
**Configuration:**
- Add `launch_darkly_sdk_key` field to `Secrets` class in settings.py
with proper validation alias
- Remove environment variable fallback in favor of centralized settings
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] All existing feature flag tests pass (6/6 tests passing)
- [x] LaunchDarkly initialization works correctly with settings
- [x] Boolean feature flags return correct values instead of objects
- [x] Non-boolean flag values are properly handled with warnings
- [x] Async/await calls work correctly in AutoMod and activity status
generator
- [x] Code formatting and imports are correct
#### For configuration changes:
- [x] `.env.example` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
**Configuration Changes:**
- LaunchDarkly SDK key is now managed through the centralized Settings
system instead of a separate config file
- Uses existing `LAUNCH_DARKLY_SDK_KEY` environment variable (no changes
needed to env files)
🤖 Generated with [Claude Code](https://claude.ai/code)
---------
Co-authored-by: Claude <noreply@anthropic.com>
## Summary
Fixed Smart Decision Maker's function signature generation to properly
handle dynamic fields (e.g., `values_#_*`, `items_$_*`) when connecting
to any block as a tool.
### Context
When Smart Decision Maker calls other blocks as tools, it needs to
generate OpenAI-compatible function signatures. Previously, when
connected to blocks via dynamic fields (which get merged by the executor
at runtime), the signature generation would fail because blocks don't
inherently know about these dynamic field patterns.
### Changes 🏗️
- **Modified
`SmartDecisionMakerBlock._create_block_function_signature()`** to detect
and handle dynamic fields:
- Detects fields containing `_#_` (dict merge), `_$_` (list merge), or
`_@_` (object merge)
- Provides generic string schema for dynamic fields (OpenAI API
compatible)
- Falls back gracefully for unknown fields
- **Added comprehensive tests** for dynamic field handling with both
dictionary and list patterns
- **No changes needed to individual blocks** - this solution works
universally
### Why This Approach
Instead of modifying every block to handle dynamic fields (original PR
approach), we handle it centrally in Smart Decision Maker where the
function signatures are generated. This is cleaner and more
maintainable.
### Test Plan 📋
- [x] Created test cases for Smart Decision Maker generating function
signatures with dynamic dict fields (`_#_`)
- [x] Created test cases for Smart Decision Maker generating function
signatures with dynamic list fields (`_$_`)
- [x] Verified Smart Decision Maker can successfully call blocks like
CreateDictionaryBlock via dynamic connections
- [x] All existing Smart Decision Maker tests pass
- [x] Linting and formatting pass
---------
Co-authored-by: Claude <claude@users.noreply.github.com>
We're forcing this note to the end of the system prompt SDM block:
Only provide EXACTLY one function call; multiple tool calls are strictly
prohibited., this is being interpreted by GPT5 as "Only call one tool
per task," which is resulting in many agent runs that only use a tool
once (i.e., useless low low-effort answers)
### Changes 🏗️
Remove parallel tool-call system prompting entirely.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] automated tests.
Add py-spy for production-safe Python profiling across all backend services:
- Add py-spy dependency to pyproject.toml
- Grant SYS_PTRACE capability to Docker services for profiling access
- Enable low-overhead performance monitoring in development and production
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
Copy of [feat(backend/AM): Integrate AutoMod content moderation - By
Bentlybro - PR
#10490](https://github.com/Significant-Gravitas/AutoGPT/pull/10490) cos
i messed it up 🤦
Adds AutoMod input and output moderation to the execution flow.
Introduces a new AutoMod manager and models, updates settings for
moderation configuration, and modifies execution result handling to
support moderation-cleared data. Moderation failures now clear sensitive
data and mark executions as failed.
<img width="921" height="816" alt="image"
src="https://github.com/user-attachments/assets/65c0fee8-d652-42bc-9553-ff507bc067c5"
/>
### Changes 🏗️
I have made some small changes to
``autogpt_platform\backend\backend\executor\manager.py`` to send the
needed into to the AutoMod system which collects the data, combines and
makes the api call to AM and based on its reply lets it run or not!
I also had to make small changes to
``autogpt_platform\backend\backend\data\execution.py`` to add checks
that allow me to clear the content from the blocks if it was flagged
I am working on finalizing the AM repo then that will be public
To note: we will want to set this up behind launch darkly first for
testing on the team before we roll it out any more
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Setup and run the platform with ``automod_enabled`` set to False
and it works normally
- [x] Setup and run the platform with ``automod_enabled`` set to True,
set the AM URL and API Key and test it runs safe blocks normally
- [x] Test AM with content that would trigger it to flag and watch it
stop and clear all the blocks outputs
Message @Bentlybro for the URL and an API key to AM for local testing!
## Changes made to Settings.py
I have added a few new options to the settings.py for AutoMod Config!
```
# AutoMod configuration
automod_enabled: bool = Field(
default=False,
description="Whether AutoMod content moderation is enabled",
)
automod_api_url: str = Field(
default="",
description="AutoMod API base URL - Make sure it ends in /api",
)
automod_timeout: int = Field(
default=30,
description="Timeout in seconds for AutoMod API requests",
)
automod_retry_attempts: int = Field(
default=3,
description="Number of retry attempts for AutoMod API requests",
)
automod_retry_delay: float = Field(
default=1.0,
description="Delay between retries for AutoMod API requests in seconds",
)
automod_fail_open: bool = Field(
default=False,
description="If True, allow execution to continue if AutoMod fails",
)
automod_moderate_inputs: bool = Field(
default=True,
description="Whether to moderate block inputs",
)
automod_moderate_outputs: bool = Field(
default=True,
description="Whether to moderate block outputs",
)
```
and
```
automod_api_key: str = Field(default="", description="AutoMod API key")
```
---------
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
## Summary
Fix critical deadlock issue where scheduler pods would freeze completely
and become unresponsive to health checks, causing pod restarts and stuck
QUEUED executions.
## Root Cause Analysis
The scheduler was using `BlockingScheduler` which blocked the main
thread, and when concurrent jobs deadlocked in the async event loop, the
entire process would freeze - unable to respond to health checks or
process any requests.
From crash analysis:
- At 01:18:00, two jobs started executing concurrently
- At 01:18:01.482, last successful health check
- Process completely froze - no more logs until pod was killed at
01:18:46
- Execution `8174c459-c975-4308-bc01-331ba67f26ab` was created in DB but
never published to RabbitMQ
## Changes Made
### Core Deadlock Fix
- **Switch from BlockingScheduler to BackgroundScheduler**: Prevents
main thread blocking, allows health checks to work even if scheduler
jobs deadlock
- **Make all health_check methods async**: Makes health checks
completely independent of thread pools and more resilient to blocking
operations
### Enhanced Monitoring & Debugging
- **Add execution timing**: Track and log how long each graph execution
takes to create and publish
- **Warn on slow operations**: Alert when operations take >10 seconds,
indicating resource contention
- **Enhanced error logging**: Include elapsed time and exception types
in error messages
- **Better APScheduler event listeners**: Add listeners for missed jobs
and max instances with actionable messages
### Files Modified
- `backend/executor/scheduler.py` - Switch to BackgroundScheduler, async
health_check, timing monitoring
- `backend/util/service.py` - Base async health_check method
- `backend/executor/database.py` - Async health_check override
- `backend/notifications/notifications.py` - Async health_check override
## Test Plan
- [x] All existing tests pass (914 passed, 1 failed unrelated connection
issue)
- [x] Scheduler starts correctly with BackgroundScheduler
- [x] Health checks respond properly under load
- [x] Enhanced logging provides visibility into execution timing
## Impact
- **Prevents pod freezes**: Scheduler remains responsive even when jobs
deadlock
- **Better observability**: Clear visibility into slow operations and
failures
- **No dropped executions**: Jobs won't get stuck in QUEUED state due to
process freezes
- **Faster incident response**: Health checks and logs provide
actionable debugging info
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-authored-by: Claude <noreply@anthropic.com>
# Enhanced Discord Integration Blocks
Introduces new blocks for sending DMs, embeds, files, and replies in
Discord, as well as blocks for retrieving user and channel information.
Enhances existing message blocks with additional metadata fields and
server/channel identification. Improves test coverage and input/output
schemas for all Discord-related blocks.
Co-Authored-By: Claude <claude@users.noreply.github.com>
## Why These Changes Are Needed 🎯
The existing Discord integration was limited to basic message sending
and reading. Users needed more sophisticated Discord functionality to
build comprehensive automation workflows:
1. **Limited messaging options** - Could only send plain text to
channels, no DMs, embeds, or file attachments
2. **Poor graph connectivity** - Blocks didn't output IDs needed for
chaining operations (e.g., couldn't reply to a message after sending it)
3. **No user management** - Couldn't get user information or send direct
messages
4. **Type safety issues** - Discord.py's incomplete type hints caused
linting errors
5. **No channel resolution** - Had to manually find channel IDs instead
of using names
### Changes 🏗️
#### New Blocks Added
- **SendDiscordDMBlock** - Send direct messages to users via their
Discord ID
- **SendDiscordEmbedBlock** - Create rich embedded messages with images,
fields, and formatting
- **SendDiscordFileBlock** - Upload any file type (images, PDFs, videos,
etc.) using MediaFileType
- **ReplyToDiscordMessageBlock** - Reply to specific messages in threads
- **DiscordUserInfoBlock** - Retrieve user profile information
(username, avatar, creation date, etc.)
- **DiscordChannelInfoBlock** - Resolve channel names to IDs and get
channel metadata
#### Enhanced Existing Blocks
- **ReadDiscordMessagesBlock**:
- Now outputs: `message_id`, `channel_id`, `user_id` (previously missing
all IDs)
- Enables workflows like: read message → reply to it, or read message →
DM the author
- **SendDiscordMessageBlock**:
- Now outputs: `message_id`, `channel_id` (previously had no outputs
except status)
- Enables tracking sent messages and replying to them later
#### Technical Improvements
- **MediaFileType Support**: SendDiscordFileBlock accepts data URIs,
URLs, or local paths
- **Defensive Programming**: Added runtime type checks for Discord.py's
incomplete typing
- **ID Passthrough**: DiscordUserInfoBlock passes through user_id for
chaining
- **Better Error Messages**: Clear feedback when operations fail (e.g.,
"Channel cannot receive messages")
- **Channel Flexibility**: Blocks accept both channel names and IDs
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
#### Test Plan 🧪
- [x] **Import and initialization**: All 8 Discord blocks import and
initialize without errors
- [x] **Type checking**: `poetry run format` passes with no type errors
- [x] **Interface connectivity**: Verified blocks can chain together:
- [x] ReadDiscordMessages → ReplyToDiscordMessage (via message_id,
channel_id)
- [x] ReadDiscordMessages → SendDiscordDM (via user_id)
- [x] SendDiscordMessage → ReplyToDiscordMessage (via message_id,
channel_id)
- [x] DiscordUserInfo → SendDiscordDM (via user_id passthrough)
- [x] DiscordChannelInfo → SendDiscordEmbed/File (via channel_id)
- [x] **MediaFileType handling**: SendDiscordFileBlock correctly
processes:
- [x] Data URIs (base64 encoded files)
- [x] URLs (downloads from web)
- [x] Local paths (from other blocks)
- [x] **Defensive checks**: Verified error handling for:
- [x] Non-text channels (forums, categories)
- [x] Private/DM channels without guilds
- [x] Missing attributes on channel objects
- [x] **Mock test data**: All blocks have appropriate test
inputs/outputs defined
## Example Workflows Now Possible 🚀
1. **Auto-reply to mentions**: Read messages → Check if bot mentioned →
Reply in thread
2. **File distribution**: Generate report → Send as PDF to Discord
channel
3. **User notifications**: Get user info → Check if online → Send DM
with alert
4. **Cross-platform sync**: Receive email attachment → Forward to
Discord channel
5. **Rich notifications**: Create embed with thumbnail → Add fields →
Send to announcement channel
## Breaking Changes ⚠️
None - all changes are backward compatible. Existing workflows using
SendDiscordMessageBlock and ReadDiscordMessagesBlock will continue to
work, they just now have additional outputs available.
## Dependencies 📦
No new dependencies added. Uses existing:
- `discord.py` (already in project)
- `aiohttp` (already in project)
- Backend utilities: `MediaFileType`, `store_media_file` (already in
project)
---------
Co-authored-by: Claude <claude@users.noreply.github.com>
## Summary
Corrects the context window for GPT5_CHAT, fixes provider for
CLAUDE_4_1_OPUS from 'openai' to 'anthropic', and adds a 600s timeout to
the Anthropic client call in llm_call.
## Changes 🏗️
- changed gpt5's context limit to be smaller, 16k
- changed claude's provider from openai to anthropic
- Adding a 600s timeout to the Anthropic client call
## Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] test all models and they work
### Summary
Implemented 5 additional GitHub blocks on top of the existing GitHub
Integration to enhance CI/CD workflows and code review automation
capabilities.
[New Github
Blocks_v41.json](https://github.com/user-attachments/files/21684665/New.Github.Blocks_v41.json)
<img width="902" height="1073" alt="Screenshot 2025-08-08 at 15 09 40"
src="https://github.com/user-attachments/assets/ebb6d33b-f3cd-4a56-acc6-56ace5a01274"
/>
### Changes 🏗️
- Added **GitHub CI Results Block** (`github/ci.py`): Fetch and analyze
CI/CD check runs, workflow statuses, and logs
- Added **GitHub Review Blocks** (`github/reviews.py`):
- Create PR reviews with comments
- Approve/request changes on PRs
- Add review comments to specific lines
- Fetch existing reviews and comments
- Dismiss stale reviews
### Related Tickets
- SECRT-1423: GitHub CI Results Integration
- SECRT-1426: GitHub PR Review Creation
- SECRT-1425: GitHub Review Comments
- SECRT-1424: GitHub Review Approval/Changes
- SECRT-1427: GitHub Review Management
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Created and tested CI results block with various repositories
- [x] Tested PR review creation with comments
- [x] Verified review approval and change request functionality
- [x] Tested adding line-specific review comments
- [x] Confirmed fetching and dismissing reviews works correctly
## Summary
This PR fixes and enhances the Exa Websets implementation to resolve
issues with the expand_items parameter and improve the overall block
functionality. The changes address UI limitations with nested response
objects while providing a more comprehensive and user-friendly interface
for creating and managing Exa websets.
[Websets_v14.json](https://github.com/user-attachments/files/21596313/Websets_v14.json)
<img width="1335" height="949" alt="Screenshot 2025-08-05 at 11 45 07"
src="https://github.com/user-attachments/assets/3a9b3da0-3950-4388-96b2-e5dfa9df9b67"
/>
**Why these changes are necessary:**
1. **UI Compatibility**: The current implementation returns deeply
nested objects that cause the UI to crash. This PR flattens the input
parameters and returns simplified response objects to work around these
UI limitations.
2. **Expand Items Issue**: The `expand_items` toggle in the GetWebset
block was causing failures. This parameter has been removed as it's not
essential for the basic functionality.
3. **Missing SDK Integration**: The previous implementation used raw
HTTP requests instead of the official Exa SDK, making it harder to
maintain and more prone to errors.
4. **Limited Functionality**: The original implementation lacked support
for many Exa API features like imports, enrichments, and scope
configuration.
### Changes 🏗️
<\!-- Concisely describe all of the changes made in this pull request:
-->
1. **Added Pydantic models** (`model.py`):
- Created comprehensive type definitions for all Exa webset objects
- Added proper enums for status values and types
- Structured models to match the Exa API response format
2. **Refactored websets.py**:
- Replaced raw HTTP requests with the official `exa-py` SDK
- Flattened nested input parameters to avoid UI issues with complex
objects
- Enhanced `ExaCreateWebsetBlock` with support for:
- Search configuration with entity types, criteria, exclude/scope
sources
- Import functionality from existing sources
- Enrichment configuration with multiple formats
- Removed problematic `expand_items` parameter from `ExaGetWebsetBlock`
- Updated response objects to use simplified `Webset` model that returns
dicts for nested objects
3. **Updated webhook_blocks.py**:
- Disabled the webhook block temporarily (`disabled=True`) as it needs
further testing
4. **Added exa-py dependency**:
- Added official Exa Python SDK to `pyproject.toml` and `poetry.lock`
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<\!-- Put your test plan here: -->
- [x] Created a new webset using the ExaCreateWebsetBlock with basic
search parameters
- [x] Verified the webset was created successfully in the Exa dashboard
- [x] Listed websets using ExaListWebsetsBlock and confirmed pagination
works
- [x] Retrieved individual webset details using ExaGetWebsetBlock
without expand_items
- [x] Tested advanced features including entity types, criteria, and
exclude sources
- [x] Confirmed the UI no longer crashes when displaying webset
responses
- [x] Verified the Docker environment builds successfully with the new
exa-py dependency
#### For configuration changes:
- [x] `.env.example` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
- Added `exa-py` dependency to backend requirements
### Additional Notes
- The webhook functionality has been temporarily disabled pending
further testing and UI improvements
- The flattened parameter approach is a workaround for current UI
limitations with nested objects
- Future improvements could include re-enabling nested objects once the
UI supports them better
## Summary
- Enabled the TikTok posting block that was previously disabled
- The block provides comprehensive TikTok-specific posting options
## Changes 🏗️
- Removed `disabled=True` from TikTok posting block to enable
functionality
- Added full TikTok API integration with all supported options:
## Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified YouTube block is now available in the block list
---------
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
This PR standardizes health check error handling across all services by
introducing and using a consistent `UnhealthyServiceError` exception
type. This improves monitoring, debugging, and service reliability by
providing uniform error reporting when services are unhealthy.
### Changes 🏗️
- **Added `UnhealthyServiceError` class** in `backend/util/service.py`:
- Custom exception for unhealthy service states
- Includes service name in error message
- Added to `EXCEPTION_MAPPING` for proper serialization
- **Updated health checks across services** to use
`UnhealthyServiceError`:
- **Database service** (`backend/executor/database.py`): Replace
`RuntimeError` with `UnhealthyServiceError` for database connection
failures
- **Scheduler service** (`backend/executor/scheduler.py`): Replace
`RuntimeError` with `UnhealthyServiceError` for scheduler initialization
and running state checks
- **Notification service** (`backend/notifications/notifications.py`):
- Replace `RuntimeError` with `UnhealthyServiceError` for RabbitMQ
configuration issues
- Added new `health_check()` method to verify RabbitMQ readiness
- **REST API** (`backend/server/rest_api.py`): Replace `RuntimeError`
with `UnhealthyServiceError` for database health checks
- **Updated imports** across all affected files to include
`UnhealthyServiceError`
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified health check endpoints return appropriate errors when
services are unhealthy
- [x] Confirmed services start up properly and health checks pass when
healthy
- [x] Tested error serialization through API responses
- [x] Verified no breaking changes to existing functionality
#### For configuration changes:
- [x] `.env.example` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
No configuration changes were made in this PR - only code changes to
improve error handling consistency.
---------
Co-authored-by: Claude <noreply@anthropic.com>
I have added e2e tests for agent dashboard page
It includes, tests like
- dashboard page loads successfully
- submit agent button works correctly
- agent table displays data correctly
- agent table actions work correctly
I’ve also updated the e2e test script to include some static agent
submissions, so I can test if it loads on the frontend.
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] All tests are working perfectly locally
<img width="469" height="177" alt="Screenshot 2025-08-08 at 12 13 42 PM"
src="https://github.com/user-attachments/assets/5e37afc3-c151-476a-84de-0a06f44a0722"
/>
## Summary
- Create dedicated notification service entry point
(backend.notification:main)
- Remove NotificationManager from scheduler service for better
separation of concerns
- Update docker-compose to run notification service on dedicated port
8007
- Configure all services to communicate with separate notification
service
This refactoring separates the notification service from the scheduler
service, allowing them to run as independent microservices instead of
two processes in the same pod.
## Changes Made
- **New notification service entry point**: Created
`backend/backend/notification.py` with dedicated main function
- **Updated pyproject.toml**: Added notification service entry point
registration
- **Modified scheduler service**: Removed NotificationManager from
`backend/backend/scheduler.py`
- **Docker Compose updates**: Added notification_server service on port
8007, updated NOTIFICATIONMANAGER_HOST references
## Test plan
- [x] Verify notification service starts correctly with new entry point
- [x] Confirm scheduler service runs without notification manager
- [x] Test docker-compose configuration with separate services
- [x] Validate service discovery between microservices
- [x] Run linting and type checking
🤖 Generated with [Claude Code](https://claude.ai/code)
## Summary
This PR resolves unclosed HTTP client session errors that were occurring
in the backend, particularly during file uploads and service-to-service
communication.
### Key Changes
- **Fixed GCS storage operations**: Convert
`gcloud.aio.storage.Storage()` to use async context managers in
`media.py` and `cloud_storage.py`
- **Enhanced service client cleanup**: Added proper cleanup methods to
`DynamicClient` class in `service.py` with `__del__` fallback and
context manager support
- **Application shutdown cleanup**: Added cloud storage handler cleanup
to FastAPI application lifespan
- **Updated test mocks**: Fixed test fixtures to properly mock async
context manager behavior
### Root Cause Analysis
The "Unclosed client session" and "Unclosed connector" errors were
caused by:
1. **GCS storage clients** not using context managers (agent image
uploads)
2. **Service HTTP clients** (`httpx.Client`/`AsyncClient`) not being
properly cleaned up in the `DynamicClient` class
### Technical Details
- All `gcloud.aio.storage.Storage()` instances now use `async with`
context managers
- `DynamicClient` class now has proper cleanup methods and context
manager support
- Application shutdown hook ensures cloud storage handlers are properly
closed
- Test fixtures updated to mock async context manager protocol
### Testing
- ✅ All media upload tests pass
- ✅ Service client tests pass
- ✅ Linting and formatting pass
## Test plan
- [ ] Deploy to staging environment
- [ ] Monitor logs for "Unclosed client session" errors (should be
eliminated)
- [ ] Verify file upload functionality works correctly
- [ ] Check service-to-service communication operates normally
🤖 Generated with [Claude Code](https://claude.ai/code)
---------
Co-authored-by: Claude <noreply@anthropic.com>
## Summary
This PR addresses the recurring job validation failures by adding graph
validation before scheduling jobs. Previously, validation errors only
occurred at runtime during job execution, making it difficult to
communicate errors to users for scheduled recurring jobs.
### Changes 🏗️
- **Extract validation logic**: Created
`validate_and_construct_node_execution_input` wrapper function that
centralizes graph fetching, credential mapping, and validation logic
- **Add pre-scheduling validation**: Modified
`add_graph_execution_schedule` to validate graphs before creating
scheduled jobs
- **Make construct function private**: Renamed
`construct_node_execution_input` to `_construct_node_execution_input` to
prevent direct usage and encourage use of the wrapper
- **Reduce code duplication**: Eliminated duplicate validation logic
between scheduler and execution paths
- **Improve scheduler lifecycle management**:
- Enhanced cleanup process with proper event loop shutdown sequence
- Added graceful event loop thread termination with timeout
- Fixed thread lifecycle management to prevent resource leaks
- **Add helper utilities**:
- Created `run_async` helper to reduce
`asyncio.run_coroutine_threadsafe` boilerplate
- Added `SCHEDULER_OPERATION_TIMEOUT_SECONDS` constant for consistent
timeout handling across all scheduler operations
### Technical Details
**Validation Flow:**
The validation now happens in `add_graph_execution_schedule` before
calling `scheduler.add_job()`, ensuring that:
1. Graph exists and is accessible to the user
2. All credentials are valid and available
3. Graph structure and node configurations are valid
4. Starting nodes are present and properly configured
This uses the same validation logic as runtime execution, guaranteeing
consistency.
**Scheduler Lifecycle Improvements:**
- **Proper cleanup sequence**: Event loop is stopped before thread
termination
- **Thread management**: Added global tracking of event loop thread for
proper cleanup
- **Timeout consistency**: All scheduler operations now use the same
300-second timeout
- **Resource management**: Prevents potential memory leaks from unclosed
event loops
**Code Quality Improvements:**
- **DRY principle**: `run_async` helper eliminates repeated
`asyncio.run_coroutine_threadsafe` patterns
- **Single source of truth**: All timeout values use
`SCHEDULER_OPERATION_TIMEOUT_SECONDS` constant
- **Cleaner abstractions**: Direct utility function calls instead of
unnecessary wrapper methods
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified imports work correctly for both scheduler and utils
modules
- [x] Confirmed code passes all linting and type checking
- [x] Validated that existing functionality remains intact
- [x] Tested that validation logic is properly extracted and reused
- [x] Verified scheduler cleanup process works correctly
- [x] Confirmed thread lifecycle management improvements
#### For configuration changes:
- [x] `.env.example` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
*Note: No configuration changes were required for this fix.*
## Impact
- **Prevents runtime failures**: Invalid graphs are caught before
scheduling instead of failing silently during execution
- **Better error communication**: Validation errors surface immediately
when scheduling
- **Improved resource management**: Proper event loop and thread cleanup
prevents memory leaks
- **Enhanced maintainability**: Single source of truth for validation
logic and consistent timeout handling
- **Reduced code duplication**: Eliminated ~30+ lines of duplicate code
across validation and async execution patterns
- **Better developer experience**: Cleaner code with helper functions
and consistent patterns
Resolves the TODO comment: "We need to communicate this error to the
user somehow" in scheduler.py:107
Co-authored-by: Claude <noreply@anthropic.com>
The RabbitMQ connection is unreliable (fixing it is a separate issue)
and sometimes get restarted. The scope of this PR is to avoid the
operation break due to executing on a stale, broken connection.
### Changes 🏗️
Fix executor running RabbitMQ operations on closed/closing connection
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Manually kill rabbitmq and see how it goes while executing an
agent
Introduces discriminated unions for time, date, and date-time format
selection, supporting both strftime and ISO 8601 (with timezone and
microsecond options). Updates schemas, test cases, and block logic to
handle the new format types, improving flexibility and standards
compliance for time and date outputs.
<!-- Clearly explain the need for these changes: -->
### Why these changes are needed
Users need to output timestamps in ISO 8601/RFC 3339 format for API
integrations and standardized data exchange. The previous implementation
only supported strftime formatting, which made it difficult to generate
properly formatted timestamps with timezone information. This change
enables:
- **Standards compliance**: ISO 8601 and RFC 3339 compliant timestamps
- **Timezone support**: 38 timezone options covering all UTC offsets
globally
- **API compatibility**: Many APIs require RFC 3339 timestamps (e.g.,
"2011-06-03T10:00:00-07:00")
- **Backward compatibility**: Existing workflows continue to work with
default strftime format
### Changes 🏗️
<!-- Concisely describe all of the changes made in this pull request:
-->
- **Added discriminated union format types** for all time/date blocks:
- `GetCurrentTimeBlock`: Now supports `TimeStrftimeFormat` and
`TimeISO8601Format`
- `GetCurrentDateBlock`: Now supports `DateStrftimeFormat` and
`DateISO8601Format`
- `GetCurrentDateAndTimeBlock`: Now supports `StrftimeFormat` and
`ISO8601Format`
- **Implemented shared timezone support**:
- Created `TimezoneLiteral` type with 38 timezone options (all UTC
offsets)
- Supports fractional offsets (e.g., India UTC+05:30, Nepal UTC+05:45)
- Deduplicated timezone lists across all format classes
- **Added ISO 8601 format features**:
- Timezone-aware timestamps with proper offset formatting
- Optional microseconds inclusion
- RFC 3339 compliance (subset of ISO 8601 with mandatory timezone)
- **Updated test cases** for all three blocks to verify:
- Default behavior unchanged (backward compatibility)
- Custom strftime formats still work
- ISO 8601 format produces correct output
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Verified backward compatibility - default strftime format
unchanged
- [x] Tested ISO 8601 format with UTC timezone
- [x] Tested ISO 8601 format with various timezones (India, New York,
etc.)
- [x] Tested microseconds option for ISO formats
- [x] Verified all existing tests pass for GetCurrentTimeBlock
- [x] Verified all existing tests pass for GetCurrentDateBlock
- [x] Verified all existing tests pass for GetCurrentDateAndTimeBlock
- [x] Manually tested each block with different format configurations
- [x] Confirmed RFC 3339 compliance for timestamps with mandatory
timezone
---------
Co-authored-by: Claude <claude@users.noreply.github.com>
This adds the latest claude opus 4.1 model to the platform
This adds the following models
- claude-opus-4-1-20250805
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Test claude opus 4.1 to make sure they work
This adds the latest chatGPT models, gpt 5 to the platform, this is
ahead of its release, the prices and context limits are still to be
properly set but for now i set them to be the same as gpt4.1, the price
is set at 5 for now till we know more
This adds the following models
- gpt-5
- gpt-5-mini
- gpt-5-nano
- gpt-5-chat
### Changes 🏗️
<!-- Concisely describe all of the changes made in this pull request:
-->
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Test all of the models to make sure they work
<!-- Clearly explain the need for these changes: -->
### Changes 🏗️
<!-- Concisely describe all of the changes made in this pull request:
-->
### Checklist 📋
#### For code changes:
- [ ] I have clearly listed my changes in the PR description
- [ ] I have made a test plan
- [ ] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [ ] ...
<details>
<summary>Example test plan</summary>
- [ ] Create from scratch and execute an agent with at least 3 blocks
- [ ] Import an agent from file upload, and confirm it executes
correctly
- [ ] Upload agent to marketplace
- [ ] Import an agent from marketplace and confirm it executes correctly
- [ ] Edit an agent from monitor, and confirm it executes correctly
</details>
#### For configuration changes:
- [ ] `.env.example` is updated or already compatible with my changes
- [ ] `docker-compose.yml` is updated or already compatible with my
changes
- [ ] I have included a list of my configuration changes in the PR
description (under **Changes**)
<details>
<summary>Examples of configuration changes</summary>
- Changing ports
- Adding new services that need to communicate with each other
- Secrets or environment variable changes
- New or infrastructure changes such as databases
</details>
## Summary
- **Remove background_executor from NotificationManager** to eliminate
event loop conflicts that were causing RabbitMQ "Connection reset by
peer" errors
- **Convert all notification processing to fully async** using async
database clients
- **Optimize Settings instantiation** to prevent file descriptor leaks
by moving to module level
- **Fix scheduler event loop management** to use single shared loop
instead of thread-cached approach
## Changes 🏗️
### 1. Remove ProcessPoolExecutor from NotificationManager
- Eliminated `background_executor` entirely from notification service
- Converted `queue_weekly_summary()` and `process_existing_batches()`
from sync to async
- Fixed the root cause: `asyncio.run()` was creating new event loops,
conflicting with existing RabbitMQ connections
### 2. Full Async Conversion
- Updated `_consume_queue` to only accept async functions:
`Callable[[str], Awaitable[bool]]`
- Replaced sync `DatabaseManagerClient` with
`DatabaseManagerAsyncClient` throughout notification service
- Added missing async methods to `DatabaseManagerAsyncClient`:
- `get_active_user_ids_in_timerange`
- `get_user_email_by_id`
- `get_user_email_verification`
- `get_user_notification_preference`
- `create_or_add_to_user_notification_batch`
- `empty_user_notification_batch`
- `get_all_batches_by_type`
### 3. Settings Optimization
- Moved `Settings()` instantiation to module level in:
- `backend/util/metrics.py`
- `backend/blocks/google_calendar.py`
- `backend/blocks/gmail.py`
- `backend/blocks/slant3d.py`
- `backend/blocks/user.py`
- Prevents multiple file descriptor reads per process, reducing resource
usage
### 4. Scheduler Event Loop Fix
- **Simplified event loop initialization** in `Scheduler.run_service()`
to create single shared loop
- **Removed complex thread caching and locking** that could create
multiple connections
- **Fixed daemon thread lifecycle** by using non-daemon thread with
proper cleanup
- **Event loop runs in dedicated background thread** with graceful
shutdown handling
## Root Cause Analysis
The RabbitMQ "Connection reset by peer" errors were caused by:
1. **Event Loop Conflicts**: `asyncio.run()` in `queue_weekly_summary`
created new event loops, disrupting existing RabbitMQ heartbeat
connections
2. **Thread Resource Waste**: Thread-cached event loops in scheduler
created unnecessary connections
3. **File Descriptor Leaks**: Multiple Settings instantiations per
process increased resource pressure
## Why This Fixes the Issue
1. **Eliminates Event Loop Creation**: By using `asyncio.create_task()`
instead of `asyncio.run()`, we reuse the existing event loop
2. **Maintains Heartbeat Connections**: Async RabbitMQ connections
remain stable without event loop disruption
3. **Reduces Resource Pressure**: Settings optimization and simplified
scheduler reduce file descriptor usage
4. **Ensures Connection Stability**: Single shared event loop prevents
connection multiplexing issues
## Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified RabbitMQ connection stability by checking heartbeat logs
- [x] Confirmed async conversion maintains all notification
functionality
- [x] Tested scheduler job execution with simplified event loop
- [x] Validated Settings optimization reduces file descriptor usage
- [x] Ensured notification processing works end-to-end
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude <noreply@anthropic.com>
Some non-node execution errors and system failures (like credentials not
found, or database failure) are not logged and exposed to the user. This
will make the node execution look like it's failed without an error
message:
<img width="804" height="1141" alt="image"
src="https://github.com/user-attachments/assets/e81314a0-b9af-4a95-bba7-8df576911e96"
/>
### Changes 🏗️
Make all non-interruption errors yielded as node execution error output.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] CI
This adds the latest opensource models from OpenAI to the platform, we
are using openrouter to provide api access to it!
I added
- openai/gpt-oss-20b
- openai/gpt-oss-120b
### Changes 🏗️
<!-- Concisely describe all of the changes made in this pull request:
-->
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Test both of the latest models from openai, openai/gpt-oss-20b and
openai/gpt-oss-120b and they should work!
In this PR, I’ve added library page tests.
### Changes
I’ve added 9 tests: 8 for normal flows and 1 for checking edge cases.
Test names are something like:
- Library navigation is accessible from the navbar.
- The library page loads successfully.
- Agents are visible, and cards work correctly.
- Pagination works correctly.
- Sorting works correctly.
- Searching works correctly.
- Pagination while searching works correctly.
- Uploading an agent works correctly.
- Edge case: Search edge cases and error handling behave correctly.
Other than that, I’ve added a new utility that uses the build page to
help us create users at the start, which we could use to test the
library page.
- All tests are passing locally
<img width="514" height="465" alt="Screenshot 2025-07-12 at 11 13 41 AM"
src="https://github.com/user-attachments/assets/7a46c437-7db5-458b-b99a-4fa0d479866f"
/>
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] All library tests are working locally and on CI perfectly.