Commit Graph

7785 Commits

Author SHA1 Message Date
Nicholas Tindle
590f434d0a feat(backend/chat): update CoPilot execution context mapping
Update run_block.py with proper CoPilot-to-graph context mapping:
- graph_id = copilot-session-{session_id} (agent = session)
- graph_exec_id = copilot-session-{session_id} (run = session)
- graph_version = 1 (versions are 1-indexed)
- Pass workspace_id and session_id for file operations
- Each chat session is its own agent with one continuous run

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 20:15:26 -06:00
Nicholas Tindle
8f171a0537 feat(backend/chat): add CoPilot workspace tools
Add tools for CoPilot to manage workspace files:
- list_workspace_files: list files with session scoping
- read_workspace_file: read file content or metadata
- write_workspace_file: save content to workspace
- delete_workspace_file: remove files
- Session-aware operations (default to current session folder)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 20:15:14 -06:00
Nicholas Tindle
c814a43465 feat(backend/api): add workspace REST endpoints
Add API routes for workspace file management:
- GET /api/workspace - get workspace info
- POST /api/workspace/files - upload file
- GET /api/workspace/files - list files
- GET /api/workspace/files/{id} - get file info
- GET /api/workspace/files/{id}/download - download file
- DELETE /api/workspace/files/{id} - soft delete
- Stream file content when signed URLs unavailable

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 20:15:00 -06:00
Nicholas Tindle
5923041fe8 feat(backend): integrate workspace storage into store_media_file
Update store_media_file to use workspace when available:
- Save files to workspace instead of temp exec_file dir
- Return workspace:// references instead of base64 data URIs
- Handle workspace:// input references (read from workspace)
- Pass session_id to WorkspaceManager for session scoping
- Prevents context bloat (100KB file = ~133KB as base64)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 20:14:44 -06:00
Nicholas Tindle
936a2d70db feat(backend): add workspace_id and session_id to ExecutionContext
Extend ExecutionContext with workspace fields:
- workspace_id: user's workspace for file persistence
- session_id: chat session for file isolation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 20:14:26 -06:00
Nicholas Tindle
80c54b7f46 feat(backend): add workspace storage backend abstraction
Add storage layer for workspace files:
- WorkspaceStorageBackend: abstract interface
- GCSWorkspaceStorage: Google Cloud Storage implementation
- LocalWorkspaceStorage: local filesystem for self-hosted
- WorkspaceManager: high-level file operations with session scoping
- Session-scoped virtual paths: /sessions/{session_id}/{filename}
- Fallback to API proxy when GCS signed URLs unavailable

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 20:14:09 -06:00
Nicholas Tindle
28caf01ca7 feat(backend/db): add UserWorkspace and UserWorkspaceFile models
Add database schema for persistent user workspace storage:
- UserWorkspace: one-per-user workspace container
- UserWorkspaceFile: file metadata with virtual paths, checksums, and source tracking
- WorkspaceFileSource enum: UPLOAD, EXECUTION, COPILOT, IMPORT
- CRUD operations in data/workspace.py

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 20:13:49 -06:00
Ubbe
071b3bb5cd fix(frontend): more copilot refinements (#11858)
## Changes 🏗️

On the **Copilot** page:

- prevent unnecessary sidebar repaints 
- show a disclaimer when switching chats on the sidebar to terminate a
current stream
- handle loading better
- save streams better when disconnecting


### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Run the app locally and test the above
2026-01-28 00:49:28 +07:00
Swifty
2134d777be fix(backend): exclude disabled blocks from chat search and indexing (#11854)
## Summary

Disabled blocks (e.g., webhook blocks without `platform_base_url`
configured) were being indexed and returned in chat tool search results.
This PR ensures they are properly filtered out.

### Changes 🏗️

- **find_block.py**: Skip disabled blocks when enriching search results
- **content_handlers.py**: 
  - Skip disabled blocks during embedding indexing
- Update `get_stats()` to only count enabled blocks for accurate
coverage metrics

### Why

Blocks can be disabled for various reasons (missing OAuth config, no
platform URL for webhooks, etc.). These blocks shouldn't appear in
search results since users cannot use them.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Verified disabled blocks are filtered from search results
  - [x] Verified disabled blocks are not indexed
  - [x] Verified stats accurately reflect enabled block count
2026-01-27 15:21:13 +00:00
Ubbe
962824c8af refactor(frontend): copilot session management stream updates (#11853)
## Changes 🏗️

- **Fix infinite loop in copilot page**
- use Zustand selectors instead of full store object to get stable
function references
- **Centralize chat streaming logic**
- move all streaming files from `providers/chat-stream/` to
`components/contextual/Chat/` for better colocation and reusability
- **Rename `copilot-store` → `copilot-page-store`**: Clarify scope
- **Fix message duplication**
- Only replay chunks from active streams (not completed ones) since
backend already provides persisted messages in `initialMessages`
- **Auto-focus chat input**
  - Focus textarea when streaming ends and input is re-enabled
- **Graceful error display**
- Render tool response errors in muted style (small text + warning icon)
instead of raw "Error: ..." text

## Checklist 📋

### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Navigate to copilot page - no infinite loop errors
  - [x] Start a new chat, send message, verify streaming works
- [x] Navigate away and back to a completed session - no duplicate
messages
  - [x] After stream completes, verify chat input receives focus
  - [x] Trigger a tool error - verify it displays with muted styling
2026-01-27 22:09:25 +07:00
Zamil Majdy
3e9d5d0d50 fix(backend): handle race condition in review processing gracefully (#11845)
## Summary
- Fixes race condition when multiple concurrent requests try to process
the same reviews (e.g., double-click, multiple browser tabs)
- Previously the second request would fail with "Reviews not found,
access denied, or not in WAITING status"
- Now handles this gracefully by treating already-processed reviews with
the same decision as success

## Changes
- Added `get_reviews_by_node_exec_ids()` function that fetches reviews
regardless of status
- Modified `process_all_reviews_for_execution()` to handle
already-processed reviews
- Updated route to use idempotent validation

## Test plan
- [x] Linter passes (`poetry run ruff check`)
- [x] Type checker passes (`poetry run pyright`)
- [x] Formatter passes (`poetry run format`)
- [ ] Manual testing: double-click approve button should not cause
errors

Fixes AUTOGPT-SERVER-7HE
2026-01-27 21:43:31 +07:00
Swifty
fac10c422b fix(backend): add SSE heartbeats to prevent tool execution timeouts (#11855)
## Summary

Long-running chat tools (like `create_agent` and `edit_agent`) were
timing out because no SSE data was sent during tool execution. GCP load
balancers and proxies have idle connection timeouts (~60 seconds), and
when the external Agent Generator service takes longer than this, the
connection would drop.

This PR adds SSE heartbeat comments during tool execution to keep
connections alive.

### Changes 🏗️

- **response_model.py**: Added `StreamHeartbeat` response type that
emits SSE comments (`: heartbeat\n\n`)
- **service.py**: Modified `_yield_tool_call()` to:
  - Run tool execution in a background asyncio task
  - Yield heartbeat events every 15 seconds while waiting
- Handle task failures with explicit error responses (no silent
failures)
  - Handle cancellation gracefully
- **create_agent.py**: Improved error messages with more context and
details
- **edit_agent.py**: Improved error messages with more context and
details

### How It Works

```
Tool Call → Background Task Started
     │
     ├── Every 15 seconds: yield `: heartbeat\n\n` (SSE comment)
     │
     └── Task Complete → yield tool result OR error response
```

SSE comments (`: heartbeat\n\n`) are:
- Ignored by SSE clients (don't trigger events)
- Keep TCP connections alive through proxies/load balancers
- Don't affect the AI SDK data protocol

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] All chat service tests pass (17 tests)
  - [x] Verified heartbeats are sent during long tool execution
  - [x] Verified errors are properly reported to frontend
2026-01-27 15:41:58 +01:00
Bently
91c7896859 fix(backend): implement context window management for long chat sessions (#11848)
## Changes 🏗️

Implements automatic context window management to prevent chat failures
when conversations exceed token limits.

  ### Problem
- **Issue**: [SECRT-1800] Long chat conversations stop working when
context grows beyond model limits (~113k tokens observed)
- **Root Cause**: Chat service sends ALL messages to LLM without
token-aware compression, eventually exceeding Claude Opus 4.5's 200k
context window

  ### Solution
  Implements a sliding window with summarization strategy:
1. Monitors token count before sending to LLM (triggers at 120k tokens)
2. Keeps last 15 messages completely intact (preserves recent
conversation flow)
  3. Summarizes older messages using gpt-4o-mini (fast & cheap)
4. Rebuilds context: `[system_prompt] + [summary] +
[recent_15_messages]`
5. Full history preserved in database (only compresses when sending to
LLM)

  ### Changes Made
- **Added** `_summarize_messages()` helper function to create concise
summaries using gpt-4o-mini
- **Modified** `_stream_chat_chunks()` to implement token counting and
conditional summarization
- **Integrated** existing `estimate_token_count()` utility for accurate
token measurement
- **Added** graceful fallback - continues with original messages if
summarization fails

  ## Motivation and Context 🎯

Without context management, users with long chat sessions (250+
messages) experience:
  - Complete chat failure when hitting 200k token limit
  - Lost conversation context
  - Poor user experience

  This fix enables:
  -  Unlimited conversation length
  -  Transparent operation (no UX changes)
  -  Preserved conversation quality (recent messages intact)
  -  Cost-efficient (~$0.0001 per summarization)

  ## Testing 🧪

  ### Expected Behavior
  - Conversations < 120k tokens: No change (normal operation)
  - Conversations > 120k tokens:
- Log message: `Context summarized: {tokens} tokens, kept last 15
messages + summary`
    - Chat continues working smoothly
    - Recent context remains intact

  ### How to Verify
  1. Start a chat session in copilot
  2. Send 250-600 messages (or 50+ with large code blocks)
  3. Check logs for "Context summarized:" message
  4. Verify chat continues working without errors
  5. Verify conversation quality remains good

  ## Checklist 

  - [x] My code follows the style guidelines of this project
  - [x] I have performed a self-review of my own code
- [x] I have commented my code, particularly in hard-to-understand areas
  - [x] My changes generate no new warnings
  - [x] I have tested my changes and verified they work as expected
2026-01-27 15:37:17 +01:00
Swifty
bab436231a refactor(backend): remove Langfuse tracing from chat system (#11829)
We are removing Langfuse tracing from the chat/copilot system in favor
of using OpenRouter's broadcast feature, which keeps our codebase
simpler. Langfuse prompt management is retained for fetching system
prompts.

### Changes 🏗️

**Removed Langfuse tracing:**
- Removed `@observe` decorators from all 11 chat tool files
- Removed `langfuse.openai` wrapper (now using standard `openai` client)
- Removed `start_as_current_observation` and `propagate_attributes`
context managers from `service.py`
- Removed `update_current_trace()`, `update_current_span()`,
`span.update()` calls

**Retained Langfuse prompt management:**
- `langfuse.get_prompt()` for fetching system prompts
- `_is_langfuse_configured()` check for prompt availability
- Configuration for `langfuse_prompt_name`

**Files modified:**
- `backend/api/features/chat/service.py`
- `backend/api/features/chat/tools/*.py` (11 tool files)

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Verified `poetry run format` passes
  - [x] Verified no `@observe` decorators remain in chat tools
- [x] Verified Langfuse prompt fetching is still functional (code
preserved)
2026-01-27 13:07:42 +01:00
Zamil Majdy
859f3f8c06 feat(frontend): implement clarification questions UI for agent generation (#11833)
## Summary
Add interactive UI to collect user answers when the agent-generator
service returns clarifying questions during agent creation/editing.

Previously, when the backend asked clarifying questions, the frontend
would just display them as text with no way for users to answer. This
caused the chat to keep retrying without the necessary context.

## Changes
- **ChatMessageData type**: Add `clarification_needed` variant with
questions field
- **ClarificationQuestionsWidget**: New component with interactive form
to collect answers
- **parseToolResponse**: Detect and parse `clarification_needed`
responses from backend
- **ChatMessage**: Render the widget when clarification is needed

## How It Works
1. User requests to create/edit agent
2. Backend returns `ClarificationNeededResponse` with list of questions
3. Frontend shows interactive form with text inputs for each question
4. User fills in answers and clicks "Submit Answers"
5. Answers are sent back as context to the tool
6. Backend receives full context and continues

## UI Features
- Shows all questions with examples (if provided)
- Input validation (all questions must be answered to submit)
- Visual feedback (checkmarks when answered)
- Numbered questions for clarity
- Submit button disabled until all answered
- Follows same design pattern as `credentials_needed` flow

## Related
- Backend support for clarification was added in #11819
- Fixes the issue shown in the screenshot where users couldn't answer
clarifying questions

## Test plan
- [ ] Test creating agent that requires clarifying questions
- [ ] Verify questions are displayed in interactive form
- [ ] Verify all questions must be answered before submitting
- [ ] Verify answers are sent back to backend as context
- [ ] Verify agent creation continues with full context
2026-01-27 09:22:30 +00:00
Swifty
d5c0f5b2df refactor(backend): remove page context from chat service (#11844)
### Background
The chat service previously supported including page context (URL and
content) in user messages. This functionality is being removed.

### Changes 🏗️

- Removed page context handling from `stream_chat_completion` in the
chat service
- User messages are now passed directly without URL/content context
injection
- Removed associated logging for page context

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Verify chat functionality works without page context
  - [x] Confirm no regressions in basic chat message handling
2026-01-26 16:00:48 +00:00
Ubbe
fbc2da36e6 fix(analytics): only try to init Posthog when on cloud (#11843)
## Changes 🏗️

This prevents Posthog from being initialised locally, where we should
not be collecting analytics during local development.

## Checklist 📋

### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Run locally and test the above
2026-01-26 22:54:19 +07:00
Swifty
75ecc4de92 fix(backend): enforce block disabled flag on execution endpoints (#11839)
## Summary
This PR adds security checks to prevent execution of disabled blocks
across all block execution endpoints.

- Add `disabled` flag check to main web API endpoint
(`/api/blocks/{block_id}/execute`)
- Add `disabled` flag check to external API endpoint
(`/api/blocks/{block_id}/execute`)
- Add `disabled` flag check to chat tool block execution

Previously, block execution endpoints only checked if a block existed
but did not verify the `disabled` flag, allowing any authenticated user
to execute disabled blocks.

## Test plan
- [x] Verify disabled blocks return 403 Forbidden on main API endpoint
- [x] Verify disabled blocks return 403 Forbidden on external API
endpoint
- [x] Verify disabled blocks return error response in chat tool
execution
- [x] Verify enabled blocks continue to execute normally
2026-01-26 13:56:24 +00:00
Abhimanyu Yadav
f0c2503608 feat(frontend): support multiple node execution results and accumulated data display (#11834)
### Changes 🏗️

- Refactored node execution results storage to maintain a history of
executions instead of just the latest result
- Added support for viewing accumulated output data across multiple
executions
- Implemented a cleaner UI for viewing historical execution results with
proper grouping
- Added functionality to clear execution results when starting a new run
- Created helper functions to normalize and process execution data
consistently
- Updated the NodeDataViewer component to display both latest and
historical execution data
- Added ability to view input data alongside output data in the
execution history

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Create and run a flow with multiple blocks that produce output
- [x] Verify that execution results are properly accumulated and
displayed
- [x] Run the same flow multiple times and confirm historical data is
preserved
- [x] Test the "View more data" functionality to ensure it displays all
execution history
- [x] Verify that execution results are properly cleared when starting a
new run
2026-01-26 12:33:22 +00:00
Swifty
cfb7dc5aca feat(backend): Add PostHog analytics and OpenRouter tracing to chat system (#11828)
Adds analytics tracking to the chat copilot system for better
observability of user interactions and agent operations.

### Changes 🏗️

**PostHog Analytics Integration:**
- Added `posthog` dependency (v7.6.0) to track chat events
- Created new tracking module (`backend/api/features/chat/tracking.py`)
with events:
  - `chat_message_sent` - When a user sends a message
  - `chat_tool_called` - When a tool is called (includes tool name)
  - `chat_agent_run_success` - When an agent runs successfully
  - `chat_agent_scheduled` - When an agent is scheduled
  - `chat_trigger_setup` - When a trigger is set up
- Added PostHog configuration to settings:
  - `POSTHOG_API_KEY` - API key for PostHog
- `POSTHOG_HOST` - PostHog host URL (defaults to
`https://us.i.posthog.com`)

**OpenRouter Tracing:**
- Added `user` and `session_id` fields to chat completion API calls for
OpenRouter tracing
- Added `posthogDistinctId` and `posthogProperties` (with environment)
to API calls

**Files Changed:**
- `backend/api/features/chat/tracking.py` - New PostHog tracking module
- `backend/api/features/chat/service.py` - Added user message tracking
and OpenRouter tracing
- `backend/api/features/chat/tools/__init__.py` - Added tool call
tracking
- `backend/api/features/chat/tools/run_agent.py` - Added agent
run/schedule tracking
- `backend/util/settings.py` - Added PostHog configuration fields
- `pyproject.toml` - Added posthog dependency

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Verified code passes linting and formatting
- [x] Verified PostHog client initializes correctly when API key is
provided
- [x] Verified tracking is gracefully skipped when PostHog is not
configured

#### For configuration changes:

- [ ] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

**New environment variables (optional):**
- `POSTHOG_API_KEY` - PostHog project API key
- `POSTHOG_HOST` - PostHog host URL (optional, defaults to US cloud)
2026-01-26 12:26:15 +00:00
Zamil Majdy
9a6e17ff52 feat(backend): add external Agent Generator service integration (#11819)
## Summary
- Add support for delegating agent generation to an external
microservice when `AGENTGENERATOR_HOST` is configured
- Falls back to built-in LLM-based implementation when not configured
(default behavior)
- Add comprehensive tests for the service client and core integration
(34 tests)

## Changes
- Add `agentgenerator_host`, `agentgenerator_port`,
`agentgenerator_timeout` settings to `backend/util/settings.py`
- Add `service.py` client for external Agent Generator API endpoints:
  - `/api/decompose-description` - Break down goals into steps
  - `/api/generate-agent` - Generate agent from instructions
  - `/api/update-agent` - Generate patches to update existing agents
  - `/api/blocks` - Get available blocks
  - `/health` - Health check
- Update `core.py` to delegate to external service when configured
- Export `is_external_service_configured` and
`check_external_service_health` from the module

## Related PRs
- Infrastructure repo:
https://github.com/Significant-Gravitas/AutoGPT-cloud-infrastructure/pull/273

## Test plan
- [x] All 34 new tests pass (`poetry run pytest test/agent_generator/
-v`)
- [ ] Deploy with `AGENTGENERATOR_HOST` configured and verify external
service is used
- [ ] Verify built-in implementation still works when
`AGENTGENERATOR_HOST` is empty
2026-01-25 04:08:56 +07:00
Zamil Majdy
fb58827c61 feat(backend;frontend): Implement node-specific auto-approval, safety popup, and race condition fixes (#11810)
## Summary

This PR implements comprehensive improvements to the human-in-the-loop
(HITL) review system, including safety features, architectural changes,
and bug fixes:

### Key Features
- **SECRT-1798: One-time safety popup** - Shows informational popup
before first run of AI-generated agents with sensitive actions/HITL
blocks
- **SECRT-1795: Auto-approval toggle UX** - Toggle in pending reviews
panel to auto-approve future actions from the same node
- **Node-specific auto-approval** - Changed from execution-specific to
node-specific using special key pattern
`auto_approve_{graph_exec_id}_{node_id}`
- **Consolidated approval checking** - Merged `check_auto_approval` into
`check_approval` using single OR query for better performance
- **Race condition prevention** - Added execution status check before
resuming to prevent duplicate execution when approving while graph is
running
- **Parallel auto-approval creation** - Uses `asyncio.gather` for better
performance when creating multiple auto-approval records

## Changes

### Backend Architecture
- **`human_review.py`**: 
- Added `check_approval()` function that checks both normal and
auto-approval in single query
- Added `create_auto_approval_record()` for node-specific auto-approval
using special key pattern
- Added `get_auto_approve_key()` helper to generate consistent
auto-approval keys
- **`review/routes.py`**: 
- Added execution status check before resuming to prevent race
conditions
- Refactored auto-approval record creation to use parallel execution
with `asyncio.gather`
  - Removed obvious comments for cleaner code
- **`review/model.py`**: Added `auto_approve_future_actions` field to
`ReviewRequest`
- **`blocks/helpers/review.py`**: Updated to use consolidated
`check_approval` via database manager client
- **`executor/database.py`**: Exposed `check_approval` through
DatabaseManager RPC for block execution context
- **`data/block.py`**: Fixed safe mode checks for sensitive action
blocks

### Frontend
- **New `AIAgentSafetyPopup`** component with localStorage-based
one-time display
- **`PendingReviewsList`**: 
  - Replaced "Approve all future actions" button with toggle
- Toggle resets data to original values and disables editing when
enabled
  - Shows warning message explaining auto-approval behavior
- **`RunAgentModal`**: Integrated safety popup before first run
- **`usePendingReviews`**: Added polling for real-time badge updates
- **`FloatingSafeModeToggle` & `SafeModeToggle`**: Simplified visibility
logic
- **`local-storage.ts`**: Added localStorage key for popup state
tracking

### Bug Fixes
- Fixed "Client is not connected to query engine" error by using
database manager client pattern
- Fixed race condition where approving reviews while graph is RUNNING
could queue execution twice
- Fixed migration to only drop FK constraint, not non-existent column
- Fixed card data reset when auto-approve toggle changes

### Code Quality
- Removed duplicate/obvious comments
- Moved imports to top-level instead of local scope in tests
- Used walrus operator for cleaner conditional assignments
- Parallel execution for auto-approval record creation

## Test plan
- [ ] Create an AI-generated agent with sensitive actions (e.g., email
sending)
- [ ] First run should show the safety popup before starting
- [ ] Subsequent runs should not show the popup
- [ ] Clear localStorage (`AI_AGENT_SAFETY_POPUP_SHOWN`) to verify popup
shows again
- [ ] Create an agent with human-in-the-loop blocks
- [ ] Run it and verify the pending reviews panel appears
- [ ] Enable the "Auto-approve all future actions" toggle
- [ ] Verify editing is disabled and shows warning message
- [ ] Click "Approve" and verify subsequent blocks from same node
auto-approve
- [ ] Verify auto-approval persists across multiple executions of same
graph
- [ ] Disable toggle and verify editing works normally
- [ ] Verify "Reject" button still works regardless of toggle state
- [ ] Test race condition: Approve reviews while graph is RUNNING
(should skip resume)
- [ ] Test race condition: Approve reviews while graph is REVIEW (should
resume)
- [ ] Verify pending reviews badge updates in real-time when new reviews
are created
2026-01-25 04:05:25 +07:00
Zamil Majdy
595f3508c1 refactor(backend): consolidate embedding error logging to prevent Sentry spam (#11832)
## Summary

Refactors error handling in the embedding service to prevent Sentry
alert spam. Previously, batch operations would log one error per failed
file, causing hundreds of duplicate alerts. Now, exceptions bubble up
from individual functions and are aggregated at the batch level,
producing a single log entry showing all unique error types with counts.

## Changes

### Removed Error Swallowing
- Removed try/except blocks from `generate_embedding()`,
`store_content_embedding()`, `ensure_content_embedding()`,
`get_content_embedding()`, and `ensure_embedding()`
- These functions now raise exceptions instead of returning None/False
on failure
- Added docstring notes: "Raises exceptions on failure - caller should
handle"

### Improved Batch Error Aggregation
- Updated `backfill_all_content_types()` to aggregate unique errors
- Collects all exceptions from batch results
- Groups by error type and message, shows counts
- Single log entry per content type instead of per-file

### Example Output
Before: 50 separate error logs for same issue
After: `BLOCK: 50/100 embeddings failed. Errors: PrismaError: type
vector does not exist (50x)`

## Motivation

This was triggered by the AUTOGPT-SERVER-7D2 Sentry issue where pgvector
errors created hundreds of duplicate alerts. Even after the root cause
was fixed (stale database connections), the error logging pattern would
create spam for any future issues.

## Impact

-  Reduces Sentry noise - single alert per batch instead of per-file
-  Better diagnostics - shows all unique error types with counts
-  Cleaner code - removed ~24 lines of unnecessary error swallowing
-  Proper exception propagation follows Python best practices

## Testing

- Existing tests should pass (error handling moved to batch level)
- Error aggregation logic tested via
asyncio.gather(return_exceptions=True)

## Related Issues

- Fixes Sentry alert spam from AUTOGPT-SERVER-7D2
2026-01-24 21:49:32 +07:00
Ubbe
7892590b12 feat(frontend): refine copilot loading states (#11827)
## Changes 🏗️

- Make the loading UX better when switching between chats or loading a
new chat
- Make session/chat management logic more manageable
- Improving "Deep thinking" loading states
- Fix bug that happened when returning to chat after navigating away

## Checklist 📋

### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Run the app locally and test the above
2026-01-23 18:25:45 +07:00
Bently
82d7134fc6 feat(blocks): Add ClaudeCodeBlock for executing tasks via Claude Code in E2B sandbox (#11761)
Introduces a new ClaudeCodeBlock that enables execution of coding tasks
using Anthropic's Claude Code in an E2B sandbox. This block unlocks
powerful agentic coding capabilities - Claude Code can autonomously
create files, install packages, run commands, and build complete
applications within a secure sandboxed environment.

Changes 🏗️

- New file backend/blocks/claude_code.py:
  - ClaudeCodeBlock - Execute tasks using Claude Code in an E2B sandbox
- Dual credential support: E2B API key (sandbox) + Anthropic API key
(Claude Code)
- Session continuation support via session_id, sandbox_id, and
conversation_history
- Automatic file extraction with path, relative_path, name, and content
fields
  - Configurable timeout, setup commands, and working directory
- dispose_sandbox option to keep sandbox alive for multi-turn
conversations

Checklist 📋

For code changes:

- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Create and execute ClaudeCodeBlock with a simple prompt ("Create a
hello world HTML file")
- [x] Verify files output includes correct path, relative_path, name,
and content
- [x] Test session continuation by passing session_id and sandbox_id
back
- [x] Build "Any API → Instant App" demo agent combining Firecrawl +
ClaudeCodeBlock + GitHub blocks
- [x] Verify generated files are pushed to GitHub with correct folder
structure using relative_path

Here are two example agents i made that can be used to test this agent,
they require github, anthropic and e2b access via api keys that are set
via the user/on the platform is testing on dev

The first agent is my

Any API → Instant App
"Transform any API documentation into a fully functional web
application. Just provide a docs URL and get a complete, ready-to-deploy
app pushed to a new GitHub repository."

[Any API → Instant
App_v36.json](https://github.com/user-attachments/files/24600326/Any.API.Instant.App_v36.json)


The second agent is my
Idea to project
"Simply enter your coding project's idea and this agent will make all of
the base initial code needed for you to start working on that project
and place it on github for you!"

[Idea to
project_v11.json](https://github.com/user-attachments/files/24600346/Idea.to.project_v11.json)

If you have any questions or issues let me know.

References
https://e2b.dev/blog/python-guide-run-claude-code-in-an-e2b-sandbox

https://github.com/e2b-dev/e2b-cookbook/tree/main/examples/anthropic-claude-code-in-sandbox-python
https://code.claude.com/docs/en/cli-reference

I tried to use E2b's "anthropic-claude-code" template but it kept
complaining it was out of date, so I make it manually spin up a E2b
instance and make it install the latest claude code and it uses that
2026-01-23 10:05:32 +00:00
Nicholas Tindle
90466908a8 refactor(docs): restructure platform docs for GitBook and remove MkDo… (#11825)
<!-- Clearly explain the need for these changes: -->
we met some reality when merging into the docs site but this fixes it
### Changes 🏗️
updates paths, adds some guides
<!-- Concisely describe all of the changes made in this pull request:
-->
update to match reality
### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  <!-- Put your test plan here: -->
  - [x] deploy it and validate

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Aligns block integrations documentation with GitBook.
> 
> - Changes generator default output to
`docs/integrations/block-integrations` and writes overview `README.md`
and `SUMMARY.md` at `docs/integrations/`
> - Adds GitBook frontmatter and hint syntax to overview; prefixes block
links with `block-integrations/`
> - Introduces `generate_summary_md` to build GitBook navigation
(including optional `guides/`)
> - Preserves per-block manual sections and adds optional `extras` +
file-level `additional_content`
> - Updates sync checker to validate parent `README.md` and `SUMMARY.md`
> - Rewrites `docs/integrations/README.md` with GitBook frontmatter and
updated links; adds `docs/integrations/SUMMARY.md`
> - Adds new guides: `guides/llm-providers.md`,
`guides/voice-providers.md`
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
fdb7ff8111. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: bobby.gaffin <bobby.gaffin@agpt.co>
2026-01-23 06:18:16 +00:00
Zamil Majdy
f9f984a8f4 fix(db): Remove redundant migration and fix pgvector schema handling (#11822)
### Changes 🏗️

This PR includes two database migration fixes:

#### 1. Remove redundant Supabase extensions migration

Removes the `20260112173500_add_supabase_extensions_to_platform_schema`
migration which was attempting to manage Supabase-provided extensions
and schemas.

**What was removed:**
- Migration that created extensions (pgcrypto, uuid-ossp,
pg_stat_statements, pg_net, pgjwt, pg_graphql, pgsodium, supabase_vault)
- Schema creation for these extensions

**Why it was removed:**
- These extensions and schemas are pre-installed and managed by Supabase
automatically
- The migration was redundant and could cause schema drift warnings
- Attempting to manage Supabase-owned resources in our migrations is an
anti-pattern

#### 2. Fix pgvector extension schema handling

Improves the `20260109181714_add_docs_embedding` migration to handle
cases where pgvector exists in the wrong schema.

**Problem:**
- If pgvector was previously installed in `public` schema, `CREATE
EXTENSION IF NOT EXISTS` would succeed but not actually install it in
the `platform` schema
- This causes `type "vector" does not exist` errors because the type
isn't in the search_path

**Solution:**
- Detect if vector extension exists in a different schema than the
current one
- Drop it with CASCADE and reinstall in the correct schema (platform)
- Use dynamic SQL with `EXECUTE format()` to explicitly specify the
target schema
- Split exception handling: catch errors during removal, but let
installation fail naturally with clear PostgreSQL errors

**Impact:**
- No functional changes - Supabase continues to provide extensions as
before
- pgvector now correctly installs in the platform schema
- Cleaner migration history
- Prevents schema-related errors

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Verified migrations run successfully without the redundant file
  - [x] Confirmed Supabase extensions are still available
  - [x] Tested pgvector migration handles wrong-schema scenario
  - [x] No schema drift warnings

#### For configuration changes:
- [x] .env.default is updated or already compatible with my changes
- [x] docker-compose.yml is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
  - N/A - No configuration changes required
2026-01-22 12:06:00 +00:00
Abhimanyu Yadav
fc87ed4e34 feat(ci): add integration test job and rename e2e test job (#11820)
### Changes 🏗️

- Renamed the `test` job to `e2e_test` in the CI workflow for better
clarity
- Added a new `integration_test` job to the CI workflow that runs unit
tests using `pnpm test:unit`
- Created a basic integration test for the MainMarketplacePage component
to verify CI functionality

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Verified the CI workflow runs both e2e and integration tests
  - [x] Confirmed the integration test for MainMarketplacePage passes

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
2026-01-22 11:14:48 +00:00
Abhimanyu Yadav
b0953654d9 feat(frontend): add integration testing setup with Vitest, MSW, and RTL (#11813)
### Changes 🏗️

- Added Vitest and React Testing Library for frontend unit testing
- Configured MSW (Mock Service Worker) for API mocking in tests
- Created test utilities and setup files for integration tests
- Added comprehensive testing documentation in `AGENTS.md`
- Updated Orval configuration to generate MSW mock handlers
- Added mock server and browser implementations for development testing

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Run `pnpm test:unit` to verify tests pass
  - [x] Verify MSW mock handlers are generated correctly
  - [x] Check that test utilities work with sample component tests

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
2026-01-22 10:10:00 +00:00
Ubbe
c5069ca48f fix(frontend): chat UX improvements (#11804)
### Changes 🏗️

<img width="1920" height="998" alt="Screenshot 2026-01-19 at 22 14 51"
src="https://github.com/user-attachments/assets/ecd1c241-6f77-4702-9774-5e58806b0b64"
/>

This PR lays the groundwork for the new UX of AutoGPT Copilot. 
- moves the Copilot to its own route `/copilot`
- Makes the Copilot the homepage when enabled
- Updates the labelling of the homepage icons
- Makes the Library the homepage when Copilot is disabled
- Improves Copilot's:
  - session handling
  - styles and UX
  - message parsing
  
### Other improvements

- Improve the log out UX by adding a new `/logout` page and using a
re-direct

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Run locally and test the above

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Launches the new Copilot experience and aligns API behavior with the
UI.
> 
> - **Routing/Home**: Add `/copilot` with `CopilotShell` (desktop
sidebar + mobile drawer), make homepage route flag-driven; update
login/signup/error redirects and root page to use `getHomepageRoute`.
> - **Chat UX**: Replace legacy chat with `components/contextual/Chat/*`
(new message list, bubbles, tool call/response formatting, stop button,
initial-prompt handling, refined streaming/error handling); remove old
platform chat components.
> - **Sessions**: Add paginated session list (infinite load),
auto-select/create logic, mobile/desktop navigation, and improved
session fetching/claiming guards.
> - **Auth/Logout**: New `/logout` flow with delayed redirect; gate
various queries on auth state and logout-in-progress.
> - **Backend**: `GET /api/chat/sessions/{id}` returns `null` instead of
404; service saves assistant message on `StreamFinish` to avoid loss and
prevents duplicate saves; OpenAPI updated accordingly.
> - **Misc**: Minor UI polish in library modals, loader styling, docs
(CONTRIBUTING) additions, and small formatting fixes in block docs
generator.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
1b4776dcf5. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
2026-01-22 16:43:42 +07:00
Zamil Majdy
5d0cd88d98 fix(backend): Use unqualified vector type for pgvector queries (#11818)
## Summary
- Remove explicit schema qualification (`{schema}.vector` and
`OPERATOR({schema}.<=>)`) from pgvector queries in `embeddings.py` and
`hybrid_search.py`
- Use unqualified `::vector` type cast and `<=>` operator which work
because pgvector is in the search_path on all environments

## Problem
The previous approach tried to explicitly qualify the vector type with
schema names, but this failed because:
- **CI environment**: pgvector is in `public` schema → `platform.vector`
doesn't exist
- **Dev (Supabase)**: pgvector is in `platform` schema → `public.vector`
doesn't exist

## Solution
Use unqualified `::vector` and `<=>` operator. PostgreSQL resolves these
via `search_path`, which includes the schema where pgvector is installed
on all environments.

Tested on both local and dev environments with a test script that
verified:
-  Unqualified `::vector` type cast
-  Unqualified `<=>` operator in ORDER BY
-  Unqualified `<=>` in SELECT (similarity calculation)
-  Combined query patterns matching actual usage

## Test plan
- [ ] CI tests pass
- [ ] Marketplace approval works on dev after deployment

Fixes: AUTOGPT-SERVER-763, AUTOGPT-SERVER-764, AUTOGPT-SERVER-76B
autogpt-platform-beta-v0.6.43
2026-01-21 18:11:58 +00:00
Zamil Majdy
033f58c075 fix(backend): Make Redis event bus gracefully handle connection failures (#11817)
## Summary
Adds graceful error handling to AsyncRedisEventBus and RedisEventBus so
that connection failures log exceptions with full traceback while
remaining non-breaking. This allows DatabaseManager to operate without
Redis connectivity.

## Problem
DatabaseManager was failing with "Authentication required" when trying
to publish notifications via AsyncRedisNotificationEventBus. The service
has no Redis credentials configured, causing `increment_onboarding_runs`
to fail.

## Root Cause
When `increment_onboarding_runs` publishes a notification:
1. Calls `AsyncRedisNotificationEventBus().publish()`
2. Attempts to connect to Redis via `get_redis_async()`
3. Connection fails due to missing credentials
4. Exception propagates, failing the entire DB operation

Previous fix (#11775) made the cache module lazy, but didn't address the
notification bus which also requires Redis.

## Solution
Wrap Redis operations in try-except blocks:
- `publish_event`: Logs exception with traceback, continues without
publishing
- `listen_events`: Logs exception with traceback, returns empty
generator
- `wait_for_event`: Returns None on connection failure

Using `logger.exception()` instead of `logger.warning()` ensures full
stack traces are captured for debugging while keeping operations
non-breaking.

This allows services to operate without Redis when only using event bus
for non-critical notifications.

## Changes
- Modified `backend/data/event_bus.py`:
- Added graceful error handling to `RedisEventBus` and
`AsyncRedisEventBus`
- All Redis operations now catch exceptions and log with
`logger.exception()`
- Added `backend/data/event_bus_test.py`:
  - Tests verify graceful degradation when Redis is unavailable
  - Tests verify normal operation when Redis is available

## Test Plan
- [x] New tests verify graceful degradation when Redis unavailable
- [x] Existing notification tests still pass
- [x] DatabaseManager can increment onboarding runs without Redis

## Related Issues
Fixes https://significant-gravitas.sentry.io/issues/7205834440/
(AUTOGPT-SERVER-76D)
2026-01-21 15:51:26 +00:00
Ubbe
40ef2d511f fix(frontend): auto-select credentials correctly in old builder (#11815)
## Changes 🏗️

On the **Old Builder**, when running an agent...

### Before

<img width="800" height="614" alt="Screenshot 2026-01-21 at 21 27 05"
src="https://github.com/user-attachments/assets/a3b2ec17-597f-44d2-9130-9e7931599c38"
/>

Credentials are there, but it is not recognising them, you need to click
on them to be selected

### After

<img width="1029" height="728" alt="Screenshot 2026-01-21 at 21 26 47"
src="https://github.com/user-attachments/assets/c6e83846-6048-439e-919d-6807674f2d5a"
/>

It uses the new credentials UI and correctly auto-selects existing ones.

### Other

Fixed a small timezone display glitch on the new library view.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Run agent in old builder
- [x] Credentials are auto-selected and using the new collapsed system
credentials UI
2026-01-21 14:55:49 +00:00
Zamil Majdy
b714c0c221 fix(backend): handle null values in GraphSettings validation (#11812)
## Summary
- Fixes AUTOGPT-SERVER-76H - Error parsing LibraryAgent from database
due to null values in GraphSettings fields
- When parsing LibraryAgent settings from the database, null values for
`human_in_the_loop_safe_mode` and `sensitive_action_safe_mode` were
causing Pydantic validation errors
- Adds `BeforeValidator` annotations to coerce null values to their
defaults (True and False respectively)

## Test plan
- [x] Verified with unit tests that GraphSettings can now handle
None/null values
- [x] Backend tests pass
- [x] Manually tested with all scenarios (None, empty dict, explicit
values)
2026-01-21 08:40:38 -05:00
Krzysztof Czerwinski
ebabc4287e feat(platform): New LLM Picker UI (#11726)
Add new LLM Picker for the new Builder.

### Changes 🏗️

- Enrich `LlmModelMeta` (in `llm.py`) with human readable model, creator
and provider names and price tier (note: this is temporary measure and
all LlmModelMeta will be removed completely once LLM Registry is ready)
- Add provider icons
- Add custom input field `LlmModelField` and its components&helpers

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] LLM model picker works correctly in the new Builder
  - [x] Legacy LLM model picker works in the old Builder
2026-01-21 10:52:55 +00:00
Zamil Majdy
8b25e62959 feat(backend,frontend): add explicit safe mode toggles for HITL and sensitive actions (#11756)
## Summary

This PR introduces two explicit safe mode toggles for controlling agent
execution behavior, providing clearer and more granular control over
when agents should pause for human review.

### Key Changes

**New Safe Mode Settings:**
- **`human_in_the_loop_safe_mode`** (bool, default `true`) - Controls
whether human-in-the-loop (HITL) blocks pause for review
- **`sensitive_action_safe_mode`** (bool, default `false`) - Controls
whether sensitive action blocks pause for review

**New Computed Properties on LibraryAgent:**
- `has_human_in_the_loop` - Indicates if agent contains HITL blocks
- `has_sensitive_action` - Indicates if agent contains sensitive action
blocks

**Block Changes:**
- Renamed `requires_human_review` to `is_sensitive_action` on blocks for
clarity
- Blocks marked as `is_sensitive_action=True` pause only when
`sensitive_action_safe_mode=True`
- HITL blocks pause when `human_in_the_loop_safe_mode=True`

**Frontend Changes:**
- Two separate toggles in Agent Settings based on block types present
- Toggle visibility based on `has_human_in_the_loop` and
`has_sensitive_action` computed properties
- Settings cog hidden if neither toggle applies
- Proper state management for both toggles with defaults

**AI-Generated Agent Behavior:**
- AI-generated agents set `sensitive_action_safe_mode=True` by default
- This ensures sensitive actions are reviewed for AI-generated content

## Changes

**Backend:**
- `backend/data/graph.py` - Updated `GraphSettings` with two boolean
toggles (non-optional with defaults), added `has_sensitive_action`
computed property
- `backend/data/block.py` - Renamed `requires_human_review` to
`is_sensitive_action`, updated review logic
- `backend/data/execution.py` - Updated `ExecutionContext` with both
safe mode fields
- `backend/api/features/library/model.py` - Added
`has_human_in_the_loop` and `has_sensitive_action` to `LibraryAgent`
- `backend/api/features/library/db.py` - Updated to use
`sensitive_action_safe_mode` parameter
- `backend/executor/utils.py` - Simplified execution context creation

**Frontend:**
- `useAgentSafeMode.ts` - Rewritten to support two independent toggles
- `AgentSettingsModal.tsx` - Shows two separate toggles
- `SelectedSettingsView.tsx` - Shows two separate toggles
- Regenerated API types with new schema

## Test Plan

- [x] All backend tests pass (Python 3.11, 3.12, 3.13)
- [x] All frontend tests pass
- [x] Backend format and lint pass
- [x] Frontend format and lint pass
- [x] Pre-commit hooks pass

---------

Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
2026-01-21 00:56:02 +00:00
Zamil Majdy
35a13e3df5 fix(backend): Use explicit schema qualification for pgvector types (#11805)
## Summary
- Fix intermittent "type 'vector' does not exist" errors when using
PgBouncer in transaction mode
- The issue was that `SET search_path` and the actual query could run on
different backend connections
- Use explicit schema qualification (`{schema}.vector`,
`OPERATOR({schema}.<=>)`) instead of relying on search_path

## Test plan
- [x] Tested vector type cast on local: `'[1,2,3]'::platform.vector`
works
- [x] Tested OPERATOR syntax on local: `OPERATOR(platform.<=>)` works
- [x] Tested on dev via kubectl exec: both work correctly
- [ ] Deploy to dev and verify backfill_missing_embeddings endpoint no
longer errors

## Related Issues
Fixes: AUTOGPT-SERVER-763, AUTOGPT-SERVER-764

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 22:18:16 +00:00
Mewael Tsegay Desta
2169b433c9 feat(backend/blocks): add ConcatenateListsBlock (#11567)
# feat(backend/blocks): add ConcatenateListsBlock

## Description

This PR implements a new block `ConcatenateListsBlock` that concatenates
multiple lists into a single list. This addresses the "good first issue"
for implementing a list concatenation block in the platform/blocks area.

The block takes a list of lists as input and combines all elements in
order into a single concatenated list. This is useful for workflows that
need to merge data from multiple sources or combine results from
different operations.

### Changes 🏗️

- **Added `ConcatenateListsBlock` class** in
`autogpt_platform/backend/backend/blocks/data_manipulation.py`
- Input: `lists: List[List[Any]]` - accepts a list of lists to
concatenate
- Output: `concatenated_list: List[Any]` - returns a single concatenated
list
- Error output: `error: str` - provides clear error messages for invalid
input types
  - Block ID: `3cf9298b-5817-4141-9d80-7c2cc5199c8e`
- Category: `BlockCategory.BASIC` (consistent with other list
manipulation blocks)
  
- **Added comprehensive test suite** in
`autogpt_platform/backend/test/blocks/test_concatenate_lists.py`
  - Tests using built-in `test_input`/`test_output` validation
- Manual test cases covering edge cases (empty lists, single list, empty
input)
  - Error handling tests for invalid input types
  - Category consistency verification
  - All tests passing

- **Implementation details:**
  - Uses `extend()` method for efficient list concatenation
  - Preserves element order from all input lists
- **Runtime type validation**: Explicitly checks `isinstance(lst, list)`
before calling `extend()` to prevent:
- Strings being iterated character-by-character (e.g., `extend("abc")` →
`['a', 'b', 'c']`)
    - Non-iterable types causing `TypeError` (e.g., `extend(1)`)
  - Clear error messages indicating which index has invalid input
- Handles edge cases: empty lists, empty input, single list, None values
  - Follows existing block patterns and conventions

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Run `poetry run pytest test/blocks/test_concatenate_lists.py -v` -
all tests pass
  - [x] Verified block can be imported and instantiated
  - [x] Tested with built-in test cases (4 test scenarios)
  - [x] Tested manual edge cases (empty lists, single list, empty input)
  - [x] Tested error handling for invalid input types
  - [x] Verified category is `BASIC` for consistency
  - [x] Verified no linting errors
- [x] Confirmed block follows same patterns as other blocks in
`data_manipulation.py`

#### Code Quality:
- [x] Code follows existing patterns and conventions
- [x] Type hints are properly used
- [x] Documentation strings are clear and descriptive
- [x] Runtime type validation implemented
- [x] Error handling with clear error messages
- [x] No linting errors
- [x] Prisma client generated successfully

### Testing

**Test Results:**
```
test/blocks/test_concatenate_lists.py::test_concatenate_lists_block_builtin_tests PASSED
test/blocks/test_concatenate_lists.py::test_concatenate_lists_manual PASSED

============================== 2 passed in 8.35s ==============================
```

**Test Coverage:**
- Basic concatenation: `[[1, 2, 3], [4, 5, 6]]` → `[1, 2, 3, 4, 5, 6]`
- Mixed types: `[["a", "b"], ["c"], ["d", "e", "f"]]` → `["a", "b", "c",
"d", "e", "f"]`
- Empty list handling: `[[1, 2], []]` → `[1, 2]`
- Empty input: `[]` → `[]`
- Single list: `[[1, 2, 3]]` → `[1, 2, 3]`
- Error handling: Invalid input types (strings, non-lists) produce clear
error messages
- Category verification: Confirmed `BlockCategory.BASIC` for consistency

### Review Feedback Addressed

- **Category Consistency**: Changed from `BlockCategory.DATA` to
`BlockCategory.BASIC` to match other list manipulation blocks
(`AddToListBlock`, `FindInListBlock`, etc.)
- **Type Robustness**: Added explicit runtime validation with
`isinstance(lst, list)` check before calling `extend()` to prevent:
  - Strings being iterated character-by-character
  - Non-iterable types causing `TypeError`
- **Error Handling**: Added `error` output field with clear, descriptive
error messages indicating which index has invalid input
- **Test Coverage**: Added test case for error handling with invalid
input types

### Related Issues

- Addresses: "Implement block to concatenate lists" (good first issue,
platform/blocks, hacktoberfest)

### Notes

- This is a straightforward data manipulation block that doesn't require
external dependencies
- The block will be automatically discovered by the block loading system
- No database or configuration changes required
- Compatible with existing workflow system
- All review feedback has been addressed and incorporated


<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Adds a new list utility and updates docs.
> 
> - **New block**: `ConcatenateListsBlock` in
`backend/blocks/data_manipulation.py`
> - Input `lists: List[List[Any]]`; outputs `concatenated_list` or
`error`
> - Skips `None` entries; emits error for non-list items; preserves
order
> - **Docs**: Adds "Concatenate Lists" section to
`docs/integrations/basic.md` and links it in
`docs/integrations/README.md`
> - **Contributor guide**: New `docs/CLAUDE.md` with manual doc section
guidelines
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
4f56dd86c2. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 18:04:12 +00:00
Nicholas Tindle
fa0b7029dd fix(platform): make chat credentials type selection deterministic (#11795)
## Background

When using chat to run blocks/agents that support multiple credential
types (e.g., GitHub blocks support both `api_key` and `oauth2`), users
reported that the credentials setup UI would randomly show either "Add
API key" or "Connect account (OAuth)" - seemingly at random between
requests or server restarts.

## Root Cause

The bug was in how the backend selected which credential type to return
when building the missing credentials response:

```python
cred_type = next(iter(field_info.supported_types), "api_key")
```

The problem is that `supported_types` is a **frozenset**. When you call
`iter()` on a frozenset and take `next()`, the iteration order is
**non-deterministic** due to Python's hash randomization. This means:
- `frozenset({'api_key', 'oauth2'})` could iterate as either
`['api_key', 'oauth2']` or `['oauth2', 'api_key']`
- The order varies between Python process restarts and sometimes between
requests
- This caused the UI to randomly show different credential options

### Changes 🏗️

**Backend (`utils.py`, `run_block.py`, `run_agent.py`):**
- Added `_serialize_missing_credential()` helper that uses `sorted()`
for deterministic ordering
- Added `build_missing_credentials_from_graph()` and
`build_missing_credentials_from_field_info()` utilities
- Now returns both `type` (first sorted type, for backwards compat) and
`types` (full array with ALL supported types)

**Frontend (`helpers.ts`, `ChatCredentialsSetup.tsx`,
`useChatMessage.ts`):**
- Updated to read the `types` array from backend response
- Changed `credentialType` (single) to `credentialTypes` (array)
throughout the chat credentials flow
- Passes all supported types to `CredentialsInput` via
`credentials_types` schema field

### Result

Now `useCredentials.ts` correctly sets both `supportsApiKey=true` AND
`supportsOAuth2=true` when both are supported, ensuring:
1. **Deterministic behavior** - no more random type selection
2. **All saved credentials shown** - credentials of any supported type
appear in the selection list

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified GitHub block shows consistent credential options across
page reloads
- [x] Verified both OAuth and API key credentials appear in selection
when user has both saved
- [x] Verified backend returns `types: ["api_key", "oauth2"]` array
(checked via Python REPL)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Ensures deterministic credential type selection and surfaces all
supported types end-to-end.
> 
> - Backend: add `_serialize_missing_credential`,
`build_missing_credentials_from_graph/field_info`;
`run_agent`/`run_block` now return missing credentials with stable
ordering and both `type` (first) and `types` (all).
> - Frontend: chat helpers and UI (`helpers.ts`,
`ChatCredentialsSetup.tsx`, `useChatMessage.ts`) now read `types`,
switch from single `credentialType` to `credentialTypes`, and pass all
supported `credentials_types` in schemas.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
7d80f4f0e0. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Nicholas Tindle <ntindle@users.noreply.github.com>
2026-01-20 16:19:57 +00:00
Abhimanyu Yadav
c20ca47bb0 feat(frontend): enhance RunGraph and RunInputDialog components with loading states and improved UI (#11808)
### Changes 🏗️

- Enhanced UI for the Run Graph button with improved loading states and
animations
- Added color-coded edges in the flow editor based on output data types
- Improved the layout of the Run Input Dialog with a two-column grid
design
- Refined the styling of flow editor controls with consistent icon sizes
and colors
- Updated tutorial icons with better color and size customization
- Fixed credential field display to show provider name with "credential"
suffix
- Optimized draft saving by excluding node position changes to prevent
excessive saves when dragging nodes

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Verified that the Run Graph button shows proper loading states
  - [x] Confirmed that edges display correct colors based on data types
- [x] Tested the Run Input Dialog layout with various input
configurations
  - [x] Checked that flow editor controls display consistently
  - [x] Verified that tutorial icons render properly
  - [x] Confirmed credential fields show proper provider names
- [x] Tested that dragging nodes doesn't trigger unnecessary draft saves
2026-01-20 15:50:23 +00:00
Abhimanyu Yadav
7756e2d12d refactor(frontend): refactor credentials input with unified CredentialsGroupedView component (#11801)
### Changes 🏗️

- Refactored the credentials input handling in the RunInputDialog to use
the shared CredentialsGroupedView component
- Moved CredentialsGroupedView from agent library to a shared component
location for reuse
- Fixed source name handling in edge creation to properly handle tool
source names
- Improved node output UI by replacing custom expand/collapse with
Accordion component
- Fixed timing of hardcoded values synchronization with handle IDs to
ensure proper loading
- Enabled NEW_FLOW_EDITOR and BUILDER_VIEW_SWITCH feature flags by
default

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified credentials input works in both agent run dialog and
builder run dialog
  - [x] Confirmed node output accordion works correctly
- [x] Tested flow editor with tools to ensure source name handling works
properly
  - [x] Verified hardcoded values sync correctly with handle IDs

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
2026-01-20 12:20:25 +00:00
Swifty
bc75d70e7d refactor(backend): Improve Langfuse tracing with v3 SDK patterns and @observe decorators (#11803)
<!-- Clearly explain the need for these changes: -->

This PR improves the Langfuse tracing implementation in the chat feature
by adopting the v3 SDK patterns, resulting in cleaner code and better
observability.

### Changes 🏗️

- **Simplified Langfuse client usage**: Replace manual client
initialization with `langfuse.get_client()` global singleton
- **Use v3 context managers**: Switch to
`start_as_current_observation()` and `propagate_attributes()` for
automatic trace propagation
- **Auto-instrument OpenAI calls**: Use `langfuse.openai` wrapper for
automatic LLM call tracing instead of manual generation tracking
- **Add `@observe` decorators**: All chat tools now have
`@observe(as_type="tool")` decorators for automatic tool execution
tracing:
  - `add_understanding`
  - `view_agent_output` (renamed from `agent_output`)
  - `create_agent`
  - `edit_agent`
  - `find_agent`
  - `find_block`
  - `find_library_agent`
  - `get_doc_page`
  - `run_agent`
  - `run_block`
  - `search_docs`
- **Remove manual trace lifecycle**: Eliminated the verbose `finally`
block that manually ended traces/generations
- **Rename tool**: `agent_output` → `view_agent_output` for clarity

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Verified chat feature works with Langfuse tracing enabled
- [x] Confirmed traces appear correctly in Langfuse dashboard with tool
spans
  - [x] Tested tool execution flows show up as nested observations

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

No configuration changes required - uses existing Langfuse environment
variables.
2026-01-19 20:56:51 +00:00
Nicholas Tindle
c1a1767034 feat(docs): Add block documentation auto-generation system (#11707)
- Add generate_block_docs.py script that introspects block code to
generate markdown
- Support manual content preservation via <!-- MANUAL: --> markers
- Add migrate_block_docs.py to preserve existing manual content from git
HEAD
- Add CI workflow (docs-block-sync.yml) to fail if docs drift from code
- Add Claude PR review workflow (docs-claude-review.yml) for doc changes
- Add manual LLM enhancement workflow (docs-enhance.yml)
- Add GitBook configuration (.gitbook.yaml, SUMMARY.md)
- Fix non-deterministic category ordering (categories is a set)
- Add comprehensive test suite (32 tests)
- Generate docs for 444 blocks with 66 preserved manual sections

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

<!-- Clearly explain the need for these changes: -->

### Changes 🏗️

<!-- Concisely describe all of the changes made in this pull request:
-->

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  <!-- Put your test plan here: -->
  - [x] Extensively test code generation for the docs pages



<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Introduces an automated documentation pipeline for blocks and
integrates it into CI.
> 
> - Adds `scripts/generate_block_docs.py` (+ tests) to introspect blocks
and generate `docs/integrations/**`, preserving `<!-- MANUAL: -->`
sections
> - New CI workflows: **docs-block-sync** (fails if docs drift),
**docs-claude-review** (AI review for block/docs PRs), and
**docs-enhance** (optional LLM improvements)
> - Updates existing Claude workflows to use `CLAUDE_CODE_OAUTH_TOKEN`
instead of `ANTHROPIC_API_KEY`
> - Improves numerous block descriptions/typos and links across backend
blocks to standardize docs output
> - Commits initial generated docs including
`docs/integrations/README.md` and many provider/category pages
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
631e53e0f6. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 07:03:19 +00:00
Nicholas Tindle
1b56ff13d9 test 2026-01-18 15:32:10 -06:00
Zamil Majdy
f31c160043 feat(platform): add endedAt field and fix execution analytics timestamps (#11759)
## Summary

This PR adds proper execution end time tracking and fixes timestamp
handling throughout the execution analytics system.

### Key Changes

1. **Added `endedAt` field to database schema** - Executions now have a
dedicated field for tracking when they finish
2. **Fixed timestamp nullable handling** - `started_at` and `ended_at`
are now properly nullable in types
3. **Fixed chart aggregation** - Reduced threshold from ≥3 to ≥1
executions per day
4. **Improved timestamp display** - Moved timestamps to expandable
details section in analytics table
5. **Fixed nullable timestamp bugs** - Updated all frontend code to
handle null timestamps correctly

## Problem Statement

### Issue 1: Missing Execution End Times
Previously, executions used `updatedAt` (last DB update) as a proxy for
"end time". This broke when adding correctness scores retroactively -
the end time would change to whenever the score was added, not when the
execution actually finished.

### Issue 2: Chart Shows Only One Data Point
The accuracy trends chart showed only one data point despite having
executions across multiple days. Root cause: aggregation required ≥3
executions per day.

### Issue 3: Incorrect Type Definitions
Manually maintained types defined `started_at` and `ended_at` as
non-nullable `Date`, contradicting reality where QUEUED executions
haven't started yet.

## Solution

### Database Schema (`schema.prisma`)
```prisma
model AgentGraphExecution {
  // ...
  startedAt DateTime?
  endedAt   DateTime?  // NEW FIELD
  // ...
}
```

### Execution Lifecycle
- **QUEUED**: `startedAt = null`, `endedAt = null` (not started)
- **RUNNING**: `startedAt = set`, `endedAt = null` (in progress)  
- **COMPLETED/FAILED/TERMINATED**: `startedAt = set`, `endedAt = set`
(finished)

### Migration Strategy
```sql
-- Add endedAt column
ALTER TABLE "AgentGraphExecution" ADD COLUMN "endedAt" TIMESTAMP(3);

-- Backfill ONLY terminal executions (prevents marking RUNNING executions as ended)
UPDATE "AgentGraphExecution"
SET "endedAt" = "updatedAt"
WHERE "endedAt" IS NULL
  AND "executionStatus" IN ('COMPLETED', 'FAILED', 'TERMINATED');
```

## Changes by Component

### Backend

**`schema.prisma`**
- Added `endedAt` field to `AgentGraphExecution`

**`execution.py`**
- Made `started_at` and `ended_at` optional with Field descriptions
- Updated `from_db()` to use `endedAt` instead of `updatedAt`
- `update_graph_execution_stats()` sets `endedAt` when status becomes
terminal

**`execution_analytics_routes.py`**
- Removed `created_at`/`updated_at` from `ExecutionAnalyticsResult` (DB
metadata, not execution data)
- Kept only `started_at`/`ended_at` (actual execution runtime)
- Made settings global (avoid recreation)
- Moved OpenAI key validation to `_process_batch` (only check when LLM
actually runs)

**`analytics.py`**
- Fixed aggregation: `COUNT(*) >= 1` (was 3) - include all days with ≥1
execution
- Uses `createdAt` for chart grouping (when execution was queued)

**`late_execution_monitor.py`**
- Handle optional `started_at` with fallback to `datetime.min` for
sorting
- Display "Not started" when `started_at` is null

### Frontend

**Type Definitions**
- Fixed manually maintained `types.ts`: `started_at: Date | null` (was
non-nullable)
- Generated types were already correct

**Analytics Components**
- `AnalyticsResultsTable.tsx`: Show only `started_at`/`ended_at` in
2-column expandable grid
- `ExecutionAnalyticsForm.tsx`: Added filter explanation UI

**Monitoring Components** - Fixed null handling bugs:
- `OldAgentLibraryView.tsx`: Handle null in reduce function
- `agent-runs-selector-list.tsx`: Safe sorting with `?.getTime() ?? 0`
- `AgentFlowList.tsx`: Filter/sort with null checks
- `FlowRunsStatus.tsx`: Filter null timestamps
- `FlowRunsTimeline.tsx`: Filter executions with null timestamps before
rendering
- `monitoring/page.tsx`: Safe sorting
- `ActivityItem.tsx`: Fallback to "recently" for null timestamps

## Benefits

 **Accurate End Times**: `endedAt` is frozen when execution finishes,
not updated later
 **Type Safety**: Nullable types match reality, exposing real bugs  
 **Better UX**: Chart shows all days with data (not just days with ≥3
executions)
 **Bug Fixes**: 7+ frontend components now handle null timestamps
correctly
 **Documentation**: Field descriptions explain when timestamps are null

## Testing

### Backend
```bash
cd autogpt_platform/backend
poetry run format  #  All checks passed
poetry run lint    #  All checks passed
```

### Frontend  
```bash
cd autogpt_platform/frontend
pnpm format        #  All checks passed
pnpm lint          #  All checks passed
pnpm types         #  All type errors fixed
```

### Test Data Generation
Created script to generate 35 test executions across 7 days with
correctness scores:
```bash
poetry run python scripts/generate_test_analytics_data.py
```

## Migration Notes

⚠️ **Important**: The migration only backfills `endedAt` for executions
with terminal status (COMPLETED, FAILED, TERMINATED). Active executions
(QUEUED, RUNNING) correctly keep `endedAt = null`.

## Breaking Changes

None - this is backward compatible:
- `endedAt` is nullable, existing code that doesn't use it is unaffected
- Frontend already used generated types which were correct
- Migration safely backfills historical data

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Introduces explicit execution end-time tracking and normalizes
timestamp handling across backend and frontend.
> 
> - Adds `endedAt` to `AgentGraphExecution` (schema + migration);
backfills terminal executions; sets `endedAt` on terminal status updates
> - Makes `GraphExecutionMeta.started_at/ended_at` optional; updates
`from_db()` to use DB `endedAt`; exposes timestamps in
`ExecutionAnalyticsResult`
> - Moves OpenAI key validation into batch processing; instantiates
`Settings` once
> - Accuracy trends: reduce daily aggregation threshold to `>= 1`;
optional historical series
> - Monitoring/analytics UI: results table shows/export
`started_at`/`ended_at`; adds chart filter explainer
> - Frontend null-safety: update types (`Date | null`) and fix
sorting/filtering/rendering for nullable timestamps across monitoring
and library views
> - Late execution monitor: safe sorting/display when `started_at` is
null
> - OpenAPI specs updated for new/nullable fields
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
1d987ca6e5. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
2026-01-16 21:44:24 +00:00
Nicholas Tindle
06550a87eb feat(backend): add missed default credentials (#11760)
### Changes 🏗️

**Fixed missing default credentials and provider name mismatch in the
credentials store:**

1. **Provider name correction** (`credentials_store.py:97-103`)
- Changed `provider="unreal"` → `provider="unreal_speech"` to match the
existing `unreal_speech_api_key` setting and block usage
- Updated title from "Use Credits for Unreal" → "Use Credits for Unreal
Speech" for clarity

2. **Added missing OpenWeatherMap credentials**
(`credentials_store.py:219-226`)
- New `openweathermap_credentials` definition with `APIKeyCredentials`
- Uses existing `settings.secrets.openweathermap_api_key` setting that
was previously defined but had no credential object
   - Added to `DEFAULT_CREDENTIALS` list

3. **Fixed credentials not exposed in `get_all_creds()`**
(`credentials_store.py:343-354`)
- Added `llama_api_credentials` conditional append (was defined but not
returned to users)
- Added `v0_credentials` conditional append (was defined but not
returned to users)
   - Added `openweathermap_credentials` conditional append

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified provider name `unreal_speech` matches block usage in
`text_to_speech_block.py`
  - [x] Confirmed `openweathermap_api_key` setting exists in secrets
- [x] Confirmed `llama_api_key` and `v0_api_key` settings exist in
secrets

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Aligns backend credential definitions and exposes missing system
creds; updates frontend to hide new built-ins.
> 
> - Backend `credentials_store.py`:
>   - Corrects `provider` to `unreal_speech` and updates title
> - Adds `openweathermap_credentials`; includes in `DEFAULT_CREDENTIALS`
and `get_all_creds()` when key present
> - Ensures `llama_api_credentials` and `v0_credentials` are returned by
`get_all_creds()`
> - Frontend `integrations/page.tsx`:
> - Extends `hiddenCredentials` with IDs for `v0`, `webshare_proxy`, and
`openweathermap`
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
e7d46b76c6. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Nicholas Tindle <ntindle@users.noreply.github.com>
2026-01-16 21:18:12 +00:00
Nicholas Tindle
088b9998dc fix(frontend): Fix flaky agent-activity tests by targeting correct agent (#11790)
This PR fixes flaky agent-activity Playwright tests that were failing
intermittently in CI.

Closes #11789

### Changes 🏗️

- **Navigate to specific agent by name**: Replace
`LibraryPage.clickFirstAgent(page)` with
`LibraryPage.navigateToAgentByName(page, "Test Agent")` to ensure we're
testing the correct agent rather than relying on the first agent in the
list
- **Add retry mechanism for async data loading**: Replace direct
visibility check with `expect(...).toPass({ timeout: 15000 })` pattern
to properly handle asynchronous agent data fetching
- **Increase timeout**: Extended timeout from 8000ms to 15000ms to
accommodate slower CI environments

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Verified the test file syntax is correct
- [x] Changes target the correct file
(`autogpt_platform/frontend/src/tests/agent-activity.spec.ts`)
- [x] The retry mechanism follows Playwright best practices using
`toPass()`

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
(N/A - no config changes)
- [x] `docker-compose.yml` is updated or already compatible with my
changes (N/A - no config changes)
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**) (N/A - no config changes)

---------

Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Nicholas Tindle <ntindle@users.noreply.github.com>
2026-01-16 20:33:47 +00:00
Nicholas Tindle
05c89fa5c0 feat(claude): add vercel-react-best-practices skill (#11777) 2026-01-16 09:40:58 -07:00
Swifty
8cc8295f14 feat(backend): add agent generator tools for chat copilot (#11781)
This PR adds the ability to create and edit agents from natural language
descriptions in the chat copilot.

### Changes 🏗️

- Added `agent_generator/` module with:
  - LLM client for OpenAI API calls
- Core generation logic for decomposing goals and generating agent JSON
  - Fixer module to correct common LLM generation errors
  - Validator to ensure generated agents are structurally valid
  - Prompts for goal decomposition and agent generation
  - Utility functions for blocks info and agent saving
- Added `CreateAgentTool` - creates new agents from natural language
descriptions
- Added `EditAgentTool` - edits existing agents using natural language
patches
- Added response models: `AgentPreviewResponse`, `AgentSavedResponse`,
`ClarificationNeededResponse`
- Registered new tools in the tools registry

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Run `poetry run format` to ensure code passes linting
- [x] Test creating an agent via chat with a natural language
description
  - [x] Test editing an existing agent via chat
2026-01-16 17:11:57 +01:00
Swifty
e55f05c7a8 feat(backend): add chat search tools and BM25 reranking (#11782)
This PR adds new chat tools for searching blocks and documentation,
along with BM25 reranking for improved search relevance.

### Changes 🏗️

**New Chat Tools:**
- `find_block` - Search for available blocks by name/description using
hybrid search
- `run_block` - Execute a block directly with provided inputs and
credentials
- `search_docs` - Search documentation with section-level granularity  
- `get_doc_page` - Retrieve full documentation page content

**Search Improvements:**
- Added BM25 reranking to hybrid search for better lexical relevance
- Documentation handler now chunks markdown by headings (##) for
finer-grained embeddings
- Section-based content IDs (`doc_path::section_index`) for precise doc
retrieval
- Startup embedding backfill in scheduler for immediate searchability

**Other Changes:**
- New response models for block and documentation search results
- Updated orphan cleanup to handle section-based doc embeddings
- Added `rank-bm25` dependency for BM25 scoring
- Removed max message limit check in chat service

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Run find_block tool to search for blocks (e.g., "current time")
  - [x] Run run_block tool to execute a found block
  - [x] Run search_docs tool to search documentation
  - [x] Run get_doc_page tool to retrieve full doc content
- [x] Verify BM25 reranking improves search relevance for exact term
matches
  - [x] Verify documentation sections are properly chunked and embedded

#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

**Dependencies added:** `rank-bm25` for BM25 scoring algorithm
2026-01-16 16:18:10 +01:00