Compare commits

..

3 Commits

Author SHA1 Message Date
Nicholas Tindle
8892bcd230 docs: Add workspace and media file architecture documentation (#11989)
### Changes 🏗️

- Added comprehensive architecture documentation at
`docs/platform/workspace-media-architecture.md` covering:
  - Database models (`UserWorkspace`, `UserWorkspaceFile`)
  - `WorkspaceManager` API with session scoping
- `store_media_file()` media normalization pipeline (input types, return
formats)
  - Virus scanning responsibility boundaries
- Decision tree for choosing `WorkspaceManager` vs `store_media_file()`
- Configuration reference including `clamav_max_concurrency` and
`clamav_mark_failed_scans_as_clean`
  - Common patterns with error handling examples
- Updated `autogpt_platform/backend/CLAUDE.md` with a "Workspace & Media
Files" section referencing the new docs
- Removed duplicate `scan_content_safe()` call from
`WriteWorkspaceFileTool` — `WorkspaceManager.write_file()` already scans
internally, so the tool was double-scanning every file
- Replaced removed comment in `workspace.py` with explicit ownership
comment clarifying that `WorkspaceManager` is the single scanning
boundary

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified `scan_content_safe()` is called inside
`WorkspaceManager.write_file()` (workspace.py:186)
- [x] Verified `store_media_file()` scans all input branches including
local paths (file.py:351)
- [x] Verified documentation accuracy against current source code after
merge with dev
  - [x] CI checks all passing

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Mostly adds documentation and internal developer guidance; the only
code change is a comment clarifying `WorkspaceManager.write_file()` as
the single virus-scanning boundary, with no behavior change.
> 
> **Overview**
> Adds a new `docs/platform/workspace-media-architecture.md` describing
the Workspace storage layer vs the `store_media_file()` media pipeline,
including session scoping and virus-scanning/persistence responsibility
boundaries.
> 
> Updates backend `CLAUDE.md` to point contributors to the new doc when
working on CoPilot uploads/downloads or
`WorkspaceManager`/`store_media_file()`, and clarifies in
`WorkspaceManager.write_file()` (comment-only) that callers should not
duplicate virus scanning.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
18fcfa03f8. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-17 06:12:26 +00:00
Zamil Majdy
48ff8300a4 Merge branch 'master' of github.com:Significant-Gravitas/AutoGPT into dev 2026-03-17 13:13:42 +07:00
Otto
0b594a219c feat(copilot): support prompt-in-URL for shareable prompt links (#12406)
Requested by @torantula

Add support for shareable AutoPilot URLs that contain a prompt in the
URL hash fragment, inspired by [Lovable's
implementation](https://docs.lovable.dev/integrations/build-with-url).

**URL format:**
- `/copilot#prompt=URL-encoded-text` — pre-fills the input for the user
to review before sending
- `/copilot?autosubmit=true#prompt=...` — auto-creates a session and
sends the prompt immediately

**Example:**
```
https://platform.agpt.co/copilot#prompt=Create%20a%20todo%20app
https://platform.agpt.co/copilot?autosubmit=true#prompt=Create%20a%20todo%20app
```

**Key design decisions:**
- Uses URL fragment (`#`) instead of query params — fragments never hit
the server, so prompts stay client-side only (better for privacy, no
backend URL length limits)
- URL is cleaned via `history.replaceState` immediately after extraction
to prevent re-triggering on navigation/reload
- Leverages existing `pendingMessage` + `createSession()` flow for
auto-submit — no new backend APIs needed
- For populate-only mode, passes `initialPrompt` down through component
tree to pre-fill the chat input

**Files changed:**
- `useCopilotPage.ts` — URL hash extraction logic + `initialPrompt`
state
- `CopilotPage.tsx` — passes `initialPrompt` to `ChatContainer`
- `ChatContainer.tsx` — passes `initialPrompt` to `EmptySession`
- `EmptySession.tsx` — passes `initialPrompt` to `ChatInput`
- `ChatInput.tsx` / `useChatInput.ts` — accepts `initialValue` to
pre-fill the textarea

Fixes SECRT-2119

---
Co-authored-by: Toran Bruce Richards (@Torantulino) <toran@agpt.co>
2026-03-13 23:54:54 +07:00
16 changed files with 455 additions and 300 deletions

View File

@@ -178,6 +178,16 @@ yield "image_url", result_url
3. Write tests alongside the route file
4. Run `poetry run test` to verify
## Workspace & Media Files
**Read [Workspace & Media Architecture](../../docs/platform/workspace-media-architecture.md) when:**
- Working on CoPilot file upload/download features
- Building blocks that handle `MediaFileType` inputs/outputs
- Modifying `WorkspaceManager` or `store_media_file()`
- Debugging file persistence or virus scanning issues
Covers: `WorkspaceManager` (persistent storage with session scoping), `store_media_file()` (media normalization pipeline), and responsibility boundaries for virus scanning and persistence.
## Security Implementation
### Cache Protection Middleware

View File

@@ -453,9 +453,6 @@ async def create_library_agent(
sensitive_action_safe_mode=sensitive_action_safe_mode,
).model_dump()
),
topIntegrations=SafeJson(
library_model._compute_top_integrations(graph_entry)
),
**(
{"Folder": {"connect": {"id": folder_id}}}
if folder_id and graph_entry is graph
@@ -624,15 +621,6 @@ async def update_library_agent_version_and_settings(
user_id=user_id,
settings=updated_settings,
)
# Recompute top integrations on version update
top_integrations = library_model._compute_top_integrations(agent_graph)
await prisma.models.LibraryAgent.prisma().update(
where={"id": library.id},
data={"topIntegrations": SafeJson(top_integrations)},
)
library.top_integrations = top_integrations
return library

View File

@@ -1,4 +1,3 @@
import collections
import datetime
from enum import Enum
from typing import TYPE_CHECKING, Any, Optional
@@ -7,7 +6,6 @@ import prisma.enums
import prisma.models
import pydantic
from backend.blocks._base import BlockCategory
from backend.data.graph import GraphModel, GraphSettings, GraphTriggerInfo
from backend.data.model import (
CredentialsMetaInput,
@@ -146,15 +144,6 @@ class RecentExecution(pydantic.BaseModel):
activity_summary: str | None = None
def _parse_top_integrations(
raw: object, graph: GraphModel
) -> list[dict[str, str]]:
"""Parse topIntegrations from database, falling back to on-the-fly computation."""
if raw and isinstance(raw, list) and len(raw) > 0:
return [dict(item) for item in raw]
return _compute_top_integrations(graph)
def _parse_settings(settings: dict | str | None) -> GraphSettings:
"""Parse settings from database, handling both dict and string formats."""
if settings is None:
@@ -167,62 +156,6 @@ def _parse_settings(settings: dict | str | None) -> GraphSettings:
return GraphSettings()
# Priority order for category-based integration entries
_CATEGORY_PRIORITY: list[BlockCategory] = [
BlockCategory.AI,
BlockCategory.SOCIAL,
BlockCategory.COMMUNICATION,
BlockCategory.DEVELOPER_TOOLS,
BlockCategory.DATA,
BlockCategory.CRM,
BlockCategory.PRODUCTIVITY,
BlockCategory.ISSUE_TRACKING,
BlockCategory.TEXT,
BlockCategory.SEARCH,
BlockCategory.MULTIMEDIA,
BlockCategory.MARKETING,
BlockCategory.LOGIC,
BlockCategory.BASIC,
BlockCategory.INPUT,
BlockCategory.OUTPUT,
]
def _compute_top_integrations(
graph: GraphModel,
) -> list[dict[str, str]]:
"""Compute the top integrations used by an agent's graph.
Returns up to 5 entries: providers first (by frequency), then categories.
"""
provider_counter: collections.Counter[str] = collections.Counter()
category_counter: collections.Counter[BlockCategory] = collections.Counter()
for g in [graph, *graph.sub_graphs]:
for node in g.nodes:
for info in node.block.input_schema.get_credentials_fields_info().values():
for provider in info.provider:
provider_counter[provider] += 1
if node.block.categories:
for cat in node.block.categories:
category_counter[cat] += 1
result: list[dict[str, str]] = [
{"name": name, "type": "provider"}
for name, _ in provider_counter.most_common(5)
]
if len(result) < 5:
for cat in _CATEGORY_PRIORITY:
if len(result) >= 5:
break
if category_counter.get(cat, 0) > 0:
result.append({"name": cat.name, "type": "category"})
return result
class LibraryAgent(pydantic.BaseModel):
"""
Represents an agent in the library, including metadata for display and
@@ -282,7 +215,6 @@ class LibraryAgent(pydantic.BaseModel):
recommended_schedule_cron: str | None = None
settings: GraphSettings = pydantic.Field(default_factory=GraphSettings)
top_integrations: list[dict[str, str]] = pydantic.Field(default_factory=list)
marketplace_listing: Optional["MarketplaceListing"] = None
@staticmethod
@@ -423,9 +355,6 @@ class LibraryAgent(pydantic.BaseModel):
folder_name=agent.Folder.name if agent.Folder else None,
recommended_schedule_cron=agent.AgentGraph.recommendedScheduleCron,
settings=_parse_settings(agent.settings),
top_integrations=_parse_top_integrations(
agent.topIntegrations, graph
),
marketplace_listing=marketplace_listing_data,
)

View File

@@ -183,7 +183,8 @@ class WorkspaceManager:
f"{Config().max_file_size_mb}MB limit"
)
# Virus scan content before persisting (defense in depth)
# Scan here — callers must NOT duplicate this scan.
# WorkspaceManager owns virus scanning for all persisted files.
await scan_content_safe(content, filename=filename)
# Determine path with session scoping

View File

@@ -1,2 +0,0 @@
-- AlterTable
ALTER TABLE "platform"."LibraryAgent" ADD COLUMN "topIntegrations" JSONB NOT NULL DEFAULT '[]';

View File

@@ -440,8 +440,6 @@ model LibraryAgent {
settings Json @default("{}")
topIntegrations Json @default("[]")
@@unique([userId, agentGraphId, agentGraphVersion])
@@index([agentGraphId, agentGraphVersion])
@@index([creatorId])

View File

@@ -1,3 +1,4 @@
import { useCopilotUIStore } from "@/app/(platform)/copilot/store";
import { ChangeEvent, FormEvent, useEffect, useState } from "react";
interface Args {
@@ -16,6 +17,16 @@ export function useChatInput({
}: Args) {
const [value, setValue] = useState("");
const [isSending, setIsSending] = useState(false);
const { initialPrompt, setInitialPrompt } = useCopilotUIStore();
useEffect(
function consumeInitialPrompt() {
if (!initialPrompt) return;
setValue((prev) => (prev.length === 0 ? initialPrompt : prev));
setInitialPrompt(null);
},
[initialPrompt, setInitialPrompt],
);
useEffect(
function focusOnMount() {

View File

@@ -7,6 +7,10 @@ export interface DeleteTarget {
}
interface CopilotUIState {
/** Prompt extracted from URL hash (e.g. /copilot#prompt=...) for input prefill. */
initialPrompt: string | null;
setInitialPrompt: (prompt: string | null) => void;
sessionToDelete: DeleteTarget | null;
setSessionToDelete: (target: DeleteTarget | null) => void;
@@ -31,6 +35,9 @@ interface CopilotUIState {
}
export const useCopilotUIStore = create<CopilotUIState>((set) => ({
initialPrompt: null,
setInitialPrompt: (prompt) => set({ initialPrompt: prompt }),
sessionToDelete: null,
setSessionToDelete: (target) => set({ sessionToDelete: target }),

View File

@@ -19,6 +19,42 @@ import { useCopilotStream } from "./useCopilotStream";
const TITLE_POLL_INTERVAL_MS = 2_000;
const TITLE_POLL_MAX_ATTEMPTS = 5;
/**
* Extract a prompt from the URL hash fragment.
* Supports: /copilot#prompt=URL-encoded-text
* Optionally auto-submits if ?autosubmit=true is in the query string.
* Returns null if no prompt is present.
*/
function extractPromptFromUrl(): {
prompt: string;
autosubmit: boolean;
} | null {
if (typeof window === "undefined") return null;
const hash = window.location.hash;
if (!hash) return null;
const hashParams = new URLSearchParams(hash.slice(1));
const prompt = hashParams.get("prompt");
if (!prompt || !prompt.trim()) return null;
const searchParams = new URLSearchParams(window.location.search);
const autosubmit = searchParams.get("autosubmit") === "true";
// Clean up hash + autosubmit param only (preserve other query params)
const cleanURL = new URL(window.location.href);
cleanURL.hash = "";
cleanURL.searchParams.delete("autosubmit");
window.history.replaceState(
null,
"",
`${cleanURL.pathname}${cleanURL.search}`,
);
return { prompt: prompt.trim(), autosubmit };
}
interface UploadedFile {
file_id: string;
name: string;
@@ -127,6 +163,28 @@ export function useCopilotPage() {
}
}, [sessionId, pendingMessage, sendMessage]);
// --- Extract prompt from URL hash on mount (e.g. /copilot#prompt=Hello) ---
const { setInitialPrompt } = useCopilotUIStore();
const hasProcessedUrlPrompt = useRef(false);
useEffect(() => {
if (hasProcessedUrlPrompt.current) return;
const urlPrompt = extractPromptFromUrl();
if (!urlPrompt) return;
hasProcessedUrlPrompt.current = true;
if (urlPrompt.autosubmit) {
setPendingMessage(urlPrompt.prompt);
void createSession().catch(() => {
setPendingMessage(null);
setInitialPrompt(urlPrompt.prompt);
});
} else {
setInitialPrompt(urlPrompt.prompt);
}
}, [createSession, setInitialPrompt]);
async function uploadFiles(
files: File[],
sid: string,

View File

@@ -12,7 +12,6 @@ import Avatar, {
AvatarImage,
} from "@/components/atoms/Avatar/Avatar";
import { Link } from "@/components/atoms/Link/Link";
import { IntegrationLinkImage } from "@/components/molecules/IntegrationLinkImage/IntegrationLinkImage";
import { AgentCardMenu } from "./components/AgentCardMenu";
import { FavoriteButton } from "./components/FavoriteButton";
import { useLibraryAgentCard } from "./useLibraryAgentCard";
@@ -103,17 +102,20 @@ export function LibraryAgentCard({ agent, draggable = true }: Props) {
</Text>
{!image_url ? (
<IntegrationLinkImage
integrations={
"top_integrations" in agent
? (agent.top_integrations as Array<{
name: string;
type: "provider" | "category";
}>)
: []
}
size="sm"
className="h-[3.64rem] w-[6.70rem] flex-shrink-0 rounded-small"
<div
className={`h-[3.64rem] w-[6.70rem] flex-shrink-0 rounded-small ${
[
"bg-gradient-to-r from-green-200 to-blue-200",
"bg-gradient-to-r from-pink-200 to-purple-200",
"bg-gradient-to-r from-yellow-200 to-orange-200",
"bg-gradient-to-r from-blue-200 to-cyan-200",
"bg-gradient-to-r from-indigo-200 to-purple-200",
][parseInt(id.slice(0, 8), 16) % 5]
}`}
style={{
backgroundSize: "200% 200%",
animation: "gradient 15s ease infinite",
}}
/>
) : (
<Image

View File

@@ -10,7 +10,6 @@ import {
} from "@/components/__legacy__/ui/card";
import { useState } from "react";
import { StoreAgent } from "@/app/api/__generated__/models/storeAgent";
import { IntegrationLinkImage } from "@/components/molecules/IntegrationLinkImage/IntegrationLinkImage";
interface FeaturedStoreCardProps {
agent: StoreAgent;
@@ -41,32 +40,15 @@ export const FeaturedAgentCard = ({
</CardHeader>
<CardContent className="flex-1 p-4">
<div className="relative aspect-[4/3] w-full overflow-hidden rounded-xl">
{agent.agent_image ? (
<Image
src={agent.agent_image}
alt={`${agent.agent_name} preview`}
fill
sizes="100%"
className={`object-cover transition-opacity duration-200 ${
isHovered ? "opacity-0" : "opacity-100"
}`}
/>
) : (
<IntegrationLinkImage
integrations={
"top_integrations" in agent
? (agent.top_integrations as Array<{
name: string;
type: "provider" | "category";
}>)
: []
}
size="lg"
className={`absolute inset-0 h-full w-full transition-opacity duration-200 ${
isHovered ? "opacity-0" : "opacity-100"
}`}
/>
)}
<Image
src={agent.agent_image || "/autogpt-logo-dark-bg.png"}
alt={`${agent.agent_name} preview`}
fill
sizes="100%"
className={`object-cover transition-opacity duration-200 ${
isHovered ? "opacity-0" : "opacity-100"
}`}
/>
<div
className={`absolute inset-0 overflow-y-auto p-4 transition-opacity duration-200 ${
isHovered ? "opacity-100" : "opacity-0"

View File

@@ -4,7 +4,6 @@ import Avatar, {
AvatarFallback,
AvatarImage,
} from "@/components/atoms/Avatar/Avatar";
import { IntegrationLinkImage } from "@/components/molecules/IntegrationLinkImage/IntegrationLinkImage";
interface StoreCardProps {
agentName: string;
@@ -16,7 +15,6 @@ interface StoreCardProps {
avatarSrc: string;
hideAvatar?: boolean;
creatorName?: string;
topIntegrations?: Array<{ name: string; type: "provider" | "category" }>;
}
export const StoreCard: React.FC<StoreCardProps> = ({
@@ -29,7 +27,6 @@ export const StoreCard: React.FC<StoreCardProps> = ({
avatarSrc,
hideAvatar = false,
creatorName,
topIntegrations,
}) => {
const handleClick = () => {
onClick();
@@ -51,19 +48,13 @@ export const StoreCard: React.FC<StoreCardProps> = ({
>
{/* First Section: Image with Avatar */}
<div className="relative aspect-[2/1.2] w-full overflow-hidden rounded-3xl md:aspect-[2.17/1]">
{agentImage ? (
{agentImage && (
<Image
src={agentImage}
alt={`${agentName} preview image`}
fill
className="object-cover"
/>
) : (
<IntegrationLinkImage
integrations={topIntegrations ?? []}
size="md"
className="absolute inset-0 h-full w-full"
/>
)}
{!hideAvatar && (
<div className="absolute bottom-4 left-4">

View File

@@ -1,112 +0,0 @@
"use client";
import Image from "next/image";
import { useState } from "react";
import {
getCategoryIcon,
getProviderIconPath,
RobotIcon,
PlugIcon,
} from "./helpers";
interface Props {
integrations: Array<{ name: string; type: "provider" | "category" }>;
size?: "sm" | "md" | "lg";
className?: string;
}
const SIZE_CONFIG = {
sm: { icon: 20, gap: 8, lineWidth: 12 },
md: { icon: 28, gap: 12, lineWidth: 16 },
lg: { icon: 36, gap: 16, lineWidth: 20 },
} as const;
function ProviderIcon({ name, iconSize }: { name: string; iconSize: number }) {
const [hasError, setHasError] = useState(false);
if (hasError) {
return <PlugIcon size={iconSize} className="text-zinc-400" />;
}
return (
<Image
src={getProviderIconPath(name)}
alt={name}
width={iconSize}
height={iconSize}
className="rounded-sm object-contain"
onError={() => setHasError(true)}
/>
);
}
function ConnectingLine({ width, height }: { width: number; height: number }) {
return (
<svg
width={width}
height={height}
viewBox={`0 0 ${width} ${height}`}
className="flex-shrink-0"
>
<line
x1={0}
y1={height / 2}
x2={width}
y2={height / 2}
stroke="currentColor"
strokeWidth={1.5}
className="text-zinc-300"
/>
<polygon
points={`${width - 4},${height / 2 - 3} ${width},${height / 2} ${width - 4},${height / 2 + 3}`}
fill="currentColor"
className="text-zinc-300"
/>
</svg>
);
}
export function IntegrationLinkImage({
integrations,
size = "sm",
className = "",
}: Props) {
const config = SIZE_CONFIG[size];
const items = integrations.slice(0, 3);
if (items.length === 0) {
return (
<div
className={`flex items-center justify-center rounded-small bg-zinc-50 ${className}`}
>
<RobotIcon size={config.icon} className="text-zinc-400" />
</div>
);
}
return (
<div
className={`flex items-center justify-center gap-0 rounded-small bg-zinc-50 ${className}`}
>
{items.map((item, i) => (
<div key={`${item.name}-${i}`} className="flex items-center">
{i > 0 && (
<ConnectingLine width={config.lineWidth} height={config.icon} />
)}
<div className="flex items-center justify-center">
{item.type === "provider" ? (
<ProviderIcon name={item.name} iconSize={config.icon} />
) : (
(() => {
const CategoryIcon = getCategoryIcon(item.name);
return (
<CategoryIcon size={config.icon} className="text-zinc-500" />
);
})()
)}
</div>
</div>
))}
</div>
);
}

View File

@@ -1,50 +0,0 @@
import {
BrainIcon,
UsersThreeIcon,
ChatCircleIcon,
CodeIcon,
DatabaseIcon,
TextTIcon,
MagnifyingGlassIcon,
GitBranchIcon,
CubeIcon,
ArrowSquareInIcon,
ArrowSquareOutIcon,
AddressBookIcon,
FilmStripIcon,
CheckSquareIcon,
MegaphoneIcon,
BugIcon,
RobotIcon,
PlugIcon,
} from "@phosphor-icons/react";
import type { Icon } from "@phosphor-icons/react";
const CATEGORY_ICON_MAP: Record<string, Icon> = {
AI: BrainIcon,
SOCIAL: UsersThreeIcon,
COMMUNICATION: ChatCircleIcon,
DEVELOPER_TOOLS: CodeIcon,
DATA: DatabaseIcon,
TEXT: TextTIcon,
SEARCH: MagnifyingGlassIcon,
LOGIC: GitBranchIcon,
BASIC: CubeIcon,
INPUT: ArrowSquareInIcon,
OUTPUT: ArrowSquareOutIcon,
CRM: AddressBookIcon,
MULTIMEDIA: FilmStripIcon,
PRODUCTIVITY: CheckSquareIcon,
MARKETING: MegaphoneIcon,
ISSUE_TRACKING: BugIcon,
};
export function getCategoryIcon(categoryName: string): Icon {
return CATEGORY_ICON_MAP[categoryName] ?? CubeIcon;
}
export function getProviderIconPath(providerName: string): string {
return `/integrations/${providerName}.png`;
}
export { RobotIcon, PlugIcon };

View File

@@ -525,7 +525,6 @@ export type LibraryAgent = {
is_favorite: boolean;
is_latest_version: boolean;
recommended_schedule_cron: string | null;
top_integrations: Array<{ name: string; type: "provider" | "category" }>;
} & (
| {
has_external_trigger: true;

View File

@@ -0,0 +1,343 @@
# Workspace & Media File Architecture
This document describes the architecture for handling user files in AutoGPT Platform, covering persistent user storage (Workspace) and ephemeral media processing pipelines.
## Overview
The platform has two distinct file-handling layers:
| Layer | Purpose | Persistence | Scope |
|-------|---------|-------------|-------|
| **Workspace** | Long-term user file storage | Persistent (DB + GCS/local) | Per-user, session-scoped access |
| **Media Pipeline** | Ephemeral file processing for blocks | Temporary (local disk) | Per-execution |
## Database Models
### UserWorkspace
Represents a user's file storage space. Created on-demand (one per user).
```prisma
model UserWorkspace {
id String @id @default(uuid())
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
userId String @unique
Files UserWorkspaceFile[]
}
```
**Key points:**
- One workspace per user (enforced by `@unique` on `userId`)
- Created lazily via `get_or_create_workspace()`
- Uses upsert to handle race conditions
### UserWorkspaceFile
Represents a file stored in a user's workspace.
```prisma
model UserWorkspaceFile {
id String @id @default(uuid())
workspaceId String
name String // User-visible filename
path String // Virtual path (e.g., "/sessions/abc123/image.png")
storagePath String // Actual storage path (gcs://... or local://...)
mimeType String
sizeBytes BigInt
checksum String? // SHA256 for integrity
isDeleted Boolean @default(false)
deletedAt DateTime?
metadata Json @default("{}")
@@unique([workspaceId, path]) // Enforce unique paths within workspace
}
```
**Key points:**
- `path` is a virtual path for organizing files (not actual filesystem path)
- `storagePath` contains the actual GCS or local storage location
- Soft-delete pattern: `isDeleted` flag with `deletedAt` timestamp
- Path is modified on delete to free up the virtual path for reuse
---
## WorkspaceManager
**Location:** `backend/util/workspace.py`
High-level API for workspace file operations. Combines storage backend operations with database record management.
### Initialization
```python
from backend.util.workspace import WorkspaceManager
# Basic usage
manager = WorkspaceManager(user_id="user-123", workspace_id="ws-456")
# With session scoping (CoPilot sessions)
manager = WorkspaceManager(
user_id="user-123",
workspace_id="ws-456",
session_id="session-789"
)
```
### Session Scoping
When `session_id` is provided, files are isolated to `/sessions/{session_id}/`:
```python
# With session_id="abc123":
manager.write_file(content, "image.png")
# → stored at /sessions/abc123/image.png
# Cross-session access is explicit:
manager.read_file("/sessions/other-session/file.txt") # Works
```
**Why session scoping?**
- CoPilot conversations need file isolation
- Prevents file collisions between concurrent sessions
- Allows session cleanup without affecting other sessions
### Core Methods
| Method | Description |
|--------|-------------|
| `write_file(content, filename, path?, mime_type?, overwrite?)` | Write file to workspace |
| `read_file(path)` | Read file by virtual path |
| `read_file_by_id(file_id)` | Read file by ID |
| `list_files(path?, limit?, offset?, include_all_sessions?)` | List files |
| `delete_file(file_id)` | Soft-delete a file |
| `get_download_url(file_id, expires_in?)` | Get signed download URL |
| `get_file_info(file_id)` | Get file metadata |
| `get_file_info_by_path(path)` | Get file metadata by path |
| `get_file_count(path?, include_all_sessions?)` | Count files |
### Storage Backends
WorkspaceManager delegates to `WorkspaceStorageBackend`:
| Backend | When Used | Storage Path Format |
|---------|-----------|---------------------|
| `GCSWorkspaceStorage` | `media_gcs_bucket_name` is configured | `gcs://bucket/workspaces/{ws_id}/{file_id}/{filename}` |
| `LocalWorkspaceStorage` | No GCS bucket configured | `local://{ws_id}/{file_id}/{filename}` |
---
## store_media_file()
**Location:** `backend/util/file.py`
The media normalization pipeline. Handles various input types and normalizes them for processing or output.
### Purpose
Blocks receive files in many formats (URLs, data URIs, workspace references, local paths). `store_media_file()` normalizes these to a consistent format based on what the block needs.
### Input Types Handled
| Input Format | Example | How It's Processed |
|--------------|---------|-------------------|
| Data URI | `data:image/png;base64,iVBOR...` | Decoded, virus scanned, written locally |
| HTTP(S) URL | `https://example.com/image.png` | Downloaded, virus scanned, written locally |
| Workspace URI | `workspace://abc123` or `workspace:///path/to/file` | Read from workspace, virus scanned, written locally |
| Cloud path | `gcs://bucket/path` | Downloaded, virus scanned, written locally |
| Local path | `image.png` | Verified to exist in exec_file directory |
### Return Formats
The `return_format` parameter determines what you get back:
```python
from backend.util.file import store_media_file
# For local processing (ffmpeg, MoviePy, PIL)
local_path = await store_media_file(
file=input_file,
execution_context=ctx,
return_format="for_local_processing"
)
# Returns: "image.png" (relative path in exec_file dir)
# For external APIs (Replicate, OpenAI, etc.)
data_uri = await store_media_file(
file=input_file,
execution_context=ctx,
return_format="for_external_api"
)
# Returns: "data:image/png;base64,iVBOR..."
# For block output (adapts to execution context)
output = await store_media_file(
file=input_file,
execution_context=ctx,
return_format="for_block_output"
)
# In CoPilot: Returns "workspace://file-id#image/png"
# In graphs: Returns "data:image/png;base64,..."
```
### Execution Context
`store_media_file()` requires an `ExecutionContext` with:
- `graph_exec_id` - Required for temp file location
- `user_id` - Required for workspace access
- `workspace_id` - Optional; enables workspace features
- `session_id` - Optional; for session scoping in CoPilot
---
## Responsibility Boundaries
### Virus Scanning
| Component | Scans? | Notes |
|-----------|--------|-------|
| `store_media_file()` | ✅ Yes | Scans **all** content before writing to local disk |
| `WorkspaceManager.write_file()` | ✅ Yes | Scans content before persisting |
**Scanning happens at:**
1. `store_media_file()` — scans everything it downloads/decodes
2. `WorkspaceManager.write_file()` — scans before persistence
Tools like `WriteWorkspaceFileTool` don't need to scan because `WorkspaceManager.write_file()` handles it.
### Persistence
| Component | Persists To | Lifecycle |
|-----------|-------------|-----------|
| `store_media_file()` | Temp dir (`/tmp/exec_file/{exec_id}/`) | Cleaned after execution |
| `WorkspaceManager` | GCS or local storage + DB | Persistent until deleted |
**Automatic cleanup:** `clean_exec_files(graph_exec_id)` removes temp files after execution completes.
---
## Decision Tree: WorkspaceManager vs store_media_file
```text
┌─────────────────────────────────────────────────────┐
│ What do you need to do with the file? │
└─────────────────────────────────────────────────────┘
┌─────────────┴─────────────┐
▼ ▼
Process in a block Store for user access
(ffmpeg, PIL, etc.) (CoPilot files, uploads)
│ │
▼ ▼
store_media_file() WorkspaceManager
with appropriate
return_format
┌──────┴──────┐
▼ ▼
"for_local_ "for_block_
processing" output"
│ │
▼ ▼
Get local Auto-saves to
path for workspace in
tools CoPilot context
Store for user access
├── write_file() ─── Upload + persist (scans internally)
├── read_file() / get_download_url() ─── Retrieve
└── list_files() / delete_file() ─── Manage
```
### Quick Reference
| Scenario | Use |
|----------|-----|
| Block needs to process a file with ffmpeg | `store_media_file(..., return_format="for_local_processing")` |
| Block needs to send file to external API | `store_media_file(..., return_format="for_external_api")` |
| Block returning a generated file | `store_media_file(..., return_format="for_block_output")` |
| API endpoint handling file upload | `WorkspaceManager.write_file()` (handles virus scanning internally) |
| API endpoint serving file download | `WorkspaceManager.get_download_url()` |
| Listing user's files | `WorkspaceManager.list_files()` |
---
## Key Files Reference
| File | Purpose |
|------|---------|
| `backend/data/workspace.py` | Database CRUD operations for UserWorkspace and UserWorkspaceFile |
| `backend/util/workspace.py` | `WorkspaceManager` class - high-level workspace API |
| `backend/util/workspace_storage.py` | Storage backends (GCS, local) and `WorkspaceStorageBackend` interface |
| `backend/util/file.py` | `store_media_file()` and media processing utilities |
| `backend/util/virus_scanner.py` | `VirusScannerService` and `scan_content_safe()` |
| `schema.prisma` | Database model definitions |
---
## Common Patterns
### Block Processing a User's File
```python
async def run(self, input_data, *, execution_context, **kwargs):
# Normalize input to local path
local_path = await store_media_file(
file=input_data.video,
execution_context=execution_context,
return_format="for_local_processing",
)
# Process with local tools
output_path = process_video(local_path)
# Return (auto-saves to workspace in CoPilot)
result = await store_media_file(
file=output_path,
execution_context=execution_context,
return_format="for_block_output",
)
yield "output", result
```
### API Upload Endpoint
```python
from backend.util.virus_scanner import VirusDetectedError, VirusScanError
async def upload_file(file: UploadFile, user_id: str, workspace_id: str):
content = await file.read()
# write_file handles virus scanning internally
manager = WorkspaceManager(user_id, workspace_id)
try:
workspace_file = await manager.write_file(
content=content,
filename=file.filename,
)
except VirusDetectedError:
raise HTTPException(status_code=400, detail="File rejected: virus detected")
except VirusScanError:
raise HTTPException(status_code=503, detail="Virus scanning unavailable")
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
return {"file_id": workspace_file.id}
```
---
## Configuration
| Setting | Purpose | Default |
|---------|---------|---------|
| `media_gcs_bucket_name` | GCS bucket for workspace storage | None (uses local) |
| `workspace_storage_dir` | Local storage directory | `{app_data}/workspaces` |
| `max_file_size_mb` | Maximum file size in MB | 100 |
| `clamav_service_enabled` | Enable virus scanning | true |
| `clamav_service_host` | ClamAV daemon host | localhost |
| `clamav_service_port` | ClamAV daemon port | 3310 |
| `clamav_max_concurrency` | Max concurrent scans to ClamAV daemon | 5 |
| `clamav_mark_failed_scans_as_clean` | If true, scan failures pass content through instead of rejecting (⚠️ security risk if ClamAV is unreachable) | false |