Files
AutoGPT/autogpt_platform/backend
Zamil Majdy d33459ddb5 feat(backend): Integrate GCS file storage with automatic expiration for Agent File Input (#10340)
## Summary

This PR introduces a complete cloud storage infrastructure and file
upload system that agents can use instead of passing base64 data
directly in inputs, while maintaining backward compatibility for the
builder's node inputs.

### Problem Statement

Currently, when agents need to process files, they pass base64-encoded
data directly in the input, which has several limitations:
1. **Size limitations**: Base64 encoding increases file size by ~33%,
making large files impractical
2. **Memory usage**: Large base64 strings consume significant memory
during processing
3. **Network overhead**: Base64 data is sent repeatedly in API requests
4. **Performance impact**: Encoding/decoding base64 adds processing
overhead

### Solution

This PR introduces a complete cloud storage infrastructure and new file
upload workflow:
1. **New cloud storage system**: Complete `CloudStorageHandler` with
async GCS operations
2. **New upload endpoint**: Agents upload files via `/files/upload` and
receive a `file_uri`
3. **GCS storage**: Files are stored in Google Cloud Storage with
user-scoped paths
4. **URI references**: Agents pass the `file_uri` instead of base64 data
5. **Block processing**: File blocks can retrieve actual file content
using the URI

### Changes Made

#### New Files Introduced:
- **`backend/util/cloud_storage.py`** - Complete cloud storage
infrastructure (545 lines)
- **`backend/util/cloud_storage_test.py`** - Comprehensive test suite
(471 lines)

#### Backend Changes:
- **New cloud storage infrastructure** in
`backend/util/cloud_storage.py`:
  - Complete `CloudStorageHandler` class with async GCS operations
- Support for multiple cloud providers (GCS implemented, S3/Azure
prepared)
- User-scoped and execution-scoped file storage with proper
authorization
  - Automatic file expiration with metadata-based cleanup
  - Path traversal protection and comprehensive security validation
  - Async file operations with proper error handling and logging

- **New `UploadFileResponse` model** in `backend/server/model.py`:
- Returns `file_uri` (GCS path like
`gcs://bucket/users/{user_id}/file.txt`)
  - Includes `file_name`, `size`, `content_type`, `expires_in_hours`
  - Proper Pydantic schema instead of dictionary response

- **New `upload_file` endpoint** in `backend/server/routers/v1.py`:
  - Complete new endpoint for file upload with cloud storage integration
  - Returns GCS path URI directly as `file_uri`
  - Supports user-scoped file storage for proper isolation
  - Maintains fallback to base64 data URI when GCS not configured
- File size validation, virus scanning, and comprehensive error handling

#### Frontend Changes:
- **Updated API client** in
`frontend/src/lib/autogpt-server-api/client.ts`:
  - Modified return type to expect `file_uri` instead of `signed_url`
  - Supports the new upload workflow

- **Enhanced file input component** in
`frontend/src/components/type-based-input.tsx`:
- **Builder nodes**: Still use base64 for immediate data retention
without expiration
- **Agent inputs**: Use the new upload endpoint and pass `file_uri`
references
  - Maintains backward compatibility for existing workflows

#### Test Updates:
- **New comprehensive test suite** in
`backend/util/cloud_storage_test.py`:
  - 27 test cases covering all cloud storage functionality
  - Tests for file storage, retrieval, authorization, and cleanup
  - Tests for path validation, security, and error handling
  - Coverage for user-scoped, execution-scoped, and system storage

- **New upload endpoint tests** in `backend/server/routers/v1_test.py`:
  - Tests for GCS path URI format (`gcs://bucket/path`)
  - Tests for base64 fallback when GCS not configured
  - Validates file upload, virus scanning, and size limits
  - Tests user-scoped file storage and access control

### Benefits

1. **New Infrastructure**: Complete cloud storage system with
enterprise-grade features
2. **Scalability**: Supports larger files without base64 size penalties
3. **Performance**: Reduces memory usage and network overhead with async
operations
4. **Security**: User-scoped file storage with comprehensive access
control and path validation
5. **Flexibility**: Maintains base64 support for builder nodes while
providing URI-based approach for agents
6. **Extensibility**: Designed for multiple cloud providers (GCS, S3,
Azure)
7. **Reliability**: Automatic file expiration, cleanup, and robust error
handling
8. **Backward compatibility**: Existing builder workflows continue to
work unchanged

### Usage

**For Agent Inputs:**
```typescript
// 1. Upload file
const response = await api.uploadFile(file);
// 2. Pass file_uri to agent
const agentInput = { file_input: response.file_uri };
```

**For Builder Nodes (unchanged):**
```typescript
// Still uses base64 for immediate data retention
const nodeInput = { file_input: "data:image/jpeg;base64,..." };
```

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] All new cloud storage tests pass (27/27)
  - [x] All upload file tests pass (7/7)
  - [x] Full v1 router test suite passes (21/21)
  - [x] All server tests pass (126/126)
  - [x] Backend formatting and linting pass
  - [x] Frontend TypeScript compilation succeeds
  - [x] Verified GCS path URI format (`gcs://bucket/path`)
  - [x] Tested fallback to base64 data URI when GCS not configured
  - [x] Confirmed file upload functionality works in UI
  - [x] Validated response schema matches Pydantic model
  - [x] Tested agent workflow with file_uri references
  - [x] Verified builder nodes still work with base64 data
  - [x] Tested user-scoped file access control
  - [x] Verified file expiration and cleanup functionality
  - [x] Tested security validation and path traversal protection

#### For configuration changes:
- [x] No new configuration changes required
- [x] `.env.example` remains compatible 
- [x] `docker-compose.yml` remains compatible
- [x] Uses existing GCS configuration from media storage

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude AI <claude@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
2025-07-18 10:20:54 +07:00
..
2025-04-15 15:40:15 +00:00