mirror of
https://github.com/Significant-Gravitas/AutoGPT.git
synced 2026-02-11 07:15:08 -05:00
## Summary
This PR introduces a complete cloud storage infrastructure and file
upload system that agents can use instead of passing base64 data
directly in inputs, while maintaining backward compatibility for the
builder's node inputs.
### Problem Statement
Currently, when agents need to process files, they pass base64-encoded
data directly in the input, which has several limitations:
1. **Size limitations**: Base64 encoding increases file size by ~33%,
making large files impractical
2. **Memory usage**: Large base64 strings consume significant memory
during processing
3. **Network overhead**: Base64 data is sent repeatedly in API requests
4. **Performance impact**: Encoding/decoding base64 adds processing
overhead
### Solution
This PR introduces a complete cloud storage infrastructure and new file
upload workflow:
1. **New cloud storage system**: Complete `CloudStorageHandler` with
async GCS operations
2. **New upload endpoint**: Agents upload files via `/files/upload` and
receive a `file_uri`
3. **GCS storage**: Files are stored in Google Cloud Storage with
user-scoped paths
4. **URI references**: Agents pass the `file_uri` instead of base64 data
5. **Block processing**: File blocks can retrieve actual file content
using the URI
### Changes Made
#### New Files Introduced:
- **`backend/util/cloud_storage.py`** - Complete cloud storage
infrastructure (545 lines)
- **`backend/util/cloud_storage_test.py`** - Comprehensive test suite
(471 lines)
#### Backend Changes:
- **New cloud storage infrastructure** in
`backend/util/cloud_storage.py`:
- Complete `CloudStorageHandler` class with async GCS operations
- Support for multiple cloud providers (GCS implemented, S3/Azure
prepared)
- User-scoped and execution-scoped file storage with proper
authorization
- Automatic file expiration with metadata-based cleanup
- Path traversal protection and comprehensive security validation
- Async file operations with proper error handling and logging
- **New `UploadFileResponse` model** in `backend/server/model.py`:
- Returns `file_uri` (GCS path like
`gcs://bucket/users/{user_id}/file.txt`)
- Includes `file_name`, `size`, `content_type`, `expires_in_hours`
- Proper Pydantic schema instead of dictionary response
- **New `upload_file` endpoint** in `backend/server/routers/v1.py`:
- Complete new endpoint for file upload with cloud storage integration
- Returns GCS path URI directly as `file_uri`
- Supports user-scoped file storage for proper isolation
- Maintains fallback to base64 data URI when GCS not configured
- File size validation, virus scanning, and comprehensive error handling
#### Frontend Changes:
- **Updated API client** in
`frontend/src/lib/autogpt-server-api/client.ts`:
- Modified return type to expect `file_uri` instead of `signed_url`
- Supports the new upload workflow
- **Enhanced file input component** in
`frontend/src/components/type-based-input.tsx`:
- **Builder nodes**: Still use base64 for immediate data retention
without expiration
- **Agent inputs**: Use the new upload endpoint and pass `file_uri`
references
- Maintains backward compatibility for existing workflows
#### Test Updates:
- **New comprehensive test suite** in
`backend/util/cloud_storage_test.py`:
- 27 test cases covering all cloud storage functionality
- Tests for file storage, retrieval, authorization, and cleanup
- Tests for path validation, security, and error handling
- Coverage for user-scoped, execution-scoped, and system storage
- **New upload endpoint tests** in `backend/server/routers/v1_test.py`:
- Tests for GCS path URI format (`gcs://bucket/path`)
- Tests for base64 fallback when GCS not configured
- Validates file upload, virus scanning, and size limits
- Tests user-scoped file storage and access control
### Benefits
1. **New Infrastructure**: Complete cloud storage system with
enterprise-grade features
2. **Scalability**: Supports larger files without base64 size penalties
3. **Performance**: Reduces memory usage and network overhead with async
operations
4. **Security**: User-scoped file storage with comprehensive access
control and path validation
5. **Flexibility**: Maintains base64 support for builder nodes while
providing URI-based approach for agents
6. **Extensibility**: Designed for multiple cloud providers (GCS, S3,
Azure)
7. **Reliability**: Automatic file expiration, cleanup, and robust error
handling
8. **Backward compatibility**: Existing builder workflows continue to
work unchanged
### Usage
**For Agent Inputs:**
```typescript
// 1. Upload file
const response = await api.uploadFile(file);
// 2. Pass file_uri to agent
const agentInput = { file_input: response.file_uri };
```
**For Builder Nodes (unchanged):**
```typescript
// Still uses base64 for immediate data retention
const nodeInput = { file_input: "data:image/jpeg;base64,..." };
```
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] All new cloud storage tests pass (27/27)
- [x] All upload file tests pass (7/7)
- [x] Full v1 router test suite passes (21/21)
- [x] All server tests pass (126/126)
- [x] Backend formatting and linting pass
- [x] Frontend TypeScript compilation succeeds
- [x] Verified GCS path URI format (`gcs://bucket/path`)
- [x] Tested fallback to base64 data URI when GCS not configured
- [x] Confirmed file upload functionality works in UI
- [x] Validated response schema matches Pydantic model
- [x] Tested agent workflow with file_uri references
- [x] Verified builder nodes still work with base64 data
- [x] Tested user-scoped file access control
- [x] Verified file expiration and cleanup functionality
- [x] Tested security validation and path traversal protection
#### For configuration changes:
- [x] No new configuration changes required
- [x] `.env.example` remains compatible
- [x] `docker-compose.yml` remains compatible
- [x] Uses existing GCS configuration from media storage
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude AI <claude@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
131 lines
3.2 KiB
TOML
131 lines
3.2 KiB
TOML
[tool.poetry]
|
|
name = "autogpt-platform-backend"
|
|
version = "0.4.9"
|
|
description = "A platform for building AI-powered agentic workflows"
|
|
authors = ["AutoGPT <info@agpt.co>"]
|
|
readme = "README.md"
|
|
packages = [{ include = "backend", format = "sdist" }]
|
|
|
|
|
|
[tool.poetry.dependencies]
|
|
python = ">=3.10,<3.13"
|
|
aio-pika = "^9.5.5"
|
|
aiodns = "^3.5.0"
|
|
anthropic = "^0.57.1"
|
|
apscheduler = "^3.11.0"
|
|
autogpt-libs = { path = "../autogpt_libs", develop = true }
|
|
bleach = { extras = ["css"], version = "^6.2.0" }
|
|
click = "^8.2.0"
|
|
cryptography = "^43.0"
|
|
discord-py = "^2.5.2"
|
|
e2b-code-interpreter = "^1.5.2"
|
|
fastapi = "^0.116.1"
|
|
feedparser = "^6.0.11"
|
|
flake8 = "^7.3.0"
|
|
google-api-python-client = "^2.176.0"
|
|
google-auth-oauthlib = "^1.2.2"
|
|
google-cloud-storage = "^3.2.0"
|
|
googlemaps = "^4.10.0"
|
|
gravitasml = "^0.1.3"
|
|
groq = "^0.30.0"
|
|
html2text = "^2024.2.26"
|
|
jinja2 = "^3.1.6"
|
|
jsonref = "^1.1.0"
|
|
jsonschema = "^4.22.0"
|
|
launchdarkly-server-sdk = "^9.12.0"
|
|
mem0ai = "^0.1.114"
|
|
moviepy = "^2.1.2"
|
|
ollama = "^0.5.1"
|
|
openai = "^1.97.0"
|
|
pika = "^1.3.2"
|
|
pinecone = "^7.3.0"
|
|
poetry = "2.1.1" # CHECK DEPENDABOT SUPPORT BEFORE UPGRADING
|
|
postmarker = "^1.0"
|
|
praw = "~7.8.1"
|
|
prisma = "^0.15.0"
|
|
prometheus-client = "^0.22.1"
|
|
psutil = "^7.0.0"
|
|
psycopg2-binary = "^2.9.10"
|
|
pydantic = { extras = ["email"], version = "^2.11.7" }
|
|
pydantic-settings = "^2.10.1"
|
|
pytest = "^8.4.1"
|
|
pytest-asyncio = "^1.1.0"
|
|
python-dotenv = "^1.1.1"
|
|
python-multipart = "^0.0.20"
|
|
redis = "^5.2.0"
|
|
replicate = "^1.0.6"
|
|
sentry-sdk = {extras = ["anthropic", "fastapi", "launchdarkly", "openai", "sqlalchemy"], version = "^2.33.0"}
|
|
sqlalchemy = "^2.0.40"
|
|
strenum = "^0.4.9"
|
|
stripe = "^11.5.0"
|
|
supabase = "2.16.0"
|
|
tenacity = "^9.1.2"
|
|
todoist-api-python = "^2.1.7"
|
|
tweepy = "^4.16.0"
|
|
uvicorn = { extras = ["standard"], version = "^0.35.0" }
|
|
websockets = "^15.0"
|
|
youtube-transcript-api = "^1.1.1"
|
|
zerobouncesdk = "^1.1.2"
|
|
# NOTE: please insert new dependencies in their alphabetical location
|
|
pytest-snapshot = "^0.9.0"
|
|
aiofiles = "^24.1.0"
|
|
tiktoken = "^0.9.0"
|
|
aioclamd = "^1.0.0"
|
|
setuptools = "^80.9.0"
|
|
gcloud-aio-storage = "^9.5.0"
|
|
pandas = "^2.3.1"
|
|
|
|
[tool.poetry.group.dev.dependencies]
|
|
aiohappyeyeballs = "^2.6.1"
|
|
black = "^24.10.0"
|
|
faker = "^37.4.2"
|
|
httpx = "^0.28.1"
|
|
isort = "^5.13.2"
|
|
poethepoet = "^0.36.0"
|
|
pre-commit = "^4.2.0"
|
|
pyright = "^1.1.403"
|
|
pytest-mock = "^3.14.0"
|
|
pytest-watcher = "^0.4.2"
|
|
requests = "^2.32.4"
|
|
ruff = "^0.12.3"
|
|
# NOTE: please insert new dependencies in their alphabetical location
|
|
|
|
[build-system]
|
|
requires = ["poetry-core"]
|
|
build-backend = "poetry.core.masonry.api"
|
|
|
|
[tool.poetry.scripts]
|
|
app = "backend.app:main"
|
|
rest = "backend.rest:main"
|
|
ws = "backend.ws:main"
|
|
scheduler = "backend.scheduler:main"
|
|
executor = "backend.exec:main"
|
|
cli = "backend.cli:main"
|
|
format = "linter:format"
|
|
lint = "linter:lint"
|
|
test = "run_tests:test"
|
|
|
|
[tool.isort]
|
|
profile = "black"
|
|
|
|
[tool.pytest-watcher]
|
|
now = false
|
|
clear = true
|
|
delay = 0.2
|
|
runner = "pytest"
|
|
runner_args = []
|
|
patterns = ["*.py"]
|
|
ignore_patterns = []
|
|
|
|
[tool.pytest.ini_options]
|
|
asyncio_mode = "auto"
|
|
asyncio_default_fixture_loop_scope = "session"
|
|
filterwarnings = [
|
|
"ignore:'audioop' is deprecated:DeprecationWarning:discord.player",
|
|
"ignore:invalid escape sequence:DeprecationWarning:tweepy.api",
|
|
]
|
|
|
|
[tool.ruff]
|
|
target-version = "py310"
|
|
|