## Summary
This PR introduces a complete cloud storage infrastructure and file
upload system that agents can use instead of passing base64 data
directly in inputs, while maintaining backward compatibility for the
builder's node inputs.
### Problem Statement
Currently, when agents need to process files, they pass base64-encoded
data directly in the input, which has several limitations:
1. **Size limitations**: Base64 encoding increases file size by ~33%,
making large files impractical
2. **Memory usage**: Large base64 strings consume significant memory
during processing
3. **Network overhead**: Base64 data is sent repeatedly in API requests
4. **Performance impact**: Encoding/decoding base64 adds processing
overhead
### Solution
This PR introduces a complete cloud storage infrastructure and new file
upload workflow:
1. **New cloud storage system**: Complete `CloudStorageHandler` with
async GCS operations
2. **New upload endpoint**: Agents upload files via `/files/upload` and
receive a `file_uri`
3. **GCS storage**: Files are stored in Google Cloud Storage with
user-scoped paths
4. **URI references**: Agents pass the `file_uri` instead of base64 data
5. **Block processing**: File blocks can retrieve actual file content
using the URI
### Changes Made
#### New Files Introduced:
- **`backend/util/cloud_storage.py`** - Complete cloud storage
infrastructure (545 lines)
- **`backend/util/cloud_storage_test.py`** - Comprehensive test suite
(471 lines)
#### Backend Changes:
- **New cloud storage infrastructure** in
`backend/util/cloud_storage.py`:
- Complete `CloudStorageHandler` class with async GCS operations
- Support for multiple cloud providers (GCS implemented, S3/Azure
prepared)
- User-scoped and execution-scoped file storage with proper
authorization
- Automatic file expiration with metadata-based cleanup
- Path traversal protection and comprehensive security validation
- Async file operations with proper error handling and logging
- **New `UploadFileResponse` model** in `backend/server/model.py`:
- Returns `file_uri` (GCS path like
`gcs://bucket/users/{user_id}/file.txt`)
- Includes `file_name`, `size`, `content_type`, `expires_in_hours`
- Proper Pydantic schema instead of dictionary response
- **New `upload_file` endpoint** in `backend/server/routers/v1.py`:
- Complete new endpoint for file upload with cloud storage integration
- Returns GCS path URI directly as `file_uri`
- Supports user-scoped file storage for proper isolation
- Maintains fallback to base64 data URI when GCS not configured
- File size validation, virus scanning, and comprehensive error handling
#### Frontend Changes:
- **Updated API client** in
`frontend/src/lib/autogpt-server-api/client.ts`:
- Modified return type to expect `file_uri` instead of `signed_url`
- Supports the new upload workflow
- **Enhanced file input component** in
`frontend/src/components/type-based-input.tsx`:
- **Builder nodes**: Still use base64 for immediate data retention
without expiration
- **Agent inputs**: Use the new upload endpoint and pass `file_uri`
references
- Maintains backward compatibility for existing workflows
#### Test Updates:
- **New comprehensive test suite** in
`backend/util/cloud_storage_test.py`:
- 27 test cases covering all cloud storage functionality
- Tests for file storage, retrieval, authorization, and cleanup
- Tests for path validation, security, and error handling
- Coverage for user-scoped, execution-scoped, and system storage
- **New upload endpoint tests** in `backend/server/routers/v1_test.py`:
- Tests for GCS path URI format (`gcs://bucket/path`)
- Tests for base64 fallback when GCS not configured
- Validates file upload, virus scanning, and size limits
- Tests user-scoped file storage and access control
### Benefits
1. **New Infrastructure**: Complete cloud storage system with
enterprise-grade features
2. **Scalability**: Supports larger files without base64 size penalties
3. **Performance**: Reduces memory usage and network overhead with async
operations
4. **Security**: User-scoped file storage with comprehensive access
control and path validation
5. **Flexibility**: Maintains base64 support for builder nodes while
providing URI-based approach for agents
6. **Extensibility**: Designed for multiple cloud providers (GCS, S3,
Azure)
7. **Reliability**: Automatic file expiration, cleanup, and robust error
handling
8. **Backward compatibility**: Existing builder workflows continue to
work unchanged
### Usage
**For Agent Inputs:**
```typescript
// 1. Upload file
const response = await api.uploadFile(file);
// 2. Pass file_uri to agent
const agentInput = { file_input: response.file_uri };
```
**For Builder Nodes (unchanged):**
```typescript
// Still uses base64 for immediate data retention
const nodeInput = { file_input: "data:image/jpeg;base64,..." };
```
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] All new cloud storage tests pass (27/27)
- [x] All upload file tests pass (7/7)
- [x] Full v1 router test suite passes (21/21)
- [x] All server tests pass (126/126)
- [x] Backend formatting and linting pass
- [x] Frontend TypeScript compilation succeeds
- [x] Verified GCS path URI format (`gcs://bucket/path`)
- [x] Tested fallback to base64 data URI when GCS not configured
- [x] Confirmed file upload functionality works in UI
- [x] Validated response schema matches Pydantic model
- [x] Tested agent workflow with file_uri references
- [x] Verified builder nodes still work with base64 data
- [x] Tested user-scoped file access control
- [x] Verified file expiration and cleanup functionality
- [x] Tested security validation and path traversal protection
#### For configuration changes:
- [x] No new configuration changes required
- [x] `.env.example` remains compatible
- [x] `docker-compose.yml` remains compatible
- [x] Uses existing GCS configuration from media storage
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude AI <claude@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
AutoGPT: Build, Deploy, and Run AI Agents
AutoGPT is a powerful platform that allows you to create, deploy, and manage continuous AI agents that automate complex workflows.
Hosting Options
- Download to self-host
- Join the Waitlist for the cloud-hosted beta
How to Setup for Self-Hosting
Note
Setting up and hosting the AutoGPT Platform yourself is a technical process. If you'd rather something that just works, we recommend joining the waitlist for the cloud-hosted beta.
System Requirements
Before proceeding with the installation, ensure your system meets the following requirements:
Hardware Requirements
- CPU: 4+ cores recommended
- RAM: Minimum 8GB, 16GB recommended
- Storage: At least 10GB of free space
Software Requirements
- Operating Systems:
- Linux (Ubuntu 20.04 or newer recommended)
- macOS (10.15 or newer)
- Windows 10/11 with WSL2
- Required Software (with minimum versions):
- Docker Engine (20.10.0 or newer)
- Docker Compose (2.0.0 or newer)
- Git (2.30 or newer)
- Node.js (16.x or newer)
- npm (8.x or newer)
- VSCode (1.60 or newer) or any modern code editor
Network Requirements
- Stable internet connection
- Access to required ports (will be configured in Docker)
- Ability to make outbound HTTPS connections
Updated Setup Instructions:
We've moved to a fully maintained and regularly updated documentation site.
👉 Follow the official self-hosting guide here
This tutorial assumes you have Docker, VSCode, git and npm installed.
⚡ Quick Setup with One-Line Script (Recommended for Local Hosting)
Skip the manual steps and get started in minutes using our automatic setup script.
For macOS/Linux:
curl -fsSL https://setup.agpt.co/install.sh -o install.sh && bash install.sh
For Windows (PowerShell):
powershell -c "iwr https://setup.agpt.co/install.bat -o install.bat; ./install.bat"
This will install dependencies, configure Docker, and launch your local instance — all in one go.
🧱 AutoGPT Frontend
The AutoGPT frontend is where users interact with our powerful AI automation platform. It offers multiple ways to engage with and leverage our AI agents. This is the interface where you'll bring your AI automation ideas to life:
Agent Builder: For those who want to customize, our intuitive, low-code interface allows you to design and configure your own AI agents.
Workflow Management: Build, modify, and optimize your automation workflows with ease. You build your agent by connecting blocks, where each block performs a single action.
Deployment Controls: Manage the lifecycle of your agents, from testing to production.
Ready-to-Use Agents: Don't want to build? Simply select from our library of pre-configured agents and put them to work immediately.
Agent Interaction: Whether you've built your own or are using pre-configured agents, easily run and interact with them through our user-friendly interface.
Monitoring and Analytics: Keep track of your agents' performance and gain insights to continually improve your automation processes.
Read this guide to learn how to build your own custom blocks.
💽 AutoGPT Server
The AutoGPT Server is the powerhouse of our platform This is where your agents run. Once deployed, agents can be triggered by external sources and can operate continuously. It contains all the essential components that make AutoGPT run smoothly.
Source Code: The core logic that drives our agents and automation processes.
Infrastructure: Robust systems that ensure reliable and scalable performance.
Marketplace: A comprehensive marketplace where you can find and deploy a wide range of pre-built agents.
🐙 Example Agents
Here are two examples of what you can do with AutoGPT:
-
Generate Viral Videos from Trending Topics
- This agent reads topics on Reddit.
- It identifies trending topics.
- It then automatically creates a short-form video based on the content.
-
Identify Top Quotes from Videos for Social Media
- This agent subscribes to your YouTube channel.
- When you post a new video, it transcribes it.
- It uses AI to identify the most impactful quotes to generate a summary.
- Then, it writes a post to automatically publish to your social media.
These examples show just a glimpse of what you can achieve with AutoGPT! You can create customized workflows to build agents for any use case.
Mission and Licencing
Our mission is to provide the tools, so that you can focus on what matters:
- 🏗️ Building - Lay the foundation for something amazing.
- 🧪 Testing - Fine-tune your agent to perfection.
- 🤝 Delegating - Let AI work for you, and have your ideas come to life.
Be part of the revolution! AutoGPT is here to stay, at the forefront of AI innovation.
📖 Documentation | 🚀 Contributing
Licensing:
MIT License: The majority of the AutoGPT repository is under the MIT License.
Polyform Shield License: This license applies to the autogpt_platform folder.
For more information, see https://agpt.co/blog/introducing-the-autogpt-platform
🤖 AutoGPT Classic
Below is information about the classic version of AutoGPT.
🛠️ Build your own Agent - Quickstart
🏗️ Forge
Forge your own agent! – Forge is a ready-to-go toolkit to build your own agent application. It handles most of the boilerplate code, letting you channel all your creativity into the things that set your agent apart. All tutorials are located here. Components from forge can also be used individually to speed up development and reduce boilerplate in your agent project.
🚀 Getting Started with Forge – This guide will walk you through the process of creating your own agent and using the benchmark and user interface.
📘 Learn More about Forge
🎯 Benchmark
Measure your agent's performance! The agbenchmark can be used with any agent that supports the agent protocol, and the integration with the project's CLI makes it even easier to use with AutoGPT and forge-based agents. The benchmark offers a stringent testing environment. Our framework allows for autonomous, objective performance evaluations, ensuring your agents are primed for real-world action.
📦 agbenchmark on Pypi
|
📘 Learn More about the Benchmark
💻 UI
Makes agents easy to use! The frontend gives you a user-friendly interface to control and monitor your agents. It connects to agents through the agent protocol, ensuring compatibility with many agents from both inside and outside of our ecosystem.
The frontend works out-of-the-box with all agents in the repo. Just use the CLI to run your agent of choice!
📘 Learn More about the Frontend
⌨️ CLI
To make it as easy as possible to use all of the tools offered by the repository, a CLI is included at the root of the repo:
$ ./run
Usage: cli.py [OPTIONS] COMMAND [ARGS]...
Options:
--help Show this message and exit.
Commands:
agent Commands to create, start and stop agents
benchmark Commands to start the benchmark and list tests and categories
setup Installs dependencies needed for your system.
Just clone the repo, install dependencies with ./run setup, and you should be good to go!
🤔 Questions? Problems? Suggestions?
Get help - Discord 💬
To report a bug or request a feature, create a GitHub Issue. Please ensure someone else hasn't created an issue for the same topic.
🤝 Sister projects
🔄 Agent Protocol
To maintain a uniform standard and ensure seamless compatibility with many current and future applications, AutoGPT employs the agent protocol standard by the AI Engineer Foundation. This standardizes the communication pathways from your agent to the frontend and benchmark.