mirror of https://github.com/Significant-Gravitas/AutoGPT.git synced 2026-04-30 03:00:41 -04:00

Files

Zamil Majdy fff101e037 feat(backend): add SQL query block with multi-database support for CoPilot analytics (#12569 )

## Summary
- Add a read-only SQL query block for CoPilot/AutoPilot analytics access
- Supports **multiple databases**: PostgreSQL, MySQL, SQLite, MSSQL via
SQLAlchemy
- Enforces read-only queries (SELECT only) with defense-in-depth SQL
validation using sqlparse
- SSRF protection: blocks connections to private/internal IPs
- Credentials stored securely via the platform credential system

## Changes
- New `SQLQueryBlock` in `backend/blocks/sql_query_block.py` with
`DatabaseType` enum
- SQLAlchemy-based execution with dialect-specific read-only and timeout
settings
- Connection URL validation ensuring driver matches selected database
type
- Comprehensive test suite (62 tests) including URL validation,
sanitization, serialization
- Documentation in `docs/integrations/block-integrations/data.md`
- Added `DATABASE` provider to `ProviderName` enum

### Checklist 📋
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan

#### Test plan:
- [x] Unit tests pass for query validation, URL validation, error
sanitization, value serialization
- [x] Read-only enforcement rejects INSERT/UPDATE/DELETE/DROP
- [x] Multi-statement injection blocked
- [x] SSRF protection blocks private IPs
- [x] Connection URL driver validation works for all 4 database types

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-03 06:43:40 +00:00

15 KiB

Raw Blame History

Data

Blocks for creating, reading, and manipulating data structures including lists, dictionaries, spreadsheets, and persistent storage.

Create Dictionary

What it is

Creates a dictionary with the specified key-value pairs. Use this when you know all the values you want to add upfront.

How it works

This block creates a new dictionary from specified key-value pairs in a single operation. It's designed for cases where you know all the data upfront, rather than building the dictionary incrementally.

The block takes a dictionary input and outputs it as-is, making it useful as a starting point for workflows that need to pass structured data between blocks.

Inputs

Input	Description	Type	Required
values	Key-value pairs to create the dictionary with	Dict[str, Any]	Yes

Outputs

Output	Description	Type
error	Error message if dictionary creation failed	str
dictionary	The created dictionary containing the specified key-value pairs	Dict[str, Any]

Possible use case

API Request Payloads: Create complete request body objects with all required fields before sending to an API.

Configuration Objects: Build settings dictionaries with predefined values for initializing services or workflows.

Data Mapping: Transform input data into a structured format with specific keys expected by downstream blocks.

Create List

What it is

Creates a list with the specified values. Use this when you know all the values you want to add upfront. This block can also yield the list in batches based on a maximum size or token limit.

How it works

This block creates a list from provided values and can optionally chunk it into smaller batches. When max_size is set, the list is yielded in chunks of that size. When max_tokens is set, chunks are sized to fit within token limits for LLM processing.

This batching capability is particularly useful when processing large datasets that need to be split for API limits or memory constraints.

Inputs

Input	Description	Type	Required
values	A list of values to be combined into a new list.	List[Any]	Yes
max_size	Maximum size of the list. If provided, the list will be yielded in chunks of this size.	int	No
max_tokens	Maximum tokens for the list. If provided, the list will be yielded in chunks that fit within this token limit.	int	No

Outputs

Output	Description	Type
error	Error message if the operation failed	str
list	The created list containing the specified values.	List[Any]

Possible use case

Batch Processing: Split large datasets into manageable chunks for API calls with rate limits.

LLM Token Management: Divide text content into token-limited batches for processing by language models.

Parallel Processing: Create batches of work items that can be processed concurrently by multiple blocks.

File Read

What it is

Reads a file and returns its content as a string, with optional chunking by delimiter and size limits

How it works

This block reads file content from various sources (URL, data URI, or local path) and returns it as a string. It supports chunking via delimiter (like newlines) or size limits, yielding content in manageable pieces.

Use skip_rows and skip_size to skip header content or initial bytes. When delimiter and limits are set, content is yielded chunk by chunk, enabling processing of large files without loading everything into memory.

Inputs

Input	Description	Type	Required
file_input	The file to read from (URL, data URI, or local path)	str (file)	Yes
delimiter	Delimiter to split the content into rows/chunks (e.g., '\n' for lines)	str	No
size_limit	Maximum size in bytes per chunk to yield (0 for no limit)	int	No
row_limit	Maximum number of rows to process (0 for no limit, requires delimiter)	int	No
skip_size	Number of characters to skip from the beginning of the file	int	No
skip_rows	Number of rows to skip from the beginning (requires delimiter)	int	No

Outputs

Output	Description	Type
error	Error message if the operation failed	str
content	File content, yielded as individual chunks when delimiter or size limits are applied	str

Possible use case

Log File Processing: Read and process log files line by line, filtering or transforming each entry.

Large Document Analysis: Read large text files in chunks for summarization or analysis without memory issues.

Data Import: Read text-based data files and process them row by row for database import.

Persist Information

What it is

Persist key-value information for the current user

How it works

This block stores key-value data that persists across workflow runs. You can scope the persistence to either within_agent (available to all runs of this specific agent) or across_agents (available to all agents for this user).

The stored data remains available until explicitly overwritten, enabling state management and configuration persistence between workflow executions.

Inputs

Input	Description	Type	Required
key	Key to store the information under	str	Yes
value	Value to store	Value	Yes
scope	Scope of persistence: within_agent (shared across all runs of this agent) or across_agents (shared across all agents for this user)	"within_agent" \| "across_agents"	No

Outputs

Output	Description	Type
error	Error message if the operation failed	str
value	Value that was stored	Value

Possible use case

User Preferences: Store user settings like preferred language or notification preferences for future runs.

Progress Tracking: Save the last processed item ID to resume batch processing where you left off.

API Token Caching: Store refreshed API tokens that can be reused across multiple workflow executions.

Read Spreadsheet

What it is

Reads CSV and Excel files and outputs the data as a list of dictionaries and individual rows. Excel files are automatically converted to CSV format.

How it works

This block parses CSV and Excel files, converting each row into a dictionary with column headers as keys. Excel files are automatically converted to CSV format before processing.

Configure delimiter, quote character, and escape character for proper CSV parsing. Use skip_rows to ignore headers or initial rows, and skip_columns to exclude unwanted columns from the output.

Inputs

Input	Description	Type	Required
contents	The contents of the CSV/spreadsheet data to read	str	No
file_input	CSV or Excel file to read from (URL, data URI, or local path). Excel files are automatically converted to CSV	str (file)	No
delimiter	The delimiter used in the CSV/spreadsheet data	str	No
quotechar	The character used to quote fields	str	No
escapechar	The character used to escape the delimiter	str	No
has_header	Whether the CSV file has a header row	bool	No
skip_rows	The number of rows to skip from the start of the file	int	No
strip	Whether to strip whitespace from the values	bool	No
skip_columns	The columns to skip from the start of the row	List[str]	No
produce_singular_result	If True, yield individual 'row' outputs only (can be slow). If False, yield both 'rows' (all data)	bool	No

Outputs

Output	Description	Type
error	Error message if the operation failed	str
row	The data produced from each row in the spreadsheet	Dict[str, str]
rows	All the data in the spreadsheet as a list of rows	List[Dict[str, str]]

Possible use case

Data Import: Import product catalogs, contact lists, or inventory data from spreadsheet exports.

Report Processing: Parse generated CSV reports from other systems for analysis or transformation.

Bulk Operations: Process spreadsheets of email addresses, user records, or configuration data row by row.

Retrieve Information

What it is

Retrieve key-value information for the current user

How it works

This block retrieves previously stored key-value data for the current user. Specify the key and scope to fetch the corresponding value. If the key doesn't exist, the default_value is returned.

Use within_agent scope for agent-specific data or across_agents for data shared across all user agents.

Inputs

Input	Description	Type	Required
key	Key to retrieve the information for	str	Yes
scope	Scope of persistence: within_agent (shared across all runs of this agent) or across_agents (shared across all agents for this user)	"within_agent" \| "across_agents"	No
default_value	Default value to return if key is not found	Default Value	No

Outputs

Output	Description	Type
error	Error message if the operation failed	str
value	Retrieved value or default value	Value

Possible use case

Resume Processing: Retrieve the last processed item ID to continue batch operations from where you left off.

Load Preferences: Fetch stored user preferences at workflow start to customize behavior.

State Restoration: Retrieve workflow state saved from a previous run to maintain continuity.

SQL Query

What it is

Execute a SQL query. Read-only by default for safety -- disable to allow write operations. Supports PostgreSQL, MySQL, and MSSQL via SQLAlchemy.

How it works

This block connects to a database using discrete host, port, and database fields and executes a SQL query via SQLAlchemy. It validates that the query is a single statement (using sqlparse to prevent SQL injection via multi-statement attacks), enforces SSRF protections on the database host, and returns results as a list of row dictionaries.

By default, only SELECT queries are allowed (read-only mode). The database session is set to read-only and the transaction is always rolled back. Disable the read_only option to allow write operations (INSERT, UPDATE, DELETE, CREATE, DROP, etc.).

Supported database types: PostgreSQL, MySQL, and MSSQL.

Inputs

Input	Description	Type	Required
database_type	Database engine	"postgres" \| "mysql" \| "mssql"	No
host	Database hostname or IP address. Treated as a secret to avoid leaking infrastructure details. Private/internal IPs are blocked (SSRF protection).	str (password)	Yes
port	Database port (leave empty for default: PostgreSQL: 5432, MySQL: 3306, MSSQL: 1433)	int	No
database	Name of the database to connect to	str	Yes
query	SQL query to execute	str	Yes
read_only	When enabled (default), only SELECT queries are allowed and the database session is set to read-only mode. Disable to allow write operations (INSERT, UPDATE, DELETE, etc.).	bool	No
timeout	Query timeout in seconds (max 120)	int	No
max_rows	Maximum number of rows to return (max 10000)	int	No

Outputs

Output	Description	Type
error	Error message if the query failed	str
results	Query results as a list of row dictionaries	List[Dict[str, Any]]
columns	Column names from the query result	List[str]
row_count	Number of rows returned	int
truncated	True when the result set was capped by max_rows, indicating additional rows exist in the database	bool
affected_rows	Number of rows affected by a write query (INSERT/UPDATE/DELETE)	int

Possible use case

Analytics Dashboards: Query your PostgreSQL or MySQL analytics database to pull daily active user counts, revenue metrics, or funnel data directly into your workflow.

Data Management: Run INSERT, UPDATE, or DELETE queries to manage data in your databases as part of automated workflows.

Schema Management: Create or modify database tables and indexes as part of provisioning or migration workflows.

Cross-Database Reporting: Connect to multiple database types (PostgreSQL, MySQL) within a single workflow to aggregate data from different sources.

Screenshot Web Page

What it is

Takes a screenshot of a specified website using ScreenshotOne API

How it works

This block uses the ScreenshotOne API to capture screenshots of web pages. Configure viewport dimensions, output format, and whether to capture the full page or just the visible area.

Optional features include blocking ads, cookie banners, and chat widgets for cleaner screenshots. Caching can be enabled to improve performance for repeated captures of the same page.

Inputs

Input	Description	Type	Required
url	URL of the website to screenshot	str	Yes
viewport_width	Width of the viewport in pixels	int	No
viewport_height	Height of the viewport in pixels	int	No
full_page	Whether to capture the full page length	bool	No
format	Output format (png, jpeg, webp)	"png" \| "jpeg" \| "webp"	No
block_ads	Whether to block ads	bool	No
block_cookie_banners	Whether to block cookie banners	bool	No
block_chats	Whether to block chat widgets	bool	No
cache	Whether to enable caching	bool	No

Outputs

Output	Description	Type
error	Error message if the operation failed	str
image	The screenshot image data	str (file)

Possible use case

Visual Documentation: Capture screenshots of web pages for documentation, reports, or archives.

Competitive Monitoring: Regularly screenshot competitor websites to track design and content changes.

Visual Testing: Capture page renders for visual regression testing or design verification workflows.

15 KiB Raw Blame History

Data

Create Dictionary

What it is

How it works

Inputs

Outputs

Possible use case

Create List

What it is

How it works

Inputs

Outputs

Possible use case

File Read

What it is

How it works

Inputs

Outputs

Possible use case

Persist Information

What it is

How it works

Inputs

Outputs

Possible use case

Read Spreadsheet

What it is

How it works

Inputs

Outputs

Possible use case

Retrieve Information

What it is

How it works

Inputs

Outputs

Possible use case

SQL Query

What it is

How it works

Inputs

Outputs

Possible use case

Screenshot Web Page

What it is

How it works

Inputs

Outputs

Possible use case

15 KiB

Raw Blame History