mirror of
https://github.com/Significant-Gravitas/AutoGPT.git
synced 2026-01-06 22:03:59 -05:00
BREAKING CHANGE: Removed deprecated use_auto_prompt field from Input
schema. Existing workflows using this field will need to be updated to
use the type field set to "auto" instead.
## Summary of Changes 📝
This PR comprehensively updates all Exa search blocks to match the
latest Exa API specification and adds significant new functionality
through the Websets API integration.
### Core API Updates 🔄
- **Migration to Exa SDK**: Replaced manual API calls with the official
`exa_py` AsyncExa SDK across all blocks for better reliability and
maintainability
- **Removed deprecated fields**: Eliminated
`use_auto_prompt`/`useAutoprompt` field (breaking change)
- **Fixed incomplete field definitions**: Corrected `user_location`
field definition
- **Added new input fields**: Added `moderation` and `context` fields
for enhanced content filtering
### Enhanced Content Settings 🛠️
- **Text field improvements**: Support both boolean and advanced object
configurations
- **New content options**:
- Added `livecrawl` settings (never, fallback, always, preferred)
- Added `subpages` support for deeper content retrieval
- Added `extras` settings for links and images
- Added `context` settings for additional contextual information
- **Updated settings**: Enhanced `highlight` and `summary`
configurations with new query and schema options
### Comprehensive Cost Tracking 💰
- Added detailed cost tracking models:
- `CostDollars` for monetary costs
- `CostCredits` for API credit tracking
- `CostDuration` for time-based costs
- New output fields: `request_id`, `resolved_search_type`,
`cost_dollars`
- Improved response handling to conditionally yield fields based on
availability
### New Websets API Integration 🚀
Added eight new specialized blocks for Exa's Websets API:
- **`websets.py`**: Core webset management (create, get, list, delete)
- **`websets_search.py`**: Search operations within websets
- **`websets_items.py`**: Individual item management (add, get, update,
delete)
- **`websets_enrichment.py`**: Data enrichment operations
- **`websets_import_export.py`**: Bulk import/export functionality
- **`websets_monitor.py`**: Monitor and track webset changes
- **`websets_polling.py`**: Poll for updates and changes
### New Special-Purpose Blocks 🎯
- **`code_context.py`**: Code search capabilities for finding relevant
code snippets from open source repositories, documentation, and Stack
Overflow
- **`research.py`**: Asynchronous research capabilities that explore the
web, gather sources, synthesize findings, and return structured results
with citations
### Code Organization Improvements 📁
- **Removed legacy code**: Deleted `model.py` file containing deprecated
API models
- **Centralized helpers**: Consolidated shared models and utilities in
`helpers.py`
- **Improved modularity**: Each webset operation is now in its own
dedicated file
### Other Changes 🔧
- Updated `.gitignore` for better development workflow
- Updated `CLAUDE.md` with project-specific instructions
- Updated documentation in `docs/content/platform/new_blocks.md` with
error handling, data models, and file input guidelines
- Improved webhook block implementations with SDK integration
### Files Changed 📂
- **Modified (11 files)**:
- `.gitignore`
- `autogpt_platform/CLAUDE.md`
- `autogpt_platform/backend/backend/blocks/exa/answers.py`
- `autogpt_platform/backend/backend/blocks/exa/contents.py`
- `autogpt_platform/backend/backend/blocks/exa/helpers.py`
- `autogpt_platform/backend/backend/blocks/exa/search.py`
- `autogpt_platform/backend/backend/blocks/exa/similar.py`
- `autogpt_platform/backend/backend/blocks/exa/webhook_blocks.py`
- `autogpt_platform/backend/backend/blocks/exa/websets.py`
- `docs/content/platform/new_blocks.md`
- **Added (8 files)**:
- `autogpt_platform/backend/backend/blocks/exa/code_context.py`
- `autogpt_platform/backend/backend/blocks/exa/research.py`
- `autogpt_platform/backend/backend/blocks/exa/websets_enrichment.py`
- `autogpt_platform/backend/backend/blocks/exa/websets_import_export.py`
- `autogpt_platform/backend/backend/blocks/exa/websets_items.py`
- `autogpt_platform/backend/backend/blocks/exa/websets_monitor.py`
- `autogpt_platform/backend/backend/blocks/exa/websets_polling.py`
- `autogpt_platform/backend/backend/blocks/exa/websets_search.py`
- **Deleted (1 file)**:
- `autogpt_platform/backend/backend/blocks/exa/model.py`
### Migration Guide 🚦
For users with existing workflows using the deprecated `use_auto_prompt`
field:
1. Remove the `use_auto_prompt` field from your input configuration
2. Set the `type` field to `ExaSearchTypes.AUTO` (or "auto" in JSON) to
achieve the same behavior
3. Review any custom content settings as the structure has been enhanced
### Testing Recommendations ✅
- Test existing workflows to ensure they handle the breaking change
- Verify cost tracking fields are properly returned
- Test new content settings options (livecrawl, subpages, extras,
context)
- Validate websets functionality if using the new Websets API blocks
🤖 Generated with [Claude Code](https://claude.com/claude-code)
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] made + ran a test agent for the blocks and flows between them
[Exa
Tests_v44.json](https://github.com/user-attachments/files/23226143/Exa.Tests_v44.json)
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> Migrates Exa blocks to AsyncExa SDK, adds comprehensive
Websets/research/code-context blocks, updates existing
search/content/answers/similar, deletes legacy models, adjusts
tests/docs; breaking: remove `use_auto_prompt` in favor of
`type="auto"`.
>
> - **Backend — Exa integration (SDK migration & BREAKING)**:
> - Replace manual HTTP calls with `exa_py.AsyncExa` across `search`,
`similar`, `contents`, `answers`, and webhooks; richer outputs
(citations, context, costs, resolved search type).
> - BREAKING: remove `Input.use_auto_prompt`; use `type = "auto"`.
> - Centralize models/utilities in `exa/helpers.py` (content settings,
cost models, result mappers).
> - **New Blocks**:
> - **Websets**: management (`websets.py`), searches, items,
enrichments, imports/exports, monitors, polling (new files under
`exa/websets_*`).
> - **Research**: async research task create/get/wait/list
(`exa/research.py`).
> - **Code Context**: code snippet/context retrieval
(`exa/code_context.py`).
> - **Removals**:
> - Delete deprecated `exa/model.py`.
> - **Docs & DX**:
> - Update `docs/new_blocks.md` (error handling, models, file input) and
`CLAUDE.md`; ignore backend logs in `.gitignore`.
> - **Frontend Tests**:
> - Split/extend “e” block tests and improve block add robustness in
Playwright (`build.spec.ts`, `build.page.ts`).
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
6e5e572322. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added multiple Exa research and webset management blocks for task
creation, monitoring, and completion tracking.
* Introduced new search capabilities including code context retrieval,
content search, and enhanced filtering options.
* Added webset enrichment, import/export, and item management
functionality.
* Expanded search with location-based and category filters.
* **Documentation**
* Updated guidance on error handling, data models, and file input
handling.
* **Refactor**
* Modernized backend API integration with improved response structure
and error reporting.
* Simplified configuration options for search operations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: Claude <noreply@anthropic.com>
226 lines
7.8 KiB
Python
226 lines
7.8 KiB
Python
from enum import Enum
|
|
from typing import Optional
|
|
|
|
from exa_py import AsyncExa
|
|
from pydantic import BaseModel
|
|
|
|
from backend.sdk import (
|
|
APIKeyCredentials,
|
|
Block,
|
|
BlockCategory,
|
|
BlockOutput,
|
|
BlockSchemaInput,
|
|
BlockSchemaOutput,
|
|
CredentialsMetaInput,
|
|
SchemaField,
|
|
)
|
|
|
|
from ._config import exa
|
|
from .helpers import (
|
|
CostDollars,
|
|
ExaSearchResults,
|
|
ExtrasSettings,
|
|
HighlightSettings,
|
|
LivecrawlTypes,
|
|
SummarySettings,
|
|
)
|
|
|
|
|
|
class ContentStatusTag(str, Enum):
|
|
CRAWL_NOT_FOUND = "CRAWL_NOT_FOUND"
|
|
CRAWL_TIMEOUT = "CRAWL_TIMEOUT"
|
|
CRAWL_LIVECRAWL_TIMEOUT = "CRAWL_LIVECRAWL_TIMEOUT"
|
|
SOURCE_NOT_AVAILABLE = "SOURCE_NOT_AVAILABLE"
|
|
CRAWL_UNKNOWN_ERROR = "CRAWL_UNKNOWN_ERROR"
|
|
|
|
|
|
class ContentError(BaseModel):
|
|
tag: Optional[ContentStatusTag] = SchemaField(
|
|
default=None, description="Specific error type"
|
|
)
|
|
httpStatusCode: Optional[int] = SchemaField(
|
|
default=None, description="The corresponding HTTP status code"
|
|
)
|
|
|
|
|
|
class ContentStatus(BaseModel):
|
|
id: str = SchemaField(description="The URL that was requested")
|
|
status: str = SchemaField(
|
|
description="Status of the content fetch operation (success or error)"
|
|
)
|
|
error: Optional[ContentError] = SchemaField(
|
|
default=None, description="Error details, only present when status is 'error'"
|
|
)
|
|
|
|
|
|
class ExaContentsBlock(Block):
|
|
class Input(BlockSchemaInput):
|
|
credentials: CredentialsMetaInput = exa.credentials_field(
|
|
description="The Exa integration requires an API Key."
|
|
)
|
|
urls: list[str] = SchemaField(
|
|
description="Array of URLs to crawl (preferred over 'ids')",
|
|
default_factory=list,
|
|
advanced=False,
|
|
)
|
|
ids: list[str] = SchemaField(
|
|
description="[DEPRECATED - use 'urls' instead] Array of document IDs obtained from searches",
|
|
default_factory=list,
|
|
advanced=True,
|
|
)
|
|
text: bool = SchemaField(
|
|
description="Retrieve text content from pages",
|
|
default=True,
|
|
)
|
|
highlights: HighlightSettings = SchemaField(
|
|
description="Text snippets most relevant from each page",
|
|
default=HighlightSettings(),
|
|
)
|
|
summary: SummarySettings = SchemaField(
|
|
description="LLM-generated summary of the webpage",
|
|
default=SummarySettings(),
|
|
)
|
|
livecrawl: Optional[LivecrawlTypes] = SchemaField(
|
|
description="Livecrawling options: never, fallback (default), always, preferred",
|
|
default=LivecrawlTypes.FALLBACK,
|
|
advanced=True,
|
|
)
|
|
livecrawl_timeout: Optional[int] = SchemaField(
|
|
description="Timeout for livecrawling in milliseconds",
|
|
default=10000,
|
|
advanced=True,
|
|
)
|
|
subpages: Optional[int] = SchemaField(
|
|
description="Number of subpages to crawl", default=0, ge=0, advanced=True
|
|
)
|
|
subpage_target: Optional[str | list[str]] = SchemaField(
|
|
description="Keyword(s) to find specific subpages of search results",
|
|
default=None,
|
|
advanced=True,
|
|
)
|
|
extras: ExtrasSettings = SchemaField(
|
|
description="Extra parameters for additional content",
|
|
default=ExtrasSettings(),
|
|
advanced=True,
|
|
)
|
|
|
|
class Output(BlockSchemaOutput):
|
|
results: list[ExaSearchResults] = SchemaField(
|
|
description="List of document contents with metadata"
|
|
)
|
|
result: ExaSearchResults = SchemaField(
|
|
description="Single document content result"
|
|
)
|
|
context: str = SchemaField(
|
|
description="A formatted string of the results ready for LLMs"
|
|
)
|
|
request_id: str = SchemaField(description="Unique identifier for the request")
|
|
statuses: list[ContentStatus] = SchemaField(
|
|
description="Status information for each requested URL"
|
|
)
|
|
cost_dollars: Optional[CostDollars] = SchemaField(
|
|
description="Cost breakdown for the request"
|
|
)
|
|
error: str = SchemaField(description="Error message if the request failed")
|
|
|
|
def __init__(self):
|
|
super().__init__(
|
|
id="c52be83f-f8cd-4180-b243-af35f986b461",
|
|
description="Retrieves document contents using Exa's contents API",
|
|
categories={BlockCategory.SEARCH},
|
|
input_schema=ExaContentsBlock.Input,
|
|
output_schema=ExaContentsBlock.Output,
|
|
)
|
|
|
|
async def run(
|
|
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
|
) -> BlockOutput:
|
|
if not input_data.urls and not input_data.ids:
|
|
raise ValueError("Either 'urls' or 'ids' must be provided")
|
|
|
|
sdk_kwargs = {}
|
|
|
|
# Prefer urls over ids
|
|
if input_data.urls:
|
|
sdk_kwargs["urls"] = input_data.urls
|
|
elif input_data.ids:
|
|
sdk_kwargs["ids"] = input_data.ids
|
|
|
|
if input_data.text:
|
|
sdk_kwargs["text"] = {"includeHtmlTags": True}
|
|
|
|
# Handle highlights - only include if modified from defaults
|
|
if input_data.highlights and (
|
|
input_data.highlights.num_sentences != 1
|
|
or input_data.highlights.highlights_per_url != 1
|
|
or input_data.highlights.query is not None
|
|
):
|
|
highlights_dict = {}
|
|
highlights_dict["numSentences"] = input_data.highlights.num_sentences
|
|
highlights_dict["highlightsPerUrl"] = (
|
|
input_data.highlights.highlights_per_url
|
|
)
|
|
if input_data.highlights.query:
|
|
highlights_dict["query"] = input_data.highlights.query
|
|
sdk_kwargs["highlights"] = highlights_dict
|
|
|
|
# Handle summary - only include if modified from defaults
|
|
if input_data.summary and (
|
|
input_data.summary.query is not None
|
|
or input_data.summary.schema is not None
|
|
):
|
|
summary_dict = {}
|
|
if input_data.summary.query:
|
|
summary_dict["query"] = input_data.summary.query
|
|
if input_data.summary.schema:
|
|
summary_dict["schema"] = input_data.summary.schema
|
|
sdk_kwargs["summary"] = summary_dict
|
|
|
|
if input_data.livecrawl:
|
|
sdk_kwargs["livecrawl"] = input_data.livecrawl.value
|
|
|
|
if input_data.livecrawl_timeout is not None:
|
|
sdk_kwargs["livecrawl_timeout"] = input_data.livecrawl_timeout
|
|
|
|
if input_data.subpages is not None:
|
|
sdk_kwargs["subpages"] = input_data.subpages
|
|
|
|
if input_data.subpage_target:
|
|
sdk_kwargs["subpage_target"] = input_data.subpage_target
|
|
|
|
# Handle extras - only include if modified from defaults
|
|
if input_data.extras and (
|
|
input_data.extras.links > 0 or input_data.extras.image_links > 0
|
|
):
|
|
extras_dict = {}
|
|
if input_data.extras.links:
|
|
extras_dict["links"] = input_data.extras.links
|
|
if input_data.extras.image_links:
|
|
extras_dict["image_links"] = input_data.extras.image_links
|
|
sdk_kwargs["extras"] = extras_dict
|
|
|
|
# Always enable context for LLM-ready output
|
|
sdk_kwargs["context"] = True
|
|
|
|
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
|
|
response = await aexa.get_contents(**sdk_kwargs)
|
|
|
|
converted_results = [
|
|
ExaSearchResults.from_sdk(sdk_result)
|
|
for sdk_result in response.results or []
|
|
]
|
|
|
|
yield "results", converted_results
|
|
|
|
for result in converted_results:
|
|
yield "result", result
|
|
|
|
if response.context:
|
|
yield "context", response.context
|
|
|
|
if response.statuses:
|
|
yield "statuses", response.statuses
|
|
|
|
if response.cost_dollars:
|
|
yield "cost_dollars", response.cost_dollars
|