mirror of
https://github.com/Significant-Gravitas/AutoGPT.git
synced 2026-01-05 05:14:14 -05:00
feat(blocks)!: Update Exa search block to match latest API specification (#11185)
BREAKING CHANGE: Removed deprecated use_auto_prompt field from Input
schema. Existing workflows using this field will need to be updated to
use the type field set to "auto" instead.
## Summary of Changes 📝
This PR comprehensively updates all Exa search blocks to match the
latest Exa API specification and adds significant new functionality
through the Websets API integration.
### Core API Updates 🔄
- **Migration to Exa SDK**: Replaced manual API calls with the official
`exa_py` AsyncExa SDK across all blocks for better reliability and
maintainability
- **Removed deprecated fields**: Eliminated
`use_auto_prompt`/`useAutoprompt` field (breaking change)
- **Fixed incomplete field definitions**: Corrected `user_location`
field definition
- **Added new input fields**: Added `moderation` and `context` fields
for enhanced content filtering
### Enhanced Content Settings 🛠️
- **Text field improvements**: Support both boolean and advanced object
configurations
- **New content options**:
- Added `livecrawl` settings (never, fallback, always, preferred)
- Added `subpages` support for deeper content retrieval
- Added `extras` settings for links and images
- Added `context` settings for additional contextual information
- **Updated settings**: Enhanced `highlight` and `summary`
configurations with new query and schema options
### Comprehensive Cost Tracking 💰
- Added detailed cost tracking models:
- `CostDollars` for monetary costs
- `CostCredits` for API credit tracking
- `CostDuration` for time-based costs
- New output fields: `request_id`, `resolved_search_type`,
`cost_dollars`
- Improved response handling to conditionally yield fields based on
availability
### New Websets API Integration 🚀
Added eight new specialized blocks for Exa's Websets API:
- **`websets.py`**: Core webset management (create, get, list, delete)
- **`websets_search.py`**: Search operations within websets
- **`websets_items.py`**: Individual item management (add, get, update,
delete)
- **`websets_enrichment.py`**: Data enrichment operations
- **`websets_import_export.py`**: Bulk import/export functionality
- **`websets_monitor.py`**: Monitor and track webset changes
- **`websets_polling.py`**: Poll for updates and changes
### New Special-Purpose Blocks 🎯
- **`code_context.py`**: Code search capabilities for finding relevant
code snippets from open source repositories, documentation, and Stack
Overflow
- **`research.py`**: Asynchronous research capabilities that explore the
web, gather sources, synthesize findings, and return structured results
with citations
### Code Organization Improvements 📁
- **Removed legacy code**: Deleted `model.py` file containing deprecated
API models
- **Centralized helpers**: Consolidated shared models and utilities in
`helpers.py`
- **Improved modularity**: Each webset operation is now in its own
dedicated file
### Other Changes 🔧
- Updated `.gitignore` for better development workflow
- Updated `CLAUDE.md` with project-specific instructions
- Updated documentation in `docs/content/platform/new_blocks.md` with
error handling, data models, and file input guidelines
- Improved webhook block implementations with SDK integration
### Files Changed 📂
- **Modified (11 files)**:
- `.gitignore`
- `autogpt_platform/CLAUDE.md`
- `autogpt_platform/backend/backend/blocks/exa/answers.py`
- `autogpt_platform/backend/backend/blocks/exa/contents.py`
- `autogpt_platform/backend/backend/blocks/exa/helpers.py`
- `autogpt_platform/backend/backend/blocks/exa/search.py`
- `autogpt_platform/backend/backend/blocks/exa/similar.py`
- `autogpt_platform/backend/backend/blocks/exa/webhook_blocks.py`
- `autogpt_platform/backend/backend/blocks/exa/websets.py`
- `docs/content/platform/new_blocks.md`
- **Added (8 files)**:
- `autogpt_platform/backend/backend/blocks/exa/code_context.py`
- `autogpt_platform/backend/backend/blocks/exa/research.py`
- `autogpt_platform/backend/backend/blocks/exa/websets_enrichment.py`
- `autogpt_platform/backend/backend/blocks/exa/websets_import_export.py`
- `autogpt_platform/backend/backend/blocks/exa/websets_items.py`
- `autogpt_platform/backend/backend/blocks/exa/websets_monitor.py`
- `autogpt_platform/backend/backend/blocks/exa/websets_polling.py`
- `autogpt_platform/backend/backend/blocks/exa/websets_search.py`
- **Deleted (1 file)**:
- `autogpt_platform/backend/backend/blocks/exa/model.py`
### Migration Guide 🚦
For users with existing workflows using the deprecated `use_auto_prompt`
field:
1. Remove the `use_auto_prompt` field from your input configuration
2. Set the `type` field to `ExaSearchTypes.AUTO` (or "auto" in JSON) to
achieve the same behavior
3. Review any custom content settings as the structure has been enhanced
### Testing Recommendations ✅
- Test existing workflows to ensure they handle the breaking change
- Verify cost tracking fields are properly returned
- Test new content settings options (livecrawl, subpages, extras,
context)
- Validate websets functionality if using the new Websets API blocks
🤖 Generated with [Claude Code](https://claude.com/claude-code)
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] made + ran a test agent for the blocks and flows between them
[Exa
Tests_v44.json](https://github.com/user-attachments/files/23226143/Exa.Tests_v44.json)
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> Migrates Exa blocks to AsyncExa SDK, adds comprehensive
Websets/research/code-context blocks, updates existing
search/content/answers/similar, deletes legacy models, adjusts
tests/docs; breaking: remove `use_auto_prompt` in favor of
`type="auto"`.
>
> - **Backend — Exa integration (SDK migration & BREAKING)**:
> - Replace manual HTTP calls with `exa_py.AsyncExa` across `search`,
`similar`, `contents`, `answers`, and webhooks; richer outputs
(citations, context, costs, resolved search type).
> - BREAKING: remove `Input.use_auto_prompt`; use `type = "auto"`.
> - Centralize models/utilities in `exa/helpers.py` (content settings,
cost models, result mappers).
> - **New Blocks**:
> - **Websets**: management (`websets.py`), searches, items,
enrichments, imports/exports, monitors, polling (new files under
`exa/websets_*`).
> - **Research**: async research task create/get/wait/list
(`exa/research.py`).
> - **Code Context**: code snippet/context retrieval
(`exa/code_context.py`).
> - **Removals**:
> - Delete deprecated `exa/model.py`.
> - **Docs & DX**:
> - Update `docs/new_blocks.md` (error handling, models, file input) and
`CLAUDE.md`; ignore backend logs in `.gitignore`.
> - **Frontend Tests**:
> - Split/extend “e” block tests and improve block add robustness in
Playwright (`build.spec.ts`, `build.page.ts`).
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
6e5e572322. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added multiple Exa research and webset management blocks for task
creation, monitoring, and completion tracking.
* Introduced new search capabilities including code context retrieval,
content search, and enhanced filtering options.
* Added webset enrichment, import/export, and item management
functionality.
* Expanded search with location-based and category filters.
* **Documentation**
* Updated guidance on error handling, data models, and file input
handling.
* **Refactor**
* Modernized backend API integration with improved response structure
and error reporting.
* Simplified configuration options for search operations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
1
.gitignore
vendored
1
.gitignore
vendored
@@ -178,3 +178,4 @@ autogpt_platform/backend/settings.py
|
||||
*.ign.*
|
||||
.test-contents
|
||||
.claude/settings.local.json
|
||||
/autogpt_platform/backend/logs
|
||||
|
||||
@@ -192,6 +192,8 @@ Quick steps:
|
||||
Note: when making many new blocks analyze the interfaces for each of these blocks and picture if they would go well together in a graph based editor or would they struggle to connect productively?
|
||||
ex: do the inputs and outputs tie well together?
|
||||
|
||||
If you get any pushback or hit complex block conditions check the new_blocks guide in the docs.
|
||||
|
||||
**Modifying the API:**
|
||||
|
||||
1. Update route in `/backend/backend/server/routers/`
|
||||
|
||||
22
autogpt_platform/backend/backend/blocks/exa/_test.py
Normal file
22
autogpt_platform/backend/backend/blocks/exa/_test.py
Normal file
@@ -0,0 +1,22 @@
|
||||
"""
|
||||
Test credentials and helpers for Exa blocks.
|
||||
"""
|
||||
|
||||
from pydantic import SecretStr
|
||||
|
||||
from backend.data.model import APIKeyCredentials
|
||||
|
||||
TEST_CREDENTIALS = APIKeyCredentials(
|
||||
id="01234567-89ab-cdef-0123-456789abcdef",
|
||||
provider="exa",
|
||||
api_key=SecretStr("mock-exa-api-key"),
|
||||
title="Mock Exa API key",
|
||||
expires_at=None,
|
||||
)
|
||||
|
||||
TEST_CREDENTIALS_INPUT = {
|
||||
"provider": TEST_CREDENTIALS.provider,
|
||||
"id": TEST_CREDENTIALS.id,
|
||||
"type": TEST_CREDENTIALS.type,
|
||||
"title": TEST_CREDENTIALS.title,
|
||||
}
|
||||
@@ -1,52 +1,55 @@
|
||||
from typing import Optional
|
||||
|
||||
from exa_py import AsyncExa
|
||||
from exa_py.api import AnswerResponse
|
||||
from pydantic import BaseModel
|
||||
|
||||
from backend.sdk import (
|
||||
APIKeyCredentials,
|
||||
BaseModel,
|
||||
Block,
|
||||
BlockCategory,
|
||||
BlockOutput,
|
||||
BlockSchemaInput,
|
||||
BlockSchemaOutput,
|
||||
CredentialsMetaInput,
|
||||
Requests,
|
||||
MediaFileType,
|
||||
SchemaField,
|
||||
)
|
||||
|
||||
from ._config import exa
|
||||
|
||||
|
||||
class CostBreakdown(BaseModel):
|
||||
keywordSearch: float
|
||||
neuralSearch: float
|
||||
contentText: float
|
||||
contentHighlight: float
|
||||
contentSummary: float
|
||||
class AnswerCitation(BaseModel):
|
||||
"""Citation model for answer endpoint."""
|
||||
|
||||
id: str = SchemaField(description="The temporary ID for the document")
|
||||
url: str = SchemaField(description="The URL of the search result")
|
||||
title: Optional[str] = SchemaField(description="The title of the search result")
|
||||
author: Optional[str] = SchemaField(description="The author of the content")
|
||||
publishedDate: Optional[str] = SchemaField(
|
||||
description="An estimate of the creation date"
|
||||
)
|
||||
text: Optional[str] = SchemaField(description="The full text content of the source")
|
||||
image: Optional[MediaFileType] = SchemaField(
|
||||
description="The URL of the image associated with the result"
|
||||
)
|
||||
favicon: Optional[MediaFileType] = SchemaField(
|
||||
description="The URL of the favicon for the domain"
|
||||
)
|
||||
|
||||
class SearchBreakdown(BaseModel):
|
||||
search: float
|
||||
contents: float
|
||||
breakdown: CostBreakdown
|
||||
|
||||
|
||||
class PerRequestPrices(BaseModel):
|
||||
neuralSearch_1_25_results: float
|
||||
neuralSearch_26_100_results: float
|
||||
neuralSearch_100_plus_results: float
|
||||
keywordSearch_1_100_results: float
|
||||
keywordSearch_100_plus_results: float
|
||||
|
||||
|
||||
class PerPagePrices(BaseModel):
|
||||
contentText: float
|
||||
contentHighlight: float
|
||||
contentSummary: float
|
||||
|
||||
|
||||
class CostDollars(BaseModel):
|
||||
total: float
|
||||
breakDown: list[SearchBreakdown]
|
||||
perRequestPrices: PerRequestPrices
|
||||
perPagePrices: PerPagePrices
|
||||
@classmethod
|
||||
def from_sdk(cls, sdk_citation) -> "AnswerCitation":
|
||||
"""Convert SDK AnswerResult (dataclass) to our Pydantic model."""
|
||||
return cls(
|
||||
id=getattr(sdk_citation, "id", ""),
|
||||
url=getattr(sdk_citation, "url", ""),
|
||||
title=getattr(sdk_citation, "title", None),
|
||||
author=getattr(sdk_citation, "author", None),
|
||||
publishedDate=getattr(sdk_citation, "published_date", None),
|
||||
text=getattr(sdk_citation, "text", None),
|
||||
image=getattr(sdk_citation, "image", None),
|
||||
favicon=getattr(sdk_citation, "favicon", None),
|
||||
)
|
||||
|
||||
|
||||
class ExaAnswerBlock(Block):
|
||||
@@ -59,31 +62,21 @@ class ExaAnswerBlock(Block):
|
||||
placeholder="What is the latest valuation of SpaceX?",
|
||||
)
|
||||
text: bool = SchemaField(
|
||||
default=False,
|
||||
description="If true, the response includes full text content in the search results",
|
||||
advanced=True,
|
||||
)
|
||||
model: str = SchemaField(
|
||||
default="exa",
|
||||
description="The search model to use (exa or exa-pro)",
|
||||
placeholder="exa",
|
||||
advanced=True,
|
||||
description="Include full text content in the search results used for the answer",
|
||||
default=True,
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
answer: str = SchemaField(
|
||||
description="The generated answer based on search results"
|
||||
)
|
||||
citations: list[dict] = SchemaField(
|
||||
description="Search results used to generate the answer",
|
||||
default_factory=list,
|
||||
citations: list[AnswerCitation] = SchemaField(
|
||||
description="Search results used to generate the answer"
|
||||
)
|
||||
cost_dollars: CostDollars = SchemaField(
|
||||
description="Cost breakdown of the request"
|
||||
)
|
||||
error: str = SchemaField(
|
||||
description="Error message if the request failed", default=""
|
||||
citation: AnswerCitation = SchemaField(
|
||||
description="Individual citation from the answer"
|
||||
)
|
||||
error: str = SchemaField(description="Error message if the request failed")
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
@@ -97,26 +90,24 @@ class ExaAnswerBlock(Block):
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
url = "https://api.exa.ai/answer"
|
||||
headers = {
|
||||
"Content-Type": "application/json",
|
||||
"x-api-key": credentials.api_key.get_secret_value(),
|
||||
}
|
||||
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
|
||||
|
||||
# Build the payload
|
||||
payload = {
|
||||
"query": input_data.query,
|
||||
"text": input_data.text,
|
||||
"model": input_data.model,
|
||||
}
|
||||
# Get answer using SDK (stream=False for blocks) - this IS async, needs await
|
||||
response = await aexa.answer(
|
||||
query=input_data.query, text=input_data.text, stream=False
|
||||
)
|
||||
|
||||
try:
|
||||
response = await Requests().post(url, headers=headers, json=payload)
|
||||
data = response.json()
|
||||
# this should remain true as long as they don't start defaulting to streaming only.
|
||||
# provides a bit of safety for sdk updates.
|
||||
assert type(response) is AnswerResponse
|
||||
|
||||
yield "answer", data.get("answer", "")
|
||||
yield "citations", data.get("citations", [])
|
||||
yield "cost_dollars", data.get("costDollars", {})
|
||||
yield "answer", response.answer
|
||||
|
||||
except Exception as e:
|
||||
yield "error", str(e)
|
||||
citations = [
|
||||
AnswerCitation.from_sdk(sdk_citation)
|
||||
for sdk_citation in response.citations or []
|
||||
]
|
||||
|
||||
yield "citations", citations
|
||||
for citation in citations:
|
||||
yield "citation", citation
|
||||
|
||||
118
autogpt_platform/backend/backend/blocks/exa/code_context.py
Normal file
118
autogpt_platform/backend/backend/blocks/exa/code_context.py
Normal file
@@ -0,0 +1,118 @@
|
||||
"""
|
||||
Exa Code Context Block
|
||||
|
||||
Provides code search capabilities to find relevant code snippets and examples
|
||||
from open source repositories, documentation, and Stack Overflow.
|
||||
"""
|
||||
|
||||
from typing import Union
|
||||
|
||||
from pydantic import BaseModel
|
||||
|
||||
from backend.sdk import (
|
||||
APIKeyCredentials,
|
||||
Block,
|
||||
BlockCategory,
|
||||
BlockOutput,
|
||||
BlockSchemaInput,
|
||||
BlockSchemaOutput,
|
||||
CredentialsMetaInput,
|
||||
Requests,
|
||||
SchemaField,
|
||||
)
|
||||
|
||||
from ._config import exa
|
||||
|
||||
|
||||
class CodeContextResponse(BaseModel):
|
||||
"""Stable output model for code context responses."""
|
||||
|
||||
request_id: str
|
||||
query: str
|
||||
response: str
|
||||
results_count: int
|
||||
cost_dollars: str
|
||||
search_time: float
|
||||
output_tokens: int
|
||||
|
||||
@classmethod
|
||||
def from_api(cls, data: dict) -> "CodeContextResponse":
|
||||
"""Convert API response to our stable model."""
|
||||
return cls(
|
||||
request_id=data.get("requestId", ""),
|
||||
query=data.get("query", ""),
|
||||
response=data.get("response", ""),
|
||||
results_count=data.get("resultsCount", 0),
|
||||
cost_dollars=data.get("costDollars", ""),
|
||||
search_time=data.get("searchTime", 0.0),
|
||||
output_tokens=data.get("outputTokens", 0),
|
||||
)
|
||||
|
||||
|
||||
class ExaCodeContextBlock(Block):
|
||||
"""Get relevant code snippets and examples from open source repositories."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
query: str = SchemaField(
|
||||
description="Search query to find relevant code snippets. Describe what you're trying to do or what code you're looking for.",
|
||||
placeholder="how to use React hooks for state management",
|
||||
)
|
||||
tokens_num: Union[str, int] = SchemaField(
|
||||
default="dynamic",
|
||||
description="Token limit for response. Use 'dynamic' for automatic sizing, 5000 for standard queries, or 10000 for comprehensive examples.",
|
||||
placeholder="dynamic",
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
request_id: str = SchemaField(description="Unique identifier for this request")
|
||||
query: str = SchemaField(description="The search query used")
|
||||
response: str = SchemaField(
|
||||
description="Formatted code snippets and contextual examples with sources"
|
||||
)
|
||||
results_count: int = SchemaField(
|
||||
description="Number of code sources found and included"
|
||||
)
|
||||
cost_dollars: str = SchemaField(description="Cost of this request in dollars")
|
||||
search_time: float = SchemaField(
|
||||
description="Time taken to search in milliseconds"
|
||||
)
|
||||
output_tokens: int = SchemaField(description="Number of tokens in the response")
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="8f9e0d1c-2b3a-4567-8901-23456789abcd",
|
||||
description="Search billions of GitHub repos, docs, and Stack Overflow for relevant code examples",
|
||||
categories={BlockCategory.SEARCH, BlockCategory.DEVELOPER_TOOLS},
|
||||
input_schema=ExaCodeContextBlock.Input,
|
||||
output_schema=ExaCodeContextBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
url = "https://api.exa.ai/context"
|
||||
headers = {
|
||||
"Content-Type": "application/json",
|
||||
"x-api-key": credentials.api_key.get_secret_value(),
|
||||
}
|
||||
|
||||
payload = {
|
||||
"query": input_data.query,
|
||||
"tokensNum": input_data.tokens_num,
|
||||
}
|
||||
|
||||
response = await Requests().post(url, headers=headers, json=payload)
|
||||
data = response.json()
|
||||
|
||||
context = CodeContextResponse.from_api(data)
|
||||
|
||||
yield "request_id", context.request_id
|
||||
yield "query", context.query
|
||||
yield "response", context.response
|
||||
yield "results_count", context.results_count
|
||||
yield "cost_dollars", context.cost_dollars
|
||||
yield "search_time", context.search_time
|
||||
yield "output_tokens", context.output_tokens
|
||||
@@ -1,3 +1,9 @@
|
||||
from enum import Enum
|
||||
from typing import Optional
|
||||
|
||||
from exa_py import AsyncExa
|
||||
from pydantic import BaseModel
|
||||
|
||||
from backend.sdk import (
|
||||
APIKeyCredentials,
|
||||
Block,
|
||||
@@ -6,12 +12,45 @@ from backend.sdk import (
|
||||
BlockSchemaInput,
|
||||
BlockSchemaOutput,
|
||||
CredentialsMetaInput,
|
||||
Requests,
|
||||
SchemaField,
|
||||
)
|
||||
|
||||
from ._config import exa
|
||||
from .helpers import ContentSettings
|
||||
from .helpers import (
|
||||
CostDollars,
|
||||
ExaSearchResults,
|
||||
ExtrasSettings,
|
||||
HighlightSettings,
|
||||
LivecrawlTypes,
|
||||
SummarySettings,
|
||||
)
|
||||
|
||||
|
||||
class ContentStatusTag(str, Enum):
|
||||
CRAWL_NOT_FOUND = "CRAWL_NOT_FOUND"
|
||||
CRAWL_TIMEOUT = "CRAWL_TIMEOUT"
|
||||
CRAWL_LIVECRAWL_TIMEOUT = "CRAWL_LIVECRAWL_TIMEOUT"
|
||||
SOURCE_NOT_AVAILABLE = "SOURCE_NOT_AVAILABLE"
|
||||
CRAWL_UNKNOWN_ERROR = "CRAWL_UNKNOWN_ERROR"
|
||||
|
||||
|
||||
class ContentError(BaseModel):
|
||||
tag: Optional[ContentStatusTag] = SchemaField(
|
||||
default=None, description="Specific error type"
|
||||
)
|
||||
httpStatusCode: Optional[int] = SchemaField(
|
||||
default=None, description="The corresponding HTTP status code"
|
||||
)
|
||||
|
||||
|
||||
class ContentStatus(BaseModel):
|
||||
id: str = SchemaField(description="The URL that was requested")
|
||||
status: str = SchemaField(
|
||||
description="Status of the content fetch operation (success or error)"
|
||||
)
|
||||
error: Optional[ContentError] = SchemaField(
|
||||
default=None, description="Error details, only present when status is 'error'"
|
||||
)
|
||||
|
||||
|
||||
class ExaContentsBlock(Block):
|
||||
@@ -19,22 +58,70 @@ class ExaContentsBlock(Block):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
ids: list[str] = SchemaField(
|
||||
description="Array of document IDs obtained from searches"
|
||||
urls: list[str] = SchemaField(
|
||||
description="Array of URLs to crawl (preferred over 'ids')",
|
||||
default_factory=list,
|
||||
advanced=False,
|
||||
)
|
||||
contents: ContentSettings = SchemaField(
|
||||
description="Content retrieval settings",
|
||||
default=ContentSettings(),
|
||||
ids: list[str] = SchemaField(
|
||||
description="[DEPRECATED - use 'urls' instead] Array of document IDs obtained from searches",
|
||||
default_factory=list,
|
||||
advanced=True,
|
||||
)
|
||||
text: bool = SchemaField(
|
||||
description="Retrieve text content from pages",
|
||||
default=True,
|
||||
)
|
||||
highlights: HighlightSettings = SchemaField(
|
||||
description="Text snippets most relevant from each page",
|
||||
default=HighlightSettings(),
|
||||
)
|
||||
summary: SummarySettings = SchemaField(
|
||||
description="LLM-generated summary of the webpage",
|
||||
default=SummarySettings(),
|
||||
)
|
||||
livecrawl: Optional[LivecrawlTypes] = SchemaField(
|
||||
description="Livecrawling options: never, fallback (default), always, preferred",
|
||||
default=LivecrawlTypes.FALLBACK,
|
||||
advanced=True,
|
||||
)
|
||||
livecrawl_timeout: Optional[int] = SchemaField(
|
||||
description="Timeout for livecrawling in milliseconds",
|
||||
default=10000,
|
||||
advanced=True,
|
||||
)
|
||||
subpages: Optional[int] = SchemaField(
|
||||
description="Number of subpages to crawl", default=0, ge=0, advanced=True
|
||||
)
|
||||
subpage_target: Optional[str | list[str]] = SchemaField(
|
||||
description="Keyword(s) to find specific subpages of search results",
|
||||
default=None,
|
||||
advanced=True,
|
||||
)
|
||||
extras: ExtrasSettings = SchemaField(
|
||||
description="Extra parameters for additional content",
|
||||
default=ExtrasSettings(),
|
||||
advanced=True,
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
results: list = SchemaField(
|
||||
description="List of document contents", default_factory=list
|
||||
results: list[ExaSearchResults] = SchemaField(
|
||||
description="List of document contents with metadata"
|
||||
)
|
||||
error: str = SchemaField(
|
||||
description="Error message if the request failed", default=""
|
||||
result: ExaSearchResults = SchemaField(
|
||||
description="Single document content result"
|
||||
)
|
||||
context: str = SchemaField(
|
||||
description="A formatted string of the results ready for LLMs"
|
||||
)
|
||||
request_id: str = SchemaField(description="Unique identifier for the request")
|
||||
statuses: list[ContentStatus] = SchemaField(
|
||||
description="Status information for each requested URL"
|
||||
)
|
||||
cost_dollars: Optional[CostDollars] = SchemaField(
|
||||
description="Cost breakdown for the request"
|
||||
)
|
||||
error: str = SchemaField(description="Error message if the request failed")
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
@@ -48,23 +135,91 @@ class ExaContentsBlock(Block):
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
url = "https://api.exa.ai/contents"
|
||||
headers = {
|
||||
"Content-Type": "application/json",
|
||||
"x-api-key": credentials.api_key.get_secret_value(),
|
||||
}
|
||||
if not input_data.urls and not input_data.ids:
|
||||
raise ValueError("Either 'urls' or 'ids' must be provided")
|
||||
|
||||
# Convert ContentSettings to API format
|
||||
payload = {
|
||||
"ids": input_data.ids,
|
||||
"text": input_data.contents.text,
|
||||
"highlights": input_data.contents.highlights,
|
||||
"summary": input_data.contents.summary,
|
||||
}
|
||||
sdk_kwargs = {}
|
||||
|
||||
try:
|
||||
response = await Requests().post(url, headers=headers, json=payload)
|
||||
data = response.json()
|
||||
yield "results", data.get("results", [])
|
||||
except Exception as e:
|
||||
yield "error", str(e)
|
||||
# Prefer urls over ids
|
||||
if input_data.urls:
|
||||
sdk_kwargs["urls"] = input_data.urls
|
||||
elif input_data.ids:
|
||||
sdk_kwargs["ids"] = input_data.ids
|
||||
|
||||
if input_data.text:
|
||||
sdk_kwargs["text"] = {"includeHtmlTags": True}
|
||||
|
||||
# Handle highlights - only include if modified from defaults
|
||||
if input_data.highlights and (
|
||||
input_data.highlights.num_sentences != 1
|
||||
or input_data.highlights.highlights_per_url != 1
|
||||
or input_data.highlights.query is not None
|
||||
):
|
||||
highlights_dict = {}
|
||||
highlights_dict["numSentences"] = input_data.highlights.num_sentences
|
||||
highlights_dict["highlightsPerUrl"] = (
|
||||
input_data.highlights.highlights_per_url
|
||||
)
|
||||
if input_data.highlights.query:
|
||||
highlights_dict["query"] = input_data.highlights.query
|
||||
sdk_kwargs["highlights"] = highlights_dict
|
||||
|
||||
# Handle summary - only include if modified from defaults
|
||||
if input_data.summary and (
|
||||
input_data.summary.query is not None
|
||||
or input_data.summary.schema is not None
|
||||
):
|
||||
summary_dict = {}
|
||||
if input_data.summary.query:
|
||||
summary_dict["query"] = input_data.summary.query
|
||||
if input_data.summary.schema:
|
||||
summary_dict["schema"] = input_data.summary.schema
|
||||
sdk_kwargs["summary"] = summary_dict
|
||||
|
||||
if input_data.livecrawl:
|
||||
sdk_kwargs["livecrawl"] = input_data.livecrawl.value
|
||||
|
||||
if input_data.livecrawl_timeout is not None:
|
||||
sdk_kwargs["livecrawl_timeout"] = input_data.livecrawl_timeout
|
||||
|
||||
if input_data.subpages is not None:
|
||||
sdk_kwargs["subpages"] = input_data.subpages
|
||||
|
||||
if input_data.subpage_target:
|
||||
sdk_kwargs["subpage_target"] = input_data.subpage_target
|
||||
|
||||
# Handle extras - only include if modified from defaults
|
||||
if input_data.extras and (
|
||||
input_data.extras.links > 0 or input_data.extras.image_links > 0
|
||||
):
|
||||
extras_dict = {}
|
||||
if input_data.extras.links:
|
||||
extras_dict["links"] = input_data.extras.links
|
||||
if input_data.extras.image_links:
|
||||
extras_dict["image_links"] = input_data.extras.image_links
|
||||
sdk_kwargs["extras"] = extras_dict
|
||||
|
||||
# Always enable context for LLM-ready output
|
||||
sdk_kwargs["context"] = True
|
||||
|
||||
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
|
||||
response = await aexa.get_contents(**sdk_kwargs)
|
||||
|
||||
converted_results = [
|
||||
ExaSearchResults.from_sdk(sdk_result)
|
||||
for sdk_result in response.results or []
|
||||
]
|
||||
|
||||
yield "results", converted_results
|
||||
|
||||
for result in converted_results:
|
||||
yield "result", result
|
||||
|
||||
if response.context:
|
||||
yield "context", response.context
|
||||
|
||||
if response.statuses:
|
||||
yield "statuses", response.statuses
|
||||
|
||||
if response.cost_dollars:
|
||||
yield "cost_dollars", response.cost_dollars
|
||||
|
||||
@@ -1,51 +1,150 @@
|
||||
from typing import Optional
|
||||
from enum import Enum
|
||||
from typing import Any, Dict, Literal, Optional, Union
|
||||
|
||||
from backend.sdk import BaseModel, SchemaField
|
||||
from backend.sdk import BaseModel, MediaFileType, SchemaField
|
||||
|
||||
|
||||
class TextSettings(BaseModel):
|
||||
max_characters: int = SchemaField(
|
||||
default=1000,
|
||||
class LivecrawlTypes(str, Enum):
|
||||
NEVER = "never"
|
||||
FALLBACK = "fallback"
|
||||
ALWAYS = "always"
|
||||
PREFERRED = "preferred"
|
||||
|
||||
|
||||
class TextEnabled(BaseModel):
|
||||
discriminator: Literal["enabled"] = "enabled"
|
||||
|
||||
|
||||
class TextDisabled(BaseModel):
|
||||
discriminator: Literal["disabled"] = "disabled"
|
||||
|
||||
|
||||
class TextAdvanced(BaseModel):
|
||||
discriminator: Literal["advanced"] = "advanced"
|
||||
max_characters: Optional[int] = SchemaField(
|
||||
default=None,
|
||||
description="Maximum number of characters to return",
|
||||
placeholder="1000",
|
||||
)
|
||||
include_html_tags: bool = SchemaField(
|
||||
default=False,
|
||||
description="Whether to include HTML tags in the text",
|
||||
description="Include HTML tags in the response, helps LLMs understand text structure",
|
||||
placeholder="False",
|
||||
)
|
||||
|
||||
|
||||
class HighlightSettings(BaseModel):
|
||||
num_sentences: int = SchemaField(
|
||||
default=3,
|
||||
default=1,
|
||||
description="Number of sentences per highlight",
|
||||
placeholder="3",
|
||||
placeholder="1",
|
||||
ge=1,
|
||||
)
|
||||
highlights_per_url: int = SchemaField(
|
||||
default=3,
|
||||
default=1,
|
||||
description="Number of highlights per URL",
|
||||
placeholder="3",
|
||||
placeholder="1",
|
||||
ge=1,
|
||||
)
|
||||
query: Optional[str] = SchemaField(
|
||||
default=None,
|
||||
description="Custom query to direct the LLM's selection of highlights",
|
||||
placeholder="Key advancements",
|
||||
)
|
||||
|
||||
|
||||
class SummarySettings(BaseModel):
|
||||
query: Optional[str] = SchemaField(
|
||||
default="",
|
||||
description="Query string for summarization",
|
||||
placeholder="Enter query",
|
||||
default=None,
|
||||
description="Custom query for the LLM-generated summary",
|
||||
placeholder="Main developments",
|
||||
)
|
||||
schema: Optional[dict] = SchemaField( # type: ignore
|
||||
default=None,
|
||||
description="JSON schema for structured output from summary",
|
||||
advanced=True,
|
||||
)
|
||||
|
||||
|
||||
class ExtrasSettings(BaseModel):
|
||||
links: int = SchemaField(
|
||||
default=0,
|
||||
description="Number of URLs to return from each webpage",
|
||||
placeholder="1",
|
||||
ge=0,
|
||||
)
|
||||
image_links: int = SchemaField(
|
||||
default=0,
|
||||
description="Number of images to return for each result",
|
||||
placeholder="1",
|
||||
ge=0,
|
||||
)
|
||||
|
||||
|
||||
class ContextEnabled(BaseModel):
|
||||
discriminator: Literal["enabled"] = "enabled"
|
||||
|
||||
|
||||
class ContextDisabled(BaseModel):
|
||||
discriminator: Literal["disabled"] = "disabled"
|
||||
|
||||
|
||||
class ContextAdvanced(BaseModel):
|
||||
discriminator: Literal["advanced"] = "advanced"
|
||||
max_characters: Optional[int] = SchemaField(
|
||||
default=None,
|
||||
description="Maximum character limit for context string",
|
||||
placeholder="10000",
|
||||
)
|
||||
|
||||
|
||||
class ContentSettings(BaseModel):
|
||||
text: TextSettings = SchemaField(
|
||||
default=TextSettings(),
|
||||
text: Optional[Union[bool, TextEnabled, TextDisabled, TextAdvanced]] = SchemaField(
|
||||
default=None,
|
||||
description="Text content retrieval. Boolean for simple enable/disable or object for advanced settings",
|
||||
)
|
||||
highlights: HighlightSettings = SchemaField(
|
||||
default=HighlightSettings(),
|
||||
highlights: Optional[HighlightSettings] = SchemaField(
|
||||
default=None,
|
||||
description="Text snippets most relevant from each page",
|
||||
)
|
||||
summary: SummarySettings = SchemaField(
|
||||
default=SummarySettings(),
|
||||
summary: Optional[SummarySettings] = SchemaField(
|
||||
default=None,
|
||||
description="LLM-generated summary of the webpage",
|
||||
)
|
||||
livecrawl: Optional[LivecrawlTypes] = SchemaField(
|
||||
default=None,
|
||||
description="Livecrawling options: never, fallback, always, preferred",
|
||||
advanced=True,
|
||||
)
|
||||
livecrawl_timeout: Optional[int] = SchemaField(
|
||||
default=None,
|
||||
description="Timeout for livecrawling in milliseconds",
|
||||
placeholder="10000",
|
||||
advanced=True,
|
||||
)
|
||||
subpages: Optional[int] = SchemaField(
|
||||
default=None,
|
||||
description="Number of subpages to crawl",
|
||||
placeholder="0",
|
||||
ge=0,
|
||||
advanced=True,
|
||||
)
|
||||
subpage_target: Optional[Union[str, list[str]]] = SchemaField(
|
||||
default=None,
|
||||
description="Keyword(s) to find specific subpages of search results",
|
||||
advanced=True,
|
||||
)
|
||||
extras: Optional[ExtrasSettings] = SchemaField(
|
||||
default=None,
|
||||
description="Extra parameters for additional content",
|
||||
advanced=True,
|
||||
)
|
||||
context: Optional[Union[bool, ContextEnabled, ContextDisabled, ContextAdvanced]] = (
|
||||
SchemaField(
|
||||
default=None,
|
||||
description="Format search results into a context string for LLMs",
|
||||
advanced=True,
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
@@ -127,3 +226,225 @@ class WebsetEnrichmentConfig(BaseModel):
|
||||
default=None,
|
||||
description="Options for the enrichment",
|
||||
)
|
||||
|
||||
|
||||
# Shared result models
|
||||
class ExaSearchExtras(BaseModel):
|
||||
links: list[str] = SchemaField(
|
||||
default_factory=list, description="Array of links from the search result"
|
||||
)
|
||||
imageLinks: list[str] = SchemaField(
|
||||
default_factory=list, description="Array of image links from the search result"
|
||||
)
|
||||
|
||||
|
||||
class ExaSearchResults(BaseModel):
|
||||
title: str | None = None
|
||||
url: str | None = None
|
||||
publishedDate: str | None = None
|
||||
author: str | None = None
|
||||
id: str
|
||||
image: MediaFileType | None = None
|
||||
favicon: MediaFileType | None = None
|
||||
text: str | None = None
|
||||
highlights: list[str] = SchemaField(default_factory=list)
|
||||
highlightScores: list[float] = SchemaField(default_factory=list)
|
||||
summary: str | None = None
|
||||
subpages: list[dict] = SchemaField(default_factory=list)
|
||||
extras: ExaSearchExtras | None = None
|
||||
|
||||
@classmethod
|
||||
def from_sdk(cls, sdk_result) -> "ExaSearchResults":
|
||||
"""Convert SDK Result (dataclass) to our Pydantic model."""
|
||||
return cls(
|
||||
id=getattr(sdk_result, "id", ""),
|
||||
url=getattr(sdk_result, "url", None),
|
||||
title=getattr(sdk_result, "title", None),
|
||||
author=getattr(sdk_result, "author", None),
|
||||
publishedDate=getattr(sdk_result, "published_date", None),
|
||||
text=getattr(sdk_result, "text", None),
|
||||
highlights=getattr(sdk_result, "highlights", None) or [],
|
||||
highlightScores=getattr(sdk_result, "highlight_scores", None) or [],
|
||||
summary=getattr(sdk_result, "summary", None),
|
||||
subpages=getattr(sdk_result, "subpages", None) or [],
|
||||
image=getattr(sdk_result, "image", None),
|
||||
favicon=getattr(sdk_result, "favicon", None),
|
||||
extras=getattr(sdk_result, "extras", None),
|
||||
)
|
||||
|
||||
|
||||
# Cost tracking models
|
||||
class CostBreakdown(BaseModel):
|
||||
keywordSearch: float = SchemaField(default=0.0)
|
||||
neuralSearch: float = SchemaField(default=0.0)
|
||||
contentText: float = SchemaField(default=0.0)
|
||||
contentHighlight: float = SchemaField(default=0.0)
|
||||
contentSummary: float = SchemaField(default=0.0)
|
||||
|
||||
|
||||
class CostBreakdownItem(BaseModel):
|
||||
search: float = SchemaField(default=0.0)
|
||||
contents: float = SchemaField(default=0.0)
|
||||
breakdown: CostBreakdown = SchemaField(default_factory=CostBreakdown)
|
||||
|
||||
|
||||
class PerRequestPrices(BaseModel):
|
||||
neuralSearch_1_25_results: float = SchemaField(default=0.005)
|
||||
neuralSearch_26_100_results: float = SchemaField(default=0.025)
|
||||
neuralSearch_100_plus_results: float = SchemaField(default=1.0)
|
||||
keywordSearch_1_100_results: float = SchemaField(default=0.0025)
|
||||
keywordSearch_100_plus_results: float = SchemaField(default=3.0)
|
||||
|
||||
|
||||
class PerPagePrices(BaseModel):
|
||||
contentText: float = SchemaField(default=0.001)
|
||||
contentHighlight: float = SchemaField(default=0.001)
|
||||
contentSummary: float = SchemaField(default=0.001)
|
||||
|
||||
|
||||
class CostDollars(BaseModel):
|
||||
total: float = SchemaField(description="Total dollar cost for your request")
|
||||
breakDown: list[CostBreakdownItem] = SchemaField(
|
||||
default_factory=list, description="Breakdown of costs by operation type"
|
||||
)
|
||||
perRequestPrices: PerRequestPrices = SchemaField(
|
||||
default_factory=PerRequestPrices,
|
||||
description="Standard price per request for different operations",
|
||||
)
|
||||
perPagePrices: PerPagePrices = SchemaField(
|
||||
default_factory=PerPagePrices,
|
||||
description="Standard price per page for different content operations",
|
||||
)
|
||||
|
||||
|
||||
# Helper functions for payload processing
|
||||
def process_text_field(
|
||||
text: Union[bool, TextEnabled, TextDisabled, TextAdvanced, None]
|
||||
) -> Optional[Union[bool, Dict[str, Any]]]:
|
||||
"""Process text field for API payload."""
|
||||
if text is None:
|
||||
return None
|
||||
|
||||
# Handle backward compatibility with boolean
|
||||
if isinstance(text, bool):
|
||||
return text
|
||||
elif isinstance(text, TextDisabled):
|
||||
return False
|
||||
elif isinstance(text, TextEnabled):
|
||||
return True
|
||||
elif isinstance(text, TextAdvanced):
|
||||
text_dict = {}
|
||||
if text.max_characters:
|
||||
text_dict["maxCharacters"] = text.max_characters
|
||||
if text.include_html_tags:
|
||||
text_dict["includeHtmlTags"] = text.include_html_tags
|
||||
return text_dict if text_dict else True
|
||||
return None
|
||||
|
||||
|
||||
def process_contents_settings(contents: Optional[ContentSettings]) -> Dict[str, Any]:
|
||||
"""Process ContentSettings into API payload format."""
|
||||
if not contents:
|
||||
return {}
|
||||
|
||||
content_settings = {}
|
||||
|
||||
# Handle text field (can be boolean or object)
|
||||
text_value = process_text_field(contents.text)
|
||||
if text_value is not None:
|
||||
content_settings["text"] = text_value
|
||||
|
||||
# Handle highlights
|
||||
if contents.highlights:
|
||||
highlights_dict: Dict[str, Any] = {
|
||||
"numSentences": contents.highlights.num_sentences,
|
||||
"highlightsPerUrl": contents.highlights.highlights_per_url,
|
||||
}
|
||||
if contents.highlights.query:
|
||||
highlights_dict["query"] = contents.highlights.query
|
||||
content_settings["highlights"] = highlights_dict
|
||||
|
||||
if contents.summary:
|
||||
summary_dict = {}
|
||||
if contents.summary.query:
|
||||
summary_dict["query"] = contents.summary.query
|
||||
if contents.summary.schema:
|
||||
summary_dict["schema"] = contents.summary.schema
|
||||
content_settings["summary"] = summary_dict
|
||||
|
||||
if contents.livecrawl:
|
||||
content_settings["livecrawl"] = contents.livecrawl.value
|
||||
|
||||
if contents.livecrawl_timeout is not None:
|
||||
content_settings["livecrawlTimeout"] = contents.livecrawl_timeout
|
||||
|
||||
if contents.subpages is not None:
|
||||
content_settings["subpages"] = contents.subpages
|
||||
|
||||
if contents.subpage_target:
|
||||
content_settings["subpageTarget"] = contents.subpage_target
|
||||
|
||||
if contents.extras:
|
||||
extras_dict = {}
|
||||
if contents.extras.links:
|
||||
extras_dict["links"] = contents.extras.links
|
||||
if contents.extras.image_links:
|
||||
extras_dict["imageLinks"] = contents.extras.image_links
|
||||
content_settings["extras"] = extras_dict
|
||||
|
||||
context_value = process_context_field(contents.context)
|
||||
if context_value is not None:
|
||||
content_settings["context"] = context_value
|
||||
|
||||
return content_settings
|
||||
|
||||
|
||||
def process_context_field(
|
||||
context: Union[bool, dict, ContextEnabled, ContextDisabled, ContextAdvanced, None]
|
||||
) -> Optional[Union[bool, Dict[str, int]]]:
|
||||
"""Process context field for API payload."""
|
||||
if context is None:
|
||||
return None
|
||||
|
||||
# Handle backward compatibility with boolean
|
||||
if isinstance(context, bool):
|
||||
return context if context else None
|
||||
elif isinstance(context, dict) and "maxCharacters" in context:
|
||||
return {"maxCharacters": context["maxCharacters"]}
|
||||
elif isinstance(context, ContextDisabled):
|
||||
return None # Don't send context field at all when disabled
|
||||
elif isinstance(context, ContextEnabled):
|
||||
return True
|
||||
elif isinstance(context, ContextAdvanced):
|
||||
if context.max_characters:
|
||||
return {"maxCharacters": context.max_characters}
|
||||
return True
|
||||
return None
|
||||
|
||||
|
||||
def format_date_fields(
|
||||
input_data: Any, date_field_mapping: Dict[str, str]
|
||||
) -> Dict[str, str]:
|
||||
"""Format datetime fields for API payload."""
|
||||
formatted_dates = {}
|
||||
for input_field, api_field in date_field_mapping.items():
|
||||
value = getattr(input_data, input_field, None)
|
||||
if value:
|
||||
formatted_dates[api_field] = value.strftime("%Y-%m-%dT%H:%M:%S.000Z")
|
||||
return formatted_dates
|
||||
|
||||
|
||||
def add_optional_fields(
|
||||
input_data: Any,
|
||||
field_mapping: Dict[str, str],
|
||||
payload: Dict[str, Any],
|
||||
process_enums: bool = False,
|
||||
) -> None:
|
||||
"""Add optional fields to payload if they have values."""
|
||||
for input_field, api_field in field_mapping.items():
|
||||
value = getattr(input_data, input_field, None)
|
||||
if value: # Only add non-empty values
|
||||
if process_enums and hasattr(value, "value"):
|
||||
payload[api_field] = value.value
|
||||
else:
|
||||
payload[api_field] = value
|
||||
|
||||
@@ -1,247 +0,0 @@
|
||||
from datetime import datetime
|
||||
from enum import Enum
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
|
||||
# Enum definitions based on available options
|
||||
class WebsetStatus(str, Enum):
|
||||
IDLE = "idle"
|
||||
PENDING = "pending"
|
||||
RUNNING = "running"
|
||||
PAUSED = "paused"
|
||||
|
||||
|
||||
class WebsetSearchStatus(str, Enum):
|
||||
CREATED = "created"
|
||||
# Add more if known, based on example it's "created"
|
||||
|
||||
|
||||
class ImportStatus(str, Enum):
|
||||
PENDING = "pending"
|
||||
# Add more if known
|
||||
|
||||
|
||||
class ImportFormat(str, Enum):
|
||||
CSV = "csv"
|
||||
# Add more if known
|
||||
|
||||
|
||||
class EnrichmentStatus(str, Enum):
|
||||
PENDING = "pending"
|
||||
# Add more if known
|
||||
|
||||
|
||||
class EnrichmentFormat(str, Enum):
|
||||
TEXT = "text"
|
||||
# Add more if known
|
||||
|
||||
|
||||
class MonitorStatus(str, Enum):
|
||||
ENABLED = "enabled"
|
||||
# Add more if known
|
||||
|
||||
|
||||
class MonitorBehaviorType(str, Enum):
|
||||
SEARCH = "search"
|
||||
# Add more if known
|
||||
|
||||
|
||||
class MonitorRunStatus(str, Enum):
|
||||
CREATED = "created"
|
||||
# Add more if known
|
||||
|
||||
|
||||
class CanceledReason(str, Enum):
|
||||
WEBSET_DELETED = "webset_deleted"
|
||||
# Add more if known
|
||||
|
||||
|
||||
class FailedReason(str, Enum):
|
||||
INVALID_FORMAT = "invalid_format"
|
||||
# Add more if known
|
||||
|
||||
|
||||
class Confidence(str, Enum):
|
||||
HIGH = "high"
|
||||
# Add more if known
|
||||
|
||||
|
||||
# Nested models
|
||||
|
||||
|
||||
class Entity(BaseModel):
|
||||
type: str
|
||||
|
||||
|
||||
class Criterion(BaseModel):
|
||||
description: str
|
||||
successRate: Optional[int] = None
|
||||
|
||||
|
||||
class ExcludeItem(BaseModel):
|
||||
source: str = Field(default="import")
|
||||
id: str
|
||||
|
||||
|
||||
class Relationship(BaseModel):
|
||||
definition: str
|
||||
limit: Optional[float] = None
|
||||
|
||||
|
||||
class ScopeItem(BaseModel):
|
||||
source: str = Field(default="import")
|
||||
id: str
|
||||
relationship: Optional[Relationship] = None
|
||||
|
||||
|
||||
class Progress(BaseModel):
|
||||
found: int
|
||||
analyzed: int
|
||||
completion: int
|
||||
timeLeft: int
|
||||
|
||||
|
||||
class Bounds(BaseModel):
|
||||
min: int
|
||||
max: int
|
||||
|
||||
|
||||
class Expected(BaseModel):
|
||||
total: int
|
||||
confidence: str = Field(default="high") # Use str or Confidence enum
|
||||
bounds: Bounds
|
||||
|
||||
|
||||
class Recall(BaseModel):
|
||||
expected: Expected
|
||||
reasoning: str
|
||||
|
||||
|
||||
class WebsetSearch(BaseModel):
|
||||
id: str
|
||||
object: str = Field(default="webset_search")
|
||||
status: str = Field(default="created") # Or use WebsetSearchStatus
|
||||
websetId: str
|
||||
query: str
|
||||
entity: Entity
|
||||
criteria: List[Criterion]
|
||||
count: int
|
||||
behavior: str = Field(default="override")
|
||||
exclude: List[ExcludeItem]
|
||||
scope: List[ScopeItem]
|
||||
progress: Progress
|
||||
recall: Recall
|
||||
metadata: Dict[str, Any] = Field(default_factory=dict)
|
||||
canceledAt: Optional[datetime] = None
|
||||
canceledReason: Optional[str] = Field(default=None) # Or use CanceledReason
|
||||
createdAt: datetime
|
||||
updatedAt: datetime
|
||||
|
||||
|
||||
class ImportEntity(BaseModel):
|
||||
type: str
|
||||
|
||||
|
||||
class Import(BaseModel):
|
||||
id: str
|
||||
object: str = Field(default="import")
|
||||
status: str = Field(default="pending") # Or use ImportStatus
|
||||
format: str = Field(default="csv") # Or use ImportFormat
|
||||
entity: ImportEntity
|
||||
title: str
|
||||
count: int
|
||||
metadata: Dict[str, Any] = Field(default_factory=dict)
|
||||
failedReason: Optional[str] = Field(default=None) # Or use FailedReason
|
||||
failedAt: Optional[datetime] = None
|
||||
failedMessage: Optional[str] = None
|
||||
createdAt: datetime
|
||||
updatedAt: datetime
|
||||
|
||||
|
||||
class Option(BaseModel):
|
||||
label: str
|
||||
|
||||
|
||||
class WebsetEnrichment(BaseModel):
|
||||
id: str
|
||||
object: str = Field(default="webset_enrichment")
|
||||
status: str = Field(default="pending") # Or use EnrichmentStatus
|
||||
websetId: str
|
||||
title: str
|
||||
description: str
|
||||
format: str = Field(default="text") # Or use EnrichmentFormat
|
||||
options: List[Option]
|
||||
instructions: str
|
||||
metadata: Dict[str, Any] = Field(default_factory=dict)
|
||||
createdAt: datetime
|
||||
updatedAt: datetime
|
||||
|
||||
|
||||
class Cadence(BaseModel):
|
||||
cron: str
|
||||
timezone: str = Field(default="Etc/UTC")
|
||||
|
||||
|
||||
class BehaviorConfig(BaseModel):
|
||||
query: Optional[str] = None
|
||||
criteria: Optional[List[Criterion]] = None
|
||||
entity: Optional[Entity] = None
|
||||
count: Optional[int] = None
|
||||
behavior: Optional[str] = Field(default=None)
|
||||
|
||||
|
||||
class Behavior(BaseModel):
|
||||
type: str = Field(default="search") # Or use MonitorBehaviorType
|
||||
config: BehaviorConfig
|
||||
|
||||
|
||||
class MonitorRun(BaseModel):
|
||||
id: str
|
||||
object: str = Field(default="monitor_run")
|
||||
status: str = Field(default="created") # Or use MonitorRunStatus
|
||||
monitorId: str
|
||||
type: str = Field(default="search")
|
||||
completedAt: Optional[datetime] = None
|
||||
failedAt: Optional[datetime] = None
|
||||
failedReason: Optional[str] = None
|
||||
canceledAt: Optional[datetime] = None
|
||||
createdAt: datetime
|
||||
updatedAt: datetime
|
||||
|
||||
|
||||
class Monitor(BaseModel):
|
||||
id: str
|
||||
object: str = Field(default="monitor")
|
||||
status: str = Field(default="enabled") # Or use MonitorStatus
|
||||
websetId: str
|
||||
cadence: Cadence
|
||||
behavior: Behavior
|
||||
lastRun: Optional[MonitorRun] = None
|
||||
nextRunAt: Optional[datetime] = None
|
||||
metadata: Dict[str, Any] = Field(default_factory=dict)
|
||||
createdAt: datetime
|
||||
updatedAt: datetime
|
||||
|
||||
|
||||
class Webset(BaseModel):
|
||||
id: str
|
||||
object: str = Field(default="webset")
|
||||
status: WebsetStatus
|
||||
externalId: Optional[str] = None
|
||||
title: Optional[str] = None
|
||||
searches: List[WebsetSearch]
|
||||
imports: List[Import]
|
||||
enrichments: List[WebsetEnrichment]
|
||||
monitors: List[Monitor]
|
||||
streams: List[Any]
|
||||
createdAt: datetime
|
||||
updatedAt: datetime
|
||||
metadata: Dict[str, Any] = Field(default_factory=dict)
|
||||
|
||||
|
||||
class ListWebsets(BaseModel):
|
||||
data: List[Webset]
|
||||
hasMore: bool
|
||||
nextCursor: Optional[str] = None
|
||||
518
autogpt_platform/backend/backend/blocks/exa/research.py
Normal file
518
autogpt_platform/backend/backend/blocks/exa/research.py
Normal file
@@ -0,0 +1,518 @@
|
||||
"""
|
||||
Exa Research Task Blocks
|
||||
|
||||
Provides asynchronous research capabilities that explore the web, gather sources,
|
||||
synthesize findings, and return structured results with citations.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import time
|
||||
from enum import Enum
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
from pydantic import BaseModel
|
||||
|
||||
from backend.sdk import (
|
||||
APIKeyCredentials,
|
||||
Block,
|
||||
BlockCategory,
|
||||
BlockOutput,
|
||||
BlockSchemaInput,
|
||||
BlockSchemaOutput,
|
||||
CredentialsMetaInput,
|
||||
Requests,
|
||||
SchemaField,
|
||||
)
|
||||
|
||||
from ._config import exa
|
||||
|
||||
|
||||
class ResearchModel(str, Enum):
|
||||
"""Available research models."""
|
||||
|
||||
FAST = "exa-research-fast"
|
||||
STANDARD = "exa-research"
|
||||
PRO = "exa-research-pro"
|
||||
|
||||
|
||||
class ResearchStatus(str, Enum):
|
||||
"""Research task status."""
|
||||
|
||||
PENDING = "pending"
|
||||
RUNNING = "running"
|
||||
COMPLETED = "completed"
|
||||
CANCELED = "canceled"
|
||||
FAILED = "failed"
|
||||
|
||||
|
||||
class ResearchCostModel(BaseModel):
|
||||
"""Cost breakdown for a research request."""
|
||||
|
||||
total: float
|
||||
num_searches: int
|
||||
num_pages: int
|
||||
reasoning_tokens: int
|
||||
|
||||
@classmethod
|
||||
def from_api(cls, data: dict) -> "ResearchCostModel":
|
||||
"""Convert API response, rounding fractional counts to integers."""
|
||||
return cls(
|
||||
total=data.get("total", 0.0),
|
||||
num_searches=int(round(data.get("numSearches", 0))),
|
||||
num_pages=int(round(data.get("numPages", 0))),
|
||||
reasoning_tokens=int(round(data.get("reasoningTokens", 0))),
|
||||
)
|
||||
|
||||
|
||||
class ResearchOutputModel(BaseModel):
|
||||
"""Research output with content and optional structured data."""
|
||||
|
||||
content: str
|
||||
parsed: Optional[Dict[str, Any]] = None
|
||||
|
||||
|
||||
class ResearchTaskModel(BaseModel):
|
||||
"""Stable output model for research tasks."""
|
||||
|
||||
research_id: str
|
||||
created_at: int
|
||||
model: str
|
||||
instructions: str
|
||||
status: str
|
||||
output_schema: Optional[Dict[str, Any]] = None
|
||||
output: Optional[ResearchOutputModel] = None
|
||||
cost_dollars: Optional[ResearchCostModel] = None
|
||||
finished_at: Optional[int] = None
|
||||
error: Optional[str] = None
|
||||
|
||||
@classmethod
|
||||
def from_api(cls, data: dict) -> "ResearchTaskModel":
|
||||
"""Convert API response to our stable model."""
|
||||
output_data = data.get("output")
|
||||
output = None
|
||||
if output_data:
|
||||
output = ResearchOutputModel(
|
||||
content=output_data.get("content", ""),
|
||||
parsed=output_data.get("parsed"),
|
||||
)
|
||||
|
||||
cost_data = data.get("costDollars")
|
||||
cost = None
|
||||
if cost_data:
|
||||
cost = ResearchCostModel.from_api(cost_data)
|
||||
|
||||
return cls(
|
||||
research_id=data.get("researchId", ""),
|
||||
created_at=data.get("createdAt", 0),
|
||||
model=data.get("model", "exa-research"),
|
||||
instructions=data.get("instructions", ""),
|
||||
status=data.get("status", "pending"),
|
||||
output_schema=data.get("outputSchema"),
|
||||
output=output,
|
||||
cost_dollars=cost,
|
||||
finished_at=data.get("finishedAt"),
|
||||
error=data.get("error"),
|
||||
)
|
||||
|
||||
|
||||
class ExaCreateResearchBlock(Block):
|
||||
"""Create an asynchronous research task that explores the web and synthesizes findings."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
instructions: str = SchemaField(
|
||||
description="Research instructions - clearly define what information to find, how to conduct research, and desired output format.",
|
||||
placeholder="Research the top 5 AI coding assistants, their features, pricing, and user reviews",
|
||||
)
|
||||
model: ResearchModel = SchemaField(
|
||||
default=ResearchModel.STANDARD,
|
||||
description="Research model: 'fast' for quick results, 'standard' for balanced quality, 'pro' for thorough analysis",
|
||||
)
|
||||
output_schema: Optional[dict] = SchemaField(
|
||||
default=None,
|
||||
description="JSON Schema to enforce structured output. When provided, results are validated and returned as parsed JSON.",
|
||||
advanced=True,
|
||||
)
|
||||
wait_for_completion: bool = SchemaField(
|
||||
default=True,
|
||||
description="Wait for research to complete before returning. Ensures you get results immediately.",
|
||||
)
|
||||
polling_timeout: int = SchemaField(
|
||||
default=600,
|
||||
description="Maximum time to wait for completion in seconds (only if wait_for_completion is True)",
|
||||
advanced=True,
|
||||
ge=1,
|
||||
le=3600,
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
research_id: str = SchemaField(
|
||||
description="Unique identifier for tracking this research request"
|
||||
)
|
||||
status: str = SchemaField(description="Final status of the research")
|
||||
model: str = SchemaField(description="The research model used")
|
||||
instructions: str = SchemaField(
|
||||
description="The research instructions provided"
|
||||
)
|
||||
created_at: int = SchemaField(
|
||||
description="When the research was created (Unix timestamp in ms)"
|
||||
)
|
||||
output_content: Optional[str] = SchemaField(
|
||||
description="Research output as text (only if wait_for_completion was True and completed)"
|
||||
)
|
||||
output_parsed: Optional[dict] = SchemaField(
|
||||
description="Structured JSON output (only if wait_for_completion and outputSchema were provided)"
|
||||
)
|
||||
cost_total: Optional[float] = SchemaField(
|
||||
description="Total cost in USD (only if wait_for_completion was True and completed)"
|
||||
)
|
||||
elapsed_time: Optional[float] = SchemaField(
|
||||
description="Time taken to complete in seconds (only if wait_for_completion was True)"
|
||||
)
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="a1f2e3d4-c5b6-4a78-9012-3456789abcde",
|
||||
description="Create research task with optional waiting - explores web and synthesizes findings with citations",
|
||||
categories={BlockCategory.SEARCH, BlockCategory.AI},
|
||||
input_schema=ExaCreateResearchBlock.Input,
|
||||
output_schema=ExaCreateResearchBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
url = "https://api.exa.ai/research/v1"
|
||||
headers = {
|
||||
"Content-Type": "application/json",
|
||||
"x-api-key": credentials.api_key.get_secret_value(),
|
||||
}
|
||||
|
||||
payload: Dict[str, Any] = {
|
||||
"model": input_data.model.value,
|
||||
"instructions": input_data.instructions,
|
||||
}
|
||||
|
||||
if input_data.output_schema:
|
||||
payload["outputSchema"] = input_data.output_schema
|
||||
|
||||
response = await Requests().post(url, headers=headers, json=payload)
|
||||
data = response.json()
|
||||
|
||||
research_id = data.get("researchId", "")
|
||||
|
||||
if input_data.wait_for_completion:
|
||||
start_time = time.time()
|
||||
get_url = f"https://api.exa.ai/research/v1/{research_id}"
|
||||
get_headers = {"x-api-key": credentials.api_key.get_secret_value()}
|
||||
check_interval = 10
|
||||
|
||||
while time.time() - start_time < input_data.polling_timeout:
|
||||
poll_response = await Requests().get(url=get_url, headers=get_headers)
|
||||
poll_data = poll_response.json()
|
||||
|
||||
status = poll_data.get("status", "")
|
||||
|
||||
if status in ["completed", "failed", "canceled"]:
|
||||
elapsed = time.time() - start_time
|
||||
research = ResearchTaskModel.from_api(poll_data)
|
||||
|
||||
yield "research_id", research.research_id
|
||||
yield "status", research.status
|
||||
yield "model", research.model
|
||||
yield "instructions", research.instructions
|
||||
yield "created_at", research.created_at
|
||||
yield "elapsed_time", elapsed
|
||||
|
||||
if research.output:
|
||||
yield "output_content", research.output.content
|
||||
yield "output_parsed", research.output.parsed
|
||||
|
||||
if research.cost_dollars:
|
||||
yield "cost_total", research.cost_dollars.total
|
||||
return
|
||||
|
||||
await asyncio.sleep(check_interval)
|
||||
|
||||
raise ValueError(
|
||||
f"Research did not complete within {input_data.polling_timeout} seconds"
|
||||
)
|
||||
else:
|
||||
yield "research_id", research_id
|
||||
yield "status", data.get("status", "pending")
|
||||
yield "model", data.get("model", input_data.model.value)
|
||||
yield "instructions", data.get("instructions", input_data.instructions)
|
||||
yield "created_at", data.get("createdAt", 0)
|
||||
|
||||
|
||||
class ExaGetResearchBlock(Block):
|
||||
"""Get the status and results of a research task."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
research_id: str = SchemaField(
|
||||
description="The ID of the research task to retrieve",
|
||||
placeholder="01jszdfs0052sg4jc552sg4jc5",
|
||||
)
|
||||
include_events: bool = SchemaField(
|
||||
default=False,
|
||||
description="Include detailed event log of research operations",
|
||||
advanced=True,
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
research_id: str = SchemaField(description="The research task identifier")
|
||||
status: str = SchemaField(
|
||||
description="Current status: pending, running, completed, canceled, or failed"
|
||||
)
|
||||
instructions: str = SchemaField(
|
||||
description="The original research instructions"
|
||||
)
|
||||
model: str = SchemaField(description="The research model used")
|
||||
created_at: int = SchemaField(
|
||||
description="When research was created (Unix timestamp in ms)"
|
||||
)
|
||||
finished_at: Optional[int] = SchemaField(
|
||||
description="When research finished (Unix timestamp in ms, if completed/canceled/failed)"
|
||||
)
|
||||
output_content: Optional[str] = SchemaField(
|
||||
description="Research output as text (if completed)"
|
||||
)
|
||||
output_parsed: Optional[dict] = SchemaField(
|
||||
description="Structured JSON output matching outputSchema (if provided and completed)"
|
||||
)
|
||||
cost_total: Optional[float] = SchemaField(
|
||||
description="Total cost in USD (if completed)"
|
||||
)
|
||||
cost_searches: Optional[int] = SchemaField(
|
||||
description="Number of searches performed (if completed)"
|
||||
)
|
||||
cost_pages: Optional[int] = SchemaField(
|
||||
description="Number of pages crawled (if completed)"
|
||||
)
|
||||
cost_reasoning_tokens: Optional[int] = SchemaField(
|
||||
description="AI tokens used for reasoning (if completed)"
|
||||
)
|
||||
error_message: Optional[str] = SchemaField(
|
||||
description="Error message if research failed"
|
||||
)
|
||||
events: Optional[List[dict]] = SchemaField(
|
||||
description="Detailed event log (if include_events was True)"
|
||||
)
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="b2e3f4a5-6789-4bcd-9012-3456789abcde",
|
||||
description="Get status and results of a research task",
|
||||
categories={BlockCategory.SEARCH},
|
||||
input_schema=ExaGetResearchBlock.Input,
|
||||
output_schema=ExaGetResearchBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
url = f"https://api.exa.ai/research/v1/{input_data.research_id}"
|
||||
headers = {
|
||||
"x-api-key": credentials.api_key.get_secret_value(),
|
||||
}
|
||||
|
||||
params = {}
|
||||
if input_data.include_events:
|
||||
params["events"] = "true"
|
||||
|
||||
response = await Requests().get(url, headers=headers, params=params)
|
||||
data = response.json()
|
||||
|
||||
research = ResearchTaskModel.from_api(data)
|
||||
|
||||
yield "research_id", research.research_id
|
||||
yield "status", research.status
|
||||
yield "instructions", research.instructions
|
||||
yield "model", research.model
|
||||
yield "created_at", research.created_at
|
||||
yield "finished_at", research.finished_at
|
||||
|
||||
if research.output:
|
||||
yield "output_content", research.output.content
|
||||
yield "output_parsed", research.output.parsed
|
||||
|
||||
if research.cost_dollars:
|
||||
yield "cost_total", research.cost_dollars.total
|
||||
yield "cost_searches", research.cost_dollars.num_searches
|
||||
yield "cost_pages", research.cost_dollars.num_pages
|
||||
yield "cost_reasoning_tokens", research.cost_dollars.reasoning_tokens
|
||||
|
||||
yield "error_message", research.error
|
||||
|
||||
if input_data.include_events:
|
||||
yield "events", data.get("events", [])
|
||||
|
||||
|
||||
class ExaWaitForResearchBlock(Block):
|
||||
"""Wait for a research task to complete with progress tracking."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
research_id: str = SchemaField(
|
||||
description="The ID of the research task to wait for",
|
||||
placeholder="01jszdfs0052sg4jc552sg4jc5",
|
||||
)
|
||||
timeout: int = SchemaField(
|
||||
default=600,
|
||||
description="Maximum time to wait in seconds",
|
||||
ge=1,
|
||||
le=3600,
|
||||
)
|
||||
check_interval: int = SchemaField(
|
||||
default=10,
|
||||
description="Seconds between status checks",
|
||||
advanced=True,
|
||||
ge=1,
|
||||
le=60,
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
research_id: str = SchemaField(description="The research task identifier")
|
||||
final_status: str = SchemaField(description="Final status when polling stopped")
|
||||
output_content: Optional[str] = SchemaField(
|
||||
description="Research output as text (if completed)"
|
||||
)
|
||||
output_parsed: Optional[dict] = SchemaField(
|
||||
description="Structured JSON output (if outputSchema was provided and completed)"
|
||||
)
|
||||
cost_total: Optional[float] = SchemaField(description="Total cost in USD")
|
||||
elapsed_time: float = SchemaField(description="Total time waited in seconds")
|
||||
timed_out: bool = SchemaField(
|
||||
description="Whether polling timed out before completion"
|
||||
)
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="c3d4e5f6-7890-4abc-9012-3456789abcde",
|
||||
description="Wait for a research task to complete with configurable timeout",
|
||||
categories={BlockCategory.SEARCH},
|
||||
input_schema=ExaWaitForResearchBlock.Input,
|
||||
output_schema=ExaWaitForResearchBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
start_time = time.time()
|
||||
url = f"https://api.exa.ai/research/v1/{input_data.research_id}"
|
||||
headers = {
|
||||
"x-api-key": credentials.api_key.get_secret_value(),
|
||||
}
|
||||
|
||||
while time.time() - start_time < input_data.timeout:
|
||||
response = await Requests().get(url, headers=headers)
|
||||
data = response.json()
|
||||
|
||||
status = data.get("status", "")
|
||||
|
||||
if status in ["completed", "failed", "canceled"]:
|
||||
elapsed = time.time() - start_time
|
||||
research = ResearchTaskModel.from_api(data)
|
||||
|
||||
yield "research_id", research.research_id
|
||||
yield "final_status", research.status
|
||||
yield "elapsed_time", elapsed
|
||||
yield "timed_out", False
|
||||
|
||||
if research.output:
|
||||
yield "output_content", research.output.content
|
||||
yield "output_parsed", research.output.parsed
|
||||
|
||||
if research.cost_dollars:
|
||||
yield "cost_total", research.cost_dollars.total
|
||||
|
||||
return
|
||||
|
||||
await asyncio.sleep(input_data.check_interval)
|
||||
|
||||
elapsed = time.time() - start_time
|
||||
response = await Requests().get(url, headers=headers)
|
||||
data = response.json()
|
||||
|
||||
yield "research_id", input_data.research_id
|
||||
yield "final_status", data.get("status", "unknown")
|
||||
yield "elapsed_time", elapsed
|
||||
yield "timed_out", True
|
||||
|
||||
|
||||
class ExaListResearchBlock(Block):
|
||||
"""List all research tasks with pagination support."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
cursor: Optional[str] = SchemaField(
|
||||
default=None,
|
||||
description="Cursor for pagination through results",
|
||||
advanced=True,
|
||||
)
|
||||
limit: int = SchemaField(
|
||||
default=10,
|
||||
description="Number of research tasks to return (1-50)",
|
||||
ge=1,
|
||||
le=50,
|
||||
advanced=True,
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
research_tasks: List[ResearchTaskModel] = SchemaField(
|
||||
description="List of research tasks ordered by creation time (newest first)"
|
||||
)
|
||||
research_task: ResearchTaskModel = SchemaField(
|
||||
description="Individual research task (yielded for each task)"
|
||||
)
|
||||
has_more: bool = SchemaField(
|
||||
description="Whether there are more tasks to paginate through"
|
||||
)
|
||||
next_cursor: Optional[str] = SchemaField(
|
||||
description="Cursor for the next page of results"
|
||||
)
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="d4e5f6a7-8901-4bcd-9012-3456789abcde",
|
||||
description="List all research tasks with pagination support",
|
||||
categories={BlockCategory.SEARCH},
|
||||
input_schema=ExaListResearchBlock.Input,
|
||||
output_schema=ExaListResearchBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
url = "https://api.exa.ai/research/v1"
|
||||
headers = {
|
||||
"x-api-key": credentials.api_key.get_secret_value(),
|
||||
}
|
||||
|
||||
params: Dict[str, Any] = {
|
||||
"limit": input_data.limit,
|
||||
}
|
||||
if input_data.cursor:
|
||||
params["cursor"] = input_data.cursor
|
||||
|
||||
response = await Requests().get(url, headers=headers, params=params)
|
||||
data = response.json()
|
||||
|
||||
tasks = [ResearchTaskModel.from_api(task) for task in data.get("data", [])]
|
||||
|
||||
yield "research_tasks", tasks
|
||||
|
||||
for task in tasks:
|
||||
yield "research_task", task
|
||||
|
||||
yield "has_more", data.get("hasMore", False)
|
||||
yield "next_cursor", data.get("nextCursor")
|
||||
@@ -1,4 +1,8 @@
|
||||
from datetime import datetime
|
||||
from enum import Enum
|
||||
from typing import Optional
|
||||
|
||||
from exa_py import AsyncExa
|
||||
|
||||
from backend.sdk import (
|
||||
APIKeyCredentials,
|
||||
@@ -8,12 +12,35 @@ from backend.sdk import (
|
||||
BlockSchemaInput,
|
||||
BlockSchemaOutput,
|
||||
CredentialsMetaInput,
|
||||
Requests,
|
||||
SchemaField,
|
||||
)
|
||||
|
||||
from ._config import exa
|
||||
from .helpers import ContentSettings
|
||||
from .helpers import (
|
||||
ContentSettings,
|
||||
CostDollars,
|
||||
ExaSearchResults,
|
||||
process_contents_settings,
|
||||
)
|
||||
|
||||
|
||||
class ExaSearchTypes(Enum):
|
||||
KEYWORD = "keyword"
|
||||
NEURAL = "neural"
|
||||
FAST = "fast"
|
||||
AUTO = "auto"
|
||||
|
||||
|
||||
class ExaSearchCategories(Enum):
|
||||
COMPANY = "company"
|
||||
RESEARCH_PAPER = "research paper"
|
||||
NEWS = "news"
|
||||
PDF = "pdf"
|
||||
GITHUB = "github"
|
||||
TWEET = "tweet"
|
||||
PERSONAL_SITE = "personal site"
|
||||
LINKEDIN_PROFILE = "linkedin profile"
|
||||
FINANCIAL_REPORT = "financial report"
|
||||
|
||||
|
||||
class ExaSearchBlock(Block):
|
||||
@@ -22,12 +49,18 @@ class ExaSearchBlock(Block):
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
query: str = SchemaField(description="The search query")
|
||||
use_auto_prompt: bool = SchemaField(
|
||||
description="Whether to use autoprompt", default=True, advanced=True
|
||||
type: ExaSearchTypes = SchemaField(
|
||||
description="Type of search", default=ExaSearchTypes.AUTO, advanced=True
|
||||
)
|
||||
type: str = SchemaField(description="Type of search", default="", advanced=True)
|
||||
category: str = SchemaField(
|
||||
description="Category to search within", default="", advanced=True
|
||||
category: ExaSearchCategories | None = SchemaField(
|
||||
description="Category to search within: company, research paper, news, pdf, github, tweet, personal site, linkedin profile, financial report",
|
||||
default=None,
|
||||
advanced=True,
|
||||
)
|
||||
user_location: str | None = SchemaField(
|
||||
description="The two-letter ISO country code of the user (e.g., 'US')",
|
||||
default=None,
|
||||
advanced=True,
|
||||
)
|
||||
number_of_results: int = SchemaField(
|
||||
description="Number of results to return", default=10, advanced=True
|
||||
@@ -40,17 +73,17 @@ class ExaSearchBlock(Block):
|
||||
default_factory=list,
|
||||
advanced=True,
|
||||
)
|
||||
start_crawl_date: datetime = SchemaField(
|
||||
description="Start date for crawled content"
|
||||
start_crawl_date: datetime | None = SchemaField(
|
||||
description="Start date for crawled content", advanced=True, default=None
|
||||
)
|
||||
end_crawl_date: datetime = SchemaField(
|
||||
description="End date for crawled content"
|
||||
end_crawl_date: datetime | None = SchemaField(
|
||||
description="End date for crawled content", advanced=True, default=None
|
||||
)
|
||||
start_published_date: datetime = SchemaField(
|
||||
description="Start date for published content"
|
||||
start_published_date: datetime | None = SchemaField(
|
||||
description="Start date for published content", advanced=True, default=None
|
||||
)
|
||||
end_published_date: datetime = SchemaField(
|
||||
description="End date for published content"
|
||||
end_published_date: datetime | None = SchemaField(
|
||||
description="End date for published content", advanced=True, default=None
|
||||
)
|
||||
include_text: list[str] = SchemaField(
|
||||
description="Text patterns to include", default_factory=list, advanced=True
|
||||
@@ -63,14 +96,30 @@ class ExaSearchBlock(Block):
|
||||
default=ContentSettings(),
|
||||
advanced=True,
|
||||
)
|
||||
moderation: bool = SchemaField(
|
||||
description="Enable content moderation to filter unsafe content from search results",
|
||||
default=False,
|
||||
advanced=True,
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
results: list = SchemaField(
|
||||
description="List of search results", default_factory=list
|
||||
results: list[ExaSearchResults] = SchemaField(
|
||||
description="List of search results"
|
||||
)
|
||||
error: str = SchemaField(
|
||||
description="Error message if the request failed",
|
||||
result: ExaSearchResults = SchemaField(description="Single search result")
|
||||
context: str = SchemaField(
|
||||
description="A formatted string of the search results ready for LLMs."
|
||||
)
|
||||
search_type: str = SchemaField(
|
||||
description="For auto searches, indicates which search type was selected."
|
||||
)
|
||||
resolved_search_type: str = SchemaField(
|
||||
description="The search type that was actually used for this request (neural or keyword)"
|
||||
)
|
||||
cost_dollars: Optional[CostDollars] = SchemaField(
|
||||
description="Cost breakdown for the request"
|
||||
)
|
||||
error: str = SchemaField(description="Error message if the request failed")
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
@@ -84,51 +133,76 @@ class ExaSearchBlock(Block):
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
url = "https://api.exa.ai/search"
|
||||
headers = {
|
||||
"Content-Type": "application/json",
|
||||
"x-api-key": credentials.api_key.get_secret_value(),
|
||||
}
|
||||
|
||||
payload = {
|
||||
sdk_kwargs = {
|
||||
"query": input_data.query,
|
||||
"useAutoprompt": input_data.use_auto_prompt,
|
||||
"numResults": input_data.number_of_results,
|
||||
"contents": input_data.contents.model_dump(),
|
||||
"num_results": input_data.number_of_results,
|
||||
}
|
||||
|
||||
date_field_mapping = {
|
||||
"start_crawl_date": "startCrawlDate",
|
||||
"end_crawl_date": "endCrawlDate",
|
||||
"start_published_date": "startPublishedDate",
|
||||
"end_published_date": "endPublishedDate",
|
||||
}
|
||||
if input_data.type:
|
||||
sdk_kwargs["type"] = input_data.type.value
|
||||
|
||||
# Add dates if they exist
|
||||
for input_field, api_field in date_field_mapping.items():
|
||||
value = getattr(input_data, input_field, None)
|
||||
if value:
|
||||
payload[api_field] = value.strftime("%Y-%m-%dT%H:%M:%S.000Z")
|
||||
if input_data.category:
|
||||
sdk_kwargs["category"] = input_data.category.value
|
||||
|
||||
optional_field_mapping = {
|
||||
"type": "type",
|
||||
"category": "category",
|
||||
"include_domains": "includeDomains",
|
||||
"exclude_domains": "excludeDomains",
|
||||
"include_text": "includeText",
|
||||
"exclude_text": "excludeText",
|
||||
}
|
||||
if input_data.user_location:
|
||||
sdk_kwargs["user_location"] = input_data.user_location
|
||||
|
||||
# Add other fields
|
||||
for input_field, api_field in optional_field_mapping.items():
|
||||
value = getattr(input_data, input_field)
|
||||
if value: # Only add non-empty values
|
||||
payload[api_field] = value
|
||||
# Handle domains
|
||||
if input_data.include_domains:
|
||||
sdk_kwargs["include_domains"] = input_data.include_domains
|
||||
if input_data.exclude_domains:
|
||||
sdk_kwargs["exclude_domains"] = input_data.exclude_domains
|
||||
|
||||
try:
|
||||
response = await Requests().post(url, headers=headers, json=payload)
|
||||
data = response.json()
|
||||
# Extract just the results array from the response
|
||||
yield "results", data.get("results", [])
|
||||
except Exception as e:
|
||||
yield "error", str(e)
|
||||
# Handle dates
|
||||
if input_data.start_crawl_date:
|
||||
sdk_kwargs["start_crawl_date"] = input_data.start_crawl_date.isoformat()
|
||||
if input_data.end_crawl_date:
|
||||
sdk_kwargs["end_crawl_date"] = input_data.end_crawl_date.isoformat()
|
||||
if input_data.start_published_date:
|
||||
sdk_kwargs["start_published_date"] = (
|
||||
input_data.start_published_date.isoformat()
|
||||
)
|
||||
if input_data.end_published_date:
|
||||
sdk_kwargs["end_published_date"] = input_data.end_published_date.isoformat()
|
||||
|
||||
# Handle text filters
|
||||
if input_data.include_text:
|
||||
sdk_kwargs["include_text"] = input_data.include_text
|
||||
if input_data.exclude_text:
|
||||
sdk_kwargs["exclude_text"] = input_data.exclude_text
|
||||
|
||||
if input_data.moderation:
|
||||
sdk_kwargs["moderation"] = input_data.moderation
|
||||
|
||||
# heck if we need to use search_and_contents
|
||||
content_settings = process_contents_settings(input_data.contents)
|
||||
|
||||
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
|
||||
|
||||
if content_settings:
|
||||
sdk_kwargs["text"] = content_settings.get("text", False)
|
||||
if "highlights" in content_settings:
|
||||
sdk_kwargs["highlights"] = content_settings["highlights"]
|
||||
if "summary" in content_settings:
|
||||
sdk_kwargs["summary"] = content_settings["summary"]
|
||||
response = await aexa.search_and_contents(**sdk_kwargs)
|
||||
else:
|
||||
response = await aexa.search(**sdk_kwargs)
|
||||
|
||||
converted_results = [
|
||||
ExaSearchResults.from_sdk(sdk_result)
|
||||
for sdk_result in response.results or []
|
||||
]
|
||||
|
||||
yield "results", converted_results
|
||||
for result in converted_results:
|
||||
yield "result", result
|
||||
|
||||
if response.context:
|
||||
yield "context", response.context
|
||||
|
||||
if response.resolved_search_type:
|
||||
yield "resolved_search_type", response.resolved_search_type
|
||||
|
||||
if response.cost_dollars:
|
||||
yield "cost_dollars", response.cost_dollars
|
||||
|
||||
@@ -1,5 +1,7 @@
|
||||
from datetime import datetime
|
||||
from typing import Any
|
||||
from typing import Optional
|
||||
|
||||
from exa_py import AsyncExa
|
||||
|
||||
from backend.sdk import (
|
||||
APIKeyCredentials,
|
||||
@@ -9,12 +11,16 @@ from backend.sdk import (
|
||||
BlockSchemaInput,
|
||||
BlockSchemaOutput,
|
||||
CredentialsMetaInput,
|
||||
Requests,
|
||||
SchemaField,
|
||||
)
|
||||
|
||||
from ._config import exa
|
||||
from .helpers import ContentSettings
|
||||
from .helpers import (
|
||||
ContentSettings,
|
||||
CostDollars,
|
||||
ExaSearchResults,
|
||||
process_contents_settings,
|
||||
)
|
||||
|
||||
|
||||
class ExaFindSimilarBlock(Block):
|
||||
@@ -29,7 +35,7 @@ class ExaFindSimilarBlock(Block):
|
||||
description="Number of results to return", default=10, advanced=True
|
||||
)
|
||||
include_domains: list[str] = SchemaField(
|
||||
description="Domains to include in search",
|
||||
description="List of domains to include in the search. If specified, results will only come from these domains.",
|
||||
default_factory=list,
|
||||
advanced=True,
|
||||
)
|
||||
@@ -38,17 +44,17 @@ class ExaFindSimilarBlock(Block):
|
||||
default_factory=list,
|
||||
advanced=True,
|
||||
)
|
||||
start_crawl_date: datetime = SchemaField(
|
||||
description="Start date for crawled content"
|
||||
start_crawl_date: Optional[datetime] = SchemaField(
|
||||
description="Start date for crawled content", advanced=True, default=None
|
||||
)
|
||||
end_crawl_date: datetime = SchemaField(
|
||||
description="End date for crawled content"
|
||||
end_crawl_date: Optional[datetime] = SchemaField(
|
||||
description="End date for crawled content", advanced=True, default=None
|
||||
)
|
||||
start_published_date: datetime = SchemaField(
|
||||
description="Start date for published content"
|
||||
start_published_date: Optional[datetime] = SchemaField(
|
||||
description="Start date for published content", advanced=True, default=None
|
||||
)
|
||||
end_published_date: datetime = SchemaField(
|
||||
description="End date for published content"
|
||||
end_published_date: Optional[datetime] = SchemaField(
|
||||
description="End date for published content", advanced=True, default=None
|
||||
)
|
||||
include_text: list[str] = SchemaField(
|
||||
description="Text patterns to include (max 1 string, up to 5 words)",
|
||||
@@ -65,15 +71,27 @@ class ExaFindSimilarBlock(Block):
|
||||
default=ContentSettings(),
|
||||
advanced=True,
|
||||
)
|
||||
moderation: bool = SchemaField(
|
||||
description="Enable content moderation to filter unsafe content from search results",
|
||||
default=False,
|
||||
advanced=True,
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
results: list[Any] = SchemaField(
|
||||
description="List of similar documents with title, URL, published date, author, and score",
|
||||
default_factory=list,
|
||||
results: list[ExaSearchResults] = SchemaField(
|
||||
description="List of similar documents with metadata and content"
|
||||
)
|
||||
error: str = SchemaField(
|
||||
description="Error message if the request failed", default=""
|
||||
result: ExaSearchResults = SchemaField(
|
||||
description="Single similar document result"
|
||||
)
|
||||
context: str = SchemaField(
|
||||
description="A formatted string of the results ready for LLMs."
|
||||
)
|
||||
request_id: str = SchemaField(description="Unique identifier for the request")
|
||||
cost_dollars: Optional[CostDollars] = SchemaField(
|
||||
description="Cost breakdown for the request"
|
||||
)
|
||||
error: str = SchemaField(description="Error message if the request failed")
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
@@ -87,47 +105,65 @@ class ExaFindSimilarBlock(Block):
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
url = "https://api.exa.ai/findSimilar"
|
||||
headers = {
|
||||
"Content-Type": "application/json",
|
||||
"x-api-key": credentials.api_key.get_secret_value(),
|
||||
}
|
||||
|
||||
payload = {
|
||||
sdk_kwargs = {
|
||||
"url": input_data.url,
|
||||
"numResults": input_data.number_of_results,
|
||||
"contents": input_data.contents.model_dump(),
|
||||
"num_results": input_data.number_of_results,
|
||||
}
|
||||
|
||||
optional_field_mapping = {
|
||||
"include_domains": "includeDomains",
|
||||
"exclude_domains": "excludeDomains",
|
||||
"include_text": "includeText",
|
||||
"exclude_text": "excludeText",
|
||||
}
|
||||
# Handle domains
|
||||
if input_data.include_domains:
|
||||
sdk_kwargs["include_domains"] = input_data.include_domains
|
||||
if input_data.exclude_domains:
|
||||
sdk_kwargs["exclude_domains"] = input_data.exclude_domains
|
||||
|
||||
# Add optional fields if they have values
|
||||
for input_field, api_field in optional_field_mapping.items():
|
||||
value = getattr(input_data, input_field)
|
||||
if value: # Only add non-empty values
|
||||
payload[api_field] = value
|
||||
# Handle dates
|
||||
if input_data.start_crawl_date:
|
||||
sdk_kwargs["start_crawl_date"] = input_data.start_crawl_date.isoformat()
|
||||
if input_data.end_crawl_date:
|
||||
sdk_kwargs["end_crawl_date"] = input_data.end_crawl_date.isoformat()
|
||||
if input_data.start_published_date:
|
||||
sdk_kwargs["start_published_date"] = (
|
||||
input_data.start_published_date.isoformat()
|
||||
)
|
||||
if input_data.end_published_date:
|
||||
sdk_kwargs["end_published_date"] = input_data.end_published_date.isoformat()
|
||||
|
||||
date_field_mapping = {
|
||||
"start_crawl_date": "startCrawlDate",
|
||||
"end_crawl_date": "endCrawlDate",
|
||||
"start_published_date": "startPublishedDate",
|
||||
"end_published_date": "endPublishedDate",
|
||||
}
|
||||
# Handle text filters
|
||||
if input_data.include_text:
|
||||
sdk_kwargs["include_text"] = input_data.include_text
|
||||
if input_data.exclude_text:
|
||||
sdk_kwargs["exclude_text"] = input_data.exclude_text
|
||||
|
||||
# Add dates if they exist
|
||||
for input_field, api_field in date_field_mapping.items():
|
||||
value = getattr(input_data, input_field, None)
|
||||
if value:
|
||||
payload[api_field] = value.strftime("%Y-%m-%dT%H:%M:%S.000Z")
|
||||
if input_data.moderation:
|
||||
sdk_kwargs["moderation"] = input_data.moderation
|
||||
|
||||
try:
|
||||
response = await Requests().post(url, headers=headers, json=payload)
|
||||
data = response.json()
|
||||
yield "results", data.get("results", [])
|
||||
except Exception as e:
|
||||
yield "error", str(e)
|
||||
# check if we need to use find_similar_and_contents
|
||||
content_settings = process_contents_settings(input_data.contents)
|
||||
|
||||
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
|
||||
|
||||
if content_settings:
|
||||
# Use find_similar_and_contents when contents are requested
|
||||
sdk_kwargs["text"] = content_settings.get("text", False)
|
||||
if "highlights" in content_settings:
|
||||
sdk_kwargs["highlights"] = content_settings["highlights"]
|
||||
if "summary" in content_settings:
|
||||
sdk_kwargs["summary"] = content_settings["summary"]
|
||||
response = await aexa.find_similar_and_contents(**sdk_kwargs)
|
||||
else:
|
||||
response = await aexa.find_similar(**sdk_kwargs)
|
||||
|
||||
converted_results = [
|
||||
ExaSearchResults.from_sdk(sdk_result)
|
||||
for sdk_result in response.results or []
|
||||
]
|
||||
|
||||
yield "results", converted_results
|
||||
for result in converted_results:
|
||||
yield "result", result
|
||||
|
||||
if response.context:
|
||||
yield "context", response.context
|
||||
|
||||
if response.cost_dollars:
|
||||
yield "cost_dollars", response.cost_dollars
|
||||
|
||||
@@ -132,45 +132,33 @@ class ExaWebsetWebhookBlock(Block):
|
||||
|
||||
async def run(self, input_data: Input, **kwargs) -> BlockOutput:
|
||||
"""Process incoming Exa webhook payload."""
|
||||
try:
|
||||
payload = input_data.payload
|
||||
payload = input_data.payload
|
||||
|
||||
# Extract event details
|
||||
event_type = payload.get("eventType", "unknown")
|
||||
event_id = payload.get("eventId", "")
|
||||
# Extract event details
|
||||
event_type = payload.get("eventType", "unknown")
|
||||
event_id = payload.get("eventId", "")
|
||||
|
||||
# Get webset ID from payload or input
|
||||
webset_id = payload.get("websetId", input_data.webset_id)
|
||||
# Get webset ID from payload or input
|
||||
webset_id = payload.get("websetId", input_data.webset_id)
|
||||
|
||||
# Check if we should process this event based on filter
|
||||
should_process = self._should_process_event(
|
||||
event_type, input_data.event_filter
|
||||
)
|
||||
# Check if we should process this event based on filter
|
||||
should_process = self._should_process_event(event_type, input_data.event_filter)
|
||||
|
||||
if not should_process:
|
||||
# Skip events that don't match our filter
|
||||
return
|
||||
if not should_process:
|
||||
# Skip events that don't match our filter
|
||||
return
|
||||
|
||||
# Extract event data
|
||||
event_data = payload.get("data", {})
|
||||
timestamp = payload.get("occurredAt", payload.get("createdAt", ""))
|
||||
metadata = payload.get("metadata", {})
|
||||
# Extract event data
|
||||
event_data = payload.get("data", {})
|
||||
timestamp = payload.get("occurredAt", payload.get("createdAt", ""))
|
||||
metadata = payload.get("metadata", {})
|
||||
|
||||
yield "event_type", event_type
|
||||
yield "event_id", event_id
|
||||
yield "webset_id", webset_id
|
||||
yield "data", event_data
|
||||
yield "timestamp", timestamp
|
||||
yield "metadata", metadata
|
||||
|
||||
except Exception as e:
|
||||
# Handle errors gracefully
|
||||
yield "event_type", "error"
|
||||
yield "event_id", ""
|
||||
yield "webset_id", input_data.webset_id
|
||||
yield "data", {"error": str(e)}
|
||||
yield "timestamp", ""
|
||||
yield "metadata", {}
|
||||
yield "event_type", event_type
|
||||
yield "event_id", event_id
|
||||
yield "webset_id", webset_id
|
||||
yield "data", event_data
|
||||
yield "timestamp", timestamp
|
||||
yield "metadata", metadata
|
||||
|
||||
def _should_process_event(
|
||||
self, event_type: str, event_filter: WebsetEventFilter
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,554 @@
|
||||
"""
|
||||
Exa Websets Enrichment Management Blocks
|
||||
|
||||
This module provides blocks for creating and managing enrichments on webset items,
|
||||
allowing extraction of additional structured data from existing items.
|
||||
"""
|
||||
|
||||
from enum import Enum
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
from exa_py import AsyncExa
|
||||
from exa_py.websets.types import WebsetEnrichment as SdkWebsetEnrichment
|
||||
from pydantic import BaseModel
|
||||
|
||||
from backend.sdk import (
|
||||
APIKeyCredentials,
|
||||
Block,
|
||||
BlockCategory,
|
||||
BlockOutput,
|
||||
BlockSchemaInput,
|
||||
BlockSchemaOutput,
|
||||
CredentialsMetaInput,
|
||||
Requests,
|
||||
SchemaField,
|
||||
)
|
||||
|
||||
from ._config import exa
|
||||
|
||||
|
||||
# Mirrored model for stability
|
||||
class WebsetEnrichmentModel(BaseModel):
|
||||
"""Stable output model mirroring SDK WebsetEnrichment."""
|
||||
|
||||
id: str
|
||||
webset_id: str
|
||||
status: str
|
||||
title: Optional[str]
|
||||
description: str
|
||||
format: str
|
||||
options: List[str]
|
||||
instructions: Optional[str]
|
||||
metadata: Dict[str, Any]
|
||||
created_at: str
|
||||
updated_at: str
|
||||
|
||||
@classmethod
|
||||
def from_sdk(cls, enrichment: SdkWebsetEnrichment) -> "WebsetEnrichmentModel":
|
||||
"""Convert SDK WebsetEnrichment to our stable model."""
|
||||
# Extract options
|
||||
options_list = []
|
||||
if enrichment.options:
|
||||
for option in enrichment.options:
|
||||
option_dict = option.model_dump(by_alias=True)
|
||||
options_list.append(option_dict.get("label", ""))
|
||||
|
||||
return cls(
|
||||
id=enrichment.id,
|
||||
webset_id=enrichment.webset_id,
|
||||
status=(
|
||||
enrichment.status.value
|
||||
if hasattr(enrichment.status, "value")
|
||||
else str(enrichment.status)
|
||||
),
|
||||
title=enrichment.title,
|
||||
description=enrichment.description,
|
||||
format=(
|
||||
enrichment.format.value
|
||||
if enrichment.format and hasattr(enrichment.format, "value")
|
||||
else "text"
|
||||
),
|
||||
options=options_list,
|
||||
instructions=enrichment.instructions,
|
||||
metadata=enrichment.metadata if enrichment.metadata else {},
|
||||
created_at=(
|
||||
enrichment.created_at.isoformat() if enrichment.created_at else ""
|
||||
),
|
||||
updated_at=(
|
||||
enrichment.updated_at.isoformat() if enrichment.updated_at else ""
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
class EnrichmentFormat(str, Enum):
|
||||
"""Format types for enrichment responses."""
|
||||
|
||||
TEXT = "text" # Free text response
|
||||
DATE = "date" # Date/datetime format
|
||||
NUMBER = "number" # Numeric value
|
||||
OPTIONS = "options" # Multiple choice from provided options
|
||||
EMAIL = "email" # Email address format
|
||||
PHONE = "phone" # Phone number format
|
||||
|
||||
|
||||
class ExaCreateEnrichmentBlock(Block):
|
||||
"""Create a new enrichment to extract additional data from webset items."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
webset_id: str = SchemaField(
|
||||
description="The ID or external ID of the Webset",
|
||||
placeholder="webset-id-or-external-id",
|
||||
)
|
||||
description: str = SchemaField(
|
||||
description="What data to extract from each item",
|
||||
placeholder="Extract the company's main product or service offering",
|
||||
)
|
||||
title: Optional[str] = SchemaField(
|
||||
default=None,
|
||||
description="Short title for this enrichment (auto-generated if not provided)",
|
||||
placeholder="Main Product",
|
||||
)
|
||||
format: EnrichmentFormat = SchemaField(
|
||||
default=EnrichmentFormat.TEXT,
|
||||
description="Expected format of the extracted data",
|
||||
)
|
||||
options: list[str] = SchemaField(
|
||||
default_factory=list,
|
||||
description="Available options when format is 'options'",
|
||||
placeholder='["B2B", "B2C", "Both", "Unknown"]',
|
||||
advanced=True,
|
||||
)
|
||||
apply_to_existing: bool = SchemaField(
|
||||
default=True,
|
||||
description="Apply this enrichment to existing items in the webset",
|
||||
)
|
||||
metadata: Optional[dict] = SchemaField(
|
||||
default=None,
|
||||
description="Metadata to attach to the enrichment",
|
||||
advanced=True,
|
||||
)
|
||||
wait_for_completion: bool = SchemaField(
|
||||
default=False,
|
||||
description="Wait for the enrichment to complete on existing items",
|
||||
)
|
||||
polling_timeout: int = SchemaField(
|
||||
default=300,
|
||||
description="Maximum time to wait for completion in seconds",
|
||||
advanced=True,
|
||||
ge=1,
|
||||
le=600,
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
enrichment_id: str = SchemaField(
|
||||
description="The unique identifier for the created enrichment"
|
||||
)
|
||||
webset_id: str = SchemaField(
|
||||
description="The webset this enrichment belongs to"
|
||||
)
|
||||
status: str = SchemaField(description="Current status of the enrichment")
|
||||
title: str = SchemaField(description="Title of the enrichment")
|
||||
description: str = SchemaField(
|
||||
description="Description of what data is extracted"
|
||||
)
|
||||
format: str = SchemaField(description="Format of the extracted data")
|
||||
instructions: str = SchemaField(
|
||||
description="Generated instructions for the enrichment"
|
||||
)
|
||||
items_enriched: Optional[int] = SchemaField(
|
||||
description="Number of items enriched (if wait_for_completion was True)"
|
||||
)
|
||||
completion_time: Optional[float] = SchemaField(
|
||||
description="Time taken to complete in seconds (if wait_for_completion was True)"
|
||||
)
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="71146ae8-0cb1-4a15-8cde-eae30de71cb6",
|
||||
description="Create enrichments to extract additional structured data from webset items",
|
||||
categories={BlockCategory.AI, BlockCategory.SEARCH},
|
||||
input_schema=ExaCreateEnrichmentBlock.Input,
|
||||
output_schema=ExaCreateEnrichmentBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
import time
|
||||
|
||||
# Build the payload
|
||||
payload: dict[str, Any] = {
|
||||
"description": input_data.description,
|
||||
"format": input_data.format.value,
|
||||
}
|
||||
|
||||
# Add title if provided
|
||||
if input_data.title:
|
||||
payload["title"] = input_data.title
|
||||
|
||||
# Add options for 'options' format
|
||||
if input_data.format == EnrichmentFormat.OPTIONS and input_data.options:
|
||||
payload["options"] = [{"label": opt} for opt in input_data.options]
|
||||
|
||||
# Add metadata if provided
|
||||
if input_data.metadata:
|
||||
payload["metadata"] = input_data.metadata
|
||||
|
||||
start_time = time.time()
|
||||
|
||||
# Use AsyncExa SDK
|
||||
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
|
||||
|
||||
sdk_enrichment = aexa.websets.enrichments.create(
|
||||
webset_id=input_data.webset_id, params=payload
|
||||
)
|
||||
|
||||
enrichment_id = sdk_enrichment.id
|
||||
status = (
|
||||
sdk_enrichment.status.value
|
||||
if hasattr(sdk_enrichment.status, "value")
|
||||
else str(sdk_enrichment.status)
|
||||
)
|
||||
|
||||
# If wait_for_completion is True and apply_to_existing is True, poll for completion
|
||||
if input_data.wait_for_completion and input_data.apply_to_existing:
|
||||
import asyncio
|
||||
|
||||
poll_interval = 5
|
||||
max_interval = 30
|
||||
poll_start = time.time()
|
||||
items_enriched = 0
|
||||
|
||||
while time.time() - poll_start < input_data.polling_timeout:
|
||||
current_enrich = aexa.websets.enrichments.get(
|
||||
webset_id=input_data.webset_id, id=enrichment_id
|
||||
)
|
||||
current_status = (
|
||||
current_enrich.status.value
|
||||
if hasattr(current_enrich.status, "value")
|
||||
else str(current_enrich.status)
|
||||
)
|
||||
|
||||
if current_status in ["completed", "failed", "cancelled"]:
|
||||
# Estimate items from webset searches
|
||||
webset = aexa.websets.get(id=input_data.webset_id)
|
||||
if webset.searches:
|
||||
for search in webset.searches:
|
||||
if search.progress:
|
||||
items_enriched += search.progress.found
|
||||
completion_time = time.time() - start_time
|
||||
|
||||
yield "enrichment_id", enrichment_id
|
||||
yield "webset_id", input_data.webset_id
|
||||
yield "status", current_status
|
||||
yield "title", sdk_enrichment.title
|
||||
yield "description", input_data.description
|
||||
yield "format", input_data.format.value
|
||||
yield "instructions", sdk_enrichment.instructions
|
||||
yield "items_enriched", items_enriched
|
||||
yield "completion_time", completion_time
|
||||
return
|
||||
|
||||
await asyncio.sleep(poll_interval)
|
||||
poll_interval = min(poll_interval * 1.5, max_interval)
|
||||
|
||||
# Timeout
|
||||
completion_time = time.time() - start_time
|
||||
yield "enrichment_id", enrichment_id
|
||||
yield "webset_id", input_data.webset_id
|
||||
yield "status", status
|
||||
yield "title", sdk_enrichment.title
|
||||
yield "description", input_data.description
|
||||
yield "format", input_data.format.value
|
||||
yield "instructions", sdk_enrichment.instructions
|
||||
yield "items_enriched", 0
|
||||
yield "completion_time", completion_time
|
||||
else:
|
||||
yield "enrichment_id", enrichment_id
|
||||
yield "webset_id", input_data.webset_id
|
||||
yield "status", status
|
||||
yield "title", sdk_enrichment.title
|
||||
yield "description", input_data.description
|
||||
yield "format", input_data.format.value
|
||||
yield "instructions", sdk_enrichment.instructions
|
||||
|
||||
|
||||
class ExaGetEnrichmentBlock(Block):
|
||||
"""Get the status and details of a webset enrichment."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
webset_id: str = SchemaField(
|
||||
description="The ID or external ID of the Webset",
|
||||
placeholder="webset-id-or-external-id",
|
||||
)
|
||||
enrichment_id: str = SchemaField(
|
||||
description="The ID of the enrichment to retrieve",
|
||||
placeholder="enrichment-id",
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
enrichment_id: str = SchemaField(
|
||||
description="The unique identifier for the enrichment"
|
||||
)
|
||||
status: str = SchemaField(description="Current status of the enrichment")
|
||||
title: str = SchemaField(description="Title of the enrichment")
|
||||
description: str = SchemaField(
|
||||
description="Description of what data is extracted"
|
||||
)
|
||||
format: str = SchemaField(description="Format of the extracted data")
|
||||
options: list[str] = SchemaField(
|
||||
description="Available options (for 'options' format)"
|
||||
)
|
||||
instructions: str = SchemaField(
|
||||
description="Generated instructions for the enrichment"
|
||||
)
|
||||
created_at: str = SchemaField(description="When the enrichment was created")
|
||||
updated_at: str = SchemaField(
|
||||
description="When the enrichment was last updated"
|
||||
)
|
||||
metadata: dict = SchemaField(description="Metadata attached to the enrichment")
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="b8c9d0e1-f2a3-4567-89ab-cdef01234567",
|
||||
description="Get the status and details of a webset enrichment",
|
||||
categories={BlockCategory.SEARCH},
|
||||
input_schema=ExaGetEnrichmentBlock.Input,
|
||||
output_schema=ExaGetEnrichmentBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
# Use AsyncExa SDK
|
||||
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
|
||||
|
||||
sdk_enrichment = aexa.websets.enrichments.get(
|
||||
webset_id=input_data.webset_id, id=input_data.enrichment_id
|
||||
)
|
||||
|
||||
enrichment = WebsetEnrichmentModel.from_sdk(sdk_enrichment)
|
||||
|
||||
yield "enrichment_id", enrichment.id
|
||||
yield "status", enrichment.status
|
||||
yield "title", enrichment.title
|
||||
yield "description", enrichment.description
|
||||
yield "format", enrichment.format
|
||||
yield "options", enrichment.options
|
||||
yield "instructions", enrichment.instructions
|
||||
yield "created_at", enrichment.created_at
|
||||
yield "updated_at", enrichment.updated_at
|
||||
yield "metadata", enrichment.metadata
|
||||
|
||||
|
||||
class ExaUpdateEnrichmentBlock(Block):
|
||||
"""Update an existing enrichment configuration."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
webset_id: str = SchemaField(
|
||||
description="The ID or external ID of the Webset",
|
||||
placeholder="webset-id-or-external-id",
|
||||
)
|
||||
enrichment_id: str = SchemaField(
|
||||
description="The ID of the enrichment to update",
|
||||
placeholder="enrichment-id",
|
||||
)
|
||||
description: Optional[str] = SchemaField(
|
||||
default=None,
|
||||
description="New description for what data to extract",
|
||||
)
|
||||
format: Optional[EnrichmentFormat] = SchemaField(
|
||||
default=None,
|
||||
description="New format for the extracted data",
|
||||
)
|
||||
options: Optional[list[str]] = SchemaField(
|
||||
default=None,
|
||||
description="New options when format is 'options'",
|
||||
)
|
||||
metadata: Optional[dict] = SchemaField(
|
||||
default=None,
|
||||
description="New metadata to attach to the enrichment",
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
enrichment_id: str = SchemaField(
|
||||
description="The unique identifier for the enrichment"
|
||||
)
|
||||
status: str = SchemaField(description="Current status of the enrichment")
|
||||
title: str = SchemaField(description="Title of the enrichment")
|
||||
description: str = SchemaField(description="Updated description")
|
||||
format: str = SchemaField(description="Updated format")
|
||||
success: str = SchemaField(description="Whether the update was successful")
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="c8d5c5fb-9684-4a29-bd2a-5b38d71776c9",
|
||||
description="Update an existing enrichment configuration",
|
||||
categories={BlockCategory.SEARCH},
|
||||
input_schema=ExaUpdateEnrichmentBlock.Input,
|
||||
output_schema=ExaUpdateEnrichmentBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
url = f"https://api.exa.ai/websets/v0/websets/{input_data.webset_id}/enrichments/{input_data.enrichment_id}"
|
||||
headers = {
|
||||
"Content-Type": "application/json",
|
||||
"x-api-key": credentials.api_key.get_secret_value(),
|
||||
}
|
||||
|
||||
# Build the update payload
|
||||
payload = {}
|
||||
|
||||
if input_data.description is not None:
|
||||
payload["description"] = input_data.description
|
||||
|
||||
if input_data.format is not None:
|
||||
payload["format"] = input_data.format.value
|
||||
|
||||
if input_data.options is not None:
|
||||
payload["options"] = [{"label": opt} for opt in input_data.options]
|
||||
|
||||
if input_data.metadata is not None:
|
||||
payload["metadata"] = input_data.metadata
|
||||
|
||||
try:
|
||||
response = await Requests().patch(url, headers=headers, json=payload)
|
||||
data = response.json()
|
||||
|
||||
yield "enrichment_id", data.get("id", "")
|
||||
yield "status", data.get("status", "")
|
||||
yield "title", data.get("title", "")
|
||||
yield "description", data.get("description", "")
|
||||
yield "format", data.get("format", "")
|
||||
yield "success", "true"
|
||||
|
||||
except ValueError as e:
|
||||
# Re-raise user input validation errors
|
||||
raise ValueError(f"Failed to update enrichment: {e}") from e
|
||||
# Let all other exceptions propagate naturally
|
||||
|
||||
|
||||
class ExaDeleteEnrichmentBlock(Block):
|
||||
"""Delete an enrichment from a webset."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
webset_id: str = SchemaField(
|
||||
description="The ID or external ID of the Webset",
|
||||
placeholder="webset-id-or-external-id",
|
||||
)
|
||||
enrichment_id: str = SchemaField(
|
||||
description="The ID of the enrichment to delete",
|
||||
placeholder="enrichment-id",
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
enrichment_id: str = SchemaField(description="The ID of the deleted enrichment")
|
||||
success: str = SchemaField(description="Whether the deletion was successful")
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="b250de56-2ca6-4237-a7b8-b5684892189f",
|
||||
description="Delete an enrichment from a webset",
|
||||
categories={BlockCategory.SEARCH},
|
||||
input_schema=ExaDeleteEnrichmentBlock.Input,
|
||||
output_schema=ExaDeleteEnrichmentBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
# Use AsyncExa SDK
|
||||
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
|
||||
|
||||
deleted_enrichment = aexa.websets.enrichments.delete(
|
||||
webset_id=input_data.webset_id, id=input_data.enrichment_id
|
||||
)
|
||||
|
||||
yield "enrichment_id", deleted_enrichment.id
|
||||
yield "success", "true"
|
||||
|
||||
|
||||
class ExaCancelEnrichmentBlock(Block):
|
||||
"""Cancel a running enrichment operation."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
webset_id: str = SchemaField(
|
||||
description="The ID or external ID of the Webset",
|
||||
placeholder="webset-id-or-external-id",
|
||||
)
|
||||
enrichment_id: str = SchemaField(
|
||||
description="The ID of the enrichment to cancel",
|
||||
placeholder="enrichment-id",
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
enrichment_id: str = SchemaField(
|
||||
description="The ID of the canceled enrichment"
|
||||
)
|
||||
status: str = SchemaField(description="Status after cancellation")
|
||||
items_enriched_before_cancel: int = SchemaField(
|
||||
description="Approximate number of items enriched before cancellation"
|
||||
)
|
||||
success: str = SchemaField(
|
||||
description="Whether the cancellation was successful"
|
||||
)
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="7e1f8f0f-b6ab-43b3-bd1d-0c534a649295",
|
||||
description="Cancel a running enrichment operation",
|
||||
categories={BlockCategory.SEARCH},
|
||||
input_schema=ExaCancelEnrichmentBlock.Input,
|
||||
output_schema=ExaCancelEnrichmentBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
# Use AsyncExa SDK
|
||||
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
|
||||
|
||||
canceled_enrichment = aexa.websets.enrichments.cancel(
|
||||
webset_id=input_data.webset_id, id=input_data.enrichment_id
|
||||
)
|
||||
|
||||
# Try to estimate how many items were enriched before cancellation
|
||||
items_enriched = 0
|
||||
items_response = aexa.websets.items.list(
|
||||
webset_id=input_data.webset_id, limit=100
|
||||
)
|
||||
|
||||
for sdk_item in items_response.data:
|
||||
# Check if this enrichment is present
|
||||
for enrich_result in sdk_item.enrichments:
|
||||
if enrich_result.enrichment_id == input_data.enrichment_id:
|
||||
items_enriched += 1
|
||||
break
|
||||
|
||||
status = (
|
||||
canceled_enrichment.status.value
|
||||
if hasattr(canceled_enrichment.status, "value")
|
||||
else str(canceled_enrichment.status)
|
||||
)
|
||||
|
||||
yield "enrichment_id", canceled_enrichment.id
|
||||
yield "status", status
|
||||
yield "items_enriched_before_cancel", items_enriched
|
||||
yield "success", "true"
|
||||
@@ -0,0 +1,676 @@
|
||||
"""
|
||||
Exa Websets Import/Export Management Blocks
|
||||
|
||||
This module provides blocks for importing data into websets from CSV files
|
||||
and exporting webset data in various formats.
|
||||
"""
|
||||
|
||||
import csv
|
||||
import json
|
||||
from enum import Enum
|
||||
from io import StringIO
|
||||
from typing import Optional, Union
|
||||
|
||||
from exa_py import AsyncExa
|
||||
from exa_py.websets.types import CreateImportResponse
|
||||
from exa_py.websets.types import Import as SdkImport
|
||||
from pydantic import BaseModel
|
||||
|
||||
from backend.sdk import (
|
||||
APIKeyCredentials,
|
||||
Block,
|
||||
BlockCategory,
|
||||
BlockOutput,
|
||||
BlockSchemaInput,
|
||||
BlockSchemaOutput,
|
||||
CredentialsMetaInput,
|
||||
SchemaField,
|
||||
)
|
||||
|
||||
from ._config import exa
|
||||
from ._test import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT
|
||||
|
||||
|
||||
# Mirrored model for stability - don't use SDK types directly in block outputs
|
||||
class ImportModel(BaseModel):
|
||||
"""Stable output model mirroring SDK Import."""
|
||||
|
||||
id: str
|
||||
status: str
|
||||
title: str
|
||||
format: str
|
||||
entity_type: str
|
||||
count: int
|
||||
upload_url: Optional[str] # Only in CreateImportResponse
|
||||
upload_valid_until: Optional[str] # Only in CreateImportResponse
|
||||
failed_reason: str
|
||||
failed_message: str
|
||||
metadata: dict
|
||||
created_at: str
|
||||
updated_at: str
|
||||
|
||||
@classmethod
|
||||
def from_sdk(
|
||||
cls, import_obj: Union[SdkImport, CreateImportResponse]
|
||||
) -> "ImportModel":
|
||||
"""Convert SDK Import or CreateImportResponse to our stable model."""
|
||||
# Extract entity type from union (may be None)
|
||||
entity_type = "unknown"
|
||||
if import_obj.entity:
|
||||
entity_dict = import_obj.entity.model_dump(by_alias=True, exclude_none=True)
|
||||
entity_type = entity_dict.get("type", "unknown")
|
||||
|
||||
# Handle status enum
|
||||
status_str = (
|
||||
import_obj.status.value
|
||||
if hasattr(import_obj.status, "value")
|
||||
else str(import_obj.status)
|
||||
)
|
||||
|
||||
# Handle format enum
|
||||
format_str = (
|
||||
import_obj.format.value
|
||||
if hasattr(import_obj.format, "value")
|
||||
else str(import_obj.format)
|
||||
)
|
||||
|
||||
# Handle failed_reason enum (may be None or enum)
|
||||
failed_reason_str = ""
|
||||
if import_obj.failed_reason:
|
||||
failed_reason_str = (
|
||||
import_obj.failed_reason.value
|
||||
if hasattr(import_obj.failed_reason, "value")
|
||||
else str(import_obj.failed_reason)
|
||||
)
|
||||
|
||||
return cls(
|
||||
id=import_obj.id,
|
||||
status=status_str,
|
||||
title=import_obj.title or "",
|
||||
format=format_str,
|
||||
entity_type=entity_type,
|
||||
count=int(import_obj.count or 0),
|
||||
upload_url=getattr(
|
||||
import_obj, "upload_url", None
|
||||
), # Only in CreateImportResponse
|
||||
upload_valid_until=getattr(
|
||||
import_obj, "upload_valid_until", None
|
||||
), # Only in CreateImportResponse
|
||||
failed_reason=failed_reason_str,
|
||||
failed_message=import_obj.failed_message or "",
|
||||
metadata=import_obj.metadata or {},
|
||||
created_at=(
|
||||
import_obj.created_at.isoformat() if import_obj.created_at else ""
|
||||
),
|
||||
updated_at=(
|
||||
import_obj.updated_at.isoformat() if import_obj.updated_at else ""
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
class ImportFormat(str, Enum):
|
||||
"""Supported import formats."""
|
||||
|
||||
CSV = "csv"
|
||||
# JSON = "json" # Future support
|
||||
|
||||
|
||||
class ImportEntityType(str, Enum):
|
||||
"""Entity types for imports."""
|
||||
|
||||
COMPANY = "company"
|
||||
PERSON = "person"
|
||||
ARTICLE = "article"
|
||||
RESEARCH_PAPER = "research_paper"
|
||||
CUSTOM = "custom"
|
||||
|
||||
|
||||
class ExportFormat(str, Enum):
|
||||
"""Supported export formats."""
|
||||
|
||||
JSON = "json"
|
||||
CSV = "csv"
|
||||
JSON_LINES = "jsonl"
|
||||
|
||||
|
||||
class ExaCreateImportBlock(Block):
|
||||
"""Create an import to load external data that can be used with websets."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
title: str = SchemaField(
|
||||
description="Title for this import",
|
||||
placeholder="Customer List Import",
|
||||
)
|
||||
csv_data: str = SchemaField(
|
||||
description="CSV data to import (as a string)",
|
||||
placeholder="name,url\nAcme Corp,https://acme.com\nExample Inc,https://example.com",
|
||||
)
|
||||
entity_type: ImportEntityType = SchemaField(
|
||||
default=ImportEntityType.COMPANY,
|
||||
description="Type of entities being imported",
|
||||
)
|
||||
entity_description: Optional[str] = SchemaField(
|
||||
default=None,
|
||||
description="Description for custom entity type",
|
||||
advanced=True,
|
||||
)
|
||||
identifier_column: int = SchemaField(
|
||||
default=0,
|
||||
description="Column index containing the identifier (0-based)",
|
||||
ge=0,
|
||||
)
|
||||
url_column: Optional[int] = SchemaField(
|
||||
default=None,
|
||||
description="Column index containing URLs (optional)",
|
||||
ge=0,
|
||||
advanced=True,
|
||||
)
|
||||
metadata: Optional[dict] = SchemaField(
|
||||
default=None,
|
||||
description="Metadata to attach to the import",
|
||||
advanced=True,
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
import_id: str = SchemaField(
|
||||
description="The unique identifier for the created import"
|
||||
)
|
||||
status: str = SchemaField(description="Current status of the import")
|
||||
title: str = SchemaField(description="Title of the import")
|
||||
count: int = SchemaField(description="Number of items in the import")
|
||||
entity_type: str = SchemaField(description="Type of entities imported")
|
||||
upload_url: Optional[str] = SchemaField(
|
||||
description="Upload URL for CSV data (only if csv_data not provided in request)"
|
||||
)
|
||||
upload_valid_until: Optional[str] = SchemaField(
|
||||
description="Expiration time for upload URL (only if upload_url is provided)"
|
||||
)
|
||||
created_at: str = SchemaField(description="When the import was created")
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="020a35d8-8a53-4e60-8b60-1de5cbab1df3",
|
||||
description="Import CSV data to use with websets for targeted searches",
|
||||
categories={BlockCategory.DATA},
|
||||
input_schema=ExaCreateImportBlock.Input,
|
||||
output_schema=ExaCreateImportBlock.Output,
|
||||
test_input={
|
||||
"credentials": TEST_CREDENTIALS_INPUT,
|
||||
"title": "Test Import",
|
||||
"csv_data": "name,url\nAcme,https://acme.com",
|
||||
"entity_type": ImportEntityType.COMPANY,
|
||||
"identifier_column": 0,
|
||||
},
|
||||
test_output=[
|
||||
("import_id", "import-123"),
|
||||
("status", "pending"),
|
||||
("title", "Test Import"),
|
||||
("count", 1),
|
||||
("entity_type", "company"),
|
||||
("upload_url", None),
|
||||
("upload_valid_until", None),
|
||||
("created_at", "2024-01-01T00:00:00"),
|
||||
],
|
||||
test_credentials=TEST_CREDENTIALS,
|
||||
test_mock=self._create_test_mock(),
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def _create_test_mock():
|
||||
"""Create test mocks for the AsyncExa SDK."""
|
||||
from datetime import datetime
|
||||
from unittest.mock import MagicMock
|
||||
|
||||
# Create mock SDK import object
|
||||
mock_import = MagicMock()
|
||||
mock_import.id = "import-123"
|
||||
mock_import.status = MagicMock(value="pending")
|
||||
mock_import.title = "Test Import"
|
||||
mock_import.format = MagicMock(value="csv")
|
||||
mock_import.count = 1
|
||||
mock_import.upload_url = None
|
||||
mock_import.upload_valid_until = None
|
||||
mock_import.failed_reason = None
|
||||
mock_import.failed_message = ""
|
||||
mock_import.metadata = {}
|
||||
mock_import.created_at = datetime.fromisoformat("2024-01-01T00:00:00")
|
||||
mock_import.updated_at = datetime.fromisoformat("2024-01-01T00:00:00")
|
||||
|
||||
# Mock entity
|
||||
mock_entity = MagicMock()
|
||||
mock_entity.model_dump = MagicMock(return_value={"type": "company"})
|
||||
mock_import.entity = mock_entity
|
||||
|
||||
return {
|
||||
"_get_client": lambda *args, **kwargs: MagicMock(
|
||||
websets=MagicMock(
|
||||
imports=MagicMock(create=lambda *args, **kwargs: mock_import)
|
||||
)
|
||||
)
|
||||
}
|
||||
|
||||
def _get_client(self, api_key: str) -> AsyncExa:
|
||||
"""Get Exa client (separated for testing)."""
|
||||
return AsyncExa(api_key=api_key)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
aexa = self._get_client(credentials.api_key.get_secret_value())
|
||||
|
||||
csv_reader = csv.reader(StringIO(input_data.csv_data))
|
||||
rows = list(csv_reader)
|
||||
count = len(rows) - 1 if len(rows) > 1 else 0
|
||||
|
||||
size = len(input_data.csv_data.encode("utf-8"))
|
||||
|
||||
payload = {
|
||||
"title": input_data.title,
|
||||
"format": ImportFormat.CSV.value,
|
||||
"count": count,
|
||||
"size": size,
|
||||
"csv": {
|
||||
"identifier": input_data.identifier_column,
|
||||
},
|
||||
}
|
||||
|
||||
# Add URL column if specified
|
||||
if input_data.url_column is not None:
|
||||
payload["csv"]["url"] = input_data.url_column
|
||||
|
||||
# Add entity configuration
|
||||
entity = {"type": input_data.entity_type.value}
|
||||
if (
|
||||
input_data.entity_type == ImportEntityType.CUSTOM
|
||||
and input_data.entity_description
|
||||
):
|
||||
entity["description"] = input_data.entity_description
|
||||
payload["entity"] = entity
|
||||
|
||||
# Add metadata if provided
|
||||
if input_data.metadata:
|
||||
payload["metadata"] = input_data.metadata
|
||||
|
||||
sdk_import = aexa.websets.imports.create(
|
||||
params=payload, csv_data=input_data.csv_data
|
||||
)
|
||||
|
||||
import_obj = ImportModel.from_sdk(sdk_import)
|
||||
|
||||
yield "import_id", import_obj.id
|
||||
yield "status", import_obj.status
|
||||
yield "title", import_obj.title
|
||||
yield "count", import_obj.count
|
||||
yield "entity_type", import_obj.entity_type
|
||||
yield "upload_url", import_obj.upload_url
|
||||
yield "upload_valid_until", import_obj.upload_valid_until
|
||||
yield "created_at", import_obj.created_at
|
||||
|
||||
|
||||
class ExaGetImportBlock(Block):
|
||||
"""Get the status and details of an import."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
import_id: str = SchemaField(
|
||||
description="The ID of the import to retrieve",
|
||||
placeholder="import-id",
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
import_id: str = SchemaField(description="The unique identifier for the import")
|
||||
status: str = SchemaField(description="Current status of the import")
|
||||
title: str = SchemaField(description="Title of the import")
|
||||
format: str = SchemaField(description="Format of the imported data")
|
||||
entity_type: str = SchemaField(description="Type of entities imported")
|
||||
count: int = SchemaField(description="Number of items imported")
|
||||
upload_url: Optional[str] = SchemaField(
|
||||
description="Upload URL for CSV data (if import not yet uploaded)"
|
||||
)
|
||||
upload_valid_until: Optional[str] = SchemaField(
|
||||
description="Expiration time for upload URL (if applicable)"
|
||||
)
|
||||
failed_reason: Optional[str] = SchemaField(
|
||||
description="Reason for failure (if applicable)"
|
||||
)
|
||||
failed_message: Optional[str] = SchemaField(
|
||||
description="Detailed failure message (if applicable)"
|
||||
)
|
||||
created_at: str = SchemaField(description="When the import was created")
|
||||
updated_at: str = SchemaField(description="When the import was last updated")
|
||||
metadata: dict = SchemaField(description="Metadata attached to the import")
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="236663c8-a8dc-45f7-a050-2676bb0a3dd2",
|
||||
description="Get the status and details of an import",
|
||||
categories={BlockCategory.DATA},
|
||||
input_schema=ExaGetImportBlock.Input,
|
||||
output_schema=ExaGetImportBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
# Use AsyncExa SDK
|
||||
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
|
||||
|
||||
sdk_import = aexa.websets.imports.get(import_id=input_data.import_id)
|
||||
|
||||
import_obj = ImportModel.from_sdk(sdk_import)
|
||||
|
||||
# Yield all fields
|
||||
yield "import_id", import_obj.id
|
||||
yield "status", import_obj.status
|
||||
yield "title", import_obj.title
|
||||
yield "format", import_obj.format
|
||||
yield "entity_type", import_obj.entity_type
|
||||
yield "count", import_obj.count
|
||||
yield "upload_url", import_obj.upload_url
|
||||
yield "upload_valid_until", import_obj.upload_valid_until
|
||||
yield "failed_reason", import_obj.failed_reason
|
||||
yield "failed_message", import_obj.failed_message
|
||||
yield "created_at", import_obj.created_at
|
||||
yield "updated_at", import_obj.updated_at
|
||||
yield "metadata", import_obj.metadata
|
||||
|
||||
|
||||
class ExaListImportsBlock(Block):
|
||||
"""List all imports with pagination."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
limit: int = SchemaField(
|
||||
default=25,
|
||||
description="Number of imports to return",
|
||||
ge=1,
|
||||
le=100,
|
||||
)
|
||||
cursor: Optional[str] = SchemaField(
|
||||
default=None,
|
||||
description="Cursor for pagination",
|
||||
advanced=True,
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
imports: list[dict] = SchemaField(description="List of imports")
|
||||
import_item: dict = SchemaField(
|
||||
description="Individual import (yielded for each import)"
|
||||
)
|
||||
has_more: bool = SchemaField(
|
||||
description="Whether there are more imports to paginate through"
|
||||
)
|
||||
next_cursor: Optional[str] = SchemaField(
|
||||
description="Cursor for the next page of results"
|
||||
)
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="65323630-f7e9-4692-a624-184ba14c0686",
|
||||
description="List all imports with pagination support",
|
||||
categories={BlockCategory.DATA},
|
||||
input_schema=ExaListImportsBlock.Input,
|
||||
output_schema=ExaListImportsBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
# Use AsyncExa SDK
|
||||
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
|
||||
|
||||
response = aexa.websets.imports.list(
|
||||
cursor=input_data.cursor,
|
||||
limit=input_data.limit,
|
||||
)
|
||||
|
||||
# Convert SDK imports to our stable models
|
||||
imports = [ImportModel.from_sdk(i) for i in response.data]
|
||||
|
||||
yield "imports", [i.model_dump() for i in imports]
|
||||
|
||||
for import_obj in imports:
|
||||
yield "import_item", import_obj.model_dump()
|
||||
|
||||
yield "has_more", response.has_more
|
||||
yield "next_cursor", response.next_cursor
|
||||
|
||||
|
||||
class ExaDeleteImportBlock(Block):
|
||||
"""Delete an import."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
import_id: str = SchemaField(
|
||||
description="The ID of the import to delete",
|
||||
placeholder="import-id",
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
import_id: str = SchemaField(description="The ID of the deleted import")
|
||||
success: str = SchemaField(description="Whether the deletion was successful")
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="81ae30ed-c7ba-4b5d-8483-b726846e570c",
|
||||
description="Delete an import",
|
||||
categories={BlockCategory.DATA},
|
||||
input_schema=ExaDeleteImportBlock.Input,
|
||||
output_schema=ExaDeleteImportBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
# Use AsyncExa SDK
|
||||
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
|
||||
|
||||
deleted_import = aexa.websets.imports.delete(import_id=input_data.import_id)
|
||||
|
||||
yield "import_id", deleted_import.id
|
||||
yield "success", "true"
|
||||
|
||||
|
||||
class ExaExportWebsetBlock(Block):
|
||||
"""Export all data from a webset in various formats."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
webset_id: str = SchemaField(
|
||||
description="The ID or external ID of the Webset to export",
|
||||
placeholder="webset-id-or-external-id",
|
||||
)
|
||||
format: ExportFormat = SchemaField(
|
||||
default=ExportFormat.JSON,
|
||||
description="Export format",
|
||||
)
|
||||
include_content: bool = SchemaField(
|
||||
default=True,
|
||||
description="Include full content in export",
|
||||
)
|
||||
include_enrichments: bool = SchemaField(
|
||||
default=True,
|
||||
description="Include enrichment data in export",
|
||||
)
|
||||
max_items: int = SchemaField(
|
||||
default=100,
|
||||
description="Maximum number of items to export",
|
||||
ge=1,
|
||||
le=100,
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
export_data: str = SchemaField(
|
||||
description="Exported data in the requested format"
|
||||
)
|
||||
item_count: int = SchemaField(description="Number of items exported")
|
||||
total_items: int = SchemaField(
|
||||
description="Total number of items in the webset"
|
||||
)
|
||||
truncated: bool = SchemaField(
|
||||
description="Whether the export was truncated due to max_items limit"
|
||||
)
|
||||
format: str = SchemaField(description="Format of the exported data")
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="5da9d0fd-4b5b-4318-8302-8f71d0ccce9d",
|
||||
description="Export webset data in JSON, CSV, or JSON Lines format",
|
||||
categories={BlockCategory.DATA},
|
||||
input_schema=ExaExportWebsetBlock.Input,
|
||||
output_schema=ExaExportWebsetBlock.Output,
|
||||
test_input={
|
||||
"credentials": TEST_CREDENTIALS_INPUT,
|
||||
"webset_id": "test-webset",
|
||||
"format": ExportFormat.JSON,
|
||||
"include_content": True,
|
||||
"include_enrichments": True,
|
||||
"max_items": 10,
|
||||
},
|
||||
test_output=[
|
||||
("export_data", str),
|
||||
("item_count", 2),
|
||||
("total_items", 2),
|
||||
("truncated", False),
|
||||
("format", "json"),
|
||||
],
|
||||
test_credentials=TEST_CREDENTIALS,
|
||||
test_mock=self._create_test_mock(),
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def _create_test_mock():
|
||||
"""Create test mocks for the AsyncExa SDK."""
|
||||
from unittest.mock import MagicMock
|
||||
|
||||
# Create mock webset items
|
||||
mock_item1 = MagicMock()
|
||||
mock_item1.model_dump = MagicMock(
|
||||
return_value={
|
||||
"id": "item-1",
|
||||
"url": "https://example.com",
|
||||
"title": "Test Item 1",
|
||||
}
|
||||
)
|
||||
|
||||
mock_item2 = MagicMock()
|
||||
mock_item2.model_dump = MagicMock(
|
||||
return_value={
|
||||
"id": "item-2",
|
||||
"url": "https://example.org",
|
||||
"title": "Test Item 2",
|
||||
}
|
||||
)
|
||||
|
||||
# Create mock iterator
|
||||
mock_items = [mock_item1, mock_item2]
|
||||
|
||||
return {
|
||||
"_get_client": lambda *args, **kwargs: MagicMock(
|
||||
websets=MagicMock(
|
||||
items=MagicMock(list_all=lambda *args, **kwargs: iter(mock_items))
|
||||
)
|
||||
)
|
||||
}
|
||||
|
||||
def _get_client(self, api_key: str) -> AsyncExa:
|
||||
"""Get Exa client (separated for testing)."""
|
||||
return AsyncExa(api_key=api_key)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
# Use AsyncExa SDK
|
||||
aexa = self._get_client(credentials.api_key.get_secret_value())
|
||||
|
||||
try:
|
||||
all_items = []
|
||||
|
||||
# Use SDK's list_all iterator to fetch items
|
||||
item_iterator = aexa.websets.items.list_all(
|
||||
webset_id=input_data.webset_id, limit=input_data.max_items
|
||||
)
|
||||
|
||||
for sdk_item in item_iterator:
|
||||
if len(all_items) >= input_data.max_items:
|
||||
break
|
||||
|
||||
# Convert to dict for export
|
||||
item_dict = sdk_item.model_dump(by_alias=True, exclude_none=True)
|
||||
all_items.append(item_dict)
|
||||
|
||||
# Calculate total and truncated
|
||||
total_items = len(all_items) # SDK doesn't provide total count
|
||||
truncated = len(all_items) >= input_data.max_items
|
||||
|
||||
# Process items based on include flags
|
||||
if not input_data.include_content:
|
||||
for item in all_items:
|
||||
item.pop("content", None)
|
||||
|
||||
if not input_data.include_enrichments:
|
||||
for item in all_items:
|
||||
item.pop("enrichments", None)
|
||||
|
||||
# Format the export data
|
||||
export_data = ""
|
||||
|
||||
if input_data.format == ExportFormat.JSON:
|
||||
export_data = json.dumps(all_items, indent=2, default=str)
|
||||
|
||||
elif input_data.format == ExportFormat.JSON_LINES:
|
||||
lines = [json.dumps(item, default=str) for item in all_items]
|
||||
export_data = "\n".join(lines)
|
||||
|
||||
elif input_data.format == ExportFormat.CSV:
|
||||
# Extract all unique keys for CSV headers
|
||||
all_keys = set()
|
||||
for item in all_items:
|
||||
all_keys.update(self._flatten_dict(item).keys())
|
||||
|
||||
# Create CSV
|
||||
output = StringIO()
|
||||
writer = csv.DictWriter(output, fieldnames=sorted(all_keys))
|
||||
writer.writeheader()
|
||||
|
||||
for item in all_items:
|
||||
flat_item = self._flatten_dict(item)
|
||||
writer.writerow(flat_item)
|
||||
|
||||
export_data = output.getvalue()
|
||||
|
||||
yield "export_data", export_data
|
||||
yield "item_count", len(all_items)
|
||||
yield "total_items", total_items
|
||||
yield "truncated", truncated
|
||||
yield "format", input_data.format.value
|
||||
|
||||
except ValueError as e:
|
||||
# Re-raise user input validation errors
|
||||
raise ValueError(f"Failed to export webset: {e}") from e
|
||||
# Let all other exceptions propagate naturally
|
||||
|
||||
def _flatten_dict(self, d: dict, parent_key: str = "", sep: str = "_") -> dict:
|
||||
"""Flatten nested dictionaries for CSV export."""
|
||||
items = []
|
||||
for k, v in d.items():
|
||||
new_key = f"{parent_key}{sep}{k}" if parent_key else k
|
||||
if isinstance(v, dict):
|
||||
items.extend(self._flatten_dict(v, new_key, sep=sep).items())
|
||||
elif isinstance(v, list):
|
||||
# Convert lists to JSON strings for CSV
|
||||
items.append((new_key, json.dumps(v, default=str)))
|
||||
else:
|
||||
items.append((new_key, v))
|
||||
return dict(items)
|
||||
591
autogpt_platform/backend/backend/blocks/exa/websets_items.py
Normal file
591
autogpt_platform/backend/backend/blocks/exa/websets_items.py
Normal file
@@ -0,0 +1,591 @@
|
||||
"""
|
||||
Exa Websets Item Management Blocks
|
||||
|
||||
This module provides blocks for managing items within Exa websets, including
|
||||
retrieving, listing, deleting, and bulk operations on webset items.
|
||||
"""
|
||||
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
from exa_py import AsyncExa
|
||||
from exa_py.websets.types import WebsetItem as SdkWebsetItem
|
||||
from exa_py.websets.types import (
|
||||
WebsetItemArticleProperties,
|
||||
WebsetItemCompanyProperties,
|
||||
WebsetItemCustomProperties,
|
||||
WebsetItemPersonProperties,
|
||||
WebsetItemResearchPaperProperties,
|
||||
)
|
||||
from pydantic import AnyUrl, BaseModel
|
||||
|
||||
from backend.sdk import (
|
||||
APIKeyCredentials,
|
||||
Block,
|
||||
BlockCategory,
|
||||
BlockOutput,
|
||||
BlockSchemaInput,
|
||||
BlockSchemaOutput,
|
||||
CredentialsMetaInput,
|
||||
SchemaField,
|
||||
)
|
||||
|
||||
from ._config import exa
|
||||
|
||||
|
||||
# Mirrored model for enrichment results
|
||||
class EnrichmentResultModel(BaseModel):
|
||||
"""Stable output model mirroring SDK EnrichmentResult."""
|
||||
|
||||
enrichment_id: str
|
||||
format: str
|
||||
result: Optional[List[str]]
|
||||
reasoning: Optional[str]
|
||||
references: List[Dict[str, Any]]
|
||||
|
||||
@classmethod
|
||||
def from_sdk(cls, sdk_enrich) -> "EnrichmentResultModel":
|
||||
"""Convert SDK EnrichmentResult to our model."""
|
||||
format_str = (
|
||||
sdk_enrich.format.value
|
||||
if hasattr(sdk_enrich.format, "value")
|
||||
else str(sdk_enrich.format)
|
||||
)
|
||||
|
||||
# Convert references to dicts
|
||||
references_list = []
|
||||
if sdk_enrich.references:
|
||||
for ref in sdk_enrich.references:
|
||||
references_list.append(ref.model_dump(by_alias=True, exclude_none=True))
|
||||
|
||||
return cls(
|
||||
enrichment_id=sdk_enrich.enrichment_id,
|
||||
format=format_str,
|
||||
result=sdk_enrich.result,
|
||||
reasoning=sdk_enrich.reasoning,
|
||||
references=references_list,
|
||||
)
|
||||
|
||||
|
||||
# Mirrored model for stability - don't use SDK types directly in block outputs
|
||||
class WebsetItemModel(BaseModel):
|
||||
"""Stable output model mirroring SDK WebsetItem."""
|
||||
|
||||
id: str
|
||||
url: Optional[AnyUrl]
|
||||
title: str
|
||||
content: str
|
||||
entity_data: Dict[str, Any]
|
||||
enrichments: Dict[str, EnrichmentResultModel]
|
||||
created_at: str
|
||||
updated_at: str
|
||||
|
||||
@classmethod
|
||||
def from_sdk(cls, item: SdkWebsetItem) -> "WebsetItemModel":
|
||||
"""Convert SDK WebsetItem to our stable model."""
|
||||
# Extract properties from the union type
|
||||
properties_dict = {}
|
||||
url_value = None
|
||||
title = ""
|
||||
content = ""
|
||||
|
||||
if hasattr(item, "properties") and item.properties:
|
||||
properties_dict = item.properties.model_dump(
|
||||
by_alias=True, exclude_none=True
|
||||
)
|
||||
|
||||
# URL is always available on all property types
|
||||
url_value = item.properties.url
|
||||
|
||||
# Extract title using isinstance checks on the union type
|
||||
if isinstance(item.properties, WebsetItemPersonProperties):
|
||||
title = item.properties.person.name
|
||||
content = "" # Person type has no content
|
||||
elif isinstance(item.properties, WebsetItemCompanyProperties):
|
||||
title = item.properties.company.name
|
||||
content = item.properties.content or ""
|
||||
elif isinstance(item.properties, WebsetItemArticleProperties):
|
||||
title = item.properties.description
|
||||
content = item.properties.content or ""
|
||||
elif isinstance(item.properties, WebsetItemResearchPaperProperties):
|
||||
title = item.properties.description
|
||||
content = item.properties.content or ""
|
||||
elif isinstance(item.properties, WebsetItemCustomProperties):
|
||||
title = item.properties.description
|
||||
content = item.properties.content or ""
|
||||
else:
|
||||
# Fallback
|
||||
title = item.properties.description
|
||||
content = getattr(item.properties, "content", "")
|
||||
|
||||
# Convert enrichments from list to dict keyed by enrichment_id using Pydantic models
|
||||
enrichments_dict: Dict[str, EnrichmentResultModel] = {}
|
||||
if hasattr(item, "enrichments") and item.enrichments:
|
||||
for sdk_enrich in item.enrichments:
|
||||
enrich_model = EnrichmentResultModel.from_sdk(sdk_enrich)
|
||||
enrichments_dict[enrich_model.enrichment_id] = enrich_model
|
||||
|
||||
return cls(
|
||||
id=item.id,
|
||||
url=url_value,
|
||||
title=title,
|
||||
content=content or "",
|
||||
entity_data=properties_dict,
|
||||
enrichments=enrichments_dict,
|
||||
created_at=item.created_at.isoformat() if item.created_at else "",
|
||||
updated_at=item.updated_at.isoformat() if item.updated_at else "",
|
||||
)
|
||||
|
||||
|
||||
class ExaGetWebsetItemBlock(Block):
|
||||
"""Get a specific item from a webset by its ID."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
webset_id: str = SchemaField(
|
||||
description="The ID or external ID of the Webset",
|
||||
placeholder="webset-id-or-external-id",
|
||||
)
|
||||
item_id: str = SchemaField(
|
||||
description="The ID of the specific item to retrieve",
|
||||
placeholder="item-id",
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
item_id: str = SchemaField(description="The unique identifier for the item")
|
||||
url: str = SchemaField(description="The URL of the original source")
|
||||
title: str = SchemaField(description="The title of the item")
|
||||
content: str = SchemaField(description="The main content of the item")
|
||||
entity_data: dict = SchemaField(description="Entity-specific structured data")
|
||||
enrichments: dict = SchemaField(description="Enrichment data added to the item")
|
||||
created_at: str = SchemaField(
|
||||
description="When the item was added to the webset"
|
||||
)
|
||||
updated_at: str = SchemaField(description="When the item was last updated")
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="c4a7d9e2-8f3b-4a6c-9d8e-a5b6c7d8e9f0",
|
||||
description="Get a specific item from a webset by its ID",
|
||||
categories={BlockCategory.SEARCH},
|
||||
input_schema=ExaGetWebsetItemBlock.Input,
|
||||
output_schema=ExaGetWebsetItemBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
|
||||
|
||||
sdk_item = aexa.websets.items.get(
|
||||
webset_id=input_data.webset_id, id=input_data.item_id
|
||||
)
|
||||
|
||||
item = WebsetItemModel.from_sdk(sdk_item)
|
||||
|
||||
yield "item_id", item.id
|
||||
yield "url", item.url
|
||||
yield "title", item.title
|
||||
yield "content", item.content
|
||||
yield "entity_data", item.entity_data
|
||||
yield "enrichments", item.enrichments
|
||||
yield "created_at", item.created_at
|
||||
yield "updated_at", item.updated_at
|
||||
|
||||
|
||||
class ExaListWebsetItemsBlock(Block):
|
||||
"""List items in a webset with pagination and optional filtering."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
webset_id: str = SchemaField(
|
||||
description="The ID or external ID of the Webset",
|
||||
placeholder="webset-id-or-external-id",
|
||||
)
|
||||
limit: int = SchemaField(
|
||||
default=25,
|
||||
description="Number of items to return (1-100)",
|
||||
ge=1,
|
||||
le=100,
|
||||
)
|
||||
cursor: Optional[str] = SchemaField(
|
||||
default=None,
|
||||
description="Cursor for pagination through results",
|
||||
advanced=True,
|
||||
)
|
||||
wait_for_items: bool = SchemaField(
|
||||
default=False,
|
||||
description="Wait for items to be available if webset is still processing",
|
||||
advanced=True,
|
||||
)
|
||||
wait_timeout: int = SchemaField(
|
||||
default=60,
|
||||
description="Maximum time to wait for items in seconds",
|
||||
advanced=True,
|
||||
ge=1,
|
||||
le=300,
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
items: list[WebsetItemModel] = SchemaField(
|
||||
description="List of webset items",
|
||||
)
|
||||
webset_id: str = SchemaField(
|
||||
description="The ID of the webset",
|
||||
)
|
||||
item: WebsetItemModel = SchemaField(
|
||||
description="Individual item (yielded for each item in the list)",
|
||||
)
|
||||
has_more: bool = SchemaField(
|
||||
description="Whether there are more items to paginate through",
|
||||
)
|
||||
next_cursor: Optional[str] = SchemaField(
|
||||
description="Cursor for the next page of results",
|
||||
)
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="7b5e8c9f-01a2-43c4-95e6-f7a8b9c0d1e2",
|
||||
description="List items in a webset with pagination support",
|
||||
categories={BlockCategory.SEARCH},
|
||||
input_schema=ExaListWebsetItemsBlock.Input,
|
||||
output_schema=ExaListWebsetItemsBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
|
||||
|
||||
if input_data.wait_for_items:
|
||||
import asyncio
|
||||
import time
|
||||
|
||||
start_time = time.time()
|
||||
interval = 2
|
||||
response = None
|
||||
|
||||
while time.time() - start_time < input_data.wait_timeout:
|
||||
response = aexa.websets.items.list(
|
||||
webset_id=input_data.webset_id,
|
||||
cursor=input_data.cursor,
|
||||
limit=input_data.limit,
|
||||
)
|
||||
|
||||
if response.data:
|
||||
break
|
||||
|
||||
await asyncio.sleep(interval)
|
||||
interval = min(interval * 1.2, 10)
|
||||
|
||||
if not response:
|
||||
response = aexa.websets.items.list(
|
||||
webset_id=input_data.webset_id,
|
||||
cursor=input_data.cursor,
|
||||
limit=input_data.limit,
|
||||
)
|
||||
else:
|
||||
response = aexa.websets.items.list(
|
||||
webset_id=input_data.webset_id,
|
||||
cursor=input_data.cursor,
|
||||
limit=input_data.limit,
|
||||
)
|
||||
|
||||
items = [WebsetItemModel.from_sdk(item) for item in response.data]
|
||||
|
||||
yield "items", items
|
||||
|
||||
for item in items:
|
||||
yield "item", item
|
||||
|
||||
yield "has_more", response.has_more
|
||||
yield "next_cursor", response.next_cursor
|
||||
yield "webset_id", input_data.webset_id
|
||||
|
||||
|
||||
class ExaDeleteWebsetItemBlock(Block):
|
||||
"""Delete a specific item from a webset."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
webset_id: str = SchemaField(
|
||||
description="The ID or external ID of the Webset",
|
||||
placeholder="webset-id-or-external-id",
|
||||
)
|
||||
item_id: str = SchemaField(
|
||||
description="The ID of the item to delete",
|
||||
placeholder="item-id",
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
item_id: str = SchemaField(description="The ID of the deleted item")
|
||||
success: str = SchemaField(description="Whether the deletion was successful")
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="12c57fbe-c270-4877-a2b6-d2d05529ba79",
|
||||
description="Delete a specific item from a webset",
|
||||
categories={BlockCategory.SEARCH},
|
||||
input_schema=ExaDeleteWebsetItemBlock.Input,
|
||||
output_schema=ExaDeleteWebsetItemBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
|
||||
|
||||
deleted_item = aexa.websets.items.delete(
|
||||
webset_id=input_data.webset_id, id=input_data.item_id
|
||||
)
|
||||
|
||||
yield "item_id", deleted_item.id
|
||||
yield "success", "true"
|
||||
|
||||
|
||||
class ExaBulkWebsetItemsBlock(Block):
|
||||
"""Get all items from a webset in a single operation (with size limits)."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
webset_id: str = SchemaField(
|
||||
description="The ID or external ID of the Webset",
|
||||
placeholder="webset-id-or-external-id",
|
||||
)
|
||||
max_items: int = SchemaField(
|
||||
default=100,
|
||||
description="Maximum number of items to retrieve (1-1000). Note: Large values may take longer.",
|
||||
ge=1,
|
||||
le=1000,
|
||||
)
|
||||
include_enrichments: bool = SchemaField(
|
||||
default=True,
|
||||
description="Include enrichment data for each item",
|
||||
)
|
||||
include_content: bool = SchemaField(
|
||||
default=True,
|
||||
description="Include full content for each item",
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
items: list[WebsetItemModel] = SchemaField(
|
||||
description="All items from the webset"
|
||||
)
|
||||
item: WebsetItemModel = SchemaField(
|
||||
description="Individual item (yielded for each item)"
|
||||
)
|
||||
total_retrieved: int = SchemaField(
|
||||
description="Total number of items retrieved"
|
||||
)
|
||||
truncated: bool = SchemaField(
|
||||
description="Whether results were truncated due to max_items limit"
|
||||
)
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="dbd619f5-476e-4395-af9a-a7a7c0fb8c4e",
|
||||
description="Get all items from a webset in bulk (with configurable limits)",
|
||||
categories={BlockCategory.SEARCH},
|
||||
input_schema=ExaBulkWebsetItemsBlock.Input,
|
||||
output_schema=ExaBulkWebsetItemsBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
# Use AsyncExa SDK
|
||||
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
|
||||
|
||||
all_items: List[WebsetItemModel] = []
|
||||
item_iterator = aexa.websets.items.list_all(
|
||||
webset_id=input_data.webset_id, limit=input_data.max_items
|
||||
)
|
||||
|
||||
for sdk_item in item_iterator:
|
||||
if len(all_items) >= input_data.max_items:
|
||||
break
|
||||
|
||||
item = WebsetItemModel.from_sdk(sdk_item)
|
||||
|
||||
if not input_data.include_enrichments:
|
||||
item.enrichments = {}
|
||||
if not input_data.include_content:
|
||||
item.content = ""
|
||||
|
||||
all_items.append(item)
|
||||
|
||||
yield "items", all_items
|
||||
|
||||
for item in all_items:
|
||||
yield "item", item
|
||||
|
||||
yield "total_retrieved", len(all_items)
|
||||
yield "truncated", len(all_items) >= input_data.max_items
|
||||
|
||||
|
||||
class ExaWebsetItemsSummaryBlock(Block):
|
||||
"""Get a summary of items in a webset without retrieving all data."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
webset_id: str = SchemaField(
|
||||
description="The ID or external ID of the Webset",
|
||||
placeholder="webset-id-or-external-id",
|
||||
)
|
||||
sample_size: int = SchemaField(
|
||||
default=5,
|
||||
description="Number of sample items to include",
|
||||
ge=0,
|
||||
le=10,
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
total_items: int = SchemaField(
|
||||
description="Total number of items in the webset"
|
||||
)
|
||||
entity_type: str = SchemaField(description="Type of entities in the webset")
|
||||
sample_items: list[WebsetItemModel] = SchemaField(
|
||||
description="Sample of items from the webset"
|
||||
)
|
||||
enrichment_columns: list[str] = SchemaField(
|
||||
description="List of enrichment columns available"
|
||||
)
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="db7813ad-10bd-4652-8623-5667d6fecdd5",
|
||||
description="Get a summary of webset items without retrieving all data",
|
||||
categories={BlockCategory.SEARCH},
|
||||
input_schema=ExaWebsetItemsSummaryBlock.Input,
|
||||
output_schema=ExaWebsetItemsSummaryBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
# Use AsyncExa SDK
|
||||
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
|
||||
|
||||
webset = aexa.websets.get(id=input_data.webset_id)
|
||||
|
||||
entity_type = "unknown"
|
||||
if webset.searches:
|
||||
first_search = webset.searches[0]
|
||||
if first_search.entity:
|
||||
# The entity is a union type, extract type field
|
||||
entity_dict = first_search.entity.model_dump(by_alias=True)
|
||||
entity_type = entity_dict.get("type", "unknown")
|
||||
|
||||
# Get enrichment columns
|
||||
enrichment_columns = []
|
||||
if webset.enrichments:
|
||||
enrichment_columns = [
|
||||
e.title if e.title else e.description for e in webset.enrichments
|
||||
]
|
||||
|
||||
# Get sample items if requested
|
||||
sample_items: List[WebsetItemModel] = []
|
||||
if input_data.sample_size > 0:
|
||||
items_response = aexa.websets.items.list(
|
||||
webset_id=input_data.webset_id, limit=input_data.sample_size
|
||||
)
|
||||
# Convert to our stable models
|
||||
sample_items = [
|
||||
WebsetItemModel.from_sdk(item) for item in items_response.data
|
||||
]
|
||||
|
||||
total_items = 0
|
||||
if webset.searches:
|
||||
for search in webset.searches:
|
||||
if search.progress:
|
||||
total_items += search.progress.found
|
||||
|
||||
yield "total_items", total_items
|
||||
yield "entity_type", entity_type
|
||||
yield "sample_items", sample_items
|
||||
yield "enrichment_columns", enrichment_columns
|
||||
|
||||
|
||||
class ExaGetNewItemsBlock(Block):
|
||||
"""Get items added to a webset since a specific cursor (incremental processing helper)."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
webset_id: str = SchemaField(
|
||||
description="The ID or external ID of the Webset",
|
||||
placeholder="webset-id-or-external-id",
|
||||
)
|
||||
since_cursor: Optional[str] = SchemaField(
|
||||
default=None,
|
||||
description="Cursor from previous run - only items after this will be returned. Leave empty on first run.",
|
||||
placeholder="cursor-from-previous-run",
|
||||
)
|
||||
max_items: int = SchemaField(
|
||||
default=100,
|
||||
description="Maximum number of new items to retrieve",
|
||||
ge=1,
|
||||
le=1000,
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
new_items: list[WebsetItemModel] = SchemaField(
|
||||
description="Items added since the cursor"
|
||||
)
|
||||
item: WebsetItemModel = SchemaField(
|
||||
description="Individual item (yielded for each new item)"
|
||||
)
|
||||
count: int = SchemaField(description="Number of new items found")
|
||||
next_cursor: Optional[str] = SchemaField(
|
||||
description="Save this cursor for the next run to get only newer items"
|
||||
)
|
||||
has_more: bool = SchemaField(
|
||||
description="Whether there are more new items beyond max_items"
|
||||
)
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="3ff9bdf5-9613-4d21-8a60-90eb8b69c414",
|
||||
description="Get items added since a cursor - enables incremental processing without reprocessing",
|
||||
categories={BlockCategory.SEARCH, BlockCategory.DATA},
|
||||
input_schema=ExaGetNewItemsBlock.Input,
|
||||
output_schema=ExaGetNewItemsBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
# Use AsyncExa SDK
|
||||
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
|
||||
|
||||
# Get items starting from cursor
|
||||
response = aexa.websets.items.list(
|
||||
webset_id=input_data.webset_id,
|
||||
cursor=input_data.since_cursor,
|
||||
limit=input_data.max_items,
|
||||
)
|
||||
|
||||
# Convert SDK items to our stable models
|
||||
new_items = [WebsetItemModel.from_sdk(item) for item in response.data]
|
||||
|
||||
# Yield the full list
|
||||
yield "new_items", new_items
|
||||
|
||||
# Yield individual items for processing
|
||||
for item in new_items:
|
||||
yield "item", item
|
||||
|
||||
# Yield metadata for next run
|
||||
yield "count", len(new_items)
|
||||
yield "next_cursor", response.next_cursor
|
||||
yield "has_more", response.has_more
|
||||
600
autogpt_platform/backend/backend/blocks/exa/websets_monitor.py
Normal file
600
autogpt_platform/backend/backend/blocks/exa/websets_monitor.py
Normal file
@@ -0,0 +1,600 @@
|
||||
"""
|
||||
Exa Websets Monitor Management Blocks
|
||||
|
||||
This module provides blocks for creating and managing monitors that automatically
|
||||
keep websets updated with fresh data on a schedule.
|
||||
"""
|
||||
|
||||
from enum import Enum
|
||||
from typing import Optional
|
||||
|
||||
from exa_py import AsyncExa
|
||||
from exa_py.websets.types import Monitor as SdkMonitor
|
||||
from pydantic import BaseModel
|
||||
|
||||
from backend.sdk import (
|
||||
APIKeyCredentials,
|
||||
Block,
|
||||
BlockCategory,
|
||||
BlockOutput,
|
||||
BlockSchemaInput,
|
||||
BlockSchemaOutput,
|
||||
CredentialsMetaInput,
|
||||
SchemaField,
|
||||
)
|
||||
|
||||
from ._config import exa
|
||||
from ._test import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT
|
||||
|
||||
|
||||
# Mirrored model for stability - don't use SDK types directly in block outputs
|
||||
class MonitorModel(BaseModel):
|
||||
"""Stable output model mirroring SDK Monitor."""
|
||||
|
||||
id: str
|
||||
status: str
|
||||
webset_id: str
|
||||
behavior_type: str
|
||||
behavior_config: dict
|
||||
cron_expression: str
|
||||
timezone: str
|
||||
next_run_at: str
|
||||
last_run: dict
|
||||
metadata: dict
|
||||
created_at: str
|
||||
updated_at: str
|
||||
|
||||
@classmethod
|
||||
def from_sdk(cls, monitor: SdkMonitor) -> "MonitorModel":
|
||||
"""Convert SDK Monitor to our stable model."""
|
||||
# Extract behavior information
|
||||
behavior_dict = monitor.behavior.model_dump(by_alias=True, exclude_none=True)
|
||||
behavior_type = behavior_dict.get("type", "unknown")
|
||||
behavior_config = behavior_dict.get("config", {})
|
||||
|
||||
# Extract cadence information
|
||||
cadence_dict = monitor.cadence.model_dump(by_alias=True, exclude_none=True)
|
||||
cron_expr = cadence_dict.get("cron", "")
|
||||
timezone = cadence_dict.get("timezone", "Etc/UTC")
|
||||
|
||||
# Extract last run information
|
||||
last_run_dict = {}
|
||||
if monitor.last_run:
|
||||
last_run_dict = monitor.last_run.model_dump(
|
||||
by_alias=True, exclude_none=True
|
||||
)
|
||||
|
||||
# Handle status enum
|
||||
status_str = (
|
||||
monitor.status.value
|
||||
if hasattr(monitor.status, "value")
|
||||
else str(monitor.status)
|
||||
)
|
||||
|
||||
return cls(
|
||||
id=monitor.id,
|
||||
status=status_str,
|
||||
webset_id=monitor.webset_id,
|
||||
behavior_type=behavior_type,
|
||||
behavior_config=behavior_config,
|
||||
cron_expression=cron_expr,
|
||||
timezone=timezone,
|
||||
next_run_at=monitor.next_run_at.isoformat() if monitor.next_run_at else "",
|
||||
last_run=last_run_dict,
|
||||
metadata=monitor.metadata or {},
|
||||
created_at=monitor.created_at.isoformat() if monitor.created_at else "",
|
||||
updated_at=monitor.updated_at.isoformat() if monitor.updated_at else "",
|
||||
)
|
||||
|
||||
|
||||
class MonitorStatus(str, Enum):
|
||||
"""Status of a monitor."""
|
||||
|
||||
ENABLED = "enabled"
|
||||
DISABLED = "disabled"
|
||||
PAUSED = "paused"
|
||||
|
||||
|
||||
class MonitorBehaviorType(str, Enum):
|
||||
"""Type of behavior for a monitor."""
|
||||
|
||||
SEARCH = "search" # Run new searches
|
||||
REFRESH = "refresh" # Refresh existing items
|
||||
|
||||
|
||||
class SearchBehavior(str, Enum):
|
||||
"""How search results interact with existing items."""
|
||||
|
||||
APPEND = "append"
|
||||
OVERRIDE = "override"
|
||||
|
||||
|
||||
class ExaCreateMonitorBlock(Block):
|
||||
"""Create a monitor to automatically keep a webset updated on a schedule."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
webset_id: str = SchemaField(
|
||||
description="The ID or external ID of the Webset to monitor",
|
||||
placeholder="webset-id-or-external-id",
|
||||
)
|
||||
|
||||
# Schedule configuration
|
||||
cron_expression: str = SchemaField(
|
||||
description="Cron expression for scheduling (5 fields, max once per day)",
|
||||
placeholder="0 9 * * 1", # Every Monday at 9 AM
|
||||
)
|
||||
timezone: str = SchemaField(
|
||||
default="Etc/UTC",
|
||||
description="IANA timezone for the schedule",
|
||||
placeholder="America/New_York",
|
||||
advanced=True,
|
||||
)
|
||||
|
||||
# Behavior configuration
|
||||
behavior_type: MonitorBehaviorType = SchemaField(
|
||||
default=MonitorBehaviorType.SEARCH,
|
||||
description="Type of monitor behavior (search for new items or refresh existing)",
|
||||
)
|
||||
|
||||
# Search configuration (for SEARCH behavior)
|
||||
search_query: Optional[str] = SchemaField(
|
||||
default=None,
|
||||
description="Search query for finding new items (required for search behavior)",
|
||||
placeholder="AI startups that raised funding in the last week",
|
||||
)
|
||||
search_count: int = SchemaField(
|
||||
default=10,
|
||||
description="Number of items to find in each search",
|
||||
ge=1,
|
||||
le=100,
|
||||
)
|
||||
search_criteria: list[str] = SchemaField(
|
||||
default_factory=list,
|
||||
description="Criteria that items must meet",
|
||||
advanced=True,
|
||||
)
|
||||
search_behavior: SearchBehavior = SchemaField(
|
||||
default=SearchBehavior.APPEND,
|
||||
description="How new results interact with existing items",
|
||||
advanced=True,
|
||||
)
|
||||
entity_type: Optional[str] = SchemaField(
|
||||
default=None,
|
||||
description="Type of entity to search for (company, person, etc.)",
|
||||
advanced=True,
|
||||
)
|
||||
|
||||
# Refresh configuration (for REFRESH behavior)
|
||||
refresh_content: bool = SchemaField(
|
||||
default=True,
|
||||
description="Refresh content from source URLs (for refresh behavior)",
|
||||
advanced=True,
|
||||
)
|
||||
refresh_enrichments: bool = SchemaField(
|
||||
default=True,
|
||||
description="Re-run enrichments on items (for refresh behavior)",
|
||||
advanced=True,
|
||||
)
|
||||
|
||||
# Metadata
|
||||
metadata: Optional[dict] = SchemaField(
|
||||
default=None,
|
||||
description="Metadata to attach to the monitor",
|
||||
advanced=True,
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
monitor_id: str = SchemaField(
|
||||
description="The unique identifier for the created monitor"
|
||||
)
|
||||
webset_id: str = SchemaField(description="The webset this monitor belongs to")
|
||||
status: str = SchemaField(description="Status of the monitor")
|
||||
behavior_type: str = SchemaField(description="Type of monitor behavior")
|
||||
next_run_at: Optional[str] = SchemaField(
|
||||
description="When the monitor will next run"
|
||||
)
|
||||
cron_expression: str = SchemaField(description="The schedule cron expression")
|
||||
timezone: str = SchemaField(description="The timezone for scheduling")
|
||||
created_at: str = SchemaField(description="When the monitor was created")
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="f8a9b0c1-d2e3-4567-890a-bcdef1234567",
|
||||
description="Create automated monitors to keep websets updated with fresh data on a schedule",
|
||||
categories={BlockCategory.SEARCH},
|
||||
input_schema=ExaCreateMonitorBlock.Input,
|
||||
output_schema=ExaCreateMonitorBlock.Output,
|
||||
test_input={
|
||||
"credentials": TEST_CREDENTIALS_INPUT,
|
||||
"webset_id": "test-webset",
|
||||
"cron_expression": "0 9 * * 1",
|
||||
"behavior_type": MonitorBehaviorType.SEARCH,
|
||||
"search_query": "AI startups",
|
||||
"search_count": 10,
|
||||
},
|
||||
test_output=[
|
||||
("monitor_id", "monitor-123"),
|
||||
("webset_id", "test-webset"),
|
||||
("status", "enabled"),
|
||||
("behavior_type", "search"),
|
||||
("next_run_at", "2024-01-01T00:00:00"),
|
||||
("cron_expression", "0 9 * * 1"),
|
||||
("timezone", "Etc/UTC"),
|
||||
("created_at", "2024-01-01T00:00:00"),
|
||||
],
|
||||
test_credentials=TEST_CREDENTIALS,
|
||||
test_mock=self._create_test_mock(),
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def _create_test_mock():
|
||||
"""Create test mocks for the AsyncExa SDK."""
|
||||
from datetime import datetime
|
||||
from unittest.mock import MagicMock
|
||||
|
||||
# Create mock SDK monitor object
|
||||
mock_monitor = MagicMock()
|
||||
mock_monitor.id = "monitor-123"
|
||||
mock_monitor.status = MagicMock(value="enabled")
|
||||
mock_monitor.webset_id = "test-webset"
|
||||
mock_monitor.next_run_at = datetime.fromisoformat("2024-01-01T00:00:00")
|
||||
mock_monitor.created_at = datetime.fromisoformat("2024-01-01T00:00:00")
|
||||
mock_monitor.updated_at = datetime.fromisoformat("2024-01-01T00:00:00")
|
||||
mock_monitor.metadata = {}
|
||||
mock_monitor.last_run = None
|
||||
|
||||
# Mock behavior
|
||||
mock_behavior = MagicMock()
|
||||
mock_behavior.model_dump = MagicMock(
|
||||
return_value={"type": "search", "config": {}}
|
||||
)
|
||||
mock_monitor.behavior = mock_behavior
|
||||
|
||||
# Mock cadence
|
||||
mock_cadence = MagicMock()
|
||||
mock_cadence.model_dump = MagicMock(
|
||||
return_value={"cron": "0 9 * * 1", "timezone": "Etc/UTC"}
|
||||
)
|
||||
mock_monitor.cadence = mock_cadence
|
||||
|
||||
return {
|
||||
"_get_client": lambda *args, **kwargs: MagicMock(
|
||||
websets=MagicMock(
|
||||
monitors=MagicMock(create=lambda *args, **kwargs: mock_monitor)
|
||||
)
|
||||
)
|
||||
}
|
||||
|
||||
def _get_client(self, api_key: str) -> AsyncExa:
|
||||
"""Get Exa client (separated for testing)."""
|
||||
return AsyncExa(api_key=api_key)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
aexa = self._get_client(credentials.api_key.get_secret_value())
|
||||
|
||||
# Build the payload
|
||||
payload = {
|
||||
"websetId": input_data.webset_id,
|
||||
"cadence": {
|
||||
"cron": input_data.cron_expression,
|
||||
"timezone": input_data.timezone,
|
||||
},
|
||||
}
|
||||
|
||||
# Build behavior configuration based on type
|
||||
if input_data.behavior_type == MonitorBehaviorType.SEARCH:
|
||||
behavior_config = {
|
||||
"query": input_data.search_query or "",
|
||||
"count": input_data.search_count,
|
||||
"behavior": input_data.search_behavior.value,
|
||||
}
|
||||
|
||||
if input_data.search_criteria:
|
||||
behavior_config["criteria"] = [
|
||||
{"description": c} for c in input_data.search_criteria
|
||||
]
|
||||
|
||||
if input_data.entity_type:
|
||||
behavior_config["entity"] = {"type": input_data.entity_type}
|
||||
|
||||
payload["behavior"] = {
|
||||
"type": "search",
|
||||
"config": behavior_config,
|
||||
}
|
||||
else:
|
||||
# REFRESH behavior
|
||||
payload["behavior"] = {
|
||||
"type": "refresh",
|
||||
"config": {
|
||||
"content": input_data.refresh_content,
|
||||
"enrichments": input_data.refresh_enrichments,
|
||||
},
|
||||
}
|
||||
|
||||
# Add metadata if provided
|
||||
if input_data.metadata:
|
||||
payload["metadata"] = input_data.metadata
|
||||
|
||||
sdk_monitor = aexa.websets.monitors.create(params=payload)
|
||||
|
||||
monitor = MonitorModel.from_sdk(sdk_monitor)
|
||||
|
||||
# Yield all fields
|
||||
yield "monitor_id", monitor.id
|
||||
yield "webset_id", monitor.webset_id
|
||||
yield "status", monitor.status
|
||||
yield "behavior_type", monitor.behavior_type
|
||||
yield "next_run_at", monitor.next_run_at
|
||||
yield "cron_expression", monitor.cron_expression
|
||||
yield "timezone", monitor.timezone
|
||||
yield "created_at", monitor.created_at
|
||||
|
||||
|
||||
class ExaGetMonitorBlock(Block):
|
||||
"""Get the details and status of a monitor."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
monitor_id: str = SchemaField(
|
||||
description="The ID of the monitor to retrieve",
|
||||
placeholder="monitor-id",
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
monitor_id: str = SchemaField(
|
||||
description="The unique identifier for the monitor"
|
||||
)
|
||||
webset_id: str = SchemaField(description="The webset this monitor belongs to")
|
||||
status: str = SchemaField(description="Current status of the monitor")
|
||||
behavior_type: str = SchemaField(description="Type of monitor behavior")
|
||||
behavior_config: dict = SchemaField(
|
||||
description="Configuration for the monitor behavior"
|
||||
)
|
||||
cron_expression: str = SchemaField(description="The schedule cron expression")
|
||||
timezone: str = SchemaField(description="The timezone for scheduling")
|
||||
next_run_at: Optional[str] = SchemaField(
|
||||
description="When the monitor will next run"
|
||||
)
|
||||
last_run: Optional[dict] = SchemaField(
|
||||
description="Information about the last run"
|
||||
)
|
||||
created_at: str = SchemaField(description="When the monitor was created")
|
||||
updated_at: str = SchemaField(description="When the monitor was last updated")
|
||||
metadata: dict = SchemaField(description="Metadata attached to the monitor")
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="5c852a2d-d505-4a56-b711-7def8dd14e72",
|
||||
description="Get the details and status of a webset monitor",
|
||||
categories={BlockCategory.SEARCH},
|
||||
input_schema=ExaGetMonitorBlock.Input,
|
||||
output_schema=ExaGetMonitorBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
# Use AsyncExa SDK
|
||||
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
|
||||
|
||||
sdk_monitor = aexa.websets.monitors.get(monitor_id=input_data.monitor_id)
|
||||
|
||||
monitor = MonitorModel.from_sdk(sdk_monitor)
|
||||
|
||||
# Yield all fields
|
||||
yield "monitor_id", monitor.id
|
||||
yield "webset_id", monitor.webset_id
|
||||
yield "status", monitor.status
|
||||
yield "behavior_type", monitor.behavior_type
|
||||
yield "behavior_config", monitor.behavior_config
|
||||
yield "cron_expression", monitor.cron_expression
|
||||
yield "timezone", monitor.timezone
|
||||
yield "next_run_at", monitor.next_run_at
|
||||
yield "last_run", monitor.last_run
|
||||
yield "created_at", monitor.created_at
|
||||
yield "updated_at", monitor.updated_at
|
||||
yield "metadata", monitor.metadata
|
||||
|
||||
|
||||
class ExaUpdateMonitorBlock(Block):
|
||||
"""Update a monitor's configuration."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
monitor_id: str = SchemaField(
|
||||
description="The ID of the monitor to update",
|
||||
placeholder="monitor-id",
|
||||
)
|
||||
status: Optional[MonitorStatus] = SchemaField(
|
||||
default=None,
|
||||
description="New status for the monitor",
|
||||
)
|
||||
cron_expression: Optional[str] = SchemaField(
|
||||
default=None,
|
||||
description="New cron expression for scheduling",
|
||||
)
|
||||
timezone: Optional[str] = SchemaField(
|
||||
default=None,
|
||||
description="New timezone for the schedule",
|
||||
advanced=True,
|
||||
)
|
||||
metadata: Optional[dict] = SchemaField(
|
||||
default=None,
|
||||
description="New metadata for the monitor",
|
||||
advanced=True,
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
monitor_id: str = SchemaField(
|
||||
description="The unique identifier for the monitor"
|
||||
)
|
||||
status: str = SchemaField(description="Updated status of the monitor")
|
||||
next_run_at: Optional[str] = SchemaField(
|
||||
description="When the monitor will next run"
|
||||
)
|
||||
updated_at: str = SchemaField(description="When the monitor was updated")
|
||||
success: str = SchemaField(description="Whether the update was successful")
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="245102c3-6af3-4515-a308-c2210b7939d2",
|
||||
description="Update a monitor's status, schedule, or metadata",
|
||||
categories={BlockCategory.SEARCH},
|
||||
input_schema=ExaUpdateMonitorBlock.Input,
|
||||
output_schema=ExaUpdateMonitorBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
# Use AsyncExa SDK
|
||||
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
|
||||
|
||||
# Build update payload
|
||||
payload = {}
|
||||
|
||||
if input_data.status is not None:
|
||||
payload["status"] = input_data.status.value
|
||||
|
||||
if input_data.cron_expression is not None or input_data.timezone is not None:
|
||||
cadence = {}
|
||||
if input_data.cron_expression:
|
||||
cadence["cron"] = input_data.cron_expression
|
||||
if input_data.timezone:
|
||||
cadence["timezone"] = input_data.timezone
|
||||
payload["cadence"] = cadence
|
||||
|
||||
if input_data.metadata is not None:
|
||||
payload["metadata"] = input_data.metadata
|
||||
|
||||
sdk_monitor = aexa.websets.monitors.update(
|
||||
monitor_id=input_data.monitor_id, params=payload
|
||||
)
|
||||
|
||||
# Convert to our stable model
|
||||
monitor = MonitorModel.from_sdk(sdk_monitor)
|
||||
|
||||
# Yield fields
|
||||
yield "monitor_id", monitor.id
|
||||
yield "status", monitor.status
|
||||
yield "next_run_at", monitor.next_run_at
|
||||
yield "updated_at", monitor.updated_at
|
||||
yield "success", "true"
|
||||
|
||||
|
||||
class ExaDeleteMonitorBlock(Block):
|
||||
"""Delete a monitor from a webset."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
monitor_id: str = SchemaField(
|
||||
description="The ID of the monitor to delete",
|
||||
placeholder="monitor-id",
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
monitor_id: str = SchemaField(description="The ID of the deleted monitor")
|
||||
success: str = SchemaField(description="Whether the deletion was successful")
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="f16f9b10-0c4d-4db8-997d-7b96b6026094",
|
||||
description="Delete a monitor from a webset",
|
||||
categories={BlockCategory.SEARCH},
|
||||
input_schema=ExaDeleteMonitorBlock.Input,
|
||||
output_schema=ExaDeleteMonitorBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
# Use AsyncExa SDK
|
||||
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
|
||||
|
||||
deleted_monitor = aexa.websets.monitors.delete(monitor_id=input_data.monitor_id)
|
||||
|
||||
yield "monitor_id", deleted_monitor.id
|
||||
yield "success", "true"
|
||||
|
||||
|
||||
class ExaListMonitorsBlock(Block):
|
||||
"""List all monitors with pagination."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
webset_id: Optional[str] = SchemaField(
|
||||
default=None,
|
||||
description="Filter monitors by webset ID",
|
||||
placeholder="webset-id",
|
||||
)
|
||||
limit: int = SchemaField(
|
||||
default=25,
|
||||
description="Number of monitors to return",
|
||||
ge=1,
|
||||
le=100,
|
||||
)
|
||||
cursor: Optional[str] = SchemaField(
|
||||
default=None,
|
||||
description="Cursor for pagination",
|
||||
advanced=True,
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
monitors: list[dict] = SchemaField(description="List of monitors")
|
||||
monitor: dict = SchemaField(
|
||||
description="Individual monitor (yielded for each monitor)"
|
||||
)
|
||||
has_more: bool = SchemaField(
|
||||
description="Whether there are more monitors to paginate through"
|
||||
)
|
||||
next_cursor: Optional[str] = SchemaField(
|
||||
description="Cursor for the next page of results"
|
||||
)
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="f06e2b38-5397-4e8f-aa85-491149dd98df",
|
||||
description="List all monitors with optional webset filtering",
|
||||
categories={BlockCategory.SEARCH},
|
||||
input_schema=ExaListMonitorsBlock.Input,
|
||||
output_schema=ExaListMonitorsBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
# Use AsyncExa SDK
|
||||
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
|
||||
|
||||
response = aexa.websets.monitors.list(
|
||||
cursor=input_data.cursor,
|
||||
limit=input_data.limit,
|
||||
webset_id=input_data.webset_id,
|
||||
)
|
||||
|
||||
# Convert SDK monitors to our stable models
|
||||
monitors = [MonitorModel.from_sdk(m) for m in response.data]
|
||||
|
||||
# Yield the full list
|
||||
yield "monitors", [m.model_dump() for m in monitors]
|
||||
|
||||
# Yield individual monitors for graph chaining
|
||||
for monitor in monitors:
|
||||
yield "monitor", monitor.model_dump()
|
||||
|
||||
# Yield pagination metadata
|
||||
yield "has_more", response.has_more
|
||||
yield "next_cursor", response.next_cursor
|
||||
600
autogpt_platform/backend/backend/blocks/exa/websets_polling.py
Normal file
600
autogpt_platform/backend/backend/blocks/exa/websets_polling.py
Normal file
@@ -0,0 +1,600 @@
|
||||
"""
|
||||
Exa Websets Polling Blocks
|
||||
|
||||
This module provides dedicated polling blocks for waiting on webset operations
|
||||
to complete, with progress tracking and timeout management.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import time
|
||||
from enum import Enum
|
||||
from typing import Any, Dict
|
||||
|
||||
from exa_py import AsyncExa
|
||||
from pydantic import BaseModel
|
||||
|
||||
from backend.sdk import (
|
||||
APIKeyCredentials,
|
||||
Block,
|
||||
BlockCategory,
|
||||
BlockOutput,
|
||||
BlockSchemaInput,
|
||||
BlockSchemaOutput,
|
||||
CredentialsMetaInput,
|
||||
SchemaField,
|
||||
)
|
||||
|
||||
from ._config import exa
|
||||
|
||||
# Import WebsetItemModel for use in enrichment samples
|
||||
# This is safe as websets_items doesn't import from websets_polling
|
||||
from .websets_items import WebsetItemModel
|
||||
|
||||
|
||||
# Model for sample enrichment data
|
||||
class SampleEnrichmentModel(BaseModel):
|
||||
"""Sample enrichment result for display."""
|
||||
|
||||
item_id: str
|
||||
item_title: str
|
||||
enrichment_data: Dict[str, Any]
|
||||
|
||||
|
||||
class WebsetTargetStatus(str, Enum):
|
||||
IDLE = "idle"
|
||||
COMPLETED = "completed"
|
||||
RUNNING = "running"
|
||||
PAUSED = "paused"
|
||||
ANY_COMPLETE = "any_complete" # Either idle or completed
|
||||
|
||||
|
||||
class ExaWaitForWebsetBlock(Block):
|
||||
"""Wait for a webset to reach a specific status with progress tracking."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
webset_id: str = SchemaField(
|
||||
description="The ID or external ID of the Webset to monitor",
|
||||
placeholder="webset-id-or-external-id",
|
||||
)
|
||||
target_status: WebsetTargetStatus = SchemaField(
|
||||
default=WebsetTargetStatus.IDLE,
|
||||
description="Status to wait for (idle=all operations complete, completed=search done, running=actively processing)",
|
||||
)
|
||||
timeout: int = SchemaField(
|
||||
default=300,
|
||||
description="Maximum time to wait in seconds",
|
||||
ge=1,
|
||||
le=1800, # 30 minutes max
|
||||
)
|
||||
check_interval: int = SchemaField(
|
||||
default=5,
|
||||
description="Initial interval between status checks in seconds",
|
||||
advanced=True,
|
||||
ge=1,
|
||||
le=60,
|
||||
)
|
||||
max_interval: int = SchemaField(
|
||||
default=30,
|
||||
description="Maximum interval between checks (for exponential backoff)",
|
||||
advanced=True,
|
||||
ge=5,
|
||||
le=120,
|
||||
)
|
||||
include_progress: bool = SchemaField(
|
||||
default=True,
|
||||
description="Include detailed progress information in output",
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
webset_id: str = SchemaField(description="The webset ID that was monitored")
|
||||
final_status: str = SchemaField(description="The final status of the webset")
|
||||
elapsed_time: float = SchemaField(description="Total time elapsed in seconds")
|
||||
item_count: int = SchemaField(description="Number of items found")
|
||||
search_progress: dict = SchemaField(
|
||||
description="Detailed search progress information"
|
||||
)
|
||||
enrichment_progress: dict = SchemaField(
|
||||
description="Detailed enrichment progress information"
|
||||
)
|
||||
timed_out: bool = SchemaField(description="Whether the operation timed out")
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="619d71e8-b72a-434d-8bd4-23376dd0342c",
|
||||
description="Wait for a webset to reach a specific status with progress tracking",
|
||||
categories={BlockCategory.SEARCH},
|
||||
input_schema=ExaWaitForWebsetBlock.Input,
|
||||
output_schema=ExaWaitForWebsetBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
start_time = time.time()
|
||||
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
|
||||
|
||||
try:
|
||||
if input_data.target_status in [
|
||||
WebsetTargetStatus.IDLE,
|
||||
WebsetTargetStatus.ANY_COMPLETE,
|
||||
]:
|
||||
final_webset = aexa.websets.wait_until_idle(
|
||||
id=input_data.webset_id,
|
||||
timeout=input_data.timeout,
|
||||
poll_interval=input_data.check_interval,
|
||||
)
|
||||
|
||||
elapsed = time.time() - start_time
|
||||
|
||||
status_str = (
|
||||
final_webset.status.value
|
||||
if hasattr(final_webset.status, "value")
|
||||
else str(final_webset.status)
|
||||
)
|
||||
|
||||
item_count = 0
|
||||
if final_webset.searches:
|
||||
for search in final_webset.searches:
|
||||
if search.progress:
|
||||
item_count += search.progress.found
|
||||
|
||||
# Extract progress if requested
|
||||
search_progress = {}
|
||||
enrichment_progress = {}
|
||||
if input_data.include_progress:
|
||||
webset_dict = final_webset.model_dump(
|
||||
by_alias=True, exclude_none=True
|
||||
)
|
||||
search_progress = self._extract_search_progress(webset_dict)
|
||||
enrichment_progress = self._extract_enrichment_progress(webset_dict)
|
||||
|
||||
yield "webset_id", input_data.webset_id
|
||||
yield "final_status", status_str
|
||||
yield "elapsed_time", elapsed
|
||||
yield "item_count", item_count
|
||||
if input_data.include_progress:
|
||||
yield "search_progress", search_progress
|
||||
yield "enrichment_progress", enrichment_progress
|
||||
yield "timed_out", False
|
||||
else:
|
||||
# For other status targets, manually poll
|
||||
interval = input_data.check_interval
|
||||
while time.time() - start_time < input_data.timeout:
|
||||
# Get current webset status
|
||||
webset = aexa.websets.get(id=input_data.webset_id)
|
||||
current_status = (
|
||||
webset.status.value
|
||||
if hasattr(webset.status, "value")
|
||||
else str(webset.status)
|
||||
)
|
||||
|
||||
# Check if target status reached
|
||||
if current_status == input_data.target_status.value:
|
||||
elapsed = time.time() - start_time
|
||||
|
||||
# Estimate item count from search progress
|
||||
item_count = 0
|
||||
if webset.searches:
|
||||
for search in webset.searches:
|
||||
if search.progress:
|
||||
item_count += search.progress.found
|
||||
|
||||
search_progress = {}
|
||||
enrichment_progress = {}
|
||||
if input_data.include_progress:
|
||||
webset_dict = webset.model_dump(
|
||||
by_alias=True, exclude_none=True
|
||||
)
|
||||
search_progress = self._extract_search_progress(webset_dict)
|
||||
enrichment_progress = self._extract_enrichment_progress(
|
||||
webset_dict
|
||||
)
|
||||
|
||||
yield "webset_id", input_data.webset_id
|
||||
yield "final_status", current_status
|
||||
yield "elapsed_time", elapsed
|
||||
yield "item_count", item_count
|
||||
if input_data.include_progress:
|
||||
yield "search_progress", search_progress
|
||||
yield "enrichment_progress", enrichment_progress
|
||||
yield "timed_out", False
|
||||
return
|
||||
|
||||
# Wait before next check with exponential backoff
|
||||
await asyncio.sleep(interval)
|
||||
interval = min(interval * 1.5, input_data.max_interval)
|
||||
|
||||
# Timeout reached
|
||||
elapsed = time.time() - start_time
|
||||
webset = aexa.websets.get(id=input_data.webset_id)
|
||||
final_status = (
|
||||
webset.status.value
|
||||
if hasattr(webset.status, "value")
|
||||
else str(webset.status)
|
||||
)
|
||||
|
||||
item_count = 0
|
||||
if webset.searches:
|
||||
for search in webset.searches:
|
||||
if search.progress:
|
||||
item_count += search.progress.found
|
||||
|
||||
search_progress = {}
|
||||
enrichment_progress = {}
|
||||
if input_data.include_progress:
|
||||
webset_dict = webset.model_dump(by_alias=True, exclude_none=True)
|
||||
search_progress = self._extract_search_progress(webset_dict)
|
||||
enrichment_progress = self._extract_enrichment_progress(webset_dict)
|
||||
|
||||
yield "webset_id", input_data.webset_id
|
||||
yield "final_status", final_status
|
||||
yield "elapsed_time", elapsed
|
||||
yield "item_count", item_count
|
||||
if input_data.include_progress:
|
||||
yield "search_progress", search_progress
|
||||
yield "enrichment_progress", enrichment_progress
|
||||
yield "timed_out", True
|
||||
|
||||
except asyncio.TimeoutError:
|
||||
raise ValueError(
|
||||
f"Polling timed out after {input_data.timeout} seconds"
|
||||
) from None
|
||||
|
||||
def _extract_search_progress(self, webset_data: dict) -> dict:
|
||||
"""Extract search progress information from webset data."""
|
||||
progress = {}
|
||||
searches = webset_data.get("searches", [])
|
||||
|
||||
for idx, search in enumerate(searches):
|
||||
search_id = search.get("id", f"search_{idx}")
|
||||
search_progress = search.get("progress", {})
|
||||
|
||||
progress[search_id] = {
|
||||
"status": search.get("status", "unknown"),
|
||||
"found": search_progress.get("found", 0),
|
||||
"analyzed": search_progress.get("analyzed", 0),
|
||||
"completion": search_progress.get("completion", 0),
|
||||
"time_left": search_progress.get("timeLeft", 0),
|
||||
}
|
||||
|
||||
return progress
|
||||
|
||||
def _extract_enrichment_progress(self, webset_data: dict) -> dict:
|
||||
"""Extract enrichment progress information from webset data."""
|
||||
progress = {}
|
||||
enrichments = webset_data.get("enrichments", [])
|
||||
|
||||
for idx, enrichment in enumerate(enrichments):
|
||||
enrich_id = enrichment.get("id", f"enrichment_{idx}")
|
||||
|
||||
progress[enrich_id] = {
|
||||
"status": enrichment.get("status", "unknown"),
|
||||
"title": enrichment.get("title", ""),
|
||||
"description": enrichment.get("description", ""),
|
||||
}
|
||||
|
||||
return progress
|
||||
|
||||
|
||||
class ExaWaitForSearchBlock(Block):
|
||||
"""Wait for a specific webset search to complete with progress tracking."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
webset_id: str = SchemaField(
|
||||
description="The ID or external ID of the Webset",
|
||||
placeholder="webset-id-or-external-id",
|
||||
)
|
||||
search_id: str = SchemaField(
|
||||
description="The ID of the search to monitor",
|
||||
placeholder="search-id",
|
||||
)
|
||||
timeout: int = SchemaField(
|
||||
default=300,
|
||||
description="Maximum time to wait in seconds",
|
||||
ge=1,
|
||||
le=1800,
|
||||
)
|
||||
check_interval: int = SchemaField(
|
||||
default=5,
|
||||
description="Initial interval between status checks in seconds",
|
||||
advanced=True,
|
||||
ge=1,
|
||||
le=60,
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
search_id: str = SchemaField(description="The search ID that was monitored")
|
||||
final_status: str = SchemaField(description="The final status of the search")
|
||||
items_found: int = SchemaField(
|
||||
description="Number of items found by the search"
|
||||
)
|
||||
items_analyzed: int = SchemaField(description="Number of items analyzed")
|
||||
completion_percentage: int = SchemaField(
|
||||
description="Completion percentage (0-100)"
|
||||
)
|
||||
elapsed_time: float = SchemaField(description="Total time elapsed in seconds")
|
||||
recall_info: dict = SchemaField(
|
||||
description="Information about expected results and confidence"
|
||||
)
|
||||
timed_out: bool = SchemaField(description="Whether the operation timed out")
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="14da21ae-40a1-41bc-a111-c8e5c9ef012b",
|
||||
description="Wait for a specific webset search to complete with progress tracking",
|
||||
categories={BlockCategory.SEARCH},
|
||||
input_schema=ExaWaitForSearchBlock.Input,
|
||||
output_schema=ExaWaitForSearchBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
start_time = time.time()
|
||||
interval = input_data.check_interval
|
||||
max_interval = 30
|
||||
# Use AsyncExa SDK
|
||||
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
|
||||
|
||||
try:
|
||||
while time.time() - start_time < input_data.timeout:
|
||||
# Get current search status using SDK
|
||||
search = aexa.websets.searches.get(
|
||||
webset_id=input_data.webset_id, id=input_data.search_id
|
||||
)
|
||||
|
||||
# Extract status
|
||||
status = (
|
||||
search.status.value
|
||||
if hasattr(search.status, "value")
|
||||
else str(search.status)
|
||||
)
|
||||
|
||||
# Check if search is complete
|
||||
if status in ["completed", "failed", "canceled"]:
|
||||
elapsed = time.time() - start_time
|
||||
|
||||
# Extract progress information
|
||||
progress_dict = {}
|
||||
if search.progress:
|
||||
progress_dict = search.progress.model_dump(
|
||||
by_alias=True, exclude_none=True
|
||||
)
|
||||
|
||||
# Extract recall information
|
||||
recall_info = {}
|
||||
if search.recall:
|
||||
recall_dict = search.recall.model_dump(
|
||||
by_alias=True, exclude_none=True
|
||||
)
|
||||
expected = recall_dict.get("expected", {})
|
||||
recall_info = {
|
||||
"expected_total": expected.get("total", 0),
|
||||
"confidence": expected.get("confidence", ""),
|
||||
"min_expected": expected.get("bounds", {}).get("min", 0),
|
||||
"max_expected": expected.get("bounds", {}).get("max", 0),
|
||||
"reasoning": recall_dict.get("reasoning", ""),
|
||||
}
|
||||
|
||||
yield "search_id", input_data.search_id
|
||||
yield "final_status", status
|
||||
yield "items_found", progress_dict.get("found", 0)
|
||||
yield "items_analyzed", progress_dict.get("analyzed", 0)
|
||||
yield "completion_percentage", progress_dict.get("completion", 0)
|
||||
yield "elapsed_time", elapsed
|
||||
yield "recall_info", recall_info
|
||||
yield "timed_out", False
|
||||
|
||||
return
|
||||
|
||||
# Wait before next check with exponential backoff
|
||||
await asyncio.sleep(interval)
|
||||
interval = min(interval * 1.5, max_interval)
|
||||
|
||||
# Timeout reached
|
||||
elapsed = time.time() - start_time
|
||||
|
||||
# Get last known status
|
||||
search = aexa.websets.searches.get(
|
||||
webset_id=input_data.webset_id, id=input_data.search_id
|
||||
)
|
||||
final_status = (
|
||||
search.status.value
|
||||
if hasattr(search.status, "value")
|
||||
else str(search.status)
|
||||
)
|
||||
|
||||
progress_dict = {}
|
||||
if search.progress:
|
||||
progress_dict = search.progress.model_dump(
|
||||
by_alias=True, exclude_none=True
|
||||
)
|
||||
|
||||
yield "search_id", input_data.search_id
|
||||
yield "final_status", final_status
|
||||
yield "items_found", progress_dict.get("found", 0)
|
||||
yield "items_analyzed", progress_dict.get("analyzed", 0)
|
||||
yield "completion_percentage", progress_dict.get("completion", 0)
|
||||
yield "elapsed_time", elapsed
|
||||
yield "timed_out", True
|
||||
|
||||
except asyncio.TimeoutError:
|
||||
raise ValueError(
|
||||
f"Search polling timed out after {input_data.timeout} seconds"
|
||||
) from None
|
||||
|
||||
|
||||
class ExaWaitForEnrichmentBlock(Block):
|
||||
"""Wait for a webset enrichment to complete with progress tracking."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
webset_id: str = SchemaField(
|
||||
description="The ID or external ID of the Webset",
|
||||
placeholder="webset-id-or-external-id",
|
||||
)
|
||||
enrichment_id: str = SchemaField(
|
||||
description="The ID of the enrichment to monitor",
|
||||
placeholder="enrichment-id",
|
||||
)
|
||||
timeout: int = SchemaField(
|
||||
default=300,
|
||||
description="Maximum time to wait in seconds",
|
||||
ge=1,
|
||||
le=1800,
|
||||
)
|
||||
check_interval: int = SchemaField(
|
||||
default=5,
|
||||
description="Initial interval between status checks in seconds",
|
||||
advanced=True,
|
||||
ge=1,
|
||||
le=60,
|
||||
)
|
||||
sample_results: bool = SchemaField(
|
||||
default=True,
|
||||
description="Include sample enrichment results in output",
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
enrichment_id: str = SchemaField(
|
||||
description="The enrichment ID that was monitored"
|
||||
)
|
||||
final_status: str = SchemaField(
|
||||
description="The final status of the enrichment"
|
||||
)
|
||||
items_enriched: int = SchemaField(
|
||||
description="Number of items successfully enriched"
|
||||
)
|
||||
enrichment_title: str = SchemaField(
|
||||
description="Title/description of the enrichment"
|
||||
)
|
||||
elapsed_time: float = SchemaField(description="Total time elapsed in seconds")
|
||||
sample_data: list[SampleEnrichmentModel] = SchemaField(
|
||||
description="Sample of enriched data (if requested)"
|
||||
)
|
||||
timed_out: bool = SchemaField(description="Whether the operation timed out")
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="a11865c3-ac80-4721-8a40-ac4e3b71a558",
|
||||
description="Wait for a webset enrichment to complete with progress tracking",
|
||||
categories={BlockCategory.SEARCH},
|
||||
input_schema=ExaWaitForEnrichmentBlock.Input,
|
||||
output_schema=ExaWaitForEnrichmentBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
start_time = time.time()
|
||||
interval = input_data.check_interval
|
||||
max_interval = 30
|
||||
# Use AsyncExa SDK
|
||||
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
|
||||
|
||||
try:
|
||||
while time.time() - start_time < input_data.timeout:
|
||||
# Get current enrichment status using SDK
|
||||
enrichment = aexa.websets.enrichments.get(
|
||||
webset_id=input_data.webset_id, id=input_data.enrichment_id
|
||||
)
|
||||
|
||||
# Extract status
|
||||
status = (
|
||||
enrichment.status.value
|
||||
if hasattr(enrichment.status, "value")
|
||||
else str(enrichment.status)
|
||||
)
|
||||
|
||||
# Check if enrichment is complete
|
||||
if status in ["completed", "failed", "canceled"]:
|
||||
elapsed = time.time() - start_time
|
||||
|
||||
# Get sample enriched items if requested
|
||||
sample_data = []
|
||||
items_enriched = 0
|
||||
|
||||
if input_data.sample_results and status == "completed":
|
||||
sample_data, items_enriched = (
|
||||
await self._get_sample_enrichments(
|
||||
input_data.webset_id, input_data.enrichment_id, aexa
|
||||
)
|
||||
)
|
||||
|
||||
yield "enrichment_id", input_data.enrichment_id
|
||||
yield "final_status", status
|
||||
yield "items_enriched", items_enriched
|
||||
yield "enrichment_title", enrichment.title or enrichment.description or ""
|
||||
yield "elapsed_time", elapsed
|
||||
if input_data.sample_results:
|
||||
yield "sample_data", sample_data
|
||||
yield "timed_out", False
|
||||
|
||||
return
|
||||
|
||||
# Wait before next check with exponential backoff
|
||||
await asyncio.sleep(interval)
|
||||
interval = min(interval * 1.5, max_interval)
|
||||
|
||||
# Timeout reached
|
||||
elapsed = time.time() - start_time
|
||||
|
||||
# Get last known status
|
||||
enrichment = aexa.websets.enrichments.get(
|
||||
webset_id=input_data.webset_id, id=input_data.enrichment_id
|
||||
)
|
||||
final_status = (
|
||||
enrichment.status.value
|
||||
if hasattr(enrichment.status, "value")
|
||||
else str(enrichment.status)
|
||||
)
|
||||
title = enrichment.title or enrichment.description or ""
|
||||
|
||||
yield "enrichment_id", input_data.enrichment_id
|
||||
yield "final_status", final_status
|
||||
yield "items_enriched", 0
|
||||
yield "enrichment_title", title
|
||||
yield "elapsed_time", elapsed
|
||||
yield "timed_out", True
|
||||
|
||||
except asyncio.TimeoutError:
|
||||
raise ValueError(
|
||||
f"Enrichment polling timed out after {input_data.timeout} seconds"
|
||||
) from None
|
||||
|
||||
async def _get_sample_enrichments(
|
||||
self, webset_id: str, enrichment_id: str, aexa: AsyncExa
|
||||
) -> tuple[list[SampleEnrichmentModel], int]:
|
||||
"""Get sample enriched data and count."""
|
||||
# Get a few items to see enrichment results using SDK
|
||||
response = aexa.websets.items.list(webset_id=webset_id, limit=5)
|
||||
|
||||
sample_data: list[SampleEnrichmentModel] = []
|
||||
enriched_count = 0
|
||||
|
||||
for sdk_item in response.data:
|
||||
# Convert to our WebsetItemModel first
|
||||
item = WebsetItemModel.from_sdk(sdk_item)
|
||||
|
||||
# Check if this item has the enrichment we're looking for
|
||||
if enrichment_id in item.enrichments:
|
||||
enriched_count += 1
|
||||
enrich_model = item.enrichments[enrichment_id]
|
||||
|
||||
# Create sample using our typed model
|
||||
sample = SampleEnrichmentModel(
|
||||
item_id=item.id,
|
||||
item_title=item.title,
|
||||
enrichment_data=enrich_model.model_dump(exclude_none=True),
|
||||
)
|
||||
sample_data.append(sample)
|
||||
|
||||
return sample_data, enriched_count
|
||||
650
autogpt_platform/backend/backend/blocks/exa/websets_search.py
Normal file
650
autogpt_platform/backend/backend/blocks/exa/websets_search.py
Normal file
@@ -0,0 +1,650 @@
|
||||
"""
|
||||
Exa Websets Search Management Blocks
|
||||
|
||||
This module provides blocks for creating and managing searches within websets,
|
||||
including adding new searches, checking status, and canceling operations.
|
||||
"""
|
||||
|
||||
from enum import Enum
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
from exa_py import AsyncExa
|
||||
from exa_py.websets.types import WebsetSearch as SdkWebsetSearch
|
||||
from pydantic import BaseModel
|
||||
|
||||
from backend.sdk import (
|
||||
APIKeyCredentials,
|
||||
Block,
|
||||
BlockCategory,
|
||||
BlockOutput,
|
||||
BlockSchemaInput,
|
||||
BlockSchemaOutput,
|
||||
CredentialsMetaInput,
|
||||
SchemaField,
|
||||
)
|
||||
|
||||
from ._config import exa
|
||||
|
||||
|
||||
# Mirrored model for stability
|
||||
class WebsetSearchModel(BaseModel):
|
||||
"""Stable output model mirroring SDK WebsetSearch."""
|
||||
|
||||
id: str
|
||||
webset_id: str
|
||||
status: str
|
||||
query: str
|
||||
entity_type: str
|
||||
criteria: List[Dict[str, Any]]
|
||||
count: int
|
||||
behavior: str
|
||||
progress: Dict[str, Any]
|
||||
recall: Optional[Dict[str, Any]]
|
||||
created_at: str
|
||||
updated_at: str
|
||||
canceled_at: Optional[str]
|
||||
canceled_reason: Optional[str]
|
||||
metadata: Dict[str, Any]
|
||||
|
||||
@classmethod
|
||||
def from_sdk(cls, search: SdkWebsetSearch) -> "WebsetSearchModel":
|
||||
"""Convert SDK WebsetSearch to our stable model."""
|
||||
# Extract entity type
|
||||
entity_type = "auto"
|
||||
if search.entity:
|
||||
entity_dict = search.entity.model_dump(by_alias=True)
|
||||
entity_type = entity_dict.get("type", "auto")
|
||||
|
||||
# Convert criteria
|
||||
criteria = [c.model_dump(by_alias=True) for c in search.criteria]
|
||||
|
||||
# Convert progress
|
||||
progress_dict = {}
|
||||
if search.progress:
|
||||
progress_dict = search.progress.model_dump(by_alias=True)
|
||||
|
||||
# Convert recall
|
||||
recall_dict = None
|
||||
if search.recall:
|
||||
recall_dict = search.recall.model_dump(by_alias=True)
|
||||
|
||||
return cls(
|
||||
id=search.id,
|
||||
webset_id=search.webset_id,
|
||||
status=(
|
||||
search.status.value
|
||||
if hasattr(search.status, "value")
|
||||
else str(search.status)
|
||||
),
|
||||
query=search.query,
|
||||
entity_type=entity_type,
|
||||
criteria=criteria,
|
||||
count=search.count,
|
||||
behavior=search.behavior.value if search.behavior else "override",
|
||||
progress=progress_dict,
|
||||
recall=recall_dict,
|
||||
created_at=search.created_at.isoformat() if search.created_at else "",
|
||||
updated_at=search.updated_at.isoformat() if search.updated_at else "",
|
||||
canceled_at=search.canceled_at.isoformat() if search.canceled_at else None,
|
||||
canceled_reason=(
|
||||
search.canceled_reason.value if search.canceled_reason else None
|
||||
),
|
||||
metadata=search.metadata if search.metadata else {},
|
||||
)
|
||||
|
||||
|
||||
class SearchBehavior(str, Enum):
|
||||
"""Behavior for how new search results interact with existing items."""
|
||||
|
||||
OVERRIDE = "override" # Replace existing items
|
||||
APPEND = "append" # Add to existing items
|
||||
MERGE = "merge" # Merge with existing items
|
||||
|
||||
|
||||
class SearchEntityType(str, Enum):
|
||||
COMPANY = "company"
|
||||
PERSON = "person"
|
||||
ARTICLE = "article"
|
||||
RESEARCH_PAPER = "research_paper"
|
||||
CUSTOM = "custom"
|
||||
AUTO = "auto"
|
||||
|
||||
|
||||
class ExaCreateWebsetSearchBlock(Block):
|
||||
"""Add a new search to an existing webset."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
webset_id: str = SchemaField(
|
||||
description="The ID or external ID of the Webset",
|
||||
placeholder="webset-id-or-external-id",
|
||||
)
|
||||
query: str = SchemaField(
|
||||
description="Search query describing what to find",
|
||||
placeholder="Engineering managers at Fortune 500 companies",
|
||||
)
|
||||
count: int = SchemaField(
|
||||
default=10,
|
||||
description="Number of items to find",
|
||||
ge=1,
|
||||
le=1000,
|
||||
)
|
||||
|
||||
# Entity configuration
|
||||
entity_type: SearchEntityType = SchemaField(
|
||||
default=SearchEntityType.AUTO,
|
||||
description="Type of entity to search for",
|
||||
)
|
||||
entity_description: Optional[str] = SchemaField(
|
||||
default=None,
|
||||
description="Description for custom entity type",
|
||||
advanced=True,
|
||||
)
|
||||
|
||||
# Criteria for verification
|
||||
criteria: list[str] = SchemaField(
|
||||
default_factory=list,
|
||||
description="List of criteria that items must meet. If not provided, auto-detected from query.",
|
||||
advanced=True,
|
||||
)
|
||||
|
||||
# Advanced search options
|
||||
behavior: SearchBehavior = SchemaField(
|
||||
default=SearchBehavior.APPEND,
|
||||
description="How new results interact with existing items",
|
||||
advanced=True,
|
||||
)
|
||||
recall: bool = SchemaField(
|
||||
default=True,
|
||||
description="Enable recall estimation for expected results",
|
||||
advanced=True,
|
||||
)
|
||||
|
||||
# Exclude sources
|
||||
exclude_source_ids: list[str] = SchemaField(
|
||||
default_factory=list,
|
||||
description="IDs of imports/websets to exclude from results",
|
||||
advanced=True,
|
||||
)
|
||||
exclude_source_types: list[str] = SchemaField(
|
||||
default_factory=list,
|
||||
description="Types of sources to exclude ('import' or 'webset')",
|
||||
advanced=True,
|
||||
)
|
||||
|
||||
# Scope sources
|
||||
scope_source_ids: list[str] = SchemaField(
|
||||
default_factory=list,
|
||||
description="IDs of imports/websets to limit search scope to",
|
||||
advanced=True,
|
||||
)
|
||||
scope_source_types: list[str] = SchemaField(
|
||||
default_factory=list,
|
||||
description="Types of scope sources ('import' or 'webset')",
|
||||
advanced=True,
|
||||
)
|
||||
scope_relationships: list[str] = SchemaField(
|
||||
default_factory=list,
|
||||
description="Relationship definitions for hop searches",
|
||||
advanced=True,
|
||||
)
|
||||
scope_relationship_limits: list[int] = SchemaField(
|
||||
default_factory=list,
|
||||
description="Limits on related entities to find",
|
||||
advanced=True,
|
||||
)
|
||||
|
||||
metadata: Optional[dict] = SchemaField(
|
||||
default=None,
|
||||
description="Metadata to attach to the search",
|
||||
advanced=True,
|
||||
)
|
||||
|
||||
# Polling options
|
||||
wait_for_completion: bool = SchemaField(
|
||||
default=False,
|
||||
description="Wait for the search to complete before returning",
|
||||
)
|
||||
polling_timeout: int = SchemaField(
|
||||
default=300,
|
||||
description="Maximum time to wait for completion in seconds",
|
||||
advanced=True,
|
||||
ge=1,
|
||||
le=600,
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
search_id: str = SchemaField(
|
||||
description="The unique identifier for the created search"
|
||||
)
|
||||
webset_id: str = SchemaField(description="The webset this search belongs to")
|
||||
status: str = SchemaField(description="Current status of the search")
|
||||
query: str = SchemaField(description="The search query")
|
||||
expected_results: dict = SchemaField(
|
||||
description="Recall estimation of expected results"
|
||||
)
|
||||
items_found: Optional[int] = SchemaField(
|
||||
description="Number of items found (if wait_for_completion was True)"
|
||||
)
|
||||
completion_time: Optional[float] = SchemaField(
|
||||
description="Time taken to complete in seconds (if wait_for_completion was True)"
|
||||
)
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="342ff776-2e2c-4cdb-b392-4eeb34b21d5f",
|
||||
description="Add a new search to an existing webset to find more items",
|
||||
categories={BlockCategory.SEARCH},
|
||||
input_schema=ExaCreateWebsetSearchBlock.Input,
|
||||
output_schema=ExaCreateWebsetSearchBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
import time
|
||||
|
||||
# Build the payload
|
||||
payload = {
|
||||
"query": input_data.query,
|
||||
"count": input_data.count,
|
||||
"behavior": input_data.behavior.value,
|
||||
"recall": input_data.recall,
|
||||
}
|
||||
|
||||
# Add entity configuration
|
||||
if input_data.entity_type != SearchEntityType.AUTO:
|
||||
entity = {"type": input_data.entity_type.value}
|
||||
if (
|
||||
input_data.entity_type == SearchEntityType.CUSTOM
|
||||
and input_data.entity_description
|
||||
):
|
||||
entity["description"] = input_data.entity_description
|
||||
payload["entity"] = entity
|
||||
|
||||
# Add criteria if provided
|
||||
if input_data.criteria:
|
||||
payload["criteria"] = [{"description": c} for c in input_data.criteria]
|
||||
|
||||
# Add exclude sources
|
||||
if input_data.exclude_source_ids:
|
||||
exclude_list = []
|
||||
for idx, src_id in enumerate(input_data.exclude_source_ids):
|
||||
src_type = "import"
|
||||
if input_data.exclude_source_types and idx < len(
|
||||
input_data.exclude_source_types
|
||||
):
|
||||
src_type = input_data.exclude_source_types[idx]
|
||||
exclude_list.append({"source": src_type, "id": src_id})
|
||||
payload["exclude"] = exclude_list
|
||||
|
||||
# Add scope sources
|
||||
if input_data.scope_source_ids:
|
||||
scope_list: list[dict[str, Any]] = []
|
||||
for idx, src_id in enumerate(input_data.scope_source_ids):
|
||||
scope_item: dict[str, Any] = {"source": "import", "id": src_id}
|
||||
|
||||
if input_data.scope_source_types and idx < len(
|
||||
input_data.scope_source_types
|
||||
):
|
||||
scope_item["source"] = input_data.scope_source_types[idx]
|
||||
|
||||
# Add relationship if provided
|
||||
if input_data.scope_relationships and idx < len(
|
||||
input_data.scope_relationships
|
||||
):
|
||||
relationship: dict[str, Any] = {
|
||||
"definition": input_data.scope_relationships[idx]
|
||||
}
|
||||
if input_data.scope_relationship_limits and idx < len(
|
||||
input_data.scope_relationship_limits
|
||||
):
|
||||
relationship["limit"] = input_data.scope_relationship_limits[
|
||||
idx
|
||||
]
|
||||
scope_item["relationship"] = relationship
|
||||
|
||||
scope_list.append(scope_item)
|
||||
payload["scope"] = scope_list
|
||||
|
||||
# Add metadata if provided
|
||||
if input_data.metadata:
|
||||
payload["metadata"] = input_data.metadata
|
||||
|
||||
start_time = time.time()
|
||||
|
||||
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
|
||||
|
||||
sdk_search = aexa.websets.searches.create(
|
||||
webset_id=input_data.webset_id, params=payload
|
||||
)
|
||||
|
||||
search_id = sdk_search.id
|
||||
status = (
|
||||
sdk_search.status.value
|
||||
if hasattr(sdk_search.status, "value")
|
||||
else str(sdk_search.status)
|
||||
)
|
||||
|
||||
# Extract expected results from recall
|
||||
expected_results = {}
|
||||
if sdk_search.recall:
|
||||
recall_dict = sdk_search.recall.model_dump(by_alias=True)
|
||||
expected = recall_dict.get("expected", {})
|
||||
expected_results = {
|
||||
"total": expected.get("total", 0),
|
||||
"confidence": expected.get("confidence", ""),
|
||||
"min": expected.get("bounds", {}).get("min", 0),
|
||||
"max": expected.get("bounds", {}).get("max", 0),
|
||||
"reasoning": recall_dict.get("reasoning", ""),
|
||||
}
|
||||
|
||||
# If wait_for_completion is True, poll for completion
|
||||
if input_data.wait_for_completion:
|
||||
import asyncio
|
||||
|
||||
poll_interval = 5
|
||||
max_interval = 30
|
||||
poll_start = time.time()
|
||||
|
||||
while time.time() - poll_start < input_data.polling_timeout:
|
||||
current_search = aexa.websets.searches.get(
|
||||
webset_id=input_data.webset_id, id=search_id
|
||||
)
|
||||
current_status = (
|
||||
current_search.status.value
|
||||
if hasattr(current_search.status, "value")
|
||||
else str(current_search.status)
|
||||
)
|
||||
|
||||
if current_status in ["completed", "failed", "cancelled"]:
|
||||
items_found = 0
|
||||
if current_search.progress:
|
||||
items_found = current_search.progress.found
|
||||
completion_time = time.time() - start_time
|
||||
|
||||
yield "search_id", search_id
|
||||
yield "webset_id", input_data.webset_id
|
||||
yield "status", current_status
|
||||
yield "query", input_data.query
|
||||
yield "expected_results", expected_results
|
||||
yield "items_found", items_found
|
||||
yield "completion_time", completion_time
|
||||
return
|
||||
|
||||
await asyncio.sleep(poll_interval)
|
||||
poll_interval = min(poll_interval * 1.5, max_interval)
|
||||
|
||||
# Timeout - yield what we have
|
||||
yield "search_id", search_id
|
||||
yield "webset_id", input_data.webset_id
|
||||
yield "status", status
|
||||
yield "query", input_data.query
|
||||
yield "expected_results", expected_results
|
||||
yield "items_found", 0
|
||||
yield "completion_time", time.time() - start_time
|
||||
else:
|
||||
yield "search_id", search_id
|
||||
yield "webset_id", input_data.webset_id
|
||||
yield "status", status
|
||||
yield "query", input_data.query
|
||||
yield "expected_results", expected_results
|
||||
|
||||
|
||||
class ExaGetWebsetSearchBlock(Block):
|
||||
"""Get the status and details of a webset search."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
webset_id: str = SchemaField(
|
||||
description="The ID or external ID of the Webset",
|
||||
placeholder="webset-id-or-external-id",
|
||||
)
|
||||
search_id: str = SchemaField(
|
||||
description="The ID of the search to retrieve",
|
||||
placeholder="search-id",
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
search_id: str = SchemaField(description="The unique identifier for the search")
|
||||
status: str = SchemaField(description="Current status of the search")
|
||||
query: str = SchemaField(description="The search query")
|
||||
entity_type: str = SchemaField(description="Type of entity being searched")
|
||||
criteria: list[dict] = SchemaField(description="Criteria used for verification")
|
||||
progress: dict = SchemaField(description="Search progress information")
|
||||
recall: dict = SchemaField(description="Recall estimation information")
|
||||
created_at: str = SchemaField(description="When the search was created")
|
||||
updated_at: str = SchemaField(description="When the search was last updated")
|
||||
canceled_at: Optional[str] = SchemaField(
|
||||
description="When the search was canceled (if applicable)"
|
||||
)
|
||||
canceled_reason: Optional[str] = SchemaField(
|
||||
description="Reason for cancellation (if applicable)"
|
||||
)
|
||||
metadata: dict = SchemaField(description="Metadata attached to the search")
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="4fa3e627-a0ff-485f-8732-52148051646c",
|
||||
description="Get the status and details of a webset search",
|
||||
categories={BlockCategory.SEARCH},
|
||||
input_schema=ExaGetWebsetSearchBlock.Input,
|
||||
output_schema=ExaGetWebsetSearchBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
# Use AsyncExa SDK
|
||||
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
|
||||
|
||||
sdk_search = aexa.websets.searches.get(
|
||||
webset_id=input_data.webset_id, id=input_data.search_id
|
||||
)
|
||||
|
||||
search = WebsetSearchModel.from_sdk(sdk_search)
|
||||
|
||||
# Extract progress information
|
||||
progress_info = {
|
||||
"found": search.progress.get("found", 0),
|
||||
"analyzed": search.progress.get("analyzed", 0),
|
||||
"completion": search.progress.get("completion", 0),
|
||||
"time_left": search.progress.get("timeLeft", 0),
|
||||
}
|
||||
|
||||
# Extract recall information
|
||||
recall_data = {}
|
||||
if search.recall:
|
||||
expected = search.recall.get("expected", {})
|
||||
recall_data = {
|
||||
"expected_total": expected.get("total", 0),
|
||||
"confidence": expected.get("confidence", ""),
|
||||
"min_expected": expected.get("bounds", {}).get("min", 0),
|
||||
"max_expected": expected.get("bounds", {}).get("max", 0),
|
||||
"reasoning": search.recall.get("reasoning", ""),
|
||||
}
|
||||
|
||||
yield "search_id", search.id
|
||||
yield "status", search.status
|
||||
yield "query", search.query
|
||||
yield "entity_type", search.entity_type
|
||||
yield "criteria", search.criteria
|
||||
yield "progress", progress_info
|
||||
yield "recall", recall_data
|
||||
yield "created_at", search.created_at
|
||||
yield "updated_at", search.updated_at
|
||||
yield "canceled_at", search.canceled_at
|
||||
yield "canceled_reason", search.canceled_reason
|
||||
yield "metadata", search.metadata
|
||||
|
||||
|
||||
class ExaCancelWebsetSearchBlock(Block):
|
||||
"""Cancel a running webset search."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
webset_id: str = SchemaField(
|
||||
description="The ID or external ID of the Webset",
|
||||
placeholder="webset-id-or-external-id",
|
||||
)
|
||||
search_id: str = SchemaField(
|
||||
description="The ID of the search to cancel",
|
||||
placeholder="search-id",
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
search_id: str = SchemaField(description="The ID of the canceled search")
|
||||
status: str = SchemaField(description="Status after cancellation")
|
||||
items_found_before_cancel: int = SchemaField(
|
||||
description="Number of items found before cancellation"
|
||||
)
|
||||
success: str = SchemaField(
|
||||
description="Whether the cancellation was successful"
|
||||
)
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="74ef9f1e-ae89-4c7f-9d7d-d217214815b4",
|
||||
description="Cancel a running webset search",
|
||||
categories={BlockCategory.SEARCH},
|
||||
input_schema=ExaCancelWebsetSearchBlock.Input,
|
||||
output_schema=ExaCancelWebsetSearchBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
# Use AsyncExa SDK
|
||||
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
|
||||
|
||||
canceled_search = aexa.websets.searches.cancel(
|
||||
webset_id=input_data.webset_id, id=input_data.search_id
|
||||
)
|
||||
|
||||
# Extract items found before cancellation
|
||||
items_found = 0
|
||||
if canceled_search.progress:
|
||||
items_found = canceled_search.progress.found
|
||||
|
||||
status = (
|
||||
canceled_search.status.value
|
||||
if hasattr(canceled_search.status, "value")
|
||||
else str(canceled_search.status)
|
||||
)
|
||||
|
||||
yield "search_id", canceled_search.id
|
||||
yield "status", status
|
||||
yield "items_found_before_cancel", items_found
|
||||
yield "success", "true"
|
||||
|
||||
|
||||
class ExaFindOrCreateSearchBlock(Block):
|
||||
"""Find existing search by query or create new one (prevents duplicate searches)."""
|
||||
|
||||
class Input(BlockSchemaInput):
|
||||
credentials: CredentialsMetaInput = exa.credentials_field(
|
||||
description="The Exa integration requires an API Key."
|
||||
)
|
||||
webset_id: str = SchemaField(
|
||||
description="The ID or external ID of the Webset",
|
||||
placeholder="webset-id-or-external-id",
|
||||
)
|
||||
query: str = SchemaField(
|
||||
description="Search query to find or create",
|
||||
placeholder="AI companies in San Francisco",
|
||||
)
|
||||
count: int = SchemaField(
|
||||
default=10,
|
||||
description="Number of items to find (only used if creating new search)",
|
||||
ge=1,
|
||||
le=1000,
|
||||
)
|
||||
entity_type: SearchEntityType = SchemaField(
|
||||
default=SearchEntityType.AUTO,
|
||||
description="Entity type (only used if creating)",
|
||||
advanced=True,
|
||||
)
|
||||
behavior: SearchBehavior = SchemaField(
|
||||
default=SearchBehavior.OVERRIDE,
|
||||
description="Search behavior (only used if creating)",
|
||||
advanced=True,
|
||||
)
|
||||
|
||||
class Output(BlockSchemaOutput):
|
||||
search_id: str = SchemaField(description="The search ID (existing or new)")
|
||||
webset_id: str = SchemaField(description="The webset ID")
|
||||
status: str = SchemaField(description="Current search status")
|
||||
query: str = SchemaField(description="The search query")
|
||||
was_created: bool = SchemaField(
|
||||
description="True if search was newly created, False if already existed"
|
||||
)
|
||||
items_found: int = SchemaField(
|
||||
description="Number of items found (0 if still running)"
|
||||
)
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
id="cbdb05ac-cb73-4b03-a493-6d34e9a011da",
|
||||
description="Find existing search by query or create new - prevents duplicate searches in workflows",
|
||||
categories={BlockCategory.SEARCH},
|
||||
input_schema=ExaFindOrCreateSearchBlock.Input,
|
||||
output_schema=ExaFindOrCreateSearchBlock.Output,
|
||||
)
|
||||
|
||||
async def run(
|
||||
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
|
||||
) -> BlockOutput:
|
||||
# Use AsyncExa SDK
|
||||
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
|
||||
|
||||
# Get webset to check existing searches
|
||||
webset = aexa.websets.get(id=input_data.webset_id)
|
||||
|
||||
# Look for existing search with same query
|
||||
existing_search = None
|
||||
if webset.searches:
|
||||
for search in webset.searches:
|
||||
if search.query.strip().lower() == input_data.query.strip().lower():
|
||||
existing_search = search
|
||||
break
|
||||
|
||||
if existing_search:
|
||||
# Found existing search
|
||||
search = WebsetSearchModel.from_sdk(existing_search)
|
||||
|
||||
yield "search_id", search.id
|
||||
yield "webset_id", input_data.webset_id
|
||||
yield "status", search.status
|
||||
yield "query", search.query
|
||||
yield "was_created", False
|
||||
yield "items_found", search.progress.get("found", 0)
|
||||
else:
|
||||
# Create new search
|
||||
payload: Dict[str, Any] = {
|
||||
"query": input_data.query,
|
||||
"count": input_data.count,
|
||||
"behavior": input_data.behavior.value,
|
||||
}
|
||||
|
||||
# Add entity if not auto
|
||||
if input_data.entity_type != SearchEntityType.AUTO:
|
||||
payload["entity"] = {"type": input_data.entity_type.value}
|
||||
|
||||
sdk_search = aexa.websets.searches.create(
|
||||
webset_id=input_data.webset_id, params=payload
|
||||
)
|
||||
|
||||
search = WebsetSearchModel.from_sdk(sdk_search)
|
||||
|
||||
yield "search_id", search.id
|
||||
yield "webset_id", input_data.webset_id
|
||||
yield "status", search.status
|
||||
yield "query", search.query
|
||||
yield "was_created", True
|
||||
yield "items_found", 0 # Newly created, no items yet
|
||||
@@ -90,7 +90,13 @@ test.describe("Build", () => { //(1)!
|
||||
});
|
||||
|
||||
test("user can add blocks starting with e", async () => {
|
||||
await addBlocksStartingWithSplit("e", 1, 1);
|
||||
test.setTimeout(60000); // Increase timeout for many Exa blocks
|
||||
await addBlocksStartingWithSplit("e", 1, 2);
|
||||
});
|
||||
|
||||
test("user can add blocks starting with e pt 2", async () => {
|
||||
test.setTimeout(60000); // Increase timeout for many Exa blocks
|
||||
await addBlocksStartingWithSplit("e", 2, 2);
|
||||
});
|
||||
|
||||
test("user can add blocks starting with f", async () => {
|
||||
|
||||
@@ -113,17 +113,20 @@ export class BuildPage extends BasePage {
|
||||
const displayName = this.getDisplayName(block.name);
|
||||
await searchInput.clear();
|
||||
await searchInput.fill(displayName);
|
||||
await this.page.waitForTimeout(500);
|
||||
|
||||
const blockCard = this.page.getByTestId(`block-name-${block.id}`);
|
||||
if (await blockCard.isVisible()) {
|
||||
|
||||
try {
|
||||
// Wait for the block card to be visible with a reasonable timeout
|
||||
await blockCard.waitFor({ state: "visible", timeout: 10000 });
|
||||
await blockCard.click();
|
||||
const blockInEditor = this.page.getByTestId(block.id).first();
|
||||
expect(blockInEditor).toBeAttached();
|
||||
} else {
|
||||
} catch (error) {
|
||||
console.log(
|
||||
`❌ ❌ Block ${block.name} (display: ${displayName}) returned from the API but not found in block list`,
|
||||
);
|
||||
console.log(`Error: ${error}`);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -614,6 +614,18 @@ custom_requests = Requests(
|
||||
)
|
||||
```
|
||||
|
||||
### Error Handling
|
||||
|
||||
All blocks should have an error output that catches all reasonable errors that a user can handle, wrap them in a ValueError, and re-raise. Don't catch things the system admin would need to fix like being out of money or unreachable addresses.
|
||||
|
||||
### Data Models
|
||||
|
||||
Use pydantic base models over dict and typeddict where possible. Avoid untyped models for block inputs and outputs as much as possible
|
||||
|
||||
### File Input
|
||||
|
||||
You can use MediaFileType to handle the importing and exporting of files out of the system. Explore how its used through the system before using it in a block schema.
|
||||
|
||||
## Tips for Effective Block Testing
|
||||
|
||||
1. **Provide realistic test_input**: Ensure your test input covers typical use cases.
|
||||
@@ -633,77 +645,3 @@ custom_requests = Requests(
|
||||
6. **Update tests when changing block behavior**: If you modify your block, ensure the tests are updated accordingly.
|
||||
|
||||
By following these steps, you can create new blocks that extend the functionality of the AutoGPT Agent Server.
|
||||
|
||||
## Blocks we want to see
|
||||
|
||||
Below is a list of blocks that we would like to see implemented in the AutoGPT Agent Server. If you're interested in contributing, feel free to pick one of these blocks or chose your own.
|
||||
|
||||
If you would like to implement one of these blocks, open a pull request and we will start the review process.
|
||||
|
||||
### Consumer Services/Platforms
|
||||
|
||||
- Google sheets - [~~Read/Append~~](https://github.com/Significant-Gravitas/AutoGPT/pull/8236)
|
||||
- Email - Read/Send with [~~Gmail~~](https://github.com/Significant-Gravitas/AutoGPT/pull/8236), Outlook, Yahoo, Proton, etc
|
||||
- Calendar - Read/Write with Google Calendar, Outlook Calendar, etc
|
||||
- Home Assistant - Call Service, Get Status
|
||||
- Dominos - Order Pizza, Track Order
|
||||
- Uber - Book Ride, Track Ride
|
||||
- Notion - Create/Read Page, Create/Append/Read DB
|
||||
- Google drive - read/write/overwrite file/folder
|
||||
|
||||
### Social Media
|
||||
|
||||
- Twitter - Post, Reply, Get Replies, Get Comments, Get Followers, Get Following, Get Tweets, Get Mentions
|
||||
- Instagram - Post, Reply, Get Comments, Get Followers, Get Following, Get Posts, Get Mentions, Get Trending Posts
|
||||
- TikTok - Post, Reply, Get Comments, Get Followers, Get Following, Get Videos, Get Mentions, Get Trending Videos
|
||||
- LinkedIn - Post, Reply, Get Comments, Get Followers, Get Following, Get Posts, Get Mentions, Get Trending Posts
|
||||
- YouTube - Transcribe Videos/Shorts, Post Videos/Shorts, Read/Reply/React to Comments, Update Thumbnails, Update Description, Update Tags, Update Titles, Get Views, Get Likes, Get Dislikes, Get Subscribers, Get Comments, Get Shares, Get Watch Time, Get Revenue, Get Trending Videos, Get Top Videos, Get Top Channels
|
||||
- Reddit - Post, Reply, Get Comments, Get Followers, Get Following, Get Posts, Get Mentions, Get Trending Posts
|
||||
- Treatwell (and related Platforms) - Book, Cancel, Review, Get Recommendations
|
||||
- Substack - Read/Subscribe/Unsubscribe, Post/Reply, Get Recommendations
|
||||
- Discord - Read/Post/Reply, Moderation actions
|
||||
- GoodReads - Read/Post/Reply, Get Recommendations
|
||||
|
||||
### E-commerce
|
||||
|
||||
- Airbnb - Book, Cancel, Review, Get Recommendations
|
||||
- Amazon - Order, Track Order, Return, Review, Get Recommendations
|
||||
- eBay - Order, Track Order, Return, Review, Get Recommendations
|
||||
- Upwork - Post Jobs, Hire Freelancer, Review Freelancer, Fire Freelancer
|
||||
|
||||
### Business Tools
|
||||
|
||||
- External Agents - Call other agents similar to AutoGPT
|
||||
- Trello - Create/Read/Update/Delete Cards, Lists, Boards
|
||||
- Jira - Create/Read/Update/Delete Issues, Projects, Boards
|
||||
- Linear - Create/Read/Update/Delete Issues, Projects, Boards
|
||||
- Excel - Read/Write/Update/Delete Rows, Columns, Sheets
|
||||
- Slack - Read/Post/Reply to Messages, Create Channels, Invite Users
|
||||
- ERPNext - Create/Read/Update/Delete Invoices, Orders, Customers, Products
|
||||
- Salesforce - Create/Read/Update/Delete Leads, Opportunities, Accounts
|
||||
- HubSpot - Create/Read/Update/Delete Contacts, Deals, Companies
|
||||
- Zendesk - Create/Read/Update/Delete Tickets, Users, Organizations
|
||||
- Odoo - Create/Read/Update/Delete Sales Orders, Invoices, Customers
|
||||
- Shopify - Create/Read/Update/Delete Products, Orders, Customers
|
||||
- WooCommerce - Create/Read/Update/Delete Products, Orders, Customers
|
||||
- Squarespace - Create/Read/Update/Delete Pages, Products, Orders
|
||||
|
||||
## Agent Templates we want to see
|
||||
|
||||
### Data/Information
|
||||
|
||||
- Summarize top news of today, of this week, this month via Apple News or other large media outlets BBC, TechCrunch, hackernews, etc
|
||||
- Create, read, and summarize substack newsletters or any newsletters (blog writer vs blog reader)
|
||||
- Get/read/summarize the most viral Twitter, Instagram, TikTok (general social media accounts) of the day, week, month
|
||||
- Get/Read any LinkedIn posts or profile that mention AI Agents
|
||||
- Read/Summarize discord (might not be able to do this because you need access)
|
||||
- Read / Get most read books in a given month, year, etc from GoodReads or Amazon Books, etc
|
||||
- Get dates for specific shows across all streaming services
|
||||
- Suggest/Recommend/Get most watched shows in a given month, year, etc across all streaming platforms
|
||||
- Data analysis from xlsx data set
|
||||
- Gather via Excel or Google Sheets data > Sample the data randomly (sample block takes top X, bottom X, randomly, etc) > pass that to LLM Block to generate a script for analysis of the full data > Python block to run the script> making a loop back through LLM Fix Block on error > create chart/visualization (potentially in the code block?) > show the image as output (this may require frontend changes to show)
|
||||
- Tiktok video search and download
|
||||
|
||||
### Marketing
|
||||
|
||||
- Portfolio site design and enhancements
|
||||
|
||||
Reference in New Issue
Block a user