feat(blocks)!: Update Exa search block to match latest API specification (#11185)

BREAKING CHANGE: Removed deprecated use_auto_prompt field from Input
schema. Existing workflows using this field will need to be updated to
use the type field set to "auto" instead.

## Summary of Changes 📝

This PR comprehensively updates all Exa search blocks to match the
latest Exa API specification and adds significant new functionality
through the Websets API integration.

### Core API Updates 🔄

- **Migration to Exa SDK**: Replaced manual API calls with the official
`exa_py` AsyncExa SDK across all blocks for better reliability and
maintainability
- **Removed deprecated fields**: Eliminated
`use_auto_prompt`/`useAutoprompt` field (breaking change)
- **Fixed incomplete field definitions**: Corrected `user_location`
field definition
- **Added new input fields**: Added `moderation` and `context` fields
for enhanced content filtering

### Enhanced Content Settings 🛠️

- **Text field improvements**: Support both boolean and advanced object
configurations
- **New content options**: 
  - Added `livecrawl` settings (never, fallback, always, preferred)
  - Added `subpages` support for deeper content retrieval
  - Added `extras` settings for links and images
  - Added `context` settings for additional contextual information
- **Updated settings**: Enhanced `highlight` and `summary`
configurations with new query and schema options

### Comprehensive Cost Tracking 💰

- Added detailed cost tracking models:
  - `CostDollars` for monetary costs
  - `CostCredits` for API credit tracking
  - `CostDuration` for time-based costs
- New output fields: `request_id`, `resolved_search_type`,
`cost_dollars`
- Improved response handling to conditionally yield fields based on
availability

### New Websets API Integration 🚀

Added eight new specialized blocks for Exa's Websets API:
- **`websets.py`**: Core webset management (create, get, list, delete)
- **`websets_search.py`**: Search operations within websets
- **`websets_items.py`**: Individual item management (add, get, update,
delete)
- **`websets_enrichment.py`**: Data enrichment operations
- **`websets_import_export.py`**: Bulk import/export functionality
- **`websets_monitor.py`**: Monitor and track webset changes
- **`websets_polling.py`**: Poll for updates and changes

### New Special-Purpose Blocks 🎯

- **`code_context.py`**: Code search capabilities for finding relevant
code snippets from open source repositories, documentation, and Stack
Overflow
- **`research.py`**: Asynchronous research capabilities that explore the
web, gather sources, synthesize findings, and return structured results
with citations

### Code Organization Improvements 📁

- **Removed legacy code**: Deleted `model.py` file containing deprecated
API models
- **Centralized helpers**: Consolidated shared models and utilities in
`helpers.py`
- **Improved modularity**: Each webset operation is now in its own
dedicated file

### Other Changes 🔧

- Updated `.gitignore` for better development workflow
- Updated `CLAUDE.md` with project-specific instructions
- Updated documentation in `docs/content/platform/new_blocks.md` with
error handling, data models, and file input guidelines
- Improved webhook block implementations with SDK integration

### Files Changed 📂

- **Modified (11 files)**:
  - `.gitignore`
  - `autogpt_platform/CLAUDE.md`
  - `autogpt_platform/backend/backend/blocks/exa/answers.py`
  - `autogpt_platform/backend/backend/blocks/exa/contents.py`
  - `autogpt_platform/backend/backend/blocks/exa/helpers.py`
  - `autogpt_platform/backend/backend/blocks/exa/search.py`
  - `autogpt_platform/backend/backend/blocks/exa/similar.py`
  - `autogpt_platform/backend/backend/blocks/exa/webhook_blocks.py`
  - `autogpt_platform/backend/backend/blocks/exa/websets.py`
  - `docs/content/platform/new_blocks.md`

- **Added (8 files)**:
  - `autogpt_platform/backend/backend/blocks/exa/code_context.py`
  - `autogpt_platform/backend/backend/blocks/exa/research.py`
  - `autogpt_platform/backend/backend/blocks/exa/websets_enrichment.py`
- `autogpt_platform/backend/backend/blocks/exa/websets_import_export.py`
  - `autogpt_platform/backend/backend/blocks/exa/websets_items.py`
  - `autogpt_platform/backend/backend/blocks/exa/websets_monitor.py`
  - `autogpt_platform/backend/backend/blocks/exa/websets_polling.py`
  - `autogpt_platform/backend/backend/blocks/exa/websets_search.py`

- **Deleted (1 file)**:
  - `autogpt_platform/backend/backend/blocks/exa/model.py`

### Migration Guide 🚦

For users with existing workflows using the deprecated `use_auto_prompt`
field:
1. Remove the `use_auto_prompt` field from your input configuration
2. Set the `type` field to `ExaSearchTypes.AUTO` (or "auto" in JSON) to
achieve the same behavior
3. Review any custom content settings as the structure has been enhanced

### Testing Recommendations 

- Test existing workflows to ensure they handle the breaking change
- Verify cost tracking fields are properly returned
- Test new content settings options (livecrawl, subpages, extras,
context)
- Validate websets functionality if using the new Websets API blocks

🤖 Generated with [Claude Code](https://claude.com/claude-code)

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] made + ran a test agent for the blocks and flows between them
[Exa
Tests_v44.json](https://github.com/user-attachments/files/23226143/Exa.Tests_v44.json)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Migrates Exa blocks to AsyncExa SDK, adds comprehensive
Websets/research/code-context blocks, updates existing
search/content/answers/similar, deletes legacy models, adjusts
tests/docs; breaking: remove `use_auto_prompt` in favor of
`type="auto"`.
> 
> - **Backend — Exa integration (SDK migration & BREAKING)**:
> - Replace manual HTTP calls with `exa_py.AsyncExa` across `search`,
`similar`, `contents`, `answers`, and webhooks; richer outputs
(citations, context, costs, resolved search type).
>   - BREAKING: remove `Input.use_auto_prompt`; use `type = "auto"`.
> - Centralize models/utilities in `exa/helpers.py` (content settings,
cost models, result mappers).
> - **New Blocks**:
> - **Websets**: management (`websets.py`), searches, items,
enrichments, imports/exports, monitors, polling (new files under
`exa/websets_*`).
> - **Research**: async research task create/get/wait/list
(`exa/research.py`).
> - **Code Context**: code snippet/context retrieval
(`exa/code_context.py`).
> - **Removals**:
>   - Delete deprecated `exa/model.py`.
> - **Docs & DX**:
> - Update `docs/new_blocks.md` (error handling, models, file input) and
`CLAUDE.md`; ignore backend logs in `.gitignore`.
> - **Frontend Tests**:
> - Split/extend “e” block tests and improve block add robustness in
Playwright (`build.spec.ts`, `build.page.ts`).
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
6e5e572322. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added multiple Exa research and webset management blocks for task
creation, monitoring, and completion tracking.
* Introduced new search capabilities including code context retrieval,
content search, and enhanced filtering options.
* Added webset enrichment, import/export, and item management
functionality.
  * Expanded search with location-based and category filters.

* **Documentation**
* Updated guidance on error handling, data models, and file input
handling.

* **Refactor**
* Modernized backend API integration with improved response structure
and error reporting.
  * Simplified configuration options for search operations.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
Nicholas Tindle
2025-11-05 13:52:48 -06:00
committed by GitHub
parent de7c5b5c31
commit 37b3e4e82e
22 changed files with 6019 additions and 768 deletions

1
.gitignore vendored
View File

@@ -178,3 +178,4 @@ autogpt_platform/backend/settings.py
*.ign.*
.test-contents
.claude/settings.local.json
/autogpt_platform/backend/logs

View File

@@ -192,6 +192,8 @@ Quick steps:
Note: when making many new blocks analyze the interfaces for each of these blocks and picture if they would go well together in a graph based editor or would they struggle to connect productively?
ex: do the inputs and outputs tie well together?
If you get any pushback or hit complex block conditions check the new_blocks guide in the docs.
**Modifying the API:**
1. Update route in `/backend/backend/server/routers/`

View File

@@ -0,0 +1,22 @@
"""
Test credentials and helpers for Exa blocks.
"""
from pydantic import SecretStr
from backend.data.model import APIKeyCredentials
TEST_CREDENTIALS = APIKeyCredentials(
id="01234567-89ab-cdef-0123-456789abcdef",
provider="exa",
api_key=SecretStr("mock-exa-api-key"),
title="Mock Exa API key",
expires_at=None,
)
TEST_CREDENTIALS_INPUT = {
"provider": TEST_CREDENTIALS.provider,
"id": TEST_CREDENTIALS.id,
"type": TEST_CREDENTIALS.type,
"title": TEST_CREDENTIALS.title,
}

View File

@@ -1,52 +1,55 @@
from typing import Optional
from exa_py import AsyncExa
from exa_py.api import AnswerResponse
from pydantic import BaseModel
from backend.sdk import (
APIKeyCredentials,
BaseModel,
Block,
BlockCategory,
BlockOutput,
BlockSchemaInput,
BlockSchemaOutput,
CredentialsMetaInput,
Requests,
MediaFileType,
SchemaField,
)
from ._config import exa
class CostBreakdown(BaseModel):
keywordSearch: float
neuralSearch: float
contentText: float
contentHighlight: float
contentSummary: float
class AnswerCitation(BaseModel):
"""Citation model for answer endpoint."""
id: str = SchemaField(description="The temporary ID for the document")
url: str = SchemaField(description="The URL of the search result")
title: Optional[str] = SchemaField(description="The title of the search result")
author: Optional[str] = SchemaField(description="The author of the content")
publishedDate: Optional[str] = SchemaField(
description="An estimate of the creation date"
)
text: Optional[str] = SchemaField(description="The full text content of the source")
image: Optional[MediaFileType] = SchemaField(
description="The URL of the image associated with the result"
)
favicon: Optional[MediaFileType] = SchemaField(
description="The URL of the favicon for the domain"
)
class SearchBreakdown(BaseModel):
search: float
contents: float
breakdown: CostBreakdown
class PerRequestPrices(BaseModel):
neuralSearch_1_25_results: float
neuralSearch_26_100_results: float
neuralSearch_100_plus_results: float
keywordSearch_1_100_results: float
keywordSearch_100_plus_results: float
class PerPagePrices(BaseModel):
contentText: float
contentHighlight: float
contentSummary: float
class CostDollars(BaseModel):
total: float
breakDown: list[SearchBreakdown]
perRequestPrices: PerRequestPrices
perPagePrices: PerPagePrices
@classmethod
def from_sdk(cls, sdk_citation) -> "AnswerCitation":
"""Convert SDK AnswerResult (dataclass) to our Pydantic model."""
return cls(
id=getattr(sdk_citation, "id", ""),
url=getattr(sdk_citation, "url", ""),
title=getattr(sdk_citation, "title", None),
author=getattr(sdk_citation, "author", None),
publishedDate=getattr(sdk_citation, "published_date", None),
text=getattr(sdk_citation, "text", None),
image=getattr(sdk_citation, "image", None),
favicon=getattr(sdk_citation, "favicon", None),
)
class ExaAnswerBlock(Block):
@@ -59,31 +62,21 @@ class ExaAnswerBlock(Block):
placeholder="What is the latest valuation of SpaceX?",
)
text: bool = SchemaField(
default=False,
description="If true, the response includes full text content in the search results",
advanced=True,
)
model: str = SchemaField(
default="exa",
description="The search model to use (exa or exa-pro)",
placeholder="exa",
advanced=True,
description="Include full text content in the search results used for the answer",
default=True,
)
class Output(BlockSchemaOutput):
answer: str = SchemaField(
description="The generated answer based on search results"
)
citations: list[dict] = SchemaField(
description="Search results used to generate the answer",
default_factory=list,
citations: list[AnswerCitation] = SchemaField(
description="Search results used to generate the answer"
)
cost_dollars: CostDollars = SchemaField(
description="Cost breakdown of the request"
)
error: str = SchemaField(
description="Error message if the request failed", default=""
citation: AnswerCitation = SchemaField(
description="Individual citation from the answer"
)
error: str = SchemaField(description="Error message if the request failed")
def __init__(self):
super().__init__(
@@ -97,26 +90,24 @@ class ExaAnswerBlock(Block):
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
url = "https://api.exa.ai/answer"
headers = {
"Content-Type": "application/json",
"x-api-key": credentials.api_key.get_secret_value(),
}
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
# Build the payload
payload = {
"query": input_data.query,
"text": input_data.text,
"model": input_data.model,
}
# Get answer using SDK (stream=False for blocks) - this IS async, needs await
response = await aexa.answer(
query=input_data.query, text=input_data.text, stream=False
)
try:
response = await Requests().post(url, headers=headers, json=payload)
data = response.json()
# this should remain true as long as they don't start defaulting to streaming only.
# provides a bit of safety for sdk updates.
assert type(response) is AnswerResponse
yield "answer", data.get("answer", "")
yield "citations", data.get("citations", [])
yield "cost_dollars", data.get("costDollars", {})
yield "answer", response.answer
except Exception as e:
yield "error", str(e)
citations = [
AnswerCitation.from_sdk(sdk_citation)
for sdk_citation in response.citations or []
]
yield "citations", citations
for citation in citations:
yield "citation", citation

View File

@@ -0,0 +1,118 @@
"""
Exa Code Context Block
Provides code search capabilities to find relevant code snippets and examples
from open source repositories, documentation, and Stack Overflow.
"""
from typing import Union
from pydantic import BaseModel
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaInput,
BlockSchemaOutput,
CredentialsMetaInput,
Requests,
SchemaField,
)
from ._config import exa
class CodeContextResponse(BaseModel):
"""Stable output model for code context responses."""
request_id: str
query: str
response: str
results_count: int
cost_dollars: str
search_time: float
output_tokens: int
@classmethod
def from_api(cls, data: dict) -> "CodeContextResponse":
"""Convert API response to our stable model."""
return cls(
request_id=data.get("requestId", ""),
query=data.get("query", ""),
response=data.get("response", ""),
results_count=data.get("resultsCount", 0),
cost_dollars=data.get("costDollars", ""),
search_time=data.get("searchTime", 0.0),
output_tokens=data.get("outputTokens", 0),
)
class ExaCodeContextBlock(Block):
"""Get relevant code snippets and examples from open source repositories."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
query: str = SchemaField(
description="Search query to find relevant code snippets. Describe what you're trying to do or what code you're looking for.",
placeholder="how to use React hooks for state management",
)
tokens_num: Union[str, int] = SchemaField(
default="dynamic",
description="Token limit for response. Use 'dynamic' for automatic sizing, 5000 for standard queries, or 10000 for comprehensive examples.",
placeholder="dynamic",
)
class Output(BlockSchemaOutput):
request_id: str = SchemaField(description="Unique identifier for this request")
query: str = SchemaField(description="The search query used")
response: str = SchemaField(
description="Formatted code snippets and contextual examples with sources"
)
results_count: int = SchemaField(
description="Number of code sources found and included"
)
cost_dollars: str = SchemaField(description="Cost of this request in dollars")
search_time: float = SchemaField(
description="Time taken to search in milliseconds"
)
output_tokens: int = SchemaField(description="Number of tokens in the response")
def __init__(self):
super().__init__(
id="8f9e0d1c-2b3a-4567-8901-23456789abcd",
description="Search billions of GitHub repos, docs, and Stack Overflow for relevant code examples",
categories={BlockCategory.SEARCH, BlockCategory.DEVELOPER_TOOLS},
input_schema=ExaCodeContextBlock.Input,
output_schema=ExaCodeContextBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
url = "https://api.exa.ai/context"
headers = {
"Content-Type": "application/json",
"x-api-key": credentials.api_key.get_secret_value(),
}
payload = {
"query": input_data.query,
"tokensNum": input_data.tokens_num,
}
response = await Requests().post(url, headers=headers, json=payload)
data = response.json()
context = CodeContextResponse.from_api(data)
yield "request_id", context.request_id
yield "query", context.query
yield "response", context.response
yield "results_count", context.results_count
yield "cost_dollars", context.cost_dollars
yield "search_time", context.search_time
yield "output_tokens", context.output_tokens

View File

@@ -1,3 +1,9 @@
from enum import Enum
from typing import Optional
from exa_py import AsyncExa
from pydantic import BaseModel
from backend.sdk import (
APIKeyCredentials,
Block,
@@ -6,12 +12,45 @@ from backend.sdk import (
BlockSchemaInput,
BlockSchemaOutput,
CredentialsMetaInput,
Requests,
SchemaField,
)
from ._config import exa
from .helpers import ContentSettings
from .helpers import (
CostDollars,
ExaSearchResults,
ExtrasSettings,
HighlightSettings,
LivecrawlTypes,
SummarySettings,
)
class ContentStatusTag(str, Enum):
CRAWL_NOT_FOUND = "CRAWL_NOT_FOUND"
CRAWL_TIMEOUT = "CRAWL_TIMEOUT"
CRAWL_LIVECRAWL_TIMEOUT = "CRAWL_LIVECRAWL_TIMEOUT"
SOURCE_NOT_AVAILABLE = "SOURCE_NOT_AVAILABLE"
CRAWL_UNKNOWN_ERROR = "CRAWL_UNKNOWN_ERROR"
class ContentError(BaseModel):
tag: Optional[ContentStatusTag] = SchemaField(
default=None, description="Specific error type"
)
httpStatusCode: Optional[int] = SchemaField(
default=None, description="The corresponding HTTP status code"
)
class ContentStatus(BaseModel):
id: str = SchemaField(description="The URL that was requested")
status: str = SchemaField(
description="Status of the content fetch operation (success or error)"
)
error: Optional[ContentError] = SchemaField(
default=None, description="Error details, only present when status is 'error'"
)
class ExaContentsBlock(Block):
@@ -19,22 +58,70 @@ class ExaContentsBlock(Block):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
ids: list[str] = SchemaField(
description="Array of document IDs obtained from searches"
urls: list[str] = SchemaField(
description="Array of URLs to crawl (preferred over 'ids')",
default_factory=list,
advanced=False,
)
contents: ContentSettings = SchemaField(
description="Content retrieval settings",
default=ContentSettings(),
ids: list[str] = SchemaField(
description="[DEPRECATED - use 'urls' instead] Array of document IDs obtained from searches",
default_factory=list,
advanced=True,
)
text: bool = SchemaField(
description="Retrieve text content from pages",
default=True,
)
highlights: HighlightSettings = SchemaField(
description="Text snippets most relevant from each page",
default=HighlightSettings(),
)
summary: SummarySettings = SchemaField(
description="LLM-generated summary of the webpage",
default=SummarySettings(),
)
livecrawl: Optional[LivecrawlTypes] = SchemaField(
description="Livecrawling options: never, fallback (default), always, preferred",
default=LivecrawlTypes.FALLBACK,
advanced=True,
)
livecrawl_timeout: Optional[int] = SchemaField(
description="Timeout for livecrawling in milliseconds",
default=10000,
advanced=True,
)
subpages: Optional[int] = SchemaField(
description="Number of subpages to crawl", default=0, ge=0, advanced=True
)
subpage_target: Optional[str | list[str]] = SchemaField(
description="Keyword(s) to find specific subpages of search results",
default=None,
advanced=True,
)
extras: ExtrasSettings = SchemaField(
description="Extra parameters for additional content",
default=ExtrasSettings(),
advanced=True,
)
class Output(BlockSchemaOutput):
results: list = SchemaField(
description="List of document contents", default_factory=list
results: list[ExaSearchResults] = SchemaField(
description="List of document contents with metadata"
)
error: str = SchemaField(
description="Error message if the request failed", default=""
result: ExaSearchResults = SchemaField(
description="Single document content result"
)
context: str = SchemaField(
description="A formatted string of the results ready for LLMs"
)
request_id: str = SchemaField(description="Unique identifier for the request")
statuses: list[ContentStatus] = SchemaField(
description="Status information for each requested URL"
)
cost_dollars: Optional[CostDollars] = SchemaField(
description="Cost breakdown for the request"
)
error: str = SchemaField(description="Error message if the request failed")
def __init__(self):
super().__init__(
@@ -48,23 +135,91 @@ class ExaContentsBlock(Block):
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
url = "https://api.exa.ai/contents"
headers = {
"Content-Type": "application/json",
"x-api-key": credentials.api_key.get_secret_value(),
}
if not input_data.urls and not input_data.ids:
raise ValueError("Either 'urls' or 'ids' must be provided")
# Convert ContentSettings to API format
payload = {
"ids": input_data.ids,
"text": input_data.contents.text,
"highlights": input_data.contents.highlights,
"summary": input_data.contents.summary,
}
sdk_kwargs = {}
try:
response = await Requests().post(url, headers=headers, json=payload)
data = response.json()
yield "results", data.get("results", [])
except Exception as e:
yield "error", str(e)
# Prefer urls over ids
if input_data.urls:
sdk_kwargs["urls"] = input_data.urls
elif input_data.ids:
sdk_kwargs["ids"] = input_data.ids
if input_data.text:
sdk_kwargs["text"] = {"includeHtmlTags": True}
# Handle highlights - only include if modified from defaults
if input_data.highlights and (
input_data.highlights.num_sentences != 1
or input_data.highlights.highlights_per_url != 1
or input_data.highlights.query is not None
):
highlights_dict = {}
highlights_dict["numSentences"] = input_data.highlights.num_sentences
highlights_dict["highlightsPerUrl"] = (
input_data.highlights.highlights_per_url
)
if input_data.highlights.query:
highlights_dict["query"] = input_data.highlights.query
sdk_kwargs["highlights"] = highlights_dict
# Handle summary - only include if modified from defaults
if input_data.summary and (
input_data.summary.query is not None
or input_data.summary.schema is not None
):
summary_dict = {}
if input_data.summary.query:
summary_dict["query"] = input_data.summary.query
if input_data.summary.schema:
summary_dict["schema"] = input_data.summary.schema
sdk_kwargs["summary"] = summary_dict
if input_data.livecrawl:
sdk_kwargs["livecrawl"] = input_data.livecrawl.value
if input_data.livecrawl_timeout is not None:
sdk_kwargs["livecrawl_timeout"] = input_data.livecrawl_timeout
if input_data.subpages is not None:
sdk_kwargs["subpages"] = input_data.subpages
if input_data.subpage_target:
sdk_kwargs["subpage_target"] = input_data.subpage_target
# Handle extras - only include if modified from defaults
if input_data.extras and (
input_data.extras.links > 0 or input_data.extras.image_links > 0
):
extras_dict = {}
if input_data.extras.links:
extras_dict["links"] = input_data.extras.links
if input_data.extras.image_links:
extras_dict["image_links"] = input_data.extras.image_links
sdk_kwargs["extras"] = extras_dict
# Always enable context for LLM-ready output
sdk_kwargs["context"] = True
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
response = await aexa.get_contents(**sdk_kwargs)
converted_results = [
ExaSearchResults.from_sdk(sdk_result)
for sdk_result in response.results or []
]
yield "results", converted_results
for result in converted_results:
yield "result", result
if response.context:
yield "context", response.context
if response.statuses:
yield "statuses", response.statuses
if response.cost_dollars:
yield "cost_dollars", response.cost_dollars

View File

@@ -1,51 +1,150 @@
from typing import Optional
from enum import Enum
from typing import Any, Dict, Literal, Optional, Union
from backend.sdk import BaseModel, SchemaField
from backend.sdk import BaseModel, MediaFileType, SchemaField
class TextSettings(BaseModel):
max_characters: int = SchemaField(
default=1000,
class LivecrawlTypes(str, Enum):
NEVER = "never"
FALLBACK = "fallback"
ALWAYS = "always"
PREFERRED = "preferred"
class TextEnabled(BaseModel):
discriminator: Literal["enabled"] = "enabled"
class TextDisabled(BaseModel):
discriminator: Literal["disabled"] = "disabled"
class TextAdvanced(BaseModel):
discriminator: Literal["advanced"] = "advanced"
max_characters: Optional[int] = SchemaField(
default=None,
description="Maximum number of characters to return",
placeholder="1000",
)
include_html_tags: bool = SchemaField(
default=False,
description="Whether to include HTML tags in the text",
description="Include HTML tags in the response, helps LLMs understand text structure",
placeholder="False",
)
class HighlightSettings(BaseModel):
num_sentences: int = SchemaField(
default=3,
default=1,
description="Number of sentences per highlight",
placeholder="3",
placeholder="1",
ge=1,
)
highlights_per_url: int = SchemaField(
default=3,
default=1,
description="Number of highlights per URL",
placeholder="3",
placeholder="1",
ge=1,
)
query: Optional[str] = SchemaField(
default=None,
description="Custom query to direct the LLM's selection of highlights",
placeholder="Key advancements",
)
class SummarySettings(BaseModel):
query: Optional[str] = SchemaField(
default="",
description="Query string for summarization",
placeholder="Enter query",
default=None,
description="Custom query for the LLM-generated summary",
placeholder="Main developments",
)
schema: Optional[dict] = SchemaField( # type: ignore
default=None,
description="JSON schema for structured output from summary",
advanced=True,
)
class ExtrasSettings(BaseModel):
links: int = SchemaField(
default=0,
description="Number of URLs to return from each webpage",
placeholder="1",
ge=0,
)
image_links: int = SchemaField(
default=0,
description="Number of images to return for each result",
placeholder="1",
ge=0,
)
class ContextEnabled(BaseModel):
discriminator: Literal["enabled"] = "enabled"
class ContextDisabled(BaseModel):
discriminator: Literal["disabled"] = "disabled"
class ContextAdvanced(BaseModel):
discriminator: Literal["advanced"] = "advanced"
max_characters: Optional[int] = SchemaField(
default=None,
description="Maximum character limit for context string",
placeholder="10000",
)
class ContentSettings(BaseModel):
text: TextSettings = SchemaField(
default=TextSettings(),
text: Optional[Union[bool, TextEnabled, TextDisabled, TextAdvanced]] = SchemaField(
default=None,
description="Text content retrieval. Boolean for simple enable/disable or object for advanced settings",
)
highlights: HighlightSettings = SchemaField(
default=HighlightSettings(),
highlights: Optional[HighlightSettings] = SchemaField(
default=None,
description="Text snippets most relevant from each page",
)
summary: SummarySettings = SchemaField(
default=SummarySettings(),
summary: Optional[SummarySettings] = SchemaField(
default=None,
description="LLM-generated summary of the webpage",
)
livecrawl: Optional[LivecrawlTypes] = SchemaField(
default=None,
description="Livecrawling options: never, fallback, always, preferred",
advanced=True,
)
livecrawl_timeout: Optional[int] = SchemaField(
default=None,
description="Timeout for livecrawling in milliseconds",
placeholder="10000",
advanced=True,
)
subpages: Optional[int] = SchemaField(
default=None,
description="Number of subpages to crawl",
placeholder="0",
ge=0,
advanced=True,
)
subpage_target: Optional[Union[str, list[str]]] = SchemaField(
default=None,
description="Keyword(s) to find specific subpages of search results",
advanced=True,
)
extras: Optional[ExtrasSettings] = SchemaField(
default=None,
description="Extra parameters for additional content",
advanced=True,
)
context: Optional[Union[bool, ContextEnabled, ContextDisabled, ContextAdvanced]] = (
SchemaField(
default=None,
description="Format search results into a context string for LLMs",
advanced=True,
)
)
@@ -127,3 +226,225 @@ class WebsetEnrichmentConfig(BaseModel):
default=None,
description="Options for the enrichment",
)
# Shared result models
class ExaSearchExtras(BaseModel):
links: list[str] = SchemaField(
default_factory=list, description="Array of links from the search result"
)
imageLinks: list[str] = SchemaField(
default_factory=list, description="Array of image links from the search result"
)
class ExaSearchResults(BaseModel):
title: str | None = None
url: str | None = None
publishedDate: str | None = None
author: str | None = None
id: str
image: MediaFileType | None = None
favicon: MediaFileType | None = None
text: str | None = None
highlights: list[str] = SchemaField(default_factory=list)
highlightScores: list[float] = SchemaField(default_factory=list)
summary: str | None = None
subpages: list[dict] = SchemaField(default_factory=list)
extras: ExaSearchExtras | None = None
@classmethod
def from_sdk(cls, sdk_result) -> "ExaSearchResults":
"""Convert SDK Result (dataclass) to our Pydantic model."""
return cls(
id=getattr(sdk_result, "id", ""),
url=getattr(sdk_result, "url", None),
title=getattr(sdk_result, "title", None),
author=getattr(sdk_result, "author", None),
publishedDate=getattr(sdk_result, "published_date", None),
text=getattr(sdk_result, "text", None),
highlights=getattr(sdk_result, "highlights", None) or [],
highlightScores=getattr(sdk_result, "highlight_scores", None) or [],
summary=getattr(sdk_result, "summary", None),
subpages=getattr(sdk_result, "subpages", None) or [],
image=getattr(sdk_result, "image", None),
favicon=getattr(sdk_result, "favicon", None),
extras=getattr(sdk_result, "extras", None),
)
# Cost tracking models
class CostBreakdown(BaseModel):
keywordSearch: float = SchemaField(default=0.0)
neuralSearch: float = SchemaField(default=0.0)
contentText: float = SchemaField(default=0.0)
contentHighlight: float = SchemaField(default=0.0)
contentSummary: float = SchemaField(default=0.0)
class CostBreakdownItem(BaseModel):
search: float = SchemaField(default=0.0)
contents: float = SchemaField(default=0.0)
breakdown: CostBreakdown = SchemaField(default_factory=CostBreakdown)
class PerRequestPrices(BaseModel):
neuralSearch_1_25_results: float = SchemaField(default=0.005)
neuralSearch_26_100_results: float = SchemaField(default=0.025)
neuralSearch_100_plus_results: float = SchemaField(default=1.0)
keywordSearch_1_100_results: float = SchemaField(default=0.0025)
keywordSearch_100_plus_results: float = SchemaField(default=3.0)
class PerPagePrices(BaseModel):
contentText: float = SchemaField(default=0.001)
contentHighlight: float = SchemaField(default=0.001)
contentSummary: float = SchemaField(default=0.001)
class CostDollars(BaseModel):
total: float = SchemaField(description="Total dollar cost for your request")
breakDown: list[CostBreakdownItem] = SchemaField(
default_factory=list, description="Breakdown of costs by operation type"
)
perRequestPrices: PerRequestPrices = SchemaField(
default_factory=PerRequestPrices,
description="Standard price per request for different operations",
)
perPagePrices: PerPagePrices = SchemaField(
default_factory=PerPagePrices,
description="Standard price per page for different content operations",
)
# Helper functions for payload processing
def process_text_field(
text: Union[bool, TextEnabled, TextDisabled, TextAdvanced, None]
) -> Optional[Union[bool, Dict[str, Any]]]:
"""Process text field for API payload."""
if text is None:
return None
# Handle backward compatibility with boolean
if isinstance(text, bool):
return text
elif isinstance(text, TextDisabled):
return False
elif isinstance(text, TextEnabled):
return True
elif isinstance(text, TextAdvanced):
text_dict = {}
if text.max_characters:
text_dict["maxCharacters"] = text.max_characters
if text.include_html_tags:
text_dict["includeHtmlTags"] = text.include_html_tags
return text_dict if text_dict else True
return None
def process_contents_settings(contents: Optional[ContentSettings]) -> Dict[str, Any]:
"""Process ContentSettings into API payload format."""
if not contents:
return {}
content_settings = {}
# Handle text field (can be boolean or object)
text_value = process_text_field(contents.text)
if text_value is not None:
content_settings["text"] = text_value
# Handle highlights
if contents.highlights:
highlights_dict: Dict[str, Any] = {
"numSentences": contents.highlights.num_sentences,
"highlightsPerUrl": contents.highlights.highlights_per_url,
}
if contents.highlights.query:
highlights_dict["query"] = contents.highlights.query
content_settings["highlights"] = highlights_dict
if contents.summary:
summary_dict = {}
if contents.summary.query:
summary_dict["query"] = contents.summary.query
if contents.summary.schema:
summary_dict["schema"] = contents.summary.schema
content_settings["summary"] = summary_dict
if contents.livecrawl:
content_settings["livecrawl"] = contents.livecrawl.value
if contents.livecrawl_timeout is not None:
content_settings["livecrawlTimeout"] = contents.livecrawl_timeout
if contents.subpages is not None:
content_settings["subpages"] = contents.subpages
if contents.subpage_target:
content_settings["subpageTarget"] = contents.subpage_target
if contents.extras:
extras_dict = {}
if contents.extras.links:
extras_dict["links"] = contents.extras.links
if contents.extras.image_links:
extras_dict["imageLinks"] = contents.extras.image_links
content_settings["extras"] = extras_dict
context_value = process_context_field(contents.context)
if context_value is not None:
content_settings["context"] = context_value
return content_settings
def process_context_field(
context: Union[bool, dict, ContextEnabled, ContextDisabled, ContextAdvanced, None]
) -> Optional[Union[bool, Dict[str, int]]]:
"""Process context field for API payload."""
if context is None:
return None
# Handle backward compatibility with boolean
if isinstance(context, bool):
return context if context else None
elif isinstance(context, dict) and "maxCharacters" in context:
return {"maxCharacters": context["maxCharacters"]}
elif isinstance(context, ContextDisabled):
return None # Don't send context field at all when disabled
elif isinstance(context, ContextEnabled):
return True
elif isinstance(context, ContextAdvanced):
if context.max_characters:
return {"maxCharacters": context.max_characters}
return True
return None
def format_date_fields(
input_data: Any, date_field_mapping: Dict[str, str]
) -> Dict[str, str]:
"""Format datetime fields for API payload."""
formatted_dates = {}
for input_field, api_field in date_field_mapping.items():
value = getattr(input_data, input_field, None)
if value:
formatted_dates[api_field] = value.strftime("%Y-%m-%dT%H:%M:%S.000Z")
return formatted_dates
def add_optional_fields(
input_data: Any,
field_mapping: Dict[str, str],
payload: Dict[str, Any],
process_enums: bool = False,
) -> None:
"""Add optional fields to payload if they have values."""
for input_field, api_field in field_mapping.items():
value = getattr(input_data, input_field, None)
if value: # Only add non-empty values
if process_enums and hasattr(value, "value"):
payload[api_field] = value.value
else:
payload[api_field] = value

View File

@@ -1,247 +0,0 @@
from datetime import datetime
from enum import Enum
from typing import Any, Dict, List, Optional
from pydantic import BaseModel, Field
# Enum definitions based on available options
class WebsetStatus(str, Enum):
IDLE = "idle"
PENDING = "pending"
RUNNING = "running"
PAUSED = "paused"
class WebsetSearchStatus(str, Enum):
CREATED = "created"
# Add more if known, based on example it's "created"
class ImportStatus(str, Enum):
PENDING = "pending"
# Add more if known
class ImportFormat(str, Enum):
CSV = "csv"
# Add more if known
class EnrichmentStatus(str, Enum):
PENDING = "pending"
# Add more if known
class EnrichmentFormat(str, Enum):
TEXT = "text"
# Add more if known
class MonitorStatus(str, Enum):
ENABLED = "enabled"
# Add more if known
class MonitorBehaviorType(str, Enum):
SEARCH = "search"
# Add more if known
class MonitorRunStatus(str, Enum):
CREATED = "created"
# Add more if known
class CanceledReason(str, Enum):
WEBSET_DELETED = "webset_deleted"
# Add more if known
class FailedReason(str, Enum):
INVALID_FORMAT = "invalid_format"
# Add more if known
class Confidence(str, Enum):
HIGH = "high"
# Add more if known
# Nested models
class Entity(BaseModel):
type: str
class Criterion(BaseModel):
description: str
successRate: Optional[int] = None
class ExcludeItem(BaseModel):
source: str = Field(default="import")
id: str
class Relationship(BaseModel):
definition: str
limit: Optional[float] = None
class ScopeItem(BaseModel):
source: str = Field(default="import")
id: str
relationship: Optional[Relationship] = None
class Progress(BaseModel):
found: int
analyzed: int
completion: int
timeLeft: int
class Bounds(BaseModel):
min: int
max: int
class Expected(BaseModel):
total: int
confidence: str = Field(default="high") # Use str or Confidence enum
bounds: Bounds
class Recall(BaseModel):
expected: Expected
reasoning: str
class WebsetSearch(BaseModel):
id: str
object: str = Field(default="webset_search")
status: str = Field(default="created") # Or use WebsetSearchStatus
websetId: str
query: str
entity: Entity
criteria: List[Criterion]
count: int
behavior: str = Field(default="override")
exclude: List[ExcludeItem]
scope: List[ScopeItem]
progress: Progress
recall: Recall
metadata: Dict[str, Any] = Field(default_factory=dict)
canceledAt: Optional[datetime] = None
canceledReason: Optional[str] = Field(default=None) # Or use CanceledReason
createdAt: datetime
updatedAt: datetime
class ImportEntity(BaseModel):
type: str
class Import(BaseModel):
id: str
object: str = Field(default="import")
status: str = Field(default="pending") # Or use ImportStatus
format: str = Field(default="csv") # Or use ImportFormat
entity: ImportEntity
title: str
count: int
metadata: Dict[str, Any] = Field(default_factory=dict)
failedReason: Optional[str] = Field(default=None) # Or use FailedReason
failedAt: Optional[datetime] = None
failedMessage: Optional[str] = None
createdAt: datetime
updatedAt: datetime
class Option(BaseModel):
label: str
class WebsetEnrichment(BaseModel):
id: str
object: str = Field(default="webset_enrichment")
status: str = Field(default="pending") # Or use EnrichmentStatus
websetId: str
title: str
description: str
format: str = Field(default="text") # Or use EnrichmentFormat
options: List[Option]
instructions: str
metadata: Dict[str, Any] = Field(default_factory=dict)
createdAt: datetime
updatedAt: datetime
class Cadence(BaseModel):
cron: str
timezone: str = Field(default="Etc/UTC")
class BehaviorConfig(BaseModel):
query: Optional[str] = None
criteria: Optional[List[Criterion]] = None
entity: Optional[Entity] = None
count: Optional[int] = None
behavior: Optional[str] = Field(default=None)
class Behavior(BaseModel):
type: str = Field(default="search") # Or use MonitorBehaviorType
config: BehaviorConfig
class MonitorRun(BaseModel):
id: str
object: str = Field(default="monitor_run")
status: str = Field(default="created") # Or use MonitorRunStatus
monitorId: str
type: str = Field(default="search")
completedAt: Optional[datetime] = None
failedAt: Optional[datetime] = None
failedReason: Optional[str] = None
canceledAt: Optional[datetime] = None
createdAt: datetime
updatedAt: datetime
class Monitor(BaseModel):
id: str
object: str = Field(default="monitor")
status: str = Field(default="enabled") # Or use MonitorStatus
websetId: str
cadence: Cadence
behavior: Behavior
lastRun: Optional[MonitorRun] = None
nextRunAt: Optional[datetime] = None
metadata: Dict[str, Any] = Field(default_factory=dict)
createdAt: datetime
updatedAt: datetime
class Webset(BaseModel):
id: str
object: str = Field(default="webset")
status: WebsetStatus
externalId: Optional[str] = None
title: Optional[str] = None
searches: List[WebsetSearch]
imports: List[Import]
enrichments: List[WebsetEnrichment]
monitors: List[Monitor]
streams: List[Any]
createdAt: datetime
updatedAt: datetime
metadata: Dict[str, Any] = Field(default_factory=dict)
class ListWebsets(BaseModel):
data: List[Webset]
hasMore: bool
nextCursor: Optional[str] = None

View File

@@ -0,0 +1,518 @@
"""
Exa Research Task Blocks
Provides asynchronous research capabilities that explore the web, gather sources,
synthesize findings, and return structured results with citations.
"""
import asyncio
import time
from enum import Enum
from typing import Any, Dict, List, Optional
from pydantic import BaseModel
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaInput,
BlockSchemaOutput,
CredentialsMetaInput,
Requests,
SchemaField,
)
from ._config import exa
class ResearchModel(str, Enum):
"""Available research models."""
FAST = "exa-research-fast"
STANDARD = "exa-research"
PRO = "exa-research-pro"
class ResearchStatus(str, Enum):
"""Research task status."""
PENDING = "pending"
RUNNING = "running"
COMPLETED = "completed"
CANCELED = "canceled"
FAILED = "failed"
class ResearchCostModel(BaseModel):
"""Cost breakdown for a research request."""
total: float
num_searches: int
num_pages: int
reasoning_tokens: int
@classmethod
def from_api(cls, data: dict) -> "ResearchCostModel":
"""Convert API response, rounding fractional counts to integers."""
return cls(
total=data.get("total", 0.0),
num_searches=int(round(data.get("numSearches", 0))),
num_pages=int(round(data.get("numPages", 0))),
reasoning_tokens=int(round(data.get("reasoningTokens", 0))),
)
class ResearchOutputModel(BaseModel):
"""Research output with content and optional structured data."""
content: str
parsed: Optional[Dict[str, Any]] = None
class ResearchTaskModel(BaseModel):
"""Stable output model for research tasks."""
research_id: str
created_at: int
model: str
instructions: str
status: str
output_schema: Optional[Dict[str, Any]] = None
output: Optional[ResearchOutputModel] = None
cost_dollars: Optional[ResearchCostModel] = None
finished_at: Optional[int] = None
error: Optional[str] = None
@classmethod
def from_api(cls, data: dict) -> "ResearchTaskModel":
"""Convert API response to our stable model."""
output_data = data.get("output")
output = None
if output_data:
output = ResearchOutputModel(
content=output_data.get("content", ""),
parsed=output_data.get("parsed"),
)
cost_data = data.get("costDollars")
cost = None
if cost_data:
cost = ResearchCostModel.from_api(cost_data)
return cls(
research_id=data.get("researchId", ""),
created_at=data.get("createdAt", 0),
model=data.get("model", "exa-research"),
instructions=data.get("instructions", ""),
status=data.get("status", "pending"),
output_schema=data.get("outputSchema"),
output=output,
cost_dollars=cost,
finished_at=data.get("finishedAt"),
error=data.get("error"),
)
class ExaCreateResearchBlock(Block):
"""Create an asynchronous research task that explores the web and synthesizes findings."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
instructions: str = SchemaField(
description="Research instructions - clearly define what information to find, how to conduct research, and desired output format.",
placeholder="Research the top 5 AI coding assistants, their features, pricing, and user reviews",
)
model: ResearchModel = SchemaField(
default=ResearchModel.STANDARD,
description="Research model: 'fast' for quick results, 'standard' for balanced quality, 'pro' for thorough analysis",
)
output_schema: Optional[dict] = SchemaField(
default=None,
description="JSON Schema to enforce structured output. When provided, results are validated and returned as parsed JSON.",
advanced=True,
)
wait_for_completion: bool = SchemaField(
default=True,
description="Wait for research to complete before returning. Ensures you get results immediately.",
)
polling_timeout: int = SchemaField(
default=600,
description="Maximum time to wait for completion in seconds (only if wait_for_completion is True)",
advanced=True,
ge=1,
le=3600,
)
class Output(BlockSchemaOutput):
research_id: str = SchemaField(
description="Unique identifier for tracking this research request"
)
status: str = SchemaField(description="Final status of the research")
model: str = SchemaField(description="The research model used")
instructions: str = SchemaField(
description="The research instructions provided"
)
created_at: int = SchemaField(
description="When the research was created (Unix timestamp in ms)"
)
output_content: Optional[str] = SchemaField(
description="Research output as text (only if wait_for_completion was True and completed)"
)
output_parsed: Optional[dict] = SchemaField(
description="Structured JSON output (only if wait_for_completion and outputSchema were provided)"
)
cost_total: Optional[float] = SchemaField(
description="Total cost in USD (only if wait_for_completion was True and completed)"
)
elapsed_time: Optional[float] = SchemaField(
description="Time taken to complete in seconds (only if wait_for_completion was True)"
)
def __init__(self):
super().__init__(
id="a1f2e3d4-c5b6-4a78-9012-3456789abcde",
description="Create research task with optional waiting - explores web and synthesizes findings with citations",
categories={BlockCategory.SEARCH, BlockCategory.AI},
input_schema=ExaCreateResearchBlock.Input,
output_schema=ExaCreateResearchBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
url = "https://api.exa.ai/research/v1"
headers = {
"Content-Type": "application/json",
"x-api-key": credentials.api_key.get_secret_value(),
}
payload: Dict[str, Any] = {
"model": input_data.model.value,
"instructions": input_data.instructions,
}
if input_data.output_schema:
payload["outputSchema"] = input_data.output_schema
response = await Requests().post(url, headers=headers, json=payload)
data = response.json()
research_id = data.get("researchId", "")
if input_data.wait_for_completion:
start_time = time.time()
get_url = f"https://api.exa.ai/research/v1/{research_id}"
get_headers = {"x-api-key": credentials.api_key.get_secret_value()}
check_interval = 10
while time.time() - start_time < input_data.polling_timeout:
poll_response = await Requests().get(url=get_url, headers=get_headers)
poll_data = poll_response.json()
status = poll_data.get("status", "")
if status in ["completed", "failed", "canceled"]:
elapsed = time.time() - start_time
research = ResearchTaskModel.from_api(poll_data)
yield "research_id", research.research_id
yield "status", research.status
yield "model", research.model
yield "instructions", research.instructions
yield "created_at", research.created_at
yield "elapsed_time", elapsed
if research.output:
yield "output_content", research.output.content
yield "output_parsed", research.output.parsed
if research.cost_dollars:
yield "cost_total", research.cost_dollars.total
return
await asyncio.sleep(check_interval)
raise ValueError(
f"Research did not complete within {input_data.polling_timeout} seconds"
)
else:
yield "research_id", research_id
yield "status", data.get("status", "pending")
yield "model", data.get("model", input_data.model.value)
yield "instructions", data.get("instructions", input_data.instructions)
yield "created_at", data.get("createdAt", 0)
class ExaGetResearchBlock(Block):
"""Get the status and results of a research task."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
research_id: str = SchemaField(
description="The ID of the research task to retrieve",
placeholder="01jszdfs0052sg4jc552sg4jc5",
)
include_events: bool = SchemaField(
default=False,
description="Include detailed event log of research operations",
advanced=True,
)
class Output(BlockSchemaOutput):
research_id: str = SchemaField(description="The research task identifier")
status: str = SchemaField(
description="Current status: pending, running, completed, canceled, or failed"
)
instructions: str = SchemaField(
description="The original research instructions"
)
model: str = SchemaField(description="The research model used")
created_at: int = SchemaField(
description="When research was created (Unix timestamp in ms)"
)
finished_at: Optional[int] = SchemaField(
description="When research finished (Unix timestamp in ms, if completed/canceled/failed)"
)
output_content: Optional[str] = SchemaField(
description="Research output as text (if completed)"
)
output_parsed: Optional[dict] = SchemaField(
description="Structured JSON output matching outputSchema (if provided and completed)"
)
cost_total: Optional[float] = SchemaField(
description="Total cost in USD (if completed)"
)
cost_searches: Optional[int] = SchemaField(
description="Number of searches performed (if completed)"
)
cost_pages: Optional[int] = SchemaField(
description="Number of pages crawled (if completed)"
)
cost_reasoning_tokens: Optional[int] = SchemaField(
description="AI tokens used for reasoning (if completed)"
)
error_message: Optional[str] = SchemaField(
description="Error message if research failed"
)
events: Optional[List[dict]] = SchemaField(
description="Detailed event log (if include_events was True)"
)
def __init__(self):
super().__init__(
id="b2e3f4a5-6789-4bcd-9012-3456789abcde",
description="Get status and results of a research task",
categories={BlockCategory.SEARCH},
input_schema=ExaGetResearchBlock.Input,
output_schema=ExaGetResearchBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
url = f"https://api.exa.ai/research/v1/{input_data.research_id}"
headers = {
"x-api-key": credentials.api_key.get_secret_value(),
}
params = {}
if input_data.include_events:
params["events"] = "true"
response = await Requests().get(url, headers=headers, params=params)
data = response.json()
research = ResearchTaskModel.from_api(data)
yield "research_id", research.research_id
yield "status", research.status
yield "instructions", research.instructions
yield "model", research.model
yield "created_at", research.created_at
yield "finished_at", research.finished_at
if research.output:
yield "output_content", research.output.content
yield "output_parsed", research.output.parsed
if research.cost_dollars:
yield "cost_total", research.cost_dollars.total
yield "cost_searches", research.cost_dollars.num_searches
yield "cost_pages", research.cost_dollars.num_pages
yield "cost_reasoning_tokens", research.cost_dollars.reasoning_tokens
yield "error_message", research.error
if input_data.include_events:
yield "events", data.get("events", [])
class ExaWaitForResearchBlock(Block):
"""Wait for a research task to complete with progress tracking."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
research_id: str = SchemaField(
description="The ID of the research task to wait for",
placeholder="01jszdfs0052sg4jc552sg4jc5",
)
timeout: int = SchemaField(
default=600,
description="Maximum time to wait in seconds",
ge=1,
le=3600,
)
check_interval: int = SchemaField(
default=10,
description="Seconds between status checks",
advanced=True,
ge=1,
le=60,
)
class Output(BlockSchemaOutput):
research_id: str = SchemaField(description="The research task identifier")
final_status: str = SchemaField(description="Final status when polling stopped")
output_content: Optional[str] = SchemaField(
description="Research output as text (if completed)"
)
output_parsed: Optional[dict] = SchemaField(
description="Structured JSON output (if outputSchema was provided and completed)"
)
cost_total: Optional[float] = SchemaField(description="Total cost in USD")
elapsed_time: float = SchemaField(description="Total time waited in seconds")
timed_out: bool = SchemaField(
description="Whether polling timed out before completion"
)
def __init__(self):
super().__init__(
id="c3d4e5f6-7890-4abc-9012-3456789abcde",
description="Wait for a research task to complete with configurable timeout",
categories={BlockCategory.SEARCH},
input_schema=ExaWaitForResearchBlock.Input,
output_schema=ExaWaitForResearchBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
start_time = time.time()
url = f"https://api.exa.ai/research/v1/{input_data.research_id}"
headers = {
"x-api-key": credentials.api_key.get_secret_value(),
}
while time.time() - start_time < input_data.timeout:
response = await Requests().get(url, headers=headers)
data = response.json()
status = data.get("status", "")
if status in ["completed", "failed", "canceled"]:
elapsed = time.time() - start_time
research = ResearchTaskModel.from_api(data)
yield "research_id", research.research_id
yield "final_status", research.status
yield "elapsed_time", elapsed
yield "timed_out", False
if research.output:
yield "output_content", research.output.content
yield "output_parsed", research.output.parsed
if research.cost_dollars:
yield "cost_total", research.cost_dollars.total
return
await asyncio.sleep(input_data.check_interval)
elapsed = time.time() - start_time
response = await Requests().get(url, headers=headers)
data = response.json()
yield "research_id", input_data.research_id
yield "final_status", data.get("status", "unknown")
yield "elapsed_time", elapsed
yield "timed_out", True
class ExaListResearchBlock(Block):
"""List all research tasks with pagination support."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
cursor: Optional[str] = SchemaField(
default=None,
description="Cursor for pagination through results",
advanced=True,
)
limit: int = SchemaField(
default=10,
description="Number of research tasks to return (1-50)",
ge=1,
le=50,
advanced=True,
)
class Output(BlockSchemaOutput):
research_tasks: List[ResearchTaskModel] = SchemaField(
description="List of research tasks ordered by creation time (newest first)"
)
research_task: ResearchTaskModel = SchemaField(
description="Individual research task (yielded for each task)"
)
has_more: bool = SchemaField(
description="Whether there are more tasks to paginate through"
)
next_cursor: Optional[str] = SchemaField(
description="Cursor for the next page of results"
)
def __init__(self):
super().__init__(
id="d4e5f6a7-8901-4bcd-9012-3456789abcde",
description="List all research tasks with pagination support",
categories={BlockCategory.SEARCH},
input_schema=ExaListResearchBlock.Input,
output_schema=ExaListResearchBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
url = "https://api.exa.ai/research/v1"
headers = {
"x-api-key": credentials.api_key.get_secret_value(),
}
params: Dict[str, Any] = {
"limit": input_data.limit,
}
if input_data.cursor:
params["cursor"] = input_data.cursor
response = await Requests().get(url, headers=headers, params=params)
data = response.json()
tasks = [ResearchTaskModel.from_api(task) for task in data.get("data", [])]
yield "research_tasks", tasks
for task in tasks:
yield "research_task", task
yield "has_more", data.get("hasMore", False)
yield "next_cursor", data.get("nextCursor")

View File

@@ -1,4 +1,8 @@
from datetime import datetime
from enum import Enum
from typing import Optional
from exa_py import AsyncExa
from backend.sdk import (
APIKeyCredentials,
@@ -8,12 +12,35 @@ from backend.sdk import (
BlockSchemaInput,
BlockSchemaOutput,
CredentialsMetaInput,
Requests,
SchemaField,
)
from ._config import exa
from .helpers import ContentSettings
from .helpers import (
ContentSettings,
CostDollars,
ExaSearchResults,
process_contents_settings,
)
class ExaSearchTypes(Enum):
KEYWORD = "keyword"
NEURAL = "neural"
FAST = "fast"
AUTO = "auto"
class ExaSearchCategories(Enum):
COMPANY = "company"
RESEARCH_PAPER = "research paper"
NEWS = "news"
PDF = "pdf"
GITHUB = "github"
TWEET = "tweet"
PERSONAL_SITE = "personal site"
LINKEDIN_PROFILE = "linkedin profile"
FINANCIAL_REPORT = "financial report"
class ExaSearchBlock(Block):
@@ -22,12 +49,18 @@ class ExaSearchBlock(Block):
description="The Exa integration requires an API Key."
)
query: str = SchemaField(description="The search query")
use_auto_prompt: bool = SchemaField(
description="Whether to use autoprompt", default=True, advanced=True
type: ExaSearchTypes = SchemaField(
description="Type of search", default=ExaSearchTypes.AUTO, advanced=True
)
type: str = SchemaField(description="Type of search", default="", advanced=True)
category: str = SchemaField(
description="Category to search within", default="", advanced=True
category: ExaSearchCategories | None = SchemaField(
description="Category to search within: company, research paper, news, pdf, github, tweet, personal site, linkedin profile, financial report",
default=None,
advanced=True,
)
user_location: str | None = SchemaField(
description="The two-letter ISO country code of the user (e.g., 'US')",
default=None,
advanced=True,
)
number_of_results: int = SchemaField(
description="Number of results to return", default=10, advanced=True
@@ -40,17 +73,17 @@ class ExaSearchBlock(Block):
default_factory=list,
advanced=True,
)
start_crawl_date: datetime = SchemaField(
description="Start date for crawled content"
start_crawl_date: datetime | None = SchemaField(
description="Start date for crawled content", advanced=True, default=None
)
end_crawl_date: datetime = SchemaField(
description="End date for crawled content"
end_crawl_date: datetime | None = SchemaField(
description="End date for crawled content", advanced=True, default=None
)
start_published_date: datetime = SchemaField(
description="Start date for published content"
start_published_date: datetime | None = SchemaField(
description="Start date for published content", advanced=True, default=None
)
end_published_date: datetime = SchemaField(
description="End date for published content"
end_published_date: datetime | None = SchemaField(
description="End date for published content", advanced=True, default=None
)
include_text: list[str] = SchemaField(
description="Text patterns to include", default_factory=list, advanced=True
@@ -63,14 +96,30 @@ class ExaSearchBlock(Block):
default=ContentSettings(),
advanced=True,
)
moderation: bool = SchemaField(
description="Enable content moderation to filter unsafe content from search results",
default=False,
advanced=True,
)
class Output(BlockSchemaOutput):
results: list = SchemaField(
description="List of search results", default_factory=list
results: list[ExaSearchResults] = SchemaField(
description="List of search results"
)
error: str = SchemaField(
description="Error message if the request failed",
result: ExaSearchResults = SchemaField(description="Single search result")
context: str = SchemaField(
description="A formatted string of the search results ready for LLMs."
)
search_type: str = SchemaField(
description="For auto searches, indicates which search type was selected."
)
resolved_search_type: str = SchemaField(
description="The search type that was actually used for this request (neural or keyword)"
)
cost_dollars: Optional[CostDollars] = SchemaField(
description="Cost breakdown for the request"
)
error: str = SchemaField(description="Error message if the request failed")
def __init__(self):
super().__init__(
@@ -84,51 +133,76 @@ class ExaSearchBlock(Block):
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
url = "https://api.exa.ai/search"
headers = {
"Content-Type": "application/json",
"x-api-key": credentials.api_key.get_secret_value(),
}
payload = {
sdk_kwargs = {
"query": input_data.query,
"useAutoprompt": input_data.use_auto_prompt,
"numResults": input_data.number_of_results,
"contents": input_data.contents.model_dump(),
"num_results": input_data.number_of_results,
}
date_field_mapping = {
"start_crawl_date": "startCrawlDate",
"end_crawl_date": "endCrawlDate",
"start_published_date": "startPublishedDate",
"end_published_date": "endPublishedDate",
}
if input_data.type:
sdk_kwargs["type"] = input_data.type.value
# Add dates if they exist
for input_field, api_field in date_field_mapping.items():
value = getattr(input_data, input_field, None)
if value:
payload[api_field] = value.strftime("%Y-%m-%dT%H:%M:%S.000Z")
if input_data.category:
sdk_kwargs["category"] = input_data.category.value
optional_field_mapping = {
"type": "type",
"category": "category",
"include_domains": "includeDomains",
"exclude_domains": "excludeDomains",
"include_text": "includeText",
"exclude_text": "excludeText",
}
if input_data.user_location:
sdk_kwargs["user_location"] = input_data.user_location
# Add other fields
for input_field, api_field in optional_field_mapping.items():
value = getattr(input_data, input_field)
if value: # Only add non-empty values
payload[api_field] = value
# Handle domains
if input_data.include_domains:
sdk_kwargs["include_domains"] = input_data.include_domains
if input_data.exclude_domains:
sdk_kwargs["exclude_domains"] = input_data.exclude_domains
try:
response = await Requests().post(url, headers=headers, json=payload)
data = response.json()
# Extract just the results array from the response
yield "results", data.get("results", [])
except Exception as e:
yield "error", str(e)
# Handle dates
if input_data.start_crawl_date:
sdk_kwargs["start_crawl_date"] = input_data.start_crawl_date.isoformat()
if input_data.end_crawl_date:
sdk_kwargs["end_crawl_date"] = input_data.end_crawl_date.isoformat()
if input_data.start_published_date:
sdk_kwargs["start_published_date"] = (
input_data.start_published_date.isoformat()
)
if input_data.end_published_date:
sdk_kwargs["end_published_date"] = input_data.end_published_date.isoformat()
# Handle text filters
if input_data.include_text:
sdk_kwargs["include_text"] = input_data.include_text
if input_data.exclude_text:
sdk_kwargs["exclude_text"] = input_data.exclude_text
if input_data.moderation:
sdk_kwargs["moderation"] = input_data.moderation
# heck if we need to use search_and_contents
content_settings = process_contents_settings(input_data.contents)
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
if content_settings:
sdk_kwargs["text"] = content_settings.get("text", False)
if "highlights" in content_settings:
sdk_kwargs["highlights"] = content_settings["highlights"]
if "summary" in content_settings:
sdk_kwargs["summary"] = content_settings["summary"]
response = await aexa.search_and_contents(**sdk_kwargs)
else:
response = await aexa.search(**sdk_kwargs)
converted_results = [
ExaSearchResults.from_sdk(sdk_result)
for sdk_result in response.results or []
]
yield "results", converted_results
for result in converted_results:
yield "result", result
if response.context:
yield "context", response.context
if response.resolved_search_type:
yield "resolved_search_type", response.resolved_search_type
if response.cost_dollars:
yield "cost_dollars", response.cost_dollars

View File

@@ -1,5 +1,7 @@
from datetime import datetime
from typing import Any
from typing import Optional
from exa_py import AsyncExa
from backend.sdk import (
APIKeyCredentials,
@@ -9,12 +11,16 @@ from backend.sdk import (
BlockSchemaInput,
BlockSchemaOutput,
CredentialsMetaInput,
Requests,
SchemaField,
)
from ._config import exa
from .helpers import ContentSettings
from .helpers import (
ContentSettings,
CostDollars,
ExaSearchResults,
process_contents_settings,
)
class ExaFindSimilarBlock(Block):
@@ -29,7 +35,7 @@ class ExaFindSimilarBlock(Block):
description="Number of results to return", default=10, advanced=True
)
include_domains: list[str] = SchemaField(
description="Domains to include in search",
description="List of domains to include in the search. If specified, results will only come from these domains.",
default_factory=list,
advanced=True,
)
@@ -38,17 +44,17 @@ class ExaFindSimilarBlock(Block):
default_factory=list,
advanced=True,
)
start_crawl_date: datetime = SchemaField(
description="Start date for crawled content"
start_crawl_date: Optional[datetime] = SchemaField(
description="Start date for crawled content", advanced=True, default=None
)
end_crawl_date: datetime = SchemaField(
description="End date for crawled content"
end_crawl_date: Optional[datetime] = SchemaField(
description="End date for crawled content", advanced=True, default=None
)
start_published_date: datetime = SchemaField(
description="Start date for published content"
start_published_date: Optional[datetime] = SchemaField(
description="Start date for published content", advanced=True, default=None
)
end_published_date: datetime = SchemaField(
description="End date for published content"
end_published_date: Optional[datetime] = SchemaField(
description="End date for published content", advanced=True, default=None
)
include_text: list[str] = SchemaField(
description="Text patterns to include (max 1 string, up to 5 words)",
@@ -65,15 +71,27 @@ class ExaFindSimilarBlock(Block):
default=ContentSettings(),
advanced=True,
)
moderation: bool = SchemaField(
description="Enable content moderation to filter unsafe content from search results",
default=False,
advanced=True,
)
class Output(BlockSchemaOutput):
results: list[Any] = SchemaField(
description="List of similar documents with title, URL, published date, author, and score",
default_factory=list,
results: list[ExaSearchResults] = SchemaField(
description="List of similar documents with metadata and content"
)
error: str = SchemaField(
description="Error message if the request failed", default=""
result: ExaSearchResults = SchemaField(
description="Single similar document result"
)
context: str = SchemaField(
description="A formatted string of the results ready for LLMs."
)
request_id: str = SchemaField(description="Unique identifier for the request")
cost_dollars: Optional[CostDollars] = SchemaField(
description="Cost breakdown for the request"
)
error: str = SchemaField(description="Error message if the request failed")
def __init__(self):
super().__init__(
@@ -87,47 +105,65 @@ class ExaFindSimilarBlock(Block):
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
url = "https://api.exa.ai/findSimilar"
headers = {
"Content-Type": "application/json",
"x-api-key": credentials.api_key.get_secret_value(),
}
payload = {
sdk_kwargs = {
"url": input_data.url,
"numResults": input_data.number_of_results,
"contents": input_data.contents.model_dump(),
"num_results": input_data.number_of_results,
}
optional_field_mapping = {
"include_domains": "includeDomains",
"exclude_domains": "excludeDomains",
"include_text": "includeText",
"exclude_text": "excludeText",
}
# Handle domains
if input_data.include_domains:
sdk_kwargs["include_domains"] = input_data.include_domains
if input_data.exclude_domains:
sdk_kwargs["exclude_domains"] = input_data.exclude_domains
# Add optional fields if they have values
for input_field, api_field in optional_field_mapping.items():
value = getattr(input_data, input_field)
if value: # Only add non-empty values
payload[api_field] = value
# Handle dates
if input_data.start_crawl_date:
sdk_kwargs["start_crawl_date"] = input_data.start_crawl_date.isoformat()
if input_data.end_crawl_date:
sdk_kwargs["end_crawl_date"] = input_data.end_crawl_date.isoformat()
if input_data.start_published_date:
sdk_kwargs["start_published_date"] = (
input_data.start_published_date.isoformat()
)
if input_data.end_published_date:
sdk_kwargs["end_published_date"] = input_data.end_published_date.isoformat()
date_field_mapping = {
"start_crawl_date": "startCrawlDate",
"end_crawl_date": "endCrawlDate",
"start_published_date": "startPublishedDate",
"end_published_date": "endPublishedDate",
}
# Handle text filters
if input_data.include_text:
sdk_kwargs["include_text"] = input_data.include_text
if input_data.exclude_text:
sdk_kwargs["exclude_text"] = input_data.exclude_text
# Add dates if they exist
for input_field, api_field in date_field_mapping.items():
value = getattr(input_data, input_field, None)
if value:
payload[api_field] = value.strftime("%Y-%m-%dT%H:%M:%S.000Z")
if input_data.moderation:
sdk_kwargs["moderation"] = input_data.moderation
try:
response = await Requests().post(url, headers=headers, json=payload)
data = response.json()
yield "results", data.get("results", [])
except Exception as e:
yield "error", str(e)
# check if we need to use find_similar_and_contents
content_settings = process_contents_settings(input_data.contents)
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
if content_settings:
# Use find_similar_and_contents when contents are requested
sdk_kwargs["text"] = content_settings.get("text", False)
if "highlights" in content_settings:
sdk_kwargs["highlights"] = content_settings["highlights"]
if "summary" in content_settings:
sdk_kwargs["summary"] = content_settings["summary"]
response = await aexa.find_similar_and_contents(**sdk_kwargs)
else:
response = await aexa.find_similar(**sdk_kwargs)
converted_results = [
ExaSearchResults.from_sdk(sdk_result)
for sdk_result in response.results or []
]
yield "results", converted_results
for result in converted_results:
yield "result", result
if response.context:
yield "context", response.context
if response.cost_dollars:
yield "cost_dollars", response.cost_dollars

View File

@@ -132,45 +132,33 @@ class ExaWebsetWebhookBlock(Block):
async def run(self, input_data: Input, **kwargs) -> BlockOutput:
"""Process incoming Exa webhook payload."""
try:
payload = input_data.payload
payload = input_data.payload
# Extract event details
event_type = payload.get("eventType", "unknown")
event_id = payload.get("eventId", "")
# Extract event details
event_type = payload.get("eventType", "unknown")
event_id = payload.get("eventId", "")
# Get webset ID from payload or input
webset_id = payload.get("websetId", input_data.webset_id)
# Get webset ID from payload or input
webset_id = payload.get("websetId", input_data.webset_id)
# Check if we should process this event based on filter
should_process = self._should_process_event(
event_type, input_data.event_filter
)
# Check if we should process this event based on filter
should_process = self._should_process_event(event_type, input_data.event_filter)
if not should_process:
# Skip events that don't match our filter
return
if not should_process:
# Skip events that don't match our filter
return
# Extract event data
event_data = payload.get("data", {})
timestamp = payload.get("occurredAt", payload.get("createdAt", ""))
metadata = payload.get("metadata", {})
# Extract event data
event_data = payload.get("data", {})
timestamp = payload.get("occurredAt", payload.get("createdAt", ""))
metadata = payload.get("metadata", {})
yield "event_type", event_type
yield "event_id", event_id
yield "webset_id", webset_id
yield "data", event_data
yield "timestamp", timestamp
yield "metadata", metadata
except Exception as e:
# Handle errors gracefully
yield "event_type", "error"
yield "event_id", ""
yield "webset_id", input_data.webset_id
yield "data", {"error": str(e)}
yield "timestamp", ""
yield "metadata", {}
yield "event_type", event_type
yield "event_id", event_id
yield "webset_id", webset_id
yield "data", event_data
yield "timestamp", timestamp
yield "metadata", metadata
def _should_process_event(
self, event_type: str, event_filter: WebsetEventFilter

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,554 @@
"""
Exa Websets Enrichment Management Blocks
This module provides blocks for creating and managing enrichments on webset items,
allowing extraction of additional structured data from existing items.
"""
from enum import Enum
from typing import Any, Dict, List, Optional
from exa_py import AsyncExa
from exa_py.websets.types import WebsetEnrichment as SdkWebsetEnrichment
from pydantic import BaseModel
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaInput,
BlockSchemaOutput,
CredentialsMetaInput,
Requests,
SchemaField,
)
from ._config import exa
# Mirrored model for stability
class WebsetEnrichmentModel(BaseModel):
"""Stable output model mirroring SDK WebsetEnrichment."""
id: str
webset_id: str
status: str
title: Optional[str]
description: str
format: str
options: List[str]
instructions: Optional[str]
metadata: Dict[str, Any]
created_at: str
updated_at: str
@classmethod
def from_sdk(cls, enrichment: SdkWebsetEnrichment) -> "WebsetEnrichmentModel":
"""Convert SDK WebsetEnrichment to our stable model."""
# Extract options
options_list = []
if enrichment.options:
for option in enrichment.options:
option_dict = option.model_dump(by_alias=True)
options_list.append(option_dict.get("label", ""))
return cls(
id=enrichment.id,
webset_id=enrichment.webset_id,
status=(
enrichment.status.value
if hasattr(enrichment.status, "value")
else str(enrichment.status)
),
title=enrichment.title,
description=enrichment.description,
format=(
enrichment.format.value
if enrichment.format and hasattr(enrichment.format, "value")
else "text"
),
options=options_list,
instructions=enrichment.instructions,
metadata=enrichment.metadata if enrichment.metadata else {},
created_at=(
enrichment.created_at.isoformat() if enrichment.created_at else ""
),
updated_at=(
enrichment.updated_at.isoformat() if enrichment.updated_at else ""
),
)
class EnrichmentFormat(str, Enum):
"""Format types for enrichment responses."""
TEXT = "text" # Free text response
DATE = "date" # Date/datetime format
NUMBER = "number" # Numeric value
OPTIONS = "options" # Multiple choice from provided options
EMAIL = "email" # Email address format
PHONE = "phone" # Phone number format
class ExaCreateEnrichmentBlock(Block):
"""Create a new enrichment to extract additional data from webset items."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
webset_id: str = SchemaField(
description="The ID or external ID of the Webset",
placeholder="webset-id-or-external-id",
)
description: str = SchemaField(
description="What data to extract from each item",
placeholder="Extract the company's main product or service offering",
)
title: Optional[str] = SchemaField(
default=None,
description="Short title for this enrichment (auto-generated if not provided)",
placeholder="Main Product",
)
format: EnrichmentFormat = SchemaField(
default=EnrichmentFormat.TEXT,
description="Expected format of the extracted data",
)
options: list[str] = SchemaField(
default_factory=list,
description="Available options when format is 'options'",
placeholder='["B2B", "B2C", "Both", "Unknown"]',
advanced=True,
)
apply_to_existing: bool = SchemaField(
default=True,
description="Apply this enrichment to existing items in the webset",
)
metadata: Optional[dict] = SchemaField(
default=None,
description="Metadata to attach to the enrichment",
advanced=True,
)
wait_for_completion: bool = SchemaField(
default=False,
description="Wait for the enrichment to complete on existing items",
)
polling_timeout: int = SchemaField(
default=300,
description="Maximum time to wait for completion in seconds",
advanced=True,
ge=1,
le=600,
)
class Output(BlockSchemaOutput):
enrichment_id: str = SchemaField(
description="The unique identifier for the created enrichment"
)
webset_id: str = SchemaField(
description="The webset this enrichment belongs to"
)
status: str = SchemaField(description="Current status of the enrichment")
title: str = SchemaField(description="Title of the enrichment")
description: str = SchemaField(
description="Description of what data is extracted"
)
format: str = SchemaField(description="Format of the extracted data")
instructions: str = SchemaField(
description="Generated instructions for the enrichment"
)
items_enriched: Optional[int] = SchemaField(
description="Number of items enriched (if wait_for_completion was True)"
)
completion_time: Optional[float] = SchemaField(
description="Time taken to complete in seconds (if wait_for_completion was True)"
)
def __init__(self):
super().__init__(
id="71146ae8-0cb1-4a15-8cde-eae30de71cb6",
description="Create enrichments to extract additional structured data from webset items",
categories={BlockCategory.AI, BlockCategory.SEARCH},
input_schema=ExaCreateEnrichmentBlock.Input,
output_schema=ExaCreateEnrichmentBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
import time
# Build the payload
payload: dict[str, Any] = {
"description": input_data.description,
"format": input_data.format.value,
}
# Add title if provided
if input_data.title:
payload["title"] = input_data.title
# Add options for 'options' format
if input_data.format == EnrichmentFormat.OPTIONS and input_data.options:
payload["options"] = [{"label": opt} for opt in input_data.options]
# Add metadata if provided
if input_data.metadata:
payload["metadata"] = input_data.metadata
start_time = time.time()
# Use AsyncExa SDK
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
sdk_enrichment = aexa.websets.enrichments.create(
webset_id=input_data.webset_id, params=payload
)
enrichment_id = sdk_enrichment.id
status = (
sdk_enrichment.status.value
if hasattr(sdk_enrichment.status, "value")
else str(sdk_enrichment.status)
)
# If wait_for_completion is True and apply_to_existing is True, poll for completion
if input_data.wait_for_completion and input_data.apply_to_existing:
import asyncio
poll_interval = 5
max_interval = 30
poll_start = time.time()
items_enriched = 0
while time.time() - poll_start < input_data.polling_timeout:
current_enrich = aexa.websets.enrichments.get(
webset_id=input_data.webset_id, id=enrichment_id
)
current_status = (
current_enrich.status.value
if hasattr(current_enrich.status, "value")
else str(current_enrich.status)
)
if current_status in ["completed", "failed", "cancelled"]:
# Estimate items from webset searches
webset = aexa.websets.get(id=input_data.webset_id)
if webset.searches:
for search in webset.searches:
if search.progress:
items_enriched += search.progress.found
completion_time = time.time() - start_time
yield "enrichment_id", enrichment_id
yield "webset_id", input_data.webset_id
yield "status", current_status
yield "title", sdk_enrichment.title
yield "description", input_data.description
yield "format", input_data.format.value
yield "instructions", sdk_enrichment.instructions
yield "items_enriched", items_enriched
yield "completion_time", completion_time
return
await asyncio.sleep(poll_interval)
poll_interval = min(poll_interval * 1.5, max_interval)
# Timeout
completion_time = time.time() - start_time
yield "enrichment_id", enrichment_id
yield "webset_id", input_data.webset_id
yield "status", status
yield "title", sdk_enrichment.title
yield "description", input_data.description
yield "format", input_data.format.value
yield "instructions", sdk_enrichment.instructions
yield "items_enriched", 0
yield "completion_time", completion_time
else:
yield "enrichment_id", enrichment_id
yield "webset_id", input_data.webset_id
yield "status", status
yield "title", sdk_enrichment.title
yield "description", input_data.description
yield "format", input_data.format.value
yield "instructions", sdk_enrichment.instructions
class ExaGetEnrichmentBlock(Block):
"""Get the status and details of a webset enrichment."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
webset_id: str = SchemaField(
description="The ID or external ID of the Webset",
placeholder="webset-id-or-external-id",
)
enrichment_id: str = SchemaField(
description="The ID of the enrichment to retrieve",
placeholder="enrichment-id",
)
class Output(BlockSchemaOutput):
enrichment_id: str = SchemaField(
description="The unique identifier for the enrichment"
)
status: str = SchemaField(description="Current status of the enrichment")
title: str = SchemaField(description="Title of the enrichment")
description: str = SchemaField(
description="Description of what data is extracted"
)
format: str = SchemaField(description="Format of the extracted data")
options: list[str] = SchemaField(
description="Available options (for 'options' format)"
)
instructions: str = SchemaField(
description="Generated instructions for the enrichment"
)
created_at: str = SchemaField(description="When the enrichment was created")
updated_at: str = SchemaField(
description="When the enrichment was last updated"
)
metadata: dict = SchemaField(description="Metadata attached to the enrichment")
def __init__(self):
super().__init__(
id="b8c9d0e1-f2a3-4567-89ab-cdef01234567",
description="Get the status and details of a webset enrichment",
categories={BlockCategory.SEARCH},
input_schema=ExaGetEnrichmentBlock.Input,
output_schema=ExaGetEnrichmentBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
# Use AsyncExa SDK
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
sdk_enrichment = aexa.websets.enrichments.get(
webset_id=input_data.webset_id, id=input_data.enrichment_id
)
enrichment = WebsetEnrichmentModel.from_sdk(sdk_enrichment)
yield "enrichment_id", enrichment.id
yield "status", enrichment.status
yield "title", enrichment.title
yield "description", enrichment.description
yield "format", enrichment.format
yield "options", enrichment.options
yield "instructions", enrichment.instructions
yield "created_at", enrichment.created_at
yield "updated_at", enrichment.updated_at
yield "metadata", enrichment.metadata
class ExaUpdateEnrichmentBlock(Block):
"""Update an existing enrichment configuration."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
webset_id: str = SchemaField(
description="The ID or external ID of the Webset",
placeholder="webset-id-or-external-id",
)
enrichment_id: str = SchemaField(
description="The ID of the enrichment to update",
placeholder="enrichment-id",
)
description: Optional[str] = SchemaField(
default=None,
description="New description for what data to extract",
)
format: Optional[EnrichmentFormat] = SchemaField(
default=None,
description="New format for the extracted data",
)
options: Optional[list[str]] = SchemaField(
default=None,
description="New options when format is 'options'",
)
metadata: Optional[dict] = SchemaField(
default=None,
description="New metadata to attach to the enrichment",
)
class Output(BlockSchemaOutput):
enrichment_id: str = SchemaField(
description="The unique identifier for the enrichment"
)
status: str = SchemaField(description="Current status of the enrichment")
title: str = SchemaField(description="Title of the enrichment")
description: str = SchemaField(description="Updated description")
format: str = SchemaField(description="Updated format")
success: str = SchemaField(description="Whether the update was successful")
def __init__(self):
super().__init__(
id="c8d5c5fb-9684-4a29-bd2a-5b38d71776c9",
description="Update an existing enrichment configuration",
categories={BlockCategory.SEARCH},
input_schema=ExaUpdateEnrichmentBlock.Input,
output_schema=ExaUpdateEnrichmentBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
url = f"https://api.exa.ai/websets/v0/websets/{input_data.webset_id}/enrichments/{input_data.enrichment_id}"
headers = {
"Content-Type": "application/json",
"x-api-key": credentials.api_key.get_secret_value(),
}
# Build the update payload
payload = {}
if input_data.description is not None:
payload["description"] = input_data.description
if input_data.format is not None:
payload["format"] = input_data.format.value
if input_data.options is not None:
payload["options"] = [{"label": opt} for opt in input_data.options]
if input_data.metadata is not None:
payload["metadata"] = input_data.metadata
try:
response = await Requests().patch(url, headers=headers, json=payload)
data = response.json()
yield "enrichment_id", data.get("id", "")
yield "status", data.get("status", "")
yield "title", data.get("title", "")
yield "description", data.get("description", "")
yield "format", data.get("format", "")
yield "success", "true"
except ValueError as e:
# Re-raise user input validation errors
raise ValueError(f"Failed to update enrichment: {e}") from e
# Let all other exceptions propagate naturally
class ExaDeleteEnrichmentBlock(Block):
"""Delete an enrichment from a webset."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
webset_id: str = SchemaField(
description="The ID or external ID of the Webset",
placeholder="webset-id-or-external-id",
)
enrichment_id: str = SchemaField(
description="The ID of the enrichment to delete",
placeholder="enrichment-id",
)
class Output(BlockSchemaOutput):
enrichment_id: str = SchemaField(description="The ID of the deleted enrichment")
success: str = SchemaField(description="Whether the deletion was successful")
def __init__(self):
super().__init__(
id="b250de56-2ca6-4237-a7b8-b5684892189f",
description="Delete an enrichment from a webset",
categories={BlockCategory.SEARCH},
input_schema=ExaDeleteEnrichmentBlock.Input,
output_schema=ExaDeleteEnrichmentBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
# Use AsyncExa SDK
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
deleted_enrichment = aexa.websets.enrichments.delete(
webset_id=input_data.webset_id, id=input_data.enrichment_id
)
yield "enrichment_id", deleted_enrichment.id
yield "success", "true"
class ExaCancelEnrichmentBlock(Block):
"""Cancel a running enrichment operation."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
webset_id: str = SchemaField(
description="The ID or external ID of the Webset",
placeholder="webset-id-or-external-id",
)
enrichment_id: str = SchemaField(
description="The ID of the enrichment to cancel",
placeholder="enrichment-id",
)
class Output(BlockSchemaOutput):
enrichment_id: str = SchemaField(
description="The ID of the canceled enrichment"
)
status: str = SchemaField(description="Status after cancellation")
items_enriched_before_cancel: int = SchemaField(
description="Approximate number of items enriched before cancellation"
)
success: str = SchemaField(
description="Whether the cancellation was successful"
)
def __init__(self):
super().__init__(
id="7e1f8f0f-b6ab-43b3-bd1d-0c534a649295",
description="Cancel a running enrichment operation",
categories={BlockCategory.SEARCH},
input_schema=ExaCancelEnrichmentBlock.Input,
output_schema=ExaCancelEnrichmentBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
# Use AsyncExa SDK
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
canceled_enrichment = aexa.websets.enrichments.cancel(
webset_id=input_data.webset_id, id=input_data.enrichment_id
)
# Try to estimate how many items were enriched before cancellation
items_enriched = 0
items_response = aexa.websets.items.list(
webset_id=input_data.webset_id, limit=100
)
for sdk_item in items_response.data:
# Check if this enrichment is present
for enrich_result in sdk_item.enrichments:
if enrich_result.enrichment_id == input_data.enrichment_id:
items_enriched += 1
break
status = (
canceled_enrichment.status.value
if hasattr(canceled_enrichment.status, "value")
else str(canceled_enrichment.status)
)
yield "enrichment_id", canceled_enrichment.id
yield "status", status
yield "items_enriched_before_cancel", items_enriched
yield "success", "true"

View File

@@ -0,0 +1,676 @@
"""
Exa Websets Import/Export Management Blocks
This module provides blocks for importing data into websets from CSV files
and exporting webset data in various formats.
"""
import csv
import json
from enum import Enum
from io import StringIO
from typing import Optional, Union
from exa_py import AsyncExa
from exa_py.websets.types import CreateImportResponse
from exa_py.websets.types import Import as SdkImport
from pydantic import BaseModel
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaInput,
BlockSchemaOutput,
CredentialsMetaInput,
SchemaField,
)
from ._config import exa
from ._test import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT
# Mirrored model for stability - don't use SDK types directly in block outputs
class ImportModel(BaseModel):
"""Stable output model mirroring SDK Import."""
id: str
status: str
title: str
format: str
entity_type: str
count: int
upload_url: Optional[str] # Only in CreateImportResponse
upload_valid_until: Optional[str] # Only in CreateImportResponse
failed_reason: str
failed_message: str
metadata: dict
created_at: str
updated_at: str
@classmethod
def from_sdk(
cls, import_obj: Union[SdkImport, CreateImportResponse]
) -> "ImportModel":
"""Convert SDK Import or CreateImportResponse to our stable model."""
# Extract entity type from union (may be None)
entity_type = "unknown"
if import_obj.entity:
entity_dict = import_obj.entity.model_dump(by_alias=True, exclude_none=True)
entity_type = entity_dict.get("type", "unknown")
# Handle status enum
status_str = (
import_obj.status.value
if hasattr(import_obj.status, "value")
else str(import_obj.status)
)
# Handle format enum
format_str = (
import_obj.format.value
if hasattr(import_obj.format, "value")
else str(import_obj.format)
)
# Handle failed_reason enum (may be None or enum)
failed_reason_str = ""
if import_obj.failed_reason:
failed_reason_str = (
import_obj.failed_reason.value
if hasattr(import_obj.failed_reason, "value")
else str(import_obj.failed_reason)
)
return cls(
id=import_obj.id,
status=status_str,
title=import_obj.title or "",
format=format_str,
entity_type=entity_type,
count=int(import_obj.count or 0),
upload_url=getattr(
import_obj, "upload_url", None
), # Only in CreateImportResponse
upload_valid_until=getattr(
import_obj, "upload_valid_until", None
), # Only in CreateImportResponse
failed_reason=failed_reason_str,
failed_message=import_obj.failed_message or "",
metadata=import_obj.metadata or {},
created_at=(
import_obj.created_at.isoformat() if import_obj.created_at else ""
),
updated_at=(
import_obj.updated_at.isoformat() if import_obj.updated_at else ""
),
)
class ImportFormat(str, Enum):
"""Supported import formats."""
CSV = "csv"
# JSON = "json" # Future support
class ImportEntityType(str, Enum):
"""Entity types for imports."""
COMPANY = "company"
PERSON = "person"
ARTICLE = "article"
RESEARCH_PAPER = "research_paper"
CUSTOM = "custom"
class ExportFormat(str, Enum):
"""Supported export formats."""
JSON = "json"
CSV = "csv"
JSON_LINES = "jsonl"
class ExaCreateImportBlock(Block):
"""Create an import to load external data that can be used with websets."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
title: str = SchemaField(
description="Title for this import",
placeholder="Customer List Import",
)
csv_data: str = SchemaField(
description="CSV data to import (as a string)",
placeholder="name,url\nAcme Corp,https://acme.com\nExample Inc,https://example.com",
)
entity_type: ImportEntityType = SchemaField(
default=ImportEntityType.COMPANY,
description="Type of entities being imported",
)
entity_description: Optional[str] = SchemaField(
default=None,
description="Description for custom entity type",
advanced=True,
)
identifier_column: int = SchemaField(
default=0,
description="Column index containing the identifier (0-based)",
ge=0,
)
url_column: Optional[int] = SchemaField(
default=None,
description="Column index containing URLs (optional)",
ge=0,
advanced=True,
)
metadata: Optional[dict] = SchemaField(
default=None,
description="Metadata to attach to the import",
advanced=True,
)
class Output(BlockSchemaOutput):
import_id: str = SchemaField(
description="The unique identifier for the created import"
)
status: str = SchemaField(description="Current status of the import")
title: str = SchemaField(description="Title of the import")
count: int = SchemaField(description="Number of items in the import")
entity_type: str = SchemaField(description="Type of entities imported")
upload_url: Optional[str] = SchemaField(
description="Upload URL for CSV data (only if csv_data not provided in request)"
)
upload_valid_until: Optional[str] = SchemaField(
description="Expiration time for upload URL (only if upload_url is provided)"
)
created_at: str = SchemaField(description="When the import was created")
def __init__(self):
super().__init__(
id="020a35d8-8a53-4e60-8b60-1de5cbab1df3",
description="Import CSV data to use with websets for targeted searches",
categories={BlockCategory.DATA},
input_schema=ExaCreateImportBlock.Input,
output_schema=ExaCreateImportBlock.Output,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"title": "Test Import",
"csv_data": "name,url\nAcme,https://acme.com",
"entity_type": ImportEntityType.COMPANY,
"identifier_column": 0,
},
test_output=[
("import_id", "import-123"),
("status", "pending"),
("title", "Test Import"),
("count", 1),
("entity_type", "company"),
("upload_url", None),
("upload_valid_until", None),
("created_at", "2024-01-01T00:00:00"),
],
test_credentials=TEST_CREDENTIALS,
test_mock=self._create_test_mock(),
)
@staticmethod
def _create_test_mock():
"""Create test mocks for the AsyncExa SDK."""
from datetime import datetime
from unittest.mock import MagicMock
# Create mock SDK import object
mock_import = MagicMock()
mock_import.id = "import-123"
mock_import.status = MagicMock(value="pending")
mock_import.title = "Test Import"
mock_import.format = MagicMock(value="csv")
mock_import.count = 1
mock_import.upload_url = None
mock_import.upload_valid_until = None
mock_import.failed_reason = None
mock_import.failed_message = ""
mock_import.metadata = {}
mock_import.created_at = datetime.fromisoformat("2024-01-01T00:00:00")
mock_import.updated_at = datetime.fromisoformat("2024-01-01T00:00:00")
# Mock entity
mock_entity = MagicMock()
mock_entity.model_dump = MagicMock(return_value={"type": "company"})
mock_import.entity = mock_entity
return {
"_get_client": lambda *args, **kwargs: MagicMock(
websets=MagicMock(
imports=MagicMock(create=lambda *args, **kwargs: mock_import)
)
)
}
def _get_client(self, api_key: str) -> AsyncExa:
"""Get Exa client (separated for testing)."""
return AsyncExa(api_key=api_key)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
aexa = self._get_client(credentials.api_key.get_secret_value())
csv_reader = csv.reader(StringIO(input_data.csv_data))
rows = list(csv_reader)
count = len(rows) - 1 if len(rows) > 1 else 0
size = len(input_data.csv_data.encode("utf-8"))
payload = {
"title": input_data.title,
"format": ImportFormat.CSV.value,
"count": count,
"size": size,
"csv": {
"identifier": input_data.identifier_column,
},
}
# Add URL column if specified
if input_data.url_column is not None:
payload["csv"]["url"] = input_data.url_column
# Add entity configuration
entity = {"type": input_data.entity_type.value}
if (
input_data.entity_type == ImportEntityType.CUSTOM
and input_data.entity_description
):
entity["description"] = input_data.entity_description
payload["entity"] = entity
# Add metadata if provided
if input_data.metadata:
payload["metadata"] = input_data.metadata
sdk_import = aexa.websets.imports.create(
params=payload, csv_data=input_data.csv_data
)
import_obj = ImportModel.from_sdk(sdk_import)
yield "import_id", import_obj.id
yield "status", import_obj.status
yield "title", import_obj.title
yield "count", import_obj.count
yield "entity_type", import_obj.entity_type
yield "upload_url", import_obj.upload_url
yield "upload_valid_until", import_obj.upload_valid_until
yield "created_at", import_obj.created_at
class ExaGetImportBlock(Block):
"""Get the status and details of an import."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
import_id: str = SchemaField(
description="The ID of the import to retrieve",
placeholder="import-id",
)
class Output(BlockSchemaOutput):
import_id: str = SchemaField(description="The unique identifier for the import")
status: str = SchemaField(description="Current status of the import")
title: str = SchemaField(description="Title of the import")
format: str = SchemaField(description="Format of the imported data")
entity_type: str = SchemaField(description="Type of entities imported")
count: int = SchemaField(description="Number of items imported")
upload_url: Optional[str] = SchemaField(
description="Upload URL for CSV data (if import not yet uploaded)"
)
upload_valid_until: Optional[str] = SchemaField(
description="Expiration time for upload URL (if applicable)"
)
failed_reason: Optional[str] = SchemaField(
description="Reason for failure (if applicable)"
)
failed_message: Optional[str] = SchemaField(
description="Detailed failure message (if applicable)"
)
created_at: str = SchemaField(description="When the import was created")
updated_at: str = SchemaField(description="When the import was last updated")
metadata: dict = SchemaField(description="Metadata attached to the import")
def __init__(self):
super().__init__(
id="236663c8-a8dc-45f7-a050-2676bb0a3dd2",
description="Get the status and details of an import",
categories={BlockCategory.DATA},
input_schema=ExaGetImportBlock.Input,
output_schema=ExaGetImportBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
# Use AsyncExa SDK
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
sdk_import = aexa.websets.imports.get(import_id=input_data.import_id)
import_obj = ImportModel.from_sdk(sdk_import)
# Yield all fields
yield "import_id", import_obj.id
yield "status", import_obj.status
yield "title", import_obj.title
yield "format", import_obj.format
yield "entity_type", import_obj.entity_type
yield "count", import_obj.count
yield "upload_url", import_obj.upload_url
yield "upload_valid_until", import_obj.upload_valid_until
yield "failed_reason", import_obj.failed_reason
yield "failed_message", import_obj.failed_message
yield "created_at", import_obj.created_at
yield "updated_at", import_obj.updated_at
yield "metadata", import_obj.metadata
class ExaListImportsBlock(Block):
"""List all imports with pagination."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
limit: int = SchemaField(
default=25,
description="Number of imports to return",
ge=1,
le=100,
)
cursor: Optional[str] = SchemaField(
default=None,
description="Cursor for pagination",
advanced=True,
)
class Output(BlockSchemaOutput):
imports: list[dict] = SchemaField(description="List of imports")
import_item: dict = SchemaField(
description="Individual import (yielded for each import)"
)
has_more: bool = SchemaField(
description="Whether there are more imports to paginate through"
)
next_cursor: Optional[str] = SchemaField(
description="Cursor for the next page of results"
)
def __init__(self):
super().__init__(
id="65323630-f7e9-4692-a624-184ba14c0686",
description="List all imports with pagination support",
categories={BlockCategory.DATA},
input_schema=ExaListImportsBlock.Input,
output_schema=ExaListImportsBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
# Use AsyncExa SDK
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
response = aexa.websets.imports.list(
cursor=input_data.cursor,
limit=input_data.limit,
)
# Convert SDK imports to our stable models
imports = [ImportModel.from_sdk(i) for i in response.data]
yield "imports", [i.model_dump() for i in imports]
for import_obj in imports:
yield "import_item", import_obj.model_dump()
yield "has_more", response.has_more
yield "next_cursor", response.next_cursor
class ExaDeleteImportBlock(Block):
"""Delete an import."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
import_id: str = SchemaField(
description="The ID of the import to delete",
placeholder="import-id",
)
class Output(BlockSchemaOutput):
import_id: str = SchemaField(description="The ID of the deleted import")
success: str = SchemaField(description="Whether the deletion was successful")
def __init__(self):
super().__init__(
id="81ae30ed-c7ba-4b5d-8483-b726846e570c",
description="Delete an import",
categories={BlockCategory.DATA},
input_schema=ExaDeleteImportBlock.Input,
output_schema=ExaDeleteImportBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
# Use AsyncExa SDK
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
deleted_import = aexa.websets.imports.delete(import_id=input_data.import_id)
yield "import_id", deleted_import.id
yield "success", "true"
class ExaExportWebsetBlock(Block):
"""Export all data from a webset in various formats."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
webset_id: str = SchemaField(
description="The ID or external ID of the Webset to export",
placeholder="webset-id-or-external-id",
)
format: ExportFormat = SchemaField(
default=ExportFormat.JSON,
description="Export format",
)
include_content: bool = SchemaField(
default=True,
description="Include full content in export",
)
include_enrichments: bool = SchemaField(
default=True,
description="Include enrichment data in export",
)
max_items: int = SchemaField(
default=100,
description="Maximum number of items to export",
ge=1,
le=100,
)
class Output(BlockSchemaOutput):
export_data: str = SchemaField(
description="Exported data in the requested format"
)
item_count: int = SchemaField(description="Number of items exported")
total_items: int = SchemaField(
description="Total number of items in the webset"
)
truncated: bool = SchemaField(
description="Whether the export was truncated due to max_items limit"
)
format: str = SchemaField(description="Format of the exported data")
def __init__(self):
super().__init__(
id="5da9d0fd-4b5b-4318-8302-8f71d0ccce9d",
description="Export webset data in JSON, CSV, or JSON Lines format",
categories={BlockCategory.DATA},
input_schema=ExaExportWebsetBlock.Input,
output_schema=ExaExportWebsetBlock.Output,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"webset_id": "test-webset",
"format": ExportFormat.JSON,
"include_content": True,
"include_enrichments": True,
"max_items": 10,
},
test_output=[
("export_data", str),
("item_count", 2),
("total_items", 2),
("truncated", False),
("format", "json"),
],
test_credentials=TEST_CREDENTIALS,
test_mock=self._create_test_mock(),
)
@staticmethod
def _create_test_mock():
"""Create test mocks for the AsyncExa SDK."""
from unittest.mock import MagicMock
# Create mock webset items
mock_item1 = MagicMock()
mock_item1.model_dump = MagicMock(
return_value={
"id": "item-1",
"url": "https://example.com",
"title": "Test Item 1",
}
)
mock_item2 = MagicMock()
mock_item2.model_dump = MagicMock(
return_value={
"id": "item-2",
"url": "https://example.org",
"title": "Test Item 2",
}
)
# Create mock iterator
mock_items = [mock_item1, mock_item2]
return {
"_get_client": lambda *args, **kwargs: MagicMock(
websets=MagicMock(
items=MagicMock(list_all=lambda *args, **kwargs: iter(mock_items))
)
)
}
def _get_client(self, api_key: str) -> AsyncExa:
"""Get Exa client (separated for testing)."""
return AsyncExa(api_key=api_key)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
# Use AsyncExa SDK
aexa = self._get_client(credentials.api_key.get_secret_value())
try:
all_items = []
# Use SDK's list_all iterator to fetch items
item_iterator = aexa.websets.items.list_all(
webset_id=input_data.webset_id, limit=input_data.max_items
)
for sdk_item in item_iterator:
if len(all_items) >= input_data.max_items:
break
# Convert to dict for export
item_dict = sdk_item.model_dump(by_alias=True, exclude_none=True)
all_items.append(item_dict)
# Calculate total and truncated
total_items = len(all_items) # SDK doesn't provide total count
truncated = len(all_items) >= input_data.max_items
# Process items based on include flags
if not input_data.include_content:
for item in all_items:
item.pop("content", None)
if not input_data.include_enrichments:
for item in all_items:
item.pop("enrichments", None)
# Format the export data
export_data = ""
if input_data.format == ExportFormat.JSON:
export_data = json.dumps(all_items, indent=2, default=str)
elif input_data.format == ExportFormat.JSON_LINES:
lines = [json.dumps(item, default=str) for item in all_items]
export_data = "\n".join(lines)
elif input_data.format == ExportFormat.CSV:
# Extract all unique keys for CSV headers
all_keys = set()
for item in all_items:
all_keys.update(self._flatten_dict(item).keys())
# Create CSV
output = StringIO()
writer = csv.DictWriter(output, fieldnames=sorted(all_keys))
writer.writeheader()
for item in all_items:
flat_item = self._flatten_dict(item)
writer.writerow(flat_item)
export_data = output.getvalue()
yield "export_data", export_data
yield "item_count", len(all_items)
yield "total_items", total_items
yield "truncated", truncated
yield "format", input_data.format.value
except ValueError as e:
# Re-raise user input validation errors
raise ValueError(f"Failed to export webset: {e}") from e
# Let all other exceptions propagate naturally
def _flatten_dict(self, d: dict, parent_key: str = "", sep: str = "_") -> dict:
"""Flatten nested dictionaries for CSV export."""
items = []
for k, v in d.items():
new_key = f"{parent_key}{sep}{k}" if parent_key else k
if isinstance(v, dict):
items.extend(self._flatten_dict(v, new_key, sep=sep).items())
elif isinstance(v, list):
# Convert lists to JSON strings for CSV
items.append((new_key, json.dumps(v, default=str)))
else:
items.append((new_key, v))
return dict(items)

View File

@@ -0,0 +1,591 @@
"""
Exa Websets Item Management Blocks
This module provides blocks for managing items within Exa websets, including
retrieving, listing, deleting, and bulk operations on webset items.
"""
from typing import Any, Dict, List, Optional
from exa_py import AsyncExa
from exa_py.websets.types import WebsetItem as SdkWebsetItem
from exa_py.websets.types import (
WebsetItemArticleProperties,
WebsetItemCompanyProperties,
WebsetItemCustomProperties,
WebsetItemPersonProperties,
WebsetItemResearchPaperProperties,
)
from pydantic import AnyUrl, BaseModel
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaInput,
BlockSchemaOutput,
CredentialsMetaInput,
SchemaField,
)
from ._config import exa
# Mirrored model for enrichment results
class EnrichmentResultModel(BaseModel):
"""Stable output model mirroring SDK EnrichmentResult."""
enrichment_id: str
format: str
result: Optional[List[str]]
reasoning: Optional[str]
references: List[Dict[str, Any]]
@classmethod
def from_sdk(cls, sdk_enrich) -> "EnrichmentResultModel":
"""Convert SDK EnrichmentResult to our model."""
format_str = (
sdk_enrich.format.value
if hasattr(sdk_enrich.format, "value")
else str(sdk_enrich.format)
)
# Convert references to dicts
references_list = []
if sdk_enrich.references:
for ref in sdk_enrich.references:
references_list.append(ref.model_dump(by_alias=True, exclude_none=True))
return cls(
enrichment_id=sdk_enrich.enrichment_id,
format=format_str,
result=sdk_enrich.result,
reasoning=sdk_enrich.reasoning,
references=references_list,
)
# Mirrored model for stability - don't use SDK types directly in block outputs
class WebsetItemModel(BaseModel):
"""Stable output model mirroring SDK WebsetItem."""
id: str
url: Optional[AnyUrl]
title: str
content: str
entity_data: Dict[str, Any]
enrichments: Dict[str, EnrichmentResultModel]
created_at: str
updated_at: str
@classmethod
def from_sdk(cls, item: SdkWebsetItem) -> "WebsetItemModel":
"""Convert SDK WebsetItem to our stable model."""
# Extract properties from the union type
properties_dict = {}
url_value = None
title = ""
content = ""
if hasattr(item, "properties") and item.properties:
properties_dict = item.properties.model_dump(
by_alias=True, exclude_none=True
)
# URL is always available on all property types
url_value = item.properties.url
# Extract title using isinstance checks on the union type
if isinstance(item.properties, WebsetItemPersonProperties):
title = item.properties.person.name
content = "" # Person type has no content
elif isinstance(item.properties, WebsetItemCompanyProperties):
title = item.properties.company.name
content = item.properties.content or ""
elif isinstance(item.properties, WebsetItemArticleProperties):
title = item.properties.description
content = item.properties.content or ""
elif isinstance(item.properties, WebsetItemResearchPaperProperties):
title = item.properties.description
content = item.properties.content or ""
elif isinstance(item.properties, WebsetItemCustomProperties):
title = item.properties.description
content = item.properties.content or ""
else:
# Fallback
title = item.properties.description
content = getattr(item.properties, "content", "")
# Convert enrichments from list to dict keyed by enrichment_id using Pydantic models
enrichments_dict: Dict[str, EnrichmentResultModel] = {}
if hasattr(item, "enrichments") and item.enrichments:
for sdk_enrich in item.enrichments:
enrich_model = EnrichmentResultModel.from_sdk(sdk_enrich)
enrichments_dict[enrich_model.enrichment_id] = enrich_model
return cls(
id=item.id,
url=url_value,
title=title,
content=content or "",
entity_data=properties_dict,
enrichments=enrichments_dict,
created_at=item.created_at.isoformat() if item.created_at else "",
updated_at=item.updated_at.isoformat() if item.updated_at else "",
)
class ExaGetWebsetItemBlock(Block):
"""Get a specific item from a webset by its ID."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
webset_id: str = SchemaField(
description="The ID or external ID of the Webset",
placeholder="webset-id-or-external-id",
)
item_id: str = SchemaField(
description="The ID of the specific item to retrieve",
placeholder="item-id",
)
class Output(BlockSchemaOutput):
item_id: str = SchemaField(description="The unique identifier for the item")
url: str = SchemaField(description="The URL of the original source")
title: str = SchemaField(description="The title of the item")
content: str = SchemaField(description="The main content of the item")
entity_data: dict = SchemaField(description="Entity-specific structured data")
enrichments: dict = SchemaField(description="Enrichment data added to the item")
created_at: str = SchemaField(
description="When the item was added to the webset"
)
updated_at: str = SchemaField(description="When the item was last updated")
def __init__(self):
super().__init__(
id="c4a7d9e2-8f3b-4a6c-9d8e-a5b6c7d8e9f0",
description="Get a specific item from a webset by its ID",
categories={BlockCategory.SEARCH},
input_schema=ExaGetWebsetItemBlock.Input,
output_schema=ExaGetWebsetItemBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
sdk_item = aexa.websets.items.get(
webset_id=input_data.webset_id, id=input_data.item_id
)
item = WebsetItemModel.from_sdk(sdk_item)
yield "item_id", item.id
yield "url", item.url
yield "title", item.title
yield "content", item.content
yield "entity_data", item.entity_data
yield "enrichments", item.enrichments
yield "created_at", item.created_at
yield "updated_at", item.updated_at
class ExaListWebsetItemsBlock(Block):
"""List items in a webset with pagination and optional filtering."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
webset_id: str = SchemaField(
description="The ID or external ID of the Webset",
placeholder="webset-id-or-external-id",
)
limit: int = SchemaField(
default=25,
description="Number of items to return (1-100)",
ge=1,
le=100,
)
cursor: Optional[str] = SchemaField(
default=None,
description="Cursor for pagination through results",
advanced=True,
)
wait_for_items: bool = SchemaField(
default=False,
description="Wait for items to be available if webset is still processing",
advanced=True,
)
wait_timeout: int = SchemaField(
default=60,
description="Maximum time to wait for items in seconds",
advanced=True,
ge=1,
le=300,
)
class Output(BlockSchemaOutput):
items: list[WebsetItemModel] = SchemaField(
description="List of webset items",
)
webset_id: str = SchemaField(
description="The ID of the webset",
)
item: WebsetItemModel = SchemaField(
description="Individual item (yielded for each item in the list)",
)
has_more: bool = SchemaField(
description="Whether there are more items to paginate through",
)
next_cursor: Optional[str] = SchemaField(
description="Cursor for the next page of results",
)
def __init__(self):
super().__init__(
id="7b5e8c9f-01a2-43c4-95e6-f7a8b9c0d1e2",
description="List items in a webset with pagination support",
categories={BlockCategory.SEARCH},
input_schema=ExaListWebsetItemsBlock.Input,
output_schema=ExaListWebsetItemsBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
if input_data.wait_for_items:
import asyncio
import time
start_time = time.time()
interval = 2
response = None
while time.time() - start_time < input_data.wait_timeout:
response = aexa.websets.items.list(
webset_id=input_data.webset_id,
cursor=input_data.cursor,
limit=input_data.limit,
)
if response.data:
break
await asyncio.sleep(interval)
interval = min(interval * 1.2, 10)
if not response:
response = aexa.websets.items.list(
webset_id=input_data.webset_id,
cursor=input_data.cursor,
limit=input_data.limit,
)
else:
response = aexa.websets.items.list(
webset_id=input_data.webset_id,
cursor=input_data.cursor,
limit=input_data.limit,
)
items = [WebsetItemModel.from_sdk(item) for item in response.data]
yield "items", items
for item in items:
yield "item", item
yield "has_more", response.has_more
yield "next_cursor", response.next_cursor
yield "webset_id", input_data.webset_id
class ExaDeleteWebsetItemBlock(Block):
"""Delete a specific item from a webset."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
webset_id: str = SchemaField(
description="The ID or external ID of the Webset",
placeholder="webset-id-or-external-id",
)
item_id: str = SchemaField(
description="The ID of the item to delete",
placeholder="item-id",
)
class Output(BlockSchemaOutput):
item_id: str = SchemaField(description="The ID of the deleted item")
success: str = SchemaField(description="Whether the deletion was successful")
def __init__(self):
super().__init__(
id="12c57fbe-c270-4877-a2b6-d2d05529ba79",
description="Delete a specific item from a webset",
categories={BlockCategory.SEARCH},
input_schema=ExaDeleteWebsetItemBlock.Input,
output_schema=ExaDeleteWebsetItemBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
deleted_item = aexa.websets.items.delete(
webset_id=input_data.webset_id, id=input_data.item_id
)
yield "item_id", deleted_item.id
yield "success", "true"
class ExaBulkWebsetItemsBlock(Block):
"""Get all items from a webset in a single operation (with size limits)."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
webset_id: str = SchemaField(
description="The ID or external ID of the Webset",
placeholder="webset-id-or-external-id",
)
max_items: int = SchemaField(
default=100,
description="Maximum number of items to retrieve (1-1000). Note: Large values may take longer.",
ge=1,
le=1000,
)
include_enrichments: bool = SchemaField(
default=True,
description="Include enrichment data for each item",
)
include_content: bool = SchemaField(
default=True,
description="Include full content for each item",
)
class Output(BlockSchemaOutput):
items: list[WebsetItemModel] = SchemaField(
description="All items from the webset"
)
item: WebsetItemModel = SchemaField(
description="Individual item (yielded for each item)"
)
total_retrieved: int = SchemaField(
description="Total number of items retrieved"
)
truncated: bool = SchemaField(
description="Whether results were truncated due to max_items limit"
)
def __init__(self):
super().__init__(
id="dbd619f5-476e-4395-af9a-a7a7c0fb8c4e",
description="Get all items from a webset in bulk (with configurable limits)",
categories={BlockCategory.SEARCH},
input_schema=ExaBulkWebsetItemsBlock.Input,
output_schema=ExaBulkWebsetItemsBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
# Use AsyncExa SDK
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
all_items: List[WebsetItemModel] = []
item_iterator = aexa.websets.items.list_all(
webset_id=input_data.webset_id, limit=input_data.max_items
)
for sdk_item in item_iterator:
if len(all_items) >= input_data.max_items:
break
item = WebsetItemModel.from_sdk(sdk_item)
if not input_data.include_enrichments:
item.enrichments = {}
if not input_data.include_content:
item.content = ""
all_items.append(item)
yield "items", all_items
for item in all_items:
yield "item", item
yield "total_retrieved", len(all_items)
yield "truncated", len(all_items) >= input_data.max_items
class ExaWebsetItemsSummaryBlock(Block):
"""Get a summary of items in a webset without retrieving all data."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
webset_id: str = SchemaField(
description="The ID or external ID of the Webset",
placeholder="webset-id-or-external-id",
)
sample_size: int = SchemaField(
default=5,
description="Number of sample items to include",
ge=0,
le=10,
)
class Output(BlockSchemaOutput):
total_items: int = SchemaField(
description="Total number of items in the webset"
)
entity_type: str = SchemaField(description="Type of entities in the webset")
sample_items: list[WebsetItemModel] = SchemaField(
description="Sample of items from the webset"
)
enrichment_columns: list[str] = SchemaField(
description="List of enrichment columns available"
)
def __init__(self):
super().__init__(
id="db7813ad-10bd-4652-8623-5667d6fecdd5",
description="Get a summary of webset items without retrieving all data",
categories={BlockCategory.SEARCH},
input_schema=ExaWebsetItemsSummaryBlock.Input,
output_schema=ExaWebsetItemsSummaryBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
# Use AsyncExa SDK
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
webset = aexa.websets.get(id=input_data.webset_id)
entity_type = "unknown"
if webset.searches:
first_search = webset.searches[0]
if first_search.entity:
# The entity is a union type, extract type field
entity_dict = first_search.entity.model_dump(by_alias=True)
entity_type = entity_dict.get("type", "unknown")
# Get enrichment columns
enrichment_columns = []
if webset.enrichments:
enrichment_columns = [
e.title if e.title else e.description for e in webset.enrichments
]
# Get sample items if requested
sample_items: List[WebsetItemModel] = []
if input_data.sample_size > 0:
items_response = aexa.websets.items.list(
webset_id=input_data.webset_id, limit=input_data.sample_size
)
# Convert to our stable models
sample_items = [
WebsetItemModel.from_sdk(item) for item in items_response.data
]
total_items = 0
if webset.searches:
for search in webset.searches:
if search.progress:
total_items += search.progress.found
yield "total_items", total_items
yield "entity_type", entity_type
yield "sample_items", sample_items
yield "enrichment_columns", enrichment_columns
class ExaGetNewItemsBlock(Block):
"""Get items added to a webset since a specific cursor (incremental processing helper)."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
webset_id: str = SchemaField(
description="The ID or external ID of the Webset",
placeholder="webset-id-or-external-id",
)
since_cursor: Optional[str] = SchemaField(
default=None,
description="Cursor from previous run - only items after this will be returned. Leave empty on first run.",
placeholder="cursor-from-previous-run",
)
max_items: int = SchemaField(
default=100,
description="Maximum number of new items to retrieve",
ge=1,
le=1000,
)
class Output(BlockSchemaOutput):
new_items: list[WebsetItemModel] = SchemaField(
description="Items added since the cursor"
)
item: WebsetItemModel = SchemaField(
description="Individual item (yielded for each new item)"
)
count: int = SchemaField(description="Number of new items found")
next_cursor: Optional[str] = SchemaField(
description="Save this cursor for the next run to get only newer items"
)
has_more: bool = SchemaField(
description="Whether there are more new items beyond max_items"
)
def __init__(self):
super().__init__(
id="3ff9bdf5-9613-4d21-8a60-90eb8b69c414",
description="Get items added since a cursor - enables incremental processing without reprocessing",
categories={BlockCategory.SEARCH, BlockCategory.DATA},
input_schema=ExaGetNewItemsBlock.Input,
output_schema=ExaGetNewItemsBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
# Use AsyncExa SDK
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
# Get items starting from cursor
response = aexa.websets.items.list(
webset_id=input_data.webset_id,
cursor=input_data.since_cursor,
limit=input_data.max_items,
)
# Convert SDK items to our stable models
new_items = [WebsetItemModel.from_sdk(item) for item in response.data]
# Yield the full list
yield "new_items", new_items
# Yield individual items for processing
for item in new_items:
yield "item", item
# Yield metadata for next run
yield "count", len(new_items)
yield "next_cursor", response.next_cursor
yield "has_more", response.has_more

View File

@@ -0,0 +1,600 @@
"""
Exa Websets Monitor Management Blocks
This module provides blocks for creating and managing monitors that automatically
keep websets updated with fresh data on a schedule.
"""
from enum import Enum
from typing import Optional
from exa_py import AsyncExa
from exa_py.websets.types import Monitor as SdkMonitor
from pydantic import BaseModel
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaInput,
BlockSchemaOutput,
CredentialsMetaInput,
SchemaField,
)
from ._config import exa
from ._test import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT
# Mirrored model for stability - don't use SDK types directly in block outputs
class MonitorModel(BaseModel):
"""Stable output model mirroring SDK Monitor."""
id: str
status: str
webset_id: str
behavior_type: str
behavior_config: dict
cron_expression: str
timezone: str
next_run_at: str
last_run: dict
metadata: dict
created_at: str
updated_at: str
@classmethod
def from_sdk(cls, monitor: SdkMonitor) -> "MonitorModel":
"""Convert SDK Monitor to our stable model."""
# Extract behavior information
behavior_dict = monitor.behavior.model_dump(by_alias=True, exclude_none=True)
behavior_type = behavior_dict.get("type", "unknown")
behavior_config = behavior_dict.get("config", {})
# Extract cadence information
cadence_dict = monitor.cadence.model_dump(by_alias=True, exclude_none=True)
cron_expr = cadence_dict.get("cron", "")
timezone = cadence_dict.get("timezone", "Etc/UTC")
# Extract last run information
last_run_dict = {}
if monitor.last_run:
last_run_dict = monitor.last_run.model_dump(
by_alias=True, exclude_none=True
)
# Handle status enum
status_str = (
monitor.status.value
if hasattr(monitor.status, "value")
else str(monitor.status)
)
return cls(
id=monitor.id,
status=status_str,
webset_id=monitor.webset_id,
behavior_type=behavior_type,
behavior_config=behavior_config,
cron_expression=cron_expr,
timezone=timezone,
next_run_at=monitor.next_run_at.isoformat() if monitor.next_run_at else "",
last_run=last_run_dict,
metadata=monitor.metadata or {},
created_at=monitor.created_at.isoformat() if monitor.created_at else "",
updated_at=monitor.updated_at.isoformat() if monitor.updated_at else "",
)
class MonitorStatus(str, Enum):
"""Status of a monitor."""
ENABLED = "enabled"
DISABLED = "disabled"
PAUSED = "paused"
class MonitorBehaviorType(str, Enum):
"""Type of behavior for a monitor."""
SEARCH = "search" # Run new searches
REFRESH = "refresh" # Refresh existing items
class SearchBehavior(str, Enum):
"""How search results interact with existing items."""
APPEND = "append"
OVERRIDE = "override"
class ExaCreateMonitorBlock(Block):
"""Create a monitor to automatically keep a webset updated on a schedule."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
webset_id: str = SchemaField(
description="The ID or external ID of the Webset to monitor",
placeholder="webset-id-or-external-id",
)
# Schedule configuration
cron_expression: str = SchemaField(
description="Cron expression for scheduling (5 fields, max once per day)",
placeholder="0 9 * * 1", # Every Monday at 9 AM
)
timezone: str = SchemaField(
default="Etc/UTC",
description="IANA timezone for the schedule",
placeholder="America/New_York",
advanced=True,
)
# Behavior configuration
behavior_type: MonitorBehaviorType = SchemaField(
default=MonitorBehaviorType.SEARCH,
description="Type of monitor behavior (search for new items or refresh existing)",
)
# Search configuration (for SEARCH behavior)
search_query: Optional[str] = SchemaField(
default=None,
description="Search query for finding new items (required for search behavior)",
placeholder="AI startups that raised funding in the last week",
)
search_count: int = SchemaField(
default=10,
description="Number of items to find in each search",
ge=1,
le=100,
)
search_criteria: list[str] = SchemaField(
default_factory=list,
description="Criteria that items must meet",
advanced=True,
)
search_behavior: SearchBehavior = SchemaField(
default=SearchBehavior.APPEND,
description="How new results interact with existing items",
advanced=True,
)
entity_type: Optional[str] = SchemaField(
default=None,
description="Type of entity to search for (company, person, etc.)",
advanced=True,
)
# Refresh configuration (for REFRESH behavior)
refresh_content: bool = SchemaField(
default=True,
description="Refresh content from source URLs (for refresh behavior)",
advanced=True,
)
refresh_enrichments: bool = SchemaField(
default=True,
description="Re-run enrichments on items (for refresh behavior)",
advanced=True,
)
# Metadata
metadata: Optional[dict] = SchemaField(
default=None,
description="Metadata to attach to the monitor",
advanced=True,
)
class Output(BlockSchemaOutput):
monitor_id: str = SchemaField(
description="The unique identifier for the created monitor"
)
webset_id: str = SchemaField(description="The webset this monitor belongs to")
status: str = SchemaField(description="Status of the monitor")
behavior_type: str = SchemaField(description="Type of monitor behavior")
next_run_at: Optional[str] = SchemaField(
description="When the monitor will next run"
)
cron_expression: str = SchemaField(description="The schedule cron expression")
timezone: str = SchemaField(description="The timezone for scheduling")
created_at: str = SchemaField(description="When the monitor was created")
def __init__(self):
super().__init__(
id="f8a9b0c1-d2e3-4567-890a-bcdef1234567",
description="Create automated monitors to keep websets updated with fresh data on a schedule",
categories={BlockCategory.SEARCH},
input_schema=ExaCreateMonitorBlock.Input,
output_schema=ExaCreateMonitorBlock.Output,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"webset_id": "test-webset",
"cron_expression": "0 9 * * 1",
"behavior_type": MonitorBehaviorType.SEARCH,
"search_query": "AI startups",
"search_count": 10,
},
test_output=[
("monitor_id", "monitor-123"),
("webset_id", "test-webset"),
("status", "enabled"),
("behavior_type", "search"),
("next_run_at", "2024-01-01T00:00:00"),
("cron_expression", "0 9 * * 1"),
("timezone", "Etc/UTC"),
("created_at", "2024-01-01T00:00:00"),
],
test_credentials=TEST_CREDENTIALS,
test_mock=self._create_test_mock(),
)
@staticmethod
def _create_test_mock():
"""Create test mocks for the AsyncExa SDK."""
from datetime import datetime
from unittest.mock import MagicMock
# Create mock SDK monitor object
mock_monitor = MagicMock()
mock_monitor.id = "monitor-123"
mock_monitor.status = MagicMock(value="enabled")
mock_monitor.webset_id = "test-webset"
mock_monitor.next_run_at = datetime.fromisoformat("2024-01-01T00:00:00")
mock_monitor.created_at = datetime.fromisoformat("2024-01-01T00:00:00")
mock_monitor.updated_at = datetime.fromisoformat("2024-01-01T00:00:00")
mock_monitor.metadata = {}
mock_monitor.last_run = None
# Mock behavior
mock_behavior = MagicMock()
mock_behavior.model_dump = MagicMock(
return_value={"type": "search", "config": {}}
)
mock_monitor.behavior = mock_behavior
# Mock cadence
mock_cadence = MagicMock()
mock_cadence.model_dump = MagicMock(
return_value={"cron": "0 9 * * 1", "timezone": "Etc/UTC"}
)
mock_monitor.cadence = mock_cadence
return {
"_get_client": lambda *args, **kwargs: MagicMock(
websets=MagicMock(
monitors=MagicMock(create=lambda *args, **kwargs: mock_monitor)
)
)
}
def _get_client(self, api_key: str) -> AsyncExa:
"""Get Exa client (separated for testing)."""
return AsyncExa(api_key=api_key)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
aexa = self._get_client(credentials.api_key.get_secret_value())
# Build the payload
payload = {
"websetId": input_data.webset_id,
"cadence": {
"cron": input_data.cron_expression,
"timezone": input_data.timezone,
},
}
# Build behavior configuration based on type
if input_data.behavior_type == MonitorBehaviorType.SEARCH:
behavior_config = {
"query": input_data.search_query or "",
"count": input_data.search_count,
"behavior": input_data.search_behavior.value,
}
if input_data.search_criteria:
behavior_config["criteria"] = [
{"description": c} for c in input_data.search_criteria
]
if input_data.entity_type:
behavior_config["entity"] = {"type": input_data.entity_type}
payload["behavior"] = {
"type": "search",
"config": behavior_config,
}
else:
# REFRESH behavior
payload["behavior"] = {
"type": "refresh",
"config": {
"content": input_data.refresh_content,
"enrichments": input_data.refresh_enrichments,
},
}
# Add metadata if provided
if input_data.metadata:
payload["metadata"] = input_data.metadata
sdk_monitor = aexa.websets.monitors.create(params=payload)
monitor = MonitorModel.from_sdk(sdk_monitor)
# Yield all fields
yield "monitor_id", monitor.id
yield "webset_id", monitor.webset_id
yield "status", monitor.status
yield "behavior_type", monitor.behavior_type
yield "next_run_at", monitor.next_run_at
yield "cron_expression", monitor.cron_expression
yield "timezone", monitor.timezone
yield "created_at", monitor.created_at
class ExaGetMonitorBlock(Block):
"""Get the details and status of a monitor."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
monitor_id: str = SchemaField(
description="The ID of the monitor to retrieve",
placeholder="monitor-id",
)
class Output(BlockSchemaOutput):
monitor_id: str = SchemaField(
description="The unique identifier for the monitor"
)
webset_id: str = SchemaField(description="The webset this monitor belongs to")
status: str = SchemaField(description="Current status of the monitor")
behavior_type: str = SchemaField(description="Type of monitor behavior")
behavior_config: dict = SchemaField(
description="Configuration for the monitor behavior"
)
cron_expression: str = SchemaField(description="The schedule cron expression")
timezone: str = SchemaField(description="The timezone for scheduling")
next_run_at: Optional[str] = SchemaField(
description="When the monitor will next run"
)
last_run: Optional[dict] = SchemaField(
description="Information about the last run"
)
created_at: str = SchemaField(description="When the monitor was created")
updated_at: str = SchemaField(description="When the monitor was last updated")
metadata: dict = SchemaField(description="Metadata attached to the monitor")
def __init__(self):
super().__init__(
id="5c852a2d-d505-4a56-b711-7def8dd14e72",
description="Get the details and status of a webset monitor",
categories={BlockCategory.SEARCH},
input_schema=ExaGetMonitorBlock.Input,
output_schema=ExaGetMonitorBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
# Use AsyncExa SDK
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
sdk_monitor = aexa.websets.monitors.get(monitor_id=input_data.monitor_id)
monitor = MonitorModel.from_sdk(sdk_monitor)
# Yield all fields
yield "monitor_id", monitor.id
yield "webset_id", monitor.webset_id
yield "status", monitor.status
yield "behavior_type", monitor.behavior_type
yield "behavior_config", monitor.behavior_config
yield "cron_expression", monitor.cron_expression
yield "timezone", monitor.timezone
yield "next_run_at", monitor.next_run_at
yield "last_run", monitor.last_run
yield "created_at", monitor.created_at
yield "updated_at", monitor.updated_at
yield "metadata", monitor.metadata
class ExaUpdateMonitorBlock(Block):
"""Update a monitor's configuration."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
monitor_id: str = SchemaField(
description="The ID of the monitor to update",
placeholder="monitor-id",
)
status: Optional[MonitorStatus] = SchemaField(
default=None,
description="New status for the monitor",
)
cron_expression: Optional[str] = SchemaField(
default=None,
description="New cron expression for scheduling",
)
timezone: Optional[str] = SchemaField(
default=None,
description="New timezone for the schedule",
advanced=True,
)
metadata: Optional[dict] = SchemaField(
default=None,
description="New metadata for the monitor",
advanced=True,
)
class Output(BlockSchemaOutput):
monitor_id: str = SchemaField(
description="The unique identifier for the monitor"
)
status: str = SchemaField(description="Updated status of the monitor")
next_run_at: Optional[str] = SchemaField(
description="When the monitor will next run"
)
updated_at: str = SchemaField(description="When the monitor was updated")
success: str = SchemaField(description="Whether the update was successful")
def __init__(self):
super().__init__(
id="245102c3-6af3-4515-a308-c2210b7939d2",
description="Update a monitor's status, schedule, or metadata",
categories={BlockCategory.SEARCH},
input_schema=ExaUpdateMonitorBlock.Input,
output_schema=ExaUpdateMonitorBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
# Use AsyncExa SDK
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
# Build update payload
payload = {}
if input_data.status is not None:
payload["status"] = input_data.status.value
if input_data.cron_expression is not None or input_data.timezone is not None:
cadence = {}
if input_data.cron_expression:
cadence["cron"] = input_data.cron_expression
if input_data.timezone:
cadence["timezone"] = input_data.timezone
payload["cadence"] = cadence
if input_data.metadata is not None:
payload["metadata"] = input_data.metadata
sdk_monitor = aexa.websets.monitors.update(
monitor_id=input_data.monitor_id, params=payload
)
# Convert to our stable model
monitor = MonitorModel.from_sdk(sdk_monitor)
# Yield fields
yield "monitor_id", monitor.id
yield "status", monitor.status
yield "next_run_at", monitor.next_run_at
yield "updated_at", monitor.updated_at
yield "success", "true"
class ExaDeleteMonitorBlock(Block):
"""Delete a monitor from a webset."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
monitor_id: str = SchemaField(
description="The ID of the monitor to delete",
placeholder="monitor-id",
)
class Output(BlockSchemaOutput):
monitor_id: str = SchemaField(description="The ID of the deleted monitor")
success: str = SchemaField(description="Whether the deletion was successful")
def __init__(self):
super().__init__(
id="f16f9b10-0c4d-4db8-997d-7b96b6026094",
description="Delete a monitor from a webset",
categories={BlockCategory.SEARCH},
input_schema=ExaDeleteMonitorBlock.Input,
output_schema=ExaDeleteMonitorBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
# Use AsyncExa SDK
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
deleted_monitor = aexa.websets.monitors.delete(monitor_id=input_data.monitor_id)
yield "monitor_id", deleted_monitor.id
yield "success", "true"
class ExaListMonitorsBlock(Block):
"""List all monitors with pagination."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
webset_id: Optional[str] = SchemaField(
default=None,
description="Filter monitors by webset ID",
placeholder="webset-id",
)
limit: int = SchemaField(
default=25,
description="Number of monitors to return",
ge=1,
le=100,
)
cursor: Optional[str] = SchemaField(
default=None,
description="Cursor for pagination",
advanced=True,
)
class Output(BlockSchemaOutput):
monitors: list[dict] = SchemaField(description="List of monitors")
monitor: dict = SchemaField(
description="Individual monitor (yielded for each monitor)"
)
has_more: bool = SchemaField(
description="Whether there are more monitors to paginate through"
)
next_cursor: Optional[str] = SchemaField(
description="Cursor for the next page of results"
)
def __init__(self):
super().__init__(
id="f06e2b38-5397-4e8f-aa85-491149dd98df",
description="List all monitors with optional webset filtering",
categories={BlockCategory.SEARCH},
input_schema=ExaListMonitorsBlock.Input,
output_schema=ExaListMonitorsBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
# Use AsyncExa SDK
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
response = aexa.websets.monitors.list(
cursor=input_data.cursor,
limit=input_data.limit,
webset_id=input_data.webset_id,
)
# Convert SDK monitors to our stable models
monitors = [MonitorModel.from_sdk(m) for m in response.data]
# Yield the full list
yield "monitors", [m.model_dump() for m in monitors]
# Yield individual monitors for graph chaining
for monitor in monitors:
yield "monitor", monitor.model_dump()
# Yield pagination metadata
yield "has_more", response.has_more
yield "next_cursor", response.next_cursor

View File

@@ -0,0 +1,600 @@
"""
Exa Websets Polling Blocks
This module provides dedicated polling blocks for waiting on webset operations
to complete, with progress tracking and timeout management.
"""
import asyncio
import time
from enum import Enum
from typing import Any, Dict
from exa_py import AsyncExa
from pydantic import BaseModel
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaInput,
BlockSchemaOutput,
CredentialsMetaInput,
SchemaField,
)
from ._config import exa
# Import WebsetItemModel for use in enrichment samples
# This is safe as websets_items doesn't import from websets_polling
from .websets_items import WebsetItemModel
# Model for sample enrichment data
class SampleEnrichmentModel(BaseModel):
"""Sample enrichment result for display."""
item_id: str
item_title: str
enrichment_data: Dict[str, Any]
class WebsetTargetStatus(str, Enum):
IDLE = "idle"
COMPLETED = "completed"
RUNNING = "running"
PAUSED = "paused"
ANY_COMPLETE = "any_complete" # Either idle or completed
class ExaWaitForWebsetBlock(Block):
"""Wait for a webset to reach a specific status with progress tracking."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
webset_id: str = SchemaField(
description="The ID or external ID of the Webset to monitor",
placeholder="webset-id-or-external-id",
)
target_status: WebsetTargetStatus = SchemaField(
default=WebsetTargetStatus.IDLE,
description="Status to wait for (idle=all operations complete, completed=search done, running=actively processing)",
)
timeout: int = SchemaField(
default=300,
description="Maximum time to wait in seconds",
ge=1,
le=1800, # 30 minutes max
)
check_interval: int = SchemaField(
default=5,
description="Initial interval between status checks in seconds",
advanced=True,
ge=1,
le=60,
)
max_interval: int = SchemaField(
default=30,
description="Maximum interval between checks (for exponential backoff)",
advanced=True,
ge=5,
le=120,
)
include_progress: bool = SchemaField(
default=True,
description="Include detailed progress information in output",
)
class Output(BlockSchemaOutput):
webset_id: str = SchemaField(description="The webset ID that was monitored")
final_status: str = SchemaField(description="The final status of the webset")
elapsed_time: float = SchemaField(description="Total time elapsed in seconds")
item_count: int = SchemaField(description="Number of items found")
search_progress: dict = SchemaField(
description="Detailed search progress information"
)
enrichment_progress: dict = SchemaField(
description="Detailed enrichment progress information"
)
timed_out: bool = SchemaField(description="Whether the operation timed out")
def __init__(self):
super().__init__(
id="619d71e8-b72a-434d-8bd4-23376dd0342c",
description="Wait for a webset to reach a specific status with progress tracking",
categories={BlockCategory.SEARCH},
input_schema=ExaWaitForWebsetBlock.Input,
output_schema=ExaWaitForWebsetBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
start_time = time.time()
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
try:
if input_data.target_status in [
WebsetTargetStatus.IDLE,
WebsetTargetStatus.ANY_COMPLETE,
]:
final_webset = aexa.websets.wait_until_idle(
id=input_data.webset_id,
timeout=input_data.timeout,
poll_interval=input_data.check_interval,
)
elapsed = time.time() - start_time
status_str = (
final_webset.status.value
if hasattr(final_webset.status, "value")
else str(final_webset.status)
)
item_count = 0
if final_webset.searches:
for search in final_webset.searches:
if search.progress:
item_count += search.progress.found
# Extract progress if requested
search_progress = {}
enrichment_progress = {}
if input_data.include_progress:
webset_dict = final_webset.model_dump(
by_alias=True, exclude_none=True
)
search_progress = self._extract_search_progress(webset_dict)
enrichment_progress = self._extract_enrichment_progress(webset_dict)
yield "webset_id", input_data.webset_id
yield "final_status", status_str
yield "elapsed_time", elapsed
yield "item_count", item_count
if input_data.include_progress:
yield "search_progress", search_progress
yield "enrichment_progress", enrichment_progress
yield "timed_out", False
else:
# For other status targets, manually poll
interval = input_data.check_interval
while time.time() - start_time < input_data.timeout:
# Get current webset status
webset = aexa.websets.get(id=input_data.webset_id)
current_status = (
webset.status.value
if hasattr(webset.status, "value")
else str(webset.status)
)
# Check if target status reached
if current_status == input_data.target_status.value:
elapsed = time.time() - start_time
# Estimate item count from search progress
item_count = 0
if webset.searches:
for search in webset.searches:
if search.progress:
item_count += search.progress.found
search_progress = {}
enrichment_progress = {}
if input_data.include_progress:
webset_dict = webset.model_dump(
by_alias=True, exclude_none=True
)
search_progress = self._extract_search_progress(webset_dict)
enrichment_progress = self._extract_enrichment_progress(
webset_dict
)
yield "webset_id", input_data.webset_id
yield "final_status", current_status
yield "elapsed_time", elapsed
yield "item_count", item_count
if input_data.include_progress:
yield "search_progress", search_progress
yield "enrichment_progress", enrichment_progress
yield "timed_out", False
return
# Wait before next check with exponential backoff
await asyncio.sleep(interval)
interval = min(interval * 1.5, input_data.max_interval)
# Timeout reached
elapsed = time.time() - start_time
webset = aexa.websets.get(id=input_data.webset_id)
final_status = (
webset.status.value
if hasattr(webset.status, "value")
else str(webset.status)
)
item_count = 0
if webset.searches:
for search in webset.searches:
if search.progress:
item_count += search.progress.found
search_progress = {}
enrichment_progress = {}
if input_data.include_progress:
webset_dict = webset.model_dump(by_alias=True, exclude_none=True)
search_progress = self._extract_search_progress(webset_dict)
enrichment_progress = self._extract_enrichment_progress(webset_dict)
yield "webset_id", input_data.webset_id
yield "final_status", final_status
yield "elapsed_time", elapsed
yield "item_count", item_count
if input_data.include_progress:
yield "search_progress", search_progress
yield "enrichment_progress", enrichment_progress
yield "timed_out", True
except asyncio.TimeoutError:
raise ValueError(
f"Polling timed out after {input_data.timeout} seconds"
) from None
def _extract_search_progress(self, webset_data: dict) -> dict:
"""Extract search progress information from webset data."""
progress = {}
searches = webset_data.get("searches", [])
for idx, search in enumerate(searches):
search_id = search.get("id", f"search_{idx}")
search_progress = search.get("progress", {})
progress[search_id] = {
"status": search.get("status", "unknown"),
"found": search_progress.get("found", 0),
"analyzed": search_progress.get("analyzed", 0),
"completion": search_progress.get("completion", 0),
"time_left": search_progress.get("timeLeft", 0),
}
return progress
def _extract_enrichment_progress(self, webset_data: dict) -> dict:
"""Extract enrichment progress information from webset data."""
progress = {}
enrichments = webset_data.get("enrichments", [])
for idx, enrichment in enumerate(enrichments):
enrich_id = enrichment.get("id", f"enrichment_{idx}")
progress[enrich_id] = {
"status": enrichment.get("status", "unknown"),
"title": enrichment.get("title", ""),
"description": enrichment.get("description", ""),
}
return progress
class ExaWaitForSearchBlock(Block):
"""Wait for a specific webset search to complete with progress tracking."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
webset_id: str = SchemaField(
description="The ID or external ID of the Webset",
placeholder="webset-id-or-external-id",
)
search_id: str = SchemaField(
description="The ID of the search to monitor",
placeholder="search-id",
)
timeout: int = SchemaField(
default=300,
description="Maximum time to wait in seconds",
ge=1,
le=1800,
)
check_interval: int = SchemaField(
default=5,
description="Initial interval between status checks in seconds",
advanced=True,
ge=1,
le=60,
)
class Output(BlockSchemaOutput):
search_id: str = SchemaField(description="The search ID that was monitored")
final_status: str = SchemaField(description="The final status of the search")
items_found: int = SchemaField(
description="Number of items found by the search"
)
items_analyzed: int = SchemaField(description="Number of items analyzed")
completion_percentage: int = SchemaField(
description="Completion percentage (0-100)"
)
elapsed_time: float = SchemaField(description="Total time elapsed in seconds")
recall_info: dict = SchemaField(
description="Information about expected results and confidence"
)
timed_out: bool = SchemaField(description="Whether the operation timed out")
def __init__(self):
super().__init__(
id="14da21ae-40a1-41bc-a111-c8e5c9ef012b",
description="Wait for a specific webset search to complete with progress tracking",
categories={BlockCategory.SEARCH},
input_schema=ExaWaitForSearchBlock.Input,
output_schema=ExaWaitForSearchBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
start_time = time.time()
interval = input_data.check_interval
max_interval = 30
# Use AsyncExa SDK
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
try:
while time.time() - start_time < input_data.timeout:
# Get current search status using SDK
search = aexa.websets.searches.get(
webset_id=input_data.webset_id, id=input_data.search_id
)
# Extract status
status = (
search.status.value
if hasattr(search.status, "value")
else str(search.status)
)
# Check if search is complete
if status in ["completed", "failed", "canceled"]:
elapsed = time.time() - start_time
# Extract progress information
progress_dict = {}
if search.progress:
progress_dict = search.progress.model_dump(
by_alias=True, exclude_none=True
)
# Extract recall information
recall_info = {}
if search.recall:
recall_dict = search.recall.model_dump(
by_alias=True, exclude_none=True
)
expected = recall_dict.get("expected", {})
recall_info = {
"expected_total": expected.get("total", 0),
"confidence": expected.get("confidence", ""),
"min_expected": expected.get("bounds", {}).get("min", 0),
"max_expected": expected.get("bounds", {}).get("max", 0),
"reasoning": recall_dict.get("reasoning", ""),
}
yield "search_id", input_data.search_id
yield "final_status", status
yield "items_found", progress_dict.get("found", 0)
yield "items_analyzed", progress_dict.get("analyzed", 0)
yield "completion_percentage", progress_dict.get("completion", 0)
yield "elapsed_time", elapsed
yield "recall_info", recall_info
yield "timed_out", False
return
# Wait before next check with exponential backoff
await asyncio.sleep(interval)
interval = min(interval * 1.5, max_interval)
# Timeout reached
elapsed = time.time() - start_time
# Get last known status
search = aexa.websets.searches.get(
webset_id=input_data.webset_id, id=input_data.search_id
)
final_status = (
search.status.value
if hasattr(search.status, "value")
else str(search.status)
)
progress_dict = {}
if search.progress:
progress_dict = search.progress.model_dump(
by_alias=True, exclude_none=True
)
yield "search_id", input_data.search_id
yield "final_status", final_status
yield "items_found", progress_dict.get("found", 0)
yield "items_analyzed", progress_dict.get("analyzed", 0)
yield "completion_percentage", progress_dict.get("completion", 0)
yield "elapsed_time", elapsed
yield "timed_out", True
except asyncio.TimeoutError:
raise ValueError(
f"Search polling timed out after {input_data.timeout} seconds"
) from None
class ExaWaitForEnrichmentBlock(Block):
"""Wait for a webset enrichment to complete with progress tracking."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
webset_id: str = SchemaField(
description="The ID or external ID of the Webset",
placeholder="webset-id-or-external-id",
)
enrichment_id: str = SchemaField(
description="The ID of the enrichment to monitor",
placeholder="enrichment-id",
)
timeout: int = SchemaField(
default=300,
description="Maximum time to wait in seconds",
ge=1,
le=1800,
)
check_interval: int = SchemaField(
default=5,
description="Initial interval between status checks in seconds",
advanced=True,
ge=1,
le=60,
)
sample_results: bool = SchemaField(
default=True,
description="Include sample enrichment results in output",
)
class Output(BlockSchemaOutput):
enrichment_id: str = SchemaField(
description="The enrichment ID that was monitored"
)
final_status: str = SchemaField(
description="The final status of the enrichment"
)
items_enriched: int = SchemaField(
description="Number of items successfully enriched"
)
enrichment_title: str = SchemaField(
description="Title/description of the enrichment"
)
elapsed_time: float = SchemaField(description="Total time elapsed in seconds")
sample_data: list[SampleEnrichmentModel] = SchemaField(
description="Sample of enriched data (if requested)"
)
timed_out: bool = SchemaField(description="Whether the operation timed out")
def __init__(self):
super().__init__(
id="a11865c3-ac80-4721-8a40-ac4e3b71a558",
description="Wait for a webset enrichment to complete with progress tracking",
categories={BlockCategory.SEARCH},
input_schema=ExaWaitForEnrichmentBlock.Input,
output_schema=ExaWaitForEnrichmentBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
start_time = time.time()
interval = input_data.check_interval
max_interval = 30
# Use AsyncExa SDK
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
try:
while time.time() - start_time < input_data.timeout:
# Get current enrichment status using SDK
enrichment = aexa.websets.enrichments.get(
webset_id=input_data.webset_id, id=input_data.enrichment_id
)
# Extract status
status = (
enrichment.status.value
if hasattr(enrichment.status, "value")
else str(enrichment.status)
)
# Check if enrichment is complete
if status in ["completed", "failed", "canceled"]:
elapsed = time.time() - start_time
# Get sample enriched items if requested
sample_data = []
items_enriched = 0
if input_data.sample_results and status == "completed":
sample_data, items_enriched = (
await self._get_sample_enrichments(
input_data.webset_id, input_data.enrichment_id, aexa
)
)
yield "enrichment_id", input_data.enrichment_id
yield "final_status", status
yield "items_enriched", items_enriched
yield "enrichment_title", enrichment.title or enrichment.description or ""
yield "elapsed_time", elapsed
if input_data.sample_results:
yield "sample_data", sample_data
yield "timed_out", False
return
# Wait before next check with exponential backoff
await asyncio.sleep(interval)
interval = min(interval * 1.5, max_interval)
# Timeout reached
elapsed = time.time() - start_time
# Get last known status
enrichment = aexa.websets.enrichments.get(
webset_id=input_data.webset_id, id=input_data.enrichment_id
)
final_status = (
enrichment.status.value
if hasattr(enrichment.status, "value")
else str(enrichment.status)
)
title = enrichment.title or enrichment.description or ""
yield "enrichment_id", input_data.enrichment_id
yield "final_status", final_status
yield "items_enriched", 0
yield "enrichment_title", title
yield "elapsed_time", elapsed
yield "timed_out", True
except asyncio.TimeoutError:
raise ValueError(
f"Enrichment polling timed out after {input_data.timeout} seconds"
) from None
async def _get_sample_enrichments(
self, webset_id: str, enrichment_id: str, aexa: AsyncExa
) -> tuple[list[SampleEnrichmentModel], int]:
"""Get sample enriched data and count."""
# Get a few items to see enrichment results using SDK
response = aexa.websets.items.list(webset_id=webset_id, limit=5)
sample_data: list[SampleEnrichmentModel] = []
enriched_count = 0
for sdk_item in response.data:
# Convert to our WebsetItemModel first
item = WebsetItemModel.from_sdk(sdk_item)
# Check if this item has the enrichment we're looking for
if enrichment_id in item.enrichments:
enriched_count += 1
enrich_model = item.enrichments[enrichment_id]
# Create sample using our typed model
sample = SampleEnrichmentModel(
item_id=item.id,
item_title=item.title,
enrichment_data=enrich_model.model_dump(exclude_none=True),
)
sample_data.append(sample)
return sample_data, enriched_count

View File

@@ -0,0 +1,650 @@
"""
Exa Websets Search Management Blocks
This module provides blocks for creating and managing searches within websets,
including adding new searches, checking status, and canceling operations.
"""
from enum import Enum
from typing import Any, Dict, List, Optional
from exa_py import AsyncExa
from exa_py.websets.types import WebsetSearch as SdkWebsetSearch
from pydantic import BaseModel
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaInput,
BlockSchemaOutput,
CredentialsMetaInput,
SchemaField,
)
from ._config import exa
# Mirrored model for stability
class WebsetSearchModel(BaseModel):
"""Stable output model mirroring SDK WebsetSearch."""
id: str
webset_id: str
status: str
query: str
entity_type: str
criteria: List[Dict[str, Any]]
count: int
behavior: str
progress: Dict[str, Any]
recall: Optional[Dict[str, Any]]
created_at: str
updated_at: str
canceled_at: Optional[str]
canceled_reason: Optional[str]
metadata: Dict[str, Any]
@classmethod
def from_sdk(cls, search: SdkWebsetSearch) -> "WebsetSearchModel":
"""Convert SDK WebsetSearch to our stable model."""
# Extract entity type
entity_type = "auto"
if search.entity:
entity_dict = search.entity.model_dump(by_alias=True)
entity_type = entity_dict.get("type", "auto")
# Convert criteria
criteria = [c.model_dump(by_alias=True) for c in search.criteria]
# Convert progress
progress_dict = {}
if search.progress:
progress_dict = search.progress.model_dump(by_alias=True)
# Convert recall
recall_dict = None
if search.recall:
recall_dict = search.recall.model_dump(by_alias=True)
return cls(
id=search.id,
webset_id=search.webset_id,
status=(
search.status.value
if hasattr(search.status, "value")
else str(search.status)
),
query=search.query,
entity_type=entity_type,
criteria=criteria,
count=search.count,
behavior=search.behavior.value if search.behavior else "override",
progress=progress_dict,
recall=recall_dict,
created_at=search.created_at.isoformat() if search.created_at else "",
updated_at=search.updated_at.isoformat() if search.updated_at else "",
canceled_at=search.canceled_at.isoformat() if search.canceled_at else None,
canceled_reason=(
search.canceled_reason.value if search.canceled_reason else None
),
metadata=search.metadata if search.metadata else {},
)
class SearchBehavior(str, Enum):
"""Behavior for how new search results interact with existing items."""
OVERRIDE = "override" # Replace existing items
APPEND = "append" # Add to existing items
MERGE = "merge" # Merge with existing items
class SearchEntityType(str, Enum):
COMPANY = "company"
PERSON = "person"
ARTICLE = "article"
RESEARCH_PAPER = "research_paper"
CUSTOM = "custom"
AUTO = "auto"
class ExaCreateWebsetSearchBlock(Block):
"""Add a new search to an existing webset."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
webset_id: str = SchemaField(
description="The ID or external ID of the Webset",
placeholder="webset-id-or-external-id",
)
query: str = SchemaField(
description="Search query describing what to find",
placeholder="Engineering managers at Fortune 500 companies",
)
count: int = SchemaField(
default=10,
description="Number of items to find",
ge=1,
le=1000,
)
# Entity configuration
entity_type: SearchEntityType = SchemaField(
default=SearchEntityType.AUTO,
description="Type of entity to search for",
)
entity_description: Optional[str] = SchemaField(
default=None,
description="Description for custom entity type",
advanced=True,
)
# Criteria for verification
criteria: list[str] = SchemaField(
default_factory=list,
description="List of criteria that items must meet. If not provided, auto-detected from query.",
advanced=True,
)
# Advanced search options
behavior: SearchBehavior = SchemaField(
default=SearchBehavior.APPEND,
description="How new results interact with existing items",
advanced=True,
)
recall: bool = SchemaField(
default=True,
description="Enable recall estimation for expected results",
advanced=True,
)
# Exclude sources
exclude_source_ids: list[str] = SchemaField(
default_factory=list,
description="IDs of imports/websets to exclude from results",
advanced=True,
)
exclude_source_types: list[str] = SchemaField(
default_factory=list,
description="Types of sources to exclude ('import' or 'webset')",
advanced=True,
)
# Scope sources
scope_source_ids: list[str] = SchemaField(
default_factory=list,
description="IDs of imports/websets to limit search scope to",
advanced=True,
)
scope_source_types: list[str] = SchemaField(
default_factory=list,
description="Types of scope sources ('import' or 'webset')",
advanced=True,
)
scope_relationships: list[str] = SchemaField(
default_factory=list,
description="Relationship definitions for hop searches",
advanced=True,
)
scope_relationship_limits: list[int] = SchemaField(
default_factory=list,
description="Limits on related entities to find",
advanced=True,
)
metadata: Optional[dict] = SchemaField(
default=None,
description="Metadata to attach to the search",
advanced=True,
)
# Polling options
wait_for_completion: bool = SchemaField(
default=False,
description="Wait for the search to complete before returning",
)
polling_timeout: int = SchemaField(
default=300,
description="Maximum time to wait for completion in seconds",
advanced=True,
ge=1,
le=600,
)
class Output(BlockSchemaOutput):
search_id: str = SchemaField(
description="The unique identifier for the created search"
)
webset_id: str = SchemaField(description="The webset this search belongs to")
status: str = SchemaField(description="Current status of the search")
query: str = SchemaField(description="The search query")
expected_results: dict = SchemaField(
description="Recall estimation of expected results"
)
items_found: Optional[int] = SchemaField(
description="Number of items found (if wait_for_completion was True)"
)
completion_time: Optional[float] = SchemaField(
description="Time taken to complete in seconds (if wait_for_completion was True)"
)
def __init__(self):
super().__init__(
id="342ff776-2e2c-4cdb-b392-4eeb34b21d5f",
description="Add a new search to an existing webset to find more items",
categories={BlockCategory.SEARCH},
input_schema=ExaCreateWebsetSearchBlock.Input,
output_schema=ExaCreateWebsetSearchBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
import time
# Build the payload
payload = {
"query": input_data.query,
"count": input_data.count,
"behavior": input_data.behavior.value,
"recall": input_data.recall,
}
# Add entity configuration
if input_data.entity_type != SearchEntityType.AUTO:
entity = {"type": input_data.entity_type.value}
if (
input_data.entity_type == SearchEntityType.CUSTOM
and input_data.entity_description
):
entity["description"] = input_data.entity_description
payload["entity"] = entity
# Add criteria if provided
if input_data.criteria:
payload["criteria"] = [{"description": c} for c in input_data.criteria]
# Add exclude sources
if input_data.exclude_source_ids:
exclude_list = []
for idx, src_id in enumerate(input_data.exclude_source_ids):
src_type = "import"
if input_data.exclude_source_types and idx < len(
input_data.exclude_source_types
):
src_type = input_data.exclude_source_types[idx]
exclude_list.append({"source": src_type, "id": src_id})
payload["exclude"] = exclude_list
# Add scope sources
if input_data.scope_source_ids:
scope_list: list[dict[str, Any]] = []
for idx, src_id in enumerate(input_data.scope_source_ids):
scope_item: dict[str, Any] = {"source": "import", "id": src_id}
if input_data.scope_source_types and idx < len(
input_data.scope_source_types
):
scope_item["source"] = input_data.scope_source_types[idx]
# Add relationship if provided
if input_data.scope_relationships and idx < len(
input_data.scope_relationships
):
relationship: dict[str, Any] = {
"definition": input_data.scope_relationships[idx]
}
if input_data.scope_relationship_limits and idx < len(
input_data.scope_relationship_limits
):
relationship["limit"] = input_data.scope_relationship_limits[
idx
]
scope_item["relationship"] = relationship
scope_list.append(scope_item)
payload["scope"] = scope_list
# Add metadata if provided
if input_data.metadata:
payload["metadata"] = input_data.metadata
start_time = time.time()
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
sdk_search = aexa.websets.searches.create(
webset_id=input_data.webset_id, params=payload
)
search_id = sdk_search.id
status = (
sdk_search.status.value
if hasattr(sdk_search.status, "value")
else str(sdk_search.status)
)
# Extract expected results from recall
expected_results = {}
if sdk_search.recall:
recall_dict = sdk_search.recall.model_dump(by_alias=True)
expected = recall_dict.get("expected", {})
expected_results = {
"total": expected.get("total", 0),
"confidence": expected.get("confidence", ""),
"min": expected.get("bounds", {}).get("min", 0),
"max": expected.get("bounds", {}).get("max", 0),
"reasoning": recall_dict.get("reasoning", ""),
}
# If wait_for_completion is True, poll for completion
if input_data.wait_for_completion:
import asyncio
poll_interval = 5
max_interval = 30
poll_start = time.time()
while time.time() - poll_start < input_data.polling_timeout:
current_search = aexa.websets.searches.get(
webset_id=input_data.webset_id, id=search_id
)
current_status = (
current_search.status.value
if hasattr(current_search.status, "value")
else str(current_search.status)
)
if current_status in ["completed", "failed", "cancelled"]:
items_found = 0
if current_search.progress:
items_found = current_search.progress.found
completion_time = time.time() - start_time
yield "search_id", search_id
yield "webset_id", input_data.webset_id
yield "status", current_status
yield "query", input_data.query
yield "expected_results", expected_results
yield "items_found", items_found
yield "completion_time", completion_time
return
await asyncio.sleep(poll_interval)
poll_interval = min(poll_interval * 1.5, max_interval)
# Timeout - yield what we have
yield "search_id", search_id
yield "webset_id", input_data.webset_id
yield "status", status
yield "query", input_data.query
yield "expected_results", expected_results
yield "items_found", 0
yield "completion_time", time.time() - start_time
else:
yield "search_id", search_id
yield "webset_id", input_data.webset_id
yield "status", status
yield "query", input_data.query
yield "expected_results", expected_results
class ExaGetWebsetSearchBlock(Block):
"""Get the status and details of a webset search."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
webset_id: str = SchemaField(
description="The ID or external ID of the Webset",
placeholder="webset-id-or-external-id",
)
search_id: str = SchemaField(
description="The ID of the search to retrieve",
placeholder="search-id",
)
class Output(BlockSchemaOutput):
search_id: str = SchemaField(description="The unique identifier for the search")
status: str = SchemaField(description="Current status of the search")
query: str = SchemaField(description="The search query")
entity_type: str = SchemaField(description="Type of entity being searched")
criteria: list[dict] = SchemaField(description="Criteria used for verification")
progress: dict = SchemaField(description="Search progress information")
recall: dict = SchemaField(description="Recall estimation information")
created_at: str = SchemaField(description="When the search was created")
updated_at: str = SchemaField(description="When the search was last updated")
canceled_at: Optional[str] = SchemaField(
description="When the search was canceled (if applicable)"
)
canceled_reason: Optional[str] = SchemaField(
description="Reason for cancellation (if applicable)"
)
metadata: dict = SchemaField(description="Metadata attached to the search")
def __init__(self):
super().__init__(
id="4fa3e627-a0ff-485f-8732-52148051646c",
description="Get the status and details of a webset search",
categories={BlockCategory.SEARCH},
input_schema=ExaGetWebsetSearchBlock.Input,
output_schema=ExaGetWebsetSearchBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
# Use AsyncExa SDK
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
sdk_search = aexa.websets.searches.get(
webset_id=input_data.webset_id, id=input_data.search_id
)
search = WebsetSearchModel.from_sdk(sdk_search)
# Extract progress information
progress_info = {
"found": search.progress.get("found", 0),
"analyzed": search.progress.get("analyzed", 0),
"completion": search.progress.get("completion", 0),
"time_left": search.progress.get("timeLeft", 0),
}
# Extract recall information
recall_data = {}
if search.recall:
expected = search.recall.get("expected", {})
recall_data = {
"expected_total": expected.get("total", 0),
"confidence": expected.get("confidence", ""),
"min_expected": expected.get("bounds", {}).get("min", 0),
"max_expected": expected.get("bounds", {}).get("max", 0),
"reasoning": search.recall.get("reasoning", ""),
}
yield "search_id", search.id
yield "status", search.status
yield "query", search.query
yield "entity_type", search.entity_type
yield "criteria", search.criteria
yield "progress", progress_info
yield "recall", recall_data
yield "created_at", search.created_at
yield "updated_at", search.updated_at
yield "canceled_at", search.canceled_at
yield "canceled_reason", search.canceled_reason
yield "metadata", search.metadata
class ExaCancelWebsetSearchBlock(Block):
"""Cancel a running webset search."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
webset_id: str = SchemaField(
description="The ID or external ID of the Webset",
placeholder="webset-id-or-external-id",
)
search_id: str = SchemaField(
description="The ID of the search to cancel",
placeholder="search-id",
)
class Output(BlockSchemaOutput):
search_id: str = SchemaField(description="The ID of the canceled search")
status: str = SchemaField(description="Status after cancellation")
items_found_before_cancel: int = SchemaField(
description="Number of items found before cancellation"
)
success: str = SchemaField(
description="Whether the cancellation was successful"
)
def __init__(self):
super().__init__(
id="74ef9f1e-ae89-4c7f-9d7d-d217214815b4",
description="Cancel a running webset search",
categories={BlockCategory.SEARCH},
input_schema=ExaCancelWebsetSearchBlock.Input,
output_schema=ExaCancelWebsetSearchBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
# Use AsyncExa SDK
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
canceled_search = aexa.websets.searches.cancel(
webset_id=input_data.webset_id, id=input_data.search_id
)
# Extract items found before cancellation
items_found = 0
if canceled_search.progress:
items_found = canceled_search.progress.found
status = (
canceled_search.status.value
if hasattr(canceled_search.status, "value")
else str(canceled_search.status)
)
yield "search_id", canceled_search.id
yield "status", status
yield "items_found_before_cancel", items_found
yield "success", "true"
class ExaFindOrCreateSearchBlock(Block):
"""Find existing search by query or create new one (prevents duplicate searches)."""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = exa.credentials_field(
description="The Exa integration requires an API Key."
)
webset_id: str = SchemaField(
description="The ID or external ID of the Webset",
placeholder="webset-id-or-external-id",
)
query: str = SchemaField(
description="Search query to find or create",
placeholder="AI companies in San Francisco",
)
count: int = SchemaField(
default=10,
description="Number of items to find (only used if creating new search)",
ge=1,
le=1000,
)
entity_type: SearchEntityType = SchemaField(
default=SearchEntityType.AUTO,
description="Entity type (only used if creating)",
advanced=True,
)
behavior: SearchBehavior = SchemaField(
default=SearchBehavior.OVERRIDE,
description="Search behavior (only used if creating)",
advanced=True,
)
class Output(BlockSchemaOutput):
search_id: str = SchemaField(description="The search ID (existing or new)")
webset_id: str = SchemaField(description="The webset ID")
status: str = SchemaField(description="Current search status")
query: str = SchemaField(description="The search query")
was_created: bool = SchemaField(
description="True if search was newly created, False if already existed"
)
items_found: int = SchemaField(
description="Number of items found (0 if still running)"
)
def __init__(self):
super().__init__(
id="cbdb05ac-cb73-4b03-a493-6d34e9a011da",
description="Find existing search by query or create new - prevents duplicate searches in workflows",
categories={BlockCategory.SEARCH},
input_schema=ExaFindOrCreateSearchBlock.Input,
output_schema=ExaFindOrCreateSearchBlock.Output,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
# Use AsyncExa SDK
aexa = AsyncExa(api_key=credentials.api_key.get_secret_value())
# Get webset to check existing searches
webset = aexa.websets.get(id=input_data.webset_id)
# Look for existing search with same query
existing_search = None
if webset.searches:
for search in webset.searches:
if search.query.strip().lower() == input_data.query.strip().lower():
existing_search = search
break
if existing_search:
# Found existing search
search = WebsetSearchModel.from_sdk(existing_search)
yield "search_id", search.id
yield "webset_id", input_data.webset_id
yield "status", search.status
yield "query", search.query
yield "was_created", False
yield "items_found", search.progress.get("found", 0)
else:
# Create new search
payload: Dict[str, Any] = {
"query": input_data.query,
"count": input_data.count,
"behavior": input_data.behavior.value,
}
# Add entity if not auto
if input_data.entity_type != SearchEntityType.AUTO:
payload["entity"] = {"type": input_data.entity_type.value}
sdk_search = aexa.websets.searches.create(
webset_id=input_data.webset_id, params=payload
)
search = WebsetSearchModel.from_sdk(sdk_search)
yield "search_id", search.id
yield "webset_id", input_data.webset_id
yield "status", search.status
yield "query", search.query
yield "was_created", True
yield "items_found", 0 # Newly created, no items yet

View File

@@ -90,7 +90,13 @@ test.describe("Build", () => { //(1)!
});
test("user can add blocks starting with e", async () => {
await addBlocksStartingWithSplit("e", 1, 1);
test.setTimeout(60000); // Increase timeout for many Exa blocks
await addBlocksStartingWithSplit("e", 1, 2);
});
test("user can add blocks starting with e pt 2", async () => {
test.setTimeout(60000); // Increase timeout for many Exa blocks
await addBlocksStartingWithSplit("e", 2, 2);
});
test("user can add blocks starting with f", async () => {

View File

@@ -113,17 +113,20 @@ export class BuildPage extends BasePage {
const displayName = this.getDisplayName(block.name);
await searchInput.clear();
await searchInput.fill(displayName);
await this.page.waitForTimeout(500);
const blockCard = this.page.getByTestId(`block-name-${block.id}`);
if (await blockCard.isVisible()) {
try {
// Wait for the block card to be visible with a reasonable timeout
await blockCard.waitFor({ state: "visible", timeout: 10000 });
await blockCard.click();
const blockInEditor = this.page.getByTestId(block.id).first();
expect(blockInEditor).toBeAttached();
} else {
} catch (error) {
console.log(
`❌ ❌ Block ${block.name} (display: ${displayName}) returned from the API but not found in block list`,
);
console.log(`Error: ${error}`);
}
}

View File

@@ -614,6 +614,18 @@ custom_requests = Requests(
)
```
### Error Handling
All blocks should have an error output that catches all reasonable errors that a user can handle, wrap them in a ValueError, and re-raise. Don't catch things the system admin would need to fix like being out of money or unreachable addresses.
### Data Models
Use pydantic base models over dict and typeddict where possible. Avoid untyped models for block inputs and outputs as much as possible
### File Input
You can use MediaFileType to handle the importing and exporting of files out of the system. Explore how its used through the system before using it in a block schema.
## Tips for Effective Block Testing
1. **Provide realistic test_input**: Ensure your test input covers typical use cases.
@@ -633,77 +645,3 @@ custom_requests = Requests(
6. **Update tests when changing block behavior**: If you modify your block, ensure the tests are updated accordingly.
By following these steps, you can create new blocks that extend the functionality of the AutoGPT Agent Server.
## Blocks we want to see
Below is a list of blocks that we would like to see implemented in the AutoGPT Agent Server. If you're interested in contributing, feel free to pick one of these blocks or chose your own.
If you would like to implement one of these blocks, open a pull request and we will start the review process.
### Consumer Services/Platforms
- Google sheets - [~~Read/Append~~](https://github.com/Significant-Gravitas/AutoGPT/pull/8236)
- Email - Read/Send with [~~Gmail~~](https://github.com/Significant-Gravitas/AutoGPT/pull/8236), Outlook, Yahoo, Proton, etc
- Calendar - Read/Write with Google Calendar, Outlook Calendar, etc
- Home Assistant - Call Service, Get Status
- Dominos - Order Pizza, Track Order
- Uber - Book Ride, Track Ride
- Notion - Create/Read Page, Create/Append/Read DB
- Google drive - read/write/overwrite file/folder
### Social Media
- Twitter - Post, Reply, Get Replies, Get Comments, Get Followers, Get Following, Get Tweets, Get Mentions
- Instagram - Post, Reply, Get Comments, Get Followers, Get Following, Get Posts, Get Mentions, Get Trending Posts
- TikTok - Post, Reply, Get Comments, Get Followers, Get Following, Get Videos, Get Mentions, Get Trending Videos
- LinkedIn - Post, Reply, Get Comments, Get Followers, Get Following, Get Posts, Get Mentions, Get Trending Posts
- YouTube - Transcribe Videos/Shorts, Post Videos/Shorts, Read/Reply/React to Comments, Update Thumbnails, Update Description, Update Tags, Update Titles, Get Views, Get Likes, Get Dislikes, Get Subscribers, Get Comments, Get Shares, Get Watch Time, Get Revenue, Get Trending Videos, Get Top Videos, Get Top Channels
- Reddit - Post, Reply, Get Comments, Get Followers, Get Following, Get Posts, Get Mentions, Get Trending Posts
- Treatwell (and related Platforms) - Book, Cancel, Review, Get Recommendations
- Substack - Read/Subscribe/Unsubscribe, Post/Reply, Get Recommendations
- Discord - Read/Post/Reply, Moderation actions
- GoodReads - Read/Post/Reply, Get Recommendations
### E-commerce
- Airbnb - Book, Cancel, Review, Get Recommendations
- Amazon - Order, Track Order, Return, Review, Get Recommendations
- eBay - Order, Track Order, Return, Review, Get Recommendations
- Upwork - Post Jobs, Hire Freelancer, Review Freelancer, Fire Freelancer
### Business Tools
- External Agents - Call other agents similar to AutoGPT
- Trello - Create/Read/Update/Delete Cards, Lists, Boards
- Jira - Create/Read/Update/Delete Issues, Projects, Boards
- Linear - Create/Read/Update/Delete Issues, Projects, Boards
- Excel - Read/Write/Update/Delete Rows, Columns, Sheets
- Slack - Read/Post/Reply to Messages, Create Channels, Invite Users
- ERPNext - Create/Read/Update/Delete Invoices, Orders, Customers, Products
- Salesforce - Create/Read/Update/Delete Leads, Opportunities, Accounts
- HubSpot - Create/Read/Update/Delete Contacts, Deals, Companies
- Zendesk - Create/Read/Update/Delete Tickets, Users, Organizations
- Odoo - Create/Read/Update/Delete Sales Orders, Invoices, Customers
- Shopify - Create/Read/Update/Delete Products, Orders, Customers
- WooCommerce - Create/Read/Update/Delete Products, Orders, Customers
- Squarespace - Create/Read/Update/Delete Pages, Products, Orders
## Agent Templates we want to see
### Data/Information
- Summarize top news of today, of this week, this month via Apple News or other large media outlets BBC, TechCrunch, hackernews, etc
- Create, read, and summarize substack newsletters or any newsletters (blog writer vs blog reader)
- Get/read/summarize the most viral Twitter, Instagram, TikTok (general social media accounts) of the day, week, month
- Get/Read any LinkedIn posts or profile that mention AI Agents
- Read/Summarize discord (might not be able to do this because you need access)
- Read / Get most read books in a given month, year, etc from GoodReads or Amazon Books, etc
- Get dates for specific shows across all streaming services
- Suggest/Recommend/Get most watched shows in a given month, year, etc across all streaming platforms
- Data analysis from xlsx data set
- Gather via Excel or Google Sheets data > Sample the data randomly (sample block takes top X, bottom X, randomly, etc) > pass that to LLM Block to generate a script for analysis of the full data > Python block to run the script> making a loop back through LLM Fix Block on error > create chart/visualization (potentially in the code block?) > show the image as output (this may require frontend changes to show)
- Tiktok video search and download
### Marketing
- Portfolio site design and enhancements