mirror of
https://github.com/simstudioai/sim.git
synced 2026-04-06 03:00:16 -04:00
improvement(connectors): audit and harden all 30 knowledge base connectors (#3603)
* improvement(connectors): audit and harden all 30 knowledge base connectors
* fix(oauth): update Notion test to match Basic Auth + JSON body config
* fix(connectors): address PR review comments for hubspot, jira, salesforce
- HubSpot: revert to Search API (POST /search) to restore lastmodifieddate DESCENDING sorting
- Salesforce: restore ArticleBody field and add it to HTML_FIELDS for proper stripping
- Jira: add zero-remaining guard to prevent requesting 0 maxResults
* fix(salesforce): revert ArticleBody — not a standard KnowledgeArticleVersion field
ArticleBody is not a standard field on KnowledgeArticleVersion per Salesforce
API docs. Article body content lives in custom fields on org-specific __kav
objects. Including ArticleBody in the SOQL query would cause runtime errors.
* fix(connectors): address second round of PR review comments
- OneDrive: use Buffer.subarray for byte-accurate truncation instead of
character-count slice
- Reddit: deduplicate comment extraction — fetchPostComments now calls
extractComments instead of duplicating the logic
- Webflow: replace crude value.includes('<') with regex /<[a-z][^>]*>/i
to avoid false positives on plain text containing '<'
- Jira: add response.ok check in getJiraCloudId before parsing JSON to
surface real HTTP errors instead of misleading "No Jira resources found"
* fix(jira,outlook): replace raw fetch in downloadJiraAttachments, fix Outlook URL encoding
- Jira: replace bare fetch() with fetchWithRetry in downloadJiraAttachments
for retry logic on transient errors and rate limits
- Outlook: use URLSearchParams in validateConfig $search URL construction
to match buildInitialUrl and produce RFC 3986 compliant encoding
This commit is contained in:
316
.claude/commands/validate-connector.md
Normal file
316
.claude/commands/validate-connector.md
Normal file
@@ -0,0 +1,316 @@
|
||||
---
|
||||
description: Validate an existing knowledge base connector against its service's API docs
|
||||
argument-hint: <service-name> [api-docs-url]
|
||||
---
|
||||
|
||||
# Validate Connector Skill
|
||||
|
||||
You are an expert auditor for Sim knowledge base connectors. Your job is to thoroughly validate that an existing connector is correct, complete, and follows all conventions.
|
||||
|
||||
## Your Task
|
||||
|
||||
When the user asks you to validate a connector:
|
||||
1. Read the service's API documentation (via Context7 or WebFetch)
|
||||
2. Read the connector implementation, OAuth config, and registry entries
|
||||
3. Cross-reference everything against the API docs and Sim conventions
|
||||
4. Report all issues found, grouped by severity (critical, warning, suggestion)
|
||||
5. Fix all issues after reporting them
|
||||
|
||||
## Step 1: Gather All Files
|
||||
|
||||
Read **every** file for the connector — do not skip any:
|
||||
|
||||
```
|
||||
apps/sim/connectors/{service}/{service}.ts # Connector implementation
|
||||
apps/sim/connectors/{service}/index.ts # Barrel export
|
||||
apps/sim/connectors/registry.ts # Connector registry entry
|
||||
apps/sim/connectors/types.ts # ConnectorConfig interface, ExternalDocument, etc.
|
||||
apps/sim/connectors/utils.ts # Shared utilities (computeContentHash, htmlToPlainText, etc.)
|
||||
apps/sim/lib/oauth/oauth.ts # OAUTH_PROVIDERS — single source of truth for scopes
|
||||
apps/sim/lib/oauth/utils.ts # getCanonicalScopesForProvider, getScopesForService, SCOPE_DESCRIPTIONS
|
||||
apps/sim/lib/oauth/types.ts # OAuthService union type
|
||||
apps/sim/components/icons.tsx # Icon definition for the service
|
||||
```
|
||||
|
||||
If the connector uses selectors, also read:
|
||||
```
|
||||
apps/sim/hooks/selectors/registry.ts # Selector key definitions
|
||||
apps/sim/hooks/selectors/types.ts # SelectorKey union type
|
||||
apps/sim/lib/workflows/subblocks/context.ts # SELECTOR_CONTEXT_FIELDS
|
||||
```
|
||||
|
||||
## Step 2: Pull API Documentation
|
||||
|
||||
Fetch the official API docs for the service. This is the **source of truth** for:
|
||||
- Endpoint URLs, HTTP methods, and auth headers
|
||||
- Required vs optional parameters
|
||||
- Parameter types and allowed values
|
||||
- Response shapes and field names
|
||||
- Pagination patterns (cursor, offset, next token)
|
||||
- Rate limits and error formats
|
||||
- OAuth scopes and their meanings
|
||||
|
||||
Use Context7 (resolve-library-id → query-docs) or WebFetch to retrieve documentation. If both fail, note which claims are based on training knowledge vs verified docs.
|
||||
|
||||
## Step 3: Validate API Endpoints
|
||||
|
||||
For **every** API call in the connector (`listDocuments`, `getDocument`, `validateConfig`, and any helper functions), verify against the API docs:
|
||||
|
||||
### URLs and Methods
|
||||
- [ ] Base URL is correct for the service's API version
|
||||
- [ ] Endpoint paths match the API docs exactly
|
||||
- [ ] HTTP method is correct (GET, POST, PUT, PATCH, DELETE)
|
||||
- [ ] Path parameters are correctly interpolated and URI-encoded where needed
|
||||
- [ ] Query parameters use correct names and formats per the API docs
|
||||
|
||||
### Headers
|
||||
- [ ] Authorization header uses the correct format:
|
||||
- OAuth: `Authorization: Bearer ${accessToken}`
|
||||
- API Key: correct header name per the service's docs
|
||||
- [ ] `Content-Type` is set for POST/PUT/PATCH requests
|
||||
- [ ] Any service-specific headers are present (e.g., `Notion-Version`, `Dropbox-API-Arg`)
|
||||
- [ ] No headers are sent that the API doesn't support or silently ignores
|
||||
|
||||
### Request Bodies
|
||||
- [ ] POST/PUT body fields match API parameter names exactly
|
||||
- [ ] Required fields are always sent
|
||||
- [ ] Optional fields are conditionally included (not sent as `null` or empty unless the API expects that)
|
||||
- [ ] Field value types match API expectations (string vs number vs boolean)
|
||||
|
||||
### Input Sanitization
|
||||
- [ ] User-controlled values interpolated into query strings are properly escaped:
|
||||
- OData `$filter`: single quotes escaped with `''` (e.g., `externalId.replace(/'/g, "''")`)
|
||||
- SOQL: single quotes escaped with `\'`
|
||||
- GraphQL variables: passed as variables, not interpolated into query strings
|
||||
- URL path segments: `encodeURIComponent()` applied
|
||||
- [ ] URL-type config fields (e.g., `siteUrl`, `instanceUrl`) are normalized:
|
||||
- Strip `https://` / `http://` prefix if the API expects bare domains
|
||||
- Strip trailing `/`
|
||||
- Apply `.trim()` before validation
|
||||
|
||||
### Response Parsing
|
||||
- [ ] Response structure is correctly traversed (e.g., `data.results` vs `data.items` vs `data`)
|
||||
- [ ] Field names extracted match what the API actually returns
|
||||
- [ ] Nullable fields are handled with `?? null` or `|| undefined`
|
||||
- [ ] Error responses are checked before accessing data fields
|
||||
|
||||
## Step 4: Validate OAuth Scopes (if OAuth connector)
|
||||
|
||||
Scopes must be correctly declared and sufficient for all API calls the connector makes.
|
||||
|
||||
### Connector requiredScopes
|
||||
- [ ] `requiredScopes` in the connector's `auth` config lists all scopes needed by the connector
|
||||
- [ ] Each scope in `requiredScopes` is a real, valid scope recognized by the service's API
|
||||
- [ ] No invalid, deprecated, or made-up scopes are listed
|
||||
- [ ] No unnecessary excess scopes beyond what the connector actually needs
|
||||
|
||||
### Scope Subset Validation (CRITICAL)
|
||||
- [ ] Every scope in `requiredScopes` exists in the OAuth provider's `scopes` array in `lib/oauth/oauth.ts`
|
||||
- [ ] Find the provider in `OAUTH_PROVIDERS[providerGroup].services[serviceId].scopes`
|
||||
- [ ] Verify: `requiredScopes` ⊆ `OAUTH_PROVIDERS scopes` (every required scope is present in the provider config)
|
||||
- [ ] If a required scope is NOT in the provider config, flag as **critical** — the connector will fail at runtime
|
||||
|
||||
### Scope Sufficiency
|
||||
For each API endpoint the connector calls:
|
||||
- [ ] Identify which scopes are required per the API docs
|
||||
- [ ] Verify those scopes are included in the connector's `requiredScopes`
|
||||
- [ ] If the connector calls endpoints requiring scopes not in `requiredScopes`, flag as **warning**
|
||||
|
||||
### Token Refresh Config
|
||||
- [ ] Check the `getOAuthTokenRefreshConfig` function in `lib/oauth/oauth.ts` for this provider
|
||||
- [ ] `useBasicAuth` matches the service's token exchange requirements
|
||||
- [ ] `supportsRefreshTokenRotation` matches whether the service issues rotating refresh tokens
|
||||
- [ ] Token endpoint URL is correct
|
||||
|
||||
## Step 5: Validate Pagination
|
||||
|
||||
### listDocuments Pagination
|
||||
- [ ] Cursor/pagination parameter name matches the API docs
|
||||
- [ ] Response pagination field is correctly extracted (e.g., `next_cursor`, `nextPageToken`, `@odata.nextLink`, `offset`)
|
||||
- [ ] `hasMore` is correctly determined from the response
|
||||
- [ ] `nextCursor` is correctly passed back for the next page
|
||||
- [ ] `maxItems` / `maxRecords` cap is correctly applied across pages using `syncContext.totalDocsFetched`
|
||||
- [ ] Page size is within the API's allowed range (not exceeding max page size)
|
||||
- [ ] Last page precision: when a `maxItems` cap exists, the final page request uses `Math.min(PAGE_SIZE, remaining)` to avoid fetching more records than needed
|
||||
- [ ] No off-by-one errors in pagination tracking
|
||||
- [ ] The connector does NOT hit known API pagination limits silently (e.g., HubSpot search 10k cap)
|
||||
|
||||
### Pagination State Across Pages
|
||||
- [ ] `syncContext` is used to cache state across pages (user names, field maps, instance URLs, portal IDs, etc.)
|
||||
- [ ] Cached state in `syncContext` is correctly initialized on first page and reused on subsequent pages
|
||||
|
||||
## Step 6: Validate Data Transformation
|
||||
|
||||
### ExternalDocument Construction
|
||||
- [ ] `externalId` is a stable, unique identifier from the source API
|
||||
- [ ] `title` is extracted from the correct field and has a sensible fallback (e.g., `'Untitled'`)
|
||||
- [ ] `content` is plain text — HTML content is stripped using `htmlToPlainText` from `@/connectors/utils`
|
||||
- [ ] `mimeType` is `'text/plain'`
|
||||
- [ ] `contentHash` is computed using `computeContentHash` from `@/connectors/utils`
|
||||
- [ ] `sourceUrl` is a valid, complete URL back to the original resource (not relative)
|
||||
- [ ] `metadata` contains all fields referenced by `mapTags` and `tagDefinitions`
|
||||
|
||||
### Content Extraction
|
||||
- [ ] Rich text / HTML fields are converted to plain text before indexing
|
||||
- [ ] Important content is not silently dropped (e.g., nested blocks, table cells, code blocks)
|
||||
- [ ] Content is not silently truncated without logging a warning
|
||||
- [ ] Empty/blank documents are properly filtered out
|
||||
- [ ] Size checks use `Buffer.byteLength(text, 'utf8')` not `text.length` when comparing against byte-based limits (e.g., `MAX_FILE_SIZE` in bytes)
|
||||
|
||||
## Step 7: Validate Tag Definitions and mapTags
|
||||
|
||||
### tagDefinitions
|
||||
- [ ] Each `tagDefinition` has an `id`, `displayName`, and `fieldType`
|
||||
- [ ] `fieldType` matches the actual data type: `'text'` for strings, `'number'` for numbers, `'date'` for dates, `'boolean'` for booleans
|
||||
- [ ] Every `id` in `tagDefinitions` is returned by `mapTags`
|
||||
- [ ] No `tagDefinition` references a field that `mapTags` never produces
|
||||
|
||||
### mapTags
|
||||
- [ ] Return keys match `tagDefinition` `id` values exactly
|
||||
- [ ] Date values are properly parsed using `parseTagDate` from `@/connectors/utils`
|
||||
- [ ] Array values are properly joined using `joinTagArray` from `@/connectors/utils`
|
||||
- [ ] Number values are validated (not `NaN`)
|
||||
- [ ] Metadata field names accessed in `mapTags` match what `listDocuments`/`getDocument` store in `metadata`
|
||||
|
||||
## Step 8: Validate Config Fields and Validation
|
||||
|
||||
### configFields
|
||||
- [ ] Every field has `id`, `title`, `type`
|
||||
- [ ] `required` is set explicitly (not omitted)
|
||||
- [ ] Dropdown fields have `options` with `label` and `id` for each option
|
||||
- [ ] Selector fields follow the canonical pair pattern:
|
||||
- A `type: 'selector'` field with `selectorKey`, `canonicalParamId`, `mode: 'basic'`
|
||||
- A `type: 'short-input'` field with the same `canonicalParamId`, `mode: 'advanced'`
|
||||
- `required` is identical on both fields in the pair
|
||||
- [ ] `selectorKey` values exist in the selector registry
|
||||
- [ ] `dependsOn` references selector field `id` values, not `canonicalParamId`
|
||||
|
||||
### validateConfig
|
||||
- [ ] Validates all required fields are present before making API calls
|
||||
- [ ] Validates optional numeric fields (checks `Number.isNaN`, positive values)
|
||||
- [ ] Makes a lightweight API call to verify access (e.g., fetch 1 record, get profile)
|
||||
- [ ] Uses `VALIDATE_RETRY_OPTIONS` for retry budget
|
||||
- [ ] Returns `{ valid: true }` on success
|
||||
- [ ] Returns `{ valid: false, error: 'descriptive message' }` on failure
|
||||
- [ ] Catches exceptions and returns user-friendly error messages
|
||||
- [ ] Does NOT make expensive calls (full data listing, large queries)
|
||||
|
||||
## Step 9: Validate getDocument
|
||||
|
||||
- [ ] Fetches a single document by `externalId`
|
||||
- [ ] Returns `null` for 404 / not found (does not throw)
|
||||
- [ ] Returns the same `ExternalDocument` shape as `listDocuments`
|
||||
- [ ] Handles all content types that `listDocuments` can produce (e.g., if `listDocuments` returns both pages and blogposts, `getDocument` must handle both — not hardcode one endpoint)
|
||||
- [ ] Forwards `syncContext` if it needs cached state (user names, field maps, etc.)
|
||||
- [ ] Error handling is graceful (catches, logs, returns null or throws with context)
|
||||
- [ ] Does not redundantly re-fetch data already included in the initial API response (e.g., if comments come back with the post, don't fetch them again separately)
|
||||
|
||||
## Step 10: Validate General Quality
|
||||
|
||||
### fetchWithRetry Usage
|
||||
- [ ] All external API calls use `fetchWithRetry` from `@/lib/knowledge/documents/utils`
|
||||
- [ ] No raw `fetch()` calls to external APIs
|
||||
- [ ] `VALIDATE_RETRY_OPTIONS` used in `validateConfig`
|
||||
- [ ] If `validateConfig` calls a shared helper (e.g., `linearGraphQL`, `resolveId`), that helper must accept and forward `retryOptions` to `fetchWithRetry`
|
||||
- [ ] Default retry options used in `listDocuments`/`getDocument`
|
||||
|
||||
### API Efficiency
|
||||
- [ ] APIs that support field selection (e.g., `$select`, `sysparm_fields`, `fields`) should request only the fields the connector needs — in both `listDocuments` AND `getDocument`
|
||||
- [ ] No redundant API calls: if a helper already fetches data (e.g., site metadata), callers should reuse the result instead of making a second call for the same information
|
||||
- [ ] Sequential per-item API calls (fetching details for each document in a loop) should be batched with `Promise.all` and a concurrency limit of 3-5
|
||||
|
||||
### Error Handling
|
||||
- [ ] Individual document failures are caught and logged without aborting the sync
|
||||
- [ ] API error responses include status codes in error messages
|
||||
- [ ] No unhandled promise rejections in concurrent operations
|
||||
|
||||
### Concurrency
|
||||
- [ ] Concurrent API calls use reasonable batch sizes (3-5 is typical)
|
||||
- [ ] No unbounded `Promise.all` over large arrays
|
||||
|
||||
### Logging
|
||||
- [ ] Uses `createLogger` from `@sim/logger` (not `console.log`)
|
||||
- [ ] Logs sync progress at `info` level
|
||||
- [ ] Logs errors at `warn` or `error` level with context
|
||||
|
||||
### Registry
|
||||
- [ ] Connector is exported from `connectors/{service}/index.ts`
|
||||
- [ ] Connector is registered in `connectors/registry.ts`
|
||||
- [ ] Registry key matches the connector's `id` field
|
||||
|
||||
## Step 11: Report and Fix
|
||||
|
||||
### Report Format
|
||||
|
||||
Group findings by severity:
|
||||
|
||||
**Critical** (will cause runtime errors, data loss, or auth failures):
|
||||
- Wrong API endpoint URL or HTTP method
|
||||
- Invalid or missing OAuth scopes (not in provider config)
|
||||
- Incorrect response field mapping (accessing wrong path)
|
||||
- SOQL/query fields that don't exist on the target object
|
||||
- Pagination that silently hits undocumented API limits
|
||||
- Missing error handling that would crash the sync
|
||||
- `requiredScopes` not a subset of OAuth provider scopes
|
||||
- Query/filter injection: user-controlled values interpolated into OData `$filter`, SOQL, or query strings without escaping
|
||||
|
||||
**Warning** (incorrect behavior, data quality issues, or convention violations):
|
||||
- HTML content not stripped via `htmlToPlainText`
|
||||
- `getDocument` not forwarding `syncContext`
|
||||
- `getDocument` hardcoded to one content type when `listDocuments` returns multiple (e.g., only pages but not blogposts)
|
||||
- Missing `tagDefinition` for metadata fields returned by `mapTags`
|
||||
- Incorrect `useBasicAuth` or `supportsRefreshTokenRotation` in token refresh config
|
||||
- Invalid scope names that the API doesn't recognize (even if silently ignored)
|
||||
- Private resources excluded from name-based lookup despite scopes being available
|
||||
- Silent data truncation without logging
|
||||
- Size checks using `text.length` (character count) instead of `Buffer.byteLength` (byte count) for byte-based limits
|
||||
- URL-type config fields not normalized (protocol prefix, trailing slashes cause API failures)
|
||||
- `VALIDATE_RETRY_OPTIONS` not threaded through helper functions called by `validateConfig`
|
||||
|
||||
**Suggestion** (minor improvements):
|
||||
- Missing incremental sync support despite API supporting it
|
||||
- Overly broad scopes that could be narrowed (not wrong, but could be tighter)
|
||||
- Source URL format could be more specific
|
||||
- Missing `orderBy` for deterministic pagination
|
||||
- Redundant API calls that could be cached in `syncContext`
|
||||
- Sequential per-item API calls that could be batched with `Promise.all` (concurrency 3-5)
|
||||
- API supports field selection but connector fetches all fields (e.g., missing `$select`, `sysparm_fields`, `fields`)
|
||||
- `getDocument` re-fetches data already included in the initial API response (e.g., comments returned with post)
|
||||
- Last page of pagination requests full `PAGE_SIZE` when fewer records remain (`Math.min(PAGE_SIZE, remaining)`)
|
||||
|
||||
### Fix All Issues
|
||||
|
||||
After reporting, fix every **critical** and **warning** issue. Apply **suggestions** where they don't add unnecessary complexity.
|
||||
|
||||
### Validation Output
|
||||
|
||||
After fixing, confirm:
|
||||
1. `bun run lint` passes
|
||||
2. TypeScript compiles clean
|
||||
3. Re-read all modified files to verify fixes are correct
|
||||
|
||||
## Checklist Summary
|
||||
|
||||
- [ ] Read connector implementation, types, utils, registry, and OAuth config
|
||||
- [ ] Pulled and read official API documentation for the service
|
||||
- [ ] Validated every API endpoint URL, method, headers, and body against API docs
|
||||
- [ ] Validated input sanitization: no query/filter injection, URL fields normalized
|
||||
- [ ] Validated OAuth scopes: `requiredScopes` ⊆ OAuth provider `scopes` in `oauth.ts`
|
||||
- [ ] Validated each scope is real and recognized by the service's API
|
||||
- [ ] Validated scopes are sufficient for all API endpoints the connector calls
|
||||
- [ ] Validated token refresh config (`useBasicAuth`, `supportsRefreshTokenRotation`)
|
||||
- [ ] Validated pagination: cursor names, page sizes, hasMore logic, no silent caps
|
||||
- [ ] Validated data transformation: plain text extraction, HTML stripping, content hashing
|
||||
- [ ] Validated tag definitions match mapTags output, correct fieldTypes
|
||||
- [ ] Validated config fields: canonical pairs, selector keys, required flags
|
||||
- [ ] Validated validateConfig: lightweight check, error messages, retry options
|
||||
- [ ] Validated getDocument: null on 404, all content types handled, no redundant re-fetches, syncContext forwarding
|
||||
- [ ] Validated fetchWithRetry used for all external calls (no raw fetch), VALIDATE_RETRY_OPTIONS threaded through helpers
|
||||
- [ ] Validated API efficiency: field selection used, no redundant calls, sequential fetches batched
|
||||
- [ ] Validated error handling: graceful failures, no unhandled rejections
|
||||
- [ ] Validated logging: createLogger, no console.log
|
||||
- [ ] Validated registry: correct export, correct key
|
||||
- [ ] Reported all issues grouped by severity
|
||||
- [ ] Fixed all critical and warning issues
|
||||
- [ ] Ran `bun run lint` after fixes
|
||||
- [ ] Verified TypeScript compiles clean
|
||||
@@ -1,6 +1,6 @@
|
||||
import { createLogger } from '@sim/logger'
|
||||
import { ConfluenceIcon } from '@/components/icons'
|
||||
import { fetchWithRetry } from '@/lib/knowledge/documents/utils'
|
||||
import { fetchWithRetry, VALIDATE_RETRY_OPTIONS } from '@/lib/knowledge/documents/utils'
|
||||
import type { ConnectorConfig, ExternalDocument, ExternalDocumentList } from '@/connectors/types'
|
||||
import { computeContentHash, htmlToPlainText, joinTagArray, parseTagDate } from '@/connectors/utils'
|
||||
import { getConfluenceCloudId } from '@/tools/confluence/utils'
|
||||
@@ -243,23 +243,31 @@ export const confluenceConnector: ConnectorConfig = {
|
||||
const domain = sourceConfig.domain as string
|
||||
const cloudId = await getConfluenceCloudId(domain, accessToken)
|
||||
|
||||
const url = `https://api.atlassian.com/ex/confluence/${cloudId}/wiki/api/v2/pages/${externalId}?body-format=storage`
|
||||
// Try pages first, fall back to blogposts if not found
|
||||
let page: Record<string, unknown> | null = null
|
||||
for (const endpoint of ['pages', 'blogposts']) {
|
||||
const url = `https://api.atlassian.com/ex/confluence/${cloudId}/wiki/api/v2/${endpoint}/${externalId}?body-format=storage`
|
||||
const response = await fetchWithRetry(url, {
|
||||
method: 'GET',
|
||||
headers: {
|
||||
Accept: 'application/json',
|
||||
Authorization: `Bearer ${accessToken}`,
|
||||
},
|
||||
})
|
||||
|
||||
const response = await fetchWithRetry(url, {
|
||||
method: 'GET',
|
||||
headers: {
|
||||
Accept: 'application/json',
|
||||
Authorization: `Bearer ${accessToken}`,
|
||||
},
|
||||
})
|
||||
|
||||
if (!response.ok) {
|
||||
if (response.status === 404) return null
|
||||
throw new Error(`Failed to get Confluence page: ${response.status}`)
|
||||
if (response.ok) {
|
||||
page = await response.json()
|
||||
break
|
||||
}
|
||||
if (response.status !== 404) {
|
||||
throw new Error(`Failed to get Confluence content: ${response.status}`)
|
||||
}
|
||||
}
|
||||
|
||||
const page = await response.json()
|
||||
const rawContent = page.body?.storage?.value || ''
|
||||
if (!page) return null
|
||||
const body = page.body as Record<string, unknown> | undefined
|
||||
const storage = body?.storage as Record<string, unknown> | undefined
|
||||
const rawContent = (storage?.value as string) || ''
|
||||
const plainText = htmlToPlainText(rawContent)
|
||||
const contentHash = await computeContentHash(plainText)
|
||||
|
||||
@@ -267,19 +275,22 @@ export const confluenceConnector: ConnectorConfig = {
|
||||
const labelMap = await fetchLabelsForPages(cloudId, accessToken, [String(page.id)])
|
||||
const labels = labelMap.get(String(page.id)) ?? []
|
||||
|
||||
const links = page._links as Record<string, unknown> | undefined
|
||||
const version = page.version as Record<string, unknown> | undefined
|
||||
|
||||
return {
|
||||
externalId: String(page.id),
|
||||
title: page.title || 'Untitled',
|
||||
title: (page.title as string) || 'Untitled',
|
||||
content: plainText,
|
||||
mimeType: 'text/plain',
|
||||
sourceUrl: page._links?.webui ? `https://${domain}/wiki${page._links.webui}` : undefined,
|
||||
sourceUrl: links?.webui ? `https://${domain}/wiki${links.webui}` : undefined,
|
||||
contentHash,
|
||||
metadata: {
|
||||
spaceId: page.spaceId,
|
||||
status: page.status,
|
||||
version: page.version?.number,
|
||||
version: version?.number,
|
||||
labels,
|
||||
lastModified: page.version?.createdAt,
|
||||
lastModified: version?.createdAt,
|
||||
},
|
||||
}
|
||||
},
|
||||
@@ -302,7 +313,25 @@ export const confluenceConnector: ConnectorConfig = {
|
||||
|
||||
try {
|
||||
const cloudId = await getConfluenceCloudId(domain, accessToken)
|
||||
await resolveSpaceId(cloudId, accessToken, spaceKey)
|
||||
const spaceUrl = `https://api.atlassian.com/ex/confluence/${cloudId}/wiki/api/v2/spaces?keys=${encodeURIComponent(spaceKey)}&limit=1`
|
||||
const response = await fetchWithRetry(
|
||||
spaceUrl,
|
||||
{
|
||||
method: 'GET',
|
||||
headers: {
|
||||
Accept: 'application/json',
|
||||
Authorization: `Bearer ${accessToken}`,
|
||||
},
|
||||
},
|
||||
VALIDATE_RETRY_OPTIONS
|
||||
)
|
||||
if (!response.ok) {
|
||||
return { valid: false, error: `Failed to validate space: ${response.status}` }
|
||||
}
|
||||
const data = await response.json()
|
||||
if (!data.results?.length) {
|
||||
return { valid: false, error: `Space "${spaceKey}" not found` }
|
||||
}
|
||||
return { valid: true }
|
||||
} catch (error) {
|
||||
const message = error instanceof Error ? error.message : 'Failed to validate configuration'
|
||||
|
||||
@@ -2,7 +2,7 @@ import { createLogger } from '@sim/logger'
|
||||
import { GithubIcon } from '@/components/icons'
|
||||
import { fetchWithRetry, VALIDATE_RETRY_OPTIONS } from '@/lib/knowledge/documents/utils'
|
||||
import type { ConnectorConfig, ExternalDocument, ExternalDocumentList } from '@/connectors/types'
|
||||
import { computeContentHash } from '@/connectors/utils'
|
||||
import { computeContentHash, parseTagDate } from '@/connectors/utils'
|
||||
|
||||
const logger = createLogger('GitHubConnector')
|
||||
|
||||
@@ -82,7 +82,7 @@ async function fetchTree(
|
||||
const data = await response.json()
|
||||
|
||||
if (data.truncated) {
|
||||
logger.error('GitHub tree was truncated — some files may be missing', { owner, repo, branch })
|
||||
logger.warn('GitHub tree was truncated — some files may be missing', { owner, repo, branch })
|
||||
}
|
||||
|
||||
return (data.tree || []).filter((item: TreeItem) => item.type === 'blob')
|
||||
@@ -139,7 +139,7 @@ async function treeItemToDocument(
|
||||
title: item.path.split('/').pop() || item.path,
|
||||
content,
|
||||
mimeType: 'text/plain',
|
||||
sourceUrl: `https://github.com/${owner}/${repo}/blob/${branch}/${item.path}`,
|
||||
sourceUrl: `https://github.com/${owner}/${repo}/blob/${encodeURIComponent(branch)}/${item.path.split('/').map(encodeURIComponent).join('/')}`,
|
||||
contentHash,
|
||||
metadata: {
|
||||
path: item.path,
|
||||
@@ -302,6 +302,7 @@ export const githubConnector: ConnectorConfig = {
|
||||
throw new Error(`Failed to fetch file ${path}: ${response.status}`)
|
||||
}
|
||||
|
||||
const lastModifiedHeader = response.headers.get('last-modified') || undefined
|
||||
const data = await response.json()
|
||||
const content =
|
||||
data.encoding === 'base64'
|
||||
@@ -314,7 +315,7 @@ export const githubConnector: ConnectorConfig = {
|
||||
title: path.split('/').pop() || path,
|
||||
content,
|
||||
mimeType: 'text/plain',
|
||||
sourceUrl: `https://github.com/${owner}/${repo}/blob/${branch}/${path}`,
|
||||
sourceUrl: `https://github.com/${owner}/${repo}/blob/${encodeURIComponent(branch)}/${path.split('/').map(encodeURIComponent).join('/')}`,
|
||||
contentHash,
|
||||
metadata: {
|
||||
path,
|
||||
@@ -322,6 +323,7 @@ export const githubConnector: ConnectorConfig = {
|
||||
size: data.size as number,
|
||||
branch,
|
||||
repository: `${owner}/${repo}`,
|
||||
lastModified: lastModifiedHeader,
|
||||
},
|
||||
}
|
||||
} catch (error) {
|
||||
@@ -400,6 +402,7 @@ export const githubConnector: ConnectorConfig = {
|
||||
{ id: 'repository', displayName: 'Repository', fieldType: 'text' },
|
||||
{ id: 'branch', displayName: 'Branch', fieldType: 'text' },
|
||||
{ id: 'size', displayName: 'File Size', fieldType: 'number' },
|
||||
{ id: 'lastModified', displayName: 'Last Modified', fieldType: 'date' },
|
||||
],
|
||||
|
||||
mapTags: (metadata: Record<string, unknown>): Record<string, unknown> => {
|
||||
@@ -414,6 +417,9 @@ export const githubConnector: ConnectorConfig = {
|
||||
if (!Number.isNaN(num)) result.size = num
|
||||
}
|
||||
|
||||
const lastModified = parseTagDate(metadata.lastModified)
|
||||
if (lastModified) result.lastModified = lastModified
|
||||
|
||||
return result
|
||||
},
|
||||
}
|
||||
|
||||
@@ -439,6 +439,8 @@ export const googleCalendarConnector: ConnectorConfig = {
|
||||
{ id: 'attendeeCount', displayName: 'Attendee Count', fieldType: 'number' },
|
||||
{ id: 'location', displayName: 'Location', fieldType: 'text' },
|
||||
{ id: 'eventDate', displayName: 'Event Date', fieldType: 'date' },
|
||||
{ id: 'lastModified', displayName: 'Last Modified', fieldType: 'date' },
|
||||
{ id: 'createdAt', displayName: 'Created', fieldType: 'date' },
|
||||
],
|
||||
|
||||
mapTags: (metadata: Record<string, unknown>): Record<string, unknown> => {
|
||||
@@ -459,6 +461,12 @@ export const googleCalendarConnector: ConnectorConfig = {
|
||||
const eventDate = parseTagDate(metadata.eventDate)
|
||||
if (eventDate) result.eventDate = eventDate
|
||||
|
||||
const lastModified = parseTagDate(metadata.updatedTime)
|
||||
if (lastModified) result.lastModified = lastModified
|
||||
|
||||
const createdAt = parseTagDate(metadata.createdTime)
|
||||
if (createdAt) result.createdAt = createdAt
|
||||
|
||||
return result
|
||||
},
|
||||
}
|
||||
|
||||
@@ -162,7 +162,7 @@ function buildQuery(sourceConfig: Record<string, unknown>): string {
|
||||
|
||||
const folderId = sourceConfig.folderId as string | undefined
|
||||
if (folderId?.trim()) {
|
||||
parts.push(`'${folderId.trim()}' in parents`)
|
||||
parts.push(`'${folderId.trim().replace(/'/g, "\\'")}' in parents`)
|
||||
}
|
||||
|
||||
return parts.join(' and ')
|
||||
|
||||
@@ -112,7 +112,7 @@ function buildQuery(sourceConfig: Record<string, unknown>): string {
|
||||
|
||||
const folderId = sourceConfig.folderId as string | undefined
|
||||
if (folderId?.trim()) {
|
||||
parts.push(`'${folderId.trim()}' in parents`)
|
||||
parts.push(`'${folderId.trim().replace(/'/g, "\\'")}' in parents`)
|
||||
}
|
||||
|
||||
const fileType = (sourceConfig.fileType as string) || 'all'
|
||||
|
||||
@@ -2,11 +2,12 @@ import { createLogger } from '@sim/logger'
|
||||
import { GoogleSheetsIcon } from '@/components/icons'
|
||||
import { fetchWithRetry, VALIDATE_RETRY_OPTIONS } from '@/lib/knowledge/documents/utils'
|
||||
import type { ConnectorConfig, ExternalDocument, ExternalDocumentList } from '@/connectors/types'
|
||||
import { computeContentHash } from '@/connectors/utils'
|
||||
import { computeContentHash, parseTagDate } from '@/connectors/utils'
|
||||
|
||||
const logger = createLogger('GoogleSheetsConnector')
|
||||
|
||||
const SHEETS_API_BASE = 'https://sheets.googleapis.com/v4/spreadsheets'
|
||||
const DRIVE_API_BASE = 'https://www.googleapis.com/drive/v3/files'
|
||||
const MAX_ROWS = 10000
|
||||
const CONCURRENCY = 3
|
||||
|
||||
@@ -102,6 +103,38 @@ async function fetchSpreadsheetMetadata(
|
||||
return (await response.json()) as SpreadsheetMetadata
|
||||
}
|
||||
|
||||
/**
|
||||
* Fetches the spreadsheet's modifiedTime from the Drive API.
|
||||
*/
|
||||
async function fetchSpreadsheetModifiedTime(
|
||||
accessToken: string,
|
||||
spreadsheetId: string
|
||||
): Promise<string | undefined> {
|
||||
try {
|
||||
const url = `${DRIVE_API_BASE}/${encodeURIComponent(spreadsheetId)}?fields=modifiedTime&supportsAllDrives=true`
|
||||
const response = await fetchWithRetry(url, {
|
||||
method: 'GET',
|
||||
headers: {
|
||||
Authorization: `Bearer ${accessToken}`,
|
||||
Accept: 'application/json',
|
||||
},
|
||||
})
|
||||
|
||||
if (!response.ok) {
|
||||
logger.warn('Failed to fetch modifiedTime from Drive API', { status: response.status })
|
||||
return undefined
|
||||
}
|
||||
|
||||
const data = (await response.json()) as { modifiedTime?: string }
|
||||
return data.modifiedTime
|
||||
} catch (error) {
|
||||
logger.warn('Error fetching modifiedTime from Drive API', {
|
||||
error: error instanceof Error ? error.message : String(error),
|
||||
})
|
||||
return undefined
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Converts a single sheet tab into an ExternalDocument.
|
||||
*/
|
||||
@@ -109,7 +142,8 @@ async function sheetToDocument(
|
||||
accessToken: string,
|
||||
spreadsheetId: string,
|
||||
spreadsheetTitle: string,
|
||||
sheet: SheetProperties
|
||||
sheet: SheetProperties,
|
||||
modifiedTime?: string
|
||||
): Promise<ExternalDocument | null> {
|
||||
try {
|
||||
const values = await fetchSheetValues(accessToken, spreadsheetId, sheet.title)
|
||||
@@ -151,6 +185,7 @@ async function sheetToDocument(
|
||||
sheetId: sheet.sheetId,
|
||||
rowCount,
|
||||
columnCount: headers.length,
|
||||
...(modifiedTime ? { modifiedTime } : {}),
|
||||
},
|
||||
}
|
||||
} catch (error) {
|
||||
@@ -208,7 +243,10 @@ export const googleSheetsConnector: ConnectorConfig = {
|
||||
|
||||
logger.info('Fetching spreadsheet metadata', { spreadsheetId })
|
||||
|
||||
const metadata = await fetchSpreadsheetMetadata(accessToken, spreadsheetId)
|
||||
const [metadata, modifiedTime] = await Promise.all([
|
||||
fetchSpreadsheetMetadata(accessToken, spreadsheetId),
|
||||
fetchSpreadsheetModifiedTime(accessToken, spreadsheetId),
|
||||
])
|
||||
const sheetFilter = (sourceConfig.sheetFilter as string) || 'all'
|
||||
|
||||
let sheets = metadata.sheets.map((s) => s.properties)
|
||||
@@ -226,7 +264,13 @@ export const googleSheetsConnector: ConnectorConfig = {
|
||||
const batch = sheets.slice(i, i + CONCURRENCY)
|
||||
const results = await Promise.all(
|
||||
batch.map((sheet) =>
|
||||
sheetToDocument(accessToken, spreadsheetId, metadata.properties.title, sheet)
|
||||
sheetToDocument(
|
||||
accessToken,
|
||||
spreadsheetId,
|
||||
metadata.properties.title,
|
||||
sheet,
|
||||
modifiedTime
|
||||
)
|
||||
)
|
||||
)
|
||||
documents.push(...(results.filter(Boolean) as ExternalDocument[]))
|
||||
@@ -257,7 +301,22 @@ export const googleSheetsConnector: ConnectorConfig = {
|
||||
return null
|
||||
}
|
||||
|
||||
const metadata = await fetchSpreadsheetMetadata(accessToken, spreadsheetId)
|
||||
let metadata: SpreadsheetMetadata
|
||||
let modifiedTime: string | undefined
|
||||
try {
|
||||
;[metadata, modifiedTime] = await Promise.all([
|
||||
fetchSpreadsheetMetadata(accessToken, spreadsheetId),
|
||||
fetchSpreadsheetModifiedTime(accessToken, spreadsheetId),
|
||||
])
|
||||
} catch (error) {
|
||||
const message = error instanceof Error ? error.message : String(error)
|
||||
if (message.includes('404')) {
|
||||
logger.info('Spreadsheet not found (possibly deleted)', { spreadsheetId })
|
||||
return null
|
||||
}
|
||||
throw error
|
||||
}
|
||||
|
||||
const sheetEntry = metadata.sheets.find((s) => s.properties.sheetId === sheetId)
|
||||
|
||||
if (!sheetEntry) {
|
||||
@@ -269,7 +328,8 @@ export const googleSheetsConnector: ConnectorConfig = {
|
||||
accessToken,
|
||||
spreadsheetId,
|
||||
metadata.properties.title,
|
||||
sheetEntry.properties
|
||||
sheetEntry.properties,
|
||||
modifiedTime
|
||||
)
|
||||
},
|
||||
|
||||
@@ -325,6 +385,7 @@ export const googleSheetsConnector: ConnectorConfig = {
|
||||
{ id: 'sheetTitle', displayName: 'Sheet Name', fieldType: 'text' },
|
||||
{ id: 'rowCount', displayName: 'Row Count', fieldType: 'number' },
|
||||
{ id: 'columnCount', displayName: 'Column Count', fieldType: 'number' },
|
||||
{ id: 'lastModified', displayName: 'Last Modified', fieldType: 'date' },
|
||||
],
|
||||
|
||||
mapTags: (metadata: Record<string, unknown>): Record<string, unknown> => {
|
||||
@@ -342,6 +403,11 @@ export const googleSheetsConnector: ConnectorConfig = {
|
||||
result.columnCount = metadata.columnCount
|
||||
}
|
||||
|
||||
const lastModified = parseTagDate(metadata.modifiedTime)
|
||||
if (lastModified) {
|
||||
result.lastModified = lastModified
|
||||
}
|
||||
|
||||
return result
|
||||
},
|
||||
}
|
||||
|
||||
@@ -223,20 +223,15 @@ export const hubspotConnector: ConnectorConfig = {
|
||||
|
||||
const portalId = await getPortalId(accessToken, syncContext)
|
||||
|
||||
const body: Record<string, unknown> = {
|
||||
filterGroups: [],
|
||||
sorts: [
|
||||
{
|
||||
propertyName: objectType === 'contacts' ? 'lastmodifieddate' : 'hs_lastmodifieddate',
|
||||
direction: 'DESCENDING',
|
||||
},
|
||||
],
|
||||
properties: [...properties],
|
||||
const sortProperty = objectType === 'contacts' ? 'lastmodifieddate' : 'hs_lastmodifieddate'
|
||||
|
||||
const searchBody: Record<string, unknown> = {
|
||||
properties,
|
||||
sorts: [{ propertyName: sortProperty, direction: 'DESCENDING' }],
|
||||
limit: PAGE_SIZE,
|
||||
}
|
||||
|
||||
if (cursor) {
|
||||
body.after = cursor
|
||||
searchBody.after = cursor
|
||||
}
|
||||
|
||||
logger.info(`Listing HubSpot ${objectType}`, { cursor })
|
||||
@@ -248,16 +243,16 @@ export const hubspotConnector: ConnectorConfig = {
|
||||
Accept: 'application/json',
|
||||
Authorization: `Bearer ${accessToken}`,
|
||||
},
|
||||
body: JSON.stringify(body),
|
||||
body: JSON.stringify(searchBody),
|
||||
})
|
||||
|
||||
if (!response.ok) {
|
||||
const errorText = await response.text()
|
||||
logger.error(`Failed to search HubSpot ${objectType}`, {
|
||||
logger.error(`Failed to list HubSpot ${objectType}`, {
|
||||
status: response.status,
|
||||
error: errorText,
|
||||
})
|
||||
throw new Error(`Failed to search HubSpot ${objectType}: ${response.status}`)
|
||||
throw new Error(`Failed to list HubSpot ${objectType}: ${response.status}`)
|
||||
}
|
||||
|
||||
const data = await response.json()
|
||||
@@ -294,12 +289,17 @@ export const hubspotConnector: ConnectorConfig = {
|
||||
getDocument: async (
|
||||
accessToken: string,
|
||||
sourceConfig: Record<string, unknown>,
|
||||
externalId: string
|
||||
externalId: string,
|
||||
syncContext?: Record<string, unknown>
|
||||
): Promise<ExternalDocument | null> => {
|
||||
const objectType = sourceConfig.objectType as string
|
||||
const properties = OBJECT_PROPERTIES[objectType] || []
|
||||
|
||||
const portalId = await getPortalId(accessToken)
|
||||
let portalId = syncContext?.portalId as string | undefined
|
||||
if (!portalId) {
|
||||
portalId = await getPortalId(accessToken)
|
||||
if (syncContext) syncContext.portalId = portalId
|
||||
}
|
||||
|
||||
const params = new URLSearchParams()
|
||||
for (const prop of properties) {
|
||||
@@ -346,19 +346,13 @@ export const hubspotConnector: ConnectorConfig = {
|
||||
|
||||
try {
|
||||
const response = await fetchWithRetry(
|
||||
`${BASE_URL}/crm/v3/objects/${objectType}/search`,
|
||||
`${BASE_URL}/crm/v3/objects/${objectType}?limit=1`,
|
||||
{
|
||||
method: 'POST',
|
||||
method: 'GET',
|
||||
headers: {
|
||||
'Content-Type': 'application/json',
|
||||
Accept: 'application/json',
|
||||
Authorization: `Bearer ${accessToken}`,
|
||||
},
|
||||
body: JSON.stringify({
|
||||
filterGroups: [],
|
||||
limit: 1,
|
||||
properties: ['hs_object_id'],
|
||||
}),
|
||||
},
|
||||
VALIDATE_RETRY_OPTIONS
|
||||
)
|
||||
|
||||
@@ -155,7 +155,11 @@ export const jiraConnector: ConnectorConfig = {
|
||||
const params = new URLSearchParams()
|
||||
params.append('jql', jql)
|
||||
params.append('startAt', String(startAt))
|
||||
params.append('maxResults', String(PAGE_SIZE))
|
||||
const remaining = maxIssues > 0 ? Math.max(0, maxIssues - startAt) : PAGE_SIZE
|
||||
if (remaining === 0) {
|
||||
return { documents: [], hasMore: false }
|
||||
}
|
||||
params.append('maxResults', String(Math.min(PAGE_SIZE, remaining)))
|
||||
params.append(
|
||||
'fields',
|
||||
'summary,description,comment,issuetype,status,priority,assignee,reporter,project,labels,created,updated'
|
||||
@@ -203,10 +207,15 @@ export const jiraConnector: ConnectorConfig = {
|
||||
getDocument: async (
|
||||
accessToken: string,
|
||||
sourceConfig: Record<string, unknown>,
|
||||
externalId: string
|
||||
externalId: string,
|
||||
syncContext?: Record<string, unknown>
|
||||
): Promise<ExternalDocument | null> => {
|
||||
const domain = sourceConfig.domain as string
|
||||
const cloudId = await getJiraCloudId(domain, accessToken)
|
||||
let cloudId = syncContext?.cloudId as string | undefined
|
||||
if (!cloudId) {
|
||||
cloudId = await getJiraCloudId(domain, accessToken)
|
||||
if (syncContext) syncContext.cloudId = cloudId
|
||||
}
|
||||
|
||||
const params = new URLSearchParams()
|
||||
params.append(
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
import { createLogger } from '@sim/logger'
|
||||
import { LinearIcon } from '@/components/icons'
|
||||
import { fetchWithRetry } from '@/lib/knowledge/documents/utils'
|
||||
import type { RetryOptions } from '@/lib/knowledge/documents/utils'
|
||||
import { fetchWithRetry, VALIDATE_RETRY_OPTIONS } from '@/lib/knowledge/documents/utils'
|
||||
import type { ConnectorConfig, ExternalDocument, ExternalDocumentList } from '@/connectors/types'
|
||||
import { computeContentHash, joinTagArray, parseTagDate } from '@/connectors/utils'
|
||||
|
||||
@@ -35,16 +36,21 @@ function markdownToPlainText(md: string): string {
|
||||
async function linearGraphQL(
|
||||
accessToken: string,
|
||||
query: string,
|
||||
variables?: Record<string, unknown>
|
||||
variables?: Record<string, unknown>,
|
||||
retryOptions?: RetryOptions
|
||||
): Promise<Record<string, unknown>> {
|
||||
const response = await fetchWithRetry(LINEAR_API, {
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'Content-Type': 'application/json',
|
||||
Authorization: `Bearer ${accessToken}`,
|
||||
const response = await fetchWithRetry(
|
||||
LINEAR_API,
|
||||
{
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'Content-Type': 'application/json',
|
||||
Authorization: `Bearer ${accessToken}`,
|
||||
},
|
||||
body: JSON.stringify({ query, variables }),
|
||||
},
|
||||
body: JSON.stringify({ query, variables }),
|
||||
})
|
||||
retryOptions
|
||||
)
|
||||
|
||||
if (!response.ok) {
|
||||
const errorText = await response.text()
|
||||
@@ -374,7 +380,7 @@ export const linearConnector: ConnectorConfig = {
|
||||
|
||||
try {
|
||||
// Verify the token works by fetching teams
|
||||
const data = await linearGraphQL(accessToken, TEAMS_QUERY)
|
||||
const data = await linearGraphQL(accessToken, TEAMS_QUERY, undefined, VALIDATE_RETRY_OPTIONS)
|
||||
const teamsConn = data.teams as Record<string, unknown>
|
||||
const teams = (teamsConn.nodes || []) as Record<string, unknown>[]
|
||||
|
||||
|
||||
@@ -66,7 +66,6 @@ async function graphApiGet<T>(
|
||||
headers: {
|
||||
Authorization: `Bearer ${accessToken}`,
|
||||
Accept: 'application/json',
|
||||
Prefer: 'outlook.body-content-type="text"',
|
||||
},
|
||||
},
|
||||
retryOptions
|
||||
@@ -191,7 +190,7 @@ export const microsoftTeamsConnector: ConnectorConfig = {
|
||||
auth: {
|
||||
mode: 'oauth',
|
||||
provider: 'microsoft-teams',
|
||||
requiredScopes: ['ChannelMessage.Read.All'],
|
||||
requiredScopes: ['ChannelMessage.Read.All', 'Channel.ReadBasic.All'],
|
||||
},
|
||||
|
||||
configFields: [
|
||||
|
||||
@@ -538,10 +538,8 @@ async function listFromParentPage(
|
||||
const data = await response.json()
|
||||
const blocks = (data.results || []) as Record<string, unknown>[]
|
||||
|
||||
// Filter to child_page and child_database blocks
|
||||
const childPageIds = blocks
|
||||
.filter((b) => b.type === 'child_page' || b.type === 'child_database')
|
||||
.map((b) => b.id as string)
|
||||
// Filter to child_page blocks only (child_database blocks cannot be fetched via the Pages API)
|
||||
const childPageIds = blocks.filter((b) => b.type === 'child_page').map((b) => b.id as string)
|
||||
|
||||
// Also include the root page itself on the first call (no cursor)
|
||||
const pageIdsToFetch = !cursor ? [rootPageId, ...childPageIds] : childPageIds
|
||||
|
||||
@@ -200,33 +200,47 @@ export const obsidianConnector: ConnectorConfig = {
|
||||
|
||||
const documents: ExternalDocument[] = []
|
||||
|
||||
for (const filePath of pageFiles) {
|
||||
try {
|
||||
const note = await fetchNote(baseUrl, accessToken, filePath)
|
||||
const content = note.content || ''
|
||||
const contentHash = await computeContentHash(content)
|
||||
const BATCH_SIZE = 5
|
||||
for (let i = 0; i < pageFiles.length; i += BATCH_SIZE) {
|
||||
const batch = pageFiles.slice(i, i + BATCH_SIZE)
|
||||
const results = await Promise.all(
|
||||
batch.map(async (filePath) => {
|
||||
try {
|
||||
const note = await fetchNote(baseUrl, accessToken, filePath)
|
||||
const content = note.content || ''
|
||||
const contentHash = await computeContentHash(content)
|
||||
|
||||
documents.push({
|
||||
externalId: filePath,
|
||||
title: titleFromPath(filePath),
|
||||
content,
|
||||
mimeType: 'text/plain',
|
||||
sourceUrl: `${baseUrl}/vault/${filePath.split('/').map(encodeURIComponent).join('/')}`,
|
||||
contentHash,
|
||||
metadata: {
|
||||
tags: note.tags,
|
||||
frontmatter: note.frontmatter,
|
||||
createdAt: note.stat?.ctime ? new Date(note.stat.ctime).toISOString() : undefined,
|
||||
modifiedAt: note.stat?.mtime ? new Date(note.stat.mtime).toISOString() : undefined,
|
||||
size: note.stat?.size,
|
||||
folder: filePath.includes('/') ? filePath.substring(0, filePath.lastIndexOf('/')) : '',
|
||||
},
|
||||
})
|
||||
} catch (error) {
|
||||
logger.warn('Failed to fetch note', {
|
||||
filePath,
|
||||
error: error instanceof Error ? error.message : String(error),
|
||||
return {
|
||||
externalId: filePath,
|
||||
title: titleFromPath(filePath),
|
||||
content,
|
||||
mimeType: 'text/plain' as const,
|
||||
sourceUrl: `${baseUrl}/vault/${filePath.split('/').map(encodeURIComponent).join('/')}`,
|
||||
contentHash,
|
||||
metadata: {
|
||||
tags: note.tags,
|
||||
frontmatter: note.frontmatter,
|
||||
createdAt: note.stat?.ctime ? new Date(note.stat.ctime).toISOString() : undefined,
|
||||
modifiedAt: note.stat?.mtime ? new Date(note.stat.mtime).toISOString() : undefined,
|
||||
size: note.stat?.size,
|
||||
folder: filePath.includes('/')
|
||||
? filePath.substring(0, filePath.lastIndexOf('/'))
|
||||
: '',
|
||||
},
|
||||
}
|
||||
} catch (error) {
|
||||
logger.warn('Failed to fetch note', {
|
||||
filePath,
|
||||
error: error instanceof Error ? error.message : String(error),
|
||||
})
|
||||
return null
|
||||
}
|
||||
})
|
||||
)
|
||||
for (const doc of results) {
|
||||
if (doc) {
|
||||
documents.push(doc)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -71,8 +71,8 @@ async function downloadFileContent(accessToken: string, fileId: string): Promise
|
||||
}
|
||||
|
||||
const text = await response.text()
|
||||
if (text.length > MAX_FILE_SIZE) {
|
||||
return text.slice(0, MAX_FILE_SIZE)
|
||||
if (Buffer.byteLength(text, 'utf8') > MAX_FILE_SIZE) {
|
||||
return Buffer.from(text, 'utf8').subarray(0, MAX_FILE_SIZE).toString('utf8')
|
||||
}
|
||||
return text
|
||||
}
|
||||
|
||||
@@ -464,7 +464,7 @@ export const outlookConnector: ConnectorConfig = {
|
||||
try {
|
||||
// Fetch messages for this conversation
|
||||
const params = new URLSearchParams({
|
||||
$filter: `conversationId eq '${externalId}'`,
|
||||
$filter: `conversationId eq '${externalId.replace(/'/g, "''")}'`,
|
||||
$select: MESSAGE_FIELDS,
|
||||
$top: '50',
|
||||
})
|
||||
@@ -557,7 +557,12 @@ export const outlookConnector: ConnectorConfig = {
|
||||
// If a search query is specified, verify it's valid with a dry run
|
||||
const searchQuery = sourceConfig.query as string | undefined
|
||||
if (searchQuery?.trim()) {
|
||||
const searchUrl = `${GRAPH_API_BASE}/messages?$search="${encodeURIComponent(searchQuery.trim())}"&$top=1&$select=id`
|
||||
const searchParams = new URLSearchParams({
|
||||
$search: `"${searchQuery.trim()}"`,
|
||||
$top: '1',
|
||||
$select: 'id',
|
||||
})
|
||||
const searchUrl = `${GRAPH_API_BASE}/messages?${searchParams.toString()}`
|
||||
const searchResponse = await fetchWithRetry(
|
||||
searchUrl,
|
||||
{
|
||||
|
||||
@@ -103,18 +103,7 @@ async function fetchPostComments(
|
||||
|
||||
if (!Array.isArray(data) || data.length < 2) return []
|
||||
|
||||
const commentListing = data[1]
|
||||
const comments: string[] = []
|
||||
|
||||
for (const child of commentListing.data.children) {
|
||||
if (child.kind !== 't1') continue
|
||||
const comment = child as RedditComment
|
||||
if (!comment.data.body || comment.data.author === 'AutoModerator') continue
|
||||
comments.push(`[${comment.data.author} | score: ${comment.data.score}]: ${comment.data.body}`)
|
||||
if (comments.length >= maxComments) break
|
||||
}
|
||||
|
||||
return comments
|
||||
return extractComments(data[1], maxComments)
|
||||
} catch (error) {
|
||||
logger.warn('Failed to fetch comments for post', {
|
||||
postId,
|
||||
@@ -124,13 +113,32 @@ async function fetchPostComments(
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Extracts formatted comment strings from a Reddit comment listing.
|
||||
*/
|
||||
function extractComments(commentListing: RedditListing, maxComments: number): string[] {
|
||||
const comments: string[] = []
|
||||
|
||||
for (const child of commentListing.data.children) {
|
||||
if (child.kind !== 't1') continue
|
||||
const comment = child as RedditComment
|
||||
if (!comment.data.body || comment.data.author === 'AutoModerator') continue
|
||||
comments.push(`[${comment.data.author} | score: ${comment.data.score}]: ${comment.data.body}`)
|
||||
if (comments.length >= maxComments) break
|
||||
}
|
||||
|
||||
return comments
|
||||
}
|
||||
|
||||
/**
|
||||
* Formats a Reddit post with its comments into a document content string.
|
||||
* When `prefetchedComments` is provided, uses those directly instead of fetching.
|
||||
*/
|
||||
async function formatPostContent(
|
||||
accessToken: string,
|
||||
post: RedditPost['data'],
|
||||
maxComments: number
|
||||
maxComments: number,
|
||||
prefetchedComments?: string[]
|
||||
): Promise<string> {
|
||||
const lines: string[] = []
|
||||
|
||||
@@ -153,7 +161,9 @@ async function formatPostContent(
|
||||
}
|
||||
|
||||
if (maxComments > 0) {
|
||||
const comments = await fetchPostComments(accessToken, post.subreddit, post.id, maxComments)
|
||||
const comments =
|
||||
prefetchedComments ??
|
||||
(await fetchPostComments(accessToken, post.subreddit, post.id, maxComments))
|
||||
if (comments.length > 0) {
|
||||
lines.push('---')
|
||||
lines.push(`Top Comments (${comments.length}):`)
|
||||
@@ -384,7 +394,9 @@ export const redditConnector: ConnectorConfig = {
|
||||
if (postChildren.length === 0) return null
|
||||
|
||||
const post = postChildren[0].data
|
||||
const content = await formatPostContent(accessToken, post, COMMENTS_PER_POST)
|
||||
const comments =
|
||||
data.length >= 2 ? extractComments(data[1] as RedditListing, COMMENTS_PER_POST) : []
|
||||
const content = await formatPostContent(accessToken, post, COMMENTS_PER_POST, comments)
|
||||
const contentHash = await computeContentHash(content)
|
||||
|
||||
return {
|
||||
|
||||
@@ -2,24 +2,17 @@ import { createLogger } from '@sim/logger'
|
||||
import { SalesforceIcon } from '@/components/icons'
|
||||
import { fetchWithRetry, VALIDATE_RETRY_OPTIONS } from '@/lib/knowledge/documents/utils'
|
||||
import type { ConnectorConfig, ExternalDocument, ExternalDocumentList } from '@/connectors/types'
|
||||
import { computeContentHash, parseTagDate } from '@/connectors/utils'
|
||||
import { computeContentHash, htmlToPlainText, parseTagDate } from '@/connectors/utils'
|
||||
|
||||
const logger = createLogger('SalesforceConnector')
|
||||
|
||||
const USERINFO_URL = 'https://login.salesforce.com/services/oauth2/userinfo'
|
||||
const API_VERSION = 'v59.0'
|
||||
const API_VERSION = 'v62.0'
|
||||
const PAGE_SIZE = 200
|
||||
|
||||
/** SOQL field lists per object type. */
|
||||
const OBJECT_FIELDS: Record<string, string[]> = {
|
||||
KnowledgeArticleVersion: [
|
||||
'Id',
|
||||
'Title',
|
||||
'Summary',
|
||||
'ArticleBody',
|
||||
'LastModifiedDate',
|
||||
'ArticleNumber',
|
||||
],
|
||||
KnowledgeArticleVersion: ['Id', 'Title', 'Summary', 'LastModifiedDate', 'ArticleNumber'],
|
||||
Case: ['Id', 'Subject', 'Description', 'Status', 'LastModifiedDate', 'CaseNumber'],
|
||||
Account: ['Id', 'Name', 'Description', 'Industry', 'LastModifiedDate'],
|
||||
Opportunity: [
|
||||
@@ -98,6 +91,9 @@ function buildRecordTitle(objectType: string, record: Record<string, unknown>):
|
||||
}
|
||||
}
|
||||
|
||||
/** Fields that may contain HTML content and should be stripped to plain text. */
|
||||
const HTML_FIELDS = new Set(['Description', 'Summary'])
|
||||
|
||||
/**
|
||||
* Builds plain-text content from a Salesforce record for indexing.
|
||||
*/
|
||||
@@ -112,7 +108,9 @@ function buildRecordContent(objectType: string, record: Record<string, unknown>)
|
||||
const value = record[field]
|
||||
if (value != null && value !== '') {
|
||||
const label = field.replace(/([A-Z])/g, ' $1').trim()
|
||||
parts.push(`${label}: ${String(value)}`)
|
||||
const text =
|
||||
HTML_FIELDS.has(field) && typeof value === 'string' ? htmlToPlainText(value) : String(value)
|
||||
parts.push(`${label}: ${text}`)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -288,7 +286,8 @@ export const salesforceConnector: ConnectorConfig = {
|
||||
getDocument: async (
|
||||
accessToken: string,
|
||||
sourceConfig: Record<string, unknown>,
|
||||
externalId: string
|
||||
externalId: string,
|
||||
syncContext?: Record<string, unknown>
|
||||
): Promise<ExternalDocument | null> => {
|
||||
const objectType = sourceConfig.objectType as string
|
||||
const fields = OBJECT_FIELDS[objectType]
|
||||
@@ -297,7 +296,11 @@ export const salesforceConnector: ConnectorConfig = {
|
||||
throw new Error(`Unsupported Salesforce object type: ${objectType}`)
|
||||
}
|
||||
|
||||
const instanceUrl = await resolveInstanceUrl(accessToken)
|
||||
let instanceUrl = syncContext?.instanceUrl as string | undefined
|
||||
if (!instanceUrl) {
|
||||
instanceUrl = await resolveInstanceUrl(accessToken)
|
||||
if (syncContext) syncContext.instanceUrl = instanceUrl
|
||||
}
|
||||
|
||||
const url = `${instanceUrl}sobjects/${objectType}/${externalId}?fields=${fields.join(',')}`
|
||||
|
||||
|
||||
@@ -513,11 +513,16 @@ export const servicenowConnector: ConnectorConfig = {
|
||||
const isKB = contentType === 'kb_knowledge'
|
||||
const tableName = isKB ? 'kb_knowledge' : 'incident'
|
||||
|
||||
const fields = isKB
|
||||
? 'sys_id,short_description,text,wiki,workflow_state,kb_category,kb_knowledge_base,number,author,sys_created_by,sys_updated_by,sys_updated_on,sys_created_on'
|
||||
: 'sys_id,number,short_description,description,state,priority,category,assigned_to,opened_by,close_notes,resolution_notes,sys_created_by,sys_updated_by,sys_updated_on,sys_created_on'
|
||||
|
||||
try {
|
||||
const { result } = await serviceNowApiGet(instanceUrl, tableName, authHeader, {
|
||||
sysparm_query: `sys_id=${externalId}`,
|
||||
sysparm_limit: '1',
|
||||
sysparm_offset: '0',
|
||||
sysparm_fields: fields,
|
||||
sysparm_display_value: 'all',
|
||||
})
|
||||
|
||||
|
||||
@@ -61,7 +61,7 @@ async function resolveSiteId(
|
||||
accessToken: string,
|
||||
siteUrl: string,
|
||||
retryOptions?: Parameters<typeof fetchWithRetry>[2]
|
||||
): Promise<string> {
|
||||
): Promise<{ id: string; displayName: string }> {
|
||||
// Normalise: strip protocol, trailing slashes
|
||||
const cleaned = siteUrl.replace(/^https?:\/\//, '').replace(/\/+$/, '')
|
||||
|
||||
@@ -108,7 +108,7 @@ async function resolveSiteId(
|
||||
siteId: site.id,
|
||||
displayName: site.displayName,
|
||||
})
|
||||
return site.id
|
||||
return { id: site.id, displayName: site.displayName ?? '' }
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -338,17 +338,9 @@ export const sharepointConnector: ConnectorConfig = {
|
||||
siteId = syncContext.siteId as string
|
||||
siteName = (syncContext.siteName as string) ?? ''
|
||||
} else {
|
||||
siteId = await resolveSiteId(accessToken, siteUrl)
|
||||
|
||||
// Fetch site display name
|
||||
const siteResponse = await fetchWithRetry(`${GRAPH_BASE}/sites/${siteId}`, {
|
||||
method: 'GET',
|
||||
headers: { Authorization: `Bearer ${accessToken}`, Accept: 'application/json' },
|
||||
})
|
||||
const siteData = siteResponse.ok
|
||||
? ((await siteResponse.json()) as { displayName?: string })
|
||||
: {}
|
||||
siteName = siteData.displayName ?? siteUrl
|
||||
const site = await resolveSiteId(accessToken, siteUrl)
|
||||
siteId = site.id
|
||||
siteName = site.displayName || siteUrl
|
||||
|
||||
if (syncContext) {
|
||||
syncContext.siteId = siteId
|
||||
@@ -463,10 +455,22 @@ export const sharepointConnector: ConnectorConfig = {
|
||||
getDocument: async (
|
||||
accessToken: string,
|
||||
sourceConfig: Record<string, unknown>,
|
||||
externalId: string
|
||||
externalId: string,
|
||||
syncContext?: Record<string, unknown>
|
||||
): Promise<ExternalDocument | null> => {
|
||||
const siteUrl = sourceConfig.siteUrl as string
|
||||
const siteId = await resolveSiteId(accessToken, siteUrl)
|
||||
|
||||
let siteId = syncContext?.siteId as string | undefined
|
||||
let siteName = syncContext?.siteName as string | undefined
|
||||
if (!siteId) {
|
||||
const site = await resolveSiteId(accessToken, siteUrl)
|
||||
siteId = site.id
|
||||
siteName = site.displayName ?? siteUrl
|
||||
if (syncContext) {
|
||||
syncContext.siteId = siteId
|
||||
syncContext.siteName = siteName
|
||||
}
|
||||
}
|
||||
|
||||
const url = `${GRAPH_BASE}/sites/${siteId}/drive/items/${externalId}`
|
||||
const response = await fetchWithRetry(url, {
|
||||
@@ -484,22 +488,11 @@ export const sharepointConnector: ConnectorConfig = {
|
||||
|
||||
const item = (await response.json()) as DriveItem
|
||||
|
||||
// Verify it is a supported text file
|
||||
if (!item.file || !isSupportedTextFile(item.name)) {
|
||||
return null
|
||||
}
|
||||
|
||||
// Fetch site display name for metadata
|
||||
const siteResponse = await fetchWithRetry(`${GRAPH_BASE}/sites/${siteId}`, {
|
||||
method: 'GET',
|
||||
headers: { Authorization: `Bearer ${accessToken}`, Accept: 'application/json' },
|
||||
})
|
||||
const siteData = siteResponse.ok
|
||||
? ((await siteResponse.json()) as { displayName?: string })
|
||||
: {}
|
||||
const siteName = siteData.displayName ?? siteUrl
|
||||
|
||||
return itemToDocument(accessToken, siteId, item, siteName)
|
||||
return itemToDocument(accessToken, siteId, item, siteName ?? siteUrl)
|
||||
},
|
||||
|
||||
validateConfig: async (
|
||||
@@ -517,7 +510,8 @@ export const sharepointConnector: ConnectorConfig = {
|
||||
}
|
||||
|
||||
try {
|
||||
const siteId = await resolveSiteId(accessToken, siteUrl, VALIDATE_RETRY_OPTIONS)
|
||||
const site = await resolveSiteId(accessToken, siteUrl, VALIDATE_RETRY_OPTIONS)
|
||||
const siteId = site.id
|
||||
|
||||
// If a folder path is configured, verify it exists
|
||||
const folderPath = (sourceConfig.folderPath as string)?.trim()
|
||||
|
||||
@@ -213,11 +213,11 @@ async function resolveChannel(
|
||||
}
|
||||
}
|
||||
|
||||
// Search by name through conversations.list
|
||||
// Search by name through conversations.list (include private channels the bot is in)
|
||||
let cursor: string | undefined
|
||||
do {
|
||||
const params: Record<string, string> = {
|
||||
types: 'public_channel',
|
||||
types: 'public_channel,private_channel',
|
||||
limit: '200',
|
||||
exclude_archived: 'true',
|
||||
}
|
||||
@@ -248,7 +248,13 @@ export const slackConnector: ConnectorConfig = {
|
||||
auth: {
|
||||
mode: 'oauth',
|
||||
provider: 'slack',
|
||||
requiredScopes: ['channels:read', 'channels:history', 'users:read'],
|
||||
requiredScopes: [
|
||||
'channels:read',
|
||||
'channels:history',
|
||||
'groups:read',
|
||||
'groups:history',
|
||||
'users:read',
|
||||
],
|
||||
},
|
||||
|
||||
configFields: [
|
||||
@@ -356,7 +362,8 @@ export const slackConnector: ConnectorConfig = {
|
||||
getDocument: async (
|
||||
accessToken: string,
|
||||
sourceConfig: Record<string, unknown>,
|
||||
externalId: string
|
||||
externalId: string,
|
||||
syncContext?: Record<string, unknown>
|
||||
): Promise<ExternalDocument | null> => {
|
||||
const maxMessages = sourceConfig.maxMessages
|
||||
? Number(sourceConfig.maxMessages)
|
||||
@@ -372,7 +379,7 @@ export const slackConnector: ConnectorConfig = {
|
||||
maxMessages
|
||||
)
|
||||
|
||||
const content = await formatMessages(accessToken, messages)
|
||||
const content = await formatMessages(accessToken, messages, syncContext)
|
||||
if (!content.trim()) return null
|
||||
|
||||
const contentHash = await computeContentHash(content)
|
||||
@@ -441,11 +448,11 @@ export const slackConnector: ConnectorConfig = {
|
||||
return { valid: true }
|
||||
}
|
||||
|
||||
// Otherwise search by name
|
||||
// Otherwise search by name (include private channels the bot is in)
|
||||
let cursor: string | undefined
|
||||
do {
|
||||
const params: Record<string, string> = {
|
||||
types: 'public_channel',
|
||||
types: 'public_channel,private_channel',
|
||||
limit: '200',
|
||||
exclude_archived: 'true',
|
||||
}
|
||||
|
||||
@@ -2,7 +2,7 @@ import { createLogger } from '@sim/logger'
|
||||
import { WebflowIcon } from '@/components/icons'
|
||||
import { fetchWithRetry, VALIDATE_RETRY_OPTIONS } from '@/lib/knowledge/documents/utils'
|
||||
import type { ConnectorConfig, ExternalDocument, ExternalDocumentList } from '@/connectors/types'
|
||||
import { computeContentHash, parseTagDate } from '@/connectors/utils'
|
||||
import { computeContentHash, htmlToPlainText, parseTagDate } from '@/connectors/utils'
|
||||
|
||||
const logger = createLogger('WebflowConnector')
|
||||
|
||||
@@ -60,6 +60,8 @@ function itemToPlainText(item: WebflowItem, collectionName: string): string {
|
||||
lines.push(`${key}: ${items.join(', ')}`)
|
||||
} else if (typeof value === 'object') {
|
||||
lines.push(`${key}: ${JSON.stringify(value)}`)
|
||||
} else if (typeof value === 'string' && /<[a-z][^>]*>/i.test(value)) {
|
||||
lines.push(`${key}: ${htmlToPlainText(value)}`)
|
||||
} else {
|
||||
lines.push(`${key}: ${String(value)}`)
|
||||
}
|
||||
|
||||
@@ -7,6 +7,15 @@ import { computeContentHash, htmlToPlainText, joinTagArray, parseTagDate } from
|
||||
const logger = createLogger('WordPressConnector')
|
||||
|
||||
const WP_API_BASE = 'https://public-api.wordpress.com/rest/v1.1/sites'
|
||||
|
||||
/**
|
||||
* Strips protocol prefix and trailing slashes from a site URL so the
|
||||
* WordPress.com API receives a bare domain (e.g. "mysite.wordpress.com").
|
||||
*/
|
||||
function normalizeSiteUrl(raw: string): string {
|
||||
return raw.replace(/^https?:\/\//, '').replace(/\/+$/, '')
|
||||
}
|
||||
|
||||
const POSTS_PER_PAGE = 20
|
||||
const DEFAULT_MAX_POSTS = 100
|
||||
|
||||
@@ -135,10 +144,11 @@ export const wordpressConnector: ConnectorConfig = {
|
||||
cursor?: string,
|
||||
syncContext?: Record<string, unknown>
|
||||
): Promise<ExternalDocumentList> => {
|
||||
const siteUrl = (sourceConfig.siteUrl as string)?.trim()
|
||||
if (!siteUrl) {
|
||||
const rawSiteUrl = (sourceConfig.siteUrl as string)?.trim()
|
||||
if (!rawSiteUrl) {
|
||||
throw new Error('Site URL is required')
|
||||
}
|
||||
const siteUrl = normalizeSiteUrl(rawSiteUrl)
|
||||
|
||||
const maxPosts = sourceConfig.maxPosts ? Number(sourceConfig.maxPosts) : DEFAULT_MAX_POSTS
|
||||
const type = resolvePostType(sourceConfig.postType as string | undefined)
|
||||
@@ -193,10 +203,11 @@ export const wordpressConnector: ConnectorConfig = {
|
||||
sourceConfig: Record<string, unknown>,
|
||||
externalId: string
|
||||
): Promise<ExternalDocument | null> => {
|
||||
const siteUrl = (sourceConfig.siteUrl as string)?.trim()
|
||||
if (!siteUrl) {
|
||||
const rawSiteUrl = (sourceConfig.siteUrl as string)?.trim()
|
||||
if (!rawSiteUrl) {
|
||||
throw new Error('Site URL is required')
|
||||
}
|
||||
const siteUrl = normalizeSiteUrl(rawSiteUrl)
|
||||
|
||||
const url = `${WP_API_BASE}/${encodeURIComponent(siteUrl)}/posts/${externalId}`
|
||||
|
||||
@@ -229,12 +240,13 @@ export const wordpressConnector: ConnectorConfig = {
|
||||
accessToken: string,
|
||||
sourceConfig: Record<string, unknown>
|
||||
): Promise<{ valid: boolean; error?: string }> => {
|
||||
const siteUrl = (sourceConfig.siteUrl as string)?.trim()
|
||||
const rawSiteUrl = (sourceConfig.siteUrl as string)?.trim()
|
||||
const maxPosts = sourceConfig.maxPosts as string | undefined
|
||||
|
||||
if (!siteUrl) {
|
||||
if (!rawSiteUrl) {
|
||||
return { valid: false, error: 'Site URL is required' }
|
||||
}
|
||||
const siteUrl = normalizeSiteUrl(rawSiteUrl)
|
||||
|
||||
if (maxPosts && (Number.isNaN(Number(maxPosts)) || Number(maxPosts) <= 0)) {
|
||||
return { valid: false, error: 'Max posts must be a positive number' }
|
||||
|
||||
@@ -391,10 +391,21 @@ export const zendeskConnector: ConnectorConfig = {
|
||||
)
|
||||
logger.info(`Fetched ${tickets.length} tickets from Zendesk`)
|
||||
|
||||
for (const ticket of tickets) {
|
||||
const comments = await fetchTicketComments(subdomain, accessToken, sourceConfig, ticket.id)
|
||||
const doc = await ticketToDocument(ticket, comments, subdomain)
|
||||
documents.push(doc)
|
||||
const BATCH_SIZE = 5
|
||||
for (let i = 0; i < tickets.length; i += BATCH_SIZE) {
|
||||
const batch = tickets.slice(i, i + BATCH_SIZE)
|
||||
const batchResults = await Promise.all(
|
||||
batch.map(async (ticket) => {
|
||||
const comments = await fetchTicketComments(
|
||||
subdomain,
|
||||
accessToken,
|
||||
sourceConfig,
|
||||
ticket.id
|
||||
)
|
||||
return ticketToDocument(ticket, comments, subdomain)
|
||||
})
|
||||
)
|
||||
documents.push(...batchResults)
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -180,7 +180,6 @@ describe('OAuth Token Refresh', () => {
|
||||
providerId: 'outlook',
|
||||
endpoint: 'https://login.microsoftonline.com/common/oauth2/v2.0/token',
|
||||
},
|
||||
{ name: 'Notion', providerId: 'notion', endpoint: 'https://api.notion.com/v1/oauth/token' },
|
||||
{ name: 'Slack', providerId: 'slack', endpoint: 'https://slack.com/api/oauth.v2.access' },
|
||||
{
|
||||
name: 'Dropbox',
|
||||
@@ -274,6 +273,44 @@ describe('OAuth Token Refresh', () => {
|
||||
)
|
||||
})
|
||||
|
||||
it.concurrent('should send Notion request with Basic Auth header and JSON body', async () => {
|
||||
const mockFetch = createMockFetch(defaultOAuthResponse)
|
||||
const refreshToken = 'test_refresh_token'
|
||||
|
||||
await withMockFetch(mockFetch, () => refreshOAuthToken('notion', refreshToken))
|
||||
|
||||
expect(mockFetch).toHaveBeenCalledWith(
|
||||
'https://api.notion.com/v1/oauth/token',
|
||||
expect.objectContaining({
|
||||
method: 'POST',
|
||||
headers: expect.objectContaining({
|
||||
'Content-Type': 'application/json',
|
||||
Authorization: expect.stringMatching(/^Basic /),
|
||||
}),
|
||||
body: expect.any(String),
|
||||
})
|
||||
)
|
||||
|
||||
const [, requestOptions] = mockFetch.mock.calls[0] as [
|
||||
string,
|
||||
{ headers: Record<string, string>; body: string },
|
||||
]
|
||||
|
||||
const authHeader = requestOptions.headers.Authorization
|
||||
const base64Credentials = authHeader.replace('Basic ', '')
|
||||
const credentials = Buffer.from(base64Credentials, 'base64').toString('utf-8')
|
||||
const [clientId, clientSecret] = credentials.split(':')
|
||||
|
||||
expect(clientId).toBe('notion_client_id')
|
||||
expect(clientSecret).toBe('notion_client_secret')
|
||||
|
||||
const bodyParams = JSON.parse(requestOptions.body)
|
||||
expect(bodyParams).toEqual({
|
||||
grant_type: 'refresh_token',
|
||||
refresh_token: refreshToken,
|
||||
})
|
||||
})
|
||||
|
||||
it.concurrent('should include User-Agent header for Reddit requests', async () => {
|
||||
const mockFetch = createMockFetch(defaultOAuthResponse)
|
||||
const refreshToken = 'test_refresh_token'
|
||||
|
||||
@@ -866,7 +866,7 @@ export const OAUTH_PROVIDERS: Record<string, OAuthProviderConfig> = {
|
||||
providerId: 'salesforce',
|
||||
icon: SalesforceIcon,
|
||||
baseProviderIcon: SalesforceIcon,
|
||||
scopes: ['api', 'refresh_token', 'openid', 'offline_access'],
|
||||
scopes: ['api', 'refresh_token', 'openid'],
|
||||
},
|
||||
},
|
||||
defaultService: 'salesforce',
|
||||
@@ -960,6 +960,11 @@ interface ProviderAuthConfig {
|
||||
* instead of in the request body. Used by Cal.com.
|
||||
*/
|
||||
refreshTokenInAuthHeader?: boolean
|
||||
/**
|
||||
* If true, the token endpoint expects a JSON body with Content-Type: application/json
|
||||
* instead of the default application/x-www-form-urlencoded. Used by Notion.
|
||||
*/
|
||||
useJsonBody?: boolean
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -1056,8 +1061,9 @@ function getProviderAuthConfig(provider: string): ProviderAuthConfig {
|
||||
tokenEndpoint: 'https://api.notion.com/v1/oauth/token',
|
||||
clientId,
|
||||
clientSecret,
|
||||
useBasicAuth: false,
|
||||
useBasicAuth: true,
|
||||
supportsRefreshTokenRotation: true,
|
||||
useJsonBody: true,
|
||||
}
|
||||
}
|
||||
case 'microsoft':
|
||||
@@ -1295,9 +1301,9 @@ function getProviderAuthConfig(provider: string): ProviderAuthConfig {
|
||||
function buildAuthRequest(
|
||||
config: ProviderAuthConfig,
|
||||
refreshToken: string
|
||||
): { headers: Record<string, string>; bodyParams: Record<string, string> } {
|
||||
): { headers: Record<string, string>; bodyParams: Record<string, string>; useJsonBody?: boolean } {
|
||||
const headers: Record<string, string> = {
|
||||
'Content-Type': 'application/x-www-form-urlencoded',
|
||||
'Content-Type': config.useJsonBody ? 'application/json' : 'application/x-www-form-urlencoded',
|
||||
...config.additionalHeaders,
|
||||
}
|
||||
|
||||
@@ -1326,7 +1332,7 @@ function buildAuthRequest(
|
||||
}
|
||||
}
|
||||
|
||||
return { headers, bodyParams }
|
||||
return { headers, bodyParams, useJsonBody: config.useJsonBody }
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -1361,12 +1367,12 @@ export async function refreshOAuthToken(
|
||||
|
||||
const config = getProviderAuthConfig(provider)
|
||||
|
||||
const { headers, bodyParams } = buildAuthRequest(config, refreshToken)
|
||||
const { headers, bodyParams, useJsonBody } = buildAuthRequest(config, refreshToken)
|
||||
|
||||
const response = await fetch(config.tokenEndpoint, {
|
||||
method: 'POST',
|
||||
headers,
|
||||
body: new URLSearchParams(bodyParams).toString(),
|
||||
body: useJsonBody ? JSON.stringify(bodyParams) : new URLSearchParams(bodyParams).toString(),
|
||||
})
|
||||
|
||||
if (!response.ok) {
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
import { createLogger } from '@sim/logger'
|
||||
import { fetchWithRetry } from '@/lib/knowledge/documents/utils'
|
||||
|
||||
const logger = createLogger('JiraUtils')
|
||||
|
||||
@@ -67,7 +68,7 @@ export async function downloadJiraAttachments(
|
||||
continue
|
||||
}
|
||||
try {
|
||||
const response = await fetch(att.content, {
|
||||
const response = await fetchWithRetry(att.content, {
|
||||
headers: {
|
||||
Authorization: `Bearer ${accessToken}`,
|
||||
Accept: '*/*',
|
||||
@@ -97,13 +98,21 @@ export async function downloadJiraAttachments(
|
||||
}
|
||||
|
||||
export async function getJiraCloudId(domain: string, accessToken: string): Promise<string> {
|
||||
const response = await fetch('https://api.atlassian.com/oauth/token/accessible-resources', {
|
||||
method: 'GET',
|
||||
headers: {
|
||||
Authorization: `Bearer ${accessToken}`,
|
||||
Accept: 'application/json',
|
||||
},
|
||||
})
|
||||
const response = await fetchWithRetry(
|
||||
'https://api.atlassian.com/oauth/token/accessible-resources',
|
||||
{
|
||||
method: 'GET',
|
||||
headers: {
|
||||
Authorization: `Bearer ${accessToken}`,
|
||||
Accept: 'application/json',
|
||||
},
|
||||
}
|
||||
)
|
||||
|
||||
if (!response.ok) {
|
||||
const errorText = await response.text()
|
||||
throw new Error(`Failed to fetch Jira accessible resources: ${response.status} - ${errorText}`)
|
||||
}
|
||||
|
||||
const resources = await response.json()
|
||||
|
||||
|
||||
Reference in New Issue
Block a user