Waleed aac9e74283 feat(knowledge): add 10 new knowledge base connectors (#3430)
* feat(knowledge): add 10 new knowledge base connectors

Add connectors for Dropbox, OneDrive, SharePoint, Slack, Google Docs,
Asana, HubSpot, Salesforce, WordPress, and Webflow. Each connector
implements listDocuments, getDocument, validateConfig with proper
pagination, content hashing, and tag definitions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(connectors): address audit findings across 5 connectors

OneDrive: fix encodeURIComponent breaking folder paths with slashes,
add recursive folder traversal via folder queue in cursor state.
Slack: add missing requiredScopes.
Asana: pass retryOptions as 3rd arg to fetchWithRetry instead of
spreading into RequestInit; add missing requiredScopes.
HubSpot: add missing requiredScopes; fix sort property to use
hs_lastmodifieddate for non-contact object types.
Google Docs: remove orphaned title tag that was never populated.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(connectors): add missing requiredScopes to OneDrive and HubSpot

OneDrive: add requiredScopes: ['Files.Read']
HubSpot: add missing crm.objects.tickets.read scope

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore(connectors): lint fixes

* fix(connectors): slice documents to respect max limit on last page

* fix(connectors): use per-segment encodeURIComponent for SharePoint folder paths

encodeURI does not encode #, ?, &, + or = which are valid in folder
names but break the Microsoft Graph URL. Apply the same per-segment
encoding fix already used in the OneDrive connector.

* fix(connectors): address PR review findings

- Slack: remove private_channel from conversations.list types param
  since requiredScopes only cover public channels (channels:read,
  channels:history). Adding groups:read/groups:history would force
  all users to grant private channel access unnecessarily.
- OneDrive/SharePoint: add .htm to supported extensions and handle
  it in content processing (htmlToPlainText), matching Dropbox.
- Salesforce: guard getDocument for KnowledgeArticleVersion to skip
  records that are no longer PublishStatus='Online', preventing
  un-published articles from being re-synced.

* fix(connectors): pre-download size check and remove dead parameter

- OneDrive/SharePoint: add file size check against MAX_FILE_SIZE before
  downloading, matching Dropbox's behavior. Prevents OOM on large files.
- Slack: remove unused syncContext parameter from fetchChannelMessages.

* fix(connectors): slack getDocument user cache & wordpress scope reduction

- Slack: pass a local syncContext to formatMessages in getDocument so
  resolveUserName caches user lookups across messages. Without this,
  every message triggered a fresh users.info API call.
- WordPress: replace 'global' scope with 'posts' and 'sites' following
  principle of least privilege. The connector only reads posts and
  validates site existence.

* fix(connectors): revert wordpress scope and slack local cache changes

- WordPress: revert requiredScopes to ['global'] — the scope check
  does literal string matching, so ['posts', 'sites'] would always
  fail since auth.ts requests 'global' from WordPress.com OAuth.
  Reducing scope requires changing both auth.ts and the connector.
- Slack: remove local syncContext from getDocument — the perf impact
  of uncached users.info calls is negligible for typical channels
  (bounded by unique users, not message count).

* fix(connectors): align requiredScopes with auth.ts registrations

The scope check in getMissingRequiredScopes does literal string matching
against the OAuth token's granted scopes. requiredScopes must match what
auth.ts actually requests (since that's what the provider returns).

- HubSpot: use 'tickets' (legacy scope in auth.ts) instead of
  'crm.objects.tickets.read' (v3 granular scope not requested)
- Google Docs: use 'drive' (what auth.ts requests) instead of
  'documents.readonly' and 'drive.readonly' (never requested,
  so never in the granted set)

* fix(connectors): align Google Drive requiredScopes with auth.ts

Google Drive connector required 'drive.readonly' but auth.ts requests
'drive' (the superset). Since scope validation does literal matching,
this caused a spurious 'Additional permissions required' warning.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 19:31:17 -08:00
2026-03-04 11:12:51 -08:00
2026-03-05 17:37:41 -08:00
2026-03-03 14:46:09 -08:00

Sim Logo

The open-source platform to build AI agents and run your agentic workforce. Connect 1,000+ integrations and LLMs to orchestrate agentic workflows.

Sim.ai Discord Twitter Documentation

Ask DeepWiki Set Up with Cursor

Build Workflows with Ease

Design agent workflows visually on a canvas—connect agents, tools, and blocks, then run them instantly.

Workflow Builder Demo

Supercharge with Copilot

Leverage Copilot to generate nodes, fix errors, and iterate on flows directly from natural language.

Copilot Demo

Integrate Vector Databases

Upload documents to a vector store and let agents answer questions grounded in your specific content.

Knowledge Uploads and Retrieval Demo

Quickstart

Cloud-hosted: sim.ai

Sim.ai

Self-hosted: NPM Package

npx simstudio

http://localhost:3000

Note

Docker must be installed and running on your machine.

Options

Flag Description
-p, --port <port> Port to run Sim on (default 3000)
--no-pull Skip pulling latest Docker images

Self-hosted: Docker Compose

git clone https://github.com/simstudioai/sim.git && cd sim
docker compose -f docker-compose.prod.yml up -d

Open http://localhost:3000

Using Local Models with Ollama

Run Sim with local AI models using Ollama - no external APIs required:

# Start with GPU support (automatically downloads gemma3:4b model)
docker compose -f docker-compose.ollama.yml --profile setup up -d

# For CPU-only systems:
docker compose -f docker-compose.ollama.yml --profile cpu --profile setup up -d

Wait for the model to download, then visit http://localhost:3000. Add more models with:

docker compose -f docker-compose.ollama.yml exec ollama ollama pull llama3.1:8b

Using an External Ollama Instance

If Ollama is running on your host machine, use host.docker.internal instead of localhost:

OLLAMA_URL=http://host.docker.internal:11434 docker compose -f docker-compose.prod.yml up -d

On Linux, use your host's IP address or add extra_hosts: ["host.docker.internal:host-gateway"] to the compose file.

Using vLLM

Sim supports vLLM for self-hosted models. Set VLLM_BASE_URL and optionally VLLM_API_KEY in your environment.

Self-hosted: Dev Containers

  1. Open VS Code with the Remote - Containers extension
  2. Open the project and click "Reopen in Container" when prompted
  3. Run bun run dev:full in the terminal or use the sim-start alias
    • This starts both the main application and the realtime socket server

Self-hosted: Manual Setup

Requirements: Bun, Node.js v20+, PostgreSQL 12+ with pgvector

  1. Clone and install:
git clone https://github.com/simstudioai/sim.git
cd sim
bun install
  1. Set up PostgreSQL with pgvector:
docker run --name simstudio-db -e POSTGRES_PASSWORD=your_password -e POSTGRES_DB=simstudio -p 5432:5432 -d pgvector/pgvector:pg17

Or install manually via the pgvector guide.

  1. Configure environment:
cp apps/sim/.env.example apps/sim/.env
cp packages/db/.env.example packages/db/.env
# Edit both .env files to set DATABASE_URL="postgresql://postgres:your_password@localhost:5432/simstudio"
  1. Run migrations:
cd packages/db && bunx drizzle-kit migrate --config=./drizzle.config.ts
  1. Start development servers:
bun run dev:full  # Starts both Next.js app and realtime socket server

Or run separately: bun run dev (Next.js) and cd apps/sim && bun run dev:sockets (realtime).

Copilot API Keys

Copilot is a Sim-managed service. To use Copilot on a self-hosted instance:

  • Go to https://sim.ai → Settings → Copilot and generate a Copilot API key
  • Set COPILOT_API_KEY environment variable in your self-hosted apps/sim/.env file to that value

Environment Variables

Key environment variables for self-hosted deployments. See .env.example for defaults or env.ts for the full list.

Variable Required Description
DATABASE_URL Yes PostgreSQL connection string with pgvector
BETTER_AUTH_SECRET Yes Auth secret (openssl rand -hex 32)
BETTER_AUTH_URL Yes Your app URL (e.g., http://localhost:3000)
NEXT_PUBLIC_APP_URL Yes Public app URL (same as above)
ENCRYPTION_KEY Yes Encrypts environment variables (openssl rand -hex 32)
INTERNAL_API_SECRET Yes Encrypts internal API routes (openssl rand -hex 32)
API_ENCRYPTION_KEY Yes Encrypts API keys (openssl rand -hex 32)
COPILOT_API_KEY No API key from sim.ai for Copilot features

Tech Stack

Contributing

We welcome contributions! Please see our Contributing Guide for details.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Made with ❤️ by the Sim Team

Description
No description provided
Readme Apache-2.0 597 MiB
Languages
TypeScript 71.8%
MDX 27.7%
CSS 0.2%
Python 0.1%