Compare commits

...

2 Commits

Author SHA1 Message Date
Vikhyath Mondreti
dbe53f406b fix(max-tokens): anthropic models streaming vs non-streaming 2026-01-25 14:28:09 -08:00
Waleed
be2a9ef0f8 fix(storage): support Azure connection string for presigned URLs (#2997)
* fix(docs): update requirements to be more accurate for deploying the app

* updated kb to support 1536 dimension vectors for models other than text embedding 3 small

* fix(storage): support Azure connection string for presigned URLs

* fix(kb): update test for embedding dimensions parameter

* fix(storage): align credential source ordering for consistency
2026-01-25 13:06:12 -08:00
21 changed files with 262 additions and 91 deletions

View File

@@ -44,7 +44,7 @@ services:
deploy: deploy:
resources: resources:
limits: limits:
memory: 4G memory: 1G
environment: environment:
- NODE_ENV=development - NODE_ENV=development
- DATABASE_URL=postgresql://postgres:postgres@db:5432/simstudio - DATABASE_URL=postgresql://postgres:postgres@db:5432/simstudio

View File

@@ -10,12 +10,20 @@ Stellen Sie Sim auf Ihrer eigenen Infrastruktur mit Docker oder Kubernetes berei
## Anforderungen ## Anforderungen
| Ressource | Minimum | Empfohlen | | Ressource | Klein | Standard | Produktion |
|----------|---------|-------------| |----------|-------|----------|------------|
| CPU | 2 Kerne | 4+ Kerne | | CPU | 2 Kerne | 4 Kerne | 8+ Kerne |
| RAM | 12 GB | 16+ GB | | RAM | 12 GB | 16 GB | 32+ GB |
| Speicher | 20 GB SSD | 50+ GB SSD | | Speicher | 20 GB SSD | 50 GB SSD | 100+ GB SSD |
| Docker | 20.10+ | Neueste Version | | Docker | 20.10+ | 20.10+ | Neueste Version |
**Klein**: Entwicklung, Tests, Einzelnutzer (1-5 Nutzer)
**Standard**: Teams (5-50 Nutzer), moderate Arbeitslasten
**Produktion**: Große Teams (50+ Nutzer), Hochverfügbarkeit, intensive Workflow-Ausführung
<Callout type="info">
Die Ressourcenanforderungen werden durch Workflow-Ausführung (isolated-vm Sandboxing), Dateiverarbeitung (In-Memory-Dokumentenparsing) und Vektoroperationen (pgvector) bestimmt. Arbeitsspeicher ist typischerweise der limitierende Faktor, nicht CPU. Produktionsdaten zeigen, dass die Hauptanwendung durchschnittlich 4-8 GB und bei hoher Last bis zu 12 GB benötigt.
</Callout>
## Schnellstart ## Schnellstart

View File

@@ -16,12 +16,20 @@ Deploy Sim on your own infrastructure with Docker or Kubernetes.
## Requirements ## Requirements
| Resource | Minimum | Recommended | | Resource | Small | Standard | Production |
|----------|---------|-------------| |----------|-------|----------|------------|
| CPU | 2 cores | 4+ cores | | CPU | 2 cores | 4 cores | 8+ cores |
| RAM | 12 GB | 16+ GB | | RAM | 12 GB | 16 GB | 32+ GB |
| Storage | 20 GB SSD | 50+ GB SSD | | Storage | 20 GB SSD | 50 GB SSD | 100+ GB SSD |
| Docker | 20.10+ | Latest | | Docker | 20.10+ | 20.10+ | Latest |
**Small**: Development, testing, single user (1-5 users)
**Standard**: Teams (5-50 users), moderate workloads
**Production**: Large teams (50+ users), high availability, heavy workflow execution
<Callout type="info">
Resource requirements are driven by workflow execution (isolated-vm sandboxing), file processing (in-memory document parsing), and vector operations (pgvector). Memory is typically the constraining factor rather than CPU. Production telemetry shows the main app uses 4-8 GB average with peaks up to 12 GB under heavy load.
</Callout>
## Quick Start ## Quick Start

View File

@@ -10,12 +10,20 @@ Despliega Sim en tu propia infraestructura con Docker o Kubernetes.
## Requisitos ## Requisitos
| Recurso | Mínimo | Recomendado | | Recurso | Pequeño | Estándar | Producción |
|----------|---------|-------------| |----------|---------|----------|------------|
| CPU | 2 núcleos | 4+ núcleos | | CPU | 2 núcleos | 4 núcleos | 8+ núcleos |
| RAM | 12 GB | 16+ GB | | RAM | 12 GB | 16 GB | 32+ GB |
| Almacenamiento | 20 GB SSD | 50+ GB SSD | | Almacenamiento | 20 GB SSD | 50 GB SSD | 100+ GB SSD |
| Docker | 20.10+ | Última versión | | Docker | 20.10+ | 20.10+ | Última versión |
**Pequeño**: Desarrollo, pruebas, usuario único (1-5 usuarios)
**Estándar**: Equipos (5-50 usuarios), cargas de trabajo moderadas
**Producción**: Equipos grandes (50+ usuarios), alta disponibilidad, ejecución intensiva de workflows
<Callout type="info">
Los requisitos de recursos están determinados por la ejecución de workflows (sandboxing isolated-vm), procesamiento de archivos (análisis de documentos en memoria) y operaciones vectoriales (pgvector). La memoria suele ser el factor limitante, no la CPU. La telemetría de producción muestra que la aplicación principal usa 4-8 GB en promedio con picos de hasta 12 GB bajo carga pesada.
</Callout>
## Inicio rápido ## Inicio rápido

View File

@@ -10,12 +10,20 @@ Déployez Sim sur votre propre infrastructure avec Docker ou Kubernetes.
## Prérequis ## Prérequis
| Ressource | Minimum | Recommandé | | Ressource | Petit | Standard | Production |
|----------|---------|-------------| |----------|-------|----------|------------|
| CPU | 2 cœurs | 4+ cœurs | | CPU | 2 cœurs | 4 cœurs | 8+ cœurs |
| RAM | 12 Go | 16+ Go | | RAM | 12 Go | 16 Go | 32+ Go |
| Stockage | 20 Go SSD | 50+ Go SSD | | Stockage | 20 Go SSD | 50 Go SSD | 100+ Go SSD |
| Docker | 20.10+ | Dernière version | | Docker | 20.10+ | 20.10+ | Dernière version |
**Petit** : Développement, tests, utilisateur unique (1-5 utilisateurs)
**Standard** : Équipes (5-50 utilisateurs), charges de travail modérées
**Production** : Grandes équipes (50+ utilisateurs), haute disponibilité, exécution intensive de workflows
<Callout type="info">
Les besoins en ressources sont déterminés par l'exécution des workflows (sandboxing isolated-vm), le traitement des fichiers (analyse de documents en mémoire) et les opérations vectorielles (pgvector). La mémoire est généralement le facteur limitant, pas le CPU. La télémétrie de production montre que l'application principale utilise 4-8 Go en moyenne avec des pics jusqu'à 12 Go sous forte charge.
</Callout>
## Démarrage rapide ## Démarrage rapide

View File

@@ -10,12 +10,20 @@ DockerまたはKubernetesを使用して、自社のインフラストラクチ
## 要件 ## 要件
| リソース | 最小 | 推奨 | | リソース | スモール | スタンダード | プロダクション |
|----------|---------|-------------| |----------|---------|-------------|----------------|
| CPU | 2コア | 4+コア | | CPU | 2コア | 4コア | 8+コア |
| RAM | 12 GB | 16+ GB | | RAM | 12 GB | 16 GB | 32+ GB |
| ストレージ | 20 GB SSD | 50+ GB SSD | | ストレージ | 20 GB SSD | 50 GB SSD | 100+ GB SSD |
| Docker | 20.10+ | 最新版 | | Docker | 20.10+ | 20.10+ | 最新版 |
**スモール**: 開発、テスト、シングルユーザー1-5ユーザー
**スタンダード**: チーム5-50ユーザー、中程度のワークロード
**プロダクション**: 大規模チーム50+ユーザー)、高可用性、高負荷ワークフロー実行
<Callout type="info">
リソース要件は、ワークフロー実行isolated-vmサンドボックス、ファイル処理メモリ内ドキュメント解析、ベクトル演算pgvectorによって決まります。CPUよりもメモリが制約要因となることが多いです。本番環境のテレメトリによると、メインアプリは平均4-8 GB、高負荷時は最大12 GBを使用します。
</Callout>
## クイックスタート ## クイックスタート

View File

@@ -10,12 +10,20 @@ import { Callout } from 'fumadocs-ui/components/callout'
## 要求 ## 要求
| 资源 | 最低要求 | 推荐配置 | | 资源 | 小型 | 标准 | 生产环境 |
|----------|---------|-------------| |----------|------|------|----------|
| CPU | 2 核 | 4 核及以上 | | CPU | 2 核 | 4 核 | 8+ 核 |
| 内存 | 12 GB | 16 GB 及以上 | | 内存 | 12 GB | 16 GB | 32+ GB |
| 存储 | 20 GB SSD | 50 GB 及以上 SSD | | 存储 | 20 GB SSD | 50 GB SSD | 100+ GB SSD |
| Docker | 20.10+ | 最新版本 | | Docker | 20.10+ | 20.10+ | 最新版本 |
**小型**: 开发、测试、单用户1-5 用户)
**标准**: 团队5-50 用户)、中等工作负载
**生产环境**: 大型团队50+ 用户)、高可用性、密集工作流执行
<Callout type="info">
资源需求由工作流执行isolated-vm 沙箱、文件处理内存中文档解析和向量运算pgvector决定。内存通常是限制因素而不是 CPU。生产遥测数据显示主应用平均使用 4-8 GB高负载时峰值可达 12 GB。
</Callout>
## 快速开始 ## 快速开始

View File

@@ -408,6 +408,7 @@ describe('Knowledge Search Utils', () => {
input: ['test query'], input: ['test query'],
model: 'text-embedding-3-small', model: 'text-embedding-3-small',
encoding_format: 'float', encoding_format: 'float',
dimensions: 1536,
}), }),
}) })
) )

View File

@@ -513,6 +513,12 @@ Return ONLY the JSON array.`,
})(), })(),
}), }),
}, },
{
id: 'maxTokens',
title: 'Max Output Tokens',
type: 'short-input',
placeholder: 'Enter max tokens (e.g., 4096)...',
},
{ {
id: 'responseFormat', id: 'responseFormat',
title: 'Response Format', title: 'Response Format',
@@ -754,6 +760,7 @@ Example 3 (Array Input):
}, },
}, },
temperature: { type: 'number', description: 'Response randomness level' }, temperature: { type: 'number', description: 'Response randomness level' },
maxTokens: { type: 'number', description: 'Maximum number of tokens in the response' },
reasoningEffort: { type: 'string', description: 'Reasoning effort level for GPT-5 models' }, reasoningEffort: { type: 'string', description: 'Reasoning effort level for GPT-5 models' },
verbosity: { type: 'string', description: 'Verbosity level for GPT-5 models' }, verbosity: { type: 'string', description: 'Verbosity level for GPT-5 models' },
thinkingLevel: { type: 'string', description: 'Thinking level for Gemini 3 models' }, thinkingLevel: { type: 'string', description: 'Thinking level for Gemini 3 models' },

View File

@@ -8,6 +8,17 @@ const logger = createLogger('EmbeddingUtils')
const MAX_TOKENS_PER_REQUEST = 8000 const MAX_TOKENS_PER_REQUEST = 8000
const MAX_CONCURRENT_BATCHES = env.KB_CONFIG_CONCURRENCY_LIMIT || 50 const MAX_CONCURRENT_BATCHES = env.KB_CONFIG_CONCURRENCY_LIMIT || 50
const EMBEDDING_DIMENSIONS = 1536
/**
* Check if the model supports custom dimensions.
* text-embedding-3-* models support the dimensions parameter.
* Checks for 'embedding-3' to handle Azure deployments with custom naming conventions.
*/
function supportsCustomDimensions(modelName: string): boolean {
const name = modelName.toLowerCase()
return name.includes('embedding-3') && !name.includes('ada')
}
export class EmbeddingAPIError extends Error { export class EmbeddingAPIError extends Error {
public status: number public status: number
@@ -93,15 +104,19 @@ async function getEmbeddingConfig(
async function callEmbeddingAPI(inputs: string[], config: EmbeddingConfig): Promise<number[][]> { async function callEmbeddingAPI(inputs: string[], config: EmbeddingConfig): Promise<number[][]> {
return retryWithExponentialBackoff( return retryWithExponentialBackoff(
async () => { async () => {
const useDimensions = supportsCustomDimensions(config.modelName)
const requestBody = config.useAzure const requestBody = config.useAzure
? { ? {
input: inputs, input: inputs,
encoding_format: 'float', encoding_format: 'float',
...(useDimensions && { dimensions: EMBEDDING_DIMENSIONS }),
} }
: { : {
input: inputs, input: inputs,
model: config.modelName, model: config.modelName,
encoding_format: 'float', encoding_format: 'float',
...(useDimensions && { dimensions: EMBEDDING_DIMENSIONS }),
} }
const response = await fetch(config.apiUrl, { const response = await fetch(config.apiUrl, {

View File

@@ -18,6 +18,52 @@ const logger = createLogger('BlobClient')
let _blobServiceClient: BlobServiceClientInstance | null = null let _blobServiceClient: BlobServiceClientInstance | null = null
interface ParsedCredentials {
accountName: string
accountKey: string
}
/**
* Extract account name and key from an Azure connection string.
* Connection strings have the format: DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...;EndpointSuffix=...
*/
function parseConnectionString(connectionString: string): ParsedCredentials {
const accountNameMatch = connectionString.match(/AccountName=([^;]+)/)
if (!accountNameMatch) {
throw new Error('Cannot extract account name from connection string')
}
const accountKeyMatch = connectionString.match(/AccountKey=([^;]+)/)
if (!accountKeyMatch) {
throw new Error('Cannot extract account key from connection string')
}
return {
accountName: accountNameMatch[1],
accountKey: accountKeyMatch[1],
}
}
/**
* Get account credentials from BLOB_CONFIG, extracting from connection string if necessary.
*/
function getAccountCredentials(): ParsedCredentials {
if (BLOB_CONFIG.connectionString) {
return parseConnectionString(BLOB_CONFIG.connectionString)
}
if (BLOB_CONFIG.accountName && BLOB_CONFIG.accountKey) {
return {
accountName: BLOB_CONFIG.accountName,
accountKey: BLOB_CONFIG.accountKey,
}
}
throw new Error(
'Azure Blob Storage credentials are missing set AZURE_CONNECTION_STRING or both AZURE_ACCOUNT_NAME and AZURE_ACCOUNT_KEY'
)
}
export async function getBlobServiceClient(): Promise<BlobServiceClientInstance> { export async function getBlobServiceClient(): Promise<BlobServiceClientInstance> {
if (_blobServiceClient) return _blobServiceClient if (_blobServiceClient) return _blobServiceClient
@@ -127,6 +173,8 @@ export async function getPresignedUrl(key: string, expiresIn = 3600) {
const containerClient = blobServiceClient.getContainerClient(BLOB_CONFIG.containerName) const containerClient = blobServiceClient.getContainerClient(BLOB_CONFIG.containerName)
const blockBlobClient = containerClient.getBlockBlobClient(key) const blockBlobClient = containerClient.getBlockBlobClient(key)
const { accountName, accountKey } = getAccountCredentials()
const sasOptions = { const sasOptions = {
containerName: BLOB_CONFIG.containerName, containerName: BLOB_CONFIG.containerName,
blobName: key, blobName: key,
@@ -137,13 +185,7 @@ export async function getPresignedUrl(key: string, expiresIn = 3600) {
const sasToken = generateBlobSASQueryParameters( const sasToken = generateBlobSASQueryParameters(
sasOptions, sasOptions,
new StorageSharedKeyCredential( new StorageSharedKeyCredential(accountName, accountKey)
BLOB_CONFIG.accountName,
BLOB_CONFIG.accountKey ??
(() => {
throw new Error('AZURE_ACCOUNT_KEY is required when using account name authentication')
})()
)
).toString() ).toString()
return `${blockBlobClient.url}?${sasToken}` return `${blockBlobClient.url}?${sasToken}`
@@ -168,9 +210,14 @@ export async function getPresignedUrlWithConfig(
StorageSharedKeyCredential, StorageSharedKeyCredential,
} = await import('@azure/storage-blob') } = await import('@azure/storage-blob')
let tempBlobServiceClient: BlobServiceClientInstance let tempBlobServiceClient: BlobServiceClientInstance
let accountName: string
let accountKey: string
if (customConfig.connectionString) { if (customConfig.connectionString) {
tempBlobServiceClient = BlobServiceClient.fromConnectionString(customConfig.connectionString) tempBlobServiceClient = BlobServiceClient.fromConnectionString(customConfig.connectionString)
const credentials = parseConnectionString(customConfig.connectionString)
accountName = credentials.accountName
accountKey = credentials.accountKey
} else if (customConfig.accountName && customConfig.accountKey) { } else if (customConfig.accountName && customConfig.accountKey) {
const sharedKeyCredential = new StorageSharedKeyCredential( const sharedKeyCredential = new StorageSharedKeyCredential(
customConfig.accountName, customConfig.accountName,
@@ -180,6 +227,8 @@ export async function getPresignedUrlWithConfig(
`https://${customConfig.accountName}.blob.core.windows.net`, `https://${customConfig.accountName}.blob.core.windows.net`,
sharedKeyCredential sharedKeyCredential
) )
accountName = customConfig.accountName
accountKey = customConfig.accountKey
} else { } else {
throw new Error( throw new Error(
'Custom blob config must include either connectionString or accountName + accountKey' 'Custom blob config must include either connectionString or accountName + accountKey'
@@ -199,13 +248,7 @@ export async function getPresignedUrlWithConfig(
const sasToken = generateBlobSASQueryParameters( const sasToken = generateBlobSASQueryParameters(
sasOptions, sasOptions,
new StorageSharedKeyCredential( new StorageSharedKeyCredential(accountName, accountKey)
customConfig.accountName,
customConfig.accountKey ??
(() => {
throw new Error('Account key is required when using account name authentication')
})()
)
).toString() ).toString()
return `${blockBlobClient.url}?${sasToken}` return `${blockBlobClient.url}?${sasToken}`
@@ -403,13 +446,9 @@ export async function getMultipartPartUrls(
if (customConfig) { if (customConfig) {
if (customConfig.connectionString) { if (customConfig.connectionString) {
blobServiceClient = BlobServiceClient.fromConnectionString(customConfig.connectionString) blobServiceClient = BlobServiceClient.fromConnectionString(customConfig.connectionString)
const match = customConfig.connectionString.match(/AccountName=([^;]+)/) const credentials = parseConnectionString(customConfig.connectionString)
if (!match) throw new Error('Cannot extract account name from connection string') accountName = credentials.accountName
accountName = match[1] accountKey = credentials.accountKey
const keyMatch = customConfig.connectionString.match(/AccountKey=([^;]+)/)
if (!keyMatch) throw new Error('Cannot extract account key from connection string')
accountKey = keyMatch[1]
} else if (customConfig.accountName && customConfig.accountKey) { } else if (customConfig.accountName && customConfig.accountKey) {
const credential = new StorageSharedKeyCredential( const credential = new StorageSharedKeyCredential(
customConfig.accountName, customConfig.accountName,
@@ -428,12 +467,9 @@ export async function getMultipartPartUrls(
} else { } else {
blobServiceClient = await getBlobServiceClient() blobServiceClient = await getBlobServiceClient()
containerName = BLOB_CONFIG.containerName containerName = BLOB_CONFIG.containerName
accountName = BLOB_CONFIG.accountName const credentials = getAccountCredentials()
accountKey = accountName = credentials.accountName
BLOB_CONFIG.accountKey || accountKey = credentials.accountKey
(() => {
throw new Error('AZURE_ACCOUNT_KEY is required')
})()
} }
const containerClient = blobServiceClient.getContainerClient(containerName) const containerClient = blobServiceClient.getContainerClient(containerName)
@@ -501,12 +537,10 @@ export async function completeMultipartUpload(
const containerClient = blobServiceClient.getContainerClient(containerName) const containerClient = blobServiceClient.getContainerClient(containerName)
const blockBlobClient = containerClient.getBlockBlobClient(key) const blockBlobClient = containerClient.getBlockBlobClient(key)
// Sort parts by part number and extract block IDs
const sortedBlockIds = parts const sortedBlockIds = parts
.sort((a, b) => a.partNumber - b.partNumber) .sort((a, b) => a.partNumber - b.partNumber)
.map((part) => part.blockId) .map((part) => part.blockId)
// Commit the block list to create the final blob
await blockBlobClient.commitBlockList(sortedBlockIds, { await blockBlobClient.commitBlockList(sortedBlockIds, {
metadata: { metadata: {
multipartUpload: 'completed', multipartUpload: 'completed',
@@ -557,10 +591,8 @@ export async function abortMultipartUpload(key: string, customConfig?: BlobConfi
const blockBlobClient = containerClient.getBlockBlobClient(key) const blockBlobClient = containerClient.getBlockBlobClient(key)
try { try {
// Delete the blob if it exists (this also cleans up any uncommitted blocks)
await blockBlobClient.deleteIfExists() await blockBlobClient.deleteIfExists()
} catch (error) { } catch (error) {
// Ignore errors since we're just cleaning up
logger.warn('Error cleaning up multipart upload:', error) logger.warn('Error cleaning up multipart upload:', error)
} }
} }

View File

@@ -9,6 +9,7 @@ import {
generateToolUseId, generateToolUseId,
} from '@/providers/anthropic/utils' } from '@/providers/anthropic/utils'
import { import {
getMaxOutputTokensForModel,
getProviderDefaultModel, getProviderDefaultModel,
getProviderModels, getProviderModels,
supportsNativeStructuredOutputs, supportsNativeStructuredOutputs,
@@ -178,7 +179,9 @@ export const anthropicProvider: ProviderConfig = {
model: request.model, model: request.model,
messages, messages,
system: systemPrompt, system: systemPrompt,
max_tokens: Number.parseInt(String(request.maxTokens)) || 1024, max_tokens:
Number.parseInt(String(request.maxTokens)) ||
getMaxOutputTokensForModel(request.model, request.stream ?? false),
temperature: Number.parseFloat(String(request.temperature ?? 0.7)), temperature: Number.parseFloat(String(request.temperature ?? 0.7)),
} }

View File

@@ -20,7 +20,11 @@ import {
generateToolUseId, generateToolUseId,
getBedrockInferenceProfileId, getBedrockInferenceProfileId,
} from '@/providers/bedrock/utils' } from '@/providers/bedrock/utils'
import { getProviderDefaultModel, getProviderModels } from '@/providers/models' import {
getMaxOutputTokensForModel,
getProviderDefaultModel,
getProviderModels,
} from '@/providers/models'
import type { import type {
ProviderConfig, ProviderConfig,
ProviderRequest, ProviderRequest,
@@ -259,7 +263,9 @@ export const bedrockProvider: ProviderConfig = {
const inferenceConfig = { const inferenceConfig = {
temperature: Number.parseFloat(String(request.temperature ?? 0.7)), temperature: Number.parseFloat(String(request.temperature ?? 0.7)),
maxTokens: Number.parseInt(String(request.maxTokens)) || 4096, maxTokens:
Number.parseInt(String(request.maxTokens)) ||
getMaxOutputTokensForModel(request.model, request.stream ?? false),
} }
const shouldStreamToolCalls = request.streamToolCalls ?? false const shouldStreamToolCalls = request.streamToolCalls ?? false

View File

@@ -34,6 +34,12 @@ export interface ModelCapabilities {
toolUsageControl?: boolean toolUsageControl?: boolean
computerUse?: boolean computerUse?: boolean
nativeStructuredOutputs?: boolean nativeStructuredOutputs?: boolean
maxOutputTokens?: {
/** Maximum tokens for streaming requests */
max: number
/** Safe default for non-streaming requests (to avoid timeout issues) */
default: number
}
reasoningEffort?: { reasoningEffort?: {
values: string[] values: string[]
} }
@@ -613,6 +619,7 @@ export const PROVIDER_DEFINITIONS: Record<string, ProviderDefinition> = {
capabilities: { capabilities: {
temperature: { min: 0, max: 1 }, temperature: { min: 0, max: 1 },
nativeStructuredOutputs: true, nativeStructuredOutputs: true,
maxOutputTokens: { max: 64000, default: 4096 },
}, },
contextWindow: 200000, contextWindow: 200000,
}, },
@@ -627,6 +634,7 @@ export const PROVIDER_DEFINITIONS: Record<string, ProviderDefinition> = {
capabilities: { capabilities: {
temperature: { min: 0, max: 1 }, temperature: { min: 0, max: 1 },
nativeStructuredOutputs: true, nativeStructuredOutputs: true,
maxOutputTokens: { max: 64000, default: 4096 },
}, },
contextWindow: 200000, contextWindow: 200000,
}, },
@@ -640,6 +648,7 @@ export const PROVIDER_DEFINITIONS: Record<string, ProviderDefinition> = {
}, },
capabilities: { capabilities: {
temperature: { min: 0, max: 1 }, temperature: { min: 0, max: 1 },
maxOutputTokens: { max: 64000, default: 4096 },
}, },
contextWindow: 200000, contextWindow: 200000,
}, },
@@ -654,6 +663,7 @@ export const PROVIDER_DEFINITIONS: Record<string, ProviderDefinition> = {
capabilities: { capabilities: {
temperature: { min: 0, max: 1 }, temperature: { min: 0, max: 1 },
nativeStructuredOutputs: true, nativeStructuredOutputs: true,
maxOutputTokens: { max: 64000, default: 4096 },
}, },
contextWindow: 200000, contextWindow: 200000,
}, },
@@ -668,6 +678,7 @@ export const PROVIDER_DEFINITIONS: Record<string, ProviderDefinition> = {
capabilities: { capabilities: {
temperature: { min: 0, max: 1 }, temperature: { min: 0, max: 1 },
nativeStructuredOutputs: true, nativeStructuredOutputs: true,
maxOutputTokens: { max: 64000, default: 4096 },
}, },
contextWindow: 200000, contextWindow: 200000,
}, },
@@ -681,6 +692,7 @@ export const PROVIDER_DEFINITIONS: Record<string, ProviderDefinition> = {
}, },
capabilities: { capabilities: {
temperature: { min: 0, max: 1 }, temperature: { min: 0, max: 1 },
maxOutputTokens: { max: 64000, default: 4096 },
}, },
contextWindow: 200000, contextWindow: 200000,
}, },
@@ -695,6 +707,7 @@ export const PROVIDER_DEFINITIONS: Record<string, ProviderDefinition> = {
capabilities: { capabilities: {
temperature: { min: 0, max: 1 }, temperature: { min: 0, max: 1 },
computerUse: true, computerUse: true,
maxOutputTokens: { max: 8192, default: 8192 },
}, },
contextWindow: 200000, contextWindow: 200000,
}, },
@@ -709,6 +722,7 @@ export const PROVIDER_DEFINITIONS: Record<string, ProviderDefinition> = {
capabilities: { capabilities: {
temperature: { min: 0, max: 1 }, temperature: { min: 0, max: 1 },
computerUse: true, computerUse: true,
maxOutputTokens: { max: 8192, default: 8192 },
}, },
contextWindow: 200000, contextWindow: 200000,
}, },
@@ -1655,6 +1669,7 @@ export const PROVIDER_DEFINITIONS: Record<string, ProviderDefinition> = {
capabilities: { capabilities: {
temperature: { min: 0, max: 1 }, temperature: { min: 0, max: 1 },
nativeStructuredOutputs: true, nativeStructuredOutputs: true,
maxOutputTokens: { max: 64000, default: 4096 },
}, },
contextWindow: 200000, contextWindow: 200000,
}, },
@@ -1668,6 +1683,7 @@ export const PROVIDER_DEFINITIONS: Record<string, ProviderDefinition> = {
capabilities: { capabilities: {
temperature: { min: 0, max: 1 }, temperature: { min: 0, max: 1 },
nativeStructuredOutputs: true, nativeStructuredOutputs: true,
maxOutputTokens: { max: 64000, default: 4096 },
}, },
contextWindow: 200000, contextWindow: 200000,
}, },
@@ -1681,6 +1697,7 @@ export const PROVIDER_DEFINITIONS: Record<string, ProviderDefinition> = {
capabilities: { capabilities: {
temperature: { min: 0, max: 1 }, temperature: { min: 0, max: 1 },
nativeStructuredOutputs: true, nativeStructuredOutputs: true,
maxOutputTokens: { max: 64000, default: 4096 },
}, },
contextWindow: 200000, contextWindow: 200000,
}, },
@@ -1694,6 +1711,7 @@ export const PROVIDER_DEFINITIONS: Record<string, ProviderDefinition> = {
capabilities: { capabilities: {
temperature: { min: 0, max: 1 }, temperature: { min: 0, max: 1 },
nativeStructuredOutputs: true, nativeStructuredOutputs: true,
maxOutputTokens: { max: 64000, default: 4096 },
}, },
contextWindow: 200000, contextWindow: 200000,
}, },
@@ -2333,3 +2351,31 @@ export function getThinkingLevelsForModel(modelId: string): string[] | null {
const capability = getThinkingCapability(modelId) const capability = getThinkingCapability(modelId)
return capability?.levels ?? null return capability?.levels ?? null
} }
/**
* Get the max output tokens for a specific model
* Returns the model's max capacity for streaming requests,
* or the model's safe default for non-streaming requests to avoid timeout issues.
*
* @param modelId - The model ID
* @param streaming - Whether the request is streaming (default: false)
*/
export function getMaxOutputTokensForModel(modelId: string, streaming = false): number {
const normalizedModelId = modelId.toLowerCase()
const STANDARD_MAX_OUTPUT_TOKENS = 4096
for (const provider of Object.values(PROVIDER_DEFINITIONS)) {
for (const model of provider.models) {
const baseModelId = model.id.toLowerCase()
if (normalizedModelId === baseModelId || normalizedModelId.startsWith(`${baseModelId}-`)) {
const outputTokens = model.capabilities.maxOutputTokens
if (outputTokens) {
return streaming ? outputTokens.max : outputTokens.default
}
return STANDARD_MAX_OUTPUT_TOKENS
}
}
}
return STANDARD_MAX_OUTPUT_TOKENS
}

View File

@@ -8,6 +8,7 @@ import {
getComputerUseModels, getComputerUseModels,
getEmbeddingModelPricing, getEmbeddingModelPricing,
getHostedModels as getHostedModelsFromDefinitions, getHostedModels as getHostedModelsFromDefinitions,
getMaxOutputTokensForModel as getMaxOutputTokensForModelFromDefinitions,
getMaxTemperature as getMaxTempFromDefinitions, getMaxTemperature as getMaxTempFromDefinitions,
getModelPricing as getModelPricingFromDefinitions, getModelPricing as getModelPricingFromDefinitions,
getModelsWithReasoningEffort, getModelsWithReasoningEffort,
@@ -992,6 +993,18 @@ export function getThinkingLevelsForModel(model: string): string[] | null {
return getThinkingLevelsForModelFromDefinitions(model) return getThinkingLevelsForModelFromDefinitions(model)
} }
/**
* Get max output tokens for a specific model
* Returns the model's maxOutputTokens capability for streaming requests,
* or a conservative default (8192) for non-streaming requests to avoid timeout issues.
*
* @param model - The model ID
* @param streaming - Whether the request is streaming (default: false)
*/
export function getMaxOutputTokensForModel(model: string, streaming = false): number {
return getMaxOutputTokensForModelFromDefinitions(model, streaming)
}
/** /**
* Prepare tool execution parameters, separating tool parameters from system parameters * Prepare tool execution parameters, separating tool parameters from system parameters
*/ */

View File

@@ -5,7 +5,7 @@ import type { ToolConfig, ToolResponse } from '@/tools/types'
const logger = createLogger('BrowserUseTool') const logger = createLogger('BrowserUseTool')
const POLL_INTERVAL_MS = 5000 const POLL_INTERVAL_MS = 5000
const MAX_POLL_TIME_MS = 180000 const MAX_POLL_TIME_MS = 600000 // 10 minutes
const MAX_CONSECUTIVE_ERRORS = 3 const MAX_CONSECUTIVE_ERRORS = 3
async function createSessionWithProfile( async function createSessionWithProfile(

View File

@@ -52,7 +52,7 @@ services:
deploy: deploy:
resources: resources:
limits: limits:
memory: 8G memory: 1G
healthcheck: healthcheck:
test: ['CMD', 'wget', '--spider', '--quiet', 'http://127.0.0.1:3002/health'] test: ['CMD', 'wget', '--spider', '--quiet', 'http://127.0.0.1:3002/health']
interval: 90s interval: 90s

View File

@@ -56,7 +56,7 @@ services:
deploy: deploy:
resources: resources:
limits: limits:
memory: 8G memory: 1G
healthcheck: healthcheck:
test: ['CMD', 'wget', '--spider', '--quiet', 'http://127.0.0.1:3002/health'] test: ['CMD', 'wget', '--spider', '--quiet', 'http://127.0.0.1:3002/health']
interval: 90s interval: 90s

View File

@@ -42,7 +42,7 @@ services:
deploy: deploy:
resources: resources:
limits: limits:
memory: 4G memory: 1G
environment: environment:
- DATABASE_URL=postgresql://${POSTGRES_USER:-postgres}:${POSTGRES_PASSWORD:-postgres}@db:5432/${POSTGRES_DB:-simstudio} - DATABASE_URL=postgresql://${POSTGRES_USER:-postgres}:${POSTGRES_PASSWORD:-postgres}@db:5432/${POSTGRES_DB:-simstudio}
- NEXT_PUBLIC_APP_URL=${NEXT_PUBLIC_APP_URL:-http://localhost:3000} - NEXT_PUBLIC_APP_URL=${NEXT_PUBLIC_APP_URL:-http://localhost:3000}

View File

@@ -13,10 +13,10 @@ app:
resources: resources:
limits: limits:
memory: "6Gi" memory: "8Gi"
cpu: "2000m" cpu: "2000m"
requests: requests:
memory: "4Gi" memory: "6Gi"
cpu: "1000m" cpu: "1000m"
# Production URLs (REQUIRED - update with your actual domain names) # Production URLs (REQUIRED - update with your actual domain names)
@@ -52,11 +52,11 @@ realtime:
resources: resources:
limits: limits:
memory: "4Gi" memory: "1Gi"
cpu: "1000m"
requests:
memory: "2Gi"
cpu: "500m" cpu: "500m"
requests:
memory: "512Mi"
cpu: "250m"
env: env:
NEXT_PUBLIC_APP_URL: "https://sim.acme.ai" NEXT_PUBLIC_APP_URL: "https://sim.acme.ai"

View File

@@ -29,10 +29,10 @@ app:
# Resource limits and requests # Resource limits and requests
resources: resources:
limits: limits:
memory: "4Gi" memory: "8Gi"
cpu: "2000m" cpu: "2000m"
requests: requests:
memory: "2Gi" memory: "4Gi"
cpu: "1000m" cpu: "1000m"
# Node selector for pod scheduling (leave empty to allow scheduling on any node) # Node selector for pod scheduling (leave empty to allow scheduling on any node)
@@ -245,11 +245,11 @@ realtime:
# Resource limits and requests # Resource limits and requests
resources: resources:
limits: limits:
memory: "2Gi"
cpu: "1000m"
requests:
memory: "1Gi" memory: "1Gi"
cpu: "500m" cpu: "500m"
requests:
memory: "512Mi"
cpu: "250m"
# Node selector for pod scheduling (leave empty to allow scheduling on any node) # Node selector for pod scheduling (leave empty to allow scheduling on any node)
nodeSelector: {} nodeSelector: {}