docs: update documentation for dynamic LLM registry

- Update llm.md: Replace hardcoded model lists with dynamic registry description
- Update ai_condition.md: Fix default model description (now uses admin-configured recommended model)
- Add docs/platform/llm-registry.md: Comprehensive admin guide for LLM Registry UI

Addresses documentation gaps for the shift from hardcoded LlmModel enum to database-driven registry.
This commit is contained in:
Bentlybro
2026-03-02 13:38:36 +00:00
parent 11e4e8ed02
commit 394cc9027f
3 changed files with 243 additions and 8 deletions

View File

@@ -20,7 +20,7 @@ The block uses a Large Language Model (LLM) to evaluate the condition by:
| Condition | A plaintext English description of the condition to evaluate |
| Yes Value | (Optional) The value to output if the condition is true. If not provided, Input Value will be used |
| No Value | (Optional) The value to output if the condition is false. If not provided, Input Value will be used |
| Model | The LLM model to use for evaluation (defaults to GPT-4o) |
| Model | The LLM model to use for evaluation. Defaults to the platform's recommended model (configurable by your admin via the LLM Registry) |
| Credentials | API credentials for the LLM provider |
## Outputs

View File

@@ -65,7 +65,7 @@ The result routes data to yes_output or no_output, enabling intelligent branchin
| condition | A plaintext English description of the condition to evaluate | str | Yes |
| yes_value | (Optional) Value to output if the condition is true. If not provided, input_value will be used. | Yes Value | No |
| no_value | (Optional) Value to output if the condition is false. If not provided, input_value will be used. | No Value | No |
| model | The language model to use for evaluating the condition. | "o3-mini" \| "o3-2025-04-16" \| "o1" \| "o1-mini" \| "gpt-5.2-2025-12-11" \| "gpt-5.1-2025-11-13" \| "gpt-5-2025-08-07" \| "gpt-5-mini-2025-08-07" \| "gpt-5-nano-2025-08-07" \| "gpt-5-chat-latest" \| "gpt-4.1-2025-04-14" \| "gpt-4.1-mini-2025-04-14" \| "gpt-4o-mini" \| "gpt-4o" \| "gpt-4-turbo" \| "gpt-3.5-turbo" \| "claude-opus-4-1-20250805" \| "claude-opus-4-20250514" \| "claude-sonnet-4-20250514" \| "claude-opus-4-5-20251101" \| "claude-sonnet-4-5-20250929" \| "claude-haiku-4-5-20251001" \| "claude-opus-4-6" \| "claude-3-haiku-20240307" \| "Qwen/Qwen2.5-72B-Instruct-Turbo" \| "nvidia/llama-3.1-nemotron-70b-instruct" \| "meta-llama/Llama-3.3-70B-Instruct-Turbo" \| "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo" \| "meta-llama/Llama-3.2-3B-Instruct-Turbo" \| "llama-3.3-70b-versatile" \| "llama-3.1-8b-instant" \| "llama3.3" \| "llama3.2" \| "llama3" \| "llama3.1:405b" \| "dolphin-mistral:latest" \| "openai/gpt-oss-120b" \| "openai/gpt-oss-20b" \| "google/gemini-2.5-pro-preview-03-25" \| "google/gemini-3-pro-preview" \| "google/gemini-2.5-flash" \| "google/gemini-2.0-flash-001" \| "google/gemini-2.5-flash-lite-preview-06-17" \| "google/gemini-2.0-flash-lite-001" \| "mistralai/mistral-nemo" \| "cohere/command-r-08-2024" \| "cohere/command-r-plus-08-2024" \| "deepseek/deepseek-chat" \| "deepseek/deepseek-r1-0528" \| "perplexity/sonar" \| "perplexity/sonar-pro" \| "perplexity/sonar-deep-research" \| "nousresearch/hermes-3-llama-3.1-405b" \| "nousresearch/hermes-3-llama-3.1-70b" \| "amazon/nova-lite-v1" \| "amazon/nova-micro-v1" \| "amazon/nova-pro-v1" \| "microsoft/wizardlm-2-8x22b" \| "gryphe/mythomax-l2-13b" \| "meta-llama/llama-4-scout" \| "meta-llama/llama-4-maverick" \| "x-ai/grok-4" \| "x-ai/grok-4-fast" \| "x-ai/grok-4.1-fast" \| "x-ai/grok-code-fast-1" \| "moonshotai/kimi-k2" \| "qwen/qwen3-235b-a22b-thinking-2507" \| "qwen/qwen3-coder" \| "Llama-4-Scout-17B-16E-Instruct-FP8" \| "Llama-4-Maverick-17B-128E-Instruct-FP8" \| "Llama-3.3-8B-Instruct" \| "Llama-3.3-70B-Instruct" \| "v0-1.5-md" \| "v0-1.5-lg" \| "v0-1.0-md" | No |
| model | The language model to use for evaluating the condition. Available models are configured by your platform admin via the LLM Registry. Defaults to the admin-configured recommended model. | str | No |
### Outputs
@@ -103,7 +103,7 @@ The block sends the entire conversation history to the chosen LLM, including sys
|-------|-------------|------|----------|
| prompt | The prompt to send to the language model. | str | No |
| messages | List of messages in the conversation. | List[Any] | Yes |
| model | The language model to use for the conversation. | "o3-mini" \| "o3-2025-04-16" \| "o1" \| "o1-mini" \| "gpt-5.2-2025-12-11" \| "gpt-5.1-2025-11-13" \| "gpt-5-2025-08-07" \| "gpt-5-mini-2025-08-07" \| "gpt-5-nano-2025-08-07" \| "gpt-5-chat-latest" \| "gpt-4.1-2025-04-14" \| "gpt-4.1-mini-2025-04-14" \| "gpt-4o-mini" \| "gpt-4o" \| "gpt-4-turbo" \| "gpt-3.5-turbo" \| "claude-opus-4-1-20250805" \| "claude-opus-4-20250514" \| "claude-sonnet-4-20250514" \| "claude-opus-4-5-20251101" \| "claude-sonnet-4-5-20250929" \| "claude-haiku-4-5-20251001" \| "claude-opus-4-6" \| "claude-3-haiku-20240307" \| "Qwen/Qwen2.5-72B-Instruct-Turbo" \| "nvidia/llama-3.1-nemotron-70b-instruct" \| "meta-llama/Llama-3.3-70B-Instruct-Turbo" \| "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo" \| "meta-llama/Llama-3.2-3B-Instruct-Turbo" \| "llama-3.3-70b-versatile" \| "llama-3.1-8b-instant" \| "llama3.3" \| "llama3.2" \| "llama3" \| "llama3.1:405b" \| "dolphin-mistral:latest" \| "openai/gpt-oss-120b" \| "openai/gpt-oss-20b" \| "google/gemini-2.5-pro-preview-03-25" \| "google/gemini-3-pro-preview" \| "google/gemini-2.5-flash" \| "google/gemini-2.0-flash-001" \| "google/gemini-2.5-flash-lite-preview-06-17" \| "google/gemini-2.0-flash-lite-001" \| "mistralai/mistral-nemo" \| "cohere/command-r-08-2024" \| "cohere/command-r-plus-08-2024" \| "deepseek/deepseek-chat" \| "deepseek/deepseek-r1-0528" \| "perplexity/sonar" \| "perplexity/sonar-pro" \| "perplexity/sonar-deep-research" \| "nousresearch/hermes-3-llama-3.1-405b" \| "nousresearch/hermes-3-llama-3.1-70b" \| "amazon/nova-lite-v1" \| "amazon/nova-micro-v1" \| "amazon/nova-pro-v1" \| "microsoft/wizardlm-2-8x22b" \| "gryphe/mythomax-l2-13b" \| "meta-llama/llama-4-scout" \| "meta-llama/llama-4-maverick" \| "x-ai/grok-4" \| "x-ai/grok-4-fast" \| "x-ai/grok-4.1-fast" \| "x-ai/grok-code-fast-1" \| "moonshotai/kimi-k2" \| "qwen/qwen3-235b-a22b-thinking-2507" \| "qwen/qwen3-coder" \| "Llama-4-Scout-17B-16E-Instruct-FP8" \| "Llama-4-Maverick-17B-128E-Instruct-FP8" \| "Llama-3.3-8B-Instruct" \| "Llama-3.3-70B-Instruct" \| "v0-1.5-md" \| "v0-1.5-lg" \| "v0-1.0-md" | No |
| model | The language model to use for the conversation. Available models are configured by your platform admin via the LLM Registry. Defaults to the admin-configured recommended model. | str | No |
| max_tokens | The maximum number of tokens to generate in the chat completion. | int | No |
| ollama_host | Ollama host for local models | str | No |
@@ -257,7 +257,7 @@ The block formulates a prompt based on the given focus or source data, sends it
|-------|-------------|------|----------|
| focus | The focus of the list to generate. | str | No |
| source_data | The data to generate the list from. | str | No |
| model | The language model to use for generating the list. | "o3-mini" \| "o3-2025-04-16" \| "o1" \| "o1-mini" \| "gpt-5.2-2025-12-11" \| "gpt-5.1-2025-11-13" \| "gpt-5-2025-08-07" \| "gpt-5-mini-2025-08-07" \| "gpt-5-nano-2025-08-07" \| "gpt-5-chat-latest" \| "gpt-4.1-2025-04-14" \| "gpt-4.1-mini-2025-04-14" \| "gpt-4o-mini" \| "gpt-4o" \| "gpt-4-turbo" \| "gpt-3.5-turbo" \| "claude-opus-4-1-20250805" \| "claude-opus-4-20250514" \| "claude-sonnet-4-20250514" \| "claude-opus-4-5-20251101" \| "claude-sonnet-4-5-20250929" \| "claude-haiku-4-5-20251001" \| "claude-opus-4-6" \| "claude-3-haiku-20240307" \| "Qwen/Qwen2.5-72B-Instruct-Turbo" \| "nvidia/llama-3.1-nemotron-70b-instruct" \| "meta-llama/Llama-3.3-70B-Instruct-Turbo" \| "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo" \| "meta-llama/Llama-3.2-3B-Instruct-Turbo" \| "llama-3.3-70b-versatile" \| "llama-3.1-8b-instant" \| "llama3.3" \| "llama3.2" \| "llama3" \| "llama3.1:405b" \| "dolphin-mistral:latest" \| "openai/gpt-oss-120b" \| "openai/gpt-oss-20b" \| "google/gemini-2.5-pro-preview-03-25" \| "google/gemini-3-pro-preview" \| "google/gemini-2.5-flash" \| "google/gemini-2.0-flash-001" \| "google/gemini-2.5-flash-lite-preview-06-17" \| "google/gemini-2.0-flash-lite-001" \| "mistralai/mistral-nemo" \| "cohere/command-r-08-2024" \| "cohere/command-r-plus-08-2024" \| "deepseek/deepseek-chat" \| "deepseek/deepseek-r1-0528" \| "perplexity/sonar" \| "perplexity/sonar-pro" \| "perplexity/sonar-deep-research" \| "nousresearch/hermes-3-llama-3.1-405b" \| "nousresearch/hermes-3-llama-3.1-70b" \| "amazon/nova-lite-v1" \| "amazon/nova-micro-v1" \| "amazon/nova-pro-v1" \| "microsoft/wizardlm-2-8x22b" \| "gryphe/mythomax-l2-13b" \| "meta-llama/llama-4-scout" \| "meta-llama/llama-4-maverick" \| "x-ai/grok-4" \| "x-ai/grok-4-fast" \| "x-ai/grok-4.1-fast" \| "x-ai/grok-code-fast-1" \| "moonshotai/kimi-k2" \| "qwen/qwen3-235b-a22b-thinking-2507" \| "qwen/qwen3-coder" \| "Llama-4-Scout-17B-16E-Instruct-FP8" \| "Llama-4-Maverick-17B-128E-Instruct-FP8" \| "Llama-3.3-8B-Instruct" \| "Llama-3.3-70B-Instruct" \| "v0-1.5-md" \| "v0-1.5-lg" \| "v0-1.0-md" | No |
| model | The language model to use for generating the list. Available models are configured by your platform admin via the LLM Registry. Defaults to the admin-configured recommended model. | str | No |
| max_retries | Maximum number of retries for generating a valid list. | int | No |
| force_json_output | Whether to force the LLM to produce a JSON-only response. This can increase the block's reliability, but may also reduce the quality of the response because it prohibits the LLM from reasoning before providing its JSON response. | bool | No |
| max_tokens | The maximum number of tokens to generate in the chat completion. | int | No |
@@ -424,7 +424,7 @@ The block sends the input prompt to a chosen LLM, along with any system prompts
| prompt | The prompt to send to the language model. | str | Yes |
| expected_format | Expected format of the response. If provided, the response will be validated against this format. The keys should be the expected fields in the response, and the values should be the description of the field. | Dict[str, str] | Yes |
| list_result | Whether the response should be a list of objects in the expected format. | bool | No |
| model | The language model to use for answering the prompt. | "o3-mini" \| "o3-2025-04-16" \| "o1" \| "o1-mini" \| "gpt-5.2-2025-12-11" \| "gpt-5.1-2025-11-13" \| "gpt-5-2025-08-07" \| "gpt-5-mini-2025-08-07" \| "gpt-5-nano-2025-08-07" \| "gpt-5-chat-latest" \| "gpt-4.1-2025-04-14" \| "gpt-4.1-mini-2025-04-14" \| "gpt-4o-mini" \| "gpt-4o" \| "gpt-4-turbo" \| "gpt-3.5-turbo" \| "claude-opus-4-1-20250805" \| "claude-opus-4-20250514" \| "claude-sonnet-4-20250514" \| "claude-opus-4-5-20251101" \| "claude-sonnet-4-5-20250929" \| "claude-haiku-4-5-20251001" \| "claude-opus-4-6" \| "claude-3-haiku-20240307" \| "Qwen/Qwen2.5-72B-Instruct-Turbo" \| "nvidia/llama-3.1-nemotron-70b-instruct" \| "meta-llama/Llama-3.3-70B-Instruct-Turbo" \| "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo" \| "meta-llama/Llama-3.2-3B-Instruct-Turbo" \| "llama-3.3-70b-versatile" \| "llama-3.1-8b-instant" \| "llama3.3" \| "llama3.2" \| "llama3" \| "llama3.1:405b" \| "dolphin-mistral:latest" \| "openai/gpt-oss-120b" \| "openai/gpt-oss-20b" \| "google/gemini-2.5-pro-preview-03-25" \| "google/gemini-3-pro-preview" \| "google/gemini-2.5-flash" \| "google/gemini-2.0-flash-001" \| "google/gemini-2.5-flash-lite-preview-06-17" \| "google/gemini-2.0-flash-lite-001" \| "mistralai/mistral-nemo" \| "cohere/command-r-08-2024" \| "cohere/command-r-plus-08-2024" \| "deepseek/deepseek-chat" \| "deepseek/deepseek-r1-0528" \| "perplexity/sonar" \| "perplexity/sonar-pro" \| "perplexity/sonar-deep-research" \| "nousresearch/hermes-3-llama-3.1-405b" \| "nousresearch/hermes-3-llama-3.1-70b" \| "amazon/nova-lite-v1" \| "amazon/nova-micro-v1" \| "amazon/nova-pro-v1" \| "microsoft/wizardlm-2-8x22b" \| "gryphe/mythomax-l2-13b" \| "meta-llama/llama-4-scout" \| "meta-llama/llama-4-maverick" \| "x-ai/grok-4" \| "x-ai/grok-4-fast" \| "x-ai/grok-4.1-fast" \| "x-ai/grok-code-fast-1" \| "moonshotai/kimi-k2" \| "qwen/qwen3-235b-a22b-thinking-2507" \| "qwen/qwen3-coder" \| "Llama-4-Scout-17B-16E-Instruct-FP8" \| "Llama-4-Maverick-17B-128E-Instruct-FP8" \| "Llama-3.3-8B-Instruct" \| "Llama-3.3-70B-Instruct" \| "v0-1.5-md" \| "v0-1.5-lg" \| "v0-1.0-md" | No |
| model | The language model to use for answering the prompt. Available models are configured by your platform admin via the LLM Registry. Defaults to the admin-configured recommended model. | str | No |
| force_json_output | Whether to force the LLM to produce a JSON-only response. This can increase the block's reliability, but may also reduce the quality of the response because it prohibits the LLM from reasoning before providing its JSON response. | bool | No |
| sys_prompt | The system prompt to provide additional context to the model. | str | No |
| conversation_history | The conversation history to provide context for the prompt. | List[Dict[str, Any]] | No |
@@ -464,7 +464,7 @@ The block sends the input prompt to a chosen LLM, processes the response, and re
| Input | Description | Type | Required |
|-------|-------------|------|----------|
| prompt | The prompt to send to the language model. You can use any of the {keys} from Prompt Values to fill in the prompt with values from the prompt values dictionary by putting them in curly braces. | str | Yes |
| model | The language model to use for answering the prompt. | "o3-mini" \| "o3-2025-04-16" \| "o1" \| "o1-mini" \| "gpt-5.2-2025-12-11" \| "gpt-5.1-2025-11-13" \| "gpt-5-2025-08-07" \| "gpt-5-mini-2025-08-07" \| "gpt-5-nano-2025-08-07" \| "gpt-5-chat-latest" \| "gpt-4.1-2025-04-14" \| "gpt-4.1-mini-2025-04-14" \| "gpt-4o-mini" \| "gpt-4o" \| "gpt-4-turbo" \| "gpt-3.5-turbo" \| "claude-opus-4-1-20250805" \| "claude-opus-4-20250514" \| "claude-sonnet-4-20250514" \| "claude-opus-4-5-20251101" \| "claude-sonnet-4-5-20250929" \| "claude-haiku-4-5-20251001" \| "claude-opus-4-6" \| "claude-3-haiku-20240307" \| "Qwen/Qwen2.5-72B-Instruct-Turbo" \| "nvidia/llama-3.1-nemotron-70b-instruct" \| "meta-llama/Llama-3.3-70B-Instruct-Turbo" \| "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo" \| "meta-llama/Llama-3.2-3B-Instruct-Turbo" \| "llama-3.3-70b-versatile" \| "llama-3.1-8b-instant" \| "llama3.3" \| "llama3.2" \| "llama3" \| "llama3.1:405b" \| "dolphin-mistral:latest" \| "openai/gpt-oss-120b" \| "openai/gpt-oss-20b" \| "google/gemini-2.5-pro-preview-03-25" \| "google/gemini-3-pro-preview" \| "google/gemini-2.5-flash" \| "google/gemini-2.0-flash-001" \| "google/gemini-2.5-flash-lite-preview-06-17" \| "google/gemini-2.0-flash-lite-001" \| "mistralai/mistral-nemo" \| "cohere/command-r-08-2024" \| "cohere/command-r-plus-08-2024" \| "deepseek/deepseek-chat" \| "deepseek/deepseek-r1-0528" \| "perplexity/sonar" \| "perplexity/sonar-pro" \| "perplexity/sonar-deep-research" \| "nousresearch/hermes-3-llama-3.1-405b" \| "nousresearch/hermes-3-llama-3.1-70b" \| "amazon/nova-lite-v1" \| "amazon/nova-micro-v1" \| "amazon/nova-pro-v1" \| "microsoft/wizardlm-2-8x22b" \| "gryphe/mythomax-l2-13b" \| "meta-llama/llama-4-scout" \| "meta-llama/llama-4-maverick" \| "x-ai/grok-4" \| "x-ai/grok-4-fast" \| "x-ai/grok-4.1-fast" \| "x-ai/grok-code-fast-1" \| "moonshotai/kimi-k2" \| "qwen/qwen3-235b-a22b-thinking-2507" \| "qwen/qwen3-coder" \| "Llama-4-Scout-17B-16E-Instruct-FP8" \| "Llama-4-Maverick-17B-128E-Instruct-FP8" \| "Llama-3.3-8B-Instruct" \| "Llama-3.3-70B-Instruct" \| "v0-1.5-md" \| "v0-1.5-lg" \| "v0-1.0-md" | No |
| model | The language model to use for answering the prompt. Available models are configured by your platform admin via the LLM Registry. Defaults to the admin-configured recommended model. | str | No |
| sys_prompt | The system prompt to provide additional context to the model. | str | No |
| retry | Number of times to retry the LLM call if the response does not match the expected format. | int | No |
| prompt_values | Values used to fill in the prompt. The values can be used in the prompt by putting them in a double curly braces, e.g. {{variable_name}}. | Dict[str, str] | No |
@@ -501,7 +501,7 @@ The block splits the input text into smaller chunks, sends each chunk to an LLM
| Input | Description | Type | Required |
|-------|-------------|------|----------|
| text | The text to summarize. | str | Yes |
| model | The language model to use for summarizing the text. | "o3-mini" \| "o3-2025-04-16" \| "o1" \| "o1-mini" \| "gpt-5.2-2025-12-11" \| "gpt-5.1-2025-11-13" \| "gpt-5-2025-08-07" \| "gpt-5-mini-2025-08-07" \| "gpt-5-nano-2025-08-07" \| "gpt-5-chat-latest" \| "gpt-4.1-2025-04-14" \| "gpt-4.1-mini-2025-04-14" \| "gpt-4o-mini" \| "gpt-4o" \| "gpt-4-turbo" \| "gpt-3.5-turbo" \| "claude-opus-4-1-20250805" \| "claude-opus-4-20250514" \| "claude-sonnet-4-20250514" \| "claude-opus-4-5-20251101" \| "claude-sonnet-4-5-20250929" \| "claude-haiku-4-5-20251001" \| "claude-opus-4-6" \| "claude-3-haiku-20240307" \| "Qwen/Qwen2.5-72B-Instruct-Turbo" \| "nvidia/llama-3.1-nemotron-70b-instruct" \| "meta-llama/Llama-3.3-70B-Instruct-Turbo" \| "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo" \| "meta-llama/Llama-3.2-3B-Instruct-Turbo" \| "llama-3.3-70b-versatile" \| "llama-3.1-8b-instant" \| "llama3.3" \| "llama3.2" \| "llama3" \| "llama3.1:405b" \| "dolphin-mistral:latest" \| "openai/gpt-oss-120b" \| "openai/gpt-oss-20b" \| "google/gemini-2.5-pro-preview-03-25" \| "google/gemini-3-pro-preview" \| "google/gemini-2.5-flash" \| "google/gemini-2.0-flash-001" \| "google/gemini-2.5-flash-lite-preview-06-17" \| "google/gemini-2.0-flash-lite-001" \| "mistralai/mistral-nemo" \| "cohere/command-r-08-2024" \| "cohere/command-r-plus-08-2024" \| "deepseek/deepseek-chat" \| "deepseek/deepseek-r1-0528" \| "perplexity/sonar" \| "perplexity/sonar-pro" \| "perplexity/sonar-deep-research" \| "nousresearch/hermes-3-llama-3.1-405b" \| "nousresearch/hermes-3-llama-3.1-70b" \| "amazon/nova-lite-v1" \| "amazon/nova-micro-v1" \| "amazon/nova-pro-v1" \| "microsoft/wizardlm-2-8x22b" \| "gryphe/mythomax-l2-13b" \| "meta-llama/llama-4-scout" \| "meta-llama/llama-4-maverick" \| "x-ai/grok-4" \| "x-ai/grok-4-fast" \| "x-ai/grok-4.1-fast" \| "x-ai/grok-code-fast-1" \| "moonshotai/kimi-k2" \| "qwen/qwen3-235b-a22b-thinking-2507" \| "qwen/qwen3-coder" \| "Llama-4-Scout-17B-16E-Instruct-FP8" \| "Llama-4-Maverick-17B-128E-Instruct-FP8" \| "Llama-3.3-8B-Instruct" \| "Llama-3.3-70B-Instruct" \| "v0-1.5-md" \| "v0-1.5-lg" \| "v0-1.0-md" | No |
| model | The language model to use for summarizing the text. Available models are configured by your platform admin via the LLM Registry. Defaults to the admin-configured recommended model. | str | No |
| focus | The topic to focus on in the summary | str | No |
| style | The style of the summary to generate. | "concise" \| "detailed" \| "bullet points" \| "numbered list" | No |
| max_tokens | The maximum number of tokens to generate in the chat completion. | int | No |
@@ -763,7 +763,7 @@ Configure agent_mode_max_iterations to control loop behavior: 0 for single decis
| Input | Description | Type | Required |
|-------|-------------|------|----------|
| prompt | The prompt to send to the language model. | str | Yes |
| model | The language model to use for answering the prompt. | "o3-mini" \| "o3-2025-04-16" \| "o1" \| "o1-mini" \| "gpt-5.2-2025-12-11" \| "gpt-5.1-2025-11-13" \| "gpt-5-2025-08-07" \| "gpt-5-mini-2025-08-07" \| "gpt-5-nano-2025-08-07" \| "gpt-5-chat-latest" \| "gpt-4.1-2025-04-14" \| "gpt-4.1-mini-2025-04-14" \| "gpt-4o-mini" \| "gpt-4o" \| "gpt-4-turbo" \| "gpt-3.5-turbo" \| "claude-opus-4-1-20250805" \| "claude-opus-4-20250514" \| "claude-sonnet-4-20250514" \| "claude-opus-4-5-20251101" \| "claude-sonnet-4-5-20250929" \| "claude-haiku-4-5-20251001" \| "claude-opus-4-6" \| "claude-3-haiku-20240307" \| "Qwen/Qwen2.5-72B-Instruct-Turbo" \| "nvidia/llama-3.1-nemotron-70b-instruct" \| "meta-llama/Llama-3.3-70B-Instruct-Turbo" \| "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo" \| "meta-llama/Llama-3.2-3B-Instruct-Turbo" \| "llama-3.3-70b-versatile" \| "llama-3.1-8b-instant" \| "llama3.3" \| "llama3.2" \| "llama3" \| "llama3.1:405b" \| "dolphin-mistral:latest" \| "openai/gpt-oss-120b" \| "openai/gpt-oss-20b" \| "google/gemini-2.5-pro-preview-03-25" \| "google/gemini-3-pro-preview" \| "google/gemini-2.5-flash" \| "google/gemini-2.0-flash-001" \| "google/gemini-2.5-flash-lite-preview-06-17" \| "google/gemini-2.0-flash-lite-001" \| "mistralai/mistral-nemo" \| "cohere/command-r-08-2024" \| "cohere/command-r-plus-08-2024" \| "deepseek/deepseek-chat" \| "deepseek/deepseek-r1-0528" \| "perplexity/sonar" \| "perplexity/sonar-pro" \| "perplexity/sonar-deep-research" \| "nousresearch/hermes-3-llama-3.1-405b" \| "nousresearch/hermes-3-llama-3.1-70b" \| "amazon/nova-lite-v1" \| "amazon/nova-micro-v1" \| "amazon/nova-pro-v1" \| "microsoft/wizardlm-2-8x22b" \| "gryphe/mythomax-l2-13b" \| "meta-llama/llama-4-scout" \| "meta-llama/llama-4-maverick" \| "x-ai/grok-4" \| "x-ai/grok-4-fast" \| "x-ai/grok-4.1-fast" \| "x-ai/grok-code-fast-1" \| "moonshotai/kimi-k2" \| "qwen/qwen3-235b-a22b-thinking-2507" \| "qwen/qwen3-coder" \| "Llama-4-Scout-17B-16E-Instruct-FP8" \| "Llama-4-Maverick-17B-128E-Instruct-FP8" \| "Llama-3.3-8B-Instruct" \| "Llama-3.3-70B-Instruct" \| "v0-1.5-md" \| "v0-1.5-lg" \| "v0-1.0-md" | No |
| model | The language model to use for answering the prompt. Available models are configured by your platform admin via the LLM Registry. Defaults to the admin-configured recommended model. | str | No |
| multiple_tool_calls | Whether to allow multiple tool calls in a single response. | bool | No |
| sys_prompt | The system prompt to provide additional context to the model. | str | No |
| conversation_history | The conversation history to provide context for the prompt. | List[Dict[str, Any]] | No |

View File

@@ -0,0 +1,235 @@
# LLM Registry Admin Guide
The LLM Registry is a database-driven system that manages all available LLM models, providers, and creators across your AutoGPT platform. This allows platform administrators to dynamically control which models are available without requiring code deployments.
## Overview
The LLM Registry replaces the previous hardcoded model system with a flexible, admin-controlled registry accessible at `/admin/llms`. This enables you to:
- Add and configure LLM models from any provider
- Set model costs and pricing tiers
- Enable or disable models at runtime
- Migrate workflows to different models when needed
- Configure a platform-wide recommended default model
## Accessing the LLM Registry
Navigate to `/admin/llms` in your AutoGPT platform (admin privileges required). The admin UI has four main sections:
1. **Providers** — LLM providers (OpenAI, Anthropic, etc.)
2. **Models** — Individual LLM models with configurations
3. **Creators** — Organizations that created/trained models
4. **Migrations** — Model migration history and management
## Managing Providers
Providers represent the API endpoints for LLM services (e.g., OpenAI, Anthropic, Groq).
### Adding a Provider
1. Click "Add Provider" in the Providers tab
2. Fill in the provider details:
- **Name**: Internal identifier (e.g., `openai`, `anthropic`)
- **Display Name**: User-facing name
- **Description**: Optional provider description
- **Default Credentials**: Default credential provider name
- **Capabilities**: Check which features the provider supports:
- Tools/function calling
- JSON output mode
- Reasoning tokens
- Parallel tool calls
### Editing a Provider
Click the edit icon next to any provider to update its configuration. Changes take effect immediately.
## Managing Models
Models are the individual LLM variants available for blocks to use.
### Adding a Model
1. Click "Add Model" in the Models tab
2. Configure the model:
- **Slug**: Unique identifier (e.g., `gpt-4o`, `claude-opus-4-6`)
- **Display Name**: User-facing name
- **Provider**: Select from your configured providers
- **Creator**: Select the organization that trained the model
- **Context Window**: Maximum input tokens
- **Max Output Tokens**: Maximum generation length
- **Enabled**: Whether the model is available for use
- **Costs**: Add cost entries for different credential providers and units (per-run or per-token)
### Model Costs
Each model can have multiple cost entries for different credential providers. For example:
- OpenAI API credentials: 5 credits per run
- Custom self-hosted deployment: 2 credits per run
Cost units can be:
- **RUN**: Flat cost per model invocation
- **TOKEN**: Cost scales with token usage
### Disabling a Model
You have two options when disabling a model:
#### Simple Disable
Just uncheck "Enabled" on the model edit form. Workflows using this model will fail until it's re-enabled.
#### Disable with Migration
Use the "Toggle Model" feature to disable a model AND migrate all existing workflows to a replacement:
1. Click "Toggle" on the model
2. Select the replacement model slug
3. Optionally set a custom credit cost for migrated workflows
4. Provide a migration reason (e.g., "Provider outage")
This creates a migration record and updates all `AgentNode` records using the old model.
## Model Migrations
The migration system tracks when workflows are moved from one model to another, allowing you to revert if needed.
### Viewing Migrations
The Migrations tab shows all model migration history with:
- Source and target model
- Number of nodes migrated
- Migration reason
- Timestamp
- Revert status
### Reverting a Migration
If the original model becomes available again:
1. Find the migration in the Migrations tab
2. Click "Revert"
3. Choose whether to re-enable the source model
4. All affected workflows are switched back to the original model
### Custom Pricing for Migrations
When migrating, you can set a custom credit cost that overrides the target model's normal cost. This is useful if you want to:
- Charge the same cost as the original model
- Offer discounted pricing during a migration period
- Apply special pricing for affected users
The custom cost is stored in the migration record and applied to all affected workflows.
## Recommended Model
The platform can have one recommended model that serves as the default for all LLM blocks when no model is explicitly specified.
### Setting the Recommended Model
1. Go to the Models tab
2. Use the "Recommended Model" selector at the top
3. Choose the model you want as the platform default
4. Save
All blocks with `model` inputs using `default_factory=LlmModel.default()` will use this model.
### When to Change It
- After adding a better/cheaper model
- During provider outages or deprecations
- For platform-wide cost optimization
## Model Creators
Creators represent the organizations that developed and trained the models (e.g., OpenAI, Meta, Anthropic).
### Adding a Creator
1. Click "Add Creator"
2. Fill in:
- **Name**: Internal identifier
- **Display Name**: User-facing name
- **Description**: Optional background information
- **Website URL**: Link to creator's site
- **Logo URL**: Optional brand logo
Creators are used for organization and attribution in the UI.
## How Block Defaults Work
When a block has a `model` input:
- **If user selects a model**: That model is used
- **If user leaves it blank**: The platform's recommended model is used
- **If a model is disabled**: Workflows fail (unless migrated)
- **If a fallback exists**: Same-provider fallback may be used automatically
## Common Admin Tasks
### Adding a New Provider and Models
1. Add the provider (e.g., "groq")
2. Add its creator if not already present (e.g., "Meta")
3. Add individual models from that provider
4. Configure costs for each model
5. Enable the models you want available
6. Optionally set one as the recommended default
### Responding to a Provider Outage
1. Go to the affected provider's models
2. For each enabled model, click "Toggle"
3. Select a replacement model from a different provider
4. Set migration reason: "Provider outage"
5. Optionally set custom cost to match original pricing
6. When the provider recovers, revert the migrations
### Deprecating Old Models
1. Disable the deprecated model with migration
2. Choose a newer/better replacement
3. Reason: "Model deprecated by provider"
4. Monitor costs and performance of the replacement
5. After validation period, keep the migration permanent
## Architecture Notes
- Models are stored in the `LlmModel` database table
- Costs are in `LlmModelCost` with foreign key to model
- Migrations are tracked in `LlmModelMigration`
- The registry is cached and refreshed automatically
- Changes take effect immediately without deployments
## Troubleshooting
### "Model not found in registry"
- Check that the model exists in `/admin/llms`
- Verify the slug matches exactly (case-sensitive)
- Refresh the registry if you just added it
### "Model exists but is disabled"
- Enable it in the model edit form, or
- Provide a migration target when disabling
### Workflows still using old model after migration
- Check the migration record was created
- Verify the migration wasn't reverted
- Inspect the `AgentNode` records in the database
## Best Practices
1. **Always provide a migration target when disabling models** — prevents workflow failures
2. **Use descriptive migration reasons** — helps with audit trails
3. **Test new models before setting as recommended** — validate cost/performance
4. **Keep at least one model from each major provider enabled** — redundancy during outages
5. **Review migration history periodically** — clean up old migrations if needed
6. **Document custom costs in migration reasons** — explain pricing changes
## Security Considerations
- Only platform admins should have access to `/admin/llms`
- Model costs directly affect user credits and billing
- Migration records are permanent for audit purposes
- Credential provider names must match exactly with the credentials system
---
For questions or issues with the LLM Registry, consult the [platform documentation](../README.md) or contact your platform administrator.