Files
genai-toolbox/docs/en/resources/embeddingModels/_index.md
Yuan Teoh 293c1d6889 feat!: update configuration file v2 (#2369)
This PR introduces a significant update to the Toolbox configuration
file format, which is one of the primary **breaking changes** required
for the implementation of the Advanced Control Plane.

# Summary of Changes
The configuration schema has been updated to enforce resource isolation
and facilitate atomic, incremental updates.
* Resource Isolation: Resource definitions are now separated into
individual blocks, using a distinct structure for each resource type
(Source, Tool, Toolset, etc.). This improves readability, management,
and auditing of configuration files.
* Field Name Modification: Internal field names have been modified to
align with declarative methodologies. Specifically, the configuration
now separates kind (general resource type, e.g., Source) from type
(specific implementation, e.g., Postgres).

# User Impact
Existing tools.yaml configuration files are now in an outdated format.
Users must eventually update their files to the new YAML format.

# Mitigation & Compatibility
Backward compatibility is maintained during this transition to ensure no
immediate user action is required for existing files.
* Immediate Backward Compatibility: The source code includes a
pre-processing layer that automatically detects outdated configuration
files (v1 format) and converts them to the new v2 format under the hood.
* [COMING SOON] Migration Support: The new toolbox migrate subcommand
will be introduced to allow users to automatically convert their old
configuration files to the latest format.

# Example
Example for config file v2:
```
kind: sources
name: my-pg-instance
type: cloud-sql-postgres
project: my-project
region: my-region
instance: my-instance
database: my_db
user: my_user
password: my_pass
---
kind: authServices
name: my-google-auth
type: google
clientId: testing-id
---
kind: tools
name: example_tool
type: postgres-sql
source: my-pg-instance
description: some description
statement: SELECT * FROM SQL_STATEMENT;
parameters:
- name: country
  type: string
  description: some description
---
kind: tools
name: example_tool_2
type: postgres-sql
source: my-pg-instance
description: returning the number one
statement: SELECT 1;
---
kind: toolsets
name: example_toolset
tools:
- example_tool
```

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Averi Kitsch <akitsch@google.com>
2026-01-27 16:58:43 -08:00

107 lines
3.1 KiB
Markdown

---
title: "EmbeddingModels"
type: docs
weight: 2
description: >
EmbeddingModels represent services that transform text into vector embeddings
for semantic search.
---
EmbeddingModels represent services that generate vector representations of text
data. In the MCP Toolbox, these models enable **Semantic Queries**, allowing
[Tools](../tools/) to automatically convert human-readable text into numerical
vectors before using them in a query.
This is primarily used in two scenarios:
- **Vector Ingestion**: Converting a text parameter into a vector string during
an `INSERT` operation.
- **Semantic Search**: Converting a natural language query into a vector to
perform similarity searches.
## Hidden Parameter Duplication (valueFromParam)
When building tools for vector ingestion, you often need the same input string
twice:
1. To store the original text in a TEXT column.
1. To generate the vector embedding for a VECTOR column.
Requesting an Agent (LLM) to output the exact same string twice is inefficient
and error-prone. The `valueFromParam` field solves this by allowing a parameter
to inherit its value from another parameter in the same tool.
### Key Behaviors
1. Hidden from Manifest: Parameters with valueFromParam set are excluded from
the tool definition sent to the Agent. The Agent does not know this parameter
exists.
1. Auto-Filled: When the tool is executed, the Toolbox automatically copies the
value from the referenced parameter before processing embeddings.
## Example
The following configuration defines an embedding model and applies it to
specific tool parameters.
{{< notice tip >}} Use environment variable replacement with the format
${ENV_NAME} instead of hardcoding your API keys into the configuration file.
{{< /notice >}}
### Step 1 - Define an Embedding Model
Define an embedding model in the `embeddingModels` section:
```yaml
kind: embeddingModels
name: gemini-model # Name of the embedding model
type: gemini
model: gemini-embedding-001
apiKey: ${GOOGLE_API_KEY}
dimension: 768
```
### Step 2 - Embed Tool Parameters
Use the defined embedding model, embed your query parameters using the
`embeddedBy` field. Only string-typed parameters can be embedded:
```yaml
# Vector ingestion tool
kind: tools
name: insert_embedding
type: postgres-sql
source: my-pg-instance
statement: |
INSERT INTO documents (content, embedding)
VALUES ($1, $2);
parameters:
- name: content
type: string
description: The raw text content to be stored in the database.
- name: vector_string
type: string
# This parameter is hidden from the LLM.
# It automatically copies the value from 'content' and embeds it.
valueFromParam: content
embeddedBy: gemini-model
---
# Semantic search tool
kind: tools
name: search_embedding
type: postgres-sql
source: my-pg-instance
statement: |
SELECT id, content, embedding <-> $1 AS distance
FROM documents
ORDER BY distance LIMIT 1
parameters:
- name: semantic_search_string
type: string
description: The search query that will be converted to a vector.
embeddedBy: gemini-model # refers to the name of a defined embedding model
```
## Kinds of Embedding Models