This PR introduces a significant update to the Toolbox configuration file format, which is one of the primary **breaking changes** required for the implementation of the Advanced Control Plane. # Summary of Changes The configuration schema has been updated to enforce resource isolation and facilitate atomic, incremental updates. * Resource Isolation: Resource definitions are now separated into individual blocks, using a distinct structure for each resource type (Source, Tool, Toolset, etc.). This improves readability, management, and auditing of configuration files. * Field Name Modification: Internal field names have been modified to align with declarative methodologies. Specifically, the configuration now separates kind (general resource type, e.g., Source) from type (specific implementation, e.g., Postgres). # User Impact Existing tools.yaml configuration files are now in an outdated format. Users must eventually update their files to the new YAML format. # Mitigation & Compatibility Backward compatibility is maintained during this transition to ensure no immediate user action is required for existing files. * Immediate Backward Compatibility: The source code includes a pre-processing layer that automatically detects outdated configuration files (v1 format) and converts them to the new v2 format under the hood. * [COMING SOON] Migration Support: The new toolbox migrate subcommand will be introduced to allow users to automatically convert their old configuration files to the latest format. # Example Example for config file v2: ``` kind: sources name: my-pg-instance type: cloud-sql-postgres project: my-project region: my-region instance: my-instance database: my_db user: my_user password: my_pass --- kind: authServices name: my-google-auth type: google clientId: testing-id --- kind: tools name: example_tool type: postgres-sql source: my-pg-instance description: some description statement: SELECT * FROM SQL_STATEMENT; parameters: - name: country type: string description: some description --- kind: tools name: example_tool_2 type: postgres-sql source: my-pg-instance description: returning the number one statement: SELECT 1; --- kind: toolsets name: example_toolset tools: - example_tool ``` --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Averi Kitsch <akitsch@google.com>
4.9 KiB
title, type, weight, description, aliases
| title | type | weight | description | aliases | |
|---|---|---|---|---|---|
| serverless-spark-create-spark-batch | docs | 2 | A "serverless-spark-create-spark-batch" tool submits a Spark batch to run asynchronously. |
|
About
A serverless-spark-create-spark-batch tool submits a Java Spark batch to a
Google Cloud Serverless for Apache Spark source. The workload executes
asynchronously and takes around a minute to begin executing; status can be
polled using the get batch tool.
It's compatible with the following sources:
serverless-spark-create-spark-batch accepts the following parameters:
mainJarFile: Optional. The gs:// URI of the jar file that contains the main class. Exactly one of mainJarFile or mainClass must be specified.mainClass: Optional. The name of the driver's main class. Exactly one of mainJarFile or mainClass must be specified.jarFiles: Optional. A list of gs:// URIs of jar files to add to the CLASSPATHs of the Spark driver and tasks.argsOptional. A list of arguments passed to the driver.versionOptional. The Serverless runtime version to execute with.
Custom Configuration
This tool supports custom
runtimeConfig
and
environmentConfig
settings, which can be specified in a tools.yaml file. These configurations
are parsed as YAML and passed to the Dataproc API.
Note: If your project requires custom runtime or environment configuration,
you must write a custom tools.yaml, you cannot use the serverless-spark
prebuilt config.
Example tools.yaml
kind: tools
name: "serverless-spark-create-spark-batch"
type: "serverless-spark-create-spark-batch"
source: "my-serverless-spark-source"
runtimeConfig:
properties:
spark.driver.memory: "1024m"
environmentConfig:
executionConfig:
networkUri: "my-network"
Response Format
The response contains the
operation
metadata JSON object corresponding to batch operation
metadata,
plus additional fields consoleUrl and logsUrl where a human can go for more
detailed information.
{
"opMetadata": {
"batch": "projects/myproject/locations/us-central1/batches/aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
"batchUuid": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
"createTime": "2025-11-19T16:36:47.607119Z",
"description": "Batch",
"labels": {
"goog-dataproc-batch-uuid": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
"goog-dataproc-location": "us-central1"
},
"operationType": "BATCH",
"warnings": [
"No runtime version specified. Using the default runtime version."
]
},
"consoleUrl": "https://console.cloud.google.com/dataproc/batches/...",
"logsUrl": "https://console.cloud.google.com/logs/viewer?..."
}
Reference
| field | type | required | description |
|---|---|---|---|
| type | string | true | Must be "serverless-spark-create-spark-batch". |
| source | string | true | Name of the source the tool should use. |
| description | string | false | Description of the tool that is passed to the LLM. |
| runtimeConfig | map | false | Runtime config for all batches created with this tool. |
| environmentConfig | map | false | Environment config for all batches created with this tool. |
| authRequired | string[] | false | List of auth services required to invoke this tool. |