genai-toolbox/docs/en/resources/tools/serverless-spark/serverless-spark-create-spark-batch.md at d1eb1799a0d70cff53dec0bbbcacb3ed2cc237ca

mirror of https://github.com/googleapis/genai-toolbox.git synced 2026-05-02 03:00:36 -04:00

Files

Yuan Teoh 293c1d6889 feat!: update configuration file v2 (#2369 )

This PR introduces a significant update to the Toolbox configuration
file format, which is one of the primary **breaking changes** required
for the implementation of the Advanced Control Plane.

# Summary of Changes
The configuration schema has been updated to enforce resource isolation
and facilitate atomic, incremental updates.
* Resource Isolation: Resource definitions are now separated into
individual blocks, using a distinct structure for each resource type
(Source, Tool, Toolset, etc.). This improves readability, management,
and auditing of configuration files.
* Field Name Modification: Internal field names have been modified to
align with declarative methodologies. Specifically, the configuration
now separates kind (general resource type, e.g., Source) from type
(specific implementation, e.g., Postgres).

# User Impact
Existing tools.yaml configuration files are now in an outdated format.
Users must eventually update their files to the new YAML format.

# Mitigation & Compatibility
Backward compatibility is maintained during this transition to ensure no
immediate user action is required for existing files.
* Immediate Backward Compatibility: The source code includes a
pre-processing layer that automatically detects outdated configuration
files (v1 format) and converts them to the new v2 format under the hood.
* [COMING SOON] Migration Support: The new toolbox migrate subcommand
will be introduced to allow users to automatically convert their old
configuration files to the latest format.

# Example
Example for config file v2:
```
kind: sources
name: my-pg-instance
type: cloud-sql-postgres
project: my-project
region: my-region
instance: my-instance
database: my_db
user: my_user
password: my_pass
---
kind: authServices
name: my-google-auth
type: google
clientId: testing-id
---
kind: tools
name: example_tool
type: postgres-sql
source: my-pg-instance
description: some description
statement: SELECT * FROM SQL_STATEMENT;
parameters:
- name: country
  type: string
  description: some description
---
kind: tools
name: example_tool_2
type: postgres-sql
source: my-pg-instance
description: returning the number one
statement: SELECT 1;
---
kind: toolsets
name: example_toolset
tools:
- example_tool
```

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Averi Kitsch <akitsch@google.com>

2026-01-27 16:58:43 -08:00

4.9 KiB

Raw Blame History

title, type, weight, description, aliases

title

type

weight

description

aliases

serverless-spark-create-spark-batch

docs

A "serverless-spark-create-spark-batch" tool submits a Spark batch to run asynchronously.

/resources/tools/serverless-spark-create-spark-batch

About

A serverless-spark-create-spark-batch tool submits a Java Spark batch to a Google Cloud Serverless for Apache Spark source. The workload executes asynchronously and takes around a minute to begin executing; status can be polled using the get batch tool.

It's compatible with the following sources:

serverless-spark

serverless-spark-create-spark-batch accepts the following parameters:

mainJarFile: Optional. The gs:// URI of the jar file that contains the main class. Exactly one of mainJarFile or mainClass must be specified.
mainClass: Optional. The name of the driver's main class. Exactly one of mainJarFile or mainClass must be specified.
jarFiles: Optional. A list of gs:// URIs of jar files to add to the CLASSPATHs of the Spark driver and tasks.
args Optional. A list of arguments passed to the driver.
version Optional. The Serverless runtime version to execute with.

Custom Configuration

This tool supports custom runtimeConfig and environmentConfig settings, which can be specified in a tools.yaml file. These configurations are parsed as YAML and passed to the Dataproc API.

Note: If your project requires custom runtime or environment configuration, you must write a custom tools.yaml, you cannot use the serverless-spark prebuilt config.

Example `tools.yaml`

kind: tools
name: "serverless-spark-create-spark-batch"
type: "serverless-spark-create-spark-batch"
source: "my-serverless-spark-source"
runtimeConfig:
  properties:
    spark.driver.memory: "1024m"
environmentConfig:
  executionConfig:
    networkUri: "my-network"

Response Format

The response contains the operation metadata JSON object corresponding to batch operation metadata, plus additional fields consoleUrl and logsUrl where a human can go for more detailed information.

{
  "opMetadata": {
    "batch": "projects/myproject/locations/us-central1/batches/aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
    "batchUuid": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
    "createTime": "2025-11-19T16:36:47.607119Z",
    "description": "Batch",
    "labels": {
      "goog-dataproc-batch-uuid": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
      "goog-dataproc-location": "us-central1"
    },
    "operationType": "BATCH",
    "warnings": [
      "No runtime version specified. Using the default runtime version."
    ]
  },
  "consoleUrl": "https://console.cloud.google.com/dataproc/batches/...",
  "logsUrl": "https://console.cloud.google.com/logs/viewer?..."
}

Reference

field	type	required	description
type	string	true	Must be "serverless-spark-create-spark-batch".
source	string	true	Name of the source the tool should use.
description	string	false	Description of the tool that is passed to the LLM.
runtimeConfig	map	false	Runtime config for all batches created with this tool.
environmentConfig	map	false	Environment config for all batches created with this tool.
authRequired	string[]	false	List of auth services required to invoke this tool.

4.9 KiB Raw Blame History