genai-toolbox/docs/en/resources/sources/serverless-spark.md at 7e2db3e133ffdfdbb8ce427eb4a6704edcaa0341

mirror of https://github.com/googleapis/genai-toolbox.git synced 2026-05-02 03:00:36 -04:00

Files

Yuan Teoh 293c1d6889 feat!: update configuration file v2 (#2369 )

This PR introduces a significant update to the Toolbox configuration
file format, which is one of the primary **breaking changes** required
for the implementation of the Advanced Control Plane.

# Summary of Changes
The configuration schema has been updated to enforce resource isolation
and facilitate atomic, incremental updates.
* Resource Isolation: Resource definitions are now separated into
individual blocks, using a distinct structure for each resource type
(Source, Tool, Toolset, etc.). This improves readability, management,
and auditing of configuration files.
* Field Name Modification: Internal field names have been modified to
align with declarative methodologies. Specifically, the configuration
now separates kind (general resource type, e.g., Source) from type
(specific implementation, e.g., Postgres).

# User Impact
Existing tools.yaml configuration files are now in an outdated format.
Users must eventually update their files to the new YAML format.

# Mitigation & Compatibility
Backward compatibility is maintained during this transition to ensure no
immediate user action is required for existing files.
* Immediate Backward Compatibility: The source code includes a
pre-processing layer that automatically detects outdated configuration
files (v1 format) and converts them to the new v2 format under the hood.
* [COMING SOON] Migration Support: The new toolbox migrate subcommand
will be introduced to allow users to automatically convert their old
configuration files to the latest format.

# Example
Example for config file v2:
```
kind: sources
name: my-pg-instance
type: cloud-sql-postgres
project: my-project
region: my-region
instance: my-instance
database: my_db
user: my_user
password: my_pass
---
kind: authServices
name: my-google-auth
type: google
clientId: testing-id
---
kind: tools
name: example_tool
type: postgres-sql
source: my-pg-instance
description: some description
statement: SELECT * FROM SQL_STATEMENT;
parameters:
- name: country
  type: string
  description: some description
---
kind: tools
name: example_tool_2
type: postgres-sql
source: my-pg-instance
description: returning the number one
statement: SELECT 1;
---
kind: toolsets
name: example_toolset
tools:
- example_tool
```

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Averi Kitsch <akitsch@google.com>

2026-01-27 16:58:43 -08:00

2.7 KiB

Raw Blame History

title, type, weight, description

title	type	weight	description
Serverless for Apache Spark	docs	1	Google Cloud Serverless for Apache Spark lets you run Spark workloads without requiring you to provision and manage your own Spark cluster.

About

The Serverless for Apache Spark source allows Toolbox to interact with Spark batches hosted on Google Cloud Serverless for Apache Spark.

Available Tools

serverless-spark-list-batches List and filter Serverless Spark batches.
serverless-spark-get-batch Get a Serverless Spark batch.
serverless-spark-cancel-batch Cancel a running Serverless Spark batch operation.
serverless-spark-create-pyspark-batch Create a Serverless Spark PySpark batch operation.
serverless-spark-create-spark-batch Create a Serverless Spark Java batch operation.

Requirements

IAM Permissions

Serverless for Apache Spark uses Identity and Access Management (IAM) to control user and group access to serverless Spark resources like batches and sessions.

Toolbox will use your Application Default Credentials (ADC) to authorize and authenticate when interacting with Google Cloud Serverless for Apache Spark. When using this method, you need to ensure the IAM identity associated with your ADC has the correct permissions for the actions you intend to perform. Common roles include roles/dataproc.serverlessEditor (which includes permissions to run batches) or roles/dataproc.serverlessViewer. Follow this guide to set up your ADC.

Example

kind: sources
name: my-serverless-spark-source
type: serverless-spark
project: my-project-id
location: us-central1

Reference

field	type	required	description
type	string	true	Must be "serverless-spark".
project	string	true	ID of the GCP project with Serverless for Apache Spark resources.
location	string	true	Location containing Serverless for Apache Spark resources.

2.7 KiB Raw Blame History