Files
genai-toolbox/docs/en/resources/sources/bigquery.md
Yuan Teoh 293c1d6889 feat!: update configuration file v2 (#2369)
This PR introduces a significant update to the Toolbox configuration
file format, which is one of the primary **breaking changes** required
for the implementation of the Advanced Control Plane.

# Summary of Changes
The configuration schema has been updated to enforce resource isolation
and facilitate atomic, incremental updates.
* Resource Isolation: Resource definitions are now separated into
individual blocks, using a distinct structure for each resource type
(Source, Tool, Toolset, etc.). This improves readability, management,
and auditing of configuration files.
* Field Name Modification: Internal field names have been modified to
align with declarative methodologies. Specifically, the configuration
now separates kind (general resource type, e.g., Source) from type
(specific implementation, e.g., Postgres).

# User Impact
Existing tools.yaml configuration files are now in an outdated format.
Users must eventually update their files to the new YAML format.

# Mitigation & Compatibility
Backward compatibility is maintained during this transition to ensure no
immediate user action is required for existing files.
* Immediate Backward Compatibility: The source code includes a
pre-processing layer that automatically detects outdated configuration
files (v1 format) and converts them to the new v2 format under the hood.
* [COMING SOON] Migration Support: The new toolbox migrate subcommand
will be introduced to allow users to automatically convert their old
configuration files to the latest format.

# Example
Example for config file v2:
```
kind: sources
name: my-pg-instance
type: cloud-sql-postgres
project: my-project
region: my-region
instance: my-instance
database: my_db
user: my_user
password: my_pass
---
kind: authServices
name: my-google-auth
type: google
clientId: testing-id
---
kind: tools
name: example_tool
type: postgres-sql
source: my-pg-instance
description: some description
statement: SELECT * FROM SQL_STATEMENT;
parameters:
- name: country
  type: string
  description: some description
---
kind: tools
name: example_tool_2
type: postgres-sql
source: my-pg-instance
description: returning the number one
statement: SELECT 1;
---
kind: toolsets
name: example_toolset
tools:
- example_tool
```

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Averi Kitsch <akitsch@google.com>
2026-01-27 16:58:43 -08:00

13 KiB

title, type, weight, description
title type weight description
BigQuery docs 1 BigQuery is Google Cloud's fully managed, petabyte-scale, and cost-effective analytics data warehouse that lets you run analytics over vast amounts of data in near real time. With BigQuery, there's no infrastructure to set up or manage, letting you focus on finding meaningful insights using GoogleSQL and taking advantage of flexible pricing models across on-demand and flat-rate options.

BigQuery Source

BigQuery is Google Cloud's fully managed, petabyte-scale, and cost-effective analytics data warehouse that lets you run analytics over vast amounts of data in near real time. With BigQuery, there's no infrastructure to set up or manage, letting you focus on finding meaningful insights using GoogleSQL and taking advantage of flexible pricing models across on-demand and flat-rate options.

If you are new to BigQuery, you can try to load and query data with the bq tool.

BigQuery uses GoogleSQL for querying data. GoogleSQL is an ANSI-compliant structured query language (SQL) that is also implemented for other Google Cloud services. SQL queries are handled by cluster nodes in the same way as NoSQL data requests. Therefore, the same best practices apply when creating SQL queries to run against your BigQuery data, such as avoiding full table scans or complex filters.

Available Tools

Pre-built Configurations

Requirements

IAM Permissions

BigQuery uses Identity and Access Management (IAM) to control user and group access to BigQuery resources like projects, datasets, and tables.

Authentication via Application Default Credentials (ADC)

By default, Toolbox will use your Application Default Credentials (ADC) to authorize and authenticate when interacting with BigQuery.

When using this method, you need to ensure the IAM identity associated with your ADC (such as a service account) has the correct permissions for the queries you intend to run. Common roles include roles/bigquery.user (which includes permissions to run jobs and read data) or roles/bigbigquery.dataViewer. Follow this guide to set up your ADC.

If you are running on Google Compute Engine (GCE) or Google Kubernetes Engine (GKE), you might need to explicitly set the access scopes for the service account. While you can configure scopes when creating the VM or node pool, you can also specify them in the source configuration using the scopes field. Common scopes include https://www.googleapis.com/auth/bigquery or https://www.googleapis.com/auth/cloud-platform.

Authentication via User's OAuth Access Token

If the useClientOAuth parameter is set to true, Toolbox will instead use the OAuth access token for authentication. This token is parsed from the Authorization header passed in with the tool invocation request. This method allows Toolbox to make queries to BigQuery on behalf of the client or the end-user.

When using this on-behalf-of authentication, you must ensure that the identity used has been granted the correct IAM permissions.

Example

Initialize a BigQuery source that uses ADC:

kind: sources
name: my-bigquery-source
type: "bigquery"
project: "my-project-id"
# location: "US" # Optional: Specifies the location for query jobs.
# writeMode: "allowed" # One of: allowed, blocked, protected. Defaults to "allowed".
# allowedDatasets: # Optional: Restricts tool access to a specific list of datasets.
#   - "my_dataset_1"
#   - "other_project.my_dataset_2"
# impersonateServiceAccount: "service-account@project-id.iam.gserviceaccount.com" # Optional: Service account to impersonate
# scopes: # Optional: List of OAuth scopes to request.
#   - "https://www.googleapis.com/auth/bigquery"
#   - "https://www.googleapis.com/auth/drive.readonly"
# maxQueryResultRows: 50 # Optional: Limits the number of rows returned by queries. Defaults to 50.

Initialize a BigQuery source that uses the client's access token:

kind: sources
name: my-bigquery-client-auth-source
type: "bigquery"
project: "my-project-id"
useClientOAuth: true
# location: "US" # Optional: Specifies the location for query jobs.
# writeMode: "allowed" # One of: allowed, blocked, protected. Defaults to "allowed".
# allowedDatasets: # Optional: Restricts tool access to a specific list of datasets.
#   - "my_dataset_1"
#   - "other_project.my_dataset_2"
# impersonateServiceAccount: "service-account@project-id.iam.gserviceaccount.com" # Optional: Service account to impersonate
# scopes: # Optional: List of OAuth scopes to request.
#   - "https://www.googleapis.com/auth/bigquery"
#   - "https://www.googleapis.com/auth/drive.readonly"
# maxQueryResultRows: 50 # Optional: Limits the number of rows returned by queries. Defaults to 50.

Reference

field type required description
type string true Must be "bigquery".
project string true Id of the Google Cloud project to use for billing and as the default project for BigQuery resources.
location string false Specifies the location (e.g., 'us', 'asia-northeast1') in which to run the query job. This location must match the location of any tables referenced in the query. Defaults to the table's location or 'US' if the location cannot be determined. Learn More
writeMode string false Controls the write behavior for tools. allowed (default): All queries are permitted. blocked: Only SELECT statements are allowed for the bigquery-execute-sql tool. protected: Enables session-based execution where all tools associated with this source instance share the same BigQuery session. This allows for stateful operations using temporary tables (e.g., CREATE TEMP TABLE). For bigquery-execute-sql, SELECT statements can be used on all tables, but write operations are restricted to the session's temporary dataset. For tools like bigquery-sql, bigquery-forecast, and bigquery-analyze-contribution, the writeMode restrictions do not apply, but they will operate within the shared session. Note: The protected mode cannot be used with useClientOAuth: true. It is also not recommended for multi-user server environments, as all users would share the same session. A session is terminated automatically after 24 hours of inactivity or after 7 days, whichever comes first. A new session is created on the next request, and any temporary data from the previous session will be lost.
allowedDatasets []string false An optional list of dataset IDs that tools using this source are allowed to access. If provided, any tool operation attempting to access a dataset not in this list will be rejected. To enforce this, two types of operations are also disallowed: 1) Dataset-level operations (e.g., CREATE SCHEMA), and 2) operations where table access cannot be statically analyzed (e.g., EXECUTE IMMEDIATE, CREATE PROCEDURE). If a single dataset is provided, it will be treated as the default for prebuilt tools.
useClientOAuth bool false If true, forwards the client's OAuth access token from the "Authorization" header to downstream queries. Note: This cannot be used with writeMode: protected.
scopes []string false A list of OAuth 2.0 scopes to use for the credentials. If not provided, default scopes are used.
impersonateServiceAccount string false Service account email to impersonate when making BigQuery and Dataplex API calls. The authenticated principal must have the roles/iam.serviceAccountTokenCreator role on the target service account. Learn More
maxQueryResultRows int false The maximum number of rows to return from a query. Defaults to 50.