feat: Add Bigtable source and tool (#418)

# Add Bigtable support

A `bigtable` source can be added as the following example

```
sources:
  test-bigtable-source:
    kind: "bigtable"
    project: "sample-project"
    instance: "sample-instance"
```

A `bigtable` tool can be added as below

```
tools:
  get-test-tool-data:
    kind: bigtable-sql
    source: test-bigtable-source
    description: Some description
    statement: SELECT * FROM `test-table` WHERE address['state'] = @state;
    parameters:
      - name: state
        type: string
        description: Filter by state
```

---------

Co-authored-by: Yuan <45984206+Yuan325@users.noreply.github.com>
Co-authored-by: Kurtis Van Gent <31518063+kurtisvg@users.noreply.github.com>
Co-authored-by: Wenxin Du <117315983+duwenxin99@users.noreply.github.com>
This commit is contained in:
An Nguyen
2025-04-18 09:48:28 -07:00
committed by GitHub
parent 91f4402a71
commit ae53b8eeff
12 changed files with 876 additions and 0 deletions

View File

@@ -0,0 +1,70 @@
---
title: "Bigtable"
type: docs
weight: 1
description: >
Bigtable is a low-latency NoSQL database service for machine learning, operational analytics, and user-facing operations. It's a wide-column, key-value store that can scale to billions of rows and thousands of columns. With Bigtable, you can replicate your data to regions across the world for high availability and data resiliency.
---
# Bigtable Source
[Bigtable][bigtable-docs] is a low-latency NoSQL database service for machine
learning, operational analytics, and user-facing operations. It's a wide-column,
key-value store that can scale to billions of rows and thousands of columns.
With Bigtable, you can replicate your data to regions across the world for high
availability and data resiliency.
If you are new to Bigtable, you can try to [create an instance and write data
with the cbt CLI][bigtable-quickstart-with-cli].
You can use [GoogleSQL statements][bigtable-googlesql] to query your Bigtable
data. GoogleSQL is an ANSI-compliant structured query language (SQL) that is
also implemented for other Google Cloud services. SQL queries are handled by
cluster nodes in the same way as NoSQL data requests. Therefore, the same best
practices apply when creating SQL queries to run against your Bigtable data,
such as avoiding full table scans or complex filters.
[bigtable-docs]: https://cloud.google.com/bigtable/docs
[bigtable-quickstart-with-cli]:
https://cloud.google.com/bigtable/docs/create-instance-write-data-cbt-cli
[bigtable-googlesql]:
https://cloud.google.com/bigtable/docs/googlesql-overview
## Requirements
### IAM Permissions
Bigtable uses [Identity and Access Management (IAM)][iam-overview] to control
user and group access to Bigtable resources at the project, instance, table, and
backup level. Toolbox will use your [Application Default Credentials (ADC)][adc]
to authorize and authenticate when interacting with [Bigtable][bigtable-docs].
In addition to [setting the ADC for your server][set-adc], you need to ensure
the IAM identity has been given the correct IAM permissions for the query
provided. See [Apply IAM roles][grant-permissions] for more information on
applying IAM permissions and roles to an identity.
[iam-overview]: https://cloud.google.com/bigtable/docs/access-control
[adc]: https://cloud.google.com/docs/authentication#adc
[set-adc]: https://cloud.google.com/docs/authentication/provide-credentials-adc
[grant-permissions]: https://cloud.google.com/bigtable/docs/access-control#iam-management-instance
## Example
```yaml
sources:
my-bigtable-source:
kind: "bigtable"
project: "my-project-id"
instance: "test-instance"
```
## Reference
| **field** | **type** | **required** | **description** |
|-----------|:--------:|:------------:|-------------------------------------------------------------------------------|
| kind | string | true | Must be "bigtable". |
| project | string | true | Id of the GCP project that the cluster was created in (e.g. "my-project-id"). |
| instance | string | true | Name of the Bigtable instance. |

View File

@@ -0,0 +1,82 @@
---
title: "bigtable-sql"
type: docs
weight: 1
description: >
A "bigtable-sql" tool executes a pre-defined SQL statement against a Google
Cloud Bigtable instance.
---
## About
A `bigtable-sql` tool executes a pre-defined SQL statement against a Bigtable
instance. It's compatible with any of the following sources:
- [bigtable](../sources/bigtable.md)
### GoogleSQL
Bigtable supports SQL queries. The integration with Toolbox supports `googlesql`
dialect, the specified SQL statement is executed as a [data manipulation
language (DML)][bigtable-googlesql] statements, and specified parameters will
inserted according to their name: e.g. `@name`.
[bigtable-googlesql]: https://cloud.google.com/bigtable/docs/googlesql-overview
## Example
```yaml
tools:
search_user_by_id_or_name:
kind: bigtable-sql
source: my-bigtable-instance
statement: |
SELECT
TO_INT64(cf[ 'id' ]) as id,
CAST(cf[ 'name' ] AS string) as name,
FROM
% s
WHERE
TO_INT64(cf[ 'id' ]) = @id
OR CAST(cf[ 'name' ] AS string) = @name;
description: |
Use this tool to get information for a specific user.
Takes an id number or a name and returns info on the user.
Example:
{{
"id": 123,
"name": "Alice",
}}
parameters:
- name: id
type: integer
description: User ID
- name: name
type: string
description: Name of the user
```
## Reference
| **field** | **type** | **required** | **description** |
|-------------|:------------------------------------------:|:------------:|--------------------------------------------------------------------------------------------------|
| kind | string | true | Must be "bigtable-sql". |
| source | string | true | Name of the source the SQL should execute on. |
| description | string | true | Description of the tool that is passed to the LLM. |
| statement | string | true | SQL statement to execute on. |
| parameters | [parameters](_index#specifying-parameters) | false | List of [parameters](_index#specifying-parameters) that will be inserted into the SQL statement. |
## Tips
- [Bigtable Studio][bigtable-studio] is a useful to explore and manage your
Bigtable data. If you're unfamiliar with the query syntax, [Query
Builder][bigtable-querybuilder] lets you build a query, run it against a
table, and then view the results in the console.
- Some Python libraries limit the use of underscore columns such as `_key`. A
workaround would be to leverage Bigtable [Logical
Views][bigtable-logical-view] to rename the columns.
[bigtable-studio]: https://cloud.google.com/bigtable/docs/manage-data-using-console
[bigtable-logical-view]: https://cloud.google.com/bigtable/docs/create-manage-logical-views
[bigtable-querybuilder]: https://cloud.google.com/bigtable/docs/query-builder