feat(tools/dataform): add dataform compile tool (#1470)

## Description

This change introduces a new tool for compiling local Dataform projects.

The new tool, `dataform-compile`, allows users to programmatically run
the `dataform compile` command against a project on the local
filesystem. This tool does not require a `source` and instead relies on
the `dataform` CLI being available in the server's `PATH`.

### Changes:
* Added the new tool definition in
`internal/tools/dataformcompile/dataformcompile.go`.
* The tool requires the following parameter:
    * `project_dir`: The local Dataform project directory to compile.
* The tool uses `os/exec` to run the `dataform compile --json` command
and parses the resulting JSON output.
* Added a new integration test in
`internal/tools/dataformcompile/dataformcompile_test.go` which:
    * Skips the test if the `dataform` CLI is not found in the `PATH`.
* Uses `dataform init` to create a temporary, minimal project for
testing.
* Verifies success, missing parameter errors, and errors from a
non-existent directory.
---
> Should include a concise description of the changes (bug or feature),
it's
> impact, along with a summary of the solution

## PR Checklist

---
> Thank you for opening a Pull Request! Before submitting your PR, there
are a
> few things you can do to make sure it goes smoothly:

- [x] Make sure you reviewed

[CONTRIBUTING.md](https://github.com/googleapis/genai-toolbox/blob/main/CONTRIBUTING.md)
- [x] Make sure to open an issue as a

[bug/issue](https://github.com/googleapis/genai-toolbox/issues/new/choose)
before writing your code! That way we can discuss the change, evaluate
  designs, and agree on the general idea
- [x] Ensure the tests and linter pass
- [x] Code coverage does not decrease (if any source code was changed)
- [x] Appropriate docs were updated (if necessary)
- [x] Make sure to add `!` if this involve a breaking change

🛠️ Fixes #1469

---------

Co-authored-by: Yuan Teoh <45984206+Yuan325@users.noreply.github.com>
This commit is contained in:
Saurabh Maurya
2025-10-01 14:15:28 -07:00
committed by GitHub
parent 4dff01f98a
commit 3be9b7b3bd
7 changed files with 407 additions and 0 deletions

View File

@@ -194,6 +194,26 @@ steps:
dataplex \
dataplex
- id: "dataform"
name: golang:1
waitFor: ["compile-test-binary"]
entrypoint: /bin/bash
env:
- "GOPATH=/gopath"
secretEnv: ["CLIENT_ID"]
volumes:
- name: "go"
path: "/gopath"
args:
- -c
- |
apt-get update && apt-get install -y npm && \
npm install -g @dataform/cli && \
.ci/test_with_coverage.sh \
"Dataform" \
dataform \
dataform
- id: "postgres"
name: golang:1
waitFor: ["compile-test-binary"]

View File

@@ -80,6 +80,7 @@ import (
_ "github.com/googleapis/genai-toolbox/internal/tools/cloudsqlmysql/cloudsqlmysqlcreateinstance"
_ "github.com/googleapis/genai-toolbox/internal/tools/cloudsqlpg/cloudsqlpgcreateinstances"
_ "github.com/googleapis/genai-toolbox/internal/tools/couchbase"
_ "github.com/googleapis/genai-toolbox/internal/tools/dataform/dataformcompilelocal"
_ "github.com/googleapis/genai-toolbox/internal/tools/dataplex/dataplexlookupentry"
_ "github.com/googleapis/genai-toolbox/internal/tools/dataplex/dataplexsearchaspecttypes"
_ "github.com/googleapis/genai-toolbox/internal/tools/dataplex/dataplexsearchentries"

View File

@@ -0,0 +1,7 @@
---
title: "Dataform"
type: docs
weight: 1
description: >
Tools that work with Dataform.
---

View File

@@ -0,0 +1,48 @@
---
title: "dataform-compile-local"
type: docs
weight: 1
description: >
A "dataform-compile-local" tool runs the `dataform compile` CLI command on a local project directory.
aliases:
- /resources/tools/dataform-compile-local
---
## About
A `dataform-compile-local` tool runs the `dataform compile` command on a local Dataform project.
It is a standalone tool and **is not** compatible with any sources.
At invocation time, the tool executes `dataform compile --json` in the specified project directory and returns the resulting JSON object from the CLI.
`dataform-compile-local` takes the following parameter:
- `project_dir` (string): The absolute or relative path to the local Dataform project directory. The server process must have read access to this path.
## Requirements
### Dataform CLI
This tool executes the `dataform` command-line interface (CLI) via a system call. You must have the **`dataform` CLI** installed and available in the server's system `PATH`.
You can typically install the CLI via `npm`:
```bash
npm install -g @dataform/cli
```
See the [official Dataform documentation](https://www.google.com/search?q=https://cloud.google.com/dataform/docs/install-dataform-cli) for more details.
## Example
```yaml
tools:
my_dataform_compiler:
kind: dataform-compile-local
description: Use this tool to compile a local Dataform project.
```
## Reference
| **field** | **type** | **required** | **description** |
| :---- | :---- | :---- | :---- |
| kind | string | true | Must be "dataform-compile-local". |
| description | string | true | Description of the tool that is passed to the LLM. |

View File

@@ -0,0 +1,122 @@
// Copyright 2025 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package dataformcompilelocal
import (
"context"
"fmt"
"os/exec"
"strings"
"github.com/goccy/go-yaml"
"github.com/googleapis/genai-toolbox/internal/sources"
"github.com/googleapis/genai-toolbox/internal/tools"
)
const kind string = "dataform-compile-local"
func init() {
if !tools.Register(kind, newConfig) {
panic(fmt.Sprintf("tool kind %q already registered", kind))
}
}
func newConfig(ctx context.Context, name string, decoder *yaml.Decoder) (tools.ToolConfig, error) {
actual := Config{Name: name}
if err := decoder.DecodeContext(ctx, &actual); err != nil {
return nil, err
}
return actual, nil
}
type Config struct {
Name string `yaml:"name" validate:"required"`
Kind string `yaml:"kind" validate:"required"`
Description string `yaml:"description" validate:"required"`
AuthRequired []string `yaml:"authRequired"`
}
var _ tools.ToolConfig = Config{}
func (cfg Config) ToolConfigKind() string {
return kind
}
func (cfg Config) Initialize(srcs map[string]sources.Source) (tools.Tool, error) {
allParameters := tools.Parameters{
tools.NewStringParameter("project_dir", "The Dataform project directory."),
}
paramManifest := allParameters.Manifest()
mcpManifest := tools.GetMcpManifest(cfg.Name, cfg.Description, cfg.AuthRequired, allParameters)
t := Tool{
Name: cfg.Name,
Kind: kind,
AuthRequired: cfg.AuthRequired,
Parameters: allParameters,
manifest: tools.Manifest{Description: cfg.Description, Parameters: paramManifest, AuthRequired: cfg.AuthRequired},
mcpManifest: mcpManifest,
}
return t, nil
}
var _ tools.Tool = Tool{}
type Tool struct {
Name string `yaml:"name"`
Kind string `yaml:"kind"`
AuthRequired []string `yaml:"authRequired"`
Parameters tools.Parameters `yaml:"allParams"`
manifest tools.Manifest
mcpManifest tools.McpManifest
}
func (t Tool) Invoke(ctx context.Context, params tools.ParamValues, accessToken tools.AccessToken) (any, error) {
paramsMap := params.AsMap()
projectDir, ok := paramsMap["project_dir"].(string)
if !ok || projectDir == "" {
return nil, fmt.Errorf("error casting 'project_dir' to string or invalid value")
}
cmd := exec.CommandContext(ctx, "dataform", "compile", projectDir, "--json")
output, err := cmd.CombinedOutput()
if err != nil {
return nil, fmt.Errorf("error executing dataform compile: %w\nOutput: %s", err, string(output))
}
return strings.TrimSpace(string(output)), nil
}
func (t Tool) ParseParams(data map[string]any, claims map[string]map[string]any) (tools.ParamValues, error) {
return tools.ParseParams(t.Parameters, data, claims)
}
func (t Tool) Manifest() tools.Manifest {
return t.manifest
}
func (t Tool) McpManifest() tools.McpManifest {
return t.mcpManifest
}
func (t Tool) Authorized(verifiedAuthServices []string) bool {
return tools.IsAuthorized(t.AuthRequired, verifiedAuthServices)
}
func (t Tool) RequiresClientAuthorization() bool {
return false
}

View File

@@ -0,0 +1,71 @@
// Copyright 2025 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package dataformcompilelocal_test
import (
"testing"
yaml "github.com/goccy/go-yaml"
"github.com/google/go-cmp/cmp"
"github.com/googleapis/genai-toolbox/internal/server"
"github.com/googleapis/genai-toolbox/internal/testutils"
"github.com/googleapis/genai-toolbox/internal/tools/dataform/dataformcompilelocal"
)
func TestParseFromYamlDataformCompile(t *testing.T) {
ctx, err := testutils.ContextWithNewLogger()
if err != nil {
t.Fatalf("unexpected error: %s", err)
}
tcs := []struct {
desc string
in string
want server.ToolConfigs
}{
{
desc: "basic example",
in: `
tools:
example_tool:
kind: dataform-compile-local
description: some description
`,
want: server.ToolConfigs{
"example_tool": dataformcompilelocal.Config{
Name: "example_tool",
Kind: "dataform-compile-local",
Description: "some description",
AuthRequired: []string{},
},
},
},
}
for _, tc := range tcs {
t.Run(tc.desc, func(t *testing.T) {
got := struct {
Tools server.ToolConfigs `yaml:"tools"`
}{}
// Parse contents
err := yaml.UnmarshalContext(ctx, testutils.FormatYaml(tc.in), &got)
if err != nil {
t.Fatalf("unable to unmarshal: %s", err)
}
if diff := cmp.Diff(tc.want, got.Tools); diff != "" {
t.Fatalf("incorrect parse: diff %v", diff)
}
})
}
}

View File

@@ -0,0 +1,138 @@
// Copyright 2025 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package dataformcompilelocal
import (
"context"
"fmt"
"net/http"
"os"
"os/exec"
"path/filepath"
"regexp"
"strings"
"testing"
"time"
"github.com/googleapis/genai-toolbox/internal/testutils"
"github.com/googleapis/genai-toolbox/tests"
)
// setupTestProject creates a minimal dataform project using the 'dataform init' CLI.
// It returns the path to the directory and a cleanup function.
func setupTestProject(t *testing.T) (string, func()) {
tmpDir, err := os.MkdirTemp("", "dataform-project-*")
if err != nil {
t.Fatalf("Failed to create temp dir: %v", err)
}
cleanup := func() {
os.RemoveAll(tmpDir)
}
cmd := exec.Command("dataform", "init", tmpDir, "test-project-id", "US")
if output, err := cmd.CombinedOutput(); err != nil {
cleanup()
t.Fatalf("Failed to run 'dataform init': %v\nOutput: %s", err, string(output))
}
definitionsDir := filepath.Join(tmpDir, "definitions")
exampleSQLX := `config { type: "table" } SELECT 1 AS test_col`
err = os.WriteFile(filepath.Join(definitionsDir, "example.sqlx"), []byte(exampleSQLX), 0644)
if err != nil {
cleanup()
t.Fatalf("Failed to write example.sqlx: %v", err)
}
return tmpDir, cleanup
}
func TestDataformCompileTool(t *testing.T) {
if _, err := exec.LookPath("dataform"); err != nil {
t.Skip("dataform CLI not found in $PATH, skipping integration test")
}
projectDir, cleanupProject := setupTestProject(t)
defer cleanupProject()
toolsFile := map[string]any{
"tools": map[string]any{
"my-dataform-compiler": map[string]any{
"kind": "dataform-compile-local",
"description": "Tool to compile dataform projects",
},
},
}
ctx, cancel := context.WithTimeout(context.Background(), 2*time.Minute)
defer cancel()
cmd, cleanupServer, err := tests.StartCmd(ctx, toolsFile)
if err != nil {
t.Fatalf("command initialization returned an error: %s", err)
}
defer cleanupServer()
waitCtx, cancelWait := context.WithTimeout(ctx, 30*time.Second)
defer cancelWait()
out, err := testutils.WaitForString(waitCtx, regexp.MustCompile(`Server ready to serve`), cmd.Out)
if err != nil {
t.Logf("toolbox command logs: \n%s", out)
t.Fatalf("toolbox didn't start successfully: %s", err)
}
nonExistentDir := filepath.Join(os.TempDir(), "non-existent-dir")
testCases := []struct {
name string
reqBody string
wantStatus int
wantBody string // Substring to check for in the response
}{
{
name: "success case",
reqBody: fmt.Sprintf(`{"project_dir":"%s"}`, projectDir),
wantStatus: http.StatusOK,
wantBody: "test_col",
},
{
name: "missing parameter",
reqBody: `{}`,
wantStatus: http.StatusBadRequest,
wantBody: `parameter \"project_dir\" is required`,
},
{
name: "non-existent directory",
reqBody: fmt.Sprintf(`{"project_dir":"%s"}`, nonExistentDir),
wantStatus: http.StatusBadRequest,
wantBody: "error executing dataform compile",
},
}
api := "http://127.0.0.1:5000/api/tool/my-dataform-compiler/invoke"
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
resp, bodyBytes := tests.RunRequest(t, http.MethodPost, api, strings.NewReader(tc.reqBody), nil)
if resp.StatusCode != tc.wantStatus {
t.Fatalf("unexpected status: got %d, want %d. Body: %s", resp.StatusCode, tc.wantStatus, string(bodyBytes))
}
if tc.wantBody != "" && !strings.Contains(string(bodyBytes), tc.wantBody) {
t.Fatalf("expected body to contain %q, got: %s", tc.wantBody, string(bodyBytes))
}
})
}
}