Explore Your Database in Google Sheets

The idea is to populate a Google Sheet with data from a database at regular intervals. Since Google Sheet is quite powerful and can be accessed from any browser, it can serve as a view layer with basic built-in permissions.

In order to populate the Google Sheet, a Google Cloud Function is needed to extract the data from the database and update the sheet.

Please note, in the case of BigQuery, you can connect the sheet directly to the datasource (link).

Function

A cloud native solution is to create a Google Cloud Function (2nd generation) that writes the data to the sheet. Google Scheduler is used to trigger the execution of the Cloud Function.

The Cloud Function is implemented in TypeScript and transpiled into JavaScript during a build step. It can be run and tested locally leveraging Google Cloud's Functions Framework for Node.js.

You can find the full implementation in a GitLab repository: https://gitlab.com/x-dennis/cloud-function-google-sheet. Merge requests for improvements are appreciated and accepted.

Authentication

The Google Auth Library takes care of the authentication process.

When running it locally, in this case, it takes advantage of the "Application Default Credentials" of the locally authenticated user.

Otherwise, it assumes the Google Cloud Function identity, when it's deployed. No further configuration is required unless a different identity needs to be assumed.

const auth = new google.auth.GoogleAuth({
    scopes: ['https://www.googleapis.com/auth/spreadsheets']
})
 
const authClient = await auth.getClient()

Sheet

Google's API Node.js Client exposes Google's APIs as Node.js functions. It's the foundation for the interaction with the Google sheets.

First it needs to be initialized by providing the authenticated client.

const sheets = google.sheets({ version: 'v4', auth: authClient })

Note: The Cloud Function's account needs to be permitted to edit the Google Sheet. This step is not automated and was taken care of manually, by simply sharing the sheet with the Cloud Function's account.

Clearance

Before filling the sheet, it is cleared completely. In this case it starts at row A2, because A1 is populated with column titles, that don't change. Z indicates the last possible row and ensures every column is cleared. Employees is the name of the specific sheet.

await sheets.spreadsheets.values.clear({
    spreadsheetId,
    range: 'Employees!A2:Z'
})

Data Preparation

As an example three columns per row are populated with the id, the lastName and the firstName of an employee. The rows attribute of the result field contains the rows previously fetched from the database. index determines the row number and 2 is added, because the database rows are 0 indexed, while the sheets start at row 1. But row 1 contains the title, that's why it starts at row 2.

const data = result.rows.map((row, index) => ({
    range: `A${index + 2}:C${index + 2}`,
    values: [[row.id, row.lastName, row.firstName]]
}))

Update

The batchUpdate allows to update multiple rows at once. RAW indicates, that no type conversion is necessary.

await sheets.spreadsheets.values.batchUpdate({
    spreadsheetId,
    requestBody: {
        data,
        valueInputOption: 'RAW'
    }
})

Build

For the build tsc is run to transpile the codebase. The target for the output is the build folder in this case. The Cloud Functions Node.js runtime takes care of installing the dependencies, if a package{-lock}.json is present. That's why it's copied into this folder as well.

{
    "scripts": {
        "build": "tsc && cp package*.json build"
    }
}

Infrastructure

Instead of creating the cloud resources manually, Terraform is used to declare the infrastructure as code, to provision the resources and to track their state.

Note: Some of the depends_on are optional. They were mainly used to ensure that service APIs are enabled before running other modules that depend on those APIs.

Services

A couple of services (or rather their APIs) are required to be activated before any other Terraform module can create the desired resources.

resource "google_project_service" "functions" {
    project = var.project
    service = "cloudfunctions.googleapis.com"
 
    disable_dependent_services = true
}

Under the hood the second version of Google Cloud Function leverages the Google Cloud Run to execute (link).

resource "google_project_service" "run" {
    project = var.project
    service = "run.googleapis.com"
 
    disable_dependent_services = true
}

Cloud Run is container based, which is why the Artifact Registry is needed as well.

resource "google_project_service" "artifact_registry" {
    project = var.project
    service = "artifactregistry.googleapis.com"
 
    disable_dependent_services = true
}

Cloud Build handles the build of the function. In the default case it pulls the dependencies defined in the package.json file andcreates an image. The build step can

resource "google_project_service" "cloud_build" {
    project = var.project
    service = "cloudbuild.googleapis.com"
 
    disable_dependent_services = true
}

resource "google_project_service" "sheets" {
    project = var.project
    service = "sheets.googleapis.com"
 
    disable_dependent_services = true
}

Secrets

Somewhere the database credentials need to be stored safely. That's why the Secret Manager's API is activated as well. Possibly this could be replaced with Cloud SQL IAM based authentication.

resource "google_project_service" "secret_manager" {
    project = var.project
    service = "secretmanager.googleapis.com"
 
    disable_dependent_services = true
}

The secret is like a container for one or multiple versions of the secret.

resource "google_secret_manager_secret" "pg_password" {
    secret_id = "gcf-${random_id.function.hex}-pg-password"
 
    replication {
        user_managed {
          replicas {
            location = var.region
          }
        }
    }
 
    depends_on = [
      google_project_service.secret_manager
    ]
}

Here the password is provided by a variable that is stored in the environment of the deploying machine. A secret version contains the actual secret and some additional metadata.

resource "google_secret_manager_secret_version" "pg_password" {
    secret      = google_secret_manager_secret.pg_password.name
    secret_data = var.db_password
    enabled     = true
 
    depends_on = [
      google_project_service.secret_manager
    ]
}

It must be ensured that the service account linked to the Google Cloud Function can actually access the secrets.

resource "google_secret_manager_secret_iam_binding" "pg_password" {
    project   = google_secret_manager_secret.pg_password.project
    secret_id = google_secret_manager_secret.pg_password.secret_id
    role      = "roles/secretmanager.secretAccessor"
 
    members = [
      "serviceAccount:${google_service_account.function.email}"
    ]
 
    depends_on = [
      google_service_account.function
    ]
}

Database Connection

If a Cloud SQL instance already exists, the keyword data can be used to import its state so that it is known to Terraform.

data "google_sql_database_instance" "instance" {
    name    = var.db_instance
    project = var.project
}

Storage

The source code is stored in a Cloud Storage bucket and is pulled by Cloud Build to build the function as described before.

resource "google_project_service" "storage" {
    project = var.project
    service = "storage.googleapis.com"
 
    disable_dependent_services = true
}

A random identifier is added as a suffix to the bucket name to ensure the uniqueness of the name.

resource "google_storage_bucket" "function" {
    name      = "${var.project}-gcf-${random_id.function.hex}"
    location  = var.region
 
    uniform_bucket_level_access = true
}

The Terraform module archive_file creates a .zip archive containing the source code of the function.

data "archive_file" "function" {
    type        = "zip"
    source_dir  = "${path.root}/../cloud-function/build"
    output_path = "${path.root}/build/function.zip"
}

Unfortunately Terraform does not pick up changes in the .zip file. That's why Terraform is used to archive the source code, which in return enables the use of the output_md5 hash. Since the hash changes every time the content of the .zip file has changed, a reupload to the bucket is triggered and thus the Cloud Function will be updated.

resource "google_storage_bucket_object" "object" {
    name   = "${data.archive_file.function.output_md5}.zip"
    bucket = google_storage_bucket.function.name
    source = data.archive_file.function.output_path
}

Cloud Function

First, a service account is created the Cloud Function will assume. This service account will be equipped with different roles to ensure a seamless, password-less integration between different Google services.

Side note: The random id (random_id.function.hex) is used consistently across all resources, like service accounts, function, buckets etc., to easily identify and connect them.

resource "google_service_account" "function" {
    account_id   = "gcf-${random_id.function.hex}-sa"
    display_name = "Service account for Cloud Function accessing Google Sheets"
    project      = var.project
}

Since the Cloud Function is supposed to fetch data from a Cloud SQL database, the role roles/cloudsql.client is added to this account.

resource "google_project_iam_member" "function_sa" {
    project = var.project
    role    = "roles/cloudsql.client"
    member  = "serviceAccount:${google_service_account.function.email}"
}

The google_cloudfunctions2_function resource creates a 2nd generation Cloud Function, which accepts a plethora of parameters, that can be specified.

In the build_config block, the code source, the engine and the entry point is defined. The service_config block offers a few options to define the required resources, such as memory, but also to pass regular, plain text environment variables as well as "secret" environment variables. The latter allows us to pass a secret from the secret manager as an environment variable, which is not exposed in the Google's Cloud Console.

resource "google_cloudfunctions2_function" "function" {
    name        = "gcf-${random_id.function.hex}"
    location    = var.region
    description = "Cloud function updating a \"Google Sheet\" with database values"
 
    build_config {
        runtime     = "nodejs16"
        entry_point = "main"
 
        source {
            storage_source {
                bucket = google_storage_bucket.function.name
                object = google_storage_bucket_object.object.name
            }
        }
    }
 
    service_config {
        max_instance_count = 1
        available_memory   = "256M"
        timeout_seconds    = 60
 
        service_account_email = google_service_account.function.email
 
        environment_variables = {
            PGDATABASE      = var.db
            PGHOST          = "/cloudsql/${data.google_sql_database_instance.instance.connection_name}"
            PGPORT          = "5432"
            PGUSER          = var.db_user
            SPREADSHEET_ID  = var.spreadsheet_id
        }
 
        secret_environment_variables {
            key        = "PGPASSWORD"
            project_id = var.project
            secret     = google_secret_manager_secret.pg_password.secret_id
            version    = "latest"
        }
    }
 
    depends_on = [
        google_project_service.artifact_registry,
        google_project_service.cloud_build,
        google_project_service.run,
        google_secret_manager_secret_version.pg_password
    ]
}

There is one important catch when it comes to the Cloud Function's connection to the database: Once the function is deployed, the Cloud SQL connection has to be set manually on the underlying Cloud Run service as described in the Google docs.

Unfortunately this cannot be automated in Terraform as of today. There is open issue on GitHub discussing this.

Scheduler

The last step requires us to create a scheduler, that triggers the function every few minutes in order to update the Google Sheet.

At first another service account is required. It is used to create an OIDC token that is verified by the Cloud Function when the scheduler triggers it.

resource "google_service_account" "job" {
    account_id   = "job-${random_id.function.hex}-sa"
    display_name = "Service account for scheduled Cloud Function invocation"
}

The roles/cloudfunctions.invoker role to the service account, so it can trigger the function.

resource "google_cloudfunctions2_function_iam_member" "job" {
    project        = var.project
    location       = var.region
    cloud_function = google_cloudfunctions2_function.function.name
 
    role   = "roles/cloudfunctions.invoker"
    member = "serviceAccount:${google_service_account.job.email}"
}

Unfortunately, the previous step was not enough for the scheduler to trigger the Cloud Function. Since this was not mentioned anywhere, I had to find out for myself that the scheduler's service account also requires the roles/run.invoker role on the underlying Cloud Run instance of the Cloud Function. Fortunately, it has the same name as the function, so it is easy to identify. Only after this role was assigned could the scheduler trigger the function. Before that, authentication always failed.

resource "google_cloud_run_service_iam_member" "member" {
    project   = var.project
    location  = var.region
    // The underlying Cloud Run instance has the same name
    // like the Cloud Function which triggered its creation.
    service   = google_cloudfunctions2_function.function.name
 
    role    = "roles/run.invoker"
    member  = "serviceAccount:${google_service_account.job.email}"
 
    depends_on = [
      google_cloudfunctions2_function.function
    ]
}

Another service / API needs to be activated, so that Terraform can create all the required resources.

resource "google_project_service" "cloud_scheduler" {
    project = var.project
    service = "cloudscheduler.googleapis.com"
 
    disable_dependent_services = true
}

Finally, the job can be defined. In this case, the scheduler runs every five minutes and triggers the Cloud Function because it is set as http_target.

resource "google_cloud_scheduler_job" "job" {
    name             = "job-${random_id.function.hex}"
    description      = "Scheduled execution (every 5th minute) of Cloud Function ${google_cloudfunctions2_function.function.name}"
    schedule         = "*/5 * * * *"
    time_zone        = "Europe/Berlin"
    attempt_deadline = "60s"
 
    retry_config {
      retry_count = 1
    }
 
    http_target {
        http_method = "GET"
        uri         = google_cloudfunctions2_function.function.service_config[0].uri
 
        oidc_token {
          service_account_email = google_service_account.job.email
        }
    }
 
    depends_on = [
      google_project_service.cloud_scheduler
    ]
}

Conclusion

I find it absolutely satisfying to see all of these infrastructure components created and wired correctly in the blink of an eye. And the best part is that it can be destroyed, parameterized and repeated as many times I want!

In this case, even the deployment of the function is taken care of. The only thing left is a simple CI/CD pipeline to update the function as soon as a code change is committed to a specific branch of the versioning tool.

The second generation of Cloud Functions is even more powerful, handles the cold start problem by allowing you to always have at least one instance ready to serve requests and is even better integrated with other services due to Eventarc.

Although a Cloud Function is a powerful tool with a minimal setup, it still requires some work to set up all the moving parts (accounts, secrets, function, bucket, code deployment, cloud function, scheduler) correctly. I hope this article and the related repository can help you to reduce the initial effort and get you up to speed quicker.