The idea is to populate a Google Sheet with data from a database at regular intervals. Since Google Sheet is quite powerful and can be accessed from any browser, it can serve as a view layer with basic built-in permissions.
In order to populate the Google Sheet, a Google Cloud Function is needed to extract the data from the database and update the sheet.
Please note, in the case of BigQuery, you can connect the sheet directly to the datasource (link).
Function
A cloud native solution is to create a Google Cloud Function (2nd generation) that writes the data to the sheet. Google Scheduler is used to trigger the execution of the Cloud Function.
The Cloud Function is implemented in TypeScript and transpiled into JavaScript during a build step. It can be run and tested locally leveraging Google Cloud's Functions Framework for Node.js.
You can find the full implementation in a GitLab repository: https://gitlab.com/x-dennis/cloud-function-google-sheet. Merge requests for improvements are appreciated and accepted.
Authentication
The Google Auth Library takes care of the authentication process.
When running it locally, in this case, it takes advantage of the "Application Default Credentials" of the locally authenticated user.
Otherwise, it assumes the Google Cloud Function identity, when it's deployed. No further configuration is required unless a different identity needs to be assumed.
const auth = new google.auth.GoogleAuth({
scopes: ['https://www.googleapis.com/auth/spreadsheets']
})
const authClient = await auth.getClient()
Sheet
Google's API Node.js Client exposes Google's APIs as Node.js functions. It's the foundation for the interaction with the Google sheets.
First it needs to be initialized by providing the authenticated client.
const sheets = google.sheets({ version: 'v4', auth: authClient })
Note: The Cloud Function's account needs to be permitted to edit the Google Sheet. This step is not automated and was taken care of manually, by simply sharing the sheet with the Cloud Function's account.
Clearance
Before filling the sheet, it is cleared completely. In this case it starts at row A2
, because A1
is populated with column titles, that don't change. Z
indicates the last possible row and ensures every column is cleared. Employees
is the name of the specific sheet.
await sheets.spreadsheets.values.clear({
spreadsheetId,
range: 'Employees!A2:Z'
})
Data Preparation
As an example three columns per row are populated with the id
, the lastName
and the firstName
of an employee. The rows
attribute of the result
field contains the rows previously fetched from the database. index
determines the row
number and 2
is added, because the database rows are 0
indexed, while the sheets start at row 1
. But row 1
contains the title, that's why it starts at row 2
.
const data = result.rows.map((row, index) => ({
range: `A${index + 2}:C${index + 2}`,
values: [[row.id, row.lastName, row.firstName]]
}))
Update
The batchUpdate
allows to update multiple rows at once. RAW
indicates, that no type conversion is necessary.
await sheets.spreadsheets.values.batchUpdate({
spreadsheetId,
requestBody: {
data,
valueInputOption: 'RAW'
}
})
Build
For the build tsc
is run to transpile the codebase. The target for the output is the build
folder in this case. The Cloud Functions Node.js runtime takes care of installing the dependencies, if a package{-lock}.json
is present. That's why it's copied into this folder as well.
{
"scripts": {
"build": "tsc && cp package*.json build"
}
}
Infrastructure
Instead of creating the cloud resources manually, Terraform is used to declare the infrastructure as code, to provision the resources and to track their state.
Note: Some of the depends_on
are optional. They were mainly used to ensure that service APIs are enabled before running other modules that depend on those APIs.
Services
A couple of services (or rather their APIs) are required to be activated before any other Terraform module can create the desired resources.
resource "google_project_service" "functions" {
project = var.project
service = "cloudfunctions.googleapis.com"
disable_dependent_services = true
}
Under the hood the second version of Google Cloud Function leverages the Google Cloud Run to execute (link).
resource "google_project_service" "run" {
project = var.project
service = "run.googleapis.com"
disable_dependent_services = true
}
Cloud Run is container based, which is why the Artifact Registry is needed as well.
resource "google_project_service" "artifact_registry" {
project = var.project
service = "artifactregistry.googleapis.com"
disable_dependent_services = true
}
Cloud Build handles the build of the function. In the default case it pulls the dependencies defined in the package.json
file andcreates an image. The build step can
resource "google_project_service" "cloud_build" {
project = var.project
service = "cloudbuild.googleapis.com"
disable_dependent_services = true
}
resource "google_project_service" "sheets" {
project = var.project
service = "sheets.googleapis.com"
disable_dependent_services = true
}
Secrets
Somewhere the database credentials need to be stored safely. That's why the Secret Manager's API is activated as well. Possibly this could be replaced with Cloud SQL IAM based authentication.
resource "google_project_service" "secret_manager" {
project = var.project
service = "secretmanager.googleapis.com"
disable_dependent_services = true
}
The secret is like a container for one or multiple versions of the secret.
resource "google_secret_manager_secret" "pg_password" {
secret_id = "gcf-${random_id.function.hex}-pg-password"
replication {
user_managed {
replicas {
location = var.region
}
}
}
depends_on = [
google_project_service.secret_manager
]
}
Here the password is provided by a variable that is stored in the environment of the deploying machine. A secret version contains the actual secret and some additional metadata.
resource "google_secret_manager_secret_version" "pg_password" {
secret = google_secret_manager_secret.pg_password.name
secret_data = var.db_password
enabled = true
depends_on = [
google_project_service.secret_manager
]
}
It must be ensured that the service account linked to the Google Cloud Function can actually access the secrets.
resource "google_secret_manager_secret_iam_binding" "pg_password" {
project = google_secret_manager_secret.pg_password.project
secret_id = google_secret_manager_secret.pg_password.secret_id
role = "roles/secretmanager.secretAccessor"
members = [
"serviceAccount:${google_service_account.function.email}"
]
depends_on = [
google_service_account.function
]
}
Database Connection
If a Cloud SQL instance already exists, the keyword data
can be used to import its state so that it is known to Terraform.
data "google_sql_database_instance" "instance" {
name = var.db_instance
project = var.project
}
Storage
The source code is stored in a Cloud Storage bucket and is pulled by Cloud Build to build the function as described before.
resource "google_project_service" "storage" {
project = var.project
service = "storage.googleapis.com"
disable_dependent_services = true
}
A random identifier is added as a suffix to the bucket name to ensure the uniqueness of the name.
resource "google_storage_bucket" "function" {
name = "${var.project}-gcf-${random_id.function.hex}"
location = var.region
uniform_bucket_level_access = true
}
The Terraform module archive_file
creates a .zip
archive containing the source code of the function.
data "archive_file" "function" {
type = "zip"
source_dir = "${path.root}/../cloud-function/build"
output_path = "${path.root}/build/function.zip"
}
Unfortunately Terraform does not pick up changes in the .zip
file. That's why Terraform is used to archive the source code, which in return enables the use of the output_md5
hash. Since the hash changes every time the content of the .zip
file has changed, a reupload to the bucket is triggered and thus the Cloud Function will be updated.
resource "google_storage_bucket_object" "object" {
name = "${data.archive_file.function.output_md5}.zip"
bucket = google_storage_bucket.function.name
source = data.archive_file.function.output_path
}
Cloud Function
First, a service account is created the Cloud Function will assume. This service account will be equipped with different roles to ensure a seamless, password-less integration between different Google services.
Side note: The random id (random_id.function.hex
) is used consistently across all resources, like service accounts, function, buckets etc., to easily identify and connect them.
resource "google_service_account" "function" {
account_id = "gcf-${random_id.function.hex}-sa"
display_name = "Service account for Cloud Function accessing Google Sheets"
project = var.project
}
Since the Cloud Function is supposed to fetch data from a Cloud SQL database, the role roles/cloudsql.client
is added to this account.
resource "google_project_iam_member" "function_sa" {
project = var.project
role = "roles/cloudsql.client"
member = "serviceAccount:${google_service_account.function.email}"
}
The google_cloudfunctions2_function
resource creates a 2nd generation Cloud Function, which accepts a plethora of parameters, that can be specified.
In the build_config
block, the code source, the engine and the entry point is defined. The service_config
block offers a few options to define the required resources, such as memory, but also to pass regular, plain text environment variables as well as "secret" environment variables. The latter allows us to pass a secret from the secret manager as an environment variable, which is not exposed in the Google's Cloud Console.
resource "google_cloudfunctions2_function" "function" {
name = "gcf-${random_id.function.hex}"
location = var.region
description = "Cloud function updating a \"Google Sheet\" with database values"
build_config {
runtime = "nodejs16"
entry_point = "main"
source {
storage_source {
bucket = google_storage_bucket.function.name
object = google_storage_bucket_object.object.name
}
}
}
service_config {
max_instance_count = 1
available_memory = "256M"
timeout_seconds = 60
service_account_email = google_service_account.function.email
environment_variables = {
PGDATABASE = var.db
PGHOST = "/cloudsql/${data.google_sql_database_instance.instance.connection_name}"
PGPORT = "5432"
PGUSER = var.db_user
SPREADSHEET_ID = var.spreadsheet_id
}
secret_environment_variables {
key = "PGPASSWORD"
project_id = var.project
secret = google_secret_manager_secret.pg_password.secret_id
version = "latest"
}
}
depends_on = [
google_project_service.artifact_registry,
google_project_service.cloud_build,
google_project_service.run,
google_secret_manager_secret_version.pg_password
]
}
There is one important catch when it comes to the Cloud Function's connection to the database: Once the function is deployed, the Cloud SQL connection has to be set manually on the underlying Cloud Run service as described in the Google docs.
Unfortunately this cannot be automated in Terraform as of today. There is open issue on GitHub discussing this.
Scheduler
The last step requires us to create a scheduler, that triggers the function every few minutes in order to update the Google Sheet.
At first another service account is required. It is used to create an OIDC token that is verified by the Cloud Function when the scheduler triggers it.
resource "google_service_account" "job" {
account_id = "job-${random_id.function.hex}-sa"
display_name = "Service account for scheduled Cloud Function invocation"
}
The roles/cloudfunctions.invoker
role to the service account, so it can trigger the function.
resource "google_cloudfunctions2_function_iam_member" "job" {
project = var.project
location = var.region
cloud_function = google_cloudfunctions2_function.function.name
role = "roles/cloudfunctions.invoker"
member = "serviceAccount:${google_service_account.job.email}"
}
Unfortunately, the previous step was not enough for the scheduler to trigger the Cloud Function. Since this was not mentioned anywhere, I had to find out for myself that the scheduler's service account also requires the roles/run.invoker
role on the underlying Cloud Run instance of the Cloud Function. Fortunately, it has the same name as the function, so it is easy to identify. Only after this role was assigned could the scheduler trigger the function. Before that, authentication always failed.
resource "google_cloud_run_service_iam_member" "member" {
project = var.project
location = var.region
// The underlying Cloud Run instance has the same name
// like the Cloud Function which triggered its creation.
service = google_cloudfunctions2_function.function.name
role = "roles/run.invoker"
member = "serviceAccount:${google_service_account.job.email}"
depends_on = [
google_cloudfunctions2_function.function
]
}
Another service / API needs to be activated, so that Terraform can create all the required resources.
resource "google_project_service" "cloud_scheduler" {
project = var.project
service = "cloudscheduler.googleapis.com"
disable_dependent_services = true
}
Finally, the job can be defined. In this case, the scheduler runs every five minutes and triggers the Cloud Function because it is set as http_target
.
resource "google_cloud_scheduler_job" "job" {
name = "job-${random_id.function.hex}"
description = "Scheduled execution (every 5th minute) of Cloud Function ${google_cloudfunctions2_function.function.name}"
schedule = "*/5 * * * *"
time_zone = "Europe/Berlin"
attempt_deadline = "60s"
retry_config {
retry_count = 1
}
http_target {
http_method = "GET"
uri = google_cloudfunctions2_function.function.service_config[0].uri
oidc_token {
service_account_email = google_service_account.job.email
}
}
depends_on = [
google_project_service.cloud_scheduler
]
}
Conclusion
I find it absolutely satisfying to see all of these infrastructure components created and wired correctly in the blink of an eye. And the best part is that it can be destroyed, parameterized and repeated as many times I want!
In this case, even the deployment of the function is taken care of. The only thing left is a simple CI/CD pipeline to update the function as soon as a code change is committed to a specific branch of the versioning tool.
The second generation of Cloud Functions is even more powerful, handles the cold start problem by allowing you to always have at least one instance ready to serve requests and is even better integrated with other services due to Eventarc.
Although a Cloud Function is a powerful tool with a minimal setup, it still requires some work to set up all the moving parts (accounts, secrets, function, bucket, code deployment, cloud function, scheduler) correctly. I hope this article and the related repository can help you to reduce the initial effort and get you up to speed quicker.