Week 1 - Terraform and Snowflake

What is Terraform?

Terraform is a Infrastructure as Code (IaC) tool developed by HashiCorp. It allows users to provision and control their infrastructure using yaml files.
It does the same thing that something like an aws cdk script would do (creating, updating, deleting resources, maintaining state), but with much less hassle. It's also platform agnostic, it supports all major cloud providers (AWS, Azure, GCP) as well as services like Snowflake, Docker, Kafka, etc

Here's a very good TLDR:

What is Snowflake?

Snowflake is cloud based data warehousing platform, similar in concept to Google's BigQuery and AWS Redshift. It's differentiated by being able to integrate with data stores across all of the major cloud platforms and having a very versatile architecture.

Snowflake databases are virtual hard drives where the user stores data, while Snowflake Warehouses are virtual compute instances that run the analytical queries. They are largely independent of one another, meaning you can have a lot of storage and little compute or vice versa, no need to get a bigger RDS instance. They are also charged separately.

Here's a very good TLDR (sadly Fireship hasn't covered it 😢 ... yet!):

Let's set up a Data Warehouse

The Terraform setup

Make sure you have terraform installed on your local machine.
The terraform website provides instructions for all the supported operating systems.

I also recommend installing tfenv it's a terraform version manager. Using it you can install different versions of terraform and easily switch between them.
This is going to be very useful when you encounter errors that require downgrading to fix.

The Snowflake setup

To setup Snowflake to work with terraform, we are going to need to set up a service account.
This is a must because we are going to be running terraform and therefor also Snowflake via CD pipeline.

To enable this we are going to need to create a Snowflake user and enable that user to log in without us manually logging in. we can do that in one of two ways, by providing terraform with this user with a password or by providing the user with a pair of encryption keys.

For the purposes of this tutorial I'm going to be using the key pair method because it's less obvious than the password method.

Generating the key pair

Let's start by opening the terminal, then navigate to the directory where the ssh keys are stored.

cd ~/.ssh

Once in the ssh directory we can generate the keys, let's start by generating the private key using openssl.

openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out snowflake_tf_snow_key.p8 -nocrypt

What this command is doing is generating a 2048 bit private key, converting it to PKCS#8 (industry standard) format, saving it to a file called snowflake_tf_snow_key.p8 and not encrypting the private key file with a passphrase.

After this is done we need to use the private key to generate the public key.

openssl rsa -in snowflake_tf_snow_key.p8 -pubout -out snowflake_tf_snow_key.pub

This command takes the previously created private key as input and outputs the public key to a file named snowflake_tf_snow_key.pub.

Creating the Snowflake service account

The service account needs to be created in the Snowflake console, open a SQL worksheet set the user role to ACCOUNTADMIN.

To create a service account user enter and run the following script.

CREATE USER "tf-snow" RSA_PUBLIC_KEY='RSA_PUBLIC_KEY_HERE' DEFAULT_ROLE=PUBLIC MUST_CHANGE_PASSWORD=FALSE;

Replace the RSA_PUBLIC_KEY_HERE with the contents of the snowflake_tf_snow_key.pub file we generated earlier.

After creating the service account user, we need to give it the roles it need to create and destroy Snowflake objects.
Run the following SQL commands to assign the SYSADMIN and SECURITYADMIN roles.

GRANT ROLE SYSADMIN TO USER "tf-snow";
GRANT ROLE SECURITYADMIN TO USER "tf-snow";

The service account user is now done, the last set is to retrieve it's account_id and region, this will be needed for authentication and authorization.

The following script will output the account_id and region.

SELECT current_account() as YOUR_ACCOUNT_LOCATOR, current_region() as YOUR_SNOWFLAKE_REGION_ID;
Setting Environmental Variables

To not have to hardcode auth information into the terraform script we need to set the values as environmental variables. We can easily do this in the terminal with the export command.

export SNOWFLAKE_USER="tf-snow"
export SNOWFLAKE_AUTHENTICATOR=JWT
export SNOWFLAKE_PRIVATE_KEY=`cat ~/.ssh/snowflake_tf_snow_key.p8`
export SNOWFLAKE_ACCOUNT="YOUR_ACCOUNT_LOCATOR"

The Terraform script

The terraform lives in a file named main.tf, it contains the following script.

terraform {
	required_providers {
		snowflake = {
			source = "Snowflake-Labs/snowflake"
			version = "~> 0.76"
		}
	}
}

# Providers in this context serve like different roles
provider "snowflake" {
	alias = "sys_admin"
	role = "SYSADMIN"
}
  
# The database (storage)
resource "snowflake_database" "db" {
	provider = snowflake.sys_admin
	name = "TF_DEMO"
}
  
# The warehouse (compute)
resource "snowflake_warehouse" "warehouse" {
	provider = snowflake.sys_admin
	name = "TF_DEMO"
	warehouse_size = "small"
	auto_suspend = 60
}

  
provider "snowflake" {
	alias = "security_admin"
	role = "SECURITYADMIN"
}

# We now add a role that we can assign to out user to allow it your use our resources.

resource "snowflake_role" "role" {
	provider = snowflake.security_admin
	name = "TF_DEMO_SVC_ROLE"
}


resource "snowflake_grant_privileges_to_account_role" "grant" {
	provider = snowflake.security_admin
	on_account_object {
		object_type = "DATABASE"
		object_name = snowflake_database.db.name
	}
	privileges = ["USAGE"]
	account_role_name = snowflake_role.role.name
	with_grant_option = false
}


resource "snowflake_schema" "schema" {
	provider = snowflake.sys_admin
	database = snowflake_database.db.name
	name = "TF_DEMO"
	is_managed = false
}

  
resource "snowflake_schema_grant" "grant" {
	provider = snowflake.security_admin
	database_name = snowflake_database.db.name
	schema_name = snowflake_schema.schema.name
	privilege = "USAGE"
	roles = [snowflake_role.role.name]
	with_grant_option = false
}

  
resource "snowflake_warehouse_grant" "grant" {
	provider = snowflake.security_admin
	warehouse_name = snowflake_warehouse.warehouse.name
	privilege = "USAGE"
	roles = [snowflake_role.role.name]
	with_grant_option = false
}

  
resource "tls_private_key" "svc_key" {
	algorithm = "RSA"
	rsa_bits = 2048
}

  
resource "snowflake_user" "user" {
	provider = snowflake.security_admin
	name = "tf_demo_user"
	default_warehouse = snowflake_warehouse.warehouse.name
	default_role = snowflake_role.role.name
	default_namespace = "${snowflake_database.db.name}.${snowflake_schema.schema.name}"
	rsa_public_key = substr(tls_private_key.svc_key.public_key_pem, 27, 398)
}


resource "snowflake_role_grants" "grants" {
	provider = snowflake.security_admin
	role_name = snowflake_role.role.name
	users = [snowflake_user.user.name]
}

How to Run the Script

To run the terraform script we need to first initiate the terraform project, we do that running the init command in the directory where the main.tf file is located.

terraform init

This will generate the .terraform directory, this is where terraform stores all the resources it needs to run the scripts. It also generates the terraform.tfstate file, this is where the state is stored. Make sure to add it to .gitignore, the state file contains all the used information including environmental variables and secrets, make sure it's private.

Before running the script we need to make sure it's correct, we do that by running the plan command.

terraform plan

If everything works (the provider is under-development, the current code might not work), you can use the script by running the apply.

terraform apply

After this command runs it course, go to the Snowflake console you will be able to see the newly created database and warehouse.

The full code

The full code is available here.

Next

Week 2 - Ingesting Data with Mage