Snowflake

Integrate via Direct Connect

Steps

1. Create a Role

Create a role, name it KUBIT.

CREATE ROLE KUBIT;

2. Generate a Key-pair

Here's Snowflake's full guide for reference).

openssl genrsa 2048 | openssl pkcs8 -topk8 -v2 des3 -inform PEM -out rsa_key.p8
openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub

3. Create a User

Create a user, name it KUBIT and set the public key you generated in the previous step.

CREATE USER KUBIT RSA_PUBLIC_KEY ='MIIBIjANBgkqh...' DEFAULT_ROLE=KUBIT

4. Create a Database

Kubit needs to create a very small auxiliary table to support some features.

  1. Create a database called KUBIT.
    CREATE DATABASE KUBIT;
  2. Make the KUBIT role owner of the KUBIT database. This will allow us to create and execute tasks that are only available to an owner role.
    GRANT OWNERSHIP ON DATABASE KUBIT TO ROLE KUBIT;

5. Create a Warehouse

Create a Data Warehouse called KUBIT (size depends on your data volume)

Grant the KUBIT role the following privileges: MONITOR | USAGE | OPERATE

CREATE OR REPLACE WAREHOUSE KUBIT WAREHOUSE_SIZE=... INITIALLY_SUSPENDED=TRUE;
GRANT MONITOR ON WAREHOUSE KUBIT TO ROLE KUBIT;
GRANT USAGE ON WAREHOUSE KUBIT TO ROLE KUBIT;
GRANT OPERATE ON WAREHOUSE KUBIT TO ROLE KUBIT;

6. Grant Access

Add read-only (SELECT on FUTURE) permissions to the KUBIT role to the schema(s) containing the views interfacing with Kubit.

5. Share Connection Details (Optional)

Share your Snowflake account locator, private key and key password and warehouse name with the Kubit team if you need help setting up your Kubit environment.

7. Split Test and Production Workloads (Optional)

👍

Best Practice

Create one more user role and warehouse for development purposes in order to isolate the impact on the production environment and avoid accidents.

Best Practices

Clustering Keys

  1. Kubit works with time-series data so clustering by date of the fact tables is essential for performance for tables bigger than 1TB.
  2. Since most Kubit reports also filter by event name it is beneficial to add the event name as a second cluster column.

Here's Snowflake's reference for Clustering Keys.

Warehouse Configuration

You have 4 major parameters to think of when creating a warehouse:

  • Type - Gen2 is the recommended type for analytical workloads.
  • Size - start small and only improve based on evidence from Query History. If the warehouse feels slow you should examine your queries with Query Profile. Look for Bytes spilled to disk above 0, if the warehouse is spilling often you need to increase the size.
  • Auto-resume - you want this on so any activity wakes up the warehouse.
  • Auto-suspend - this setting controls for how long the warehouse should remain up and idling after all running queries have finished executing. The trade-off is between cost saving and a few additional seconds on some Kubit reports to account for the warehouse waking up. Our default recommendation is to set it to the minimum possible - 1 minutes.