Skip to content

Creating a New Connection

This guide shows how to connect your warehouse to Numbers Station from within the web application and start to ask analytics questions.

Prerequisites

In order to connect to your data warehouse, you will first need to set up a way to share access with Numbers Station in a controlled way. This varies by data warehouse, so please check the instructions for your specific use case.

Databricks

To connect to Databricks, Numbers Station requires that you:

  1. Provide warehouse parameters
  2. Create a service principal
  3. Create a Personal Access Token (PAT) for the service principal

Provide Warehouse Parameters

In your databricks workspace, go to SQL > SQL warehouses and select your warehouse. Then, go to the Connection details tab and copy the Server hostname and HTTP path to the fields on the left.

image

Create a Service Principal

It is best practice to follow the principle of least privilege. You can do so by creating a service principal for use by Numbers Station and only allowing access to the resources you plan to connect with Numbers Station.

Follow the databricks guide to create a service principal.

Once your service principal is created, you'll need to enable access token usage for this service principal.

Grant Permissions

In order to function, Numbers Station requires you to provide the service principal the correct entitlements and permissions.

Entitlements

Numbers Station needs databricks-sql-access entitlements. This is provided by default, but you can follow these Databricks instructions to ensure the correct entitlement is provided.

"Data Editor" Permissions

Numbers Station requires "Data Editor" permissions on the result cache schema. Assuming you have already added your service principal to the workspace, locate your result cache schema in your Catalog. Then, select the Permissions tab, select the service principal in the Principals drop down menu, and select the Data Editor Privilege preset. Then, click Grant.

Create a PAT

Follow this databricks guide to create a PAT for your service principal.

Save the access token, you'll need it to create the connection in Numbers Station.

Snowflake

Snowflake recommends granting permissions to a role other than ACCOUNTADMIN. See Snowflake for more on the use of the ACCOUNTADMIN Role.

Create a Custom Role

Create a Numbers Station specific role in Snowflake to interface with Numbers Station. In Snowflake, run the following to create a custom role and user with access to the databases, schemas, and tables you want to interact with in Numbers Station.

-- Create a new role intended for reading/writing data with Numbers Station
create role NUMBERSSTATION;

-- [Optional] Grant privileges on your databases to the new role.
-- Only databases listed here will be readable by the new user/role.
-- Each of these steps are required to ensure access to the database/schema/tables.
GRANT usage on database <name> to role NUMBERSSTATION;
GRANT usage on all schemas in database <name> to role NUMBERSSTATION;
GRANT usage on future schemas in database <name> to role NUMBERSSTATION;
GRANT select on all tables in database <name> to role NUMBERSSTATION;
GRANT select on future tables in database <name> to role NUMBERSSTATION;

-- Create a user, or skip this step if you're using an existing user and have assigned the user NUMBERSSTATION as DEFAULT_ROLE
create user NUMBERSSTATION_USER
password = '<PASSWORD>'
default_warehouse = <WAREHOUSE>
default_role = NUMBERSSTATION;

-- grant access to the <WAREHOUSE> for the NUMBERSSTATION role.
GRANT usage on warehouse <warehouse> to role NUMBERSSTATION;

-- grant the NUMBERSSTATION role to the user.
GRANT role NUMBERSSTATION to user NUMBERSSTATION_USER;

-- Create the Result Cache db and schema
CREATE DATABASE IF NOT EXISTS "NUMBERS_STATION";
CREATE SCHEMA IF NOT EXISTS "NUMBERS_STATION"."NUMBERS_STATION";

-- grant the NUMBERSSTATION role access to Result Cache
GRANT CREATE TABLE ON SCHEMA "NUMBERS_STATION"."NUMBERS_STATION" TO ROLE NUMBERSSTATION;

Warning

If you choose to rely on an existing user, ensure the user has DEFAULT_ROLE as NUMBERSSTATION or wider, a DEFAULT_WAREHOUSE, and that the DEFAULT_ROLE has access to the DEFAULT_WAREHOUSE. For example, if your DEFAULT_ROLE is ORGADMIN, you generally won't have been granted usage on any warehouses.

Google BigQuery

To connect to BigQuery, Numbers Station requires that you:

  1. Create a custom role.
  2. Create a service account to connect to Google Cloud Platform (GCP).
  3. Ensure your GCP project has the correct APIs enabled.

Creating a Custom Role for Numbers Station

It is best practice to follow the principle of least privilege. You can do so by creating a custom role for use by Numbers Station with at least the following permissions.

Permissions in YAML

  - bigquery.datasets.create
  - bigquery.datasets.delete
  - bigquery.datasets.get
  - bigquery.jobs.create
  - bigquery.tables.create
  - bigquery.tables.delete
  - bigquery.tables.get
  - bigquery.tables.getData
  - bigquery.tables.list
  - bigquery.tables.update
  - bigquery.tables.updateData

Here is an example command to create the role programmatically (it requires gcloud and jq installed).

PROJECT_ID=<fill this in with your target project id>
ROLE_NAME=NUMBERS_STATION
ROLE_ID=$ROLE_NAME
ROLE_DESCRIPTION="Numbers Station Role"
ROLE_PERMISSIONS=bigquery.datasets.create,bigquery.datasets.delete,bigquery.datasets.get,bigquery.jobs.create,bigquery.tables.create,bigquery.tables.delete,bigquery.tables.get,bigquery.tables.getData,bigquery.tables.list,bigquery.tables.update,bigquery.tables.updateData

gcloud auth login
echo ""
echo "Creating role named $ROLE_NAME..."
gcloud iam roles create $ROLE_ID --project=$PROJECT_ID --title=$ROLE_NAME --description="$ROLE_DESCRIPTION" --permissions=$ROLE_PERMISSIONS
echo ""
echo "Verifying role..."
ROLES_LIST=$(gcloud iam roles list --project=$PROJECT_ID --filter=title=$ROLE_NAME --format=json)
ROLE_FULL_NAME=$(echo $ROLES_LIST | jq -r .[0].name)

Creating a Service Account

Once you have the custom role, you can proceed to create a service account. Numbers Station uses this account to access your data, as well as set up the cloud functions and remote functions needed to run our transformations. Make sure to assign the custom role during the second step of the creation process.

After creating the service account, you must then generate a JSON service account key. You will provide it to Numbers Station to grant us access to the project that this service account is associated with. The file's contents are stored and encrypted in transit and at rest.

PROJECT_ID=<fill this in with your target project id>
SERVICE_ACCOUNT_DISPLAY_NAME=NUMBERS-STATION
echo ""
echo "Creating service account named $SERVICE_ACCOUNT_DISPLAY_NAME..."
gcloud iam service-accounts create --project=$PROJECT_ID $SERVICE_ACCOUNT_DISPLAY_NAME --display-name=$SERVICE_ACCOUNT_DISPLAY_NAME

echo ""
echo "Verifying service account..."
SERVICE_ACCOUNTS_LIST=$(gcloud iam service-accounts list --project=$PROJECT_ID --filter=displayName=$SERVICE_ACCOUNT_DISPLAY_NAME --format=json)
SERVICE_ACCOUNT_EMAIL=$(echo $SERVICE_ACCOUNTS_LIST | jq -r .[0].email)

echo ""
echo "Adding role to service account..."
gcloud projects add-iam-policy-binding $PROJECT_ID --member=serviceAccount:$SERVICE_ACCOUNT_EMAIL --role=$ROLE_FULL_NAME

echo ""
echo "Creating service account key..."
gcloud iam service-accounts keys create --project=$PROJECT_ID --iam-account=$SERVICE_ACCOUNT_EMAIL $SERVICE_ACCOUNT_DISPLAY_NAME.json

echo ""
echo ""
echo "Copy the following key into Numbers Station when creating a new BigQuery connection:"
echo ""
cat $SERVICE_ACCOUNT_DISPLAY_NAME.json

Enabling APIs

To validate permissions, your project must have the Cloud Resource Manager API enabled.

To connect to external tables, your project must have BigQuery Connection API enabled.

For each external table, permissions also must be added. For example, if the external table comes from Google Sheets, the service account email needs be on the share list of the Sheet.

Warning

BigQuery does not support cross-location queries. So, when you specify a connection's location, select the location that matches the location of the datasets you are operating on. Otherwise, Numbers Station's connection will be unable to query your data. For instance, if the dataset is in the us-west2 region, the connection must use the same region and the multi-region US does not work. If the dataset is in the US region, the connection must use the multi-region US, not a single region us-west1.

Making a Connection in Numbers Station

Once you have the prerequisites complete, you are ready to create a connection in Numbers Station to your data warehouse.

There are two primary paths where you can create a connection.

The most direct path is to click on your account name in the top left > Settings > Connections (in the Workspace group) > Create connection. Note that this page also shows all the existing connections in your account.

The second way is to create a connection while creating a dataset. This can be done by clicking the Datasets button at the bottom left > New dataset > Create connection.

From either of these paths, you will be presented the same UI where you can input the information obtained in the prerequisite steps, so Numbers Station can connect to your warehouse securely.

image

image

image