Skip to content

Creating and Managing Connection(s)

This guide shows how to connect your warehouse to Numbers Station from within the web application and start to ask analytics questions.

Setting Up Permissions in Data Warehouse

In order to connect to your data warehouse, you will first need to set up a way to share access with Numbers Station in a controlled way. This varies by data warehouse, so please check the instructions for your specific use case.

Databricks

To connect to Databricks, Numbers Station requires that you:

  1. Provide warehouse parameters
  2. Create a service principal
  3. Create a Personal Access Token (PAT) for the service principal

Provide Warehouse Parameters

In your databricks workspace, go to SQL > SQL warehouses and select your warehouse. Then, go to the Connection details tab and copy the Server hostname and HTTP path to the fields on the left.

image

Create a Service Principal

It is best practice to follow the principle of least privilege. You can do so by creating a service principal for use by Numbers Station and only allowing access to the resources you plan to connect with Numbers Station.

Follow the databricks guide to create a service principal.

Once your service principal is created, you'll need to enable access token usage for this service principal.

Grant Permissions

In order to function, Numbers Station requires you to provide the service principal the correct entitlements and permissions.

Entitlements

Numbers Station needs databricks-sql-access entitlements. This is provided by default, but you can follow these Databricks instructions to ensure the correct entitlement is provided.

"Data Editor" Permissions

Numbers Station requires "Data Editor" permissions on the result cache schema. Assuming you have already added your service principal to the workspace, locate your result cache schema in your Catalog. Then, select the Permissions tab, select the service principal in the Principals drop down menu, and select the Data Editor Privilege preset. Then, click Grant.

Create a PAT

Follow this databricks guide to create a PAT for your service principal.

Save the access token, you'll need it to create the connection in Numbers Station.

Snowflake

Snowflake recommends granting permissions to a role other than ACCOUNTADMIN. See Snowflake for more on the use of the ACCOUNTADMIN Role.

Create a Custom Role

Create a Numbers Station specific role in Snowflake to interface with Numbers Station. In Snowflake, run the following to create a custom role and user with access to the databases, schemas, and tables you want to interact with in Numbers Station.

-- Create a new role intended for reading/writing data with Numbers Station
CREATE ROLE NUMBERSSTATION;

-- [Optional] Grant privileges on your databases to the new role.
-- Only databases listed here will be readable by the new user/role.
-- Each of these steps are required to ensure access to the database/schema/tables.
GRANT USAGE ON DATABASE <name> TO ROLE NUMBERSSTATION;
GRANT USAGE ON ALL SCHEMAS IN DATABASE <name> TO ROLE NUMBERSSTATION;
GRANT USAGE ON FUTURE SCHEMAS IN DATABASE <name> TO ROLE NUMBERSSTATION;
GRANT SELECT ON ALL TABLES IN DATABASE <name> TO ROLE NUMBERSSTATION;
GRANT SELECT ON FUTURE TABLES IN DATABASE <name> TO ROLE NUMBERSSTATION;
GRANT SELECT ON ALL VIEWS IN DATABASE <name> TO ROLE NUMBERSSTATION;
GRANT SELECT ON FUTURE VIEWS IN DATABASE <name> TO ROLE NUMBERSSTATION;

-- Create a user, or skip this step if you're using an existing user and have assigned the user NUMBERSSTATION as DEFAULT_ROLE
create user NUMBERSSTATION_USER
password = '<PASSWORD>'
default_warehouse = <WAREHOUSE>
default_role = NUMBERSSTATION;

-- grant access to the <WAREHOUSE> for the NUMBERSSTATION role.
GRANT USAGE ON WAREHOUSE <warehouse> TO ROLE NUMBERSSTATION;

-- grant the NUMBERSSTATION role to the user.
GRANT ROLE NUMBERSSTATION TO USER NUMBERSSTATION_USER;

-- Create the Result Cache db and schema
CREATE DATABASE IF NOT EXISTS "NUMBERS_STATION";
CREATE SCHEMA IF NOT EXISTS "NUMBERS_STATION"."NUMBERS_STATION";

-- grant the NUMBERSSTATION role access to Result Cache
GRANT ALL PRIVILEGES ON DATABASE NUMBERS_STATION TO ROLE NUMBERSSTATION;
GRANT ALL PRIVILEGES ON SCHEMA "NUMBERS_STATION"."NUMBERS_STATION" TO ROLE NUMBERSSTATION;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA "NUMBERS_STATION"."NUMBERS_STATION" TO ROLE NUMBERSSTATION;
GRANT ALL PRIVILEGES ON FUTURE TABLES IN SCHEMA "NUMBERS_STATION"."NUMBERS_STATION" TO ROLE NUMBERSSTATION;
GRANT ALL PRIVILEGES ON ALL VIEWS IN SCHEMA "NUMBERS_STATION"."NUMBERS_STATION" TO ROLE NUMBERSSTATION;
GRANT ALL PRIVILEGES ON FUTURE VIEWS IN SCHEMA "NUMBERS_STATION"."NUMBERS_STATION" TO ROLE NUMBERSSTATION;

Warning

If you choose to rely on an existing user, ensure the user has DEFAULT_ROLE as NUMBERSSTATION or wider, a DEFAULT_WAREHOUSE, and that the DEFAULT_ROLE has access to the DEFAULT_WAREHOUSE. For example, if your DEFAULT_ROLE is ORGADMIN, you generally won't have been granted usage on any warehouses.

Google BigQuery

To connect to BigQuery, Numbers Station requires that you:

  1. Create a custom role.
  2. Create a service account to connect to Google Cloud Platform (GCP).
  3. Ensure your GCP project has the correct APIs enabled.

Creating a Custom Role for Numbers Station

It is best practice to follow the principle of least privilege. You can do so by creating a custom role for use by Numbers Station with at least the following permissions.

Permissions in YAML

  - bigquery.datasets.create
  - bigquery.datasets.delete
  - bigquery.datasets.get
  - bigquery.jobs.create
  - bigquery.tables.create
  - bigquery.tables.delete
  - bigquery.tables.get
  - bigquery.tables.getData
  - bigquery.tables.list
  - bigquery.tables.update
  - bigquery.tables.updateData

Here is an example command to create the role programmatically (it requires gcloud and jq installed).

PROJECT_ID=<fill this in with your target project id>
ROLE_NAME=NUMBERS_STATION
ROLE_ID=$ROLE_NAME
ROLE_DESCRIPTION="Numbers Station Role"
ROLE_PERMISSIONS=bigquery.datasets.create,bigquery.datasets.delete,bigquery.datasets.get,bigquery.jobs.create,bigquery.tables.create,bigquery.tables.delete,bigquery.tables.get,bigquery.tables.getData,bigquery.tables.list,bigquery.tables.update,bigquery.tables.updateData

gcloud auth login
echo ""
echo "Creating role named $ROLE_NAME..."
gcloud iam roles create $ROLE_ID --project=$PROJECT_ID --title=$ROLE_NAME --description="$ROLE_DESCRIPTION" --permissions=$ROLE_PERMISSIONS
echo ""
echo "Verifying role..."
ROLES_LIST=$(gcloud iam roles list --project=$PROJECT_ID --filter=title=$ROLE_NAME --format=json)
ROLE_FULL_NAME=$(echo $ROLES_LIST | jq -r .[0].name)

Creating a Service Account

Once you have the custom role, you can proceed to create a service account. Numbers Station uses this account to access your data, as well as set up the cloud functions and remote functions needed to run our transformations. Make sure to assign the custom role during the second step of the creation process.

After creating the service account, you must then generate a JSON service account key. You will provide it to Numbers Station to grant us access to the project that this service account is associated with. The file's contents are stored and encrypted in transit and at rest.

PROJECT_ID=<fill this in with your target project id>
SERVICE_ACCOUNT_DISPLAY_NAME=NUMBERS-STATION
echo ""
echo "Creating service account named $SERVICE_ACCOUNT_DISPLAY_NAME..."
gcloud iam service-accounts create --project=$PROJECT_ID $SERVICE_ACCOUNT_DISPLAY_NAME --display-name=$SERVICE_ACCOUNT_DISPLAY_NAME

echo ""
echo "Verifying service account..."
SERVICE_ACCOUNTS_LIST=$(gcloud iam service-accounts list --project=$PROJECT_ID --filter=displayName=$SERVICE_ACCOUNT_DISPLAY_NAME --format=json)
SERVICE_ACCOUNT_EMAIL=$(echo $SERVICE_ACCOUNTS_LIST | jq -r .[0].email)

echo ""
echo "Adding role to service account..."
gcloud projects add-iam-policy-binding $PROJECT_ID --member=serviceAccount:$SERVICE_ACCOUNT_EMAIL --role=$ROLE_FULL_NAME

echo ""
echo "Creating service account key..."
gcloud iam service-accounts keys create --project=$PROJECT_ID --iam-account=$SERVICE_ACCOUNT_EMAIL $SERVICE_ACCOUNT_DISPLAY_NAME.json

echo ""
echo ""
echo "Copy the following key into Numbers Station when creating a new BigQuery connection:"
echo ""
cat $SERVICE_ACCOUNT_DISPLAY_NAME.json

Enabling APIs

To validate permissions, your project must have the Cloud Resource Manager API enabled.

To connect to external tables, your project must have BigQuery Connection API enabled.

For each external table, permissions also must be added. For example, if the external table comes from Google Sheets, the service account email needs be on the share list of the Sheet.

Warning

BigQuery does not support cross-location queries. So, when you specify a connection's location, select the location that matches the location of the datasets you are operating on. Otherwise, Numbers Station's connection will be unable to query your data. For instance, if the dataset is in the us-west2 region, the connection must use the same region and the multi-region US does not work. If the dataset is in the US region, the connection must use the multi-region US, not a single region us-west1.

Tableau

Tableau is a data visualization tool widely used for business intelligence and analytics. It allows users to connect to a variety of data sources, transform data, and create interactive, shareable dashboards and visualizations. This helps users analyze and gain insights from their data more effectively.

Numbers Station supports direct integration with your tableau workbooks.

Configuration

To connect to Tableau, Numbers Station requires that you:

  1. Provide service address and site ID.
  2. Configure Connected Apps with Direct Trust.
Provide service address and site ID
Service Address

The service address is essentially the URL of your Tableau Server or Tableau Online instance.

  • If you're using Tableau Online: The service address will typically be in the format https://[your-site].online.tableau.com, where [your-site] is your site's specific name.
  • If you're using Tableau Server: The service address is the base URL where your Tableau Server is hosted, often something like https://yourserver.com or https://[IP address]:[port] if accessed by IP.
Site ID

Go to your Tableau Server or Tableau Online and log in. In the URL, after the base service address, you will see the Site ID. For example, in https://yourserver.com/#/site/[site-id]/workbooks, the [site-id] segment is your Site ID. If you're on the default site, the Site ID may be "Default," or it could be another custom name if you've set up multiple sites.

Configure Connected Apps with Direct Trust.

Tableau suggests to use Tableau Connected Apps for application integration. Tableau connected apps enable a seamless and secure authentication experience by facilitating an explicit trust relationship between your Tableau Cloud site and external applications.

You can do so by creating a connected app for use by Numbers Station and only allowing access to the resources you plan to connect with Numbers Station.

Follow the Configure Connected Apps with Direct Trust to create a connected app.

Enable the connected app and generate the secret value.

Redshift

Amazon Redshift is a fully managed, cloud-based data warehouse service that allows organizations to run fast, complex queries on large datasets. It is designed for scalability and performance, enabling users to easily store and analyze vast amounts of data for business intelligence and analytics.

Numbers Station supports direct integration with your Redshift data.

Creating a Dedicated User for Numbers Station

To connect to Redshift, Numbers Station recommends creating a dedicated user with minimal permissions.

Specifically, we need:

  • CREATE permission on the database.
  • SELECT permission on all tables within the public schema
  • To share tables beyond those in the public schema with Numbers Station, we need both SELECT on the tables themselves and USAGE on their corresponding schemas.
  • ALL permissions on a schema for Numbers Station's internal results cache (or allow Numbers Station to create its own schema).

The following instructions show how to do so using your superuser Redshift account.

Prerequisites:

  1. Access to a Redshift superuser in order to grant permissions.
  2. We currently require the enable_case_sensitive_identifier configuration to be false (the default value) on your cluster. You can verify the value of this setting by running SHOW enable_case_sensitive_identifier;.

To connect to Redshift, Numbers Station recommends that you:

  1. Create a new user by running CREATE USER <user_name> PASSWORD <password>;. Numbers Station will use this user to access tables within your database.
  2. Provide CREATE permission on the database you are sharing with us by running: GRANT CREATE ON DATABASE <database_name> TO <user_name>;. This is used to create our results cache schema, if necessary.
  3. For schema permissions:

    1. If all of the tables you intend to share with us are within the public schema:

      • You can provide SELECT permission on all tables within public schema to all users in the public group by running GRANT SELECT ON ALL TABLES IN SCHEMA public TO public;.
      • If you want Numbers Station to have access to any future tables you might add to the public schema, you can provide SELECT permission to all future tables for all users by running ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO GROUP public;.
    2. If any of the tables you intend to share with Numbers Station are not within the public schema, then you need to (1) grant USAGE permission on the target schema and (2) specifically grant SELECT permission on that table by running GRANT SELECT ON TABLE <schema_name>.<table_name> TO <user_name>;.

  4. Numbers Station also requires an additional schema for our internal results cache. This can be managed in one of two ways.

    1. You can create a new schema and provide ALL permissions on this schema by running GRANT ALL ON SCHEMA <schema_name>.
    2. Let Numbers Station automatically create its own schema based on the name you specify in the connection.

Provide Host Address and Region

Host Address

The host refers to the hostname or IP address of the Redshift cluster that the connector will connect to. It is the endpoint of your Redshift database that Numbers Station will use to establish a connection. Usually a complete host address will look like this: <your-cluster-name>.<random-id>.<region>.redshift.amazonaws.com:<port>. And the part after ":" is your network port number used to establish a connection to the Redshift database.

Region

The region is used to specify where your Redshift cluster is hosted (AWS region), and when forming the endpoint URL for the cluster, it is the <region> part of the example host address provided above. It would typically be a string representing the AWS region code (e.g., us-west-2, us-east-1, eu-central-1, etc.).

Making a Connection in Numbers Station

Once you have the prerequisites complete, you are ready to create a connection in Numbers Station to your data warehouse.

There are two primary paths where you can create a connection.

The most direct path is to click on your account name in the top left > Settings > Connections (in the Workspace group) > Create connection. Note that this page also shows all the existing connections in your account.

The second way is to create a connection while creating a dataset. This can be done by clicking the Datasets button at the bottom left > New dataset > Create connection.

From either of these paths, you will be presented the same UI where you can input the information obtained in the prerequisite steps, so Numbers Station can connect to your warehouse securely.

image

image

image

image

image

Managing Connections

Clicking on the Connections icon on the left navigation menu, navigates you to a page where all the connections of the workspace are listed.

managing-connection

You can test, edit, and delete a connection from this page.

One of the most popular usecases is to update the password and connection secrets on periodic bases.