Skip to content

When Should I Group Dimensions?

The "Grouping Dimensions" feature in our platform allows you to relate dimensions that are duplicated across views. This is particularly useful when your dataset contains columns that are semantically the same but might be duplicated across multiple views. Grouping these dimensions helps our system to decide which dimension to use, optimizing the cost of the SQL query.

Why Group Dimensions?

When dimensions are duplicated across views, it can be challenging for our system to determine which dimension to use. Grouping these dimensions informs the system of their equivalence, allowing it to select the dimension that minimizes the SQL query cost. This helps in optimizing performance and ensuring consistent results by reducing noise in the dataset by making it more organized.

How to Create a Dimension Group

  1. Navigate to the Dataset Page: Go to the dataset page for the dataset you want to manage.
  2. Access the Knowledge Tab: If you're not already there, click on the "Knowledge" tab to see the list of dimensions defined in the dataset.
  3. Select the Dimension to Group: Hover over the name of the dimension you wish to group. A three-dot menu button will appear.
  4. Open the Grouping Modal: Click on the three-dot menu button and select the "Group to..." option. A modal will open.
  5. Choose the Target Dimension: In the modal, select the other dimension to which you want to group the dimension.

group-to

Best Practices

  • Ensure Data Equivalence: Before grouping dimensions, verify that the underlying data (profiles for sample values) is the same. This means the dimensions should represent the same data point across different views.
  • Administrative Permissions: Only admins can edit the dimension groups. Ensure you have the necessary permissions to make these changes.
  • Consider User Preferences: If you care which specific dimension is used in the query, you should not group the dimensions. Grouping removes control over which dimension is chosen by the system.

Example: Retail Sales

Suppose I have a retail sales dataset with a star schema about sales within my stores, as described below. Due to earlier reporting requirements I set up both the dimension tables in my star schema to all have the city of my stores. However, the information contained in each column is effectively the same. When I import that dataset to Numbers Station, I would mark each city in those tables as part of the same group since they are semantically equivalent. In other words Branch City and City would be in a group.

  • Fact Table


    • Branch FK
    • Location FK
    • Sales
  • Branch Dimension


    • Branch Name
    • Branch City
  • Location Dimension


    • Street Address
    • City
    • Country

Troubleshooting

  • Numbers Station is picking a suboptimal dimension from the group. This can happen if you have a verified question from prior to the release of the grouped dimension feature that specifies a suboptimal query. If Numbers Station copies the verified question then it can get suboptimal results. Editing the semantic query in the verified question should fix this.
  • Numbers Station won't let me pick the specific dimension from the group when I add dimensions or filters. We automatically suggest the best dimension to add to minimize the scope of the query and so do not allow you to specify a particular dimension in a group. If it matters beyond efficiency which dimension Numbers Station uses then it likely means those dimensions should not be in a group together.

By following these guidelines, you can effectively manage dimension grouping in your datasets, leading to optimized queries and consistent data handling.