Relationships in Dataset
Combining data from different models
In an exploration, users can drag fields from different data models in a single explore. Based on the relationships metadata, Holistics figures out the correct JOINs statements to apply to the SQL query.
Example: Given a dataset created from the following models: orders
, users
, merchants
, cities
, countries
You want to calculate: "Total orders broken down by countries". You will need access to countries.name
and orders.total_orders
fields.
Simply drag in Country Name and Total Orders field (from 2 different models):
The generated query will be:
SELECT
T3."name" AS "name",
count(1) AS "total_orders"
FROM
"ecommerce"."orders" T0
LEFT JOIN "ecommerce"."users" T1 ON T0."user_id" = T1."id"
LEFT JOIN "ecommerce"."cities" T2 ON T1."city_id" = T2."id"
LEFT JOIN "ecommerce"."countries" T3 ON T2."country_code" = T3."code"
GROUP BY 1
ORDER BY 2 DESC
Ambiguity in join paths
Sometimes, based on the relationships metadata, there are more than one possible paths to combine fields between two models.
Example: you have some models with relationships as below
To answer the question "Total order values placed by countries", you drag in Country Name from countries
and Orders Count from orders
. As you can see, there are two ways to go from countries
to orders
:
- A: countries → cities → users → orders → order items
- B: countries → cities → merchants → products → order items
Each JOIN path will produce a different result with different meanings (total values by users' countries vs. by merchant's origin countries.)
To prevent ambiguity, when creating datasets, Holistics automatically detects and disables ambiguous relationships in the JOIN paths. However, you could also browse the list of relationships and adjust your active relationship according to your needs.
Duplicating into 2 different models: In these situations, it is also recommended that you duplicate Cities and Countries models into dedicated models for Users and Merchants. This makes your dataset more clear to the end-user:
Which JOIN types (left, right, full) does Holistics use?
The JOINs are constructed following some simple rules:
- If the relationship between A and B is n - 1, then valid JOIN path is from n to 1 (
A LEFT JOIN B
orB RIGHT JOIN A
) - If the relationship is 1 - 1, the path is valid both ways (
A LEFT JOIN B
orB LEFT JOIN A
are OK) so we apply FULL JOIN in this case.
Why do I sometimes see 'SELECT DISTINCT' in the SQL?
In some cases, you will see we apply SELECT DISTINCT in the underlying query for two reasons.
First, this is part of our mechanism to avoid fan-out issue. For more information, please refer to our document about fan-out issue
Second, we see no value to display all records (even duplicated records) so we have applied SELECT DISTINCT to prevent showing redundant data. Our aim is to provide dataset's explorers an overview of their data.