Cannot combine fields due to fan-out issues?
In some cases, when exploring you encounter this error Cannot combine selected fields due to potential fan-out issues.
Why does fan-out issue happen in real-life?
Fan-out happens when you LEFT JOIN two data models with one-to-many relationship, that one row of your model on the left can match up with multiple rows in your joined table on the right. So, aggregate functions like COUNT or SUM may include duplicates, throwing off the results.
For example, you are working on an ecommerce dataset that contains two data models users
and orders
with one-to-many relationship as below diagram.
Let's say when exploring this set of Data, if you use any field from orders
with a measure from users
, the fan-out issue occurs. For example, you want to know total users have activities (orders.status
is not null) on your ecommerce platform.
In the aforementioned case, you use measure field users.total_users
(COUNT(users."id")
) of users
data model and non-measure field orders.status
of orders
data model. Since the relationship between users
and orders
is one-to-many, we say that a fanout has occurred. The measure users.total_users
(COUNT(users."id")
) will be duplicated and normally count of the user will be wrong.
The underlying query will be generated as below:
SELECT
T1."user_id" AS "user_id",
COUNT(T0."id") AS "total_user"
FROM
"ecommerce"."users" T0
LEFT JOIN "ecommerce"."order" T1 ON T0."id" = T1."user_id"
WHERE
T0."id" < 2200 and T0."status" is not null
GROUP BY 1
ORDER BY 2 DESC
LIMIT 500
What have we done to solve the issue
In order to solve this issue, Holistics has modified the underlying query to get the distinct value when the fan-out issue happens.
Specifically, when the measure/aggregation is on the dimension model instead of the fact model, we will calculate the measure first inside the dimension model before executing the join.
For example, let take the example above, when we count the user.id
, this query will run first inside user
model to calculate total user
for each user_id
. Let's call this Set A
SELECT
T0."id" AS "user_id",
COUNT(T0."id") AS "total_user"
FROM
"ecommerce"."users" T0
WHERE
T0."id" < 2200
GROUP BY 1
ORDER BY 2 DESC
After that, we will execute the join from orders
to users
as normal and select distinct user_id
from order
model. Let's call this Set B
SELECT
distinct T0."id" AS "user_id"
FROM
"ecommerce"."users" T0
LEFT JOIN "ecommerce"."order" T1 ON T0."id" = T1."user_id"
WHERE
T0."id" < 2200 AND T1."status" is not null
Finally, we will execute Inner Join Set A and Set B, select user_id
from Set B and total_user
from Set A. By doing this, the user_id
will be unique and fan-out issue no longer exists.
The final query will be:
WITH total_actived_users AS (
SELECT
T0."id" AS "user_id",
COUNT(T0."id") AS "total_users "
FROM
"ecommerce"."users" T0
WHERE
T0."id" < 2200
GROUP BY 1
ORDER BY 2 DESC
)
, distinct_user_id AS (
SELECT
DISTINCT T0."id" AS "user_id"
FROM
"ecommerce"."users" T0
LEFT JOIN
"ecommerce"."orders" T1 ON T0."id" = T1."user_id"
WHERE
T0."id" < 2200 AND T1."status" IS NOT NULL
)
SELECT
T1."user_id",
T0."total_users"
FROM
total_actived_users T0
INNER JOIN
distinct_user_id T1 ON T0."user_id" = T1."user_id"
Why do you still encounter this fan-out issue? How to solve?
In Holistics, the fan-out issue happens when your custom measure is defined by your own SQL logic, and (for now) we're unable to analyze the SQL semantic (of your measure) to handle the fan-out issue.
If you use either Aggregation Function or Business Calculation in our Dataset explore, the issue will be solved.
Let’s deep dive into our solutions to protect yourself from the fan-out issues.
Use Aggregation or Business calculation on the Dataset Exploration UI
- If your measure is only a basic calculation (COUNT, SUM, AVG,...), you can use our Aggregation in our dataset exploration instead of creating a custom measure in the modeling layer.
Just drag whatever field was originally nested in your custom measure definition, and place that field in the y-axis field. Holistics will naturally assign an aggregation to that field, most likely a count
. You can then proceed to click on the drop-down icon on the right side of the field box to select a different aggregation.
- For composite measure, we advise that you use Business calculation. For example, when you need to divide one measure by another measure or need to use
case...when
safe_divide(
sum(
case(
when: public_appointments.status == 'Quote Requested', then: 0,
when: public_appointments.status == 'Incomplete', then: 0,
else: 1
)
),
sum(
case(
when: public_appointments.status == 'Incomplete', then: 0,
else: 1
)
)
)