Holistics Expression
Definition
Holistics Expression is our propriety language that allows you to define data, perform transformations, and make calculations, etc. in a database-agnostic way. Essentially, it eliminates the nuances between different SQL dialects, allowing you to concentrate on your data logic.
Why Holistics Expression is necessary
Firstly, in comparison to SQL, Holistics Expression serves more complex business use cases. For example, you can:
- Create a period-over-period growth percentage
- Analyze cumulative sum from the selected range
- Create custom fields combining data from multiple models instead of just the current model
- Apply Column Level Permission easily
- And so much more
Next, if you are working with multiple databases with different SQL dialects, Holistics Expression will help you to create dimension/measure flexibly with unified syntax that functions properly with all of your databases (and of course, with our system as well).
Where to use Holistics Expression
Holistics Expression aims to support both Data Analysts and Explorers to efficiently create calculations thus fulfilling their analytics needs. However, while Analysts work primarily in the modeling layer and prepare reusable measures/calculations, Explorers, on the other hand, view prepared reports and only explore or create calculations on-demand, so that Holistics has introduced two places where Holistics Expression can be created and used.
Business Calculation
Business Calculations can be created when exploring Dataset or Reports and Explorers are its target users. You can imagine it's like calculating 2 or multiple fields in an Excel Sheet. Its formula can be adjusted easily right in Dataset Exploration View.
Model Field Expression
In contrast, Model Field Expression can only be created in a specific model and used when exploring the related model in Dataset. The formula of Model Field Expression cannot be modified in Dataset Exploration View.
How Holistics Expression works
Holistics Expression provides a unified syntax against different SQL dialects for defining any measure (metric) on top of a set of models inside a dataset. For a simple measure which only involves a single model, a measure expression work just like an aggregate function in SQL. Behind the scene, when you use a measure in a dataset, it's translated to a simple SQL expression and put into the final query.
With more complicated measures that involve more models, since we know what you are computing, which models and relationships are involved, we can dynamically prepare these dependencies in different parts of the final query with respect to correctness and performance.
Supported Functions
The functions can be divided into a few basic categories:
- Aggregator Functions: take values from multiple rows to perform a calculation
- Logical Functions: return value based on some logical conditions
- Dealing with Nulls and Zeros functions
- Time Intelligence Functions: Date- and time-related functions
- (Not Available) Filter Funtions: filter expressions that are applied to a measure calculation
Aggregator Functions
Functions | Syntax | Purpose |
---|---|---|
count | count(field) | Counts the total number of items in a group, not including NULL values |
count_distinct | count_distinct(field) | Counts the total number of distinct items in a group, not including NULL values. |
average | average(field) | Averages the values of items in a group, not including NULL values. |
min | min(field) | Computes the item in the group with the smallest numeric value. |
max | max(field) | Computes the item in the group with the largest numeric value. |
sum | sum(field) | Sums the total number of items in a group, not including NULL values. |
median | median(field) | Computes the median of an expression, which is the value that the values in the expression are below 50% of the time. |
stdev | stdev(field) | Returns the standard deviation (sample) of the column created by expression unless expression defines a column of lists, in which case returns the standard deviation (sample) of each list. |
stdevp | stdevp(field) | Returns the standard deviation (population) of the column created by expression unless expression defines a column of lists, in which case returns the standard deviation (population) of each list. |
var | var(field) | Returns the variance (sample) of the column created by expression unless expression defines a column of lists, in which case returns the variance (sample) of each list. |
varp | varp(field) | Returns the variance (population) of the column created by expression unless expression defines a column of lists, in which case returns the variance (population) of each list. |
running total | Shows how a metric has changed over time. This function can only be used in Dataset Exploration UI. |
Logical Funtions
Funtions | Syntax | Purpose |
---|---|---|
case when | case(when: condition_expression, then: value_expression, else: value_expression) | goes through conditions and returns a value when the first condition is met (like an IF-THEN-ELSE statement). |
and | and(condition_expression, ...) | compares between two Booleans as expression and returns true when both expressions are true. |
or | or(condition_expression, ...) | compares two Booleans as expression and returns true when one of the expressions is true. |
not | not(field_expression) | takes a single Boolean as an argument and invert it. |
is | is(field_expression) | evaluates the given statement and return either True or False. |
in | in(field_expression, value_expression, value...) | takes a field expression and a list of values. Return true if that list of values contains the value of that field expression. |
Dealing with Nulls and Zeros functions
Funtions | Syntax | Purpose |
---|---|---|
coalesce | coalesce(val1, val2, ...., val_n) | returns the first non-null value in a list |
nullif | nullif(expr1, expr2) | returns NULL if two expressions are equal, otherwise it returns the first expression. |
safe_divide | safe_divide(val1, val2) | Equivalent to the division operator (X / Y), but returns NULL if an error occurs, such as a division by zero error. |
Time Intelligence Functions
Currently, Holistics only supports epoch
funtion. Other functions are not available.
Functions | Syntax | Purpose |
---|---|---|
epoch | epoch(date); epoch(datetime) | Returns a Unix timestamp which is the number of seconds that have elapsed since ‘1970-01-01 00:00:00’ UTC |
date_trunc | date_trunc(datetime, time_col: 'year', 'quarter', 'month', 'week', 'day', 'hour', 'minute') | Truncate a TIMESTAMP on secific date part |
now | now() | Returns current timestamp |
day | day() | Extract the day from a given timestamp |
month | month() | Extract the month from a given timestamp |
year | year() | Extract the year from a given timestamp |
quarter | quarter() | Extract the quarter from a given timestamp |
week | week() | Extract the week from a given timestamp |
hour | hour() | Extract the hour from a given timestamp |
minute | minute() | Extract the minute from a given timestamp |
week_day | week_day() | Return the day number of a specific date (within a week) |
week_num | week_num() | Return the week number of a specific date (within a year) |
Please refer to Reference section on the left for more information.
FAQs
Holistics Expression is case sensitive
Question: Can I use CASE(WHEN:...,THEN:...,ELSE:...)
, AND()
, OR()
,...
Answer: No, Since Holistics Expression is case sensitive and we don't support capitalized letters in our Expression so the exact needs to be followed
case(when:...,then:...,else:...)
and()
or()
- ...
How to create calculation with only a subset of my current data
Question: How can I create a calculation with only a subset of my current data (using condition inside an aggregate function).
For example, from the eCommerce dataset, what if I want to calculate the total value from the delivered orders only (exclude all other cancelled and refunded orders)
Answer: Since Filter function
inside an Aggregate function
is currently not supported, we recommend that at this moment, you can combine measure
function with case when
inside to calculate the data with any specific condition being applied.
sum(
case(
when: order_derived.order_status == 'delivered'
, then: order_derived.item_value
, else: null
)
)
How to handle error Divide by 0
Question: When doing division in Business Calculation (field_a/field_b), sometimes I encounter division by zero
error which is obviously because my Divisor = 0. How should I handle this case?
Answer: There are 2 ways to handle this case:
Option 1: Use safe_divide
syntax.
Option 2: You can add conditional expression in your divisor to return NULL whenever it has the value of 0
sum(model.field_a)
/
case(
when: sum(model.field_b) == 0
, then: null
, else: sum(model.field_b)
)
Can I add comments?
You can add comments using //
syntax:
sum(
// only take delivered orders
case(
when: order_derived.order_status == 'delivered'
, then: order_derived.item_value
, else: null
)
)