Calculate Running Total (Cumulative Total)
Holistics has supported a native feature to calculate Running Total. Check out our documentation here.
Introduction
A running total is a common metric to gain insight into how an amount has accumulated over time. In this article, we'll show you how to set up cumulative number (running total) in Holistics.
Use Case
We'll use a simple table running_total_demo_data
that contains 3 fields: date
, sale
, and product
. There are 1379 rows, dates range from Jan 2016 to Jun 2020 with no sales made on certain days, prices are random positive values, and the product is assigned randomly between A/B/C.
Given the above demo data, we want to build 2 reports:
- Running total of all sales (unsegmented).
- Running total of sales broken down by individual product (segmented).
High-level Solution
Currently, Holistics doesn't support native running total calculation, so we need to do some transformations using SQL queries to calculate the running totals.
We will create two query models that calculate running totals as follow:
We create
rt_all (ordering, date, sales, running_all)
where running_all is the cumulative sum of all products' sales.We create
rt_products (ordering, date, product, sales, running_by_product)
where running_by_product is the cumulative sum of each product's sales.
To perform the cumulative calculation, in this guide we'll be using the self-join technique instead of window functions technique. Read on for more details.
Transforming Data
First, we'll prep the data by adding a numbered
ordering
field to each row and havert_prepped
model.To calculate running total, we self-join the model back on itself, using the
ordering
andproduct
fields as the join conditions to calculate the cumulative fieldrunning_total
and save it to new models as below.- We build
rt_all
model: running total by date - We build
rt_products
model: running total by product and date
- We build
Adding ordering
column
Our first step will be to create an ordering
column. Besides being simpler for a lot of JOIN
-related transformations, it's also easier for humans to read (especially when timestamps between transactions are too similar).
We create a Transform Model (which we'll name rt_prepped
) with this simple code:
select
row_number () over () as ordering,
date,
sales,
product
from
{{ #running_total_demo_data}}
ROW_NUMBER()
written in this way generates a sequential column that numbers each row. You can add an ORDER BY
argument within the OVER()
clause if you have another column that you would like to order it by.
Note: If you are recording both positive and negative value transactions, you might want to use a
DATE_PART()
function to extract the dates based on the intervals of your choice (e.g. 'month' and/or 'year'),SUM()
the values, and then group them by your chosen interval so that there is only one entry per interval. You will also have to pre-filter the segments you wish to exclude or segment your report by.
Building Running total by Date (Unsegmented)
Now, let's add a new cumulative column to tell people how total prices of all products do we have up to a particular day and save it to rt_all
model
Query:
We are simply joining each row on its precedent and summing price totals to create our running_all
field. Creating the ordering
field makes my first JOIN condition easier. Without it, you might need to create a joining key.
SELECT
t1.ordering,
t1.date,
t1.sales,
sum(t2.sales) as running_all
FROM
{{ #rt_prepped t1}}
INNER JOIN {{ #rt_prepped t2}} ON t1.ordering>= t2.ordering
GROUP BY
t1.ordering,
t1.date,
t1.sales
ORDER BY
t1.id
Building Running total by Product & Date (Segmented)
Sometimes, we need to calculate the running total based on date by product.
In SQL, this might not be ideal as it would create separate running totals for each product
type and you will have to do extra work to create an overall running total in your visualizations, but with Holistics it can actually cut down on your work!
Let's add second JOIN conditions to segment our data by the product
field and then proceed.
SELECT
t1.ordering,
t1.date,
t1.product,
t1.sales,
sum(t2.sales) as running_by_product
FROM
{{ #rt_prepped t1}}
INNER JOIN {{ #rt_prepped t2}} ON t1.ordering>= t2.ordering AND t1.product = t2.product
GROUP BY
t1.ordering,
t1.product,
t1.date,
t1.sales
ORDER BY
t1.ordering
Both segmented and unsegmented models have their time and place when it comes to generating reports.
Caveat: Handle Days With No Data
There's a problem with the above models: days with no sales are not shown in the result table (e.g. no sales between Jan 07, 2016 and Jan 31, 2016). This is considered a bug in our query, and needs to be addressed. We do want that particular day to report 0 sales, instead of missing out on the value completely.
We fix this by creating a date dimensions model which generates a list of all dates between [min(date), max(date)]
(using PostgreSQL's generate_series
function) and then FULL JOIN
it to the base model on the date
/datetime
field. You then further clean the data by either changing your price
and other fields with CASE WHEN
conditions to populate them with 0/'NIL' where appropriate.
Now, let's edit rt_prepped
model with a more complicated SQL query as below:
with date_range as(
select
generate_series( min({{ #a.date}}), max({{ #a.date}}),'1d')::date as dates
from
{{ #running_total_demo_data a}}
)
select
row_number () over () as ordering,
date_range.dates as date,
coalesce(t1.product,'A') as product,
coalesce(t1.sales,0) as sales
FROM
{{ #running_total_demo_data t1}}
full join date_range on t1.date=date_range.dates
Visualizations and Reporting in Holistics
Let's walk through how to present the data using different types of charts.
Area charts
For a simple area chart, drag the Date
field into the "X-Axis" area, and click on it to select your interval. We've chosen "Quarter".
Drag your segment Product
into the "Legend area.
Drag your Running By Product
field to your "Y-Axis" area, and click on it and select "Max".
As stated above, if you are recording both positive and negative transactions, you will have to pre-group and sum your transactions by your chosen interval, or else the visualization might be inaccurate.
Under "Styles", enable "Stack Series".
Of course, you can always use the unsegmented version of your table.
Displaying using Pivot Tables
For segmented data in a pivot table, drag your Date
to "Rows", your segments to "Columns" (in order of hierarchy), and Running All
to values set to Max. Go to "Styles" to enable your Row Totals. Now you have all your running totals by segments and as a whole.
As with the Area Chart, if you want the unsegmented running total without restriction on your Date interval, either prep your data accordingly or use the unsegmented data.
Combination Chart
Just like the area chart, Date
as your "X-Axis", segments in your legend, and "Max" Running Total
as one of your Y-Axes with any other relevant metrics in other Y-Axes.
Dashboard Filters
You can also create both selective and global filters on your Dashboard with a few clicks. This allows your business user to see only the data that is relevant to them without needing to run to an analyst to generate a whole new SQL query for each new report.
For our example, we've created a "Product" filter that affects only the segmented charts, and a global "Date" filter.
Summary
As you saw, in this guide we have shown you how to do a simple cumulative report using the power and flexibility of Holistics data modeling and visualization. We address a few points:
- Calculate running total without window functions. Instead of that,
INNER JOIN
data model back on itself. - Avoid empty-row days with generated date ranges
- The accumulating sums don't only sum for the selected date period. It also takes into account the running total from the beginning of time.