Build Cohort Retention Chart
Introduction
Cohort retention report is a type of report that allows you to track how different groups of users engage with your platform over time. This tutorial will show you how to build a cohort retention report in Holistics.
Let's start with some definition:
- A cohort is a group of users who share common characteristics around a time period. For example: Cohort of students who enrolled in 2021, cohort of users who signed up on August 2021.
- Retention: A measure of how well your platform retains users. For example if 100 people sign ups for your restaurant on first month, but only 20 of them come back the next month, your "retention rate" is 20%.
- By looking at the retention over time of different cohorts of users (hence "cohort retention"), we can see if we are improving our products/services in the right direction.
By this post, we'll show you how to build the below Cohort Retention report:
By monthly cohorts, how many of our first-time buyers are coming back to make purchases the subsequent months?
The above chart tells you a few things:
- Cohort Month: We cohort users by the month when they made their first purchase
- Cohort Size: How many users in that cohort; i.e how many users made their first purchase on that month.
- Month 00, Month 01, Month 02 etc: Months since the user has made the first purchase
- For example: 436 users made their first purchase in Sep 2016. 87% of them came back (made at least 1 purchase) in Month 1; 75% of them came back in month 3, and so on.
Input Data
For this report, we only need a simple table orders
that contains these fields:
id
: order iduser_id
: ID of the usercreated_at
: when the order is made
High-level Approach
There are 2 main steps involved in building a cohort retention report using Holistics:
- From the raw input data, transform them into the right data format
- Use Holistics' "Cohort Retention" chart type to visualize
The transformed data should look like the table below:
- The
cohort month
and corresponding total users in that cohort (cohort size
) - The months since the user has made the first purchase
month number
and how many users are still active on all subsequent months after their first purchasenumber users
.
Step-by-step Instructions
High-level Transformation Diagram
As it's not a simple transformation, we break them down into multiple steps with interim charts. The diagram below puts the steps together:
Step 1: Defining Cohort
We want to group our cohorts based on the month in which they made their first purchase and store them into cohort_dfn (user_id, cohort_month)
model.
That means: For each user with user_id
, which monthly cohort cohort_month
does s/he belong to.
select
{{#o.user_id}},
date_trunc('month', min({{#o.created_at}}))::date as cohort_month
from {{ #orders as o}}
group by 1
Step 2: Calculating Cohort Size for each cohort
We build cohort_size (cohort_month, total_users)
model which is simply how many users are in each cohort:
select
{{#c.cohort_month}},
count(distinct({{ #c.user_id }})) as total_users
from {{#cohort_dfn as c}}
group by 1
Step 3: Calculate if user X makes purchases in month Y
We build retention_by_user_by_month (user_id, month_number)
model that indicates if user X has made a purchase in month Y.
Note that month_number
is a integer value, denoting number of months since user's cohort month.
For example:
- User X belongs to cohort Sep 2019
- X makes a repeat purchase on Nov 2019
- Thus, there will be a record with
(X, 2)
(2 = months between November and September)
A sample table would look like:
| user | month_number |
| Alex | 0 |
| Alex | 1 |
| Bob | 0 |
| Bob | 2 |
The query:
select
{{#o.user_id}},
((date_part('year', {{#o.created_at}}::date) - date_part('year', {{#c.cohort_month}}::date)) * 12 +
(date_part('month', {{#o.created_at}}::date) - date_part('month', {{#c.cohort_month}}::date))) as month_number
from {{#orders as o}}
left join {{#cohort_dfn as c}} on {{#o.user_id}} = {{#c.user_id}}
Step 4: Putting them together
From the data in step 3, we aggregate them and build cohort_retention (cohort_month, month_number, num_users)
model. This indicates how many users in cohort X make purchases in month number Y.
Query below. We use count distinct
to calculate number of users in each group (cohort_month, month_number)
and assign it to dimension num_users
.
select {{#c.cohort_month}}
, concat('Month ', to_char({{#r.month_number}}, 'fm00')) as month_number
, count(distinct({{#r.user_id}})) as num_users
from {{#retention_by_user_by_month as r}}
left join {{#cohort_dfn as c}} on {{#r.user_id}} = {{#c.user_id}}
group by 1,2
Step 5: Create Dataset
Finally, we create a dataset cohort
which contains 2 models cohort_size
and cohort_retention
with 1-n relationship. This is necessary to create the visualization.
Create the relationship between 2 models cohort_size
and cohort_retention
; then, add them to the new dataset called cohort
.
Business users can now drag and drop to explore the data.
Final Step: Building the chart
Now that we’ve built the dataset, go to Reporting and choose the Visualization Retention Heatmap, drag the fields for each element of chart.
It’s great to know our absolute user count by month, but what would actually be even better is to know what percentage of each cohort is being retained over time. We can achieve this via a simple click using Styles tab and Toggle on Support Percentage Display.
Conclusion
Cohort retention analysis is a simple, yet effective way to understand the performance of your marketing retention and acquisition efforts.