Skip to main content

Build Cohort Retention Chart

Introduction

Cohort retention report is a type of report that allows you to track how different groups of users engage with your platform over time. This tutorial will show you how to build a cohort retention report in Holistics.

Let's start with some definition:

  • A cohort is a group of users who share common characteristics around a time period. For example: Cohort of students who enrolled in 2021, cohort of users who signed up on August 2021.
  • Retention: A measure of how well your platform retains users. For example if 100 people sign ups for your restaurant on first month, but only 20 of them come back the next month, your "retention rate" is 20%.
  • By looking at the retention over time of different cohorts of users (hence "cohort retention"), we can see if we are improving our products/services in the right direction.

By this post, we'll show you how to build the below Cohort Retention report:

By monthly cohorts, how many of our first-time buyers are coming back to make purchases the subsequent months?

The above chart tells you a few things:

  • Cohort Month: We cohort users by the month when they made their first purchase
  • Cohort Size: How many users in that cohort; i.e how many users made their first purchase on that month.
  • Month 00, Month 01, Month 02 etc: Months since the user has made the first purchase
  • For example: 436 users made their first purchase in Sep 2016. 87% of them came back (made at least 1 purchase) in Month 1; 75% of them came back in month 3, and so on.

Input Data

For this report, we only need a simple table orders that contains these fields:

  • id: order id
  • user_id: ID of the user
  • created_at: when the order is made

High-level Approach

There are 2 main steps involved in building a cohort retention report using Holistics:

  1. From the raw input data, transform them into the right data format
  2. Use Holistics' "Cohort Retention" chart type to visualize

The transformed data should look like the table below:

  • The cohort month and corresponding total users in that cohort (cohort size)
  • The months since the user has made the first purchase month number and how many users are still active on all subsequent months after their first purchase number users.

Step-by-step Instructions

High-level Transformation Diagram

As it's not a simple transformation, we break them down into multiple steps with interim charts. The diagram below puts the steps together:

Step 1: Defining Cohort

We want to group our cohorts based on the month in which they made their first purchase and store them into cohort_dfn (user_id, cohort_month) model.

That means: For each user with user_id, which monthly cohort cohort_month does s/he belong to.

select
{{#o.user_id}},
date_trunc('month', min({{#o.created_at}}))::date as cohort_month
from {{ #orders as o}}
group by 1

Step 2: Calculating Cohort Size for each cohort

We build cohort_size (cohort_month, total_users) model which is simply how many users are in each cohort:

select 
{{#c.cohort_month}},
count(distinct({{ #c.user_id }})) as total_users
from {{#cohort_dfn as c}}
group by 1

Step 3: Calculate if user X makes purchases in month Y

We build retention_by_user_by_month (user_id, month_number) model that indicates if user X has made a purchase in month Y.

Note that month_number is a integer value, denoting number of months since user's cohort month.

For example:

  • User X belongs to cohort Sep 2019
  • X makes a repeat purchase on Nov 2019
  • Thus, there will be a record with (X, 2) (2 = months between November and September)

A sample table would look like:

| user | month_number |
| Alex | 0 |
| Alex | 1 |
| Bob | 0 |
| Bob | 2 |

The query:

select
{{#o.user_id}},
((date_part('year', {{#o.created_at}}::date) - date_part('year', {{#c.cohort_month}}::date)) * 12 +
(date_part('month', {{#o.created_at}}::date) - date_part('month', {{#c.cohort_month}}::date))) as month_number
from {{#orders as o}}
left join {{#cohort_dfn as c}} on {{#o.user_id}} = {{#c.user_id}}

Step 4: Putting them together

From the data in step 3, we aggregate them and build cohort_retention (cohort_month, month_number, num_users) model. This indicates how many users in cohort X make purchases in month number Y.

Query below. We use count distinct to calculate number of users in each group (cohort_month, month_number) and assign it to dimension num_users.

select {{#c.cohort_month}}
, concat('Month ', to_char({{#r.month_number}}, 'fm00')) as month_number
, count(distinct({{#r.user_id}})) as num_users
from {{#retention_by_user_by_month as r}}
left join {{#cohort_dfn as c}} on {{#r.user_id}} = {{#c.user_id}}
group by 1,2

Step 5: Create Dataset

Finally, we create a dataset cohort which contains 2 models cohort_size and cohort_retention with 1-n relationship. This is necessary to create the visualization. Cohort models relationship

Create the relationship between 2 models cohort_size and cohort_retention; then, add them to the new dataset called cohort.

Business users can now drag and drop to explore the data.

Final Step: Building the chart

Now that we’ve built the dataset, go to Reporting and choose the Visualization Retention Heatmap, drag the fields for each element of chart.

It’s great to know our absolute user count by month, but what would actually be even better is to know what percentage of each cohort is being retained over time. We can achieve this via a simple click using Styles tab and Toggle on Support Percentage Display.

Conclusion

Cohort retention analysis is a simple, yet effective way to understand the performance of your marketing retention and acquisition efforts.


Let us know what you think about this document :)