This article is part of Faraday's Out of the Lab series, which highlights initiatives our Data Science team undertakes and challenges they solve.
Building cohorts with event data
When it comes to predicting customer behavior, including event data is crucial. Events are simply actions taken by a customer or lead, like making a purchase or cancelling a subscription, that are recorded by marketing and sales platforms.
At Faraday, we love events. Why? They are factual, immutable, and have timestamps.
Whenever possible, we interpret raw client data as streams of events. This brings structure and consistency to the messy world that is data collection across many different organizations and verticals.
Working with event data allows us to analyze so much about the relationship between each client and their customers. An important feature of events is that they occur at a specific time, which allows us to translate event data into a collection of dates. We can then ask consistent questions about these events for deeper insight and understanding of customer behavior. How often did this person experience the event? When was the first time? When was the most recent time?
Features of customer cohorts
Events are a precursor to the most important building block we use here at Faraday to build predictive models: cohorts. Put simply, cohorts are groups of people that have experienced the same event. Because events have timestamps, you can imagine a cohort accumulating members along a timeline. Some cohort examples include:
- Home purchasers cohort — defined by a “closing” event
- Grocery buyers cohort — defined by their first “purchase” event
- Churned subscribers cohort — defined by a “cancellation” date
An important feature of cohorts is that individuals cannot be removed from a cohort once they have entered it with a qualifying event (e.g. a purchase, subscription cancellation, etc.). This can seem finicky, but is easily demonstrated with an example: We want to avoid the possibility of counting someone as a customer when they are still able to return a product. Ideally, a customer would only be added to a customer cohort after the return period has lapsed. Since we use cohorts to define groups of people that we want to use for modeling, someone that purchases a product and then returns it is not a customer that we want to use to find new customers.
Now, we don’t want to throw away these customers that returned products, because they can be a useful seed for a retention model. Luckily we can throw them in their own cohort, defined by the date that they returned their product.
Reliability of customer cohorts
We like cohorts because they are only able to grow, retaining each individual customer that enters. The fact that someone can’t be removed from a cohort means that, when modeling, we can expect results from our historical models to be consistent. Additionally, when we need to slice the cohort based on different date ranges, we can be sure that the same date range will always provide the same people. This prevents us from having to deal with a sticky situation where data used to create a model is changing as time passes.
We want our models and data to remain static once we have used them for a client. This allows us to readily test and validate the effectiveness of models without having to go through the headache of verifying that the data hasn’t changed since we created the models. If the data had somehow changed, we would have a damn near impossible task of replicating the data when we built the model in order to have reliable performance metrics.
How brands use cohort analysis
Defining and understanding key cohorts unlocks all of Faraday’s analyses — the following are how we often leverage them for clients.
We can use a “Customers” cohort as the basis of our persona modeling, building out holistic pictures of the individuals that fall into that group so brands can personalize ads and experiences to fit each persona. In Fig. 1 you can see a “Customer” cohort broken out by persona.
When leveraging propensity modeling, we are looking at the likelihood of one event happening after another. For example, an individual becoming a lead and then making a purchase to become a customer. We would analyze the “Leads” cohort, predicting the propensity of the second event, a lead converting into a customer.
We compare cohorts for our Customer Insight Reports to give brands an idea of how their various types of customers are distinctive from one another, and even how they compare to the U.S. population as a whole. These reports often surface surprisingly important details that brands may not have considered before.
To map out customer journeys, customer cohorts are key, as they signify customers who have experienced the particular event(s) that are the “pit stops” along a specific customer journey. In Fig. 2 above, a customer journey using cohorts is illustrated. In order to transition from “Everyone” (the U.S. population) to a “Best customer,” we see that becoming part of the “Leads” cohort and then the “Customers” cohort are necessary steps for someone to be considered a “Best customer.”
Cohort analysis is a powerful tool for predicting customer behavior, accounting for many of the insights we provide to brands on a daily basis. Brands use these insights to make key decisions on everything from how to target high-value leads or proactively prevent churn. And it all starts with the raw event data any direct-to-consumer business is already collecting.
Interested in learning more about how your brand can use cohorts to predict customer behavior? Schedule a demo today.