Customer Segmentation in Online Retail: Cohorts Analysis

A step-by-step explanation on performing Customer Segmentation in Online Retail dataset using python, focussing on cohort analysis.

Prachi Gopalani
The Startup

--

Image Link

In this article, I am going to tell about how to carry out customer segmentation and other related analysis on online retail data using python.

Let’s Understand what is Customer Segmentation

It is a practice of dividing a customer base into groups of individuals that are similar in specific ways relevant to marketing, such as age, gender, interests and spending habits.

Key differentiators that divide customers into groups:

· Demographics (age, race, religion, gender, family size, ethnicity, income, education level)

· Geography (location)

· Psychographic (social class, lifestyle and personality characteristics)

· Behavioural (spending, consumption, usage and desired benefits) tendencies are considered when determining customer segmentation practices.

What is Cohort Analysis?

A cohort is a set of users who share similar characteristics over time. Cohort analysis groups the users into mutually exclusive groups and their behaviour is measured over time.

It can provide information about product and customer lifecycle.

Types of cohorts

1. Time cohorts

Time cohorts are customers who signed up for a product or service during a particular time frame. Analysing these cohorts shows the customers’ behaviour depending on the time they started using the company’s products or services. The time may be monthly or quarterly, even daily.

2. Behaviour cohorts

Behaviour cohorts are customers who purchased a product or subscribed to a service in the past. It groups customers by the type of product or service they signed up. Customers who signed up for basic level services might have different needs than those who signed up for advanced services. Understanding the needs of the various cohorts can help a company design custom-made services or products for particular segments.

3. Size cohorts:

Size cohorts refer to the various sizes of customers who purchase company’s products or services. This categorization can be based on the amount of spending in some period of time after acquisition, or the product type that the customer spent most of their order amount in some period of time. Now, let’s look at the main elements of the cohort analysis.

Cohort Analysis can answer questions like:

1. Are the new cohorts you’re acquiring more (or less) valuable than previous users?

2. Have changes you’ve made to your site impacted users who are new to your site?

3. Are there seasonal differences between users you acquire? Perhaps users acquired during big retail moments like Black Friday behave differently than those acquired at other times.

4. What is your users’ retention rate?

5. What is the long-term value of your users?

6. When do users start to churn?

Getting started

We will use the Online Retail Data of the very popular transactional dataset provided by UCI machine Learning repository. The link to the data can be found here.

This is a transactional data set which contains the transactions occurring between 01/12/2010 and 09/12/2011 for the UK-based and registered non-store online retail firm and contains realistic customer Transaction information in a commonly used format in Industry.

Data snapshot

Fig: Whole Data set

Data Attributes:

1. InvoiceNo: Invoice number. Nominal, a 6-digit integral number uniquely assigned to each transaction. If this code starts with letter ‘c’, it indicates a cancellation.

2. StockCode: Product (item) code. Nominal, a 5-digit integral number uniquely assigned to each distinct product.

3. Description: Product (item) name. Nominal.

4. Quantity: The quantities of each product (item) per transaction. Numeric.

5. InvoiceDate: Invice Date and time. Numeric, the day and time when each transaction was generated.

6. UnitPrice: Unit price. Numeric, Product price per unit in sterling.

7. CustomerID: Customer number. Nominal, a 5-digit integral number uniquely assigned to each customer.

8. Country: Country name. Nominal, the name of the country where each customer resides

Cohort Analysis: Retention Rate

Since, we will be performing Cohort Analysis based on Transaction records of Customers, we will be Dealing with Mainly:

· Invoice Data

· CustomerID

· Price

· and Quantity columns in this Analysis

Step-by-Step approach performed to generate the Cohort Chart of Retention Rate:

Step 1: Month Extraction from InvoiceDate Column

First we will create a function, which takes any date and returns the formatted date with day value as 1st of the same month and Year.

Fig: Month extraction function

Step 2: Assigning Cohorts to Each Column

Next, a column called InvoiceMonth was created to indicate the month of the transaction by taking the first date of the month of InvoiceDate for each transaction. Then, information about the first month of the transaction was extracted, grouped by the CustomerID.

Fig: Creating the InvoiceMonth and CohortMonth column

Step 3: Assigning Cohort Index to each transaction

In order to find Cohort index we have to find difference between InvoiceMonth & CohortMonth column in terms of number of months.

The code is shown Here.

Step 4: Calculating number of unique customers in each Group of (CohortDate,Index)

The percentage of active customers compared to the total number of customers after a specific time interval is called retention rate. In this section, we will calculate retention count for each cohort Month paired with cohort Index.

Now we will count number of unique customer Id’s falling in each group of CohortMonth and CohortIndex. This will give us number of customers (Retained Customers) from each cohort who bought items after a n Months where n is CohortIndex and store them in a new dataframe cohort Data.

Fig: Identifying Unique Customer ID

Step 5: Retention rate Calculation

After obtaining the above information, we obtain the cohort analysis matrix by grouping the data by CohortMonth and CohortIndex and aggregating on the CustomerID column by applying the pivot function. Here are the cohort counts obtained:

Fig: Pivot table of CohortMonth & CohortIndex

What does the above table tell us?

Consider CohortMonth 2010–12–01: For CohortIndex 0, this tells us that 948 unique customers made transactions during CohortMonth 2010–12–01. For CohortIndex 1, this tells that there are 362 customers out of 948 who made their first transaction during CohortMonth 2010–12–01 and they also made transactions during the next month. That is, they remained active.

For CohortIndex 2, this tells that there are 362 customers out of 948 who made their first transaction during CohortMonth 2010–12–01 and they also made transactions during the second-next month. And so on for higher CohortIndices.

Step 6: Visualizing the Above Retention rate

Plotting the above matrix in form of heatmap and converting the date in Year-Month format by using strftime function

Fig: Cohort Heatmap

From the above cohort retention rate heatmap, we can see that there is an average retention of ~38% for the CohortMonth 2010–12–01, with the highest retention rate occurring after 11 months (50%). For all the other CohortMonths, the average retention rates are around 18–25%. Only this percentage of users are making transactions again in the given CohortIndex ranges.

From this analysis, company can understand above mentioned questions:

  • Did the strategy employed to improve the conversion rates of Customers worked?
  • Should I focus more on retention rather than acquiring new customers?

And then can create strategies to increase customer retention by providing more attractive discounts or by doing more effective marketing, etc.

Code Link available here.

Follow for more intresting analytics updates!

Happy Reading !

“ Want me to write some Tech Blog ? Contact me here .”

--

--