Rule-Based Classification
In this article, I will calculate the potential revenue return of the client to the firm by using a rule-based classification.
Business Problem:
A company that sells souvenirs wants to create new level-based customer definitions (personas) using its customers’ features and create segments according to these new customer definitions. They want to estimate how much new customers can benefit the company on average using the new segments.
for example, how much can the firm make benefits from 25 years old man who uses online transactions.
Dataset
The data set includes the prices of the products that an American gift company sells online and in-store and some demographic information of the users who buy these products.
PRICE — product price
SOURCE — online or store
SEX — customer’s gender
STATE — USA states in which are the company has the customers
AGE — Age of customers.
Purpose of Project
- LEVEL BASED PERSONA DEFINITION: To be able to define new customers according to category levels (Level).
- SIMPLE SEGMENTATION: Simply segment new customer definitions
- RULE-BASED CLASSIFICATION: When a new customer arrives, classify this customer according to segments.
- Our dataset contains 5000 observation units, 5 variables including continuous and categorical, and no NA values. Let’s look at the description section, as you can see our general customers seem young to make ads according to young people’s interests or try to acquire older customers.
What are the price averages in the state-source breakdown?
What are the total gains broken down by STATE, SOURCE, SEX, AGE?
We are going to turn the age variable into the categorical variable and make a new variable called cat_age.
We will split the variable age using the cut() function and set the range for age_cat.
For example: ‘0_18’, ‘19_23’, ‘24_30’, ‘31_40’, ‘41_70
Let’s create level based customer (persona) and add it in the data set as a new variable called “ customers_level_based”. We choose the criteria from the columns which we want to use. Our new personas look like this “CA_store_male_41_66” this shows us how we
- Using the pd.qcut function we will segment new personas by price into “D”, “C”, “B”, “A”.
Analysis of segment by price:
As you can see in the following snippet of code, the qcut function sorts customers by price and split them into four even segments. These segments make sense when we examine them. For example segment of ‘A’ has the most bigger price mean than other segments and the segment of ‘D’ has the lowest price mean so our Rule-Based Classifier has extracted the meaningful information from the dataset properties.
In conclusion, rule-based classification builds a model that is a set of rules. Rules can be extracted from training data. Rule quality measures are important to assess the rules and to determine the segments.
You can find the dataset and step-by-step implementation in my Kaggle account.
Thanks for reading. Please let me know if you have any feedback and question.