机器学习代写|Machine Learning for Business Intelligence 1 Hands-on Exercises

本次代写是一个机器学习相关的assignment

Data

The objective of this application is to predict behaviour to avoid customer churning. Identifying the
determinants of churning and predicting churn will allow the company to develop more focused
customer retention programs. Each row represents a customer, each column contains customer’s
attributes. The data set includes information about:

• Customers who left within the last month – the column is called Churn

• Services that each customer has signed up for – phone, multiple lines, internet, online security,
online backup, device protection, tech support, and streaming TV and movies

• Customer account information – how long they’ve been a customer, contract, payment method,
paperless billing, monthly charges, and total charges

• Demographic info about customers – gender, age range, and if they have partners and
dependents

More specifically, the dataset includes:

Two numerical columns:

1. MonthlyCharges: The amount charged to the customer monthly
2. TotalCharges: The total amount charged to the customer

Eighteen categorical columns:

1. CustomerID: Customer ID unique for each customer
2. gender: Whether the customer is a male or a female
3. SeniorCitizen: Whether the customer is a senior citizen or not (1, 0)
4. Partner: Whether the customer has a partner or not (Yes, No)
5. Dependents: Whether the customer has dependents or not (Yes, No)
6. Tenure: Number of months the customer has stayed with the company
7. PhoneService: Whether the customer has a phone service or not (Yes, No)
8. MultipleLines: Whether the customer has multiple lines or not (Yes, No, No phone service)
9. InternetService: Customer’s internet service provider (DSL, Fiber optic, No)
10. OnlineSecurity: Whether the customer has online security or not (Yes, No, No internet service)
11. OnlineBackup: Whether the customer has an online backup or not (Yes, No, No internet service)
12. DeviceProtection: Whether the customer has device protection or not (Yes, No, No internet
service)
13. TechSupport: Whether the customer has tech support or not (Yes, No, No internet service)
14. StreamingTV: Whether the customer has streaming TV or not (Yes, No, No internet service)
15. StreamingMovies: Whether the customer has streaming movies or not (Yes, No, No internet
service)
16. Contract: The contract term of the customer (Month-to-month, One year, Two years)
17. PaperlessBilling: Whether the customer has paperless billing or not (Yes, No)
18. PaymentMethod: The customer’s payment method (Electronic check, Mailed check, Bank
transfer (automatic), Credit card (automatic))

Data understanding

Using the full dataset,

1. Upload the data in R Studio and familiarize with the variables and their meaning. Check the
variables type (e.g., factor, numeric, etc.) and adapt if necessary.

CustomerID should not be part of the analysis.

Tenure should be categorical, cf. the description above. One can discretize it in a few categories
to avoid too many levels. Alternatively, one can treat it as numeric.

2. Visualize the missing values. Consider deleting them from the dataset.

3. Change the “No internet service” to “No” for the following columns: “OnlineSecurity”,
“OnlineBackup”, “DeviceProtection”, “TechSupport”, “StreamingTV”, “StreamingMovies”.

library(plyr)
cols <- c(10:15)
for(i in 1:ncol(data[,cols])) {
data[,cols][,i] <- as.factor(mapvalues
(data[,cols][,i],
from =c(“No internet service”),
to=c(“No”)))
}