Machine learning work steps through python

Import all useable library

In [1]:
# import relevant modules
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.formula.api as sn
import scipy.stats as stats
from matplotlib.backends.backend_pdf import PdfPages
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.linear_model import LogisticRegression
from statsmodels.stats.outliers_influence import variance_inflation_factor
from patsy import dmatrices
%matplotlib inline

# Ignore warnings
import warnings
warnings.filterwarnings('ignore')

# Settings
pd.set_option('display.max_columns', None)
np.set_printoptions(threshold=np.nan)
np.set_printoptions(precision=3)
sns.set(style="darkgrid")
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12

Load the Data set.

In [2]:
custdata_df = pd.read_excel("Data Set.xlsx", sheetname="customer_dbase")
In [3]:
custdata_df.sample(5)
Out[3]:
custid region townsize gender age agecat birthmonth ed edcat jobcat union employ empcat retire income lninc inccat debtinc creddebt lncreddebt othdebt lnothdebt default jobsat marital spoused spousedcat reside pets pets_cats pets_dogs pets_birds pets_reptiles pets_small pets_saltfish pets_freshfish homeown hometype address addresscat cars carown cartype carvalue carcatvalue carbought carbuy commute commutecat commutetime commutecar commutemotorcycle commutecarpool commutebus commuterail commutepublic commutebike commutewalk commutenonmotor telecommute reason polview polparty polcontrib vote card cardtype cardbenefit cardfee cardtenure cardtenurecat card2 card2type card2benefit card2fee card2tenure card2tenurecat cardspent card2spent active bfast tenure churn longmon lnlongmon longten lnlongten tollfree tollmon lntollmon tollten lntollten equip equipmon lnequipmon equipten lnequipten callcard cardmon lncardmon cardten lncardten wireless wiremon lnwiremon wireten lnwireten multline voice pager internet callid callwait forward confer ebill owntv hourstv ownvcr owndvd owncd ownpda ownpc ownipod owngame ownfax news response_01 response_02 response_03
394 8512-KZJXAA-A34 5 2.0 0 58 5 July 13 2 3 0 13 4 0 54 3.988984 3 16.4 1.186704 0.171180 7.669296 2.037225 0 4 0 -1 -1 1 7 0 0 0 0 0 0 7 1 1 25 4 3 1 0 28.0 2 0 1 1 1 23.0 1 0 1 1 0 0 0 0 0 0 9 5 0 0 1 3 3 2 0 18 5 2 2 1 0 14 4 226.22 62.82 1 3 43 0 9.10 2.208274 399.10 5.989212 1 37.25 3.617652 1616.55 7.388050 0 0.0 NaN 0.00 NaN 1 32.50 3.481240 1395.0 7.240650 0 0.0 NaN 0.00 NaN 1 0 0 0 1 1 1 1 0 1 18 1 1 1 0 1 1 0 0 1 0 0 0
218 8477-FURXBL-V98 1 1.0 0 47 4 March 15 3 3 1 8 3 0 45 3.806662 2 19.7 2.526525 0.926845 6.338475 1.846638 0 5 0 -1 -1 1 2 1 1 0 0 0 0 0 1 1 24 4 1 1 1 25.3 2 0 1 3 2 14.0 0 0 1 0 0 0 0 0 0 0 9 4 0 0 0 4 1 4 0 4 2 3 2 3 0 6 3 42.69 12.08 0 3 7 0 4.05 1.398717 27.50 3.314186 0 0.00 NaN 0.00 NaN 0 0.0 NaN 0.00 NaN 0 0.00 NaN 0.0 NaN 0 0.0 NaN 0.00 NaN 0 0 0 2 0 0 0 0 0 1 17 1 1 1 0 1 0 1 0 0 0 0 0
4015 0409-MMPGJY-ECA 1 1.0 1 60 5 January 12 2 5 1 27 5 0 79 4.369448 4 7.7 1.344343 0.295905 4.738657 1.555754 0 2 0 -1 -1 1 2 2 0 0 0 0 0 0 1 1 9 3 0 -1 -1 -1.0 -1 -1 1 2 1 15.0 0 1 0 1 0 0 0 0 0 0 9 6 1 0 1 1 1 2 0 19 5 4 2 2 0 11 4 1249.83 881.78 0 3 30 0 8.15 2.098018 271.55 5.604146 1 17.00 2.833213 555.60 6.320049 0 0.0 NaN 0.00 NaN 1 5.50 1.704748 155.0 5.043425 0 0.0 NaN 0.00 NaN 0 0 0 0 0 1 1 1 0 1 14 1 1 1 0 0 0 0 0 0 1 0 0
4407 9411-CNRRPX-2HW 3 2.0 0 56 5 July 17 4 1 0 16 5 0 219 5.389072 5 7.0 5.212200 1.651002 10.117800 2.314296 0 2 0 -1 -1 1 5 0 5 0 0 0 0 0 0 2 19 4 2 1 0 46.1 3 0 1 1 1 15.0 1 1 0 0 1 0 0 0 0 0 9 3 0 0 1 3 1 4 0 33 5 1 1 2 0 23 5 678.79 96.49 0 3 63 0 23.50 3.157000 1508.30 7.318738 0 0.00 NaN 0.00 NaN 1 38.2 3.642836 2324.95 7.751454 1 38.50 3.650658 2400.0 7.783224 1 23.7 3.165475 1406.55 7.248895 1 0 1 1 1 1 1 1 1 1 12 1 1 1 0 1 0 1 1 0 0 0 1
1044 1890-WDOXSL-5H1 2 3.0 0 20 2 May 10 1 6 1 3 2 0 19 2.944439 1 3.0 0.369930 -0.994441 0.200070 -1.609088 0 2 1 11 1 4 0 0 0 0 0 0 0 0 0 2 0 1 2 1 0 9.6 1 0 1 1 1 19.0 1 0 1 0 0 0 0 0 0 0 9 5 0 0 0 3 3 3 0 2 2 4 3 1 0 2 2 362.36 163.95 1 3 24 0 3.75 1.321756 69.50 4.241327 0 0.00 NaN 0.00 NaN 0 0.0 NaN 0.00 NaN 1 7.25 1.981001 175.0 5.164786 0 0.0 NaN 0.00 NaN 0 0 0 0 0 1 1 1 0 1 22 0 1 1 0 0 0 0 0 0 0 0 0
In [4]:
# Find column information in the dataframe.
custdata_df.columns
Out[4]:
Index(['custid', 'region', 'townsize', 'gender', 'age', 'agecat', 'birthmonth',
       'ed', 'edcat', 'jobcat',
       ...
       'owncd', 'ownpda', 'ownpc', 'ownipod', 'owngame', 'ownfax', 'news',
       'response_01', 'response_02', 'response_03'],
      dtype='object', length=130)

Creating Dependent Y column

In [5]:
#To create Y we need to sumup cardspent(first card spent amount) and card2spent(Second card spent amount)
custdata_df['totalspend'] = custdata_df['cardspent'] + custdata_df['card2spent']
In [6]:
custdata_df.head()
Out[6]:
custid region townsize gender age agecat birthmonth ed edcat jobcat union employ empcat retire income lninc inccat debtinc creddebt lncreddebt othdebt lnothdebt default jobsat marital spoused spousedcat reside pets pets_cats pets_dogs pets_birds pets_reptiles pets_small pets_saltfish pets_freshfish homeown hometype address addresscat cars carown cartype carvalue carcatvalue carbought carbuy commute commutecat commutetime commutecar commutemotorcycle commutecarpool commutebus commuterail commutepublic commutebike commutewalk commutenonmotor telecommute reason polview polparty polcontrib vote card cardtype cardbenefit cardfee cardtenure cardtenurecat card2 card2type card2benefit card2fee card2tenure card2tenurecat cardspent card2spent active bfast tenure churn longmon lnlongmon longten lnlongten tollfree tollmon lntollmon tollten lntollten equip equipmon lnequipmon equipten lnequipten callcard cardmon lncardmon cardten lncardten wireless wiremon lnwiremon wireten lnwireten multline voice pager internet callid callwait forward confer ebill owntv hourstv ownvcr owndvd owncd ownpda ownpc ownipod owngame ownfax news response_01 response_02 response_03 totalspend
0 3964-QJWTRG-NPN 1 2.0 1 20 2 September 15 3 1 1 0 1 0 31 3.433987 2 11.1 1.200909 0.183079 2.240091 0.806516 1 1 0 -1 -1 3 0 0 0 0 0 0 0 0 0 2 0 1 2 1 0 14.3 1 0 0 8 4 22.0 0 1 1 0 0 0 0 1 0 0 9 6 1 0 1 3 1 1 0 2 2 5 3 1 0 3 2 81.66 67.80 0 3 5 1 6.50 1.871802 34.40 3.538057 1 29.0 3.367296 161.05 5.081715 1 29.50 3.384390 126.1 4.837075 1 14.25 2.656757 60.0 4.094345 0 0.00 NaN 0.00 NaN 1 1 1 0 0 1 1 1 0 1 13 1 1 0 0 0 1 1 0 0 0 1 0 149.46
1 0648-AIPJSP-UVM 5 5.0 0 22 2 May 17 4 2 0 0 1 0 15 2.708050 1 18.6 1.222020 0.200505 1.567980 0.449788 1 1 0 -1 -1 2 6 0 0 0 0 0 0 6 1 3 2 1 2 1 1 6.8 1 0 0 1 1 29.0 1 0 0 1 0 0 1 0 1 1 9 4 1 0 0 2 4 1 0 4 2 4 1 3 0 4 2 42.60 34.94 1 1 39 0 8.90 2.186051 330.60 5.800909 0 0.0 NaN 0.00 NaN 1 54.85 4.004602 1975.0 7.588324 1 16.00 2.772589 610.0 6.413459 1 45.65 3.821004 1683.55 7.428660 1 1 1 4 1 0 1 0 1 1 18 1 1 1 1 1 1 1 1 1 0 0 0 77.54
2 5195-TLUDJE-HVO 3 4.0 1 67 6 June 14 2 2 0 16 5 0 35 3.555348 2 9.9 0.928620 -0.074056 2.536380 0.930738 0 4 1 13 2 3 3 2 1 0 0 0 0 0 1 1 30 5 3 1 1 18.8 1 0 1 4 3 24.0 1 0 1 1 1 0 0 0 0 0 2 5 1 0 0 2 1 4 0 35 5 4 1 3 0 25 5 184.22 175.75 0 3 65 0 28.40 3.346389 1858.35 7.527444 0 0.0 NaN 0.00 NaN 0 0.00 NaN 0.0 NaN 1 23.00 3.135494 1410.0 7.251345 0 0.00 NaN 0.00 NaN 1 0 0 0 0 0 0 0 0 1 21 1 1 1 0 0 0 0 0 1 0 0 0 359.97
3 4459-VLPQUH-3OL 4 3.0 0 23 2 May 16 3 2 0 0 1 0 20 2.995732 1 5.7 0.022800 -3.780995 1.117200 0.110826 1 2 1 18 4 5 0 0 0 0 0 0 0 0 1 3 3 2 3 1 1 8.7 1 0 1 1 1 38.0 1 0 0 0 0 0 0 0 0 0 9 3 0 0 0 2 1 4 0 5 2 3 2 4 0 5 2 340.99 18.42 1 1 36 0 6.00 1.791759 199.45 5.295564 0 0.0 NaN 0.00 NaN 0 0.00 NaN 0.0 NaN 1 21.00 3.044522 685.0 6.529419 0 0.00 NaN 0.00 NaN 1 0 0 2 0 0 0 0 1 1 26 1 1 1 0 1 1 1 0 1 1 0 0 359.41
4 8158-SMTQFB-CNO 2 2.0 0 26 3 July 16 3 2 0 1 1 0 23 3.135494 1 1.7 0.214659 -1.538705 0.176341 -1.735336 0 1 1 13 2 4 0 0 0 0 0 0 0 0 0 2 3 2 1 0 1 10.6 1 0 1 6 3 32.0 0 0 0 0 0 1 0 1 0 0 9 4 0 0 0 4 2 1 0 8 3 1 3 2 0 9 3 255.10 252.73 1 3 21 0 3.05 1.115142 74.10 4.305416 1 16.5 2.803360 387.70 5.960232 0 0.00 NaN 0.0 NaN 1 17.25 2.847812 360.0 5.886104 1 19.05 2.947067 410.80 6.018106 0 1 0 3 1 1 1 1 0 1 27 1 1 1 0 1 0 1 0 0 0 1 0 507.83
In [7]:
# Now Run pandas profiling to see the data audit reports

import pandas_profiling
pandas_profiling.ProfileReport(custdata_df)
Out[7]:

Overview

Dataset info

Number of variables 131
Number of observations 5000
Total Missing (%) 0.2%
Total size in memory 5.0 MiB
Average record size in memory 1.0 KiB

Variables types

Numeric 59
Categorical 1
Boolean 49
Date 0
Text (Unique) 1
Rejected 21
Unsupported 0

Warnings

Variables

active
Boolean

Distinct count 2
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
Mean 0.466
0
2670
1
2330
Value Count Frequency (%)  
0 2670 53.4%
 
1 2330 46.6%
 

address
Numeric

Distinct count 57
Unique (%) 1.1%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 16.402
Minimum 0
Maximum 57
Zeros (%) 4.9%

Quantile statistics

Minimum 0
5-th percentile 1
Q1 6
Median 14
Q3 25
95-th percentile 40
Maximum 57
Range 57
Interquartile range 19

Descriptive statistics

Standard deviation 12.397
Coef of variation 0.75583
Kurtosis -0.22967
Mean 16.402
MAD 10.223
Skewness 0.70655
Sum 82012
Variance 153.7
Memory size 39.1 KiB
Value Count Frequency (%)  
0 245 4.9%
 
2 196 3.9%
 
4 195 3.9%
 
5 177 3.5%
 
3 172 3.4%
 
1 169 3.4%
 
8 169 3.4%
 
7 166 3.3%
 
12 166 3.3%
 
6 163 3.3%
 
Other values (47) 3182 63.6%
 

Minimum 5 values

Value Count Frequency (%)  
0 245 4.9%
 
1 169 3.4%
 
2 196 3.9%
 
3 172 3.4%
 
4 195 3.9%
 

Maximum 5 values

Value Count Frequency (%)  
52 7 0.1%
 
53 6 0.1%
 
54 1 0.0%
 
55 5 0.1%
 
57 3 0.1%
 

addresscat
Highly correlated

This variable is highly correlated with address and should be ignored for analysis

Correlation 0.92352

age
Numeric

Distinct count 62
Unique (%) 1.2%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 47.026
Minimum 18
Maximum 79
Zeros (%) 0.0%

Quantile statistics

Minimum 18
5-th percentile 20
Q1 31
Median 47
Q3 62
95-th percentile 76
Maximum 79
Range 61
Interquartile range 31

Descriptive statistics

Standard deviation 17.77
Coef of variation 0.37789
Kurtosis -1.187
Mean 47.026
MAD 15.403
Skewness 0.09076
Sum 235128
Variance 315.78
Memory size 39.1 KiB
Value Count Frequency (%)  
18 106 2.1%
 
35 102 2.0%
 
37 98 2.0%
 
24 97 1.9%
 
21 95 1.9%
 
63 95 1.9%
 
31 94 1.9%
 
57 93 1.9%
 
25 93 1.9%
 
36 92 1.8%
 
Other values (52) 4035 80.7%
 

Minimum 5 values

Value Count Frequency (%)  
18 106 2.1%
 
19 78 1.6%
 
20 80 1.6%
 
21 95 1.9%
 
22 82 1.6%
 

Maximum 5 values

Value Count Frequency (%)  
75 74 1.5%
 
76 58 1.2%
 
77 71 1.4%
 
78 70 1.4%
 
79 73 1.5%
 

agecat
Highly correlated

This variable is highly correlated with age and should be ignored for analysis

Correlation 0.96988

bfast
Numeric

Distinct count 3
Unique (%) 0.1%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 2.0586
Minimum 1
Maximum 3
Zeros (%) 0.0%

Quantile statistics

Minimum 1
5-th percentile 1
Q1 1
Median 2
Q3 3
95-th percentile 3
Maximum 3
Range 2
Interquartile range 2

Descriptive statistics

Standard deviation 0.82952
Coef of variation 0.40295
Kurtosis -1.5385
Mean 2.0586
MAD 0.70605
Skewness -0.10964
Sum 10293
Variance 0.6881
Memory size 39.1 KiB
Value Count Frequency (%)  
3 1875 37.5%
 
1 1582 31.6%
 
2 1543 30.9%
 

Minimum 5 values

Value Count Frequency (%)  
1 1582 31.6%
 
2 1543 30.9%
 
3 1875 37.5%
 

Maximum 5 values

Value Count Frequency (%)  
1 1582 31.6%
 
2 1543 30.9%
 
3 1875 37.5%
 

birthmonth
Categorical

Distinct count 12
Unique (%) 0.2%
Missing (%) 0.0%
Missing (n) 0
September
 
458
May
 
451
January
 
420
Other values (9)
3671
Value Count Frequency (%)  
September 458 9.2%
 
May 451 9.0%
 
January 420 8.4%
 
June 420 8.4%
 
February 418 8.4%
 
March 416 8.3%
 
July 413 8.3%
 
October 410 8.2%
 
August 406 8.1%
 
November 399 8.0%
 
Other values (2) 789 15.8%
 

callcard
Boolean

Distinct count 2
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
Mean 0.7162
1
3581
0
1419
Value Count Frequency (%)  
1 3581 71.6%
 
0 1419 28.4%
 

callid
Boolean

Distinct count 2
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
Mean 0.4752
0
2624
1
2376
Value Count Frequency (%)  
0 2624 52.5%
 
1 2376 47.5%
 

callwait
Boolean

Distinct count 2
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
Mean 0.479
0
2605
1
2395
Value Count Frequency (%)  
0 2605 52.1%
 
1 2395 47.9%
 

carbought
Numeric

Distinct count 3
Unique (%) 0.1%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 0.221
Minimum -1
Maximum 1
Zeros (%) 58.0%

Quantile statistics

Minimum -1
5-th percentile -1
Q1 0
Median 0
Q3 1
95-th percentile 1
Maximum 1
Range 2
Interquartile range 1

Descriptive statistics

Standard deviation 0.60912
Coef of variation 2.7562
Kurtosis -0.5264
Mean 0.221
MAD 0.49918
Skewness -0.15823
Sum 1105
Variance 0.37103
Memory size 39.1 KiB
Value Count Frequency (%)  
0 2901 58.0%
 
1 1602 32.0%
 
-1 497 9.9%
 

Minimum 5 values

Value Count Frequency (%)  
-1 497 9.9%
 
0 2901 58.0%
 
1 1602 32.0%
 

Maximum 5 values

Value Count Frequency (%)  
-1 497 9.9%
 
0 2901 58.0%
 
1 1602 32.0%
 

carbuy
Boolean

Distinct count 2
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
Mean 0.361
0
3195
1
1805
Value Count Frequency (%)  
0 3195 63.9%
 
1 1805 36.1%
 

carcatvalue
Numeric

Distinct count 4
Unique (%) 0.1%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 1.3894
Minimum -1
Maximum 3
Zeros (%) 0.0%

Quantile statistics

Minimum -1
5-th percentile -1
Q1 1
Median 1
Q3 2
95-th percentile 3
Maximum 3
Range 4
Interquartile range 1

Descriptive statistics

Standard deviation 1.0813
Coef of variation 0.77825
Kurtosis 0.23064
Mean 1.3894
MAD 0.84868
Skewness -0.49643
Sum 6947
Variance 1.1692
Memory size 39.1 KiB
Value Count Frequency (%)  
1 2399 48.0%
 
2 1267 25.3%
 
3 837 16.7%
 
-1 497 9.9%
 

Minimum 5 values

Value Count Frequency (%)  
-1 497 9.9%
 
1 2399 48.0%
 
2 1267 25.3%
 
3 837 16.7%
 

Maximum 5 values

Value Count Frequency (%)  
-1 497 9.9%
 
1 2399 48.0%
 
2 1267 25.3%
 
3 837 16.7%
 

card
Numeric

Distinct count 5
Unique (%) 0.1%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 2.7142
Minimum 1
Maximum 5
Zeros (%) 0.0%

Quantile statistics

Minimum 1
5-th percentile 1
Q1 2
Median 3
Q3 4
95-th percentile 4
Maximum 5
Range 4
Interquartile range 2

Descriptive statistics

Standard deviation 1.1849
Coef of variation 0.43656
Kurtosis -1.1112
Mean 2.7142
MAD 1.0323
Skewness 0.015333
Sum 13571
Variance 1.404
Memory size 39.1 KiB
Value Count Frequency (%)  
4 1344 26.9%
 
2 1247 24.9%
 
3 1200 24.0%
 
1 986 19.7%
 
5 223 4.5%
 

Minimum 5 values

Value Count Frequency (%)  
1 986 19.7%
 
2 1247 24.9%
 
3 1200 24.0%
 
4 1344 26.9%
 
5 223 4.5%
 

Maximum 5 values

Value Count Frequency (%)  
1 986 19.7%
 
2 1247 24.9%
 
3 1200 24.0%
 
4 1344 26.9%
 
5 223 4.5%
 

card2
Numeric

Distinct count 5
Unique (%) 0.1%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 2.7744
Minimum 1
Maximum 5
Zeros (%) 0.0%

Quantile statistics

Minimum 1
5-th percentile 1
Q1 2
Median 3
Q3 4
95-th percentile 5
Maximum 5
Range 4
Interquartile range 2

Descriptive statistics

Standard deviation 1.1734
Coef of variation 0.42296
Kurtosis -0.91791
Mean 2.7744
MAD 0.99139
Skewness 0.084736
Sum 13872
Variance 1.377
Memory size 39.1 KiB
Value Count Frequency (%)  
3 1384 27.7%
 
2 1301 26.0%
 
4 1141 22.8%
 
1 829 16.6%
 
5 345 6.9%
 

Minimum 5 values

Value Count Frequency (%)  
1 829 16.6%
 
2 1301 26.0%
 
3 1384 27.7%
 
4 1141 22.8%
 
5 345 6.9%
 

Maximum 5 values

Value Count Frequency (%)  
1 829 16.6%
 
2 1301 26.0%
 
3 1384 27.7%
 
4 1141 22.8%
 
5 345 6.9%
 

card2benefit
Numeric

Distinct count 4
Unique (%) 0.1%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 2.534
Minimum 1
Maximum 4
Zeros (%) 0.0%

Quantile statistics

Minimum 1
5-th percentile 1
Q1 2
Median 3
Q3 4
95-th percentile 4
Maximum 4
Range 3
Interquartile range 2

Descriptive statistics

Standard deviation 1.1173
Coef of variation 0.44091
Kurtosis -1.3562
Mean 2.534
MAD 0.99851
Skewness -0.046519
Sum 12670
Variance 1.2483
Memory size 39.1 KiB
Value Count Frequency (%)  
4 1294 25.9%
 
3 1286 25.7%
 
2 1216 24.3%
 
1 1204 24.1%
 

Minimum 5 values

Value Count Frequency (%)  
1 1204 24.1%
 
2 1216 24.3%
 
3 1286 25.7%
 
4 1294 25.9%
 

Maximum 5 values

Value Count Frequency (%)  
1 1204 24.1%
 
2 1216 24.3%
 
3 1286 25.7%
 
4 1294 25.9%
 

card2fee
Boolean

Distinct count 2
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
Mean 0.1872
0
4064
1
936
Value Count Frequency (%)  
0 4064 81.3%
 
1 936 18.7%
 

card2spent
Numeric

Distinct count 4477
Unique (%) 89.5%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 160.88
Minimum 0
Maximum 2069.2
Zeros (%) 3.6%

Quantile statistics

Minimum 0
5-th percentile 14.819
Q1 66.968
Median 125.34
Q3 208.31
95-th percentile 419.45
Maximum 2069.2
Range 2069.2
Interquartile range 141.34

Descriptive statistics

Standard deviation 146.29
Coef of variation 0.90935
Kurtosis 15.736
Mean 160.88
MAD 100.44
Skewness 2.8012
Sum 804380
Variance 21402
Memory size 39.1 KiB
Value Count Frequency (%)  
0.0 179 3.6%
 
63.690000000000005 3 0.1%
 
92.92 3 0.1%
 
175.75 3 0.1%
 
97.87 3 0.1%
 
112.88 3 0.1%
 
128.54 3 0.1%
 
159.1 3 0.1%
 
38.410000000000004 3 0.1%
 
128.35 3 0.1%
 
Other values (4467) 4794 95.9%
 

Minimum 5 values

Value Count Frequency (%)  
0.0 179 3.6%
 
6.1000000000000005 1 0.0%
 
6.54 1 0.0%
 
6.86 1 0.0%
 
7.140000000000001 1 0.0%
 

Maximum 5 values

Value Count Frequency (%)  
1277.68 1 0.0%
 
1282.76 1 0.0%
 
1309.3700000000001 1 0.0%
 
1611.3500000000001 1 0.0%
 
2069.25 1 0.0%
 

card2tenure
Highly correlated

This variable is highly correlated with cardtenure and should be ignored for analysis

Correlation 0.96298

card2tenurecat
Highly correlated

This variable is highly correlated with card2tenure and should be ignored for analysis

Correlation 0.92439

card2type
Numeric

Distinct count 4
Unique (%) 0.1%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 2.5412
Minimum 1
Maximum 4
Zeros (%) 0.0%

Quantile statistics

Minimum 1
5-th percentile 1
Q1 2
Median 3
Q3 4
95-th percentile 4
Maximum 4
Range 3
Interquartile range 2

Descriptive statistics

Standard deviation 1.1188
Coef of variation 0.44027
Kurtosis -1.3601
Mean 2.5412
MAD 1.0003
Skewness -0.04748
Sum 12706
Variance 1.2518
Memory size 39.1 KiB
Value Count Frequency (%)  
4 1319 26.4%
 
3 1257 25.1%
 
2 1235 24.7%
 
1 1189 23.8%
 

Minimum 5 values

Value Count Frequency (%)  
1 1189 23.8%
 
2 1235 24.7%
 
3 1257 25.1%
 
4 1319 26.4%
 

Maximum 5 values

Value Count Frequency (%)  
1 1189 23.8%
 
2 1235 24.7%
 
3 1257 25.1%
 
4 1319 26.4%
 

cardbenefit
Numeric

Distinct count 4
Unique (%) 0.1%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 2.5058
Minimum 1
Maximum 4
Zeros (%) 0.0%

Quantile statistics

Minimum 1
5-th percentile 1
Q1 2
Median 3
Q3 3.25
95-th percentile 4
Maximum 4
Range 3
Interquartile range 1.25

Descriptive statistics

Standard deviation 1.1172
Coef of variation 0.44586
Kurtosis -1.3579
Mean 2.5058
MAD 0.99894
Skewness -0.012388
Sum 12529
Variance 1.2482
Memory size 39.1 KiB
Value Count Frequency (%)  
3 1274 25.5%
 
4 1250 25.0%
 
1 1245 24.9%
 
2 1231 24.6%
 

Minimum 5 values

Value Count Frequency (%)  
1 1245 24.9%
 
2 1231 24.6%
 
3 1274 25.5%
 
4 1250 25.0%
 

Maximum 5 values

Value Count Frequency (%)  
1 1245 24.9%
 
2 1231 24.6%
 
3 1274 25.5%
 
4 1250 25.0%
 

cardfee
Boolean

Distinct count 2
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
Mean 0.1898
0
4051
1
949
Value Count Frequency (%)  
0 4051 81.0%
 
1 949 19.0%
 

cardmon
Numeric

Distinct count 271
Unique (%) 5.4%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 15.444
Minimum 0
Maximum 188.5
Zeros (%) 28.4%

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
Median 13.75
Q3 22.75
95-th percentile 42
Maximum 188.5
Range 188.5
Interquartile range 22.75

Descriptive statistics

Standard deviation 15.008
Coef of variation 0.97175
Kurtosis 7.1671
Mean 15.444
MAD 11.245
Skewness 1.6877
Sum 77219
Variance 225.23
Memory size 39.1 KiB
Value Count Frequency (%)  
0.0 1419 28.4%
 
13.25 53 1.1%
 
11.5 52 1.0%
 
16.5 49 1.0%
 
16.25 49 1.0%
 
13.75 47 0.9%
 
18.25 45 0.9%
 
13.5 45 0.9%
 
14.25 44 0.9%
 
15.0 44 0.9%
 
Other values (261) 3153 63.1%
 

Minimum 5 values

Value Count Frequency (%)  
0.0 1419 28.4%
 
3.25 1 0.0%
 
3.75 1 0.0%
 
4.0 3 0.1%
 
4.25 9 0.2%
 

Maximum 5 values

Value Count Frequency (%)  
100.25 1 0.0%
 
102.0 1 0.0%
 
104.5 1 0.0%
 
138.25 1 0.0%
 
188.5 1 0.0%
 

cardspent
Numeric

Distinct count 4760
Unique (%) 95.2%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 337.2
Minimum 0
Maximum 3926.4
Zeros (%) 0.1%

Quantile statistics

Minimum 0
5-th percentile 91.305
Q1 183.38
Median 276.36
Q3 418.54
95-th percentile 782.32
Maximum 3926.4
Range 3926.4
Interquartile range 235.16

Descriptive statistics

Standard deviation 245.15
Coef of variation 0.727
Kurtosis 21.44
Mean 337.2
MAD 167.79
Skewness 3.0512
Sum 1686000
Variance 60096
Memory size 39.1 KiB
Value Count Frequency (%)  
0.0 7 0.1%
 
186.91 4 0.1%
 
245.84 3 0.1%
 
321.19 3 0.1%
 
231.14000000000001 3 0.1%
 
202.31 3 0.1%
 
237.16 3 0.1%
 
412.99 3 0.1%
 
122.54 3 0.1%
 
249.0 3 0.1%
 
Other values (4750) 4965 99.3%
 

Minimum 5 values

Value Count Frequency (%)  
0.0 7 0.1%
 
6.97 1 0.0%
 
7.34 1 0.0%
 
7.53 1 0.0%
 
8.11 1 0.0%
 

Maximum 5 values

Value Count Frequency (%)  
2461.03 1 0.0%
 
2503.25 1 0.0%
 
2969.39 1 0.0%
 
3104.63 1 0.0%
 
3926.41 1 0.0%
 

cardten
Numeric

Distinct count 698
Unique (%) 14.0%
Missing (%) 0.0%
Missing (n) 2
Infinite (%) 0.0%
Infinite (n) 0
Mean 720.48
Minimum 0
Maximum 13705
Zeros (%) 28.4%

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
Median 425
Q3 1080
95-th percentile 2455.7
Maximum 13705
Range 13705
Interquartile range 1080

Descriptive statistics

Standard deviation 922.23
Coef of variation 1.28
Kurtosis 15.163
Mean 720.48
MAD 667.37
Skewness 2.6459
Sum 3601000
Variance 850500
Memory size 39.1 KiB
Value Count Frequency (%)  
0.0 1420 28.4%
 
590.0 21 0.4%
 
200.0 20 0.4%
 
380.0 20 0.4%
 
45.0 19 0.4%
 
195.0 19 0.4%
 
500.0 19 0.4%
 
330.0 18 0.4%
 
220.0 18 0.4%
 
435.0 18 0.4%
 
Other values (687) 3406 68.1%
 

Minimum 5 values

Value Count Frequency (%)  
0.0 1420 28.4%
 
4.75 1 0.0%
 
5.0 17 0.3%
 
5.25 1 0.0%
 
7.75 1 0.0%
 

Maximum 5 values

Value Count Frequency (%)  
6440.0 1 0.0%
 
7115.0 1 0.0%
 
7310.0 1 0.0%
 
9920.0 1 0.0%
 
13705.0 1 0.0%
 

cardtenure
Numeric

Distinct count 41
Unique (%) 0.8%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 16.656
Minimum 0
Maximum 40
Zeros (%) 1.8%

Quantile statistics

Minimum 0
5-th percentile 1
Q1 6
Median 14
Q3 26
95-th percentile 38
Maximum 40
Range 40
Interquartile range 20

Descriptive statistics

Standard deviation 12.021
Coef of variation 0.72173
Kurtosis -1.0561
Mean 16.656
MAD 10.355
Skewness 0.42936
Sum 83279
Variance 144.5
Memory size 39.1 KiB
Value Count Frequency (%)  
3 246 4.9%
 
1 228 4.6%
 
2 220 4.4%
 
4 193 3.9%
 
5 188 3.8%
 
6 176 3.5%
 
7 163 3.3%
 
11 158 3.2%
 
8 158 3.2%
 
9 153 3.1%
 
Other values (31) 3117 62.3%
 

Minimum 5 values

Value Count Frequency (%)  
0 91 1.8%
 
1 228 4.6%
 
2 220 4.4%
 
3 246 4.9%
 
4 193 3.9%
 

Maximum 5 values

Value Count Frequency (%)  
36 72 1.4%
 
37 83 1.7%
 
38 98 2.0%
 
39 113 2.3%
 
40 126 2.5%
 

cardtenurecat
Numeric

Distinct count 5
Unique (%) 0.1%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 3.7822
Minimum 1
Maximum 5
Zeros (%) 0.0%

Quantile statistics

Minimum 1
5-th percentile 1
Q1 3
Median 4
Q3 5
95-th percentile 5
Maximum 5
Range 4
Interquartile range 2

Descriptive statistics

Standard deviation 1.3538
Coef of variation 0.35794
Kurtosis -1.0266
Mean 3.7822
MAD 1.2057
Skewness -0.62824
Sum 18911
Variance 1.8327
Memory size 39.1 KiB
Value Count Frequency (%)  
5 2351 47.0%
 
2 847 16.9%
 
3 789 15.8%
 
4 694 13.9%
 
1 319 6.4%
 

Minimum 5 values

Value Count Frequency (%)  
1 319 6.4%
 
2 847 16.9%
 
3 789 15.8%
 
4 694 13.9%
 
5 2351 47.0%
 

Maximum 5 values

Value Count Frequency (%)  
1 319 6.4%
 
2 847 16.9%
 
3 789 15.8%
 
4 694 13.9%
 
5 2351 47.0%
 

cardtype
Numeric

Distinct count 4
Unique (%) 0.1%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 2.507
Minimum 1
Maximum 4
Zeros (%) 0.0%

Quantile statistics

Minimum 1
5-th percentile 1
Q1 2
Median 3
Q3 4
95-th percentile 4
Maximum 4
Range 3
Interquartile range 2

Descriptive statistics

Standard deviation 1.1185
Coef of variation 0.44614
Kurtosis -1.3608
Mean 2.507
MAD 1.0004
Skewness -0.0098086
Sum 12535
Variance 1.251
Memory size 39.1 KiB
Value Count Frequency (%)  
4 1260 25.2%
 
3 1257 25.1%
 
1 1242 24.8%
 
2 1241 24.8%
 

Minimum 5 values

Value Count Frequency (%)  
1 1242 24.8%
 
2 1241 24.8%
 
3 1257 25.1%
 
4 1260 25.2%
 

Maximum 5 values

Value Count Frequency (%)  
1 1242 24.8%
 
2 1241 24.8%
 
3 1257 25.1%
 
4 1260 25.2%
 

carown
Numeric

Distinct count 3
Unique (%) 0.1%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 0.6414
Minimum -1
Maximum 1
Zeros (%) 16.0%

Quantile statistics

Minimum -1
5-th percentile -1
Q1 0
Median 1
Q3 1
95-th percentile 1
Maximum 1
Range 2
Interquartile range 1

Descriptive statistics

Standard deviation 0.6549
Coef of variation 1.021
Kurtosis 1.14
Mean 0.6414
MAD 0.5313
Skewness -1.5944
Sum 3207
Variance 0.42889
Memory size 39.1 KiB
Value Count Frequency (%)  
1 3704 74.1%
 
0 799 16.0%
 
-1 497 9.9%
 

Minimum 5 values

Value Count Frequency (%)  
-1 497 9.9%
 
0 799 16.0%
 
1 3704 74.1%
 

Maximum 5 values

Value Count Frequency (%)  
-1 497 9.9%
 
0 799 16.0%
 
1 3704 74.1%
 

cars
Numeric

Distinct count 9
Unique (%) 0.2%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 2.1306
Minimum 0
Maximum 8
Zeros (%) 9.9%

Quantile statistics

Minimum 0
5-th percentile 0
Q1 1
Median 2
Q3 3
95-th percentile 4
Maximum 8
Range 8
Interquartile range 2

Descriptive statistics

Standard deviation 1.3075
Coef of variation 0.61366
Kurtosis 0.32839
Mean 2.1306
MAD 1.0136
Skewness 0.50172
Sum 10653
Variance 1.7095
Memory size 39.1 KiB
Value Count Frequency (%)  
2 1607 32.1%
 
1 1119 22.4%
 
3 1082 21.6%
 
0 497 9.9%
 
4 481 9.6%
 
5 149 3.0%
 
6 51 1.0%
 
7 13 0.3%
 
8 1 0.0%
 

Minimum 5 values

Value Count Frequency (%)  
0 497 9.9%
 
1 1119 22.4%
 
2 1607 32.1%
 
3 1082 21.6%
 
4 481 9.6%
 

Maximum 5 values

Value Count Frequency (%)  
4 481 9.6%
 
5 149 3.0%
 
6 51 1.0%
 
7 13 0.3%
 
8 1 0.0%
 

cartype
Numeric

Distinct count 3
Unique (%) 0.1%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 0.3438
Minimum -1
Maximum 1
Zeros (%) 45.7%

Quantile statistics

Minimum -1
5-th percentile -1
Q1 0
Median 0
Q3 1
95-th percentile 1
Maximum 1
Range 2
Interquartile range 1

Descriptive statistics

Standard deviation 0.65153
Coef of variation 1.8951
Kurtosis -0.70821
Mean 0.3438
MAD 0.58166
Skewness -0.48685
Sum 1719
Variance 0.42449
Memory size 39.1 KiB
Value Count Frequency (%)  
0 2287 45.7%
 
1 2216 44.3%
 
-1 497 9.9%
 

Minimum 5 values

Value Count Frequency (%)  
-1 497 9.9%
 
0 2287 45.7%
 
1 2216 44.3%
 

Maximum 5 values

Value Count Frequency (%)  
-1 497 9.9%
 
0 2287 45.7%
 
1 2216 44.3%
 

carvalue
Numeric

Distinct count 801
Unique (%) 16.0%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 23.233
Minimum -1
Maximum 99.6
Zeros (%) 0.0%

Quantile statistics

Minimum -1
5-th percentile -1
Q1 9.2
Median 17
Q3 31.1
95-th percentile 72
Maximum 99.6
Range 100.6
Interquartile range 21.9

Descriptive statistics

Standard deviation 21.232
Coef of variation 0.91387
Kurtosis 1.9517
Mean 23.233
MAD 15.904
Skewness 1.474
Sum 116160
Variance 450.78
Memory size 39.1 KiB
Value Count Frequency (%)  
-1.0 497 9.9%
 
9.8 25 0.5%
 
13.5 24 0.5%
 
6.300000000000001 24 0.5%
 
10.200000000000001 23 0.5%
 
13.0 23 0.5%
 
11.4 22 0.4%
 
9.1 22 0.4%
 
9.200000000000001 22 0.4%
 
9.9 22 0.4%
 
Other values (791) 4296 85.9%
 

Minimum 5 values

Value Count Frequency (%)  
-1.0 497 9.9%
 
2.2 1 0.0%
 
2.3000000000000003 1 0.0%
 
2.4000000000000004 1 0.0%
 
2.5 1 0.0%
 

Maximum 5 values

Value Count Frequency (%)  
98.2 1 0.0%
 
98.5 4 0.1%
 
98.80000000000001 1 0.0%
 
99.2 1 0.0%
 
99.60000000000001 1 0.0%
 

churn
Boolean

Distinct count 2
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
Mean 0.2532
0
3734
1
1266
Value Count Frequency (%)  
0 3734 74.7%
 
1 1266 25.3%
 

commute
Numeric

Distinct count 10
Unique (%) 0.2%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 2.9962
Minimum 1
Maximum 10
Zeros (%) 0.0%

Quantile statistics

Minimum 1
5-th percentile 1
Q1 1
Median 1
Q3 4
95-th percentile 8
Maximum 10
Range 9
Interquartile range 3

Descriptive statistics

Standard deviation 2.7435
Coef of variation 0.91567
Kurtosis -0.045572
Mean 2.9962
MAD 2.2996
Skewness 1.1277
Sum 14981
Variance 7.5269
Memory size 39.1 KiB
Value Count Frequency (%)  
1 2855 57.1%
 
4 635 12.7%
 
8 585 11.7%
 
5 302 6.0%
 
3 295 5.9%
 
10 153 3.1%
 
7 56 1.1%
 
2 50 1.0%
 
6 44 0.9%
 
9 25 0.5%
 

Minimum 5 values

Value Count Frequency (%)  
1 2855 57.1%
 
2 50 1.0%
 
3 295 5.9%
 
4 635 12.7%
 
5 302 6.0%
 

Maximum 5 values

Value Count Frequency (%)  
6 44 0.9%
 
7 56 1.1%
 
8 585 11.7%
 
9 25 0.5%
 
10 153 3.1%
 

commutebike
Boolean

Distinct count 2
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
Mean 0.1234
0
4383
1
 
617
Value Count Frequency (%)  
0 4383 87.7%
 
1 617 12.3%
 

commutebus
Boolean

Distinct count 2
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
Mean 0.406
0
2970
1
2030
Value Count Frequency (%)  
0 2970 59.4%
 
1 2030 40.6%
 

commutecar
Boolean

Distinct count 2
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
Mean 0.679
1
3395
0
1605
Value Count Frequency (%)  
1 3395 67.9%
 
0 1605 32.1%
 

commutecarpool
Boolean

Distinct count 2
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
Mean 0.2718
0
3641
1
1359
Value Count Frequency (%)  
0 3641 72.8%
 
1 1359 27.2%
 

commutecat
Highly correlated

This variable is highly correlated with commute and should be ignored for analysis

Correlation 0.98117

commutemotorcycle
Boolean

Distinct count 2
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
Mean 0.1026
0
4487
1
 
513
Value Count Frequency (%)  
0 4487 89.7%
 
1 513 10.3%
 

commutenonmotor
Boolean

Distinct count 2
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
Mean 0.0584
0
4708
1
 
292
Value Count Frequency (%)  
0 4708 94.2%
 
1 292 5.8%
 

commutepublic
Boolean

Distinct count 2
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
Mean 0.0954
0
4523
1
 
477
Value Count Frequency (%)  
0 4523 90.5%
 
1 477 9.5%
 

commuterail
Boolean

Distinct count 2
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
Mean 0.2746
0
3627
1
1373
Value Count Frequency (%)  
0 3627 72.5%
 
1 1373 27.5%
 

commutetime
Numeric

Distinct count 42
Unique (%) 0.8%
Missing (%) 0.0%
Missing (n) 2
Infinite (%) 0.0%
Infinite (n) 0
Mean 25.346
Minimum 8
Maximum 48
Zeros (%) 0.0%

Quantile statistics

Minimum 8
5-th percentile 16
Q1 21
Median 25
Q3 29
95-th percentile 35
Maximum 48
Range 40
Interquartile range 8

Descriptive statistics

Standard deviation 5.8791
Coef of variation 0.23196
Kurtosis 0.13487
Mean 25.346
MAD 4.6895
Skewness 0.29028
Sum 126680
Variance 34.564
Memory size 39.1 KiB
Value Count Frequency (%)  
24.0 336 6.7%
 
23.0 335 6.7%
 
27.0 331 6.6%
 
25.0 330 6.6%
 
22.0 325 6.5%
 
26.0 311 6.2%
 
21.0 307 6.1%
 
28.0 293 5.9%
 
29.0 260 5.2%
 
30.0 226 4.5%
 
Other values (31) 1944 38.9%
 

Minimum 5 values

Value Count Frequency (%)  
8.0 1 0.0%
 
9.0 6 0.1%
 
10.0 4 0.1%
 
11.0 9 0.2%
 
12.0 22 0.4%
 

Maximum 5 values

Value Count Frequency (%)  
44.0 4 0.1%
 
45.0 4 0.1%
 
46.0 6 0.1%
 
47.0 1 0.0%
 
48.0 1 0.0%
 

commutewalk
Boolean

Distinct count 2
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
Mean 0.3838
0
3081
1
1919
Value Count Frequency (%)  
0 3081 61.6%
 
1 1919 38.4%
 

confer
Boolean

Distinct count 2
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
Mean 0.478
0
2610
1
2390
Value Count Frequency (%)  
0 2610 52.2%
 
1 2390 47.8%
 

creddebt
Numeric

Distinct count 4950
Unique (%) 99.0%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 1.8573
Minimum 0
Maximum 109.07
Zeros (%) 0.0%

Quantile statistics

Minimum 0
5-th percentile 0.10109
Q1 0.38552
Median 0.92644
Q3 2.0638
95-th percentile