CS628 – Data Science Project # 5: Visualization Assignment
Monroe College
For the following questions (1 to 3), work with the bank_marketing_training data set. Use Python.
Attached is the file. Start with the following code.
1. # import required package 2. import pandas as pd 3. 4. # read the csv bank_train data using the pandas package 5. bank_train = pd.read_csv(“/Users/edeki/Desktop/Website Data Sets/bank_marketing_training”)
1.
Create a bar graph of the previous_outcome variable, with response overlay.
2.
Create a normalized bar graph of previous_outcome variable with response overlay. Describe the
relationship between previous_outcome and response.
3.
Examine the non?normalized and normalized histograms of duration, with overlay of response.
Identify cutoff point(s) for duration, which separate low values of response from high values.
Define a new categorical variable, duration_binned, using the cutoff points you identified.
For the following questions (4 to 5), work with the adult_ch3_training data set
1. # read the csv adult_ch3_training data using the pandas package 2. adult_ch3_train = pd.read_csv(“/Users/edeki/Desktop/Website Data Sets/adult_ch3_training”)
4.
Consider capital?loss. Identify the outliers in capital?loss using the Z?score method. How many outliers are there?
5.
Construct a bar graph of Income for these outlier records.
Recent Comments