MIS 690: Methodology Approach and Model Building
Predictive analytics allows businesses to identify who their customers and prospects are, how to upsell and cross-sell products and services and predict consumer behavior. Building predictive models is an iterative process in which the model is built from an initial premise and then improved until it generates a useful organization outcome.
Predictive model creators and users must have a thorough understanding of the organization’s business operations as well as the industry in which it competes. Eight stages will assist in the development and use of prediction models:
Step 1: Define and scope the predictive analytics model we want to build: In this step, we want to decide what business processes will be studied as well as the expected business outcomes.
Step 2: Investigate and profile the data. Predictive analytics is a data-intensive process. In this phase, we must determine the required data, its storage, accessibility, and current condition.
Step 3: Collect, cleanse, and combine data: We may need to sanitize the data after we’ve determined where the essential data is located.
Step 4: Create the prediction model where we develop the hypothesis and then create the test model. Our objective is to include and exclude various variables and components, and then test the model using historical data to determine if the model’s findings verify the hypothesis.
Step 5: Integrate analytics into business processes: To make the model valuable, we integrate it into the business process so that it may be utilized to aid in the achievement of the goal.
Step 6: Monitor the model and measure the business results: We should keep an eye on the model and see how effective it is in producing the expected results. As conditions change, it may be required to make modifications and fine-tune the model.
Step 7: Validate and test the data. This procedure confirms that our model performs well on a fresh and distinct dataset.
Step 8: Deploying and maintaining the model entails putting the model into action and tracking its results.
Modeling Methods
For this part of the project, I will use the Nurse_Retention_Survey dataset and load it in R and view a summary of the dataset and return the total observations.
Graphical user interface, application, table, Excel Description automatically generated
A picture containing table Description automatically generated
This dataset contains the following information about 31 individuals: Timestamp, Age, Employment, Education, Careunit, Covid-19 , Job Expectations, Use of skills, Professional Growth, Salary Satisfaction, Other Income, Life Stress, Work Stress, Salary, Benefits, Nurse to patient ratios, weekends off, Schedule Flexibility, Career advancements.
Test Design
Its important to train and test the dataset before moving forward. However, in this case, I need to make it clear that I want to fit a logistic regression model by setting the family argument to binomial and use glm.fit() function to build a logistic regression model. Some variables such as Education and Careunit were left out because of there are not binomial. I used 60% of the dataset as training set and 30% as testing set. My raw dataset is attached with this word document. I will use the Timestamp as a response variable and others as predictors to predict the probability that a nurse quits.
The issue I am facing is to build the logistic regression model with glm.fit() function unfortunately I am struggling using TimeStamp as the response variable to return the deviance residuals and coefficients and determine our P value.
Model Building
In a typical linear regression, we can use R^2 as a method to assess how well a model fits a data. This number usually ranges from 0 to 1 with higher values confirming a better model fit. Its different for a logistic regression, where we can compute using McFaddens R.
Here are the steps to a final model building :
# Define the variables to be used in the model
# pull data into those variables from the training set
# define the model
# run the model on the training
# make predictions
# summarize & analyze
# compare prediction with actual values based on testing set
# create confusion matrix
# define and calculate metrics for validating the model (accuracy, error rate, precision, recall, F1, etc.)
# explain each metric, the meaning of its value in this context
# quantify the validity of the entire model based on the values of each metric
# Summarize the overall suitability of the model in this context.
Because of the issue encountered, I gave the problem another approach using SPSS Statistics. I performed a logistic regression to ascertain the effects of age, salary satisfaction, work environment, etc, on the likelihood that the participants whether vaccinated or not from Covid dont quit during the next few months.
Graphical user interface, application Description automatically generated
The explained variation in the dependent variable Covid-19 vaccination based on the model ranges from 62% to 100% depending on which method we use.
Table Description automatically generated
Binomial logistic regression estimates the probability of a nurse quitting vaccinated with Covid-19 quits. The variables in the equation shows the contribution of each independent variable to the model. However, after selecting all variables, they did not appear in the equation. Table Description automatically generated
If they were, variables such as Age (p=011), Employment Years (P=0.008) and CareUnit PCU (p=0.05), Care Unit (3) Med surg (p=0.006) add significantly to the model.
Recent Comments