This assignment will cover material through current.
· Data set: 2015 Carbon-di-oxide Emissions by Engine Size_DAP Assignment 2_2021.xlsx Download 2015 Carbon-di-oxide Emissions by Engine Size_DAP Assignment 2_2021.xlsx
· N=1132
· Included: 9 variables (including year 2015)
· Source: https://open.canada.ca/en (Links to an external site.)
· File: 2015 Carbon di-oxide Emissions By Engine Size_DAP Assignment xlsx
Please note that data follow the metric system. Please present your results using metric system.
SUBMISSION:
· 1 File (PDF file with your responses to Questions 1-5 in the same order listed). I am NOT looking for a report
· 1 jmp file with your final models and analysis loaded as scripts. Please add a team name to your file
Motivation: Seek to evaluate linkages between engine size of an automobile (and other design aspects of an automobile) and carbon-di-oxide emissions. This data set is a subset of a much larger data set available from Open Canada.
Two main questions:
1. Is the level of CO2 emissions of an automobile positively associated with the size of the automobiles engine after accounting for whether the automobile has 4, 6, or 8 cylinders?
2. Is the level of CO2 emissions of an automobile positively associated with the number of cylinders (4, 6, or 8) after accounting for the size of the automobiles engine?
Clean and Load the Data: This dataset is very clean and has descriptors for all variables at the bottom of the file and column names are clearly specified. Remove extra lines, and load your data (for 4, 6, or 8 cylinders) into jmp. The project steps are listed below.
1. Hypotheses (Mechanics 8 points)
· Develop the 2 null hypothesis one for each of the questions listed and set up the analysis. These are how you seek to answer the questions above and develop regression models to test Hypothesis 1 and Hypothesis 2.
2. Variable Analysis (Mechanics) (12 points)
1. Identify which one will be the response variable (Y) as you conduct the analysis: Carbon-di-oxide emissions, Engine Size, Number of cylinders.
2. Visualize the data
3. Evaluate the distributions of the variables (Be sure to identify mnemonics: mean median, standard deviations and other statistical aspects of the distributions) and comment on these distributions.
4. Present the Pearson Correlation matrix and Scatterplots between the variables. Explain why this is important?
3. Regression Model 1 (Methods) (35 points)
1. Develop the regression models to analyze Hypothesis 1.
2. How will you handle the cylinder size variable (4, 6 or 8 cylinders)? Develop the required variables and proceed to estimate the model.
3. Analyze the regression model (global fit and parameters).
4. Present your message(s) and conclusion on the hypothesis that you set up.
5. Are the assumptions of linear regression met? (Look at residual plots- residual by predicted and residual by X; normal quantile plots to answer this question)
6. Are there any outliers? How can you be sure? (Note: We will not be talking about outliers till next week). The project deadline is on Oct 10. I am not expecting a response to this question in your report. If, however, you compile your thoughts and place a short description after the class, this will count towards extra credit. (A formal analysis will be part of your Assignment 3 – post exam). (Extra credit- 3 points)
4. Regression Model 2 (Methods) (35 points)
2. Develop the regression models to analyze Hypothesis 2. How will you handle the cylinder size variable (4, 6 or 8 cylinders)?
3. Analyze the regression model (global fit and parameters).
4. Write down the prediction equation and document if all assumptions are
5. Present your message(s) and conclusions on the hypothesis you set
6. What is the engine size effect on emissions for an 8 -cylinder engine? (Write down the equation and analyze it).
5. Final Conclusions (Conclusions and Message(s) 10 points):
1. What is your final conclusion on the effect of cylinders and engine size on Carbon emissions? Base your conclusion on both models. (6)
2. Craft the key “message(s)” based on the conclusion(s) you just arrived at. (2)
3. What else could you have done- given you have seen the variables in the data?
Recent Comments