Mastery Component – Version 1 MATH 2560H – Wesley Burr
Mastery Component: Nonparametric Confidence Intervals
Please complete all of the problems in RStudio using R Markdown and knit to PDF for submission
Problem 1. Write a simple function which, given the data sets dat1, dat2 and a value alpha returns the bootstrap percentile confidence interval for the difference in means of the data sets dat1 and dat2.
Problem 2. Use the datasets package to import the data set airquality. This data set contains daily air quality measurements in New York from May to September 1973.
1. Create a histogram and summary statistics for the solar radiations in Langleys (Solar.R vector), what do you observe?
2. Remove the missing observations from the Solar.R vector.
3. Bootstrap the distribution of the mean for Solar.R.
4. Find and interpret the 95% bootstrap percentile confidence interval for the mean.
Problem 3. Consider the dataset Girls2004 (from the package resampledata) containing birth weights of baby girls born in Wyoming or Alaska.
(a) Compute summary statistics of the birth weights in each of the states.
(b) Compute the 92% bootstrap t confidence interval for the difference in the mean birth weights between the two states.
Problem 4. There is a built-in data set in R called MobileAds in resampledata. Google was interested in comparing the amount that advertisers paid before (m.cost pre) and after (m.cost post) Googles recommendation. We will restrict our attention to the mobile platform (indicated by the m in the variable names).
(a) Create a histogram and summary statistics for the two variables, and describe the results.
(b) Notice that this is a matched pairs data set, explain why this is. Use this knowledge when answering the following questions.
(c) Find and interpret the 94% bootstrap t confidence interval for the mean difference in cost before and after Googles recommendation.
(d) Find and interpret the 94% bootstrap percentile confidence interval for the mean difference in cost before and after Googles recommendation.
(e) Compare the two confidence intervals.
(f) What could be a reason why the two confidence intervals are different? (7.5.1)
Problem 5. In this question we will use the Verizon data set in R, which can be loaded using the resampledata package. We will compare the mean repair times for ILEC customers with CLEC customers, ideally there should be no difference.
1
Mastery Component – Version 1 MATH 2560H – Wesley Burr
(a) How many ILEC customers are in the sample? How many CLEC customers are in the sample?
(b) Create exploratory plots of the times for the two groups and comment on the distributions.
(c) Compute the 95% confidence interval for the difference in means using the formula t, bootstrap percentile and bootstrap t methods. Compare the results, which interval would you report?
(d) Modify your results from above to obtain a one-sided 95% t confidence interval (hypothesizing that ILEC customers have lower repair times compared to CLEC customers).
Recent Comments