7.1 Explain the correlation of data points to a given equation. 7.2 Determine assumptions of the regression model.
Reading Assignment Chapter 4: Regression Models, pp. 114123
Unit Lesson Regression Models In business and government, situations occur that lead observers to wonder is the change in X related to the change in Y? Indeed, they may see Y change every time X does and already conclude that they are correlated but just cannot tell how much without analysis. It follows that after seeing something affect a change in something else in a field of ones concern, the next question that arises is how much will Y change when X changes? Data that can be grouped into a sloping line can lead to a mathematical answer to that question, as all lines on a graph have a slope of y = mx + b (these terms will be defined shortly). So you do not have to wonder about data and changes. You can use regression analysis to calculate the change. There are two reasons to use regression analysis:
1. to understand the relationship between variables as shown by a collected pattern of data, and 2. to predict the value of one variable if the value of the other variable is known or set.
Sections 4.2 through 4.6, pages 114124 of the textbook walk you through taking a scatter plot of data and using a line (and simple linear regression) to model the correlation, predict a value for Y unknown given a value set for X, and determine how well the model linear equation fits the data situation, in terms of error and standard deviation. The error and standard deviation could be very small, showing that the linear equation fits well and the data is clustered very close together in relative value. Or, the error and standard deviation could be wide apart, showing the predictions of correlations will not be that good, and leading you to wonder if you have the right linear equation modeling the data. These are issues supporting analysts will work on if the solutions do not look right. To orient yourself on using regression to model correlation, follow the textbooks example (page 114) of the Triple A Construction Company. Note the pattern of the six data points (the local payroll amounts) in Figure 4.1, page 115 of the textbook. When plotted on a graph of X/Y axes, these form a scatter plot, which can be modeled by a line with a certain position and slope. Of course, the standard mathematics line equation, y = mx + b, can model this line and any other straight line, but the difference is in the values for the variables. In terms of linear regression, the equation for a line becomes:
Y = ?0 + ?1X + ? , where
UNIT VII STUDY GUIDE
Correlation
MSL 5080, Methods of Analysis for Business Operations 2
UNIT x STUDY GUIDE
Title
Y = dependent / response variable X = independent variable ?0 = Y-intercept when X = 0 ?1 = slope of the line ? = random error For a linear regression as a model to solve a correlation, X and ?1 are not known, but they are estimated with sample data. Rewrite the linear (regression) equation based on sample data as:
? = b0 + b1X, where ? = predicted value of Y b0 = estimate of ?0 based on sample results b1 = estimate of ?1 based on sample results You could try a line and eyeball it so you can report that you have a close model, but what you really must do to be accurate is to determine the position of a line with minimal error. Error, you define with some common sense:
Error = actual value predicted value And in terms of linear regression equations, this means:
E = Y ? Square errors so that an error in a negative direction does not cancel out an error in a positive direction, making the predicted values look more accurate than they may be. The best regression line, then, is the one with the minimum sum of squared errors, which is why regression analysis is also termed least- squares regression. Note how you can find b0 and b1: by taking the averages of X and Y (summing all the Xs and multiplying the sum by the number of Xs, and doing the same for the Ys) you emplace the resulting averages in equivalent formulas for b0 and b1:
As indicated in the textbook, you can sum up these data points, but for cumbersome amounts of data, software can do this for us. Here and in the textbook, in the case of Triple A Construction Company with its six data points, calculating manually will give:
(Render, Stair, Hanna, & Hale, 2015)
MSL 5080, Methods of Analysis for Business Operations 3
UNIT x STUDY GUIDE
Title
And in the equation, ? = b0 + b1X, ? = 2 + 1.25X, or sales = 2 + 1.25 (payroll), which enables us to estimate the predicted value of sales for whatever amount the payroll would be set. Also as noted, finding the numbers for the linear regression equation shows us the relationship between the variables. Here, you can see how sales should move, given certain payroll amounts (do not forget that payroll is in units of hundreds of millions and sales is in units of hundreds of thousands). Measuring the Fit As previously addressed, you can try linear regression equations and settle on one that calculations show is a good fit, but the issue of the amount of error will persist, can be argued over, and finally tends to lead analysts to find out how much error is in an equation and which ones fit with the smallest error. To address these issues and ward off objections to calculations, analysts developed sums of squares total (SST), sums of squares error (SSE), and sums of squares regression (SSR), and methods to test for significance. The reason you square terms in these equations, as you have in past units, is because an error with a negative value may cancel out an error with a positive value when these are added, making the regression model equation appear to have a smaller error than it really has. Terms squared are always positive, so that problem is eliminated by converting formulas for error to those where error values are squared. So:
Sum of squares total = SST = ? (Y average of Y values)2
Sum of squares error = SSE = ? e2 = ? (Y ?)2 The sum of squares regression (which shows how much Ys variance is) is explained by the regression equation:
SSR = ? (? average of Y values)2 These sums are related: SST = SSR + SSE. As noted in the textbook on page 118, these measuring tools can be seen as the SSR, showing the explained variability in Y and the SSE showing the unexplained variability in Y. The proportion of these two is called the coefficient of determination, r2, and calculated with SST, SSE and SSR like this:
r2 = SSR = 1 SSE SST SST
A value for r2 is the percentage of the variability of Y explained by the regression equation, as that developed for payroll (X) for the Triple A Construction Company example. This discussion can now tie in the title of the unit, Correlation. r, or the square root of r2, is the coefficient of correlation and shows the strength of correlation of the regression equation. Note the four examples in Figure 4.3 of the textbook, and how in two cases, (a) and (d), the data points are aligned in an exact line, and so that line has perfect correlation, as shown here:
(Render, et al., 2015)
MSL 5080, Methods of Analysis for Business Operations 4
Recent Comments