What Is The Average Rate Of Change Of F(X), Represented By The Graph, Over The Interval [0, 2]?

Regression Basics

What are predictors and criteria?

According to the regression (linear) model, what are the two parts of variance of the dependent variable? (Write an equation and state in your own words what this says.)

How practise changes in the slope and intercept affect (move) the regression line?

What does information technology mean to choose a regression line to satisfy the loss function of to the lowest degree squares?

How do we find the slope and intercept for the regression line with a single contained variable? (Either formula for the gradient is acceptable.)

What does information technology mean to test the significance of the regression sum of squares? R-foursquare?

Why does testing for the regression sum of squares turn out to have the same result as testing for R-square?

The Linear Model

The linear model assumes that the relations between two variables can be summarized past a straight line.

Jargon. Information technology is customary to call the independent variable 10 and the dependent variable Y. The X variable is frequently chosen the predictor and Y is frequently called the benchmark (the plural of 'criterion' is 'criteria'). It is customary to talk about the regression of Y on X, and so that if nosotros were predicting GPA from SAT we would talk about the regression of GPA on Sabbatum.

Scores on a dependent variable can be thought of as the sum of two parts: (1) a linear role of an independent variable, and (2) random mistake. In symbols, we have:

(2.ane)

Where Y_i is a score on the dependent variable for the ith person, a + b 10_i describes a line or linear office relating 10 to Y, and east _i is an mistake. Note that in that location is a split score for each Ten, Y, and error (these are variables), but only one value of a and b , which are population parameters.

The portion of the equation denoted by a + b Ten_i defines a line. The symbol X represents the independent variable. The symbol a represents the Y intercept, that is, the value that Y takes when X is zero. The symbol b describes the slope of a line. It denotes the number of units that Y changes when X changes i unit of measurement. If the gradient is ii, then when X increases 1 unit, Y increases 2 units. If the slope is -.25, then as X increases i unit, Y decreases .25 units. Equation 2.1 is expressed as parameters. We usually have to estimate the parameters.

The equation for estimates rather than parameters is:

(2.2)

If nosotros take out the error part of equation 2.2, we have a directly line that nosotros can use to predict values of Y from values of X, which is ane of the main uses of regression. It looks similar this:

(two.3)

Equation 2.3 says that the predicted value of Y is equal to a linear role of X. The gradient of a line (b) is sometimes divers as rise over run. If Y is the vertical axis, then ascent refers to modify in Y. If X is the horizontal axis, and then run refers to change in Ten. Therefore, rise over run is the ratio of change in Y to modify in X. This means exactly the same thing as the number of units that Y changes when X changes 1 unit (e.g., 2/1 = 2, ten/12 = .833, -five/xx=-.25). Gradient means rise over run.

Linear Transformation

The idea of a linear transformation is that one variable is mapped onto another in a 1-to-i fashion. A linear transformation allows yous to multiply (or divide) the original variable and then to add (or subtract) a constant. In junior high school, you were probably shown the transformation Y = mX+b, merely we use Y = a+bX instead. A linear transformation is what is permissible in the transformation of interval scale data in Steven's taxonomy (nominal, ordinal, interval, and ratio). The value a, the Y intercept, shifts the line upwards or down the Y-axis. The value of b, the slope, controls how rapidly the line rises every bit we motility from left to correct.

Ane further example may help to illustrate the notion of the linear transformation. We can catechumen temperature in degrees Centigrade to degrees Fahrenheit using a linear transformation.

Note that the Y intercept is 32, because when X=0, Y=32. The slope is rise over run. Run is degrees C, or zero to 100 or 100. Rise over the aforementioned part of the line is 212-32, or 180. Therefore slope is 180/100 or 1.8. Nosotros can write the equation for the linear transformation Y=32+1.8X or F=32+1.8C.

Unproblematic Regression Example

The regression problems that we deal with will use a line to transform values of X to predict values of Y. In full general, not all of the points will fall on the line, but we will choose our regression line so equally to best summarize the relations between Ten and Y.

Suppose nosotros measured the height and weight of a random sample of adults in shopping malls in the U.Due south. We want to predict weight from tiptop in the population.

Tabular array 2.1

Ht	Wt
61	105
62	120
63	120
65	160
65	120
68	145
69	175
lxx	160
72	185
75 North=ten	210 N=10
67	150	Mean
20.89	1155.5	Variance (Sⁱⁱ)
iv.57	33.99	Standard Deviation (S)
		Correlation (r) = .94

It is customary to talk near the regression of Y on X, hence the regression of weight on height in our case. The regression equation of our example is Y = -316.86 + 6.97X, where -361.86 is the intercept (a) and half-dozen.97 is the slope (b). We could also write that weight is -316.86+6.97height. The slope value means that for each inch we increase in peak, we wait to increase approximately 7 pounds in weight (increase does non hateful modify in acme or weight within a person, rather it means change in people who have a certain top or weight). The intercept is the value of Y that we expect when X is nada. So if we had a person 0 inches tall, they should weigh -316.86 pounds. Of course nosotros exercise not observe people who are zippo inches alpine and we do not find people with negative weight. Information technology is often the case in psychology the value of the intercept has no meaningful interpretation. Other examples include SAT scores, personality exam scores, and many private difference variables equally contained variables. Occasionally, however, the intercept does have meaning. Our contained variable might be digits recalled correctly, number of siblings, or some other independent variable defined so that zero has meaning.

The linear model revisited. Retrieve that the linear model says each observed Y is composed of 2 parts, (1) a linear role of X, and (ii) an fault. We can illustrate this with our instance.

Effigy 2

We can utilize the regression line to predict values of Y given values of X. For any given value of X, we become straight up to the line, and and so movement horizontally to the left to discover the value of Y. The predicted value of Y is called the predicted value of Y, and is denoted Y'. The difference between the observed Y and the predicted Y (Y-Y') is called a residual. The predicted Y part is the linear office. The rest is the error.

Table 2

N	Ht	Wt	Y'	Resid
1	61	105	108.19	-3.19
2	62	120	115.16	4.84
3	63	120	122.xiii	-ii.13
4	65	160	136.06	23.94
5	65	120	136.06	-16.06
6	68	145	156.97	-eleven.97
7	69	175	163.94	11.06
eight	seventy	160	170.91	-ten.91
nine	72	185	184.84	0.sixteen
x	75	210	205.75	four.25
Hateful	67	150	150.00	0.00
Standard Divergence	4.57	33.99	31.85	eleven.89
Variance	twenty.89	1155.56	1014.37	141.32

Compare the numbers in the table for person 5 (peak = 65, weight=120) to the same person on the graph. The regression line for X=65 is 136.06. The divergence between the hateful of Y and 136.06 is the part of Y due to the linear function of X. The difference between the line and Y is -16.06. This is the error role of Y, the residue. A couple of other things to note about Table 2.two that nosotros will come up dorsum to:

The mean of the predicted values (Y') is equal to the hateful of actual values (Y), and the mean of the residual values (e) is equal to nil.
The variance of Y is equal to the variance of predicted values plus the variance of the residuals.

How can nosotros find the location of the line? What are the values of a and b (estimates of a and b )?

Finding the regression line: Method ane

Information technology turns out that the correlation coefficient, r, is the gradient of the regression line when both 10 and Y are expressed as z scores. Recall that r is the average of cantankerous products, that is,

The correlation coefficient is the slope of Y on X in z-score form, and we already know how to notice it. Just find the z scores for each variable, multiply them, and find the average. The correlation coefficient tells us how many standard deviations that Y changes when Ten changes one standard divergence. When there is no correlation (r = 0), Y changes zero standard deviations when Ten changes 1 SD. When r is one, then Y changes 1 SD when X changes 1 SD.

The regression b weight is expressed in raw score units rather than z score units. To motility from the correlation coefficient to the regression coefficient, nosotros can but transform the units:

(2.4)

This says that the regression weight is equal to the correlation times the standard departure of Y divided by the standard deviation of 10. Note that r shows the slope in z score form, that is, when both standard deviations are 1.0, and then their ratio is ane.0. But we want to know the number of raw score units that Y changes and the number that Ten changes. So to get new ratio, we multiply by the standard deviation of Y and divide by the standard deviation of Ten, that is, multiply r past the raw score ratio of standard deviations.

To find the intercept, a, we compute the following:

(two.five)

This says have the hateful of Y and subtract the slope times the hateful of X. Now it turns out that the regression line always passes through the mean of X and the mean of Y.

If there is no relationship betwixt X and Y, the best approximate for all values of X is the mean of Y. If there is a relationship (b is not zero), the best guess for the hateful of 10 is still the mean of Y, and every bit 10 departs from the mean, then does Y. At any rate, the regression line ever passes through the means of X and Y. This means that, regardless of the value of the gradient, when 10 is at its mean, and then is Y. Nosotros can write this every bit (from equation 2.3):

So merely subtract and rearrange to find the intercept. Another way to think about this is that we know ane point for the line, which is (). Nosotros also know the slope, so nosotros tin can draw in the line until it crosses the Y-axis, that is, follow the line until X=0.

We tin rewrite the regression equation:

This interpretation of the regression line says outset with the mean of Y, and slide up or down the regression line b times the departure of X. For example, look back at Figure 2. Wait for the deviation of Ten from the mean. Notation the similarity to ANOVA, where you accept a grand mean and each gene in the model is in terms of deviations from that mean.

Finding the regression line: Method 2

To find the slope, Pedhazur uses the formula:

(2.6)

This yields the same result as I gave you lot in 2.five. To see why this is so, we tin start with the formula I gave you for the slope and piece of work down:

This says that the gradient is the sum of difference cross products divided by the sum of squares for X. Of grade, this is the aforementioned as the correlation coefficient multiplied by the ratio of the two standard deviations.

Finding the regression line: The notion of least squares.

What we are well-nigh with regression is predicting a value of Y given a value of X. There are many ways we could do this (in actuarial prediction, used past insurance companies, you find whatever happened in the past and judge that it will happen in the futurity, due east.1000., what is the likelihood that a 21 year old male will crash his Neon?). Still, the usual method that we apply is to assume that there are linear relations betwixt the two variables. If this is true, so the relations between the two tin be summarized with a line. The question now is where to put the line so that we get the best prediction, whatsoever 'best' ways. The statistician's solution to what 'best' means is called least squares. Nosotros define a residual to exist the deviation between the actual value and the predicted value (e = Y-Y'). Information technology seems reasonable that we would like to brand the residuals as small-scale as possible, and earlier in our example, you saw that the mean of the residuals was zero. The criterion of least squares defines 'best' to mean that the sum of east²is a small as possible, that is the smallest sum of squared errors, or to the lowest degree squares. It turns out that the regression line with the choice of a and b I take described has the property that the sum of squared errors is minimum for any line chosen to predict Y from Ten.

Least squares is called a loss function (for badness of fit or errors). It is not the only loss role in employ. The loss role most often used by statisticians other than to the lowest degree squares is called maximum likelihood. Least squares is a practiced choice for regression lines because is has been proved that to the lowest degree squares provides estimates that are BLUE, that is, Best (minimum variance) Linear Unbiased Estimates of the regression line. Maximum likelihood estimates are consistent; they become less and less unbiased as the sample size increases. You will see maximum likelihood (rather than least squares) used in many multivariate applications. ML is also used in topic nosotros volition embrace later on, that is, logistic regression, oftentimes used in when the dependent variable is binary.

Segmentation the Sum of Squares

As we saw in Table 2, each value of weight (Y) could be thought of as a part due to the linear regression (a + bX, or Y') and a piece due to error (east, or Y-Y'). In other words, Y = Y'+e. As nosotros saw in the table, the variance of Y equals the variance of Y' plus the variance of e. Now the variance of Y' is as well called the variance due to regression and the variance of e is called the variance of error. Nosotros tin work this a niggling more formally by considering each observed score every bit deviation from the mean of Y due in part to regression and in part due to error.

Observed	Mean of Y	Deviation from Mean due to Regression	Fault Function
Y =	+	(Y'-) +	(Y-Y')

Subtract the mean:

(2.7)

(I finessed a part of the derivation that includes the cross products merely before 2.7. The cross products sum to null.)

This ways that the sum of squares of Y equals the sum of squares regression plus the sum of squares of error (residual). If we carve up through by North, we would have the variance of Y equal to the variance of regression plus the variance residual. For lots of work, we don't carp to apply the variance because nosotros get the same result with sums of squares and information technology'due south less work to compute them.

The big signal here is that we can partition the variance or sum of squares in Y into two parts, the variance (SS) of regression and the variance (SS) of error or residual.

Nosotros can too divide through past the sum of squares Y to become a proportion:

(2.eight)

This says that the sum of squares of Y can exist divided into two proportions, that due to regression, and that due to error. The two proportions must add to 1. Retrieve our instance:

Wt (Y)		Y-	(Y-)²	Y'	Y'-	(Y'-)²	Resid (Y-Y')	Resid²
105	150	-45	2025	108.19	-41.81	1748.076	-three.nineteen	x.1761
120	150	-30	900	115.xvi	-34.84	1213.826	4.84	23.4256
120	150	-30	900	122.13	-27.87	776.7369	-ii.13	4.5369
160	150	x	100	136.06	-13.94	194.3236	23.94	573.1236
120	150	-xxx	900	136.06	-xiii.94	194.3236	-xvi.06	257.9236
145	150	-5	25	156.97	vi.97	48.5809	-11.97	143.2809
175	150	25	625	163.94	13.94	194.3236	11.06	122.3236
160	150	10	100	170.91	20.91	437.2281	-10.91	119.0281
185	150	35	1225	184.84	34.84	1213.826	0.sixteen	0.0256
210	150	sixty	3600	205.75	55.75	3108.063	four.25	18.0625
Sum = 1500	1500	0	10400	1500.01	0.01	9129.307	-0.01	1271.907
Variance			1155.56			1014.37		141.32

The total sum of squares for Y is 10400. The sum of squares for regression is 9129.31, and the sum of squares for error is 1271.91. The regression and fault sums of squares add to 10401.22, which is a tad off because of rounding error. Now we can divide the regression and mistake sums of square by the sum of squares for Y to observe proportions.

We tin do the same with the variance of each:

Both formulas say that the full variance (SS) tin can be split into ii pieces, ane for regression, and 1 for fault. The two pieces each count for a part of the variance (SS) in Y.

We tin can as well compute simple correlation between Y and the predicted value of Y, that is, r_{Y, Y'}. For these information, that correlation is .94, which is also the correlation between 10 and Y (observed height and observed weight). (This is so because Y' is a linear transformation of X.) If nosotros square .94, we get .88, which is called R-foursquare, the squared correlation between Y and Y'. Discover that R-square is the same as the proportion of the variance due to regression: they are the same matter. We could likewise compute the correlation between Y and the residual, e. For our data, the resulting correlation is .35. If we square .35, we get .12, which is the squared correlation between Y and the residue, that is, r_Ye. This is also the proportion of variance due to fault, and it agrees with the proportions we got based upon the sums of squares and variances. We could too correlate Y' with e. The result would be zero. There are two separate, uncorrelated pieces of Y, ane due to regression (Y') and the other due to error (e).

Testing the Significance of the Regression and of R-square

Because R-square is the same every bit proportion of variance due to the regression, which is the aforementioned as the proportion of the total sum of squares due to regression, testing one of these is the same as testing for any of them. The test statistics that nosotros will apply follow the F distribution. They yield identical results considering they test the same matter. To examination for the significance of the regression sum of squares:

(ii.9)

where k is the number of independent variables or predictors, and Northward is the sample size. In our case, k is 1 because in that location is one contained variable. In our example, N is 10. The statistic computed in Equation 2.9 is a ratio of two mean squares (variance estimates), which is distributed equally F with g and (Northward-k-i) degrees of freedom when the cipher hypothesis (that the regression slope is zero?) is true.

In our case, the computation is:

To test for R-square, nosotros use the formula:

(2.x)

where N and k have the same meaning as before, and r² is the squared correlation between Y and Y'.

In our example, the computation is:

which is the same as our before upshot within rounding error. In general, the results will be exactly the same for these ii tests except for rounding mistake.

To anticipate a little bit, soon we will exist using multiple regression, where we have more than ane independent variable. In that instance, instead of r (the correlation) we will take R (the multiple correlation), and instead of rⁱⁱ we will accept R² , so the capital R indicates multiple predictors. However, the test for R² is the 1 just mentioned, that is,