This section discusses hypothesis tests on the regression coefficients in multiple linear regression. As in the case of simple linear regression, these tests can only be carried out if it can be assumed that the random error terms, , are normally and independently distributed with a mean of zero and variance of .
Three types of hypothesis tests can be carried out for multiple linear regression models:
This test checks the significance of the whole regression model.
t test
This test checks the significance of individual regression coefficients.
This test can be used to simultaneously check the significance of a number of regression coefficients. It can also be used to test individual coefficients.
The test for significance of regression in the case of multiple linear
regression analysis is carried out using the analysis of variance. The
test is used to check if a linear statistical relationship exists between
the response variable and at least one of the predictor variables. The
statements for the hypotheses are:
The test for is carried out using the following
statistic:
where is the regression mean square and
is the error mean square. If the null
hypothesis, , is true then the statistic follows the distribution with degrees of freedom in the numerator
and () degrees of freedom in the denominator.
[Note]
The null hypothesis, , is rejected if the calculated statistic,
, is such that:
To calculate the statistic , the mean squares and must be known. As explained in Chapter 4, the mean squares
are obtained by dividing the sum of squares by their degrees of freedom.
For example, the total mean square, , is obtained as follows:
(14)
where is the total sum of squares and is the number of degrees of freedom
associated with . In multiple linear regression, the
following equation is used to calculate : [Note]
(15)
where is the total number of observations, is the vector of observations (that was defined in Chapter 5, Estimating Regression Models Using Least Squares), is the identity matrix of order and represents an square matrix of ones. The number of degrees of freedom associated with , , is (). Knowing and the total mean square, , can be calculated.
The regression mean square, , is obtained by dividing the regression
sum of squares, , by the respective degrees of freedom,
, as follows:
(16)
The regression sum of squares, , is calculated using the following
equation:
(17)
where is the total number of observations, is the vector of observations, is the hat matrix (that was defined in Chapter 5, Estimating Regression Models Using Least Squares) and represents an square matrix of ones. The number of degrees of freedom associated with , , is , where is the number of predictor variables in the model. Knowing and the regression mean square, , can be calculated.
The error mean square, , is obtained by dividing the error
sum of squares, , by the respective degrees of freedom,
, as follows:
(18)
The error sum of squares, , is calculated using the following
equation:
(19)
where is the vector of observations, is the identity matrix of order and is the hat matrix. The number of degrees
of freedom associated with , , is , where is the total number of observations
and is the number of predictor variables
in the model. Knowing and , the error mean square, , can be calculated. The error mean
square is an estimate of the variance, , of the random error terms, . 
Example 5.2
The test for the significance of regression, for the regression model
obtained for the data in Table 5.1, is illustrated in this example. The
null hypothesis for the model is:
The statistic to test is:
To calculate , first the sum of squares are calculated so that the mean squares can be obtained. Then the mean squares are used to calculate the statistic to carry out the significance test.
The regression sum of squares, , can be obtained as:
The hat matrix, is calculated as follows using the
design matrix from Example
5.1:
Knowing , and , the regression sum
of squares, , can be calculated:
The degrees of freedom associated with is , which equals to a value of two since
there are two predictor variables in the data in Table 5.1. Therefore,
the regression mean square is:
Similarly to calculate the error mean square, , the error sum of squares, , can be obtained as:
The degrees of freedom associated with is . Therefore, the error mean square,
, is:
The statistic to test the significance of regression can now be calculated
as:
The critical value for this test, corresponding to a significance level
of 0.1, is:
Since , is rejected and it is concluded that at least one coefficient out of and is significant. In other words, it is concluded that a regression model exists between yield and either one or both of the factors in Table 5.1. The analysis of variance is summarized in Table 5.2.
|
Table 5.2: ANOVA table for the significance of regression test in Example 5.2. |
The test is used to check the significance
of individual regression coefficients in the multiple linear regression
model. Adding a significant variable to a regression model makes the model
more effective, while adding an unimportant variable may make the model
worse. The hypothesis statements to test the significance of a particular
regression coefficient, , are:
The test statistic for this test is based on the distribution (and is similar to the
one used in the case of simple linear regression models in Chapter
4):
(20)
where the standard error, , is obtained from Eqn. (13).
The analyst would fail to reject the null hypothesis if the test statistic,
calculated using Eqn. (20), lies in the acceptance
region:
This test measures the contribution of a variable while the remaining variables are included in the model. For the model , if the test is carried out for , then the test will check the significance of including the variable in the model that contains and (i.e. the model ). Hence the test is also referred to as partial or marginal test. In DOE++, this test is displayed in the Regression Information table.
The test to check the significance of the estimated regression coefficients
for the data in Table 5.1 is illustrated in this example. The null hypothesis
to test the coefficient is:
The null hypothesis to test can be obtained in a similar manner. To calculate the test statistic, , we need to calculate the standard error using Eqn. (13).
In Example 5.2, the value of the error mean square, , was obtained as 30.24. The error
mean square is an estimate of the variance, . Therefore: 
The variance-covariance matrix of the estimated regression coefficients
is:
From the diagonal elements of , the estimated standard error for
and is:
The corresponding test statistics for these coefficients are:
The critical values for the present test at a significance of 0.1 are:
Considering , it can be seen that does not lie in the acceptance region
of . The null hypothesis, , is rejected and it is concluded that
is significant at . This conclusion can also be arrived
at using the value noting that the hypothesis is
two-sided. The value corresponding to the test statistic,
, based on the distribution with 14 degrees of freedom
is:
Since the value is less than the significance,
, it is concluded that is significant. The hypothesis test
on can be carried out in a similar manner.
As explained in Chapter 4, in DOE++, the information related to the test is displayed in the Regression Information table as shown in Figure 5.13. In this table, the test for is displayed in the row for the term Factor 2 because is the coefficient that represents this factor in the regression model. Columns labeled Standard Error, T Value and P Value represent the standard error, the test statistic for the test and the value for the test, respectively. These values have been calculated for in this example. The Coefficient column represents the estimate of regression coefficients. These values are calculated using Eqn. (8) as shown in Example 5.1. The Effect column represents values obtained by multiplying the coefficients by a factor of 2. This value is useful in the case of two factor experiments and is explained in Chapter 7. Columns labeled Low CI and High CI represent the limits of the confidence intervals for the regression coefficients and are explained in Chapter 5, Confidence Interval on Regression Coefficients. The Variance Inflation Factor column displays values that give a measure of multicollinearity. This is explained in Chapter 5, Multicollinearity.
|
Figure 5.13: Regression results for the data in Table 5.1. |