Hypothesis Tests in Multiple Linear Regression

This section discusses hypothesis tests on the regression coefficients in multiple linear regression. As in the case of simple linear regression, these tests can only be carried out if it can be assumed that the random error terms, , are normally and independently distributed with a mean of zero and variance of .

 

Three types of hypothesis tests can be carried out for multiple linear regression models:

 

  1. Test for significance of regression

    This test checks the significance of the whole regression model.

  2. t test

    This test checks the significance of individual regression coefficients.

  3. Partial F test

    This test can be used to simultaneously check the significance of a number of regression coefficients. It can also be used to test individual coefficients.

Test for Significance of Regression

The test for significance of regression in the case of multiple linear regression analysis is carried out using the analysis of variance. The test is used to check if a linear statistical relationship exists between the response variable and at least one of the predictor variables. The statements for the hypotheses are:MATHThe test for is carried out using the following statistic:MATHwhere is the regression mean square and is the error mean square. If the null hypothesis, , is true then the statistic follows the distribution with degrees of freedom in the numerator and () degrees of freedom in the denominator. [Note] The null hypothesis, , is rejected if the calculated statistic, , is such that:MATH

Calculation of the Statistic

To calculate the statistic , the mean squares and must be known. As explained in Chapter 4, the mean squares are obtained by dividing the sum of squares by their degrees of freedom. For example, the total mean square, , is obtained as follows:MATH(14)

where is the total sum of squares and is the number of degrees of freedom associated with . In multiple linear regression, the following equation is used to calculate : [Note] MATH(15)

where is the total number of observations, is the vector of observations (that was defined in Chapter 5, Estimating Regression Models Using Least Squares), is the identity matrix of order and represents an square matrix of ones. The number of degrees of freedom associated with , , is (). Knowing and the total mean square, , can be calculated.

 

The regression mean square, , is obtained by dividing the regression sum of squares, , by the respective degrees of freedom, , as follows:MATH(16)

The regression sum of squares, , is calculated using the following equation:MATH(17)

where is the total number of observations, is the vector of observations, is the hat matrix (that was defined in Chapter 5, Estimating Regression Models Using Least Squares) and represents an square matrix of ones. The number of degrees of freedom associated with , , is , where is the number of predictor variables in the model. Knowing and the regression mean square, , can be calculated.

 

The error mean square, , is obtained by dividing the error sum of squares, , by the respective degrees of freedom, , as follows:MATH(18)

 

The error sum of squares, , is calculated using the following equation:MATH(19)

where is the vector of observations, is the identity matrix of order and is the hat matrix. The number of degrees of freedom associated with , , is , where is the total number of observations and is the number of predictor variables in the model. Knowing and , the error mean square, , can be calculated. The error mean square is an estimate of the variance, , of the random error terms, . MATH

 

Example 5.2

 

The test for the significance of regression, for the regression model obtained for the data in Table 5.1, is illustrated in this example. The null hypothesis for the model is:MATHThe statistic to test is:MATH

To calculate , first the sum of squares are calculated so that the mean squares can be obtained. Then the mean squares are used to calculate the statistic to carry out the significance test.

The regression sum of squares, , can be obtained as:MATH

The hat matrix, is calculated as follows using the design matrix from Example 5.1:MATH

 

Knowing , and , the regression sum of squares, , can be calculated:MATH

 

The degrees of freedom associated with is , which equals to a value of two since there are two predictor variables in the data in Table 5.1. Therefore, the regression mean square is:MATH

Similarly to calculate the error mean square, , the error sum of squares, , can be obtained as:MATH

The degrees of freedom associated with is . Therefore, the error mean square, , is:MATH

The statistic to test the significance of regression can now be calculated as:MATH

The critical value for this test, corresponding to a significance level of 0.1, is:MATH

Since , is rejected and it is concluded that at least one coefficient out of and is significant. In other words, it is concluded that a regression model exists between yield and either one or both of the factors in Table 5.1. The analysis of variance is summarized in Table 5.2.

 

Table 5.2: ANOVA table for the significance of regression test in Example 5.2.

 

Test on Individual Regression Coefficients ( Test)

The test is used to check the significance of individual regression coefficients in the multiple linear regression model. Adding a significant variable to a regression model makes the model more effective, while adding an unimportant variable may make the model worse. The hypothesis statements to test the significance of a particular regression coefficient, , are:MATH

The test statistic for this test is based on the distribution (and is similar to the one used in the case of simple linear regression models in Chapter 4):MATH(20)

where the standard error, , is obtained from Eqn. (13). The analyst would fail to reject the null hypothesis if the test statistic, calculated using Eqn. (20), lies in the acceptance region:MATH

 

This test measures the contribution of a variable while the remaining variables are included in the model. For the model , if the test is carried out for , then the test will check the significance of including the variable in the model that contains and (i.e. the model ). Hence the test is also referred to as partial or marginal test. In DOE++, this test is displayed in the Regression Information table.

 

Example 5.3

 

The test to check the significance of the estimated regression coefficients for the data in Table 5.1 is illustrated in this example. The null hypothesis to test the coefficient is:MATH

The null hypothesis to test can be obtained in a similar manner. To calculate the test statistic, , we need to calculate the standard error using Eqn. (13).

 

In Example 5.2, the value of the error mean square, , was obtained as 30.24. The error mean square is an estimate of the variance, . Therefore: MATH

 

The variance-covariance matrix of the estimated regression coefficients is:MATH

 

From the diagonal elements of , the estimated standard error for and is:MATH

 

The corresponding test statistics for these coefficients are:MATH

 

The critical values for the present test at a significance of 0.1 are:MATH

 

Considering , it can be seen that does not lie in the acceptance region of . The null hypothesis, , is rejected and it is concluded that is significant at . This conclusion can also be arrived at using the value noting that the hypothesis is two-sided. The value corresponding to the test statistic, , based on the distribution with 14 degrees of freedom is:MATHSince the value is less than the significance, , it is concluded that is significant. The hypothesis test on can be carried out in a similar manner.

 

As explained in Chapter 4, in DOE++, the information related to the test is displayed in the Regression Information table as shown in Figure 5.13. In this table, the test for is displayed in the row for the term Factor 2 because is the coefficient that represents this factor in the regression model. Columns labeled Standard Error, T Value and P Value represent the standard error, the test statistic for the test and the value for the test, respectively. These values have been calculated for in this example. The Coefficient column represents the estimate of regression coefficients. These values are calculated using Eqn. (8) as shown in Example 5.1. The Effect column represents values obtained by multiplying the coefficients by a factor of 2. This value is useful in the case of two factor experiments and is explained in Chapter 7. Columns labeled Low CI and High CI represent the limits of the confidence intervals for the regression coefficients and are explained in Chapter 5, Confidence Interval on Regression Coefficients. The Variance Inflation Factor column displays values that give a measure of multicollinearity. This is explained in Chapter 5, Multicollinearity.

 

Figure 5.13: Regression results for the data in Table 5.1.

 
See Also:
 
Estimating Regression Models Using Least Squares
Tests on Subsets of Regression Coefficients
Simple Linear Regression Analysis
Hypothesis Tests in Simple Linear Regression
Two Level Factorial Experiments
Multicollinearity