Test on Subsets of Regression Coefficients (Partial F Test)

This section discusses the partial F test, which can be performed on subsets of regression coefficients. This section also includes the following subsections:

 

 

The partial F test can be considered to be the general form of the test mentioned in the previous section. This is because the test simultaneously checks the significance of including many (or even one) regression coefficients in the multiple linear regression model. Adding a variable to a model increases the regression sum of squares, . The test is based on this increase in the regression sum of squares. The increase in the regression sum of squares is called the extra sum of squares. [Note]

 

Assume that the vector of the regression coefficients, , for the multiple linear regression model, , is partitioned into two vectors with the second vector, , containing the last regression coefficients, and the first vector, , containing the first () coefficients as follows:MATH

with:MATH

The hypothesis statements to test the significance of adding the regression coefficients in to a model containing the regression coefficients in may be written as:MATH

The test statistic for this test follows the distribution and can be calculated as follows:MATH(21)

where is the increase in the regression sum of squares when the variables corresponding to the coefficients in are added to a model already containing , and is obtained from Eqn. (18). The value of the extra sum of squares is obtained as explained in the next section.

 

The null hypothesis, , is rejected if . Rejection of leads to the conclusion that at least one of the variables in , ... contributes significantly to the regression model. [Note] In DOE++, the results from the partial test are displayed in the ANOVA table.

Types of Extra Sum of Squares

The extra sum of squares can be calculated using either the partial (or adjusted) sum of squares or the sequential sum of squares. The type of extra sum of squares used affects the calculation of the test statistic of Eqn. (21). In DOE++, selection for the type of extra sum of squares is available in the Options tab of the Control Panel as shown in Figure 5.14. The partial sum of squares is used as the default setting. The reason for this is explained in the following section on the partial sum of squares.

 

Figure 5.14: Selection of the type of extra sum of squares in DOE++.

 

Partial Sum of Squares

The partial sum of squares for a term is the extra sum of squares when all terms, except the term under consideration, are included in the model. For example, consider the model:MATH(22)

 

Assume that we need to know the partial sum of squares for . The partial sum of squares for is the increase in the regression sum of squares when is added to the model. This increase is the difference in the regression sum of squares for the full model of Eqn. (22) and the model that includes all terms except . These terms are , and . The model that contains these terms is:MATH(23)

 

The partial sum of squares for can be represented as and is calculated as follows:MATH

 

For the present case, and . It can be noted that for the partial sum of squares contains all coefficients other than the coefficient being tested.

 

DOE++ has the partial sum of squares as the default selection. This is because the test explained in Chapter 5, Test on Individual Regression Coefficients, is a partial test, i.e. the test on an individual coefficient is carried by assuming that all the remaining coefficients are included in the model (similar to the way the partial sum of squares is calculated). The results from the test are displayed in the Regression Information table. The results from the partial test are displayed in the ANOVA table. To keep the results in the two tables consistent with each other, the partial sum of squares is used as the default selection for the results displayed in the ANOVA table.

 

The partial sum of squares for all terms of a model may not add up to the regression sum of squares for the full model when the regression coefficients are correlated. If it is preferred that the extra sum of squares for all terms in the model always add up to the regression sum of squares for the full model then the sequential sum of squares should be used.

 

Example 5.4

 

This example illustrates the partial test using the partial sum of squares. The test is conducted for the coefficient corresponding to the predictor variable for the data in Table 5.1.

 

The regression model used for this data set in Example 5.1 is:MATH

 

The null hypothesis to test the significance of is:MATH

 

The statistic to test this hypothesis is:MATH

where represents the partial sum of squares for , represents the number of degrees of freedom for (which is one because there is just one coefficient, , being tested) and is the error mean square that can be obtained using Eqn. (18) and has been calculated in Example 5.2 as 30.24. [Note]

 

The partial sum of squares for is the difference between the regression sum of squares for the full model, , and the regression sum of squares for the model excluding , . The regression sum of squares for the full model can be obtained using Eqn. (31) and has been calculated in Example 5.2 as . Therefore:MATH

 

The regression sum of squares for the model is obtained as shown next. First the design matrix for this model, , is obtained by dropping the second column in the design matrix of the full model, (the full design matrix, , was obtained in Example 5.1). The second column of corresponds to the coefficient which is no longer in the model. Therefore, the design matrix for the model, , is:

MATH

The hat matrix corresponding to this design matrix is . It can be calculated using . Once is known, the regression sum of squares for the model , can be calculated using Eqn. (17) as: MATH

Therefore, the partial sum of squares for is:MATH

 

Knowing the partial sum of squares, the statistic to test the significance of is:MATH

 

The value corresponding to this statistic based on the distribution with 1 degree of freedom in the numerator and 14 degrees of freedom in the denominator is: [Note_8] MATH

Assuming that the desired significance is 0.1, since value < 0.1, is rejected and it can be concluded that is significant. The test for can be carried out in a similar manner. In the results obtained from DOE++, the calculations for this test are displayed in the ANOVA table as shown in Figure 5.15. Note that the conclusion obtained in this example can also be obtained using the test as explained in Example 5.3 in Chapter 5, Test on Individual Regression Coefficients. The ANOVA and Regression Information tables in DOE++ represent two different ways to test for the significance of the variables included in the multiple linear regression model.

 

 

Figure 5.15: ANOVA results for the data in Table 5.1.

 

Sequential Sum of Squares

The sequential sum of squares for a coefficient is the extra sum of squares when coefficients are added to the model in a sequence. For example, consider the model:MATH(24)

 

The sequential sum of squares for is the increase in the sum of squares when is added to the model observing the sequence of Eqn. (24). Therefore this extra sum of squares can be obtained by taking the difference between the regression sum of squares for the model after was added and the regression sum of squares for the model before was added to the model. The model after is added is as follows:MATH(25)

 

This is because to maintain the sequence of Eqn. (24) all coefficients preceding must be included in the model. These are the coefficients , , , and .

Similarly the model before is added must contain all coefficients of Eqn. (25) except . This model can be obtained as follows:MATH(26)

 

The sequential sum of squares for can be calculated as follows:MATH

 

For the present case, and . It can be noted that for the sequential sum of squares contains all coefficients proceeding the coefficient being tested.

 

The sequential sum of squares for all terms will add up to the regression sum of squares for the full model, but the sequential sum of squares are order dependent.

 

Example 5.5

 

This example illustrates the partial test using the sequential sum of squares. The test is conducted for the coefficient corresponding to the predictor variable for the data in Table 5.1. The regression model used for this data set in Example 5.1 is:MATH

 

The null hypothesis to test the significance of is:MATH

 

The statistic to test this hypothesis is:MATH

where represents the sequential sum of squares for , represents the number of degrees of freedom for (which is one because there is just one coefficient, , being tested) and is the error mean square that can obtained using Eqn. (18) and has been calculated in Example 5.2 as 30.24. [Note]

 

The sequential sum of squares for is the difference between the regression sum of squares for the model after adding , , and the regression sum of squares for the model before adding , .

 

The regression sum of squares for the model is obtained as shown next. First the design matrix for this model, , is obtained by dropping the third column in the design matrix for the full model, (the full design matrix, , was obtained in Example 5.1). The third column of corresponds to coefficient which is no longer used in the present model. Therefore, the design matrix for the model, , is:

MATH

The hat matrix corresponding to this design matrix is . It can be calculated using . Once is known, the regression sum of squares for the model can be calculated using Eqn. (17) as:MATH

The regression sum of squares for the model is equal to zero since this model does not contain any variables. Therefore:MATH

The sequential sum of squares for is:MATH

Knowing the sequential sum of squares, the statistic to test the significance of is:MATH

 

The value corresponding to this statistic based on the distribution with 1 degree of freedom in the numerator and 14 degrees of freedom in the denominator is: [Note] MATH

Assuming that the desired significance is 0.1, since value < 0.1, is rejected and it can be concluded that is significant. The test for can be carried out in a similar manner. This result is shown in Figure 5.16.

 

Figure 5.16: Sequential sum of squares for the data in Table 5.1.

 
See Also:
 
Hypothesis Tests in Multiple Linear Regression
Confidence Intervals in Multiple Linear Regression
Estimating Regression Models Using Least Squares