Use of Regression to Calculate Sum of Squares

This section explains the reason behind the use of regression in DOE++ in all calculations related to the sum of squares. A number of textbooks present the method of direct summation to calculate the sum of squares. But this method is only applicable for balanced designs and may give incorrect results for unbalanced designs. For example, the sum of squares for factor in a balanced factorial experiment with two factors, and , is given as follows:MATH

where represents the levels of factor , represents the levels of factor , and represents the number of samples for each combination of and . The term is the mean value for the th level of factor , is the sum of all observations at the th level of factor and is the sum of all observations.

 

The analogous term to calculate in the case of an unbalanced design is given as:MATH

where is the number of observations at the th level of factor and is the total number of observations. Similarly, to calculate the sum of squares for factor and interaction , the formulas are given as:MATH

Applying these relations to the unbalanced data of Table 6.6, the sum of squares for the interaction is:MATH

 

Table 6.6: Example of an unbalanced design.

 

which is obviously incorrect since the sum of squares cannot be negative. For a detailed discussion on this refer to [23].

 

The correct sum of squares can be calculated as shown next. The and matrices for the design of Table 6.6 can be written as:MATH

Then the sum of squares for the interaction can be calculated as:MATH

where is the hat matrix and is the matrix of ones. The matrix can be calculated using where is the design matrix, , excluding the last column that represents the interaction effect . Thus, the sum of squares for the interaction is:MATH

 

This is the value that is calculated by DOE++ (see Figure 6.15 for the experiment design and Figure 6.16 for the analysis).

 

Figure 6.15: Unbalanced experimental design for the data in Table 6.6.

   

Figure 6.16: Analysis for the unbalanced data in Table 6.6.

 

See Also:

 

Blocking

Two Level Factorial Experiments