|
Discussion of
the Residual Sum of Squares in DOE
Variation occurs in nature, be
it the tensile strength of a particular grade of steel, the caffeine
content in your energy drink or the distance traveled by your
vehicle in a day. Variations are also seen in the observations
recorded during multiple executions of a process, even when
conditions are kept as homogeneous as possible. The natural
variations that occur in a process, even when all factors are
maintained at the same level, are often termed noise. In
Design of Experiments (DOE) analysis, a key goal is to
study the effect(s) of one or more factors on measurements of interest (or responses)
for a product or process.
It thus becomes extremely important to distinguish the changes
in the response caused by a factor from those caused by noise. A
number of statistical methods, generally referred to as hypothesis testing,
are available to achieve this. These methods normally involve
quantifying the variability due to noise or error in
conjunction with the variability due to different factors and
their interactions. It is then possible to determine whether the changes
observed in the response(s) due to changes in the factor(s) are
significant.
In this article, we will
discuss different components of error and give an example of how
the error and each of its components are defined and used in the
context of hypothesis testing.
Background
In a DOE analysis, the error is quantified in terms of the
residual sum of squares. To illustrate this, consider the
standard regression model:
where
a
and b are coefficients
and εi is the error term.
Given this model, the fitted
value of yi is:
|
 |
(2) |
where
and
are estimated
coefficients.
The discrepancy between the
observed value and the fitted value is called the residual. Note
that while the residual and the error term are often used
interchangeably, the residual is the estimate of the error term
εi.
The residual sum of squares
(SSE) is an overall measurement of the discrepancy between the
data and the estimation model. The smaller the discrepancy, the
better the model's estimations will be. The discrepancy is quantified in
terms of the sum of squares of the residuals.
The sum of squares of the
residuals usually can be divided into two parts: pure error and
lack of fit. They are discussed in subsequent sections.
As mentioned, it is of great
importance to be able to identify which terms are significant.
(The word term refers to an effect in the model — for example,
the effect of factor A,
the effect of factor B or the effect of their interaction, AB.) To accomplish this, the mean square of
the term of interest is tested against the error mean square.
The mean square of the error is defined as the sum of squares of
the error divided by the degrees of freedom attributed to the
error.
|
 |
(4) |
The degrees of freedom of the
error is the number of observations in excess of the unknowns.
For example, if there are 3 unknowns and 7 independent
observations are taken then the degrees of freedom value is 4 (7 − 3 =
4).
The mean square of a term is
defined as the sum of squares of that term divided by the
degrees of freedom attributed to that term.
|
 |
(5) |
The degrees of freedom of a
term is the number of independent effects for that term. For
example, for a factor with 4 levels (i.e. 4 different
factor values or settings), only three are independent.
The
degrees of freedom value associated with that term is then 3. For more
information, see [1].
To test the hypothesis H0: the
tested term is not significant, a test statistic is needed. The
ratio between the term mean square and the error mean square is
called the F ratio.
|
 |
(6) |
It can be shown that if a term
is not significant, this ratio follows the F distribution. H0 is
rejected (i.e. the term is considered significant) if:

where
is the percentile of the
F distribution corresponding to a cumulative probability of (1-
a) and
a is the significance level.
Example
The following example shows the
definition and applications of the residual sum of squares, the
pure error and the lack of fit.
Consider the following
two-level design, where the two levels are coded as -1 and 1. Y
is the measurement of interest or the response. The obtained
measurements are shown next.
|
A |
B |
Y |
| -1 |
-1 |
12 |
| 1 |
-1 |
1 |
| -1 |
1 |
32 |
| 1 |
1 |
9 |
| -1 |
-1 |
23 |
| 1 |
-1 |
3 |
| -1 |
1 |
10 |
| 1 |
1 |
8 |
Each run of this experiment
involves a combination of the levels of the investigated
factors, A and B. Each of these combinations is referred to as a
treatment. Since there are two factors with two levels each,
there are 4 (22 = 4) possible combinations for the factor
settings. Since all possible combinations are present, this
design is called a full factorial design.
Multiple runs at a given
treatment are called replicates. This particular example has 2 replicates for each treatment (i.e.
there are 2 replicates of
the treatment with A = -1 and B = -1, 2 replicates of the
treatment with A = 1 and B = -1, etc.)
Analysis Results
Assuming that the interaction
term is not of interest, the following results are obtained from
ReliaSoft’s DOE++
software.

Figure
1: DOE++ Analysis Results
[Click to Enlarge]
The following discussion will
focus on the section marked above in red.
Residual Error
The regression model in this
analysis, not considering the interaction term AB, is:
|
 |
(7) |
with
a0 = 12.25,
a1 = -7 and
a2
= 2.5 (obtained from the Regression Information Table above
under the Coefficient column). The residual sum of squares can
then be obtained using equation (1).

Given 3 unknowns in the model,
a0,
a1 and
a2, and 8 observations, the degrees of freedom of the
error can be found.
Pure Error
Pure error reflects the
variability of the observations within each treatment. The sum
of squares for the pure error is the sum of the squared
deviations of the responses from the mean response in each set
of replicates.
In this example, there are four
treatments. The mean values for the treatments are shown below.
|
A |
B |
Y mean |
|
-1 |
-1 |
17.5 |
|
1 |
-1 |
2 |
|
-1 |
1 |
21 |
|
1 |
1 |
8.5 |
The pure error can be
calculated as follows.
- The pure error for the
treatment with A = -1 and B = -1 is:

- The pure error for the
treatment with A = 1 and B = -1 is:

- The pure error for the
treatment with A = -1 and B = 1 is:

- The pure error for the
treatment with A = 1 and B = 1 is:

- The pure error sum of squares
is then:

The degrees of freedom
corresponding to the pure error sum of squares is:

where
n
is the number of treatments and
m
is the number of replicates.
For the above example:
The mean square of the pure
error can now be obtained as follows:

The mean square of the pure
error can then be used to test the adequacy of the model as
shown in the next section.
Lack of Fit Error
The lack of fit measures the
error due to deficiency in the model. In this particular example,
the deficiency is explained by the missing term AB in the model.
If a regression model fits the
data well, the mean square of the lack of fit error should be
close to the mean square of the pure error. Therefore, the lack
of fit error can be used to test whether the model can fit the data
well. If the lack of fit term is significant, on the other hand,
we would reject the null hypothesis and would conclude that the
model is not adequate.
The lack of fit sum of squares
is calculated as follows:

The degrees of freedom are
calculated as follows:

The mean square of the lack of
fit can be obtained by:

The statistic to test the
significance of the lack of fit can then be calculated as
follows:

Note that when we are testing
for significance of the lack of fit, the denominator is the mean
square of the pure error, while when we are testing for
significance of a term in the model, the denominator is the mean
square of the residual as shown in equation (3).
The critical value for this
test at a 0.05 significance level is:

Since
(0.59 < 7.709), at a
significance level of 0.05, we fail to reject the hypothesis
that the model adequately fits the data. In other words, at that
significance level, it is acceptable to use the reduced model
that does not include the interaction term.
Alternatively, we could also
find the lowest significance level,
a, that would lead to the
rejection of the null hypothesis at the given value of the test
statistic and to the conclusion that the model is not
acceptable. This is also called the p value and it is defined
as:

For this example, it can be
calculated as follows:

Conclusion
This article presented some
background on the calculation of the error and its different
components. Of particular interest was the use of the mean
square of the pure error and the lack of fit to test for the
validity of the chosen model. Although not illustrated here,
once the residual error is obtained, it can be used to test for
the significance of any term in the model.
References
[1] ReliaSoft Corporation,
Experiment Design and Analysis Reference, ReliaSoft Publishing,
Tucson, AZ, 2008.
|