|
Using DOE++ to Analyze
Unbalanced Designs
One of the purposes of using
Design of Experiments (DOE) is to evaluate which factors have
significant effects and how they affect the response of interest
to the experimenters. For example, one might examine how the
speed and the type of gasoline affect the mileage of a vehicle
and which factor has the greater impact. When DOE is used by
reliability engineers, the response of interest is product life;
a reliability engineer will use DOE to investigate which
stresses influence product life and how much they affect life.
In order to obtain an accurate estimation, experiments need to
be carefully designed. A general requirement is that the design
needs to be "balanced." In this article, we will discuss what an
unbalanced design is and how it can be analyzed correctly using
ReliaSoft's
DOE++ software.
Definition: If all the
treatments (combinations of factors) in an experiment have the
same number of observations, the design is called a balanced
design.
Instead of explaining the
benefits of using balanced designs in a convoluted mathematical
way, an intuitive and simple explanation is given here. For
instance, in order to compare the hardness of the steel provided
by two manufacturers, it is better to take the same number of
samples from each manufacturer and compare their mean values. If
you take only 1 sample from Vendor A and take 2000 samples from
Vendor B, the resulting comparison will not be reliable,
especially if the single sample from Vendor A is an outlier.
Analysis of Variance (ANOVA) is
the technique that is usually employed for determining which
factors are the major contributors to the variations of the
response. In other words, this type of analysis is used to
identity which factors have significant influences on the
response. Once these factors are identified, actions can be
taken by adjusting the factors to optimize the response. A
typical ANOVA table for a two-level full factorial design using
two factors looks like the one shown next.
Table 1: A
Typical ANOVA Table
|
Source of
Variation |
Degrees of
Freedom |
Sum of
Squares |
Mean Squares
|
|
Model
|
3
|
368
|
122.6667
|
|
A
|
1
|
193.1429
|
193.1429
|
|
B
|
1
|
56
|
56
|
|
AB
|
1
|
28.5714
|
28.5714
|
|
Residual
|
1
|
648
|
648
|
|
Pure Error
|
1
|
648
|
648
|
|
Total
|
4
|
1016
|
However, the method given in
many DOE textbooks to calculate the sum of squares in the ANOVA
table is applicable only for balanced designs.[1,
2] (Dr. S.R. Searle provides an excellent discussion of
balanced designs.[3])
Although heavily unbalanced
designs should be avoided, there are situations where it may be
necessary to work with an unbalanced design. Many engineers may
not be aware that the method taught in textbooks cannot be used
to obtain correct results in such cases. Fortunately, many DOE
software packages can handle unbalanced designs correctly. In
this article, we will illustrate why the calculation method for
balanced designs cannot be used for unbalanced designs and will
provide the results calculated by DOE++.
Example 1: Balanced Design
This simple example is based on the following data:
Table 2: Data
for a Balanced Design
| |
Factor B |
Row
Total |
|
Level 1 |
Level 2 |
|
Factor A |
Level 1 |
5, 7 |
13, 15 |
40 |
|
Level 2 |
20, 30 |
10, 14 |
74 |
|
Column Total |
62 |
52 |
114 |
It is a balanced design because
it has an equal number of observations for each treatment
(cell). There are only two observations for each treatment. The
data can be entered into DOE++ using a two-level full
factorial design with two factors and two replicates.

Figure 1: Example 1, Data Input in DOE++
For a balanced design, the sums
of squares for factors A, B and AB can be calculated using the
method given in the textbooks:[1,
2]

where:
- a
is the number of levels for factor A.
- b
is the number of levels for factor B.
- n
is the number of observations of each cell or treatment.
-
,
,
and
are
the corresponding, column, row, cell and grand average.
Expressed mathematically:

Using the above equations:

which matches the following
results from DOE++ using the Individual Test Terms
option:

Figure 2: Example 1, ANOVA Table in DOE++
The ANOVA table shows that
factor A and the interaction effect AB are significant at the
risk level of 0.1, while factor B is not. In other words, the
p values for factor A and the interaction effect AB are
less than the risk level used in the analysis of 0.10. Some
readers may notice that the calculated sum of squares is
"Partial Sum of Squares." In fact, there are several different
types of sums of squares. DOE++ provides example files
that offer discussions of many advanced topics in Design of
Experiments, such as balanced vs. unbalanced design, partial vs.
sequential sum of squares, etc. For additional details, please
refer to the examples shipped with DOE++ (accessible by
choosing Help > Open Examples Folder) and the
accompanying reference book.[4]
Example 2: Unbalanced Design
If we add one more observation to the data in Example 1, it
becomes an unbalanced design. The modified data set, which
includes one more observation in cell 1 (A = Level 1, B = Level
1), is given in Table 3:
Table 3: Data
for an Unbalanced Design
| |
Factor B |
Row
Total |
|
Level 1 |
Level 2 |
|
Factor A |
Level 1 |
5, 7,
70 |
13, 15 |
110 |
|
Level 2 |
20, 30 |
10, 14 |
74 |
|
Column Total |
132 |
52 |
184 |
For this unbalanced design, the
equations used in example 1 to calculate the sum of squares are
not applicable. For example, if you calculate SSAB using an
analogous equation:

Clearly, this result is wrong;
the sum of squares cannot be negative, since squares cannot be
negative. This result indicates that there is something wrong
with the calculation method. Therefore, the method for the sum
of squares for balanced designs cannot be used for unbalanced
designs. Using DOE++, you will get the sum of squares as:

Figure 4: Example 2, ANOVA Table in DOE++ for Example 2
Instead of using the above
formulas to calculate the sum of squares for each term, DOE++
always uses the linear regression method; using this method,
SSAB is calculated to be 0.0606. From the results in Examples 1
and 2, we can see that the linear regression method can
calculate the sum of squares correctly for both balanced and
unbalanced designs, so it makes sense for DOE++ to use
this method in all cases. For the details of the calculation,
please refer to ReliaSoft's Experiment Design and Analyis
Reference.[4]
Reference:
[1] Montgomery, D., Design and Analysis of
Experiments, 5th edition, 2001, New York: John Wiley & Sons,
2001, p. 180.
[2] Wu, C. F. and Hamada, M.,
Experiments: Planning, Analysis, and Parameter Design
Optimization, New York: John Wiley & Sons, 2000, pp. 57-58.
[3] Searle, S. R. "A Biometrics Invited Paper:
Topics in Variance Component Estimation," Biometrics,
vol. 27, no. 1, pp 1-76.
[4] ReliaSoft Corporation, Experiment Design
and Analysis Reference, Tucson, AZ: ReliaSoft Publishing,
2008.
|