|
Type I and Type II Errors and
Their Applications
Type I and Type II errors are
two well-known concepts in quality engineering, which are related
to hypothesis testing. Often engineers are confused by these two
concepts simply because they have many different names. We list
a few of them here.
Type I errors are also called:
1) Producer’s risk 2)
False alarm
3) False negative 4)
error
Type II errors are also called:
1) Consumer’s risk 2)
Misdetection 3) False positive 4)
error
Type I and Type II errors can
be defined in terms of hypothesis testing.
- A Type I error (
)
is the probability of rejecting a true null hypothesis.
- A Type II error (
)
is the probability of failing to reject a false null
hypothesis.
Or simply:
- A Type I error (
)
is the probability of telling you things are wrong, given
that things are correct.
- A Type II error (
)
is the probability of telling you things are correct, given
that things are wrong.
The above statements are
summarized
in Table 1.

One concept related to Type II
errors is "power." Power is the probability of rejecting H0 when
H1 is true. The value of power is equal to 1- .
It is the power to detect the change.
In this article, we will use
two examples to clarify what Type I and Type II errors are and
how they can be applied.
Example 1 - Application in
Manufacturing
Assume an engineer is interested in controlling the diameter of
a shaft. Under the normal (in control) manufacturing process,
the diameter is normally distributed with mean of 10mm and
standard deviation of 1mm. She records the difference between
the measured value and the nominal value for each shaft. If the
absolute value of the difference,
D = M - 10
(M
is the measurement), is beyond a critical value, she will check
to see if the manufacturing process is out of control.
What is the probability that
she will check the machine but the manufacturing process is, in
fact, in control? Or, in other words, what is the probability
that she will check the machine even though the process is in the
normal state and the check is actually unnecessary?
Assume that there is no measurement
error. Under normal manufacturing conditions,
D
is normally distributed with mean of 0 and standard deviation of
1. If the critical value is 1.649, the probability that the
difference is beyond this value (that she will check the
machine), given that the process is in control, is:

So, the probability that she is
going to check the machine but find the process is, in fact, in
control is 0.1. This probability is the Type I error, which may
also be called false alarm rate, a
error, producer’s risk, etc.
The engineer realizes that the
probability of 10% is too high because checking the
manufacturing process is not an easy task and is costly. She
wants to reduce this number to 1% by adjusting the critical
value. The new critical value is calculated as:

Using the inverse normal
distribution, the new critical value is 2.576. From the above
equation, it can be seen that the larger the critical value, the
smaller the Type I error.
By adjusting the critical line
to a higher value, the Type I error is reduced. However, the
engineer now is facing a new issue after the adjustment. Some
customers complain that the diameters of their shafts are too big.
Instead of having a mean value of 10, they have a mean value of
12, which means that the engineer didn’t detect the mean shift and
she needs to adjust the manufacturing process to its normal
state.
What is the probability of
failing to detect the mean shift under the current critical
value, given that the process is indeed out of control? In this
case, the mean of the diameter has shifted. The answer for this
question is found by examining the Type II error.
The engineer asks a
statistician for help. The statistician uses the following
equation to calculate the Type II error:

Here,
is the mean of the difference between the measured and nominal
shaft diameters and
is the standard deviation. The mean value of the diameter
shifting to 12 is the same as the mean of the difference
changing to 2. From the above equation, we can see that the
larger the critical value, the larger the Type II error.
The result tells us that there
is a 71.76% probability that the engineer cannot detect the
shift if the mean of the diameter has shifted to 12. This is the
reason why oversized shafts have been sent to the customers, causing
them to complain.
It seems that the engineer must find
a balance point to reduce both Type I and Type II errors. If she
increases the critical value to reduce the Type I error, the
Type II error will increase. If she reduces the critical value
to reduce the Type II error, the Type I error will increase. The
engineer asks the statistician for additional help.
The statistician notices that
the engineer makes her decision on whether the process needs to be
checked after each measurement. This means the sample size for
decision making is 1. The statistician suggests grouping a
certain number of measurements together and making the decision
based on the mean value of each group. By increasing the sample
size of each group, both Type I and Type II errors will be
reduced. However, a large sample size will delay the detection
of a mean shift. The smallest sample size that can meet both
Type I and Type II error requirements should be determined. The
engineer provides her requirements to the statistician.
The engineer wants:
- The Type I error to be
0.01.
- The Type II error to be
less than 0.1 if the mean value of the diameter shifts from
10 to 12 (i.e. if the difference shifts from
0 to 2).
Tables and curves for
determining sample size are given in many books. These curves
are called Operating Characteristic (OC) Curves. From the
OC curves of Appendix A in reference [1],
the statistician finds that the smallest sample size that meets the
engineer’s requirement is 4.
This sample size also can be
calculated numerically by hand. Assume the sample size is
n
in each group. The mean value and the standard deviation of the
mean value of the deviation (difference between measurement and
nominal value) of each group is 0 and
under the normal manufacturing process. Based on the Type I
error requirement, the critical value for the group mean can be
calculated by the following equation:

Under the abnormal
manufacturing condition (assume the mean of the deviation shifts
to 2 and the standard deviation is unchanged), the Type II error
is:

The smallest integer that
satisfies the above two equations is the value of
n.
Let’s set
n
= 3 first. The critical value is 1.4872 when the sample size is
3. Using this critical value, we get the Type II error of
0.1872, which is greater than the required 0.1. So we increase
to 4. The critical value becomes 1.2879. The corresponding Type
II error is 0.0772, which is less than the required 0.1.
Therefore the final sample size is 4.
By using the mean value of
every 4 measurements, the engineer can control the Type II error
at 0.0772 and keep the Type I error at 0.01.
Sometimes, engineers are interested
only in one-sided changes of their products or processes.
For example, consider the case where the engineer in the
previous example cares only whether the diameter is becoming
larger. The hypothesis test becomes:

Assume the sample size is 1 and
the Type I error is set to 0.05. The critical value will be
1.649. For detecting a shift of
,
the corresponding Type II error is
.
Readers can calculate these values in Excel or in
Weibull++. The relation between the Type I and Type II
errors is illustrated in Figure 1:

Figure 1: Illustration of Type I and Type II Errors
Example 2 - Application in Reliability
Engineering
Type I and Type II errors are also applied in reliability
engineering. For example, in a reliability demonstration test,
engineers usually choose sample size according to the Type II
error.
A reliability engineer needs to demonstrate that the reliability of a
product at a given time is higher than 0.9 at an 80% confidence
level. She decides to perform a zero failure test. How many
samples does she need to test in order to demonstrate the
reliability with this test requirement?
The above problem can be
expressed as a hypothesis test.

is the lower bound of the reliability to be demonstrated. The
engineer must determine the minimum sample size such that the
probability of observing zero failures given that the product
has at least a 0.9 reliability is less than 20%. In other words,
the sample size is determined by controlling the Type II error.
The percentage of time that no more than
f
failures are expected during a pass-fail test is described by
the cumulative binomial equation [2]:

The smallest integer that
n
can satisfy the above equation
is the sample size.

Figure 2: Determining Sample
Size for Reliability Demonstration Testing
Figure 2 shows the Design of
Reliability Tests (DRT) tool in Weibull++, which demonstrates
that the reliability is at least as high as the number entered
in the required inputs. From this analysis, we can see that the
engineer needs to test 16 samples.
One might wonder what the Type
I error would be if 16 samples were tested with a 0 failure
requirement. In order to know this, the reliability value of
this product should be known.
Assume the engineer knows without doubt that the product
reliability is 0.95. What is the Type I error if she uses the
test plan given above? In other words, given a sample
size of 16 units, each with a reliability of 95%, how often will
one or more failures occur?
Using a sample size of 16 and
the critical failure number of 0, the Type I error can be
calculated as:

Therefore, if the true
reliability is 0.95, the probability of not passing the
demonstration testing is 0.5599. In this case, the test plan is
too strict and the producer might want to adjust the number of
units to test to reduce the Type I error.
Conclusion
In this article, we discussed Type I and Type II errors and
their applications. It can be seen that a Type II error is very
useful in sample size determination. In fact, power and sample
size are important topics in statistics and are used widely in
our daily lives. For example, these concepts can help a
pharmaceutical company determine how
many samples are necessary in order to prove that a medicine is useful
at a given confidence level. So the next time a doctor tells you
the effectiveness of a medicine is 99.9%, it is wise to ask
how many patients were treated in the experiment and what the
confidence level was.
References
[1] D. C. Montgomery and G.C. Runger, Applied Statistics and
Probability for Engineers. 2nd Edition, John Wiley & Sons,
New York, 1999.
[2] D. Kececioglu, Reliability & Life Testing Handbook,
Volume 2. Prentice-Hall, New Jersey, 1994.
|