Analyzing
Accelerated Test Data with Unrelated Failures
When running an
accelerated test, it may be assumed that the test equipment is going to
operate properly when it is requested to do so. In reality, though, the
test equipment will more than likely fail over a certain period of time
(just like the units being tested) and may need to be repaired or
replaced. But what happens if the test equipment actually fails during
the test? What do you do then? How do you treat the units that were
unable to complete the test in the analysis? One possible solution would
be to simply remove the units from the analysis since they did not
actually complete the test. However, valuable information may be lost by
not including these units. If you do include the units in the analysis,
should they be considered failures or suspensions? How you answer these
questions will play an important role in your analysis of accelerated
test data.
Example
An electronic component was redesigned and was tested to failure at
three different temperatures. Six units were tested at each stress
level. At the 406K stress level, however, a unit was removed after 0.3
hours from the test due to a test equipment failure. This led to a
failure of the component. A warranty time of one year is to be given,
with an expected return of 10% of the population. The times-to-failure
data set is shown in the following table.
Times-to-failure (hrs) |
406K |
416K |
426K |
(0.3) |
164 |
92 |
248 |
176 |
105 |
456 |
289 |
155 |
528 |
319 |
184 |
731 |
340 |
219 |
813 |
543 |
235 |
The operating
temperature is 356K. Using the Arrhenius-Weibull model, the objective is
to determine the following:
- Should the first
failure at 406K be included in the analysis?
- Determine the
warranty time for 90% reliability.
- Determine the 90%
lower confidence limit on the warranty time.
- Is the warranty
requirement met? If not, what steps should be taken?
- Repeat the analysis
with the unrelated failure included. Is there any difference?
- If the unrelated
failure had occurred at 500 hr, should it be included in the analysis?
Solution
-
Since the failure occurred at the very beginning of the test, and
for an unrelated reason, it can be omitted from the analysis. If it
is included, it should be treated as a suspension and not as a
failure.
- The first failure at
406K was neglected and the data were analyzed using ReliaSoft's
ALTA
software. The following parameters were obtained:
β
= 2.9658
B = 10679.57
C = 2.39662E-9
The use level probability plot (at 356K) can then be obtained. The
warranty time for a reliability of 90% (or an unreliability of 10%) can
be estimated from the the plot, as shown next.

This estimate can also be obtained from the Arrhenius plot (a life vs.
stress plot). The 10th percentile (time for a reliability of 90%) is
plotted versus stress. This type of plot is useful because a time for a
given reliability can be determined for different stress levels.

A more accurate way of determining the warranty time would be to use
ALTA's Quick Calculation Pad (QCP). By selecting the Warranty (Time)
Information option from the Basic Calculations tab in the QCP and
entering 356 for the temperature and 90 for the required reliability, a
warranty time of 11,977.793 hr can be determined, as shown next.

-
The warranty time for a 90% reliability was estimated to be
approximately 12,000 hr. This is above the 1 year (8,760 hr)
requirement. However, this is an estimate at the 50% confidence
level. In other words, 50% of the time, life will be greater than
12,000 hr and 50% of the time life will be less. A known confidence
level is therefore crucial before any decisions are made. Using
ALTA, confidence bounds can be plotted on both probability and
Arrhenius plots. In the following use level probability plot, the
90% lower confidence level (LCL) is plotted. Note that percentile
bounds are type 1 confidence bounds in ALTA.

An estimated 4,300 hr
warranty time at a 90% lower confidence level was obtained from the use
level probability plot. This means that 90% of the time, life will be
greater than this value. In other words, a life of 4,300 hr is a
bounding value for the warranty. The Arrhenius plot with the 90% lower
confidence level is shown next.

Using the QCP and specifying a 90% lower confidence level, a warranty
time of 4436.5 hr is estimated.

- The warranty time
for this component is estimated to be 4,436.5 hr at a 90% lower
confidence bound. This is much less than the 1 year warranty time
required (almost 6 months). Thus, the desired warranty is not met. In
this case, the following four options are available:
- Redesign
- Reduce the
confidence level
- Change the
warranty policy
- Test additional
units at stress levels closer to the use level
- Including the
unrelated failure of 0.3 hr at 406 K (by treating it as a suspension at
that time), the following results are obtained:
β
= 2.9658
B = 10679.57
C = 2.39662E-9
These results are
identical to the ones with the unrelated failure excluded. A small
difference can be seen only if more significant digits are considered.
The warranty time with the 90% lower 1-sided confidence bound was
estimated to be:
T = 11.977.729 hr
TL= 4436.46 hr
Again, the difference
is negligible. This is due to the very early time at which this
unrelated failure occurred.
- The analysis is
repeated treating the unrelated failure at 500 hr as a suspension, with
the following results:
β
= 3.0227
B = 10959.52
C = 1.23808E-9
In this case, the
results are very different. The warranty time with the 90% lower 1-sided
confidence bound is estimated to be:
T = 13780.208 hr
TL
= 5303.67 hr
It can be seen that
in this case, it would be a mistake to neglect the unrelated failure. In
doing this, we would actually underestimate the warranty time. The
important observation in this example is that every piece of life
information is crucial. In other words, unrelated failures also provide
information about the life of the product. An unrelated failure
occurring at 500 hr indicates that the product has survived for that
period of time under the particular stress level, thus ignoring it would
be a mistake. On the other hand, it would also be a mistake to treat
this data point as a failure since the failure was caused by the failure
of the test equipment.
|