Reliability HotWire

Issue 30, August 2003

Hot Topics

# Analyzing Accelerated Test Data with Unrelated Failures

When running an accelerated test, it may be assumed that the test equipment is going to operate properly when it is requested to do so. In reality, though, the test equipment will more than likely fail over a certain period of time (just like the units being tested) and may need to be repaired or replaced. But what happens if the test equipment actually fails during the test? What do you do then? How do you treat the units that were unable to complete the test in the analysis? One possible solution would be to simply remove the units from the analysis since they did not actually complete the test. However, valuable information may be lost by not including these units. If you do include the units in the analysis, should they be considered failures or suspensions? How you answer these questions will play an important role in your analysis of accelerated test data.

Example

An electronic component was redesigned and was tested to failure at three different temperatures. Six units were tested at each stress level. At the 406K stress level, however, a unit was removed after 0.3 hours from the test due to a test equipment failure. This led to a failure of the component. A warranty time of one year is to be given, with an expected return of 10% of the population. The times-to-failure data set is shown in the following table.

 Times-to-failure (hrs) 406K 416K 426K (0.3) 164 92 248 176 105 456 289 155 528 319 184 731 340 219 813 543 235

The operating temperature is 356K. Using the Arrhenius-Weibull model, the objective is to determine the following:

1. Should the first failure at 406K be included in the analysis?
2. Determine the warranty time for 90% reliability.
3. Determine the 90% lower confidence limit on the warranty time.
4. Is the warranty requirement met? If not, what steps should be taken?
5. Repeat the analysis with the unrelated failure included. Is there any difference?
6. If the unrelated failure had occurred at 500 hr, should it be included in the analysis?

Solution

1. Since the failure occurred at the very beginning of the test, and for an unrelated reason, it can be omitted from the analysis. If it is included, it should be treated as a suspension and not as a failure.

2. The first failure at 406K was neglected and the data were analyzed using ReliaSoft's ALTA software. The following parameters were obtained:

β = 2.9658

B = 10679.57

C = 2.39662E-9

The use level probability plot (at 356K) can then be obtained. The warranty time for a reliability of 90% (or an unreliability of 10%) can be estimated from the the plot, as shown next.

This estimate can also be obtained from the Arrhenius plot (a life vs. stress plot). The 10th percentile (time for a reliability of 90%) is plotted versus stress. This type of plot is useful because a time for a given reliability can be determined for different stress levels.

A more accurate way of determining the warranty time would be to use ALTA's Quick Calculation Pad (QCP). By selecting the Warranty (Time) Information option from the Basic Calculations tab in the QCP and entering 356 for the temperature and 90 for the required reliability, a warranty time of 11,977.793 hr can be determined, as shown next.

1. The warranty time for a 90% reliability was estimated to be approximately 12,000 hr. This is above the 1 year (8,760 hr) requirement. However, this is an estimate at the 50% confidence level. In other words, 50% of the time, life will be greater than 12,000 hr and 50% of the time life will be less. A known confidence level is therefore crucial before any decisions are made. Using ALTA, confidence bounds can be plotted on both probability and Arrhenius plots. In the following use level probability plot, the 90% lower confidence level (LCL) is plotted. Note that percentile bounds are type 1 confidence bounds in ALTA.

An estimated 4,300 hr warranty time at a 90% lower confidence level was obtained from the use level probability plot. This means that 90% of the time, life will be greater than this value. In other words, a life of 4,300 hr is a bounding value for the warranty. The Arrhenius plot with the 90% lower confidence level is shown next.

Using the QCP and specifying a 90% lower confidence level, a warranty time of 4436.5 hr is estimated.

1. The warranty time for this component is estimated to be 4,436.5 hr at a 90% lower confidence bound. This is much less than the 1 year warranty time required (almost 6 months). Thus, the desired warranty is not met. In this case, the following four options are available:
• Redesign
• Reduce the confidence level
• Change the warranty policy
• Test additional units at stress levels closer to the use level
1. Including the unrelated failure of 0.3 hr at 406 K (by treating it as a suspension at that time), the following results are obtained:

β = 2.9658

B = 10679.57

C = 2.39662E-9

These results are identical to the ones with the unrelated failure excluded. A small difference can be seen only if more significant digits are considered. The warranty time with the 90% lower 1-sided confidence bound was estimated to be:

T = 11.977.729 hr

TL= 4436.46 hr

Again, the difference is negligible. This is due to the very early time at which this unrelated failure occurred.

1. The analysis is repeated treating the unrelated failure at 500 hr as a suspension, with the following results:

β = 3.0227

B = 10959.52

C = 1.23808E-9

In this case, the results are very different. The warranty time with the 90% lower 1-sided confidence bound is estimated to be:

T = 13780.208 hr

TL = 5303.67 hr

It can be seen that in this case, it would be a mistake to neglect the unrelated failure. In doing this, we would actually underestimate the warranty time. The important observation in this example is that every piece of life information is crucial. In other words, unrelated failures also provide information about the life of the product. An unrelated failure occurring at 500 hr indicates that the product has survived for that period of time under the particular stress level, thus ignoring it would be a mistake. On the other hand, it would also be a mistake to treat this data point as a failure since the failure was caused by the failure of the test equipment.