Reliability HotWire: eMagazine for the Reliability Professional
Reliability HotWire

Issue 5, July 2001

Hot Topics

Analyzing Sudden-Death Testing Data

[Editor's Note: This article has been updated since its original publication to reflect a more recent version of the software interface.]

Sudden-death testing has been in popular use for a number of decades. One of the appeals of sudden-death testing is reduced test time, with claims of test time reduction of up to 75% having been made. Naturally, this has a strong attraction to program managers and test engineers who are always on the lookout for ways to cut test time and costs. On a cautionary note, one should keep in mind that there is no way to reduce the amount of testing without reducing the precision of the analysis results. Sudden-death testing is no exception - while this methodology will allow for reduced test times, it is not a "magic bullet" that will miraculously produce highly-precise reliability estimates in a fraction of the typical test time. In this article, we will look at the "classical" method of analyzing data from sudden-death tests, as well as a much simpler method of analysis using Weibull++.

Sudden-Death Testing
Sudden-death testing involves testing r identical groups of n units until the first failure in each group occurs. The number of units in each group must be the same. When a unit in a group fails, the rest of the units in that group are suspended, and another group is put on test. Some sources claim that this type of testing can be performed with as few as nine units, in three groups of three. While this seems to be a little low, we will not proffer any minimum sizes for groups or units, but will remind the prospective sudden-death testing practitioner that the more units that are tested to failure, the more precise the analysis results will be.

Once the testing for all of the groups has been completed, the failure times are plotted on a Weibull probability plot (or the probability plot for another life distribution). Note that only the time-to-failures of the weakest unit in each group are considered at this stage in the analysis; the suspended units do not come into play here. The failure times are plotted and a "sudden-death line" is drawn through the points, as if they were a set of complete data. This sudden-death line can be said to represent the population of first failures in groups of size n. Another line representing the entire population is drawn parallel to the "sudden-death line," the distance of separation being determined by median ranks and the number of units in each group. This will be illustrated in the following example.

Classical Analysis Example
In this example, suppose that we have 40 units with which to perform a reliability test. Seeking to reduce test time and expense, we decide to perform a sudden-death test, and divide the test units into eight groups of five units each (r = 8, n = 5). We put the first group on test and run all five units until one of the units fails at 120 hours. The test is immediately halted for that group, and another group of five is tested until a failure occurs, and so on until all eight groups have been tested in this manner. The following table shows the results of the test.

Group Number Failed Unit Failure Time
1 Unit #2 120 hours
2 Unit #5 200 hours
3 Unit #2 185 hours
4 Unit #3   55 hours
5 Unit #4 265 hours
6 Unit #4   90 hours
7 Unit #2 300 hours
8 Unit #1 155 hours

The following plot shows the first-failure data arranged and analyzed as a two-parameter Weibull data set, or sudden-death line.

The next step in the analysis involves placing a line on the plot that represents the entire population. This will be parallel to the sudden-death line. The points from which the "sudden-death line" are obtained are clustered around a median value that represents MR1/n% of the population, where MR is the median rank value for the first value in a sample size of n. In this example MR1/5 = 12.95%, so that the sudden-death line represents the distribution of the 12.95% failed life instead of the entire population. 

In order to derive the total population line from the sudden-death line, we must equate the median (50% value) of the sudden-death line with the MR1/n% of the total population line. This is done on the plot by drawing a line from the 50% unreliability value until it intersects the sudden-death line. A vertical line is then drawn down from this point. In this example, MR1/n% = MR1/5 = 12.95%, so another horizontal line is drawn from the 12.945% unreliability point on the y-axis to the vertical line extending down from the sudden-death line. This process is illustrated in the following figure.

At this point, a line is drawn through the intersection of the vertical line and the 12.95% line which is parallel to the sudden-death line. This line represents the entire population rather than the first failures in each small group. Since the Weibull slope (β) has already been determined from the sudden-death line, all that remains to be determined is the value of eta (η). This is found by locating the intersection of the total population line and the 63.2% unreliability value, as illustrated in the following figure.

As can be determined from the plot, the value of eta is 457 hours. We already know the value of beta from the sudden-death plot, β = 1.94. From these parameter values, all subsequent reliability calculations can be made.

Simpler Analysis Method Using Weibull++
The previous example illustrates the classic method of analyzing sudden-death testing data. One will notice that the process is rather labor-intensive, and subject to the inaccuracies inherent in dealing with manual probability plotting to determine parameter estimates. The overall process was developed before the widespread use of computers, when reliability engineers had to rely on manual plots and tables to perform many of their life data calculations.

Fortunately, we no longer have to rely on such practices of the previous millennium, and can use the power of the computer and Weibull++ to reach the same results much more easily and quickly. This is accomplished by simply treating all of the sudden-death testing data as one group, rather than a number of subgroups. According to our testing plan, eight groups of five units each were run until the first failure occurred, and then the test was terminated for that group. This means that for each group, we have one failure and four suspensions, all with the same time values. For example, Group #1 has one failure at 120 hours and four suspensions at 120 hours. The following figure shows the data entered into a Weibull++ data folio.

At this point, it is a simple matter to calculate the parameters. In order to closely duplicate the results obtained through manual plotting, rank regression on X (RRX) should be used to estimate the parameters. The following plot shows the results of the analysis.

The results from this automated analysis return values of β = 1.95 and η = 444 hours. This is very close to the results obtained from the tedious manual method (β = 1.94, η = 457 hours) and much more accurate, as the use of the Weibull++ program removes the inaccuracies of manual plotting.

 

ReliaSoft Corporation

Copyright © 2001 ReliaSoft Corporation, ALL RIGHTS RESERVED