
Evaluation of Life Test Plans with Different Censoring Schemes
There are many options available when designing a test to elicit failure distribution parameters and associated metrics. For example, there are multiple censoring schemes that can be used, such as testing all units to failure or suspending the test after a specific time or number of failures. There are also practical constraints, such as the number of units available or the total test time available. This article uses the SimuMatic tool in Weibull++ to examine the tradeoff between total test time, the number of test units and the variability of an estimated metric for four different life test scenarios: two scenarios in which all units are tested to failure, one scenario in which the test is terminated after a specific time, and one scenario in which the test is terminated after a specific number of failures.
A reliability engineer wants to determine the B10 life of an existing product that has undergone a redesign. The previous version of the product followed a Weibull distribution with a shape parameter, β, of 2.5 and had a characteristic life, η, of 1,500 hours. Discussions with the design engineers suggest that the new product will last twice as long as the old one, so the reliability engineer estimates that the new product has a characteristic life of 3,000 hours. He knows he can obtain a maximum of 15 units to test. He decides to use the SimuMatic tool in Weibull++ to help design the life test.
As a starting point, he calculates the expected B10 life from the Weibull reliability equation:
Solving for time, he obtains:
The engineer wants to ensure that he understands all the outputs of the SimuMatic tool before using it to design the reliability test. He decides to use SimuMatic to generate a very small number of data sets each with 15 failure times. He uses the following settings in the SimuMatic setup window:
 Parameters: β = 2.5 and η = 3,000 hours
 Censoring: No censoring
 Seed: Use seed = 1
 Number of data sets: 10
 Number of points: 15
 Analysis Method: RRX
 Target Reliability: 90%
 Lower 1Sided Confidence Level: 90%
 Reliability Values: 90%
The resulting Simulation sheet in the SimuMatic folio is shown next.
Figure 1: Simulation tab output from SimuMatic for 10 data sets
In order to see the data sets that were generated, he clicks the Show Raw Data button in the Additional Results section of the control panel. The first four data sets are shown next.
Figure 2: The first 4 of 10 data sets generated by SimuMatic
He copies the first data set and pastes it into a standard folio. He calculates the parameters and the B10 life (i.e., T(0.9)) and finds that the calculations match the first row in Figure 1, as expected. He also computes the average value of the largest time to failure observed in each data set in order to estimate how long the test might last; the average largest time to failure is 4,464 hours.
Next, he looks at the Sorted sheet shown in Figure 3. He notes that each column is sorted in ascending order independently of the other columns, i.e., the value of B10 for a given row cannot be computed from the values of β and η in that row. He can get some idea of how much variability in the estimates of B10 life he would see with a test of 15 specimens all tested to failure by comparing the B10 values in the highlighted rows. (Note that this is comparable to an 80% twosided bound.) He uses these values to compute the bounds ratio as:
Figure 3: 10th
and 90th percentile estimates of B10 life
He clicks the Show Summary button in the Additional Results section of the control panel to view the report shown in Figure 4. Right away, he sees that T1 is the B10 life he computed using only his initial parameter estimates, T2 is the B10 life at the 10th percentile (i.e., 1 – CL) from the Sorted tab, and T3 is the average largest time to failure. He also notes that DELTA is similar to the bounds ratio that he had just computed, except that DELTA compares the life computed using the initial parameter estimates and the lower bound.
Figure 4: Test Planning Results from SimuMatic
Armed with his new understanding of the outputs of SimuMatic, the reliability engineer decides to evaluate four different test scenarios using the bounds ratio and the expected test duration. In order to determine some candidate test plans, he opens the Test Design Assistant and selects Expected Failure Time Plot. In the plot's control panel, he specifies a sample size of 15 units that come from a population described by a shape parameter of 2.5 and a scale parameter of 3,000 hours. He creates the plot at a 90% twosided confidence level in order to capture a large range of plausible failure times. (Note that this confidence level does not need to be the same as the one used in SimuMatic because the bounds are used for a different purpose. In SimuMatic, we were trying to determine the expected variability of estimates of B10 life for a specific test design; in the Expected Failure Time plot, we are interested in the variability of the 15th failure time.) The plot is shown in Figure 5.
Figure 5: Expected Failure
Time plot for 15 units that follow a Weibull distribution with a shape
parameter of 2.5 and a scale parameter of 3,000 hours
According to this plot, the 15th failure is expected to occur at 4,716 hours, which is comparable to the simulation result of 4,464 hours. (While these values may seem rather different, the simulation result is based on a very small number of simulated data sets. A simulation with 1,000 data sets is presented later in this article. In that case, the expected last failure time is at 4,699 hours, which agrees very well with the Expected Failure Time plot.)
The plot also shows that the last failure time is likely to occur between 3,717 hours and 6,010 hours. Based on this information, the engineer decides to investigate what might happen if the test were truncated at 3,500 hours. He chose this time because it was to the left of the lower bound on the 15th failure time, so a test truncated at 3,500 hours would probably result in a data set with one or more suspensions while still containing a sufficient number of failures to make a reasonable estimate of the failure distribution of the units on test. Again referring to Figure 5, he draws a vertical line at 3,500 hours. He determines that there will likely be between 9 and 14 failures (out of 15 samples) by 3,500 hours. He considers two additional test scenarios: 1) truncating the test after 10 out of 15 total units have failed and 2) testing a total of 10 units to failure.
Some of the data sets that he was going to analyze would have a significant number of suspensions, so he decides to analyze all of the test scenarios using MLE to ensure a fair comparison between test strategies. The SimuMatic settings he uses are given in Table 1.
Table 1: Inputs to SimuMatic for different test scenarios
All Test Scenarios  β = 2.5, η = 3,000 hours Seed: Use Seed = 1 Number of Data Sets: 1,000 Analysis Method: MLE Target Reliability: 90% Lower 1Sided Confidence Level: 90% Reliability Values: 90% 

Test Scenario  Censoring  Number of Points 
15 Units; All Failures 
No censoring  15 
15 Units; Suspend Test at 3,500 Hours 
Right censoring after a specific time: 3,500 hours  15 
15 Units; Suspend Test after 10 Failures 
Right censoring after a specific number of failures: 10  15 
10 Units; All Failures 
No censoring  10 
He compares the results of the simulations shown in Table 2. Note that the 95% upper bound on test duration is taken from the Expected Failure Time plots for 10 or 15 units. Not surprisingly, he finds that testing all 15 of the units to failure yields the lowest variability of estimates of B10 life. However, he is pleased to find that by suspending the test at 3,500 hours, he can reduce the expected test time by over 25% and the worst case (95% upper bound) test time by over 40% while increasing the bounds ratio by only 2%. In addition, he finds that if 10 failures occur before 3,500 hours, he could elect to stop the test knowing that he would expect around a 7% increase in the bounds ratio over testing all 15 units to failure. Finally, he notes that testing 10 units to failure will result in a longer test with more variability in the estimate of B10 than either of the truncated test scenarios with 15 units. Thus, the simulations show that although suspensions result in more variability of estimate of the B10 life (i.e., the bounds ratio is higher when testing 15 units and suspending the test either at a specified number of failures or test time compared to testing all 15 units to failure), including units in the test that are suspended does enhance the estimate of B10 life (i.e., the bounds ratio is lower for the test with 10 failures and 5 suspensions compared to testing 10 units to failure).
Table 2: Results from SimuMatic and Expected Failure Time plots for different test scenarios
Test Scenario  Bounds Ratio  Expected Duration (hours)  95% Upper Bound on Test Duration (hours) 
15 Units; All Failures 
1.85  4,699  6,010 
15 Units; Suspend Test at 3,500 Hours 
1.89  3,500  3,500 
15 Units; Suspend Test after 10 Failures 
1.96  2,990  3,671 
10 Units; All Failures 
2.02  4,452  5,835 
Conclusion
This article examined the tradeoff between total test time and variability of an estimated metric for four different life test scenarios. First, an overview of the calculations performed by the SimuMatic tool in Weibull++ was presented. Then the Expected Failure Time plot was examined to provide guidance for what test scenarios might be good candidates for reducing test time and/or number of test units. Finally, SimuMatic was used to provide results that could be used to choose the best test plan.