A Simple Method for Analyzing a Data Set with Unknown Suspension Times
Sometimes valuable information is left out of field failure reporting systems, particularly if a system is put in place before it is known how the collected data will be used. Such omissions can lead to a misleading analysis of field failure data. This article considers a case where failure data on a fielded component are collected but operating times of unfailed fielded units are not known. First, the effect of suspension times on predicted reliability is examined, and then a simple method of estimating suspension times for fielded units is discussed.
Joe the reliability engineer had recently accepted a transfer to another site within his company. His new boss asked him to analyze some field data from a component to see if it was performing as well as the target B50 life of 5,000 hours. His boss gave Joe the total number of components (892) that were put into the field nine months ago and a failure data set, which is shown in Table 1. Joe was asked to provide an estimate of the B50 life of the components in the field by the end of the day.
Table 1: Fielded Component Failure Time Data Set
|Component Failure Times (hours)|
Every data point in the data set was classified as a failure, so Joe asked his boss where he could find information about the components that were still operating. Joe’s boss told him to treat the unfailed components as suspensions at 1 hour, just like Joe’s predecessor had done, since the only information they had about those components was that they didn’t fail upon startup.
Joe went back to his desk and looked at the information he had been given. The data set that contained the failure times was small compared to the total number of components in the population; maybe 7 or 8 percent of the total number of fielded components were accounted for in the data set. Joe knew that it would be best to use maximum likelihood estimation (MLE) rather than rank regression (RRX) to fit a model to the data, due to the large percentage of censored data points. But he also knew that the MLE method takes into account the exact times to suspension, so assuming a very small value for the operating times of all the suspended components would produce an overly conservative estimate of component life. Joe wanted to provide a more realistic estimate to his boss, but he knew that he would have to justify why he used a different method than his predecessor. He decided to use the Monte Carlo tool in Weibull++ to illustrate how the analysis could be improved.
Joe started by searching for a historical analysis of a similar component. He found data from a laboratory test of a previous generation of the component. The test results indicated that the previous generation component followed a Weibull distribution with a shape parameter, beta, of 2.5 and a scale parameter, eta, of 5,000 hours. He was unable to quickly find information on how many components were put into the field or the usage of the component. Given the time constraints, Joe chose to assume that 1,000 previous generation components were put into the field and that they had a usage that was uniformly distributed between 0 and 2,000 hours. He decided to separate the data into 10 bins, where each bin would hold 100 components of a specific usage level. As a first step, he created a folio with a data sheet for components with 200 hours of use, a second data sheet for components with 400 hours of use, and so on. He named each of the data sheets for use in the next steps.
Joe opened the Weibull++ Monte Carlo tool by choosing Home > Tools > Weibull++ Monte Carlo.
To create his data for the 200 hour usage level, he entered his parameters on the Main tab and specified the suspension time on the Censoring tab:
On the Settings tab, he cleared the Use Seed check box, set the number of data points to 100 and specified the "200" data sheet that he had created:
Then he clicked Generate to create the data for the 200 hour usage level. He repeated this process nine more times to generate data for the remaining usage levels. He then copied the data from each data sheet into a single data sheet in a new folio.
Joe made three copies of the Monte Carlo data for all 1,000 components (for a total of 4 data sheets). He set the distribution on all four data sheets to 2-parameter Weibull.
On the first data sheet, Joe deleted all the suspensions and analyzed the remaining data using RRX to mimic the case of analyzing his field data without considering the effect of the suspensions.
On the second data sheet, he analyzed the Monte Carlo data for all 1,000 components. This data sheet and all of the subsequent ones were analyzed using MLE because they contained mostly suspended data points.
S at 1
On the third data sheet, he changed all the suspension times to 1 hour to mimic the analysis his boss was expecting.
S at 2000
On the fourth data sheet, he changed all the suspension times to 2,000 hours to investigate what would happen if all the unfailed components in the field were assumed to have the maximum amount of usage.
Joe created an additional data sheet with no data to hold the parameters of the "actual" distribution. He clicked Calculate for the empty sheet and, when prompted, entered the parameters he had used to generate the Monte Carlo data (beta = 2.5, eta = 5,000 hours).
Joe created an overlay plot of reliability for all five data sheets to compare the results. He removed the data points in order to be able to see the reliability lines more clearly. He also made the line that showed the “actual” reliability curve thicker than the others.
To explain the plot to his boss, Joe showed him the thick line representing the “actual” reliability curve and described how the leftmost two curves represent the analysis performed with only failure data and the analysis performed with all the unfailed components assumed to have a suspension time of 1 hour. Joe’s boss was surprised to see that the two curves were nearly indistinguishable and how much both methods underestimated the true reliability. Next, Joe pointed out that the rightmost curve, which greatly overestimated the true reliability, assumed all unfailed components had a usage equal to the greatest assumed usage of 2,000 hours. Finally, Joe showed his boss the curve immediately to the left of the “actual” reliability curve, in which he assumed that the usage of the unfailed components in the field was spread relatively evenly between 0 and 2,000 hours. At that point in the discussion, Joe’s boss remembered that he did have some actual usage data that was collected for the previous generation of the component. Joe’s predecessor didn’t know how to incorporate the usage data into his analyses, so Joe’s boss didn’t think of it again until Joe’s plot reminded him. An hour later, Joe’s boss produced the usage data shown in the first two columns of Table 2.
Table 2: Component Usage Information
Usage per 9
Joe calculated the expected usage for the time the components had been in the field (9 months) and the cumulative percentage of customers with each usage, as shown in the last two columns of Table 2. (Note that Joe used 99% in the last cell in the table because Weibull accepts only numbers greater than 0% and less than 100% in the free-form data sheet.) He copied the calculated data into a free-form (probit) data sheet in Weibull++. He chose to analyze the data using rank regression minimizing the variability in the y-direction (RRY), because he was providing y-axis values instead of calculating them using median ranks. He used the distribution wizard and found that lognormal was the best fit for this usage data. It yielded a location parameter, log-mean, of 6.62627 and a scale parameter, log-std, of 0.928195.
Joe turned to the Monte Carlo tool again to generate his suspensions. Since there were 892 fielded components and 71 failures, he generated 821 data points following the lognormal usage distribution given above and using random censoring with 100% right censoring.
Now Joe was able to combine the failures and suspensions into a single data sheet. He chose MLE and a 2-parameter Weibull distribution. Then he used the QCP to obtain the B50 life. He presented the information shown in Table 3 to his boss. Joe told his boss that even though the B50 life predicted by the new analysis method met the target, two things bothered him. First, the lower bound of the prediction did not meet the target, so Joe proposed that it would be wise to continue to update his analysis as more failures were observed. He also mentioned that having additional information from the field could be helpful in multiple ways:
- With information about the usage of the items that were found failed, the Weibulll++ usage format warranty analysis folio could provide a somewhat more sophisticated analysis.
- Having suspension information from the field would be even more advantageous, as it would yield more accurate assessments of component reliability.
Joe’s boss agreed that collecting usage information on both failed and unfailed components would become a priority for the future.
Table 3: Component Data Analysis Summary
B50 Life Lower
B50 Life Median
B50 Life Upper
|Failure Data Only||2.89||4,662||3,801||4,106||4,436|
Failure Data Plus Monte
This example showed a way to analyze a data set with unknown suspension times. First, using the results of a previous generation component analysis, we examined the effect of assigning suspension times to unfailed components with unknown operating times in a variety of ways. Making the suspension times all short, all long, or evenly distributed showed a large difference in predicted reliability curves. Second, we used data on actual field usage to generate suspensions for the analysis of new component data. This showed that the new component was close to meeting its B50 life target.