Reliability HotWire
Hot Topics

You Have a Small Data Set: What Do You Do?

Engineers are no strangers to the pressures of making decisions based on little information and in compressed time frames-- for many engineers, it is almost the norm. The high cost of reliability testing and the need to get to market before competitors often leave data analysts and engineers with less than satisfactory data. Many data sets comprise no failures at all, only a few failures or a few failures and many suspensions (surviving units). If you are one of these engineers who understand these pressures and wonder how you can produce good reliability estimates with such small data sets, read on. This article presents an array of approaches that can be used to analyze small life data sets with Weibull++.

Non-parametric Analysis

Non-parametric life data analysis facilitates data analysis without assuming an underlying life distribution model. This may offer an advantage if you are uncomfortable about assuming a distribution because you are dealing with unfamiliar failure modes for which you have not gathered enough knowledge or if you find it difficult to decide on a distribution that fits the data statistically when the data set is small. On the other hand, the confidence bounds associated with non-parametric analysis are usually much wider than those calculated via parametric analysis and predictions outside the range of the observations are not possible, which greatly limits this approach.

For the rest of this article, we will use the following data set for illustration. It describes the test results of a sample of units that incorporate a minor improvement from a previous design. The data set contains a few failures and many suspensions.

Table 1 - Small Data Set Example

 Number State (F or S) Time (hrs) 1 F 160 1 F 560 1 F 800 10 S 1000

Using the Kaplan-Meier non-parametric method, we obtain the following reliability plot (with 90% two-sided confidence bounds).

Figure 1 - Small Data Set Analyzed Non-Parametrically

Exponential Distribution

Analysis with the exponential distribution is a parametric approach that can be used to model the behavior of units that have a constant failure rate (or units that do not degrade with time or wear out). Because the exponential distribution has only one parameter, it is more robust to small sample sizes and uncertainties in parameter fitting than distributions with two or more parameters. Another application of the exponential distribution is that it can be used with data sets that have no failures at all (only suspensions). However, a major drawback of using the exponential distribution is the assumption that the failures are purely random (chance failures), an assumption that is often not valid.

Assuming that the design has a constant failure rate, we obtain the following probability plot (with 90% two-sided confidence bounds on reliability) for the data set in Table 1.

Figure 2 - Small Data Set Analyzed with Exponential Distribution

One-Parameter Weibull Distribution

Like the exponential distribution, one-parameter Weibull distribution is a one-parameter model. However, the advantage of the one-parameter Weibull distribution is its ability to model products with increasing failure rate, constant failure rate and decreasing failure rate. This distribution is based on the common Weibull distribution, but assumes that the shape parameter, β, is a known value. This distribution is sometimes known as the "WeiBayes" distribution. The advantage of this distribution over the common two-parameter Weibull is that it is more robust to small sample sizes and uncertainties in fitting the parameter because it only needs to estimate one parameter, η, rather than two. The price that you pay with this approach is that you need to be able to assume a value of β. This could be based on prior comparable tests, observation and engineering knowledge. Note that the word "comparable" is key here. You cannot use a prior β if the data set you are analyzing comes from units of drastically different designs or units that fail due to different failure modes.

In our example, let us assume that prior observations based on a previous design indicated that β is typically 1.3. Don't get too mired in debating what is the "right" β, as you can't truly know. You could try a few possible values of β and assess the impact on the predictions. The following figure shows one-parameter Weibull probability plots with β = 1.15, β = 1.2 and β = 1.3 and 90% two-sided confidence bounds on reliability.

Figure 3 - Small Data Set Analyzed with One-Parameter Weibull and Different β Values

Bayesian Analysis

The premise of Bayesian statistics is to incorporate prior knowledge along with a given set of current observations in order to make statistical inferences. The prior information could come from observational data, previous comparable experiments or engineering knowledge. This type of analysis is particularly useful in cases where there is a lack of current test data but there is a strong prior understanding about the parameter of the assumed life model and a distribution can be used to model the parameter. By incorporating prior information about a parameter, a posterior distribution for a parameter can be produced and an adequate estimate of reliability can be obtained. Weibull++ offers the Weibull-Bayesian distribution which combines the properties of the Weibull distribution with the concepts of Bayesian statistics.

This approach expands on the concept of the previous approach (one-parameter Weibull). Here, instead of using single deterministic values of β to run the analysis, we use a probabilistic model that describes our knowledge about the β value. The Weibull-Bayesian model is actually a true "WeiBayes" model that offers an alternative to the one-parameter Weibull by including the variation and uncertainty observed in the past on the shape parameter.

Let us assume that prior observations on previous designs showed that β follows a normal distribution with μ = 1.2 and σ = 0.1. The following figure shows the corresponding Weibull-Bayesian probability plot with 90% two-sided confidence bounds on reliability.

Figure 4 - Small Data Set Analyzed with the Weibull-Bayesian Distribution

Note:

What is described above is a selection of typical distributions (exponential, one-parameter Weibull and Weibull-Bayesian) that have convenient properties and practical applications in small data set analysis. This does not mean that other distributions (such as the two-parameter Weibull, lognormal, normal, gamma, Gumbel, etc.) cannot be used. From a theoretical point of view, any distribution, when its minimum data requirements for model fitting are met, can be used to analyze the data. As an example, the next figure shows the two-parameter Weibull probability plot with 90% two-sided confidence bounds on reliability.

Figure 5 - Small Data Set Analyzed with the Two-Parameter Weibull Distribution

Notice that in the above figure, the confidence bounds for the the two-parameter Weibull are wider compared to the bounds in Figures 3 and 4. This is due to the fact that in the case of the one-parameter Weibull and the Weibull-Bayesian distributions, a prior idea about the β parameter is available and helps reduce the uncertainty in the estimates.

Other Approaches:

Accelerated Testing

If you are worried that an insufficient number of failures will occur during your test duration, you could consider accelerating the test by elevating the stresses to produce more failures in the same (or shorter) amount of time. Accelerated life testing introduces another set of considerations and challenges and often requires an even larger sample size than standard testing. However, when properly executed, it can be a good way to avoid ending up with a data set dominated by suspensions.

Degradation analysis is another alternative for analyzing data sets that contain few failures and many suspensions. Many failure mechanisms can be directly attributed to the degradation of a part or a characteristic of the product (e.g. wear of brake pads, leakage, noise level, temperature, propagation of crack size, etc). Degradation analysis facilitates extrapolation to a failure time based on the measurements of degradation or performance over time. From the projected failure times, a distribution can be derived and subsequent reliability calculations become feasible.

This approach also introduces other challenges. You need to be able to measure the degradation over time, which could mean investing in additional types of equipment. It also requires knowledge (from physics of failure) about how the degradation worsens with time.

Note:

Typically, accelerated testing and degradation analysis are not "after the fact" approaches. In other words, you need to plan to apply them before starting the test. In cases where you are obtaining data not from tests but from field data or warranty returns, you may not be able to apply these approaches, as you do not have complete control over the customer usage. That is, you cannot ask customers to accelerate their usage or monitor the product degradation over time (unless your products have sensors that can collect data during the life of the product in the field).

Conclusion

Because abundant data rarely exist, this article presented an overview of different techniques for analyzing small data sets. A few words of caution: Always use common sense. Dont accept results blindly, especially with small sample sizes-- your engineering knowledge plays a critical role in such situations.