![]() | ||||||||||||||||||||||||||||||||||||||||||||||||
| Reliability HotWire | ||||||||||||||||||||||||||||||||||||||||||||||||
| Reliability Basics | ||||||||||||||||||||||||||||||||||||||||||||||||
|
How Good Is Your Assumed Distribution's Fit? After fitting a distribution model to a data set when performing life data analysis, we are often interested in diagnosing the model's fit or comparing the fit of different distributions. In addition to the engineering knowledge that should always govern the choice of a distribution model, there are many statistical tools that can help in deciding whether or not a distribution model is a good choice from a statistical point of view. These tools can also be used to compare the fit of different distributions. This article presents a survey of various statistical tools available in Weibull++ that can be used to assess the fit of a distribution model and compare it to other distributions. For the remainder of this article, we will use the following data set to explain the different ways to asses the fit of one or multiple distributions. For comparisons, we will use as an example the Weibull distribution and the exponential distribution. (Note, however, that the concept can be used to compare more than two distributions.) Table 1: Sample data set
Probability Plots
Figure 1: Comparing the probability plots of two distributions using the same data set The plots show that the Weibull distribution fits the data well and is a better fit than the exponential distribution. Note: Correlation Coefficient
where σxy is the covariance of x (times-to-failure) and y (median ranks), σx is the standard deviation of x, and σy is the standard deviation of y. The estimator of
ρ is the sample correlation coefficient,
The range of
The closer the value of
Using the data set presented in Table 1 and using the Rank Regression on X method to estimate the parameters, we make the following comparison: Table 2: Comparing the correlation coefficients of two distributions using the same data set
The above table shows that the
Weibull distribution is a very adequate model (i.e. |
Note: Likelihood Value
where:
Unlike the correlation coefficient, the likelihood value is not constrained by a certain range of possible values. L can have any value and therefore cannot be used by itself to make a judgment about the fit of the distribution model. L can, however, be used to compare the fit of multiple distributions. The distribution with the largest L value is the best fit statistically. Note that the likelihood values shown in Weibull++ are actually the log-likelihood values, not the likelihood values. The log-likelihood function is used instead because it is much easier to work with than L for parameter estimation. Using the log-likelihood function does not affect the validity of the results. Using the data set presented in Table 1 and using the MLE method to estimate the parameters, we make the following comparison: Table 3: Comparing the log-likelihood value of two distributions using the same data set
The above table shows that the log-likelihood value for the Weibull distribution is greater than that for the exponential distribution (i.e. the Weibull distribution is statistically a better fit).
Note:
Modified Kolmogorov-Smirnov (KS) Test If the data set is made of N failure times (t1, t2, ..., tN), we can define SN(t) to be the function giving the fraction of data points to the left of a given value ti (i = 1, 2,, ..., N). SN(t) is constant between consecutive ti values, and jumps by the same constant 1/N value at each ti. The Modified KS test uses Dmax, the maximum of the absolute difference between SN(t) and the fitted cumulative distribution function, Q(t). [Ref. 1]
What makes the Modified KS test useful is that its distribution in the case of the null hypothesis (i.e. data set drawn from the fitted distribution) can be calculated, at least to a useful approximation, thus giving the significance of any observed non-zero value of Dmax. The Modified KS test returns the probability that DCRIT < Dmax. A high probability value, close to 1, indicates that there is a significant difference between the theoretical distribution and the data set. Using the data set presented in Table 1 and using the MLE method to estimate the parameters, we make the following comparison: Table 4: Comparing two distributions using the Modified Kolmogorov-Smirnov test
The above figure shows that the value of P(DCRIT < Dmax) for the Weibull distribution is smaller than that for the exponential distribution (i.e. the Weibull distribution is statistically a better fit). Note:
Chi-Squared Test
Suppose that Ni is the number of data points in the ith bin and ni is the number expected according to the fitted distribution. The chi-squared statistic is then [Ref. 1]:
where the sum is over all the bins. Large values of χ2 indicate that the null hypothesis is rather unlikely. In other words, it is not likely that the Ni's are drawn from the population represented by the ni's, (i.e. the fitted model actually fits the data). The χ2 value follows a distribution that can be approximated by the chi-squared probability function, especially when the number of bins is much greater than 1 or the number of data points in each bin is much greater than 1. The chi-squared test returns the probability that χ2CRIT < χ2. A high probability value, close to 1, indicates that there is a significant difference between the theoretical distribution and the data set. Using the data set presented in Table 1 and using the MLE method to estimate the parameters, we make the following comparison: Table 5: Comparing two distributions using the chi-squared test
The above figure shows that the value of P(χ2 CRIT < χ2) for the Weibull distribution is smaller than that for the exponential distribution (i.e. the Weibull distribution is statistically a better fit). Note:
References | ||||||||||||||||||||||||||||||||||||||||||||||||
|
|
Copyright 2007 ReliaSoft Corporation, ALL RIGHTS RESERVED | |||||||||||||||||||||||||||||||||||||||||||||||








