Reliability HotWire: eMagazine for the Reliability Professional
Reliability HotWire

Issue 9, November 2001

Reliability Basics

Maximum Likelihood Estimation

In last month's Reliability Basics, we looked at the probability plotting method of parameter estimation. Similar to this method is that of rank regression or least squares, which essentially "automates" the probability plotting method mathematically. In this article, we take a look at the maximum likelihood estimation (MLE) method. This is considered to be one of the most robust parameter estimation techniques.

The Likelihood Function
Maximum likelihood estimation endeavors to find the most "likely" values of distribution parameters for a set of data by maximizing the value of what is called the "likelihood function." This likelihood function is largely based on the probability density function (pdf) for a given distribution. As an example, consider a generic pdf:

where x represents the data (times to failure), and θ1, θ2,..., θk are the parameters to be estimated. For a two-parameter Weibull distribution, for example, these would be beta (β ) and eta (η). For complete data, the likelihood function is a product of the pdf functions, with one element for each data point in the data set:

where R is the number of failure data points in the complete data set, and xi is the ith failure time. It is often mathematically easier to manipulate this function by first taking the logarithm of it. This log-likelihood function then has the form:

It then remains to find the values for the parameters that result in the highest value for this function. This is most commonly done by taking the partial derivative of the log-linear equation for each parameter and setting it equal to zero:

This results in a number of equations with an equal number of unknowns, which can be solved simultaneously. This can be a relatively simple matter if there are closed-form solutions for the partial derivatives. In situations where this is not the case, numerical techniques need to be employed.

Exponential Example
This process is easily illustrated with the one-parameter exponential distribution. Since there is only one parameter, there is only one differential equation to be solved. Moreover, this equation is closed-form, owing to the nature of the exponential pdf. The likelihood function for the exponential distribution is given by:

where lambda (λ) is the parameter we are trying to estimate. Since the log-likelihood function is easier to manipulate mathematically, we derive this by taking the natural logarithm of the likelihood function. For the exponential distribution, the log-likelihood function has the form:

Taking the derivative of the equation with respect to λ and setting it equal to zero results in:

From this point, it is a simple matter to rearrange this equation to solve for λ:

This gives the closed-form solution for the MLE estimate for the one-parameter exponential distribution. Obviously, this is one of the most simplistic examples available, but it does illustrate the process well. The methodology is more complex for distributions with multiple parameters, or do not have closed-form solutions. 

Dealing With Suspensions
The previous section illustrated the MLE methodology for complete data sets. Often, however, data sets will contain suspended data. This will make the process a little more difficult, but not too much so. Essentially, dealing with suspended or right-censored data involves including another term in the likelihood function. As stated earlier, the term for the complete data uses the probability density function (pdf). The second term for suspensions incorporates the cumulative density function (cdf). This extended likelihood function has the form:

where m is the number of suspended data points, yj is the jth suspension, and F(yj;θ1, θ2,..., θk) is the cdf. With this function, the analysis process proceeds as described previously: take the natural logarithm of the likelihood function, take the partial derivatives with respect to the parameters, and solve simultaneously.

The likelihood function for the suspended data helps illustrate some of the advantages that MLE analysis has over other parameter estimation techniques. First and foremost, MLE methodology takes into account the values of the suspension times, as is illustrated in the previous equation. Probability plotting and rank regression only take into account the relative location of the suspensions, not the actual time-to-suspension values. This makes MLE a much more powerful tool when dealing with data sets that contain a relatively large number of suspensions. A second advantage of the MLE method is that it is theoretically possible to derive parameter estimates for data sets containing nothing but suspensions. (Note, however, that the mathematics of the partial derivatives make it impossible to solve for more than one parameter with data sets consisting of nothing but suspensions. Either a one-parameter distribution must be used, or values for other parameters in the distribution must be assumed. It is generally not recommended to draw important conclusions from analyses of data sets containing only suspensions.)

Note that this analysis only uses failure or suspension time data for the analysis; at no point are reliability/unreliability values or estimates incorporated. This sometimes results in models that do not track plotted data points on probability plots. As was discussed in last month's Reliability Basics, data points are placed on the plot with the failure time as the x-coordinate and an unreliability estimate for the y-coordinate. Maximum likelihood estimation does not use these unreliability estimates. Consequently, the plot line based on the MLE parameter estimates does not always track the plotted points. This does not mean that one method or the other is "wrong," just that they were plotted using different techniques.

Likelihood Function Surface
ReliaSoft's Weibull++ software contains a feature that allows the generation of a three-dimensional representation of the log-likelihood function. This best represents two-parameter distributions, with the values of the parameters on the x- and y-axes, and the log-likelihood value on the z-axis. (In Weibull++, the log-likelihood value is normalized to a value of 100%.) The following graphic gives an example of a likelihood function surface plot for a two-parameter Weibull distribution.

Likelihood Function Surface plot for a 2-parameter Weibull distribution, created in Weibull++.

Thus, the "peak" of the likelihood surface function corresponds to the values of the parameters that maximize the likelihood function, i.e. the MLE estimates for the distribution's parameters.

Comments on Maximum Likelihood Estimation
The MLE method has many large sample properties that make it attractive for use. It is asymptotically consistent, which means that as the sample size gets larger, the estimates converge to the true values. It is asymptotically efficient, which means that for large samples it produces the most precise estimates. It is also asymptotically unbiased, which means that for large samples one expects to get the true value on average. The estimates themselves are normally distributed, if the sample is large enough. These are all excellent large sample properties.

Unfortunately, the size of the sample necessary to achieve these properties can be quite large: thirty to fifty to more than a hundred exact failure times, depending on the application. With fewer data points, the methods can be biased. It is known, for example, that MLE estimates of the shape parameter for the Weibull distribution are biased for small sample sizes, and the effect can be increased depending on the amount of censoring. This bias can cause discrepancies in analysis.

There are also pathological situations when the asymptotic properties of the MLE do not apply. One of these is estimating the location parameter for the three-parameter Weibull distribution when the shape parameter has a value close to 1. These problems, too, can cause major discrepancies.

As a rule of thumb, our recommendation is to use rank regression techniques when the sample sizes are small and without heavy censoring. When heavy or uneven censoring is present, when a high proportion of interval data points are present and/or when the sample size is sufficient, MLE should be preferred.


ReliaSoft Corporation

Copyright © 2001 ReliaSoft Corporation, ALL RIGHTS RESERVED