This appendix includes the following sections:
Assume that a set of data pairs (x1, y1), (x2, y2), ... , (xN, yN), were obtained and plotted. Then, according to the least squares principle, which minimizes the vertical distance between the data points and the straight line fitted to the data, the best fitting straight line to these data is the straight line y = + x such that:
and where and are the least squares estimates of a and b, and N is the number of data points.
To obtain and , let:
Differentiating F with respect to a and b yields:
(1)
and:
(2)
Setting Eqns. (1) and (2) equal to zero yields:
and:
Solving the equations simultaneously yields:
(3)
and:
(4)
Assume that a set of data pairs (x1, y1), (x2, y2), ... , (xN, yN) were obtained and plotted. Then, according to the least squares principle, which minimizes the horizontal distance between the data points and the straight line fitted to the data, the best fitting straight line to these data is the straight line x = + y such that:
Again, and are the least squares estimates of a and b, and N is the number of data points.
To obtain and , let:
Differentiating F with respect to a and b yields:
(5)
and:
(6)
Setting Eqns. (5) and (6) equal to zero yields:
and:
Solving the above equations simultaneously yields:
(7)
and:
(8)
Solving the equation of the line for y yields:
Fit a least squares straight line using regression on X and regression on Y to the following data:
|
x |
1 |
2.5 |
4 |
6 |
8 |
9 |
11 |
15 |
|
y |
1.5 |
2 |
4 |
4 |
5 |
7 |
8 |
10 |
The first step is to generate the following table:
Table A.1 - Data analysis for the least squares method
Using the results in Table A.1, Eqns. (3) and (4) yield:
and:
The least squares line is given by:
The plotted line is shown in the next figure.
For rank regression on X using the analyzed data in Table A.1, Eqns. (8) and (7) yield:
and:
The least squares line is given by:
The plotted line is shown in the next figure.
Note that the regression on Y is not necessarily the same as the regression on X. The only time when the two regressions are the same (i.e. will yield the same equation for a line) is when the data lie perfectly on a line.
The correlation coefficient is given by:
If x is a continuous random variable with pdf:
where θ1, θ2, ... θk are k unknown constant parameters that need to be estimated, conduct an experiment and obtain N independent observations, x1, x2, ..., xN which correspond in the case of life data analysis to failure times. The likelihood function (for complete data) is given by:
The logarithmic likelihood function is:
The maximum likelihood estimators (MLE) of θ1, θ2, ... θk, are obtained by maximizing L or Λ.
By maximizing Λ, which is much easier to work with than L, the maximum likelihood estimators (MLE) of θ1, θ2, ... θk are the simultaneous solutions of k equations such that:
Even though it is common practice to plot the MLE solutions using median ranks (points are plotted according to median ranks and the line according to the MLE solutions), this is not completely accurate. As it can be seen from the equations above, the MLE method is independent of any kind of ranks. For this reason, many times the MLE solution appears not to track the data on the probability plot. This is perfectly acceptable since the two methods are independent of each other, and in no way suggests that the solution is wrong.
This subsection includes the following subsubsections:
To estimate for a sample of n units (all tested to failure), first obtain the likelihood function:
Take the natural log of both sides:
Obtain , and set it equal to zero:
Solve for or:
Note that the value of is an estimate because if we obtain another sample from the same population and re-estimate λ, the new value would differ from the one previously calculated. In plain language, is an estimate of the true value of λ. How close is the value of our estimate to the true value? To answer this question, one must first determine the distribution of the parameter, in this case λ. This methodology introduces a new term, confidence bound, which allows us to specify a range for our estimate with a certain confidence level. The treatment of confidence bounds is integral to reliability engineering, and to all of statistics. (Confidence bounds are covered in the Confidence Bounds chapter.)
To obtain the MLE estimates for the mean, , and standard deviation, σT, for the normal distribution, start with the pdf of the normal distribution which is given by:

If T1, T2, ... , TN are known times-to-failure (and with no suspensions), then the likelihood function is given by:


then:
Then taking the partial derivatives of Λ with respect to each one of the parameters and setting them equal to zero yields:
(9)
and:
(10)
Solving Eqns. (9) and (10) simultaneously yields:
and:
It should be noted that these solutions are only valid for data with no suspensions, i.e. all units are tested to failure. In the case where suspensions are present or all units are not tested to failure, the methodology changes and the problem becomes much more complicated.
If we had five units that failed at 10, 20, 30, 40 and 50 hours, the mean would be:
The standard deviation estimate then would be:
A look at the likelihood function surface plot in Figure A-1 reveals that both of these values are the maximum values of the function.
Figure A-1: Likelihood surface plot for MLE normal distribution example.
This three-dimensional plot represents the likelihood function. As can be seen from the plot, the maximum likelihood estimates for the two parameters correspond with the peak or maximum of the likelihood function surface.
Go
to weibull.com
Go to ReliaSoft.com
©1996-2006. ReliaSoft Corporation. ALL RIGHTS RESERVED.