 Reliability HotWire Issue 25, March 2003 Reliability Basics Understanding Biasedness There are many properties associated with parameter estimates, such as minimum variance, sufficiency, consistency, efficiency, completeness and biasedness. The property called biasedness is commonly discussed within reliability engineering and statistics, but what does it really mean? What is a biased estimator? This article will explore the concept of biasedness and try to shed some light on this often misunderstood topic. Background An estimator is said to be unbiased if the estimator = d( ) satisfies the condition E[ ] = θ for all θ Ω. E[X] denotes the expected value of X and for a continuous distribution is defined by: But what does this really mean? First of all, biasedness comes into play when conducting analysis using maximum likelihood estimation (MLE). The discussion of MLE is beyond the scope of this article, but one of the properties of MLE is that it is asymptotically unbiased. This implies that as the sample size increases, you can expect to converge to a more accurate result. As an example, let us consider the shape parameter, β (or beta), of the Weibull distribution. It is widely known that beta is biased. The degree of biasing will increase for small sample sizes and the effect can be increased depending on the amount of censoring in the data. Keep in mind that for large sample sizes, the distribution of the parameter estimates themselves is normal (MLE property). Therefore, as the sample size increases the shape parameter, which we know is biased, will approach the condition such that E[ ] = θ for all θ Ω. The example presented next will elaborate on what this actually means. Example To illustrate biasing, Monte Carlo simulation will be used to generate times-to-failure for multiple data sets and the results will be compared. The data sets will be generated using ReliaSoft's free software tool called SimuMatic. The simulation will be conducted for β = 1.5 and η = 100. Two simulations will be run for the given parameter values. In the first case, 1000 data sets will be generated with 10 samples in each (Case 1). The second simulation will also generate 1000 data sets, but will contain 100 samples in each (Case 2). Using Monte Carlo simulation, SimuMatic will generate the times-to-failure and then estimate β and η for each data set using MLE. This will generate a population of parameters for each case. Figure 1 displays the estimated β and η values for the first 20 data sets for Case 1 (10 samples in each). Figure 1: Beta and Eta values for Case 1 The estimated β and η values for the first 20 data sets for Case 2 (100 samples in each) are shown in Figure 2. Figure 2: Beta and Eta values for Case 2 In this example it is known that β = 1.5. It was given. For a moment, assume that the value of β is not biased. Now, if the values of β have also been sorted in ascending order, where would you expect the value of β to come the closest to the true value (β = 1.5)? If you assume that the parameter values are distributed normally then you would expect to come the closest to the true value at the midpoint of the distribution. But we know β is biased so this obviously will not be the case. Where will it approach the true value? The further away from the midpoint of the distribution this occurs the greater the amount of biasing. From Figure 3 you can see for Case 1 that the value of β approaches 1.5 at 39.1% (not the midpoint). The values of beta are in column B and the values of eta are in column C. Figure 3: Beta approaches true value at 39.10% for Case 1 The biasing of β is fairly obvious with a sample size of 10. The biasing is represented by the offset from the midpoint of the distribution where the value of beta is closest to the true value. For Case 2 and a sample size of 100 see Figure 4. Figure 4: Beta approaches true value at 48.20% for Case 2 You can see that for a sample size of 100 the value of beta that is closest to the true value occurs at a point that is very near the midpoint of the distribution. This implies that the biasing is not as pronounced with the larger sample size since it is much closer to the midpoint (50%) of the distribution. Conclusion Biasing is sample size dependent. The smaller the sample size the greater the extent of the biasing. Beta was used as an example in this article, but there are also other parameters that are biased. For example, the MLE estimator for the standard deviation of the normal distribution is also biased. And keep in mind that biasing applies to maximum likelihood estimation. Biasing is not present when parameters are estimated using least squares (rank regression). However, this does not mean that least squares estimates are "more correct." For more information on the parameter estimation methods, see HotWire Issue 1 and Issue 16. Copyright 2003 ReliaSoft Corporation, ALL RIGHTS RESERVED