If the mean value of an estimator equals the true value of the quantity it estimates, then the estimator is called an unbiased estimator (see Figure 3.4). For example, assume that the sample mean is being used to estimate the mean of a population. Using the Central Limit Theorem, the mean value of the sample means equals the population mean. Therefore, the sample mean is an unbiased estimator of the population mean.
If the mean value of an estimator is either less than or greater than the true value of the quantity it estimates, then the estimator is called a biased. For example, suppose you decide to choose the smallest observation in a sample to be the estimator of the population mean. Such an estimator would be biased because the average of the values of this estimator would always be less than the true population mean. In other words, the mean of the sampling distribution of this estimator would be less than the true value of the population mean it is trying to estimate. Consequently, the estimator is a biased estimator.
|
Figure 3.4: Example showing the distribution of a biased estimator which underestimates the parameter in question, along with the distribution of an unbiased estimator. |
A case of biased estimation is seen to occur when sample variance, , is used to estimate the population
variance, , if the following relation is used
to calculate the sample variance:
The sample variance calculated using this relation is always less than
the true population variance. This is because to calculate the sample
variance, deviations with respect to the sample mean, , are used. [Note]
Sample observations, , tend to be closer to than to . Thus, the calculated deviations are smaller. As a result, the sample
variance obtained is smaller than the population variance. To compensate
for this, is used as the denominator in place
of in the calculation of sample variance.
Thus, the correct formula to obtain the sample variance is:
It is important to note that although using as the denominator makes the sample variance, , an unbiased estimator of the population variance, , the sample standard deviation, , still remains a biased estimator of the population standard deviation, . For large sample sizes this bias is negligible.
See Also: