<<< Back to Index
This Month's Tip >>>

Statistical Intervals

 

 

What are tolerance and prediction intervals? How are they different from confidence intervals? This month, we will define these three types of statistical intervals, describe two types of statistical studies and summarize when to use each type of interval based on the type of study to be performed.

Confidence intervals are most commonly used. A confidence interval establishes an interval based on a sample that contains the true population (or process) parameter or metric x% of the time, if a random sample is drawn repeatedly from the same population.

Prediction intervals apply to the situation in which a statement needs to be made about a future population that does not currently exist. For example, samples are provided of a prototype and we need to know if future versions of the design will exhibit the same characteristic of interest as the prototype.

Tolerance intervals describe, for a sampled population (or process), an interval that contains a certain percentage of the population x% of the time. Note that for a normal distribution, a lower one-sided x% tolerance bound to be exceeded by y% of the population is equivalent to a lower one-sided x% confidence bound for the yth percentile; an upper one-sided x% tolerance bound to exceed y% of the population is equivalent to an upper one-sided x% confidence bound for the yth percentile.

To help to understand the difference between the three types of intervals and their applications, let’s discuss some basic assumptions of sample data.

Statistical intervals represent an uncertainty that exists in the data because we work with samples that are obtained from a larger population or process. As the number of samples we have to work with increases, we notice that the length of the confidence interval decreases. Uncertainty can also be introduced through measurement error, so it is important to be able to understand the amount of uncertainty introduced by the measurement system used. Finally, uncertainty can exist in the data when the data is not "representative" of the population or process. We need to remember that any statistical intervals calculated represent only the statistical uncertainty of the data and not the uncertainly of samples representing the population of interest.

So, what does it mean to have "representative data"? First, we need to make a distinction between enumerative and analytical studies, a concept introduced by Deming (1950). Simply stated, an enumerative study is one with a well-defined, existing population where the purpose of the study is to describe or make inferences about the existing population. An analytical study is one in which the population currently does not exist. The purpose of an analytical study is to make conclusions about the future product or process, not about the materials under current investigation. Analytical studies often involve sampling prototype units, made in the lab or on a pilot line, to draw conclusions about subsequent full-scale production. For example, a study may involve sampling product from inventory to make some type of inference about the product population or process.

To summarize, if the focus of the study is to describe a characteristic about the existing inventory, then the study is enumerative. If the study focuses on the future output from the process, then the study is analytical.

Based on the focus of the study, there are important considerations in the sampling strategies.

Sampling strategy for an enumerative study:

Sampling strategy for an analytical study:

Based on the discussion about enumerative and analytical studies, it becomes a little easier to determine what type of statistical interval is appropriate for your application. Hahn and Meeker (1991) provided the table below that summarizes when to use each type of statistical interval.  

General Purpose of the Statistical Interval
Characteristic  of Interest Description (Enumerative) Prediction (Analytical)
Location - as measured by the mean or some %tile of the specific distribution Confidence interval for a population mean or median or a specified distribution percentile. Prediction interval for a future sample mean, future sample median, or a particular ordered observation from a future sample.
Spread - as measured by its standard deviation Confidence interval for a population standard deviation. Prediction interval for the standard deviation of a future sample.
Enclosure Interval - Tolerance intervals to contain a specific proportion of the population and a prediction interval to contain all, or most, of the observations from a future sample. Tolerance interval to contain (or cover) at least a specified proportion of a population. Prediction interval to contain all or most of these observations from a future sample.
Probability of an Event - A specific example is a confidence interval for the probability that a measurement will exceed a specified threshold value Confidence interval for the probability of being less than (or greater than) some specified value. Prediction interval to contain the proportion of observations in a future sample that exceed a specified limit.

So, there you have it! Confidence intervals and tolerance intervals are used when the population already exists and you want to be able to make inferences about that population based on a random sample taken from the population. Prediction intervals are used when you are trying to make an inference about a population that will exist in the future, based on sample data that "represents" the future population.

References

Deming, W., Some theory of sampling, 1st ed. New York: John Wiley & Sons, 1950.

Hahn, G. and Meeker, W., Statistical intervals, 1st ed. New York: Wiley, 1991.