Variations occur in nature, be it the tensile strength of a particular grade of steel, caffeine content in your energy drink or the distance traveled by your vehicle in a day. Variations are also seen in the observations recorded during multiple executions of a process, even when all factors are strictly maintained at their respective levels and all the executions are run as identically as possible. The natural variations that occur in a process, even when all conditions are maintained at the same level, are often termed as noise. When the effect of a particular factor on a process is studied it becomes extremely important to distinguish the changes in the process caused by the factor from noise. A number of statistical methods are available to achieve this. This chapter covers basic statistical concepts that are useful in understanding the statistical analysis of data obtained from designed experiments. The initial sections of this chapter discuss the normal distribution and related concepts. The assumption of the normal distribution is widely used in the analysis of designed experiments. The subsequent sections introduce the standard normal, Chi-Squared, and distributions that are widely used in calculations related to hypothesis testing and confidence bounds. The final sections of this chapter cover hypothesis testing. It is important to gain a clear understanding of hypothesis testing because this concept finds direct application in the analysis of designed experiments to determine whether a particular factor is significant or not. [15]
If you record the distance traveled by your car everyday then these
values would show some variation because it is unlikely that your car
travels the same distance each day. If a variable is used to denote these values then
is termed as a random variable
(because of the diverse and unpredicted values can have). Random variables are denoted
by uppercase letters while a measured value of the random variable is
denoted by the corresponding lowercase letter. For example, if the distance
traveled by your car on January 1 was 10.7 miles then: 
A commonly used distribution to describe the behavior of random variables is the normal distribution. When you calculate the mean and standard deviation for a given data set, you are assuming that the data follows a normal distribution. A normal distribution (also referred to as the Gaussian distribution) is a bell shaped curved (see Figure 3.1). The mean and standard deviation are the two parameters of this distribution. The mean determines the location of the distribution on the axis and is also called the location parameter of the normal distribution. The standard deviation determines the spread of the distribution (how narrow or wide) and is thus called the scale parameter of the normal distribution. The standard deviation, or its square called variance, gives an indication of the variability or spread of data. A large value of the standard deviation (or variance) implies that a large amount of variability exists in the data.
|
Figure 3.1: Normal probability density functions for different values of mean and standard deviation. |
Any curve is also referred to as the probability density function or pdf of the normal distribution as the area under the curve gives the probability of occurrence of for a particular interval. For instance, if you obtained the mean and standard deviation for the distance data of your car as 15 miles and 2.5 miles respectively, then the probability that your car travels a distance between 7 miles and 14 miles is given by the area under the curve covered between these two values which is calculated as 34.4% (see Figure 3.2). This means that on 34.4 days out of every 100 days your car travels, you car can be expected to cover a distance in the range of 7 to 14 miles.
|
Figure 3.2: Normal probability density function with the shaded area representing the probability of occurrence of data between 7 and 14 miles. |
On a normal probability density function, the area under the curve between the values of and is approximately 99.7% of the total area under the curve. This implies that almost all the time (or 99.7% of the time) the distance traveled will fall in the range of 7.5 miles and 22.5 miles . Similarly, covers approximately 95% of the area under the curve and covers approximately 68% of the area under the curve.
See Also:
Population Mean, Sample Mean and Variance