Estimating Regression Models Using Least Squares

 

Consider a multiple linear regression model with predictor variables:MATH

 

Let each of the predictor variables, , ..., have levels. Then represents the th level of the th predictor variable . For example, represents the fifth level of the first predictor variable , while represents the first level of the ninth predictor variable, . Observations, , ..., recorded for each of these levels can be expressed in the following way:MATH

 

The system of equations shown previously can be represented in matrix notation as follows:MATH(7)

where:MATH

MATH

 

The matrix in Eqn. (7) is referred to as the design matrix. It contains information about the levels of the predictor variables at which the observations are obtained. [Note] The vector contains all the regression coefficients. To obtain the regression model, should be known. is estimated using least square estimates. The following equation is used:MATH(8)

where represents the transpose of the matrix while represents the matrix inverse. Knowing the estimates, , the multiple linear regression model can now be estimated as:MATH(9)

 

The estimated regression model is also referred to as the fitted model. The observations, , may be different from the fitted values obtained from this model. The difference between these two values is the residual, . The vector of residuals, , is obtained as:MATH(10) The fitted model of Eqn. (9) can also be written as follows, using from Eqn. (8):MATH(11) where . The matrix, , is referred to as the hat matrix. It transforms the vector of the observed response values, , to the vector of fitted values, .

 

Example 5.1

 

An analyst studying a chemical process expects the yield to be affected by the levels of two factors, and . Observations recorded for various levels of the two factors are shown in Table 5.1. The analyst wants to fit a first order regression model to the data. Interaction between and is not expected based on knowledge of similar processes. Units of the factor levels and the yield are ignored for the analysis.

 

Table 5.1: Observed yield data for various levels of two factors.

 

The data of Table 5.1 can be entered into DOE++ using the Multiple Regression tool as shown in Figure 5.7. A scatter plot for the data in Table 5.1 is shown in Figure 5.8. The first order regression model applicable to this data set having two predictor variables is:MATH

where the dependent variable, , represents the yield and the predictor variables, and , represent the two factors respectively. The and matrices for the data can be obtained as:

 

Figure: 5.7: Multiple Regression tool in DOE++ with the data in Table 5.1.

 

 

Figure 5.8: Three dimensional scatter plot for the observed data in Table 5.1.

 
 

MATH

The least square estimates, , can now be obtained:MATH

Thus:MATH

and the estimated regression coefficients are , and . The fitted regression model is:MATH

 

In DOE++, the fitted regression model can be viewed using the Show Analysis Summary icon in the Control Panel. The model is shown in Figure 5.9.

 

Figure 5.9: Equation of the fitted regression model for the data in Table 5.1.

 
A plot of the fitted regression plane is shown in Figure 5.10. The fitted regression model can be used to obtain fitted values, , corresponding to an observed response value, . For example, the fitted value corresponding to the fifth observation is:MATH

 

 

Figure 5.10: Fitted regression plane for the data of Table 5.1.

 
The observed fifth response value is . The residual corresponding to this value is:MATH

 

In DOE++, fitted values and residuals are available using the Diagnostic icon in the Control Panel. The values are shown in Figure 5.11. The fitted regression model can also be used to predict response values. For example, to obtain the response value for a new observation corresponding to 47 units of and 31 units of , the value is calculated using:MATH

Figure 5.11: Fitted values and residuals for the data in Table 5.1.

Properties of the Least Square Estimators,

The least square estimates, , , ..., are unbiased estimators of , , ..., provided that the random error terms, , are normally and independently distributed. The variances of the s are obtained using the matrix. The variance-covariance matrix of the estimated regression coefficients is obtained as follows:MATH(12)

 

is a symmetric matrix whose diagonal elements, , represent the variance of the estimated th regression coefficient, . The off-diagonal elements, , represent the covariance between the th and th estimated regression coefficients, and . The value of is obtained using the error mean square, , which can be calculated as discussed in the beginning of Chapter 5, Multiple Linear Regression Analysis. The variance-covariance matrix for the data in Table 5.1 is shown in Figure 5.12. It is available in DOE++ using the Show Analysis Summary icon in the Control Panel. Calculations to obtain the matrix are given in Example 5.3 in Chapter 5, Test on Individual Regression Coefficients. The positive square root of represents the estimated standard deviation of the th regression coefficient, , and is called the estimated standard error of (abbreviated ).MATH(13)

 

Figure 5.12: The variance-covariance matrix for the data of Table 5.1.

 
See Also:
 
Multiple Linear Regression Analysis
Hypothesis Tests in Multiple Linear Regression
Test on Individual Regression Coefficients