The linear regression model may not be directly applicable to certain data. Non-linearity may be detected from scatter plots or may be known through the underlying theory of the product or process or from past experience. Transformations on either the predictor variable, , or the response variable, , may often be sufficient to make the linear regression model appropriate for the transformed data.
If it is known that the data follows the logarithmic distribution, then a logarithmic transformation on (i.e. ) might be useful. For data following the Poisson distribution, a square root transformation () is generally applicable.
Transformations on may also be applied based on the type of scatter plot obtained from the data. Figure 4.17 shows a few such examples. For the scatter plot of Figure (a), a square root transformation () is applicable. While for Figure (b), a logarithmic transformation (i.e. ) may be applied. For Figure (c), the reciprocal transformation () is applicable. At times it may be helpful to introduce a constant into the transformation of . For example, if is negative and the logarithmic transformation on seems applicable, a suitable constant, , may be chosen to make all observed positive. Thus the transformation in this case would be .
|
Figure 4.17: Transformations on for a few possible scatter plots. Plot (a) may require , (b) may require and (c) may require . |
The Box-Cox method may also
be used to automatically identify a suitable power transformation for
the data based on the relation:
Here the parameter is determined using the given data such that is minimized (details on this method are presented in Chapter 6).