Reliability HotWire

Issue 90, August 2008

Reliability Basics

Estimating the Expected Number of Failures for Items with Minimal or Perfect Repair

The area under the failure rate curve constructed from the first times-to-failure of a set of components is often used as an estimate of the number of spare parts needed during a mission. This method of estimating spares is applicable only under certain conditions. The primary assumptions of the method are that minimal repair is performed on the components in the population and that the size of the population remains constant over time. However, in many cases, reliability engineers erroneously apply the method to estimate the number of failures of non-repairable components. This article uses an example to clarify when it is appropriate to estimate the number of failures using the area under the failure rate curve and when other methods must be employed.

Example
A company has produced a new widget and the management has decided (without performing a reliability study) that they will provide a 300-hour warranty on the component. The management wants to minimize warranty costs on a population of 20 fielded widgets. Each new widget costs $10,000 to produce. It is up to the engineer to determine the best policy to keep the warranty costs to a minimum.

The engineer is given five widgets to test. The widgets are all tested to failure, and the following times-to-failure are obtained: 75 hours, 123 hours, 164 hours, 170 hours and 197 hours. The engineer enters these failure times into a Weibull++ Folio and calculates the parameters using rank regression on X and a 2-parameter Weibull distribution. The resulting parameters are β = 2.7948 and η = 164.9250 hours, as shown in Figure 1.

Figure 1: Computation of Weibull Parameters from Component Test Data
Figure 1: Computation of Weibull Parameters from Component Test Data

The engineers first thought is to determine how many widgets will fail during the warranty period. In other words, he wants to determine the number of components out of the original 20 that will fail during a 300-hour mission. He recalls that the cumulative distribution function, or cdf, denoted by F(T), describes the probability that a component will fail during a mission of duration T. Therefore, he determines the expected number of first failures using the following equation:

First he uses the Quick Calculation Pad to determine the percent expected to fail by 300 hours, as shown in Figure 2, and he then multiplies this value by 20 to find that all 20 original components are expected to fail by 300 hours.

Figure 2: Determining the Percent of Units that Experience at Least One Failure at Time = 300 hours
Figure 2: Determining the Percent of Units that Experience at Least One Failure at Time = 300 hours

The engineer knows he has two options. He could repair the widgets upon failure and send them back into the field, or he could replace the widgets with new ones. There are two extreme situations he wants to consider. One is to perform the minimum amount of repair to keep the original 20 widgets operating through 300 hours, and the other is to replace each failed widget with a new one.

The engineer estimates that it would be possible to repair the widget for an average of $4500 including parts and labor. For his analysis, the engineer assumes that the widget would undergo minimal repair, which means that the age of the widget when it is put back into service is identical to its age at failure.

He recalls that the failure rate function, or hazard function, denoted by λ(T), describes the number of failures per unit time for a component of age T. Integrating the failure rate function over the warranty period will provide the expected number of (minimal) repairs necessary per component in the population. So the expected number of repairs the engineer will need to perform is given by:

In order to calculate this value using Weibull++, the engineer adds a General Spreadsheet to the project. He then uses the built-in function EFAILURES() to compute the expected number of failures in the interval from 0.0001 hours to 300 hours, as shown in Figure 3. (Note that Weibull++ does not allow time equal to zero as a limit for these calculations since failure rate can be undefined at this time depending on the parameters of the chosen distribution.)

Figure 3: Using the EFAILURES Function in a Weibull++ General Spreadsheet
Figure 3: Using the EFAILURES Function in a Weibull++ General Spreadsheet

The engineer determines that he will need to repair each widget an average of 5.32 times, as shown in Figure 4.


Figure 4: Results of EFAILURES Function

Thus, for his 20 fielded components, he will need to make about 100 repairs, and will need a warranty budget of $450,000.

Next, the engineer considers the case where each failed widget is replaced with a new one. This situation is referred to as "perfect repair." In order to address this situation, the engineer decides to simulate his population of 20 components using a reliability block diagram in BlockSim. He uses a single block to represent his 20 components, as shown in Figure 5.

Figure 4: Creating a Population of 20 Fielded Components
Figure 5: Creating a Population of 20 Fielded Components

The engineer imports the parameters of the component failure distribution from his Weibull++ spreadsheet and specifies corrective maintenance of zero duration, as shown in Figures 6 and 7.

Figure 5: Importing the Failure Distribution from Weibull++ to BlockSim
Figure 6: Importing the Failure Distribution from Weibull++ to BlockSim

Figure 6: Specifying the Corrective Maintenance Distribution for the 20 Fielded Components
Figure 7: Specifying the Corrective Maintenance Distribution for the 20 Fielded Components

The engineer ran 1000 simulations, and each simulation lasted 300 hours. These settings are shown in Figure 8.

Figure 7: Specifying the Simulation Settings in BlockSim
Figure 8: Specifying the Simulation Settings in BlockSim

The predicted number of failures is computed by summing the Expected NOF column in the Block Summary section of the simulation results, as shown in Figure 9.


Figure 9: Expected Total Number of Replacements for 20 Fielded Components
[Click to Enlarge]

The engineer concluded that, if the organization chooses to replace failed widgets with new ones, 33 new widgets would be needed to support the population of 20 fielded components during the warranty. For this case, he would need to allocate $330,000 in warranty costs. Comparing this cost to the estimated $450,000 required to support a minimal repair policy, the engineer concludes it is better to replace the failed widgets with new ones than to repair the fielded widgets.

Both methods of estimating the number of failures shown in this example can be correct, but they depend on different assumptions about the efficacy of the repair. When repair is assumed to make the component "as bad as old," it is appropriate to estimate the number of failures using the area under the failure rate curve. The method of estimation using BlockSim, on the other hand, can be made appropriate for any level of repair from "as good as new," as shown here, to "as bad as old," by adjusting the restoration factor on the Corrective Maintenance page of the Block Properties window. Here, a restoration factor of 1 represents perfect repair, whereas a restoration factor of 0 would represent minimal repair. For more information on restoration factors, please see the Reliability HotWire article, "Restoration Factors in BlockSim."

Copyright 2008 ReliaSoft Corporation, ALL RIGHTS RESERVED