Estimating the
Expected Number of Failures for Items with Minimal or Perfect
Repair
The area under the
failure rate curve constructed from the first timestofailure
of a set of components is often used as an estimate of the
number of spare parts needed during a mission. This method of
estimating spares is applicable only under certain conditions.
The primary assumptions of the method are that minimal repair is
performed on the components in the population and that the size
of the population remains constant over time. However, in many
cases, reliability engineers erroneously apply the method to
estimate the number of failures of nonrepairable components.
This article uses an example to clarify when it is appropriate
to estimate the number of failures using the area under the
failure rate curve and when other methods must be employed.
Example
A company has produced a new widget and the management has
decided (without performing a reliability study) that they will
provide a 300hour warranty on the component. The management
wants to minimize warranty costs on a population of 20 fielded
widgets. Each new widget costs $10,000 to produce. It is up to
the engineer to determine the best policy to keep the warranty
costs to a minimum.
The engineer is given five widgets to test. The
widgets are all tested to failure, and the following
timestofailure are obtained: 75 hours, 123 hours, 164 hours,
170 hours and 197 hours. The engineer enters these failure times
into a Weibull++
Folio and calculates the parameters using rank regression on X
and a 2parameter Weibull distribution. The resulting parameters
are β = 2.7948 and η = 164.9250 hours, as shown in Figure 1.
Figure 1: Computation of Weibull Parameters from Component Test
Data
The engineers
first thought is to determine how many widgets will fail during
the warranty period. In other words, he wants to determine the
number of components out of the original 20 that will fail
during a 300hour mission. He recalls that the cumulative
distribution function, or
cdf, denoted by F(T), describes the probability that a
component will fail during a mission of duration T. Therefore,
he determines the expected number of first failures using the
following equation:
First he uses the
Quick Calculation Pad to determine the percent expected to fail
by 300 hours, as shown in Figure 2, and he then multiplies this
value by 20 to find that all 20 original components are expected
to fail by 300 hours.
Figure 2: Determining the Percent of Units that Experience at
Least One Failure at Time = 300 hours
The engineer knows
he has two options. He could repair the widgets upon failure and
send them back into the field, or he could replace the widgets
with new ones. There are two extreme situations he wants to
consider. One is to perform the minimum amount of repair to keep
the original 20 widgets operating through 300 hours, and the
other is to replace each failed widget with a new one.
The engineer estimates that it would be possible to repair
the widget for an average of $4500 including parts and labor.
For his analysis, the engineer assumes that the widget would
undergo minimal repair, which means that the age of the widget
when it is put back into service is identical to its age at
failure.
He recalls that the failure rate function, or hazard
function, denoted by λ(T), describes the number of
failures per unit time for a component of age T. Integrating the
failure rate function over the warranty period will provide the
expected number of (minimal) repairs necessary per component in
the population. So the expected number of repairs the engineer
will need to perform is given by:
In order to
calculate this value using
Weibull++, the engineer adds a General Spreadsheet to the
project. He then uses the builtin function EFAILURES() to
compute the expected number of failures in the interval from
0.0001 hours to 300 hours, as shown in Figure 3. (Note that
Weibull++ does not allow time equal to zero as a limit for
these calculations since failure rate can be undefined at this
time depending on the parameters of the chosen distribution.)
Figure 3: Using the EFAILURES Function in a Weibull++ General
Spreadsheet
The engineer
determines that he will need to repair each widget an average of
5.32 times, as shown in Figure 4.
Figure 4: Results of EFAILURES
Function
Thus, for his 20
fielded components, he will need to make about 100 repairs, and
will need a warranty budget of $450,000.
Next, the engineer
considers the case where each failed widget is replaced with a
new one. This situation is referred to as "perfect repair." In
order to address this situation, the engineer decides to
simulate his population of 20 components using a reliability
block diagram in
BlockSim. He
uses a single block to represent his 20 components, as shown in
Figure 5.
Figure 5: Creating a Population of 20 Fielded Components
The engineer
imports the parameters of the component failure distribution
from his Weibull++ spreadsheet and specifies corrective
maintenance of zero duration, as shown in Figures 6 and 7.
Figure 6: Importing the Failure Distribution from Weibull++ to
BlockSim
Figure 7: Specifying the Corrective Maintenance Distribution for
the 20 Fielded Components
The engineer ran
1000 simulations, and each simulation lasted 300 hours. These
settings are shown in Figure 8.
Figure 8: Specifying the Simulation Settings in BlockSim
The predicted
number of failures is computed by summing the Expected NOF
column in the Block Summary section of the simulation results,
as shown in Figure 9.
Figure 9: Expected Total Number of Replacements for 20 Fielded
Components
[Click
to Enlarge]
The engineer
concluded that, if the organization chooses to replace failed
widgets with new ones, 33 new widgets would be needed to support
the population of 20 fielded components during the warranty. For
this case, he would need to allocate $330,000 in warranty costs.
Comparing this cost to the estimated $450,000 required to
support a minimal repair policy, the engineer concludes it is
better to replace the failed widgets with new ones than to
repair the fielded widgets.
Both methods of
estimating the number of failures shown in this example can be
correct, but they depend on different assumptions about the
efficacy of the repair. When repair is assumed to make the
component "as bad as old," it is appropriate to estimate the
number of failures using the area under the failure rate curve.
The method of estimation using BlockSim, on the other
hand, can be made appropriate for any level of repair from "as
good as new," as shown here, to "as bad as old," by adjusting
the restoration factor on the Corrective Maintenance page of the
Block Properties window. Here, a restoration factor of 1
represents perfect repair, whereas a restoration factor of 0
would represent minimal repair. For more information on
restoration factors, please see the Reliability HotWire
article, "Restoration
Factors in BlockSim."
