Reliability Growth Planning
[Editor's Note: This article has been updated
since its original publication to reflect a more recent
version of the software interface.]
The objective of reliability growth testing is to increase a system’s
reliability to a particular goal or requirement through the discovery of failure
modes and the implementation of corrective actions. Often times the question that
arises when setting up a reliability growth program is whether the reliability
goal will be met in the allocated test time. Alternatively, one may need to know
how many systems should be allocated for growth testing or how long should the
growth test last in order to meet the goal. The Growth Planning tool
in RGA can be a very useful tool
in answering those questions. In this article we present the Growth Planning
tool and provide an example of an appropriate use.
Introduction
The Growth Planning tool in RGA is based on the Crow Extended
model. This planning model is similar to the MILHDBK189 [1]
growth curve with the major distinction that the growth curve in the military
handbook is based on the CrowAMSAA (NHPP) model. Therefore, using MILHDBK189
for growth planning assumes that the corrective actions for the observed failure
modes are incorporated during the test and at the specific time of failure.
However, in actual practice, some minor corrective actions may be implemented
during the test while others that require more investigation may be delayed until
after the completion of the test and some may not be fixed at all. Using the
Crow Extended model for growth planning allows for additional inputs to account
for a specific management strategy as well as delayed fixes with specified
effectiveness factors.
Before we look at an example of how the planning tool can be utilized
in RGA, let us first go over the definitions of the required inputs to
the model and the calculated outputs. Note that the math behind the planning
model is beyond the scope of this article. For more details on the model
please refer to the Reliability Growth
Planning chapter of the Reliability Growth & Repairable System
Analysis Reference [2].
Inputs to the Planning Model
 Initial MTBF is the MTBF of the system before the reliability
growth testing begins. It can be determined by some initial testing or
through historical information, engineering expertise and/or
reliability predictions.
 Goal MTBF is the MTBF requirement of the system.
 Growth Potential (GP) Design Margin is a "safety
factor" that can be adjusted to make sure that the desired reliability
growth will be reached. The higher the GP Design Margin, the smaller the risk
that the reliability that will be observed in the field will be lower than
the requirement but, at the same time, the more rigorous the reliability growth
program will be. Typically, the GP Design Margin takes values
between 1.2 and 1.5.
 Average Effectiveness Factor is used to determine how effective
corrective actions are in eliminating a failure mode. It can be determined
based on engineering expertise, specific product complexity, prior history,
etc. The reason behind using an Average Effectiveness Factor is that failure
modes are rarely totally eliminated by a corrective action. After failure
modes have been found and fixed, a certain percentage of the failure
intensity will remain in the system. The Effectiveness Factor is the
fractional decrease in a mode's failure intensity after implementing the
corrective action. Typically, about 30% of the failure intensity
for the failure modes that are addressed will remain in the system after
implementing all of the corrective actions, therefore in many
reliability growth programs the average effectiveness factor is 0.7.
 Management Strategy determines the percentage of the unique
failure modes discovered during the test that will be addressed
(i.e. fixed). This is an important variable in reliability growth
planning because the Management Strategy can be changed to address a larger
percentage of the discovered failure modes if the MTBF goal cannot be
reached with the current strategy. Generally, the Management Strategy is
recommended to be above 90%.
 Discovery Beta is the rate at which new, unique failure modes are
being discovered during testing. A value less than 1 indicates that the
interarrival times between unique modes are getting larger. This value is
expected to be less than 1 because often times most failure modes will be
identified early, and their interarrival times will become larger as
the test progresses.
Note that the planning model will solve for only one of those variables.
Therefore, when setting up the planning calculations you will need to determine
which variable to solve for.
Outputs of the Planning Model
 Initial Time [t(0)] is the time it takes for growth to start.
In general, a failure mode needs to be observed and a corrective action
implemented before reliability growth can start. Therefore the initial
time must be a value greater than 0.
 Final MTBF (Act) is the MTBF of the system at the end of the
last phase of the growth test. This value takes into account the
average fix delay.
 T Goal (Act) is the time at which the Goal MTBF is reached.
This value takes into account the average fix delay.
 Nominal Idealized Growth Curve is the growth planning curve
that assumes that all fixes are implemented instantaneously.
 Actual Idealized Growth Curve is the growth planning curve that
takes into account the average fix delay, which is the time required to
incorporate corrective actions into the system.
Example
The reliability group of the ACME Company is preparing for the reliability
growth testing phase of a new system design. Before starting the growth test
the group wants to determine whether the Goal MTBF of 1,700 hours can be met
in the available test time and with the allocated test units. The results of
this analysis will be critical in determining whether the budget that was
allocated by management for growth testing will be sufficient or whether
they will need to push for additional resources in terms of time or test
units so that the reliability goal can be met.
The team plans to divide the growth test into three phases that will match
the product development stages. At the end of each phase, major
redesigns can be applied if deemed necessary. The following table shows
the duration of each phase, the available test units at each phase, the
corresponding test time and the estimated average fix delay for each phase.
Phase 
Duration
(Weeks) 
Number
of Units 
Test Hours
per Day 
Test Days per
Week 
Average Fix
Delay (Weeks) 
Phase 1 
16 
10 
8 
5 
2 
Phase 2 
16 
16 
16 
6 
3 
Phase 3 
24 
26 
16 
6 
3 
Converting the above data into cumulative test hours for each phase, the
team determined the following values.
Phase 
Cumulative Test Time
(hours) 
Average Fix Delay
(Hours) 
Phase 1 
6400 
800 
Phase 2 
30976 
4608 
Phase 3 
90880 
7488 
When the first prototypes of the new system became available and before the
reliability growth planning had begun, the team performed some initial testing
of 10 prototypes in order to evaluate the system’s reliability. The testing
lasted 5 weeks and each prototype was tested for 16 hours a day and 6 days a
week for a total test time of 4,800 hours. Given that this was an evaluation
test, no corrective actions were implemented during the test. When
a failure was observed, the system was fixed so that it was brought back to
operation and testing resumed. The following table shows the observed failure
times and the corresponding failure modes. Note that the failure times shown
in the table represent the cumulative test time for all 10 units. So, for example,
while the first failure was observed at 81.12 hours, the cumulative time
is 811.2 hours because 10 units were in the test.
Failure Time (hours) 
Mode 
811.2 
105 
1250.6 
265 
1955.7 
145 
3187.3 
344 
3825.1 
265 
4520.9 
105 
Having observed those failure times, the team can now calculate the Initial
MTBF of the system, which is an input to the growth planning model. Given that
no corrective actions were implemented during the test, the test is essentially
a TestFindTest type. The failure data can be analyzed in
RGA using
the Crow Extended model with each mode categorized as a BD mode, meaning that the
corrective actions will be implemented after the test. Figure 1 shows the data
entered in a Failure Times folio in RGA and the calculated Demonstrated
MTBF using the Crow Extended Model with the unbiased beta option set. Note that
when analyzing the data, RGA requires the effectiveness factor of each
corrective action that will be implemented at the end of the test. Given that
no projections are necessary at this point, the team used an assumed
effectiveness factor of 0.7 for all failure modes.
Figure 1: Test data analyzed using the Crow Extended model
As it can be seen from Figure 1, the Demonstrated MTBF at the end of the test
is 800 hours. This value will be used as the Initial MTBF for the growth
planning model.
Another variable that the team can estimate using the initial test data is the
Discovery Beta. As mentioned earlier, the Discovery Beta is the rate at which
new unique failure modes are discovered during the test. In order to determine
the Discovery Beta from the above data set, the team performed an analysis
that considers only the first occurrences of the unique failure modes.
Figure 2 shows those failure times entered in RGA and analyzed using
the CrowAMSAA (NHPP) model.
Figure 2: Calculation of the Discovery Beta
The Discovery Beta is found to be 0.6772. Note that since
the data set is small, they used the unbiased calculation of beta, which can
be set on the Calculations page of the Application Setup window in
RGA.
Finally the team decided that the variable that the planning model will solve
for is the Management Strategy. This will allow them to determine the appropriate
portion of the failure modes that will be found during the test that should be
addressed. Having calculated the Initial MTBF and the Discovery Beta, they knew
that the two additional required inputs are the Growth
Potential Design Margin and the Average Effectiveness Factor. They set the
Growth Potential Design Margin to be 1.35, which is a fairly common value. Based
on past experience, they also set the Average Effectiveness Factor to 0.7.
After defining all inputs, they used RGA to determine
the Growth Plan. Figure 3 shows the Cumulative Phase Time (in hours) and the
Average Fix Delay of the three phases as entered in the Growth Planning folio
of RGA.
Figure 3: Cumulative Phase Time and Average Fix Delay for each phase
Figure 4 shows the planning calculations given the inputs that were
already defined and after solving for the appropriate Management
Strategy value.
Figure 4: Planning calculations
As it can be seen, given that the Management Strategy will be 0.9306
(meaning that a corrective action will be implemented for 93.06% of all
unique failure modes found), the Goal MTBF will be reached at 88,296 hours, which
is less than the allocated test time of 90,880 hours. At the end of the
test, the MTBF should be about 1,704 hours.
Figure 5 shows the plot of the growth planning curve. The plot shows the
Nominal Idealized curve, the Actual Idealized curve and the planned growth at
the beginning of each phase.
Figure 5: The Growth Planning Curve
Based on this analysis, the team determined that the reliability goal can be
met with the already allocated resources. However, as with any test plan,
they knew that they made certain assumptions in order to create the plan.
Therefore the team knows that once actual testing begins, they should compare
the results of the test to the planned curve in order to verify whether the initial
assumptions were correct and whether the final goal can be met on time.
Conclusions
In this article we have seen that the Growth Planning model in
RGA can
be a very useful tool in determining whether the reliability goal can be met
during the allocated time for growth testing. We presented all the necessary
inputs and outputs of the planning model and gave an example of how you can
use RGA to create a planning curve.
References
[1] Department of Defense, MILHDBK189:
Reliability Growth Modeling, Philadelphia, PA: Naval Publications and
Forms Center, 2009.
[2] ReliaSoft Corporation, Reliability
Growth & Repairable System Analysis Reference, Tucson, AZ: ReliaSoft
Publishing, 2009.
