Hot Topics

Reliability Growth Planning

The objective of reliability growth testing is to increase a system’s reliability to a particular goal or requirement through the discovery of failure modes and the implementation of corrective actions. Often times the question that arises when setting up a reliability growth program is whether the reliability goal will be met in the allocated test time. Alternatively, one may need to know how many systems should be allocated for growth testing or how long should the growth test last in order to meet the goal. The Growth Planning tool in RGA can be a very useful tool in answering those questions. In this article we present the Growth Planning tool and provide an example of an appropriate use.

Introduction

The Growth Planning tool in RGA is based on the Crow Extended model. This planning model is similar to the MIL-HDBK-189 [1] growth curve with the major distinction that the growth curve in the military handbook is based on the Crow-AMSAA (NHPP) model. Therefore, using MIL-HDBK-189 for growth planning assumes that the corrective actions for the observed failure modes are incorporated during the test and at the specific time of failure. However, in actual practice, some minor corrective actions may be implemented during the test while others that require more investigation may be delayed until after the completion of the test and some may not be fixed at all. Using the Crow Extended model for growth planning allows for additional inputs to account for a specific management strategy as well as delayed fixes with specified effectiveness factors.

Before we look at an example of how the planning tool can be utilized in RGA, let us first go over the definitions of the required inputs to the model and the calculated outputs. Note that the math behind the planning model is beyond the scope of this article. For more details on the model please refer to the Reliability Growth Planning chapter of the Reliability Growth & Repairable System Analysis Reference [2].

Inputs to the Planning Model

• Initial MTBF is the MTBF of the system before the reliability growth testing begins. It can be determined by some initial testing or through historical information, engineering expertise and/or reliability predictions.
• Goal MTBF is the MTBF requirement of the system.
• Growth Potential (GP) Design Margin is a "safety factor" that can be adjusted to make sure that the desired reliability growth will be reached. The higher the GP Design Margin, the smaller the risk that the reliability that will be observed in the field will be lower than the requirement but, at the same time, the more rigorous the reliability growth program will be. Typically, the GP Design Margin takes values between 1.2 and 1.5.
• Average Effectiveness Factor is used to determine how effective corrective actions are in eliminating a failure mode. It can be determined based on engineering expertise, specific product complexity, prior history, etc. The reason behind using an Average Effectiveness Factor is that failure modes are rarely totally eliminated by a corrective action. After failure modes have been found and fixed, a certain percentage of the failure intensity will remain in the system. The Effectiveness Factor is the fractional decrease in a mode's failure intensity after implementing the corrective action. Typically, about 30% of the failure intensity for the failure modes that are addressed will remain in the system after implementing all of the corrective actions, therefore in many reliability growth programs the average effectiveness factor is 0.7.
• Management Strategy determines the percentage of the unique failure modes discovered during the test that will be addressed (i.e. fixed). This is an important variable in reliability growth planning because the Management Strategy can be changed to address a larger percentage of the discovered failure modes if the MTBF goal cannot be reached with the current strategy. Generally, the Management Strategy is recommended to be above 90%.
• Discovery Beta is the rate at which new, unique failure modes are being discovered during testing. A value less than 1 indicates that the inter-arrival times between unique modes are getting larger. This value is expected to be less than 1 because often times most failure modes will be identified early, and their inter-arrival times will become larger as the test progresses.

Note that the planning model will solve for only one of those variables. Therefore, when setting up the planning calculations you will need to determine which variable to solve for.

Outputs of the Planning Model

• Initial Time [t(0)] is the time it takes for growth to start. In general, a failure mode needs to be observed and a corrective action implemented before reliability growth can start. Therefore the initial time must be a value greater than 0.
• Final MTBF (Act) is the MTBF of the system at the end of the last phase of the growth test. This value takes into account the average fix delay.
• T Goal (Act) is the time at which the Goal MTBF is reached. This value takes into account the average fix delay.
• Nominal Idealized Growth Curve is the growth planning curve that assumes that all fixes are implemented instantaneously.
• Actual Idealized Growth Curve is the growth planning curve that takes into account the average fix delay, which is the time required to incorporate corrective actions into the system.

Example

The reliability group of the ACME Company is preparing for the reliability growth testing phase of a new system design. Before starting the growth test the group wants to determine whether the Goal MTBF of 1,700 hours can be met in the available test time and with the allocated test units. The results of this analysis will be critical in determining whether the budget that was allocated by management for growth testing will be sufficient or whether they will need to push for additional resources in terms of time or test units so that the reliability goal can be met.

The team plans to divide the growth test into three phases that will match the product development stages. At the end of each phase, major redesigns can be applied if deemed necessary. The following table shows the duration of each phase, the available test units at each phase, the corresponding test time and the estimated average fix delay for each phase.

 Phase Duration (Weeks) Number of Units Test Hours per Day Test Days per Week Average Fix Delay (Weeks) Phase 1 16 10 8 5 2 Phase 2 16 16 16 6 3 Phase 3 24 26 16 6 3

Converting the above data into cumulative test hours for each phase, the team determined the following values.

 Phase Cumulative TestTime (hours) Average FixDelay (Hours) Phase 1 6400 800 Phase 2 30976 4608 Phase 3 90880 7488

When the first prototypes of the new system became available and before the reliability growth planning had begun, the team performed some initial testing of 10 prototypes in order to evaluate the system’s reliability. The testing lasted 5 weeks and each prototype was tested for 16 hours a day and 6 days a week for a total test time of 4,800 hours. Given that this was an evaluation test, no corrective actions were implemented during the test. When a failure was observed, the system was fixed so that it was brought back to operation and testing resumed. The following table shows the observed failure times and the corresponding failure modes. Note that the failure times shown in the table represent the cumulative test time for all 10 units. So, for example, while the first failure was observed at 81.12 hours, the cumulative time is 811.2 hours because 10 units were in the test.

 Failure Time (hours) Mode 811.2 105 1250.6 265 1955.7 145 3187.3 344 3825.1 265 4520.9 105

Having observed those failure times, the team can now calculate the Initial MTBF of the system, which is an input to the growth planning model. Given that no corrective actions were implemented during the test, the test is essentially a Test-Find-Test type. The failure data can be analyzed in RGA using the Crow Extended model with each mode categorized as a BD mode, meaning that the corrective actions will be implemented after the test. Figure 1 shows the data entered in a Failure Times folio in RGA and the calculated Demonstrated MTBF using the Crow Extended Model with the unbiased beta option set. Note that when analyzing the data, RGA requires the effectiveness factor of each corrective action that will be implemented at the end of the test. Given that no projections are necessary at this point, the team used an assumed effectiveness factor of 0.7 for all failure modes.

Figure 1: Test data analyzed using the Crow Extended model

As it can be seen from Figure 1, the Demonstrated MTBF at the end of the test is 800 hours. This value will be used as the Initial MTBF for the growth planning model.

Another variable that the team can estimate using the initial test data is the Discovery Beta. As mentioned earlier, the Discovery Beta is the rate at which new unique failure modes are discovered during the test. In order to determine the Discovery Beta from the above data set, the team performed an analysis that considers only the first occurrences of the unique failure modes. Figure 2 shows those failure times entered in RGA and analyzed using the Crow-AMSAA (NHPP) model.

Figure 2: Calculation of the Discovery Beta

The Discovery Beta is found to be 0.6772. Note that since the data set is small, they used the unbiased calculation of beta, which can be set on the Calculations page of the Application Setup window in RGA.

Finally the team decided that the variable that the planning model will solve for is the Management Strategy. This will allow them to determine the appropriate portion of the failure modes that will be found during the test that should be addressed. Having calculated the Initial MTBF and the Discovery Beta, they knew that the two additional required inputs are the Growth Potential Design Margin and the Average Effectiveness Factor. They set the Growth Potential Design Margin to be 1.35, which is a fairly common value. Based on past experience, they also set the Average Effectiveness Factor to 0.7.

After defining all inputs, they used RGA to determine the Growth Plan. Figure 3 shows the Cumulative Phase Time (in hours) and the Average Fix Delay of the three phases as entered in the Growth Planning folio of RGA.

Figure 3: Cumulative Phase Time and Average Fix Delay for each phase

Figure 4 shows the planning calculations given the inputs that were already defined and after solving for the appropriate Management Strategy value.

Figure 4: Planning calculations

As it can be seen, given that the Management Strategy will be 0.9306 (meaning that a corrective action will be implemented for 93.06% of all unique failure modes found), the Goal MTBF will be reached at 88,296 hours, which is less than the allocated test time of 90,880 hours. At the end of the test, the MTBF should be about 1,704 hours.

Figure 5 shows the plot of the growth planning curve. The plot shows the Nominal Idealized curve, the Actual Idealized curve and the planned growth at the beginning of each phase.

Figure 5: The Growth Planning Curve

Based on this analysis, the team determined that the reliability goal can be met with the already allocated resources. However, as with any test plan, they knew that they made certain assumptions in order to create the plan. Therefore the team knows that once actual testing begins, they should compare the results of the test to the planned curve in order to verify whether the initial assumptions were correct and whether the final goal can be met on time.

Conclusions

In this article we have seen that the Growth Planning model in RGA can be a very useful tool in determining whether the reliability goal can be met during the allocated time for growth testing. We presented all the necessary inputs and outputs of the planning model and gave an example of how you can use RGA to create a planning curve.

References

[1] Department of Defense, MIL-HDBK-189: Reliability Growth Modeling, Philadelphia, PA: Naval Publications and Forms Center, 2009.

[2] ReliaSoft Corporation, Reliability Growth & Repairable System Analysis Reference, Tucson, AZ: ReliaSoft Publishing, 2009.