Reliability HotWire: eMagazine for the Reliability Professional
Reliability HotWire

Issue 39, May 2004

Hot Topics
Introduction to Reliability Growth

In general, the first prototypes produced during the development of a new system contain design, manufacturing and/or engineering deficiencies. In order to identify and correct these deficiencies, the prototypes are usually subjected to a rigorous testing program. During this testing, most problem areas are usually identified and appropriate corrective (or redesign) actions are taken. During the first phases of a product's development, the estimate of the product's final reliability is typically called the "reliability goal." To reach this goal, the product must undergo comprehensive testing and appropriate corrective actions must be implemented. This well structured process of finding reliability problems and monitoring the increase of the product's reliability through successive phases is commonly called "reliability growth."

Background

Reliability growth is the improvement in the reliability of a product (i.e. component, subsystem or system) or service over a period of time due to changes in the product design and/or the manufacturing process. Reliability growth analysis (RGA) concerns itself with the quantification and assessment of parameters (or metrics) relating to the item's reliability growth over time. Reliability growth management concerns itself with the planning and management for the item's reliability growth as a function of time and resources.

Reliability information generated and recorded over time can be used to observe trends in the reliability of the product. The term "growth" is always used since you assume that the reliability of the product will increase over time as design changes and fixes are implemented. In other words, reliability growth is a projection of the reliability of a system, component, unit (or service) to some future development time. This projection is based upon information currently available from predictions or prior experience on identical or similar systems. Monitoring the reliability, the mean time between failures (MTBF) and the failure rate of the system, equipment or product also establishes a trend in the increase in the reliability, the increase in the MTBF or the decrease in the failure rate. This is achieved with engineering, research, development, test-analyze-and-fix (TAAF) and/or test-analyze-and-redesign (TAAR) procedures until it passes its acceptance tests and/or is delivered to the end-user.

Such growth occurs from corrective and/or preventive actions based on experience gained from early failures and corrective actions to the equipment, design, production and operation processes. These actions represent an obvious reason for improved reliability. The TAAF or TAAR concept is applied by uncovering weaknesses during the testing stages and performing appropriate actions before full-scale production.

Growth can also occur by natural screening. If the population of devices is heterogeneous, then the high failure rate items are naturally screened-out through operational use. Such screening can improve the mixture of a heterogeneous population, generating an apparent growth phenomenon when, in fact, the devices themselves are not improving.

Learning by operator and maintenance personnel also plays an important role in this improvement scenario. Through continued use of the equipment, operator and maintenance personnel become more familiar with it. This is called "natural learning." Natural learning is a continual process in which the reliability is improved as fewer mistakes are made in operation and maintenance. Thus, the equipment is used more effectively. The learning rate increases in the early stages and then levels off when familiarity is achieved. To compensate for this phenomenon, the natural learning can be accompanied by revisions of technical manuals or even specialized training for improved operation and maintenance. This is also called "familiarity growth" or "maintenance growth." If a population of devices undergo repairs without their age being affected, then this leads to reliability growth. External factors, such as decreasing stress trends or wear-hardening characteristics, can also lead to reliability growth.

The concept of reliability growth is not just theoretical or absolute. Growth is related to factors such as reliability requirements, initial reliability level, reliability funding and management, corrective actions and competitive factors. For example, a 400% improvement in reliability for equipment that initially had one tenth of the reliability goal is not as much of an improvement as a 50% improvement in reliability for equipment that initially had one half of the reliability goal.

A comprehensive reliability growth program is developed based on three important factors:

  • Management, where the decisions are made to keep the program moving.
  • Testing, where all the weaknesses and failure modes are found in the design and manufacturing process.
  • Failure Identification, Analysis and Fix (FIAAF), where the cause of failure is isolated, analyzed and then fixed.

One question that a manager may ask is "When does a reliability growth program take place in the development process?" Actually, there is more than one answer to this question. It may take place during an early prototype testing, integrated testing, dedicated test-analyze-and-fix (TAAF) and/or test-analyze-and-redesign (TAAR) procedures or during production for any manufacturing or quality problems.

Integrated reliability growth testing is based on existing testing, such as operational testing, safety testing or any other testing that is involved with the operation of that specific product. The way that the program is performed, in this case FIAAF, is applied in parallel with the existing development tests. This type of reliability growth program is relatively cost-effective because the additional cost is minimal. Note, however, that when testing a product's specifications, the test environment has to be consistent with the specified environmental conditions that the product is going to be operated under.

In the case of a dedicated TAAF and/or TAAR, the environmental test conditions are controlled. This test, however, is very costly to conduct and any required design changes would also be expensive to perform because this testing phase is usually late in the development process. On the other hand, a dedicated reliability growth testing program is necessary if a high reliability target is desired, especially for complex systems.

Another question that a manager may ask is "What does a reliability growth program consist of and how is growth accomplished during testing?" First, a reliability goal is set and this goal should be achieved during the development testing program with the necessary allocation or reallocation of resources. Planning and evaluating are essential factors in controlling the growth process. A comprehensive reliability growth program needs well-structured planning of assessment techniques, which should be based on demonstrated and projected values that are designed to evaluate reliability growth as testing progresses. A reliability growth program differs from a conventional reliability program in the fact that there is a more objectively developed growth standard against which assessment techniques are compared. Secondly, the reliability can be more accurately evaluated for the current version of the equipment with assessment techniques. Hence, a comparison between the assessment and the planned value provides a good estimate of whether or not the program is progressing as planned. If the program does not progress as planned, then new strategies, re-examination of the problem areas and new assessment techniques should be incorporated.

This discussion, in turn, raises the following question: "What if some of the fixes cannot be incorporated during testing?" It is possible that only some fixes can be done on the product during testing and some of the fixes must be delayed until the end of the test due to the fact that the test is too expensive to stop and then restart or the equipment is too complex for performing a complete teardown. The incorporation of delayed fixes usually results in a distinct jump in the reliability of the system at the end of the test phase. This concept is handled by demonstrated reliability values and projected reliability values. Demonstrated reliability values are based on the actual and current system performance. Projected reliability values are based on an estimate of future system reliability, which accounts for the delayed fixes that will be incorporated at the end of the test or between test phases.

Why Reliability Growth?

Reliability growth can be quantified in different ways, all of which are directly correlated. Reliability growth can be quantified by looking at different quantities or aspects, such as the increase in the MTBF, the decrease in the failure rate as a function of time or the increase in the mission success probability (synonymous to reliability). All of these quantities are, in general, mathematically correlated and one can be obtained from the other.

These quantities are collectively called "reliability growth trends" and are usually presented as curves, or reliability growth curves, constructed on certain mathematical and statistical models, called "reliability growth models." In the quantification of these models, you should first be concerned with the current reliability estimate of the product. This estimate can be obtained using current failure data on prototypes or, if the product is in its design phases, by using past data from similar products or data from the subcomponents of the product. Once the current reliability estimate has been assessed, then it needs to be determined whether it is possible to extrapolate or extend the growth to some point in the future.

This ability to accurately project the reliability of the product at some time in the future will enable you to determine:

  • Whether the stated reliability requirements will be achieved.
  • The associated time for meeting such requirements.
  • The associated costs of meeting such requirements.
  • The correlation of reliability changes with reliability activities.
  • The reliability improvement warranty.
  • The plan for maintenance manpower and logistic activities.
  • How to perform the life-cycle-cost analysis.

The determination of these tasks is of extreme value in the proper management of the production process and the overall reliability program. In other words, reliability growth studies are necessary to insure that, based on information available at the beginning of a project, the reliability, R, MTBF, m, or failure rate, λ, goals are capable of being met by acceptance or delivery and use time. This growth model is normally used to project R, m or λ at the completion date. If this projected R, m or λ is equal to or exceeds the specified target goal, then the project manager would be confident that the project's R, m or λ requirements will be met. Otherwise, the manager will have to re-assess the reliability prediction techniques or refine them in the hopes of exceeding the goal. Also, the manager will have to improve the design, use more redundancy or more reliable components, or allocate a greater proportion of the contract's resources to design engineering, reliability and maintainability engineering, research and development, manufacturing, purchasing, quality control, inspection and testing; and perhaps provide for more test units.

ReliaSoft Corporation

Copyright 2004 ReliaSoft Corporation, ALL RIGHTS RESERVED