Reliability HotWire: eMagazine for the Reliability Professional
Reliability HotWire

Issue 70, December 2006

Reliability Basics

Standards Based Reliability Prediction in a Nutshell

Standards based reliability prediction is a methodology for predicting reliability for systems and components (mostly electronics) based on failure rate estimates published by globally recognized military or commercial standards. Standards based reliability prediction is especially useful in the initial stages of development when hard failure data is not yet available or when manufacturers are obliged contractually by their customers to use published standards for their reliability predictions. This article presents an overview of standards based reliability prediction and how it can be performed with the help of the Lambda Predict software.

Assumptions and Applicability
Issue 50 of Reliability HotWire presented an introduction to standards based reliability prediction and discussed the applicability and assumptions used in this approach. Issue 51 presented an overview of common prediction standards and analysis methods. Readers are encouraged to review these issues to set the stage for this article.

Prediction Standards
The common prediction standards are MIL-HDBK-217, Bellcore/Telcordia (SR-332), NSWC-98/LE1 (for mechanical components), China 299B (GJB/z-299B) and RDF 2000 (IEC 62380).

Analysis Methods:
The typical analysis methods are:

  • Part Count Analysis method.

  • Part Stress Analysis method.

  • In addition to these methods, which are common to all the standards, Bellcore also uses 3 more methods (Method I, Method II, Method III).

Issue 51 presented an overview of the analysis methods mentioned above.

Calculations and Metrics
The standards typically estimate the system reliability by relying on base failure rates for the components in the system. The base failure rates describe the components while operating under "normal" (determined by the standard) environmental conditions. The base failure rates are then multiplied by various factors (called pi factors, ranging between 0 and 1) that describe the specific conditions/stress in which the component is used and, in the case of some standards (such as MIL-217), by factors that describe the quality of the component.

Standards based reliability prediction calculates failure rates by summing, or "rolling up," the failure rates of all components and subassemblies to the system level. It may (depending on the method used by the analysis) also add the failure rates associated with the components' solder joint connections and other types of construction, such as surface mounts and printed circuit boards (PCBs) or hybrid devices. The following metrics can be calculated:

Failure Rate, λ: The conditional failure rate, defined as the total number of failures within an item's populations, divided by the total time expended by that population, during a particular measurement interval under stated conditions. Reliability predictions are typically stated in number of failures per million hours, FPMH. In Bellcore, failure rates are usually expressed as failures per billion hours, FI.

MTBF: The Mean Time Between Failure is the expected hours of operation between failure under specified conditions.

Unavailability: In standards based reliability prediction, this term is used interchangeably with Unreliability in the case of unrepairable systems. Unreliability is defined as 1 - R(t), where R(t) represents reliability. Since the standards assume a constant failure rate and all of the calculations are based on failure rates or MTBF values, this assumption dictates the use of the exponential distribution model to describe the reliability function. The following expression describes the exponential distribution model; the time variable can be used to calculate the reliability of the system/subsystem at a specific time value.

R(t) = e-λt      or      R(t) = e-t/MTBF

Contribution: The failure rate of an item or block (collection of items) accounts for a certain percentage of the failure rate of the next higher level or hierarchy. The is the item or block's contribution. This may be (a) the percent contribution of a component's failure rate to the total failure rate of the block (collection of components) to which it is connected, (b) the percent contribution of a component or block's failure rate to the total failure rate of the top level hierarchy or system (collection of blocks or components) to which it is connected or (c) the percent contribution of a system's failure rate to the total overall project (collection of systems) failure rate.

First-Year-Multiplier
This feature is unique to the Bellcore/Telcordia standard. Bellcore stresses the early life (infant) mortality problems of electronics and the use of burn-in by manufacturers to reduce the severity of infant mortality by weeding out weak components that suffer from early life problems. The Bellcore standard applies a First-Year-Multiplier factor that accounts for infant mortality risks in the failure rate prediction. The multiplier, defined as the average failure rate during the first year of operation, expresses a multiple of the Steady-State Failure Rate. The Bellcore standard also applies a "credit" for the use of a burn-in period and reduces the First-Year-Multiplier accordingly (i.e. the multiplier is smaller for longer periods of burn-in).

Mission Profile
Predicting reliability has to be done in accordance with the field use conditions, which might sometimes be time-varying. A mission profile can be utilized to decompose, in several homogenous working phases, the phases of conditions that the product goes through over time.

The capability to specify a mission profile is available only in the RDF 2000 standard. This standard allows specification of a temperature mission profile with different phases. The phases can have different temperatures that influence the failure rate of the components. The phases can also be one of each of the following types with various average outside temperature swings seen by the equipment:

  • On/Off working phases

  • Permanent working phases

  • Dormant (storage) phases

The different types of phases mentioned above affect the failure rate calculation in different ways, as they apply different stresses on the components.

Repairable and/or Redundant Systems Analysis
Typical standards based reliability predictions address devices and equipment as non-repairable serial systems, where any component failure will cause a system failure and the system stays in that failed state forever. Therefore, no redundancy or repair is included in the models. Lambda Predict provides additional capabilities to include system- and/or block-level repair and redundancy functions in the failure rate and unavailability calculations. Repairable analysis is implemented in Lambda Predict by simply entering MTTR (Mean Time to Repair) data. The analyst can also specify the number of redundant units and describe the relationship used: simple parallel configuration (hot standby) or cold standby (backup) configuration. The base availability is calculated using the repair rate
μ = 1/MTTR and the failure rate, λ. In the case of redundant systems, the unavailability is then calculated using the base availability, the failure rate and the number of systems available as backups. A corresponding failure rate can also be calculated for redundant systems.

Allocation
Oftentimes, a design needs to meet a certain reliability goal. For systems of multiple components/subsystems, the reliability goal needs to be apportioned (allocated) across the different components/subsystems in a way that the rolled-up failure rate would meet the reliability goal.

Standards based reliability prediction methods typically use one of five allocation models that can be used to logically apportion the product design reliability into lower level design criteria such that the cumulative reliability can meet the requirement. The methods rely on different allocation techniques and would therefore produce different results. The five allocation methods are:

  • Equal: This method is the simplest and does not take into account any differences among the elements; it just takes the reliability goal and splits it equally across all the elements.

  • AGREE: A technique that considers both the complexity and importance of each element based on the estimated probability of product failure as a result of element failure.

  • Feasibility: Relies on numerical rating scales to assess the elements based on product complexity, state-of-the-art, operating time and environment.

  • ARINC: This method looks only at the current failure rate of a subsystem and allocates the reliability using a weighted factor calculated by taking the ratio of the subsystem's current failure rate to the sum of all the subsystems' failure rates.

  • Repairable Systems Allocation: This technique allocates subsystem failure rates to allow the system to meet an availability objective for a repairable system. Assuming that all subsystems are identical and have a constant repair rate and by determining the ratio of allocated failure rate to the repair rate for each subsystem based on a steady-state availability calculation, the failure rate allocated to each subsystem can be determined.

Derating Analysis
Most equipment failures are precipitated by stress. When the applied stress exceeds the inherent strength of the part, either a serious degradation or a failure will occur. To assure reliability, equipment must be designed to endure stress over time without failure. In addition, design stress parameters must be identified and controlled and parts and materials that can withstand these stresses must be selected. Derating is the selection and application of parts and materials so that the applied stress is less than rated for a specific application. Specifically, derating is the negative slope of a power versus temperature graph. It shows that as operating ambient temperature increases, the output power of a particular component drops to ensure reliable system operation. Derating curves provide a quick way to estimate the maximum output of a device at a given temperature.

In Lambda Predict 2, a derating standard can be applied to MIL-217, Bellcore or RDF 2000 systems. The available published derating standards are:

  • NAVSEA-TE000-AB-GTP-010: This standard is based on the Parts Derating Requirements and Application Manual for Navy Electronic Equipment.

  • MIL-STD-975M: Electronics parts, materials and processes for space and launch vehicles.

  • MIL-STD-1547: This standard provides part selection for electrical, electronic and electromechanical parts used in the design and construction of space flight hardware in space missions as well as essential ground support equipment (GSE).

  • Naval Air System Command AS-4613: Application and derating requirements for electronic components, General spec. F.

Derating is configured at the system level. It impacts only those component categories that are considered in the derating standards.

Due to the fact that these derating standards dictate different requirements for derating and do not all agree completely in their exact parameters or values, some reliability analysts prefer to combine a published standard with their own derating requirements. Lambda Predict 2 allows for this flexibility.

Once a standard has been chosen, each component will indicate if its current stress levels are within the derating standard or not.

Copyright 2006 ReliaSoft Corporation, ALL RIGHTS RESERVED