Standards Based Reliability Prediction in a
Nutshell
Standards based reliability prediction is a methodology for predicting
reliability for systems and components (mostly electronics) based on failure
rate estimates published by globally recognized military or commercial
standards. Standards based reliability prediction is especially useful in
the initial stages of development when hard failure data is not yet
available or when manufacturers are obliged contractually by their customers
to use published standards for their reliability predictions. This article
presents an overview of standards based reliability prediction and how it
can be performed with the help of the
Lambda Predict software.
Assumptions and Applicability
Issue 50
of Reliability HotWire presented an introduction to standards based
reliability prediction and discussed the applicability and assumptions used
in this approach.
Issue 51
presented an overview of common prediction standards and analysis methods.
Readers are encouraged to review these issues to set the stage for this
article.
Prediction Standards
The common prediction standards are MIL-HDBK-217, Bellcore/Telcordia
(SR-332), NSWC-98/LE1 (for mechanical components), China 299B (GJB/z-299B)
and RDF 2000 (IEC 62380).
Analysis Methods:
The typical analysis methods are:
-
Part Count Analysis method.
-
Part Stress Analysis method.
-
In addition to these methods, which are
common to all the standards, Bellcore also uses 3 more methods (Method I,
Method II, Method III).
Issue 51
presented an overview of the analysis methods mentioned above.
Calculations and Metrics
The standards typically estimate the system reliability by relying on
base failure rates for the components in the system. The base failure
rates describe the components while operating under "normal" (determined by
the standard) environmental conditions. The base failure rates are then
multiplied by various factors (called
pi factors, ranging between 0 and 1) that describe the specific
conditions/stress in which the component is used and, in the case of some
standards (such as MIL-217), by factors that describe the quality of the
component.
Standards based reliability prediction calculates failure rates by summing,
or "rolling up," the failure rates of all components and subassemblies to
the system level. It may (depending on the method used by the analysis)
also add the failure rates associated with the components' solder joint
connections and other types of construction, such as surface mounts and
printed circuit boards (PCBs) or hybrid devices. The following metrics can
be calculated:
Failure Rate, λ: The conditional failure rate, defined as the total
number of failures within an item's populations, divided by the total time
expended by that population, during a particular measurement interval under
stated conditions. Reliability predictions are typically stated in number of
failures per million hours, FPMH. In Bellcore, failure rates are usually
expressed as failures per billion hours, FI.
MTBF: The Mean Time Between Failure is the expected hours of
operation between failure under specified conditions.
Unavailability: In standards based reliability prediction, this term
is used interchangeably with Unreliability in the case of
unrepairable systems. Unreliability is defined as 1 - R(t), where
R(t)
represents reliability. Since the standards assume a constant failure rate
and all of the calculations are based on failure rates or MTBF values, this
assumption dictates the use of the exponential distribution model to
describe the reliability function. The following expression describes the
exponential distribution model; the time variable can be used to calculate
the reliability of the system/subsystem at a specific time value.
R(t) = e-λt
or
R(t) = e-t/MTBF
Contribution: The failure rate of an
item or block (collection of items) accounts for a certain percentage of the
failure rate of the next higher level or hierarchy. The is the item or
block's contribution. This may be (a) the percent contribution of a
component's failure rate to the total failure rate of the block (collection
of components) to which it is connected, (b) the percent contribution of a
component or block's failure rate to the total failure rate of the top level
hierarchy or system (collection of blocks or components) to which it is
connected or (c) the percent contribution of a system's failure rate to the
total overall project (collection of systems) failure rate.
First-Year-Multiplier
This feature is unique to the Bellcore/Telcordia standard. Bellcore stresses
the early life (infant) mortality problems of electronics and the use of
burn-in by manufacturers to reduce the severity of infant mortality by
weeding out weak components that suffer from early life problems. The
Bellcore standard applies a First-Year-Multiplier factor that accounts for
infant mortality risks in the failure rate prediction. The multiplier,
defined as the average failure rate during the first year of operation,
expresses a multiple of the Steady-State Failure Rate. The Bellcore standard
also applies a "credit" for the use of a burn-in period and reduces the
First-Year-Multiplier accordingly (i.e. the multiplier is smaller for
longer periods of burn-in).
Mission Profile
Predicting reliability has to be done in accordance with the field use
conditions, which might sometimes be time-varying. A mission profile can be
utilized to decompose, in several homogenous working phases, the phases of
conditions that the product goes through over time.
The capability to specify a mission profile
is available only in the RDF 2000 standard. This standard allows
specification of a temperature mission profile with different phases. The
phases can have different temperatures that influence the failure rate of
the components. The phases can also be one of each of the following types
with various average outside temperature swings seen by the equipment:
-
On/Off working phases
-
Permanent working phases
-
Dormant (storage) phases
The different types of phases mentioned above
affect the failure rate calculation in different ways, as they apply
different stresses on the components.
Repairable and/or Redundant Systems Analysis
Typical standards based reliability predictions address devices and
equipment as non-repairable serial systems, where any component failure will
cause a system failure and the system stays in that failed state forever.
Therefore, no redundancy or repair is included in the models. Lambda
Predict provides additional capabilities to include system- and/or
block-level repair and redundancy functions in the failure rate and
unavailability calculations. Repairable analysis is implemented in Lambda
Predict
by simply entering MTTR (Mean Time to Repair) data. The analyst can also
specify the number of redundant units and describe the relationship used:
simple parallel configuration (hot standby) or cold standby (backup)
configuration. The base availability is calculated using the repair rate
μ
= 1/MTTR and the failure rate,
λ.
In the case of redundant systems, the unavailability is then calculated
using the base availability, the failure rate and the number of systems
available as backups. A corresponding failure rate can also be calculated
for redundant systems.
Allocation
Oftentimes, a design needs to meet a certain reliability goal. For systems
of multiple components/subsystems, the reliability goal needs to be
apportioned (allocated) across the different components/subsystems in a way
that the rolled-up failure rate would meet the reliability goal.
Standards based reliability prediction methods typically use one of five
allocation models that can be used to logically apportion the product design
reliability into lower level design criteria such that the cumulative
reliability can meet the requirement. The methods rely on different
allocation techniques and would therefore produce different results. The
five allocation methods are:
-
Equal: This method is the simplest and does
not take into account any differences among the elements; it just takes the
reliability goal and splits it equally across all the elements.
-
AGREE: A technique that considers both the
complexity and importance of each element based on the estimated probability
of product failure as a result of element failure.
-
Feasibility: Relies on numerical rating scales to assess the elements based
on product complexity, state-of-the-art, operating time and environment.
-
ARINC: This method looks only at the
current failure rate of a subsystem and allocates the reliability using a
weighted factor calculated by taking the ratio of the subsystem's current
failure rate to the sum of all the subsystems' failure rates.
-
Repairable Systems Allocation: This technique allocates subsystem failure
rates to allow the system to meet an availability objective for a repairable
system. Assuming that all subsystems are identical and have a constant
repair rate and by determining the ratio of allocated failure rate to the
repair rate for each subsystem based on a steady-state availability
calculation, the failure rate allocated to each subsystem can be determined.
Derating Analysis
Most equipment failures are precipitated by stress. When the applied stress
exceeds the inherent strength of the part, either a serious degradation or a
failure will occur. To assure reliability, equipment must be designed to
endure stress over time without failure. In addition, design stress
parameters must be identified and controlled and parts and materials that
can withstand these stresses must be selected. Derating is the selection and
application of parts and materials so that the applied stress is less than
rated for a specific application. Specifically, derating is the negative
slope of a power versus temperature graph. It shows that as operating
ambient temperature increases, the output power of a particular component
drops to ensure reliable system operation. Derating curves provide a quick
way to estimate the maximum output of a device at a given temperature.
In
Lambda Predict 2, a derating standard can be applied to MIL-217,
Bellcore or RDF 2000 systems. The available published derating standards
are:
-
NAVSEA-TE000-AB-GTP-010: This standard is
based on the Parts Derating Requirements and Application Manual for Navy
Electronic Equipment.
-
MIL-STD-975M: Electronics parts, materials
and processes for space and launch vehicles.
-
MIL-STD-1547: This standard provides part
selection for electrical, electronic and electromechanical parts used in the
design and construction of space flight hardware in space missions as well
as essential ground support equipment (GSE).
-
Naval Air System Command AS-4613: Application and derating requirements for
electronic components, General spec. F.
Derating is configured at the system level. It impacts only those component
categories that are considered in the derating standards.
Due to the fact that these derating
standards dictate different requirements for derating and do not all agree
completely in their exact parameters or values, some reliability analysts
prefer to combine a published standard with their own derating requirements.
Lambda Predict 2 allows for this flexibility.
Once a standard has been chosen, each component will indicate if its current
stress levels are within the derating standard or not.
|