Reliability Glossary
Alphabetical Listing
The following glossary contains brief definitions of terms frequently used in reliability engineering and life data analysis. The purpose of these entries is to provide a quick explanation of the terms in question, not to provide extensive explanations or mathematical derivations. For those desiring such detailed descriptions, links have been provided when possible for more extensive coverage elsewhere in ReliaSoft's reliability engineering knowledge base.
For ease of reference, the contents of this Reliability Glossary have also been subdivided into topic-specific categories. View the Subject Listing.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
A |
[Top] |
Accelerated life testing
A testing strategy whereby the
engineer extrapolates a product’s failure behavior at normal conditions
from life data obtained at accelerated stress levels. Since products
fail more quickly at higher stress levels, this sort of strategy allows
the engineer to obtain reliability information about a product (e.g.,
mean life, probability of failure at a specific time, etc.) in a shorter
time.
Acceleration factor
The ratio of the product’s life at the use stress level to its life at an accelerated stress level.
For example, if the product has a life of 100 hours at the use stress level, and
it is being tested at an accelerated stress level which reduces its life to 50
hours, then the acceleration factor is 2.
AMPM
"AMPM" stands for "AMSAA maturity prediction
model." This is an enhanced reliability growth
model that allows the user to predict failure rates in future stages of
development. This model allows the user to assess the effectiveness of proposed
and implemented fixes in order to determine the future failure rate.
AMSAA model
"AMSAA" stands for "Army
Material Systems Analysis Activity." This is
a
reliability growth model that uses a relationship
between cumulative test time and cumulative failures to develop a reliability
growth model.
ANOVA
ANOVA stands for analysis
of variance, a method by which the source of variability is
identified. This method is widely used in industry to help identify the
source of potential problems in the production process, and identify whether
variation in measured output values is due to variability between various
manufacturing processes, or within them. By varying the factors in a predetermined
pattern and analyzing the output, one can use statistical techniques to
make an accurate assessment as to the cause of variation in a manufacturing
process.
Analysis plan
A plan used to keep
track of team members, ground rules and assumptions, estimated completion
dates, scheduled work sessions and other details to help an analyst plan
and manage analysis projects.
Arrhenius model
An accelerated life
testing model used in accelerated life testing to establish a relationship
between absolute temperature and reliability. It was originally developed
by Swedish chemist Svante Arrhenius to define the relationship between temperature
and the rates of chemical reaction.
Availability
The probability that
an item will be able to function (i.e., it will not be failed or undergoing
repair) when called upon to do so. This measure takes into account an item’s
reliability (how quickly it fails) and its maintainability (how quickly
it can be repaired).
B |
[Top] |
BX% life
The time at which X% of the units
in a population will have failed. For example, if an item has a B10 life
of 100 hours, then 10% of the population will have failed by 100 hours of
operation.
Block diagram
A diagram that represents
how the components, represented by "blocks," are arranged and related reliability-wise
in a larger system. This is often but not necessarily the same as the way
that the components are physically related. This is also called a reliability
block diagram or RBD.
C |
[Top] |
Censored data
Data in which not all
of the data points represent exact failure times (e.g., there may be operation
times for units that have not failed). Censoring schemes include right-censoring,
left-censoring and interval censoring.
Competing failure modes
A model whereby items
that fail due to more than one failure mode can be represented as a series
reliability system with each block representing a failure mode. The failure
modes are considered to be "competing" amongst each other to see which one
will cause the item to fail.
Complete data
Data that consists of only exact failure times.
Complex system
A block diagram that cannot be reduced to series
and/or parallel systems.
Conditional reliability
The probability that a product will successfully operate at a specific time interval given that it has operated successfully up to a specified time
(e.g., the probability that an item that has survived for 100 hours will
survive for an additional 100 hours).
Confidence
bounds
A measure of the precision of a statistical estimate.
This is represented by a range of values that the particular estimate should
fall within a specified percentage of the time. For example, if we perform
ten different reliability tests for our product and analyze the results,
we will obtain slightly different parameters for the distribution each time,
and thus slightly different reliability results. However, by employing confidence
bounds, we obtain a range within which these reliability values are likely
to occur a certain percentage of the time. This helps us gauge the utility
of the data and the accuracy of the resulting estimates.
Contour plot
A graphical representation of the possible solutions
to the likelihood ratio equation. This is employed to determine confidence
bounds as well as make comparisons between two different data sets.
Control plan
A plan used to keep
track of characteristics that affect a product during the manufacturing
process to ensure that the desired product specifications are met during
the manufacturing process. It is often integrated with the
PFD worksheet and/or
process FMEA.
Cumulative damage model
An accelerated
life testing model used to analyze data with multiple stress types and/or
situations where the stress varies with time.
Criticality analysis
A method for prioritizing
issues that takes into account the probability of failure for the item and
the portion of the failure likelihood that can be attributed to a particular
failure mode. The resulting prioritization is used to determine the sequence
and time-frame for the corrective actions that will be performed.
Cumulative distribution function (cdf)
A function obtained by integrating the failure distribution
pdf. In life data analysis, the cdf is equivalent
to the unreliability function.
D |
[Top] |
Decomposition method
A method for
determining the reliability of complex systems. The decomposition method
is an application of the law of total probability, which involves choosing
a "key" component and then calculating the reliability of the system twice:
once as if the key component failed and once as if the key component succeeded.
These two probabilities are then combined to obtain the reliability of the
system, since at any given time the key component will be failed or operating.
Degradation analysis
A technique
that uses the performance (degradation) measurements of a product over time
to predict the point at which each unit in the sample is expected to fail.
This analysis is useful for tests performed on products with very high reliability,
where it is not possible to test the units to failure under normal conditions.
Design FMEA
An
FMEA performed with the objective of improving the design
of a subsystem or component.
Design for reliability (DFR):
A process in which
a set of reliability engineering practices are utilized early in a product's
design and integrated into the entire product development cycle.
Design reviews based on failure mode (DRBFM)
A methodology used
to evaluate proposed changes to an existing design. DRBFM uses a worksheet
similar to the FMEA worksheet, but it typically focuses on the failure modes
that might be introduced by a specific change to a product or process.
Detection ratings
Qualitative ratings
used to estimate the likelihood of prior detection for each cause of failure
(i.e., the likelihood of detecting the problem before it reaches the end
user or customer).
Downtime
The amount of time during which a repairable unit
is not operating. This can be due to being in a failed state, administrative
delay, waiting for replacement parts to be shipped or undergoing active
repair.
Duane model
A
reliability growth model similar to the
AMSAA model that uses a relationship between cumulative
test time and cumulative failures to develop a reliability growth profile.
E |
[Top] |
Event space method
A method for determining the reliability
of complex systems. With the event space method, all mutually exclusive
events are determined. The reliability of the system is simply the probability
of the union of all mutually exclusive events that yield a system success
(the unreliability is the probability of the union of all mutually exclusive
events that yield a system failure).
Exponential distribution
A lifetime
statistical distribution that assumes a constant failure rate for the product
being modeled.
Eyring model
An accelerated life testing
model based on quantum mechanics that is typically used when temperature
or humidity is the accelerated stress.
F |
[Top] |
Failure distribution
A mathematical
model that describes the probability of failures occurring over time. Also
known as the probability density function (pdf),
this function is integrated to obtain the probability that the failure time
takes a value in a given time interval. This function is the basis for other
important reliability functions, including the reliability
function, the failure rate function and
the mean life.
Failure effect categorization (FEC)
A process whereby
by the effects of a system’s functional failures are evaluated and categorized
in order to help the analyst prioritize identified issues and choose the
appropriate maintenance strategy to address them.
Failure mode and effects analysis (FMEA)
A methodology designed
to identify potential failure modes for a product or process, to assess
the risk associated with those failure modes, to rank the issues in terms
of importance and to identify and carry out corrective actions to address
the most serious concerns.
Failure mode criticality
see Criticality analysis
Failure modes and reliability analysis (FMRA)
A methodology that
uses information from FMEAs as a starting point for other reliability/availability
analyses and cost calculations.
Failure modes, effects and criticality analysis (FMECA)
Similar to the
FMEA methodology, except that it includes a
criticality analysis.
Failure rate
A function that describes
the number of failures that can be expected to take place over a given unit
of time. The failure rate function has the units of failures per unit time
among surviving units (e.g., one failure per month).
Fisher matrix
A mathematical expression
that is used to determine the variability of estimated parameter values
based on the variability of the data used to make the parameter estimates.
It is used to determine confidence bounds when using maximum
likelihood estimation (MLE) techniques.
Functional failure analysis (FFA)
see Failure
mode and effects analysis (FMEA)
G |
[Top] |
Gaussian distribution
see Normal
distribution
General log-linear model
An accelerated
life testing model that can account for multiple non-thermal stresses as
acceleration factors. In ALTA PRO, this model allows the user to select
a life-stress relationship (Arrhenius, Inverse Power Law or Exponential) for each stress.
Generalized gamma distribution
While not
as frequently used for modeling life data as other life distributions, the
generalized gamma distribution does have the ability to mimic the attributes
of other distributions such as the Weibull or lognormal, based on the values
of the distribution's parameters. While the generalized gamma distribution
is not often used to model life data by itself, its ability to behave like
other more commonly-used life distributions is sometimes used to determine
which of those life distributions should be used to model a particular set
of data.
Gompertz model
A
reliability growth model that models reliability
values at different stages of development and produces an S-shaped reliability
growth curve.
H |
[Top] |
HALT
"HALT" stands for "Highly accelerated
life testing." It is an accelerated testing method
used primarily to reveal probable failure modes for the product.
HASS
"HASS" stands for "Highly accelerated
stress screening." It is
similar to the HALT testing method, except it is applied during the production
stage to prevent the shipment of defective items.
Hazard rate
see Failure rate
I |
[Top] |
Importance measure
A measure of the
relative contribution of a component’s contribution to the overall system’s
reliability. The importance measure of a component is equivalent to the
first partial derivative of the component reliability with respect to the
system reliability.
Interval censored data
Data that represents a range of time within which the unit is known to have
failed (e.g., it might be observed that a unit failed at some point between
50 and 100 hours of operation).
Inverse power law
An accelerated life testing
model commonly used when the accelerating factor is a single, non-thermal
stress (e.g., vibration, voltage or temperature cycling).
J |
[Top] |
K |
[Top] |
Kaplan-Meier estimator
This is an estimator used as an alternative
to the median ranks method for calculating the estimates of the unreliability
for probability plotting purposes. It is also used to determine reliability
estimates for nonparametric data analysis.
L |
[Top] |
Left
censored data
A type of interval
censored data where the the failure is only known to have occurred before
a specific time (e.g., it might be observed that a unit failed at some point
before 500 hours of operation).
Life data analysis
The statistical
analysis of failure and usage data performed to be able to mathematically
model the reliability and failure characteristics of a product.
Life distribution
see Failure
distribution
Life-stress relationship
A relationship that describes how stress levels affect the reliability of a product.
Various mathematical models (e.g., the
Arrhenius model) are available to
describe a product's life-stress relationship.
Likelihood function
A function that represents the joint probability
of all the points in a data set. For complete data, the likelihood function
consists of the product of the pdf for each data point; for data
sets that also include suspended or censored data, the likelihood function
is more complex. Maximum likelihood estimation (MLE)
techniques maximize this function in order to determine the best parameter
estimates.
Likelihood ratio
The ratio of a likelihood function for an
unknown parameter vector to the likelihood function calculated at the estimated
parameter vector. The relationship of this ratio to the chi-squared distribution
can then be used to calculate confidence bounds and confidence regions.
Lloyd-Lipow model
A
reliability growth model based on the number
of trials and successes at each stage of product development.
Lognormal distribution
A lifetime statistical
distribution that is often used to model products in which physical fatigue
is the prominent contributor to the primary failure mode.
M |
[Top] |
Maintainability
The probability
that a failed unit will be repaired within a given amount of time. The term
is also used to denote the discipline of studying and improving the maintainability
of products, primarily by reducing the amount of time required to diagnose
and repair failures.
Maintenance tasks
Activities intended
to repair or maintain a system (e.g., inspections).
Maximum likelihood estimation (MLE)
A method
of parameter estimation involving the maximization of the likelihood equation.
The best parameter estimates are obtained by determining the parameter values
that maximize the value of the likelihood equation for a particular data
set.
Mean Life
A reliability measure that
represents the expected value of the failure times for a failure distribution,
also known as the average or central life value. While this represents a
useful representative value of a distribution of failure times, it is often
over-used as the sole reliability metric.
Median ranks
Measures used to obtain estimates
of the unreliability. Median ranks are the values that the true probability
of failure should have at the jth failure out of a sample of N units, at
a 50% confidence level, or the best estimate for the unreliability. This
estimate is based on a solution of the binomial equation.
Mixed Weibull distribution
A variation
of the Weibull distribution used to model data with
distinct subpopulations that may represent different failure characteristics
over the lifetime of a product. Each subpopulation has separate Weibull
parameters calculated and the results are combined in a mixed Weibull distribution
to represent all of the subpopulations in one function.
Mode ratio
The probability that the item's failure will be due to the failure mode
under consideration. In other words, this represents the percentage of
all failures for the item that will be due to the given mode.
Modified Gompertz model
A
reliability growth model that models based
on a variation of the Gompertz model.
Monte Carlo simulation
A method of generating values from a
known distribution for the purposes of experimentation. This is accomplished
by generating uniform random variables and using them in an inverse reliability
equation to produce failure times that would conform to the desired input
distribution.
MTBF
In the case of repairable systems,
"MTBF" stands for "mean time between failures."
This average time excludes the time spent waiting for repair, being repaired,
being re-qualified, and other downing events such as inspections and preventive
maintenances and so on; it is intended to measure only the time a system
is available and operating. Whereas, in the case of non-repairable systems,
MTBF stands for mean time before failure and
is represented by the mean life value for a failure
distribution of non-repairable units.
MTTF
"MTTF" stands for "mean time to failure"
and is represented by the mean life value for a
failure distribution of non-repairable units.
MTTR
"MTTR" stands for "mean time to repair"
and is represented by the mean life value for a
distribution of repair times (see Maintainability).
N |
[Top] |
NHPP
"NHPP" stands for "non-homogeneous
Poisson process," which is a simple parametric model used to
represent events with a non-constant failure recurrence rate. This type
of model is often used to model
reliability growth and the reliability of repairable
units.
Nonparametric analysis
A method
of analysis that allows the user to characterize failure data without assuming
an underlying failure distribution. This avoids the potentially large errors
brought about by making incorrect assumptions about the distribution. However,
the confidence bounds associated with nonparametric analysis are usually
much wider than those calculated via parametric analysis. Additionally,
predictions outside the range of the observations are not possible.
Normal distribution
A common lifetime
statistical distribution that was developed by mathematician C. F. Gauss.
The distribution is a continuous, bell-shaped distribution that is symmetric
about its mean and can take on values from negative infinity to positive
infinity.
O |
[Top] |
Occurrence ratings
Qualitative ratings
used to estimate the likelihood of occurrence for each cause of failure.
P |
[Top] |
Path-tracing method
A method for determining the reliability
of complex systems. With this method, every path from a starting point to
an ending point is considered. Since system success involves having at least
one path available from one end of the reliability
block diagram to the other, as long as at least one path is available,
the system has not failed. The reliability of the system is simply the probability
of the union of these paths.
Plotting paper
see Probability
plotting paper
Probability
A quantitative description of the possible likelihood
of a particular event. Probability is conventionally expressed on a scale
from 0 to 1, or 0% to 100%, with an unlikely event having a probability
close to 0, and a very common event having a probability close to 1.
Probability density function (pdf)
A mathematical model that describes the probability of events occurring
over time. This function is integrated to obtain the probability that the
event time takes a value in a given time interval. In
life data analysis, the event in question
is a failure, and the pdf is the basis for other important reliability
functions, including the reliability function,
the failure rate function and the
mean life.
Probability plot
A type of plot that linearizes a distribution’s
cdf, allowing the user to manually plot failure time vs. estimated
unreliability. Provided that the plotted points fall on a relatively straight
line (thus indicating that the chosen distribution is a good fit), the parameter
estimates can be obtained from scales on the plot. This is a crude, time-consuming
method of fitting a distribution to failure data, but it was practically
the only method available prior to the widespread use of computers.
Probability plotting paper
A specially
designed type of graph paper that allows the user to plot failure time vs.
unreliability as a linear function. Plotting paper constructions varies
from distribution to distribution. Probability plotting papers that have
been generated by ReliaSoft's software are available on the Web at
http://www.weibull.com/GPaper/index.htm.
Process flow diagram (PFD)
A high level chart
that helps the analyst visualize the steps that a product goes through in
a manufacturing or assembly process.
Process flow diagram (PFD) worksheet
A worksheet that captures details about what happens to an item in each step of its manufacturing
or assembly process, and it records the product and process characteristics
that are important to keep under control. The information from this worksheet
can be used as an input to the process FMEA (PFMEA) and control plan for the item.
Process FMEA
An FMEA performed with the objective of improving the design
of a manufacturing process.
Proportional hazards model
An accelerated life testing model that can account for multiple non-thermal
stresses as acceleration factors. This model allows the use of zero as a stress value, which enables the
analysis of data with indicator variables (e.g., 0 = on/off and 1 = continuous operation).
Q |
[Top] |
Quality
A common buzzword referring to
the non-quantifiable point-level excellence of a product or process. While
sometimes used interchangeably with the term reliability, quality refers
to the characteristics of a product at one point in time, while
reliability refers to the characteristics of
a product over its entire lifetime.
R |
[Top] |
Reliability
The probability of an
item operating for a given amount of time without failure. More generally,
reliability is the capability of parts, components, equipment, products
and systems to perform their required functions for desired periods of time
without failure, in specified environments and with a desired confidence.
Reliability analysis
see Life
data analysis
Reliability block diagram
see Block
diagram
Reliability centered maintenance (RCM) analysis
A structured framework
for analyzing the functions and potential failure modes for a physical asset
(such as an airplane, a manufacturing production line, etc.) in order to
develop a scheduled maintenance plan that will provide an acceptable level
of operability, with an acceptable level of risk, in an efficient and cost-effective
manner.
Reliability growth
The analysis of the
change in reliability over time, usually applied to products under development.
Reliability growth analysis provides the means by which the reliability,
mean life or failure rate is tracked over time, allowing the user to predict
future reliability values based on the current rate of growth of the reliability
measurement of interest.
Reliability importance
see Importance
measure
Reliability life data analysis
see Life data analysis
Reliability test design
The process of designing plans for
reliability testing.
Reliability testing
Testing units to failure in order to obtain
raw failure time data for life data analysis.
Reliable life
see Warranty time
Repair
An action that restores a failed part or component to
operating condition.
Repair distribution
A mathematical model that describes the
probability of repairs occurring over time.
Repairable system
A system that can be restored to operating
condition after a failure by the repair or replacement of one or more components.
Right censored data
Data that represents
the length of time during which a unit has operated without failure (e.g.,
it might be observed that a unit did not fail during a 100-hour test); also
called suspended data.
Risk discovery analysis
A preliminary analysis
that often involves answering questions and/or assigning ratings about possible
risks. It is used to help the analyst choose which items should receive
more detailed consideration via a failure mode and effects
analysis (FMEA) or reliability centered maintenance (RCM)
analysis.
Risk priority number (RPN) system
A relative rating
system that prioritizes issues by assigning numerical values to them in
each of three different categories: Severity (S),
Occurrence (O) and Detection
(D). The three ratings are multiplied together to determine the overall
RPN for each issue, where higher RPNs indicate higher risk priority.
S |
[Top] |
Sequential testing
A testing
methodology in which test units are tested consecutively instead of simultaneously.
Severity ratings
Qualitative ratings
used to estimate the severity of each effect of failure.
Spares provisioning
The stocking of spare units or components
based on the anticipated number of failures for a given mission or length
of operation.
SPRT
SPRT stands for sequential probability
ratio test. This is a type of accept/reject
sequential testing in which accept/reject
boundaries are defined by the user and units are sequentially tested until
either the accept boundary or the reject boundary have been reached, and
a decision is made about the suitability of the units.
Statistics
The branch of mathematics that deals with the collection,
organization, analysis and interpretation of data.
Stress testing
A testing strategy whereby units are tested
at stresses higher than what would be encountered during normal operating
conditions, usually to induce failures.
Stress-strength interference
A method by which the probability of failure of an item is calculated by
superimposing the distribution of the item’s strength over the distribution
of the stress it will encounter during normal usage.
Suspended data
see Right censored data
System FMEA
An
FMEA performed with the objective of improving the design
of a system.
System reliability
The reliability of an entire system, as
opposed to the reliability of its components. The system reliability is
defined by the reliability of the components as well as the way the components
are arranged reliability-wise.
T |
[Top] |
Temperature-humidity model
An accelerated
life testing model used when the two accelerating factors are temperature
and humidity.
Temperature-nonthermal model
An accelerated life testing model used when the two accelerating factors
are temperature and another non-thermal stress factor.
U |
[Top] |
Uptime
The amount of time during which a repairable unit is
operating per design.
V |
[Top] |
W |
[Top] |
Warranty analysis
The analysis of warranty
and return data for the purpose of determining the reliability characteristics
of a product.
Warranty time
The time at which a specified reliability value will be reached (e.g., a goal of
90% reliability with a reliable life of 4 years means that if 100 identical
units are fielded, then 90 of them will be still be operating at the end of 4
years).
Weibull distribution
A statistical distribution
frequently used in life data analysis. Developed by Swedish mathematician
Waloddi Weibull, this distribution is widely used due to its versatility
and the fact that the Weibull pdf can assume different
shapes based on the parameter values.
X |
[Top] |
Y |
[Top] |
Z |
[Top] |