The following glossary contains brief definitions of terms frequently used in reliability engineering and life data analysis. The purpose of these entries is to provide a quick explanation of the terms in question, not to provide extensive explanations or mathematical derivations. For those desiring such detailed descriptions, links have been provided when possible for more extensive coverage elsewhere in ReliaSoft's reliability engineering knowledge base.
For ease of reference, the contents of this Reliability Glossary have also been subdivided into topic-specific categories. View the Subject Listing.
Accelerated life testing
A testing strategy whereby the engineer extrapolates a product’s failure behavior at normal conditions from life data obtained at accelerated stress levels. Since products fail more quickly at higher stress levels, this sort of strategy allows the engineer to obtain reliability information about a product (e.g., mean life, probability of failure at a specific time, etc.) in a shorter time.
The ratio of the product’s life at the use stress level to its life at an accelerated stress level. For example, if the product has a life of 100 hours at the use stress level, and it is being tested at an accelerated stress level which reduces its life to 50 hours, then the acceleration factor is 2.
"AMPM" stands for "AMSAA maturity prediction model." This is an enhanced reliability growth model that allows the user to predict failure rates in future stages of development. This model allows the user to assess the effectiveness of proposed and implemented fixes in order to determine the future failure rate.
"AMSAA" stands for "Army Material Systems Analysis Activity." This is a reliability growth model that uses a relationship between cumulative test time and cumulative failures to develop a reliability growth model.
ANOVA stands for analysis of variance, a method by which the source of variability is identified. This method is widely used in industry to help identify the source of potential problems in the production process, and identify whether variation in measured output values is due to variability between various manufacturing processes, or within them. By varying the factors in a predetermined pattern and analyzing the output, one can use statistical techniques to make an accurate assessment as to the cause of variation in a manufacturing process.
A plan used to keep track of team members, ground rules and assumptions, estimated completion dates, scheduled work sessions and other details to help an analyst plan and manage analysis projects.
An accelerated life testing model used in accelerated life testing to establish a relationship between absolute temperature and reliability. It was originally developed by Swedish chemist Svante Arrhenius to define the relationship between temperature and the rates of chemical reaction.
The probability that an item will be able to function (i.e., it will not be failed or undergoing repair) when called upon to do so. This measure takes into account an item’s reliability (how quickly it fails) and its maintainability (how quickly it can be repaired).
The time at which X% of the units in a population will have failed. For example, if an item has a B10 life of 100 hours, then 10% of the population will have failed by 100 hours of operation.
A diagram that represents how the components, represented by "blocks," are arranged and related reliability-wise in a larger system. This is often but not necessarily the same as the way that the components are physically related. This is also called a reliability block diagram or RBD.
Data in which not all of the data points represent exact failure times (e.g., there may be operation times for units that have not failed). Censoring schemes include right-censoring, left-censoring and interval censoring.
Competing failure modes
A model whereby items that fail due to more than one failure mode can be represented as a series reliability system with each block representing a failure mode. The failure modes are considered to be "competing" amongst each other to see which one will cause the item to fail.
A block diagram that cannot be reduced to series and/or parallel systems.
The probability that a product will successfully operate at a specific time interval given that it has operated successfully up to a specified time (e.g., the probability that an item that has survived for 100 hours will survive for an additional 100 hours).
A measure of the precision of a statistical estimate. This is represented by a range of values that the particular estimate should fall within a specified percentage of the time. For example, if we perform ten different reliability tests for our product and analyze the results, we will obtain slightly different parameters for the distribution each time, and thus slightly different reliability results. However, by employing confidence bounds, we obtain a range within which these reliability values are likely to occur a certain percentage of the time. This helps us gauge the utility of the data and the accuracy of the resulting estimates.
A graphical representation of the possible solutions to the likelihood ratio equation. This is employed to determine confidence bounds as well as make comparisons between two different data sets.
A plan used to keep track of characteristics that affect a product during the manufacturing process to ensure that the desired product specifications are met during the manufacturing process. It is often integrated with the PFD worksheet and/or process FMEA.
Cumulative damage model
An accelerated life testing model used to analyze data with multiple stress types and/or situations where the stress varies with time.
A method for prioritizing issues that takes into account the probability of failure for the item and the portion of the failure likelihood that can be attributed to a particular failure mode. The resulting prioritization is used to determine the sequence and time-frame for the corrective actions that will be performed.
A method for determining the reliability of complex systems. The decomposition method is an application of the law of total probability, which involves choosing a "key" component and then calculating the reliability of the system twice: once as if the key component failed and once as if the key component succeeded. These two probabilities are then combined to obtain the reliability of the system, since at any given time the key component will be failed or operating.
A technique that uses the performance (degradation) measurements of a product over time to predict the point at which each unit in the sample is expected to fail. This analysis is useful for tests performed on products with very high reliability, where it is not possible to test the units to failure under normal conditions.
An FMEA performed with the objective of improving the design of a subsystem or component.
Design reviews based on failure mode (DRBFM)
A methodology used to evaluate proposed changes to an existing design. DRBFM uses a worksheet similar to the FMEA worksheet, but it typically focuses on the failure modes that might be introduced by a specific change to a product or process.
Qualitative ratings used to estimate the likelihood of prior detection for each cause of failure (i.e., the likelihood of detecting the problem before it reaches the end user or customer).
The amount of time during which a repairable unit is not operating. This can be due to being in a failed state, administrative delay, waiting for replacement parts to be shipped or undergoing active repair.
Event space method
A method for determining the reliability of complex systems. With the event space method, all mutually exclusive events are determined. The reliability of the system is simply the probability of the union of all mutually exclusive events that yield a system success (the unreliability is the probability of the union of all mutually exclusive events that yield a system failure).
A lifetime statistical distribution that assumes a constant failure rate for the product being modeled.
An accelerated life testing model based on quantum mechanics that is typically used when temperature or humidity is the accelerated stress.
A mathematical model that describes the probability of failures occurring over time. Also known as the probability density function (pdf), this function is integrated to obtain the probability that the failure time takes a value in a given time interval. This function is the basis for other important reliability functions, including the reliability function, the failure rate function and the mean life.
Failure effect categorization (FEC)
A process whereby by the effects of a system’s functional failures are evaluated and categorized in order to help the analyst prioritize identified issues and choose the appropriate maintenance strategy to address them.
Failure mode and effects analysis (FMEA)
A methodology designed to identify potential failure modes for a product or process, to assess the risk associated with those failure modes, to rank the issues in terms of importance and to identify and carry out corrective actions to address the most serious concerns.
Failure mode criticality
see Criticality analysis
A function that describes the number of failures that can be expected to take place over a given unit of time. The failure rate function has the units of failures per unit time among surviving units (e.g., one failure per month).
A mathematical expression that is used to determine the variability of estimated parameter values based on the variability of the data used to make the parameter estimates. It is used to determine confidence bounds when using maximum likelihood estimation (MLE) techniques.
Functional failure analysis (FFA)
see Failure mode and effects analysis (FMEA)
see Normal distribution
General log-linear model
An accelerated life testing model that can account for multiple non-thermal stresses as acceleration factors. In ALTA PRO, this model allows the user to select a life-stress relationship (Arrhenius, Inverse Power Law or Exponential) for each stress.
Generalized gamma distribution
While not as frequently used for modeling life data as other life distributions, the generalized gamma distribution does have the ability to mimic the attributes of other distributions such as the Weibull or lognormal, based on the values of the distribution's parameters. While the generalized gamma distribution is not often used to model life data by itself, its ability to behave like other more commonly-used life distributions is sometimes used to determine which of those life distributions should be used to model a particular set of data.
"HALT" stands for "Highly accelerated life testing." It is an accelerated testing method used primarily to reveal probable failure modes for the product.
"HASS" stands for "Highly accelerated stress screening." It is similar to the HALT testing method, except it is applied during the production stage to prevent the shipment of defective items.
see Failure rate
A measure of the relative contribution of a component’s contribution to the overall system’s reliability. The importance measure of a component is equivalent to the first partial derivative of the component reliability with respect to the system reliability.
Interval censored data
Data that represents a range of time within which the unit is known to have failed (e.g., it might be observed that a unit failed at some point between 50 and 100 hours of operation).
Inverse power law
An accelerated life testing model commonly used when the accelerating factor is a single, non-thermal stress (e.g., vibration, voltage or temperature cycling).
This is an estimator used as an alternative to the median ranks method for calculating the estimates of the unreliability for probability plotting purposes. It is also used to determine reliability estimates for nonparametric data analysis.
A type of interval censored data where the the failure is only known to have occurred before a specific time (e.g., it might be observed that a unit failed at some point before 500 hours of operation).
Life data analysis
The statistical analysis of failure and usage data performed to be able to mathematically model the reliability and failure characteristics of a product.
see Failure distribution
A relationship that describes how stress levels affect the reliability of a product. Various mathematical models (e.g., the Arrhenius model) are available to describe a product's life-stress relationship.
A function that represents the joint probability of all the points in a data set. For complete data, the likelihood function consists of the product of the pdf for each data point; for data sets that also include suspended or censored data, the likelihood function is more complex. Maximum likelihood estimation (MLE) techniques maximize this function in order to determine the best parameter estimates.
The ratio of a likelihood function for an unknown parameter vector to the likelihood function calculated at the estimated parameter vector. The relationship of this ratio to the chi-squared distribution can then be used to calculate confidence bounds and confidence regions.
A lifetime statistical distribution that is often used to model products in which physical fatigue is the prominent contributor to the primary failure mode.
The probability that a failed unit will be repaired within a given amount of time. The term is also used to denote the discipline of studying and improving the maintainability of products, primarily by reducing the amount of time required to diagnose and repair failures.
Activities intended to repair or maintain a system (e.g., inspections).
Maximum likelihood estimation (MLE)
A method of parameter estimation involving the maximization of the likelihood equation. The best parameter estimates are obtained by determining the parameter values that maximize the value of the likelihood equation for a particular data set.
A reliability measure that represents the expected value of the failure times for a failure distribution, also known as the average or central life value. While this represents a useful representative value of a distribution of failure times, it is often over-used as the sole reliability metric.
Measures used to obtain estimates of the unreliability. Median ranks are the values that the true probability of failure should have at the jth failure out of a sample of N units, at a 50% confidence level, or the best estimate for the unreliability. This estimate is based on a solution of the binomial equation.
Mixed Weibull distribution
A variation of the Weibull distribution used to model data with distinct subpopulations that may represent different failure characteristics over the lifetime of a product. Each subpopulation has separate Weibull parameters calculated and the results are combined in a mixed Weibull distribution to represent all of the subpopulations in one function.
The probability that the item's failure will be due to the failure mode under consideration. In other words, this represents the percentage of all failures for the item that will be due to the given mode.
Monte Carlo simulation
A method of generating values from a known distribution for the purposes of experimentation. This is accomplished by generating uniform random variables and using them in an inverse reliability equation to produce failure times that would conform to the desired input distribution.
In the case of repairable systems, "MTBF" stands for "mean time between failures." This average time excludes the time spent waiting for repair, being repaired, being re-qualified, and other downing events such as inspections and preventive maintenances and so on; it is intended to measure only the time a system is available and operating. Whereas, in the case of non-repairable systems, MTBF stands for mean time before failure and is represented by the mean life value for a failure distribution of non-repairable units.
"MTTF" stands for "mean time to failure" and is represented by the mean life value for a failure distribution of non-repairable units.
"NHPP" stands for "non-homogeneous Poisson process," which is a simple parametric model used to represent events with a non-constant failure recurrence rate. This type of model is often used to model reliability growth and the reliability of repairable units.
A method of analysis that allows the user to characterize failure data without assuming an underlying failure distribution. This avoids the potentially large errors brought about by making incorrect assumptions about the distribution. However, the confidence bounds associated with nonparametric analysis are usually much wider than those calculated via parametric analysis. Additionally, predictions outside the range of the observations are not possible.
A common lifetime statistical distribution that was developed by mathematician C. F. Gauss. The distribution is a continuous, bell-shaped distribution that is symmetric about its mean and can take on values from negative infinity to positive infinity.
Qualitative ratings used to estimate the likelihood of occurrence for each cause of failure.
A method for determining the reliability of complex systems. With this method, every path from a starting point to an ending point is considered. Since system success involves having at least one path available from one end of the reliability block diagram to the other, as long as at least one path is available, the system has not failed. The reliability of the system is simply the probability of the union of these paths.
see Probability plotting paper
A quantitative description of the possible likelihood of a particular event. Probability is conventionally expressed on a scale from 0 to 1, or 0% to 100%, with an unlikely event having a probability close to 0, and a very common event having a probability close to 1.
Probability density function (pdf)
A mathematical model that describes the probability of events occurring over time. This function is integrated to obtain the probability that the event time takes a value in a given time interval. In life data analysis, the event in question is a failure, and the pdf is the basis for other important reliability functions, including the reliability function, the failure rate function and the mean life.
A type of plot that linearizes a distribution’s cdf, allowing the user to manually plot failure time vs. estimated unreliability. Provided that the plotted points fall on a relatively straight line (thus indicating that the chosen distribution is a good fit), the parameter estimates can be obtained from scales on the plot. This is a crude, time-consuming method of fitting a distribution to failure data, but it was practically the only method available prior to the widespread use of computers.
Probability plotting paper
A specially designed type of graph paper that allows the user to plot failure time vs. unreliability as a linear function. Plotting paper constructions varies from distribution to distribution. Probability plotting papers that have been generated by ReliaSoft's software are available on the Web at http://www.weibull.com/GPaper/index.htm.
Process flow diagram (PFD)
A high level chart that helps the analyst visualize the steps that a product goes through in a manufacturing or assembly process.
Process flow diagram (PFD) worksheet
A worksheet that captures details about what happens to an item in each step of its manufacturing or assembly process, and it records the product and process characteristics that are important to keep under control. The information from this worksheet can be used as an input to the process FMEA (PFMEA) and control plan for the item.
Proportional hazards model
An accelerated life testing model that can account for multiple non-thermal stresses as acceleration factors. This model allows the use of zero as a stress value, which enables the analysis of data with indicator variables (e.g., 0 = on/off and 1 = continuous operation).
A common buzzword referring to the non-quantifiable point-level excellence of a product or process. While sometimes used interchangeably with the term reliability, quality refers to the characteristics of a product at one point in time, while reliability refers to the characteristics of a product over its entire lifetime.
The probability of an item operating for a given amount of time without failure. More generally, reliability is the capability of parts, components, equipment, products and systems to perform their required functions for desired periods of time without failure, in specified environments and with a desired confidence.
see Life data analysis
Reliability block diagram
see Block diagram
Reliability centered maintenance (RCM) analysis
A structured framework for analyzing the functions and potential failure modes for a physical asset (such as an airplane, a manufacturing production line, etc.) in order to develop a scheduled maintenance plan that will provide an acceptable level of operability, with an acceptable level of risk, in an efficient and cost-effective manner.
The analysis of the change in reliability over time, usually applied to products under development. Reliability growth analysis provides the means by which the reliability, mean life or failure rate is tracked over time, allowing the user to predict future reliability values based on the current rate of growth of the reliability measurement of interest.
see Importance measure
Reliability life data analysis
see Life data analysis
Testing units to failure in order to obtain raw failure time data for life data analysis.
see Warranty time
An action that restores a failed part or component to operating condition.
A mathematical model that describes the probability of repairs occurring over time.
A system that can be restored to operating condition after a failure by the repair or replacement of one or more components.
Right censored data
Data that represents the length of time during which a unit has operated without failure (e.g., it might be observed that a unit did not fail during a 100-hour test); also called suspended data.
Risk discovery analysis
A preliminary analysis that often involves answering questions and/or assigning ratings about possible risks. It is used to help the analyst choose which items should receive more detailed consideration via a failure mode and effects analysis (FMEA) or reliability centered maintenance (RCM) analysis.
Risk priority number (RPN) system
A relative rating system that prioritizes issues by assigning numerical values to them in each of three different categories: Severity (S), Occurrence (O) and Detection (D). The three ratings are multiplied together to determine the overall RPN for each issue, where higher RPNs indicate higher risk priority.
A testing methodology in which test units are tested consecutively instead of simultaneously.
Qualitative ratings used to estimate the severity of each effect of failure.
The stocking of spare units or components based on the anticipated number of failures for a given mission or length of operation.
SPRT stands for sequential probability ratio test. This is a type of accept/reject sequential testing in which accept/reject boundaries are defined by the user and units are sequentially tested until either the accept boundary or the reject boundary have been reached, and a decision is made about the suitability of the units.
The branch of mathematics that deals with the collection, organization, analysis and interpretation of data.
A testing strategy whereby units are tested at stresses higher than what would be encountered during normal operating conditions, usually to induce failures.
A method by which the probability of failure of an item is calculated by superimposing the distribution of the item’s strength over the distribution of the stress it will encounter during normal usage.
see Right censored data
An FMEA performed with the objective of improving the design of a system.
The reliability of an entire system, as opposed to the reliability of its components. The system reliability is defined by the reliability of the components as well as the way the components are arranged reliability-wise.
An accelerated life testing model used when the two accelerating factors are temperature and humidity.
An accelerated life testing model used when the two accelerating factors are temperature and another non-thermal stress factor.
The amount of time during which a repairable unit is operating per design.
The analysis of warranty and return data for the purpose of determining the reliability characteristics of a product.
The time at which a specified reliability value will be reached (e.g., a goal of 90% reliability with a reliable life of 4 years means that if 100 identical units are fielded, then 90 of them will be still be operating at the end of 4 years).
A statistical distribution frequently used in life data analysis. Developed by Swedish mathematician Waloddi Weibull, this distribution is widely used due to its versatility and the fact that the Weibull pdf can assume different shapes based on the parameter values.