Reliability HotWire

Issue 32, October 2003

Reliability Basics

As awareness of product reliability increases, so does the responsibility of engineering organizations to insure that reliability requirements are met. As a result, engineers and managers who have had little experience with life data analysis or applied statistics may find themselves responsible for calculating and reporting on a product's reliability. This article offers background on elementary concepts of reliability and life data analysis that will hopefully prove to be of use to the novice analyst. We will begin our discussion with an overview of how to characterize your product's reliability.

The most general purpose of life data analysis is to characterize the life behavior of a product. We will assume that the product in question is non-repairable (i.e. it is used from its initial turn-on and is run until it fails, at which time it is replaced).

In more simple terms, the purpose of reliability analysis is to indicate the probability of success for a specified time. This probability is called the reliability and is always associated with a given time. That is, the given percentage representing the probability of success is a function of time and is essentially paired with an associated time. For example, a specification may call for a 90% reliability at 100 hours of operation. This means that the product has a 90% probability of running for 100 hours without failure. It can also be interpreted as 90% of a population of such products will run for 100 hours, while the other 10% will have failed before 100 hours.

Other reliability/time combinations will hold true for the same product. For example, the products in the previous example may have a reliability of 75% at 200 hours. The relationship between reliability and operation time for a product can generally be characterized by a continuous reliability function or curve, which represents reliability as a function of time. This function is usually denoted as R(t), with R representing the dependent variable reliability and t representing the independent variable time. A graphical representation of such a function is shown in the following figure:

This represents the probability of failure over the lifetime of the product and is one of the fundamental measures in life data analysis.

One other reliability metric that merits quick discussion is that of the mean life, or MTBF/MTTF. This is widely used as a reliability metric due to its simplicity. However, it is very easy to become overly reliant on this metric, which is often thought to be synonymous with a reliability of 50%. However, this is not always the case and the use of the MTBF in these circumstances may result in misleading characterizations of a products reliability. For a detailed discussion on the unsuitability of the MTBF as the sole reliability metric, see the "The Limitations of Using the MTTF as a Reliability Specification" article in this issue of Reliability Hotwire. This characterization is the result of the analysis of life test data or from field failure data. This data would take the form of the amount of time it took for a number of units to fail. This concept sometimes does not sit well with those involved in the product development process who, quite understandably, feel uncomfortable with associating their carefully-designed products with failure. However, the fact remains that all products will eventually fail if operated for a long enough period of time. In order to characterize when this failure time is likely to happen, failure data are required.

Failure data may be obtained from a reliability or life test conducted in a controlled environment, the purpose of which is to operate units to failure in order to obtain data for reliability analysis. Ideally, all of the units put on the test should be operated until they fail, resulting in a data set comprised of complete data. Sometimes this is not possible due to time and budgetary constraints and there will be accumulated test time for units that did not fail. This is known as suspended data and, while not as important as complete failure data, it should not be discarded. This is because the information it contains (i.e. the amount of time the units have run without failing) is also important in the assessment of a products reliability. Further, one must take care that the conditions under which the data set is obtained are very close to those the product will see during normal operation. Otherwise, the data obtained from the test may lead to inaccurate reliability results, which may in turn lead to poor business decisions. The problem of operating conditions is not a concern when analyzing field failure data, for the units under analysis were operated under actual use conditions. One of the drawbacks of field failure data is that it may consist primarily of suspended data. Another one of the drawbacks of field failure data is that it may be tainted or incomplete. For example, many times field data obtained for reliability analysis may have originally been collected for another purpose, such as financial warranty purposes. In some cases, the data may not have all of the necessary information required to perform a good reliability analysis. Also, there may be large portions of information missing, that is, large segments of the field population which are unaccounted for. Have they failed? How long have they been running? Are they still in operation? The answers to these questions are very important in the analysis of field data and if this information cannot be provided for a large segment of the product's population, a field data analysis may return grossly inaccurate results. It is generally a good idea to have a reliability professional involved in the development of field data collection systems in order to avoid some of these pitfalls.