 Reliability HotWire

Issue 40, June 2004

Reliability Basics

Fielded Systems in Reliability Growth (Part I)

Analyzing data gathered from the field (i.e., fielded systems) returns some of the most important information since this information is collected from units that have been used by the customer. These systems can be categorized into two basic types: one-time (non-repairable) systems or reusable (repairable) systems. In the latter case, under continuous operation, the system is repaired but not replaced after each failure. For example, if the system is a vehicle and the water pump fails, then the water pump is replaced and the vehicle is repaired.

In Part I of this article, we will present repairable system data analysis in which the reliability of an individual system can be tracked and quantified based on data from multiple systems in the field. In next month's issue of Hotwire, Part II of this article will present fleet analysis in which data from multiple systems in the field can be collected and analyzed. In addition, reliability metrics for the fleet can be quantified and the effect of the reliability improvements can be investigated. When conducting fleet analysis, the results returned are for the entire fleet and not for an individual system.

Repairable Systems

Suppose a system was put into operation at age zero and operated for a period of time, T. The number of failures, N(T), experienced by the system during this time was random and the successive times of these failures 0 < X1 < X2 < ...XN(T) were also random. If the times between successive failures during the operation of the system, Xi - Xi-1,(i = 1, 2,...,X0 = 0), are independent, identically distributed exponential random variables with failure rate λ, then {N(t), t > 0} is a homogeneous Poisson process with intensity λ.

If Δt is infinitesimally small, then for a homogeneous Poisson process, λΔt is approximately the probability of an event occurring in any interval of length Δt, regardless of the time t at the beginning of the interval. In terms of a repairable system, this implies that the system is not improving nor wearing out with age, but rather is maintaining a constant intensity of failure.

The non-homogeneous Poisson process with intensity function u(t) is a generalization of the homogeneous Poisson process that allows for a change or trend in the intensity of system failure. Analogous to λΔt in the homogeneous Poisson process, u(t)Δt is approximately the probability that a failure will occur in the interval (t, t + Δt).

Unlike the homogeneous Poisson process failure probability, the intensity, u(t), may depend on the age, t, of the system. u(t) would be decreasing during debugging, constant over the system useful life and increasing during the wearout phase of the system. This suggests that a possible useful extension of the homogeneous Poisson process with exponential times between failures is the non-homogeneous Poisson process with Weibull time to first failure. The intensity function for this non-homogeneous Poisson process is: (1)

where:

• λ > 0
• β > 0
• t is the age of the system

This particular mathematical form of the intensity, u(t), is the same mathematical form as the failure rate for a Weibull distribution. However, it is very important to keep in mind that we are not dealing with the Weibull distribution and that the Weibull distribution terminology, interpretation of failure rate, estimation and other statistical procedures do not apply.

The non-homogeneous Poisson process with intensity has the power law mean value function: (2)

which is the expected number of failures for a system during its age (0, t). Because of the functional forms of the intensity function and the mean value function, this particular non-homogeneous Poisson process is often referred to as a "Weibull Poisson process" or the "power law Poisson process."

Note that for β = 1, we have the homogeneous Poisson process. For β > 1, u(t) is strictly increasing and the intervals between successive failures Xi - Xi-1 are stochastically decreasing, which would be characteristic of a wearout situation. For β < 1, u(t) is strictly decreasing and the intervals between successive failures Xi - Xi-1 are stochastically increasing, which would be characteristic of a debugging situation, perhaps caused by quality and manufacturing problems. This, however, is not reliability growth since the improvement pertains only to the reliability of an individual system and not to the reliability of future systems.

In addition to the intensity function, u(t), given by Eqn. (1) and the mean value function given by Eqn. (2), other relationships based on the power law are often of practical interest. For example, the probability that the system will go to age t + d without failure is given by: This is the mission reliability for a system of age t and mission length d.

Comparing to the Weibull Distribution

As mentioned previously, the power law model is a non-homogeneous Poisson process (NHPP) with Weibull time to first failure. This leads to some confusion with the Weibull distribution. It is important to understand the difference between the two models. The power law models the rate of failures over system age. This means that in the case of repairable system data, the time between successive events (or failures) is not independent nor identically distributed.

For example, consider a repairable system composed of three components. When one of these components fails, the system fails and the failed component is replaced with a new one. In other words, the three components are in a reliability-wise series configuration, as shown in Figure 1. Figure 1: Three components connected reliability-wise in series

Each of the three components has a failure distribution. This distribution can be Weibull, lognormal, etc. Therefore, every component follows its own process and every failure of a particular component is independent from its previous failure (since the component is replaced). So the failures of a particular component are independent and identically distributed. On the other hand, the system failures are dependent on the failures of the components (which all have different distributions) as well as on the ages of the components.

In summary, the difference between the Weibull distribution and the power law model is the fact that the events/failures are independent and identically distributed in a Weibull analysis, whereas they are dependent in the power law. In addition, under the NHPP power law model, it is also assumed that the system after the repair is "as bad as old." This is also known as "minimal repair" where the system reliability after the repair is the same as the system reliability before the failure.

Parameter Estimation

Suppose that the number of systems under study is K and the qth system is observed continuously from time Sq to time Tq, q = 1,...,K. During the period [Sq, Tq], let Nq be the number of failures experienced by the qth system and let Xi,q be the age of this system at the ith occurrence of failure, i = 1,...,Nq, q = 1,...,K. The times Sq, Tq, may possibly be observed failure times for the qth system. If XN{q},q = Tq, then the data on the qth system is said to be failure truncated and Tq is a random variable with Nq fixed. If XN{q},q < Tq, then the data on the qth system is said to be time truncated. The maximum likelihood (ML) estimates of λ and β are values that satisfy the following equations: (3) (4)

Where 0 ln(0) is defined to be zero. In general, these equations cannot be solved explicitly for and , but must be solved by iterative procedures. Once we have the estimates and , the ML estimate of the intensity function is given by: When Sq = 0 and the data set is time truncated at Tq = T (q = 1, 2,...,k), then the ML estimates and are in closed form. (5) (6)

Also, when k = 1, S1 = 0 and the data set is failure truncated, that is, XN,1 = T1, then and are in closed form. These estimates are:  Example

For the data in Table 1, the starting time for each system is equal to zero and the ending time for each system is 2000. Calculate the ML estimates and . Table 1: Repairable system failure data

For this data set, the general Eqns. (3) and (4) reduce to the closed form Eqns. (5) and (6) for calculating the ML estimates. These estimates are:    The system failure intensity function is then estimated by: Figure 2 is a plot of (t) over the period (0, 2000). Clearly, the estimated failure intensity function is most representative over the range of the data and any extrapolation should be viewed with the usual caution. Figure 2: Instantaneous Failure Intensity vs. Time plot

If all of the starting times for each of the systems are not equal to zero, then the maximum likelihood estimates for and are solved iteratively using Eqns. (3) and (4).

In next month's issue of Hotwire, Part II of this article will present fleet analysis. 