Reliability HotWire  
Reliability Basics  
Fielded Systems in Reliability Growth (Part I) Analyzing data gathered from the field (i.e., fielded systems) returns some of the most important information since this information is collected from units that have been used by the customer. These systems can be categorized into two basic types: onetime (nonrepairable) systems or reusable (repairable) systems. In the latter case, under continuous operation, the system is repaired but not replaced after each failure. For example, if the system is a vehicle and the water pump fails, then the water pump is replaced and the vehicle is repaired. In Part I of this article, we will present repairable system data analysis in which the reliability of an individual system can be tracked and quantified based on data from multiple systems in the field. In next month's issue of Hotwire, Part II of this article will present fleet analysis in which data from multiple systems in the field can be collected and analyzed. In addition, reliability metrics for the fleet can be quantified and the effect of the reliability improvements can be investigated. When conducting fleet analysis, the results returned are for the entire fleet and not for an individual system. Repairable Systems Suppose a system was put into operation at age zero and operated for a period of time, T.
The number of failures, N(T), experienced by the system during this
time was random and the successive times of these failures 0 < X_{1 }
< X_{2
}< ...X_{N(T)} were also random. If the times between
successive failures during the operation of the system, X_{i}

X_{i1},(i = 1, 2,...,X_{0} = 0), are
independent, identically distributed exponential random variables with
failure rate λ,
then {N(t), t
> 0} is a homogeneous Poisson process with intensity
λ. If Δt is infinitesimally small, then for a homogeneous Poisson process, λΔt is approximately the probability of an event occurring in any interval of length Δt, regardless of the time t at the beginning of the interval. In terms of a repairable system, this implies that the system is not improving nor wearing out with age, but rather is maintaining a constant intensity of failure. The nonhomogeneous Poisson process with
intensity function u(t) is a generalization of the homogeneous
Poisson process that allows for a change or trend in the intensity of system
failure. Analogous to
λΔt
in the homogeneous Poisson process, u(t)Δt
is approximately the probability that a failure will occur in the interval (t,
t + Δt). Unlike the homogeneous Poisson process failure probability, the intensity, u(t), may depend on the age, t, of the system. u(t) would be decreasing during debugging, constant over the system useful life and increasing during the wearout phase of the system. This suggests that a possible useful extension of the homogeneous Poisson process with exponential times between failures is the nonhomogeneous Poisson process with Weibull time to first failure. The intensity function for this nonhomogeneous Poisson process is:
where:
This particular mathematical form of the intensity, u(t), is the same mathematical form as the failure rate for a Weibull distribution. However, it is very important to keep in mind that we are not dealing with the Weibull distribution and that the Weibull distribution terminology, interpretation of failure rate, estimation and other statistical procedures do not apply. The nonhomogeneous Poisson process with intensity has the power law mean value function:
which is the expected number of failures for a system during its age (0, t). Because of the functional forms of the intensity function and the mean value function, this particular nonhomogeneous Poisson process is often referred to as a "Weibull Poisson process" or the "power law Poisson process." Note that for β = 1, we have the homogeneous Poisson process. For β > 1, u(t) is strictly increasing and the intervals between successive failures X_{i}  X_{i1} are stochastically decreasing, which would be characteristic of a wearout situation. For β < 1, u(t) is strictly decreasing and the intervals between successive failures X_{i}  X_{i1} are stochastically increasing, which would be characteristic of a debugging situation, perhaps caused by quality and manufacturing problems. This, however, is not reliability growth since the improvement pertains only to the reliability of an individual system and not to the reliability of future systems. In addition to the intensity function, u(t), given by Eqn. (1) and the mean value function given by Eqn. (2), other relationships based on the power law are often of practical interest. For example, the probability that the system will go to age t + d without failure is given by:
This is the mission reliability for a system of age t and mission length d. Comparing to the Weibull Distribution As mentioned previously, the power law model is a nonhomogeneous Poisson process (NHPP) with Weibull time to first failure. This leads to some confusion with the Weibull distribution. It is important to understand the difference between the two models. The power law models the rate of failures over system age. This means that in the case of repairable system data, the time between successive events (or failures) is not independent nor identically distributed. For example, consider a repairable system composed of three components. When one of these components fails, the system fails and the failed component is replaced with a new one. In other words, the three components are in a reliabilitywise series configuration, as shown in Figure 1.
Figure 1: Three components connected reliabilitywise in series Each of the three components has a failure distribution. This distribution can be Weibull, lognormal, etc. Therefore, every component follows its own process and every failure of a particular component is independent from its previous failure (since the component is replaced). So the failures of a particular component are independent and identically distributed. On the other hand, the system failures are dependent on the failures of the components (which all have different distributions) as well as on the ages of the components. In summary, the difference between the Weibull distribution and the power law model is the fact that the events/failures are independent and identically distributed in a Weibull analysis, whereas they are dependent in the power law. In addition, under the NHPP power law model, it is also assumed that the system after the repair is "as bad as old." This is also known as "minimal repair" where the system reliability after the repair is the same as the system reliability before the failure. Parameter Estimation Suppose that the number of systems under study is K and the q^{th} system is observed continuously from time S_{q} to time T_{q}, q = 1,...,K. During the period [S_{q}, T_{q}], let N_{q} be the number of failures experienced by the q^{th} system and let X_{i,q} be the age of this system at the i^{th} occurrence of failure, i = 1,...,N_{q}, q = 1,...,K. The times S_{q}, T_{q}, may possibly be observed failure times for the q^{th} system. If X_{N{q},q }= T_{q}, then the data on the q^{th} system is said to be failure truncated and T_{q} is a random variable with N_{q} fixed. If X_{N{q},q }< T_{q}, then the data on the q^{th} system is said to be time truncated. The maximum likelihood (ML) estimates of λ and β are values that satisfy the following equations:
Where 0 ln(0) is defined to be zero. In general, these equations cannot be solved explicitly for and , but must be solved by iterative procedures. Once we have the estimates and , the ML estimate of the intensity function is given by:
When S_{q }= 0 and the data set is time truncated at T_{q }= T (q = 1, 2,...,k), then the ML estimates and are in closed form.
Also, when k = 1, S_{1 } = 0 and the data set is failure truncated, that is, X_{N,1 }= T_{1}, then and are in closed form. These estimates are: Example For the data in Table 1, the starting time for each system is equal to zero and the ending time for each system is 2000. Calculate the ML estimates and .
Table 1: Repairable system failure data For this data set, the general Eqns. (3) and (4) reduce to the closed form Eqns. (5) and (6) for calculating the ML estimates. These estimates are:
The system failure intensity function is then estimated by:
Figure 2 is a plot of (t) over the period (0, 2000). Clearly, the estimated failure intensity function is most representative over the range of the data and any extrapolation should be viewed with the usual caution.
Figure 2: Instantaneous Failure Intensity vs. Time plot If all of the starting times for each of the systems are not equal to zero, then the maximum likelihood estimates for and are solved iteratively using Eqns. (3) and (4). In next month's issue of Hotwire, Part II of this article will present fleet analysis.  
Copyright 2004 ReliaSoft Corporation, ALL RIGHTS RESERVED  