Reliability Importance Measures of Components in a Complex System - Identifying the 20% in the 80/20 Rule
[Editor's Note: This article has been updated since its original publication to reflect a more recent version of the software interface.]
When analyzing a system's reliability and availability, measuring the importance of components is often of significant value in prioritizing improvement efforts, performing trade-off analysis in system design or suggesting the most efficient way to operate and maintain a system. Focusing on the most problematic areas in the system results in the most significant gains. This article presents different ways for assessing the importance of non-repairable and repairable components within a system using BlockSim.
With modern technology and higher reliability requirements, systems are getting more complicated. Therefore, identifying the most problematic components can become difficult. Many systems are repairable systems composed of many components that fail and get repaired based on different distributions. With limitations and constraints (such as spare parts availability, repair crew response time, logistic delays etc.), exact analytical solutions become intractable. In these cases, simulation becomes the tool of choice in modeling repairable systems and identifying weak components and areas where maintainability limitations hinder the availability of the system.
Note: In this article, the cost of improving the reliability of the component is not considered. Cost of improvement is covered in the Reliability Allocation section of the System Analysis Reference.
1. Importance Measures for Non-Repairable
Using reliability importance (IR) measures is one method of identifying the relative importance of each component in a system with respect to the overall reliability of the system. The reliability importance, IRi, of component i in a system of n components is given by Leemis :
This metric measures the rate of change (at time t) of the system reliability with respect to the component's reliability change. It also measures the probability of a component being responsible for system failure at time t. The value of the reliability importance given by Eqn. (1) depends on both the reliability of a component and its corresponding position in the system.
As an example, let us consider the system shown next.
The failure distributions for the components in the diagram are:
The system reliability equation for this configuration can be expressed as:
Hence, according to Eqn. (1), the reliability importance of component A, for example, is:
By varying the time value, t, and obtaining the corresponding reliabilities at t for each of the components in the above equation, we can obtain the reliability importance value for different times. For instance, if t=50 hr, IRA=0.936. The same procedure can be applied for every component.
This type of reliability importance measure can be presented graphically in various ways. The following BlockSim plot shows the reliability importance of each block over time.
The next plot is a snapshot of the previous plot at a specific time value (this is called "static reliability importance").
The following plot is also static reliability importance, but is presented as a "square pie chart" that shows the breakdown of the components' reliability importance.
The three plots above show the clear dominance of two (20%) of the components, A and I, in responsibility for most of the failures of the system.
2. Importance Measures for Repairable
Table 1: Maintainability Characteristics of the System
Through simulation, the system and component histories over time can be captured. The results of the simulation can be used to quantify two other types of reliability importance measures, ReliaSoft's Failure Criticality Index (RS FCI) and ReliaSoft's Downing Event Criticality Index (RS DECI), both available in BlockSim. A discussion of these two metrics is presented next.
2.1. ReliaSoft's Failure Criticality Index
This metric considers only failure events and excludes preventive maintenance and inspection events that cause an interruption in the system's operation.
RS FCI reports the percentage of times that a system failure event was caused (triggered) by a failure of a particular component over the simulation time (0,t). Intuitively, this index has the same meaning and the same application as the reliability importance measure, IRi, described in Eqn. (1).
For example, if we simulate the system's operation for 5,000 hours in BlockSim, we obtain the following Block Summary report.
For component A, RS FCI = 75.03%. This implies that 75.03% of the times that the system failed, a component A failure was responsible. Note that the combined RS FCI of A and I is 81.41%. In other words, A and I contributed to about 80% of the system's total downing failures.
The RS FCI results can also be seen in a graphical format.
2.2. ReliaSoft's Downing Event Criticality
Index (RS DECI)
This metric considers all downing events (i.e., failures, preventive maintenance and inspection events that cause an interruption in the system's operation).
In the simulation results, we see that for component A, RS DECI = 46.30%. This implies that 46.30% of the times that the system was down were due to component A being down. Note that the combined RS DECI of A and I is 84.69%. Once again we see how the vital few issues, A and I (20% of the components), contributed to about 80% of the system downtime, whereas the trivial many (80% of the components) contributed to only 20% of the downtime.
The RS DECI results can also be seen in a graphical format.
3. FRED Report
For the repairable system example, the FRED report is shown next.
The FRED report shows the average availability, the MTBF, the MTTR (mean time to repair) and the RS FCI values for each component in the system. In addition, the components are color coded to show the maintainability/availability of each component in relation to the other components (using a color spectrum varying from red for worst, to dark green for best). For example, we can conclude from the above FRED report that component G's reliability needs improvement (MTBF=107.717) and that component A's availability is the lowest (Am=0.912).
4.1. Eliminating Problems
In BlockSim, you can delete a block or set it so that it does not fail; this will eliminate its effect.
Changing Failure Distribution
B10 Life for the System with C1
B10 Life for the System with C2
The above analysis can be used to weigh the
gains obtained by switching to a more expensive supplier.
The next example shows the impact on availability if preventive maintenance is performed on every component every 500 hours of component age. You can see that availability increases, which indicates that the original schedule of preventive maintenance was far too frequent. By performing the maintenance less often, you will not only increase availability (by reducing planned downtime) but will also save money on the maintenance tasks.
Mean Availability of the System with Original PM Timing
Mean Availability of the System with New PM Timing
2. Leemis, L.M. Reliability Probabilistic Models and Statistical Methods, Prentice Hall, Inc. Englewood Clifs, New Jersey, 1995.
3. Wang, W., Loman, J., Vassiliou, P., Reliability Importance of Components in a Complex System, Proceedings of the Annual Reliability & Maintainability Symposium, 2004.
Copyright 2006 ReliaSoft Corporation, ALL RIGHTS RESERVED