Reliability Importance Measures of Components in a Complex System - Identifying the 20% in the 80/20 Rule
When analyzing a systems reliability and availability, measuring the importance of components is often of significant value in prioritizing improvement efforts, performing trade-off analysis in system design or suggesting the most efficient way to operate and maintain a system. Focusing on the most problematic areas in the system results in the most significant gains. This article presents different ways for assessing the importance of non-repairable and repairable components within a system using BlockSim.
With modern technology and higher reliability requirements, systems are getting more complicated. Therefore, identifying the most problematic components can become difficult. Many systems are repairable systems composed of many components that fail and get repaired based on different distributions. With limitations and constraints (such as spare parts availability, repair crew response time, logistic delays etc.), exact analytical solutions become intractable. In these cases, simulation becomes the tool of choice in modeling repairable systems and identifying weak components and areas where maintainability limitations hinder the availability of the system.
Note: In this article, the cost of improving the reliability of the component is not considered. Cost of improvement is covered in the Reliability Allocation section of the System Analysis Reference.
1. Importance Measures for Non-Repairable
Using Reliability Importance (IR) measures is one method of identifying the relative importance of each component in a system with respect to the overall reliability of the system. The reliability importance, IRi, of component i in a system of n components is given by Leemis :
This metric measures the rate of change (at time t) of the system reliability with respect to the components reliability change. It also measures the probability of a component being responsible for system failure at time t. The value of the reliability importance given by Eqn. (1) depends on both the reliability of a component and its corresponding position in the system.
As an example, let us consider the system described in Figure 1.
The failure distributions for the components in the diagram are:
The system reliability equation for this configuration can be expressed as:
Hence, according to Eqn. (1), the reliability importance of component A, for example, is:
By varying the time value, t, and obtaining the corresponding reliabilities at t for each of the components in the above equation, we can obtain the reliability importance value for different times. For instance, if t=50 hr, IRA=0.936. The same procedure can be applied for every component.
This type of reliability importance measure can be presented graphically in various ways. The following BlockSim plot shows the reliability importance of each block in Figure 1 over time.
The next plot is a snapshot of the previous plot at a specific time value (this is called "static reliability importance").
The following plot is also static reliability importance, but is presented as a "square pie chart" that shows the breakdown of the components reliability importance.
The three plots above show the clear dominance of two (20%) of the components, A and I, in responsibility for most of the failures of the system.
2. Importance Measures for Repairable
Table 1 Maintainability Characteristics of the Figure 1 Example System
Through simulation, the system and components histories over time can be captured. The results of the simulation can be used to quantify two other types of reliability importance measures, ReliaSoft's Failure Criticality Index (RS FCI) and ReliaSoft's Downing Event Criticality Index (RS DECI), both available in BlockSim. A discussion of these two metrics is presented next.
2.1. ReliaSoft's Failure Criticality Index
This metric considers only failure events and excludes preventive maintenance and inspection events that cause an interruption is the systems operation.
RS FCI reports the percentage of times that a system failure event was caused (triggered) by a failure of a particular component over the simulation time (0,t). Intuitively, this index has the same meaning and the same application as the Reliability Importance measure, IRi, described in Eqn. (1).
For example, if we simulate the systems operation for 5000 hr in BlockSim, we obtain the following Block Summary report.
Figure 2 Blocks Simulation Summary Report for
5000 hr of System Operation
For component A, RS FCI = 73.73%. This implies that 73.73% of the times that the system failed, a component A failure was responsible. Note that the RS FCI of A and I is 81.67%. In other words, A and I contributed to about 80% of the systems total downing failures.
The RS FCI results can also be seen in a graphical format.
2.2. ReliaSoft's Downing Event Criticality
Index (RS DECI)
This metric considers all downing events, i.e. failures, preventive maintenance and inspection events, that cause an interruption in the systems operation.
In Figure 2, we see that for component A, RS DECI = 51.68%. This implies that 51.68% of the times that the system was down were due to component A being down. Note that the RS DECI of A and I is 80.05%. Once again we see how the vital few issues, A and I (20% of the components), contributed to about 80% of the system downtime, whereas the trivial many (80% of the components) contributed to only 20% of the downtime.
The RS DECI results can also be seen in a graphical format.
3. FRED Report
For the repairable system example in Figure 1, the FRED report is shown next.
The FRED report shows the average availability, the MTBF, the MTTR (mean time to repair) and the RS FCI values for each component in the system. In addition, the components are color coded (a color spectrum varying from red, for worst reliability, to dark green for best reliability) to show the reliability of each component in relation to the other components. For example, we can conclude from the above FRED report that component Gs reliability needs improvement and that component Bs maintainability needs improvement (MTTR=678.71h).
4.1. Eliminating Problems
In BlockSim, you can delete a block or set it so that it does not fail; this will eliminate its effect.
Changing Failure Distribution
The above analysis can be used to weigh the
gains obtained by switching to a more expensive supplier.
The next example shows the impact on availability if each preventive maintenance policy applied on the components is performed every 200 hr of component age.
Copyright 2006 ReliaSoft Corporation, ALL RIGHTS RESERVED