<<< Back to Index
This Month's Tip >>>

Modeling Maintenance-induced Failures During the Scheduled Maintenance of a Component Using Reliability Block Diagrams (RBDs)

 

 

Sometimes the scheduled preventive maintenance (PM) of a component in a system can end up causing more harm than good. While the maintenance task was performed to prolong the life of the component, it instead leads to the failure of the component, resulting in corrective maintenance (CM). This article uses BlockSim 10 to show how you can include maintenance-induced failures in your simulations.

Introduction

The most common causes for maintenance-induced failures include:

  1. Failure caused directly by the technician, be it from a poorly written maintenance manual, lack of training or negligence. (Human error)
  2. The component is located in a place where access to it is blocked; therefore not allowing the technician to perform the maintenance with the proper tools. (Lack of design for serviceability and maintenance)
  3. Certain parts of the component, such as fittings or seals, cannot be easily opened/removed without being broken. (PM induced failures)

Traditionally, when conducting an analysis on a system using a reliability block diagram (RBD), these types of failures were ignored and all scheduled maintenance tasks were assumed to be performed exactly as intended. However, in BlockSim 10, the use of tasks based on maintenance groups and state change triggers allow us to simulate these maintenance-induced failures. The inclusion of maintenance-induced failures during the simulation of our system allows us to analyze the system in a more realistic fashion with respect to labor costs, spare part costs and unplanned downtime.

Example

For our analysis, imagine that we have a system composed of two main components. The first component is an electrical controller, which is only replaced upon failure. The second component is a mechanical lift, which is periodically restored to like-new condition, and replaced upon failure.

Analysis without maintenance-induced failures:

The RBD representing the system is:

Analysis without maintenance-induced failures RBD

The following tables show the reliability and maintenance properties for both the controller and the lift:

Block URD Reliability Model Corrective Task Scheduled Task(s)
Electrical Controller Exponential MTTF = 2.5 years Electrical Controller Corrective Task N/A
Mechanical Lift 2P-Weibull β=1.7, η=2 years Mechanical Lift Corrective Task Annual PM Scheduled Task

 

Corrective Task Task Frequency Duration Crew Spare Part Pool
Electrical Controller Corrective Task On Failure 168 hours CM Repair Crew Electrical Spares
Mechanical Lift Corrective Task On Failure 96 hours CM Repair Crew Mechanical Spares

 

Scheduled Task Task Type Task Frequency Duration Crew Spare Part Pool Restoration Factor
Annual PM Scheduled Task Preventive Every 1 year based on item age 40 hours PM Repair Crew N/A 1.0

 

Crew Cost per Incident Logistic Delay Number of Tasks
CM Repair Crew 5,000 24 hours included in cost No Limit
PM Repair Crew 1,000 N/A No Limit

 

Spare Pool Cost per Part Spare Limit Logistic Delay
Electrical Spares 50,000 No 24 hours
Mechanical Spares 75,000 No 24 hours

After entering the reliability and maintenance data in BlockSim, the block properties windows for the two components are shown below:

Electrical Controller block properties

Mechanical Lift block properties

Notice that this methodology ignores the possibility that the scheduled maintenance task will result in a component failure. Running 10,000 simulations with an end time of 5 years results in the following:

Availability: 0.9842
Downtime (hrs): 692.6
Crew Costs: 20582.1
Part Costs: 206017.5

Analysis with maintenance-induced failures:

The RBD representing the system is now this:

Analysis with maintenance-induced failures RBD

The two new blocks in the system RBD do not directly affect the system configuration. We use the PM Timer block to keep track of when the lift is down for scheduled maintenance. (Note that this timer ignores the corrective maintenance of the lift because we are considering only preventive maintenance-induced failures in this article.) We use the PM Failure block to model the chance of a maintenance-induced failure during scheduled maintenance of the lift.

The logic behind the preventive maintenance failure is:

  1. When the lift goes down for the annual preventive maintenance, it triggers an inspection of equal duration on the PM Timer block that brings the PM Timer block down.
  2. The inspection of the PM Timer block turns the PM Failure block ON. (Note that the PM Failure block is OFF when the lift is operating or during the corrective maintenance of the lift.) We give this block a reliability model that has the desired probability of failure at the duration length of the lift scheduled maintenance. (For this example, assume we want to model a 20 percent probability of failure per scheduled maintenance (i.e., at 40 hours), so we choose an exponential distribution with an MTTF of 179.2 hours.)
  3. The scheduled maintenance on the lift and the inspection on the PM Timer block end at the same time. This triggers the PM Failure block to turn OFF and also triggers an inspection of the PM Failure block. If the PM Failure block has not failed (indicating that the maintenance on the lift was successful), nothing happens. If it has failed (indicating that the maintenance on the lift was unsuccessful), the PM Failure block is restored through its own corrective maintenance (with a duration set to 0.1 hours to prevent too many events occurring simultaneously) and then turned OFF until the next annual preventive maintenance of the lift. The corrective maintenance of the PM Failure block also triggers a preventive maintenance on the lift that is designed to mimic the corrective maintenance of the lift.

To accomplish this, we must place the Mechanical Lift, PM Timer and PM Failure blocks into separate maintenance groups. Also, we must enable the state change triggers in the PM Failure block using the following setup:

State Change Triggers options

The following tables show the reliability and maintenance properties for all four relevant blocks:

Block URD Reliability Model Corrective Task Scheduled Task(s)
Electrical Controller Exponential MTTF = 2.5 years Electrical Controller Corrective Task N/A
Mechanical Lift 2P-Weibull β=1.7, η=2 years Mechanical Lift Corrective Task Annual PM Scheduled Task, CM as PM Scheduled Task
PM Timer Cannot Fail N/A PM Timer Scheduled Task
PM Failure Exponential MTTF = 179.2 hours PM Failure Corrective Task PM Failure Scheduled Task

 

Corrective Task Task Frequency Duration Crew Spare Part Pool
Electrical Controller Corrective Task On Failure 168 hours CM Repair Crew Electrical Spares
Mechanical Lift Corrective Task On Failure 96 hours CM Repair Crew Mechanical Spares
PM Failure Corrective Task When found failed during inspection 0.1 hrs N/A N/A

 

Scheduled Task Task Type Task Frequency Duration Crew Spare Part Pool Restoration Factor
Annual PM Scheduled Task Preventive Every 1 year based on item age 40 hours PM Repair Crew N/A 1.0
CM as PM Scheduled Task (mimic the CM of the lift) Preventive Based on the start of the corrective task for the PM Failure 96 hours CM Repair Crew Mechanical Spares 1.0
PM Timer Scheduled Task Inspection Based on the start of the preventive task for the Mechanical Lift 40 hours N/A N/A 0.0
PM Failure Scheduled Task Inspection Based on the block restored for the PM Timer Immediate N/A N/A 0.0

 

Crew Cost per Incident Logistic Delay Number of Tasks
CM Repair Crew 5,000 24 hours included in cost No Limit
PM Repair Crew 1,000 N/A No Limit

 

Spare Pool Cost per Part Spare Limit Logistic Delay
Electrical Spares 50,000 No 24 hours
Mechanical Spares 75,000 No 24 hours

The block properties windows for the Mechanical Lift, PM Timer and PM Failure blocks are shown below. (Note that we made no changes to the Electrical Controller block.)

Mechanical Lift block properties

PM Timer block properties

PM Failure block properties

This methodology incorporates the chance that the annual PM on the lift fails, thus requiring an immediate replacement of the lift before we can use the system. Running 10,000 simulations with an end time of 5 years results in the following:

Availability: 0.9823
Downtime (hrs): 776.0
Crew Costs: 24077.5
Part Costs: 258697.5

Conclusion

Based on a 5 year system simulation, by removing the assumption that preventive maintenance is performed flawlessly, we were able to show that downtime is likely to increase by 12%, the crew costs by 17% and the part costs by 26%. These increases are significant enough to justify the need for components that are designed for serviceability and maintenance, well-written preventive maintenance procedures and well-trained technicians. Even with these improvements, the possibility of a maintenance-induced failure always exists; our improvements only reduce the probabilities and frequencies of these events. The application of RBD system analyses through simulation is needed as part of the business strategy to understand the potential risks and costs associated with unplanned downtime to support data-driven decisions for equipment upgrades, standard work development and training.

 

 
ReliaSoft