One of the most important benefits of simulation is the ability to define how and when actions are performed. In our case, the actions of interest are part repairs/replacements. This is accomplished in BlockSim through the use of maintenance policies. Specifically, three different types of policies can be defined for maintenance actions: corrective maintenance, preventive maintenance and inspection.
A corrective maintenance policy defines when a corrective maintenance (CM) action is performed. Figure 8.27 shows a corrective maintenance policy assigned to a block in BlockSim.
Figure 8.27: Setting a corrective maintenance policy in BlockSim.
Corrective actions will be performed either immediately upon failure of the item or upon finding that the item has failed (for "hidden" failures that are not detected until an inspection). BlockSim allows the selection of either category. If "Upon Failure" is selected, the CM action is initiated immediately upon failure. If no policy has been set for a block, then this is the default option. All prior examples were done based on the instruction to perform a CM upon failure. If the "Upon Inspection" option is selected, then the CM action will only be initiated after an inspection is done on the failed component. How and when the inspections are performed is defined by the block's inspection properties and also by the inspection policy. This has the effect of defining a dependency between the corrective maintenance policy and the inspection policy, as shown in Figure 8.28.
Figure 8.28: Cascading dependencies present when CM "Upon Inspection" has been specified.
Figure 8.28 shows the options available in an inspection policy within BlockSim. Inspections can be performed upon a fixed time interval. This is either based on the item's age (item clock) or the system's age (system clock). Furthermore, inspections can also be set to occur if the system goes down or if another group item goes down. Within BlockSim, items are considered to be in the same group if they have the same non-zero "Item Group #." Note that the default value for this is 0. Zero is a reserved number and it means that the item does not belong to any group. Inspections do not bring the item down by default.
Figure 8.29 shows the options available in a preventive maintenance (PM) policy within BlockSim. Much like inspections, PMs can be performed upon a fixed time interval. This is either based on the item's age (item clock) or the system's age (system clock). Furthermore, PM actions can also be set to occur if the system goes down or if another group item goes down. Because PM actions always bring the item down, one can also specify whether preventive maintenance will be performed if the action brings the system down.
Figure 8.29: PM policy options.
It is important to keep in mind that the system and each component of the system maintains a separate clock within the simulation. Figure 8.30 illustrates system and item clocks. The system clock is the simulation elapsed time while the item clock is the age of the item since last renewal. If the system clock is used, the inspection will be performed every X time units. Whereas, if the item clock is used, the inspection will be performed every time the component reaches that age. As an example, if the inspection is set to be performed at a system age of 100, then an inspection will be performed at 100, 200, 300 and so forth. If the inspection is set based on an item's age of 100, then the inspection will be performed when the item reaches an age of 100.
Figure 8.30: The system and each block maintain different clocks during each simulation.
Inspection tasks can be used to check for indications of an approaching failure. BlockSim models such indications of when an approaching failure will become detectable upon inspection using Failure Detection Threshold and P-F Interval. Failure detection threshold allows the user to enter a number between 0 and 1 indicating the percentage of an item's life that must elapse before an approaching failure can be detected. For instance, if the failure detection threshold value is set as 0.8 then this means that the failure of a component can be detected only during the last 20% of its life. If an inspection occurs during this time, an approaching failure is detected and the inspection triggers a preventive maintenance task to take the necessary precautions to delay the failure by either repairing or replacing the component.
The P-F interval allows the user to enter the amount of time before the failure of a component when the approaching failure can be detected by an inspection. The P-F interval represents the warning period that spans from P (when a potential failure can be detected) to F (when the failure occurs). If a P-F interval is set as 200 then the approaching failure of the component can only be detected 200 time units (tu) before the failure of the component. Thus, if a component has a fixed life of 1,000 tu and the P-F interval is set to 200 tu, then if an inspection occurs at or beyond 800 tu, the approaching failure of the component that is to occur at 1,000 tu is detected by this inspection and a preventive maintenance task is triggered to take action against this failure.
Example using P-F Interval
To illustrate the use of the P-F interval in BlockSim, consider a component A that fails every 700 tu. The corrective maintenance on this equipment takes 100 tu to complete, while the preventive maintenance takes 50 tu to complete. Both the corrective and preventive maintenance actions have a type II restoration factor of 1. Inspection tasks of 10 tu duration are performed on the component every 300 tu. There is no restoration of the component during the inspections. The P-F interval for this component is 100 tu (see Figure 8.31).
Figure 8.31. Inspection policy options for the P-F interval example.
The component behavior form 0 to 2000 tu is shown in Figure 8.32 and described next.
At 300 tu the first scheduled inspection of 10 tu duration occurs. At this time the age of the component is 300 tu. This inspection does not lie in the P-F interval of 100 tu (which begins at the age of 600 tu and ends at the age of 700 tu.) Thus, no approaching failure is detected during this inspection.
At 600 tu the second scheduled inspection of 10 tu duration occurs. At this time the age of the component is 590 tu (no age is accumulated during the first inspection from 300 tu to 310 tu as the component does not operate during this inspection). Again this inspection does not lie in the P-F interval. Thus, no approaching failure is detected during this inspection.
At 720 tu the component fails after having accumulated an age of 700 tu. A corrective maintenance task of 100 tu duration occurs to restore the component to as-good-as-new condition.
At 900 tu the third scheduled inspection occurs. at this time the age of the component is 80 tu. This inspection does not lie in the P-F interval (from age 600 tu to 700 tu). Thus, no approaching failure is detected during this inspection.
At 1200 tu the fourth scheduled inspection occurs. at this time the age of the component is 370 tu. Again, this inspection does not lie in the P-F interval and no approaching failure is detected.
At 1500 tu the fifth scheduled inspection occurs. At this time the age of the component is 660 tu, which lies in the P-F interval. As a result, an approaching failure is detected and the inspection triggers a preventive maintenance task. A preventive maintenance task of 50 tu duration occurs at 1510 tu to restore the component to as-good-as-new condition.
At 1800 tu the sixth scheduled inspection occurs. At this time the age of the component is 240 tu. this inspection does not lie in the P-F interval (from age 600 tu to 700 tu) and no approaching failure is detected.
Figure 8.32 Component behavior for P-F interval example.
All the options available in the Maintenance tab of the Block Properties window and the associated policies were designed to maximize the modeling flexibility within BlockSim. However, maximizing the modeling flexibility introduces issues that the user needs to be aware of and requires the user to carefully select options in order to assure that the selections do not contradict one another. One obvious case would be to define a PM action on a component in series (which will always bring the system down) and then assign a PM policy to the block that has the "Do not perform maintenance if the action brings the system down" option set. With these settings, no PMs will ever be performed on the component during the BlockSim simulation. The following sections summarize some issues and special cases for the user to consider when defining maintenance properties and policies in BlockSim.
Inspections do not consume spare parts. However, an inspection can have a renewal effect on the component if the restoration factor is set to a number other than the default of 0.
On the inspection tab, if Inspection brings system down is selected, this also implies that the inspection brings the item down.
If a PM or an inspection is scheduled based on the item's age, then it will occur exactly when the item reaches that age. However, it is important to note that failed items do not age. Thus, if an item fails before it reaches that age, the action will not be performed. This means that if the item fails before the scheduled inspection (based on item age) and the CM is set to be performed upon inspection, the CM will never take place. The reason that this option is allowed in BlockSim is for the flexibility of specifying renewing inspections.
Downtime due to a failure discovered during a non-downing inspection is included when computing results "w/o PM & Inspections."
If a PM upon item age is scheduled and is not performed because it brings the system down (based on the option in the PM policy) the PM will not happen unless the item reaches that age again (after restoration by CM, inspection or another type of PM).
If the CM policy is upon inspection and a failed component is scheduled for PM prior to the inspection, the PM action will restore the component and the CM will not take place.
In the case of simultaneous events, only one event is executed. The following precedence order is used: inspection, preventive maintenance, corrective maintenance.
The PM option of Do not perform if it brings the system down is only considered at the time that PM needs to be initiated. If the system is down at that time, due to another item, then the PM will be performed regardless of any future consequences to the system up state. In other words, when the other item is fixed, it is possible that the system will remain down due to this PM action. In this case, the PM time difference is added to the system PM downtime.
If the CM policy is upon inspection, the inspection does not restore the block, only the CM restores the block.
Downing events cannot overlap. If a component is down due to a PM and another PM is suggested based on another trigger, the second call is ignored.
A non-downing inspection with a restoration factor restores the block based on the age of the block at the beginning of the inspection (i.e. duration is not restored). Note that this is different from BlockSim 6.
Non-downing events can overlap with downing events. If in a non-downing inspection and a downing event happen concurrently, the non-downing event will be dealt with in parallel with the downing event.
If a failure or PM occurs during a non-downing inspection and the CM or PM has a restoration factor and the inspection action has a restoration factor, then both restoration factors are used (compounded).
A PM or inspection on system down is triggered only if the system was up at the time that the event brought the system down.
A non-downing inspection with a restoration factor of 0 does not affect the block.
An inspection that finds a block at or beyond the failure detect threshold will trigger a preventive maintenance action as long as preventive maintenance can be performed on that block.
An inspection that finds a block within the range of the P-F Interval will trigger a preventive maintenance action as long as preventive maintenance can be performed on the block.
To illustrate the use of maintenance policies in BlockSim, let's use the same example (the example using both crews and pools) from the Using Resources: Pools and Crews section of this on-line reference with the following modifications (Figures 8.33 and 8.34 also show these settings):
Figure 8.33: CM and Inspection settings for blocks A and D for the example in the Using Resources: Pools and Crews section of this on-line reference.
Blocks A and D:
Belong to the same group (Group 1).
Corrective maintenance actions are upon inspection (not upon failure) and the inspections are performed every 30 tu based on system time. Inspections have a duration of 1 tu. Furthermore, unlimited free crews are available to perform the inspections.
Whenever either item fails, the other one gets a PM.
The PM has a fixed duration of 10 tu.
The same crews are used for both corrective and preventive maintenance actions.
Figure 8.34: PM setting for blocks A and D for the example in the Using Resources: Pools and Crews section of this on-line reference.
The item and system behavior from 0 to 300 hours is shown in Figure 8.35 and described next.
Figure 8.35: Up/down event sequence for the system and the blocks in the example in the Using Resources: Pools and Crews section of this on-line reference.
At 100, block A goes down and brings the system down.
No maintenance action is performed since an "upon inspection" policy was utilized.
The next scheduled inspection is at 120, thus Crew A is called to perform the maintenance by 121 (end of the inspection).
Crew A arrives and initiates the repair on A at 131.
The only part in the pool is utilized and an on-condition restock is triggered.
Pool [on-hand = 0, pending: 150s, 181].
Block A is repaired by 141.
At the same time (121), a PM is initiated for block D because the PM policy called for "PM upon a maintenance action on another group item."
Crew B is called for block D and arrives at 136.
No part is available until 150. An on-condition restock is triggered for 181.
Pool [on-hand = 0, pending: 150s, 181, 181].
At 150, a part becomes available and the PM is completed by 160.
Pool [on-hand = 0, pending: 181, 181].
At 161, block B fails (corrective maintenance upon failure).
Block B gets Crew A, which arrives at 171.
No part is available until 181. An on-condition restock is triggered for 221.
Pool [on-hand = 0, pending: 181, 181, 221].
A part arrives at 181.
The repair is completed by 201.
Pool [on-hand = 0, pending: 181, 221].
At 162, block C fails.
Block C gets Crew C, which arrives at 177.
No part is available until 181. An on-condition restock is triggered for 222.
Pool [on-hand = 0, pending: 181, 221, 222].
A part arrives at 181.
The repair is completed by 201.
Pool [on-hand = 0, pending: 221, 222].
At 163, block F fails and brings the system down.
Block F calls Crew A then B. Both are busy.
Crew A will be the first available so F calls A again and waits.
No part is available until 221. An on-condition restock is triggered for 223.
Pool [on-hand = 0, pending: 221, 222, 223].
Crew A arrives at 211.
Repair begins at 221.
Repair is completed by 241.
Pool [on-hand = 0, pending: 222, 223].
At 298, block A goes down and brings the system down.
Figure 8.36: Simulation results for the example in the Using Resources: Pools and Crews section of this on-line reference.
System Uptime: This is 200 tu.
This can be obtained by observing the following system up durations, 0 to 100, 160 to 163 and 201 to 298.
System CM Downtime: This is 58 tu.
Observe that even though the system failed at 100, the CM action (on block A) was initiated at 121 and lasted until 141, thus only 20 tu of this downtime are attributed to the CM action.
The next CM action started at 163 when block F failed and lasted until 201 when blocks B and C were restored, thus adding another 38 tu of CM downtime.
System Inspection Downtime: This is 1 tu.
The only time the system was under inspection was from 120 to 121, during the inspection of block A.
System PM Downtime: This is 19 tu.
Note that the entire PM action duration on block D was from 121 to 160.
Until 141, and from the system perspective, the CM on block A was the cause for the downing. Once block A was restored (at 141), then the reason for the system being down became the PM on block D.
Thus, the PM on block D was only responsible for the downtime after block A was restored, or from 141 to 160.
System Total Downtime: This is 100 tu.
This includes all of the above downtimes plus the 20 tu (100 to 120) and the 2 tu (298 to 300) that the system was down due the undiscovered failure of block A.
Mean Availability (All Events):
Mean Availability (w/o PM & Inspection):
This is due to the CM downtime 58, the undiscovered downtime of 22 and the inspection downtime of 1, or:
It should be noted that the inspection downtime was included even though the definition was "w/o PM & Inspection." The reason for this is that the inspection did not cause the downtime in this case. Only downtimes caused by the PM or inspections are excluded.
Point Availability and Reliability at 300 is zero because the system was down at 300.
Expected Number of Failures is 3.
The system failed at 100, 163 and 298.
The MTTFF is 100 because the example is deterministic.
Number of Failures is 3.
The first is the failure of block A, the second is the failure of block F and the third is the failure of block A.
Number of CMs is 2.
The first is the CM on block A and the second is the CM on block F.
Number of Inspections is 1.
Number of PMs is 1.
Total Events are 6. These are events that the downtime can be attributed to. Specifically, the following events were observed:
The failure of block A at 100.
Inspection on block A at 120.
The CM action on block A.
The PM action on block D (after A was fixed).
The failure of block F at 163.
The failure of block A at 298.
The details for blocks A, B, C, D and F are shown in Figure 8.37.
Figure 8.37: Block details for this example.
We will discuss some of these results. First note that there are four downing events on block A: initial failure, inspection and CM, plus the last failure at 298. All others have just one. Also, block A had a total downtime of 41 + 2, giving it a mean availability of 0.8567. The first time-to-failure for block A occurred at 100 while the second occurred after 298 - 141 = 157 hours of operation, yielding an average time between failures (MTBF) of 257/2 = 128.5 (note that this is the same as uptime/failures). Block D never failed so its MTBF cannot be determined. Furthermore, MTBDE for each item is determined by dividing the block's uptime by the number of events. The RS FCI and RS DECI metrics are obtained by looking at the SD Failures and SD Events of the item and the number of system failures and events. Specifically, the only items that caused system failure are blocks A and F; A at 100 and 298 and F at 163. It is important to note that even though one could argue that block F alone did not cause the failure (B and C were also failed), the downing was attributed to F because the system reached a failed state only when block F failed.
On the number of inspections, which were scheduled every 30 tu nine occurred for block A [30, 60, 90, 120, 150, 180, 210, 240, 270] and eight for block D. Block D did not get inspected at 150 because block D was undergoing a PM action at that time.
Figure 8.38 shows the crew results.
Figure 8.38: Crew details for this example.
Crew A received a total of six calls and accepted three. Specifically,
At 121, the crew was called by block A and the call was accepted.
At 121, block D also called for its PM action and was rejected. Block D then called crew B, which accepted the call.
At 161, block B called crew A. Crew A accepted.
At 162, block C called crew A. Crew A rejected and block C called crew B, which accepted the call.
At 163, block F called crew A and then crew B and both rejected. Block F then waited until a crew became available at 201 and called that crew again. This was crew A, which accepted.
The total wait time is the time that blocks had to wait for the maintenance crew. Block F is the only component that waited, waiting 38 tu for crew A.
Also, the costs for crew A were 1 per unit time and 10 per incident, thus the total costs were 100 + 30. The costs for crew B were 2 per unit time and 20 per incident, thus the total costs were 156 + 40.
Figure 8.39 shows the spare part pool results. The pool started with a stock level of 1 and ended up with 2. Specifically:
At 121, the pool dispensed a part to block A and ordered another to arrive at 181.
At 121, it dispensed a part to block D and ordered another to arrive at 181.
At 150, a scheduled part arrived to restock the pool.
At 161 the pool dispensed a part to block B and ordered another to arrive at 221.
At 181, it dispensed a part to block C and ordered another to arrive at 222.
At 221, it dispensed a part to block F and ordered another to arrive at 223.
The 222 and 223 arrivals remained in stock until the end.
Overall, five parts were dispensed. Blocks had to wait a total of 126 tu to receive parts (B : 181 - 161 = 20, C : 181 - 162 = 19, D : 150 - 121 = 29 and F: 221 - 163 = 58).
Figure 8.39: Pool details for this example.
To illustrate some special cases that one needs to be aware of, consider the following diagram.
In this diagram, blocks A and D have the same properties as before, with the exception that the inspection duration is now set to zero. Furthermore, recall the rule that only one event is executed in the case of simultaneous events. In this case and when block A fails, the inspection on block A at 120 will find the failure of A, which will then trigger a PM event on block D at the same instant that D also gets an inspection. This causes two simultaneous events on block D. This will result in the cancellation of the PM event on block D. The reason for the cancellation is to avoid the recursive situation where the PM on D would trigger a PM on A, which is undergoing CM, which would trigger a PM on D and so forth. Different options can be used to avoid this. One is to assign a non-zero inspection duration. In this case, the PM on block D would get triggered after the inspection on block A, as seen in the prior example.
See Also:
Repairable Systems Analysis Through Simulation
Go
to weibull.com
Go
to ReliaSoft.com
©1999-2007. ReliaSoft Corporation. ALL RIGHTS RESERVED.