To further illustrate the probabilistic case presented in the Simple Repairs section of this on-line reference, assume that both components in the prior example had normal failure and repair distributions with their means equal to the deterministic values used in the prior example and standard deviations of 10 and 1 respectively. That is, FA ~ N(100,10), FB ~ N(120,10), RA = RB ~ N(10,1). Obviously, given the probabilistic nature of the example, the times to each event will vary. If one were to repeat this X number of times, one would arrive at the results of interest for the system and its components. Some of the results for this system and this example, over 1,000 simulations, are given in Figure 8.8 and explained in the next sections. (Note: The results are based on 1000 simulations run in BlockSim 7 and with an end time of 300 and a fixed seed of 1.) The simulation settings are shown in Figure 8.7.
Figure 8.7: BlockSim simulation window.
Figure 8.8: Summary of system results for 1,000 simulations.
This is the mean availability due to all downing events, which can be thought of as the operational availability. It is the ratio of the system uptime divided by the total simulation time (total time). For this example:
Std Deviation (Mean Availability)
This is the standard deviation of the mean availability of all downing events for the system during the simulation.
This is the mean availability due to failure events only and it is 0.8679 for this example. Note that for this case, the mean availability without preventive maintenance and inspection is identical to the mean availability for all events. This is because no preventive maintenance actions or inspections were defined for this system. We will discuss the inclusion of these actions in later sections.
Downtimes caused by PM and inspections are not included. However, if the PM or inspection action results in the discovery of a failure, then these times are included. As an example, consider a component that has failed but its failure is not discovered until the component is inspected. Then the downtime from the time failed to the time restored after the inspection is counted as failure downtime, since the original event that caused this was the component's failure.
This is the probability that the system is up at time t. As an example, to obtain this value at t = 300, then a special counter would need to be utilized during the simulation. This counter is incremented by one every time the system is up at 300 hours. Thus, the point availability at 300 would be the times the system was up at 300 divided by the number of simulations. For this example, this is 0.933, or 933 times out of the 1000 simulations the system was up at 300 hours.
This is the probability that the system has not failed by time t. This is similar to point availability with the major exception that it only looks at the probability that the system did not have a single failure. Other (non-failure) downing events are ignored. During the simulation, a special counter again must be utilized. This counter is incremented by one (once in each simulation) if the system has had at least one failure up to 300 hours. Thus, the point reliability at 300 would be the number of times the system did not fail up to 300 divided by the number of simulations. For this example, this is 0 because the system failed prior to 300 hours 1000 times out of the 1000 simulations.
It is very important to note that this value is not always the same as the reliability computed using the analytical methods, depending on the redundancy present. The reason that it may differ is best explained by the following scenario:
Assume two units in parallel. The analytical system reliability, which does not look at repairs, is the probability that both units fail. In this case, when one unit goes down, it does not get repaired and the system fails after the second unit fails. In the case of repairs, however, it is possible for one of the two units to fail and get repaired before the second unit fails. Thus, when the second unit fails, the system will still be up due to the fact that the first unit was repaired.
This is the average number of system failures. The system failures (not downing events) for all simulations are counted and then averaged. For this case, this is 3.998, which implies that a total of 3,998 system failure events occurred over 1000 simulations. Thus, the expected number of system failures for one run is 3.998. This number includes all failures, even those that may have a duration of zero.
This is the standard deviation of the number of failures for the system during the simulation.
MTTFF is the mean time to first failure for the system. This is computed by keeping track of the time at which the first system failure occurred for each simulation. MTTFF is then the average of these times. This may or may not be identical to the MTTF obtained in the analytical solution for the same reasons as those discussed in the Point Reliability section. For this case, this is 99.4692. This is fairly obvious for this case since the mean of one of the components in series was 100 hours.
It is important to note that for each simulation run, if a first failure time is observed, then this is recorded as the system time to first failure. If no failure is observed in the system, then the simulation end time is used as a right censored (suspended) data point. MTTFF is then computed using the total operating time until the first failure divided by the number of observed failures (constant failure rate assumption). Furthermore, and if the simulation end time is much less than the time to first failure for the system, it also possible that all data points are right censored (i.e. no system failures were observed). In this case, the MTTFF is again computed using a constant failure rate assumption, or:
Where TS is the simulation end time and N is the number of simulations. One should be aware that this formulation may yield unrealistic (or erroneous) results if the system does not have a constant failure rate. (Note: This is not any different than the error one gets when trying to estimate the MTTF from a sample of units tested for a short period of time and without observing any failures.) If you are trying to obtain an accurate (realistic) estimate of this value, then your simulation end time should be set to a value that is well beyond the MTTF of the system (as computed analytically). As a general rule, the simulation end time should be at least three times larger than the MTTF of the system.
This is the average time the system was up and operating. This is obtained by taking the sum of the uptimes for each simulation and dividing it by the number of simulations. For this example, the uptime is 260.35. To compute the Operational Availability, Ao, for this system, then:
(3)
This is the average time the system was down for corrective maintenance actions (CM) only. This is obtained by taking the sum of the CM downtimes for each simulation and dividing it by the number of simulations. For this example, this is 39.64.
To compute the Inherent Availability, AI, for this system over the observed time (which may or may not be steady state, depending on the length of the simulation), then:
(4)
This is the average time the system was down due to inspections. This is obtained by taking the sum of the inspection downtimes for each simulation and dividing it by the number of simulations. For this example, this is zero because no inspections were defined.
This is the average time the system was down due to preventive maintenance (PM) actions. This is obtained by taking the sum of the PM downtimes for each simulation and dividing it by the number of simulations. For this example, this is zero because no PM actions were defined.
This is the downtime due to all events. In general, one may look at this as the sum of the above downtimes. However, this is not always the case. It is possible to have time overlaps over actions, depending on the options and settings for the simulation. Furthermore, there are other events that can cause the system to go down that do not get counted in any of the above categories. As an example, in the case of standby redundancy with a switch delay, if the settings are to reactivate the failed component after repair, the system may be down during the switch-back action. This downtime does not fall into any of the above categories but it is counted in the total.
For this example, this is identical to
System downing events are events associated with downtime. If the duration of the event is zero, the event is not counted as a system downing event. However, the block properties CM brings system down, PM brings system down and Inspection brings system down take precedence in which case an event with zero duration will be counted as a system downing event.
This is the average number of system downing failures. Unlike the Expected Number of Failures, NF, this number does not include failures with zero duration. For this example, this is 3.998.
This is the number of corrective maintenance actions that caused the system to fail. It is obtained by taking the sum of all CM actions that caused the system to fail divided by the number of simulations. It does not include CM events of zero duration. For this example, this is 3.998. Note that this may differ from the Number of Failures (System Downing), . An example would be the case where the system has failed but, due to other settings for the simulation, a CM is not initiated (e.g. an inspection is needed to initiate a CM).
This is the number of inspection actions that caused the system to fail. It is obtained by taking the sum of all inspection actions that caused the system to fail divided by the number of simulations. It does not include inspection events of zero duration. For this example, this is zero.
This is the number of PM actions that caused the system to fail. It is obtained by taking the sum of all PM actions that caused the system to fail divided by the number of simulations. It does not include PM events of zero duration. For this example, this is zero.
This is the total number of system downing events. It also does not include events of zero duration. It is possible that this number may differ from the sum of the other listed events. As an example, consider the case where a failure does not get repaired until an inspection but the inspection occurs after the simulation end time. In this case, the number of inspections, CMs and PMs will be zero while the number of total events will be one.
Cost and throughput results are discussed in later sections.
It is important to note that two identical system downing events (that are continuous or overlapping) may be counted and viewed differently. As shown in Case 1 of Figure 8.9, two overlapping failure events are counted as only one event from the system perspective because the system was never restored and remained in the same down state, even though that state was caused by two different components. Thus, the number of downing events in this case is one and the duration is as shown in CM system. In the case that the events are different, as shown in Case 2 of Figure 8.9, two events are counted, the CM and the PM. However, the downtime attributed to each event is different than the actual time of each event. In this case, the system was first down due to a CM and remained in a down state due to the CM until that action was over. However, immediately upon completion of that action, the system remained down but now due to a PM action. In this case, only the PM action portion that kept the system down is counted.
Figure 8.9: Duration and count of different overlapping events.
The system point results, as shown in Figure 8.10, shows the Point Availability (All Events), A(t), and Point Reliability, R(t), as defined in the previous section. These are computed and returned at different points in time, based on the number of intervals selected by the user. Additionally, this window shows (1 - A(t)), (1 - R(t)), Cost(t), Mean A(t), Mean A(ti - ti - 1), System Failures(t) and Throughput(t).
Figure 8.10: System point results. The number of intervals shown is based on the increments set (Figure 8.7). The results shown in this figure are for 10 increments, or shown every 30 tu.
Results by component are presented in the next section of this on-line reference.
See Also:
Repairable Systems Analysis Through Simulation
Go
to weibull.com
Go
to ReliaSoft.com
©1999-2007. ReliaSoft Corporation. ALL RIGHTS RESERVED.