A Case Study Using Monte Carlo Simulation for Risk Analysis
RENO is a user friendly platform designed for building and running complex analyses for any probabilistic or deterministic scenario. It uses an intuitive flowchart modeling approach with Monte Carlo simulation to estimate or optimize the results for risk analysis, complex reliability modeling, maintenance planning, operational research, financial planning or other analysis objectives. In this article, we will use RENO to perform a simple reliability and risk analysis of an airplane’s environmental control system (ECS) and downstream air-conditioning subsystem.
An ECS is a system within an aircraft that provides the cabin with breathable air, thermal control and overall pressurization. The ECS is supplied by "bleed air" that is taken and compressed from engines on the aircraft. A failure of the air bleed system may be catastrophic to everyone on board because cabin depressurization can potentially to lead to hypoxia (oxygen deprivation) at altitudes above 10,000 feet, damage to the aircraft or even start a fire.
High pressure air leaks produce heat, which triggers heat detectors that alert the pilot to a potential fire or cabin depressurization. After the warning light turns on, the pilot can use the emergency shut off valves to retain cabin pressure.
In the following example, we will estimate the probability that a heat sensor fails to detect heat, given that there is a failure within the ECS. A heat detection failure would prevent the pilot from knowing about an air leak, which, if left undiscovered, may result in fatal consequences.
Download example file for Version 9 (*.rsgz9)
The following diagram shows a broad overview of an ECS system and its AC subsystem.
The diagram shows that air travels from the engine and into an upstream air bleed system. It then goes through a flow control valve and a heat exchanger before it is pressurized and released at a controlled temperature to the people on board. Each component has an associated heat detector with different detection capabilities. If the heat detectors alert the pilot to an air leak, the pilot can activate the shut off valves to retain pressure.
For the purposes of this example, we will use the following simplified diagram and assumptions.
Note that the following assumptions are for example purposes only and do not reflect actual failure distributions for an aircraft.
- The upstream air bleed system (UABS) reliability follows a 2-parameter Weibull distribution with a beta of 1.5 and an eta of 140,000 hours. A failure would be an air leak that causes depressurization and overheating within the UABS.
- The flow control valve (FCV) reliability follows a 2-parameter Weibull distribution with a beta of 2 and an eta of 120,000 hours. A failure would be an air leak that causes depressurization and overheating around the FCV.
- The heat exchanger (HE) reliability follows a 2-parameter Weibull distribution with a beta of 1.5 and an eta of 100,000 hours. A failure would be an air leak that causes depressurization and overheating around the HE.
- The UABS system heat detector is located at the end of the upstream system and detects 95% of all overheating occurrences.
- The FCV system heat detector is located next to the FCV and detects 90% of all overheating occurrences.
- The HE system heat detector is located next to the heat exchanger and detects 98% of all overheating occurrences.
- The aircraft must have operated for 20,000 hours before preventive maintenance (PM) is performed. If a failure is detected, the plane is grounded immediately and then repaired before another flight occurs.
Step 1: In RENO, create reliability models for the UABS, FCV and HE systems, as shown next.
Step 2: Define RENO static functions that will generate a random failure time from each model, as shown next.
In this example, the static functions use the RENO internal function called "rvm" to return a random value based on the UABS, FCV and HE models.
RENO static functions will compute a value only once before a simulation of the flowchart begins, as opposed to functions that generate new random values each time they are used in the flowchart. As you will see, these static functions will generate a failure time from each model and keep those failure times constant while simulation is in progress. This will allow us to compare those values against each other within the same simulation.
Step 3: Create a variable to define the aircraft's preventive maintenance cycle. In this case, an aircraft must have operated for 20,000 hours before maintenance is performed.
Step 4: Create a flowchart that follows the logic in the simplified diagram given above. For example, the following picture shows one possible flowchart for modeling the problem. This flowchart determines the probability that the heat detection sensors fail, given the probability that a component fails.
A breakdown of this flowchart is described next.
- The Upstream Air Bleed System block obtains a failure time based on the "StaticUABS" static function. The block then passes that value to the next block in the flowchart.
- The Flow Control Valve conditional block evaluates whether
the UABS's failure time is less than the FCV's failure time. If the
condition is true, then the UAB's failure time is sent down the "True"
path to compare it with the PM cycle time. If the
condition is false, the FCV's failure time is sent down the "False" path
to compare it with the HE's failure time.
The Heat Exchanger conditional block evaluates whether the FCV's failure time is less than the HE's failure time. If the condition is true, the FCV's failure time is compared with the PM cycle time; otherwise, the HE's failure time is compared with the PM cycle time.
- The Failure During Life conditional blocks compare the incoming failure times with the preventive maintenance cycle times to determine whether the aircraft is currently in use. Since PM_Cycle_Time is set to 20,000 hours, only failures that are less than or equal to 20,000 hours will be sent to the next block on the "True" path.
- The Failure Time Array result storage blocks record each failure time so that they can be plotted later. Notice that each component has its own result storage block so that each may be plotted separately.
- The Time Heat Occurs counter blocks count the number of times that overheating occurs. Each time the simulation passes through a counter block, the count is incremented by 1 and a value of 1 is passed to the next block in the flowchart.
- The Air Leak Is Properly Detected conditional blocks evaluate whether the heat detection systems were successful. Here, the
FP<=% condition configures the conditional blocks to ignore any incoming values. Instead, the
blocks will draw a random number uniformly distributed from 0 to 100, and then evaluate whether that number is less than or equal to the condition value.
For example, the UABS system detects 95% of all occurrences of overheating. Therefore, 95% of the time, a value of 1 will be sent down the "True" path and counted as a success, while 5% of the time, a value of 1 will be sent down the "False" path and counted as a failure.
Step 5: Perform 1,000 simulations on the flowchart. The results show that there is a 0.3% probability that the heat detectors will fail to detect an air leak within a preventive maintenance cycle.
The following plot shows the probability of failure at each time for the UABS, FCV and HE systems. The plot is generated based on data stored in the failure time array blocks.
Many reliability process analyses are complex and difficult to visualize. Creating a RENO flowchart can help reliability engineers better visualize the process, while using that same flowchart to perform analytical calculations or run simulations. Since RENO is also integrated into the Synthesis Platform, the reliability analysis can be linked to other related analyses in any other Synthesis application including Weibull++ and ALTA analyses, BlockSim diagrams and fault trees, etc.