Reliability HotWire

Issue 103, September 2009

Reliability Basics

Steps in a System Reliability, Availability and Maintainability Simulation Analysis

This article provides an overview of the steps involved with performing system analysis via simulation, along with some introductory concepts. The focus is mainly system reliability, availability and maintainability (RAM) analysis. However, these concepts can be expanded to different fields.
 
Over the past decades, a system has been defined in multiple different ways, by different sources and for different fields. In the reliability arena, we may define it as a collection of components, subsystems and/or assemblies arranged to a specific design in order to achieve desired functions with acceptable performance and reliability.

There are multiple ways of analyzing a system once it has been defined. [1]



Figure 1: Ways to study a system

One could experiment with the actual system and, for example, run the system to failure in order to determine the system reliability. Often this is impractical because it is too expensive or requires too much time/resources. It may even be impossible if the system itself does not exist yet. In these cases, a model of the system can be developed so that the system can be studied. If a model is used, it could be a physical model, a prototype or a small-scale model. Building and experimenting with such a model can provide valuable information about a real system. If this is not feasible or sufficient, a mathematical model may be created.
 
Mathematical models may be analytical or simulation models. An analytical model for system reliability would be the system's probability density function (pdf) as a function of the individual failure distributions of the components. We could then obtain the system's cumulative distribution function (cdf) and other results of interest such as the conditional reliability, the warranty time, etc. We also could perform ancillary analyses such as reliability importance calculations or reliability allocation optimization. Analytical models provide exact solutions and, therefore, even with some assumptions and simplifications, analytical models may be preferable to simulation techniques.
 
Simulation refers to the creation and use of a computer model in order to replicate and analyze the behavior of a real system. As the complexity of a model increases (e.g. as repairs, resource utilization, throughput, preventive maintenance, inspections and other factors are to be considered), simulation quickly becomes the only feasible approach. Some advantages of simulation analysis are:

  • Real-world complex systems with stochastic components can be represented accurately with simulation models, but rarely with analytical models.
  • Simulation allows experimenting with a system without disrupting it. For example, the performance of an existing system can be evaluated in varying conditions. Alternative designs of a system can also be tested against a requirement.

When simulation is the approach of choice, one has to be aware of the disadvantages of a simulation study. Some disadvantages include:

  • The time and cost to develop, populate, simulate, validate and analyze a model is often high.
  • The results vary from run to run so the accuracy of the results is dependent on the number of simulations. Because of this, optimization is more challenging than comparing a fixed number of alternatives.
  • Results obtained via simulation techniques are often harder to "sell" than analytical models.
  • The large volume of numerical information often makes the analyst overconfident in the results. A simple example of this would be the number of significant figures often presented in study results.

If you have determined that it is appropriate to use simulation for system analysis, the next section describes some basic steps.

Basic Steps for Performing a Simulation Study

A simulation study may be divided into different steps. Many variations have been put forth in literature. Some basic steps that should be included in any study are outlined below.

  1. Define the problem
    Define the problem and the overall objectives of the study. Define the specific questions to be answered by the study. Identify the performance measures that will be used to evaluate the efficiency of the system. In a RAM analysis, this may be reliability, availability and/or throughput, along with costs. A life cycle cost (LCC) may also be desirable. Decide on the time-frame of the study and the required resources. This step should allow a full understanding of the scope of the study, keeping in mind that changes may occur as the study progresses. It involves a collaborative effort among all stakeholders: management, project manager, simulation analysts and subject matter experts (SMEs).
  2. Define the system
    This step requires a definition of the different elements and the ways they interact with the system. The system structure may be defined, possibly via a block diagram and/or a reliability block diagram (RBD), along with operating procedures and environment. The system definition should be kept as simple as possible allowing for complexity to be added as needed. The level of detail chosen is a factor that can determine the success or failure of a simulation study. More information on selecting the level of detail of the study can be found in [1]. At this point, it is important to reach an agreement between the stakeholders regarding the validity of the conceptual model before additional time and money are spent. The limitations and shortfalls should be discussed. The main goal of this step is to define a conceptual model of the system that is adequate for solving the problems and questions defined in the previous step.
  3. Collect the data
    This step in the study is labor-intensive as a large amount of data and processing may be required. Quantities of interest need to be collected, such as the probability distributions for failure and repair. If available, data on the existing system should be collected to validate the model. It is important to document the assumptions as the study progresses, especially during this step. Assumptions then can be reviewed during the different validation milestones.
  4. Construct the model
    If the choice has not been made, the analysts will need to decide whether to use a programming language (such as C or C++), a general simulation environment (such as Excel® or RENO) or high-level simulation software (such as BlockSim), where no programming is required by the user. The choice will influence the level of complexity of the system to be captured. For example, some assumptions may have to be made when using a commercial off-the-shelf package that may not be necessary when using a programming language. However, the time to develop the model will generally be lower with high-level software.
  5. Verify the model
    At this point, the analysts need to ensure that the model is actually doing what is expected. If the expected performance output collected is based on the actual system, this is the time to check to make sure the model matches the real system. The analysts and SMEs should check the model for correctness. Sensitivity analysis can be used to determine the impact of different factors on the performance of the system. This may assist in focusing on the critical aspects of the model.
  6. Design the simulation
    Some of the things that should be determined:
    • What should the initial conditions of the system be?
    • If steady-state results are of interest, what should the warm up period be?
    • What should the mission time be? (It is likely this may have been determined in Step 1.)
    • How many simulation runs should be used? This will be directly tied to the accuracy of the results.
  7. Run the model and analyze the output
    Run the model following the simulation design from the previous step. Generally, the objective in this phase is to determine the performance of one or more alternatives of the system so that a comparison can be done. Statistically sound analysis of the simulation output must be performed. One example of simulation output analysis in BlockSim can be found in [4]. Extensive literature in this topic is available and should be well understood. A perfectly good model may go to waste if this step is not done carefully.
  8. Document, present and use the results
    In this step, formal documentation should be compiled regarding the assumptions, the simulation model and its validation and the results of the study. In a simulation study, the process of discovery and understanding of the system is often as valuable as the results. Ideally, the bulk of the information to be documented has already been collected and is readily available. This information will be key, not only for the current and future understanding of the system, but also for the credibility of the study. If the results are both valid and credible, they can now be used as part of the decision-making process.

Conclusion
In this article, we presented general concepts to determine when a simulation study may be beneficial. Some basic steps for a simulation study were also defined. It should be noted that although these steps will be found in any sound simulation study, they are most likely part of an iterative process. Different validation milestones should be put in place to ensure the correctness of the model and to make sure the objectives of the study are still being met (or even to re-evaluate the objectives if necessary).

References
[1] Law, A. M. and Kelton, W. D, Simulation Modeling and Analysis, 1997.
[2] Mykytka, E. F. and Litko, J. R., "Simulation Modeling for Reliability Analysis," Tutorial Notes, Annual Reliability and Maintainability Symposium, January 2000.
[3] ReliaSoft Corporation, System Analysis Reference: Reliability, Availability and Optimization, Tucson: ReliaSoft Publishing, 2007.
[4] ReliaSoft Corporation. "An Application of BlockSims Log of Simulations." ReliaSoft Corporation. 2009. http://www.weibull.com/hotwire/issue97/relbasics97.htm.

Copyright 2009 ReliaSoft Corporation, ALL RIGHTS RESERVED