|
A Method for Estimating Bounds on System Reliability in the Absence of Times-to-Failure Data
A common method for obtaining confidence bounds on a system uses the variance of reliability of each component in the system along with the system reliability equation. This analytical method (discussed in the August 2009 issue of Reliability HotWire) requires that you have failure data for each component in the system. If times-to-failure data are not available for all components (e.g., the reliability information is obtained from a supplier who will share the distribution parameters, but not the actual failure data), some reliability engineers use a worst case/best case method based on the lower or upper confidence bounds for the parameters of the components. However, when data are available to compare the results, this approach provides estimates that are quite far from the analytical solution.
This article introduces a simulation method of obtaining approximate bounds on system reliability that does not require data for each component, but provides results that are closer to the analytical solution.
Analytical Method
Jacob started with a set of times to failure data for a specific component. In Weibull++, he fitted a 2-parameter Weibull model to the data and created a plot with 90% 2-sided confidence bounds as shown in Figure 1.
Figure 1: Times-to-failure data fit with a 2-parameter Weibull distribution with 90% 2-sided confidence bounds
Jacob then published the model to make it available in BlockSim, as shown in Figure 2.
Figure 2: Published model of times-to-failure data
Now suppose that Jacob wanted to compute 90% 2-sided confidence bounds for a system of 10 of these components arranged in a parallel configuration. He created the block diagram shown in Figure 3 and assigned the Universal Reliability Definition (URD) shown in Figure 4 to each component block.
Figure 3: 10 components arranged reliability-wise in parallel
Figure 4: URD assigned to each component in the RBD of Figure 3
Figure 5 shows the resulting system reliability plot with 90% confidence bounds.
Figure 5: System reliability with 90% 2-sided confidence bounds for 10 components arranged reliability-wise in parallel
Worst Case/Best Case Method
Jacob had heard about other reliability engineers who fit distributions to only the lower and upper bounds of each component and used those bounds as the failure models in the RBD in order to obtain the system-level bounds. He decided to try this method with his current data set by doing the following:
- Create a table of bounds on unreliability versus time in Weibull++.
- Copy the data for each bound and the corresponding time to a free-form folio, then use the Distribution Wizard to determine the model and parameters that best fit the bounds data. In general, the distribution that fits the bounds best will not be the same as the distribution that fits the times-to-failure data. In other words, if the times-to-failure data follow a Weibull distribution, the median estimate will follow a Weibull distribution but it is very likely that the bounds will not follow a Weibull distribution, as shown in Table 1.
Unreliability | |||
5% | 50% | 95% | |
Model | Generalized gamma | Weibull | Generalized gamma |
Parameter 1 | μ = 7.57 | β = 2.10 | μ = 7.43 |
Parameter 2 | σ = 0.500 | η = 1819 | σ = 0.477 |
Parameter 3 | λ = 0.671 | N/A | λ = 1.32 |
- Use an overlay plop to compare the actual reliability versus the time plot with bounds to the fitted distributions. Figure 6 shows that in this case the original and fitted data were virtually indistinguishable.
Figure 6: Overlay plot of component reliability with confidence bounds and models from step 2
- Publish the fitted models for the upper and lower bounds to make them available to use in BlockSim.
- Duplicate the block diagram used in the analytical method twice. In the first diagram, assign all blocks a single URD using the lower bound as the failure model; in the second diagram, assign all blocks a single URD using the upper bound as the failure model.
Figure 7 shows how Jacob plotted the best and worst case bounds along with the analytical solution and confidence bounds. He observed that the worst case and best case solutions are more than twice the distance from the analytical solution as the analytical confidence bounds. He decided to try a different approach.
Figure 7: Comparison of analytical (solid lines) and worst case/best case (dashed /dotted lines) reliability estimates and bounds
Simulation Method
In order to more closely approximate the analytical confidence bounds, Jacob considered what system confidence bounds really mean. They are a measure of the variability of the reliability of the system that take into account the variability of each input block. The worst case (or best case) method assumes that every block in the system has lower (or higher) than average reliability, while the analytical method assumes that the system has a mixture of lower than average reliability components, average reliability components, and higher than average reliability components. So Jacob concluded that in order to more closely approximate the analytical method, he must incorporate some cases where all components have lower than average reliability, some cases where all components have higher than average reliability, and many cases where there are both lower and higher than average reliability components.
Jacob started by modeling the variability of the Weibull parameters of the component failure distribution by doing the following:
- Create the table of estimates of beta and eta at different percentiles given in Table 2.
Percentile | Beta | Eta |
10 | 1.81 | 1658 |
30 | 1.97 | 1752 |
50 | 2.10 | 1819 |
70 | 2.23 | 1889 |
90 | 2.43 | 1995 |
- Use a free-form folio to fit a separate distribution for each parameter to obtain the parameter models given in Table 3.
Beta | Eta | |
Model | lognormal | lognormal |
In-mean | 0.740 | 7.51 |
In-standard deviation | 0.115 | 0.0721 |
- Publish the model for each parameter for use in the subsequent RENO simulation.
Jacob created a RENO simulation that will compute system reliability from samples of the component parameter distributions by doing the following:
- Create the appropriate variables (e.g., time values, reliability values and other variables to control the flow of the RENO simulation), as shown in Figure 8:
Figure 8: Variables used in RENO simulation
- Create an analytical RBD with 10 components in parallel. Figure 9 shows how he gave each component a separate URD with a dynamic failure model that uses the value contained in one of the static reliability variables created in the previous step.
Figure 9: RBD called by RENO simulation
- Publish the analytical RBD model to use in the RENO simulation.
- Create a Synthesis Workbook to store results.
- Create a RENO flowchart with an inner loop for computing system reliabilities at a given time and an outer loop for generating time values to be used in the inner loop.
- Each pass through the inner loop:
- Computes a static reliability value for each component from pairs of parameters randomly chosen from the distributions of beta and eta as shown in Figure 10.
Figure 10: Block properties for the first component showing the equation used to generate the static reliability value used in the RBD
- Calculates and stores the system reliability value calculated using the published RBD model with the component static reliability values, as shown in Figure 11.
Figure 11: Block properties for computing the system reliability. The upper left window shows the call to the RBD to obtain the system unreliability. The lower right window calculates the system reliability.
- Each pass through the outer loop (shown in blue) computes and stores a time value for use in the inner loop (shown in yellow).
Figure 12: RENO flowchart. The inner loop for calculating system reliability for multiple combinations of parameters at a particular time
is in yellow and the outer loop for generating time values to be used in the inner loop is in blue.
- For each time value stored in the workbook, sort the reliability values. For this scenario, 99 reliability values are computed at each time step, so the 5th largest value roughly corresponds to the lower bound and the 95th largest value corresponds to the upper bound, as shown in Figure 13.
Figure 13: RENO simulation results sorted in ascending order for each time step. The lower bound of system reliability is shown in bold. (Note that only the first 20 percentiles and the first 15 time steps are shown.)
In order to compare the three methods, Figure 14 shows that Jacob plotted the results on a single graph. The plot shows that the simulation method results in a reliability estimate and bounds that are much closer to the analytical solution than the worst case / best case bounds. At larger time values, the simulation median estimate and bounds are greater than the analytical solution, but the distance between the bounds and the median estimate is preserved.
Figure 14: Comparison of analytical (solid lines), worst case/best case (dashed/dotted lines) and simulation (dashed lines) reliability estimates and bounds
The simulation method is a reasonable approximation to the analytical method for diagrams with few blocks that have reliability equations that are sums of products of component reliabilities. However, the simulation method should not be used for diagrams with complex dependencies, such as load share or standby redundancy, as it will give incorrect results in these scenarios.
Conclusion
This article presented a method for obtaining approximate confidence bounds for the scenario where times-to-failure data is unavailable. The method compared favorably to calculating confidence bounds on system reliability analytically. It can provide a reasonable estimate of the variability of system reliability when sufficient information to build an analytical model is not available.
Appendix
For the purpose of comparing the simulation method to the analytical method, we used the variances of the parameters computed in Weibull++ to obtain distributions to describe each parameter; however, we could just as easily start from rough estimates of the variability of the parameters of the failure time distribution. For example, Figure 15 shows how we could use as little information as the median estimate of beta provided by the supplier (e.g., 2.1) and an expert opinion on the 90th percentile value of beta (e.g., 2.4) with the Quick Parameter Estimator to get a rough estimate of the variability of beta to use in place of the model fit to the various percentiles of beta.
Figure 15: Use of Quick Parameter Estimator to define distributions of component parameters in the absence of times-to-failure data