Using FMRA to Estimate Baseline Reliability
As you may have seen when exploring the different standards and templates for FMEA, the analysis method can be modified to meet different
objectives. Regardless of what the objective is, at the end of the day the FMEA process will produce a wealth of information from a
cross-functional team that should be leveraged for other Design for Reliability (DFR) activities. Most of us are familiar with the traditional
approaches of using information from the design FMEA (DFMEA) as an input to design verification plans (e.g., DVP&Rs), process FMEAs and process
control plans. Some practitioners also have experience with using FMEA data to generate fault trees for advanced risk analysis. What we have
not done to date is use the DFMEA as the starting point in our reliability analysis, and as an integral part of our DFR process. In
this article, we would like to introduce you to a new type of analysis based on the DFMEA called Failure Modes and Reliability Analysis (FMRA).
FMEA from a DFR and Reliability Perspective
On its own, the DFMEA activity accomplishes its objectives of identifying potential failures, assessing risk and initiating corrective actions to
improve the design. In doing so, the analysis also produces a wealth of information that can be effectively leveraged by other activities. As an example, and
early on in the design process, the DFMEA can be used to generate a baseline estimate of the design’s reliability, which is sought as part of the overall DFR program.
As a starting point when other reliability information is not yet available, the quantitative probability of occurrence for each failure cause can be
obtained from the qualitative occurrence rating that has been assigned by the FMEA team as part of the traditional risk priority number (RPN) calculation. For
the purposes of this analysis, the traditional FMEA occurrence scale can be expanded to include a quantitative value for each rating in the scale. For example, if
the FMEA team assumed that the occurrence rating labeled "Rare" implies 1:100,000 then Probability of Occurrence = 0.0010%.
This could be treated as a fixed probability (Q) that is the same regardless of how long the product operates. Alternatively, and for better reliability
modeling, a life distribution could be used to describe this probability. For example, if the FMEA team assumed that the probability of
occurrence by 1,000 hours of operation is "Rare" (1:100,000), then an exponential distribution could be easily substituted by computing a
lambda for a time=1,000 where the unreliability (i.e., probability of occurrence of this failure cause) is 0.0010%.
Then, for each item within the existing FMEA, a fault tree or reliability block diagram (RBD) can be easily constructed relating the probability of occurrence
of each cause to the probability of failure of the item. For example, if the team assumes that the item will fail if any one of the failure causes occurs, this
could be modeled with a series configuration RBD or an OR gate in a fault tree. The model can then be expanded to the entire system using combinations of RBDs and
fault trees, rolling up from the FMEA causes to the system level.
Note that for series reliability-wise configurations,
Version 8 of Xfmea/RCM++ software can automatically
construct the RBDs (in the background) based on the system configuration and failure causes defined in the FMEAs. For more complex configurations, the software allows
users to view and modify the configurations in a synchronized view of the FMRA in BlockSim.
With this approach, the reliability modeling subject matter expert leverages the work done by the FMEA team to automatically create a baseline reliability
model. Obviously, and at this stage, the results obtained at the system level are solely based on the probabilities of occurrence defined for each cause in the
FMEA, which may or may not be correct. During this modeling activity, an overall assessment of the validity of these values can be performed and communicated
back to the FMEA team for reassessment, modification or further information gathering actions (e.g., reliability testing). (See the later discussion in
the FMRA Vetting Process section.)
Illustrating the FMRA Process
To illustrate the FMRA process, consider a simple example based on the assembly/component DFMEAs for a single light pendant chandelier. The following picture shows
the rating scale that was used by the team when they assigned an occurrence rating to each failure cause identified in the FMEA. In addition to the qualitative ratings
and criteria (e.g., 1 = 1 in 1 Million), the scale also has a quantitative value associated with each rating (e.g., 1 in 1 Million = 0.000001).
Based on this probability of occurrence and its corresponding probabilistic definition, one can easily build a one-parameter exponential reliability model for
each failure cause as follows:
Assuming an exponential distribution, its single parameter λ can be estimated
Note that the exponential distribution is the default choice because a single probability value/time is the only information available. When better information
is obtained, other more appropriate models should be utilized to describe the reliability. (See
the later discussion in
the FMRA Vetting Process section.)
Now, assuming that any one of these causes could cause the component to fail (reliability-wise in series), an initial reliability estimate at any given time
could easily be obtained by combining the causes to get the reliability of the component and then combining components and assemblies until we reach the system
level. We will call this the "first draft" FMRA. Note that more complex configurations may be appropriate to describe the reliability-wise relationships
of the failure causes and/or components. These can be implemented in the analysis after the first draft is completed. (See
the later discussion in
the Next Steps section.)
It is extremely important to note at this point that, even though we just computed a system reliability value, this first draft FMRA value may be nowhere
close to the true reliability. What we need to do now is go back through the first draft and review each entry and result. We will call this subsequent step
the "FMRA Vetting" process.
The first draft of the FMRA is just that, a draft. It needs to be thoroughly reviewed and vetted before proceeding. The list that follows outlines items that
need to be considered in the vetting process.
Tip: Within the Xfmea/RCM++
Version 8 software, you can create a baseline (i.e., an exact replica of the project at a specific point in time) before
any major change to the analysis. You can restore the baseline whenever it may be needed, which allows you to view the project as it was at the previous point in time.
When you are ready to begin modifying the first draft of the FMRA in the Xfmea/RCM++ software, one option is to create a copy of the project so
the original DFMEA can remain unchanged and you can modify the FMRA in a separate project. The drawback of this approach is that you will now dissociate the FMRA from
the DFMEA and changes made to either will not be reflected in both. As an alternative, the list below includes tips for ways that you can adjust the FMRA for
the purposes of a more accurate reliability calculation while still maintaining synchronization with the original DFMEA.
- Discount/eliminate issues that have no impact on reliability. Depending on how the FMEA was done, there may be multiple items, functions, failures or
causes in the DFMEA that have no impact on reliability. For example:
- A failure in the DFMEA could be "Design fails to aesthetically please the customer" with multiple causes such as "Red color is disliked
by X% of the population," etc. While these are important considerations during design, the color of the chandelier is irrelevant from a reliability perspective.
- Other failures could be process issues that will be addressed in manufacturing, or items that will not impact reliability if the appropriate controls are put in
In short, we need to remove failures that are not reliability-related from our FMRA. If you wish to maintain synchronization with the original DFMEA, instead of
deleting records from the FMEA, you can set their reliability to 100% (i.e., cannot fail). Doing so excludes the issues from the reliability analysis but
keeps the FMRA and DFMEA synchronized in the same project and tightly integrated.
- Include other contributing issues not considered in the FMEA that have an impact on reliability. There may be other failures (including interfaces)
that affect the reliability but were not included in the DFMEA. These will need to be added into the FMRA.
- Account for common causes that may appear multiple times in the FMEA. Depending on how the DFMEA was done, a single cause may appear more than
once. From a reliability perspective, the item only fails once, and a single failure should not be counted multiple times. If you wish to maintain synchronization
with the original DFMEA, you can use the mirroring functionality in Xfmea/RCM++
to ensure that common cause failure modes are handled appropriately in the analysis.
Review and Validate Inputs
- Review each occurrence rating assigned during the DFMEA and its derived reliability equivalent. The resulting values are only as good as the inputs
provided ("garbage in, garbage out"). In this case, the inputs are the qualitative occurrence ratings from the FMEA team. The team could be wrong on
some or all of them.
- Compare/cross-reference the occurrence rating value with other FMEAs done by other teams on similar items and similar environments.
In Xfmea/RCM++, it is easy to search through all FMEAs stored in the same Synthesis repository.
- Review any available historical data, published data and warranty data, as well as all related analysis and models that have been performed to
describe the reliability of the item at the expected use conditions.
- Look for similar models in the Synthesis repository (i.e., Weibull++,
ALTA or other analyses on similar items).
- Look for reference data in published standards (e.g., standards based reliability prediction).
- Cross-reference with data from the failure reporting, analysis and corrective action system (FRACAS).
- Get expert opinion and use the Quick Parameter Estimator within Xfmea/RCM++ to translate these opinions into usable models.
- Use physics of failure, computer simulation, finite element analysis and other tools and methods.
- In cases where no reasonable assessment can be made, testing may need to be performed. Make this testing a part of the reliability plan and set aside a
budget for it.
- Do a common-sense reality check on the values given! As an example, in the chandelier FMRA example shown above, the reliability of the bulb was calculated
to be 93% after 5,000 hours of operation. If this is an incandescent bulb, this value may be overly optimistic.
- Question the dangerous exponential assumption. Even though we used an exponential
distribution for the initial transition to a reliability model, remember
that this distribution assumes a constant failure rate. For the majority of items, this assumption is invalid. If wearout is present or suspected, you may need
to replace these initial models with distributions that have non-constant failure rates (e.g., Weibull or lognormal). In the absence of data, you can
use the Quick Parameter Estimator within Xfmea/RCM++ to translate these into different models. For example, if beta is known for a specific failure
mode, use a 1-parameter Weibull distribution coupled with the stated probability.
- Comparatively review and rank all values to further identify inconsistencies. Re-compute the FMRA based on the modifications performed so far. Use the
color-coding feature to look at causes/failure modes that are high unreliability contributors. Assess if this is valid. For each item in question, repeat the
- Review all items in question with the original FMEA team, revise the DFMEA as appropriate and regenerate the FMRA. At this point, the DFMEA will likely
be modified to take the new information into account, and then the FMRA can be regenerated. Depending on the number of changes and the extent of your
participation, you may have to go through the vetting process again.
Once vetted, the baseline reliability value is now your initial design reliability. Compare this with the target reliability, keeping in mind that this first
baseline estimate is usually optimistic. Furthermore, the product needs to go through manufacturing, and the manufacturing process isn’t going to increase the
reliability, thus you want your initial baseline reliability value to exceed your reliability target value.
If the initial baseline is not sufficient for the target, you may expand the analysis to include RBDs in BlockSim and continue with different types of
analysis including reliability importance and reliability allocation. Reliability importance analysis provides more advanced methods to identify the issues that
have the biggest contribution on the overall reliability. Reliability allocation analysis provides reliability requirements for each item (all the way down
to the cause level) so that the target is met. For causes that have higher reliability requirements (which translate back to lower probability of occurrence), a
review of the new requirements with the FMEA team is advised so that additional corrective actions can be assigned if necessary to assure that the new
requirements are met. This may again require a revision of the FMEA and the FMRA.
From a knowledge base perspective, the FMEA and related FMRA should continuously be updated as new information becomes available, including the addition of
new failure modes uncovered during testing and the revision of the underlying reliability models based on data obtained from testing. After the product’s
release, this process should continue with field information.