How Long Should You Burn In a System?|
Burn-in testing is an important reliability technique that can be used to improve the population of a product before shipping. It relies on weeding out weak units by operating all units for a predetermined period of time. The units that survive are considered to be appropriate for shipping, while failed units are discarded. Some applications call for burning in a whole system made of various components instead of burning in the components individually. This article discusses good burn-in strategies, depending on the type of the system's failure rate behavior, using BlockSim.
Burn-in is not good for all types of products. It should be used only when dealing with populations with a "freak" (weak) subpopulation or units that have a continuously decreasing failure rate or have a period of decreasing failure rate. If burn-in is determined to be an appropriate approach, the optimum burn-in test period needs to be determined. Many approaches are available to determine the optimum burn-in period, such as optimizing burn-in time to meet a reliability goal or a failure rate goal or to minimize the cost of burn-in and future replacements. Issue 58 addressed one of these methods.
We face a problem when burning in a system made of components with differing failure behaviors. In such situations, an ideal burn-in test should test each component's population separately and then build systems with components that survived the burn-in tests (assuming that the construction of the system does not add new significant failure modes; otherwise, a system burn-in test might still be needed). However, there are many situations that do not allow for such a procedure. For example, to replicate in a burn-in test the same conditions that components would experience in regular operation as part of the system, the components may need to be assembled into the system before the test is conducted. Another reason could be because the manufacturer does not have all the equipment necessary to separately test each of the components that make up the system. System burn-in will also help eliminate units that sustained damage in the assembly processes, such as damage introduced by component insertion machines, manual operations, stressed conditions (e.g. excessive temperature due to soldering) or general handling. Such situations call for burning in the whole system. When dealing with a system made of components with different failure characteristics, how does one go about determining an appropriate burn-in test time?
We will study different categories of failure rates and decide on an appropriate system burn-in strategy using BlockSim. For the rest of this article, we will consider the following system Reliability Block Diagram (RBD).
Figure 1 - Example System
Note that the blocks in the above RBD can represent different components, subassemblies or failure modes inherent in the system, or even failure modes introduced by the assembly process.
Case 1: Systems with increasing failure
The system failure rate plot, λ(t), obtained in BlockSim is shown next.
Figure 2 - Failure Rate Plot for Case 1
Systems of this kind are already in their wearout stage from the start time. In this case, burn-in will have a negative effect. It would make the systems worse, because systems that do survive the burn-in will have a greater failure rate than what they had before the test and will be more likely to fail in the field. Performing a burn-in is not advised in this case.
Case 2: Systems with decreasing failure
The system failure rate plot is shown next.
Figure 3 - Failure Rate Plot for Case 2
Systems with decreasing failure rates are ideal candidates for burn-in testing. For such systems, longer burn-in periods will result in surviving systems that are better than new systems, but will also result in more discarded weak systems. The burn-in period should be determined based on a certain criterion (for example, optimizing burn-in time based on a reliability goal, a failure rate goal or cost of burn-in and future replacements). In this example, we use the failure rate goal criterion. If the failure rate goal is FR, then the burn-in period can be determined by estimating the time, Tb, that satisfies λ(Tb)= FR, as seen in the next figure.
Figure 4 - Determining an Optimum Burn-in Time Based on a Failure Rate Goal
For example if FR'=0.22hr-1, then Tb=7.6hr.
In BlockSim, we can evaluate the benefit of burn-in by comparing the conditional reliability after 50hr of operation if no burn-in is conducted (Figure 5.A) to the reliability after 50hr of operation if a burn-in period of Tb =7.6hr is conducted (Figure 5.B).
Figure 5A- Reliability without Burn-in Figure 5B-Reliability with Burn-in
The above comparison shows that a burn-in for a duration of 7.6hr would improve the reliability at t =50 from 40% to about 55%.
In BlockSim, we can also obtain two reliability plots for the system with or without burn-in. Since a "burned-in" system would be starting operation with blocks that have accumulated a certain age (Tb), an RBD of the burned-in system can be created by duplicating the original RBD with the same properties and applying a start age on every block. The next figure shows how to apply the Start Age on a block (this should be done for every block in the RBD).
Figure 6 - Applying the Start Age for a Block
After applying the start age for every block, the failure rate or other plots (such as the reliability plot) can be obtained. The next figure is a plot of the reliability after burn-in compared with the systems reliability if no burn-in is performed.
Figure 7 - Comparison of the Reliability of the System in Case 2 with or without Burn-in
Case 3: Systems with bathtub failure
The system failure rate plot is shown next.
Figure 8 - Failure Rate Plot for Case 3
In this case we consider that the system follows a period of decreasing failure rate, followed by an increasing failure rate period (and possibly an intermediary constant failure rate period). Note that the term bathtub curve is usually used to describe an entire non-homogeneous population of units. Here, however, we are using it to describe the failure rate of a single system or each system in a population of systems.
Using Figure 8, we can determine the appropriate burn-in time for the system that could eliminate the early life problems as follows.
Figure 9 - Determining an Optimum Burn-in Time to Eliminate Weak Systems That Follow the Bathtub Failure Rate
From Figure 9, we determine that a burn-in period of Tb = 160hr would eliminate the weak systems. Longer burn-in times will not be economical because the system's failure rate will start increasing again after Tb (or, in cases where there is a period of intermediary constant failure rate, the failure rate will start increasing again sometime after Tb). The following is the system's reliability plot if burn-in is conducted (the plot is obtained using the Start Age property as explained in Case 2), compared with the system's reliability plot if no burn-in is conducted.
Figure 10 - Comparison of Case 3's System with or without Burn-in
The above figure shows that burn-in can help in eliminating the early life problems. Note, however, that after around t=714hr, the burned-in systems reliability becomes inferior to the system that was not burned in. The reason is that as the system is being burned in, all the components in the system that have a decreasing failure rate (components B and G) are benefiting from the test. However, the components that have an increasing failure rate (components A, C, D, E, F and H) are becoming worse. That is the price we pay for burning in the whole system, rather than burning in only the components that need it! Therefore, in such situations, burn-in will have short-term benefits but will cause problems in the long term. The long-term problems will be more significant when the ratio of the burn-in period to the whole mission life is significant. Consequently, a decision needs to be made regarding the benefit of burn-in considering the mission time of the systems (i.e. the period we want the systems to operate for). For example, if the system is to be used for less than 714hr, then burn-in will be beneficial. However, if the systems mission time is to exceed 714hr, then we need to consider the long-term effect of burn-in and not perform it or perform burn-in for a shorter time.
Case 4: System with constant failure
Copyright 2006 ReliaSoft Corporation, ALL RIGHTS RESERVED