Reliability HotWire

Issue 6, August 2001

Hot Topics

Reliability Allocation and Optimization

During the process of  developing a new product, the engineer is often faced with the task of designing a system that conforms to a set of reliability specifications. The engineer is given the goal for the system and must then develop a design that will achieve the desired reliability of the system while performing all of the system's intended functions at a minimum cost. This involves a "balancing act'' of determining how to allocate reliability to the components in the system so the system will meet its reliability goal while at the same time insuring that the system meets all of the other associated performance specifications.

The simplest method for allocating reliability is to distribute the reliabilities uniformly among all components. For example, suppose a system with five components in series has a reliability objective of 90% for a given operating time. The uniform allocation of the reliability goal to all components would require each component to have a reliability of 98% for the specified operating time, since (0.98)5 is approximately equal to 0.90. While this manner of allocation is easy to calculate, it is generally not the best way to allocate reliability for a system. The optimum method of allocating reliability would take into account the cost or relative difficulty of improving the reliability of different subsystems or components.

The reliability optimization process begins with the development of a model that represents the entire system. This is accomplished with the construction of a system reliability block diagram that represents the reliability relationships of the components in the system. From this model, the system reliability impact of different component modifications can be estimated and considered alongside the costs that would be incurred in the process of making those modifications. It is then possible to perform an optimization analysis for this problem, finding the best combination of component reliability improvements that meet or exceed the performance goals at the lowest cost. ReliaSoft's BlockSim system reliability, maintainability and availability software can be used to perform this type of analysis.

Improving Reliability
Reliability engineers are often called upon to make decisions as to whether to improve a certain component or components in order to achieve a minimum required system reliability. (Note that this minimum required system reliability is always associated with a specified time.) There are two approaches to improving the reliability of a system: fault avoidance and fault tolerance. Fault avoidance is achieved by using high-quality and high-reliability components, and is usually less expensive than fault tolerance. Fault tolerance, on the other hand, is achieved by redundancy. Redundancy can result in increased design complexity and increased costs through additional weight, space, etc.

Before deciding whether to improve the reliability of a system by fault tolerance or avoidance, a reliability assessment for each component in the system should be made. Once the reliability values for the components have been quantified, an analysis can be performed in order to determine if that system's reliability goal will be met. If it becomes apparent that the system's reliability will not be adequate to meet the desired goal at the specified mission duration, steps can be taken to determine the best way to improve the system's reliability so that it will reach the desired target.

Consider a system with three components connected reliability-wise in series. The reliabilities for each component for a given time are: R1 = 70%, R2 = 80%, and R3 = 90%. The reliability goal RG = 85% is required for this system.

The current reliability of the system is RS = R1 · R2 · R3 = 50.4%. Obviously, this is far short of the system's required reliability performance. It is apparent that the reliability of the system's constituent components will need to be increased in order for the system to meet its goal. First, we will try increasing the reliability of one component at a time to see whether the reliability goal can be achieved. The following figure shows the effect on the overall system reliability of raising the reliability of individual components.

The preceding figure shows that even by raising the individual component reliability to a hypothetical value of 1 (100% reliability, i.e. the component will never fail), the overall system reliability goal will not be met by improving the reliability of just one component. The next logical step would be to try to increase the reliability of two components. The question now becomes: which two? One might also suggest increasing the reliability of all three components. A basis for making such decisions needs to be found in order to avoid the "trial and error" aspect of altering the system's components randomly in an attempt to achieve the system reliability goal. The question becomes one of how to do this most efficiently and cost effectively. We will need more information to make an informed decision as to how to go about improving the system's reliability. How much does each component need to be improved for the system to meet its goal? How feasible is it to improve the reliability of each component? Would it actually be more efficient to raise the reliability of two or three components?

In order to answer these questions, we must introduce another variable into the problem - cost. Cost does not necessarily have to be in dollars; it could be in terms of non-monetary resources, like time. By associating cost values to the reliabilities of the system's components, we can find an optimum design that will provide the required reliability at a minimum cost.

Cost/Penalty Function
There is always a cost associated with changing a design, due to change of vendors, use of higher-quality materials, retooling costs, administrative fees, or other factors. Before attempting to improve the reliability, the cost as a function of reliability for each component must be obtained. Otherwise, the design changes may result in a system that is needlessly expensive or over-designed. Developing the "cost of reliability" relationship will give the engineer an understanding of which components/subsystems to improve and how to best concentrate the effort and allocate resources in doing so. The first step will be to obtain a relationship between the cost of improvement and reliability.

The next challenge is to model the cost as a function of reliability. The preferred approach would be to formulate the cost function from actual cost data. This can be done from past experience. If a reliability growth program is in place, the costs associated with each stage of improvement can also be quantified. Defining the different costs associated with different vendors or different component models is also useful in formulating a model of component cost as a function of reliability.

However, there are many cases where no such information is available. For this reason, a general behavior model of the cost versus the component's reliability was developed for performing reliability optimization in BlockSim. The objective of this function is to model an overall cost behavior for all types of components. Of course, it is impossible to formulate a model that will be precisely applicable to every situation, but the proposed relationship is general enough to cover most applications. The default cost function in BlockSim acts like a penalty function for increasing a component's reliability. An exponential behavior for the cost is assumed, and the function has the following form:

where:

• Ci(Ri) is the penalty function (or cost) as a function of component reliability.
• f is the feasibility of improving a component's reliability relative to other components in the system.
• Rmin,i is the current reliability at the given mission time at which the optimization is to be performed.
• Rmax,i is the maximum achievable reliability at the given mission time at which the optimization is performed.

Note that this penalty function is dimensionless. It essentially acts as a weighting factor that describes the difficulty in increasing the component reliability from its current value.

1. The cost increases as the allocated reliability departs from the minimum, or current value of reliability. It is assumed that the reliabilities for the components will not take values any lower than they already have. Depending on the optimization, a component's reliability may not need to be increased from its current value, but it will not drop any lower.
2. The cost increases as the allocated reliability approaches the maximum achievable reliability. This is a reliability value that is approached asymptotically as the cost increases, but never actually reached.
3. The cost is a function of the range of improvement, which is the difference between the component's initial reliability and the corresponding maximum achievable reliability.
4. The exponent in the equation approaches infinity as the component's reliability approaches its maximum achievable value. This means that it is easier to increase the reliability of a component from a lower initial value. For example, it is easier to increase a component's reliability from 70% to 75% than increasing its reliability from 90% to 95%.

Feasibility
The feasibility term in the penalty function equation is a constant which represents the difficulty in increasing a component/subsystem's reliability relative to the rest of the components in the system. Depending on the design complexity, technological limitations, etc., certain components can be very hard to improve. Clearly, the more difficult it is to improve the reliability of the component/subsystem, the greater the cost. In BlockSim, the feasibility parameter takes values between 0.1 and 0.9, with 0.1 being very hard to improve and 0.9 being very easy to improve. (Note: The feasibility parameter for a component in BlockSim is assigned in the Optimization page of the component's Block Properties window.) Several methods can be used to obtain a feasibility value. Weighting factors for allocating reliability have been proposed by many authors and can be used to quantify feasibility. These weights depend on certain factors of influence such as the complexity of the component, the state of the art, the operational profile, the criticality, etc. Engineering judgment based on past experience, supplier quality, supplier availability, etc. can also be used in determining a feasibility value. Overall, the assignment of a feasibility value is going to be a subjective process. Of course, this problem is negated if the relationship between the cost and the reliability for each component is known.

Maximum Achievable Reliability
For the purposes of reliability optimization, we need to define a limiting reliability that a component will approach, but not reach. The costs near the maximum achievable reliability are very high, and the actual value for the maximum reliability is usually dictated by technological or financial constraints. In deciding on a value to use for the maximum achievable reliability, current state of the art of the component in question and other similar factors will have to be considered. In the end, a realistic estimation based on engineering judgment and experience will be necessary to assign a value to this input.

Note that the time associated with this maximum achievable reliability is the same as that of the overall system reliability goal. Almost any component can achieve a very high reliability value, provided the mission time is short enough. For example, a component with an exponential distribution and a failure rate of one failure per hour has a reliability that drops below 1% for missions greater than five hours. However, it can achieve a reliability of 99.9% as long as the mission is no longer than four seconds. For the purposes of optimization in BlockSim, the reliability values of the components are associated with the time for which the system reliability goal is specified. For example, if the problem were to achieve a system goal of 99% reliability at 1000 hours, the maximum achievable reliability values entered for the individual components would be the maximum reliability that each component could attain for a mission of 1000 hours.

Optimizing the System's Reliability
Once the cost functions for the individual components have been determined, it becomes necessary to develop an expression for the overall system cost. This takes the form of: Cs(RG) =C1(R1) + C2(R2) + ... + Cn(Rn), i = 1, 2, ... n. In other words, the cost of the system is simply the sum of the costs of its components. This is regardless of the form of the individual component cost functions. They can be of the general behavior model in BlockSim or they can be user-defined. Once the overall cost function for the system has been defined, the problem becomes one of minimizing the cost function while remaining within the constraints defined by the target system reliability and the reliability ranges of the components. The latter constraints in this case are defined by the minimum and maximum reliability values for the individual components.

BlockSim employs a nonlinear programming technique to minimize the system cost function. The system has a minimum (current) and theoretical maximum reliability value that is defined by the minimum and maximum reliabilities of the components, and by the way the system is configured. That is, the structural properties of the system are accounted for in the determination of the optimum solution. For example, the optimization for a system of three units in series will be different than the optimization for a system consisting of the same three units in parallel. The optimization occurs by varying the reliability values of the components within their respective constraints of maximum and minimum reliability in a way that the overall system goal is achieved. Obviously, there can be any number of different combinations of component reliability values that might achieve the system goal. The optimization routine essentially finds the combination that results in the lowest overall system cost.

Method of Implementing the Optimization
As was mentioned earlier, there are two different methods of implementing the changes suggested by the reliability optimization routine: fault tolerance and fault avoidance. Once the optimized component reliabilities have been determined, it does not matter which of the two methods is employed to realize the optimum reliability for the component in question. For example, suppose we have determined that a component must have its reliability for a certain mission time raised from 50% to 75%. The engineer must now decide how to go about implementing the increase in reliability. If the engineer decides to do this via fault avoidance, another component must be found that will perform the same function with a higher reliability. On the other hand, if the engineer decides to go the fault tolerance route, the optimized reliability can be achieved merely by placing a second identical component in parallel with the first one.

Obviously, the method of implementing the reliability optimization is going to be related to the cost, and this is something the reliability engineer must take into account when deciding on what type of cost function is going to be used for the optimization. In fact, if we take a closer look at the fault tolerance scheme, we can see some parallels with the general behavior cost model included in BlockSim.

For example, consider a system that initially consists of a single unit. The cost of that unit, including all associated mounting and hardware costs, is one dollar. The reliability of this unit for a given mission time is 30%. It has been determined that this is inadequate and that a second component is to be added in parallel to boost the reliability. The reliability for the two-unit parallel system is RS = 1 - (1 - 0.3)2 = 0.51, or 51%. So, the reliability has increased by a value of 21%, and the cost has increased by one dollar. In a similar fashion, we can continue to add additional units in parallel, thus increasing the reliability and the cost. We now have an array of reliability values and the associated costs that we can use to develop a cost function for this fault tolerance scheme. The following figure shows the relationship between cost and reliability for this example.

As can be seen, this looks quite similar to the general behavior cost model discussed earlier. In fact, the standard regression analysis function available in Weibull++ indicates that an exponential model fits this cost model quite well. The function is given by the equation

C(R) = 0.3756 · e3.1972 · R

where C is the cost in dollars and R is the fractional reliability value. Thus, it is apparent that using an exponential model to represent the general relationship between cost and reliability is valid.