Reliability HotWire

Reliability HotWire

Issue 112, June 2010

Hot Topics

Build Equivalent Single System During Reliability Growth Testing

[Editor's Note: This article has been updated since its original publication to reflect a more recent version of the software interface.]

During reliability growth testing, it is common that multiple systems are tested concurrently. The cumulative test hours and failure information from all the systems under test are used for reliability growth modeling. For this analysis, it is critical to track all the cumulative test hours from all the systems using the same configuration (i.e., with the same "fixes" implemented). When a failure occurs during reliability growth testing, a fix will be implemented not only on the system that has the failure, but also on other systems in the same test. Ideally, all the systems should be stopped and the same fix should be implemented before resuming the test. However, in reality it is rare that the same fixes will be implemented simultaneously on each system. Delays are common and it complicates the calculation for cumulative test hours. In this article, we will illustrate how to correctly get the cumulative test hours for each failure mode and use them to build an equivalent single system (ESS) for reliability growth modeling and prediction using RGA.

Example

Assume four systems are tested for 500 hours. Failures (F) and fixes (I) are recorded in the following table. Three failure modes: BC1, BC2 and BC3 are observed. The BC1 failure mode is fixed at different times on different systems. Each occurrence of a BC1 failure mode is identified in the final column with a unique number for easy reference later in the article. The question posed by engineers is how can you build an equivalent single system using the data?

System ID Event Time To Event Failure Mode Failure ID
for BC1
System 1 F 100 BC1 #1
System 1 I 200 BC1  
System 1 F 250 BC1 #2
System 1 I 300 BC1  
System 1 F 350 BC1 #3
System 1 F 450 BC1 #4
System 2 F 280 BC1 #5
System 2 I 300 BC1  
System 2 F 380 BC1 #6
System 2 I 400 BC1  
System 2 F 480 BC1 #7
System 3 F 290 BC1 #8
System 3 F 390 BC1 #9
System 3 I 400 BC1  
System 3 F 490 BC1 #10
System 3 I 500 BC1  
System 4 F 5 BC2  
System 4 F 12 BC3  
System 4 F 270 BC1 #11
System 4 F 370 BC1 #12
System 4 F 470 BC1 #13
System 4 I 500 BC1  
System 4 I 500 BC2  
System 4 I 500 BC3  

 

To calculate the cumulative event time for the equivalent single system (ESS), we assume failure modes are independent of each other. The cumulative test time is calculated for each failure mode and then combined together to build the equivalent single system. Figure 1 shows the event times for BC1.

Figure 1: Event Times for Failure Mode BC1

For each BC1 failure, the corresponding cumulative time for the ESS is calculated below.

  1. For failure #1 at 100 hours for system 1 (S1), the cumulative test time is 100 x 4 = 400. This is because the 4 systems have the same configuration for the time period up to 100 hours. When a failure occurs at 100 hours, the accumulated test time from all the four systems is 400 hours.
  2. For failure #2 at 250 hours (S1), it occurs 50 hours after the fix. So it occurs at 1400 (I) + 50 = 1450 hours on the ESS.
  3. For failure #3 at 350 hours (S1), its time on the ESS is 1400 (I) + 150 (S1) + 50 (S2) + 0 (S3) + 0 (S4) = 1600.
  4. For failure #4 at 450 hours (S1), its time on the ESS is 1400 (I) + 250 (S1) + 150 (S2) + 50 (S3) + 0 (S4) = 1850.
  5. For failure #5 at 280 hours (S2), its time on the ESS is 200 (S1) + 280 x 3 = 1040. Because this failure occurs before the I event on S2, it only adds 200 hours from S1.
  6. For failure #6 at 380 hours (S2), its time on the ESS is 1400 (I) + 180 (S1) + 80 (S2) + 0 (S3, S4) = 1660.
  7. For failure #7 at 480 hours (S2), its time on the ESS is 1400 (I) + 280 (S1) + 180 (S2) +80 (S3) +0 (S4) = 1940.
  8. For failure #8 at 290 hours (S3), its time on the ESS is 290 (S3) + 290 (S4) + 290 (S2) + 200 (S1) = 1070.
  9. For failure #9 at 390 hours (S3), its time on the ESS is 390 (S3) + 390 (S4) + 300 (S2) + 200 (S1) = 1280.
  10. For failure #10 at 490 hours (S3), its time on the ESS is 1400 (I) + 90 (S3) + 190 (S2) + 290 (S1) = 1970.
  11. For failure #11 at 270 hours (S4), its time on the ESS is 270 (S4) + 270 (S3) + 270 (S2) + 200 (S1) = 1010.
  12. For failure #12 at 370 hours (S4), its time on the ESS is 370 (S4) + 370 (S3) + 300 (S2) + 200 (S1) = 1240.
  13. For failure #13 at 470 hours (S4), its time on the ESS is 470 (S4) + 400 (S3) + 300 (S2) + 200 (S1) = 1370.

For each BC1 fix, the corresponding cumulative time for the ESS is calculated below.

  1. For the fix at 200 hours (S1), the same fix is implemented at 300 hours for system 2 (S2), at 400 hours for system 3 (S3) and at 500 hours for system 4 (S4). So the cumulative time before the fix is 200 + 300 + 400 +500 = 1400. It means the test has cumulated 1400 operation hours from the four systems under the same configuration.
  2. For the second fix, it occurs at 300 hours for system 1, at 400 hours for system 2, at 500 hours for system 3, but these fixes are not used in the calculation of ESS. In terms of the Crow Extended model, it does not need to know when the fix was implemented. More information on recurring failures for the same mode after an I event will be covered in a future article.

Since we assume failure modes are independent of each other, the above procedure also is used to get the cumulative event times for BC2 and BC3. 

  1. For BC2 failure at 5 hours (S4), the cumulative time is 5 x 4 = 20.
  2. For BC3 failure at 12 hours (S4), the cumulative time is 12 x 4 = 48.

Finally:

  1. For the I events at 500 hours (S4), the cumulative time is 500 x 4 = 2000.

From the above calculations, it can be seen that the basic rule is to get the cumulative test hours for systems with the same configuration. Several general rules for BC modes are summarized in below:

  1. Each failure time for a BC mode that occurred before an implemented fix (I event) for that mode is calculated by multiplying the failure time of the system by the total number of total systems under test.
  2. The implemented fix (I event) time in the equivalent single system is calculated by adding the test time invested in each system before that I event takes place. It is the total time that the system has spent at the same configuration in terms of that specific mode.
  3. After a fix was implemented in one or more systems (I event) and the same BC mode occurs in another system, the failure time in the equivalent single system for this failure is calculated by adding the test time until this failure and one of the following for each system:
    1. The test time until the implemented fix (I event) if the I events occurred earlier than this failure in calculation.
    2. The time of this failure for each one of the systems if the I events occurred later than this failure time in the other systems or those systems did not have any I events for that BC mode.
  4. After a fix for a mode was implemented in one or more systems (I event) and the same BC mode occurs in the same system, the failure time in the equivalent single system is calculated by adding the test time of each system after the I event was implemented to the equivalent I event time.

It can be seen that the calculation for building the ESS is tedious when the number of systems and the number of failures are large. Luckily, RGA has implemented all the calculations using the Multiple Systems with Event Code data type. This new data type and the ability to transfer the data to the ESS data type will be available in version 7.5.1 or higher, which will be released by ReliaSoft in June 2010. Licensed users can obtain the latest service release from http://RGA.ReliaSoft.com/updates.htm.

The data in the above table can be entered into RGA as shown in Figure 2.

Figure 2: Multiple Systems with Event Code

RGA also can graphically display the failure times of each system and the corresponding ESS. The plot is called the System Operation plot and is given in Figure 3.

Figure 3: System Operation Plot

The values of the points on the "Equivalent" line in the plot are the same as the values we have calculated manually. Click the Transfer to New Data Type icon and select Equivalent Single System as shown in Figure 4.

Figure 4: Transfer Data Type Window

The transferred data appears in a separate worksheet, as shown in Figure 5.

Figure 5: Transferred Data

One can see that values in Figure 5 indeed are the same as the values we have calculated. Please notice that all the BC modes were renamed to BD modes. This is because all the fixes are delayed fixes (not implemented right at the failure time). For more detail on the definition of failure mode classification, please refer to [1]. Using the newly obtained ESS, we can build the model and make predictions. All the calculations in the Multiple System with Event Code data type are based on the ESS.

Conclusion

In this article, we illustrated how to correctly calculate the cumulative test hours when there are multiple systems under the same reliability growth test. For this analysis, it is critical to get the number of test hours for the systems with the same configuration. We used an example to demonstrate the step-by-step calculation of the event times for the ESS in RGA. Once the ESS has been obtained, we can use it to build a reliability growth model and calculate reliability metrics such as the demonstrated MTBF, demonstrated failure intensity and growth rate.

References

[1] ReliaSoft Corporation. "Crow Extended Model." ReliaSoft Corporation. 2010. http://reliawiki.org/index.php/Crow_Extended