A Blueprint for Implementing A Comprehensive Reliability Engineering Program
Section 5 of 7: Data Collection
Data collection is the framework for a good reliability engineering program. It is necessary to have an accurate and comprehensive system of recording data relating to a product's reliability performance in order to be able to produce meaningful reliability reports. Although the nature of the data collection system will differ based on the type of data being collected, there must be a certain number of common elements in the type of data being collected and the way the information is recorded. This is necessary in order to provide the continuity necessary for developing an accurate assessment of the "cradle-to-grave" reliability of a product.
Whenever possible, computers should be employed in the data collection and recording process. Of course, the method of data collection will vary with the product under consideration, but given the decreasing cost and increasing power of computer systems, it should not be very difficult to set up a computerized data collection system. In some cases, it is even possible to automate much of the data collection process, thus further decreasing the potential for data recording errors.
In a previous section, the different types of in-house reliability testing were discussed. One of the most important aspects of setting up an in-house reliability testing program lies in having certain common elements that extend across all of the different types of tests. This can be aided greatly by a degree of uniformity in the data collection process. Having a core group of data types that are collected from every test that takes place makes it easier to perform similar analyses across a variety of different test types. This lends a great deal of continuity in test analysis and reporting that will benefit the reliability program and the entire organization.
As mentioned earlier, it is highly beneficial to automate the data collection and recording process wherever possible. The method of data collection will differ from product to product, depending on whether it is possible to use a computer interface to operate the product during the test and automatically record the test results. Regardless of the method employed in running the test, it is always possible and advisable to use a database system to keep track of the test results. The use of relational databases makes it fairly easy to manipulate large quantities of data, and greatly aids in the reporting process. Properly managed, it is even possible to automate some if not all of the data reporting process using database information. Of course, human oversight is always a necessity when dealing with data analysis of any type, but proper use of database structuring and manipulation can make the reliability engineer's job much easier.
In setting up a database system for collecting data from in-house reliability testing, there are a minimum number of data types that need to be included in the database structure. For the purposes of in-house reliability data collection, it is recommended to have at least three related databases: a test log, a failure log, and a service log. Detailed descriptions of these databases and the information they should contain appear below.
The test log contains detailed information on the tests being run on the products. The structure of the database will vary depending on the testing procedures and the type of products for which data are being captured. If the product requires a test in which the test units are essentially just turned on and left to run until they fail or the time of the test expires, the test log will be fairly simple. However, if the product requires a variety of different inputs in order to be properly exercised during testing, the test log should be detailed enough to record all of the pertinent information. A suggested list of fields for the test log includes:
- Transaction number: a unique identification code for the test log entry.
- Test start date: the date the test starts.
- Test start time: the time the test starts.
- Test name: the name or identifier for the test being run.
- Test stage or step: if the test is run in a series of stages or steps with different inputs, this field should provide a description or count of which segment of the test is being run, e.g., "Step 2," "High Temperature," etc.
- Test inputs: this describes the test inputs at each stage or step of the test. Depending on the nature of the product and the testing, it may be necessary to create a separate log for the specific steps and inputs of the test in order to keep the test log from being too cluttered with specific step/input information.
- Operator comments: specific comments regarding the test that may be useful when performing subsequent analyses.
- Deviations: descriptions of deviations from the original test plan.
The failure log is where the majority of the information that is important to the generation of reliability results will reside. Care should be taken in the construction of this database so that all of the pertinent failure information will be collected every time a failure occurs. At the same time, it should not have so many fields as to be unwieldy when conducting a reliability analysis. This might slow down the overall testing process if a large amount of minutely detailed information needs to be recorded. Developing a method of automating the data collection process will alleviate this problem, but that is not always possible. A suggested list of fields for the failure log include:
- Transaction number: a unique identification code for the failure log entry.
- Test log cross-reference: the transaction number for the test log entry that corresponds to the test on which the failure occurred.
- Service log cross-reference: the transaction number for the most recent service log entry.
- Failure date: the date when the failure occurred.
- Failure time: the time when the failure occurred.
- Failure type: describes the failure type encountered, particularly if a multi-tiered system of failure classification is being used.
- Test stage: the stage or step of the test when the failure occurred. This can be cross-referenced to the appropriate test or input log.
- Symptom code: the symptom noticed by the operator when the failure occurred. The type of symptom code can be cross-linked to a preliminary failure code.
- Failure code: describes the actual mode of failure. A preliminary failure code can be generated based on the symptom code, but a failure analysis engineer should make the final disposition of the failure code.
- Failed part ID: this describes the part or parts that caused the failure. If possible, the failed part serial number should be included as a separate field.
- Resolution: this field describes what action was taken to restore the failed unit to operational status. This field should be cross-linked to the service log.
- Comments: specific comments regarding the failure that may be useful when performing subsequent analyses.
The purpose of the service log is to track and record any service actions or modifications performed on test units. It is important to keep a concise record of any service actions performed on test units because even a relatively small modification or repair can potentially have a large effect on the performance of the test units. By requiring service technicians and engineers to use a service log whenever they work on a test unit, the amount of unofficial "tinkering" with a system will be minimized, thus reducing unexplained changes in test unit performance. A service log entry should be made whenever a test unit is installed or upgraded. This allows for tracking design level or version number changes across the tests. A suggested list of fields for the service log include:
- Transaction number: a unique identification code for the service log entry.
- Test log cross-reference: the transaction number for the test log entry that corresponds to the test during which the service was performed.
- Service date: the date on which the service was performed.
- Service time: the time at which the service was performed.
- Current version identifier: identifies the revision or design level of the test unit before the service is performed.
- New version identifier: identifies the revision or design level of the test unit after the service is performed. This will be the same as the current version identifier unless the service performed upgrades the test unit to the next level.
- Service type: describes the service performed.
- Part modified/replaced: a description/serial number of the part modified or replaced during the service action.
- Comments: specific comments regarding the service action that may be useful when performing subsequent analyses.
Depending on the circumstances, collection of field data for reliability analyses can be either a simple matter or major headache. Even if there is not a formal field data collection system in place, odds are that much of the necessary general information is being collected already in order to track warranty costs, financial information, etc.. The potential drawback is that the data collection system may not be set up to collect all of the types of data necessary to perform a thorough reliability analysis. As mentioned earlier, many field data collection methodologies focus on aspects of the field performance other than reliability. Usually, it is a small matter to modify data collection processes to gather the necessary reliability information.
For example, in one instance the field repair personnel were only collecting information specific to the failure of the system and what they did to correct the fault. No information was being collected on the time accumulated on the systems at the time of failure. Fortunately, it was a simple matter to have the service personnel access the usage information, which was stored on a computer chip in the system. This information was then included with the rest of the data collected by the service technician, which allowed for a much greater resolution in the failure times used in the calculation of field reliability. Previously, the failure time was calculated by subtracting the failure date from the date the product was shipped. This could cause problems in that the product could remain unused for months after it was shipped. By adding the relatively small step of requiring the service technicians to record the accumulated use time at failure, a much more accurate model of the field reliability of this unit could be made.
Another difficulty in using field data to perform reliability analyses is that the data may reside in different places, and in very different forms. The field service data, customer support data, and failure analysis data may be in different databases, each of which may be tailored to the specific needs of the group recording the data. The challenge in this case is in developing a method of gathering all of the pertinent data from the various sources and databases and pulling it into one central location where it can easily be processed and analyzed. While this can be challenging, the planning involved in bringing these previously separate sources of information together usually results in a beneficial synergy. By bringing together key players and acting as a catalyst to discussion and discovery, the planning process often helps an organization to gain a new understanding of its current processes and to identify untapped existing resources. Many organizations that have undergone this process have found that the fresh perspective and ability to spark communication provided by such consultations are invaluable to their organizations.
Copyright © 1992- HBM Prenscia Inc. Document updated January 2016.