Reliability HotWire: eMagazine for the Reliability Professional
Reliability HotWire

Issue 43, September 2004

Hot Topics

Data Collection

Data collection is the framework for a good reliability engineering program. It is necessary to have an accurate and comprehensive system of recording data relating to a product's reliability performance in order to be able to produce meaningful reliability reports. Although the nature of the data collection system will differ based on the type of data being collected, there must be a certain number of common elements in the type of data being collected and the way the information is recorded. This is necessary in order to provide the continuity essential for developing an accurate assessment of the "cradle-to-grave" reliability of a product.

Whenever possible, computers should be employed in the data collection and recording process. Of course, the method of data collection will vary with the product under consideration, but given the decreasing cost and increasing power of computer systems, it should not be very difficult to set up a computerized data collection system. In some cases, it is even possible to automate much of the data collection process, further decreasing the potential for data recording errors.

The concepts of in-house data collection, field data collection, and ReliaSoft's Dashboard are presented in more detail below.

In-House Test Data Collection

One of the most important aspects of setting up an in-house reliability testing program lies in having certain common elements that extend across all of the different types of tests. This can be aided greatly by a degree of uniformity in the data collection process. By having a core group of data types that are collected from every test that takes place, it is easier to perform similar analyses across a variety of different test types. This lends a great deal of continuity in test analysis and reporting that will benefit the reliability program and the entire organization.

As mentioned earlier (HotWire Issue 41, July 2004), it is highly beneficial to automate the data collection and recording process wherever possible. The method of data collection will differ from product to product, depending on whether it is possible to use a computer interface to operate the product during the test and automatically record the test results. Regardless of the method employed in running the test, it is always possible and advisable to use a database system to keep track of the test results. The use of relational databases makes it fairly easy to manipulate large quantities of data, and greatly aids in the reporting process. Properly managed, it is even possible to automate some if not all of the data reporting process using database information. Of course, human oversight is always a necessity when dealing with data analysis of any type, but proper use of database structuring and manipulation can make the reliability engineer's job much easier.

In setting up a database system for collecting data from in-house reliability testing, there are a minimum number of data types that need to be included in the database structure. For the purposes of in-house reliability data collection, it is recommended to have at least three related databases: a test log, a failure log, and a service log. Detailed descriptions of these databases and the information they should contain appear below.

Test Log

The test log contains detailed information on the tests being run on the products. The structure of the database will vary depending on the testing procedures and depending on the type of products for which data are being captured. If the product requires a test in which the test units are essentially just turned on and left to run until they fail or the time of the test expires, the test log will be fairly simple. However, if the product requires a variety of different inputs in order to be properly exercised during testing, the test log should be detailed enough to record all of the pertinent information. A suggested list of fields for the test log includes:

  • Transaction number: a unique identification code for the test log entry.
  • Test start date: the date the test starts.
  • Test start time: the time the test starts.
  • Test name: the name or identifier for the test being run.
  • Test stage or step: if the test is run in a series of stages or steps with different inputs, this field should provide a description or count of which segment of the test is being run, e.g. "Step 2," "High Temperature," etc..
  • Test inputs: this describes the test inputs at each stage or step of the test. Depending on the nature of the product and the testing, it may be necessary to create a separate log for the specific steps and inputs of the test in order to keep the test log from being too cluttered with specific step/input information.
  • Operator comments: specific comments regarding the test that may be useful when performing subsequent analyses.
  • Deviations: descriptions of deviations from the original test plan.

Failure Log

The failure log is where the majority of the information that is important to the generation of reliability results will reside. Care should be taken in the construction of this database so that all of the pertinent failure information will be collected every time a failure occurs. At the same time, it should not have so many fields as to be unwieldy when conducting a reliability analysis. This might slow down the overall testing process if a large amount of minutely detailed information needs to be recorded. Developing a method of automating the data collection process will alleviate this problem, but that is not always possible. A suggested list of fields for the failure log includes:

  • Transaction number: a unique identification code for the failure log entry.
  • Test log cross-reference: the transaction number for the test log entry that corresponds to the test on which the failure occurred.
  • Service log cross-reference: the transaction number for the most recent service log entry.
  • Failure date: the date when the failure occurred.
  • Failure time: the time when the failure occurred.
  • Failure type: describes the failure type encountered, particularly if a multi-tiered system of failure classification is being used.
  • Test stage: the stage or step of the test when the failure occurred. This can be cross-referenced to the appropriate test or input log.
  • Symptom code: the symptom noticed by the operator when the failure occurred. The type of symptom code can be cross-linked to a preliminary failure code.
  • Failure code: describes the actual mode of failure. A preliminary failure code can be generated based on the symptom code, but a failure analysis engineer should make the final disposition of the failure code.
  • Failed part ID: this describes the part or parts that caused the failure. If possible, the failed part serial number should be included as a separate field.
  • Resolution: this field describes what action was taken to restore the failed unit to operational status. This field should be cross-linked to the service log.
  • Comments: specific comments regarding the failure that may be useful when performing subsequent analyses.

Service Log

The purpose of the service log is to track and record any service actions or modifications performed on test units. It is important to keep a concise record of any service actions performed on test units because even a relatively small modification or repair can potentially have a large effect on the performance of the test units. By requiring service technicians and engineers to use a service log whenever they work on a test unit, the amount of unofficial "tinkering" with a system will be minimized, thus reducing unexplained changes in test unit performance. A service log entry should be made whenever a test unit is installed or upgraded. This allows for tracking design level or version number changes across the tests. A suggested list of fields for the service log includes:

  • Transaction number: a unique identification code for the service log entry.
  • Test log cross-reference: the transaction number for the test log entry that corresponds to the test during which the service was performed.
  • Service date: the date when the service was performed.
  • Service time: the time when the service was performed.
  • Current version identifier: identifies the revision or design level of the test unit before the service is performed.
  • New version identifier: identifies the revision or design level of the test unit after the service is performed. This will be the same as the current version identifier unless the performed service upgrades the test unit to the next level.
  • Service type: describes the service performed.
  • Part modified/replaced: a description/serial number of the part modified or replaced during the service action.
  • Comments: specific comments regarding the service action that may be useful when performing subsequent analyses.

Field Data Collection

Depending on the circumstances, collection of field data for reliability analyses can be either a simple matter or a major headache. Even if there is not a formal field data collection system in place, odds are that much of the necessary general information is being collected already in order to track warranty costs, financial information, etc. The potential drawback is that the data collection system may not be set up to collect all of the types of data necessary to perform a thorough reliability analysis. As mentioned earlier, many field data collection methodologies focus on aspects of the field performance other than reliability. Usually, it is a small matter to modify data collection processes to gather the necessary reliability information.

For example, in one instance the field repair personnel were collecting only information specific to the failure of the system and what they did to correct the fault. No information was being collected on the time accumulated on the systems at the time of failure. Fortunately, it was a simple matter to have the service personnel access the usage information, which was stored on a computer chip in the system. This information was then included with the rest of the data collected by the service technician, which allowed for a much greater resolution in the failure times used in the calculation of field reliability. Previously, the failure time was calculated by subtracting the failure date from the date the product was shipped. This could cause problems in that the product could remain unused for months after it was shipped. By adding the relatively small step of requiring the service technicians to record the accumulated use time at failure, a much more accurate model of the field reliability of this unit could be made.

Another difficulty in using field data to perform reliability analyses is that the data may reside in different places, and in very different forms. The field service data, customer support data and failure analysis data may be in different databases, each of which may be tailored to the specific needs of the group recording the data. The challenge in this case is in developing a method of gathering all of the pertinent data from the various sources and databases and pulling it into one central location where it can easily be processed and analyzed. These functions can also be performed automatically, using ReliaSoft's Dashboard system.

ReliaSoft's Dashboard

ReliaSoft's Dashboard system is a tool for the automation of product quality tracking and warranty processes that pulls in data from a variety of sources and presents the analyzed results in a central location. It is designed around a central database that is used to capture, analyze and present product field reliability, quality and warranty data. The system can be used to capture product quality data for product failures reported via customer returns, the customer call center, field repairs and other warranty channels. The ReliaSoft Dashboard is a Web-based reporting mechanism that allows users to view product quality and reliability reports and analyses. As needed, customized data entry tools, data load applications and data transfers from existing systems can be used to capture product data. Following is a description of a Dashboard system currently in use to capture and track field reliability information.

The system is designed around a master database that captures product quality and reliability data in one centralized location. Data are incorporated into the master database through a variety of techniques designed to work with the information infrastructure already in place at the organization. In this case, Sales, Manufacturing, Warranty Cost, Call Center and Repair Center data are extracted on a regular basis from the databases in which they currently reside and "pumped" into the master database using a specially designed data import utility to validate and load the data. (Other methods to transfer data from existing databases are available and their implementation depends on the existing information architecture and the information technology policies of the individual organization.)

Detailed data on product returns (processed through regional distribution organizations) are captured via a customized Web-based data entry interface that serves to validate and load the data into the master database. Some of the products returned by customers are routed through a product disposition center that has been designed to coordinate more detailed analysis of returned products based on statistical sampling techniques. A customized application in use at this location allows the organization to coordinate the disposition and sampling. More extensive failure analyses are performed on selected products and data obtained during that process is incorporated into the system via a Web-based data entry interface as well. The next figure shows a graphical representation of an example of this process.


Once the pertinent information has been loaded to a central storage area, it can easily be processed and presented in a format that will be meaningful to the users of the information. The Dashboard system presents an at-a-glance overview of a variety of different analyses based on the data that have been extracted. It is then possible to "drill down" to more detailed levels of information from the top-level view.

Another advantage to using the Dashboard (or any other tool or methodology) to pull together disparate sources of field information is that the planning involved in bringing these previously separate sources of information together usually results in a beneficial synergy. By bringing together key players and acting as a catalyst to discussion and discovery, the planning process often helps an organization to gain a new understanding of its current processes and to identify untapped existing resources. Many organizations that have undergone this process have found that the fresh perspective and ability to spark communication provided by such consultations are invaluable.

ReliaSoft Corporation

Copyright 2004 ReliaSoft Corporation, ALL RIGHTS RESERVED