Reliability HotWire: eMagazine for the Reliability Professional
Reliability HotWire

Issue 68, October 2006

Hot Topics

Dont Let Your Event/Maintenance Log Data Go to Waste

 

Companies are increasingly relying on information to run their businesses and stay competitive. More aggressive productivity goals and globalization needs are putting higher pressure on companies to increase the efficiency and availability of their assets. In this competitive environment, one companys downtime can become another company's opportunity. Obtaining relevant data about the failures and repairs of assets is central to achieving company goals.

 

One very valuable kind of data that is collected in the manufacturing, process, oil, machine tools and communication industries is the log of failures and repairs of physical assets such as machines, airplanes, computers, servers, trucks and oil refineries. This information can provide many insights about failure modes, outages, their frequencies, the repair duration, uptime/downtime and availability. It can aid in gaining knowledge required for critical business performance improvements.

 

The volume of this type of data can become very large and the task of extracting useful and concise information can also become cumbersome. This article highlights some of the functionalities of Weibull++ 7 that can be utilized to facilitate the processing of event or maintenance logs that your company might be keeping track of.

 

Event/Maintenance Logs
Event logs, or maintenance logs, store information about an asset's associated failures and repairs. Some event logs contain more information than others, but essentially event logs capture data in a format that includes the type of event, the date/time when the event occurred and the date/time when the system was restored to operation.

 

Weibull++ offers a feature called the Event Log Folio that helps in logging such entries and automatically converting these log entries to time-to-failure and time-to-repair data that can be analyzed with life data analysis techniques. The data format for Weibull++ 7s Event Log is shown next.
 

 

Weibull++ 7's Event Log consists of the following columns:

  • System: identifies the system (asset) that experienced an event. This is an optional column that can be displayed by checking the Allow System column option on the Other tab of the Control Panel. If this column is not used, all components are assumed to belong to the same system.

  • F/E: indicates whether the occurrence was a failure (F) or another type of event (E). The F and E occurrences can be analyzed jointly or separately. The E event is a general event type that can be used to enter comments, events other than relevant failures, etc.

  • Date Occurred: indicates the exact date of the occurrence.

  • Time Occurred: indicates the exact time of the occurrence.

  • Date Restored: indicates the exact date when the system was restored.

  • Time Restored: indicates the exact time when the system was restored.

  • Level 1, Level 2, Level 3, Level 4: indicate the subsystem, the subsubsystem, component, subcomponent, etc. that was responsible for the failure or event.

  • OTF: If any of the components accumulate age when the system is down for any reason other than their own failure, this can be indicated with a Y in the OTF (Operate Through other Failures) column. This is an optional column that can be displayed by checking the Allow OTF column option on the Other tab of the Control Panel. If this column is not used, then all components are assumed to be down when the system is down.

  • Description allows you to enter descriptions of the failure or event.

System-level information is also needed for each system, namely:

  • The start date/time and end date/time.

  • Whether the system was new or not when the event log data started being collected.

  • The shift patterns of the system to show which parts of the days of the week the system is in use.

Analyzing Event/Maintenance Logs
Once event logs have been collected, the data can be used to extract failure times and repair times information. Let us say that n failures and repair actions took place during the events logging period. The times-to-failure can be derived by calculating the times between the last restoration (repair) and the new failures of every unique occurrence of an event at a certain level of the system.

  

Times-to-failurei = ti - ri-1    (1)

where

  • ti = Date/time of occurrence i

  • ri-1 = Date/time of restoration of previous occurrence (i-1)

For systems that were not new when data started being collected, the times to first occurrence of every unique event are considered to be suspensions (right censored), because the system is assumed to have accumulated more hours before data began to be gathered (i.e. the time between the start date/time and the first occurrence of an event is not the entire operating time). In this case:

 

Suspension1 = t1 - Start Time (2)

 

For systems that were new when the event log data started being collected:

 

Times-to-failure1 = t1 - Start Time (3)

 

When monitoring on the system is stopped or when the system is no longer being used (when system end date/time is reached), all events that have not occurred by this time are considered suspensions.

 

Last Suspension = System End Time - rn (4)

 

Eqn. (1) to Eqn. (4) are valid for cases in which the component operates through the failure of other components. The equations need to be adjusted in cases in which the component does not operate through the failure of other components. The downtime of the system due to other failures needs to be accounted for. In other words, Eqn. (1) to Eqn. (4) become:

 

Times-to-failure = ti - ri-1 - System Down Time since ri-1 (5)
Suspension1 = t1 - Start Time - System Down Time since Start Time (6)
Times-to-failure1 = t1 - Start Time - System Down Time since Start Time (7)
Last Suspension = End Time - rn - System Down Time since rn (8)

 

Repair times are obtained by calculating the difference between the date/time of event occurrence and the date/time of restoration.

 

Times-to-repairi  = ri - ti (9)

 

Eqn. (1) to Eqn. (9) should also take into consideration the periods of non-operation during which the systems are not used, as in the case of operations that do not run on a 24/7 basis.

 

The list of times-to-failure and times-to-repair obtained in the above manner, from every system in the event log, can be used to derive failure distributions and repair distributions respectively using life data analysis methods. The process of data extraction and model fitting can be automated using the Weibull++ 7 Event Log Folio.

 

Example Using Weibull++ 7

The following example illustrates the use of event logs in a manufacturing company. The event logs of a cutting machine are being tracked. There are two cutting machines in two parallel production lines. The following figure shows the event log data of both machines.

 


[Click to Enlarge]

 

For safety reasons, when a machine fails, the machine is turned off. None of the machine's components continue to work during a failure of other parts of the machine. Therefore, the OTF column is not used in the analysis.

 

These machines were new when the logging of failures and maintenance actions began. Click the System Setup icon to enter the start date/time and end date/time of each machine and the state of the machine at the start of the event logs.

 

 

 

Click the Shift Pattern icon to specify the shift during which the machines are supposed to be working.

 

 

 

The level of analysis also needs to be specified. This sets the depth of the analysis and generates models for the components at the indicated levels. On the Main tab of the Control Panel, make the following selections:

  • Levels to Analyze: Level 2

  • Failures and Events: Analyze Separately

You can also choose from a variety of options for analysis on the Analysis tab of the Control Panel. For this example, for both the Failure Distribution and the Repair Distribution, select Prefer RRX if sufficient data and select all three distributions for consideration.

 

Click Calculate to obtain results. You can then click the (...) button under Analysis Summary at the bottom of the Main tab of the Control Panel to view detailed results from the analysis. This summary lists the fitted failure and repair distributions and their parameter estimates in addition to the uptime and downtime of the elements of the system at the selected level.

 


[Click to Enlarge]

 

You can perform further analysis by transferring failure and repair data from the event log to standard folios, where more analysis options are available. To do this, click the Transfer Life Data to New Folio icon.

 

 

The next two figures show the failure data and repair data for the conveyer belt component.

 

 

 

The reliability plots for the different components in the system are shown next.

 

 

Integration with BlockSim 6

The failure and repair distribution models that were derived above can be used for system reliability, maintainability and availability analysis. Weibull++ 7 provides a functionality to facilitate the transfer of these models to BlockSim 6. Click the BlockSim icon to select which blocks you want to transfer.

 

 

In the window that appears, make the following selection to transfer the models for the machine's components (in this example, we chose not to transfer the E events because they do not describe the failure characteristics of the components and their repair actions) and click OK.

 

 

This will create a BlockSim 6 file that contains a template with all the blocks representing the system's makeup.

 

 

The failure distributions and repair distributions derived using the Event Log Folio in Weibull++ 7 are applied to the respective blocks. These blocks can be used to create a Reliability Block Diagram (RBD) that can be used for reliability and availability analysis. The RBD that describes the cutting machine is as follows.

 

Note that each block in this diagram is a subdiagram. The subdiagrams are shown next.

 

 

 

 

The availability of the cutting machine during a two year period (2 years X 52 weeks/year X 45 hours/week = 4680 hours) can be estimated using simulation. The following is a simulation plot of the machine's availability over a two year period.

 

 

The RBD shown above is the backbone for many subsequent types of analysis that can follow, such as life cycle cost analysis and throughput analysis. The diagrams can also be enhanced by specifying more details about crews, spare parts stocks, part ordering, preventive maintenance, inspections, etc. These topics have been covered extensively in previous issues of Reliability HotWire.

 

Conclusion

This article addresses how useful information can be extracted from event and maintenance logs in industries that collect this type of historic failure data for their important assets. This type of knowledge is essential for critical operation management and maintenance processes, as well as for business performance analysis and improvements. The Event Log Folio feature in Weibull++ 7, along with its smooth integration with BlockSim 6, facilitates this type of analysis.

Copyright 2006 ReliaSoft Corporation, ALL RIGHTS RESERVED