Availability and the Different Ways to Calculate It
Availability is an
important metric used to assess the performance of
repairable systems,
accounting for both the reliability
and maintainability properties of a component or system. Did
you know, though, that there are different classifications of
availability and different ways to calculate it?
This article will
explore the different availability classifications,
what they mean and how they can be calculated with the help
of BlockSim 7.
Availability
Classifications
The classification of availability is somewhat flexible and
is largely based on
the types of downtimes
used in the
computation and on
the relationship
with time (i.e. the span of time to which the
availability refers). As a result, there are a number of
different classifications of availability, including:
Instantaneous (or
Point) Availability
Average Uptime
Availability (or Mean Availability)
Steady State
Availability
Inherent Availability
Achieved Availability
Operational
Availability
A wide range of
availability classifications and definitions exist. The ones
presented here are the most common but variations exist and
you should be aware of how they are calculated and what they
mean so that you can
make an appropriate choice for the analysis you are
performing.
Instantaneous or Point
Availability, A(t)
Instantaneous (or point) availability is the probability
that a system (or component) will be operational (up and
running) at a specific time,
t.
This classification is
typically used in the military, as it is sometimes necessary to estimate the
availability of a system at a specific time of interest (e.g.
when a certain mission is to happen). The point
availability is very similar to the reliability function
in that it gives a
probability that a system will function at the given time,
t. Unlike reliability,
however, the instantaneous availability
measure incorporates maintainability information. At a given
time, t, the system will be operational if one of the
following conditions is met (1):
The system
functioned properly
from 0 to t, i.e. it never failed by time t.
The probability of this happening is R(t).
Or,
The system functioned properly
since the last repair at time u, 0 < u <
t. The probability of this condition is:
with m(u) being
the renewal density function of the system.
Consequently, the point
availability is the summation of the above two
probabilities, or:
In BlockSim
7, the point availability can be obtained through
simulation. It
can be estimated for different points of time
and can also be plotted as a function of time.
Average Uptime
Availability (or Mean Availability),
The mean availability is the proportion of time during a
mission or time period that the system is available for use.
It represents the mean value of the instantaneous
availability function over the period (0, T] and is given
by:
For systems
that have periodical maintenance, availability may be zero
at regular periodical intervals. In this case, mean
availability is a more meaningful measure than instantaneous
availability. Such a definition of availability is commonly
used in manufacturing and telecommunication systems.
Steady State
Availability,
The steady state availability of the system is the limit of
the availability function as time
tends to
infinity. Steady state availability is also called the
long-run or asymptotic availability. A common equation for
the steady state availability found in literature is:
However, it
must be noted that the steady state also applies to mean
availability.
The next
figure illustrates the steady state availability
graphically.
Figure 1
- Illustration of point availability approaching steady
state.
For practical
considerations, the availability function will
start approaching the steady state availability value after
a time period of approximately four times the average
time-to-failure. This varies depending on the
maintainability issues and complexity of the system. In
other words, you can think of the steady state availability
as a stabilizing point where the system's availability is
roughly a constant value.
The steady state
availability reflects the long-term availability after the
system "settles."
The system availability may initially be unstable due to training/learning
issues, deciding on a good spare parts stocking policy,
deciding on the number of repair personnel, optimizing the
efficiency of repair, burn-in of the system, etc., and could take some time before
it stabilizes.
It is important to be very careful in using the steady state
availability as the sole metric for some systems, especially
systems that do not need regular maintenance. A large-scale
system with repeated repairs, such as a car, will reach a
point where it is almost certain that something will break
and need repair once a month. However, this state may not be
reached until, for instance, 500,000 miles. Obviously, if an
Operations Manager of rental vehicles, for example, keeps
the vehicles until they reach 50,000 miles, then this metric
value would not be of any use.
Inherent
Availability, AI
Inherent availability is the steady state availability when
considering only the corrective maintenance (CM) downtime of
the system. This classification is what is sometimes
referred to as the availability as seen by maintenance
personnel. This classification excludes preventive
maintenance downtime, logistic delays, supply delays and
administrative delays.
Since these other causes of delay
can be minimized or eliminated, an availability value that considers
only the corrective downtime is the inherent or
intrinsic property of the system. Many times, this is the
type of availability that companies use to report the
availability of their products (e.g.
computer
servers) because they see downtime other
than actual repair
time as out of their control and too unpredictable.
The corrective
downtime reflects the efficiency and speed of the
maintenance
personnel, as well as their expertise and training
level. It also reflects characteristics that should be of
importance to the engineers who design the system,
such as the complexity of necessary repairs, ergonomics
factors and whether ease of repair (maintainability) was
adequately considered in the design.
For a single component, the
inherent availability can be computed by:
This gets slightly more
complicated for a system. To do this,
you need to look at
the mean time between failures, or MTBF, and compute this as
follows:
MTBF = Uptime / Number of
System Failures
MTTR = CM Downtime / Number of
System Failures
This may look simple.
However, you should keep in mind that until steady state is
reached, the MTBF calculation may be a function of time (e.g.
a degrading system).
In such cases, before reaching steady
state, the calculated MTBF changes
as the system ages
and more data are collected. Thus, the above
formulation should be used cautiously. Furthermore, it is
important to note that the MTBF defined here is different
from the MTTF (or, more precisely for a repairable system,
MTTFF: mean time to first failure).
Achieved
Availability, AA
Achieved availability is very similar to inherent
availability with the exception that preventive maintenance
(PM) downtimes are also included. Specifically, it is the
steady state availability when considering corrective and
preventive downtime of the system. The achieved availability
is sometimes referred to as the availability seen by the
maintenance department (includes both corrective and
preventive maintenance but does not include logistic delays,
supply delays or administrative delays).
Achieved availability can be
computed by looking at the mean time between maintenance
actions, MTBM, and the mean maintenance downtime, :
MTBM = Uptime / (Number of
System Failures + Number of System Downing PMs)
=
(CM Downtime + PM Downtime) / (Number of System Failures +
Number of System Downing PMs)
Note: System Downing PMs are
PMs that cause the system
to go down or require a shut down of
the system.
Operational
Availability, Ao
Operational availability is a measure of the "real" average
availability over a period of time
and includes all
experienced sources of downtime, such as administrative
downtime, logistic downtime, etc. The operational
availability is the availability that the customer actually
experiences. It is essentially the a posteriori
availability based on actual events that happened to the
system. The previously discussed availability classifications are a
priori estimates based on models of the system failure
and downtime distributions. In many cases, operational
availability cannot be controlled by the manufacturer due to
variation in location, resources and other factors that are
the sole province of the end user of the product.
Operational availability is
the ratio of the system uptime
to total time.
Mathematically, it is given by:
where the operating cycle
is the overall time period of operation being investigated
and uptime is the total time the system was functioning
during the operating cycle. (Note: The operational
availability is a function of time,
t, or operating cycle.)
The concept of operational
availability is closely related to the concept of
operational readiness. In military applications, this means
that the assigned numbers of operating and maintenance
personnel, the supply chain for spare parts and training are
adequate. In the commercial world, a manufacturer may be
capable of manufacturing a very reliable and maintainable
product (i.e. very good inherent availability). But
what if the manufacturer has a poor distribution and
transportation system or does not stock the parts needed or
provide enough service personnel to support the systems in
the field? Then, the readiness of this manufacturer to go to
market with the product is low.
Logistic planners, design
engineers
and maintainability engineers can collaboratively estimate
the repair needs of the system, required personnel, spares,
maintenance tasks, repair procedures, support equipment and
other resources. Only when all downtime causes are addressed
will you be able to paint a realistic picture of your
system's availability in actual operation.
What Type of Availability Should
You Use?
You
may be
wondering why there are so many types of availability and
which one you should use. You can use different
classifications of availabilities
to present different
conclusions about your
system's availability. The difference can be potentially
large and availability measurements can be misused or
misleading to your company and your customers.
If you use a
different classification from the one your customer uses,
you and your customer could have very
different impressions of the system. Therefore, the choice
of availability classification to use should be made
carefully, taking
into account your system and
industry and how your company and your customers perceive
availability. Another place
to be careful
about availability definition is in contracts. Definitions need to be stated clearly. Make sure that your
company and your customers have the same understanding of
availability and agree on the classification to use.
As an example, if your system has an increasing availability
(as in the second part of
Figure 1) and you report the long-term steady state
availability of your system, but your customer judges your
system based on the short-term mean availability, then you
will have a problem.
As another example,
consider a company that
rents oil drilling equipment and
is responsible for
repairing the equipment.
If their customer penalizes
them
because the equipment stays down longer
than a
certain duration,
they might
consider using operational and mean availability
metrics (that include all delays) in
their analyses instead
of inherent (or achieved) availability, which
considers
only the corrective (and preventive) actions downtime when parts
and crews are available.
BlockSim 7 Example Let us use the following
repairable system reliability block diagram to illustrate
the different availability classifications and calculations
using BlockSim 7.
The blocks have the
following failure and repair properties.
Component
Failure Distribution (hr)
Repair Duration Distribution (hr)
Preventive Replacement Policy
Preventive Replacement Duration Distribution
(hr)
Repair and Preventive Parts Pool
Maintenance Crew Delay (Travel Time) (hr)
A
Weibull (β = 1.5, η = 1000)
Normal (μ=12, σ= 2)
Every 1000 hr based on system age
Normal (μ=5, σ= 1.5)
Spare Pool Policy A
Normal (μ=5, σ= 2.5)
B
Lognormal (μT'=5,
σT' = 1)
Exponential (Mean = 20)
--
--
Spare Pool Policy B
Normal (μ=5, σ= 2.5)
C
Exponential (Mean = 10000)
Normal (μ=15, σ= 5)
Every 1000 hr based on system age
Normal (μ=5, σ= 1.5)
Spare Pool Policy C
Normal (μ=5, σ= 2.5)
D
Weibull (β = 3 ,η = 2000)
Exponential (Mean = 14)
Every 1000 hr based on system age
Normal (μ=5, σ= 1.5)
Spare Pool Policy D
Normal (μ=5, σ= 2.5)
All the components share
the same maintenance crew. The crew can perform only one
task at a time.
The spare pool policies
have the following properties.
Spare Pool Policy
Initial Stock Level
Logistic Delay to Obtain Part from Pool
Pool Restocking
Logistic Delay (Shipping) to Restock Pool
Spare Pool Policy A
5
Normal (μ=1, σ= 0.5)
Add
1 when stock drops to 1
Normal (μ=96, σ= 3)
Spare Pool Policy B
20
Normal (μ=1, σ= 0.5)
Add
1 when stock drops to 1
Normal (μ=96, σ= 3)
Spare Pool Policy C
1
Normal (μ=1, σ= 0.5)
Add
1 when stock drops to 0
Normal (μ=96, σ= 3)
Spare Pool Policy D
2
Normal (μ=1, σ= 0.5)
Add
1 when stock drops to 1
Normal (μ=96, σ= 3)
In this example, we are
interested in the operation of the system over 3000 hours.
The RBD is created and the
failure and maintenance properties are set.
Calculating
Instantaneous (or Point) Availability
Using the RBD model and all
the repair characteristics and delays, the system
is
simulated for 3000 hours of operation. The following figure
for the instantaneous availability is
then obtained.
As you can see in the above
figure, the availability drops during the PMs. As an example of instantaneous availability
calculations, the instantaneous availability at
t=500 is
0.9580, at
t=1000 is 0.6386
and at t=2500 is 0.9600. The
results can be read directly from the graph or obtained
using the Quick Calculation Pad (QCP) or by reading the
value from the System Point Results report in BlockSim 7.
Calculating Average Uptime
Availability (or Mean Availability)
Using the RBD model and all
the repair characteristics and delays, the system
is
simulated for 3000 hours of operation. The mean availability
can be obtained in BlockSim 7 using the Quick
Calculation Pad (QCP), as shown next, or by reading the value from the
summarized simulation report.
In this example, the mean
availability at 3,000 hours is 0.9453.
Calculating Steady State
Availability
To obtain the long-term
availability of the system, we simulate it for a much longer
time, 10,000 hours. In this example, there is actually no
obvious steady state in terms of the point availability.
The system is
renewed by the PMs and the availability increases again, as can be seen in
the following instantaneous (or point) availability plot. In
addition, the PMs are performed based on predetermined
system age intervals, where the system "goes down" during
the PM, causing point availability to be zero during these
intervals.
Calculating Inherent
Availability
To calculate the inherent
availability, AI, for this example, we need to
simulate the system with corrective maintenance downtime without logistic delays or preventive maintenance.
All of
these characteristics should be disabled in the BlockSim
7 model in order to obtain the intrinsic availability
of the system. To calculate AI, we need to
calculate the MTBF and MTTR.
MTBF
= Uptime / Number of System
Failures
= 2925.9617/6.302
= 464.2909 hours
MTTR
= CM
Downtime / Number of System Failures
= 11.7483 hours
The values of uptime, number
of system failures and CM downtime are all obtained from the
summarized simulation report in BlockSim 7 shown
next.
Therefore:
AI
= MTBF / (MTBF + MTTR) = 0.9753
Calculating Achieved
Availability
To calculate the achieved
availability for this example, we need to simulate the
system with corrective and preventive maintenance downtimes
but without logistic delays (all logistic
delay characteristics should be disabled in the BlockSim 7
model). The estimated achieved availability is obtained by
calculating MTBM and
.
MTBM
= Uptime / (Number of System Failures + Number of
System Downing PMs)
= 2885.9619 / 6.804
= 424.1566
= (CM
Downtime + PM Downtime) / (Number of System Failures
+ Number of System Downing PMs)
=
16.7604 / 6.804
= 16.7604
The values of uptime, number
of system failures, CM Downtime, PM Downtime, number of
system failures and number of system downing PMs are all
obtained from the summarized simulation report in
BlockSim 7, as shown next.
Therefore:
AA
= MTBM / (MTBM +) =
0.9619
Calculating Operational
Availability
This is calculated in
BlockSim 7 similarly to the mean availability
calculation. The operational availability is expected to be
0.9453.
Note: This estimate is based on the model, not
on actual observation of operation. There are other factors (not included in this
model) that could affect the operational availability, such
as customer delays, additional travel time, etc.Typically,
operational availability is calculated based on actual
observed events, not
on the model. This model can
serve only as
an a
priori estimate of the operational availability and can
be fine tuned by adding more expected delays that a customer
can observe.
Conclusion
This article discussed the
different classifications and ways to calculate
the availability of repairable
systems.
This type of
availability for your application should be considered carefully
because different classifications can
give very different results about the system's availability. The article also demonstrated how
BlockSim 7 can be used to make these different
calculations.