Mean Time Between Failures in Plant Maintenance

KPI's in Plant Maintenance

Mean Time Between Failures and Mean Time To Repair are two important KPI's in plant maintenance management and lean manufacturing.

Mean Time Between Failures = (Total up time) / (number of breakdowns)

Mean Time To Repair = (Total down time) / (number of breakdowns)

"Mean Time" means, statistically, the average time.

"Mean Time Between Failures" is literally the average time elapsed from one failure to the next.  Usually people think of it as the average time that something works until it fails and needs to be repaired (again). As reliable production processes are crucial in a Lean Manufacturing environment, MTBF is vital for all lean initiatives

"Mean Time To Repair" is the average time that it takes to repair something after a failure.

For something that cannot be repaired, the correct term is "Mean Time To Failure" (MTTF).  Some would define MTBF – for repair-able devices – as the sum of MTTF plus MTTR. .In other words, the mean time between failures is the time from one failure to another.  This distinction is important if the repair time is a significant fraction of MTTF.

Here is an example.  A light bulb in a chandelier is not repairable, so MTTF is most appropriate.  (The light bulb will be replaced).  The MTTF might be 10,000 hours. 

On the other hand, without oil changes, an automobile's engine may fail after 150 hours of highway driving – that is the MTTF.  Assuming 6 hours to remove and replace the engine (MTTR), Mean Time Between Failures is 150 hours.

Like automobiles, most manufacturing equipment will be repaired, rather than replaced after a failure, so Mean Time Between Failures is the more appropriate measurement.


What is a Failure?

"Failure" can have multiple meanings.  Let us briefly examine one device's "failures":

An Uninterruptible Power Source (UPS) may have five functions under two conditions:

  • While the main power is available:
    • Allow power to flow from the main source to the machine being protected
    • Condition the power by limiting surges or brownouts
    • Store power in a battery, up to the battery's full charge
  • When the main power is interrupted:
  • Supply continuous power to the machine being protected
  • Emit a signal to indicate that the main power is off

There is no question that the UPS has failed if it prevents main power from flowing to the machine being protected (function 1).  Failures for functions 2, 3 or 5 may not be obvious, because the "protected" machine is still running on main power or on the battery supply.  Even if noticed, these failures may not trigger immediate corrective measures because the "protected" machine is still running and it may be more important to keep it running than to repair or replace the UPS.

What is Availability?

The "availability" of a device is, mathematically, MTBF / (MTBF + MTTR) for scheduled working time.

The automobile in the earlier example is available for 150/156 = 96.2% of the time.  The repair is unscheduled down time.

With an unscheduled half-hour oil change every 50 hours – when a dashboard indicator alerts the driver – availability would increase to 50/50.5 = 99%.

If oil changes were properly scheduled as a maintenance activity, then availability would be 100%.

Why are these important for reliablity ?

"Availability" is a key performance indicator in manufacturing; it is part of the "Overall Equipment Effectiveness" (OEE) metric.

A production schedule that includes down time for preventative maintenance can accurately predict total production.  Schedules that ignore Mean Time Between Failures and Mean Time To Repair are simply future disasters awaiting remediation.

How to calculate actual Mean Time Between Failures

Actual or historic Mean Time Between Failures is calculated using observations in the real world.  (There is a separate discipline for equipment designers, based on the components and anticipated workload).

Calculating actual Mean Time Between Failures requires a set of observations; each observation is:

  • Uptime_moment: the moment at which a machine began operating (initially or after a repair)
  • Downtime_moment: the moment at which a machine failed after operating since the previous uptime-moment

So each Time Between Failure (TBF) is the difference between one Uptime_moment observation and the subsequent Downtime_moment.

Three quantities are required:

  • n = Number of observations.
  • ui = This is the ith Uptime_moment
  • di = This is the ith Downtime_moment following the ith Uptime_moment

So Mean Time Between Failures = Sum (di – ui)/ n  , for all i = 1 through n observations.  More simply, it is the total working time divided by the number of failures.

By Oskar Olofsson

Read more:

KPI's in Manufacturing Plant Maintenance

Preventive Maintenance

Do you need this calculator in spreadsheet format? Buy it from our online store


Free, Cloud-based OEE System

  • No installation required
  • Analyzes downtimes, performance, KPIs and more
  • Quick and easy setup, see results in 5 minutes
  • Start your free trial today!
Free Signup