RCM - Reliability centered maintenance

Reliability centered maintenance, or RCM, was introduced by United Airlines in the 1960s. It was a response to a higher-than-expected rate of failures in commercial aircraft.

RCM was defined as a "process to establish the safe minimum levels of maintenance" by John Moubray. This definition balances the need for safety against the goal of minimizing the cost and interruption of service incurred by performing maintenance activities.

The goal is to do keep a machine or process running safely and correctly without "surprise" failures, at minimum cost. RCM accepts scheduled maintenance activities and costs; it does not expect 24/7 uptime of every component.

One of the implications is that some processes might be allowed to run to failure, if the cost of repair or replacement is low. For example, in an office setting, one would not schedule maintenance operations on an inexpensive ballpoint pen. Instead, simply reorder a box of twenty when the box is empty. Performing regular maintenance on a high-end and heavily-used printer or photocopier may be wise, especially if scheduled for a weekend when it will not be needed to print a sales presentation at the last minute.

A Brief History

One of the early surprises was that aircraft in a fleet would go out of service more often than was predicted by a maintenance schedule based on repairing or replacing individual components. For example, one device might be expected to function for, say, 900 hours before requiring service. A related part might have been serviced at its expected interval, say at 500 hours. Yet the airplane combination of these parts might fail at 800 hours.

The interactions among sensitive parts made the maintenance effort impossible to predict based on individual components. Probability theorists noted that the time until failure is better described as "memory-less". This means that the probability a component would fail in the first 100 hours is the same as the probability that, if it has not yet failed after N hours, it would fail in the next 100 hours.

Seven Questions

RCM works by asking seven questions, documenting the answers, and using this information to guide the maintenance program.

  • What should this item do?
  • What are the failure modes for this item?
  • What events might cause each of these failures?
  • What are the results of these failures?
  • What are the consequences of these failures?
  • What regular and systematic actions can prevent these failures, or minimize them to the point where the failure does not interfere with the safe operation of the item or overall machine or process?
  • What are the remedies if preventative actions cannot be taken?

Implications of Risk-based Maintenance

RCM has several implications in planning a maintenance strategy:

  • Some Failures are More Urgent than Others
  • Some Failures Just Need Work-Arounds
  • Some Failures must be Avoided
  • Practical Suggestions

Some Failures are More Urgent than Others

Let's use some automotive examples to explore this point.

If the cigarette lighter outlet fails, you may be unable to light a cigarette or keep your mobile phone charged. There is no effect on the primary function of the automobile. The car will not come to a stop by itself, nor does the driver need to arrange for immediate repairs for safety reasons.

If one headlight burns out, the car is still operable, even at night. The failure of that one component does not absolutely prevent driving the vehicle.  Depending on the jurisdiction, however, there may be other problems: one might need to convince a police officer of the plans to replace that lamp.

Losing the second headlight would be a catastrophic safety problem at night, although the car could continue moving forward. Safety would be so compromised that a prudent driver would stop.

If the radiator starts leaking, the engine will soon overheat. Driving until engine failure will stop the car and also result in a very high repair bill.

Some Failures Just Need Work-Arounds

In a manufacturing environment, an example of a failure that might not require immediate repair is the failure to automatically count outputs, if this can be done manually. The cost of shutting down and repairing may be less than the additional labour, reduced productivity or increased error rate in counting.

The key point is that the RCM plan included this option as a well-considered response. It is easy to make the wrong decision "in the heat of the moment".

Some Failures must be Avoided

The RCM process identifies failure conditions with safety or financial risks that must be avoided "at any cost". Flight-critical airplane components must work, or must have reliable back-up systems.

In a manufacturing context, it identifies the high-profile maintenance needed to keep critical components operating to meet their safety and functional requirements.

One Practical Suggestion: Prioritize

RCM is similar to the Failure Modes and Effects Analysis (FMEA) process, in that both assess risks in terms of the likelihood of a noticeable failure occurring and how serious the effects would be. In both, the analyst is interested in failures that affect the process: the machine shuts down, produces defective products, or is likely to injure the operator.

Both processes make detailed plans for avoiding or correcting these failures. The level of detail means that many factories can only afford to begin the project. Therefore, the manager must prioritize the machines or processes that are already known to carry the greatest risks, or else start with a broad but shallow assessment.

Some industries require the full analysis: the nuclear and aviation industries are examples where a failure can be catastrophic. In these industries, the loss of public confidence could be a greater disaster than the financial cost of a breakdown. As noted, most other corporations should begin by making a "quick" survey to determine which machines need a complete analysis.

By Oskar Olofsson