Justifying Preventative Maintenance – Where’s the Value?

Summary:

How to build a simple, easy-to-understand business benefits model for preventative maintenance investments; you have to model the risk of failure. Fear not - this simple calculation will do the trick.

It’s relatively simple to have the IT Value Conversation (What is the value of IT? How much more should we invest? What is the cost of cutting resources?) when IT and their internal partners regularly work together to quantify benefits for projects. This is especially true for manufacturing companies, where projects usually focus on labor productivity – since IT is typically thought of as a “cost of doing business”, something quite removed from the top line.

The challenge often comes with projects that are sponsored by IT, for IT – work done to “maintain technical currency”. What exactly are the benefits of a server refresh or a tedious, desktop-by-desktop upgrade of a virus scanning program? Universally, IT managers insist on keeping the technology up to date, grousing about aging servers “on the brink of crashing” and out-dated software that lacks the features and functionality of the Latest and Greatest. Typically, however, they cannot glibly quantify the hard $$, soft $$, and productivity benefits of keeping these systems up-to-date.

It is difficult to ask for a large capital investment with no quantifiable benefits – I applaud the generations of IT managers that have been able to pull it off. The rest of us, however, need to come up with a simple yet reasonable model for expressing why it makes sense to make these investments.

Note: This is not an education exercise for neo-luddites that don’t get IT (sic) and just want everything to be as cheap and easy as their Android phone. No, the real need is to make a reasoned case to get a chunk of a [typically] small pile of available cash, where you will [always] be competing with other functional areas of the company making their own [purportedly] sound business cases.

Want to win the money grab? You need to come to the table with numbers.

Hard $$, Soft $$, and Productivity

Our basic story is going to be something like this:

What: Upgrade [hard/soft]ware X to latest version Y, at a cost of $Z.
Why: Maintain currency – [hard/soft]ware X is going off vendor support, and is approaching the end of its useful life.
Cost: One time $A plus annual recurring $B
Benefits: ???

Ah, the dreaded Benefits question. In these cases, it’s probably a Risk Management ploy – if we don’t upgrade, the servers will eventually crash, and then umm, ooo, err, and bad things will happen. A fine story, but when in competition for scarce resources, we have to be specific.

I’ve written previously about the three types of benefits for any project; to make a solid business case, we must express the Hard $$, Soft $$, and Productivity benefits in real numbers if we hope to get the effort going. Two of those three categories will probably not identify a huge amount of benefits …

Hard $$ – Typically, the easiest to quantify – since there are rarely any cost reductions associated with upgrade projects. I’m talking real $$ here – as in I will cut next year’s budget by $H when this updated system goes into place. There might be something here; you could be replacing expensive maintenance on aging devices with cheaper maintenance on newer and fewer devices. Do the math and capture anything that you can – even if it is a relatively small amount.

Productivity – Here, we’re talking labor productivity – and for upgrades, this may be expressed in terms like new [hard/soft]ware features additional automation that will reduce support tasks by P hours/month for the team. I wouldn’t expect a lot here, but think about it and see if there is not at least some improvement with the new stuff.

Risk Mitigation and Cost Avoidance

The Soft $$ benefits category is where most of Risk Management thinking comes into play. It’s the classic FUD scenario – if we don’t replace this [hard/soft]ware, we will eventually come to a point where the system fails and the entire company will grind to a halt. Hmmm – sounds like a Doomsday scenario – but this stuff has been working just great for the last 5 years, why not leave well enough alone?

The response should be a conservative yet realistic risk model, that sets a plausible scenario for a system outage. Have a conversation with the primary users of the systems, and ask a series of simple questions that will make the problem real. Here’s a sample exchange concerning a trio of servers that support an ERP system …

If the system fails – can I still take manually orders and ship product?

Yes; we can manually process orders and coordinate manufacturing shifts, but based on volume, we wouldn’t want to go more than two days before the backlog of paperwork would get overwhelming. Plus, we run lean, so without replenishment signals, we will run out of WIP and Raw Material in just a few days.

So – what is the maximum number of days without a system until we can’t make money?

After two days, we will be forced to shut the plant down – two days, and we stop shipping.

How much do I ship in a day?

Total Annual Revenue = $50M per year, and we plan 200 shipping days per year – so that’s $250K per day.

If the servers fail, how long until we get the right replacement parts and get the system back running again?

This system is off maintenance, but parts still exist on eBay – I’d say 1 week.

So therefore – how much is cost of one system failure?

( (5 business days downtime) – (2 business days on “manual mode”) ) * ($250K Daily Revenue) = $750K

Now, let’s “risk adjust” this number, to make it realistic. I’m sure the servers will not last another 10 years, but I also think we could squeeze just one more year out of them – the chance is kinda small, right? If that’s the way we are thinking …

How many years have to go by before we think it’s a 50/50 proposition that the servers will fail?

Three years; in the next three years, it’s an even chance that the servers will fail.

Ok, so now we discount the annual cost of one system failure across that number of years: $750K per failure / 3 = $250K – we’ll call this the Risk Adjusted Cost of Failure.

So now, my suggestion to spend $50K for three servers, for $250K in annual [Soft $$] benefits, does not seem out of line. Actually, seems like a pretty good investment to me; payback in about 10 weeks.

Facts not FUD

Yes, this is a Cost Avoidance conversation, and yes, this based on a certain amount of Preventing Future Calamity. However, this approach helps frame the conversation well; it shifts the conversation from I want new servers to I’m trying to protect our revenue – and building a plausible conversational model with the folks in Customer Service and Operations makes them part of the team arguing for the project.

In the end, you still may not get the dollars for the system upgrade – and the world will still wail and mope until you replace the failed servers when they eventually go. But you will be able to remind folks that it was a joint decision to defer the spending – and that can go a long way towards reducing the noise. This process will also help IT understand what trade-offs are being made when decisions to spend or not spend do not go their way.

Then again – why am I being so pessimistic? With such well-reasoned numbers presenting a logical, fact-based business case, you will get these projects approved every time!

4 August, 2014

Justifying Preventative Maintenance – Where’s the Value?

Hard $$, Soft $$, and Productivity

Risk Mitigation and Cost Avoidance

Facts not FUD

Comments (0)

Leave a Reply Cancel reply

IoT Field Notes: Building a Business Case

How to be Taken Seriously as a Business Partner

Chargebacks vs. Allocations – Defining IT’s Relationship with the Business

Measuring and Reporting IT Value (2 of 2)

James MacLennan