SHARE

The Operator Did It

What do an overwhelming majority of accidents and errors have in common? All too often, operator error is identified as the causal factor. In fact, operators are identified as the culprits in 80 percent of all accidents. Doesn’t this seem odd in this day and age of automation and complex systems? Doesn’t this seem like […]

Written By

George Spafford

Jun 8, 2004

6 minute read

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

What do an overwhelming majority of accidents and errors have in common? All too often, operator error is identified as the causal factor.

In fact, operators are identified as the culprits in 80 percent of all accidents. Doesn’t this seem odd in this day and age of automation and complex systems? Doesn’t this seem like an easy out on the part of investigators? Rather than play the blame game for whatever reason, organizations should focus on causal factors and address the real issues.

Complexity

It’s no secret that our world is becoming more complex. Not only does everything from stoves to air conditioners to the space shuttle have electronics and computers, but now they are becoming increasingly integrated into a web of connections creating dependencies and interactions never before dreamed of. However, with all of these interconnections and dependencies come the potential for some serious headaches.

We like to think of ourselves as rational people who live by causality. Simple causality is very straightforward and we cling to it. Simple causality deals with cause and effect — the system overheated and failed. We view this as “because it overheated, the system failed.” This seems easy enough. The truth is that problems aren’t always this simple.

When multiple things fail and interact in ways that, combined, cause the system to fail, this is known as systemic causality. The trick here is that systemic causality does not necessarily need to follow a linear path. Instead, it depends on the failure of multiple components, or subsystems, to interact and fail.

For example, a power failure causing systems to crash seems easy. But, when we dig in we find that there was a UPS (uninterruptible power supply) and generator. The systems should have been protected.

However, we find out that the UPS died earlier than expected due to a couple of space heaters being plugged in on the UPS circuit because an electrician accidentally connected an outlet to the protected power circuit and the heaters only ran during business hours when the staff was present, hence the load was never noticed during regular weekend testing.

To make matters worse, the generator hadn’t been exercised for a long time due to a failure of an electronic remote starter and, hence, the fuel had evaporated and clogged the fuel injection system. Operations didn’t notice because the generator’s monitoring circuit had been erroneous for quite some time and operators discounted what the logs said. To make matters worse, the second generator was down having preventive maintenance performed.

The result is that the supposedly fault-tolerant system failed not just from a lack of power, but also systemic issues related to the backup systems that could not have been foreseen. Who could know that the power would fail when one generator was down, the other hadn

’

t run since the last exercise and someone had plugged in space heaters?

The realistic answer is that nobody could know that all of these independent issues would interact to cause an outage. How would you address this type of scenario?

Incomprehensibility

The hard part, and this is why operators get blamed the most, is that many times the failures in the component systems cause scenarios that were previous unfathomable. If you visit this site and go to the accidents and errors page, you will see a long list of accidents, of which many are blamed on operators. Why? My hunch is that they needed to blame somebody. “We are going to put you into a situation for which you are totally unprepared and see how you do. If you fail, we’ll blame you.” That sure sounds promising, doesn’t it?

So what does this mean? First, vendors and internal groups must place an emphasis on proper design and thorough testing. The testing needs to be formalized and there are software test engineers versed in the proper methodologies. Note, quality must come first, before features, and while testing is a much-needed detective control that assists in ensuring quality, it is not the total solution and must be integrated with the overall system such that feedback is generating process improvement loops. To borrow a phrase from manufacturing — you don’t inspect quality in, you build quality in.

Second, an effective change management process must be in place and followed. There must be detective controls that can assist with the mapping of changes found in production back to authorized change orders. Only authorized changes should be allowed to remain. The ITIL Service Support book and the ITPI Visible Ops methodology provide great guidance here.

Third, we must evolve adaptive processes that rapidly recognize and adapt to variations from the understood mean. This applies not just to application logic, but manual human processes as well. Systems and their operators must be adept at recognizing the need to change and then actually changing in a secure, timely and efficient manner.

Fourth, members of failure review boards must avoid taking the easy way out. Rather than flag the outcome as a result of operator error, ask yourself these two simple questions: “Could anyone have realistically known what to do in that situation?” and “Could anyone have computed the solution and acted in the timeframe allotted?”Quite often, the answer is “no,” which points back to systemic issues that are process and/or technically based.

Summary

Rather than expect operators to perform superhuman acts of omniscience, we must confront systemic issues in processes and technology that prevent the accidents from happening again. Yes, people can and do make mistakes. The point is that it is too simple to blame the operator for unexplained failures.

Organizations must dig in and ensure that there is learning after the accident and that appropriate measures are introduced to prevent reoccurrences in the future. This is done by addressing root causes — not just playing the blame game.

Ethics and Artificial Intelligence: Driving Greater Equality

FEATURE | By James Maguire,
December 16, 2020
AI vs. Machine Learning vs. Deep Learning

FEATURE | By Cynthia Harvey,
December 11, 2020
Huawei’s AI Update: Things Are Moving Faster Than We Think

FEATURE | By Rob Enderle,
December 04, 2020
Keeping Machine Learning Algorithms Honest in the ‘Ethics-First’ Era

ARTIFICIAL INTELLIGENCE | By Guest Author,
November 18, 2020
Key Trends in Chatbots and RPA

FEATURE | By Guest Author,
November 10, 2020
Top 10 AIOps Companies

FEATURE | By Samuel Greengard,
November 05, 2020
What is Text Analysis?

ARTIFICIAL INTELLIGENCE | By Guest Author,
November 02, 2020
How Intel’s Work With Autonomous Cars Could Redefine General Purpose AI

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 29, 2020
Dell Technologies World: Weaving Together Human And Machine Interaction For AI And Robotics

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 23, 2020
The Super Moderator, or How IBM Project Debater Could Save Social Media

FEATURE | By Rob Enderle,
October 16, 2020
Top 10 Chatbot Platforms

FEATURE | By Cynthia Harvey,
October 07, 2020
Finding a Career Path in AI

ARTIFICIAL INTELLIGENCE | By Guest Author,
October 05, 2020
CIOs Discuss the Promise of AI and Data Science

FEATURE | By Guest Author,
September 25, 2020
Microsoft Is Building An AI Product That Could Predict The Future

FEATURE | By Rob Enderle,
September 25, 2020
Top 10 Machine Learning Companies 2021

FEATURE | By Cynthia Harvey,
September 22, 2020
NVIDIA and ARM: Massively Changing The AI Landscape

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
September 18, 2020
Continuous Intelligence: Expert Discussion [Video and Podcast]

ARTIFICIAL INTELLIGENCE | By James Maguire,
September 14, 2020
Artificial Intelligence: Governance and Ethics [Video]

ARTIFICIAL INTELLIGENCE | By James Maguire,
September 13, 2020
IBM Watson At The US Open: Showcasing The Power Of A Mature Enterprise-Class AI

FEATURE | By Rob Enderle,
September 11, 2020
Artificial Intelligence: Perception vs. Reality

FEATURE | By James Maguire,
September 09, 2020

SEE ALL
ARTICLES

The Operator Did It

George Spafford

Company

Categories

The Operator Did It

RELATED NEWS AND ANALYSIS

George Spafford

Company

Categories