SHARE

Monitoring Complex Systems

Complex networks require increasingly sophisticated monitoring systems. However, far too often, monitoring is an afterthought and not a holistically engineered part of the system. In fact, it is very common that the overall monitoring system is complicated and mission-critical, yet has varying degrees of documentation, training, fault-tolerance and security. In order to improve, organizations must […]

Written By

George Spafford

Jun 17, 2004

5 minute read

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

Complex networks require increasingly sophisticated monitoring systems. However, far too often, monitoring is an afterthought and not a holistically engineered part of the system. In fact, it is very common that the overall monitoring system is complicated and mission-critical, yet has varying degrees of documentation, training, fault-tolerance and security.

In order to improve, organizations must recognize that a monitoring system itself can cause problems and there are a unique set of issues that must be taken into account and mitigated.

Perceived Reliability

We must consider how people perceive the accuracy of the automated feedback systems. A properly designed monitoring system must be such that operators can realistically investigate and record the findings of all alerts raised or issues flagged.

In other words, the system must be a closed loop where in issues are raised, investigated, mitigated (if need be) and results logged. The problem is that as the number of erroneous alerts increase, the amount of personnel time wasted and level of frustration increases as well.

This “perceived reliability” is a key dynamic for any form of monitoring. If operators have expectations that are out of alignment with what the system can deliver, then they are far more likely to discount reports coming from that system and even falsify reports in order to “not waste time.”

Far too many accidents have taken place due to operators assuming that messages were false positives when, in fact, the alerts were accurate. From this, we can posit The Law of False Alerts: As the rate of erroneous alerts increases, operator reliance, or belief, in subsequent warnings decreases.

If a complex system has an area where there are constant false alarms coming from a monitoring system used to detect a security breach, or any critical parameter for that matter, wouldn’t that be a prime target by a hacker or terrorist? Whether it is an intrusion detection system that constantly reports non-existent incursions, a flaky motion sensor flagging movement that doesn’t exist, or an open/closed sensor providing a false report about a valve’s state, if it is a known weak link due to media reports or even the office rumor mill, then it is at risk of allowing a breach to happen.

What do we do?

First, we must treat monitoring as an intrinsic part of the overall system in question. By adding monitoring with little thought to a system, we risk monitoring the wrong events and/or wrongly interpreting reported data. In other words, there must be a holistic approach that identifies key performance indicators in the system, their acceptable bounds and key causal logic. “If these sensors register X, Y and Z then event Alpha must be taking place and the IT operations must be alerted immediately.”

The human factor must be taken into account and careful planning of what events trigger an alarm, processes to validate results, layout of the messages and so on. Always bear in mind that as the level of false positives increases, faith in the monitoring system decreases. The monitoring system must not only be accurate, it must be viewed as accurate and as providing value to the operators or they will increasingly ignore it over time, perhaps to disastrous results.
p
Second, build “monitoring in-depth.” This is a play on “defense in-depth” in that multiple sensors are arranged to confirm events.

For example, one potential scenario is that a more sensitive but more error-prone sensor is used to initially indicate a state and a less sensitive but more reliable sensor is used in series to corroborate the earlier “fast alert” probe.

Another scenario could involve an array of sensors used to confirm an event due to the critical need to be certain that the data collected is accurate. A single monitoring system is as susceptible to a single point-of-failure incident as any other system.

Third, plan for continuous improvement. Odds are high that most of the underlying systems monitored will evolve over time for one reason or another. In parallel, the monitoring system must evolve to continue meeting expectations.

A monitoring system that can only handle 10Mb/s will face a virtually impossible task if the underlying system is upgraded to gigabit speeds and it can’t sample the data fast enough. Furthermore, these systems must be reviewed over time to ensure that they still align with operator requirements. For example, filters may need to be added or modified in order to screen out unnecessary “noise” that the operators are contending with that didn’t initially exist (providing proper analysis is done to determine why the noise exists of course).

Fourth, treat monitoring as an important activity and have the appropriate engineering resources and processes, such as change advisory boards set up to review, approve and schedule changes.

Monitoring must evolve from a haphazard afterthought to a critical application with specified service levels identifying timeliness, accuracy, uptime and security. For all monitoring, and especially SCADA systems, there must be effective communication between functional groups to ensure the systems are designed, secured and maintained appropriately.

Summary

Complex systems require increasingly sophisticated monitoring systems. Care must be taken to design secure systems that meet requirements and are perceived as accurate by the operators.

If a monitoring system is perceived as not adding value, operators will depend on it less and less. This, in turn, creates a fertile environment for security breaches, accidents and all types of inefficiencies. With that in mind, monitoring systems must consistently evolve from afterthoughts to well engineered systems to ensure expectations are met.

Ethics and Artificial Intelligence: Driving Greater Equality

FEATURE | By James Maguire,
December 16, 2020
AI vs. Machine Learning vs. Deep Learning

FEATURE | By Cynthia Harvey,
December 11, 2020
Huawei’s AI Update: Things Are Moving Faster Than We Think

FEATURE | By Rob Enderle,
December 04, 2020
Keeping Machine Learning Algorithms Honest in the ‘Ethics-First’ Era

ARTIFICIAL INTELLIGENCE | By Guest Author,
November 18, 2020
Key Trends in Chatbots and RPA

FEATURE | By Guest Author,
November 10, 2020
Top 10 AIOps Companies

FEATURE | By Samuel Greengard,
November 05, 2020
What is Text Analysis?

ARTIFICIAL INTELLIGENCE | By Guest Author,
November 02, 2020
How Intel’s Work With Autonomous Cars Could Redefine General Purpose AI

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 29, 2020
Dell Technologies World: Weaving Together Human And Machine Interaction For AI And Robotics

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 23, 2020
The Super Moderator, or How IBM Project Debater Could Save Social Media

FEATURE | By Rob Enderle,
October 16, 2020
Top 10 Chatbot Platforms

FEATURE | By Cynthia Harvey,
October 07, 2020
Finding a Career Path in AI

ARTIFICIAL INTELLIGENCE | By Guest Author,
October 05, 2020
CIOs Discuss the Promise of AI and Data Science

FEATURE | By Guest Author,
September 25, 2020
Microsoft Is Building An AI Product That Could Predict The Future

FEATURE | By Rob Enderle,
September 25, 2020
Top 10 Machine Learning Companies 2021

FEATURE | By Cynthia Harvey,
September 22, 2020
NVIDIA and ARM: Massively Changing The AI Landscape

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
September 18, 2020
Continuous Intelligence: Expert Discussion [Video and Podcast]

ARTIFICIAL INTELLIGENCE | By James Maguire,
September 14, 2020
Artificial Intelligence: Governance and Ethics [Video]

ARTIFICIAL INTELLIGENCE | By James Maguire,
September 13, 2020
IBM Watson At The US Open: Showcasing The Power Of A Mature Enterprise-Class AI

FEATURE | By Rob Enderle,
September 11, 2020
Artificial Intelligence: Perception vs. Reality

FEATURE | By James Maguire,
September 09, 2020

SEE ALL
ARTICLES

Monitoring Complex Systems

George Spafford

Company

Categories

Monitoring Complex Systems

RELATED NEWS AND ANALYSIS

George Spafford

Company

Categories