SHARE

Too Much of a Good Thing: Managing Information Overload in Storage Management

When managing storage and other network elements, you can easily end up with far too much of a good thing. Servers, routers, switches, desktops, firewalls, intrusion detection systems — each produces a wealth of information detailing every aspect of its performance, as well as the performance of related network elements. The result is that you […]

Written By

Drew Robb

Jun 23, 2003

6 minute read

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

When managing storage and other network elements, you can easily end up with far too much of a good thing. Servers, routers, switches, desktops, firewalls, intrusion detection systems — each produces a wealth of information detailing every aspect of its performance, as well as the performance of related network elements. The result is that you end up with an overwhelming amount of data. A vast sea of unimportant alerts within device-specific logs masks a handful of vital alerts that require immediate analysis, coordination, and priority attention by administrators.

“Our admins will go in and look at the logs to see what happened before a server locked up,” says Steve Luciano, Network Administrator for New Pig Corporation, an industrial safety and plant maintenance vendor headquartered in Tipton, Pennsylvania. “But it’s difficult to keep on top of all the servers amongst everything else they have to do.”

New Pig searched for a means of presenting storage and networking information from disparate sources in a useful and centralized format. This led to the company acquiring and installing Event Log Management (ELM) software.

Mother Lode

The key element to track when managing storage systems is, of course, the disk drives.

“You have to understand that disk drives are like light bulbs,” says Paul Santeler, VP of Management Networking and High Availability Products Group at Hewlett-Packard. “They will fail. It is how well prepared you are when one fails that makes the difference between a well-run or poorly run data center.”

To help in preparing for possible upcoming failures, disks use a system called Self-Monitoring Analysis and Reporting Technology (S.M.A.R.T.). S.M.A.R.T. monitors up to thirty different items within the drive, including seek time, head flying height, the amount of time it takes to spin a disk up to its rated speed, and the internal temperature of the drive.

S.M.A.R.T. analyzes all these monitored elements and creates an overall health assessment for the drive based on algorithms the manufacture establishes for that particular model. When it appears a device is approaching the failure point, S.M.A.R.T. alerts the administrator in (hopefully) enough time to back up the drive and replace it. If the disk is part of a RAID array, there is an additional level of protection.

“When there is a failure coming, the S.M.A.R.T. drive passes that information to the RAID controller,” says Santeler. “But RAID does its own analysis as well, monitoring hundreds or thousands of things on the drive itself to try to see as a whole what might cause failure.”

But drive status is just one part of ensuring the availability and performance of storage systems. A complete view requires an end-to-end view of the entire process as it affects the end users. Therefore, it is wise to also keep tabs on other sources of information, including:

FECN/BECN – FECNs (Forward Explicit Congestion Notifications) and BECNs (Backward Explicit Congestion Notifications) are Frame Relay messages that notify the receiving (FECN) or sending (BECN) device that there is congestion in the network.

SNMP – SNMP (Simple Network Management Protocol) lets administrators monitor and manage such items as CPU utilization, available disk space, temperature, up or down status of devices, connections or services, excessive errors on switches/routers, server fan failure, and bandwidth utilization.

Security Threats – This includes password hacking, stealth and port scans on firewalls, application failures due to viruses, and login authentication failures stored in firewall or other security logs.

Page 2: Alert Reduction

Alert Reduction

While all this information should make it easy to proactively manage storage and network systems, the problem in most cases is very much one of too much information. Even a medium-sized network can have hundreds of separate logs, and within each of these logs is more information than can easily be digested and operated on. This is where Event Log Management (ELM) tools help out. Examples of ELMs include Adiscon GmbH’s Event Reporter; Somix Technologies, Inc.’s Logalot; TNT Software’s ELM Log Manager; GFI Software’s LANGuard; and RGE, Inc.’s IPSentry.

ELMs aggregate all the information contained in the Event Logs and Syslogs into a single database and present that information in a single interface. While this is easier than having to individually log onto each piece of equipment to view the logs, the real value in ELMs lies in their ability to winnow down the information to a manageable level.

ELMs store all log entries, but since the vast majority of entries are routine items that never need to be seen, the non-essential entries can be configured to not show up on the management console. When something does require intervention, though, administrators can set the appropriate alerting and escalation policies.

New Pig, for example, uses Logalot for ELM. “If you have a problem with a switch and are getting a lot of Cyclic Redundancy Check (CRC) errors, it won’t send a hundred e-mails,” says Luciano, “but they all get tallied on the bulletin board so I can go there to view them.”

Having all alerts available in a single console makes it easier to quickly track down the source of a problem. For instance, knowing that you have simultaneous alerts from the Intrusion Detection System and from the database server indicating excessive CPU utilization provides a quicker answer to what is happening than if you had to track down each individually.

“Before, it was a matter of not really knowing what was going on or why things were happening,” Luciano says. “Now, when the IS manager wants to find out what is going on with the network, she can go to the bulletin board and see all the active situations that are going on.”

Simplification

With storage growing at 50 to 100 percent annually in many organizations, ELM tools certainly won’t solve all problems. They do, however, simplify the often overwhelming business of dealing with multitudes of alerts, alarms, and events. ELMs allow the administrator to set alerting parameters for storage resources (such as disk space, fragmentation levels, and disk performance criteria) and gather those alerts into one central repository. At the end of the day, that means the most vital alerts come to your immediate attention while the abundance of duplicative or less important events remain hidden until you need to drill down further to learn more about specific situations.

This article originally appeared on Enterprise IT Planet.

»

See All Articles by Drew Robb

Ethics and Artificial Intelligence: Driving Greater Equality

FEATURE | By James Maguire,
December 16, 2020
AI vs. Machine Learning vs. Deep Learning

FEATURE | By Cynthia Harvey,
December 11, 2020
Huawei’s AI Update: Things Are Moving Faster Than We Think

FEATURE | By Rob Enderle,
December 04, 2020
Keeping Machine Learning Algorithms Honest in the ‘Ethics-First’ Era

ARTIFICIAL INTELLIGENCE | By Guest Author,
November 18, 2020
Key Trends in Chatbots and RPA

FEATURE | By Guest Author,
November 10, 2020
Top 10 AIOps Companies

FEATURE | By Samuel Greengard,
November 05, 2020
What is Text Analysis?

ARTIFICIAL INTELLIGENCE | By Guest Author,
November 02, 2020
How Intel’s Work With Autonomous Cars Could Redefine General Purpose AI

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 29, 2020
Dell Technologies World: Weaving Together Human And Machine Interaction For AI And Robotics

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 23, 2020
The Super Moderator, or How IBM Project Debater Could Save Social Media

FEATURE | By Rob Enderle,
October 16, 2020
Top 10 Chatbot Platforms

FEATURE | By Cynthia Harvey,
October 07, 2020
Finding a Career Path in AI

ARTIFICIAL INTELLIGENCE | By Guest Author,
October 05, 2020
CIOs Discuss the Promise of AI and Data Science

FEATURE | By Guest Author,
September 25, 2020
Microsoft Is Building An AI Product That Could Predict The Future

FEATURE | By Rob Enderle,
September 25, 2020
Top 10 Machine Learning Companies 2021

FEATURE | By Cynthia Harvey,
September 22, 2020
NVIDIA and ARM: Massively Changing The AI Landscape

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
September 18, 2020
Continuous Intelligence: Expert Discussion [Video and Podcast]

ARTIFICIAL INTELLIGENCE | By James Maguire,
September 14, 2020
Artificial Intelligence: Governance and Ethics [Video]

ARTIFICIAL INTELLIGENCE | By James Maguire,
September 13, 2020
IBM Watson At The US Open: Showcasing The Power Of A Mature Enterprise-Class AI

FEATURE | By Rob Enderle,
September 11, 2020
Artificial Intelligence: Perception vs. Reality

FEATURE | By James Maguire,
September 09, 2020

SEE ALL
ARTICLES

Drew Robb

Drew Robb is a contributing writer for Datamation, Enterprise Storage Forum, eSecurity Planet, Channel Insider, and eWeek. He has been reporting on all areas of IT for more than 25 years. He has a degree from the University of Strathclyde UK (USUK), and lives in the Tampa Bay area of Florida.

Too Much of a Good Thing: Managing Information Overload in Storage Management

Drew Robb

Company

Categories

Too Much of a Good Thing: Managing Information Overload in Storage Management

RELATED NEWS AND ANALYSIS

Drew Robb

Company

Categories