Observability for the Real World

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Introducing ‘Observability’

Observability is the hot new buzzword in IT Operations, DevOps, Agile, and Site Reliability Engineering (SRE) communities. The concept of observability originally comes from the industrial world, and is defined in Wikipedia as:

“A measure of how well internal states of a system can be inferred from knowledge of its external outputs.”

For example, in a water treatment plant with no instrumentation inside the pipes, a plant operator outside the pipes cannot determine if water is flowing, which way it is flowing, how clean it is, etc. The system lacks observability.

However, by adding flow gauges and quality sensors inside the pipes, connected (by ‘telemetry’) to meters or dashboards outside the pipes, the internal system states (flow speed, water purity, etc.) can be inferred from the external system outputs (meters, dashboards, etc.). The system has observability.

Observability for Software Applications and Services

The same principle can be applied to software. Modern developers are building measurement directly into code, delivering observable status indicators to meters and dashboards outside the application. This allows operations teams (including IT ops, sysadmins, SREs) to, for example:

· Detect, isolate, and alert sooner on critical incidents and events.

· Investigate problem root causes more accurately and efficiently.

· Fix incidents faster with real-time feedback on remediation efforts.

· Conduct more accurate post-incident reviews and post-mortems.

· Better understand problem history to preventing recurrence.

· Close feedback loops with requirements for continuous improvement.

· Use analytics and machine learning to predict and prevent problems.

· And much, much more.

Observability for the Real World

No wonder observability is becoming the norm for cloud-native businesses, which can build and deliver new code unhindered by decades of success and the ‘legacy’ of systems and applications that come with that success.

However, even a large traditional enterprise can build observability into services, even without substantial refactoring. For example:

· With no internal system changes – collect internal system-level data directly from servers, storage, networks, containers, cloud services etc. (e.g. entity performance, utilization, capacity).

· With minor configuration changes – deploy collectd to measure and forward infrastructure attributes (e.g. CPU/memory utilization, network performance, storage IOPS).

· With (probably) minor code changes – deploy statsd to collect and forward metrics from inside your application (e.g. transaction response time, volume, errors etc.).

· With (perhaps) major code changes – use semantic logging (even simple JavaScript injection) to instrument any activity, including business metrics (e.g. sign-ups, click-through rate, revenue).

Each approach is valuable to varying degrees. Even basic infrastructure metrics will help to detect and triage many problems, allowing IT Operations teams to answer key technology questions, such as:

· What is a normal transaction volume or resource utilization by hour, day, or month?

· Is my application performing correctly for this time of day, day of week, etc.?

· Is the application infrastructure and configuration sufficient for my current load?

· Are there transaction bottlenecks in certain applications that are causing problems?

· Are there services or systems throwing exceptions and errors that I need to fix?

However, application activity recorded in a well-structured semantic log opens up observability into higher-order data, allowing multiple stakeholders to also answer key business questions such as:

· How long are purchases taking at different times of day, or days of the week?

· What is my click-through rate, and how does it vary by customer, transaction, product?

· Is my current revenue number normal right now – and what should I do about it?

· Who is my best customer? My worst? Where should I focus my marketing?

· How many purchases are failing, and why? What customers are affected?

From Observation to Action with AIOps

Observability itself is not the end goal. More charts and dashboards will not help your business succeed per se. To be truly meaningful, observability must feed action – such as real-time problem and incident triage, closed DevOps feedback loops, or prescriptive problem prevention.

Typically, this means collecting observability data, correlating it with other monitoring outputs, and processing it with advanced analytics and machine learning, to drive ‘known good’ responses into automated actions. Combining monitoring and observability with advanced data integration, machine learning, predictive analytics, and orchestration capabilities delivers what Gartner calls “Artificial Intelligence for IT Operations,” or “AIOps.”

For example, AIOps solutions will take your raw observability data and make it meaningful and actionable by:

· Integrating it with critical system data like DCIM/APM tools, HTTP events, API outputs, device data, SNMP traps, and even RMF, SMF, or CICS data.

· Improving ‘signal to noise’ by correlating, analyzing, and filtering these integrated datasets to suppress alert storms or isolate the most notable events.

· Leveraging machine learning and predictive analytics to identify and even correct otherwise hidden anomalies to get ahead of potential problems.

· Triggering automated workflows to find, fix, and prevent both known and novel incidents by executing known solutions, even without human intervention.

· Correlating technology and business insights to enable Product Managers and DevOps teams to iterate on new ideas in real-time to achieve business goals.

Observability Nirvana

Observability as practiced at (and often preached by) cloud-based startups delivering web-based services is an exciting new world of IT management – but for many traditional IT Ops, it does not seem achievable. However, any business can and should adopt observability techniques, including large enterprise IT. Especially as a supplement to traditional monitoring, observability changes the game in software service delivery, and moves IT closer to the nirvana of true business-technology alignment.

About the author:

Andi Mann is the Chief Technology Advocate for Splunk.

RELATED NEWS AND ANALYSIS

Huawei’s AI Update: Things Are Moving Faster Than We Think

FEATURE | By Rob Enderle,
December 04, 2020
Keeping Machine Learning Algorithms Honest in the ‘Ethics-First’ Era

ARTIFICIAL INTELLIGENCE | By Guest Author,
November 18, 2020
Key Trends in Chatbots and RPA

FEATURE | By Guest Author,
November 10, 2020
Top 10 AIOps Companies

FEATURE | By Samuel Greengard,
November 05, 2020
What is Text Analysis?

ARTIFICIAL INTELLIGENCE | By Guest Author,
November 02, 2020
How Intel’s Work With Autonomous Cars Could Redefine General Purpose AI

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 29, 2020
Dell Technologies World: Weaving Together Human And Machine Interaction For AI And Robotics

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 23, 2020
The Super Moderator, or How IBM Project Debater Could Save Social Media

FEATURE | By Rob Enderle,
October 16, 2020
Top 10 Chatbot Platforms

FEATURE | By Cynthia Harvey,
October 07, 2020
Finding a Career Path in AI

ARTIFICIAL INTELLIGENCE | By Guest Author,
October 05, 2020
CIOs Discuss the Promise of AI and Data Science

FEATURE | By Guest Author,
September 25, 2020
Microsoft Is Building An AI Product That Could Predict The Future

FEATURE | By Rob Enderle,
September 25, 2020
Top 10 Machine Learning Companies 2020

FEATURE | By Cynthia Harvey,
September 22, 2020
NVIDIA and ARM: Massively Changing The AI Landscape

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
September 18, 2020
Continuous Intelligence: Expert Discussion [Video and Podcast]

ARTIFICIAL INTELLIGENCE | By James Maguire,
September 14, 2020
Artificial Intelligence: Governance and Ethics [Video]

ARTIFICIAL INTELLIGENCE | By James Maguire,
September 13, 2020
IBM Watson At The US Open: Showcasing The Power Of A Mature Enterprise-Class AI

FEATURE | By Rob Enderle,
September 11, 2020
Artificial Intelligence: Perception vs. Reality

FEATURE | By James Maguire,
September 09, 2020
Anticipating The Coming Wave Of AI Enhanced PCs

FEATURE | By Rob Enderle,
September 05, 2020
The Critical Nature Of IBM’s NLP (Natural Language Processing) Effort

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
August 14, 2020

SEE ALL
BIG DATA ARTICLES

Observability for the Real World

RELATED NEWS AND ANALYSIS

Subscribe to Data Insider

Similar articles

Get the Free Newsletter!

Latest Articles

Exploring Multi-Tenant Architecture: A...

8 Best Data Analytics...

Common Data Visualization Examples:...

What is Data Management?...

Advertisers

Menu

Our Brands

Observability for the Real World

RELATED NEWS AND ANALYSIS

Subscribe to Data Insider

Similar articles

8 Best Data Analytics Tools: Gain Data-Driven Advantage In 2024

Common Data Visualization Examples: Transform Numbers into Narratives

What is Data Management? A Guide to Systems, Processes, and Tools

Get the Free Newsletter!

Latest Articles

Exploring Multi-Tenant Architecture: A...

8 Best Data Analytics...

Common Data Visualization Examples:...

What is Data Management?...

Advertisers

Menu

Our Brands