Friday, May 24, 2024

Mining Data to Up Airline Safety

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Air travel safety has come a long way. And technology is partly to thank

for it.

According to industry statistics, airplanes are the safest way to get

from Point A to Point B. Driving a car is positively hazardous in

comparison. In 1950, 17 out of every one million commercial airline

passengers worldwide died. But since the mid-1970s, that figure has

fluctuated around one in a million, and the U.S. average is .3 per


And technology plays a role in that safety record.

”Our aviation system is so robust with backups that a single problem

almost never winds up hurting someone,” says Christopher Hart, the

Federal Aviation Administration’s (FAA) systems administrator for System


To keep it that way, FAA investigators pore over wreckage from every

accident to determine what went wrong and what steps to take to ensure it

never happens again. But crashes now are so rare, and the circumstances

so unique, that they offer little guidance.

”There is so little predictability in how the next accident will occur

that it is hard to predict where an intervention should be to prevent

accidents,” adds Hart.

To further improve safety, therefore, it is necessary to look at

potential, rather than actual problems.

One way is to use simulations. Airplane manufacturers extensively use

computer modeling to see how their equipment performs under various

weather conditions and load factors. Another method is to look at small,

but common, mechanical anomalies which could lead to problems down the


”We are trying to get smarter and look at events that happen relatively

frequently, but are innocuous by themselves because of the robustness of

the systems,” says Hart. ”But if they are part of the links in an

accident chain, we can stop those links before they cause an accident.”

Achieving this requires overcoming two barriers.

The first is that the potentially useful data is held by thousands of

different entities, including private and national airlines,

manufacturers, maintenance companies, air traffic controllers, trade

associations, labor unions and air forces. It would be useful, for

example, to be able to aggregate the Boeing 747 maintenance records of

all airlines that fly the plane so common problems could be identified

and corrected, rather than each airline being limited to the information

on the few 747’s in its own fleet. But these groups dont necessarily

want to share detailed information about their operations, whether due to

competitive advantage or fear of law suit.

The other barrier is that most of the data is unstructured, making it

difficult to compare and analyze.

To address these shortcomings, the FAA facilitated the creation of the

Global Aviation Information Network (GAIN) in 1996. GAIN is a voluntary

international membership organization composed of public and private

entities from more than 50 countries. It is structured around the

philosophy that ”the collection, analysis, and sharing of safety

information using advanced technologies in a just culture environment

will illuminate safety concerns and permit identification and

implementation of cost-effective mitigations.”

While getting its members to openly share information still has a long

way to go, progress is being made in developing analytical tools

specifically designed to analyze safety information.

”The airline industry generates two types of data — digital data from

flight data recorders and textual data generated from reports written by

pilots and others,” explains Hart. ”Several entities are looking at the

digital data, so our main focus has been on the free text data where not

as much work has been done.”

To fill this hole, GAIN’s Analytic Methods and Tools Working Group has

sponsored the creation of several tools to analyze the text information.

To date, each of these has been used by a single airline.

One of these tools was a proof of concept done by Southwest Airlines

using the PolyAnalyst tool from Megaputer, Inc. of Bloomington, In. Hart

says PolyAnalyst was selected, in part, because of its ability to analyze

small data sets.

”One of the issues with text mining software is the volume of

information needed to provide a valid analysis,” he says. ”Some of the

software requires quite a large number of data inputs to develop

relationships between words and terminology in the data set, but the

Megaputer tool is more applicable to smaller data sets than other


The six-week test involved reports from Southwest’s pilots, detailing any

abnormal occurrences during different flight phases. These reports are

filed in an Oracle database containing 63 structured fields. It also has

an unstructured field allowing input of up to 4,000 words of free text

for pilots to give a narrative description of the incident.

The existing system for analyzing the material in the database was a

time-consuming manual process which relied on the analyst’s memory and

was prone to human error.

PolyAnalyst analyzed both the text and structured data of 2,000 database

records, and generated graphic depictions of the types of anomalies for

each type of aircraft in use. The user could click on any of the data

points in the graph to drill down to the individual pilot reports to get

all the details.

While it is helpful for an individual airline to have such safety

information, the real benefit will come when airlines start sharing this


”Individual airlines have a certain quantity of data that they keep to

themselves, so the Megaputer tool is only being applied to data from one

airline,” says Hart. ”If we are successful in pooling data from other

airlines it will be more successful.”

This is where GAIN’s other data sharing project comes into play.

Hart stresses that the FAA is not trying to get the airlines’ data.

Instead, GAIN is working on creating a method where the raw data remains

on the servers of each of its members, but they pool non-identifiable


”We are not only looking at what happens at the airline level, but at

the system level — airlines, plus air traffic control systems, plus the

maintenance network,” he says. ”The foundation for that will be what

the individual entities are finding from programs like Southwest’s.”

Subscribe to Data Insider

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more.

Similar articles

Get the Free Newsletter!

Subscribe to Data Insider for top news, trends & analysis

Latest Articles