Mining Data to Up Airline Safety

Download the authoritative guide: Cloud Computing 2018: Using the Cloud to Transform Your Business

Share it on Twitter  
Share it on Facebook  
Share it on Google+
Share it on Linked in  
Air travel safety has come a long way. And technology is partly to thank for it.

According to industry statistics, airplanes are the safest way to get from Point A to Point B. Driving a car is positively hazardous in comparison. In 1950, 17 out of every one million commercial airline passengers worldwide died. But since the mid-1970s, that figure has fluctuated around one in a million, and the U.S. average is .3 per million.

And technology plays a role in that safety record.

''Our aviation system is so robust with backups that a single problem almost never winds up hurting someone,'' says Christopher Hart, the Federal Aviation Administration's (FAA) systems administrator for System Safety.

To keep it that way, FAA investigators pore over wreckage from every accident to determine what went wrong and what steps to take to ensure it never happens again. But crashes now are so rare, and the circumstances so unique, that they offer little guidance.

''There is so little predictability in how the next accident will occur that it is hard to predict where an intervention should be to prevent accidents,'' adds Hart.

To further improve safety, therefore, it is necessary to look at potential, rather than actual problems.

One way is to use simulations. Airplane manufacturers extensively use computer modeling to see how their equipment performs under various weather conditions and load factors. Another method is to look at small, but common, mechanical anomalies which could lead to problems down the road.

''We are trying to get smarter and look at events that happen relatively frequently, but are innocuous by themselves because of the robustness of the systems,'' says Hart. ''But if they are part of the links in an accident chain, we can stop those links before they cause an accident.''

Achieving this requires overcoming two barriers.

The first is that the potentially useful data is held by thousands of different entities, including private and national airlines, manufacturers, maintenance companies, air traffic controllers, trade associations, labor unions and air forces. It would be useful, for example, to be able to aggregate the Boeing 747 maintenance records of all airlines that fly the plane so common problems could be identified and corrected, rather than each airline being limited to the information on the few 747's in its own fleet. But these groups dont necessarily want to share detailed information about their operations, whether due to competitive advantage or fear of law suit.

The other barrier is that most of the data is unstructured, making it difficult to compare and analyze.

To address these shortcomings, the FAA facilitated the creation of the Global Aviation Information Network (GAIN) in 1996. GAIN is a voluntary international membership organization composed of public and private entities from more than 50 countries. It is structured around the philosophy that ''the collection, analysis, and sharing of safety information using advanced technologies in a just culture environment will illuminate safety concerns and permit identification and implementation of cost-effective mitigations.''

While getting its members to openly share information still has a long way to go, progress is being made in developing analytical tools specifically designed to analyze safety information.

''The airline industry generates two types of data -- digital data from flight data recorders and textual data generated from reports written by pilots and others,'' explains Hart. ''Several entities are looking at the digital data, so our main focus has been on the free text data where not as much work has been done.''

To fill this hole, GAIN's Analytic Methods and Tools Working Group has sponsored the creation of several tools to analyze the text information. To date, each of these has been used by a single airline.

One of these tools was a proof of concept done by Southwest Airlines using the PolyAnalyst tool from Megaputer, Inc. of Bloomington, In. Hart says PolyAnalyst was selected, in part, because of its ability to analyze small data sets.

''One of the issues with text mining software is the volume of information needed to provide a valid analysis,'' he says. ''Some of the software requires quite a large number of data inputs to develop relationships between words and terminology in the data set, but the Megaputer tool is more applicable to smaller data sets than other tools.''

The six-week test involved reports from Southwest's pilots, detailing any abnormal occurrences during different flight phases. These reports are filed in an Oracle database containing 63 structured fields. It also has an unstructured field allowing input of up to 4,000 words of free text for pilots to give a narrative description of the incident.

The existing system for analyzing the material in the database was a time-consuming manual process which relied on the analyst's memory and was prone to human error.

PolyAnalyst analyzed both the text and structured data of 2,000 database records, and generated graphic depictions of the types of anomalies for each type of aircraft in use. The user could click on any of the data points in the graph to drill down to the individual pilot reports to get all the details.

While it is helpful for an individual airline to have such safety information, the real benefit will come when airlines start sharing this information.

''Individual airlines have a certain quantity of data that they keep to themselves, so the Megaputer tool is only being applied to data from one airline,'' says Hart. ''If we are successful in pooling data from other airlines it will be more successful.''

This is where GAIN's other data sharing project comes into play.

Hart stresses that the FAA is not trying to get the airlines' data. Instead, GAIN is working on creating a method where the raw data remains on the servers of each of its members, but they pool non-identifiable data.

''We are not only looking at what happens at the airline level, but at the system level -- airlines, plus air traffic control systems, plus the maintenance network,'' he says. ''The foundation for that will be what the individual entities are finding from programs like Southwest's.''

Submit a Comment

Loading Comments...