For many organizations, data integration has become a major concern, so creating a data integration strategy is a key part of their digital transformation.
As they pursue digital transformation, nearly all enterprises are seeking to become more data-driven. In the NewVantage Partners Big Data Executive Survey, 98.6 percent of enterprises surveyed were working to create a data-driven culture, while only 1.4 percent said that being data-driven was not a priority.
Becoming more data-driven often means combining information from a wide array of sources, applications and formats so that it can be analyzed to derive valuable business insights. The process of combining that data is known as data integration.
For many businesses, data integration begins with a simple data exchange need. For example, an organization may want to pull information from its ecommerce platform into its ERP solution, or it may want to make social media data available through its marketing software.
Over time, these organizations generally find that their data integration needs to grow exponentially. They become very interested in breaking down the "data silos" that have kept different types of data separate from each other so that they can combine and analyze data in new and interesting ways. However, at the same time, they need to make sure that they are keeping their data secure and meeting their compliance needs.
To meet these sometimes contradictory goals, a growing number of enterprises are crafting data integration strategies. In fact, that same NewVantage report found that 47.8 percent of enterprises already have a data strategy in place, and 42.2 percent said they were working on creating a strategy, although it wasn't finished yet. Overall, 98.6 percent of the executives surveyed said that having a data strategy was a priority.
So what should a data integration strategy include? What key questions should it answer and which issues should it address?
Key Elements of a Data Integration Strategy
Different experts recommend different approaches to creating a data integration strategy. One of the easiest and most intuitive may be by answering the "Five Ws and an H" questions: Who, What, When, Where, Why and How.
- Who in your organization will be involved in data integration? Will you have IT specialists with knowledge of programming who tackle all data integration tasks? Or do you want to enable knowledge workers from the business side to use data integration tools on their own? Your answer will have a major impact on the type of data integration solutions you choose to purchase. Many large enterprises have a chief data officer (CDO) and data governance team that establishes policy and procedures, but they deploy tools that enable self-service for analysts and other business users involved in day-to-day creation of reports and analytics.
- What data do you have that might need integration? Many companies have already created an inventory of their data for security or compliance purposes. Revisiting this list can help you see how many different applications and data formats exist in your environment. If you have only a few different data silos, the most cost-effective strategy for you may be to choose a basic data exchange or ETL tool that can handle your specific needs. But if you have a lot of different silos to integrate — or you anticipate that you may want to integrate a lot of different types of data in the future — it might be better to go with a more full-featured data integration platform that can handle many different types of data.
- When will you do your data integration? If you are creating a data warehouse, the data integration will likely happen in advance of any analytics. If you are creating a data lake based on Hadoop or similar technologies that store data in its raw, unaltered form, some data integration will take place right before running analytics workloads. Many companies have both a data warehouse and a data lake. The architecture you choose will affect your data integration procedures and the type of technology you need.
- Where will your data integration take place — in the cloud or on premises? Also, is your data in the cloud, on premises or both? Will a cloud-based solution meet your needs or do you need an on-premise solution? Or will you take a hybrid approach with some integration occurring in the cloud?
- Why are you integrating your data? Many experts would recommend actually starting with this question. You need to understand the business reasons for your data integration in order to create a suitable strategy. Understanding the business case can also help you determine whether a particular solution will result in positive return on investment (ROI).
- How will you do your data integration? This last question is the most complicated of all because it concerns the policies, procedures and tools you will use for data integration. This section of your data integration strategy should also cover three other important considerations: data governance, data quality and data security.
Data governance is defined as the exercise of control and collaborative decision-making over a set of data assets, including monitoring and enforcement. In other words, data governance is all about making sure that the right people have the right access to the right data at the right time.
Good data governance requires organizations to have visibility into where information originated, who created it, who viewed it, who edited it and any transformations it has undergone as part of the data integration process. This concept, also known as data lineage, allows analysts to trace data back to its origins, giving them confidence that the data they are using for their analytics is trustworthy.
Some vendors sell standalone data governance or master data management (MDM) solutions, but these capabilities are also sometimes incorporated into data integration platforms.
Closely related to data governance is the idea of data quality. In fact, the two terms are sometimes conflated or even used interchangeably; however, most data management experts would say that they refer to slightly different concepts.
The concept of data quality can be understood as the degree to which data is accurate, consistent, timely, is relevant for the current use, and aligns with business rules. Good data governance — knowing where your data came from — is essential for good data quality, but governance alone doesn't ensure data quality. For example, data governance might tell you who input the ZIP code for a particular customer, but it doesn't ensure that the zip code is accurate or even that it has the correct number of digits.
Data cleansing or data quality management tools analyze data to find missing records, data that has the wrong format, duplicate records, etc. And as with data governance solutions, data quality tools can be standalone solutions or they can be part of a larger data integration or data management platform.
Data security is also closely associated with data governance, but it is primarily concerned with preventing unauthorized access to data assets. When integrating data from multiple sources, organizations need to make sure that they continue to provide an appropriate level of protection for all the data they are bringing together. Personally identifiable customer information is particularly sensitive and may be covered by government regulations.
Teams working on data integration should work closely with IT security to make sure that they have adequate data protection measures in place. That may include encrypting data and using appropriate authentication tools, in addition to the many other defenses in place on the network. Some data integration platforms have capabilities like role-based authentication and data encryption built in.
Understanding Data Integration Tools
After going through the process of answering the "Five Ws and an H" questions and examining your data governance, data quality and data security needs, you should have a pretty good idea of the capabilities you need in a data integration tool. Many different types of data integration tools are available, including master data management, data governance, data cleansing, data catalogs, data modeling and other tools that include some data integration capabilities.
Some of the most common data solutions that businesses must understand:
- ETL tools extract data from one application, transform it into a new format and load it into a new application. (Note that ELT tools, which extract, load and then transform data, are also available.)
- APIs, short for Application Programming Interfaces, provide a programmatic way for one application to share data with another. These application-specific tools are highly technical and designed for use by developers.
- Data integration platforms incorporate a wide range of different capabilities, such as ETL, ELT, data governance, data quality, data security, etc. These tools can integrate data from a wide variety of different sources and are suitable for use by business users.
- Integration Platform as a Service (iPaaS) offerings are cloud-based tools for data integration. They generally offer very good ease of use and can integrate data from cloud-based sources, including software as a service (SaaS).
- Data migration services move data from one place to another and may offer some limited capabilities for data transformations as well. Most of the major cloud vendors offer migration services for moving data to the cloud.