SAP took the wraps off a new data integration, management and governance offering called SAP Data Hub during a New York City event on Sept. 25. The product is aimed at helping enterprises draw more value out of their large and complex data storage environments, faster.
SAP Data Hub enables organizations to get a running start on their big data projects and applications by dispensing with one of the most cumbersome aspects of readying enterprise data for value-extracting workloads, namely moving data into a purpose-built repository for further processing.
“You’re moving from a world of centralizing the data in one location to a world of centralizing the data management,” Greg McStravic, president of SAP Database and Data Management at SAP, told attendees of the SAP’s Big Data event. “It’s the movement, orchestration, monitoring and governance of the data while you’re leaving the data where it resides.”
Available as an on-premise application to start—a platform as a service (SaaS) and software as a service (SaaS) implementation is in the works, according to the company—the software provides straightforward visibility into complex data landscapes, or the assortments of cloud storage, data lakes, data warehouses and other often-siloed data sources that can hinder an enterprise’s big data efforts.
According to SAP’s own Data 2020: State of Big Data Study (PDF), the vast majority of enterprises (85 percent) are struggling to manage data from a variety of locations. Seventy-two percent reported that the sheer number and variety of data sources have added complexity to their data landscapes.
“Our study findings show that like natural energy resources, data resources are just beneath the surface, in places that are either inaccessible or invisible,” McStravic said in prepared remarks related to the report. “If data is the new gold, then we aim to make data scientists the new gold miners.”
After getting a handle on one’s data landscape, SAP Data Hub enables users to create and manage data processing pipelines that can access, transform and harmonize business information from multiple sources. Supported libraries, like the TensorFlow machine learning technology, can help customers accelerate artificial intelligence (AI) projects that can tap into vast stores of data across various locations. The data pipeline models created by the tool can be easily copied, tweaked and reused, further accelerating innovation, asserts SAP.
Finally, SAP Data Hub distributes the compute activities of data pipelines to the environments in which the data natively resides. It’s a tactic that allows data pipelines to be completed as quickly as possible and enables businesses to use the existing data processing capabilities found in SAP HANA, Apache Hadoop, SAP Vora or Apache Spark, said the company.
Pedro Hernandez is a contributing editor at Datamation. Follow him on Twitter @ecoINSITE.