Saturday, May 18, 2024

5 Top Data Storage Software Trends 

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

As storage hardware becomes more commoditized—especially at the lower end—software has stolen some of the spotlight, distinguishing itself by providing additional functionality and features that set products apart. Here are five of the top storage software trends.

Keeping Pace with Evolving Data Access Requirements

For years, the industry has been focused on how to store the rapidly growing volumes of unstructured data. While that’s still true, organizations are increasingly looking in parallel at how to monetize that data. That means finding ways to cost-effectively store more and more data without bogging it down in difficult-to-access databases and repositories.

“Monetizing unstructured data means it needs to be liberated from data silos for retention, operationalized with actionable metadata, and made available to applications and users wherever they reside,” said Floyd Christofferson, Hammerspace’s VP of product marketing.

Overcoming Incompatibility Issues

There are many operating systems in which data is stored, and storage vendors have also developed their own storage operating systems and platforms. Often, they all use different standards, protocols, and languages, leading to significant incompatibility issues. 

“The emerging trend is software that can bridge the gaps between incompatible file systems and vendor silos to enable a holistic access to and management of data via shared and extensible metadata across any on-premises or cloud storage type,” Christofferson said. “In this way, data becomes a more easily manageable and exploitable resource.” 

Incorporating More AI/ML into Analytics 

Greater compatibility and data access has opened many doors in the areas of machine learning (ML), artificial intelligence (AI), and analytics. Analytics personnel no longer find themselves wrestling with data formats, attempting to shift data around into business intelligence (BI) applications, or wasting time trying to corral data to explore. Using AI and ML makes it possible to globalize analytics across multiple data repositories and silos without consolidating it into a single repository, reducing operational expenses and increasing user and application productivity with persistent access.

Boosting Storage Performance 

The wider usage of AI in the storage industry is exerting itself on storage software in a variety of ways. Users looking for the best possible performance want gains on both the hardware and the software side of the equation. AI is making it happen—Gartner predicts that 70 percent of enterprises will use cloud and cloud-based AI infrastructure to operationalize artificial intelligence by next year.

“AI workloads in the cloud require cloud storage to optimize for high performance and throughput,” said Bin Fan, Alluxio’s VP of open source and founding engineer. But building optimal storage systems is challenging, as AI workloads are data- and metadata-intensive and require storage systems with high performance, high concurrency, and high data throughput. 

“While designing AI platform architecture in the cloud,” Fan said, “organizations should consider these storage requirements upfront and choose software-defined storage solutions for scalability and cost-effectiveness.”

Moving Toward Container-Native Storage  

Container-native storage is becoming increasingly popular among enterprises, and there are three primary database or data state use cases for it:

  • Transactional databases, such as Oracle and SQL Server 
  • Large data lakes or data warehouses for analytics and post-processing 
  • Near-line data sources, such as message queues (Kafka), SQL/NoSQL (Postgres, Mongo), file and object, and searchable databases (Elastic) 

Gou Rao, CTO of Portworx by Pure Storage, recommended that the first two be kept outside of any container platform, since they really don’t have an agile and dynamic management requirement. But the third use case, near-line data sources, is more closely tied with the front-end application stack and is subject to more frequent updates, schema changes, and duplication of deployments. 

“The truth is, there are a lot of database deployments at the near-line tier with frequent enough changes that require constant control directly by the development team,” Rao said. “Since the container-based (cloud-native or stateless) applications, which are managed by Kubernetes, directly depend on this tier of databases, it makes more sense to have these resources then directly controlled by a common control plane, the Kubernetes control plane. Over the next few years, expect to see Kubernetes offer a richer and more native database-as-a-service experience.”  

 

Subscribe to Data Insider

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more.

Similar articles

Get the Free Newsletter!

Subscribe to Data Insider for top news, trends & analysis

Latest Articles