Data collection trends tend to vary tremendously over time. With so much data now available in the enterprise, how data is collected can be a major problem. Instead of coming from one app, it must be gathered from multiple systems across the enterprise. But it is not only internal data that is gathered. The bulk of it comes from customers, the web, and social media.
5 top data collection trends
1. Unstructured Data
Not so long ago, it was all about collecting data into structured repositories such as relational databases. That started to change about a decade ago as social media gathered momentum. Suddenly, unstructured data was king. Platforms like Hadoop emerged to collect and bring order to unstructured data. Organizations found themselves attempting to bring huge volumes of data from disparate sources into one physical or virtual spot.
“Data is now being collected from a wider base and type of devices, including the Internet of Things (IoT), cameras, and other sensors as well as the numbers and shear volume or amount of data being collected at the edge as well as other locations,” said Greg Schulz, an analyst with StorageIO Group.
2. The Data Explosion
The data explosion has been a trend for many years. What wasn’t expected by many was that the explosion would keep on exploding. Every few years, the volume of data in the digital universe increases by another order of magnitude. The amount of digital data worldwide is growing at a rate of around 23% per year, according to IDC.
“The deluge of data that every company is contending with, from connected products, assets, processes, and customers, is difficult to leverage without the applications in place that enable collation of information and rapid decision making,” said Jeffrey Hojlo, an analyst at IDC.
3. Data Tiering
Brian Henderson, director of product marketing for unstructured data solutions at Dell, said a lot of companies are taking a fresh look at their data management strategies, which includes storage and data protection technology. The more advanced IT organizations are tiering their data requirements based on the volume (how much data they have and how fast it’s growing), variety (what kinds of data, where is it stored, and can it be accessed from one place), and velocity (what performance is required), and how critical it is to their business.
Thus, data is being placed in different tiers depending on its value and its need. Large repositories of data might house petabytes of unstructured data that may only be analyzed once. Other tiers might be for time-sensitive data that only has value for a short period. And of course, there is live, operational data, which might stay in a higher tier with high performance for a period of one month to three months. After that, the data is relegated to lower tiers until it ends up in an archive.
4. Scale-Out NAS
To overcome the challenges of traditional storage infrastructure, many businesses are looking to scale-out network attached storage (NAS) solutions and the benefits they can offer for managing their data. Scale-out NAS can be used to host unstructured data in a single large pool of storage for file sharing, large graphics, video files, and many other applications that can be accessible via NFS, SMB, HDFS, S3, HTTP, and FTP protocols. For example, Dell EMC PowerScale, powered by the PowerScale OneFS operating system, helps store, manage, secure, protect, and analyze unstructured data for a range of applications and workloads. It also offers API integrated cyber protection capabilities.
“Most backup and data protection technologies today can support NAS workloads, but it’s important to complement backup software with storage-based solutions that replicate data faster and have faster recovery times,” Henderson said.
5. Data Location
In days gone by, all data was funneled to a central data center or central storage area network (SAN). But the magnitude of data collected in the modern enterprise means that it usually isn’t feasible to send it all over the network to headquarters. As well as different tiers of data, there are also different data locations evolving. A big data collection trend is the determination of the ideal location for data.
Some data is useful only at the edge. Let’s take the case of self-driving vehicle data. It is useful for a few moments within the vehicle and in sensors near the location of the vehicle. Therefore, that data is gathered there. The network would become hopelessly clogged if all of it was transmitted centrally. That’s why most if it is discarded or summarized within a short period, and only key data is relayed to central systems.
“There is a major change ongoing about how data and metadata, including telemetry are collected from various locations,” said Schulz. “Some data only needs to be gathered and processed at the edge. Other data needs to be sent to a core on-premises or cloud location for additional heavy processing.”