The ability to automatically and transparently migrate data within a storage system relies on this mapping so that the data can be reconstructed for the user. This reconstruction is embodied within metadata that specifies how data is distributed across the various storage subsystems.
In addition to the various implementation styles (which well explore next), there are a number of trade-offs in what granularity of data is to be migrated (see Figure 2). Each comes with their own advantages and disadvantages. For example, some vendors implement LUN-level migration, which is conceptually simple, but means that all content within a LUN is treated the same way.
Sub-LUN level migration is also implemented, which can take the form of large blocks of data, in the extreme case down to the block level. Sub-LUN level migration has certain advantages, as high-frequency data can be migrated to faster tiers, leaving the other data in the LUN to less expensive tiers of storage.
Sub-LUN level migration also has a cost, as metadata must be managed for the individual blocks of data (and the smaller the chunk size, the less efficient it may ultimately be). Additionally, if the migrated chunks of data are larger than a block, performance gains may be realized in the form of read-ahead (for example, if the blocks within the chunk are logically related).
An important characteristic of a solution that incorporates data migration is efficiency. The solution should minimize any impact on storage performance. Other trade-offs include the method by which data is classified, the frequency that its performed, initial placement of data (assume the data is initially hot or cold), and others.
Some implementations, for example, perform data migration as a background process (nightly activity), where others perform this activity in real-time. While potentially introducing latency, real-time migration provides the ability to react dynamically to the user needs of data.
Figure 2: Levels of Data Migration.
Host-based implementations integrate the tiering and migration logic into the host servers. While this can be restrictive from the perspective of single-user storage, virtualization has changed this to also support multi-user (multi-VM) configurations.
Operating systems, for example, can integrate this type of functionality into their logical volume managers (such as Linuxs LVM), and hypervisors can incorporate into their storage stacks. VMware implements this under the product name Storage vMotion, which permits the migration of live (active) virtual machine disks between storage mediums. This is implemented efficiently using changed block tracking to migrate the virtual machine disk in the background, and in the end, suspend the VM for a short time to move any remaining blocks to the destination datastore.
Network-based implementations place an intermediary into the network between the storage users and the physical storage. This offloads the functionality from the host, but also permits a vendor-agnostic storage backend (storage from multiple vendors). Examples of network-based implementations (for both data migration, and numerous other features) include IBMs SAN Volume Controller (SVC), HPs SAN Virtualization Storage Platform (SVSP), and FalconStors Network Storage Server (NSS).
Finally, target-based implementations pull the required logic into the storage array itself. Like network-based implementations, the overhead of virtualizing the data is offloaded from the host, creating an abstraction at the target. Once this abstraction is constructed, other advanced features can be implemented, such as data reduction (as the physical placement and format of data is hidden from the host users). Many examples of target-based implementations exist, such as EMCs FAST, Compellents Data Progression, 3PARs Dynamic Optimization, and many others.
Figure 3: Implementation Styles.
About the Author
M. Tim Jones is a firmware and product architect and the author of Artificial Intelligence: A Systems Approach, GNU/Linux Application Programming (now in its second edition), AI Application Programming (in its second edition), and BSD Sockets Programming from a Multilanguage Perspective. His background ranges from the development of software for geosynchronous satellites to the architecture and development of storage and virtualization solutions.