Physical servers were usually under-utilized and took time and effort to deploy. These servers also consumed data center space, power and cooling. Virtualization reduced hardware costs, reduced the environmental requirements by saving on power and cooling and improved the utilization of physical hardware in comparison to dedicated server environments.
Of course there are tradeoffs in using virtual infrastructure. Operating system licenses are not free; there are additional management costs to consider and staff have to be trained and gain experience in the technology. However, the net effect of virtualization has been to allow many companies to reduce their overall computing costs.
Virtualization, as it continues to gain adoption, has a very close relationship with storage. Virtual machines are just data and have to be stored somewhere. On standalone virtual servers, this can be achieved simply by using DAS (locally attached) disks. But if more advanced virtualization features (and increased availability) are required then virtual machines will normally be stored on SAN or NAS arrays. Using a dedicated storage array provides a number of significant benefits to the user, namely:
• Increased resilience – storage arrays operate with a high degree of redundancy with multiple power supplies, fans and other internal components.
• Scalability– storage arrays are highly scalable devices and can be extended dynamically without outages.
• Shared access– storage arrays allow shared access to data, which isn’t generally practical or possible with DAS. Shared access is essential in increasing resiliency.
• Increased functionality– storage arrays offer advanced features such as replication and snapshots that can offload processing power from the virtual server.
Virtual guests all contained on a single LUR; LUN failover keeps replicated LUNS together.
Storage Features Key to Virtualization
As virtual servers increase in levels of adoption, they will encompass more and more mission critical systems. It will be essential, therefore, for virtual guests to be replicated between locations to improve availability and protect in the event of a disaster scenario at the local data center.
Replication can be performed synchronously (where I/O is confirmed as written to both source and target arrays before being confirmed to the host), or asynchronously(where the source array confirms I/O complete to the host without waiting to confirm the target array has received the data).
Storage arrays are highly efficient at replicating data, which has been a key feature of these devices for 15-20 years. However, for SAN arrays, the way in which data is presented to the host can cause an issue with replication.
Storage is typically presented to virtual servers as large LUNs (Logical Unit Number). This is a single unit of storage as far as the array is concerned, but from the virtual server perspective it will be used to hold many virtual guests.
For example, a 500GB LUN could hold 25 virtual guests of 20GB per guest. Each of these virtual guests will have their own service levels and DR requirements. The storage array will replicate the entire LUN and in the event of a “failover” scenario where operation moves to the remote array, it is expected that the primary LUN will not be accessed, and all host I/O will occur on the remote LUN.
This may be fine for a complete DR failover but doesn’t address other operational requirements, for example, where a single VM guest is moved to another location for operational reasons rather than a DR outage.
In this instance, the VM guest may have to be manually moved to another data store, removing the benefit of using replication within the array. We will see later that this issue is being addressed by new functionality in the array.
The replication issue touches on another closely related subject, that of data mobility.
Where replication provides benefits in increasing levels of availability, the wider subject of data mobility becomes more important in virtual environments. By mobility, we refer to the ability to move data (and by definition, virtual guests) around a storage enterprise, both within the data center and between separate data centers.
Mobility within the data center is an essential feature for a number of reasons:
• Balance – It enables workloads to be balanced across multiple storage arrays.
• Change configuration – It enables storage devices to be added into and taken out of a configuration as required, for instance, when arrays are being replaced.
• DR scenario – It enables data to be moved to other locations for pre-emptive DR planning or in the event of an actual DR scenario.
The ability to move data around the infrastructure is becoming more important in delivering today’s virtual environment and will be even more important in the future as workload moves into “the Cloud.” Without advanced storage array functionality, data movement would have to be performed by the virtual server, consuming CPU and network resources.
We will see later that storage vendors are working toward solutions that will enable arrays to move large amounts of data independent of the virtual server itself.
Virtualizaton and Backup
The move to virtualized environments meant a new approach to backup. Although possible, it is impractical to backup each virtual guest individually. Instead, functionality within the virtual server enables backup images of each virtual guest to be taken and accessed by a separate backup server. To improve the performance of this feature, the storage array can perform the snapshot process, offloading CPU and I/O resources from the virtual server.
Performance is clearly an issue in backup, however, performance in general is a key storage array feature for virtualization.
Storage arrays have been developed to process large volumes of I/O, which can be either sequential or random. In virtual environments the I/O is typically random and this doesn’t work well with DAS storage, which would require more expensive, high-speed drives.
Storage arrays can benefit from large numbers of disks (as it is a shared environment), dedicated cache and multiple I/O connectivity, all of which both improves performance and delivers a more consistent I/O response time.
As more workload is virtualized, the hypervisor itself becomes a significant part of the support effort, because it is the platform that is tied to the hardware itself.
Boot from SAN enables the hypervisor to be disconnected from the hardware and allows a single hypervisor instance to be booted on any server; it also allows the hypervisor to be replaced with another instance that could be (for example) an upgraded version.
By removing the boot device from the server and placing it on the SAN, the server holds no state information and so becomes a commodity. This is most easily demonstrated with the use of blade servers, where multiple physical servers exist in a single chassis; they can be added or removed from the blade infrastructure at any time. Ultimately, blade flexibility is served best with shared SAN storage.
Both hypervisor and boot disk stored on SAN; hypervisor can be booted from any physical server.
Why VAAI Is So Important
Although storage arrays already offer many important features to virtual environments, there are additional requirements not met by today’s hardware. That is why for VMware vSphere, VAAI (vStorage API for Array Integration) was developed.
VAAI defines a set of API calls that are implemented within the storage array through amendments to the SCSI protocol. Most notably of these are the following:
• Block Zero– implemented as Write Same in SCSI, this pushes the task of zeroing out large blocks of data down to the array. In fact, Write Same could be used to write any values over a large range of data, however it’s most useful to vSphere to write zeroed out data when creating new virtual disks (VMDKs).
• Full Copy– implemented within the array as SCSI EXTENDED COPY, this feature allows bulk movement of data both within and between storage arrays, taking the load off the vSphere hypervisor when performing storage vMotion or guest cloning functions.
• Hardware Assisted Locking (HAL) – this moves the SCSI hardware lock from the LUN to the block level, improving performance on certain vSphere operations that require locking for data integrity. However, HAL will potentially resolve the issue of LUN replication, allowing I/O on both sides of a replicated LUN pair.
Virtualization and Storage Vendor Solutions
Storage vendors are starting to offer new features and products that specifically meet the needs of virtual environments.
Example 1: EMC VPLEX
EMC’s VPLEX product virtualizes the storage LUN and permits I/O to either side of a replicated LUN pair. This enables virtual guests to be moved between storage arrays (typically in geographically distant locations) with no outage and without waiting for data to be replicated.
Example 2: Compellent Live Volume
Compellent’s newly announced Live Volume feature enables a single logical LUN to be spread across multiple storage arrays. The LUN can be associated with one array and dynamically moved to another in order to meet workload balancing or DR requirements.
Virtualization and Storage: Summary
It’s clear that as virtualization continues to have a greater importance in the enterprise, storage will form a critical part in delivering that infrastructure. The features of the storage array will continue to evolve and deliver better performance, availability and resilience than could be achieved using directly attached storage (DAS) alone. Storage and virtualization are and will continue to remain, closely linked.