What you will learn in this tip: In the first part of our series on data replication techniques, you learned about the differences between asynchronous vs. synchronous replication.
In part two, you will learn about array-based replication and network-based replication.
Array-based replication has many merits. For storage managers, it's simply another array feature. Managed similarly as other array functions and options, it takes little effort to leverage replication. Because it's an array function, deploying it requires very little cross-departmental coordination; it's the storage group that makes it happen and supports it once deployed. Provided by the same supplier, array-based replication is supported by a single vendor, thereby eliminating a great deal of finger-pointing when problems occur. Furthermore, array-based replication is less likely to be disrupted by extraneous activities such as patching and other changes, which are more likely to pester host-based replication products, giving it a higher degree of resilience.
Early on, a mechanism to replicate data from one array to another emerged as a necessity and array vendors quickly added replication to their storage systems -- to high-end arrays first, where it's standard now, and then to midrange and lower-end arrays. Dell Inc. is a perfect illustration of the trend of replication filtering down into the low end of the data storage market. Today, all of Dell's storage systems, with the exception of the lower-end PowerVault arrays, support replication, from Dell/EMC SAN Storage and Dell EqualLogic arrays to the Dell DX Object Storage Platform.
Array-based replication's greatest shortcoming is its requirement for similar source and target arrays, limiting its use to homogeneous storage environments. Most storage vendors don't even support replication between their own array families. Among major storage vendors, NetApp is the lone exception, supporting array-based replication between any of its arrays. Another noteworthy vendor is Hitachi Data Systems, whose Virtual Storage Platform (VSP) and Universal Storage Platform (USP) are able to reach out to other arrays via storage virtualization. And with very few exceptions, such as Dell EqualLogic arrays, replication is an extra-cost option that's charged for by device or replicated capacity.
Block-based Fibre Channel and iSCSI arrays replicate block changes on volumes and LUNs. Since only changed blocks of a few hundred bytes need to be replicated, it's very fast and efficient. Executed beneath the file system, block-based replication is operating system agnostic and supports replication between any platforms attached to the array. Block-based replication has the potential to take advantage of advanced array features such as deduplication, compression and encryption, and some vendors have enhanced their replication offerings accordingly. For instance, NetApp, with the 8.0.1 release of Data Ontap, added the ability to only replicate data changes in FlexClone volumes between parent and clone images. A FlexClone volume is a thin-provisioned clone, requiring very little actual disk space; but until this latest release, the complete volume had to be replicated instead of the disk-efficient FlexClone.
Network-attached storage (NAS) systems usually replicate at the file system-level, which has the benefit of file system metadata awareness, which can be leveraged during the replication process and enables replication based on criteria such as file size and file type. But it's slower and usually less efficient than block-based replication. The performance impact increases with the number of files and folders in a replication set that need to be parsed, and the larger the tree, the longer it takes to parse it. For that reason, BlueArc introduced the object-based JetMirror technology, replacing time-consuming sequential file parsing with an object-based metadata store. "Backups with JetMirror are 2.8 times faster than with NDMP [Network Data Management Protocol] and replication times for very large file stores can be reduced by an order of magnitude," said Ravi Chalaka, BlueArc's senior director of solutions marketing.
Network-based replication pros and cons
Network-based replication usually comes into play in heterogeneous storage environments. It'll work with anyone's array and supports any host platform. Situated in the network, between hosts and arrays, the splitting of I/Os is performed in either an inline appliance or in a Fibre Channel fabric. The I/O splitter looks at the destination address of an incoming write I/O and, if it's part of a replication volume, forwards a copy of the I/O to the replication target. In many ways, network-based replication combines the merits of array- and host-based replication. Having only arrived on the market several years ago, it has the smallest market share, trailing both array-based and host-based replication in revenue and numbers, but it's growing at a quicker rate than array-based replication, according to IDC.
Compared to the multitude of array- and host-based replication offerings, there are fewer network-based replication products on the market, and they can be broken into two groups: inline appliances and fabric-based replication products.
Inline appliances, such as the IBM Corp. San Volume Controller (SVC), don't depend on intelligent switches from Brocade Communications Systems Inc. or Cisco Systems Inc. for splitting I/Os; instead, I/Os are terminated and forwarded in the appliance to storage targets. Unlike the wire-speed splitting of fabric-based products, the overhead of terminating and initiating new I/Os causes a small delay. While fabric-based products are based on a split-path architecture where data that isn't part of a replication or virtualized volume is simply passed through, in an inline appliance all traffic has to traverse the replication appliance. As a result, they're more likely to hit a scalability threshold than their fabric-based counterparts. "A variety of hardware options, including cache and number and speed of processors, have enabled IBM to address scalability and performance concerns for the most part," said Greg Schulz, founder and senior analyst at Stillwater, Minn.-based StorageIO Group.
While fabric-based replication products may be technologically superior, with better performance and scalability, they're significantly more complex and require intelligent switches. To use them in environments that don't have intelligent switches, fabric-based replication products usually provide host agents that perform the splitting of I/Os on hosts instead of in the fabric. EMC Corp. RecoverPoint, with its continuous data protection (CDP) and remote replication capabilities, is the most prominent fabric-based replication product.
Hewlett-Packard (HP) Co. StorageWorks SVSP and LSI StoreAge SVM -- the former being an OEM
product of the latter -- combine the simplicity of an inline appliance with the performance and
scalability of a fabric-based product. The products use a split-path approach where management is
handled in-band; however, data movement and normal data flow occur out of band, leading to improved
scaling and performance.
Other network-based replication players are FalconStor Software Inc. with its MicroScan and Delta Resync replication features, and InMage Systems Inc.
In the next part of our series, learn about the pros and cons of host-based replication products.
About this author: Jacob Gsoedl is a freelance writer and a corporate director for business systems.
This article was previously published in Storage magazine.
This was first published in April 2011