At the highest level, there are three main types of data replication commonly used: application-based, host-based and storage array-based data replication. In fact, we could get
Purists could argue that SAN and appliance-based replication are not "true" array-based replication because they are independent of the disk array, but for the purpose of this article, we can agree that replication takes place at the storage level rather than being host or application based. What distinguishes storage-level replication is that it relieves the application and server resources from the processing overhead associated with replication.
The downsides of array-based replication
A few years ago, it was a lot easier to outline the downsides of array-based replication; it was a very low-level technology that replicated blocks of data without much ceremony. Many times you had to take applications down to preserve data integrity because the application was not aware of the replication process. The technology offered very little support for heterogeneous storage hardware, which made it pricey. It typically worked at the array controller level and remote replication was costly due to network bandwidth requirements and often required some proprietary form of protocol conversion. Mirroring or RAID 1 is probably one of the earliest and best known forms of controller-based (array-based) replication. It is also known for its cost and limitations.
Not that these days are long gone, but data replication options are now much more flexible and affordable. Once reserved for enterprise-class organizations, there are now many offerings available to SMBs without breaking the bank. This has made array-based replication a very popular option because it is centralized and operating system agnostic. However, the challenge still remains interoperability between most vendor solutions.
That being said, if a very low-cost replication solution for a limited amount of data is what is needed, array-based replication is often still too expensive for small environment or small remote office.
Getting started with disk array-based data replication
The best way to get started with array-based replication is by first answering a number of high-level questions:
- What are you trying to achieve? The need to implement array-based replication may be to address slow backups, increase the frequency of backups, capture frequently changing data, reduce or eliminate tape management, etc. If host-based or application-based replication is too resource intensive, array-based replication is likely the right approach. If large volumes of data are replicated, host-based or application-based replication can also interfere with network traffic.
- What do you need to replicate? The answer to that question can really influence the cost of the solution so it is always good to address this realistically. Since in most cases, technology is deployed to support the business, there should be a business requirement for the benefits of replication rather that a "nice-to-have" capability.
This takes us back to a very familiar discussion about data availability, recovery time objectives (RTOs) and recovery point objectives (RPOs), which ultimately drives the need for a particular technology. There might be a need for a certain application to have access the latest possible copy of data. In such cases, a technology like EMC Corp. RecoverPoint is well suited. The I/O is split at the SAN level and written simultaneously on two different supported storage arrays, local or remote. This is done transparently and provides what is known as continuous data protection (CDP).
On the other hand, the ability to access point-in-time copies might be desirable. Data corruption or unintentional data deletion are often cited as situations where this functionality is required. Snapshot technology is a good fit to create multiple point-in-time copies without requiring an amount of disk space that is a multiple of the space the original data occupies by only replicating changed blocks. The NetApp Snap software suite offers a comprehensive set of options including local and remote snapshots, mirroring, vaulting and application specific snapshots capabilities.
Application-aware replication is probably the most significant development in the field of array-based replication. The integration of replication with applications has made it possible to create copies of the data while the application is up and user access is maintained. Whether the solution leverages snapshots, mirroring, or volume-based replication, the ability to create a replica of a particular data set without affecting user access to the application is very much aligned with today's availability requirements. This capability is also far superior to the traditional daily backup since it can support much tighter recovery point objectives by allowing multiple daily copies or even continuous protection.
Local vs. remote replication
One more significant feature of array-based replication is the ability to proved local or remote copies of the data. While this capability is certainly not unique to array-based replication, it's not as intrusive or resource intensive as host- or application-based replication, which once more, provides an opportunity to replicate more frequently.
Vendor choices in array-based replication
There are many vendors with array-based replication offerings and the cost for their offerings will vary. Some of the best-known vendors include:
- EMC's TimeFinder, SRDF, MirrorView, SAN Copy, SnapView
- IBM Corp.'s Advanced Copy Services suite
- Hitachi Data Systems' Universal Replicator, TrueCopy and ShadowImage
- Hewlett-Packard (HP) Co. StorageWorks suite of software
Some vendors with specialized offering have smaller market shares but nonetheless offer innovative data protection technologies with an array-based replication component such as Data Domain, which leverages data reduction through data deduplication as backup data storage target enhanced with array-based replication for disaster recovery purposes.
That said, with the exception of EMC SAN Copy, which replicates at the LUN level, and HDS with the Universal Replicator leveraging TagmaStore, heterogeneous support remains the main challenge for array-based replication.
The line between host-based or application-based replication and array-based replication is not as clear as it once was. The emergence of virtualization appliances such is becoming the answer to interoperability between heterogeneous platforms while remaining off the host.
Pierre Dorion is the data center practice director and a senior consultant with Long View Systems Inc. in Phoenix, Ariz., specializing in the areas of business continuity and disaster recovery planning services, and corporate data protection.
Do you have comments on this tip? Let us know. Please let others know how useful this tip was via the rating scale below.
Do you know a helpful storage tip, timesaver or workaround? Email the editors to talk about writing for SearchDisasterRecovery.com.
This was first published in June 2009