An important part of the "beyond the data center" challenge is
Remote data backup and recovery has traditionally been decentralized for two reasons: the high cost of moving large quantities of data to a centralized location, and slow access time to this data for remote recovery. Bandwidth is costly, and in many remote locations the availability of even sufficient capacity is a stretch. Compounding the problem is the fact that traditional backup is a batch-oriented activity that involves moving lots of data in a limited period of time. The additional bandwidth needed to support that type of activity across a WAN was impossible to justify. There's also the matter of recovery: While backing up over a small pipe is painful, restoring volumes of data can be excruciating.
As a result, firms tended to implement ROBO backup in accordance with a "mini-data center" model, equipping each location with its own backup server and tape drives or small library. Someone at the remote office periodically inserted and removed the tapes and, with luck, tapes were occasionally sent to an off-site location. Quality and consistency of data protection varied dramatically from site to site.
Technologies for remote office backup
Several technologies have evolved that have led to shifting backup paradigms and new alternatives for remote backup. These range from improving the traditional remote backup model to eliminating the need for remote backup and potentially even remote servers and storage. These include:
Disk. As with backup in general, lower cost, high-capacity disk-based backup has been a major enabler to improved remote protection. Using disk can mitigate issues inherent in tape like stop-start performance, reliability concerns, and sequential job scheduling and resource management constraints. Those issues made local remote backup particularly painful. But it's important to remember two advantages of tape -- low per-unit storage cost and transportability -- can be overriding factors.
Data replication. Limited backup operational capability and the need to move data off-site make it reasonable to consider replication as an alternative to remote backup. Moving data back to the corporate data center where it can be managed (and backed up) as part of standard operations makes sense. This requires bandwidth, but unlike the concentrated high-capacity demand of nightly backup, it's a continuous flow based on the change rate of data. Because data replication alone presents a challenge when recovering large volumes of data at the remote location, it's typically deployed with other technologies.
Continuous data protection (CDP). CDP is a complementary option to replication. Real-CDP or near-CDP technology is available as SAN-based appliances, app-specific adjunct products, host-based snapshot managers and as backup apps. With continuous data protection, data can be protected locally to a backup server transparently and then replicated.
Data reduction. A key to successfully deploying replication in remote locations is the ability to minimize the number of bytes transferred. Data reduction in the form of data deduplication and compression is an essential enabling technology for remote data protection. This capability is necessary to keep disk-capacity requirements low and to make maximum use of bandwidth. Like CDP, this technology is incorporated into different types of hardware and software products.
Wide-area file services (WAFS). Wide-area optimization technologies, such as wide-area file systems, applications services and data services (WAFS, WAAS and WDS), leverage data reduction to further improve bandwidth efficiency. Typically implemented as appliances, these devices compress data and provide additional performance optimization functions such as caching and low-level protocol optimization. They're also increasingly app-aware and can dramatically improve I/O by streamlining chatty application-level protocols. Depending on the app mix and aggregate network activity, these devices can produce dramatic results and, in some situations, make it possible to eliminate remote servers entirely.
Specialized remote backup applications have evolved that integrate capabilities such as replication, CDP and data reduction, including products from traditional software vendors as well as alternative products and services.
The challenge then is how to determine the right backup model. The fundamental choices include:
Option 1. Back up locally to tape (or to disk with off-site tape copy).
Option 2. Back up locally, then replicate the backup data off-site.
Option 3. Back up directly to a remote location with data transferred directly from the branch server to a data center (or third-party backup service provider).
Option 4. Replicate primary data to a remote location with no local backup. This option leverages technologies such as host- or array-based snapshot and replication to ensure that an off-site copy of data exists.
There are also additional considerations that can significantly influence the final design and technology choices:
- Administration and management. Is the control and operation to be centralized or distributed? A high degree of centralization would tend to favor Option 3 or 4.
- Bandwidth. How much data must be protected? Does sufficient bandwidth exist given the data volumes to permit consideration of remote options? Large remote sites may need to adopt some variation of Option 1 or 2.
- Security. What data access and encryption policies are in place or anticipated? Any option needs to account for security, but those involving local administration and tape handling are a particular concern.
- Recoverability. In the event of primary data loss, what are acceptable recovery time objectives and recovery point objectives for the remote environment? Remote recovery can mean substantial downtime, eliminating Options 3 and 4.
- Data type. Some remote locations offer only file and print services, while others provide other application services. Application support can impact the range of remote-backup options, including consideration of application-specific backup and replication technologies.
- Policy. Coordinating consistent data policies across many remote locations can be daunting. But regulations may demand the ability to audit policies and provide proof of compliance, again favoring centralized control and monitoring solutions.
- Cost. Remote backup design decisions often end up as a compromise due to cost constraints. To fully understand cost tradeoffs requires precise inputs -- data quantities, change rates and growth estimates -- and realistic bandwidth calculations, factoring in data-reduction and WAFS efficiencies.
Even with common high-level goals, the final design can vary. For example, consider two firms with requirements for local recovery and off-site data vaulting. One, with multiple storage devices, may choose a local disk-based backup solution that incorporates deduplication and replication, while the other (with a common file server) might choose NAS snapshots and replication assisted by a WAFS appliance.
Ultimately, whether data sits inside or outside the data center is irrelevant, and extending appropriate data protection capabilities is now both required and realizable.
This article originally appeared in Storage magazine.
James Damoulakis is CTO of GlassHouse Technologies, an independent storage services firm with offices across the United States and in the UK.
This was first published in July 2009