Article

Data backup overview

Stephen J. Bigelow, Features Writer

Storage administrators usually don't have trouble getting data onto disk or tape. The real challenge is keeping that valuable corporate data safe in the face of daily operations. Whether you're dealing with a hard disk failure, chasing a misplaced tape or recovering from nature's wrath, data loss is simply a fact of life. But data loss isn't simply an inconvenience -- it can result in costly business interruptions, and the increasing weight of government regulations and consumer expectations can pose severe penalties for lost data. IT professionals must take decisive steps to

Requires Free Membership to View

back up and protect corporate data. This chapter covers the areas of disk, tape and remote data backup, and explains the essential ideas used in successful data backup strategies.

Tape backup

Tape is the quintessential data backup medium. Most tape technology is well established and inexpensive but it is also too slow to serve as a primary storage platform. The appeal of "cheap, plentiful and slow" storage has made tape a traditional complement to disk storage systems.

Tape storage is a removable media technology, so tape cartridges can easily be exchanged with any compatible drive mechanism. The cartridges are designed for specific tape drive architectures and are not interchangeable. The "tape" is simply a length of flexible plastic ribbon coated with magnetic media and wrapped around a set of spindles. The spindles are mounted inside a plastic cartridge enclosure that protects the tape media from damage. Tape cartridges have a relatively short working life because the tape media actually contacts the tape drive's read/write heads. It is recommended that tapes should be replaced after about 2,000 passes.

A tape drive is the electromechanical device that reads and writes to the tape cartridge, and exchanges that data with the rest of the computer. Drives typically use either helical scan or linear tape head technology to access the tape. Helical scan drives use a rotating head positioned at an angle, reading and writing data as diagonal stripes along the tape's width. Linear tape simply positions a stationary head that runs along the tape length. There are numerous tape formats in service today that leverage these two approaches, including advanced intelligent tape (AIT), digital data storage (DDS), digital linear tape (DLT), Linear Open-Tape (LTO) and Travan. The choice of tape drive should include a consideration of capacity need, performance speed, media cost and technological longevity.

Even the latest tape technology cannot offer enough capacity to backup an entire enterprise to a single tape. Rather than manually spanning data backup jobs across multiple tape cartridges, tape drives are often organized into groups, dubbed tape libraries. Backup software then utilizes the various drives and cartridges in the tape library to achieve a complete backup. In many cases, a robotic arm or autoloader is added to exchange tapes with each drive, allowing a tape library to manage a huge number of tapes.

Backup software is a critical management tool that interfaces backup hardware (the tape drives and libraries) with corporate data servers, allowing administrators to decide when and where to backup selected files, folders, drives, servers or even entire data centers. Backup software also supports automation so backups can be performed and verified during off hours without direct human intervention. For example, EMC NetWorker and Symantec Veritas Backup Exec 10d are two well-known backup tools.

Disk backup

While hard disks are certainly the primary storage medium for all types of computer systems, disks are increasingly being used for data backup tasks. This is partly due to the falling costs of high-volume storage devices, such as SATA and SAS drives, but also because backup needs are changing. Many organizations work in a global 24/7 marketplace and cannot afford to go offline for nightly tape backups. When trouble does strike, a busy organization must restore its operations in a matter of hours -- not days. Disks offer the cost-effective speed and storage capacity to make disk-based backup effective [see Chapter one for more information about disk storage].

The simplest type of disk-based backup is disk-to-disk, basically copying the contents of one disk to another. If the first disk fails, data can be retrieved from the other. This is sometimes called mirroring and is an essential tenant of RAID. In some cases, both disk and tape technologies are combined in a disk-to-disk-to-tape platform, dubbed D2D2T. Primary disk storage is first backed up to secondary disks -- lost data can be quickly restored from the backup disk. Tape is then added on as a form of long-term archival storage. A benefit of D2D2T is that tapes can be written from the secondary disk storage so the main storage system is not taken offline in the tape writing process. The resulting tapes can then be sent off site to protect the primary and secondary disk storage systems against disaster.

Some companies with established investments in tape libraries may have trouble justifying the shift to disk-based backup systems. One way to ease the transition anxiety from tape to disk is through a virtual tape library (VTL). A VTL is simply a disk storage system designed to mimic the behaviors of a tape library. By emulating a tape system, a VTL can utilize disk speed to accelerate backups and restorations while leveraging an organization's existing backup software, policies, infrastructure and in-house technical expertise. Select a VTL that will most closely match your current tape library system, capacity needs and backup software. For example, Advanced Digital Information Corp.'s Pathlight VX can offer up to 57.6 terabytes of capacity while emulating LTO-1 and LTO-2 drives.

Remote backup

One problem for today's enterprise is the proliferation of remote offices. Business data can be just as important on servers in the Boise, Idaho, sales office as in the Seattle headquarters. Unfortunately, remote offices typically do not staff IT personnel -- relying instead on non IT workers to rotate backup tapes and ship them to a data center. Several trends are appearing to address this problem. A growing number of organizations are eliminating tapes in favor of WAN-based backups that transfer crucial information to the data center across broadband WAN connections. LiveVault Corp.'s InControl is one product intended for remote WAN backups. Rather than creating physical tape backups and rotate them to an off-site storage facility, WAN is also being employed to transfer data directly to an off-site archive service, such as Iron Mountain Inc. [see the SearchStorage.com article on remote office backup].

Bandwidth is the main issue with any WAN-based backup scheme. Fast bandwidth is expensive, so the focus with WAN backups is to use techniques like data deduplication (a.k.a. single-instance storage or commonality factoring) and conventional compression to optimize the use of available bandwidth. Another popular technique is to avoid complete backups over WAN and just transfer the most important business files between locations.

Some organizations are eliminating the difficulties of remote IT by consolidating remote IT into a single data center. Remote access then uses WAN links with application accelerating technologies, like WAFS, to serve applications and files to remote offices just as if the data were local. WAFS generally involves appliances installed at both ends of the WAN link, which cache needed files to each remote office for quick access. Any changes to a file can then be saved back to the data center as time and bandwidth allow [see the SearchStorage.com article on WAFS].

Other backup concepts

Backups generally fall into three categories: full, incremental and differential. A full backup is a complete copy of all files. A full backup on a server with 528 GB of data will transfer all of that data to the backup target (e.g., disk or tape). Full backups take the longest to make, but they are easiest and fastest to restore. An incremental backup only tracks changes made since the last backup event. If you perform a full 200 GB backup on a server Monday, and 2 GB of new data are added on Tuesday, an incremental backup will only capture the new 2 GB. If another 1 GB changes on Wednesday, only the new 1 GB is captured. Once a full backup is performed, incremental backups can be very quick. However, you must restore a full backup first and then all of the incremental backups in succession since that last full backup. By comparison, a differential backup captures the total changes made since the last full backup. For example, if 3 GB changes on Monday, 2 GB on Tuesday and 7 GB on Wednesday, each day's differential backup will capture 3 GB, 5 GB and 12GB respectively. Differential backups can take longer to make than incremental backups, but are easier to restore. With a differential backup, only the full backup and last differential backup must be restored.

Mirroring and replication are essentially the same thing -- both create copies of data -- but there are subtle differences in context. Replication is basically an offline copy of the data that isn't necessarily intended for use but mirroring creates a data copy that can be used directly. For example, data is frequently replicated to CD or DVD for long-term archival storage but data may be mirrored to disk for RAID.

Snapshot and continuous data protection (CDP) technologies are also appearing in disk-based backup systems. Snapshots capture the state of a storage system at a given point in time, saving detailed reference information about available data and its location, similar to a detailed table of contents. When trouble strikes, data can be restored based on the latest snapshot. Snapshots can be taken as frequently as a storage administrator deems necessary. CDP provides even more granular detail, recording each storage transaction to a journal in real-time. If data loss occurs, the storage system can be "wound back" to the last good transaction, which could be minutes, even seconds, ago [see the SearchStorage.com article on CDP].

Security is increasingly important for all data backup operations. Company data often includes confidential or personally identifiable information that needs to be protected. When a tape is lost or a network is hacked, sensitive information may fall into the wrong hands. Backup systems are starting to use encryption when saving files to tape or archival storage. Encrypted data cannot be read without the corresponding keys, so encrypted data cannot be misused if it's stolen.