Home > Using data deduplication with backup applications: Source vs. target dedupe
Column:
EMAIL THIS

Using data deduplication with backup applications: Source vs. target dedupe

23 Dec 2009 | SearchStorage.co.UK

Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   

W. Curtis Preston
By W. Curtis Preston

Your data backup software company wants some of Data Domain's revenue -- seriously. Backup software companies didn't see the intelligent disk target (IDT) market coming. The next thing they knew, companies like Data Domain were making millions of dollars a year selling such devices. Then the independent software vendors (ISVs) that make backup software started having the same thought: "If we offered dedupe for regular backups, customers would pay the data deduplication premium to us instead of to those appliance companies." And a line in the sand was drawn.

Source deduplication is your friend

The IDT vs backup software battle is just beginning, and this article will include a description of the products that have entered the battle; however, first we should discuss the battle that's completely over: backup of small amounts of data coming from remote sites. In this fight for your storage dollars, source deduplication has won ((Content component not found.)) hands down. Whether you're backing up a single home computer with your personal data, hundreds of remote users with laptops, or many remote offices with less than a terabyte of data each, source dedupe is your friend.

Without source dedupe, backups of smaller data sets and remote data sets can be quite challenging. Home users have historically used free products that are included with their OS or USB drive. Remote offices typically use something like Symantec Corp. Backup Exec and a DAT drive. Only the most conscientious laptop users have any kind of backup plan at all other than occasionally copying their data to a server that gets backed up. All of these methods are fraught with problems and suffer most from human error.

Installing a source dedupe product on these systems allows them to back up to a source dedupe server over a WAN connection -- completely automating this most important business function. They can back up to a source dedupe server managed by the IT department, or to a cloud backup service managed by an outside company.

The reason that source dedupe allows you to back up large amounts of data over such a small connection is that a source dedupe product communicates with the source dedupe backup server to identify and transmit only the blocks that are new. They start by asking the file system for the files that have changed since the last backup, then they examine each file that is to be backed up for blocks that have changed. This method of backup is obviously very well suited for remote data or mobile data.

Cloud backup and source dedupe

One interesting way that some companies can begin using source dedupe is to use a cloud backup provider that will manage the backups for them. All they have to do is install the cloud backup provider's software on their servers and start backing up to the cloud service. There's no backup server to install or manage. The only challenge some companies may have is getting the first backup done, since the first backup obviously has to send all the blocks. Some cloud providers offer a "seeding" option where they ship you a disk drive that you back up to locally and then ship back to them. They copy this backup to their servers, thus "seeding" your initial full. Once that has been done, your servers only have to back up the blocks that have changed since you backed up to the seeding system.

Target deduplication

Where source dedupe is perfect for smaller, remote data sets, target dedupe is meant for larger datasets where you have essentially unlimited bandwidth between the backup client and the backup server. This is the market that the appliance vendors have focused on, and some have done quite well selling you appliances that will ingest native, un-deduped backups and dedupe them for you. That's what made backup software companies sit up and take notice.

In this fight for your storage dollars, source deduplication has won hands down.

The first company to make a move was Symantec. They took NetBackup PureDisk (a source dedupe product) and moved it inside the media server, allowing it to receive and dedupe regular NetBackup backups. The media server dedupes the data inline as it receives the data, and the deduped data is sent over IP to a PureDisk server.

IBM Corp Tivoli Storage Manager (TSM) followed with TSM server dedupe in TSM 6.1. TSM's implementation is a post-process implementation that looks at backups that have been sent to a disk-type device and dedupes them after the fact. CA announced similar capability for its ARCserve Backup product.

CommVault Systems Inc. is the latest vendor to enter the fray with its media agent dedupe option. Backups that are sent to a Simpana media agent are deduped inline before they are stored on disk. If you wanted to back up a remote site using this feature, CommVault says a media agent in a remote site could write deduped data to a CIFS share that was mounted from the central site.

Both CommVault and Symantec are making claims that you should use their dedupe software instead of buying a dedupe appliance, although CommVault's claims tend to be a little bolder. (Dave West, vice president of marketing and business development at CommVault, wrote in his blog that he sees no use case where any CommVault customer would need to buy a dedupe appliance.)

Can source dedupe or target dedupe from your backup application meet your needs? That will depend on your environment. We definitely see cases where a company's throughput requirements for target dedupe can only be met by an appliance. But there are also plenty of cases where either will work and it's simply a matter of negotiating over price. Just be sure to perform a proof of concept of any vendor claims before signing that check.

About this author: W. Curtis Preston (a.k.a. "Mr. Backup"), Executive Editor and Independent Backup Expert, has been singularly focused on data backup and recovery for more than 15 years. From starting as a backup admin at a $35 billion dollar credit card company to being one of the most sought-after consultants, writers and speakers in this space, it's hard to find someone more focused on recovering lost data. He is the webmaster of BackupCentral.com, the author of hundreds of articles, and the books "Backup and Recovery" and "Using SANs and NAS."


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   



RELATED CONTENT
Data reduction and deduplication
IBM quietly releases source-side data deduplication in Tivoli Storage Manager 6.2
SunGard adds EMC Data Domain deduplication to Secure2Disk cloud data backup service
Primary storage data dedupe and compression find their niche
EMC's Slootman: Data Domain planning global deduplication, NetWorker integration this spring
Storage roundup: College uses clustered NAS; new Secure Multi-tenancy Design Architecture; and more
The green data centre: Business best practices
Symantec injects data deduplication into NetBackup 7 and Backup Exec 2010
Creating a data center migration plan
Data backup and recovery best practices with W. Curtis Preston
Data backup and recovery choices for SMBs

Data storage backup tools
Tape storage and backup suits us fine, says City firm's backup chief
Storage roundup: UK data backup practices behind those of France, Germany; and more
IBM quietly releases source-side data deduplication in Tivoli Storage Manager 6.2
The pros and cons of RAID disk arrays in small business data storage environments
Tiered data backup storage strategies
Symantec injects data deduplication into NetBackup 7 and Backup Exec 2010
Creating a data center migration plan
i365 makes cloud data storage connection with CA Recovery Management
Virtual machine (VM) backup has RLAM investing in Veeam Backup & Replication
Data backup and recovery best practices with W. Curtis Preston

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary




Data Backup Solutions for UK - Data Reduction, Data Deduplication, Tape Storage
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2008 - 2010, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts