Home > Data deduplication tutorial
Learning Guide:
EMAIL THIS

Data deduplication tutorial

31 Mar 2009 | SearchDataBackup.com

Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   

by Russ Fellows

If you've decided that your data backup system can benefit from data deduplication, you definitely have plenty of choices. But first you need to figure out where and how to implement dedupe. There are several data backup products that incorporate data deduplication. Some are virtual tape library (VTL) products, others are network-attached storage (NAS) that may be used as a backup target, and still others are backup applications.

In this tutorial, we look at post-processing versus inline deduplication, disk-based backup and dedupe, and compare the popular deduplication products.

Table of contents

>> A look at post-processing deduplication
>> Inline deduplication
>> Disk-based backup systems and deduplication
>> A comparison of deduplication products
>> The future of dedupe

A look at post-processing deduplication

For backup/storage administrators looking minimize the time it takes to back up their data, the best option is often to use a post-process method. This has the advantage of backing up data faster, reducing the backup window. The disadvantage of this method is that additional storage space is consumed. Backup data is sent to a temporary holding area in order to speed the backup process. Once that completes, the data is reexamined for duplicates, with duplicate data removed (some post systems start deduping before the whole backup is complete, so they may not require as much storage on the target).

Inline deduplication

An alternative to deduplicating data after a backup is to perform deduplication inline as data is being sent to the backup device. The advantage with this method is that no extra space is required. Another advantage is that once the data is deduplicated and stored, the process is done, and backup data may be replicated to offsite storage. With post-processing deduplication, data must be written to storage, then deduplicated at a later time, and then replicated to offsite storage. As a result, the time to complete the entire backup process -- including replicating to offsite systems -- can be longer than systems that deduplicate inline.

Disk-based backup systems and deduplication

Data deduplication can dramatically decrease the amount of disk space required for backup data, while retaining the significant performance improvements that disk-based backup devices have over tape. Thus, disk-based backup targets, whether they are NAS devices or VTLs, allow these systems to deliver high service-level objectives, while remaining cost competitive with tape-based systems.

A comparison of deduplication product offerings

There are several vendors that deliver products that incorporate data deduplication. Provided below is a comparison of vendors, products and features.

Product

Simpana 8

DDX

Avamar

DL 4000

SIR

VLS

Diligent

Vendor

Comm-Vault

Data Domain

EMC Corp.

EMC Corp.

FalconStor Software

HP Co.

IBM Corp.

Deployment type

Backup software

VTL w/ storage

Backup software

VTL appl. w/ storage

VTL w/ or wo storage

VTL appl. w/ storage

VTL appl. w/ or wo storage

Dedupe cost

Add-on

Included

Included

Add-on

Add-on

Add-on

Included

When Dedupe

Inline

Inline

Inline

Inline and post process

Post process

Post process

Inline

Dedupe location

Distributed

Target

Source

Target

Target

Target

Target

Chunk size

Variable

Variable

Variable

Variable

Variable

Variable

Variable

Access method

-

-

Hardware dependent

-

-

-

-

NAS (NFS/CIFS)

Yes

Yes

-

No

No

No

No

FC primary storage

No

No

-

No

No

No

No

FC tape storage (VTL)

No

Yes

-

Yes

Yes

Yes

Yes

iSCSI primary storage

No

No

-

No

No

No

No

iSCSI tape storage (VTL)

Yes

Yes

-

Yes

Yes

Yes

Yes

Product

HydraStor DataRedux

FAS DeDupe

Enterprise Archive

DXi

S2100 w/ DeltaStor

VTL Prime

PureDisk

Vendor

NEC Corp.

NetApp

Permabit

Quantum Corp.

Sepaton Inc.

Sun Inc.

Symantec Corp.

Deployment type

Secondary storage

Primary storage

Secondary Storage

VTL appl. w/ storage

VTL w/ or wo storage

VTL appl. w/ storage

Backup software

Dedupe Cost

Add-on

Included (No cost license)

Included

Add-on

Add-on

Add-on

Included

When Dedupe

Inline

Post Process

Inline

Both (Inline and post process)

Post Process

Post Process

Inline

Dedupe location

Target

Target

Target

Target

Target

Target

Source

Chunk Size

Variable

4 KB block

Variable

Variable

Variable

Variable

Variable

Access Method

-

-

-

-

-

-

Hardware dependent

NAS (NFS/CIFS)

Yes

Yes

Yes

Yes

No

No

-

FC primary storage

No

Yes

No

No

No

No

-

FC tape storage (VTL)

No

No

No

Yes

Yes

Yes

-

iSCSI primary storage

No

Yes

No

No

No

No

-

iSCSI tape storage (VTL)

No

No

No

Yes

Yes

Yes

-

The future of data dedupe

It is likely that over time, data deduplication will become a service and be offered as a feature in conjunction with multiple product types and deployment scenarios. Until this time, you must carefully evaluate their cost, performance and data retention goals prior to choosing a data deduplication product that will deliver the optimal benefits in their particular environment, or test the product carefully in your environment before you buy it.

About the author: Russ Fellows is a Senior Analyst with the Evaluator Group. He is responsible for leading research and analysis of product and market trends for NAS, virtual tape libraries and storage security.



Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   



RELATED CONTENT
Data reduction and deduplication
IBM quietly releases source-side data deduplication in Tivoli Storage Manager 6.2
SunGard adds EMC Data Domain deduplication to Secure2Disk cloud data backup service
Primary storage data dedupe and compression find their niche
EMC's Slootman: Data Domain planning global deduplication, NetWorker integration this spring
Storage roundup: College uses clustered NAS; new Secure Multi-tenancy Design Architecture; and more
The green data centre: Business best practices
Symantec injects data deduplication into NetBackup 7 and Backup Exec 2010
Creating a data center migration plan
Data backup and recovery best practices with W. Curtis Preston
Data backup and recovery choices for SMBs

Data storage backup tools
Tape storage and backup suits us fine, says City firm's backup chief
Storage roundup: UK data backup practices behind those of France, Germany; and more
IBM quietly releases source-side data deduplication in Tivoli Storage Manager 6.2
The pros and cons of RAID disk arrays in small business data storage environments
Tiered data backup storage strategies
Symantec injects data deduplication into NetBackup 7 and Backup Exec 2010
Creating a data center migration plan
i365 makes cloud data storage connection with CA Recovery Management
Virtual machine (VM) backup has RLAM investing in Veeam Backup & Replication
Data backup and recovery best practices with W. Curtis Preston

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary




Data Backup Solutions for UK - Data Reduction, Data Deduplication, Tape Storage
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2008 - 2010, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts