Home > NetApp: Post-process deduplication limits performance hit in primary storage data deduplication
Special Report:
EMAIL THIS

NetApp: Post-process deduplication limits performance hit in primary storage data deduplication

16 Dec 2009 | Carol Sliwa, Features Writer

Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   

NetApp Inc. offers data deduplication as a feature of its Data Ontap operating system with its FAS and V-series systems. The company cites post-process deduplication as a major reason it's able to limit the deduplication performance penalty to 10% to 20% for average workloads. Writes are stored to minimize interference with application throughput. Deduplication runs later either on a scheduled basis typically during off-peak hours or automatically, based on the growth of the storage volume.

"It's always done in the background, and it's always done after the write occurs," said Larry Freeman, senior marketing manager for storage efficiency at NetApp. "If you run it more frequently, it's going to run faster because we're going to catch the duplicate blocks before there's too many of them."

Inline vs. post-process deduplication

NetApp's post-process deduplication approach contrasts with the inline, or real-time, method used by some of the popular backup dedupe products such as EMC Corp.'s Data Domain. (Some of the other backup systems use post-process deduplication.) Inline dedupe removes the duplicates as they appear and wastes little space. But Freeman claimed the performance impact on CPU resources is too high for primary storage.

"They're intercepting [the data] at the storage controller, and they have to make an immediate real-time decision: Do I store this or do I reference it?" Freeman said of inline dedupe products. "You have to compare that data object to every other object that's been stored previously. They do this with some sophisticated look-up tables and hash comparisons, but the more data is in the system, the more extensive the look-up has to be, and the slower the system becomes."

Freeman said the vendor originally expected its dedupe to be used for backup and archiving, but customers found it especially valuable for reducing VMware virtual machine disk (VMDK) files. "We promoted that and it really just took off," he said. "There was no turning back. Deduplication became the focus of primary storage."

More on primary storage data reduction
Primary storage data reduction advancing via data deduplication, compression

EMC: Primary storage reduction via dedupe, compression

Ocarina ECOsystem compresses, dedupes

Storwize claims good compression, no performance hit

Primary storage data deduplication is mature now, says Gartner analyst
NetApp's post-process deduplication system uses a fingerprint catalog to identify candidates for data deduplication. Each 32-byte, algorithm-created fingerprint, which is also referred to as a digital signature or hash, references a larger 4 KB data block. When the system finds two fingerprints that match, it pulls the blocks into memory and does a byte-level validation to insure against false positives or hash collisions.

Multiple-block referencing technology then kicks in. Each of the data blocks has a pointer going to it. If two blocks validate as identical, the system moves one of the data pointers to point to the same block as the first pointer and releases the duplicate block back to the free pool on the storage system.

But Freeman said NetApp's Data Ontap operating system is especially conducive to data deduplication because it includes a file system with data pointers to facilitate the multiple-block referencing. "All we needed to do to add deduplication was create a catalog of fingerprints to identify duplicate data," he said.

NetApp deduplicates any raw data on the system, whether storage-area network (SAN) or network-attached storage (NAS). The system supports deduplication on a per-volume basis, with a volume limit of 16 TB. Future plans include addressing customer requests for increased volume sizes as well as deduplication across volumes.

Space savings average out at 30% across all storage tiers, performance workloads and applications, according to Freeman. He said the company doesn't break down the storage savings by tier. But with its leading use case, VMware Inc. VMDK files, space savings are in the range of 70%, he said.

The American Association of Airport Executives claimed initial space savings of approximately 30% on 1 TB of CIFS-based shared drives and 22% on 600 GB of NFS-based data using deduplication with the NetApp FAS 3140 it rolled out in February.

"If I don't have to keep growing that volume out but I can put more on it because of dedupe, I can not only store more locally but I can replicate more and have a better disaster recovery plan. And it doesn't take up anymore bandwidth," said Patrick Osborne, senior vice president of IT at the Alexandra, Va.-based association.

But,Osborne wasn't comfortable performing deduplication on all of his data. The association elected not to deduplicate its training videos and highly sensitive biometric files out of fear of corrupting the data, he said.

"I brought it to my users and said, 'Hey, we can do this [on the NetApp FAS 3140]. We might save space, but we don't know how it's going to work.' They said no," Osborne said. "Since I was saving space in those other areas where I was really looking to save space, I was OK."

Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   



RELATED CONTENT
Data reduction and deduplication
SunGard adds EMC Data Domain deduplication to Secure2Disk cloud data backup service
Primary storage data dedupe and compression find their niche
EMC's Slootman: Data Domain planning global deduplication, NetWorker integration this spring
Storage roundup: College uses clustered NAS; new Secure Multi-tenancy Design Architecture; and more
The green data centre: Business best practices
Symantec injects data deduplication into NetBackup 7 and Backup Exec 2010
Creating a data center migration plan
Data backup and recovery best practices with W. Curtis Preston
Data backup and recovery choices for SMBs
Data protection and data backup trends in 2009

Disk arrays
3PAR adds SSDs, sub-volume automated tiered storage to InServ arrays
SAS technology: SAS-2 enhancements and product overview
RAID disk arrays in small business data storage environments
EMC upgrades Symmetrix V-Max arrays, thin provisioning
Primary storage data reduction: Data deduplication and compression tools
Primary storage data reduction advancing via data deduplication, compression
Ocarina ECOsystem deconstructs before compression, deduplication for primary storage data reduction
EMC Celerra: Primary storage data reduction through deduplication, compression
Gartner analyst on data deduplication for primary storage
Storwize claims good data compression rates, no performance degradation on STN-6000 appliance

Disk drives
3PAR adds SSDs, sub-volume automated tiered storage to InServ arrays
FalconStor, Violin combine on Flash SAN accelerator
SAS and SATA explained
Using SAS and SATA for tiered storage
SATA technology advances and expands in the enterprise
Storage roundup: College uses clustered NAS; new Secure Multi-tenancy Design Architecture; and more
Primary storage data reduction: Data deduplication and compression tools
Primary storage data reduction advancing via data deduplication, compression
EMC Celerra: Primary storage data reduction through deduplication, compression
Storwize claims good data compression rates, no performance degradation on STN-6000 appliance

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary




Data Backup Solutions for UK - Data Reduction, Data Deduplication, Tape Storage
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2008 - 2010, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts