The Kadena technology uses progressive deduplication, which is different than the fixed-block and variable-block deduplication approaches most vendors use to identify dedupe data. Network Backup 9 is application-aware and creates a different size window depending on the application it is deduping. It takes a two-step approach, first looking for probable matches using a lightweight algorithm and then confirming the matches with a hash. Arkeia CEO Bill Evans calls this a sliding window approach, and claimed its method is faster than only using a hash.
Fixed-block deduplication looks at chunks that are all the same size and has a lower compression rate than variable-block dedupe, which examines blocks in different sizes and takes longer to dedupe. "Progressive deduplication with a sliding window gives you the best of both worlds," Evans said.
Arekeia also offers a profiler application to calculate dedupe compression ratios by measuring exact compression ratios on specific data. The tool is used for planning deduplication projects.
Arkeia's data dedupe option will cost $2,000 per server. It will be available in early 2011, with the profiler tool due out next month.
Evans said Network Backup upgrades usually come around 18 months apart, but version 9 took two years to develop because of the integration of Kadena's dedupe. He said it was important to get deduplication right in this release because most of Arkeia's competitors already have it and he sees it as part of the bigger picture of replicating backups from on-site appliances to the cloud. That's a capability Arkeia plans to add.
"We think the battle for deduplication is not over yet," he said. "We wanted a compelling approach to dedupe to differentiate ourselves from others in the market. It's part of our long-term strategy of having backups replicated to the cloud. The long pole in the tent for that model is deduplication, because the scarce resource is bandwidth to the cloud."
Enterprise Strategy Group analyst Lauren Whitehouse said Arkeia is putting a new twist on an established technology.
"It's hard to create something completely different," she said. "Arkeia is evolving its dedupe from what others have done. Hash and compare are not new. Fixed length and variable length are not new. When you take the best of these approaches and combine them, you can have something better. It's taking what's already out there and is understood, and is trying to improve on it."
Whitehouse said the ability to replicate deduped data is an important piece that Arkeia is still missing.
"There's no storage-to-storage replication," she said. "If I don't want to use tape, how do I get the data off-site? If I optimize on disk, I want to move it to a secondary site and make sure it's optimized. I could use a feature of my storage system, but I might have to un-dedupe it to move it across the wire."
ExaGrid's goes 'generic' with DeltaZone data dedupe
ExaGrid introduced DeltaZone, a new deduplication algorithm that lets customers use a generic byte-level deduplication as well as content-aware byte-level dedupe.
ExaGrid has always offered content-aware dedupe for its EX series of midrange data backup appliances. The content-aware dedupe scales better and is tuned for individual applications, but ExaGrid VP of product marketing Marc Crespi said it is less flexible because it limited the applications the product could support and it took ExaGrid longer to release upgrades and support new applications.
Application-aware deduplication requires the vendor to optimize its software with specific applications before it supports those backup apps for its customers. ExaGrid supports CA ARCserve, CommVault Simpana, IBM Corp. System i platform (AS400 or iSeries), Hewlett-Packard (HP) Co. Data Protector, Symantec Corp. Backup Exec and NetBackup, Veeam Software, Vizioncore (now Quest Software Inc.) vRangerPro and VMware Backup.
"Byte-level methods require more knowledge of content," Crespi said. "We have to do more work to bring applications to market. Now you'll see us accelerate product announcements. We'll add backup applications and venture into archiving and nearline use cases."
For now those use cases do not include primary storage, Crespi said, "We're staying focused on secondary storage -- backup, archiving and nearline data," he said. "However, DeltaZone technology has promise for primary storage. You can envision what this byte-level variant can do for primary storage deduplication."
ExaGrid customers can have their systems analyze data by content or in the generic mode. Crespi said DeltaZone has been tested by a small group of customers since January, and now is part of the ExaGrid operating system.
Whitehouse said ExaGrid's generic mode can cause less disruption for its customers when adding upgrades and new features.
"End users don't want to risk disruption once the solution is in place," she said. "Scalability for performance and capacity is a key consideration for purchase as end users mature in their use of the technology. While early adopters were OK with a high-risk/high-reward proposition in the early stage of deduplication adoption, now the majority of customers want high reward without risk."