Home > Ocarina ECOsystem deconstructs before compression, deduplication for primary storage data reduction
Special Report:
EMAIL THIS

Ocarina ECOsystem deconstructs before compression, deduplication for primary storage data reduction

16 Dec 2009 | Carol Sliwa, Features Writer

Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   

Ocarina Networks Inc. claims that its primary data storage-targeted ECOsystem appliance produces data reduction of up to 85% on Microsoft Office documents, PDFs and virtual machine files and 40% or more on images.

But the competition contends the savings come at a performance price that some users may be unwilling to pay.

Ocarina says judging the performance impact is a complex matter and depends, to some degree, on how a customer uses the product. The appliance offloads CPU from the filer and does data deduplication and compression on a post-process basis, according to Mike Davis, senior director of marketing at Ocarina.

"The main impact to consider is not CPU overhead but marginal delay for decompressing files," Davis wrote in an email. "This does take 'milliseconds,' which matters for transactional applications but not for human reads and Web services where we are generally not noticeable."

The ECO in ECOsystem stands for extract, correlate and optimize. In the first step, the software identifies the file type and then extracts, or decompresses, the file to get to the zeros and ones that represent its richest expression. A compound document (such as a PDF with an embedded image file) may require multiple levels of recursive decompression.

More on primary storage data reduction
Primary storage data reduction advancing via data deduplication, compression

NetApp: Post-processing approach limits performance hit

EMC: Primary storage reduction via dedupe, compression


Storwize claims good compression, no performance hit

Primary storage data deduplication is mature now, says Gartner analyst
"Really, this extraction is the thing that makes us different than everybody else," said Carter George, vice president of products at Ocarina. He claimed duplicates are often obscured because many file types have already been compressed. By decompressing files, duplicates are exposed.

While unraveling the files, ECOsystem attempts to identify natural object boundaries, such as section of text, a graphic or a photo. For instance, it might take the unique hash of the whole photo, rather than looking for 4 K duplicate chunks at the block level.

In the correlation (or data deduplication) step, the system removes the duplicates and directs pointers to the matching parts.

"By keeping those things together as natural objects, we get to the compression stage and you've already taken out the dupes," George said . "You can still get more space savings by applying compressors to the things that are left."

ECOsystem has approximately 125 compressors for the optimization step. Some are standard compressors based on the pioneering work of Abraham Lempel and Jacob Ziv. Others are proprietary compressors developed by Ocarina's research team of mathematicians for specific file types, such as seismic or genomic data.

"The more you know about what kinds of patterns are going to show up in a file, the more specialized the compressor you can build," George said. "There's whole classes of data where you will get zero to 10% data reduction with dedupe, but with a good compressor, you can get 50%, 60%, 70%, 80% reduction."

Customers who want maximum performance might opt to turn on deduplication and turn off compression. That works well for VMware VMDK files and might be beneficial for other primary storage scenarios, according to George.

But George estimated that 80% of online or near-line storage is not especially hot, or active, so customers might elect to use fast generic compression. A third option is available for archived colder storage, using the data-specific compressors. The system looks inside each object to figure out the data type and pick the best-fit compressor.

"You get sets of knobs and dials to pick what you want," George said. He advises customers to consider applying policies, such as only deduping files that are older than 10 days or haven't changed in 30 days, to minimize the performance impact.

Like some other primary storage offerings, Ocarina's product works on a post-process basis, waiting for a file to land on disk before deduplicating it. But unlike the others, ECOsystem employs what George calls a "sliding window," or variable-block approach, to compare the zeros and ones on the block to find duplicates.

ECOsystem works only with network-attached storage (NAS) filers today, but George said one partner issued a request for block storage. Other future plans for the product include an embeddable edition for NAS vendors, a direct-attached storage (DAS) option and a port for Windows servers.

Although George claims ECOsystem is 100% for the primary enterprise data storage market, not all of its customers choose to use the product that way. Saker Klippsten, head of engineering at Zoic Studios Inc. in Culver City, Calif., said his company uses the technology for secondary storage of reusable assets, such as stock film footage.

Klippsten said Zoic has realized 40% to 65% data reduction with Ocarina's ECOsystem and has no plans to use it for primary storage. "It takes time to decompress and read the files, and we want to access them in real-time," he said.

Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   



RELATED CONTENT
Disk arrays
3PAR adds SSDs, sub-volume automated tiered storage to InServ arrays
SAS technology: SAS-2 enhancements and product overview
RAID disk arrays in small business data storage environments
EMC upgrades Symmetrix V-Max arrays, thin provisioning
Primary storage data reduction: Data deduplication and compression tools
Primary storage data reduction advancing via data deduplication, compression
EMC Celerra: Primary storage data reduction through deduplication, compression
NetApp: Post-process deduplication limits performance hit in primary storage data deduplication
Gartner analyst on data deduplication for primary storage
Storwize claims good data compression rates, no performance degradation on STN-6000 appliance

Disk drives
Solid-state drives vs. hard disk drives: How to justify the cost of an SSD
3PAR adds SSDs, sub-volume automated tiered storage to InServ arrays
FalconStor, Violin combine on Flash SAN accelerator
SAS and SATA explained
Using SAS and SATA for tiered storage
SATA technology advances and expands in the enterprise
Storage roundup: College uses clustered NAS; new Secure Multi-tenancy Design Architecture; and more
Primary storage data reduction: Data deduplication and compression tools
Primary storage data reduction advancing via data deduplication, compression
NetApp: Post-process deduplication limits performance hit in primary storage data deduplication

Data reduction and deduplication
IBM quietly releases source-side data deduplication in Tivoli Storage Manager 6.2
SunGard adds EMC Data Domain deduplication to Secure2Disk cloud data backup service
Primary storage data dedupe and compression find their niche
EMC's Slootman: Data Domain planning global deduplication, NetWorker integration this spring
Storage roundup: College uses clustered NAS; new Secure Multi-tenancy Design Architecture; and more
The green data centre: Business best practices
Symantec injects data deduplication into NetBackup 7 and Backup Exec 2010
Creating a data center migration plan
Data backup and recovery best practices with W. Curtis Preston
Data backup and recovery choices for SMBs

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary




Data Backup Solutions for UK - Data Reduction, Data Deduplication, Tape Storage
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2008 - 2010, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts