According to a white paper released this week and dated Feb. 26., Network Appliance Inc. (NetApp) has deduplication, but it's still limited in terms of the products it supports, the size of data stores it can dededuplicate and has the potential for high overhead.
The company has ported its single-instancing algorithm, which it calls Advanced Single Instance Storage (A-SIS), from its SnapLock
According to the white paper, "A-SIS only stores unique data blocks in the flexible volume and creates a small amount of additional metadata in the process." Each block of data has a digital signature, which is compared to all other signatures in the flexible volume. If an exact byte-for-byte block match exists on the flexible volume, the duplicate block is discarded and its disk space is reclaimed."
The white paper also claims that the post-process deduplication has a 1% write performance hit. The background process, which is activated through a command line interface, can also be scheduled or run manually. A-SIS operates on the active file system (AFS) of a flexible volume.
The product is currently in beta tests and has not yet been released to the public. Though the white paper contains instructions for deploying A-SIS with NearStore, it requires two licenses, called "nearstore_asis2" and "nearstore_option" to be activated on the filer.
A-SIS won't work with snapshots, LUNs and limited in scale
There are also a few catches at this phase of the product as detailed by the white paper: Any block referenced by a snapshot copy cannot be deduplicated, A-SIS will only work on data sent via CIFS or NFS, it will not work on LUNs and is only compatible as yet with the NearStore R200, FAS3020c and FAS3050c. A-SIS also cannot deduplicate across FlexVols, which currently have a size limit of 4 terabytes (TB) on the R200, 2 TB on the FAS3020c and 1 TB on the FAS3050c.
The white paper also warns, "The total storage used by A-SIS is …1% to 3% of the actual stored data due to fingerprints in the fingerprint file and change log file(s). So for 1 TB of data there would be 10 GB to 30 GB of overhead." That's without snapshots -- if snapshots are turned on for the flexible volume, the paper states, "the overhead becomes additive each time A-SIS is run and is therefore substantial."
Finally, under best practices, the white paper suggests that users "run A-SIS infrequently … do not run eight A-SIS processes concurrently if possible because there will be a negative performance impact on other applications."
It continues, "given the above two items, the best bet is to disable any A-SIS schedules on the flexible volume and run A-SIS manually, [and] turn off scheduled Snapshot copies or keep Snapshot copies to a minimum ... if Snapshot copies are required, run A-SIS before creating the Snapshot copy as this will minimize the amount of data that gets locked in Snapshot copies."
"This pretty much looks like an SMB [small and midsized business] type play where the backup window finishes, and you have fairly large amounts of time to dedupe data already backed up," said Jerome Wendt, lead analyst and president with the DCIG Inc. "You might free up space in the course of a day with it, but you still need all that space for your backups before the dedupe happens."
"Based on what I'm seeing in this white paper, to me this looks like a poor man's solution to single-instance storage," Wendt said. "I wouldn't view this as a robust way to manage it. It appears it will work, but it's really not suited for the enterprise. There are so many qualifiers here, and even NetApp recommends just doing it under certain circumstances and infrequently."
Read the entire white paper on A-SIS.