Special Report

Data reduction techniques for better storage efficiency

Data reduction via simply deleting unwanted or unneeded data is the most effective way to reduce a company's data footprint and, in turn, its energy needs. The hard part is often getting end users to cooperate. When deletion isn't a viable option, IT shops can turn to technologies such as data deduplication, compression and thin provisioning to promote storage efficiency and possibly better power consumption.

"The real savings in energy efficiency comes in managing the data," said Christine Taylor, an analyst at Taneja Group in Hopkinton, Mass. "The less data you have to house, the less energy you're going to have to use."

Before turning to data reduction technologies, Taylor advises IT shops to delete as much data as possible and set up automated data retention so they'll be able to safely eliminate or copy data to tape at the end of their pre-set retention periods. Classification mechanisms can also help to weed out unnecessary data.

If deletion hinges on user participation, IT shops might need to come up with some creative approaches or subtle forms of pressure. The Friedrich Miescher Institute (FMI) for Biomedical Research in Basel, Switzerland, for instance, decided to make the data archiving process a bit more difficult for its researchers.

When FMI's scientists had the ability to acquire data directly into the institute's archive, they sometimes saved large quantities of unnecessary data, such as the results of failed experiments. Now they must first acquire the data locally and then move it to a "scratch area" to analyze it before they're able to shift it to the archive.

"It's work to get the data into the archive, so they only take what they absolutely need," said Dean Flanders, head of informatics at FMI. "We really want to make sure that they're not just dumping things in there that they'll never look at. We don't want to store junk."

When the challenge of deletion proves too great, the following technologies can achieve data reduction and/or boost storage efficiency, which can have an impact on energy consumption:

Data deduplication: Deduplication technology shrinks the data footprint by removing redundant data at the file or sub-file level. It's most common use is with backups and archives, but it's increasingly gaining acceptance with primary storage.

The leading vendors in the backup/archive realm all offer data deduplication capabilities. NetApp Inc. is the top proponent in the primary storage space and claims that more than 8,000 customers have licensed its free deduplication technology.

Compression: Data deduplication eliminates redundant data; compression reduces the size of every piece of data based on algorithms that have been around since the 1950s. Compression can be done standalone or in conjunction with data deduplication.

Leading storage vendors generally offer some form of compression. Storwize Inc. promotes compression of primary storage, and EMC Corp. and Ocarina Networks Inc. offer a combination of data deduplication and compression for primary storage.

Snapshots: Snapshots are point-in-time copies of files, directories or data volumes that are especially helpful in the context of backups. Some systems save space by copying only the changes and using pointers to the original snapshot.

"A couple years ago, we used to make full-volume copies all the time to make backups more efficient. You quickly took a logical image of a backup, took a snapshot and used it to run your backups again, and that used to require the same amount of capacity as the volume that you were making a copy of," said Brian Garrett, vice president of the Enterprise Strategy Group Lab in Milford, Mass. "Now with snapshots, you can reduce that capacity required to make a logical copy of the volume, which can reduce the amount of capacity you need to deploy and the amount of spinning drives, and, therefore, the energy you're using."

Thin provisioning: IT shops can improve their storage utilization rates by using thin provisioning to allocate storage on demand, rather than allocating it upfront, which often leads to overprovisioning and lots of underused or wasted disk space.

"In essence, what you've created" with thin provisioning, Garrett said, "is a pool of spinning drives that is energy-optimized to meet your needs."

This was first published in July 2010