Tutorial

What are synthetic backups and can you eliminate full backups?

By W. Curtis Preston

A long time ago, IBM Corp. Tivoli Storage Manager (TSM) developers asked a simple question: Why are we backing up data that hasn't changed? This became one of the core elements

Requires Free Membership to View

of TSM design and what TSM would eventually refer to as "progressive incremental" and others would call "incrementals forever." Once a given version of a file has been backed up, it's never backed up again.

Backup applications tutorial
New features in backup applications

Data dedupe: Transforming backup to disk

Data protection management features take you beyond simple backup statistics

Continuous data protection is still alive in backup and recovery

What are synthetic backups and can you eliminate full backups?

Virtual server backup may finally be getting a little easier

Other backup products have chosen to use the traditional full/incremental approach to backups, also referred to as the grandfather-father-son method. But the question persisted: Why are we backing up data that hasn't changed? Eventually, CommVault, EMC Corp. and Symantec Corp. all came to the same conclusion: Instead of transferring data that's already been backed up across the network, just transfer it from one tape to another within the backup server. Because 90% of any given full backup is already on tape or disk somewhere, a "synthetic full" can be created by copying the data that's needed from the latest full to a new full backup. This provides the benefit of a full backup (fast restores via collocation of the necessary data) without the downside of a full backup (unnecessary transfer of the data across the network).

All three products have implemented the concept of the synthetic full in a slightly different way (CommVault and Symantec call synthetic fulls "synthetic backups, "while EMC uses the term "saveset consolidation"). However, all of them share one critical concept. Once a synthetic full is created, it's essentially just like any other full: it will be used for restores and later incremental backups will be based on that full. The previous full is only necessary if you're keeping it for longer retention.

Tivoli Storage Manager users may feel that TSM's concept of a backup set is very similar to a synthetic full, but it's actually quite different. Unlike synthetic backups, the contents of a TSM backup set aren't tracked in the backup database. In fact, one of the main purposes of a TSM backup set is to create an "instant archive" of backups that you wish to keep for a longer period of time than your TSM database has room for (see "Can backups be turned into archives?" below). Another purpose for the TSM backup set is to create a backup that can be used outside of TSM; a TSM backup set can be read without the aid of the TSM catalog. If TSM backup sets were kept in the TSM database and usable for standard restores, then they would be the same as a synthetic full.

Can backups be turned into data archives?

IBM's Tivoli Storage Manager has a backup feature where backups are copied to what is officially called a "backup set." IBM occasionally also calls a backup set an "instant archive." This seems to go against the usual mantra that backups aren't archives, and simply holding onto backups longer doesn't magically turn them into archives. So are TSM backup sets truly data archives?

To answer this question, let's take a look at a new feature in Symantec Corp.'s Backup Exec 2010. Backup Exec incorporates Symantec's Enterprise Vault engine, so users can create archives of their backups by copying them into this engine. But Backup Exec does more than just copy the data from one tape format to another; it actually creates an index of the content of the archived files or applications. This means that you can perform Google-like searches against these archives by searching for phrases that might appear in files or Exchange emails, and Backup Exec will extract that data for you.

CommVault Systems Inc.'s Simpana also has the ability to perform content searches against its backups. You can search for files or emails based on a particular word or phrase. Like Symantec, they have a more full-featured archive product as well, but you can perform archive-like searches against their backups.

Let's contrast this to what Tivoli Storage Manager is doing. A TSM backup set actually has fewer database entries than a regular TSM backup; its purpose is to "archive" older files that you no longer have room for in the TSM database. So instead of having more context than regular backups, a TSM "instant archive" actually has less. While it's now possible with some products to "turn a backup into an archive," calling a TSM backup set an "instant archive" does a disservice to the word archive.

But that's not to suggest that TSM backup sets have no value. They do allow for longer retention than what's possible in the TSM database, and they also allow for restores without having to install TSM.

Editor's Tip: Click here to go to the next part of our backup applications tutorial, and learn about virtual server backup.

This story was originally published in Storage magazine.

This was first published in June 2010