| COLUMN |
Backup management: Tracking recurring backup failures and tape drive performance |
 |
 |
 |
By David Boyd
04 Feb 2009 | SearchStorage.co.UK



|
|
Whether you approach backup management from an SMB or enterprise point of view, there are two metrics you should be keenly aware of: the presence of recurring backup failures and tape drive performance. Backup management in an SMB environment can be simple. One tends to become familiar with each individual backup job: when it runs, when it roughly completes and how much data it will secure. You also tend to have good visibility into backup failures, and therefore know when they're succeeding. Essentially, you have a good grasp of the recovery point for the servers you're responsible for.
But backup management in larger environments is difficult, if not impossible, to maintain with such intimacy; there's no way you can remember the details of 20,000 backup jobs. You're unlikely to know offhand whether a backup that failed did so for the first time or the 10th time. This is why accurate backup reporting tools are essential.
Beware of statistics in backup management
I recently came across a case where some data had corrupted and required restoration from tape. Sadly, there was no tape copy to be found. The backups were configured and working, but they ran beyond the backup window and were being terminated automatically. It wasn't uncommon for a percentage of backups to be terminated in this fashion each day, so no special attention was paid to this particular client.
Also, the reporting mechanisms in place were based upon a percentage success rate and the environment frequently achieved success rates in excess of 99.7%. However, statistics tell only part of the story. The 0.3% of failures in the scenario above could represent terabytes of data and perhaps as much as 10% of business-critical information.
For these reasons, there are two metrics I always examine when analysing a backup environment:
- The presence of recurring failures. Most organisations can cope with the odd failure; if there are tight recovery point objectives (RPOs), data can often be recovered from other sources (e.g., database log shipping and subsequent roll forwards). But recurring failures often represent the largest risk. In the situation above, the host with the corrupt data had routinely fallen into the 0.3% of failed backup jobs and as a result reached a point where restoration was impossible.
- Tape drive performance. If you know you are securing your data, the next question to ask is "How efficiently are you driving your hardware?" From this, you can deduce how you can get the most from your environment and at which point it will need capital investment.
There are an ever-growing number of products available that can provide detailed reports of your backup environments, regardless of the backup application. Purchasing the correct tool and configuring it to deliver all of the information your organisation requires is essential. It's important you don't rely on statistics as the sole indicator of an environment's success because that approach doesn't highlight recurring problems.
About the author: David Boyd is a senior consultant at GlassHouse Technologies (UK), a global provider of IT infrastructure services. Boyd has more than seven years of experience in backup and storage, with a major focus on designing and implementing backup solutions for blue chip companies.
');
// -->
|
 |
|
 |