Continuous data protection (CDP) backup works to back up all of the data in a system whenever a change is made. So if your data backup system is hit with a virus or data loss occurs, you can go back to the most recent clean copy of the data and restore it. There are two types of continuous data protection: real-CDP and near-CDP. What are the differences between these? Can we consider continuous data protection a replacement for traditional backup systems? And who should consider using CDP in their enterprise? W. Curtis Preston, independent backup expert and executive editor for TechTarget, answers these questions and more in this Q&A. His answers are also available as an MP3 below.
Download for later:
Download the podcast with Curtis Preston
• Internet Explorer: Right Click > Save Target As
• Firefox: Right Click > Save Link As
Table of contents:
>> Can you outline the differences between real-CDP and
>> Some people have problems with the term near-CDP. Why?
>> Can we consider CDP a replacement for backup?
>> Who should consider using CDP in their enterprise?
>> What CDP products are being offered today?
The term continuous data protection really only applies to what some people call real-CDP. Real-CDP is basically data replication where the changes that occur on the system that is being protected are immediately replicated to another system. Replication and CDP are different because of what happens on the destination side. With real-CDP, there is something that's continuously being updated on the destination site. In addition, real-CDP also stores a log of these changes, allowing the system to roll those changes back to a previous point in time. Essentially, with real-CDP you're able to undo anything that happens to a system within the period of time that you're storing data to it.
People using real-CDP typically store every change in their system for a few days. Then they create points in time which are treated like snapshots. So the changes that happen between these points in time are discarded. For example, say a person who's using real-CDP stores every change that happens to their data system for three days in hourly snapshots. Then maybe after a week or a month you go to storing daily changes, and so on.
Near-CDP only documents changes to the data from specific points in time. Near-CDP is not an official term, but rather, it's a combination of snapshots and data replication. Snapshots by themselves are not a good data protection mechanism because they rely on the system being protected. Replication by itself is not a good data protection mechanism because if you do something stupid like delete a table or get a virus it simply makes the virus more effective because it replicates the virus over to your replication system. However, when you combine the two technologies together, you get a good data protection mechanism, and many people call this near-CDP. Essentially, with near-CDP technology I would take a snapshot, say ever hour, and when I take that snapshot then I can replicate those snapshots over to another destination.
The true difference between near- and real-CDP is what they do during the recovery stage. A real-CDP system can recover any point in time. So if you were to delete a user in Microsoft Exchange and immediately decide you didn't want to delete that user, you can roll back that entire system to three seconds ago and recover that user. If you found yourself in the same situation while using a near-CDP system that's making snapshots of your data every hour, and it's been 59 minutes since your last snapshot, you would have to roll the entire database back at least an hour. In other words, only real difference that truly matters is the recovery point objectives (RPOs). Real-CDP systems can recover up to any point in time and a near-CDP system can recover to significant points in time that you had to have made in advance.
Certain people object to the term near-CDP. They will say, "You're either continuous or you're not continuous. You can't be near-continuous." In my opinion, that's technically true, but there are a number of binary conditions where we use the term "near" or "nearly" to modify phrases, such as "nearly dead" or "nearly full." To me, near-CDP has so much more in common with real-CDP than it does with regular backups. It's done continuously throughout the day. It's an incremental backup. But the difference between traditional data backups and near-CDP and real-CDP is the recovery process. In both CDP backups, the recovery in most cases is done by mounting a file system. In traditional data backups, we have to restore something, which will take a finite amount of time. If our restore is a 10 TB Oracle database, we are going to be there for a while, whereas if we have a near-CDP or real-CDP system, we can simply mount the continuous data protection as the recovery system and start using it immediately as your recovery system while you are repairing the primary system. However, if you're using near-CDP you can't recover everything, you can only recover to certain points in time. So again, it's near to being continuous but it's not continuous. I'm fine with the term and so are many others, but because of the controversy, it will probably never be an officially sanctioned term.
Sort of. I say that because there are some caveats. But you can under many circumstances replace your traditional data backup system with continuous data protection backup. If you think about what backup needs it needs an onsite copy and an offsite copy. You do need the ability to get the data offsite. There unfortunately are some companies that feel they have to have a tape copy If you do have to have a tape copy, you can create that from a CDP system, but you'll need a traditional data backup system to do that. Also you're going to need long-term retention.
A near-CDP system or a real-CDP system can do the same things as a traditional backup system except for the tape copy. Historically, true CDP systems did not do long-term retention but they can now by switching to the snapshot model where they only store certain periods of changes. They have a much longer retention period than they used to. So you can recover everything and store data for up to however long you're willing to buy disk for. On that same note, the amount of disk you need for your CDP system is similar to the amount of disk you need for traditional backup. They don't require much more disk than a backup system using regular data deduplication. You can also do onsite and offsite backup with CDP because you can backup to the onsite system and that can replicate to an offsite system. So I think it meets all of the real requirements that a backup system needs.
Frankly, I hope that CDP or something like it is adopted more in the future because the way we've been doing backups is kind of silly. With traditional backup systems, you are creating full backups or full file incremental, meaning if, for example, one block in my PowerPoint file changes, then I have to back up my entire PowerPoint file. Continuous data protection, on the other hand, is a lot smarter because you're not doing a full restore -- that's silly. We don't have time to do that in today's data center. Plus, CDP addresses most if not all the issues that a traditional backup system has created over the decades. So yes, I think CDP could replace traditional backup and I am seeing it replace traditional backup in a number of environments. I truly hope that it becomes more accepted over time and replaces traditional backup in a lot more environments.
I think that everyone should consider continuous data protection backup. The biggest motivator would be if you have a really large system and you're not sure how you're going to meet your RPOs and recovery time objectives (RTOs). CDP systems can recover things from as close as three seconds ago, and I don't know anyone where that wouldn't make them happy. Plus, the RTO can be as long as it takes to mount of the recovery system. So choosing CDP is best when you have significantly difficult RTOs and RPOs.
Another reason why you might consider CDP is its cost. You can go and spend hundreds and millions of dollars on a dedupe system, but that's really just a band-aid for your backup system. You should consider going back to the drawing board and instead of buying a big dedupe system, consider using a CDP system.
There are several. Most of them are being offered by large companies Such as Atempo Inc., CA, FalconStor, IBM Corp. and Symantec Corp. All of these large companies that have some type of continuous data protection or near-CDP offering. NetApp is kind of the original, if not most popular, company offering CDP. Also, any vendor that currently offers snapshot or replication in their products can be considered CDP products. All of the companies that I just mentioned with the exception of NetApp have real-CDP offerings.
There are also other companies like AppAssure Software Inc., Double-Take Software Inc. and InMage -- they all are smaller companies that are offering CDP. Similarly, if you're okay with using a startup company, I would urge you look at those as well because they're looking at taking continuous data protection to the next level and using CDP as an enabler for their backup products. If you want a more traditional larger company product, then I would talk to your backup software vendor and find out what their CDP offerings are like.
This was first published in July 2010