Data Domain vs. EMC Avamar: Which deduplication technology is better?

By November 9, 2009Avamar, Deduplication, EMC
Lego Ninja vs. Robot

Data Domain vs. Avamar – Which is better?

Now that EMC owns both Data Domain and Avamar, I am constantly being asked which technology is better. Before the Data Domain acquisition, it was tough to get a straight answer because the two deduplication giants were constantly slugging it out and slandering each other to try and find an edge and gain more market share. With the two technologies now living under the same umbrella, sometimes it is hard to tell where one technology ends and the other begins.

Both Avamar and Data Domain have their pros and cons and the niches where they fit best, as well as places where they shouldn’t be deployed. If you’re reading this post, you’re probably trying to figure out which technology would be the best for you. In an effort to try and help sort the sales fluff from the truth, I have tried to summarize the similarities and differences between the two products so you can figure out which one best fits your environment.

First off, the two products share some common attributes, so let’s look at those first.

Block-based deduplication

Rather than just deduplicating full files that are exactly the same, Avamar and Data Domain will break files apart into small blocks and compare those blocks to ones that have already been backed up. Each unique block only needs to be backed up once within your environment. So let’s say you have a 100MB PowerPoint presentation that has your name on the front slide and you send it to 9 other people and each person makes a small change and puts their name on the front slide. Traditional backup technologies will see each one as a new file and backup 10 copies of the file, each 100MB in size. Avamar and Data Domain will break the file apart into blocks and see that the only a small portion has changed and backup just the changes.

Variable length deduplication

Both Avamar and Data Domain utilize variable length deduplication rather than fixed length de-dupe. What does that mean in Layman’s terms? Fixed-length deduplication always looks for segments of the same size when looking for common data. So if you use a 128K fixed block, it will look at the first 128K of the file, then the second 128K of the file, and so on looking for common data. Variable-length deduplication takes a more intelligent approach and can vary the size of the segment when it is looking for commonality. This means if small changes are inserted into the middle of a file, it is smart enough to pick out just those changes and still see the common data around them. When changes are inserted into the middle of a file with fixed-length deduplication, the data will typically shift and the remainder of the file can often be seen as all new data. Avamar and Data Domain both utilize variable-length deduplication and that is why you will typically see much higher commonality rates than most of the competition.

That is where the similarities end and the differences begin. Here are some key areas where the two products differ:

Where does the deduplication happen?

Data Domain utilizes target-based deduplication. The Data Domain appliance is simply a disk target that you point your backup software at. Backups leave the server in their full format and are deduplicated on the fly as they hit the Data Domain appliance. The data flowing out of the server and across the network is not reduced, but the amount of data stored on disk is reduced significantly.

On the other side, Avamar utilizes source-based deduplication. Since Avamar is both the backup software and backup-to-disk target, it can actually deduplicate the data before it leaves the server. This means that files are broken apart and deduplicated before any backup data is sent across the network. Only the changed blocks are sent across the network to the backup-to-disk target. This results in a reduction in network traffic, the amount of data stored on disk, and also the time it takes you to backup.

What is included?

Data Domain was designed to very simply integrate into any existing backup environment with very little effort. You still utilize your existing backup software and just point to the Data Domain appliance as a backup target.

Avamar is a rip-and-replace for your current backup environment as it includes both backup software and a B2D appliance. You will remove your current backup agent from your servers and load the Avamar agent in its place. This agent is how Avamar is able to deduplicate at the server level.

How do you expand?

Data Domain comes as an appliance with disk built-in. Depending on the model, you can expand by adding drives until you hit the maximum amount for that model. When you reach the maximum capacity for the model, you must purchase a new unit to upgrade. A Data Domain gateway product is also available which allows you to use your existing storage behind it.

Avamar was designed as a node-based grid solution. Each node has a specific capacity and if you need more backup storage, you add more nodes to your grid. Data is striped within the nodes and also striped across the nodes for additional protection. This means your data is safer, but there is also a parity overhead to be aware of.

Do you really need tape?

Both Avamar and Data Domain allow you to backup to an appliance and then replicate the data offsite to a sister appliance. Now that you have your backups geographically dispersed, do you really need tape? Most people will say no because tape becomes more of a liability and security risk than anything if you already have your backups offsite. However, sometimes the strictest of compliance departments will absolutely require tape even if the backups are stored offsite on disk. If that is the case, you have some decisions to make.

Data Domain doesn’t have any native tape-out functionality, but that is by design. Since you’re using your existing backup software to push data to the Data Domain, you would use that software to push backups off to tape. It is as simple as doing a copy job and copying your backup set to another media.

Avamar was originally designed as a completely tapeless solution. The backup data is pushed to the Avamar appliance and then replicated offsite. However, as Avamar became more popular and people with tape requirements wanted to jump on the bandwagon, EMC began developing a “tape-out” functionality. The first release of Avamar tape-out was basically a script to do a rolling restoration of your backups to a proxy server and then use another backup software to move that restored backup to tape. Not the prettiest scenario in the world, but if you wanted to move your backups to tape monthly, it was serviceable. In the next release of Avamar, the tape-out functionality is being totally re-written and looks much more promising. Stay tuned for more information on this as it becomes available.

So when you’re evaluating which deduplication technology is the best for you, make sure to consider your need for tape in the decision. Tape with Data Domain is much simpler, but Avamar is working to get there too.

There are many other similarities and differences between the two products, but those are some of the heavy hitters. So let’s take a few specific use-cases and see which technology fits the best:

I need to decrease my backup time – If your backups are quickly growing out of your backup window, you need to find a way to backup more data in less time. Data Domain could speed up your process since backing up to disk is faster than tape, however you are still sending your backup data in its full format, so that change will likely be minor. Avamar will deduplicate at the server and only send the changed blocks across the network and this will drastically decrease your backup time.

Advantage: Avamar

I love my existing backup software and just want to decrease my backup footprint – If your existing backup software is working great and you’re just looking to do backup-to-disk with deduplication, Data Domain is the perfect solution for you. Simply plug in the Data Domain appliance, point your backup software to it, and your backups will be deduplicated as they hit the appliance. You can also replicate offsite if needed. Avamar requires removal of your existing backup software and isn’t a great fit for this scenario.

Advantage: Data Domain

I need a better way to backup my remote offices – If you have data at your remote offices and you want to centralize your backups without having huge WAN links, deduplication is a must. Data Domain can help by putting a small appliance at each site and replicating into a large appliance to centralize the backups. Avamar can take it one step further because it deduplicates data before sending across the wire. Because of this, in some situations, you can just put agents at the remote sites and send only the changed blocks across the WAN to your centralized site. This allows you to avoid putting backup hardware at remote sites.

Advantage: Avamar

My Compliance Department requires that I use tape on a weekly basis – If replicating your backups offsite isn’t enough and you must use tape on a regular basis, Data Domain will easily allow you to push to tape by utilizing your existing backup software. Avamar has methods to push to tape that were mentioned above, but they aren’t designed for frequent use.

Advantage: Data domain

So what does all of this mean? The moral of the story is that there isn’t a one-size-fits-all backup solution in the market right now. You really need to know exactly what you want to accomplish and compare that with the product feature sets and determine which the best fit is. The key is understanding what each product has to offer and how that fits into your environment.

Photo Credit: Dunechaser via Flickr