Data Domain vs. EMC Avamar: Which deduplication technology is better?

Lego Ninja vs. Robot

Data Domain vs. Avamar – Which is better?

Now that EMC owns both Data Domain and Avamar, I am constantly being asked which technology is better. Before the Data Domain acquisition, it was tough to get a straight answer because the two deduplication giants were constantly slugging it out and slandering each other to try and find an edge and gain more market share. With the two technologies now living under the same umbrella, sometimes it is hard to tell where one technology ends and the other begins.

Both Avamar and Data Domain have their pros and cons and the niches where they fit best, as well as places where they shouldn’t be deployed. If you’re reading this post, you’re probably trying to figure out which technology would be the best for you. In an effort to try and help sort the sales fluff from the truth, I have tried to summarize the similarities and differences between the two products so you can figure out which one best fits your environment.

First off, the two products share some common attributes, so let’s look at those first.

Block-based deduplication

Rather than just deduplicating full files that are exactly the same, Avamar and Data Domain will break files apart into small blocks and compare those blocks to ones that have already been backed up. Each unique block only needs to be backed up once within your environment. So let’s say you have a 100MB PowerPoint presentation that has your name on the front slide and you send it to 9 other people and each person makes a small change and puts their name on the front slide. Traditional backup technologies will see each one as a new file and backup 10 copies of the file, each 100MB in size. Avamar and Data Domain will break the file apart into blocks and see that the only a small portion has changed and backup just the changes.

Variable length deduplication

Both Avamar and Data Domain utilize variable length deduplication rather than fixed length de-dupe. What does that mean in Layman’s terms? Fixed-length deduplication always looks for segments of the same size when looking for common data. So if you use a 128K fixed block, it will look at the first 128K of the file, then the second 128K of the file, and so on looking for common data. Variable-length deduplication takes a more intelligent approach and can vary the size of the segment when it is looking for commonality. This means if small changes are inserted into the middle of a file, it is smart enough to pick out just those changes and still see the common data around them. When changes are inserted into the middle of a file with fixed-length deduplication, the data will typically shift and the remainder of the file can often be seen as all new data. Avamar and Data Domain both utilize variable-length deduplication and that is why you will typically see much higher commonality rates than most of the competition.

That is where the similarities end and the differences begin. Here are some key areas where the two products differ:

Where does the deduplication happen?

Data Domain utilizes target-based deduplication. The Data Domain appliance is simply a disk target that you point your backup software at. Backups leave the server in their full format and are deduplicated on the fly as they hit the Data Domain appliance. The data flowing out of the server and across the network is not reduced, but the amount of data stored on disk is reduced significantly.

On the other side, Avamar utilizes source-based deduplication. Since Avamar is both the backup software and backup-to-disk target, it can actually deduplicate the data before it leaves the server. This means that files are broken apart and deduplicated before any backup data is sent across the network. Only the changed blocks are sent across the network to the backup-to-disk target. This results in a reduction in network traffic, the amount of data stored on disk, and also the time it takes you to backup.

What is included?

Data Domain was designed to very simply integrate into any existing backup environment with very little effort. You still utilize your existing backup software and just point to the Data Domain appliance as a backup target.

Avamar is a rip-and-replace for your current backup environment as it includes both backup software and a B2D appliance. You will remove your current backup agent from your servers and load the Avamar agent in its place. This agent is how Avamar is able to deduplicate at the server level.

How do you expand?

Data Domain comes as an appliance with disk built-in. Depending on the model, you can expand by adding drives until you hit the maximum amount for that model. When you reach the maximum capacity for the model, you must purchase a new unit to upgrade. A Data Domain gateway product is also available which allows you to use your existing storage behind it.

Avamar was designed as a node-based grid solution. Each node has a specific capacity and if you need more backup storage, you add more nodes to your grid. Data is striped within the nodes and also striped across the nodes for additional protection. This means your data is safer, but there is also a parity overhead to be aware of.

Do you really need tape?

Both Avamar and Data Domain allow you to backup to an appliance and then replicate the data offsite to a sister appliance. Now that you have your backups geographically dispersed, do you really need tape? Most people will say no because tape becomes more of a liability and security risk than anything if you already have your backups offsite. However, sometimes the strictest of compliance departments will absolutely require tape even if the backups are stored offsite on disk. If that is the case, you have some decisions to make.

Data Domain doesn’t have any native tape-out functionality, but that is by design. Since you’re using your existing backup software to push data to the Data Domain, you would use that software to push backups off to tape. It is as simple as doing a copy job and copying your backup set to another media.

Avamar was originally designed as a completely tapeless solution. The backup data is pushed to the Avamar appliance and then replicated offsite. However, as Avamar became more popular and people with tape requirements wanted to jump on the bandwagon, EMC began developing a “tape-out” functionality. The first release of Avamar tape-out was basically a script to do a rolling restoration of your backups to a proxy server and then use another backup software to move that restored backup to tape. Not the prettiest scenario in the world, but if you wanted to move your backups to tape monthly, it was serviceable. In the next release of Avamar, the tape-out functionality is being totally re-written and looks much more promising. Stay tuned for more information on this as it becomes available.

So when you’re evaluating which deduplication technology is the best for you, make sure to consider your need for tape in the decision. Tape with Data Domain is much simpler, but Avamar is working to get there too.

There are many other similarities and differences between the two products, but those are some of the heavy hitters. So let’s take a few specific use-cases and see which technology fits the best:

I need to decrease my backup time – If your backups are quickly growing out of your backup window, you need to find a way to backup more data in less time. Data Domain could speed up your process since backing up to disk is faster than tape, however you are still sending your backup data in its full format, so that change will likely be minor. Avamar will deduplicate at the server and only send the changed blocks across the network and this will drastically decrease your backup time.

Advantage: Avamar

I love my existing backup software and just want to decrease my backup footprint – If your existing backup software is working great and you’re just looking to do backup-to-disk with deduplication, Data Domain is the perfect solution for you. Simply plug in the Data Domain appliance, point your backup software to it, and your backups will be deduplicated as they hit the appliance. You can also replicate offsite if needed. Avamar requires removal of your existing backup software and isn’t a great fit for this scenario.

Advantage: Data Domain

I need a better way to backup my remote offices – If you have data at your remote offices and you want to centralize your backups without having huge WAN links, deduplication is a must. Data Domain can help by putting a small appliance at each site and replicating into a large appliance to centralize the backups. Avamar can take it one step further because it deduplicates data before sending across the wire. Because of this, in some situations, you can just put agents at the remote sites and send only the changed blocks across the WAN to your centralized site. This allows you to avoid putting backup hardware at remote sites.

Advantage: Avamar

My Compliance Department requires that I use tape on a weekly basis – If replicating your backups offsite isn’t enough and you must use tape on a regular basis, Data Domain will easily allow you to push to tape by utilizing your existing backup software. Avamar has methods to push to tape that were mentioned above, but they aren’t designed for frequent use.

Advantage: Data domain

So what does all of this mean? The moral of the story is that there isn’t a one-size-fits-all backup solution in the market right now. You really need to know exactly what you want to accomplish and compare that with the product feature sets and determine which the best fit is. The key is understanding what each product has to offer and how that fits into your environment.

Photo Credit: Dunechaser via Flickr

9 Comments

  • Dan Gauld says:

    Nice post. Very informative. Wondering do you have an opinion on which technology works best with Databases? I hear the avamar DB restoration sucks wind.

  • Sean Livingstone says:

    Some comments on yours. :-)

    In the section “Block-based deduplication”, the last statement is not 100% correct, both will see the same block but only Avamar will not back it up, again. Data Domain will not STORE it again however, the backup software being used with Data Domain will backup the same data over and over again, especially the full backups. So, Avamar will move the duplicate segment only once during backup, in the Data Domain case the associated backup application will move the same data over and over again. This is sort of mentioned in the section “Where does the deduplication happen?” but, it is worth expanding on it in the first section, just to provide clarity.

    In the section “Where does the deduplication happen?”, it menions that Avamar is rip-and-replace, which is correct for the servers that you will be protecting with Avamar. Good to note thought that Avamar can work along side users’ existing backup applications, for the use cases where Avamar is not a good fit. Many customer have this environment working and using both to their advantage.

    In the section “How do you expand?” it mentions that Avamar has parity overhead that a user has to be aware of. Data Domain uses RAID 6, which also has parity, double parity overhead in RAID 6’s case. This is not a bad thing because it serves to protect the backed up data, which is all good. Point is both have parity overhead.

    Thank you.

  • Roseanne Sullivan says:

    Another difference between Avamar and Data Domain deduplication is in the area of replication. I am a Data Domain technical writer who recently took an Avamar Administration course and looked at their user guide,so I have a cursory familiarity.
    – Replication in Data Domain depends on an optional software feature that is licensed.
    – Replication seems to be built into the Avamar server software
    Avamar supports full (or root-to-root) replication, which creates a complete logical copy of an entire source server on the destination Avamar server.
    Not sure whether Data Domain Replicator enables root replication.
    It seems that replication in Avamar must be done only when backups are finished. In Data Domain, replication can occur simultaneously.

  • Justin says:

    Dan – When it comes to databases and deduplication, your mileage truly will vary. I’ve got customers who are getting 97% commonality on full database backups and others who get under 40%. Both of those are extremes and the rates you’ll see will typically be somewhere in the middle. Your comment on Avamar Database restores sucking is likely regarding remote office restores. Avamar is great because in many cases, it will allow you to backup remote office servers back to a central location directly over the wire because it deduplicates the data before sending it across the WAN. The one thing to remember is that if you have to do a full database restore, the entire database must be copied across the wire because none of the data exists at the remote site. Avamar does a great job with databases but I always warn people to have realistic expectations around restore times over the wire and the deduplication rates that databases will get in comparison with file servers.

    Sean – You make a couple of great points and I appreciate you providing some clarity. Your point on block-based deduplication is spot-on and is definitely a big differentiation point between Avamar and Data Domain and how much data is actually moved off of the server being backed up.

    Regarding the rip-and-replace aspect of Avamar, it definitely only applies to servers where you will be deploying Avamar. I have seen many people bring in Avamar for a specific subset of data such as remote office backups, VMware, etc. and leave their legacy backup software for other portions of the environment. Given the node-based architecture of Avamar, that makes it very easy for people to phase Avamar in and add nodes over time instead of doing a full rip-and-replace of their existing backup architecture.

    Great point on the parity overhead of both Avamar and Data Domain as well. Data Domain has the RAID 6 over head and Avamar has both the RAID 5 and RAIN overhead. I agree 100% that this is far from a bad thing. Since you’re only storing each unique piece of data in your environment one time, maintaining the integrity of that data becomes even more critical. Whether it is Data Domain’s Data Invulnerability Architecture or Avamar’s use of RAIN and Snapshots, both do an amazing job of making sure your backups are safe.

    Roseanne – As you stated, Avamar replication is built-in and does not require an additional license. Data Domain’s replication does require an optional license on both sides. With Avamar, you can choose which backup sets are replicated and if you want to replicate your daily, weekly, monthly, and/or annual backups. Replication with Avamar is scheduled as part of the backup job and happens once the backup completes.

    In the Data Domain world, backups can be sent to different folders and then you choose which folders to replicate. It does not natively allow for granular replication of just your monthly or weekly backups. However, you could tell your backup software to put your monthly backups in a different folder and replicate only that folder.

    Those are two more distinct differences that weren’t mentioned in my original post. Thanks for bringing them up.

  • Justin–great summary of the differences. I’m the founder of Avamar, and it’s always eye opening to find people who understand the technology well enough to communicate competitive differentiation.

    A quick note on databases in the comments section: variable segmentation doesn’t help for databases. Databases store data in blocks inside files. Those blocks are fixed-sized blocks (often 8K), and the blocks have structure that resists de-duplication (unique IDs, transaction record log information, integrity check information, actual user data, etc.). These “structural wrappers” decrease the efficiency of de-duplication solutions; while you will still see day-over-day (e.g. snapshot) and compression benefits, you will rarely see global de-duplication benefits. As a result, databases tend to pig up storage in de-duplication solutions and create capacity management challenges (you need global de-duplication benefits to outweigh the overhead for de-duplication).

    While today’s solutions will work OK, databases are unique applications that really need unique solutions.

  • Arthur Clay says:

    Our experience with Avamar after using it for about 1 year is very positive for file server data, most databases. It works well even replicating across slow wan links (as long as they are reliable).

    Downsides:
    Exchange deduplication is awful. Do not rely on this to work if you have a medium-large exchange databases or slow wan links to replicate across.

    Initial seeding of data is poorly thought through. Ensure you have an alternative way to get your data seeded to your replication destination (e.g. tape or high speed link).

  • […] This post was mentioned on Twitter by Roy Mikes, Kempet. Kempet said: #EMC Data Domain vs Avamar, which deduplication technology is better? http://alturl.com/27ati #emcnl […]

  • Mike G. says:

    With Avamar performing its dedupe at the source, is it made aware of blocks of data that are common with other servers.

    For example if 2 servers both have the same 100MB file, are they aware of each other’s blocks of data, or is the avamar agent only aware of the data of a single machine?

    • Justin Mescher says:

      @Mike – Each host maintains a hash cache file which keeps track of every unique block that has been backed up from that host. It then uses that hash file to determine whether a changed block needs to be sent to the Avamar Datastore. In your example, the block is unique to the server, but already exists in the Avamar Datastore. In that case, the server will send the block to the Datastore and it will be discarded and a pointer stored instead. So you would not be saving bandwidth in this example, but you would still be saving space on disk. Hopefully this answers your question.

      Thanks!
      Justin

Leave a Reply

Sign up for the IDS newsletter! Subscribe