OK, so you have decided that deduplication is the best thing ever and a must have for your backup needs. The next big question on the horizon is what KIND of deduplication is right for you. Two of the big hitters in the market today are EMC’s Avamar and CommVault’s Simpana products. Both products seem to be doing very well in the wild and both approach deduplication in completely different manners.
In the case of Avamar, the product is deduplicating at the client using variable block deduplication. Once the scan is complete on the client server and the deduplication hash is created the client actually checks back with the Avamar Data Store appliance farm to see which blocks the farm has not seen and then only the truly unique blocks across the environment (not just that server) are sent over the wire. This results is extremely high levels of deduplication AND remarkably fast backups since very little data is normally left to send after deduplication and comparison to the rest of the environment. The data is stored on the EMC Data Store appliance which presents a pretty simple GUI for the recovery. The only major chink in the armor of Avamar is that it does not have the ability to natively create tapes for those data sets you may want to retain longer than you have space to keep on the appliance farm.
CommVault came at the deduplication process a completely different way and leveraged their existing tape archive construct to create a form of fixed block deduplication. In this case the clients do the same thing they always did: run their scans, package the data, and then shoot it out. Once the data gets to the Media Agent the deduplication occurs and the data is spit out onto any disk target supported by CommVault. Since the deduplication is fixed block, the deduplication ratios are not as good as with variable block, aka Avamar, but certainly much better than typical compression. Since the deduplication occurs on the Media Agent, there is no savings in backup window time. The good news is that this is CommVault and cutting tapes is it’s forte and completely automated with the ability to have different retentions on each type of media to fit all your compliance desires within a single tool. Also, since the format of the media archive did not change, restores are just as fast with deduplicated data as they were with plain Jane backup to disk, which is huge if you have a lot of data to restore. Avamar can be sluggish in terms of restore on the smaller deployments but still certainly functional for your every day restore needs.
On the grand scheme, Avamar is the holy grail of backup speed since it only every sends fractions of incremental data over the wire to the target which reduces not only backup times but the impact of backup on both the hosts and the network. I also give CommVault a major tip of the hat in how they leveraged their existing technology and morphed it into a deduplication technology that brings huge benefits to their current customer base while staying on the commodity hardware bandwagon.
Obviously there are many more features to both products worth investigating and comparing but now you know how the two differ technically in terms of the deduplication angle.
Photo Credit: JTony via Flickr