deduplication, data domain, online storage, back up online, data, data storage

Avamar And Data Domain: The Two Best Deduplication Software Appliances

By | Avamar, Deduplication, Networking, Storage | No Comments

For years backup-to-disk technologies have evolved toward the ingestion of large sums of data very quickly, especially when compared to the newest tape options. This evolution has made backup applications influence disk targets for equally fast restores, even down to the file level.

Essentially what this means is that, today clients can integrate disk-based back-up solutions to fulfill the following conditions:
[framed_box bgColor=”#F0F0F0″ textColor=”undefined” rounded=”true”] – Mitigate risk of traditional tape failures

– Reduce the amount of time it takes to perform large back-up jobs

– Reduce the amount of capacity required at the back-up target by nearly 10-20X more than tape

– Reduce the amount of data traversing the network during a back-up job (Avamar or similar “source-based” technologies)

– Lower the total cost of ownership in comparison to tape

– Enable clients to automate the “off-site” requirement for tape by the way of replicating one disk system to the next over long distances

– Lower the RTO and RPO for clients based on custom policies available

Data Domain deduplication methods are useless without backup software in place. By leveraging Data Domains OST functionality (DDBoost), we can now combine Data Domain’s deep compression ability with the superior archiving abilities of Avamar.

Through Source-Based Deduplication, Avamar’s host side enables environments with lower bandwidth and longer backup windows to push the backup process much faster. Also, after completing the initial backup, this strategy results in less data on disks, which is good for everyone.

deduplication, data domain, online storage, back up online, data, data storage

Where Data Domain shines the most is in its ability to compress the then deduplicated data 10X more than Avamar. This integration allows Avamar to cut weekend, month-end and year-end backups to the Data Domain, allowing for much longer retention. This feature expands Avamar’s reach into extended retention cycles to disk, which is one of the faster restore methods.

Data Domain’s “target-based” deduplication technology means the backup/deduplication process happens at the actual DD Appliance. Data Domain is the actual target, as it is here that the deduplication takes place.

All data has to go over the network to the target when leveraging Data Domain. If there is a need to backup 10TBs then 10TBs need to traverse the network to the DD Appliance. When leveraging Avamar, I may only need to send 2TBs over the network, given the fact that data has been deduped prior to pushing to the target.

Taking Data Domain even further, Avamar can replicate backups to another Data Domain off site.

Allowing Avamar to control the replication enables it to keep the catalogues and track the location of the backup. This ability gives the end user ease of management when a request is made to restore. The prerequisites for DDBoost are both the license enabler for DDBoost and the Replicator on Data Domain. Overall this integration of the two “Best Deduplication Appliances” allows the end user a much wider spectrum of performance, use and compliance.

For a deeper dive into deduplication strategies, read the article from IDS CTO Justin Mescher about Data Domain vs EMC Avamar: which deduplication technology is better.

Protecting Exchange 2010 with EMC RecoverPoint and Replication Manager

By | Backup, Deduplication, Disaster Recovery, EMC, Replication, Storage | No Comments

Regular database backups of Microsoft Exchange environments are critical to maintaining the health and stability of the databases. Performing full backups of Exchange provides a database integrity checkpoint and commits transaction logs. There are many tools which can be leveraged to protect Microsoft Exchange environments, but one of the key challenges with traditional backups is the length of time that it takes to back up prior to committing the transaction logs.

Additionally, the database integrity should always be checked prior to backing up: to ensure the data being backed up is valid. This extended time often can interfere with daily activities – so it usually must be scheduled around other maintenance activities, such as daily defragmentation. What if you could eliminate the backup window time?

EMC RecoverPoint in conjunction with EMC Replication Manager can create application consistent replicas with next to zero impact, that can be used for staging to tape, direct recovery, or object level recovery with Recovery Storage Groups or third party applications. These replicas leverage Microsoft VSS technology to freeze the database, RecoverPoint bookmark technology to mark the image  time in the journal volume, and then thaw the database in a matter of less then thirty seconds – often in less than five seconds.

EMC Replication Manager is aware of all of the database server roles in the Microsoft Exchange 2010 Database Availability Group (DAG) infrastructure and can leverage any of the members (Primary, Local Replica, or Remote Replica) to be a replication source.

EMC Replication Manager automatically mounts the bookmarked replica images to a mount host running the Microsoft Exchange tools role and the EMC Replication Manager agent. The database and transaction logs are then verified using the essentials utility provided with the Microsoft Exchange tools. This ensures that the replica is a valid, recoverable copy of the database. The validation of the databases can take from a few minutes to several hours, depending on the number and size of databases and transaction log files. The key is: the load from this process does not impact the production database servers. Once the verification completes, EMC Replication Manager calls back to the production database to commit and delete the transaction logs.

Once the Microsoft Exchange database and transaction logs are validated, the files can be spun off to tape from the mount host, or depending on the retention requirement – you could eliminate tape backups of the Microsoft Exchange environment completely. Depending on the write load on the Microsoft Exchange server and how large the journal volumes for RecoverPoint are, you can maintain days or even weeks of retention/recovery images in a fairly small footprint – as compared to disk or tape based backup.

There are a number of recovery scenarios that are available from a solution based on RecoverPoint and Replication Manager. The images can be reversed synchronized to the source – this is a fast delta-based copy, but is data destructive. Alternatively, the database files could be copied from the mount host to a new drive and mounted as a recovery storage group on the Microsoft Exchange server. The database and log files can also be opened on the mount host directly with tools such as Kroll OnTrack for mailbox and message-level recovery.

Photo Credit: pinoldy

What Happens When You Poke A Large Bear (NetApp SnapMirror) And An Aggressive Wolf (EMC RecoverPoint)?

By | Backup, Clariion, Data Loss Prevention, Deduplication, Disaster Recovery, EMC, NetApp, Replication, Security, Storage | No Comments

This month I will take an objective look at two competitive data replication technologies – NetApp SnapMirror and EMC RecoverPoint. My intent is not to create a technology war, but I do realize that I am poking a rather large bear and an aggressive wolf with a sharp stick.

A quick review of both technologies:


  • NetApp’s controller based replication technology.
  • Leverages the snapshot technology that is fundamentally part of the WAFL file system.
  • Establishes a baseline image, copies it to a remote (or partner local) filer and then updates it incrementally in a semi-synchronous or asynchronous (scheduled) fashion.


  • EMC’s heterogeneous fabric layer journaled replication technology.
  • Leverages a splitter driver at the array controller, fabric switch, and/or host layer to split writes from a LUN or group of LUNs to a replication appliance cluster.
  • The split writes are written to a journal and then applied to the target volume(s) while preserving write order fidelity.

SnapMirror consistency is based on the volume or qtree being replicated. If the volume contains multiple qtrees or LUNs, those will be replicated in a consistent fashion. In order to get multiple volumes replicated in a consistent fashion, you will need to quiesce the applications or hosts accessing each of the volumes and then take snapshots of all the volumes and then SnapMirror those snapshots. An effective way to automate this process is leveraging SnapManager.

After the initial synchronization SnapMirror targets are accessible as read-only. This provides an effective source volume for backups to disk (SnapVault) or tape. The targets are not read/write accessible though, unless the SnapMirror relationship is broken or FlexClone is leveraged to make a read/write copy of the target. The granularity of the replication and recovery is based off a schedule (standard SnapMirror) or in a semi-synchronous continual replication.

When failing over, the SnapMirror relationship is simply broken and the volume is brought online. This makes DR failover testing and even site-to-site migrations a fairly simple task. I’ve found that many people use this functionality as much for migration as data protection or Disaster Recovery. Failing back to a production site is simply a matter of off-lining the original source, reversing the replication, and then failing it back once complete.

In terms of interface, SnapMirror is traditionally managed through configuration files and the CLI. However, the latest version of ONCommand System Manager includes an intuitive easy to use interface for setting up and managing SnapMirror Connections and relationships.

RecoverPoint is like TIVO® for block storage. It continuously records incoming write changes to individual LUNs or groups of LUNs in a logical container aptly called a consistency group. The writes are tracked by a splitter driver that can exist on the source host, in the fabric switch or on a Clariion (VNX) or Symmetrix (VMAXe only today) array. The host splitter driver enables replication between non-EMC and EMC arrays (Check ESM for latest support notes).

The split write IO with RecoverPoint is sent to a cluster of appliances that package, compress and de-duplicate the data, then sends it over a WAN IP link or local fibre channel link. The target RecoverPoint Appliance then writes the data to the journal. The journaled writes are applied to the target volume as time and system resources permit and are retained as long as there is capacity in the journal volume in order to be able to rewind the LUN(s) in the consistency group to any point in time retained.

In addition to remote replication, RecoverPoint can also replicate to local storage. This option is available as a standalone feature or in conjunction with remote replication.

RecoverPoint has a standalone Java application that can be used to manage all of the configuration and operational features. There is also integration for management of consistency groups by Microsoft Cluster Services and VMWare Site Recovery Manager. For application consistent “snapshots” (RecoverPoint calls them “bookmarks”) EMC Replication Manager or the KVSS command line utilities can be leveraged. Recently a “light” version of the management tool has been integrated into the Clariion/VNX Unisphere management suite.

So, sharpening up the stick … NetApp SnapMirror is a simple to use tool that leverages the strengths of the WAFL architecture to replicate NetApp volumes (file systems) and update them either continuously or on a scheduled basis using the built-in snapshot technology. Recent enhancements to the System Manager have made it much simpler to use, but it is limited to NetApp controllers. It can replicate SAN volumes (iSCSI or FC LUNs) in NetApp environments – as they are essentially single files within a Volume or qtree.

RecoverPoint is a block-based SAN replication tool that splits writes and can recover to any point in time which exists in the journal volume. It is not built into the array, but is a separate appliance that exists in the fabric and leverages array, and fabric or host based splitters. I would make the case that RecoverPoint is a much more sophisticated block-based replication tool that provides a finer level of recoverable granularity, at the expense of being more complicated.

 Photo Credit: madcowk

Tape Sucks: Avamar 6.0 Version #tapesucksmoveon

By | Avamar, Backup, Data Loss Prevention, Deduplication, EMC, Replication, Storage, VMware | No Comments

Since its release last week, there has been a lot of buzz around the Avamar 6.0. I am going to take the liberty of reading between the lines and exploring some of the good (but gory) details.

The biggest news in this release is the DDBoost/Data Domain integration into the new Avamar Client binaries. This allows an Avamar client to send a backup dataset stream to a DataDomain system as opposed to an Avamar node or grid. Datasets that are not “dedupe friendly”(too large for Avamar to handle, or have very high change rates) are typically retained for shorter periods of time. These can be targeted to a DD array, but still managed through the same policies, backup and recovery interface.

Client types supported pertaining to this release are limited to Exchange VSS, SQL, SharePoint, Oracle and VMWare Image backups. Replication of data is Avamar to Avamar and DataDomain to DataDomain: there isn’t any mixing or cross-replication. Avamar coordinates the replication and replicates the meta-data so that it is manageable and recoverable from either side. From a licensing perspective, Avamar requires a capacity license for the Data Domain system at a significantly reduced cost per TB. DDBoost and replication licenses are also required on the Data Domain.

There is a major shift in hardware for Avamar 6.0: 

  1. The Gen4 Hardware platform was introduced with a significant increase in storage capacity.
  2. The largest nodes now support 7.8TB per node – enabling grids of up to 124TB.
  3. The new high capacity nodes are based off of the Dell R510 hardware with 12 2TB SATA drives.
  4. To speed up indexing the new 7.8TB nodes also leverage an SSD drive for the hash tables.
  5. There are also 1.3TB, 2.6TB and 3.9TB Gen4 Nodes based off of the Dell R710 hardware.
  6. All nodes use RAID1 pairs and it seems the performance hit going to RAID5 on the 3.3TB Gen3 nodes was too high.
  7. All Gen4 nodes now run SLES (SUSE Linux) for improved security.

There were several enhancements made for grid environments. Multi-node systems now leverage the ADS switches exclusively for a separate internal network that allows the grid nodes to communicate in the event of front-end network issues. There are both HA and Non-HA front end network configurations, depending on availability requirements. In terms of grid support, it appears that the non-RAIN 1X2 is no longer a supported configuration with Gen4 nodes. Also, spare nodes are now optional for Gen4 grids if you have Premium Support.

Avamar 6.0 is supported on Gen3 hardware, so existing customers can upgrade from 4.x and 5.x versions. Gen3 hardware will also remain available for upgrades to existing grids as the mixing of Gen3 and Gen4 systems in a grid is not supported. Gen3 systems will continue to run on Red Hat (RHEL 4).

Avamar 5.x introduced VStorageAPI integration for VMWare ESX 4.0 and later versions. This functionality provides changed block tracking for backup operations, but not for restores. Avamar 6.0 now provides for in-place “Rollback” restores leveraging this same technology. This reduces restore times dramatically by only restoring the blocks that changed back into an existing vm. The other key VMWare feature introduced in version 6.0 is Proxy server pooling – previously, a proxy was assigned to a datastore, but now proxy servers can be pooled for load balancing in large environments.

There were several additional client enhancements on the Microsoft front including Granular Level Recovery (GLR) support and multistreaming (1 to 6 concurrent streams) for Exchange and Sharepoint clients.

All in all, the Avamar 6.0 release provides several key new features and scales significantly further than previous versions. With the addition of Data Domain as a target, tape-less backup is quickly approaching reality.

Photo Credit: altemark

Leveraging EMC,VNX, & NFS To Work For Your VMware Environments #increasestoragecapacity

By | Deduplication, EMC, Replication, Storage, Virtualization, VMware | No Comments

Storage Benefits
NFS (Network File System) is native to UNIX and Linux file systems. Because the NFS protocol is native to UNIX and Linux, it allows the file system to be provisioned thin instead of thick, with ISCSI or fiber channel. Provisioning LUN’s or datastores thin, allows the end user to efficiently manage their NAS capacity. Users have reported a 50% increase in both capacity and usable space. 

Creating NFS datastores is a lot easier to attach to hosts than FC or ISCSI. There is no usage of HBA’s or fiber channel fabric, and all that needs to be created is a VMkernel for networking. NAS and SAN capacity can quickly become scarce if the end user can’t control the amount of storage being used, or if there are VM’s with over provisioned VMDK’s. NFS file systems can also be deduplicated, and not only are user’s saving space via thin provisioning, the VNX can track similar data and store only the changes to the file system. 

EMC and VMware’s best practice is to use deduplication on NFS exports which house ISO’s, templates and other miscellaneous tools and applications. Enabling deduplication on file systems which house VMDK’s is not a best practice due to the fact that the VMDK’s will not compress. Automatic volume manager can also stripe the NFS volumes across multiple RAID groups (assuming their array was purchased with more than just 6 drives). This increases the I/O performance of the file system and VM. Along with AVM extending the datastores, this makes the file system transparent and beneficial to VMware (assuming you are adding drive to the file system). AVM will extend the file system to the next available empty volume, meaning if you add drives to the file systems you will be increasing the performance of your virtual machines. 

Availability Benefits
Using VNX, Snapsure snapshots can be taken of the NFS snapshots and mounted anywhere in both physical and virtual environments. NFS Snapshots will allow you to mount production datastores in your virtual environment to use them for testing VM’s without affecting production data. Leveraging SnapSure will allow the end-user to keep up with certain RTO and RPO objectives. SnapSure can create 96 checkpoints and 16 writable snapshots per file system. Not to mention the ease of use Snapsure has over SnapView. Snapsure is configured at the file system level, just right-click the file system, select how many snapshots you need, add a schedule and you’re finished. 

From my experience in the field the end-user finds this process much easier than SnapView or replication manager. Using VNX, NFS will also enable the user to replicate the file system to an offsite NS4-XXX without adding any additional networking hardware. VNX Replicator allows the user to mount file systems on other sites without affecting production machines. Users can replicate up to 1024 file systems, and 256 active sessions. 

Networking Benefits
VNX datamovers can be purchased with 1 GB/s or 10 GB/s NICs. Depending on your existing infrastructure, the VNX can leverage LACP or ether channel trunks to increase the bandwidth and availability of your NFS file systems. LACP trunks enable the datamover to monitor and proactively reroute traffic from all available NIC’s in the Fail Safe Network, therefore increasing storage availability. It has been my experience interacting with customers who are leveraging 10GB on NFS, that they have seen a huge improvement in R/RW to disk and storage, as well as VMotion from datastore to datastore with up to 100% bandwidth and throughput.

Photo Credit: dcJohn

VMware Virtual Machine Backup with Avamar Deduplication Solution (Installation How-to #Fromthefield)

By | Avamar, Backup, Deduplication, Replication, VMware | No Comments

Initially, our customer had a huge challenge backing up his virtual environment. He had to treat his VMs as physical machines. Due to the long backup window of “backup to tape,” the customer was only able to a get a full backup on the weekend. As we all know, backup to tape is the cheapest; but with VMware establishing itself as an industry standard, tape can no longer be our primary backup target.

At first, he attempted to backup to a SAN partition mounted to a VM—although this improved his backup window, the customer found a new challenge of moving the data from disk to tape. Unfortunately, moving the data off the SAN partition to tape was still taking too long and he didn’t have enough space to accommodate a second day’s backups. Given that, he opened up to the suggestion of moving toward backup to disk as part of an Avamar deduplication solution.

The installation was very straight forward. Once I initialized the nodes, we started by deploying the Avamar VM image client. The proxy is a self-deploying VM; once we load the OVA, we simply go through this 4 step process.

  1. Networking (IP, DNS, Hostname)
  2. Time zones
  3. Proxy type Windows or Linux (what type of VM you will be backing up)
  4. Register client to main Avamar node

Take note, this is very important: if you will not be installing a genuine certificate for your virtual center, you must modify the mcserver.xml file as follows.

[framed_box bgColor=”#7b0000″ textColor=”#fefefe” rounded=”true”]You have log in as admin

type ssh-agent bash

ssh-add ~admin/.ssh/admin_key

dpnctl stop mcs

<text editor> /usr/local/avamar/var/mc/server_data/prefs/mcserver.xml

Edit this line and set to true as shown in example

<entry key=”ignore_vc_cert” value=”true” />

Once you’ve modified the file type dpnctl, start mcs[/framed_box]

This will allow you to add the virtual center server to Avamar and import clients.  Trust me, if you don’t complete the above importing, the virtual center will fail.

Once we’ve imported the clients and enabled change block tracking, Avamar has a default policy group that includes all of your VM clients. Simply add the schedule and retention of your liking, and your VMs are ready for backups. Utilizing the proxy agent, Avamar will back up the entire virtual machine (.vmx, .nvram and .vmdk files).

The benefit of backing up the server as a Virtual Machine is that it will allow you to restore a server seamlessly, without having to load an OS and an application. We were able to seed the VM to Avamar within 12 hours. The next backup ran for about 15 minutes. Avamar was finding between 99.5 and 100% of common data.

Once we converted all the VMs backups to Avamar, the customer was able to perform full backups of over 25 machines daily. And in order to comply with offsite media, we replicated all of the data on Avamar to a secondary Avamar node, which, after a three day seeding window, would take less than 4 hours to replicate the changed blocks over the WAN.

Leveraging Avamar to backup VMware will increase your productivity, RTO and RPO. And guys, it’s just plain simple to use! DOWN WITH TAPES!!!

[Photo credit to SandiaLabs on Flickr]

Data Domain vs. EMC Avamar: Which deduplication technology is better?

By | Avamar, Deduplication, EMC | No Comments

Now that EMC owns both Data Domain and Avamar, I am constantly being asked which technology is better. Before the Data Domain acquisition, it was tough to get a straight answer because the two deduplication giants were constantly slugging it out and slandering each other to try and find an edge and gain more market share. With the two technologies now living under the same umbrella, sometimes it is hard to tell where one technology ends and the other begins.

Read More

Will Your Data Benefit from Deduplication? Find Out By Testing Dedupe Rates with the EMC CAT Tool

By | Deduplication, EMC | No Comments

Is deduplication really all it’s cracked up to be?

With everyone in the industry talking about deduplication, you can’t go 2 minutes without hearing how great it is and outlandish claims regarding deduplication rates. So the question is… is dedupe really all it’s cracked up to be? The answer isn’t really in the deduplication technology itself. It’s actually in the make-up of the data you’re looking to deduplicate. So how do you know if dedupe is the right technology for you?

Do you have a ton of highly-compressed images or multimedia files? These aren’t the ideal data types for deduplication.

Does your environment contain a lot of large databases like SQL, Exchange, Oracle, and Exchange? If that is the case, dedupe can help, but not as much as those crazy marketing numbers say.

Do you have large File Servers? Lots of VMware? Remote offices you need to backup? This is where dedupe really shines and you can really add some efficiencies in your environment. It is also where those numbers like 200:1 or 500:1 come from and can actually be beat in some cases.

Now of course your data doesn’t nicely fit into just one of those categories above. It likely spans two of them, if not all three. So we’re back to the original question…is dedupe the right technology for you? One of the keys to deploying an *effective* deduplication solution is to know where to deploy it, how to deploy it, and what to expect.

Just because you can deduplicate your live VMware or database environment doesn’t necessarily mean you should. There are a lot of implications to trying to deduplicate data that is frequently accessed and performance can severely suffer in some cases. While dedupe is a great technology, it can bring your environment to its knees if implemented incorrectly. I’ll address this in another post because that is a whole different topic. Today I want to focus on how to figure out if, and how, deduplication can benefit your business in the backup process.

Knowing where to deploy deduplication and what rates to expect can really only be determined through an assessment of your current environment. EMC has a great tool called the Commonality Assessment Tool (CAT) that will allow you to look at a subset of your data and see exactly what the commonality is. This tool can be downloaded from the IDS website for free here (click the link for the EMC CAT tool download).

So what is this tool and what can you expect from it? EMC offers a deduplication solution called Avamar, which is a backup software and backup-to-disk appliance all wrapped into one. The CAT Tool is essentially a modified Avamar client that will perform a simulated backup on your server(s) and instead of actually backing the data up, it just tracks what the deduplication rate is and how long the actual backup would have taken. I’ll quickly take you through the process of running this tool and show you how easy it is to figure out exactly how much commonality is in your data.

***One important thing to note before beginning is that the CAT Tool has the same impact on your system as a normal backup client. It is recommended to run it off-hours and not during your regular backup window.

  1. Download the CAT Tool from the IDS website (click for the deduplication rate test tool download). A link will be emailed to you where you can download a zip file containing the tool.
  2. Extract the zip file to C:Avasst. The directory will contain avtar.exe and avasst.exe. *Note: This directory must exist or the tool will not run correctly.
  3. Open a command prompt by going to Start/Run and running cmd.exe.
  4. Browse to the CAT directory by typing “cdAvasst”
  5. Run the CAT tool by typing “avasst”
  6. You will be prompted to select a folder to scan. In this example, I will scan the D Drive by entering “d:”. If you want to scan multiple folders at once, see the notes at the end.
  7. The first time you run the tool, you can expect the tool to take approximately 1 hour per 100GB of file and 1 hour per million files. However, subsequent runs will be much quicker due to deduplication.
  8. When the tool has completed running, you will see the following screen:
  9. Now if you look in the c:Avasst folder, you will see several files that are tracking the deduplication rates and backup times for your data. They are just raw data and need to be run through a tool in order to interpret the results.
  10. In order to see the full benefits of deduplication, you will want to run this tool against the same dataset several times (at least 3). You can also run it across several different datasets to see commonality across several servers. Since the commonality tracking is stored locally in the c:Avasst folder, you will want to mount directories from other servers and scan them from this server across the network.
  11. When you have scanned your datasets, zip the results and send them to your IDS Engineer to have the results interpreted.

Some other notes on the CAT Tool:

  • If you want to scan data from other servers, you can simply mount another server’s drive to a drive letter on the local server and scan that drive.
  • In a real deduplication solution, all data will be deduplicated globally against other servers and backup sets. Since the assessment tool tracks deduplication locally, you will need to scan all datasets from the same server to see global deduplication benefits.
  • The CAT Tool can easily be scheduled using the built-in Windows Scheduler. Those instructions are included in a Word document included with the CAT Tool download.
  • If you want to scan multiple folders at once, you will need to create a silent file that contains the folders you want to scan. Simply create a file named “Silent” with no extension in the c:Avasst folder. Inside of that file, just put a line for each drive you want to scan (see screenshot below)

**Note that you cannot end any entries with a backslash. For example, C: and D:Users are valid, but C: and D:Users are invalid.

Deduplication Wars: EMC Avamar vs CommVault Simpana

By | Avamar, Deduplication, EMC | No Comments

OK, so you have decided that deduplication is the best thing ever and a must have for your backup needs. The next big question on the horizon is what KIND of deduplication is right for you. Two of the big hitters in the market today are EMC’s Avamar and CommVault’s Simpana products. Both products seem to be doing very well in the wild and both approach deduplication in completely different manners.

In the case of Avamar, the product is deduplicating at the client using variable block deduplication. Once the scan is complete on the client server and the deduplication hash is created the client actually checks back with the Avamar Data Store appliance farm to see which blocks the farm has not seen and then only the truly unique blocks across the environment (not just that server) are sent over the wire. This results is extremely high levels of deduplication AND remarkably fast backups since very little data is normally left to send after deduplication and comparison to the rest of the environment. The data is stored on the EMC Data Store appliance which presents a pretty simple GUI for the recovery. The only major chink in the armor of Avamar is that it does not have the ability to natively create tapes for those data sets you may want to retain longer than you have space to keep on the appliance farm.

CommVault came at the deduplication process a completely different way and leveraged their existing tape archive construct to create a form of fixed block deduplication. In this case the clients do the same thing they always did: run their scans, package the data, and then shoot it out. Once the data gets to the Media Agent the deduplication occurs and the data is spit out onto any disk target supported by CommVault. Since the deduplication is fixed block, the deduplication ratios are not as good as with variable block, aka Avamar, but certainly much better than typical compression. Since the deduplication occurs on the Media Agent, there is no savings in backup window time. The good news is that this is CommVault and cutting tapes is it’s forte and completely automated with the ability to have different retentions on each type of media to fit all your compliance desires within a single tool. Also, since the format of the media archive did not change, restores are just as fast with deduplicated data as they were with plain Jane backup to disk, which is huge if you have a lot of data to restore. Avamar can be sluggish in terms of restore on the smaller deployments but still certainly functional for your every day restore needs.

On the grand scheme, Avamar is the holy grail of backup speed since it only every sends fractions of incremental data over the wire to the target which reduces not only backup times but the impact of backup on both the hosts and the network. I also give CommVault a major tip of the hat in how they leveraged their existing technology and morphed it into a deduplication technology that brings huge benefits to their current customer base while staying on the commodity hardware bandwagon.

Obviously there are many more features to both products worth investigating and comparing but now you know how the two differ technically in terms of the deduplication angle.

Photo Credit: JTony via Flickr