The Many Faces of Archiving (And Why It Isn’t Backup)

If archiving is defined as intelligent data management, then neither Backup Technologies, nor Hierarchical Storage Management (HSM) techniques, nor Storage Resource Management (SRM) tools qualify; however, these continue to be leveraged for archiving as substitute products. Even “Information Lifecycle Management” that would benefit from archiving is now equated with archiving. This has led to a proliferation of archiving products that tend to serve different purposes for different organizations.

Background

IT organizations have long valued the notion of preserving copies of data in case “work” got lost. In fact, with every occurrence of data disaster, the role of data backup operations has strengthened and no company can do without a strategy in place. Since 1951, when Mauchly and Eckert ushered in the era of digital computing with the construction of UNIVAC, the computing industry has seen all kinds of media in which storage could be kept for later recall: punch cards, magnetic tapes, floppy disks, hard drives, CD-R/RW, flash drives, DVD, Blue-ray and HD-DVD to name a few. And the varying formats and delivery methods have helped create generations of vendors with competing technologies.

Backups had come of age … but also became increasingly costly and hard to manage with data complexity, growth and retention.

Backups had come of age, cloaked and dressed with a respectable name “data protection”—the magic wand that was insurance for “data loss.” But, it also became increasingly costly and hard to manage with data complexity, growth and retention. Thus came about the concept of “archiving,” defined simply as “long term data.” That, coupled with another smart idea for moving data to less expensive storage (tier), helped IT organizations to reduce costs. The HSM technique dovetails into tiered storage management, as it is really a method to move data that is not changing or not being accessed frequently. HSM was first implemented by IBM and also by DEC VAX/VMS systems. In practice, HSM is typically performed by dedicated software, such as IBM Tivoli Storage Manager, Oracle’s SAM-QFS, Quantum SGI DMF, StorNext or EMC Legato OTG DiskXtender.

On the other hand, SRM tools evolved as quota management tools for companies trying to deal with hard-to-control data growth, and now include SAN management functions. Many of the HSM players sell tools in this space as well: IBM Tivoli Storage Productivity Center, Quantum Vision, EMC Storage Resource management Suite, HP Storage Essentials, HDS Storage Services Manager (Aptare) and NetApp SANscreen (Onaro). Other SRM products include Quest Storage Horizon (Monosphere), SolarWinds Storage Profiler (Tek-Tools) and CA Storage Resource Manager. Such tools are able to provide analysis, create reports and target inefficiencies in the system, creating a “containment” approach to archiving.

Almost as old as the HSM technique is the concept of Information Lifecycle Management (ILM). ILM recognizes archiving as an important function distinct from backup. In 2004, SNIA gave ILM a broader definition by aligning it with business processes and value, while associating it with five functional phases: Creation and Receipt; Distribution; Use; Maintenance; Disposition. Storage and Backup vendors embraced the ILM “buzzword” and re-packaged their products as ILM solutions, cleverly embedding HSM tools in “policy engines.” And so, with these varied implementations of “archiving tools,” businesses have come to realize different levels of satisfaction.

Discussion

Kelly J. Lipp, who today evaluates products from the Active Archive Alliance members, wrote (in 1999) the paper entitled “Why archive is archive, backup is backup and backup ain’t archive.” Kelly wrote this simple definition: “Backup is short term and archive is long term.” He then ended the paper with this profound statement: “We can’t possibly archive all of the data, and we don’t need to. Use your unique business requirements and the proper tools to solve your backup and archive issues.”

Backup is short term and archive is long term.
— Kelly Lipp

However, the Active Archive Alliance promotes a “combined solution of open systems applications, disk, and tape hardware that gives users an effortless means to store and manage ALL their data.” ALL their data? Yes, say many of the pundits who rely on search engines to “mine” for hidden nuggets of information.

Exponential data growth is pushing all existing data management technologies to their limits, and newer locations for storing data—the latest being “storage clouds”—attempt to solve the management dilemma. But the realization that for the bulk of data that is “unstructured,” there is no orderly process to bring back information that is of value to the business brings increasing concern.

Similarly, though, to the clutter stored in our basement, data that collects meaninglessly may become “data blot.”

Although businesses rely on IT to safeguard data, the value of the information contained therein is not always known to IT. Working with available tools, IT chooses attributes such as age and size and location to measure the worth, and then executes “archiving” to move this data out, so that computing systems may perform adequately. Similarly, though, to the clutter stored in our basement, data that collects meaninglessly may become “data blot.” Data survival then depends on proper information classification and organization.

Traditionally, data has seen formal organization in the form of databases—all variations of SQL, email and document management included. With the advent of Big Data and the use of ecosystems such as “Hadoop,” large databases now leverage flat file systems that are better suited for mapping search algorithms. And this may be considered as yet another form of archiving because data stored here is “immutable” anyway. All of these databases (and the many related applications) tend to have more formal archiving processes, but little visibility into the underlying storage. Newer legal and security requirements tend to focus on such databases, leading to the rise of “archiving” for compliance.

That brings us back full circle. While security and legality play a lot in today’s archiving world, one could argue that these tend to create “pseudo archives” that can be removed (deleted) when the stipulated time has passed. In contrast, a book or film on digital media adds to the important assets of a company that become the basis for its valuation and for future ideas. If one were to create a literature masterpiece, the file security surrounding the digitized asset is less consequential than the fact that 100 years later those files would still be valuable to the organization that owns it.

Archiving … is the preservation of a business’s digital assets: information that is valuable irrespective of time and needed when needed.

The meaning of archiving becomes clearer when viewed as distinctly different from backup. It is widely accepted that purpose of a backup is to restore lost data. Thus, backup is preservation of “work in progress”: data that does not have to be understood, but resurrected as-is when needed. Archiving, on the other hand, is the preservation of a business’s digital assets: information that is valuable irrespective of time and needed when needed. The purpose of archiving is to hold assets in a meaningful way for later recall.

Backup is a simple IT process. Archiving is tied to business flow.

This suggests that archiving does not need “policy engines” and “security strongholds,” but rather information grouping, classification, search and association. Because these tend to be industry-specific, “knowledge engines” would be more appropriate for archiving tools. Increasingly, IT professional services are now working with businesses and vendors alike to bridge the gaps and bring about dramatic industry transformations through the implementation of intelligent archiving.

Conclusion

Backups have grown in importance since the days of early computing, and as technology has changed, so has the costs for preserving the data in different storage media. Backup technologies also have become substitute tools for archives by choosing long-term retention for those data.

With a plethora of tools and techniques developed to manage the storage growth, and contain the storage costs (the HSM techniques and the SRM tools), archiving has been implemented in different organizations for different purposes and with different meaning.

In defining Information Lifecycle Management, SNIA has elevated the importance of archiving, and thereby encouraged vendors to re-package HSM tools in policy engines. On the other hand, databases for SQL and email—and even Big Data ecosystems—have implemented archiving without visibility into the underlying storage.

As archiving tools continue to evolve, it is now considered distinctly different from backup. While backup protects “work in progress,” archiving preserves valuable business information. Unlike backups that need “policy engines,” archiving requires “knowledge engines” which may be industry-specific. IT professional services have stepped in to bridge the gaps and bring about transformations through the implementation of intelligent archiving.

References

1. “Why archive is archive, backup is backup and backup ain’t archive” by Kelly J. Lipp, 1999
2. http://en.wikipedia.org/wiki/Backup
3. http://en.wikipedia.org/wiki/Archive
4. http://www.backuphistory.com/
5. http://www.activearchive.com/blog/19
6. http://www.it-sense.org/webportal/index.php?option=com_content&view=article&id=60&Itemid=67

Photo credit: Antaratma via Flickr