Why an open archive API is needed to transfer data between cloud services

ITPro Today

April 7, 2016

4 Min Read
Why an open archive API is needed to transfer data between cloud services

Microsoft is listed as one of the “Challengers” in Gartner’s Magic Quadrant for Enterprise Information Archiving (29 October 2015), which is testimony to the hard work and effort put into building out the compliance features over the last few years since their debut in Exchange 2010.

Gartner’s market definition is “Enterprise information archiving (EIA) incorporates products and solutions for archiving user messaging content (such as email, IM, and public and business social media data) and other data types (such as files, enterprise file synchronization and sharing [EFFS] and Microsoft SharePoint documents, some structured data, and website content).”

The scope for archiving is much wider than email, yet it seems to me that few who have moved workload into Office 365 have considered how they might extract their data. I covered this topic in November 2015 and some further consideration brings me to a list of the possible content types that I might have to extricate if I wanted to move data from my Office 365 tenant.

Exchange mailbox data is a relatively easy task because hybrid connectivity allows mailboxes to be moved back to on-premises servers if the need arises.

SharePoint documents, lists, and other metadata are more problematic, but companies like Sharegate can help. Remember that SharePoint Online spans traditional sites as well as providing storage for OneDrive for Business and Office 365 group libraries.

Then there’s the new information such as the videos uploaded into the Office 365 Video Portal, which holds its metadata in SharePoint and the transcoded video content in Azure Media Services. Hopefully, you have all the original videos that were uploaded to the portal.

And if Yammer is used, the small matter of how to transfer information held in its groups to some other repository comes into play.

There’s more to consider too, such as the shared notebooks used by Office 365 Groups plus all the configuration information for the tenant and user settings. In short, moving away from Office 365 (or any other cloud service) is a bear.

The costs involved in such an activity can be staggering. For instance, HP’s Digital Safe is a cloud-based archiving solution used by large enterprises. Let’s say that you want to move your archives to a competitor solution, such as those offered by Veritas, Proofpoint, or Mimecast. A large price tag is usually associated with the work necessary to extract the data from the source archive, package it into a format suitable for the target, and ingest it into the new archive. The effort is likely to require a lot of manual intervention that will drive the cost well into six or even seven figures.

Which brings me to the need for something like an Open Archive API to allow for high-fidelity transfer from one archive to another (or one cloud service to another). Ideally, the API would accommodate transfer via “drive shipping” or network uploads, just like the Office 365 Import Service does today. In fact, the Office 365 Import Service is probably further along the path than other archive vendors because it has a specification for ingestion packages that ISVs can use to create feeds from non-Microsoft data sources.

Because of the volume of data held in archives, the API needs to be optimized for high-capacity transfer of information and have sophisticated error-checking and reporting capabilities. It would also need the ability to assure that the chain of custody for data is maintained during a transfer so that legal challenges could not be mounted on the basis that information could have been interfered while being transferred.

An Open Archive API would mean lower cost and greater customer convenience when the time came to move data between repositories. It would eliminate the need for specialized connectors to extract or ingest data. Providing of course the archiving vendors agreed to support the API.

Some might say that the EDRM model would be a good basis because it’s already in use in the eDiscovery space. However, I think we need an API that has much better coverage of the data now in use in cloud services together with the ability to restore items in the target repository as if they were created there initially. And eDiscovery operations often restore to PSTs to move data around from, which seems like a pretty retrograde step.

Cloud services have been with us for a decade. We solved the problem of email interoperability years ago when SMTP won over X.400 to become the de facto standard. Given the increasing amount of data held in cloud services today, isn’t it about time that the industry worked together to give tenants true control over their information and fulfil all those promises that “it is yours to take with you if you decide to leave the service

I think the time is ripe for archive vendors to do a better job of interoperability. Do you agree?

Follow Tony @12Knocksinna

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like