Cutting the Cost of Unstructured Data Management

As IT costs scale higher but budgets do not, IT leaders are examining enterprise workloads to find new sources of savings. Unstructured data is a frequent starting point, as organizations find themselves storing even greater volumes of data from IoT sensors, social media, and analytic-driven applications.

IDC estimates that the volume of data generated worldwide will double by 2026 – and that unstructured data will account for over 90% of this, with enterprise data growing twice as fast as consumer data. While unstructured data is key to delivering invaluable business insights, managing petabytes of data is complex and costly. And unless the data is readily available to business departments to use as analytics input or for AI training, there is no payoff for the cost of storing it.

The nature of unstructured data complicates this challenge. Any solution must work within the hybrid cloud strategies that are proliferating across enterprises. Because hybrid clouds enable data to be generated in multiple locations – on-premises, at the edge, and in the cloud – while also requiring it to be accessible from multiple locations, they add to the complexity.

To deal with these issues and curb storage costs while deriving value from unstructured data, enterprises should explore three options.

#1. Cut the Cost of Public Cloud Storage

Storage in public infrastructure clouds runs hot and cold, literally, as most IT teams use it to handle both hot or frequently accessed data, and cold or infrequently accessed data. Better management of both can deliver significant cost savings and begins by establishing a clear picture of what data is being stored, where it is being stored, which applications are using it, and how they are using it. This allows a rethink of whether data belongs on-premises or in a public cloud, cutting the costly network access and egress charges for copying or moving data between locations. Wherever possible, data should be held in the same location as the applications that process it.

#2. Use Data Management Platforms To Build Data Fabrics

Leaders should create an automated, policy-based data management environment to implement what has been variously labeled as a data fabric or a data-first strategy that will contain costs; optimize data access; and ensure compliance, governance, and data security. Gartner has predicted that by 2024, data fabric deployments will quadruple efficiency in data utilization while cutting human-driven data management tasks in half.

A range of data management software platforms are available for this purpose. The first step before implementation is to assess the current situation and expectations for future changes. This includes determining both the IT and business goals and methods of measuring whether they have been achieved, and of course, ensuring buy-in at an executive level, and the close involvement of the business units that will benefit from the data fabric.

Managing any resource requires knowing what you have. This is problematic for unstructured datasets; data management platforms can automate the inventory of unstructured data by examining its metadata and in some cases also searching content for personally identifiable information.

While data management platforms offer a diverse range of capabilities, this is one of the key functions, as the visibility it provides not only helps IT teams manage storage efficiently but also helps business units understand what data is available for their use.

#3. Choose NAS vs. Object Storage as Requirements Dictate

While the use of data management platforms is key to optimizing costs and maximizing the value of unstructured data, so is the choice of the underlying storage. For unstructured data at anything beyond desktop scale, the choice is between file-based storage – also known as network-attached storage (NAS) – and object storage. Both can be implemented as on-premises systems or used as public cloud services, and many of the systems and services include management and other capabilities that overlap with some data management platform functionality, such as automatic data tiering.

NAS and object storage offer competing advantages. Historically NAS has provided higher performance, although the differential has narrowed. NAS is frequently used in document-sharing applications, as well as data storage for web applications and virtual servers.

Object storage applications increasingly overlap with the uses of NAS. IT teams tend to deploy object storage when applications scale intensively, or when data needs to be available from anywhere, anytime. Workloads with large volumes of disparate data that are intended for use in analytics and AI training are also a good fit for this type of storage since its larger numbers of connections can provide additional performance.

The Payoff: More Than Just Control of Runaway Costs

The growth of unstructured data is unlikely to slow, meaning that CIOs must get to grips with this problem now. Enterprises with a better understanding of how to optimize access to data can use it to improve a company’s bottom line.

About the Author

Candida Valois

Candida Valois is field CTO for Scality, a hardware-agnostic storage software firm whose solutions help organizations build reliable, secure, and sustainable data storage architectures. Candida is an IT specialist with 20-plus years of IT experience in architecture, development of software, services, and sales for various industries. She is passionate about technology and delivering valuable solutions.

Twitter: @scality @CandidaValois
Author LinkedIn: https://www.linkedin.com/in/cvalois/
Company LinkedIn: https:///www.linkedin.com/company/Scality/

Comments

Plain text