How to Build a Data Stack That Actually Puts You in Charge of Your DataHow to Build a Data Stack That Actually Puts You in Charge of Your Data

Indicium Chief Data Officer Daniel Avancini outlines seven essential considerations for constructing a data stack that ensures you have full control over your data.

8 Min Read
virtual data stack
Alamy

This article was written by Daniel Avancini, chief data officer at Indicium.

A business's data stack — meaning the set of tools it uses to manage data — should be a resource that helps businesses thrive. Too often, however, data stacks become a burden that holds companies back and constrains their ability to innovate.

This happens when organizations become wed to data tools and processes that end up defining what they can do with their data. Instead of using data in adaptable ways based on varying business needs, they let their data stack determine what the business can — and can't — do with the information at its disposal.

Freeing yourself from this constraint (or avoiding it altogether) requires carefully evaluating and selecting the tools your company uses to build its data stack. You want tools that put you in control of your data, not tools that control what you can do with your data.

Now, I'm not here to say that any one tool is the best fit for everyone's needs. It's not, because every company requires data tools that do different things. But what I would like to do is elaborate on what to think about when selecting data tools and building a data stack with the goal of maximizing flexibility and value and minimizing constraint.

How We Got Here: A Brief History of the Data Stack

Related:How BlaBlaCar Reached Its Data Quality Management Destination

Ironically, efforts over the past decade to make data stacks more modular and flexible are a large part of the reason why many companies have ended up with data stacks that restrict, rather than enhance, their flexibility to innovate.

Until the 2010s, most enterprise data platforms were monolithic. They were solutions that packed every data-management capability a company could want into a single product. This made data stacks very inflexible because it wasn't practical to use multiple platforms for different purposes. With rare exceptions, you had to pick a vendor and use its data tooling exclusively, leading to inflexible data management strategies. Migrating between data platforms was also very complicated, exacerbating the lock-in challenges posed by monolithic data solutions.

Then, starting about a decade ago, a series of more modular data tooling solutions began emerging, many from startups. Instead of trying to build monolithic solutions capable of doing everything in the realm of data management, most of these vendors catered to specific types of needs. You had tools that only did data quality, for instance, or that only supported data discovery. The idea behind this new ecosystem was that companies would benefit when they could pick and choose from among multiple solutions to build data stacks tailored to their needs.

To be sure, not being tied to a single monolithic data platform is a good thing. However, modular data tools are also not necessarily guarantees of choice and flexibility. They can be just as restrictive in cases where it's difficult to migrate from one tool to another, or where you use multiple tools that include overlapping or extraneous capabilities — which effectively means you end up paying for more features than you are actually using.

The point I'm getting at here is that, although it has become common to talk about the "modern data stack" as a very flexible and modular approach to meeting data management needs, the reality isn't always so rosy. I'm all in favor of giving companies the ability to pick and choose from multiple data management solutions. But to achieve the full value that the modern stack is intended to provide, you need to think critically about exactly which tools you use and how much flexibility they actually deliver.

How to Build a Data Stack That Actually Delivers Value

There are seven key considerations to address when building a data stack that puts you in full control of your data.

1. Determine which capabilities you need from your stack

First and foremost, you need to think about what your tools actually need to do — and what they don't need to do.

This is important because some businesses don't need every type of data management capability. Data discovery or cataloging may not be important, for instance, to a company that only works with well-organized, structured data.

If you add tools to your stack that provide capabilities you won't use, you needlessly complicate the stack. Worse, you make it harder to evolve your stack over time because you've added moving pieces that deliver no value.

So, before deciding to build a data stack that does everything a monolithic data platform would do — or that includes all of the data management capabilities of your competitors — assess your actual requirements. You may find that they are simpler than you thought, and that your data stack can be simpler, too.

2. Design an agnostic data stack architecture

Next, sketch a data stack architecture that delivers the capabilities you've deemed necessary for your business.

Your goal here should be to determine what your ideal data stack looks like, including not just which types of tools it will include, but also which personnel and processes will leverage those tools.

As you approach this, think in a tool-agnostic way. In other words, rather than looking at vendor solutions and building a stack based on what's available, think in terms of your needs. This is important because you shouldn't let tools define what your stack looks like. Instead, you should define your ideal stack first, and then select tools that allow you to build it.

3. Evaluate data tool capabilities

Once you know what your data stack architecture will ideally look like, you can begin assessing tools that provide them.

Your key focus during this process should be on determining what makes each tool unique. Many tools available today offer overlapping functionality, which can lead to needless redundancy within your data stack. Ideally, you'll select solutions that allow you to build a data stack that does exactly what you require — nothing more, and nothing less.

4. Determine how easy it is to achieve data tools' full potential

Another critical consideration when evaluating tools is how much expertise and effort are necessary to get tools to do what you need them to do.

This is important because too often, vendors make promises about their tools' capabilities — but just because a tool can theoretically do something doesn't mean it's easy to do that thing with that tool. A data discovery tool that requires you to install special plugins or write custom code to work with a legacy storage system you depend on, for example, won't deliver as much value as one that supports the storage format out-of-the-box.

5. Assess migration capabilities

No matter which tools you end up choosing to build your data stack, you should strive to ensure that you can migrate to alternative solutions when and if you need.

In theory, the modularity of the modern data stack makes migration easy. But in practice, migration can be challenging. It may require rewriting transformation rules, for example — a lengthy process that requires specialized expertise.

Here again, the point is that just because you can migrate from one tool to another doesn't mean it's going to be easy. Your goal should be to build a stack that enables seamless migration whenever you require.

6. Evaluate tool costs

Consider as well how much data tools will cost. This may seem obvious, but the varying pricing and licensing models that tool vendors use can make it challenging to draw apples-to-apples cost comparisons. One vendor might charge based only on the volume of data you ingest into its tool, for example, while another charges based on data volume as well as the number of different data assets you're working with. Some vendors charge monthly or yearly licensing fees on top of other costs, while others base their fees on usage alone.

A full discussion of how to piece through different data tool pricing models is beyond the scope of this article. But suffice it to say that before committing yourself to a certain tool, take the time to perform an in-depth analysis to gain an accurate estimate of what the tool will actually cost you to use.

7. Evaluate personnel requirements

The capabilities and cost of tools are only part of the equation when it comes to building a data stack that puts you in charge. People are equally critical. To operate your stack effectively, in a way that actually creates business value, you need well-structured teams that can bring what's necessary to put your tools to maximum use.

The depth of expertise that you'll need from your team will vary depending on what your data stack looks like and how complex your data management operations are. But however you decide to build your team, it's critical to think from the start about what they'll need to be able to do. Otherwise, you risk being constrained in your ability to take full advantage of your data stack due to shortcomings within your team.

Conclusion: Taking Control of Your Data Stack

In a sense, having so many data tools available today is like eating at a restaurant with dozens of options on the menu: While the breadth of choices available to you is a good thing, it can feel burdensome to determine exactly which selections to make.

I can't tell you what to order for dinner. But I can tell you that when choosing tools to build a data stack, you need to think critically and in-depth about what each tool does and how it does it. Otherwise, you risk creating a set of tools that might seem flexible and modular, but that in practice are just as constricting as the monolithic data platforms of old.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like