Data Prep Boosted by IBM Updates to Cloud Pak for Data

Before any enterprise can take advantage of data analytics, they've got to ensure their data has been uploaded, cleaned and formatted to allow for easy transferring into analytical models. It's not glamorous work -- but IBM is aiming to make it easier work.

Scot Petersen, Editor at Large

September 11, 2019

4 Min Read
Data
iStock

It doesn't matter if you are trying to do "DataOps," "MLOps" or "AIOps," proper data preparation is the key to success for doing analytics, machine learning or artificial intelligence. Unfortunately for many businesses, data prep is the hardest step.

The data preparation problem is the target of the latest releases from IBM's Cloud Pak for Data, its Red Hat OpenShift-based platform for collecting, organizing and analyzing data. This week, IBM announced updates to three of the Cloud Pak for Data's components to add more automation in each of those areas.

  • IBM updated the Watson Knowledge Catalog with new tools to add more sources of third-party data and ensure data quality. 

  • InfoSphere DataStage, IBM's extract, transfer and load (ETL) tool, now includes a Change Data Capture feature to improve the processes for transforming and transferring data into analytics models.

  • IBM also debuted StoredIQ InstaScan, a product that analyzes data compliance and confidence levels. 

"We are finding that the biggest struggle clients have if they want to do anything with AI comes down to data preparation, which today is still highly manual --  which is hard to believe in 2019," said Rob Thomas IBM's GM of Data and AI.

"What we are announcing are not super exciting things, but are important in the process, things like automating data matching and metadata creation. If you can automate these things that are very labor-intensive today, your engineers and data scientists are freed up to do more meaningful work,” he said.

Conquering data management is not just a technology problem, Thomas stressed. It’s often a bigger problem inside companies to create a mindset around treating data as a valuable resource. “Every company knows they need this but some may not be ready to attack the problem [from a cultural standpoint]. You have to build the right ‘data culture’ that gets the organization thinking about different ways to manage data to leverage analytics.”

Businesses might have the tools and the culture, but may not be focused enough on specific problems, Thomas said, which can further bog down an organization looking for untapped data resources. 

Should an enterprise try to get its data shop in order first or try to attack a specific problem?

“I encourage clients to pick a problem, otherwise the challenge with a generic data prep exercise is that you'll never finish,” he said. “You eventually want to get to a more mature level, like creating a data catalog, but until then, to build momentum, you have to pick a problem and show that you can make progress on that.”

Thomas says he also encourages customers not to get overly focused on unstructured data, which can be harder to analyze, and stick to the bread and butter data sources.

“There’s here's a lot of noise about the rise of unstructured data, like videos, language processing, video, images, but 95 percent of this work is actually on structured data—point of sale, CRM, ERP, web clickstream data,” he said. “You will solve more problems by going after structured data sets.”

Thomas called out one successful early adopter, the state of New Jersey court system, which re-organized its data to the point where it reduced the time to pull a risk profile from hours to seconds and was able to reduce its jail population by 35 percent. Another customer, AMC Networks, was able to start doing more personalized ads based on customer data it had collected.

Not every business is capable of achieving quick results, but the goal is the same: To get to a point where data analytics can be self-service. “You can’t bring enough people to deal with the amounts of data you are dealing with today,” he said. “You need automation, data curation, metadata management. You need self-service-ready data.”

Scot Petersen is a technology analyst at Ziff Brothers Investments, a private investment firm. He has an extensive background in the technology field. Prior to joining Ziff Brothers, Scot was the editorial director, Business Applications & Architecture, at TechTarget. Before that, he was the director, Editorial Operations, at Ziff Davis Enterprise. While at Ziff Davis Media, he was a writer and editor at eWEEK. No investment advice is offered in his blog. All duties are disclaimed. Scot works for a private investment firm, which may at any time invest in companies whose products are discussed in this blog, and no disclosure of securities transactions will be made.

About the Author

Scot Petersen

Editor at Large

Scot Petersen is technology analyst at Ziff Brothers Investments, a private investment firm. Prior to joining Ziff Brothers, Scot was the editorial director for the Business Applications & Architecture group at TechTarget. Before that, he was the director of editorial operations at Ziff Davis Enterprise, While at Ziff Davis Media, he was a writer and editor at eWEEK. No investment advice is offered in his articles. All duties are disclaimed. Scot works for a private investment firm, which may at any time invest in companies whose products are discussed in this blog, and no disclosure of securities transactions will be made.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like