Getting Ready for AI with a Strong Data Management Foundation
Meet the Authors
Key Takeaways
⇨ With the advent of AI, data quality is an even more stringent requirement
⇨ Recent research indicates that foundational data management practices are not in place for many companies.
⇨ A byproduct of companies spending the effort to clean their data as part of their SAP S/4HANA migration is they are building a strong data foundation ready for AI applications
Garbage in, garbage out. We’ve all heard the adage applied to business process design, enterprise systems implementation, and really any scenario that has raw material as an input and a finished good as an output. The quality of the result depends on the quality of the inputs.
This has been true for decades when applied to data analytics, visualization, and decision-support. The most sophisticated algorithms cannot credibly inform decisions without reliable operational data as an input. And now, with the advent of AI, data quality is an even more stringent requirement.
“The good news is that all these data initiatives that companies had before around data governance, data cleansing, and data management are still relevant if not more so than ever before,” said Daniel Yu, SVP, Product Marketing & Solution Management, SAP Data and Analytics.
Explore related questions
Table stakes for preparing enterprise data to leverage AI productively include the following:
- Data quality: Ensuring that data is accurate, complete, consistent, and meets the needs of the organization. This involves data profiling, data cleansing, and data validation processes.
- Data management: Establishing procedures for data collection, storage, and retrieval, as well as defining data ownership and accountability.
- Data privacy and security: Implementing measures to protect sensitive and confidential data in compliance with data protection regulations (e.g. GDPR, HIPAA). This includes data encryption, access controls, and data masking.
- Data lifecycle management: Managing data throughout its entire lifecycle, from creation and capture to archiving and disposal.
- Data cataloguing and metadata: Creating a centralized catalogue of data assets and defining metadata to describe data elements, helping users understand and locate data.
- Data lineage: Providing a detailed record of how data is sourced, transformed, stored, and consumed.
- Data stewardship: Appointing data stewards responsible for overseeing data within specific domains or business units, ensuring data quality and compliance.
- Data compliance: Ensuring that data handling practices align with legal and regulatory requirements, industry standards, and internal policies.
- Data access and authorization: Defining who has access to what data and under what conditions, as well as implementing access control mechanisms.
- Data governance council: Forming a governance council or committee to oversee and enforce data governance policies and resolve data-related issues.
- Data training and awareness: Providing training and education to employees about data governance practices and the importance of data stewardship.
- Data auditing and monitoring: Regularly auditing and monitoring data-related processes to identify issues, assess compliance, and make improvements.
While the data management practices that have supported reporting and analytics use cases serve as a foundation for AI, their importance is now greatly magnified.
“It’s no longer decision support, meaning there’s some other element there to make a final decision. With AI, that decision now will become automatic. So that data trust level is 10 times, maybe 100 times more important,” said Mike Keilen, SVP Solution Management & GTM BTP-Core at SAP.
Recent research indicates that these table stakes are not in place for many companies. More than half of companies that participated in SAPinsider’s recent Data Management Strategies report said that their data management strategies were meeting their requirements only partially, slightly, or not at all.
For building products company Pacific Coast Companies, the journey to data quality and readiness began five or six years ago, when it recognized that managing data was mission critical. The company split its data among its six subsidiaries and then held each subsidiary accountable for the quality of its own data. Results have been mixed and data quality remains a challenge.
CIO Marty Menard recommends three steps in pursuit of data quality. The first is to admit you have a problem. The second is to establish new processes for how you manage data. And the third is to invest in resources to go and clean everything up and then make sure people are following the process going forward.
“One of the four or five things that will really limit our growth is being able to have accurate high-quality data,” said Menard.
Pacific Coast Companies brought in a third party that helped identify where it had data inconsistencies. The provider had tools that combed through company data, pulled it out of SAP, and then identified where the inconsistencies were and made recommendations. This diagnostic was the first step. And then the second step was to get agreement on how the process was going to work going forward. Today, the company is managing all its data within SAP, and it is considering adding SAP’s master data management solution going forward.
Menard and his colleagues are not standing still waiting to completely solve data quality issues before pushing forward with predictive analytics, robotic process automation, AI, and machine learning initiatives. In fact, over the summer he sponsored an internship program that brought in computer science students to test and prove out experimental AI models.
If anything, AI is serving as an accelerant and a forcing function for data management improvement initiatives. It is evolving data management from what used to be a CIO problem to what is rapidly becoming an enterprise-wide opportunity.
SAP’s Datasphere is a Business Technology Platform (BTP) tool that is well positioned to enable data management practices that will ultimately support AI adoption. Datasphere is the next generation of SAP Data Warehouse Cloud, and is a comprehensive data service that enables every data professional to deliver seamless and scalable access to mission-critical business data. It equips any organization to deliver meaningful data to every data consumer — with business context and logic intact.
What this means for SAPinsiders
SAP S/4HANA migrations could be a tailwind.
Companies making the move from ECC to SAP S/4HANA are already making decisions about what to do with historical and operational data. Data migration companies, like Switzerland-based Data Migration International, work with their customers to identify which data needs to be migrated to the target system and which data can be left behind in a data warehouse. It is often the case that only 10% or 20% of data needs to be migrated and therefore data quality initiatives need only be applied to the subset of data bound for SAP S/4HANA. A byproduct of companies spending the effort to clean their data as part of this migration is they are building a strong data foundation ready for AI applications.
Data literacy is a cultural necessity.
AI, particularly generative AI, is a productivity tool. Companies need to provide training to their employees for them to fully understand how data can be used on a day-to-day basis, and what data means for the company, end-customers, and the marketplace. For those involved in the crafting of the underlying data models, it’s important to understand how bias is reflected in models. These individuals need to consider the impact of the decisions they’re making and be proactive in limiting the bias that ends up in these data models.
Process integrity can’t be overlooked.
Getting data ready for AI is as much a process problem as it is a tools problem. Companies must have clearly documented procedures for how data is captured, stored, transformed, and consumed.
“We can go and buy a data quality solution in the marketplace and we can teach people how to use it, but if the process isn’t established for how they’re going to manage this with the end in mind, the end being how I’m going to do a comparison from a reporting or analytics perspective, it’s a waste of time and money,” said Menard.