SAPexpert/CRM
Many companies starting CRM initiatives are not satisfied with the resulting customer views, citing distrust of the data in their systems. Data quality strategies help you resolve data issues that distort customer views. Find out tips for preparing, blueprinting, and implementing a solid data quality strategy.
Key Concept
Identifying inconsistencies and anomalies in your data can range from basic issues of syntax (formatting) to the more technically challenging semantic definitions of the data. A simple syntax issue could entail making to format all product codes in the same way, which may require a simple conversion process. The more complex semantic problems may require the intelligent interpretation of data within the appropriate context.
Many believe that CRM solutions offer the much-sought-after 360 degree view of customer relationships. However, even the best CRM technologies cannot generate clear and unified customer views on their own. They are constrained by limitations of the data within them. In CRM projects, if you factor in the number of systems — often 10, 20, or more — companies target for consolidation, the result can be a tangle of unreliable and often cryptic data.
For example, take the case of a major gas company that was cleaning up and consolidating customer lists from various CRM systems. Several lists contained references to LDIY, although it was unclear what this represented. Name fields contained it (e.g., John Smith LDIY) as well as address fields (e.g., 240 Main St. LDIY). The project manager considered discarding these extra characters, but wisely didn’t. Instead he brought the business users into the process and discovered that LDIY means “Large Dog in Yard” to a meter reader.
Since there was no other place in the system to put this information, meter readers added it where they could to the customer record. The company needed to maintain this information to prevent emergency room trips and workers’ compensation claims. As part of its data quality strategy, the company created a separate field in the customer database for this information.
The most common problem companies encounter when beginning a data quality project is understanding where to begin. At my company, we developed a data quality best practices guide for SAP implementations, based on past successes with SAP clients (Figure 1). I’ll explain the first three phases of a data quality project (project preparation, blueprint, and implement).

Figure 1
Business users and project managers have a significant role in ensuring success of a data quality initiative
Project Preparation
Companies often find their current data quality practices seldom address all of the data issues. Often these practices rely only on one or two key individuals who know the custom code involved with the data, which can be risky.
To begin to solve the data quality problem, you must involve IT and business people at the start. Bring in subject-matter experts from the departments involved with the data. For example, someone from the company’s marketing department may act as a data steward to make sure data adheres to marketing’s data-quality standards.
Once you have set up your team, you should analyze your current data quality practices and decide your project’s priorities. Your answers to the following questions can guide you through prioritizing your data quality initiatives.
- How many internal organizations make decisions based on the mySAP CRM data or reports?
- How critical are these decisions to your business?
- What actions does the company take as a result of these decisions? What is the impact of making a poor decision based on bad data?
- Is the data quality in these systems too poor to consider cleansing at all?
- Which systems support real-time, customer-facing functions? What is the impact of poor quality data to customer retention?
As your cross-functional team identifies your strategic approach based on your analysis, you can see clear priorities emerge in the systems you need to fix. You must address systems with the greatest impact first. Tools used during this phase include data profiling and data discovery technology.
Blueprint
Stage two in a data quality project involves developing a solid plan (blueprint) to ensure project success. In this stage you define data quality metrics and relate these metrics to business initiatives.
Define Data Quality Metrics
Define the data quality metrics you want to track. This may include both high-level, data-centric rules as well as specific rules that apply to a particular system or application. Examples of these rules include:
- Number of records with changes to name fields for improved matching
- Number of records with changes to address data fields for US Postal Service standardization
- Number of duplicate records identified
- Number of processed records with incomplete mailing address but with valid phone numbers or emails
- Number of records with duplicate primary keys
- Number of records with invalid values in xxx field
- Number of transaction records with transaction type = ‘Personal Check’ and a negative transaction value
- Total dollar value associated with erroneous double shipments for transactions with a subset of backordered items
- Total dollar value for customer goodwill where ‘Reason Type’ = “Shipment received late”
- Total dollar value of bills with no invoices
- Total dollar value of invoices with no bills
Relate Data Quality Metrics to Business Initiatives
Data quality does not usually have a definable cost or revenue. To understand the financial impact of data quality on your organization, you must relate it to business functions. For example:
- What is the revenue associated with transactions with invalid product values? Here you investigate what was actually shipped and what was the expected revenue.
- How frequently do orders with items on back-order result in double shipments of non-back-ordered items?
- How many times do double shipments occur? Look at the cost of these double shipments, and figure out their impact on revenue reporting and inventory.
- How do erroneous shipments impact inventory?
- How often are there shipments that were never billed?
- How often are invoices created for orders that were never shipped? Find out how frequently this resulted in customer service calls and assess the cost of compensating dissatisfied customers.
- How many duplicate mailings are you sending to the same household? Detail the cost for those duplicates in terms of mail pieces, postage, and opportunity lost.
These numbers and metrics are not only valuable for judging how you’re doing on the project; they are also very valuable to project management. They document successes and justify further expenditure on the data quality project. In this phase, the IT team also needs to consider things like:
- How am I going to access the data?
- What will my data model look like?
- How much server power and storage will I need?
Finally, the IT team should consider setting up some test case scenarios with the business users at this stage and a plan to deal with exceptions.
Implement
To help solve the data quality problem at the implementation phase, you can employ one of several strategies. These strategies include SAP Master Data Management (MDM), MDM with third-party data quality tools, and a non-MDM implementation.
MDM
MDM supports SAP Enterprise Service Architecture (ESA). In addition to helping you set up the framework for service-oriented architecture (SOA), MDM offers a user interface (UI) that supports administrative tasks such as handling data exceptions and assigning role-based access to business processes and information. Data managers use this UI to configure data source merging, business rules, and distribution details to downstream applications.
Using the data models and mappings in MDM, you can manually find and merge exact duplicates, select records to merge, and check data ranges during import. MDM can deliver an easy way to connect to and synchronize data and support vertical object data models.
MDM with Data Quality Tools
The option of MDM in conjunction with third-party data quality tools places less emphasis on how the data moves from place to place or how the data is distributed. Instead, it focuses on automating processes that prevent bad data from entering your systems. It also finds anomalies in the data automatically.
MDM with data quality tools can give you better match/dedupe (the removal of duplicate entries) results in two ways. First, instead of just finding exact matches (John Smith in database one to John Smith in database two), data quality tools can find duplicates in misfielded data (John Smith in the address line), or data that has characters missing or transposed (Jon Smith). If you were using MDM alone, you would need to manually search for these items.
A standardization process with data quality tools manipulates the data into the same shape and size and makes sure, for example, that an address is on the address line instead of the name line. This standardization improves matching results, allowing MDM to find a relationship between a part number 09-0990-02 and a part number 0909 9002. The data standardization phase of data quality tools puts the part numbers in the same format.
Second, data quality tools also cut down on large match windows by providing better matching granularity with more relevant matches. For instance, when matching customer records, MDM may pull from part of a name, zip code, or address line and present many results in the match window that may or may not be actual matches. You need to review the potential matches and make many decisions about the viability of a match (Table 1).
| Name |
City |
State |
Postal |
Key |
Result |
| Steve Sarsfield |
Billerica |
MA |
02180 |
02MABILSTV |
|
| Steven M. Sarsfield |
Billreica |
Mass |
01280 |
01MABILSTV |
Not exact match |
| Stephen Sarsfield |
Billerica |
MA |
02180 |
02MABILSTP |
Not exact match |
| S Sarsfield |
Bill. |
MA |
02180 |
02MABILSSR |
Not exact match |
| Stevan Seinfeld |
Billings |
MT |
02000-000-0000 |
02MABILSTV |
Possible false match |
| Steven Sarsfield |
Billerica |
MA |
|
__MABILSTV |
Not exact match |
|
| Table 1 |
Results of a search for “Steve Sarsfield” prior to data cleansing |
With data quality tools, a match window is simply the first step. These tools then use routines and business rules to narrow the window, finding data that is misspelled, has transposed letters, is misfielded, has comma-reversed names, or that is clearly different and shouldn’t be in the match window.
Say you’re trying to find duplicates in (match) your data to the first record, Steve Sarsfield. To do so, matching technologies set up match keys that pull pieces of the records and give you windows, subsets of your data with an opportunity to match. You can then use third-party data tools to perform automated analysis to verify that the records are a match.
In Table 2 a data quality tool cleansed and standardized the data. In this case, the data architect appended the cleansed data and kept the original data intact. The new cleansed records result in more matches when they should, and fewer matches when they shouldn’t.
| Name |
Cleansed name |
Cleansed middle |
Cleansed last |
City |
Cleansed city |
State |
Cleansed state |
Postal |
Geocode result |
New key |
Result |
| Steve Sarsfield |
Steven |
|
Sarsfield |
Billerica |
Billerica |
MA |
MA |
02180 |
02180 |
02MABILSSA |
|
| Steven M. Sarsfield |
Steven |
M |
Sarsfield |
Billreica |
Billerica |
Mass |
MA |
01280 |
02180 |
02MABILSSA |
Possible match |
| Stephen Sarsfield |
Steven |
|
Sarsfield |
Billerica |
Billerica |
MA |
MA |
02180 |
02180 |
02MABILSSA |
Possible match |
| S Sarsfield |
S |
|
Sarsfield |
Bill. |
Billerica |
MA |
MA |
02180 |
02180 |
02MABILSSA |
Possible match |
| Stevan Seinfeld |
Stevan |
|
Seinfeld |
Billings |
Billings |
MT |
MT |
02000-000-0000 |
59101 |
59MTBILSSE |
Excluded from match |
| Steven Sarsfield |
Steven |
|
Sarsfield |
Billerica |
Billerica |
MA |
MA |
|
02180 |
02MABILSSA |
Possible match |
|
| Table 2 |
Cleansed data results for “Steve Sarsfield” |
Data quality tools contain built-in knowledge of data, particularly name and address data. For example, global names and addresses have many nuances. An address in Mexico can look quite different than an address in Belgium — a data quality tool knows the difference.
Non-MDM Implementation
If you’re not yet ready to implement MDM, you can connect data quality into your SAP application in several other ways. Data quality vendors have developed connectors into SAP CRM that allow both real-time and batch cleansing. You would use these connectors instead of MDM, although this process does not allow you to manage data easily within the SAPGUI as the MDM process does.
The two data quality certifications that SAP offers are postal validation (PV) and duplicate check, error tolerant search (DES). SAP designates the PV certification as BC-BAS-PV, which is available for third-party tools to cleanse, standardize, and enhance address data in accordance with postal guidelines. DES, designated as BC-BAS-DES by SAP, offers third-party prevention of duplicate addresses entering the system, using error-tolerant searches. If you enter a record that is slightly different, you can find immediately the “almost” duplicate and decide if it is a match.
At my company, the data quality tools connect to the Basis layer and the system calls them via ABAP. Virtually any application that sits on the Basis layer, including mySAP CRM, can call the data quality functions.
Beyond Implementation
Data quality is not a one-time “fix it and forget it” project. Rather, the process of building a framework for capturing, consolidating, and controlling data quality is most likely to be an evolutionary one that involves business users throughout the organization. Here are some guidelines for moving toward higher data quality:
- Start where the impact of poor information is most critical
- Focus on reusable, portable components to economically and incrementally build better data quality across your organization, including SAP and any other enterprise applications.
- Establish automatic, rules-based processing for analyzing, harmonizing, and maintaining data
- Build data management processes with an eye toward major business processes
- Expand and grow data quality practices over time
Steve Sarsfield
Steve Sarsfield is the product marketing manager for Harte-Hanks Trillium Software and the author of many white papers and articles about data quality. He believes in the philosophy that “it’s all about the data," especially when it comes to making enterprise applications successful.
You may contact the author at steve_sarsfield@trilliumsoftware.com.
If you have comments about this article or publication, or would like to submit an article idea, please contact the editor.