Extracting the right data from your transaction system for analysis and decision making is important. You can standardize and correct data using SAP BusinessObjects Data Services. A real-time data migration example illustrates the concepts and walks you through the process. The scenario can help technical teams in any organization understand the data quality process for data migration. You can use this process in a non-SAP environment when data warehouses are built on top of relational database systems for reporting and analysis. You can also use it when the target is in an SAP system.

Key Concept

SAP BusinessObjects Data Services is an information management application for data integration, data quality, data profiling, and text analysis. It allows users to integrate, transform, improve, and deliver trusted data that supports critical business processes so that they can make sound decisions on time.

This article was originally published on BusinessObjects Expert

Data cleansing is an important aspect of SAP BusinessObjects because it allows companies to combine redundant records and streamline data for front-end users. SAP BusinessObjects Data Services is an extraction, transformation, and loading (ETL) tool that combines two facets of data migration (i.e., data quality and data integration) in a single user interface.

SAP provides SAP Best Practices for Data Migration, which is a set of preconfigured data migration jobs using BusinessObjects Data Services to migrate legacy data to another system. SAP Best Practices is available for two versions of BusinessObjects:

BusinessObjects 3.2: https://help.sap.com/bp_dmgv332/DMS_US/HTML/index.htm
BusinessObjects 4.0: https://help.sap.com/saap/sap_bp/DMS_V140/DMS_US/html/index.htm

Note

This information is aimed at end users and developers who are interested in best practices regarding data cleansing.

This article is based on BusinessObjects Data Services 3.2. The process also holds good for BusinessObjects 4.0. Concepts are explained based on real-time data (i.e., transactional data used by any organization in its day-to-day activities), and how you get the data cleansed and ready to load.

Data migration using SAP Best Practices for Data Migration involves the following tasks:

Ensure clean data in the staging database to be mapped to the source fields in the best practices job using data quality
Configure best practices scenarios to the requirement and execute jobs to load the data to the target system

I focus on the data quality aspect of data migration in real-time scenarios. I use vendor master – basic data load as an example to explain the various steps involved in the data migration process using the best practices.

Note

Although the target used in this article is the vendor master tables in SAP, the target can be any table in non-SAP relational or data warehouse tables. All the transforms defined in the article work for both SAP- and non-SAP-based scenarios.

Source Data

Source data for migration is usually extracted from the legacy system in the form of delimited text documents or Microsoft Excel files in comma-separated values (CSV) format. Table 1 shows a sample vendor master data structure with data extracted from a legacy source system. I use this data as a reference throughout this article to explain various steps in the data quality process of data migration.

Table 1

Data extracted from a legacy system

Mapping Source and Target Data

SAP Best Practices has defined standard target data structures to maintain the source data.

Table 2 shows the data structure for vendor master basic data as defined in the best practices job, which in this case is data migration.

Table 2

An example of data structure

This is a standard structure as provided by SAP and designed considering the target SAP system’s database fields. For example, imagine that the vendor is YASH Technologies Pvt Ltd, and the Vendor Name Field in the SAP system is 17 characters long. You cannot store the full name in the first field, so you store the remaining part of the name (i.e., Pvt Ltd) in the vendor last name field, which is optional.

The fields preceded by an asterisk are mandatory. They are mapped to the target field in an SAP system and cannot be left blank. Business users add data to them. If the legacy data does not hold values for the mandatory fields, the best practice jobs fill in default values from the lookup tables defined in the repository. This is the case with the Vendor Account Number field in Table 3. If the source system has values for the mandatory fields, you need to ensure that they are mapped to the right target field. This mapping is the first step in the data quality process.

The next step is to map the legacy and target fields as shown in Table 3. This process includes defining business rules (e.g., providing default values based on the target system configuration).

Table 3

Data source and target fields

A technical developer performs the following data quality activities to achieve the clean staging data:

Address cleansing
Data cleansing
Match process
De-duplication

Let’s look at each of these four activities in detail.

Address Cleansing

Address cleansing ensures that address fields (e.g., City, Region, Country, and Postal Code) are correct and standardized by using a Global Address Cleanse Transform. In Data Services, any data migration, loading activity, or enhancements are done using built-in processes called transforms. The transforms refer to the address directories that SAP provides to validate the values and correct them.

Note

Verify that you have the latest address directories in the server to ensure correct data. Address directories are updated regularly and can be downloaded from https://service.sap.com/support. For my scenario, the data is based only on US addresses. Therefore, I use the U.S. Global Address Cleanse transform.

Figure 1 shows the configuration of the U.S. Global Address Cleanse transform screen. The source fields on the left connect to the target fields on the right map. The tabs below the mapping area are used to configure the rules to achieve the cleansed address fields. You correct the source addresses to generate missing values, populate them, and format them before loading them to the target table.

Figure 1

Mapping U.S. address data in the U.S. Global Address Cleanse transform

You can see the source address in Figure 2 and the target address in Figure 3 after the addresses are corrected.

Figure 2

Addresses prior to the cleanse transform

Figure 3

Cleansed addresses

After you execute the Address Cleanse transform, you cleanse the region, postal code, and address fields and populate the country key (Figure 3). Place all the previously mentioned transforms in the right order of execution under a data services project. When you execute the project, Data Services runs the job. There is no transaction code involved as this is a BusinessObjects tool with a Windows-based GUI.

The following results are achieved by the address cleanse process in Figure 3:

The REGION field is populated with values based on the codes from the source file (e.g., region code HI is replaced with Hawaii in the second row).
The POSTCODE field is enhanced to the nine-digit US format (e.g., postal code 96819 is enhanced to 96819-1982 in the second row based on the locality and region).
The COUNTRY field is populated with the country code based on the postal code.
In the eighth row, the City field is corrected (e.g., Phenix has been corrected to Phoenix).
The address 3131 N Nimitz Hwy #-105 has been corrected to 3131 N Nimitz Hwy Ste 105 in the third row.
In the 10th row, the region name is populated to Oregon based on the postal code (notice that in the source data, the Region field is <Null> for the same).

This example demonstrates how BusinessObjects Data Services can cleanse the data even though the source data on the original Excel sheet was incomplete.

Data Cleansing

Data cleansing ensures that data is parsed, corrected, and enhanced to fit business needs. Data cleansing standardizes data, and during de-duplication of data, it helps eliminate duplicate records. Among the cleansing activities are the following:

Condensation, which is the removal of spaces in the beginning and end of data from an address-cleansed file. This activity helps ensure that the data fits to a target field.
De-punctuation, which removes punctuation from source files for more space, similar to condensation.
Capitalization, which changes the case to uppercase in the source file. Rules are part of the transform and can be customized according to the users’ requirements.
Standardization, which makes telephone and fax numbers uniform and puts them in a consistent structure.

Let’s discuss these activities further.

Condensation

You achieve condensation using a query transform. As shown in Figure 4, use ltrim() and rtrim() functions to truncate the spaces in the beginning and end of data.

Figure 4

Condensation of data using ltrim() and rtrim()

De-punctuation and Capitalization

You perform de-punctuation and capitalization using the Data Cleanse Transform, which helps correct the data and identify duplicate records before load. For example, a Data Cleanse transform is usually done before using a Match transform because it helps identify duplicate records. Duplicate records are removed in a Match transform. Therefore, when the Data Cleanse transform is used, you do the cleansing on the right subset of data rather than on the full data set, reducing the execution time when the data is extensive. The required data cleansing rules are configured in the Options tab (Figure 5).

Figure 5

Configuring de-punctuation and capitalization rules

The results of the Data Cleanse activity include the following (Figure 6):

Vendor names are fully capitalized
Phone and fax numbers are standardized to the US format

Figure 6

Results of de-punctuation and capitalization activities

Match Process

Once the data is standardized, the next step is to identify the duplicate records and eliminate them. This step is done using the Firm Address Match transform (Figure 7). In my example, the matching is done based on the vendor name and the address.

Figure 7

The Firm Address Match transform

Figure 8 shows there is one duplicate record. You can identify it using the Match transform.

Figure 8

Data after address cleansing

After you execute the Match transform, you see the results in Figure 9. The system adds a new column called Firm_Address_Status in the result set. The values in this column are used to identify the duplicate record based on these values:

Firm Match Status = U and P indicate unique records
Firm Match Status = D indicates duplicate records

Figure 9 shows that the duplicate record is populated with a status of D.

Figure 9

Result data after Match Transform

De-duplication

De-duplication is the final step in the data quality process. Once you identify the duplicate records using the Match transform, the next step is to analyze and remove the duplicates. This step is done using a Query transform and filtering the records based on the Firm_Address_Status column values as shown in Figure 10. Based on the values in Firm Match Status generated by Match transform, you can separate duplicate records from unique records.

Figure 10
The de-duplication data flow

A developer adds filters in the WHERE tab of the Query transform as shown in Figure 11.

Figure 11

The Query transform

The system checks the values in the target tables (vendor_dedup table, which is used as a source here) for duplicates. As shown in Figure 12, the duplicate row is loaded into the vendor_duplicates table (right side), and the unique records are loaded into the vendor_unique table (left side).

Figure 12

A duplicate row added on the right

Figure 13 shows the final set of data after the data quality job is executed.

Figure 13

The cleansed data

The table shown in Figure 13 is used as the staging table. Clean data is now ready and can be loaded to the vendor tables in the target system. You use the BusinessObjects Data Services Best Practices Jobs after mapping the fields of the staging table as the source. Detailed documentation on the configuration and execution process can be found here: https://help.sap.com/bp_dmgv332/DMS_US/HTML/index.htm.

Jeswin Davidson Ebenezer

Jeswin Davidson Ebenezer is an SAP BusinessObjects consultant with YASH Technlogies. He has about nine years of experience working on BusinessObjects implementations on ETL and reporting.

You may contact the author at Jeswin.ebenezer@yash.com.

If you have comments about this article or publication, or would like to submit an article idea, please contact the editor.

Articles
Premium

Data Cleansing in Real Time Using BusinessObjects Data Services

Reading time: 10 mins

Membership Required

You must be a member to access this content.

View Membership Levels

Explore related questions

Already a member? Log in here

- SAP Process Automation
How to Develop and Implement an Automation Strategy
- Premium
How to Apply Campaign Automation to Third-Party Marketing Service Providers

Reading time: 11 mins
- SAP Cloud Strategy
The Reality of SAP in the Multi Cloud

June 27, 2023
- SAP Inventory Management
- Premium
Logistics and SCM: Case study: How Minuteman International worked collaboratively across business units to achieve supply chain transformation and a 34% inventory reduction

Reading time: 1 min

SAPinsider EMEA 2025

October 01 - 03, 2025

Copenhagen, Denmark

Learn More
View Event

Key Concept

Source Data

Mapping Source and Target Data

Address Cleansing

Data Cleansing

Condensation

De-punctuation and Capitalization

Match Process

De-duplication

Jeswin Davidson Ebenezer

Data Cleansing in Real Time Using BusinessObjects Data Services

Membership Required

Explore related questions

More Resources

How to Develop and Implement an Automation Strategy

How to Apply Campaign Automation to Third-Party Marketing Service Providers

The Reality of SAP in the Multi Cloud

Logistics and SCM: Case study: How Minuteman International worked collaboratively across business units to achieve supply chain transformation and a 34% inventory reduction

Become a Member

Become a Partner

Upcoming Events

SAPinsider EMEA 2025

Sign Up for the SAPInsider Weekly

Key Concept

Source Data

Mapping Source and Target Data

Address Cleansing

Data Cleansing

Condensation

De-punctuation and Capitalization

Match Process

De-duplication

Jeswin Davidson Ebenezer

Share this Article

Data Cleansing in Real Time Using BusinessObjects Data Services

Membership Required

Explore related questions

More Resources

How to Develop and Implement an Automation Strategy

How to Apply Campaign Automation to Third-Party Marketing Service Providers

The Reality of SAP in the Multi Cloud

Logistics and SCM: Case study: How Minuteman International worked collaboratively across business units to achieve supply chain transformation and a 34% inventory reduction

Become a Member

Become a Partner

Upcoming Events

SAPinsider EMEA 2025

Sign Up for the SAPInsider Weekly