Learn how to create a real-time data quality Web service to deliver accurate and reliable customer data. Explore time-saving tips and tricks for developing a flexible data flow that any application can use to cleanse and standardize data and for configuring an SAP BusinessObjects Data Services server.
Key Concept
The data quality engine on the SAP BusinessObjects Data Services server uses a break key to minimize a comparison recordset considered for a data match. Without the break key narrowing down a recordset, the match operation is inefficient and processes more data than necessary, causing the data flow to perform poorly. Performance is especially important for a real-time job, as these jobs often imbedded in other applications as web services. For example SAP NetWeaver Master Data Management uses the data quality engine natively to perform real-time data quality. Any performance degradation in the real-time job will be magnified in the host application.
Data quality is a topic that is consistently at the forefront of most IT corporate initiatives. Businesses are more concerned than ever about ways to enhance customer experience at every touch point. Having the best customer record at input is the first step in promoting and maintaining quality master data in the enterprise.
Take, for example, ordering an item online and the convenience of a business already knowing your address from past order history, or correcting an entered address with a prompt suggesting, “Did you mean this address?” Companies with mature business intelligence implementations often know a customer’s entire purchase history, and they use this history to both speed up and enhance the ordering experience. Prompts such as “Would you be interested in these products?” are often used in these scenarios.
These examples signify that a company has a mature data quality environment and is actively pursuing master data management. Behind many of these successful data quality implementations is SAP BusinessObjects Data Services, which can verify, associate, and select the best customer record at every customer interaction.
Note
To gain the most from this article, you should be familiar with the SAP BusinessObjects Data Services platform, data warehousing and extraction, transformation, and loading concepts.
I frame my discussion around using SAP BusinessObjects Data Services for real-time data quality. Through examples, I demonstrate a properly constructed data flow that first cleanses a fragmented record and then matches this cleansed data against a deduped master customer list in a data warehouse. All of these operations happen at runtime via an SAP BusinessObjects Data Services Web service. You can create this Web service in the SAP BusinessObjects Data Services Designer, which is the client-installed development studio.
Implementing a real-time SAP BusinessObjects Data Services data quality data flow is a complex process. More steps are involved in creating a real-time data flow than in producing a standard batch data flow. Your first step is to design a real-time data flow that is logical and straightforward. Your next two steps are to expose the real-time data flow job as an SAP BusinessObjects Data Services Web service and configure it within the SAP BusinessObjects Data Services Management Console. Finally, you need to test the Web service to make sure that it functions as you have planned.
As I walk you through these steps, I offer best practices in real-time data flow design, explain time-saving configuration tips within the SAP BusinessObjects Data Services Management Console, and show you how to use open source software for unfiltered testing
Step 1. Design a Real-Time Data Flow
Figure 1 shows you an example of a real-time, data quality data flow as it appears in the SAP BusinessObjects Data Services Designer. The first thing to notice in the data flow is that the source (DqMatchCstmrIn) and target (DpMatchCstmrOut) objects are both XML messages. These messages allow the data flow to exchange information with just about any type of application source or target in real time using a Web service. The Web service created is stateless: The applications do not need to reference the schema or the structure of the data coming in or going out of the real-time job. This keeps the real-time job extremely flexible in a decoupled-service-oriented model.
In short, all the logic and decision making happen within the data flow, and this allows great flexibility in the utilization of the data flow with a variety of applications. The data that is submitted and received to the data flow via XML messages can be a generic structure. By being generic, the XML data exchange is flexible and therefore fits into the landscape for any application that sends or receives XML messages.
Figure 1
Real-time data flow for cleansing and standardizing personal data
The XML input leads to a data cleanse transform. The data transform cleanses and standardizes the customer name fields. In this real-time job, the multi-purpose person field, containing both first and last names, is split into appropriate given (i.e., first) and family (i.e., last) names: MATCH_GIVEN_NAME1 and MATCH_FAMILY_NAME.
As you can see at the top of the data flow in
Figure 1, I add a comment that explains the purpose of the data flow. It is a best practice to disclose comments about the end purpose or special logic within the design of data flow to help others use the schema more effectively. You can also include your name and contact information and the date you created the data flow, in case users need to contact you.
By clicking the EnglishNorthAmerica_DataCleanse transform icon in the data flow, the data cleanse editor window opens up (
Figure 2). The mapping for the Schema Out setting returns the GIVEN and FAMILY names for the Schema In person field.
The Schema Out setting is the output of the EnglishNorthAmerica_DataCleanse in this example, and that is what performs the FName LName split.
Figure 2
Mappings for cleansing the data in the transformed person field displayed the data cleanse editor window
After the data is cleansed according to the mappings, the data moves into the case transform to a decision point in the data flow, i.e., CASE_NULL_KEY (
Figure 1). The case transform interprets the cleansed data from the data cleanse transform and does one of two things: either passes the data through to the match transform if there is enough information for a match (i.e., Name_Phone or NameDOB), or forgoes the match if enough information does not exist (i.e., NULL). You use the case transform because you want to avoid an expensive, long-running operation in the data quality engine when there is not enough data for a successful match.
If there is enough input data to perform the match, then a break key is assembled in CASE_NULL_KEY, and the case transform passes the data to the match transform to perform the match. If the case transform forgoes the matching operation, it still passes the cleansed data from the data cleanse transform to the XML target message DqMatchCstmrOut.
A break key is merely a concatenation of fields that narrows a comparison recordset. The break key is an important concept: Without the break key narrowing down the recordset, the match operation processes more data than necessary, causing the data flow to perform poorly. You can perform the match without the key, but it is best practice to use one. You will achieve a better result and have a better performing data flow.
The match in this example is performed on a portion of family name and phone number data or name and date of birth data. If records match, then they pass from the match transforms to the merge transform, which merges the recordsets together. A final query transform (i.e., OutputPrep) nests the schema to the XML output (DqMatchCstmOut) along with the original record that was submitted to the data flow. The nesting provides a clear break for the records and is standard practice for XML messages.
Step 2. Expose the Real-Time Job as a Web Service
Exposing the real-time job as a Web service allows the data flow to interact with any application capable of sending and receiving Web services using XML messages. To expose the real-time job and data flow, you open the SAP BusinessObjects Data Services Management Console and navigate to the Real-Time and Web Services nodes in the navigation panel in the left side of the console.
Next, select the Web Services Configuration tab, and then click Add Real-time Services from the menu below (
Figure 3). All the available real-time jobs in the repository that you can select to expose as Web services are listed.
Figure 3
Expose the Data Services real-time job as a Web service
Step 3. Configure the Real-Time Job as a Web Service
To configure the real-time job as a Web service and add the real-time job to the SAP BusinessObjects Data Services WSDL, click the Real-Time job node in the navigation pane, and then select an access server name for which you want to configure the job. Next, select the Real-Time Job Configuration tab. As shown in
Figure 4, you adjust the timeout settings and system configuration (i.e., database connection) that the job uses to connect to the database.
Figure 4
Configure the real-time job as a Web service
Step 4. Test the Web Service
To perform a true test of the SAP BusinessObjects Data Services job as a Web service, you need to get “outside” of the SAP BusinessObjects Data Services Designer. When real-time jobs are run inside the designer tool, they are run in batch mode because this tool only tests and compiles batch jobs as executable objects. You can test the real-time job as a batch job, but not as a true real-time job using a source and target XML message.
To test as a Web service, the job must respond to an XML message, and this cannot happen within the SAP BusinessObjects Data Services Designer. To run the job as a real-time job, the Web service must be called, and data elements must be passed into the Web service via WSDL using XML messages. In this way, testing real-time Web services is not like testing other jobs or data flows in the designer tool.
You can test the Web service in a variety of ways, but I find the free open source utility SOAPUI to be a good testing mechanism. It is readily available, straightforward to set up, and yields reliable results. To test with SOAPUI, you must install the SOAPUI product on a workstation and create a project. You can find instructions for installing SOAPUI at
https://www.soapui.org/Getting-Started/installing-on-windows.html.
To create the project, launch the SOAPUI client and specify the parameters for the project (
Figure 5). These parameters name the project and connect the project in SOAPUI to the SAP BusinessObjects Data Services WSDL that was just exposed in the preceding section as a Web service.
The complete address for the WSDL/WADL for SAP BusinessObjects Data Services is:
https://bods1:28080/DataServices/servlet/webservices?ver=2.1&wsdlxml. Note that the address for the WSDL is case sensitive, which is not mentioned in the product documentation. Remembering this important piece of information will save you time and aggravation in the future. In fact, you can copy and paste the above text into the Initial WSDL/WADL field as shown in
Figure 5.
Figure 5
Set up the SOAPUI project to point to the SAP BusinessObjects Data Services WSDL
Be sure to select the Create sample requests for all operations? check box and save the settings by clicking OK. You are now ready to use the Navigator to browse to the real-time services request.
As shown in
Figure 6, you can use the Navigator pane on the left side of the SOAPUI application to find the Web service exposed in the SAP BusinessObjects Data Services Management Console. Follow the path Test_Project > Real-time_Services > Request 1. Request 1 exists by default in the newly created Test_Project.
Figure 6
Use the Navigator to find the exposed Web service Request 1
Clicking Request 1 opens the testing window shown in
Figure 7. Here you enter your test data into the XML document fields in the left input pane, and then click the green arrow icon to submit the request to test the Web service. For example, I enter my name and phone number into the XML fields in the left pane of the testing window.
Figure 7
Data in the testing window before and after submitting the request
The testing shows that the Web service has returned the cleansed and matched records. Now, any application receiving this data not only has accurate and reliable data, but also the global_customer_id, which was returned from the data warehouse as shown in the highlighted row_id column (
Figure 7). This new column strengthens the result by attaching additional logic from the data warehouse.
These results mean that the application produces a better customer record with standardized data, and most importantly, offers the link to the global customer record. The new record is linked to the customer’s purchase history from the data warehouse. This is valuable information for the business to know at the customer touch point.

Don Loden
Don Loden is an information management and information governance professional with experience in multiple verticals. He is an SAP-certified application associate on SAP EIM products. He has more than 15 years of information technology experience in the following areas: ETL architecture, development, and tuning; logical and physical data modeling; and mentoring on data warehouse, data quality, information governance, and ETL concepts. Don speaks globally and mentors on information management, governance, and quality. He authored the book
SAP Information Steward: Monitoring Data in Real Time and is the co-author of two books:
Implementing SAP HANA, as well as
Creating SAP HANA Information Views. Don has also authored numerous articles for publications such as SAPinsider magazine, Tech Target, and Information Management magazine.
You may contact the author at
don.loden@protiviti.com.
If you have comments about this article or publication, or would like to submit an article idea, please contact the
editor.