SAPxperts/BI
Including a data validation process as part of your SAP BW or SAP NetWeaver BI implementation helps you find problems in your data more quickly and lends credibility to your data. Learn how to optimize your data validation effort with seven important principles.
Key Concept
Validation is a process that checks incoming data in your BW system. It ensures that critical data elements, such as the number of records or sums of certain value fields (key figures), are updated properly in the data warehouse.
SAP BW and SAP NetWeaver BI users must know and trust that BW/BI data is correct, complete, and current. The longer bad data sits in the data warehouse, the greater the risk of making poor business decisions based on that data. Data validation becomes more critical as the data volume increases and its use spreads across the organization.
Most organizations perform some form of data validation, but it is rarely part of a formal process. These organizations often get caught up with tracking statuses, such as the status of the data load and packet, as an indicator of BW data quality. Although this is certainly one part of ensuring proper data quality, it is not the whole story — loads can complete, but data errors can arise because of flaws in data extraction, transformation, and update. When data errors occur, user confidence in data warehouse quality erodes, which can ultimately undermine the success of the enterprise data warehouse (EDW) initiative.
Let us explain the data validation concept and show you the best ways to approach it. We’ll then take you through the seven key principles every good data validation effort should follow. They are:
- Validate all data
- Validate data at a high level of granularity
- Put business meaning behind the validation elements
- Make data validation a continuous and automated process
- Know the source of your data
- Customize your validation approach
- Validate the EDW across all layers of the data warehouse architecture
The Data Validation Concept
Data validation provides users clear proof that the EDW data they are viewing is valid. However, it is not a simple process. You must understand the associated business processes, data entry modes, and the specific data warehousing models and applications.
For example, when implementing data validation measures for Financial Accounting/Controlling (FI/CO), you should involve key users from the finance department participate in the process. They know best which pieces of data are vital to organizational measurement and decision making. Without this, the validation functions cannot achieve their full potential — to improve user confidence in EDW output and prevent data errors.
Once established, different parts of the organization can use the validation solutions. For example, the finance group can use validation analysis both to prepare reporting designs prior to go-live and perform ongoing month-end closing processing. The validation checks ensure that BW imports all the required data correctly from the source system.
7 Principles for Data Validation
1. Validate all data. The idea of validating across all data with a representative sample is insufficient. All it takes is a few missing or duplicated records, or garbled key figures, to ruin the accuracy of many reports. To validate data, you need to determine the following:
- Application areas to validate
- Level of granularity for the validation
- Time frame for data validation (days, months, quarters, years)
- Validation frequency (daily, weekly, monthly, yearly)
For example, you can evaluate Profitability Analysis (CO-PA) line item data from the current month and the previous three months to prepare for each month-end close. By validating that you have received the data correctly from the R/3 source into BW, the finance user can be sure that the closing reports contain accurate data.
2. Validate data at a high level of granularity. Validation performed at a document line item detail level is comprehensive, but requires time and effort. Moreover, it duplicates the extraction process, which has time and database space implications.
Therefore, it is important to design validation functions on a high level of granularity. They should focus on detecting when and where a problem exists, rather than providing transparency down to the individual documents that are different. Data validation is intended primarily for analyzing data quality broadly, with the ability to segment the data into smaller subsets to find where quality problems exist.
For example, with a validation design you could evaluate the aggregated record count and key figure sums for all the sales orders changed in R/3 in the past 90 days to ensure that they have all loaded correctly into BW. You should design the validation to provide subtotal groupings for company code, sales organization, and document type. This allows the user to figure out where record count or key figure sum differences exist.
3. Put business meaning behind the validation elements. Validate the data elements that are meaningful to the business, such as item net value and sales quantity. Technically, this is unnecessary because you can use the numbers behind key figure values for validation. However, data validation is an important method of improving user confidence in the data. As such, the numbers you validate must be meaningful to the user. This means that, for example, if users rely on the correctness of net sales numbers, then users take comfort in knowing that someone has validated these numbers and seeing the validation results in the net sales area.
4. Make data validation a continuous and automated process. You should execute validation functions in a carefully timed sequence to the data loading process. To realize the full value of migrating to a validation framework, you must perform the validation continuously, without imposing an operational burden on the BW support team.
Design, set up, and perfect validation functions in background processes that run automatically. This allows the BW support team to interpret problems highlighted in the validation results, investigate their causes, and quickly enact corrective measures, rather than spending time performing manual functions to initiate and compile validation data.
Integrating data validation is a critical part of the design, quality assurance, and production phases of the project lifecycle. Successful data quality process integration offers substantial benefits for an SAP BW/SAP NetWeaver BI implementation. With this integration in place, an organization uses BW support resources more effectively, detects and resolves problems faster, and encourages greater user confidence in the data.
5. Know the source of your data. The key things to know are:
- Which fields the application populates
- Which of these fields do you use in data warehouse reporting
- How frequently you load this information into the EDW
- When is data archived out of the source system
With this knowledge, setting up your validation rules and processes becomes more streamlined and mechanical.
For example, in the area of purchase requisitions, companies develop their own extractors because none are provided with BW Business Content. The validation team must meet with the BW team (and possibly key purchasing users) to learn which fields they took from table EBAN in R/3, how and where they mapped them into the BW InfoSource, and how the system populates in the EDW InfoProviders. To design the optimal validation rules, the validation designer must learn about how the system stores the data (record keys), what its load frequency and volumes are, and when BW archives data from the source system.
6. Customize your validation approach. Not all BW objects are the same, so you shouldn’t validate them in the same way. Validation frequency, selection parameters, and comparison object definitions differ depending on the BW object type. For example, a DataStore object (formerly an operational data store [ODS] object) is fundamentally different in its record types, lengths, and retention periods than a highly summarized InfoCube.
Say you need to perform a validation of data in a detailed Accounts Payable (FI-AP) DataStore object. For this situation, you could have a more detailed drill-down design to evaluate more precise characteristics, such as the cost center and G/L account. However, you should perform this validation for a shorter time frame — for example, only the previous 30 days at daily level, due to the likely volume of records stored in that DataStore object.
Conversely, FI-AP validation performed against a summarized, reportable InfoCube may be limited to organizational and high-level document characteristics, such as company code, purchasing organization, or document type. In this case, the validation can include a longer time frame — for example, six months at a calendar-month level.
7. Validate the EDW across all layers of the data warehouse architecture. BW allows almost limitless data transformation and staging within the data warehouse architecture. Therefore, it is important not to stop at the EDW layer. Validation should occur from EDW up through all the analytical decision support system objects. Some BW objects may allow validation by using the inherent redundancy of data in the BW system. In such cases, use the redundancy to cross-validate data.
With these seven principles in mind, you can properly define, design, and implement your validation solution architecture in SAP BW and SAP NetWeaver BI. Having a defined validation process in place ensures that the data coming into your BW system is correct. This enables users to spend more time analyzing the data and less time double- checking it.
Michael Loveless
Michael Loveless is an SAP NetWeaver BI strategic architect for Sapiex, Inc., a consulting group that focuses on strategic architecture planning and integration for SAP applications. Previously, he worked for SAP America (1998-2004) as a platinum SAP NetWeaver BI consultant. Michael has worked on many SAP NetWeaver BI projects in roles ranging from an SAP NetWeaver BI application configuration consultant to a global strategic SAP NetWeaver BI SME. Currently, he is working as an SAP NetWeaver BI 7.0 project manager and architect for a global forecasting solution project in the pharmaceutical industry, and as a global BI strategic planner for federal government projects.
You may contact the author at michael.loveless@sapiex.net.
If you have comments about this article or publication, or would like to submit an article idea, please contact the editor.
Gueorgui Patchov
Gueorgui Patchov is an SAP BW data architect with Sapiex, Inc., a consulting group that focuses on strategic architecture planning and integration for SAP applications. Previously, he worked for the Rohm and Haas Company in Philadelphia, PA, and has provided BI consulting services in the chemical, pharmaceutical, and consumer products industries. Currently, he is consulting for a large SAP NetWeaver BI 7.0 implementation project in the pharmaceutical industry.
You may contact the author at goshopachov@yahoo.com.
If you have comments about this article or publication, or would like to submit an article idea, please contact the editor.