Converting data from a legacy system to an SAP system can be a daunting task. Follow experienced advice for planning and executing your data conversion strategy, from developing project scope to testing and monitoring the data conversion. Included with these tips are two downloads: a sample Microsoft Visio data dependency planning chart and a conversion project plan checklist.
Key Concept
The data conversion process ensures that you migrate all your legacy data in a usable, consistent format into your new SAP system with the right quality and right dependencies. The process helps you preserve data usability by using the right tools, processes, and resources at the right time. A data conversion strategy involves extracting data from source legacy systems, transforming the data in the format required by SAP load programs, loading the data into the target SAP system, and finally reconciling data loaded in the SAP system.
Data is the lifeblood for any organization. Companies use data to run day-to-day business, reporting, analytics, and financial metrics. Most importantly, top management use data for decision making. It is the data conversion manager’s responsibility to make sure that all the data is captured, transformed, and loaded into the target SAP system.
Teams involved in successful implementations take data conversions very seriously and they make every effort to preserve data qualitatively and quantitatively. They use the converted data during integration testing cycles and end-to-end testing cycles to make sure that the desired business process works seamlessly. In this article, I will discuss my experience with many successful SAP implementations by discussing how to prepare for data conversion and how to address the technical and functional considerations.
The Data Conversion Process
Invariably, all new SAP implementation projects require data conversion efforts. Companies need to migrate the data from existing legacy systems into the new SAP systems to run their day-to-day operations and to make business-critical decisions.
Often the sponsor and the key project management team members think that data conversion is simple and do not put enough emphasis on the data conversion efforts. In almost all implementations in which the team has this attitude, either the implementation failed or there was tremendous rework to get things fixed after the fact.
I call this the 1-10-100 rule. For each unit of data that you plan to convert, fixing and cleaning the data at the source (legacy system) needs one unit of effort. When you fix it during testing, you need to put in 10 units of effort. If you fix the same issue in the target SAP system after go-live, your need to apply 100 units of effort.
Business owns, maintains, and uses the data that needs to be converted. Make every effort to involve business representatives. This is crucial in designing the conversion processes. The business representatives have an in-depth knowledge of the current legacy applications and data and often are in a better position to resolve issues with the data conversion.
Functional team members should take ownership in creating the conversion data maps. Technical personnel should be involved in designing conversion tools to use and in resolving technical issues.
Figure 1 depicts the data conversion process flow at a high level.
Figure 1
The process flow of data from the source legacy extract to loading and reconciliation in a target SAP system
In the sections that follow, I describe several of the important tasks you need to undertake when planning and executing a data conversion project.
Determine the Scope of Legacy Data to Convert
During the requirements phase of the project, when you request information on what data should be migrated from existing legacy systems to the new SAP system, the first response from the business team members and data owners often is, “we want all the data to be converted.” At this point as the data conversion manager, you need to ask them why they need all the data to be converted. In some instances it is needed and others it is not. You should make sure that you are only converting data that adds value and makes business sense.
As a first step, identify key business contacts and data owners and schedule a series of workshops. These workshops should aim at identifying all existing legacy systems and understanding the legacy system data.
- Identify all the data sources and their dependencies
- Do a volume analysis. Identify the volume of data available at each data source.
- Decide how much history data to convert (e.g., three years or five years worth of data). This depends on the business requirements and statutory regulations.
- Decide on what data volumes to convert and at what intervals. You may wonder “Why?” When my team and I converted HR data at one of our major clients, we had a huge volume of employee qualifications data (two million rows). This data was not needed immediately and did not need be loaded during cutover. This drastically reduced our cutover window. We loaded this data every day from 6:00 PM to 6:00 AM for a week after go-live.
- Consider the effect of the data conversion on system sizing and performance issues. You may receive requirements to load a decade’s worth of data. This not only adds implementation time, but it also affects system performance. You should involve the Project Management Office (PMO) because system sizing and performance issues affect the overall project.
Publish and Share a Conversion Guide
The majority of SAP implementations are spread into multiple releases spanning six to nine months. Publish a data conversion guide to help you to consolidate all the file formats needed for conversions. The guide should include the objective, roles and responsibilities, data mapping, translation, data cleaning techniques, validation techniques, reconciliation techniques, manual data entry rules, data conversion procedures, and data correction procedures. For example, a large company may decide to roll out its SAP implementation in phases:
Phase 1: North America
Phase 2: Europe, the Middle East, and Africa (EMEA)
Phase 3: The rest of the countries
Another company may decide to roll out by business units. It is a good idea to develop a conversion guide and define the conversion processes. The conversion guide essentially should cover:
- Scope of data conversion activities
- Data guidelines (transfer method, file format)
- Exception and error handling procedure
- Data mapping tools
- Records layout and file mapping.
- Data dependencies. For example, transaction data cannot be loaded unless master data exists.
The conversion guide should be stored in a central location accessible to all stakeholders on the project. Any updates should be made to this central copy.
Map Legacy Data to the SAP Format
After the legacy data is extracted, it should be transformed and mapped to the SAP data format described in the conversion guide. All mandatory fields required in SAP tables should be extracted or defaulted and populated in the file for loading into the SAP system.
As an example, the HR employee subgroup may not be available or it may not be a required field in your legacy system, but in SAP systems, this is a mandatory field. In this case, my team and I decided to default this field with N1 for all the converted data.
The conversion guide is especially helpful when it comes to flat file structure definition. It spells out what fields are needed, description of the field in SAP terminology, required or optional field, data type, and length.
Figure 2 lists the field characteristics for a data conversion. In most instances, the legacy extract is transformed in an external system such as Microsoft Access or Excel. Your business data owner should champion and complete this data transformation initiative with the help of the data conversion team members.
Figure 2
Flat file structure input for custom SAP data conversion programs. The file structures for each conversion are published in the conversion guide.
Design Custom Tables to Hold Legacy ID Mapping
Wherever possible, it is good idea to create a custom table to hold the legacy IDs and SAP IDs. A development team member can create a custom table via transaction SE11 (ABAP Data Dictionary) and create a table maintenance view in transaction SM30 (call view maintenance). The custom table helps during reconciliation as well as subsequent loads (
Figure 3).
Figure 3
This custom cross-reference table holds the legacy project ID along with the SAP project ID
For example, when you load customer data into SAP ERP, you should store the legacy customer ID and the SAP-generated customer ID. When you load your quotes, contracts, and sales orders, your custom program can read the table and obtain the SAP ID for the corresponding legacy ID.
Similarly, for HR data, you should store the legacy employee number and the SAP-generated personnel number in a custom table. Subsequent transactional data such as time entry records and sales orders on this project will have a legacy project ID in the legacy file. You can use this cross-reference table to refer to the correct SAP project ID when you load project time entries and project-related sales orders.
Translate Legacy Fields
You cannot just extract a raw file from your legacy system and load it into your SAP system. You need to perform translations on some of the legacy fields. This depends on your SAP configuration and should be provided to the team translating these legacy fields. I recommend you create these translations in your SAP data conversion programs because the translated target values originate in the SAP system.
For example, in standard SAP configuration, 01 and 02 represent gender in HR master data whereas in your legacy system you have male as M and female as F. You need to carry out gender translation 01 for M and 02 for F. This is a simple translation and can be performed while extracting the data in the legacy system or in your custom SAP program while loading the data into your SAP system.
You can choose to translate these values in the legacy system, an external system, or in the load program. If the translation fields are few, it is advantageous to work either in your legacy or SAP load programs. If the fields to be translated are too complex and too many, it is advisable to work in an external system such as MS Access, Informatica, or on a SQL server.
Note
As for default values, they should be defaulted in the SAP data load program because these values may not exist in the legacy system.
Scrub Data in the Legacy System and External Applications
You may have a few decades of data in the legacy system that you do not want to bring into your new SAP system. As much as possible, unused or inactive data should be archived. The company may decide to keep this data in the legacy system or archive it on third-party databases or data disks. The data owner should determine which data is needed so that it is cleansed, formatted, mapped, transformed, and extracted in the desired format as needed by SAP loads.
The key steps for data cleansing are:
- Define the business rules for data scrubbing. What data do you want to bring into your new system? As an example, you may have duplicate customers and contacts in your legacy system. When you bring over this data, you should dedupe the records.
- Define conversion extract criteria. What data do you really need to extract and what is the criteria for extraction? For example, you may want to extract sales orders active since the beginning of the financial year. Alternatively, the business may decide to extract only orders with pending invoices or billing documents.
- Assess legacy data for accuracy, duplications, and completeness
- Identify conversion mapping data issues or anomalies. As an example, sometimes in the legacy system, some of the important fields are free text fields. As a result of the nature of the field type, you could have typos. Another example is that you may have 100 different values for a field in legacy system, whereas in the SAP system, you have only three values defined as per the company business requirement. This causes mapping issues and anomalies.
- Define the process for monitoring data cleansing effectiveness
- Document the data cleaning process and key success factors (KSFs). This is especially useful for future releases.
Legacy data issues that might arise during this step fall into three basic categories:
- Syntactic: These are surface differences, such as spelling errors that are usually the result of typing or data entry. You can address this kind of error by using speech recognition or sound comparison.
- Structural: The internal representation of data is inconsistent between sources. Typical problems include definition inconsistencies. For example, in SAP ERP Human Capital Management Personnel Data, the values for the gender field are 1 for male and 2 for female, but in the legacy system, the values are M and F. Another example happens with currency — one system has two decimal places while the other system has three.
- Semantic: These occur when the users’ interpretation of the data differs. The automated script or program may not be able to logically interpret the data. These errors can also occur when a disagreement among source records requires a human resolution, such as inconsistent definitions or duplicate vendor master records. For example, this type of issue can occur when a customer has multiple addresses and you are unsure which one is the most current. In this case, you need the business data owner involvement to decide which record to extract and load.
Note
The business data owner is responsible for the quality of the data. The data conversion team supports them and advises them about the best approach to transform the data.
Establish Dependencies Between Each Load
I strongly recommend that you develop a dependency chart for all your conversions. This is the starting point for you to plan your resources. Microsoft Visio is very useful for creating this type of chart.
Figure 4 shows an example of an HR data conversion dependency chart. If you look at leve1 2 (Infotype 1000) and level 3 (Infotypes 1001, 1002, 1005), the data conversions in level 3 cannot be loaded unless level 2 data conversions are loaded. The dependencies are depicted by the arrow connectors. This chart helped everyone in the HR team on my project understand the sequence and dependencies. You can download the Visio file for this chart at the end of the article.
Figure 4
Data conversion dependency chart for HR conversions
The dependency chart helps you manage conversion loads effectively and efficiently. With the help of the dependency chart (and the sequencing diagram in
Figure 5), your team can plan the number of loads that can be concurrently scheduled. You can then time them to make the most of the SAP NetWeaver Application Server functionality.
Figure 5
Data conversion sequencing: logical sequence diagram
When you load data into your SAP system, it uses its own application servers. Based on the available application servers and processors, you can plan data loads so you use them at optimum capacity without idling the system. For example, you can always plan to run 30 batch jobs concurrently at any given time based on the server capacity. Scheduling more than 30 may lead to resource contentions and load delays.
To load balance the servers, I created data conversion sequencing. This chart helps your team to concurrently load various data conversions based on the dependencies and availability of the system processors (application server processors).
Test the Data Conversion
Any software development should go through testing. This is where you discover and fix issues. To optimize resources, it is a best practice to plan three mock conversion cycles for the data conversion testing. You carry out all these tests on your QA clients, which should more or less simulate your production client.
The benefits from data cycles testing are that you can identify:
- Right data is extracted and loaded in the target SAP system
- Data transformation and mapping are accurately done
- Data reconciled and verified by business data owner
- Fine tune extraction, transformation, and loading (ETL) programs to run efficiently in shortest possible time
- Capture the actual load time for each data conversion. These timings are useful in cutover project planning.
Conversion cycles follow three steps: Extract data from the legacy system, transform the data to fit the SAP load program format, and load the data into the SAP system.
Step 1. Extract data from legacy systems. At each cycle, data is identified with the help of the business owner (extraction criteria). This data should be cleaned in the legacy system and extracted. Alternatively, you can extract it and clean it in external systems such as Access or Informatica.
Step 2. Transform the data into the format required by SAP load programs. This includes data mapping as well as data transformation. You can carry this step out in Access, Informatica, or SAP NetWeaver Process Integration (SAP NetWeaver PI).
Step 3. Load the data into the designated SAP client. The QA system should be refreshed for each cycle. The client refresh is a complex process that your Basis admin team performs. If it is a first-time implementation, the QA client should be completely clean. For subsequent releases, the QA client should be refreshed from the production client. The idea is to have your QA system as close as possible to you production system.
Make sure you capture the statistics listed in
Figure 6 for each conversion load. These statistics help you compare and measure progress from cycle 1 to cycle 3.
Figure 6
Statistics to track during each conversion mock cycle
Each conversion mock cycle should have continuous improvement and the last cycle load rate should be 100% or close to 100%. Any failed records should be added manually by the business. Adding failed records manually is especially critical for master data. As an example, one customer master record fails due to a syntactic data issue and this customer is one of your major customers with 100+ sales orders tied to the customer master. Not fixing this one record manually will result in more than a hundred transactional record failures down the line.
After you capture details in the form provided in
Figure 6, the conversion lead should consolidate the data for all conversions to provide a status update to the PMO and the stakeholders.
Figure 7 shows you the consolidated view of all the data conversions’ statuses.
Figure 7
Consolidated status report of all your conversions
Validate the Data Loads
After loading data in the SAP system, the conversion team member should validate the data (compare data loaded in SAP system and legacy data file) in an Excel sheet, Access file, or within the SAP system via a custom program. Validation immediately after loading the data helps the team identify mapping, transformation issues, and any bugs in the data conversion programs as well as any configuration issues. The key roles involved in this process are the conversion lead or manager and the data conversion team members.
The objectives for data validation include:
- Validate the source data loaded into the SAP system as expected
- Verify the functionality of configuration used by conversion programs
- Verify the accuracy of the cross-reference mapping tables (old legacy ID to new SAP ID)
- Reconcile source data to SAP data
- Identify and fix legacy data issues
- Verify the sequence of data loads
Data validation should be automated as much as possible. As an example we extracted data loaded into the SAP system and compared it with the legacy file data. We loaded these two files into Access and did a comparison. You can perform this validation in Access, Excel, Informatica, SAP NetWeaver PI, or using Web methods. The idea behind the validations is to make sure that the data in the legacy file is accurately mapped to fields in the correct SAP format for the SAP load programs.
You should validate data whenever the data form, structure, or content changes. To uncover these changes, you should compare the pre-cleansed and post-cleansed data reports. The different types of data validation include:
- Legacy extract file validation: Validate the file with independent existing legacy reports
- Pre-SAP load validation: Validate that the data mapping and data transformations are accurate in the final extract file. You can validate this data against SAP configuration tables, translation tables, and master data.
- Load validation: Capture the counts of records in the legacy extract, successfully loaded records, errored out records, and the error message report
- Post-SAP load validation: Includes quantitative and qualitative validation. In this process you compare the SAP data with independent reports from the legacy system.
Reconcile the Data
After the data is loaded, the business team should reconcile the data by making sure the data in the legacy system matches what is now in the SAP system. The team should check that all the mappings, translations, and transformations carried over from the legacy system. You can use different sampling techniques such as
Weibull’s statistical approach to pick a random sample of records for reconciliation. You can use any random sampling method as long as the sample is representative of the whole set and is a proven method.
The data should be reconciled by logging on to the SAP system and the legacy system. Data should never be reconciled using data dumps from the SAP and legacy systems. For example, if you are reconciling sales contracts, the user should use SAP transaction VA43 (display contract), open the contract, and on another screen open the sales contract on the legacy system to compare all the key fields. During this reconciliation process, no data should be corrected in the target SAP system.
After completing the reconciliation, the business data owner reports the success and failure rates. If the failure rate is high or if too many discrepancies are observed, you should request a data reload. This may be necessary to correct the legacy file, data mapping, SAP configuration, or incorrect loading of fields.
Note
Data reconciliation should check for the 5Cs: completeness, comprehensibility, correctness, consistency, and compliance with business rules.
Establish Gate Criteria for Each Conversion Cycle
The PMO, release manager, deployment manager, testing lead, and conversion lead should determine the gate criteria before kicking off conversion cycles. During your testing phase, each project establishes gate criteria. A gate criterion is the cutoff point where you declare the testing cycle is a success. The usual gate criteria for data conversion are 75%, 90%, and almost 100% for cycles 1, 2, and 3, respectively. Typically, on my projects, my team and I target 85% for cycle 1, 95% for cycle 2, and ~100% for cycle 3. Even though on paper it shows as three mock cycles, each cycle involves iterative data loads — the data keeps loading until the gate criterion is achieved. It may take a few mock cycles for some conversion objects.
Obtain Sign-Off from the Business Data Owner
For each mock cycle, the business owner for the conversion should sign off on it to show that the data is reconciled and verified and that the correct data was loaded into the SAP system. This sign-off also shows that the business owner verifies that the data in the legacy system was successfully migrated. In addition to manual or automated reconciliation, the business owner may run a few reports in the legacy and SAP system to make sure the data makes sense. The key roles for this part of the process include the business owner, SAP data conversion functional owner, legacy functional/technical team members, performance leads, and a Basis team member.
Now that I have defined the data conversion process, in the next section I describe the technical considerations for data conversions.
Data Classification — Configuration, Master Data, and Transactional Data
I would like to sequence the data into three distinct groups. Within each group is a logical sub-grouping of data conversion loads. This will aid planning and usage of system processors in optimizing system resources and reduce the end-to-end conversion cycle time.
The first group consists of configuration and master data, which is the basis for all the other data loads. Configuration data such as the company code, plants, sales organizations, personnel area/sub-area, and employee groups/sub-groups should be configured in the system. By adding the configuration data, you are laying the basic foundation and structure in your SAP system.
Next follows the master data loads, including cost centers, customers, vendors, materials, and employees. Master data is the basis for transactional data. As an example, sales quotes or orders cannot be created unless customer master data is loaded. Similarly, purchase orders cannot be created unless vendor master data is loaded.
These volumes are very high and you may want to bring in all the data from the legacy system. As an example, a customer may not have done business with you in recent times, but you do not want to exclude this data. By excluding this data, you are missing important customer information. When it comes to customer and vendor data, you may want to consider loading all the data. This data is never historic data.
Next, load the transactional data such as opportunities, quotes, sales orders, contracts, purchase orders, payroll data, and time data. These volumes are usually huge and these loads depend on configuration and master data. You may want to restrict as much historical data as possible. Any data more than three to five years old should be archived, unless you have a requirement stating otherwise.
File Extraction from the Legacy System
Files should be cut off well in advance before loading. This allows you to validate the files and, if needed, gives you time to re-extract the data from legacy system. The conversion process typically has a validation step before loading so you can catch any errors in the data extract before loading the data into SAP ERP. In addition, the validation step ensures that valuable system resources are not tied up during the actual load process, which reduces the time for the conversion load. Before you extract data from the legacy system, your cutover deployment plan should address these requirements. For a go-live, this is very important because the files should be extracted immediately after the legacy systems are locked. When the SAP cut-over begins with migrating transports, the legacy team should extract the data and the validation team should validate and sign off on the extracts.
There should be a central secure location for files on your share drive or SharePoint. All success reports, error files, and conversion forms should be saved in this location. This is useful for your audit and compliance teams.
Split Large Data Loads
Due to huge volume, some conversions take a very long time to load. It may take days for an SAP General Ledger load or HR organizational data load (infotype 1001). Some conversions — such as customer master, vendor master, and material load — can be very time consuming. Because the cutover time available is short, data conversions involving large volumes should be split into multiple manageable loads. Your technical team members can schedule 10 to 15 jobs concurrently via background processors. Assuming that you plan to run 10 concurrent processers, the time this process takes is reduced almost 10 fold.
Error Handling and Reload Capabilities
Data conversion programs should be designed to handle errors and should be capable of reloading in the event of program failure. Always add a validation routine in your program. This helps you figure out any issue and allows you to fix the legacy file or configurations as needed. Validation routines also reduce actual load time as you can eliminate validations during load.
Figure 8 shows when you can validate the file in transaction SE38. In this custom program, the developer used CALL_TRANSACTION and batch input with batch input sessions to load data into the SAP system.
Figure 8
A typical conversion program selection screen with validation and load options
Plan to load data via the SAP background processor. This way you have the logs and the exact duration of the run. Always capture errors in a file or in a session so you can re-load the file or session as needed after fixing the configuration. For example, while loading vendor data, one of the tax types is missed in the configuration. As a result, around 7% of the records failed. I captured this in the Batch Data Communication (BDC) error session via transaction SM35 (
Figure 9). The SAP configuration for tax type is fixed and the BDC session can be re-processed again.
Figure 9
To process a session, choose the line item and click the Process button
Generate statistics after the load such as total records in the legacy file, successfully loaded records, and failed records (
Figure 10). Generate a success record file and an error record file.
Figure 10
After the loading is completed, the custom program should generate a report
Monitor System Performance and Application Server Management
During loads, your Basis team and performance team should monitor the load performance using the following transactions:
- SM66: System-wide work process overview
- SM50: Work process overview (gives your Basis team a good idea about how the loads are going). This transaction lists the processor use. Your Basis team members should be monitoring this transaction during the data conversion load to determine how your data conversions are progressing (Figure 11).
- SM37: Job overview. This transaction displays the progress of your conversion jobs and its status (Figure 12). The screen displays the batch job details such as start date and time and the time it took to run the jobs.
- WE02 (Display IDoc) and WE06 (Active IDoc monitoring): Monitor your IDoc loads (Figure 13)
Figure 11
Transaction SM50: Work process overview
Figure 12
Transaction SM37: Overview of job selection
Figure 13
Transaction WE02: Display IDoc
Schedule Conversion Batch Jobs
All conversions should be scheduled via background jobs. The advantage of using batch jobs is that logs are saved and can be retrieved in the SAP system. You can use transaction SM36 (schedule background job) as shown in
Figure 14. Click the Job wizard button on the main screen (not shown) to specify jobs on specific servers. Your technical team members should monitor jobs regularly. They can check job details by going to transaction SM37 (background job monitoring).
Figure 14
Transaction WE02: Display IDoc
Checklist for Success
To execute a successful data conversion strategy, I suggest you capture and track tasks in your conversion project plan. The list in
Figure 15 summarizes the conversion tasks and can serve as a checklist for your implementation. You can download a copy of this checklist at the bottom of the article.
Figure 15
Example conversion project plan checklist
In addition to the checklist here are a few tips based on my experience to make your strategy work:
- Team members should be available onsite and connected to the company LAN network. The data conversion should always be loaded onsite and should be monitored by your Basis and performance teams.
- Data conversions consume as much as 30% to 50% of your cutover time
- Encourage all stakeholders to raise any concerns they have during the conversion process. This helps you avoid issues after go-live in the production system.
- Communicate the project progress to all stakeholders by using progress reports, status reports, and project time lines
Remember to follow four simple rules: clean, validate, design, and reconcile. Clean the data in your legacy system before extracting. Do not load any data that does not add any value to your business. Validate the data in the legacy data extracts to make sure mappings and translations are accurate. Design data conversion programs with flexible options. They should have options to validate data before loading, create error reports, and generates load statistics. Business data owners should reconcile data, comparing existing legacy reports with SAP reports, and make sure the end-to-end process works as required.
As a final note, your team is the key to a successful data conversion strategy. Team members commit long hours and bring in their best to make it successful. Make a point to recognize teams and individuals who go above and beyond. Recognition motivates teams to bring out the best in them.
Srini Munagavalasa
Srinivasa (Srini) Munagavalasa has 14 years of experience in various SAP modules. Srini has worked on multiple SAP global implementations at major clients. He has experience as a project manager, deployment lead, build manager, and technical development manager.
If you have comments about this article or publication, or would like to submit an article idea, please contact the
editor.