Using Information Lifecycle Management (ILM) and Nearline Storage (NLS) techniques enables organizations with SAP NetWeaver BW implementations to improve warehouse performance while considerably reducing database administration costs. In addition, using ILM with NLS improves your ability to manage and satisfy service level agreements. Discover the important aspects of ILM and garner best practices for using ILM with NLS.
Key Concept
Information Lifecycle Management (ILM) is the set of policies, processes, practices, and tools that you can use to ensure that specific data at a specific point in its value life cycle is assigned to the most appropriate and cost-effective IT infrastructure. Any organization, whether it is just starting with an SAP NetWeaver BW implementation or has already installed one, can take advantage of the benefits of ILM. It is best to introduce ILM as early as possible in the process because it is much easier to prevent the problems associated with database growth than to fix them after they have occurred. If you have a large existing system, you can implement ILM in phases to start realizing benefits earlier and with as little effect as possible on the overall system.
It is generally accepted that the value of the information stored in a data warehouse changes over time. As the volume of data produced by enterprises continues to grow, this recognition has led to the development of the concept of Information Lifecycle Management (ILM).
For users of SAP NetWeaver BW 7.0, the Nearline Storage (NLS) interface is a key component for implementing an ILM strategy. In the SAP world, NLS is a new category of data persistency that is similar to archiving in that its overall purpose is to take read-only data out of the database system and move it onto less expensive devices (normally file systems). The crucial difference between nearline data and archived data lies in the degree of data accessibility offered. Unlike archived data, nearline data can be seamlessly accessed at speeds comparable to those involved in accessing data in the main online database.
NLS functionality is fully integrated into SAP Net Weaver BW 7.0, and can be managed using the standard SAP workbench interface (transaction RSA1) and process chains. A tiered architectural approach such as the Layered Scalable Architecture (LSA) allows for integration of existing Relational Database Management System functionality with the massively parallel performance capabilities of both the NLS and SAP NetWeaver BW Accelerator components.
I discuss the rationale for implementing an ILM strategy and present a number of recommended best practices intended to ensure a quick, successful ILM/NLS implementation in SAP NetWeaver BW.
Why Implement ILM?
A number of possible factors might prompt an organization to proceed with an ILM implementation. There may be an obligation to implement ILM to comply with legal regulations for data retention, or an organization may choose ILM because of anticipated benefits in the areas of resource usage, system availability, risk management, system performance, and support for analytics. The details of the benefits that ILM can provide in each of these areas are described in greater detail below.
Reduced Resource Usage
ILM enables a reduction in hardware expenditures (on disks, CPUs, and memory), as well as the time requirements and costs associated with management of the online database system. When the substantial footprint of warehouse data is factored in, the savings can be considerable. Typically, 1 GB of raw data ends up requiring 3 GB of database space after indexes and summary tables have been built. When a RAID 1 solution (mirrored disks) is in place, this 3 GB of database space grows to 6 GB.
Furthermore, that data is then copied into multiple systems: development, test and QA, pre-production, backup and disaster recovery, and so on. Ultimately, the initial 1 GB of data could actually occupy as much as 30 GB, with a corresponding increase in the time required to perform standard system maintenance functions such as backup, database re-organization, re-indexing, and re-calculation of aggregates.
Reducing the database size by implementing an ILM strategy means that fewer resources are required to manage that database, and it significantly reduces the amount of data processing that needs to be performed within limited batch windows. This can make it much easier to meet and manage Service Level Agreements (SLAs), and ultimately results in reduced demand for expensive DBA resources.
Increased System Availability and Better Risk Management
As suggested in the previous section, implementing an ILM strategy can reduce the overall time spent in refreshing the SAP NetWeaver BW database (including backup, table/index reorganization, and summary table management), so that available batch windows continue to be respected in the face of growing data volumes. Moving data out of the main SAP NetWeaver BW database can also decrease the time required for software upgrades or database conversion, such as when implementing Unicode support. Furthermore, ILM can also help reduce the risk of lost revenue associated with disaster recovery. It is 10 times faster to recover a 1 TB database than a 10 TB one, and since the cost per hour associated with unplanned downtime may vary from $300,000 for a business application to over $2,000,000 for a financial/trading organization, this can be a compelling reason to proceed quickly with an ILM implementation as part of a prudent risk management strategy.
Improved System Performance
You can implement ILM strategically to improve overall system performance, with a direct effect on the productivity of business users. A smaller database usually offers faster response times, along with easier database performance tuning and modeling — you can create more indexes and full table scans are faster to execute. All this can be achieved without sacrificing the ability to analyze long-term trends.
More Data Available for Analysis
For any number of reasons, an organization may be required to integrate and provide access to more data, and to retain it for longer periods. If no ILM strategy is in place, it may be very difficult technically to maintain existing SLAs in the face of this organic data growth. In this respect, the need to deal with organic data growth may itself be the primary justification for an ILM implementation.
Best Practices for ILM
The SAP NetWeaver BW NLS component was introduced in the initial SAP NetWeaver BW 7.0 release. A number of enterprises are already taking advantage of this feature, and their experiences have led to the identification of the following best practices for an ILM implementation.
1. Don’t wait to encounter database size issues before thinking about introducing an ILM strategy. Start early with a preventative approach rather than undertaking a more complex and expensive cure later on.
An ILM implementation has two main phases:
- The initial ILM data migration process
- The ILM steady state
The steady state is attained at the point when only the required data is kept online, which means that all static data has been moved to the NLS solution. During the initial ILM data migration process, an enterprise may have to migrate a large amount of data to catch up, which can have a major effect on the costs and time requirements of this operation. This also puts additional pressure on already constricted batch windows, thereby increasing the amount of time required to reach the steady state.
Frequently, ILM implementation projects do not allocate sufficient time for initial data migration, mainly because the amount of time required for data deletion and database reorganization is underestimated — this can actually represent up to 70% of the migration process. The longer the enterprise waits before implementing ILM, the more time is required to reach the steady state when pressure on the batch windows is eased considerably. So, the quicker and more aggressive the ILM implementation is, the sooner real benefits materialize.
2. Nearline the SAP BW 3.x system before migrating to SAP NetWeaver BW 7.0 and converting to Unicode. An organization looking to upgrade to SAP NetWeaver BW 7.0 and convert to Unicode at the same time should take this opportunity to implement ILM with SAP BW 3.x, before proceeding with the technical upgrade. This reduces the size of the database, making the upgrade faster and easier to execute.
3. Incorporate NLS into the data modeling phase. The ILM solution is simpler and more powerful when the database is designed with NLS in mind. Database objects should be structured and architected with the aim of facilitating data migration from the online database to the NLS solution. For example, the migration rule can be mapped to a table partition structure. If the table partition is based on the same structure as the data selection criteria used for the data migration (a time slice, for example), it can deliver better throughput and greatly facilitate the data management process. There is a considerable reduction in requirements for table and index reorganization in the underlying database, and much less impact on operational batch windows.
4. Identify the data objects that should be addressed first to achieve quick wins. Undertaking an ILM implementation requires a good understanding of the attributes of various SAP objects such as Persistent Staging Areas (PSAs), DataStore objects (DSOs), and InfoCubes. Size, growth rate, the nature of the data (static or non-static) and its business lifespan, and retention requirements are attributes that you must identify to define effective migration rules for time-sliced data.
The implementation should start with the largest object that will most quickly bring the greatest benefit by being moved to NLS. Tools are available to help specialists use transaction DB02 (tables and indexes monitor) to identify the best starting point. Alternatively, some companies offer professional services focused on developing plans of action and identifying the specific benefits of implementing ILM in a given situation.
5. Use the Data Acquisition Layer as a source for NLS. The data objects associated with the data acquisition layer (PSAs and write-optimized DSOs) are typically large and static by nature, making them ideal initial candidates for nearlining. When stored in the NLS solution, you can use these objects as direct sources for data transfer processes (DTPs) extraction, transformation and loading (ETL) processes, and for much of the atomic-level reporting or extraction normally involved in more detailed trend analysis.
Some may believe that it is better to archive this data (that is, move it offline), but this raises additional issues relating to security and accessibility: How certain is it that the current ETL process is appropriate, and that it covers all the data that may be required, now and into the future? Also, the restore process for archive data is costly and has a massive impact on users of the system when it is activated. If the data did not perform well in the online database, the situation will be worse when it is archived, and if it needs to be fetched back into the online database, the negative effect on performance can be substantial.
Instead, the data should be migrated to the NLS solution as soon as possible, and archived only when it is at the end of its life cycle — that is, when it is no longer required. The nearline data is an excellent data source for supporting regulatory compliance because it is possible to ensure that no transformation or update processes have been applied to it.
6. Incorporate more data using NLS. It is possible to incorporate more data into SAP NetWeaver BW by means of an aggressive ILM strategy applied to the data acquisition layer. A new data source can be directly mapped to a write-optimized object, so that as soon as the data has been created in SAP ERP, it is migrated directly to the nearline component. After it is defined and activated, this SAP object becomes available as a new data source supporting new analytic capabilities. For example, information from external data sources such as PeopleSoft, Oracle Financial, or other systems can be readily added in this way.
7. Store the Data Staging Layer in NLS. The data staging layer represents the detail data associated with an InfoCube. Usually, this data has been transformed and prepared as part of the InfoCube construction process. Data stored in the data staging layer is an excellent candidate for NLS because you can use it to feed DTP processes as well as to support drill-to-detail capability.
8. Move infrequently used InfoCubes to the nearline repository to improve reporting layer performance. The reporting layer consists of the data that is most often used for reporting. This is the data that is normally targeted for migration to the SAP NetWeaver BW Accelerator to provide accelerated performance. Applying the ILM strategy to InfoCubes, by moving less frequently used data to the nearline solution, has a direct, positive effect on the size of data in SAP NetWeaver BW Accelerator and on reporting performance against this data. This, in turn, lowers the cost both of implementation and ongoing operations.
Note
For more information about SAP NetWeaver BW Accelerator, refer to the following
BI Expert articles:
9. Design the SAP NetWeaver BW Accelerator implementation to incorporate NLS functionality. Though the NLS solution does not interact directly with SAP NetWeaver BW Accelerator, you can use NLS with this component to improve the overall performance and manageability of the SAP NetWeaver BW environment. InfoCubes should be designed with the NLS implementation in mind. This means that InfoCubes should be partitioned in accordance with the migration rule. With this strategy, you can reduce the amount of reorganization required at the SAP NetWeaver BW Accelerator level after online data has been migrated to nearline because you will need to re-index fewer InfoCube. For a given request, SAP NetWeaver BW automatically and transparently accesses both BWA and NLS to obtain access to all the data required by the request.
10. Consider the effect that NLS data archive process (DAP) package size will have on operations. The SAP NetWeaver BW NLS application programming interface (API) is controlled by a DAP that defines the source data to be migrated, the migration rules, and the parameters controlling the overall process. Caution should be exercised when defining the granularity of the DAP package that represents the data slice to be nearlined. It is possible to define a very large package or to partition the data into smaller packages. The important thing to understand is that the data package size is the minimal granularity of an operation. During a restore process, the full DAP package is always restored, because it is not possible to perform a partial restore. Multiple smaller DAP packages can be processed in parallel, enabling better operational performance than when using a single large package. Finally, the granularity of recovery operations is related to the granularity of the DAP Package — the smaller the size of the package, the more savepoints are introduced into the process.
11. Reduce pressure on batch windows by splitting the DAP process into operational steps. A DAP process is divided into three main phases: copy, verification, and delete. In the copy phase, the target data is locked and copied from the online system database to nearline. In the verification phase, the integrity of the data that has been moved to Nearline is validated. The delete phase involves actual data deletion of the source data and activation of the Nearline data for requests and DTPs. Before the delete phase, the nearline data cannot be directly accessed (to ensure that data is not counted more than once).
These three phases can be executed in sequence or decoupled. The recommended practice is to split the phases into two steps: the first step composed of the copy and verification phases and the second based on the delete phase. The delete phase can be costly from an operational point of view because it includes deletion of data, reorganization of table spaces and indexes, or summary rebuilding.
In contrast, you can execute the copy/verification phase during normal batch windows or as a daily operation. You can perform a backup immediately before the delete phase, which can be scheduled for a weekend. You can schedule all these processes by using a process chain. The costs of the delete phase can be reduced by targeting data associated with a specific table partition for deletion. This approach improves the ability to meet SLAs, removes pressure on batch windows, and secures the overall ILM process.
12. Use NLS to enable data governance and e-discovery. Implementing a nearline solution in conjunction with a data governance storage solution (such as Open Text, EMC Centera, or any eXtensible Access Method- (XAM)-compliant storage provider) is an ideal way for an enterprise to enable data governance as well as electronic discovery for legal purposes. Data stored using those solutions can be queried directly via the nearline solution.
Richard Grondin
Richard Grondin is vice president of research and development for SAND Technology. He has been involved in the design and implementation of large data warehouse solutions since 1996, and currently leads the development and direction of the nearline and corporate information memory solutions at SAND. He has worked closely with SAP on the SAP NetWeaver BW nearline solution since 2004.
You may contact the author at richard.grondin@sand.com.
If you have comments about this article or publication, or would like to submit an article idea, please contact the editor.