The Analysis Process Designer (APD) workbench, introduced in BW 3.1 Content (BW 3.0B SP6), allows users to combine numerous transformations into a single data flow. It offers a less technical approach to enhancing subject-oriented, non-volatile data that has already been integrated, cleansed, and transformed in the data warehouse. The author examines current APD features and outlines future plans for integrating it with other data mining functionality in the upcoming BW 3.5 release set for later this year.

As data volumes continue to grow, more and more decisions are based on the intelligence gleaned from the data warehouse using various approaches to data mining. Data mining is loosely defined as any machine-based, algorithmic discovery of hidden information in massive volumes of data. Practically speaking, this means leveraging information repositories for nuggets of information that are not readily obvious from the manual inspection of raw data — even with advanced OLAP and reporting tools. Since the launch of SAP BW 2.1C SP9, various data mining algorithms — or modeling methods — have been available to SAP BW users including:

Decision trees
Scoring (including regression)
Clusterin
Association analysis

SAP has added more models and methods since their initial introduction. One of the most valuable features came with the release of SAP BW 3.1 Content (a.k.a. SAP BW 3.0B SP7) via the introduction of the Analysis Process Designer (APD). Using transaction RSANWB, the APD workbench provides users with a visual, drag-and-drop interface for developing and deploying advanced data transformations. SAP BW 3.5 goes one step further by combining the APD and data mining processes. It facilitates new data transformation capabilities by employing trained data mining models in APD data flows. I will explain this integration along with other data mining enhancements in the upcoming SAP BW 3.5 release slated for later this year. The sidebar “How Data Mining in SAP BW Works Today” on page 16 provides an overview of current data mining capabilities in SAP BW.

Using the APD

Since the introduction of SAP BW 3.1 Content, the APD has provided an enhanced data transformation workbench. It is able to combine numerous transformations into a single data flow. The APD offers a less technical approach to enhancing subject-oriented, non-volatile data that has already been integrated, cleansed, and transformed in the data warehouse. It is not intended, however, to be a replacement for traditional ETL processes. The APD offers plenty of modeling flexibility. A variety of data sources are available to the APD, including data from a SAP BW query, flat files, InfoObject master data, or an SAP BW InfoProvider (InfoCube, ODS object, or MultiProvider). It supports various data transformations such as outer joins, filtering, aggregation, sorting, regression, and more. The supported data targets for SAP BW 3.1 Content include transectional ODS objects, other SAP systems, master data, and SAP BW survey target groups, which are custom lists that can be reused by the Web Application server. The diagram presented in Figure 1 shows a basic APD data flow. Starting on the left, you see two data sources — an ODS and an InfoCube. Using grouping transformations, the data from each source is then aggregated to assist later data transformation processes by reducing the number of records. Both data streams are then filtered based on specific parameters of interest in the analysis.

Figure 1

The Analysis Process Designer (APD) in a SAP BW 3.1 Content system

Although it is not explicit in the diagram, in this example the ODS contains statistics for all known customers. The data from the InfoCube shows profitability for customers who have purchased from a company. The join transformation detemines those customers in the ODS who have not made any purchases. Based on this joined data, the cutomers with no sales are identified and written to the Churn ODS where the data is available for further analysis. This is just an example and doesn’t represent any sort of fixed analysis. The icons in the diagram can be combined at the designer’s discretion.

SAP BW 3.5 Weds Data Mining and the APD

Prior to SAP BW 3.5, the APD and SAP BW data mining functionality were two different processes. With the launch of SAP BW 3.5 later this year, the two will be combined, allowing transformation, data mining modeling, and data mining model training to be done directly in the APD. Of course, data mining is not required to use the APD in the new release. Like in SAP BW 3.1 Content, you can include advanced transformation of data without a named data mining method in your data flow. Data mining modelers using the APD in SAP BW 3.5 will enjoy benefits such as:

A single point of access via RSANWB for the APD and data mining modeling
A wider variety of data sources feeding data mining models directly
Direct pre- and post-processing of APD transformations
Integration of APD models into the SAP BW metadata repository
Access to intermediate results that can be calculated and cached to increase performance on an ad hoc or scheduled basis

Data mining model outputs are considered transformations in SAP BW 3.5, and several methods can be used as transformations once the model is trained. Transformations in SAP BW 3.5 include weighted score tables, linear and non-linear regressions, decision trees, segmentation, and planned external access to trained data mining models in SAP BW. Various mining methods are also available as data targets and are typically used during the training process. SAP BW 3.5 also enhances the statistics available on each node within an APD diagram. Statistics can be viewed online via context menus and then viewed during model development or refinement. This functionality allows model developers and power users to more finely tune model output.

Data Mining Model Training in SAP BW 3.5

Support for model training is provided by the APD in SAP BW 3.5, and most of the data mining model methods exist in the data targets section. An initial APD data flow can be setup to train a model, then a single, trained data mining model can be used as a transformation in other APD data flows. Data mining model developments in the SAP BW 3.5 APD data flow process eliminate the need for the model creation wizard supported in prior releases. Model data flow is done directly in the APD diagram, and model parameters are set as properties of the model icon in the diagram. Models created in versions prior to SAP BW 3.5 are available in the APD for data flows after the upgrade process is complete. Care must be taken, however, to use the APD to map — or remap — the same fields from data sources into the existing models.

Note SAP BW 3.5 supports the Data Mining Workbench (RSDMWB) and models can be manipulated using it. Users, however, are encouraged to access mining models from the APD.

“Knowledge Discovery” in SAP BW 3.5

As a final note, I’d like to point out another valuable aspect of SAP BW 3.5. A new discipline of sorts is getting a lot of attention these days in the data mining world and it has been addressed in SAP BW 3.5. Known collectively as Knowledge Discovery in Databases (KDD), it covers the preprocessing, post-processing, and information delivery of data mining operations along with specifics of data mining. While there are various interpretations of KDD, in general the field is made up of five processing steps, which are represented at the top of Figure 2. Note in the lower part of the figure that the APD and data mining integration in SAP BW 3.5 supports KDD tasks. Specifically, the data mining processes are handled as transformations with possible pre- and post-processing steps to accommodate the data flow of raw data into and processed data out of the model.

Figure 2

Knowledge Discovery in Databases (KDD) processes mapped to SAP BW APD functionality

As with any well-defined process, a certain amount of non-technical planning — labled “task analysis” in figure 2 —should guide your entire effort so that a solid business requirement is addressed. The APD functionality in SAP BW 3.5 then allows the data to be sourced, transformed, and deployed following KDD guidelines. Data deployment can be handled by either batch processes (persisting the data for further processing) or online on an ad hoc basis. Ultimately, SAP BW offers the flexibility of incorporating data mining capabilities into robust data transformations that allow end users and/or dependent applications further insight into the data than might otherwise be available only via traditional methods.

How Data Mining in SAP BW Works Today

SAP BW supports data mining models that are not limited to any particular industry or business function and delivers model-specified output. Data mining functionality found in SAP BW releases up to and including the SAP BW 3.1 Content is accessed via the Data Mining Workbench using transaction RSDMWB (Figure 1).

Figure 1

Transaction RSDMWB provides access to the Data Mining Workbench

Data Mining Method	Algorithm	Description	Predicted Field Type	Example Business Scenario	Visualization
Decision Tree	C4.5	Uses the values of input variables to predict the value of a categorical variable	Discrete	Fraud detection	Web AD SAPGUI External targets
Non-Linear Regression	Modified Least Squares	Attempts to explain the relationship between sets of variables assuming a non-linear model	Continuous	Sales forecasting	SAPGUI External targets
Linea Regression	Least Squares	Attempts to explain the relationship between sets of variables assuming a linear model	Continuous	Response rate forecasting	SAPGUI External targets
Weighted Score Tables	N/A	Creates an overall score by taking different dimensions into account	N/A	Customer scoring	SAPGUI External targets
Association Analysis	K-Meanz	Creates groups of records that are similar to each other within a particular group and different across distinct groups	N/A	Customer segmentation	SAPGUI External targets
Linear Regression	A-Priori	Uncovers the hidden patterns, correlations, or casual structures among a set of items or objects	N/A	Market basket analysis	SAPGUI External targets
ABC Classification	N/A	Classifies objects into distinct groups based on a particular dimension	N/A	Customer segmentation	SAPGUI External targets

To determine the impact of specific model parameters on model results and output, consult the documentation in SAP BW online help — both at https://help.sap.com (navigate to SAP NetWeaver>SAP Business Information Warehouse>Administrator Workbench>Modeling Data Mining) and in the context-driven help in your SAP BW system. More information is available at http//service.sap.com/cnm-analytics. Find Customer Analytics in the scenario section and see litereture type “Documentation” for details on each modeling method. In general, you set up and configure your data mining activities following these five steps:

Select a model type
Create (or select) a BEx query or query view as a data source for model training
Set model-specific parameters
Train the model by executing it against historical data to build statistics, which are used for evaluating future results
Load results back into the SAP BW system

The first two steps are the same for any model, as is the last step. Each of the data mining methods, however, has unique parameters and settings that vary based on the desired outcome or issue at hand. The entire configuration and setup process is facilitated through the use of a model creation wizard.

Model Parameter Settings and Training

Setting the parameters for a data model involves identifying key fields in the data source and assigning weighting values to those individual fields. In some cases, additional steps must be taken such as indicating whether fields hold discrete values or a continuous flow of an unknown number of unique values, accommodating sparse values, etc. Training the data, which requires creating historical statistics, also varies depending on model type. For example, in a clustering model, historical data would provide data for groups or clusters of customers based on any number of demographic data points including income, geographic location, age range, etc. These historical groupings can then be applied to future data as it passes into the data warehouse and then through the data mining model algorithm. (For examples of decision tree and scoring models, see “Put BW’s Newly Enhanced Data-Mining Tools to Use”.) Depending on your needs, data mining model output can be stored in SAP BW for use in other applications such as SAP CRM. When new data is run against a trained clustering model, for example, the resulting clustering group identifier can populate a ClusterID attribute of new Customer(s) in the SAP BW master data. By running this process in batch mode, new information could be retained and BEx users could identify specific groups of customers, while CRM users could use the same data to determine specific customers for a targeted marketing campaign. In addition to offering persistent data, with the launch of the SAP BI 3.2 Content Add-On,¹ the technology supports Predictive Model Markup Language (PMML). PMML is an XML-based language that provides a standard way of defining and sharing data mining models that can interact with systems external to SAP BW. Model definitions can be exported rather than just results for use with external applications. You can also share models seamlessly with different applications and systems, and enable operational processes to use data mining models obtained from multiple sources.

¹ Beginning with SAP BW 3.1 Content Add-On, the technology and business content delivery was separated. SAP BI 3.2 Content Add-On refers to the business content add-on and SAP BW 3.1 Content Add-On refers to the underlying technology release.

Glen Leslie

Glen Leslie is a product manager for SAP’s Business Intelligence solution. Originally a data warehousing consultant, Glen has been working with SAP BW since the 1.2B release in a variety of environments. If you have comments about this article or publication, or would like to submit an article idea, please contact the editor.