This is the first installment of a series of three articles that help with understanding the basics of the SAP Predictive Analytics toolset and its history. In this part, learn about the automated analysis component of SAP Predictive Analytics (specifically, a classification analysis) and see how it can be used to generate money-saving analysis of your data.
Key Concept
In this first part of a series, Ned Falk discusses all the data-mining/predictive analysis tools SAP offers (for a wide perspective) and then focuses on one of the newest tools, SAP Predictive Analytics. Learn how to use the Automated Analysis component within SAP Predictive Analytics (previously marketed separately as SAP InfiniteInsight) to classify your customers as well as to predict the profile of a customer who would actually buy your products.
SAP has a long history with data-mining tools. The newest tools provide robust statistical analysis with guided user interfaces (GUIs) that are very easy to deploy and understand. The newest toolset, SAP Predictive Analytics, is the culmination of this long history. In this article I cover the basics of what you need to know to best use this tool.
A Definition of Data Mining and a Brief History
For many, the term data mining means drilling down or selecting more rows of data. For example, most people would think that (using an output listing sales by month and customer to illustrate) filtering on a specific month and drilling down to the part numbers for the customer in that month would be an example of data mining. However, that is not what data mining is at all. Rather, a common definition of data mining (from Merriam-Webster.com) is “the practice of searching through large amounts of computerized data to find useful patterns or trends.”
Over the years, the ability to mine data has spread, moving from being the domain of a few math geeks at universities with supercomputers on to business experts who apply easy-to-use data-mining tools to basic business situations. Over the course of my career I have worked with data-mining tools at SAP for many years. The first SAP toolset for data mining was SAP BW-based, but its development was driven by the SAP CRM team because data mining is a central focus for improving CRM business processes in many ways. For example, using:
- Clustering to group together customers for targeted marketing
- Decision trees to predict buying behavior
- Association analysis to suggest additional secondary products to purchase when a primary product is selected
This original toolset for data mining consisted of the Analysis Process Designer (APD) and the Data Mining Workbench. Both of these tools were part of the basic license cost of SAP BW and were highly integrated with SAP CRM, features that made this tool very compelling for companies.
Two other more recent data-mining toolsets come as an offering integrated in the new SAP HANA database: the Predictive Analysis Libraries’ (PAL) data-mining algorithms and the Application Function Modeler (AFM). The AFM is a graphical modeling tool to feed data into and out of the algorithm. These tools allow SAP HANA programmers and experienced SAP HANA modelers to leverage data-mining algorithms. Also, there is a new SAP HANA Analysis Process (HAP) tool, which is an SAP BW-based tool that gives SAP BW modelers access to the HANA PAL through the SAP BW application server.
What binds most of these tools together is a third tool—SAP Predictive Analytics—which I discuss in more detail later in this article. All these tools bring together pre-packaged data-mining algorithms that hide the underlying complex math, while also allowing for a way to source and pre-process data to and from the algorithm.
Each of these tools has advantages and disadvantages, summarized by the matrix in Table 1. This matrix is not 100-percent definitive, but rather contains some generalizations about each of these tools.
Toolset |
Target audience
|
Data-mining power/level of sophistication
|
Integration/Visualization
|
SAP HANA PAL |
SAP HANA programmers |
High |
Requires an external tool (BusinessObjects reporting tools for example) to visualize and programming to feed and save data. |
SAP HANA AFM (with SAP Predictive Analytics) |
SAP HANA modelers |
High |
Requires an external tool to visualize; its graphical tool helps source data and store results. |
SAP BW’s transaction codes RSANWB and RSDMWB |
Advanced business users and BW modelers |
Medium/Low |
Requires BEx Query and other reporting tools to expose data to end consumers. |
SAP HAP |
BW modelers |
Medium to high |
Limited extraction, transformation, and loading (ETL) and requires external tool to visualize data. |
SAP Predictive Analytics |
Advanced business users up to data scientists |
High |
Integrated in SAP Lumira for visualization, as part of the overall process. |
Table 1
A quick guide to SAP’s data-mining tool options
SAP Predictive Analytics
Now that you have some history and background, let’s deepen the discussion about the last tool in Table 1, SAP Predictive Analytics. The name Predictive Analysis was the name of a toolset initially brought into SAP by its acquisition of BusinessObjects, but this tool has changed a lot since then. This is partly due to the creation of SAP HANA and SAP Lumira (a data-visualization tool) and partly due to another acquisition, a data-mining company called KXEN. KXEN’s InfiniteInsight tool’s claim to fame was its very powerful data-mining algorithms, combined with its ease of use for even a total beginner. This made the KXEN product very appealing to a variety of users, from business analysts to novice data scientists. After the KXEN acquisition, this easy-to-consume (non-BW-centric) product was re-branded as SAP InfiniteInsight.
SAP realized that it now had too many tools. To keep in line with its new “simple” mantra, SAP needed to increase usability and decrease complexity, but still provide very sophisticated data-mining tools to all classes of users. The result of this streamlining effort is the newest SAP Predictive Analytics 2.0 toolset. With just one install, you get the guts of both Predictive Analytics and what was previously the separate SAP InfiniteInsight tool. I am sure that there is more integration of these planned in the future, but what is available now does simplify the landscape.
For now, in the new SAP Predictive Analytics 2.0, the central component of the previous version of Predictive Analysis is found under Expert Analytics, and what was previously SAP InfiniteInsight is now found under Automated Analytics, both shown in Figure 1.

Figure 1
Home page of SAP Predictive Analytics 2.0
The Basics of Predictive Analytics: Automated Analytics – Classification Model
The main function of the Automated Analytics tool is to guide you through a data-mining model use case from beginning to end. (I am leaving the introductory discussion of the Expert Analytics function for a later article.)
To illustrate the functionality of the automated analysis wizard, I show how to create a new classification model. However, there are many more choices, such as regression and association analysis, just to name a couple.
Note
Full disclosure: A technical hurdle with my license for SAP Predictive
Analytics prevents me from showing you updated screenprints of the
Automated Analytics tool as a component of Predictive Analytics. Rather,
the following screenprints show the prior release, SAP InfiniteInsight.
However, for the purposes of the article, the basics have not changed.
Using the old SAP InfiniteInsight release, start the wizard for the classification demo by selecting the Create a Classification/Regression Model option (Figure 2). If you are using the newer 2.0 version, choose the Modeler option under Automated Analytics and then select the Classification/Regression option.

Figure 2
Select the create a new classification/regression model option
Next, the wizard quickly and logically walks you through the required steps for creating a new classification (or regression) model as detailed below.
Classification Model: Step 1 – Source the Data to Train the Model
The role of the classification model is to take a set of data where many variables (attribute fields with data) exist along with information about the outcome of predictable variables, to determine the statistical relationships between the attributes and the predicted outcomes. For example, the data for many metal welds might include who made the weld, how many years of experience the welder had, when was the tool made, with what type of tool, and who made the welding tool (attributes). In addition, for all this data to be useful for training a classification model, you must also know the answer to the question: Did any particular weld in the set fail (predicted variable)?
Then, after applying the model to the training set of data with the known outcomes, you apply the model to another set of data, using the model to predict the outcomes when you’re not sure. In this weld example, you collect information about welds that have not failed and predict if they will.
In my example, I’m not using welds, but rather customers. I am using their attributes to build a classification model to determine future customer purchasing behavior as the predicted variable. More specifically, using the Automated Analytics tool, I try to predict, based on the attributes of customers, if they are likely to purchase products promoted by company X.
Once you select the classification/regression option in Figure 2, the wizard prompts for the location of a source of data to initially train the model. In my case the source is a simple comma-separated values (CSV) file on my PC, but you can source data from any number of places (for example, from any SAP HANA database tables and views).

Figure 3
Select the source of data for training the model
After you’ve selected the source of the data, click the Next button (in the bottom right of the figure) and the Data Description screen opens. This screen initially appears without any metadata in the bottom half, but when you click the Analyze button on the right, the system parses the data (in this case, the CSV file) and suggests the types of fields and some other metadata that are relevant to the data model. This metadata, once parsed, shows up in the bottom of the screen in Figure 4.

Figure 4
The metadata description of the training data after analysis
Under the covers, this math model needs to know, for example, which fields are numbers and which hold other characters. Even more than that, the model needs to know if some of the numbers are continuous (like a customer’s income), ordinal (the number of years of education for the customer), or nominal (like a meaningless number code assigned to indicate a color).
The wizard does a pretty good job determining this metadata, but if you have details that the system might not, you should change the settings to help the model work more accurately. For example, if you know a field called Color has just three possible entries for your data (1= Red, 2 = Blue and 3 = Green) and the wizard incorrectly thought this was a continuous field, you would need to correct the wizard and set the field as ordinal, not continuous.
Once you have parsed the data, click the View Data button (boxed in red in Figure 4) to see the data that the model will use for this first training phase (Figure 5). As you can see in Figure 5, there is a bunch of demographic data about customers. What is not shown (but appears in this screen) is a field on the right, named target_response. This field contains a 0 if the customer you contacted did not buy the product and a 1 if it did.

Figure 5
View the model training data
Now that you have parsed the data and assigned the right value (e.g., ordinal, nominal, and so on), click the Close button. This takes you back to the screen in Figure 4, from which you can continue to use the dataset as defined below.
Classification Model: Step 2 – Choose the Variables
Click the Next button in the bottom right corner of Figure 4, which takes you to the screen where you can choose the variables for the model (Figure 6). Here the modeler (in this case, you) can choose the variables that you want the model to consider. In this context, variables are just the fields describing customers and their buying behaviors. They are called variables as they are flexible factors that the data-mining model could consider, but does not have to consider.

Figure 6
Choose the model’s variables
What is critical in this screen is that you must select at least one variable to be your target variable. In addition, in this case I also decided to exclude a few (census) fields from the model analysis. The excluded variables (the bottom right of Figure 6) include race and gender amongst others. This time I did this for demo purposes, but there might be good reasons to do this, either technically, or from a business perspective, or for legal reasons.
In the first situation, the technical one, a feature of the tool allows for the system to show which variables have a greater impact in determining the outcome of the model. As each model variable increases complexity and run time, excluding some low-level outcome contributor variables might make sense. For the business reason, it might be too expensive to provide the values for some variables in the next predictive phase where the model is applied. For example, you have to pay a lot of money for age information on each customer, and the model shows that age has a low impact on the model accuracy. If you know these facts ahead of time, this is the time to exclude these variables.
The operation of this screen doesn’t require much explanation. In my case I select target_response as the target variable and (as explained above) I exclude a few other variables. FYI, the target response in my data holds the results of my internet analysis. For example, a 1 in this variable for a customer means they have purchased products from my website, and a 0 means they visited the website but did not make a purchase. I used the blue arrow icons (in the center of Figure 6) to make sure the variables appear in the correct boxes, but you can also use the drag-and-drop method to move the variables around.
Now you can go to the next step, where you get the big picture on how you have defined the model.
Classification Model: Step 3 – Summarize the Model Parameters
Click the Next button in Figure 6, and the wizard provides a final summary of some of the parameters you used so far (Figure 7). Check the parameters carefully and, if all looks well, click the Generate button. This generates the underlying algorithm with the customizations previously selected.

Figure 7
Review and then generate the new training model
Classification Model: Step 4 – Review the Output from the Model’s Training Phase
Once you click the Generate button (in Figure 7), the wizard trains the model and provides summary reports about the training (Figure 8). Here you could, for example, change the Report Type from the default Model Overview type to generate reports about which variables have the most effect (contribution) in determining the target response. Simply change the report type using the drop-down options in the field.

Figure 8
A trained classification model overview report
Now, focusing on the screen in Figure 8, here are some important points about the defaulted model overview report that I want to highlight.
The Nominal Targets section shows a higher frequency of no responses (0 – Frequency) for the target response versus the yes responses (1 – Frequency). In this case, 76.05% versus 23.95%.
The model has a Predictive Power of .7959 and a Prediction Confidence of .9948. The Predictive Power value indicates the quality of the model to predict better than a random guess, and a positive value means it is better that nothing at all.
The Prediction Confidence (KR) value is, in SAP speak, a robustness indicator. SAP help tells you that this value should be greater than .95. Otherwise, when you apply the model to new data when you’re trying to predict the target outcome, you might not get good results.
A Target Key value of 1 indicates a hypothetical perfect model. This is where every target result can be mathematically explained by the other model variables.
For more information on these critical measures of the model, refer to SAP help.
The model is now complete and ready to be used to predict customer behavior for an unknown dataset of new customers.
Classification Model: Step 5 – Use the Model
Now that you’ve trained the model, the next step is to use it. Click the Next button in Figure 8 and the Using the Model option screen opens (Figure 9). In this section you can choose to again display some of the overview reports or display reports about the impact of different variables in influencing the target variable (called a Contribution by Variables report, not shown). This report tells you, for example, if gender is more useful for predicting buying behavior than age.

Figure 9
Use the model to display additional reports
Another interesting report is the Confusion Matrix report (Figure 10). This report shows how likely the model is to choose the opposite result when it is applied back on top of the known dataset. For example, I know that customer Bob purchased products, but the model, if used on a record like that, would have predicted that he would not have purchased them. It would be confused. In this case, the Confusion Matrix shows that 7.9 percent of the time the model would predict that customers would not buy (0) when, in fact, they actually did (1).

Figure 10
The confusion matrix report
After reviewing the accuracy of the information provided by the Confusion Matrix, you have a better idea about how robust the underlying predictions the model will be generating are.
Classification Model: Step 6 – Run (Apply) the Model
In this final step, you want to run (apply) the new model to predict customer buying behavior.
Once you’re satisfied with the model and have displayed reports about its prediction prowess, it’s time to use it to predict the buying behavior of a new group of customers to whom you have not yet tried to sell. The goal here is to spend more energy selling to customers who are more likely to buy. Click the Previous button (Figure 10) to return to the prior screen (Figure 9). This time, select the Run option on the left, which opens the Applying the Model screen (Figure 11).

Figure 11
Settings for applying the model and getting predictions as output
The critical options and their settings are:
The location of a data source (the Data field) where there are unknown (blank values) in the target response variable. In this case it is the census apply file.
The Generation Options are Decision (Generate field) and Apply (Mode).
Provide a location and a name for the output prediction data (the Data file); in this case, DECISION_OUT.
Once you’ve made your entries, click the Apply button. The system executes the algorithm it created during the model training phase on this new set of data to predict customer buying behavior. As you can see in Figure 12, the prediction run is successful.

Figure 12
A successful prediction run of the training model
The last step is to view the generated output. Click the View Output button in the center of Figure 12 and the results are shown in Figure 13.

Figure 13
The prediction model run results
Figure 13 shows the most critical field, the decision_rr… column. These fields contain the results of the prediction for each input record. For example, the prediction for customer 1200014 is that they will buy (1) and for customer 1200015, that they will not buy (0).
Review the SAP Class offerings for BOII10 (the current SAP InfiniteInsight class) and new classes coming out soon for the new combined Predictive Analytics/SAP InfiniteInsight offering based on SAP Predictive Analytics 2.1: PAII10.
Ned Falk
Ned Falk is a senior education consultant at SAP. In prior positions, he implemented many ERP solutions, including SAP R/3. While at SAP, he initially focused on logistics. Now he focuses on SAP HANA, SAP BW (formerly SAP NetWeaver BW), SAP CRM, and the integration of SAP BW and SAP BusinessObjects tools. You can meet him in person when he teaches SAP HANA, SAP BW, or SAP CRM classes from the Atlanta SAP office, or in a virtual training class over the web. If you need an SAP education plan for SAP HANA, SAP BW, BusinessObjects, or SAP CRM, you may contact Ned via email.
You may contact the author at ned.falk@sap.com.
If you have comments about this article or publication, or would like to submit an article idea, please contact the editor.