Analytics Innovation in The Cloud
Getting into the pilot seat
Pilots are a critical element in advanced analytics journeys. For those not familiar with this jargon, a pilot essentially tests an analytics application use case in a scaled-down, siloed environment. And as organizations evolve in their analytics journey, they will go through several pilots. I believe that for an organization to build truly intelligent enterprise capabilities by the end of this decade, it will have to run hundreds of pilots. Remember, experimentation is critical to innovation. Now there are two aspects of a pilot when you think about it from a supply chain and operations perspective. One aspect is to test the algorithm, outside of it IDE, with real-world data and the second aspect is to leverage the recommendations to make changes to business processes. This article focuses on the first one. And we will call it algorithm pilot from this point onward.
If you have spearheaded projects about running these algorithm pilots and helped design pilot environments and architectures, you know it is a messy process. An algorithm gets built, trained, tested, and validated is a siloed environment, and most of the data used is fed as piecemeal data files. Sometimes quick and dirty data lakes are created to support the development. But the real challenge comes into play when you test the algorithm “outside the lab”. You want to see how it would behave when fed the data pipelines it will interface with and consume in the real world. That…is the true definition of an algorithm pilot. I see many professionals confusing the first stage of developing an algorithm and testing and validating it as a successful algorithm pilot. It is not! The algorithm pilot is essentially putting your algorithm out in the field.
But the view from the Pilot’s seat is not always beautiful
And to take the algorithm outside the lab and “simulate” a test environment among organizational systems is a process that is not easy. IT resources are requested in advance, approvals, what not. And if you have done this before, you know that several chinks start showing up in the model as soon as it gets out of the lab and starts “tasting” real work data. Many tweaks are made, and essentially, a significant portion of the model development process happens again!
And hence, organizations should create a standardized test environment these days in the cloud- a cloud-based environment created within an organization’s infrastructure in the cloud, dedicated exclusively for running pilots.
A standardized, plug-and-play environment dedicated to pilot algorithm development
The concept is not radical in terms of infrastructure or technology involved. The only pre-requisite is that you leverage Infrastructure as a Service (IaaS) and Platform as a service (PaaS) and the majority of large organizations have the key components of what is required already in the cloud. And this is where the beauty of a cloud infrastructure comes into play.
If you start thinking about the key components that you need to run a robust algorithm pilot, the first five aspects that may come to your mind are:
- Ease of creating an architecture for the pilot
- Connectivity to real-world organizational data systems for robust “real world” testing
- Interdependency of the use case (pilot) on other use cases (pilots)
- Ability to scale up fast
- Ease of transition to production
And this is where the Infrastructure as a Service (IaaS) component of cloud computing comes into play. We will get an overview of how this can be done later in this article. This is the first step, where you create an environment with access to all the real-world datasets. You can compartmentalize this “pilot environment” so several pilots can run simultaneously.
But an environment with access to all data points is not the only benefit of having this dedicated pilot environment. And to illustrate the other aspects, I will use an example. Since my experience in hands-on data science and analytics is in the supply chain and distribution, we will use an example from that area.
Understanding with an example- Warehouse Analytics
Suppose that you are on the path towards building intelligent warehousing and distribution operations, and a vital aspect of this “intelligence” that you envision is powered by analytics. As a first step, you identify use cases, as shown below, where you think advanced analytics can add significant value. Let us review them, as we will need that information later.
Algorithm Pilot 1: Simulation models are frequently used by Industrial engineers for designing warehouse flows, but their usage is primarily prescriptive. However, if you have designed a simulation model accordingly, you can train a neural network from thousands or millions of simulation runs to understand every possible inbound and outbound flow scenario and what can be the best way to operate the warehouse. Note that two key given aspects that the simulation model will use (along with many others) are known inbound and outbound flow/demand patterns.
So if your warehouse is suddenly overflowing, the deep learning algorithm that is monitoring the data will flag that at a certain point, your flows, schedule, labor assignment, etc. need to be modified to avoid the warehouse from overflows and disruptions. All the tools and technology to develop something like this exists already so there is nothing futuristic about this pilot.
Algorithm Pilot 2: As a logistics professional, you are probably already thinking about managing your dock flows optimally and the impact any mayhem on your docks can have on an algorithm like Pilot 1. As an example, if your dock is overflowing, the algorithm from pilot 1, in silo, will suggest a replan that does not know how docks will manage the overflow in the next few hours. Without that information, if it assumes that the docks will function as normal or at an overflow state rate, the new plan will be worthless.
And hence optimizing docks in tandem is necessary. Dock schedule optimization algorithms will typically be heuristics with multiple optimization algorithms embedded. It can be implemented with open-source tools.
Pilot 3: But you are a smart warehouse manager, and therefore you are already thinking about the challenges of optimizing your dock without having full visibility and control into your yard. And this is where the third pilot comes into the picture. This is relatively simpler since many smart Yard Management Solutions (YMS) exist. You need a “linking algorithm ” (read about linking algorithms here: https://www.sapinsideronline.com/blogs/what-are-linkage-algorithms-and-why-they-can-be-really-powerful/). This algorithm taps into the intelligent YMS data, culls relevant data and translates that into desired information for the dock optimization heuristic.
Pilot 4: Since you are strategic, you know that eventually, with all the fancy analytics you plan to run, you will uncover a need to redesign the warehouse layout and flows. So you have planned a pilot around this aspect as well.
Now there can be a few more pilots in the warehouse analytics domain (and hopefully, the four examples above already made you start thinking in that direction), but for the sake of this example, let us say that you have finalized these four pilots. But what was THE one theme you noticed as we moved from one pilot to another? You noticed that:
And imagine the chicken or egg situation here. If you are optimizing in a silo, it is never optimal. But creating a massive pilot of an end-to-end analytics platform also does not make sense and may not be realistic (unless you are developing an off-the-shelf solution). So what do we generally do? We go with the lesser evil. We build siloed pilots and build a business case based on their results. And the $ savings in the real world may not translate into what the siloed pilot suggested because of the interdependencies.
And this is where the “pilot in the cloud” solution can help. In the environment where they all have access to the same data, and output their results in the same data pool, only to get consumed by the other pilot algorithms, it will mimic the true end-to-end aspect.
How the setup will work at a high level
To simplify it, here is what you need to do (remember, as mentioned earlier, that a critical pre-requisite however is that you have a robust IaaS and PaaS solution, which all leading cloud solutions are).
Create a master development environment in the cloud (I have heard from many developers that they believe that cloud advantages are not so common in the world of software development vs business applications. I disagree! Cloud based IDEs like AWS cloud9, code server, gitpod etc. have been around for some time now and I have seen more sophisticated offerings emerge recently like Koding, which by the way is open source and free and will work on any of your favorite hyperscalers like AWS, Azure and Google Cloud). Next, you can configure a data source location (for example as an AWS EC2 ), that connects to your overall data hub as a node and then this source communicates with something like an AWS cloud9.
The above description has left the granularities of setting this up on purpose but the key aspect here is that you can use your data hub to push data that your model would need, in the data pool created (like the EC2 example). Remember that when you are setting up something like this, you create a standardized development environment that is familiar to your IT Ops and support folks as well (in case you need them). You can further standardize this pilot environment by providing application frameworks, code samples and development tools.
One key aspect that you need to keep in mind though is that you need to make sure that your application & data architecture is designed for interoperability and multi-cloud flexibility.
Now there are many additional benefits but most of them are benefits that come with cloud so I will not dig deeper into them.
What does this mean for SAPinsiders ?
To summarize it, as indicated in the illustration, having your pilot IDEs in the cloud, tapping into the same data source, that is near real time real world data, provides you the following benefits:
- Seamless connectivity to all data points
- Test exchange of information among modules rapidly and as close to production environment as possible
- Faster scaling in testing
- Ease of transition to production
- Seamlessly connect to other tools to create platforms
- Minimal additional effort required if looking to offer as a SaaS product (though the architectural setup will need to be a bit more intense in this case)
Kumar Singh, Research Director, Automation & Analytics, SAPinsider, can be reached at email@example.com