Deep learning will not transform your Supply Chain : Unless you know where and how to leverage it

Are Motor cars better than horses?

The following advertisement for the International Commercial Truck, circa 1910, is on display in Maine’s Owls Head Transportation Museum:

“That the motor truck is an excellent substitute for the horse has been proven in every instance where business men have given it a fair trial. But the man who uses his motor truck simply as a substitute for horses neglects to make the most of his opportunities. The horse is not a machine—five to six hours’ actual work—fifteen to twenty-five miles—is its maximum day’s work. A motor truck can be used twenty-four hours a day if necessary, and it will travel the last hour and the hundredth mile just as fast as the first.

“Business men who are using the motor truck in place of horse and wagon equipment with the greatest success are men who have given this problem careful study. In most instances it was necessary to change the plan of routing—delays which were necessary to give the horses rest were eliminated—plans were laid to keep the truck busy the entire day with as few delays as possible…” 

Reinventing the wheel ?

What do you interpret from the advertisement above. In my mind, it is a brilliant representation of our struggle with new technologies, deep learning or not. We think about any new technology from the context of existing technologies or processes. Winners in this decade will be the ones who develop a breed of managers who embrace a new technology to redefine processes and offerings- and do not try to force fit it onto existing processes.

Recently, I saw an article from MIT Sloan review about an experiment done by professors from Michigan and MIT. They compared the accuracy rate for a Regression model predicting credit card customer acquisitions with a Deep learning model and found that the percentage difference was only 3%. Deep learning obviously is more computing intense than regression model so the question was – is it worth using Deep Learning for a mere improvement in accuracy of 3 percentage points ?

Obviously not.

But as authors note, Deep learning may not be a good method to use at all. You can read the article here Deep learning in Marketing Analytics but cutting through all the verbiage in the article, my “manipulated” interpretation of the message is:

If simple works well enough, don’t try complex. Use complex to change paradigms – achieve things that have not been done before.

Where to use DLs in Supply Chain context ?

Now the article mentioned above was Marketing analytics focused so you may ask, where can (or should) we use Deep Learning in Supply Chains ? Not in forecasting- is my suggestion, unless you have a very complex forecasting challenge like Fashion Apparel.

So where should you use Deep learning ? To change the paradigms of your business. To erase the constraint boundaries of your operating model and create new boundaries. Can real time analytics on shipments moving through your Supply Chain provide you an ability that you did not have before ? Can extracting insights from trillions of transactions happening over hundreds of systems in your network provide you a capability that can be game changer ? Can you capture images of inventory in your website and translate those into real time insights of your warehouse efficiency ? What if an Algorithm driven setup takes over setting and calculating time standard metrics of your warehouse ?

Above mentioned examples are just some broad high level examples. See the appendix for some more suggested applications that I have recently experimented with. The idea, as far as leveraging Deep learning in Supply Chain goes is :

Use Deep learning to develop capabilities that you did not have before. If there was some for of analytics that was extremely challenging to perform, automating a process that required some sort of cognitive (human) inputs, taking control of basic to intermediate Supply Chain planning decisions, generating real time, end to end Supply Chain insights across your network.

Do not introduce complexity in the analytical or automation processes that do not actually need that level of complexity.  

How to leverage it is equally important. That also ties to the complexity aspect. Within the same process, you may need a more granular application of deep learning, on a part of the process, whereas the remainder can be plain automation. Remember that Deep learning is computing power intense so optimizing its application across the network not only saves you $ but also minimizes any latency….you may be counting on some of the decisions or recommendations made by your algorithms.

Appendix:

I. How did DL take ML to the next level?

Simple ML methods used in small-scale data analysis are not effective when dealing with large and high-dimensional datasets. However, deep learning (DL), which is a branch of ML based on a set of algorithms that attempt to model high-level abstractions in data, can handle this issue.Ian Goodfellow defined DL in his book “Deep Learning, MIT Press, 2016″ as follows:
“Deep learning is a particular kind of machine learning that achieves great power and flexibility by learning to represent the world as a nested hierarchy of concepts, with each concept defined in relation to simpler concepts, and more abstract representations computed in terms of less abstract ones.”

Similar to the ML model, a DL model also takes in an input, X, and learns high-level abstractions or patterns from it to predict an output of Y. For example, based on the stock prices of the past week, a DL model can predict the stock price for the next day. When performing training on such historical stock data, a DL model tries to minimize the difference between the prediction and the actual values. This way, a DL model tries to generalize to inputs that it hasn’t seen before and makes predictions on test data.
Now, you might be wondering, if an ML model can do the same tasks, why do we need DL for this? Well, DL models tend to perform well with large amounts of data, whereas old ML models stop improving after a certain point. The core concept of DL, inspired by the structure and function of the brain, is called artificial neural networks (ANNs).

Being at the core of DL, ANNs help you to learn the associations between sets of inputs and outputs in order to make more robust and accurate predictions. However, DL is not only limited to ANNs; there have been many theoretical advances, software stacks, and hardware improvements that bring DL to the masses. Let’s look at an example in which we want to develop a predictive analytics model, such as an animal recognizer, where our system has to resolve two problems:

  • To classify whether an image represents a cat or a dog
  • To cluster images of dogs and cats.

If we solve the first problem using a typical ML method, we must define the facial features (ears, eyes, whiskers, and so on) and write a method to identify which features (typically non-linear) are more important when classifying a particular animal. However, at the same time, we cannot address the second problem because classical ML algorithms for clustering images (such as k-means) cannot handle nonlinear features. Take a look at the following diagram, which shows a workflow that we would follow to classify if the given image is of a cat:

Capture

DL algorithms take these two problems one step further, and the most important features will be extracted automatically after determining which features are the most important for classification or clustering. In contrast, when using a classical ML algorithm, we would have to provide the features manually. A DL algorithm takes more sophisticated steps instead. For example, first, it identifies the edges that are the most relevant when clustering cats or dogs. It then tries to find various combinations of shapes and edges hierarchically, which is called ETL.

Then, after several iterations, it carries out the hierarchical identification of complex concepts and features. Following that, based on the features identified, the DL algorithm will decide which of the features are most significant for classifying the animal. This step is known as feature extraction. Finally, it takes out the label column and performs unsupervised training using autoencoders (AEs) to extract the latent features to be redistributed to k-means for clustering. Then, the clustering assignment hardening loss (CAH loss) and reconstruction loss are jointly optimized toward an optimal clustering assignment.

However, in practice, a DL algorithm is fed with a raw image representation, which doesn’t see an image as we see it because it only knows the position of each pixel and its color. The image is divided into various layers of analysis. For example, at a lower level, there is the software analysis—a grid of a few pixels with the task of detecting a type of color or various nuances. If it finds something, it informs the next level, which, at this point, checks whether or not that given color belongs to a larger form, such as a line. The process continues to the upper levels until the algorithm understands what is shown in the following diagram:

Capture.JPG

Although a dog versus a cat is an example of a very simple classifier, software that’s capable of doing these types of things is now widespread and is found in systems for recognizing faces, or in those for searching an image on Google, for example. This kind of software is based on DL algorithms. By contrast, if we are using a linear ML algorithm we cannot build such applications, since these algorithms are incapable of handling non-linear image features.

Also, using ML approaches, we typically only handle a few hyperparameters. However, when neural networks are brought into the mix, things become too complex. In each layer, there are millions or even billions of hyperparameters to tune—so many that the cost function becomes non-convex. Another reason for this is that the activation functions that are used in hidden layers are non-linear, so the cost is non-convex.

II. Some applications of Deep learning in Supply Chains

Please refer to this article for some applications that have been tested in Python.

A Supply Chain Executive’s summary of Deep Learning : With 15+ innovative application opportunities across Supply Chain

———————————————————————————————————-

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s