“Deep learning is a particular kind of machine learning that achieves great power and flexibility by learning to represent the world as a nested hierarchy of concepts, with each concept defined in relation to simpler concepts, and more abstract representations computed in terms of less abstract ones.”
Similar to the ML model, a DL model also takes in an input, X, and learns high-level abstractions or patterns from it to predict an output of Y. For example, based on the stock prices of the past week, a DL model can predict the stock price for the next day. When performing training on such historical stock data, a DL model tries to minimize the difference between the prediction and the actual values. This way, a DL model tries to generalize to inputs that it hasn’t seen before and makes predictions on test data.
Now, you might be wondering, if an ML model can do the same tasks, why do we need DL for this? Well, DL models tend to perform well with large amounts of data, whereas old ML models stop improving after a certain point. The core concept of DL, inspired by the structure and function of the brain, is called artificial neural networks (ANNs).
Being at the core of DL, ANNs help you to learn the associations between sets of inputs and outputs in order to make more robust and accurate predictions. However, DL is not only limited to ANNs; there have been many theoretical advances, software stacks, and hardware improvements that bring DL to the masses. Let’s look at an example in which we want to develop a predictive analytics model, such as an animal recognizer, where our system has to resolve two problems:
- To classify whether an image represents a cat or a dog
- To cluster images of dogs and cats.
If we solve the first problem using a typical ML method, we must define the facial features (ears, eyes, whiskers, and so on) and write a method to identify which features (typically non-linear) are more important when classifying a particular animal. However, at the same time, we cannot address the second problem because classical ML algorithms for clustering images (such as k-means) cannot handle nonlinear features. Take a look at the following diagram, which shows a workflow that we would follow to classify if the given image is of a cat:
DL algorithms take these two problems one step further, and the most important features will be extracted automatically after determining which features are the most important for classification or clustering. In contrast, when using a classical ML algorithm, we would have to provide the features manually. A DL algorithm takes more sophisticated steps instead. For example, first, it identifies the edges that are the most relevant when clustering cats or dogs. It then tries to find various combinations of shapes and edges hierarchically, which is called ETL.
Then, after several iterations, it carries out the hierarchical identification of complex concepts and features. Following that, based on the features identified, the DL algorithm will decide which of the features are most significant for classifying the animal. This step is known as feature extraction. Finally, it takes out the label column and performs unsupervised training using autoencoders (AEs) to extract the latent features to be redistributed to k-means for clustering. Then, the clustering assignment hardening loss (CAH loss) and reconstruction loss are jointly optimized toward an optimal clustering assignment.
However, in practice, a DL algorithm is fed with a raw image representation, which doesn’t see an image as we see it because it only knows the position of each pixel and its color. The image is divided into various layers of analysis. For example, at a lower level, there is the software analysis—a grid of a few pixels with the task of detecting a type of color or various nuances. If it finds something, it informs the next level, which, at this point, checks whether or not that given color belongs to a larger form, such as a line. The process continues to the upper levels until the algorithm understands what is shown in the following diagram:
Although a dog versus a cat is an example of a very simple classifier, software that’s capable of doing these types of things is now widespread and is found in systems for recognizing faces, or in those for searching an image on Google, for example. This kind of software is based on DL algorithms. By contrast, if we are using a linear ML algorithm we cannot build such applications, since these algorithms are incapable of handling non-linear image features.
Also, using ML approaches, we typically only handle a few hyperparameters. However, when neural networks are brought into the mix, things become too complex. In each layer, there are millions or even billions of hyperparameters to tune—so many that the cost function becomes non-convex. Another reason for this is that the activation functions that are used in hidden layers are non-linear, so the cost is non-convex.
II. Some applications of Deep learning in Supply Chains
Please refer to this article for some applications that have been tested in Python.