Predicting Production time by leveraging Machine Learning : An applied use case

As I have insisted consistently in my articles and posts on LinkedIn, the real value of Smart Manufacturing will be in the data generated by the architecture of Smart Factories.  While some applications are pretty well known at this point and are already being leveraged by some organizations, the real competitive advantage can be gained by leveraging analytics to develop capabilities that are not “conventional” methods.

A Real life pilot

This example is from a real life pilot coducted at a footwear manufacturing plant. The Manufacturer is one of the top Manufacturers in India, exports a major volume to 16 different countries. The company had recently built an automated manufacturing facility for footwear and the line was generating tons of data. I was advising them as a remote external consultant on how to make the best use of their data and we ran few pilots to validate some opportunity areas.

Defining the opportunity

We know that accurate prediction of manufacturing lead times (LT) significantly influences the quality and efficiency of production planning and scheduling (PPS). Traditional planning and control methods mostly calculate average lead times, derived from historical data. This often results in the deficiency of PPS, as production planners cannot consider the variability of LT, affected by multiple criteria in today’s complex manufacturing environment.

In case of footwear manufacturing, sophisticated LT prediction methods may be needed, due to complex operations, mass production, multiple routings and demands to high process resource efficiency, as we will see in the description of the manufacturing process below. To overcome these challenges, supervised machine learning (ML) approaches can be employed for LT prediction, relying on historical production data obtained from manufacturing execution systems (MES).

Production process

At a high level, the production steps in the manufacturing process were:

  • Cutting
  • Skiving,
  • Mounting and
  • Cleaning

Obviously every step had different process cycle type for each product type.

Products and attributes

At a high level, five different types of product lines were being manufactured:

  • Shoes
  • Boots
  • Top boots
  • Sandals and
  • Slippers

As you can imagine, for each product  the above mentioned process varied and used different equipments in many instances. For instance, a high end “hand crafted” shoe would  be cut by hands while another ones could be cut by the leather cutting machine. In addition, the mounting type can be changed between manual or automated (machine).


For each product line as well, different SKUs may have different  flow depending on attributes listed below.


Data Gathering and Cleaning- Real world challenges

I don’t need to highlight that Data gathering and cleaning was the most challenging aspect. A major portion of that challenge also came from the “hand crafted” processes.
Data generated by sensors was relatively an easy pluck since the primary challenge was only cleaning and making sense of it. . On the other hand, in some cases, data needed to be collected manually, for which, a team of Jagah.AI engineers did extensive time studies.

After the outlier and missing data elimination, a dataset that include seven
independent variable and production time of 57,600 shoes was finalized for the pilot. The independent variables and their values are provided in Table below and have also been identified above. Production time is noted in seconds in dataset.


Model Application

After the data gathering and cleaning step, data was divided into training and test
datasets. The architecture and flow setup of the modeling process is shown in the illustration below.


Decision Tree, K-nearest neighborhood (k-NN) and artificial neural networks
(ANN) techniques were used to train the models and the trained models were applied
to the test data set.


The accuracy of the models are compared using Mean Absolute Percentage Error (MAPE) and Mean Absolute Error (MAE) metrics.


According to results, best performance was obtained with CHAID decision tree model and the most important variable was found as the process type. This model is a classification method for building decision trees by using chi-square statistics to identify
optimal splits and divides production times into bins as a dependent variable that
differentiate respect to independent variables. The results reveal that Decision Tree model gives the best prediction for the above mentioned problem.



As a continuous learning process, manufacturing firms need to create a dynamic
predictive models that updates itself according to changes in dataset. Also, changes in
human and machinery factors can be regularly appended to the dataset. With use of extended techniques that come with the evolving use of machine learning, algorithms can give better results for the manufacturing analytics cases than some conventional approaches.

Also, as I have consistently mentioned in many instances, deep learning is not a good candidate for not so complex predictive analytics models, as we can see from the accuracy percentages in the table above.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s