Note: As a standard practice, I will include a plagarism detection link in all the articles on my blog site. You can use the online tool below to evaluate any article published on the web for plagarized content. All you have to do is to paste the url of the article there and it will go through every sentence in the article and flag any plagarism. I believe as a follower of “writer’s integrity” I should include this tool in all my articles:
More than an year ago, I wrote a post on this website about how k-Means clustering has some really useful, realistic applications in the area of Supply Chain. In the article, I offered to share some sample datasets that I had created to illustrate the applications along with the code (I was naive back then, doling out too much information for free) Anyways, I did receive couple of requests back then to share the solutions, that I obliged.
This article is a result of two events:
- Recently, in last two months, there have been multiple students who suddenly discovered my k-means article and requested the k-means code (note- just the code).
- Yesterday, I came across a research paper that shared an approach to leverage k-means clustering for Supply Chain segmentation.
So, combined with my knowledge of k-means clustering, and these two events, I believe there are three Analytics strategy insights, very important ones, that organizations need to pay attention to.
Insight 1: Data Science and Analytics education is lacking
Read this email exchange that I had with someone who is supposedly doing an entire technical thesis on leveraging AI in Supply Chain and that too from a University ranked among top 20 in Engineering. I had three such exchanges with different individuals who requested my “k-means code”.
There are two noteworthy aspects of the notes that I received from this person who is doing a technical thesis on ways to leverage AI in Supply Chain:
In the first email, they ask for “Python code for AI in Supply Chain”. If you are in the field of Data Science, you are probably already scratching your head. It is like asking- Can you share your SQL code for mining data from Teradata ? OR Can you share the tool you used for your plumbing project for my plumbing project (without telling you what “my plumbing project” was). Considering this person is an engineering student- NOT ACCEPTABLE.
I was so surprised that I had to go check on LinkedIn to make sure this person was for real and actually was a student at the University they claimed.
When asked for details, this time, they added a detail- “AI in Shipping Industry”. They needed k-means clustering code for AI in Shipping Industry. I can modify my plumbing analogy above to rephrase their “modified” question as: Can you give me tools for a plumbing project in the lower level of my house.
As you can see in the screengrab, I again asked them for details, because I wanted to help them. But it was evident that the more detailed questions were slowly going out of their area of expertise. When my question asked for specifics, they decided to stop responsing. They did not want my “Python AI code for Supply Chain” anymore.😁
What is the key lesson here ?
Despite exponential increase in analytics and Data Science related degrees, the shortage in the domain will remain. The key challenge is not that Data Science or Advanced analytics education does not exist. The key challenge is that the combination of the right education with right skills is rare or in scarcity.
The example above illustrates the point. You see- doing the k-means clustering is just a matter of using few lines of code to import Python libraries. The need of the hour, is for the next generation of analytics professionals to understand that they need to think without algorithms first. Define opportunity (problems), understand influencing parameters, devise a high level solution. AND THEN, they start refining the solution.
I see that the type of Data Science graduates we are churning and many who are in the market, are really good at executing if you tell them the exact solution, the model they need to use and the approach. What we need is more people who define the solution, identify the modeling approach etc. The emphasis on the first part, which is essentially writing codes to import libraries in R or Python, is steering Data Science education in the wrong direction.
This brings us to the second insight
Insight 2: Ignore the Algorithms Jazz and hire Smartly
Building upon the example from lesson 1, if I were to share only the code for k-means, as requested by the person above, that will be this (BTW, this screengrab is not from my work but from a customer segmentation model.)
Yes- don’t be surprised. Just these two lines ! After the Input data has been prepped into an input data set (data frame RFM_norm1 file), these two lines of code take the data and do K-Means clustering on the data. What would the person have done with the two lines of code above if I would have actually shared it ?
The key, in this problem, or my Inventory classification clustering method is not the algorithm itself.
What are the key lessons here ?
First, when you map out how you need to leverage analytics, as shared by me in various articles, start with startegy all the way down to the analytical approach. This crucial step will define whether your Analytics capabilities will stick around in a sustainable and productive way or not. Then, in actual model building, the key is to understand data requirements, collect only relevant data so as to minimize noise, be smart about parameter selection and feature engineering, proper data cleaning and scaling etc. Model building via Python libraries is “self service”.
Second lesson here is, understand what type of Data Science skills you need to hire for. I will start this lesson with this quote:
“Unless you are a research scientist or work for a huge corporation with a large R&D budget, you usually don’t implement machine learning algorithms yourself. You don’t implement gradient descent or some solver either. You use libraries, most of which are open sourced.”Andriy Burkov, “The 100 Page Machine Learning Book”, Page 41
Why is this quote relevant ? You do not always need rocket scientists in your Data Science teams. Only 5% of Data Scientists do the level of ML engineering that requires an intense PhD in say Computer Science. So in order to build your Data Science team, forget about the qualification everyone out there is hiring for. Determine your qualification based on the unique problems that your team should and will work upon. Develop your own unique hiring methods. Build your secret unconventional Data Science tribe.
Insight 3: Automation of Strategic Analysis is on the horizon
Going back to one of the drivers that made me write this article: ” I came across a research paper that shared an approach to leverage k-means clustering for Supply Chain segmentation. “
From a company perspective, this is an example of solution you can build. I often see organizations constrained by conventional methods and approches, primarily because they are marketed heavily by external partners who have expertise in these “comfort zone” methods. But the real competitive edge comes from developing solutions like this k-means clustering for Supply Chain segmentation, to a production level use. Imagine the benefits of having a tool that keeps track of your Segmentation strategy. Find similar unique opportunities across your Supply Chain.
What is the lesson here ?
Find “unique” applications of Data Science, like this one, if you really want to leverage Data Science as a competitive differentiator. Otherwise, your Analytics capabilities will be commodities like many others.
Views expressed are my own.