Macy’s Inventory problem and an Advanced Analytics solution

This blog is based on a news article in Forbes

Power of drawing insights from events in the Business World

When you see evolving trends and facts in the business world around you, think about how you could have leveraged process optimization, technology and/or analytics to solve the problem. That is the true meaning of “reading” – learning from what you read. Over a period of time, this habit will help you develop the indispensable skill of problem solving.

Macy’s Inventory Problem

“Most retail companies have reported sales and earnings for the third quarter of 2019. Inventories were well under control by most retailers despite somewhat lower sales. Only Macy’s misfired by having 1.5% higher inventory despite a drop of 3.5% in comparable store sales.”


*Nordstrom only reported total sales. The company did not report comparable store sales for the quarter

Quick insights from the chart

It is obvious that Macy’s expected more sales (since its sales were down by 3.5%) and because it could not sell what it expected to sell, it ended up with more inventory. However, look at other retailers in the same segment in the chart above. These retailers also have negative sales percentages, however, their Inventory on hand also has negative percentages, which essentially means that they were able to foresee this decreased sales, whereas Macy’s did not.

What could be the key driver ?

In my opinion, the core of a majority of excess Inventory problem can be traced to non desirable buying behavior and/or demand forecasting challenges.

Forecast numbers that are used to place orders

In my opinion, Macy’s forecasting lead times are much longer than competitors. Even if it is not, the specific nature of the items they buy and sell, are not a good fit to order maybe an year in advance. In fashion retailing, trends change rapidly and what may be considered a “hot” items few months ago, can become cold very soon, making the stocked inventory markdown items.

However, majority of items at Macy’s are not its private labels but are sourced from other manufacturers. This essentially means that Macy’s does not have much control over the Design to consumption lead time.

What is the challenge and how Analytics solutions can help?

As mentioned above, for most of these items, Macy’s may not have control on the design to consumption lead time- which impacts the forecasting and sourcing lead time as well. Assuming that as a constraint, what Analytical approaches could have been used?

Forecasting demand for fashion retail is one of the most difficult forecasting problems in the industry, given fast changing consumer tastes, long (> 8 months) design and production cycles, bulk manufacturing for cost efficiency, heavy competition on pricing, and increasing marketing costs. When planning for fashion merchandise, there is very little information available on what will be prevailing fashion in the future, what the competitor’s mix will be, and how particular pricing and marketing interventions may need to be applied to promote merchandise.

What retailers have is large volumes of previous years’ sales data and they use it to forecast future purchases using conventional techniques . While these help in estimating demand at reasonable levels of confidence for existing/previously sold merchandise, they cannot be used for predicting demand for new merchandise. Since multiple parameters in design interact non-linearly to define the look or appeal of an item in fashion, past sales data in itself is not instructive in predicting demand for future designs.

Details from this point onward are technical in nature.

Deep and Machine Learning can provide the solution

Deep learning models are optimal methods due to their ability to model feature interactions even if transient in time, so that they capture non-linear relationship between target and regressors. Typically, scale of historical demand for large retailers is also large (~1 million styles or items listed at any point in time) that limits the utility of SVM-like models that do not scale well for large sets of data and hyperparameters.  I followed the high level three step approach to building the model.

Step 1: Get the data: To evaluate the feasibility of leveraging a Deep Learning algorithm, I scouted online and finally got my hands on a dataset pertaining to a study of large scale fashion sales data.

Step 2: Feature Engineering: The next step was to directly infer which clothing/footwear attributes and merchandising factors drove demand for those items.

Step 3: Model selection, building and training: Finally, I tried building generalised models to forecast demand given new item attributes, and demonstrate robust performance by experimenting with different neural architectures and ML methods.

Models built were built using deep learning framework PyTorch , and were trained on my Laptop with 32GB of RAM and 8 core CPU.

I. Data Aspects

I have used Historical sales data for a leading Indian fashion e-commerce company, to train my models.

I can not share any additional details of models and the underlying data, than in the article, due to Non Disclosure Agreement with the company.

The data contained 5 different article types. In  fashion ontology, an article type is a hierarchy level that contains items which can be characterized by a similar set of attributes, for example – Shirts, Casual Shoes, Tops, Vests etc. are article types, and particular items listed under these may be referred to as style or item alternately.

Data for only those items which were cataloged or went live in the last two years was included in the model data set. Data for items which went live in the first year was used for training, and data for items that went live in next 6 months was used as validation set. The validation set was used to tune hyper-parameters of the models, using standard validation techniques. Finally, a test set of subsequent 6 months was used for measuring and reporting performance.

Note that in this type of setup, the temporal length of time series for each style will vary, as they were listed for different duration.

II. Attributes

Based on the data, I modeled promotions, discount, and list page views (visibility) along with fashion attributes of the style as external regressors.  Some of these features were not known for future time steps at the time of prediction. Therefore I transformed most of these features so that default values of promotions and discounts for future time
steps can be easily approximated without remembering the training data. The details of engineered features are mentioned herein.

Fashion Factors:

Fashion related Attributes such as color, material etc. of a style are used. These attributes may be different for different article types. I embled these attributes in order to both compress their representations while preserving salient features, as well as capture mutual similarities and differences.

Merchandising Factors:

– Discount: In my initial analysis of the data, I found that most brands sold at an average (consistent) discount , while there were intra-brand variations in discounts that sometimes boosted sales on the retail platform. I captured the discount deviation from both the brand average, and overall retail platform average, hence, found this feature to contain more information than the item/style’s absolute discount. A value of 0 in this case for future will mean that style will be sold at average brand/ platform discount. This feature also captured the non-linear and brand specific effects of discounting in fashion retail.

– Visibility: Visibility features are derived from the list page views, which is the shelf space allocated to a style in an online store. List views ratio with respect to brand and platform are numerical measures of style visibility dispersion, and have big impact on observed sales, so we use them as features. List views given to a style depend on its sales,
CTR, applied promotions etc. But in the absence of this information, usually platform average list views are given to new styles.

– Promotion: To model the effect of sales drop just before and after a promotion, features like days to promotion and days from promotion are used. In the Indian retail scenario in general, certain country-wide observed holidays/occasions are promotional shopping festival days, such as Diwali, Valentine’s day, etc. In the  run up to a shopping festival (promotional), customers tend to postpone their buying till the promotional event, and immediately after a period of intense activity, I could see a significant lull in shopping enthusiasm. Hence the choice of maintaining a calendar like feature to indicate a count down to and from planned promotional events.

III. Derived Features:

– Age of Style: Shelf life of a style. With longer shelf life, the style’s demand may decay with time.

– Trend and Seasonality: To model a trend in interest over time, the number of weeks between experiment start date and the current date is used. In order to model  seasonality in purchase patterns, first three terms of the Fourier transform of week of year are used as features. For a new item, these can be derived during prediction.

– Cannibalization: Cannibalization is a commerce specific scenario where given that buyers/customers have a certain need, equivalent items may cannibalize each other’s sales to meet that need. We create features like number of styles listed in a week, number of styles listed within the same brand in that week, number of styles listed by other brands in similar price ranges, etc. If all styles to be considered are available, along with their merchandising factors, these features can be inferred for new items; if not available then averages/medians may be used as representative values.

IV. Models performance comparison

As you can see in the table, I experimented with both ML (Tree based) as well as Deep Learning models. You can see that almost all ML based models outperform the naive average based projection model. XGBoost with MSE loss, when optimized in logarithmic scale gives best performance followed by GBRT. Among deep learning models, LSTM with Poisson loss, when optimized in linear scale gives best performance.

A question you may ask is that as per this comparison, this shows that Tree based models are better option but remember that this was a relatively smaller experiment with controlled amount of data. When data scale is much large, and models need to run continuously, Deep Learning models have an edge.


Based on my own research




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s