Forecasting retail store sales with deep learning using entity embeddings

Lotus Labs
6 min readMar 3, 2021


Accurate retail sales demand forecasting is critical for optimal resource allocation, budget planning, and other related retail tasks during the year. This problem is challenging because sales prediction depends on numerous external factors, including weather, city population, unemployment, growth, or marketing changes.

One challenge of modeling retail data is the need to make decisions based on limited history. Holidays and select major events come once a year, and so does the chance to see how strategic decisions impacted the bottom line. In addition, markdowns are known to affect sales.

State-of-the-art methods for handling these tasks often rely on a combination of univariate forecasting models and machine learning methods. Such models usually require extensive tuning to set seasonality and other parameters. These types of models require manual feature extraction and frequent retraining, which can become prohibitive when there are millions of time-series to be analyzed.

In this paper, we propose a novel end-to-end neural network architecture that outperforms the current state-of-the-art sales forecasting methods on a public retail dataset. Our approach does not require the use of complicated model ensembles and minimal domain-specific engineering. This article discusses some key concepts on how we applied neural networks to retail structured data.


The dataset contains historical sales data for 45 stores located in different regions, and each store contains several departments. The company also runs several promotional markdown events throughout the year. These markdowns precede prominent holidays, the four largest of which are the Super Bowl, Labor Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the evaluation than non-holiday weeks. The dataset can be found here.

The dataset is divided into three tables: Stores, Features, and Sales. The stores' table contains anonymized information about the 45 stores, indicating each store's type and size. The features table contains additional data related to the store, department, and regional activity for the given dates. The features table is described in Table 1. MarkDown data is only available after Nov 2011 and is not available for all stores all the time.

Table 1. Features description

The sales tables contain historical sales data, from 2010–02 to 2012–11. The sales table is described in Table 2.

Table 2. Sales data description

Our Model

Our model is based on Deep learning, a powerful class of machine learning algorithms that use artificial neural networks to understand and leverage patterns in data. Deep learning algorithms use multiple layers to extract higher-level features from raw data progressively: this reduces the amount of feature extraction needed in other machine learning methods. The deep learning algorithm learns on its own by recognizing patterns using many layers of processing. That is why the “deep” in “deep learning” refers to the number of layers through which the data is transformed. Multiple transformations automatically extract important features from raw data.

This is totally the opposite of more traditional, rule-based methods, where the manual input is on both the data analysis and feature extraction plus the rule creation, which is usually a tedious process.

A categorical set of inputs is a type of data where we have different categories (or types) that are unrelated to each other. Each entity is now an embedding (vector) in new dimensions (hence the term entity embedding). Think of these different dimensions as different characteristics in the dataset. What we find, applying this technique, is a hidden representation that works for our specific problem. A neural network learns the hidden representation during the standard supervised training process. By mapping similar values close to each other in the embedding space, the model identifies patterns that would have been difficult to reveal for the categorical variables. This means that we can find useful patterns without performing much feature engineering!

Our model's core idea is the use of entity embeddings, which means using a different set of dimensions to represent a categorical set of data. As represented in Figure 2.

Figure 2. Model architecture using entity embeddings.

Entity embeddings have been shown to work successfully when fitting neural networks on structured data. For example, the winning solution in a Kaggle competition on predicting the distance of taxi rides used entity embeddings to deal with each ride's categorical metadata. Similarly, the third-place solution on the task of predicting store sales for Rossmann drug stores used a much less complicated approach than the number one and two solutions. More on Entity Embeddings in this paper.


Now, let us compare our deep learning model against some of the most popular machine learning algorithms to showcase deep learning models' predictive accuracy. The metric we choose to evaluate the regression models is the root mean squared error (RMSE).

Table 3. Comparison between RMSE errors from different models obtained using 5 fold cross-validation.

The embeddings we have created capture latent features with minimum feature engineering. Our model outperforms both XGBoost and Random forests improving the performances by 42%, as seen in Table 3.

As underlined in Figure 3, by using deep learning and embedding layers, we can efficiently capture latent features difficult to engineer by hand, and the neural network model predicts the weekly sales accurately.

Figure 3. Real and forecasted weekly sales in number. Data for all the stores are shown.

Entity embeddings solve the disadvantages of simple variable encoding, such as one-hot encoding. One-hot encoding variables with many categories results in very sparse vectors, which are computationally inefficient and make it harder to reach optimization. Embeddings provide information about the distance between different categories. The advantage of using embeddings is that they can be learned, representing each category better than what other models can approximate.

How can LotusLabs help you?

Building an AI system is clearly a complex undertaking. The right conditions must be in place to ensure that the system also works reliably in day-to-day operations, performing as planned. The factors that determine whether an implementation is a success will cover all levels of the retail business.

At LotusLabs, we are experts in Machine Learning and AI infrastructure. Our people work with your people, at all levels. Our methods help you find ways to put AI to work.

You want to see AI drive value in every corner of your business. But how do you get started? And how do you get there before your competition? LotusLabs helps you define an AI Roadmap that contains your vision. With the roadmap ready, you can focus on projects with the highest return and least risk.

Transform your business into an AI-driven enterprise, implementing machine learning models that solve complex business problems and drive real ROI on the path toward functioning AI-supported retail.



Lotus Labs

Transform your business into an AI-driven enterprise. We specialize in Machine learning for Retail, Insurance, and Healthcare industries.