Deep Learning: What it is and how it relates to supply chains

Disclaimer: In this post I’m going to write about how we use Deep Learning in my company, Lokad.

When you follow the news about deep learning, you might have come across exciting breakthroughs such as algorithms which are able to colorize black and white photographs or automatic real-life translations of texts on pictures taken by a phone app . While these are all pretty cool applications, they do not immediately give any direct use cases for most traditional businesses. At Lokad, our goal is to translate the stunning reach of deep learning capabilities into the real world, to optimize supply chains everywhere.

So, before going into detail how we do that, let me quickly and very roughly summarize what Deep Learning actually entails without going too much into technical details.

First of all, deep learning is a flavor of machine learning. Regular non-machine learning algorithms require full prior knowledge on the task (and no training data whatsoever). An expert-knowledge approach to demand forecasting would require you to specify in advance all specific rules and patterns such as

“All articles that have category=Spring will peak in May and slowly die down until October.”

This, however may only be true for some products of this category. It is also possible that there might be subcategories that behave a bit differently and so on. Combining these with a moving average forecast already yields an overall understanding of future demand which is not so far from reality.

However it does have the following downsides:

It does not embrace uncertainty — In our experience, risk and uncertainty are crucial for supply chains, since it’s mostly the boundary cases that can be either very profitable or very costly if ignored,
You have to maintain and manage the complexity of your rule set – An approach is only as powerful as the set of rules that are applied to it. Maintaining rules is very costly i.e. for each rule in the algorithm, we calculate there is an initial cost of about 1 man-day of implementation, testing and proper documentation initially and about half a day of maintenance. Assuming you keep on refining your rules and therefore have to readjust the old ones this yields a cost of 8k € per rule for a five year period. It is worth noting that this only applies for one rule and does not take into account the exponential increase in complexity that arises when dealing with more complex product portfolios. Even demand patterns for small businesses usually exhibit dozens of influences making their maintenance incredibly costly.

Now imagine that there is a technology that could, like a human child, learn on its own to deduce patterns from data and could thus independently predict how your portfolio of products develop throughout a year.

Just like a child in development, a deep learning algorithm will try to make sense of the world by trying to deduce correlations from observations. It will test them and discard those that do not make any sense for the remaining data.

Again following our analogy, like a child learning to makes sense of the world, a deep learning algorithm is consuming lots and lots of data and the key lies in grasping the information that is actually relevant. While a child in a big city might be completely overwhelmed with all the different colors, noises and smells, it will learn later that the traffic lights are the ones to watch out for in combination with noises coming from approaching cars that are most critical when crossing a street. The same mechanism is in place for deep learning. The algorithm may process a vast amount of data and needs to find out the essence of what drives demand.

The way to figure out what is important and what is not is carried out via repeating similar situations several times, like you would repeat correct traffic behavior with a child. A human brain is highly parallelizing its sensory input processing and reaction, so that it is able to react quickly to urgent new data such as a car that is approaching while crossing the street.

With the rise of big data, parallelization became also a key topic driving efficiency and, in fact, feasibility of a “human-like” autonomous learning process.

At Lokad, we actually use the parallelized computing power high end gaming graphic cards in our cloud servers, to efficiently run our optimization for our clients, processing for a portfolio of 10.000 products with five years of sales data in less half an hour while largely outperforming any conventional moving average based algorithms (or even Lokad’s own earlier generation machine learning forecasts) in accuracy.

Lokad then uses the demand forecasting results which come in a probabilistic format to optimize the supply chain decisions taking into account economic drivers such as one’s stance on growth vs. profitability. With these analyses, Lokad directly delivers the best supply chain decisions such as purchase orders or dispatching decisions. “Best” here refers to the economic driver set up (i.e. growth vs. profitability, opportunity costs etc.) that has been put in place supply chain decisions. It it will scale with the business as one’s portfolio and demand patterns become more complex making any hard coded demand forecasting rules which need to be maintained by a human completely obsolete.

Notes:

Average Developer salary in Germany 58k €, 261 working days – 30 days of vacation in 2018 yields a 250€ manday rate)

Bits and Pieces

A personal blog on topics related to Data Science

Deep Learning: What it is and how it relates to supply chains