Uncertainty Quantification and Probabilistic forecasting of big data time series at Amazon Supply Chain

This project took place in summer term 2022, you CAN NOT apply to this project anymore!

Results of this project are explained in detail in the final report.

Project description

In Amazon EU Supply Chain, we are improving our customer experience and costs by answering complex questions using enhanced analytical methods ranging from Econometrics to more predictive methods (Machine Learning) and using prescriptive optimization models (Operations Research). With the computational developments and accessibility efforts made by Amazon Web Service, we are now extensively using large and extra-large datasets (more than 100MM observations) to perform our analyses and develop more precise statistical models.

In order to improve the way we handle uncertainty in our planning process, we are working towards building risk management approaches (stochastic optimization, robust optimization, dynamic programming, global minimum variance portfolio estimation, VAR estimation) for supply chain planning.The adoption of such a framework requires moving from point forecast to statistical/machine learning algorithms that estimate the data generative process of the conditional distribution of a target variable given a set of exogenous variables. To model the conditional distribution, we will explore quantile loss applications to Machine/Statistical Learning models with a particular focus on linear quantile regressions.The solution developed should not only accurately estimate the conditional distribution, but also be able to scale up as the size of the dataset increases. Students will test the algorithms on simulated data and on an Amazon dataset of hundreds of millions of observations. For this reason, after exploring python implementations, if required we will implement a Spark/TensorFlow version of the algorithms or solutions based on parallel computing.

The students will be provided with full access to an AWS account linked to EC2 and EMR machines, and non-confidential toy datasets to test their results along the way.

Accepted students to this project should attend (unless they have proven knowledge) online workshops at the LRZ from 19.04. - 22.04.22. More information will be provided to students accepted to this project.