Synthetic Augmentation of Smartwatch Data for Proactive Infectious Disease Management
The results of this project will be uploaded here as a final report by mid-May 2025.

- Sponsored by: Heidelberg Institute of Global Health (HIGH) in cooperation with the TUM Chair of Bioinformatics
- Project lead: Dr. Ricardo Acevedo Cabra
- Scientific lead: Jürgen Wallner
- TUM co-mentor: Dr. Alessandro Scagliotti
- Term: Winter semester 2024
- Application deadline: Sunday 21.07.2024

Recent outbreaks like Zika or COVID-19 serve as overt reminders that infectious diseases still pose a significant global threat to public health. The annual prevalence of malaria cases adds urgency to the need for innovative containment strategies, especially in the face of climate change, which is expanding the breeding sites of mosquitoes and increasing the geographical range of vector-borne diseases.
Resting heart rate (RHR) elevation has emerged as a potential early indicator of infections, presenting opportunities for intervention during the pre-symptomatic phase. This is particularly important in regions where healthcare resources are limited, as early detection can mitigate severe complications and curb further contagion.
This project aims to improve timely infection detection without relying on costly diagnostic methods by leveraging wearable technology. Specifically, heart rate, steps, and sleep data collected from smartwatch devices worn by 300 volunteers in Siaya, Kenya, are analyzed. Given the incomplete nature of such data, imputation methods such as Generative Adversarial Networks (GANs) or Wasserstein GANs (WGANs) are employed to synthetically fill data gaps and enhance the accuracy of time series anomaly detection using LSTM Autoencoders.
Ultimately, these efforts contribute to more effective population health management and better preparedness for future disease outbreaks exacerbated by climate change.

The Pre-Symptomatic Malaria Detection dataset is compiled by the Heidelberg Institute of Global Health (HIGH) over a 3-month period with 300 study participants. It contains human-based variables like heart rate (bpm), sleep data (time), and physical activity (number of steps). The dataset is provided for research and analysis purposes only. For interested students, publicly available data with similar variables is available at: https://www.nature.com/articles/s41591-021-01593-2

The primary objective of the project is to explore the feasibility of generating synthetic heart rate data within the domain of individual vital sign parameters. This entails initially implementing a simple LSTM-AE to address incomplete original data, followed by the imputation of missing values commonly encountered in smartwatch data. Subsequently, the same LSTM-AE methodology will be applied to synthetic data to address the core research question. Additionally, for the purpose of generalization and comparison, the data generation process may also be extended to encompass the dataset available at: https://www.nature.com/articles/s41591-021-01593-2

- Anomaly Detection Configuration:
• Input: Actual heart rate readings from day T - 7 to day T.
• Output: Reconstructed heart rate readings from day T - 7 to day T.
• Thresholds: Reconstruction errors based on standard deviations between day T - 2 and day T.
• Alert Generation: Determine deviation on day T. - Anomaly Detection Guidance:
• Pre-processing: Data may need downsampling and handling of missing values.
• Hyperparameters: Optimal values to be determined experimentally.
• Alternative Approaches: Finite State Machine instead of LSTM-AE can be considered for simpler anomaly detection. - Project Focus:
• Incorporating a method to address data gaps by synthetically filling them using GAN or WGAN.
• Evaluating and comparing anomaly detection performance using incomplete and completed datasets.
Overall, this project will mainly focus on incorporating a method to address data gaps by synthetically filling them using a GAN or WGAN. The performance of the anomaly detection system will be evaluated and compared using both the incomplete dataset and the completed dataset. This comparative analysis will provide insights into the efficacy of utilizing synthetic data to enhance anomaly detection capabilities.

The Heidelberg Institute of Global Health (HIGH) is one of the research institutes at the Faculty of Medicine at Heidelberg University, Germany’s oldest university. Through its research, the institute aims to contribute to improving the health of some of the sickest and poorest populations worldwide, especially in Africa and Asia. Through its teaching, the institute aims to train the next generation of global health researchers and practitioners. Therefore, our Digital Global Health working group invites interested students!
Don't miss this opportunity to be part of impactful research with real-world implications. Apply now to join us in addressing one of the most pressing healthcare challenges of our time.
Apply to this project here