Green Lakehouse: sustainable computing in Data Engineering
Join us for the presentation of our project
Date: July 9
Time: 18:00
Location: NOVE Building, Ground Floor, Solar Conference Room
Address: Luise-Ullrich-Straße 14, 80636 Munich (Please ask the front desk for directions.)
Follow the link or contact the mentors for more information.
- Sponsored by: Data Reply GmbH
- Project lead: Dr. Ricardo Acevedo Cabra
- Scientific lead: Antonio Di Turi and Majid Azimi
- TUM co-mentor: TBA
- Term: Winter semester 2025
- Application deadline: Sunday 20.07.2025
Apply to this project here

Motivation and Values
In our data-driven world, Data Engineering has rapidly matured, evolving from siloed monolithic systems to flexible architectures like Data Lakes and Warehouses. Today, Lakehouse architectures are emerging as the industry standard—merging the scalability of Data Lakes with the structure and performance of Data Warehouses.
Yet as this evolution continues, so do the demands. Petabyte-scale data workloads not only pose technical and operational challenges—they also come with significant energy and financial costs. Companies are now prioritizing sustainability and cost-efficiency, particularly with the rise of FinOps—a practice focused on optimizing cloud spending through financial accountability and informed decision-making.
This project aims to address a key aspect of sustainable computing: how to design and operate large-scale data pipelines with both performance and energy efficiency in mind.
Project Objectives
We will implement a modern Lakehouse architecture using open-source technologies and hybrid infrastructure (on-prem + cloud), and evaluate the energy and cost efficiency of various computing frameworks. Specifically, we will:
- Migrate data from various formats to Apache Iceberg, a modern open-source table format.
- Compare performance, cost, and energy efficiency across data processing engines such as Apache Spark, Apache Flink, Trino, Daft, Kafka, and Hadoop.
- Collect energy metrics using Kepler (for containerized/cloud environments) and Scaphandre (for bare-metal environments).
- Build insightful dashboards to visualize trade-offs between performance, cost (FinOps), and sustainability.
Core Values
- Innovation at the Intersection of Data and Sustainability: Explore cutting-edge techniques in both sustainable computing and data infrastructure.
- Career Growth Through Hands-On Implementation: Building a Lakehouse from the ground up using open technologies provides real-world experience that is highly sought-after.
- Strong Technical Foundation: Exposure to a variety of modern tools across the data stack—valuable for both engineers and architects.
- Industry Relevance: As companies across Germany (and globally) invest in Lakehouse architectures, your experience will align with in-demand practices in both data management and FinOps.
Methodology
- Architecture & Infrastructure: We will build the Lakehouse on Kubernetes (K8s) to ensure scalability and avoid vendor lock-in. Cloud services will be used where beneficial to accelerate delivery.
- Data Pipeline: Initial proof-of-concept: migrate CSV files into Iceberg tables. From there, pipelines will be built using various frameworks to test scalability and efficiency.
- Energy & Cost Monitoring: Kepler (Kubernetes Energy Profiler) for container-based workloads Scaphandre for energy usage on physical machines
- Custom dashboards will correlate workflow cost, energy consumption, and performance, enabling FinOps-based decision making.
Evaluation & Reporting:
Standardized reports will compare:
- Resource usage
- Time-to-completion
- Energy consumption
- Operational cost estimates
Requirements & Opportunities
You do not need prior experience with all tools to participate. What matters most is your willingness to learn quickly and work professionally in a team setting. Preferred experience includes:
- Familiarity with Kubernetes (infrastructure base)
- Exposure to Data Engineering tools (Flink, Spark, Trino, Daft, Kafka, Hadoop)
- Understanding of AWS concepts
- Strong Python programming skills
However, learning is a key part of this project—so we welcome applicants who are motivated to grow and explore new domains.
Apply to this project here