Program Code Generator using SotA Transformer Models

Results of this project are explained in the final report.

 

Apply to this project here

Motivation
In our ever fast changing world, insurance companies also face additional challenges. One of them is
the frequent replacement of their legacy management systems. At msg life, the leading life insurance software provider in the DACH area, we offer solutions for data migration, but here too, time does not stand still. In recent years, we have been integrating machine learning and exploratory data analysis method into our systems. Typically, we have invested in research and development (R&D) of new AI based methods to offer better solutions to our customers. Furthermore, we want to offer our employees better tools to help them with tedious tasks.

Goal
The overall goal is to generate program code based on the formal specifications provided by life
insurance experts. Our data migration procedural model guides us through the journey of data from the source system to the destination system. A major part of this, is the conversion of specifications into a program code. In this project, we want to develop an AI based solution that converts the specifications into a program code.
By doing so, current generative state-of-the-art (SotA) open-source methods need to be tested for
their applicability in our scenario.
Thus, the project consists of the followings:

  • Application of one or more promising models for the described task
  • Systematic comparison
  • Improvement of the models by fine-tuning and/or transfer training

The expected deliverables are:

  • A Statement about most promising models or model classes
  • A systematic comparison of the models
  • Optional: a prototype with the most promising model(s) trained on our data and a documentation of the results

Ideally, such methods fit seamlessly into our model, or at least best practice can be derived for this.

Main Methods
Generative SotA open-source models, for example:

  • General transformer models for translation like BERT
  • Specialized models for code generation like CodeBERT
  • Large generative language models like Bloom embeddings or GPT-J
  • The students can select alternative methods or models as a baseline or as a working model

Data

  • Existing expert specifications
  • Matching conventionally programmed code (from finished data migration projects)
  • Without any sensitive data

Important notice

Accepted students to this project should attend online workshops at the LRZ in April 2023 before the semester starts, unless they have proven knowledge. More information will be provided to students accepted to this project.