Generative creative AI - Using GANs for text based image creation

A picture of a person holding painting tools — Capgemini. CC0

Sponsored by: Capgemini
Project Lead: Dr. Ricardo Acevedo Cabra
Scientific Lead: Matthias Wissel, Matthias Rebs, and Arndt Kirchhoff
Co-Mentoring: PhD candidate Sandro Belz
Term: Summer semester 2019

Results of this project are explained in detail in the final documentation and presentation.

For us humans it is easy to understand the description of an image. Stories come to life in our minds and form vivid pictures and scenes. Nevertheless it takes more or less training to be able to create these pictures and scenes by for example brush strokes. It is the same with machine learning. Discriminative tasks, like classification or object detection, can be mastered very well by algorithms. Generating realistic images on the other hand is a much bigger challenge since there is much more information to produce.

But the challenge is worth being tackled due to the manifold areas of application. There is content creation like the design of clothing or the production of marketing material. Another area is editing support, like changing the scenery of a photo from rainy to sunny. And finally, there is the vast area of data augmentation that can be used to enrich data sets, fill the underrepresented cases with synthesized data, for e.g. dangerous scenarios of autonomous driving or rare situations in fraud detection.

In this project we want to research the different versions of these models. In particular we will try to understand and implement the kind of models that can be used for creating images based on text descriptions. As a starting point we will look into an approach based on StackGAN which yielded very promising results in creating photo realistic images. The goal will be to implement a model that is able to produce images that cannot be distinguished from a photo by the human eye.