Let me show you! Developing an image-based document search

This project took place in winter term 2020, you CAN NOT apply to this project anymore!

Results of this project are explained in detail in the final documentation and presentation.

  • Sponsored by: inovex GmbH
  • Project Leader: Dr. Ricardo Acevedo Cabra
  • Scientific Lead: Dr. Robert Pesch, Sebastian Blank, Julia Kronburger,
  • TUM Co-Mentor: PhD candidate Olga Graf
  • Term: Winter semester 2020

Finding relevant textual descriptions for images is called image-to-text retrieval. Efficient image-to-text retrieval can significantly improve the user experience of many expert tools, for example, service technicians can find documentation and manuals given the photo of a defect technical component. Developing such a retrieval system, potentially serving hundreds of requests per second, is a non-trivial task. It requires a diverse team with background in Data Science, Data Engineering, Software Engineering, and Project Management.

Our project aims to leverage state-of-the-art deep-learning and indexing approaches to develop an image-based document search system. Typically, these approaches embed images and text documents into a joint representation space. Relevant documents are later on retrieved based on the similarity between the image- and document-representations. The project addresses the different stages on the path to a data product. We will start with a simple, already published, model, and set up a vector-based search engine to demonstrate the principle functionality of the retrieval system and create our first Minimum Viable Product (MVP). Further stages cover model development and refinement, the implementation of an efficient indexing structure, and the implementation of a prototypical frontend to retrieve documents. We will make use of publicly available labeled data sets to train and evaluate our model. Employing public cloud-resources will enable us to set up the needed and scalable infrastructure for the search system.

We will develop the image-retrieval system in agile and short iterations. In every iteration, we will incrementally increase the complexity of our model, the search system, the infrastructure, and the frontend. A key ingredient for successfully developing such a system will be a team with broad interests and skills, which is eager to learn and collaborate as a team.