Cross-lingual Semantic Search

Building an advanced information retrieval (IR) system involves numerous challenges due to the multitude of tasks that an advanced search engine is expected to perform. Some of these tasks are

  1. Ranking the retrieved documents by relevance with respect to the information need of the users,
  2. Finding relevant documents in languages other than that of the query (cross-lingual IR),
  3. Retrieving semantically relevant documents even if they do not include the exact words of the query, 
  4. Prioritizing documents about the queried topic over those that merely mention the topic.

While traditional IR research has pursued different approaches to tackle some of these challenges in isolation, recent advances in neural IR have shown possible strategies for tackling many tasks simultaneously. During the initial project phase we will set up an existing IR system for cross-lingual and semantic search, which already contains several querying methods. We will then develop further querying methods based on, for example, document vectors generated by recurrent neural networks or by combining probabilistic models with neural techniques. Finally, we will compare the newly developed methods against the existing ones.

Results: The results of this project are explained in detail in the final documentation and presentation.