Enterprise Risk Autopilot

This project took place in summer term 2022, you CAN NOT apply to this project anymore!

Results of this project are explained in detail in the final report.

Project Description

Hospitals must ensure their patients’ data is well protected. Energy providers need to ensure their systems are secure to provide reliable power to the grid. Car manufacturers must protect their designs and make sure they are safe and universities must safeguard their students’ data and knowledge. Alyne supports leading organisations in all of these sectors to achieve higher levels of security and better transparency of risk. Regulators and lawmakers define a multitude of rules and regulations for these practices - and Alyne makes complying with these much easier. As you can imagine, at Alyne we must process a large amount of regulatory documents. In order to make sense of this data for our enterprise customers, we must identify what could go wrong if a certain legal or regulatory requirement is not met. This is what is defined as a risk. Automating the process of understanding a specific law or regulation and deducting the risks that may result from weaknesses in its implementation is a challenging task - and a huge value driver for our customers. This is the objective of our exciting project.

Our main task of risk mapping will be split in 3 steps:

  1.  Preprocessing task:
  • *Goal*: given an original .pdf document and finite block elements such as heading, paragraphs, image, lists etc., come up with the parsing mechanism to parse pdf documents in terms of given block elements preserving the hierarchical structure of original documents.
  • *Methods*: open-source .pdf parsing tools, LayoutLM, CNN

2. Risk identification and mapping stage:

  • *Goal*: identify references in documents that can be mapped to risks and suggest relevant risks from the risk register
  • *Methods*: deep ranking, domain adaptation, contrastive learning, weak supervision

3. *Interpretability*

  • *Goal*: interpret the suggestions provided by the model using modern  approaches
  • *Methods*: learn to dissect modern transformer-based models (BERT, XLNet) and interpret their internal decision functions (BERTScore, for example)

Tools we will use: Python, PyTorch, transformers, huggingface, sklearn, pandas, numpy

Accepted students to this project should attend (unless they have proven knowledge) online workshops at the LRZ from 19.04. - 22.04.22. More information will be provided to students accepted to this project.