Generating investment opportunities using LLMs

Results of this project are shown in the demo video and the final report (PDF) below:

Apply to this project here

About FSN Capital
Established in 1999, FSN Capital Partners is a leading Northern European private equity firm and advisor to the FSN Capital Funds, with €4 billion under management and offices in Oslo, Stockholm, Copenhagen,
and Munich. FSN Capital Funds make control investments in growth-oriented Northern European companies, to support further growth and to transform companies into more sustainable, competitive, international,
and profitable entities. FSN Capital Partners is at the forefront of data-driven investing and is actively driving digital transformation in the private equity industry.

About Private Equity
Private equity is a way of investing, where the investment firm buys a majority stake in a private company. As opposed to investing in public companies through the stock market, a private equity investment will have
active ownership in the company. Meaning partnering up with the company to increase its value during the ownership period. A private equity fund typically owns a company for 5-10 years before selling it to another
owner or taking the company to the stock market (IPO).


Challenge (Motivation and Value)

  • Arguably, the two strongest predictors for investment opportunity quality and risk, are 1) How the subsector the company operates in is performing, and 2) how well the company is performing within the subsector, compared to the other companies in this group.
  • A subsector is defined as a micro-cluster of similar companies that buy and sell the same products or services and are affected by the same market forces.
  • A subsector is often tied to a geographical area (size dependent on the industry) and typically consist of 3-10 comparable companies.
  • The search space is vast. In 2020, the EU estimated that its business economy was made up of 26.3 million active enterprises. Often standard classifications in company databases (e.g. NACE codes) are no granular enough for getting to a shortlist of relevant companies. For example, one might easily find a long list of biotech companies, but it is usually a laborious manual task to filter down to biotech companies which have a certain product offering or operate in a niche market. I.e., there is little value to compare a biotech company that produces an insulin meter, with a protein folding software company. These should sit in their own subsectors.
  • FSN wish to automatically cluster companies into one or more subsectors, 2) score subsector attractiveness, 3) score companies' relative performance in the subsectors.
  • The most promising way to solve this challenge is to describe an industry/subsector/target-company based on keywords, scrape company websites and then apply keyword search, clustering and LLMs to determine company similarity on a much more granular level. Combined with financial data, you start getting industry-leading investment technology.

Project (Work packages)

  • Web scraping: Optional - FSN has scraped the websites for all companies in its existing company database. FSN will provide this data but can also share the scraping algorithms to the team such that they can further optimize it to the needs of this project.
  • Data pre-processing: Cleaning the scraped website data from noise + Optional: automated translation of texts to English language
  • Modelling: Application of clustering, NLP algorithms and LLMs to determine similarity scores for companies based on key words and website content. Testing of open-source alternatives to OpenAI GPT models. Definition of a company similarity metric and measures to evaluate quality. Development of a ranking logic which considers also financial data in addition to similarity scores. Optional: addition of alternative data sources such as Google Trends.
  • Data visualization / Prototype development: Target is to develop a lightweight web application (for example via Dash Plotly) that can be used as a user interface for investors to get a short list of relevant companies based on their keyword description of the industry/market

Technical specifications

Data sources

  • Company database: containing website URLs, financials, existing industry classifications, and ownership structures
  • Scraped company website data

Tools

  • Python, Langchain, Google Search API, GPTX-API, Hugging Face, GCP, GitHub, Dash Plotly, Docke