Deep Learning for de novo peptide sequencing
Project Description
The three major classes of molecules of life are DNA, RNA, and proteins. Unlike for DNA and RNA, there is, to date, no accurate, high-throughput sequencing technology for proteins. The closest technology is tandem mass spectrometry, which yields mass spectra of protein fragments called peptides. Highly accurate de novo peptide sequencing (DNPS), i.e., determining peptide amino acid sequences solely from tandem mass spectra, will make proteomics amenable for applications including genotyping, cancer surveillance, pathogen surveillance, immuno-oncology, metagenomics, and paleogenomics.
In this project, we developed Koina, an open, containerized, web-accessible service that standardizes and accelerates access to peptide property prediction models. On top of Koina, we created Oktoberfest, a search-engine-agnostic Python package that generates spectral libraries and rescores peptide-spectrum matches. Leveraging these foundations, we introduced Spectralis, a DNPS method for tandem mass spectrometry. It features a convolutional neural network layer connecting peaks in spectra spaced by amino acid masses, proposes classifications of fragment ion series, and provides a peptide-spectrum confidence score. Together, these components improve peptide recall over initial predictions and make state-of-the-art DNPS accessible and scalable.
Results
- Developed Koina, an open-source containerized, decentralized, and online-accessible high-performance prediction service.
- Used FragPipe as an example to demonstrate Koina integration with existing proteomics software tools.
- Compiled Oktoberfest, an open-source Python package of our spectral library generation and rescoring pipeline.
- Demonstrated its ability to improve rescoring analyses on two distinct use cases.
- Oktoberfest is freely available on GitHub.
- Introduced Spectralis, a de novo peptide sequencing method for tandem mass spectrometry.
- Spectralis achieved 40% sensitivity at 90% precision, nearly doubling the state-of-the-art.
- Application to unidentified spectra confirmed its superiority and showcased its applicability to variant calling.
Follow-up
PostDoc Joel Lapin joined Computational Mass Spectrometry and is working on alternatives for encoding tandem mass spectra (Lapin et al. 2025) collaboratively with researchers from Applied Systems Biology at KTH (Royal Institute of Technology in Stockholm).
Participation in the “OpenMS Summer 2024 Fellowship” resulted in hosting a PhD student from Adrem Data Lab at the University of Antwerp. This led to a collaboration on a novel benchmark platform for DNPS tools (Pominova et al. 2026).
Another goal of our labs will be the development of tools for DNPS of post-translationally modified peptides, including support of chimeric spectra (multiple peptides). To this end, we are generalizing the underlying algorithms to predict modified residues (Klaproth-Andrade et al. 2026), including the transformer Casanovo (Straub et al. 2025). For validation, we will further improve Prosit (Gabriel et al. 2025), a foundation for Spectralis.
Proceedings of the EuBIC-MS developers meeting 2023; Pedro Beltrao, Tim Van Den Bossche, Ralf Gabriels, Tanja Holstein, Tobias Kockmann, Alireza Nameni, Christian Panse, Ralph Schlapbach, Ludwig Lautenbacher, Matthias Mattanovich, Alexey Nesvizhskii, Bart Van Puyvelde, Jonas Scheid, Veit Schwämmle, Maximilian Strauss, Anna Klimovskaia Susmelj, Matthew The, Henry Webel, Mathias Wilhelm, Dirk Winkelhardt, Witold E. Wolski, Muyao Xi; Journal of Proteomics (July 2024), https://doi.org/10.1016/j.jprot.2024.105246
Koina: Democratizing machine learning for proteomics research; Ludwig Lautenbacher, Kevin L. Yang, Tobias Kockmann, Christian Panse, Matthew Chambers, Elias Kahl, Fengchao Yu, Wassim Gabriel, Dulguun Bold, Tobias Schmidt, Kai Li, Brendan MacLean, Alexey I. Nesvizhskii, Mathias Wilhelm; bioRxiv (June 2024), https://doi.org/10.1101/2024.06.01.596953
Deep learning-driven fragment ion series classification enables highly precise and sensitive de novo peptide sequencing; Daniela Klaproth-Andrade, Johannes Hingerl, Yanik Bruns, Nicholas H. Smith, Jakob Träuble, Mathias Wilhelm & Julien Gagneur; Nat Commun 15, 151 (January 2024), https://doi.org/10.1038/s41467-023-44323-7
Oktoberfest: Open-source spectral library generation and rescoring pipeline based on Prosit; Mario Picciani, Wassim Gabriel, Victor-George Giurcoiu, Omar Shouman, Firas Hamood, Ludwig Lautenbacher, Cecilia Bang Jensen, Julian Müller, Mostafa Kalhor, Armin Soleymaniniya, Bernhard Kuster, Matthew The, Mathias Wilhelm; Proteomics (September 2023), https://doi.org/10.1002/pmic.202300112
- Koina collaboration visit by Tobias Kockmann and Christian Panse at Computational Mass Spectrometry
- Ludwig Lautenbacher attending BSPR/EuPA 2023 Conference,
“Koina: Bringing machine learning to the community” (talk) - Wassim Gabriel attending EuroBioC 2023,
“Accessing and using a European prediction service for biological data” (talk) - Ludwig Lautenbacher & Mathias Wilhelm attending Annual Conference of the DGMS 2024,
“Koina: Bringing machine learning to the community” (poster),
“Prosit, Koina, and Oktoberfest: Deep-learning for proteomics research at your fingertips” (workshop) - Daniela Klaproth-Andrade Salazar, Ludwig Lautenbacher, Mathias Wilhelm, Yanik Bruns, and Mario Picciani attending Annual Conference of the ASMS 2024,
“Improving de novo peptide sequencing for post-translationally modified peptides” (talk),
“De novo peptide sequencing breakthroughs and challenges” (panel discussion at workshop),
“De novo sequencing of multiple peptides in chimeric mass spectra” (poster),
“Koina: Bringing machine learning to the community” (talk),
“Oktoberfest: search engine agnostic rescoring pipeline leveraging online peptide property prediction from various models” (poster) - Marina Pominova from Adrem Data Lab, University of Antwerp, visiting Computational Mass Spectrometry
- since 11/2023: bi-monthly Gagneur-Wilhelm-lab meetings to foster and facilitate bi-lateral exchange and research.
Yanik Bruns, Computational Molecular Medicine
Wassim Gabriel, Computational Mass Spectrometry
Mario Picciani, Computational Mass Spectrometry
Tobias Kockmann, Functional Genomics Center Zurich (FGCZ) - University of Zurich | ETH Zurich
Christian Panse, Functional Genomics Center Zurich (FGCZ) - University of Zurich | ETH Zurich, Swiss Institute of Bioinformatics (SIB)
Joel Lapin, Computational Mass Spectrometry




