Jeroen Van Goey

Staff Research Engineer in BioAI building generative models for biology

Jeroen Van Goey

Machine learning research engineer, fascinated by the intersection of AI and science. I bridge scientific ML, biological domain knowledge, production-quality software, and team leadership.

Currently looking for Staff/Principal Research Engineer roles in AI for science (biology, proteomics, drug discovery, protein design, foundation models).

  • Published InstaNovo (a transformer) and InstaNovo+ (a diffusion model) for de novo peptide sequencing in Nature Machine Intelligence.
  • Led BioAI teams working on peptide sequencing and signal-peptide design.
  • Built and shipped production-grade ML systems across GPUs, HPC, cloud, and scientific data pipelines.

What I work on

I build and ship machine-learning systems for science: generative sequence models, calibrated and reliable predictions, and the large-scale data and training behind them. My main application is de novo proteomics: reading protein and peptide sequences directly from raw mass spectra.

  • Generative sequence models: transformer and diffusion models that decode sequences from noisy scientific signals (InstaNovo / InstaNovo+), top-ranked across an independent benchmark of 17 tools on 83 datasets.
  • Reliable, calibrated predictions: rescoring, calibration and false-discovery-rate control so model outputs carry statistical guarantees (Winnow).
  • Large-scale data & training: curating large public datasets with automated LLM labelling, and distributed training across GPUs, HPC and cloud.
  • Research to production: turning models into reusable, production-grade libraries and an end-to-end pipeline (prediction, scoring, quantification, structural assembly).

Publications

InstaNovo figure 1
K. Eloff, K. Kalogeropoulos, A. Mabona, O. Morell, R. Catzel, E. Rivera-de-Torre, J. Berg Jespersen, W. Williams, S. P. B. van Beljouw, M. J. Skwark, A. H. Laustsen, S. J. J. Brouns, A. Ljungars, E. M. Schoof, J. Van Goey, U. auf dem Keller, K. Beguir, N. Lopez Carranza, T. P. Jenkins
Nature Machine Intelligence · 2025 · 7(4):565–579
InstaNovo-P figure 1
J. Lauridsen, P. Ramasamy, R. Catzel, V. Canbay, A. Mabona, K. Eloff, P. Fullwood, J. Ferguson, A. Kirketerp-Møller, I. S. Goldschmidt, T. Claeys, S. van Puyenbroeck, N. Lopez Carranza, E. M. Schoof, L. Martens, J. Van Goey, C. Francavilla, T. P. Jenkins, K. Kalogeropoulos
bioRxiv (preprint) · 2025
Winnow figure 1
A. Mabona, J. Daniel, H. S. J. Knudsen, R. Catzel, K. M. Eloff, E. M. Schoof, N. Lopez Carranza, T. P. Jenkins, J. Van Goey, K. Kalogeropoulos
arXiv (preprint) · 2025
InstaNexus figure 1
M. Reverenna, M. Wennekers Nielsen, D. S. Wolff, J. Daniel, E. Lytra, S. Thumtecho, P. D. Colaianni, A. Ljungars, A. H. Laustsen, E. M. Schoof, J. Van Goey, T. P. Jenkins, M. V. Lukassen, A. Santos, K. Kalogeropoulos
Molecular & Cellular Proteomics · 2026 · 25(4):101547
Database search framework figure 1
K. Kalogeropoulos, J. Van Goey, T. P. Jenkins, K. M. Eloff
Journal of Proteome Research · 2026 · 25(5):2234–2242
Open-source and FAIR research software for proteomics figure 1
Y. Perez-Riverol, W. Bittremieux, W. S. Noble, L. Martens, A. Bilbao, M. R. Lazear, B. Grüning, D. S. Katz, M. J. MacCoss, C. Dai, J. K. Eng, R. Bouwmeester, M. R. Shortreed, E. Audain, T. Sachsenberg, J. Van Goey, G. Wallmann, B. Wen, L. Käll, W. E. Fondrie
Journal of Proteome Research · 2025 · 24(5):2222–2234
afkSNP poster
J. Van Goey, H. Pouseele, P. Supply, S. Niemann
Conference poster · Benelux Bioinformatics Conference 2015

InstaNovo in the news

The InstaNovo paper drew broad attention across the scientific press.

More coverage: press releases, blogs & social

Institutional press releases: InstaDeep · DTU · Science News Denmark (Novo Nordisk Foundation)

Blogs & newsletters: Plenty of Room · Proteomics News

Community: r/proteomics · r/massspectrometry

On X: InstaDeep · viral thread (149.8K views) · Chemistry World · A. Laustsen · T. P. Jenkins · J. Bravo-Abad

On LinkedIn: InstaNovo · Proteomics & Immunopeptidomics · M. Busch · A. Laustsen

After publication we released a next generation of InstaNovo models: Introducing the next generation of InstaNovo models (announcement on X · on LinkedIn).

More at the InstaNovo project website and its blog.


Projects

Awesome De Novo Peptide Sequencing

A comprehensive, interactive map of the field: algorithms, post-processors, downstream applications, and adjacent tools, deep-learning and classical alike.


Experience

Location & availability: I moved from Belgium to South Africa with my family in 2023, and we plan to relocate back to Europe around mid-2027. So I'm open to remote-first roles now, or hybrid / on-site roles in Europe from 2027.

InstaDeep

Staff Research Engineer in BioAI
July 2024 – Present
South AfricaCape Town, South Africa
Team lead and hiring manager for two cross-disciplinary BioAI research teams: de novo peptide sequencing (in collaboration with the Technical University of Denmark) and signal-peptide design for secretion efficiency in immunotherapy and vaccines (with BioNTech). Responsible for recruitment and team expansion across the Cape Town and Kigali offices.

Senior BioAI Software Engineer
August 2022 – July 2024
South AfricaCape Town, South Africa
Designed, implemented and delivered performant, scalable protein-design libraries (Transformer-based, written in Python / PyTorch) running compute-intensive workloads on large, complex data across GPUs, HPC supercomputers and the cloud.

Barco

Sr. Software Development Engineer, Machine Learning
February 2020 – August 2022
BelgiumKortrijk, Belgium
Built a TensorFlow Extended production pipeline training deep-learning models on multispectral images to detect and classify melanoma skin cancers, working across the research, cloud-backend and mobile/web frontend teams.

BASF

Bioinformatics Researcher, Manager of the Python & R Platforms
August 2018 – January 2020
BelgiumZwijnaarde, Belgium
Owned the Python/R data-analysis platform used by 480 researchers and data scientists: technical support, proactive monitoring and root-cause analysis, training (NumPy, pandas, BioPython, …) and mentoring across the scientific community.

Bayer Crop Science

Bioinformatics Researcher & Python Platform Manager
February 2018 – July 2018
BelgiumZwijnaarde, Belgium
Same platform role, prior to acquisition by BASF Agricultural Solutions.

Applied Maths, a bioMérieux company

Bioinformatics Software Developer
September 2011 – January 2018
BelgiumSint-Martens-Latem, Belgium
Worked on BioNumerics, a bioinformatics suite for integrated analysis of biological data, writing custom Python scripts and extensions for clients in the clinical, academic and government sectors.


Education

Hogeschool West-Vlaanderen (Howest)
2018–2019
Microdegree, Machine Learning & Deep Learning

Katholieke Universiteit Leuven
2010–2012
Bioinformatics, partial credits obtained (M.Sc. not completed)

University of Antwerp
1997–2004
M.Sc., Biology


Hackathons

I have co-created, organized and judged machine-learning hackathons at the Deep Learning Indaba and its regional IndabaX events.

3BPA (3-(benzyloxy)pyridin-2-amine) molecule
IndabaX 2025 · Stellenbosch University, South Africa

Training a Machine Learning Interatomic Potential (MLIP) model (predicting the energy and forces of atomic systems) with the mlip library, using the 3BPA molecule (3-(benzyloxy)pyridin-2-amine) as the benchmark system.

Snakes and Sequences hackathon
Deep Learning Indaba 2024 · Senegal

Using InstaNovo and de novo peptide sequencing to characterise African snake venoms for better antivenoms.

Desert locust breeding ground prediction hackathon
IndabaX 2024 · University of the Witwatersrand (Wits), Johannesburg

Predicting desert locust breeding grounds from remote-sensing data, with teams competing on the Zindi leaderboard.


Skills

Machine learning
Transformers, diffusion models, large language models (LLMs), supervised & self-supervised learning

Frameworks
PyTorch, TensorFlow / TFX, NumPy, pandas

Scale & MLOps
Distributed GPU training, HPC clusters (on-premise and the MareNostrum 5 supercomputer), cloud (AWS), Docker, CI/CD

Bioinformatics
BioPython, BLAST, ClustalW, Snakemake, BioNumerics

Languages
Python (primary), R

Just for fun

Project Euler profile: Solved 68, Level 2

My solutions to Project Euler, a series of challenging mathematical and computer-programming problems.

Year Days Stars
2024 5 days completed 10 stars
2023 2 days completed 5 stars
2022 6 days completed 12 stars
2021 8 days completed 17 stars
2020 8 days completed 18 stars
2019 6 days completed 14 stars
2018 5 days completed 10 stars
2017 6 days completed 13 stars
2016 15 days completed 30 stars
2015 12 days completed 27 stars

My solutions to Advent of Code, the annual December programming-puzzle event (156 stars across 2015–2024).

Just another genome hacker. 🧬