Jeroen Van Goey

Staff Research Engineer in BioAI building generative models for biology

Jeroen Van Goey

Machine learning research engineer, fascinated by the intersection of AI and science. I bridge scientific ML, biological domain knowledge, production-quality software, and team leadership.

Currently looking for Staff/Principal Research Engineer roles in AI for science (biology, proteomics, drug discovery, protein design, foundation models).

  • Published InstaNovo (a transformer) and InstaNovo+ (a diffusion model) for de novo peptide sequencing in Nature Machine Intelligence.
  • Led BioAI teams working on peptide sequencing and signal-peptide design.
  • Built and shipped production-grade ML systems across GPUs, HPC, cloud, and scientific data pipelines.

What I work on

I build and ship machine-learning systems for science: generative sequence models, calibrated and reliable predictions, and the large-scale data and training behind them. My main application is de novo peptide sequencing: predicting peptide sequences directly from raw mass spectra.

  • Generative sequence models: transformer and diffusion models that decode sequences from noisy scientific signals (InstaNovo / InstaNovo+), top-ranked across an independent benchmark of 17 tools on 83 datasets.
  • Reliable, calibrated predictions: rescoring, calibration and false-discovery-rate control so model outputs carry statistical guarantees.
  • Large-scale data & training: curating large public datasets with automated LLM labelling, and distributed training across GPUs, HPC and cloud.
  • Research to production: turning models into reusable, production-grade libraries and an end-to-end pipeline (prediction, scoring, quantification, structural assembly).

Publications

* co-first author · co-senior author

InstaNovo figure 1
K. Eloff*, K. Kalogeropoulos*, J. Van Goey, T. P. Jenkins
Nature Machine Intelligence · 2025 · 7(4):565–579
InstaNovo-P figure 1
J. Lauridsen*, P. Ramasamy*, J. Van Goey, K. Kalogeropoulos
bioRxiv (preprint) · 2025
Winnow figure 1
A. Mabona*, J. Daniel*, J. Van Goey, K. Kalogeropoulos
arXiv (preprint) · 2025
InstaNexus figure 1
M. Reverenna, M. Wennekers Nielsen, J. Van Goey, K. Kalogeropoulos
Molecular & Cellular Proteomics · 2026 · 25(4):101547
Database search framework figure 1
K. Kalogeropoulos, J. Van Goey, T. P. Jenkins, K. M. Eloff
Journal of Proteome Research · 2026 · 25(5):2234–2242
Open-source and FAIR research software for proteomics figure 1
Y. Perez-Riverol, W. Bittremieux, J. Van Goey, W. E. Fondrie
Journal of Proteome Research · 2025 · 24(5):2222–2234
afkSNP poster
J. Van Goey, H. Pouseele, P. Supply, S. Niemann
Conference poster · Benelux Bioinformatics Conference 2015

InstaNovo in the news

The InstaNovo paper drew broad attention across the scientific press.

More coverage: follow-up, press releases, blogs & social

Institutional press releases: InstaDeep · DTU · Science News Denmark (Novo Nordisk Foundation)

Blogs & newsletters: Plenty of Room · Proteomics News

Video: Revolutionary AI Tool InstaNovo Redefines Protein Sequencing

Community: r/proteomics · r/massspectrometry

On X: InstaDeep · viral thread (149.8K views) · Chemistry World · A. Laustsen · T. P. Jenkins · J. Bravo-Abad

On LinkedIn: InstaNovo · Proteomics & Immunopeptidomics · M. Busch · A. Laustsen

Follow-up:


Experience

Location & availability: I moved from Belgium to South Africa with my family in 2023, and we plan to relocate back to Europe around mid-2027. So I'm open to remote-first roles now, or hybrid / on-site roles in Europe from 2027.

InstaDeep

Staff Research Engineer in BioAI
August 2022 – Present
South AfricaCape Town, South Africa

  • Tech lead and hiring manager for two cross-disciplinary ML research teams: de novo peptide sequencing (with the Technical University of Denmark) and signal-peptide design for secretion efficiency (with BioNTech).
  • Set research direction and take models from research to production, balancing scientific rigour with reliable, maintainable engineering.
  • Lead the BioAI department of InstaDeep’s Cape Town office and serve as the office’s site manager; responsible for hiring and growing the BioAI teams across the Cape Town and Kigali offices.
  • Built the team’s ML engineering foundations: scalable, Transformer-based libraries (Python / PyTorch) for large-scale training across GPUs, HPC supercomputers (MareNostrum v5), and the cloud.

Barco

Senior Software Development Engineer, Machine Learning
February 2020 – August 2022
BelgiumKortrijk, Belgium
Built a TensorFlow Extended production pipeline training deep-learning models on multispectral images to detect and classify melanoma skin cancers for Demetra, Barco’s dermatology imaging device. Engineered the pipeline with end-to-end data and model lineage tracking to deliver the traceability and reproducibility that regulated medical industries (FDA approval) require. Worked across the research, cloud-backend and mobile/web frontend teams.

BASF

Bioinformatics Researcher, Manager of the Python & R Platforms
August 2018 – January 2020
BelgiumZwijnaarde, Belgium
Owned the Python/R data-analysis platform used by 480 researchers and data scientists: technical support, proactive monitoring and root-cause analysis, training (NumPy, pandas, BioPython, …) and mentoring across the scientific community.

Bayer Crop Science

Bioinformatics Researcher & Python Platform Manager
February 2018 – July 2018
BelgiumZwijnaarde, Belgium
Same platform role, prior to acquisition by BASF Agricultural Solutions.

Applied Maths, a bioMérieux company

Bioinformatics Software Developer
September 2011 – January 2018
BelgiumSint-Martens-Latem, Belgium
Worked on BioNumerics, a bioinformatics suite for integrated analysis of biological data, writing custom Python scripts and extensions for clients in the clinical, academic and government sectors.


Education

Hogeschool West-Vlaanderen (Howest)
2018–2019
Microdegree, Machine Learning & Deep Learning

Katholieke Universiteit Leuven
2010–2012
Bioinformatics, (partial credits obtained)

University of Antwerp
1997–2004
M.Sc., Biology


Projects

Awesome De Novo Peptide Sequencing

A comprehensive, interactive map of the field: algorithms, post-processors, downstream applications, and adjacent tools, deep-learning and classical alike.


Hackathons

I have co-created, organized and judged machine-learning hackathons at the Deep Learning Indaba and its regional IndabaX events.

3BPA (3-(benzyloxy)pyridin-2-amine) molecule
IndabaX 2025 · South AfricaStellenbosch University, South Africa

Training a Machine Learning Interatomic Potential (MLIP) model (predicting the energy and forces of atomic systems) with the mlip library, using the 3BPA molecule (3-(benzyloxy)pyridin-2-amine) as the benchmark system.

Snakes and Sequences hackathon
Deep Learning Indaba 2024 · SenegalSenegal

Using InstaNovo and de novo peptide sequencing to characterise African snake venoms for better antivenoms.

Desert locust breeding ground prediction hackathon
IndabaX 2024 · South AfricaUniversity of the Witwatersrand, Johannesburg

Predicting desert locust breeding grounds from remote-sensing data, an early-warning task that helps target control efforts before swarms threaten crops and food security across Africa.

Unveiling Cassava's Secrets genomics hackathon
Deep Learning Indaba 2023 · GhanaGhana

Probe the genome of cassava, a staple crop for food security across Africa, using AgroNT, a variant of InstaDeep's Nucleotide Transformer trained on edible plant genomes.


Teaching & mentoring

Machine Learning for Biology practical
2023
GhanaDeep Learning Indaba, Ghana
Mentored the hands-on practical alongside InstaDeep and Google DeepMind colleagues.


Skills

Machine learning
Transformers, diffusion models, large language models (LLMs), supervised & self-supervised learning

Frameworks
PyTorch, TensorFlow / TFX, NumPy, pandas

Scale & MLOps
Distributed GPU training, HPC clusters (on-premise and the MareNostrum 5 supercomputer), cloud (AWS), Docker, CI/CD

Bioinformatics
BioPython, BLAST, ClustalW, Snakemake, BioNumerics

Languages
Python (primary), R

Just for fun

Project Euler profile: Solved 68, Level 2

My solutions to Project Euler, a series of challenging mathematical and computer-programming problems.

Year Days Stars
2024 5 days completed 10 stars
2023 2 days completed 5 stars
2022 6 days completed 12 stars
2021 8 days completed 17 stars
2020 8 days completed 18 stars
2019 6 days completed 14 stars
2018 5 days completed 10 stars
2017 6 days completed 13 stars
2016 15 days completed 30 stars
2015 12 days completed 27 stars

My solutions to Advent of Code, the annual December programming-puzzle event (156 stars across 2015–2024).

Just another genome hacker. 🧬