Jeroen Van Goey

Staff Research Engineer in BioAI building generative models for biology

Jeroen Van Goey

Machine learning research engineer, fascinated by the intersection of AI and science. I bridge scientific ML, biological domain knowledge, production-quality software, and team leadership.

Currently looking for Staff/Principal Research Engineer roles in AI for science (biology, proteomics, drug discovery, protein design, foundation models).

  • Published InstaNovo (a transformer) and InstaNovo+ (a diffusion model) for de novo peptide sequencing in Nature Machine Intelligence.
  • Led BioAI teams working on peptide sequencing and signal-peptide design.
  • Built and shipped production-grade ML systems across GPUs, HPC, cloud, and scientific data pipelines.

What I work on

I build and ship machine-learning systems for science. My main application is de novo peptide sequencing: generative models that read peptide sequences directly from raw mass spectra, the confidence estimation that makes those predictions trustworthy, and the proteome-scale analysis they feed.

Research framework
1
Spectrum Peptide
De novo sequencing with transformer & diffusion models (InstaNovo / InstaNovo+)
2
Peptide Confidence
Rescoring, calibration & false-discovery-rate control (Winnow)
3
Peptides Proteome
Quantification & structural assembly into biological insight (InstaNexus)
Generative sequence models and scalable ML, from raw spectra to biology
  • Independently validated: top-ranked across an external benchmark of 17 de novo sequencing tools on 83 datasets.
  • Built for scale: distributed training on large public datasets (~63 million spectra) curated with automated LLM labelling.

Publications

8 publications, including in Nature Machine Intelligence.

* co-first author · co-senior author

InstaNovo figure 1
K. Eloff*, K. Kalogeropoulos*, J. Van Goey, T. P. Jenkins
Nature Machine Intelligence · 2025 · 7(4):565–579
InstaNovo-P figure 1
J. Lauridsen*, P. Ramasamy*, J. Van Goey, K. Kalogeropoulos†
bioRxiv (preprint) · 2025
Winnow figure 1
A. Mabona*, J. Daniel*, J. Van Goey, K. Kalogeropoulos
arXiv (preprint) · 2025
InstaNexus figure 1
M. Reverenna, M. Wennekers Nielsen, J. Van Goey, K. Kalogeropoulos†
Molecular & Cellular Proteomics · 2026 · 25(4):101547
Database search framework figure 1
K. Kalogeropoulos, J. Van Goey, T. P. Jenkins, K. M. Eloff
Journal of Proteome Research · 2026 · 25(5):2234–2242
Open-source and FAIR research software for proteomics figure 1
Y. Perez-Riverol, W. Bittremieux, J. Van Goey, W. E. Fondrie
Journal of Proteome Research · 2025 · 24(5):2222–2234
afkSNP poster
J. Van Goey, H. Pouseele, P. Supply, S. Niemann
Conference poster · Benelux Bioinformatics Conference 2015

InstaNovo in the news

The InstaNovo paper drew broad attention across the scientific press.

More coverage: follow-up, press releases, blogs & social

Institutional press releases: InstaDeep · DTU · Science News Denmark (Novo Nordisk Foundation)

Blogs & newsletters: Plenty of Room · Proteomics News

Video: Revolutionary AI Tool InstaNovo Redefines Protein Sequencing

Community: r/proteomics · r/massspectrometry

On X: InstaDeep · viral thread (149.8K views) · Chemistry World · A. Laustsen · T. P. Jenkins · J. Bravo-Abad

On LinkedIn: InstaNovo · Proteomics & Immunopeptidomics · M. Busch · A. Laustsen

Follow-up:


Experience

Location & availability: I moved from Belgium to South Africa with my family in 2023, and we plan to relocate back to Europe around mid-2027. So I'm open to remote-first roles now, or hybrid / on-site roles in Europe from 2027.

InstaDeep

Staff Research Engineer in BioAI
August 2022 – Present
South AfricaCape Town, South Africa

  • Tech lead and hiring manager for two cross-disciplinary ML research teams of about 5 to 8 people: de novo peptide sequencing (with the Technical University of Denmark) and signal-peptide design for secretion efficiency (with BioNTech).
  • Delivered a client collaboration with Syngenta, applying genomic language models (AgroNT, a Nucleotide Transformer trained on ~10.5 million genomic sequences spanning trillions of base pairs across 48 plant species) to accelerate crop trait research.
  • Set research direction and take models from research to production, balancing scientific rigour with reliable, maintainable engineering.
  • Lead the BioAI department of InstaDeep’s Cape Town office and serve as the office’s site manager (an office of about 25 people); responsible for hiring and growing the BioAI teams across the Cape Town and Kigali offices.
  • Directed the team’s ML engineering foundations: scalable, Transformer-based libraries (Python / PyTorch) for large-scale training on InstaDeep’s Kyber (~500 PFLOPs) and EuroHPC’s MareNostrum 5 (260 PFLOPs) supercomputers, and the cloud (AWS, GCP).

Barco

Senior Software Development Engineer, Machine Learning
February 2020 – August 2022
BelgiumKortrijk, Belgium
Built a TensorFlow Extended production pipeline (orchestrated with Apache Airflow) training deep-learning models on multispectral images to detect and classify melanoma skin cancers for Demetra, Barco’s dermatology imaging device. Engineered the pipeline with end-to-end data and model lineage tracking to deliver the traceability and reproducibility that regulated medical industries (FDA approval) require. Worked across the research, cloud-backend and mobile/web frontend teams.

BASF

Bioinformatics Researcher, Manager of the Python & R Platforms
August 2018 – January 2020
BelgiumZwijnaarde, Belgium
Owned the Python/R data-analysis platform used by 480 researchers and data scientists: technical support, proactive monitoring and root-cause analysis, training (NumPy, pandas, BioPython, …) and mentoring across the company’s internal research community.

Bayer Crop Science

Bioinformatics Researcher & Python Platform Manager
February 2018 – July 2018
BelgiumZwijnaarde, Belgium
Same platform role, prior to acquisition by BASF Agricultural Solutions.

Applied Maths (acquired by bioMérieux)

Bioinformatics Software Developer
September 2011 – January 2018
BelgiumSint-Martens-Latem, Belgium
Worked on BioNumerics, a bioinformatics suite for integrated analysis of biological data, writing custom Python scripts and extensions for clients in the clinical, academic and government sectors.


Education

Hogeschool West-Vlaanderen (Howest)
2018–2019
Microdegree, Machine Learning & Deep Learning

Katholieke Universiteit Leuven
2010–2012
Bioinformatics, (partial credits obtained)

University of Antwerp
1997–2004
M.Sc., Biology


Projects

Awesome De Novo Peptide Sequencing

A comprehensive, interactive map of the field: algorithms, post-processors, downstream applications, and adjacent tools, deep-learning and classical alike.


Hackathons

I have co-created, organized and judged machine-learning hackathons at the Deep Learning Indaba and its regional IndabaX events.

3BPA (3-(benzyloxy)pyridin-2-amine) molecule
IndabaX 2025 · South AfricaStellenbosch University, South Africa

Training a Machine Learning Interatomic Potential (MLIP) model (predicting the energy and forces of atomic systems) with the mlip library, using the 3BPA molecule (3-(benzyloxy)pyridin-2-amine) as the benchmark system.

Snakes and Sequences hackathon
Deep Learning Indaba 2024 · SenegalSenegal

Using InstaNovo and de novo peptide sequencing to characterise African snake venoms for better antivenoms.

Desert locust breeding ground prediction hackathon
IndabaX 2024 · South AfricaUniversity of the Witwatersrand, Johannesburg

Predicting desert locust breeding grounds from remote-sensing data, an early-warning task that helps target control efforts before swarms threaten crops and food security across Africa.

Unveiling Cassava's Secrets genomics hackathon
Deep Learning Indaba 2023 · GhanaGhana

Probe the genome of cassava, a staple crop for food security across Africa, using AgroNT, a variant of InstaDeep's Nucleotide Transformer trained on edible plant genomes.


Teaching & mentoring

Machine Learning for Biology practical
2023
GhanaDeep Learning Indaba, Ghana
Mentored the hands-on practical alongside InstaDeep and Google DeepMind colleagues.

Scientific Python training & mentoring
2018–2020
BelgiumBASF and Bayer Crop Science, Belgium
Trained and mentored researchers and data scientists in scientific Python (NumPy, pandas, BioPython, …) as manager of the Python/R analysis platform used across the company’s internal research community.


Skills

Machine learning
Transformers, diffusion models, large language models (LLMs), supervised & self-supervised learning

Frameworks
PyTorch, TensorFlow / TFX, NumPy, pandas

Scale & MLOps
Distributed GPU training, HPC clusters (InstaDeep’s Kyber and EuroHPC’s MareNostrum 5), Kubernetes-based ML platforms (AIchor), cloud (AWS, GCP), Docker, CI/CD, workflow orchestration (Airflow, Dagster), experiment tracking (MLflow, Neptune, TensorBoard)

Bioinformatics
BioPython, BLAST, ClustalW, Snakemake, BioNumerics

Languages
Python (primary), R

Just for fun

Project Euler profile: Solved 68, Level 2

My solutions to Project Euler, a series of challenging mathematical and computer-programming problems.

Year Days Stars
2024 5 days completed 10 stars
2023 2 days completed 5 stars
2022 6 days completed 12 stars
2021 8 days completed 17 stars
2020 8 days completed 18 stars
2019 6 days completed 14 stars
2018 5 days completed 10 stars
2017 6 days completed 13 stars
2016 15 days completed 30 stars
2015 12 days completed 27 stars

My solutions to Advent of Code, the annual December programming-puzzle event (156 stars across 2015–2024).

Just another genome hacker. 🧬