Jeroen Van Goey

Staff Research Engineer in BioAI building generative models for biology

Jeroen Van Goey

Machine learning research engineer, fascinated by the intersection of AI and science. I bridge scientific ML, biological domain knowledge, production-quality software, and team leadership.

Currently looking for Staff/Principal Research Engineer roles in AI for science (biology, proteomics, drug discovery, protein design, foundation models).

  • Published InstaNovo (a transformer) and InstaNovo+ (a diffusion model) for de novo peptide sequencing in Nature Machine Intelligence.
  • Lead BioAI teams working on peptide sequencing and signal-peptide design.
  • Built and shipped production-grade ML systems across GPUs, HPC, cloud, and scientific data pipelines.

What I work on

I build and ship machine-learning systems for science. My main application is de novo peptide sequencing: generative models that read peptide sequences directly from raw mass spectra, the confidence estimation that makes those predictions trustworthy, and the proteome-scale analysis they feed.

Research framework
1
Spectrum Peptide
De novo sequencing with transformer & diffusion models (InstaNovo / InstaNovo+)
2
Peptide Confidence
False-discovery-rate control, rescoring and calibration using PyTorch (Winnow)
3
Peptides Proteome
From quantification & de Bruijn graph assembly to biological insight (InstaNexus)
Generative sequence models and scalable ML, from raw spectra to biology
  • Independently validated: top-ranked across an external benchmark of 17 de novo sequencing tools on 83 datasets.
  • Built for scale: distributed training on large public datasets (~63 million spectra) curated with automated LLM labelling.

Publications

8 publications, including in Nature Machine Intelligence.

* co-first author · co-senior author

InstaNovo-P figure 1
J. Lauridsen*, P. Ramasamy*, J. Van Goey, K. Kalogeropoulos†
bioRxiv (preprint) · 2025
Winnow figure 1
A. Mabona*, J. Daniel*, J. Van Goey, K. Kalogeropoulos
arXiv (preprint) · 2025
InstaNexus figure 1
M. Reverenna, M. Wennekers Nielsen, J. Van Goey, K. Kalogeropoulos†
Molecular & Cellular Proteomics · 2026 · 25(4):101547
Database search framework figure 1
K. Kalogeropoulos, J. Van Goey, T. P. Jenkins, K. M. Eloff
Journal of Proteome Research · 2026 · 25(5):2234–2242
Open-source and FAIR research software for proteomics figure 1
Y. Perez-Riverol, W. Bittremieux, J. Van Goey, W. E. Fondrie
Journal of Proteome Research · 2025 · 24(5):2222–2234
afkSNP poster
J. Van Goey, H. Pouseele, P. Supply, S. Niemann
Conference poster · Benelux Bioinformatics Conference 2015

InstaNovo in the news

The InstaNovo paper drew broad attention across the scientific press.

More coverage: follow-up, press releases, blogs & social

Institutional press releases: InstaDeep · DTU · Science News Denmark (Novo Nordisk Foundation)

Blogs & newsletters: Plenty of Room · Proteomics News

Video: Revolutionary AI Tool InstaNovo Redefines Protein Sequencing

Community: r/proteomics · r/massspectrometry

On X: InstaDeep · viral thread (149.8K views) · Chemistry World · A. Laustsen · T. P. Jenkins · J. Bravo-Abad

On LinkedIn: InstaNovo · Proteomics & Immunopeptidomics · M. Busch · A. Laustsen

Follow-up:


Experience

Location & availability: I moved from Belgium to South Africa with my family in 2023, and we plan to relocate back to Europe around mid-2027. So I'm open to remote-first roles now, or hybrid / on-site roles in Europe from 2027.

InstaDeep

Staff Research Engineer · BioAI Lead
August 2022 – Present
South AfricaCape Town, South Africa

  • Tech lead and hiring manager for two cross-disciplinary ML research teams of about 5 to 8 people: de novo peptide sequencing (with the Technical University of Denmark) and signal-peptide design for secretion efficiency (with BioNTech).
  • Set research direction and take models from research to production (InstaNovo is available to commercial customers on DeepChain), balancing scientific rigour with reliable, maintainable engineering.
  • Delivered a client collaboration with Syngenta, applying genomic language models (AgroNT, a Nucleotide Transformer trained on ~10.5 million genomic sequences spanning trillions of base pairs across 48 plant species) to accelerate crop trait research.
  • Lead the BioAI department of InstaDeep’s Cape Town office and serve as the office’s site manager (an office of about 25 people); responsible for hiring and growing the BioAI teams across the Cape Town and Kigali offices.
  • Directed the team’s ML engineering foundations: scalable, Transformer-based libraries (Python / PyTorch) for large-scale training on InstaDeep’s Kyber (~500 PFLOPs) and EuroHPC’s MareNostrum 5 (260 PFLOPs) supercomputers, and the cloud (AWS, GCP).

Barco

Senior Software Development Engineer, Machine Learning
February 2020 – August 2022
BelgiumKortrijk, Belgium
Built a TensorFlow Extended production pipeline (orchestrated with Apache Airflow) training deep-learning models on multispectral images to detect and classify melanoma skin cancers for Demetra, Barco’s dermatology imaging device. Engineered the pipeline with end-to-end data and model lineage tracking to deliver the traceability and reproducibility that regulated medical industries (FDA approval) require. Worked across the research, cloud-backend and mobile/web frontend teams.

BASF

Bioinformatics Researcher, Manager of the Python & R Platforms
August 2018 – January 2020
BelgiumZwijnaarde, Belgium
Owned the Python/R data-analysis platform used by 480 researchers and data scientists: technical support, proactive monitoring and root-cause analysis, training (NumPy, pandas, BioPython, …) and mentoring across the company’s internal research community.

Bayer Crop Science

Bioinformatics Researcher & Python Platform Manager
February 2018 – July 2018
BelgiumZwijnaarde, Belgium
Same platform role, prior to acquisition by BASF Agricultural Solutions.

Applied Maths (acquired by bioMérieux)

Bioinformatics Software Developer
September 2011 – January 2018
BelgiumSint-Martens-Latem, Belgium
Worked on BioNumerics, a bioinformatics suite for integrated analysis of biological data, writing custom Python scripts and extensions for clients in the clinical, academic and government sectors.


Education

HOWEST Hogeschool West-Vlaanderen
2018–2019
BelgiumKortrijk, Belgium
Microdegree, Machine Learning & Deep Learning

Katholieke Universiteit Leuven
2010–2012
BelgiumLeuven, Belgium
Bioinformatics (partial credits obtained)

University of Antwerp
1997–2004
BelgiumAntwerp, Belgium
M.Sc., Biology

Universitat de Barcelona
2002
SpainBarcelona, Spain
Erasmus interuniversity exchange

Møglestu videregående skole
1996–1997
NorwayLillesand, Norway
AFS intercultural exchange program

Onze-Lieve-Vrouwecollege, Antwerp
1990–1996
BelgiumAntwerp, Belgium
Secondary education: mathematics–modern languages


Projects

Awesome De Novo Peptide Sequencing

A comprehensive, interactive map of the field: algorithms, post-processors, downstream applications, and adjacent tools, deep-learning and classical alike.


Hackathons

I have co-created, organized and judged machine-learning hackathons at the Deep Learning Indaba and its regional IndabaX events.

3BPA (3-(benzyloxy)pyridin-2-amine) molecule
IndabaX 2025 · South AfricaStellenbosch University, South Africa

Training a Machine Learning Interatomic Potential (MLIP) model (predicting the energy and forces of atomic systems) with the mlip library, using the 3BPA molecule (3-(benzyloxy)pyridin-2-amine) as the benchmark system.

Snakes and Sequences hackathon
Deep Learning Indaba 2024 · SenegalSenegal

Using InstaNovo and de novo peptide sequencing to characterise African snake venoms for better antivenoms.

Desert locust breeding ground prediction hackathon
IndabaX 2024 · South AfricaUniversity of the Witwatersrand, Johannesburg

Predicting desert locust breeding grounds from remote-sensing data, an early-warning task that helps target control efforts before swarms threaten crops and food security across Africa.

Unveiling Cassava's Secrets genomics hackathon
Deep Learning Indaba 2023 · GhanaGhana

Probe the genome of cassava, a staple crop for food security across Africa, using AgroNT, a variant of InstaDeep's Nucleotide Transformer trained on edible plant genomes.


Teaching & mentoring

Machine Learning for Biology practical
2023
GhanaDeep Learning Indaba, Ghana
Mentored the hands-on practical alongside InstaDeep and Google DeepMind colleagues.

Scientific Python training & mentoring
2018–2020
BelgiumBASF and Bayer Crop Science, Belgium
Trained and mentored researchers and data scientists in scientific Python (NumPy, pandas, BioPython, …) as manager of the Python/R analysis platform used across the company’s internal research community.


Skills

Machine learning
Transformers, diffusion models, large language models (LLMs), supervised & self-supervised learning

Frameworks
PyTorch, TensorFlow / TFX, NumPy, pandas

MLOps
KubernetesDockerKubernetes-based ML platforms (AIchor), Docker, CI/CD, workflow orchestration (Airflow, Dagster), experiment tracking (MLflow, Neptune, TensorBoard)

Cloud
AWSGoogle CloudBuckets, VMs, Cloud Run (AWS, GCP)

Scale
Distributed GPU training, HPC clusters (InstaDeep’s Kyber and EuroHPC’s MareNostrum 5)

Bioinformatics
BioPython, BLAST, ClustalW, Snakemake, BioNumerics

Languages
Python (primary), R

Just for fun

Project Euler profile: Solved 68, Level 2
Project Euler

My solutions to Project Euler, a series of challenging mathematical and computer-programming problems.

Year Days Stars
2024 5 days completed 10 stars
2023 2 days completed 5 stars
2022 6 days completed 12 stars
2021 8 days completed 17 stars
2020 8 days completed 18 stars
2019 6 days completed 14 stars
2018 5 days completed 10 stars
2017 6 days completed 13 stars
2016 15 days completed 30 stars
2015 12 days completed 27 stars
Advent of Code

My solutions to Advent of Code, the annual December programming-puzzle event (156 stars across 2015–2024).

Just another genome hacker. 🧬