Jeroen Van Goey
Staff Research Engineer in BioAI building generative models for biology
Machine learning research engineer, fascinated by the intersection of AI and science. I bridge scientific ML, biological domain knowledge, production-quality software, and team leadership.
Currently looking for Staff/Principal Research Engineer roles in AI for science (biology, proteomics, drug discovery, protein design, foundation models).
- Published InstaNovo (a transformer) and InstaNovo+ (a diffusion model) for de novo peptide sequencing in Nature Machine Intelligence.
- Lead BioAI teams working on peptide sequencing and signal-peptide design.
- Built and shipped production-grade ML systems across GPUs, HPC, cloud, and scientific data pipelines.
What I work on
I build and ship machine-learning systems for science. My main application is de novo peptide sequencing: generative models that read peptide sequences directly from raw mass spectra, the confidence estimation that makes those predictions trustworthy, and the proteome-scale analysis they feed.
- Independently validated: top-ranked across an external benchmark of 17 de novo sequencing tools on 83 datasets.
- Built for scale: distributed training on large public datasets (~63 million spectra) curated with automated LLM labelling.
Publications
8 publications, including in Nature Machine Intelligence.
InstaNovo in the news
The InstaNovo paper drew broad attention across the scientific press.
An AI revolution comes to protein sequencing
- Science NewsAI may help decode proteins DNA can't reveal
- Chemistry WorldAI takes step towards cracking biology's toughest problem
- BioTechniquesDoubling up: novel AI models improve de novo peptide sequencing
- Technology NetworksAI models accelerate discovery within protein science
- AZoLifeSciencesNovel AI models enhance disease diagnosis and pathogen ID
- ScienceDailyPossible game-changers within protein science and healthcare
- Mirage NewsAI models revolutionize protein science
- The Research CodeAI system reads protein sequences without databases
- Dong-a ScienceAI cracks protein sequencing (Korean)
More coverage: follow-up, press releases, blogs & social
Institutional press releases: InstaDeep · DTU · Science News Denmark (Novo Nordisk Foundation)
Blogs & newsletters: Plenty of Room · Proteomics News
Video: Revolutionary AI Tool InstaNovo Redefines Protein Sequencing
Community: r/proteomics · r/massspectrometry
On X: InstaDeep · viral thread (149.8K views) · Chemistry World · A. Laustsen · T. P. Jenkins · J. Bravo-Abad
On LinkedIn: InstaNovo · Proteomics & Immunopeptidomics · M. Busch · A. Laustsen
Follow-up:
Experience
InstaDeep
Staff Research Engineer · BioAI Lead
August 2022 – Present
Cape Town, South Africa
- Tech lead and hiring manager for two cross-disciplinary ML research teams of about 5 to 8 people: de novo peptide sequencing (with the Technical University of Denmark) and signal-peptide design for secretion efficiency (with BioNTech).
- Set research direction and take models from research to production (InstaNovo is available to commercial customers on DeepChain), balancing scientific rigour with reliable, maintainable engineering.
- Delivered a client collaboration with Syngenta, applying genomic language models (AgroNT, a Nucleotide Transformer trained on ~10.5 million genomic sequences spanning trillions of base pairs across 48 plant species) to accelerate crop trait research.
- Lead the BioAI department of InstaDeep’s Cape Town office and serve as the office’s site manager (an office of about 25 people); responsible for hiring and growing the BioAI teams across the Cape Town and Kigali offices.
- Directed the team’s ML engineering foundations: scalable, Transformer-based libraries (Python / PyTorch) for large-scale training on InstaDeep’s Kyber (~500 PFLOPs) and EuroHPC’s MareNostrum 5 (260 PFLOPs) supercomputers, and the cloud (AWS, GCP).
Barco
Senior Software Development Engineer, Machine Learning
February 2020 – August 2022
Kortrijk, Belgium
Built a TensorFlow Extended production pipeline (orchestrated with Apache Airflow) training deep-learning models on multispectral images to detect and classify melanoma skin cancers for Demetra, Barco’s dermatology imaging device. Engineered the pipeline with end-to-end data and model lineage tracking to deliver the traceability and reproducibility that regulated medical industries (FDA approval) require. Worked across the research, cloud-backend and mobile/web frontend teams.
BASF
Bioinformatics Researcher, Manager of the Python & R Platforms
August 2018 – January 2020
Zwijnaarde, Belgium
Owned the Python/R data-analysis platform used by 480 researchers and data scientists: technical support, proactive monitoring and root-cause analysis, training (NumPy, pandas, BioPython, …) and mentoring across the company’s internal research community.
Bayer Crop Science
Bioinformatics Researcher & Python Platform Manager
February 2018 – July 2018
Zwijnaarde, Belgium
Same platform role, prior to acquisition by BASF Agricultural Solutions.
Applied Maths (acquired by bioMérieux)
Bioinformatics Software Developer
September 2011 – January 2018
Sint-Martens-Latem, Belgium
Worked on BioNumerics, a bioinformatics suite for integrated analysis of biological data, writing custom Python scripts and extensions for clients in the clinical, academic and government sectors.
Education
HOWEST Hogeschool West-Vlaanderen
2018–2019
Kortrijk, Belgium
Microdegree, Machine Learning & Deep Learning
Katholieke Universiteit Leuven
2010–2012
Leuven, Belgium
Bioinformatics (partial credits obtained)
University of Antwerp
1997–2004
Antwerp, Belgium
M.Sc., Biology
Universitat de Barcelona
2002
Barcelona, Spain
Erasmus interuniversity exchange
Møglestu videregående skole
1996–1997
Lillesand, Norway
AFS intercultural exchange program
Onze-Lieve-Vrouwecollege, Antwerp
1990–1996
Antwerp, Belgium
Secondary education: mathematics–modern languages
Projects
Hackathons
I have co-created, organized and judged machine-learning hackathons at the Deep Learning Indaba and its regional IndabaX events.
Training a Machine Learning Interatomic Potential (MLIP) model (predicting the energy and forces of atomic systems) with the mlip library, using the 3BPA molecule (3-(benzyloxy)pyridin-2-amine) as the benchmark system.
Using InstaNovo and de novo peptide sequencing to characterise African snake venoms for better antivenoms.
Predicting desert locust breeding grounds from remote-sensing data, an early-warning task that helps target control efforts before swarms threaten crops and food security across Africa.
Probe the genome of cassava, a staple crop for food security across Africa, using AgroNT, a variant of InstaDeep's Nucleotide Transformer trained on edible plant genomes.
Teaching & mentoring
Machine Learning for Biology practical
2023
Deep Learning Indaba, Ghana
Mentored the hands-on practical alongside InstaDeep and Google DeepMind colleagues.
Scientific Python training & mentoring
2018–2020
BASF and Bayer Crop Science, Belgium
Trained and mentored researchers and data scientists in scientific Python (NumPy, pandas, BioPython, …) as manager of the Python/R analysis platform used across the company’s internal research community.
Skills
Machine learning
Transformers, diffusion models, large language models (LLMs), supervised & self-supervised learning
Frameworks
PyTorch, TensorFlow / TFX, NumPy, pandas
MLOps
Kubernetes-based ML platforms (AIchor), Docker, CI/CD, workflow orchestration (Airflow, Dagster), experiment tracking (MLflow, Neptune, TensorBoard)
Cloud
Buckets, VMs, Cloud Run (AWS, GCP)
Scale
Distributed GPU training, HPC clusters (InstaDeep’s Kyber and EuroHPC’s MareNostrum 5)
Bioinformatics
BioPython, BLAST, ClustalW, Snakemake, BioNumerics
Languages
Python (primary), R
Just for fun
My solutions to Project Euler, a series of challenging mathematical and computer-programming problems.
| Year | Days | Stars |
|---|---|---|
| 2024 | ||
| 2023 | ||
| 2022 | ||
| 2021 | ||
| 2020 | ||
| 2019 | ||
| 2018 | ||
| 2017 | ||
| 2016 | ||
| 2015 |
My solutions to Advent of Code, the annual December programming-puzzle event (156 stars across 2015–2024).
Just another genome hacker. 🧬