Jeroen Van Goey
Staff Research Engineer in BioAI building generative models for biology
Machine learning research engineer, fascinated by the intersection of AI and science. I bridge scientific ML, biological domain knowledge, production-quality software, and team leadership.
Currently looking for Staff/Principal Research Engineer roles in AI for science (biology, proteomics, drug discovery, protein design, foundation models).
- Published InstaNovo (a transformer) and InstaNovo+ (a diffusion model) for de novo peptide sequencing in Nature Machine Intelligence.
- Led BioAI teams working on peptide sequencing and signal-peptide design.
- Built and shipped production-grade ML systems across GPUs, HPC, cloud, and scientific data pipelines.
What I work on
I build and ship machine-learning systems for science. My main application is de novo peptide sequencing: generative models that read peptide sequences directly from raw mass spectra, the confidence estimation that makes those predictions trustworthy, and the proteome-scale analysis they feed.
- Independently validated: top-ranked across an external benchmark of 17 de novo sequencing tools on 83 datasets.
- Built for scale: distributed training on large public datasets (~63 million spectra) curated with automated LLM labelling.
Publications
8 publications, including in Nature Machine Intelligence.
InstaNovo in the news
The InstaNovo paper drew broad attention across the scientific press.
- ScienceAn AI revolution comes to protein sequencing
- Science NewsAI may help decode proteins DNA can't reveal
- Chemistry WorldAI takes step towards cracking biology's toughest problem
- BioTechniquesDoubling up: novel AI models improve de novo peptide sequencing
- Technology NetworksAI models accelerate discovery within protein science
- AZoLifeSciencesNovel AI models enhance disease diagnosis and pathogen ID
- ScienceDailyPossible game-changers within protein science and healthcare
- Mirage NewsAI models revolutionize protein science
- The Research CodeAI system reads protein sequences without databases
- Dong-a ScienceAI cracks protein sequencing (Korean)
More coverage: follow-up, press releases, blogs & social
Institutional press releases: InstaDeep · DTU · Science News Denmark (Novo Nordisk Foundation)
Blogs & newsletters: Plenty of Room · Proteomics News
Video: Revolutionary AI Tool InstaNovo Redefines Protein Sequencing
Community: r/proteomics · r/massspectrometry
On X: InstaDeep · viral thread (149.8K views) · Chemistry World · A. Laustsen · T. P. Jenkins · J. Bravo-Abad
On LinkedIn: InstaNovo · Proteomics & Immunopeptidomics · M. Busch · A. Laustsen
Follow-up:
Experience
InstaDeep
Staff Research Engineer in BioAI
August 2022 – Present
Cape Town, South Africa
- Tech lead and hiring manager for two cross-disciplinary ML research teams of about 5 to 8 people: de novo peptide sequencing (with the Technical University of Denmark) and signal-peptide design for secretion efficiency (with BioNTech).
- Delivered a client collaboration with Syngenta, applying genomic language models (AgroNT, a Nucleotide Transformer trained on ~10.5 million genomic sequences spanning trillions of base pairs across 48 plant species) to accelerate crop trait research.
- Set research direction and take models from research to production, balancing scientific rigour with reliable, maintainable engineering.
- Lead the BioAI department of InstaDeep’s Cape Town office and serve as the office’s site manager (an office of about 25 people); responsible for hiring and growing the BioAI teams across the Cape Town and Kigali offices.
- Directed the team’s ML engineering foundations: scalable, Transformer-based libraries (Python / PyTorch) for large-scale training on InstaDeep’s Kyber (~500 PFLOPs) and EuroHPC’s MareNostrum 5 (260 PFLOPs) supercomputers, and the cloud (AWS, GCP).
Barco
Senior Software Development Engineer, Machine Learning
February 2020 – August 2022
Kortrijk, Belgium
Built a TensorFlow Extended production pipeline (orchestrated with Apache Airflow) training deep-learning models on multispectral images to detect and classify melanoma skin cancers for Demetra, Barco’s dermatology imaging device. Engineered the pipeline with end-to-end data and model lineage tracking to deliver the traceability and reproducibility that regulated medical industries (FDA approval) require. Worked across the research, cloud-backend and mobile/web frontend teams.
BASF
Bioinformatics Researcher, Manager of the Python & R Platforms
August 2018 – January 2020
Zwijnaarde, Belgium
Owned the Python/R data-analysis platform used by 480 researchers and data scientists: technical support, proactive monitoring and root-cause analysis, training (NumPy, pandas, BioPython, …) and mentoring across the company’s internal research community.
Bayer Crop Science
Bioinformatics Researcher & Python Platform Manager
February 2018 – July 2018
Zwijnaarde, Belgium
Same platform role, prior to acquisition by BASF Agricultural Solutions.
Applied Maths (acquired by bioMérieux)
Bioinformatics Software Developer
September 2011 – January 2018
Sint-Martens-Latem, Belgium
Worked on BioNumerics, a bioinformatics suite for integrated analysis of biological data, writing custom Python scripts and extensions for clients in the clinical, academic and government sectors.
Education
Hogeschool West-Vlaanderen (Howest)
2018–2019
Microdegree, Machine Learning & Deep Learning
Katholieke Universiteit Leuven
2010–2012
Bioinformatics, (partial credits obtained)
University of Antwerp
1997–2004
M.Sc., Biology
Projects
Hackathons
I have co-created, organized and judged machine-learning hackathons at the Deep Learning Indaba and its regional IndabaX events.
Training a Machine Learning Interatomic Potential (MLIP) model (predicting the energy and forces of atomic systems) with the mlip library, using the 3BPA molecule (3-(benzyloxy)pyridin-2-amine) as the benchmark system.
Using InstaNovo and de novo peptide sequencing to characterise African snake venoms for better antivenoms.
Predicting desert locust breeding grounds from remote-sensing data, an early-warning task that helps target control efforts before swarms threaten crops and food security across Africa.
Probe the genome of cassava, a staple crop for food security across Africa, using AgroNT, a variant of InstaDeep's Nucleotide Transformer trained on edible plant genomes.
Teaching & mentoring
Machine Learning for Biology practical
2023
Deep Learning Indaba, Ghana
Mentored the hands-on practical alongside InstaDeep and Google DeepMind colleagues.
Scientific Python training & mentoring
2018–2020
BASF and Bayer Crop Science, Belgium
Trained and mentored researchers and data scientists in scientific Python (NumPy, pandas, BioPython, …) as manager of the Python/R analysis platform used across the company’s internal research community.
Skills
Machine learning
Transformers, diffusion models, large language models (LLMs), supervised & self-supervised learning
Frameworks
PyTorch, TensorFlow / TFX, NumPy, pandas
Scale & MLOps
Distributed GPU training, HPC clusters (InstaDeep’s Kyber and EuroHPC’s MareNostrum 5), Kubernetes-based ML platforms (AIchor), cloud (AWS, GCP), Docker, CI/CD, workflow orchestration (Airflow, Dagster), experiment tracking (MLflow, Neptune, TensorBoard)
Bioinformatics
BioPython, BLAST, ClustalW, Snakemake, BioNumerics
Languages
Python (primary), R
Just for fun
My solutions to Project Euler, a series of challenging mathematical and computer-programming problems.
| Year | Days | Stars |
|---|---|---|
| 2024 | ||
| 2023 | ||
| 2022 | ||
| 2021 | ||
| 2020 | ||
| 2019 | ||
| 2018 | ||
| 2017 | ||
| 2016 | ||
| 2015 |
My solutions to Advent of Code, the annual December programming-puzzle event (156 stars across 2015–2024).
Just another genome hacker. 🧬