Home » Research » PhD Research Programme » Complex systems & Metagenomics » Prepared for anything: detection and interpretation of novel viruses by metagenomics

Prepared for anything: detection and interpretation of novel viruses by metagenomics


Starting in 2020

Group and collaboration

Dr. Bas E. Dutilh, Metagenomics Group, Theoretical Biology and Bioinformatics
Institute for Biodynamics and Biocomplexity (IBB), Science for Life, Utrecht University (UU), Utrecht Bioinformatics Centre (UBC)
Dr. Ronnie de Jonge, Plant-Microbe Interactions, Institute for Environmental Biology (IEB), Science for Life, UU
Prof. Dr. Marion Koopmans, Viroscience, Erasmus Medical Centre
PhD Student: -

Project description

One Health research often depends on the accurate and sensitive detection of microbes and viruses in environmental sequencing datasets (metagenomes). Identifying viral sequences in metagenomes is like searching for a needle-in-a-haystack, because their sequences cannot be recognised with commonly used homology-based search tools. In this PhD-project researchers will integrate innovative machine learning and statistical approaches into a sensitive virus detection tool. “We will apply the tool to important plant, human, and other NCOH datasets. Our novel computational tools for virus identification in a range of biomes – including healthy and diseased humans, farm animals and wildlife – will lead to a better understanding of the role of viruses in human, plant, animal, and environmental systems.”


  1. collect a broad range of viromic sequencing datasets from both well-studied and neglected hosts and environments from NCOH collaborators and public resources, process them according to best practices developed in our group (QC, trimming, assembly) and create a training dataset for use below.
  2. develop a ML tool for virus detection. We will train a convolutional neural network (CNN) to distinguish viral from cellular sequences based on local nucleotide patterns, where cellular sequences will either be derived from the reference database or from assembled total community metagenomes that may be matched by host or environment with the virome samples to minimize microbiome biases. This will result in a data model that captures the essential characteristics distinguishing viral from cellular sequences. The model will be implemented in a virus detection tool whose performance (sensitivity, specificity) will ultimately be assessed with the unseen testing dataset.
  3. the ML tool developed above will be immediately applied to screen metagenomic datasets for viral sequences.
  4. compare the identified viruses to all viruses in the database to determine their evolutionary relationships according to community standards8,9. Where possible, we will classify them according to the International Committee on Taxonomy of Viruses (ICTV).
  5. characterise and functionally annotate the viruses identified above, with a special focus on the newly discovered lineages in plant- and rhizosphere-associated metagenomes. This will allow us to identify the role of viruses in stimulating, or inhibiting (phage therapy) plant health under stress conditions.
  6. the viral sequences and their functional annotations will be combined with the known viruses that are already available in the Jovian reference database. Developed in collaboration with partner RIVM, this tool allows single-nucleotide resolution analysis of viral strains in metagenomic datasets by aligning metagenomic or viromic sequencing reads to viral genomes in the reference database.

Complex Systems & Metagenomics is the overarching theme for more than 10 PhD tracks in NCOH projects to create new interdisciplinary, inter-thematic, and inter-institutional research collaborations.