Theoretical and practical metagenomic approaches to viral discovery
IBI 5071 - 2019
Syllabus
Janus - USP post-graduation system

Institute of Biomedical Sciences
 
Post-Graduate Program on Bioinformatics
 
Discipline:

IBI5071-1 - Theoretical and practical metagenomic approaches to virus detection


Class credits: 2
Homework credits : 0
Total Time: 30 hours
Concentration Area: 95131
Activation: 07/01/2017

Objective
  • This course aims to present fundamental concepts of profile HMM construction using viral sequences and their application on genomic and metagenomic data. It is also an objective of the course to introduce different machine learning approaches applied to viral sequence studies.
 
Instructors
Arthur Gruber (university of São Paulo, Brazil)
Manja Marz (Friedrich Schiller University Jena, Germany)
 
Course description

The advent of next-generation sequencing has brought the possibility of sequencing not only a single genome but the genomes of a whole community of microorganisms of a biome. Metagenomics allows estimating the biological diversity of a sample, the whole set of enzymes and pathways present in the community, as well as to detect unknown organisms. We currently know only a small fraction of the viral diversity. The use of metagenomic data and the identification of emerging viruses represent a major challenge in terms of bioinformatics. In this discipline we intend to cover some methods and tools for processing metagenomic data and their use for the viral discovery, including theoretical concepts and practical sessions.


The advent of next-generation sequencing has brought the possibility of sequencing not only a single genome but the genomes of a whole community of microorganisms of a biome. We currently know only a small fraction of the viral diversity. The use of metagenomic data and the identification of emerging viruses represent a major challenge in terms of bioinformatics. First, viruses evolve much faster than prokaryotes and eukaryotes, leading to a higher divergence of the sequences and making their detection by conventional pairwise alignment methods more difficult. Second, the number of viral genomes available on public databases is relatively low, compared to archaea and bacteria, for instance. This aspect also makes viral sequence detection and classification much more challenging. In this course, we intend to cover some innovative methods that have been recently developed and that can increase the sensitivity of detection of evolutionary remote viruses. One of the approaches involves the construction and application of profile HMMs. We will teach conceptual aspects of profile HMM construction, especially for taxonomically specific groups of viruses. Also, we will offer practical sessions where the students will be able to build and apply profile HMMs in metagenomic data for viral detection and discovery. We also intend to cover the fundamentals of different machine learning approaches and present in practical sessions different methods applied to viral detection, classification, virus-host interactions, among other topics.

 
Content
  1. Metagenomics - challenges for viral detection and discovery
  2. Profile HMM construction for viral detection discovery
  3. Screening metagenomic data with profile HMMs
  4. Targeted progressive assembly using profile HMMs as seeds
  5. Finding proviruses in bacterial genomes using profile HMMs
  6. Introduction to machine learning methods
  7. SVM to detect viral miRNAs
  8. Random Forest to detect viral miRNA
  9. PCA for viral host classification
  10. >CNN for viral host classification
  11. Introduction into RNA world
  12. Folding algorithm: MacCaskill and Partition functions
  13. RNAfold to determine the secondary structures of RNA viruses
  14. Vienna RNA Package for the study of virus host interactions>
  15. LRIscan/Circos (long-range Interactions of segmented viruses)
  16. Covariance models
  17. Infernal to detect viral elements from (meta-)genomic samples

 
Evaluation Method
  • Theory test at the end of the course.
Bibliography
     
  • No textbook is required for this course. Some papers covering the main topics are listed below. Additional papers will be assigned and made available on the course’s web site in advance.
  1. Alves, J.M., de Oliveira, A.L., Sandberg, T.O., Moreno-Gallego, J.L., de Toledo, M.A., de Moura, E.M., Oliveira, L.S., Durham, A.M., Mehnert, D.U., Zanotto, P.M., Reyes, A., and Gruber, A. (2016). GenSeed-HMM: A Tool for Progressive Assembly Using Profile HMMs as Seeds and its Application in Alpavirinae Viral Discovery from Metagenomic Data. Front Microbiol. 7, 269.
  2. Bexfield, N., and Kellam, P. (2011). Metagenomics and the molecular identification of novel viruses. Vet J 190, 191-198.
  3. Bibby, K., and Peccia, J. (2013). Identification of viral pathogen diversity in sewage sludge by metagenome analysis. Environ Sci Technol 47, 1945-1951.
  4. Dutilh, B.E., Schmieder, R., Nulton, J., Felts, B., Salamon, P., Edwards, R.A., and Mokili, J.L. (2012). Reference-independent comparative metagenomics using cross-assembly: crAss. Bioinformatics 28, 3225-3231.
  5. Edwards RA, McNair K, Faust K, Raes J, Dutilh BE. (2016). Computational approaches to predict bacteriophage-host relationships. FEMS Microbiol Rev. 40, 258-272.
  6. Fancello, L., Raoult, D., and Desnues, C. (2012). Computational tools for viral metagenomics and their application in clinical research. Virology 434, 162-174.
  7. Grazziotin, A.L., Koonin, E.V., Kristensen, D.M. (2016). Prokaryotic Virus Orthologous Groups (pVOGs): a resource for comparative genomics and protein family annotation. Nucleic Acids Res 45(D1):D491-D498.
  8. Huerta-Cepas, J., Szklarczyk, D., Forslund, K., Cook, H., Heller, D., Walter, M.C., Rattei, T., Mende, D.R., Sunagawa, S., Kuhn, M., Jensen, L.J., Von Mering, C., and Bork, P. (2015). eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res.
  9. Kristensen, D.M., Waller, A.S., Yamada, T., Bork, P., Mushegian, A.R., and Koonin, E.V. (2013). Orthologous gene clusters and taxon signature genes for viruses of prokaryotes. J Bacteriol 195, 941-950.
  10. Mokili, J.L., Rohwer, F., and Dutilh, B.E. (2012). Metagenomics and future perspectives in virus discovery. Curr Opin Virol 2, 63-77.
  11. Reyes, A., Haynes, M., Hanson, N., Angly, F.E., Heath, A.C., Rohwer, F., and Gordon, J.I. (2010). Viruses in the faecal microbiota of monozygotic twins and their mothers. Nature 466, 334-338.
  12. Roux, S., Enault, F., Hurwitz, B.L., and Sullivan, M.B. (2015). VirSorter: mining viral signal from microbial genomic data. PeerJ 3, e985.
  13. Sharma, D., Priyadarshini, P., and Vrati, S. (2015). Unraveling the web of viroinformatics: computational tools and databases in virus research. J Virol 89, 1489-1501.
  14. Skewes-Cox, P., Sharpton, T.J., Pollard, K.S., and Derisi, J.L. (2014). Profile hidden Markov models for the detection of viruses within metagenomic sequence data. PLoS One 9, e105067.
  15. Smits, S.L., Bodewes, R., Ruiz-Gonzalez, A., Baumgartner, W., Koopmans, M.P., Osterhaus, A.D., and Schurch, A.C. (2015). Recovering full-length viral genomes from metagenomes. Front Microbiol 6, 1069.
  16. Tang, P., and Chiu, C. (2010). Metagenomics for the discovery of novel human viruses. Future Microbiol 5, 177-189.
  17. Yutin, N., Wolf, Y.I., Raoult, D., and Koonin, E.V. (2009). Eukaryotic large nucleo-cytoplasmic DNA viruses: clusters of orthologous genes and reconstruction of viral genome evolution. Virol J 6, 223.

© 2019 Arthur Gruber

Instituto de Ciências Biomédicas - Av. Prof. Lineu Prestes, 1374 - Cidade Universitária - SP

Last update: August 12, 2019