publications
2020
- A network-based integrated framework for predicting virus–prokaryote interactionsWang, Weili, Ren, Jie, Tang, Kujin, Dart, Emily, Ignacio-Espinoza, Julio Cesar, Fuhrman, Jed A, Braun, Jonathan, Sun, Fengzhu, and Ahlgren, Nathan ANAR Genomics and Bioinformatics 2020
Metagenomic sequencing has greatly enhanced the discovery of viral genomic sequences; however, it remains challenging to identify the host(s) of these new viruses. We developed VirHostMatcher-Net, a flexible, network-based, Markov random field framework for predicting virus–prokaryote interactions using multiple, integrated features: CRISPR sequences and alignment-free similarity measures (}s_2\^*} and WIsH). Evaluation of this method on a benchmark set of 1462 known virus–prokaryote pairs yielded host prediction accuracy of 59% and 86% at the genus and phylum levels, representing 16–27% and 6–10% improvement, respectively, over previous single-feature prediction approaches. We applied our host prediction tool to crAssphage, a human gut phage, and two metagenomic virus datasets: marine viruses and viral contigs recovered from globally distributed, diverse habitats. Host predictions were frequently consistent with those of previous studies, but more importantly, this new tool made many more confident predictions than previous tools, up to nearly 3-fold more (n \textgreater 27 000), greatly expanding the diversity of known virus–host interactions.
2019
- Long-term stability and Red Queen-like strain dynamics in marine virusesIgnacio-Espinoza, J. Cesar, Ahlgren, Nathan A., and Fuhrman, Jed A.Nature Microbiology 2019
Viruses that infect microorganisms dominate marine microbial communities numerically, with impacts ranging from host evolution to global biogeochemical cycles1,2. However, virus community dynamics, necessary for conceptual and mechanistic model development, remains difficult to assess. Here, we describe the long-term stability of a viral community by analysing the metagenomes of near-surface 0.02–0.2 \mum samples from the San Pedro Ocean Time-series3 that were sampled monthly over 5 years. Of 19,907 assembled viral contigs (\textgreater5 kb, mean 15 kb), 97% were found in each sample (by \textgreater98% ID metagenomic read recruitment) to have relative abundances that ranged over seven orders of magnitude, with limited temporal reordering of rank abundances along with little change in richness. Seasonal variations in viral community composition were superimposed on the overall stability; maximum community similarity occurred at 12-month intervals. Despite the stability of viral genotypic clusters that had 98% sequence identity, viral sequences showed transient variations in single-nucleotide polymorphisms (SNPs) and constant turnover of minor population variants, each rising and falling over a few months, reminiscent of Red Queen dynamics4. The rise and fall of variants within populations, interpreted through the perspective of known virus–host interactions5, is consistent with the hypothesis that fluctuating selection acts on a microdiverse cloud of strains, and this succession is associated with ever-shifting virus–host defences and counterdefences. This results in long-term virus–host coexistence that is facilitated by perpetually changing minor variants.
- Dynamic marine viral infections and major contribution to photosynthetic processes shown by spatiotemporal picoplankton metatranscriptomesSieradzki, Ella T, Ignacio-Espinoza, J Cesar, Needham, David M, Fichot, Erin B, and Fuhrman, Jed ANature Communications 2019
Viruses provide top-down control on microbial communities, yet their direct study in natural environments was hindered by culture limitations. The advance of bioinformatics enables cultivation-independent study of viruses. Many studies assemble new viral genomes and study viral diversity using marker genes from free viruses. Here we use cellular metatranscriptomics to study active community-wide viral infections. Recruitment to viral contigs allows tracking infection dynamics over time and space. Our assemblies represent viral populations, but appear biased towards low diversity viral taxa. Tracking relatives of published T4-like cyanophages and pelagiphages reveals high genomic continuity. We determine potential hosts by matching dynamics of infection with abundance of particular microbial taxa. Finally, we quantify the relative contribution of cyanobacteria and viruses to photosystem-II psbA (reaction center) expression in our study sites. We show sometimes \textgreater50% of all cyanobacterial+viral psbA expression is of viral origin, highlighting the contribution of viruses to photosynthesis and oxygen production.
- Discovery and ecogenomic context of a global Caldiserica-related phylum active in thawing permafrost, Candidatus Cryosericota phylum nov., Ca. Cryosericia class nov., Ca. Cryosericales ord. nov., Ca. Cryosericaceae fam. nov., comprising the four species Cryosericum septentrionale gen. nov. sp. nov., Ca. C. hinesii sp. nov., Ca. C. odellii sp. nov., Ca. C. terrychapinii sp. nov.Martinez, Miguel A., Woodcroft, Ben J., Ignacio-Espinoza, J. Cesar, Zayed, Ahmed A., Singleton, Caitlin M., Boyd, Joel A., Li, Yueh Fen, Purvine, Samuel, Maughan, Heather, Hodgkins, Suzanne B., Anderson, Darya, Sederholm, Maya, Temperton, Ben, Bolduc, Benjamin, Saleska, Scott R., Tyson, Gene W., and Rich, Virginia I.Systematic and Applied Microbiology 2019
The phylum Caldiserica was identified from the hot spring 16S rRNA gene lineage ‘OP5’ and named for the sole isolate Caldisericum exile, a hot spring sulfur-reducing chemoheterotroph. Here we characterize 7 Caldiserica metagenome-assembled genomes (MAGs) from a thawing permafrost site in Stordalen Mire, Arctic Sweden. By 16S rRNA and marker gene phylogenies, and average nucleotide and amino acid identities, these Stordalen Mire Caldiserica (SMC) MAGs form part of a divergent clade from C. exile. Genome and meta-transcriptome and proteome analyses suggest that unlike Caldisericum, the SMCs (i) are carbohydrate- and possibly amino acid fermenters that can use labile plant compounds and peptides, and (ii) encode adaptations to low temperature. The SMC clade rose to community dominance within permafrost, with a peak metagenome-based relative abundance of ∼60%. It was also physiologically active in the upper seasonally-thawed soil. Beyond Stordalen Mire, analysis of 16S rRNA gene surveys indicated a global distribution of this clade, predominantly in anaerobic, carbon-rich and cold environments. These findings establish the SMCs as four novel phenotypically and ecologically distinct species within a single novel genus, distinct from C. exile clade at the phylum level. The SMCs are thus part of a novel cold-habitat phylum for an understudied, globally-distributed superphylum encompassing the Caldiserica. We propose the names Candidatus Cryosericota phylum nov., Ca. Cryosericia class nov., Ca. Cryosericales ord. nov., Ca. Cryosericaceae fam. nov., Ca. Cryosericum gen. nov., Ca. Cryosericum septentrionale sp. nov., Ca. C. hinesii sp. nov., Ca. C. odellii sp. nov., and Ca. C. terrychapinii sp. nov.
2016
- Illuminating structural proteins in viral "dark matter" with metaproteomicsBrum, Jennifer R., Ignacio-Espinoza, J. Cesar, Kim, Eun Hae, Trubl, Gareth, Jones, Robert M., Roux, Simon, VerBerkmoes, Nathan C., Rich, Virginia I., and Sullivan, Matthew B.Proceedings of the National Academy of Sciences of the United States of America 2016
Viruses are ecologically important, yet environmental virology is limited by dominance of unannotated genomic sequences representing taxonomic and functional "viral dark matter." Although recent analytical advances are rapidly improving taxonomic annotations, identifying functional darkmatter remains problematic. Here, we apply paired metaproteomics and dsDNA-targeted metagenomics to identify 1,875 virion-associated proteins from the ocean. Over onehalf of these proteins were newly functionally annotated and represent abundant and widespread viral metagenome-derived protein clusters (PCs). One primarily unannotated PC dominated the dataset, but structural modeling and genomic context identified this PC as a previously unidentified capsid protein from multiple uncultivated tailed virus families. Furthermore, four of the five most abundant PCs in the metaproteome represent capsid proteins containing the HK97-like protein fold previously found in many viruses that infect all three domains of life. The dominance of these proteins within our dataset, as well as their global distribution throughout the world’s oceans and seas, supports prior hypotheses that this HK97-like protein fold is the most abundant biological structure on Earth. Together, these culture-independent analyses improve virion-associated protein annotations, facilitate the investigation of proteins within natural viral communities, and offer a high-throughput means of illuminating functional viral dark matter.
- Plankton networks driving carbon export in the oligotrophic oceanGuidi, Lionel, Chaffron, Samuel, Bittner, Lucie, Eveillard, Damien, Larhlimi, Abdelhalim, Roux, Simon, Darzi, Youssef, Audic, Stephane, Berline, Léo, Brum, Jennifer R., Coelho, Luis Pedro, Ignacio-Espinoza, J. Cesar, Malviya, Shruti, Sunagawa, Shinichi, Dimier, Céline, Kandels-Lewis, Stefanie, Picheral, Marc, Poulain, Julie, Searson, Sarah, Stemmann, Lars, Not, Fabrice, Hingamp, Pascal, Speich, Sabrina, Follows, Mick, Karp-Boss, Lee, Boss, Emmanuel, Ogata, Hiroyuki, Pesant, Stephane, Weissenbach, Jean, Wincker, Patrick, Acinas, Silvia G., Bork, Peer, De Vargas, Colomban, Iudicone, Daniele, Sullivan, Matthew B., Raes, Jeroen, Karsenti, Eric, Bowler, Chris, and Gorsky, GabrielNature 2016
The biological carbon pump is the process by which CO 2 is transformed to organic carbon via photosynthesis, exported through sinking particles, and finally sequestered in the deep ocean. While the intensity of the pump correlates with plankton community composition, the underlying ecosystem structure driving the process remains largely uncharacterized. Here we use environmental and metagenomic data gathered during the Tara Oceans expedition to improve our understanding of carbon export in the oligotrophic ocean. We show that specific plankton communities, from the surface and deep chlorophyll maximum, correlate with carbon export at 150 m and highlight unexpected taxa such as Radiolaria and alveolate parasites, as well as Synechococcus and their phages, as lineages most strongly associated with carbon export in the subtropical, nutrient-depleted, oligotrophic ocean. Additionally, we show that the relative abundance of a few bacterial and viral genes can predict a significant fraction of the variability in carbon export in these regions.
- Genomic differentiation among wild cyanophages despite widespread horizontal gene transferGregory, Ann C., Solonenko, Sergei A., Ignacio-Espinoza, J. Cesar, LaButti, Kurt, Copeland, Alex, Sudek, Sebastian, Maitland, Ashley, Chittick, Lauren, Santos, Filipa, Weitz, Joshua S., Worden, Alexandra Z., Woyke, Tanja, and Sullivan, Matthew B.BMC Genomics 2016
Background: Genetic recombination is a driving force in genome evolution. Among viruses it has a dual role. For genomes with higher fitness, it maintains genome integrity in the face of high mutation rates. Conversely, for genomes with lower fitness, it provides immediate access to sequence space that cannot be reached by mutation alone. Understanding how recombination impacts the cohesion and dissolution of individual whole genomes within viral sequence space is poorly understood across double-stranded DNA bacteriophages (a.k.a phages) due to the challenges of obtaining appropriately scaled genomic datasets. Results: Here we explore the role of recombination in both maintaining and differentiating whole genomes of 142 wild double-stranded DNA marine cyanophages. Phylogenomic analysis across the 51 core genes revealed ten lineages, six of which were well represented. These phylogenomic lineages represent discrete genotypic populations based on comparisons of intra- and inter- lineage shared gene content, genome-wide average nucleotide identity, as well as detected gaps in the distribution of pairwise differences between genomes. McDonald-Kreitman selection tests identified putative niche-differentiating genes under positive selection that differed across the six well-represented genotypic populations and that may have driven initial divergence. Concurrent with patterns of recombination of discrete populations, recombination analyses of both genic and intergenic regions largely revealed decreased genetic exchange across individual genomes between relative to within populations. Conclusions: These findings suggest that discrete double-stranded DNA marine cyanophage populations occur in nature and are maintained by patterns of recombination akin to those observed in bacteria, archaea and in sexual eukaryotes.
2015
- Life-style and genome structure of marine Pseudoalteromonas siphovirus B8b isolated from the northwestern Mediterranean SeaLara, Elena, Holmfeldt, Karin, Solonenko, Natalie, Sà, Elisabet Laia, Ignacio-Espinoza, J. Cesar, Cornejo-Castillo, Francisco M., Verberkmoes, Nathan C., Vaqué, Dolors, Sullivan, Matthew B., and Acinas, Silvia G.PLoS ONE 2015
Marine viruses (phages) alter bacterial diversity and evolution with impacts on marine biogeochemical cycles, and yet few well-developed model systems limit opportunities for hypothesis testing. Here we isolate phage B8b from the Mediterranean Sea using Pseudoalteromonas sp. QC-44 as a host and characterize it using myriad techniques. Morphologically, phage B8b was classified as a member of the Siphoviridae family. One-step growth analyses showed that this siphovirus had a latent period of 70 min and released 172 new viral particles per cell. Host range analysis against 89 bacterial host strains revealed that phage B8b infected 3 Pseudoalteromonas strains (52 tested, \textgreater99.9%16S rRNA gene nucleotide identity) and 1 non-Pseudoaltermonas strain belonging to Alteromonas sp. (37 strains from 6 genera tested), which helps bound the phylogenetic distance possible in a phage-mediated horizontal gene transfer event. The Pseudoalteromonas phage B8b genome size was 42.7 kb, with clear structural and replication modules where the former were delineated leveraging identification of 16 structural genes by virion structural proteomics, only 4 of which had any similarity to known structural proteins. In nature, this phage was common in coastal marine environments in both photic and aphotic layers (found in 26.5% of available viral metagenomes), but not abundant in any sample (average per sample abundance was 0.65% of the reads). Together these data improve our understanding of siphoviruses in nature, and provide foundational information for a new ’rare virosphere’ phage-host model system.
- Patterns and ecological drivers of ocean viral communitiesBrum, Jennifer R., Ignacio-Espinoza, J. Cesar, Roux, Simon, Doulcier, Guilhem, Acinas, Silvia G., Alberti, Adriana, Chaffron, Samuel, Cruaud, Corinne, De Vargas, Colomban, Gasol, Josep M., Gorsky, Gabriel, Gregory, Ann C., Guidi, Lionel, Hingamp, Pascal, Iudicone, Daniele, Not, Fabrice, Ogata, Hiroyuki, Pesant, Stéphane, Poulos, Bonnie T., Schwenck, Sarah M., Speich, Sabrina, Dimier, Celine, Kandels-Lewis, Stefanie, Picheral, Marc, Searson, Sarah, Bork, Peer, Bowler, Chris, Sunagawa, Shinichi, Wincker, Patrick, Karsenti, Eric, Sullivan, Matthew B., Boss, Emmanuel, Follows, Michael, Grimsley, Nigel, Jaillon, Olivier, Karp-Boss, Lee, Krzic, Uros, Raes, Jeroen, Reynaud, Emmanuel G., Sardet, Christian, Sieracki, Mike, Stemmann, Lars, Velayoudon, Didier, and Weissenbach, JeanScience 2015
Viruses influence ecosystems by modulating microbial population size, diversity, metabolic outputs, and gene flow. Here, we use quantitative double-stranded DNA (dsDNA) viral-fraction metagenomes (viromes) and whole viral community morphological data sets from 43 Tara Oceans expedition samples to assess viral community patterns and structure in the upper ocean. Protein cluster cataloging defined pelagic upper-ocean viral community pan and core gene sets and suggested that this sequence space is well-sampled. Analyses of viral protein clusters, populations, and morphology revealed biogeographic patterns whereby viral communities were passively transported on oceanic currents and locally structured by environmental conditions that affect host community structure. Together, these investigations establish a global ocean dsDNA viromic data set with analyses supporting the seed-bank hypothesis to explain how oceanic viral communities maintain high local diversity.
- Determinants of community structure in the global plankton interactomeLima-Mendez, Gipsi, Faust, Karoline, Henry, Nicolas, Decelle, Johan, Colin, Sébastien, Carcillo, Fabrizio, Chaffron, Samuel, Ignacio-Espinoza, J. Cesar, Roux, Simon, Vincent, Flora, Bittner, Lucie, Darzi, Youssef, Wang, Jun, Audic, Stéphane, Berline, Léo, Bontempi, Gianluca, Cabello, Ana M., Coppola, Laurent, Cornejo-Castillo, Francisco M., D’Ovidio, Francesco, De Meester, Luc, Ferrera, Isabel, Garet-Delmas, Marie José, Guidi, Lionel, Lara, Elena, Pesant, Stéphane, Royo-Llonch, Marta, Salazar, Guillem, Sánchez, Pablo, Sebastian, Marta, Souffreau, Caroline, Dimier, Céline, Picheral, Marc, Searson, Sarah, Kandels-Lewis, Stefanie, Gorsky, Gabriel, Not, Fabrice, Ogata, Hiroyuki, Speich, Sabrina, Stemmann, Lars, Weissenbach, Jean, Wincker, Patrick, Acinas, Silvia G., Sunagawa, Shinichi, Bork, Peer, Sullivan, Matthew B., Karsenti, Eric, Bowler, Chris, De Vargas, Colomban, Raes, Jeroen, Boss, Emmanuel, Follows, Michael, Grimsley, Nigel, Hingamp, Pascal, Iudicone, Daniele, Jaillon, Olivier, Karp-Boss, Lee, Krzic, Uros, Reynaud, Emmanuel G., Sardet, Christian, Sieracki, Mike, and Velayoudon, DidierScience 2015
Species interaction networks are shaped by abiotic and biotic factors. Here, as part of the Tara Oceans project, we studied the photic zone interactome using environmental factors and organismal abundance profiles and found that environmental factors are incomplete predictors of community structure. We found associations across plankton functional types and phylogenetic groups to be nonrandomly distributed on the network and driven by both local and global patterns. We identified interactions among grazers, primary producers, viruses, and (mainly parasitic) symbionts and validated networkgenerated hypotheses using microscopy to confirm symbiotic relationships. We have thus provided a resource to support further research on ocean food webs and integrating biological components into ocean models.
- Closing the gaps on the viral photosystem-IpsaDCAB gene organizationRoitman, Sheila, Flores-Uribe, José, Philosof, Alon, Knowles, Ben, Rohwer, Forest, Ignacio-Espinoza, J. Cesar, Sullivan, Matthew B., Cornejo-Castillo, Francisco M., Sánchez, Pablo, Acinas, Silvia G., Dupont, Chris L., and Béjà, OdedEnvironmental Microbiology 2015
Marine photosynthesis is largely driven by cyanobacteria, namely Synechococcus and Prochlorococcus. Genes encoding for photosystem (PS) I and II reaction centre proteins are found in cyanophages and are believed to increase their fitness. Two viral PSI gene arrangements are known, psaJF→C→A→B→K→E→D and psaD→C→A→B. The shared genes between these gene cassettes and their encoded proteins are distinguished by %G+C and protein sequence respectively. The data on the psaD→C→A→B gene organization were reported from only two partial gene cassettes coming from Global Ocean Sampling stations in the Pacific and Indian oceans. Now we have extended our search to 370 marine stations from six metagenomic projects. Genes corresponding to both PSI gene arrangements were detected in the Pacific, Indian and Atlantic oceans, confined to a strip along the equator (30°N and 30°S). In addition, we found that the predicted structure of the viral PsaA protein from the psaD→C→A→B organization contains a lumenal loop conserved in PsaA proteins from Synechococcus, but is completely absent in viral PsaA proteins from the psaJF→C→A→B→K→E→D gene organization and most Prochlorococcus strains. This may indicate a co-evolutionary scenario where cyanophages containing either of these gene organizations infect cyanobacterial ecotypes biogeographically restricted to the 30°N and 30°S equatorial strip.
2014
- Viral tagging reveals discrete populations in Synechococcus viral genome sequence spaceDeng, Li, Ignacio-Espinoza, J. Cesar, Gregory, Ann C., Poulos, Bonnie T., Weitz, Joshua S., Hugenholtz, Philip, and Sullivan, Matthew B.Nature 2014
Microbes and their viruses drive myriad processes across ecosystems ranging from oceans and soils to bioreactors and humans. Despite this importance, microbial diversity is only now being mapped at scales relevant to nature, while the viral diversity associated with any particular host remains little researched. Here we quantify host-associated viral diversity using viral-tagged metagenomics, which links viruses to specific host cells for high-throughput screening and sequencing. In a single experiment, we screened 10(7) Pacific Ocean viruses against a single strain of Synechococcus and found that naturally occurring cyanophage genome sequence space is statistically clustered into discrete populations. These population-based, host-linked viral ecological data suggest that, for this single host and seawater sample alone, there are at least 26 double-stranded DNA viral populations with estimated relative abundances ranging from 0.06 to 18.2%. These populations include previously cultivated cyanophage and new viral types missed by decades of isolate-based studies. Nucleotide identities of homologous genes mostly varied by less than 1% within populations, even in hypervariable genome regions, and by 42-71% between populations, which provides benchmarks for viral metagenomics and genome-based viral species definitions. Together these findings showcase a new approach to viral ecology that quantitatively links objectively defined environmental viral populations, and their genomes, to their hosts.
2013
- The global virome: Not as big as we thought?Ignacio-Espinoza, J. Cesar, Solonenko, Sergei A., and Sullivan, Matthew B.2013
Viruses likely infect all organisms, serving to unknown extent as genetic vectors in complex networks of organisms. Environmental virologists have revealed that these abundant nanoscale entities are global players with critical roles in every ecosystem investigated. Curiously, novel genes dominate viral genomes and metagenomes, which has led to the suggestion that viruses represent the largest reservoir of unexplored genetic material on Earth with literature estimates, extrapolating from 14 mycobacteriophage genomes, suggesting that two billion phage-encoded ORFs remain to be discovered. Here we examine (meta)genomic data available in the decade since this provocative assertion, and use ’protein clusters’ to evaluate whether sampling technologies have advanced to the point that we may be able to sample ’all’ of viral diversity in nature. \textcopyright 2013 Elsevier B.V. All rights reserved.
- Sequencing platform and library preparation choices impact viral metagenomesSolonenko, Sergei A., Ignacio-Espinoza, J. Césa, Alberti, Adriana, Cruaud, Corinne, Hallam, Steven, Konstantinidis, Kostas, Tyson, Gene, Wincker, Patrick, and Sullivan, Matthew B.BMC Genomics 2013
Background: Microbes drive the biogeochemistry that fuels the planet. Microbial viruses modulate their hosts directly through mortality and horizontal gene transfer, and indirectly by re-programming host metabolisms during infection. However, our ability to study these virus-host interactions is limited by methods that are low-throughput and heavily reliant upon the subset of organisms that are in culture. One way forward are culture-independent metagenomic approaches, but these novel methods are rarely rigorously tested, especially for studies of environmental viruses, air microbiomes, extreme environment microbiology and other areas with constrained sample amounts. Here we perform replicated experiments to evaluate Roche 454, Illumina HiSeq, and Ion Torrent PGM sequencing and library preparation protocols on virus metagenomes generated from as little as 10pg of DNA.Results: Using %G + C content to compare metagenomes, we find that (i) metagenomes are highly replicable, (ii) some treatment effects are minimal, e.g., sequencing technology choice has 6-fold less impact than varying input DNA amount, and (iii) when restricted to a limited DNA concentration (\textless1\mug), changing the amount of amplification produces little variation. These trends were also observed when examining the metagenomes for gene function and assembly performance, although the latter more closely aligned to sequencing effort and read length than preparation steps tested. Among Illumina library preparation options, transposon-based libraries diverged from all others and adaptor ligation was a critical step for optimizing sequencing yields.Conclusions: These data guide researchers in generating systematic, comparative datasets to understand complex ecosystems, and suggest that neither varied amplification nor sequencing platforms will deter such efforts. \textcopyright 2013 Solonenko et al.; licensee BioMed Central Ltd.
2012
- Phylogenomics of T4 cyanophages: Lateral gene transfer in the ’core’ and origins of host genesIgnacio-Espinoza, J. Cesar, and Sullivan, Matthew B.Environmental Microbiology 2012
The last two decades have revealed that phages (viruses that infect bacteria) are abundant and play fundamental roles in the Earth System, with the T4-like myoviruses (herein T4-like phages) emerging as a dominant ’signal’ in wild populations. Here we examine 27 T4-like phage genomes, with a focus on 17 that infect ocean picocyanobacteria (cyanophages), to evaluate lateral gene transfer (LGT) in this group. First, we establish a reference tree by evaluating concatenated core gene supertrees and whole genome gene content trees. Next, we evaluate what fraction of these ’core genes’ shared by all 17 cyanophages appear prone to LGT. Most (47 out of 57 core genes) were vertically transferred as inferred from tree tests and genomic synteny. Of those 10 core genes that failed the tree tests, the bulk (8 of 10) remain syntenic in the genomes with only a few (3 of the 10) having identifiable signatures of mobile elements. Notably, only one of these 10 is shared not only by the 17 cyanophages, but also by all 27 T4-like phages (thymidylate synthase); its evolutionary history suggests cyanophages may be the origin of these genes to Prochlorococcus. Next, we examined intragenic recombination among the core genes and found that it did occur, even among these core genes, but that the rate was significantly higher between closely related phages, perhaps reducing any detectable LGT signal and leading to taxon cohesion. Finally, among 18 auxiliary metabolic genes (AMGs, a.k.a. ’host’ genes), we found that half originated from their immediate hosts, in some cases multiple times (e.g. psbA, psbD, pstS), while the remaining have less clear evolutionary origins ranging from cyanobacteria (4 genes) or microbes (5 genes), with particular diversity among viral TalC and Hsp20 sequences. Together, these findings highlight the patterns and limits of vertical evolution, as well as the ecological and evolutionary roles of LGT in shaping T4-like phage genomes. \textcopyright 2012 Society for Applied Microbiology and Blackwell Publishing Ltd.
2020
- A network-based integrated framework for predicting virus–prokaryote interactionsWang, Weili, Ren, Jie, Tang, Kujin, Dart, Emily, Ignacio-Espinoza, Julio Cesar, Fuhrman, Jed A, Braun, Jonathan, Sun, Fengzhu, and Ahlgren, Nathan ANAR Genomics and Bioinformatics 2020
Metagenomic sequencing has greatly enhanced the discovery of viral genomic sequences; however, it remains challenging to identify the host(s) of these new viruses. We developed VirHostMatcher-Net, a flexible, network-based, Markov random field framework for predicting virus–prokaryote interactions using multiple, integrated features: CRISPR sequences and alignment-free similarity measures (}s_2\^*} and WIsH). Evaluation of this method on a benchmark set of 1462 known virus–prokaryote pairs yielded host prediction accuracy of 59% and 86% at the genus and phylum levels, representing 16–27% and 6–10% improvement, respectively, over previous single-feature prediction approaches. We applied our host prediction tool to crAssphage, a human gut phage, and two metagenomic virus datasets: marine viruses and viral contigs recovered from globally distributed, diverse habitats. Host predictions were frequently consistent with those of previous studies, but more importantly, this new tool made many more confident predictions than previous tools, up to nearly 3-fold more (n \textgreater 27 000), greatly expanding the diversity of known virus–host interactions.