Tag: PacBio

Better together: long-range and long-read DNA sequencing methods close age-old blindspots in microbiome research

 

Since its debut, next-generation sequencing has not rested on its laurels. Improved sequencing platforms have reduced error and lengthened reads into the tens of thousands of bases. The debut of ultra-long-range sequencing methods that are based on proximity ligation (aka Hi-C) has brought a new order-of-magnitude into reach by linking DNA strands with their neighbors before sequencing.

Rapid progress in this field has birthed genome-resolved metagenomics, the sequencing and assembly of genomes from environmental samples to study ecosystem dynamics. But metagenomic experiments often undersample microbial diversity, missing rare residents, overlooking closely related organisms (like bacterial strains), losing rich genetic data (like viruses and metabolite gene clusters), and ignoring host-viral or host-plasmid interactions.

 

A revolution within a revolution

New sequencing platforms and methods can reform metagenomics from within. Phase Genomics has been a leader in genome-resolved metagenomics with its ProxiMeta™ platform, which leverages a method that physically connects DNA molecules inside cells before sequencing to generate highly complete genomes for novel bacteria and viruses. Boosting proximity-fueled methods with long-read platforms, such as the PacBio® Sequel® IIe system that can yield HiFi reads of up to 15,000 base pairs with error rates below 1%, could stretch its potential even further.

In a study published in Nature Biotechnology, a team — led by Dr. Timothy Smith and Dr. Derek Bickhart at the U.S. Department of Agriculture and Dr. Pavel Pevzner at the University of California, San Diego — employed both PacBio HiFi sequencing and ProxiMeta in a deep sequencing experiment to uncover record levels of microbial diversity from a fecal sample of a Katahdin lamb. Combined, PacBio HiFi sequencing and ProxiMeta eased assembly, recovered rare microbes, resolved hundreds of strains and their haplotypes, and revealed hundreds of novel plasmid and viral interactions.

 

Deeper diversity

The team constructed SMRTbell® libraries to generate HiFi data, and ProxiMeta™ libraries to generate long-range sequencing data. The two datasets allowed them to assemble contigs and create draft genomes without manual curation.

Researchers compared the breadth and depth of HiFi data-derived metagenome-assembled genomes, or MAGs, to control MAGs from assemblies of the same sample made using long, more error-prone reads. HiFi data yielded 428 complete MAGs from bacteria and archaea — a record number from a single sample. HiFi data also generated more low-prevalence MAGs, capturing a larger slice of the community’s diversity by picking up more genomes from less common residents.

 

The hidden actors

But no assembly method could be considered “complete” if it overlooked viruses, the most numerous members of virtually all ecological niches on Earth. These tiny players shape microbial communities in ways scientists are still trying to understand. For example, as agents of horizontal gene transfer, they help spread antibiotic resistance genes. And conversely, they have recently grown in popularity as a means to kill resistant bacteria in our ever-waging war against antibiotic resistance.

Phase Genomics’ ProxiPhage™ tool can already assemble high-fidelity viral genomes from microbial communities, even using only short-read sequencing data. But the new study shows that having HiFi helps considerably. The team identified 424 unique viral-host interactions, including 60 between viruses and archaea, which is a more than 7-fold increase over control samples. In total, the HiFi library included nearly 400 viral contigs, more than half of which came from a single family that infects bacteria and archaea. The ability to connect viruses with their microbial hosts in vivo is a unique property of Phase Genomics’ technology.

 

HiFi family trees

The long-range ProxiMeta libraries contained information that yielded more than 1,400 complete and 350 partial sets of gene clusters from archaea and bacteria for synthesizing metabolites such as proteasome inhibitors — the most uncovered to date. These clusters likely help some of these microbes colonize the gut. HiFi data picked up about 40% more clusters than control MAGs, illustrating just how much data is lost when long reads aren’t also highly accurate reads.

The team also used the HiFi-based MAGs to trace lineages within the community. They computationally resolved 220 MAGs into strain haplotypes, based largely on variations within single-copy genes. One MAG had 25 different haplotypes, which are likely strains of the same genus or species.

ProxiMeta ultra-long-range sequencing also linked nearly 300 HiFi-assembled plasmids to specific MAGs — revealing the species that hosted them in vivo. One plasmid, for example, was found in bacteria from 13 different genera. Long-range data also identified the first plasmids associated with three archaea, including Methanobrevibacter and Methanosphaera.

 

What’s around the bend?

This study has lessons beyond one lamb’s gastrointestinal tract. It shows decisively that the discovery power innate to long-range sequencing methods like ProxiMeta are greatly enhanced when wedded to high-accuracy sequencing methods like HiFi. Together, the two generate increasingly sophisticated metagenome assemblies for biologists to interrogate.

Applied to other environmental samples, this platform could illuminate the diversity and complexity of other microbial communities — from the bottom of the sea to mountain peaks, and within the body of every human being. It could probe pressing issues of our day, such as disease, soil health, and antibiotic resistance, a scourge whose spread and potential solutions — such as phage therapy — can only be forged through a thorough understanding of microbial diversity, interactions, and ecology.

Better together: long-range and long-read DNA sequencing methods, combined, reach record heights in microbiome discovery

Microbiome plate and Phase Genomics logo. Reads "Breaking records in microbiome discovery"

 

Click here for an updated blog post.

 

Since its debut, next-generation sequencing has not rested on its laurels. Improved sequencing platforms have reduced error and lengthened reads into the tens of thousands of bases. The debut of long-range sequencing methods that are based on proximity ligation (aka Hi-C) has brought a new order-of-magnitude into reach by linking DNA strands with their neighbors before sequencing.

 

This progress has birthed high-resolution metagenomics, the sequencing and assembly of genomes from environmental samples to study ecosystem dynamics. But metagenomic experiments often undersample microbial diversity, missing rare residents, overlooking closely related organisms (like bacterial strains), losing rich genetic data (like metabolite gene clusters), and ignoring host-viral or host-plasmid interactions.

 

A revolution within a revolution

 

New sequencing platforms and methods can reform metagenomics from within. Long-read platforms, such as the PacBio® Sequel® IIe system, now yield HiFi reads of up to 15,000 base pairs with error rates below 1%. In addition, Phase Genomics created ProxiMeta™ kits to generate proximity-ligated long-range sequencing libraries, which preserve associations between DNA strands originating in the same cell.

 

In a study posted May 4 to bioRxiv, a team — led by Dr. Timothy Smith and Dr. Derek Bickhart at the U.S. Department of Agriculture and Dr. Pavel Pevzner at the University of California, San Diego — employed both PacBio HiFi sequencing and ProxiMeta in a deep sequencing experiment to uncover record levels of microbial diversity from a fecal sample of a Katahdin lamb. Combined, PacBio HiFi sequencing and ProxiMeta eased assembly, recovered rare microbes, resolved hundreds of strains and haplotypes, and preserved hundreds of plasmid and viral interactions.

 

HiFi family trees

 

The team constructed SMRTbell® libraries to generate HiFi data, and ProxiMeta kits to generate long-range libraries. The two datasets, along with the metaFlye and ProxiMeta algorithms, allowed them to assemble contigs and create draft genomes without manual curation.

 

Researchers compared the breadth and depth of HiFi data-derived metagenome-assembled genomes, or MAGs, to control MAGs from assemblies of the same sample made using long, error-prone reads. HiFi data yielded more complete MAGs — 428 versus 335 — from more bacteria and archaea. HiFi data also generated more low-prevalence MAGs, capturing a larger slice of the community’s diversity by picking up more genomes from less common residents.

 

The HiFi MAGs also contained more than 1,400 complete and 350 partial sets of gene clusters for synthesizing metabolites such as proteasome inhibitors, which likely help some of these microbes colonize the gut. HiFi data picked up about 40% more of such clusters than control MAGs, illustrating just how much data is lost when long reads aren’t also highly accurate reads.

 

The team also used the HiFi MAGs to trace lineages within the community. They computationally resolved 220 MAGs into strain haplotypes, based largely on variations within single-copy genes. One MAG had 25 different haplotypes, which are likely strains of the same genus or species.

 

ProxiMeta’s long-range discoveries

 

The ProxiMeta-generated libraries added flesh to these MAG frames skeletons by unveiling additional rich biological information. Long-range sequencing linked nearly 300 HiFi-assembled plasmids to specific MAGs — revealing the species that hosted them. One plasmid, for example, was found in bacteria from 13 different genera. Long-range data also identified the first plasmids associated with two archaea, Methanobrevibacter and Methanosphaera.

 

Long-range sequencing illuminated the viral burden in this community. The HiFi library included nearly 400 viral contigs, more than half of which came from a single family of viruses that infect both bacteria and archaea. The team identified 424 unique viral-host interactions, including 60 between viruses and archaea, which is a more than 7-fold increase over controls.

 

What’s around the bend?

 

This study has lessons beyond one lamb’s gastrointestinal tract. It shows decisively that the highly accurate long reads generated by HiFi sequencing ideal partners for Hi-C-derived methods like ProxiMeta — together generating increasingly sophisticated metagenome assemblies for biologists to interrogate.

 

Applied to other environmental samples, this platform could illuminate the diversity and complexity of other microbial communities — from the bottom of the sea to mountain peaks, and within the stomach of every human being. It could probe pressing issues of our day, such as antibiotic resistance, soil health, or how microbes can break down pollutants. These endeavors will not just fuel the engines of scientific inquiry. Broader use of this method could generate new insights into pressing problems of our times, including antibiotic resistance.

Phase Genomics and Pacific Biosciences Co-Developing new Genome Assembly Phasing Software

Phase Genomics and Pacific Biosciences logos

“FALCON-Phase” – an algorithm for producing diploid genomes.

 

Phase Genomics has entered into a co-development agreement with Pacific Biosciences to develop FALCON-Phase, a software module that combines Hi-C and PacBio® highly-accurate, long read sequencing data to produce fully-phased diploid genome assemblies. The software will be released later this spring.

FALCON-Phase augments PacBio Single Molecule, Real-Time (SMRT®) assemblies with Hi-C proximity-ligation data, generating accurate, fully-phased diploid assemblies. Specifically, it uses Hi-C’s chromatin proximity information to identify sequences belonging to the same parental chromosome in genome assemblies produced by PacBio’s FALCON-Unzip software, greatly reducing haplotype switching along the primary assembly.

Furthermore, by combining Phase Genomics’ Proximo Hi-C genome scaffolding technology with FALCON-Phase, users can fully reconstruct maternal and paternal haplotypes on a chromosomal scale. The end result is a diploid set of chromosome-scale scaffolds, or two fully-phased genomes for the same data and labor cost typical for a single genome project.

FALCON-Phase genome Phasing Graph

FALCON-Phase groups long-read contigs into two separate haplotypes based on Hi-C data. Red and blue edges show contigs connected to the same haplotype, while black edges show homologous contigs connected to both haplotypes. Colors were assigned based on known phasing of assembly, which was not otherwise used to inform FALCON-Phase analysis.

These high-quality phased haplotypes can be leveraged to improve the efficiency of agricultural breeding programs, and could help identify disease-causing genomic variations in humans.

Prof. John Williams, Director of the Davies Research Centre at the University of Adelaide, Australia, wrote, “We are interested in expression of imprinted genes and for this work the availability of haplotype-resolved genome assemblies is an important advance. The release of software that enables the creation of haplotyped genome sequence assembly will revolutionize exploration of genome function. The FALCON-Phase software has this ability and can be applied retroactively to SMRT assemblies, as long as Hi-C data are available. Therefore, even pre-existing genomes can potentially be upgraded to haplotyped assemblies for little or no cost.”

Haplotype-resolved genome assembly is an exciting emerging field. Currently, there is only one other method, Trio Canu, which, unlike FALCON-Phase, requires the parents and offspring to be sequenced, adding an additional cost. For many species, it is not possible to collect a trio in the wild and breeding is often not an option. Other Hi-C phasing techniques exist, but they phase genetic variants, not genome assemblies.

The addition of ultra-long genomic interactions captured by Hi-C to PacBio assemblies is very powerful and presents a straightforward solution to a problem experienced by almost all genomic researchers working with diploid organisms.

A formal announcement with more information is coming in the next month. For more information, email us at info@phasegenomics.com.

 

Pacific Biosciences, the Pacific Biosciences logo, PacBio and SMRT are trademarks of Pacific Biosciences of California, Inc.