Tag: genomics

Far and wide: New technology reveals the long arm of viruses in microbial ecosystems

Hydrothermal vent on ocean floor depicting the microbial environment of the featured study

Hydrothermal mat sampling aboard R/V Roger Revelle using ROV Jason. Credit: R/V Roger Revelle, Scripps institute of Oceanography.

 

For decades, biologists largely studied microbes and their viruses in isolation, nurtured in laboratory cultures. Yet, to paraphrase the poet John Donne, no microbe is an island. In recent years, scientists have recognized this by studying microbes not as individual species, but as part of the larger microbiome: the communal ecosystems, each home to many different types of bacteria and archaea, in which most microbes reside. It is in these realms that microbes display their collective might. From guts to geysers, tiny tales of competition and cooperation within microbiomes have big effects on our health and environment — such as the spread of antibiotic resistance and the stability of food webs.

 

Revealing microbiome mechanics

Traditional, laboratory-based methods struggle to probe the individual components of the microbiome. But “metagenomics” allows us to study the community at large. Metagenomics is the sequencing of DNA from microbial communities, and metagenome-assembled genomes — or MAGs — put together using ever-more sensitive tools and processes, are increasingly able to resolve the inner workings of these complex ecosystems.

Recently, a collaboration between Phase Genomics and a team at Harvard University on a metagenomics project showed that phages — viruses that infect bacteria and archaea — have a surprisingly broad impact on the microbiome of a seafloor hydrothermal vent. Using a technique called proximity ligation (Hi-C), which cross-links DNA strands from the same cell before DNA extraction and sequencing, researchers reconstructed MAGs in this community and found that diverse microbes, including bacteria and archaea separated by billions of years of evolution, sported records of past encounters with the same phages. One explanation is that the phages have an unheard-of level of host diversity — one certainly not predicted by laboratory experiments. Another is that these deep-sea microbes may somehow “share” adaptive immunity across broad and deep evolutionary gulfs.

If phages have similarly broad impacts far above the ocean floor, scientists may have to rethink how communication, cooperation and evolution shape microbiomes — and how they impact the larger creatures, like us, that depend on them.

 

Tapping the archive

Microbiomes teem with phages. But deciphering their reach is no easy task. Thankfully, some bacteria and archaea are hoarders. Their CRISPR-based immune responses record past phage infections by inserting short fragments of phage genomes into a specific region of their own genome. Some studies have even sought to reconstruct the reach of phages in a microbiome by probing the content of these areas — known as spacer regions. Yet, the approach has its drawbacks.

“Spacer regions are rich in repeats, so they don’t get sorted well in the MAG assembly process,” said Yunha Hwang, a doctoral student at Harvard University. “That creates a bias regarding which spacers and phage fragments are ultimately assembled into MAGs.”

Hwang has studied these genetic archives of microbial immunity, and previously reported that, in a desert microbiome, phages may have broad host ranges.

“It was a preliminary result, but very exciting,” said Hwang. “I wanted to see if this was a wider feature of microbiomes, and I wanted to avoid that assembly bias.”

 

Achieving Hi-C depth in deep oceans

Hwang and Peter Girguis, a professor at Harvard, worked with Phase Genomics to employ a metagenomic approach centered on Hi-C, which, by preserving physical linkages between DNA fragments present in the same cell, eases the process of resolving repeat-rich regions like CRISPR spacers.

Hwang collected samples from the microbiome near a hydrothermal vent in the Gulf of California’s Guyamas Basin. Microbial communities like this employ “alternative” metabolic pathways — relying on the plume’s rich geochemical outflow for nutrients, energy and raw materials instead of the sun-based food webs more familiar to surface-dwellers. As soon as she reached port in San Diego, Hwang shipped the microbiome samples to Phase Genomics for cross-linking, DNA extraction, sequencing and MAG assembly.

The spacer regions of the MAGs assembled via Hi-C showed similar profiles of past phage infection compared to conventional spacer-sequencing and assembly. But the higher-quality Hi-C MAGs also eased the search for phage fragments within CRISPR spacers. And, as in Hwang’s study of desert microbiomes, individual phages in the hydrothermal vent microbiome had a broad reach — including bacteria to archaea.

“This was so baffling to us, because these are two separate domains of life,” said Hwang. “The ability for a phage to infect a host depends on fundamental properties of cell biology, and bacteria and archaea are so different — their membranes, their proteins, their genomes. So, what does this mean?”

Another puzzle is that bacteria and archaea that are linked by symbiotic relationships — such as eating one another’s metabolic leftovers — were also more likely to harbor genomic fragments of the same phages in their CRISPR spacers.

 

Spread the word

One theory to explain these findings is that phages within microbiomes, which can be hard-pressed for space in these close-knit communities, have evolved to infect hosts with radically diverse membrane compositions, host defenses and cell biology. But that is not the only possibility. Another is that symbiotic partners, separated by billions of years of evolution but united at the dinner table, may be sharing more than just a meal.

“In symbiotic microbes, when one population or species gets infected by a phage, there could be a selective advantage in sharing that adaptive, genetically encoded immunity with your partners,” said Hwang.

Future metagenomic studies of other microbiomes may help resolve these theories, or sire new ones. But the eventual explanations will undoubtedly force scientists to rethink how genetic information flows within microbiomes.

“How do bacteria and archaea build up ‘resilience’ in such closely packed communities?” said Hwang. “Perhaps one way that happens through selective pressure to share records of past phage infections widely. Keeping your neighbor healthy keeps you healthy.”

 

Sounds familiar

Once upon a time, far above the ocean floor, children played a game called “telephone”: passing a phrase from one person to another — in the form of a whisper — to see how the message changed as it is heard by each ear and transmitted by each voice.

It seems that bacteria, archaea and phages play similar games, which is just the latest surprise that metagenomics has revealed about microbiomes. It will certainly not be the last.

Pass it on.

 

 

Catching Evolution in the Act

Scientist studying chromosomes

 

Genome sequencing has confirmed some long-held theories about the blueprints of life. But it has also unearthed quite a few surprises. Scientists once hypothesized that the human genome consisted of upward of 100,000 genes. The decades-long Human Genome Project — as well as many next-generation sequencing studies — have prompted the downward revision of that figure to a relatively spartan 20,000 genes, more or less.

 

Evolution in action

 

If there is a lesson in this vast overestimation to our gene load, it is perhaps that evolution shapes genomes in unexpected ways.

 

The advent of more nimble and lithe methods for genome assembly and analysis holds the promise to unearth the surprises that evolution has wrought. These relatively new advancements include tools like Phase Genomics’ ultra-long-range sequencing, which reconstructs the sequence of chromosomes by using positional relationships between DNA sequences in the genome. These methods have grown sufficiently sophisticated to catch the quick transitions that transform populations and species.

 

Recently a team led by Dr. Leonid Kruglyak at UCLA employed these tools to catch evolution at work. Their discovery relates to sex determination, a complex developmental process that, in animals, generally kicks off when an immature gonad develops into either testes or ovaries. In humans and many animals, sex determination is governed largely by genes, and in turn shapes their genomes and evolutionary trajectories like few other biological processes can.

 

That special pair

 

For species with full genetic control over sex determination, the process often leaves its imprint on the genome in the form of sex chromosomes. In most animals, genomes consist of pairs of chromosomes called autosomes. But in addition to those autosomes, many animals — including us — harbor another set of chromosomes called the sex chromosomes. Sex chromosomes govern — or at least try to govern — whether the gonads develop into ovaries or testes, which  in turn influences the development of genitals and secondary sex characteristics.

 

Scientists have long theorized that sex chromosomes evolve from autosomes. Studies of young, relatively new sex chromosome systems, like those in the medaka, indicate that the transition happens fast. Yet the steps that transform a pair of autosomes into sex chromosomes are at best murky, with many questions unresolved. Much could be answered by catching this transition from autosome to sex chromosome in the act.

 

Behind the curtain

In a paper published June 1 in Nature, Dr. Kruglyak and his colleagues announced that they have found just such a transition: an animal with a pair of autosomes that is beginning to act like sex chromosomes. The researchers utilized Phase Genomics’ Proximo™ genome scaffolding platform and PacBio long reads to sequence and assemble a highly complete genome for a microscopic, freshwater flatworm, Schmidtea mediterranea. In many parts of its natural habitat across the Mediterranean basin, S. mediterranea reproduces by budding, without the need for sex. But some populations in Corsica and Sardinia produce the next generation through sexual reproduction.

 

The team, including lead and co-corresponding author Dr. Longhua Guo at UCLA, discovered that in these sexual strains of S. mediterranea, one pair of autosomes shows evidence of almost no genetic exchange, also known as recombination, during reproduction. This is a telltale signature of sex chromosomes. In addition, they saw that the unusual pair of autosomes harbors a large contingent of genes that play a role in developing sex-specific characteristics. Taken together, these genomic data finger these autosomes as a “sex-primed” pair that are in the process of evolving into fully fledged sex chromosomes.

 

Photo finishes

 

Future studies of S. mediterranea’s nascent sex chromosomes will likely fuel fresh inquiry and debate about this rarely-seen evolutionary transition. The answers will stretch far beyond flatworms. Studies of other recently evolved systems, such as in stickleback fish, show that sex chromosomes can play a decisive role in other poorly understood evolutionary transitions, such as the rise of a new species.

 

Beyond sex chromosomes, this study demonstrates the raw interrogative power of modern genome assembly and analysis methods. They can capture transitions — even the most brief and ephemeral. Applied appropriately, methods like these can help scientists make sense of a myriad of messy, complex processes that evolution shapes. These include some issues that hit as close to home as gonads, from curbing the spread of antibiotic resistance to protecting pollinators from annihilation. Evolution moves quickly. Now, so can we.

 

Better together: long-range and long-read DNA sequencing methods, combined, reach record heights in microbiome discovery

Microbiome plate and Phase Genomics logo. Reads "Breaking records in microbiome discovery"

 

Click here for an updated blog post.

 

Since its debut, next-generation sequencing has not rested on its laurels. Improved sequencing platforms have reduced error and lengthened reads into the tens of thousands of bases. The debut of long-range sequencing methods that are based on proximity ligation (aka Hi-C) has brought a new order-of-magnitude into reach by linking DNA strands with their neighbors before sequencing.

 

This progress has birthed high-resolution metagenomics, the sequencing and assembly of genomes from environmental samples to study ecosystem dynamics. But metagenomic experiments often undersample microbial diversity, missing rare residents, overlooking closely related organisms (like bacterial strains), losing rich genetic data (like metabolite gene clusters), and ignoring host-viral or host-plasmid interactions.

 

A revolution within a revolution

 

New sequencing platforms and methods can reform metagenomics from within. Long-read platforms, such as the PacBio® Sequel® IIe system, now yield HiFi reads of up to 15,000 base pairs with error rates below 1%. In addition, Phase Genomics created ProxiMeta™ kits to generate proximity-ligated long-range sequencing libraries, which preserve associations between DNA strands originating in the same cell.

 

In a study posted May 4 to bioRxiv, a team — led by Dr. Timothy Smith and Dr. Derek Bickhart at the U.S. Department of Agriculture and Dr. Pavel Pevzner at the University of California, San Diego — employed both PacBio HiFi sequencing and ProxiMeta in a deep sequencing experiment to uncover record levels of microbial diversity from a fecal sample of a Katahdin lamb. Combined, PacBio HiFi sequencing and ProxiMeta eased assembly, recovered rare microbes, resolved hundreds of strains and haplotypes, and preserved hundreds of plasmid and viral interactions.

 

HiFi family trees

 

The team constructed SMRTbell® libraries to generate HiFi data, and ProxiMeta kits to generate long-range libraries. The two datasets, along with the metaFlye and ProxiMeta algorithms, allowed them to assemble contigs and create draft genomes without manual curation.

 

Researchers compared the breadth and depth of HiFi data-derived metagenome-assembled genomes, or MAGs, to control MAGs from assemblies of the same sample made using long, error-prone reads. HiFi data yielded more complete MAGs — 428 versus 335 — from more bacteria and archaea. HiFi data also generated more low-prevalence MAGs, capturing a larger slice of the community’s diversity by picking up more genomes from less common residents.

 

The HiFi MAGs also contained more than 1,400 complete and 350 partial sets of gene clusters for synthesizing metabolites such as proteasome inhibitors, which likely help some of these microbes colonize the gut. HiFi data picked up about 40% more of such clusters than control MAGs, illustrating just how much data is lost when long reads aren’t also highly accurate reads.

 

The team also used the HiFi MAGs to trace lineages within the community. They computationally resolved 220 MAGs into strain haplotypes, based largely on variations within single-copy genes. One MAG had 25 different haplotypes, which are likely strains of the same genus or species.

 

ProxiMeta’s long-range discoveries

 

The ProxiMeta-generated libraries added flesh to these MAG frames skeletons by unveiling additional rich biological information. Long-range sequencing linked nearly 300 HiFi-assembled plasmids to specific MAGs — revealing the species that hosted them. One plasmid, for example, was found in bacteria from 13 different genera. Long-range data also identified the first plasmids associated with two archaea, Methanobrevibacter and Methanosphaera.

 

Long-range sequencing illuminated the viral burden in this community. The HiFi library included nearly 400 viral contigs, more than half of which came from a single family of viruses that infect both bacteria and archaea. The team identified 424 unique viral-host interactions, including 60 between viruses and archaea, which is a more than 7-fold increase over controls.

 

What’s around the bend?

 

This study has lessons beyond one lamb’s gastrointestinal tract. It shows decisively that the highly accurate long reads generated by HiFi sequencing ideal partners for Hi-C-derived methods like ProxiMeta — together generating increasingly sophisticated metagenome assemblies for biologists to interrogate.

 

Applied to other environmental samples, this platform could illuminate the diversity and complexity of other microbial communities — from the bottom of the sea to mountain peaks, and within the stomach of every human being. It could probe pressing issues of our day, such as antibiotic resistance, soil health, or how microbes can break down pollutants. These endeavors will not just fuel the engines of scientific inquiry. Broader use of this method could generate new insights into pressing problems of our times, including antibiotic resistance.

Phase Genomics and QIAGEN Partner to Bring Hi-C Epigenetics Solutions to U.S. Market

In the last few years, Hi-C technology has grown in popularity within the epigenetics community. The chief application this proprietary method is to measure the three-dimensional architecture of genomes to better understand complex nuclear dynamics. Being a leader in this space, we at Phase Genomics seek to maximize the commercial footprint of our technology. As interest in this method has increased significantly, we have partnered with QIAGEN to increase its commercial availability. Read about the new EpiTect Hi-C kits available now through our collaborative effort.

 

QIAGEN expands its existing Epigenomic offering in the United States with Sample to Insight solution for Hi-C NGS analysis

• EpiTect Hi-C Kit helps researchers to better understand key aspects of long-range genome architecture
• License agreement enables QIAGEN to sell Phase Genomics’ proprietary proximity-ligation technology in the United States research market
• Adds to QIAGEN’s epigenomic capabilities in identification of individual methylation marks and histone modification at the nucleotide level

 

Hilden, Germany, and Germantown, Maryland, October 29, 2020 – QIAGEN today announced a non-exclusive agreement with Phase Genomics, Inc. to license specific patents to sell its EpiTect Hi-C kits in the United States. Through this agreement, QIAGEN now has access to support chromatin research in the largest research market in the world.

Chromatin conformation research, including chromatin conformational analysis (Hi-C), is an emerging and growing market area of genomic research that is refining our knowledge of the interconnectivity and organization of the genome. Hi-C has become a vital tool for understanding the structures and organization associated with cell biology. The EpiTect Hi-C Kit provides a simplified, single-box solution, requiring less than 250,000 mammalian cells to generate sequence-ready libraries.

“QIAGEN’s EpiTect portfolio has until now focused on identifying individual methylation marks and histone modification in the genome at the nucleotide level,” said Kerstin Steinert, Vice President of Product Development & Research Services at QIAGEN. “With the QIAGEN EpiTect Hi-C Kit, we are providing an end-to-end solution to study the 3-D genome and identify larger structural aspects of chromatin conformation and genomic architecture.”

“This partnership demonstrates a strong confidence in the value of Phase Genomics’ technology. Now, scientists studying epigenetics can more fully understand changes in genome architecture that may trigger disease in ways that are more cost-effective than ever before,” said Ivan Liachko, PhD, Founder and CEO of Phase Genomics. “Our Hi-C proximity-ligation technology, now available through QIAGEN, will help accelerate treatments to market and discover new paths toward the prevention of disease.”

In keeping with the QIAGEN commitment to provide Sample to Insight solutions, customers will have two options to analyze data from experiments. The EpiTect Hi-C Portal, located on GeneGlobe Data Analysis Center, provides multiple analysis types, including contact matrices and maps. In addition, Phase Genomics will provide a comprehensive suite of computational analytic services for Hi-C data analysis to QIAGEN EpiTect Hi-C Kit customers through a cloud-based bioinformatic platform that employs novel computational approaches and algorithms to analyze and interrogate proximity ligation (Hi-C) data.

Phase Genomics Transformative Genome Phasing Tool (FALCON-Phase) Now Compatible with Nanopore Sequencing

Nanopore and Hi-C produce a new fully-phased, chromosome-scale genome for the red raspberry.

On October 22, scientists at KeyGene revealed the first fully-phased, chromosome-scale reference genome for the red raspberry, sequenced with Oxford Nanopore long-read technology and scaffolded and phased into full chromosomes using Phase Genomics’ Proximo™ Hi-C method.  

Assembling complex plant genomes used to be considered nearly impossible as they can be extremely large, polypoid, and contain highly repetitive regions. Long-read sequencing generates genomic data spanning very long regions, but still needs to be scaffolded, or “put together” into chromosomes. Proximo Hi-C not only helps guide the assembly to produce chromosome-level scaffolds but can also tell which sequences and mutations come from the maternal and paternal chromosome copies (this is called phasing). Our phasing method, FALCON-Phase was originally released in 2018 and was used in conjunction with the Proximo pipeline to generate this “platinum level” raspberry genome.

Read more about the assembly and future directions for the project here.

The Highest-Quality Genomes: Q&A on Cannabis Genomics

 

Co-author Kevin McKernan of Medicinal Genomics talks more about the past, present, and future of cannabis genomic research. Read more about his newly published cannabis genome assembly project using Proximo Hi-C scaffolding featured in The Genetic Literacy Project.

 

What is the difference between hemp and marijuana? How can we use genomics to answer this question?

 

McKernan: The legal definition of hemp is any Cannabis sativa that has less than 0.3 percent THC acid, or THCA. Historically, hemp has been grown for fiber and the exceptional nutritional content of its seed. THCA expression is genetically controlled at what has been historically referred to the Bt:Bd allele. Next-generation sequencing technologies are giving us our first glimpse of this complicated locus.

 

Why are you interested in assembling the Cannabis genome? What are you hoping to accomplish?

 

McKernan: A refined genome assembly will enable molecular breeding programs to deploy marker-assisted selection for yield, flowering time, pest resistance and rare cannabinoid expression. It will likely shed light on the heritability of hermaphroditism and apomixis. A clearer picture of the genes involved in cannabinoid and terpenoid expression will enable more intelligent breeding and synthetic biology programs.

 

Which genes are responsible for cannabidiolic acid production and how do these genes vary between the cultivars?

 

McKernan: The Cannabis plant makes 113 different cannabinoids. There are three well-understood cannabinoid synthesis genes. These highly similar genes all compete for a common precursor molecule. Mutations in these genes affect gross cannabinoid expression. A more refined reference may enlighten us to the genetic variants that can more accurately estimate THCA levels to segregate hemp and drug-type seed stocks.

 

What other hidden gems did you find in the Cannabis genome after you finished the assembly?

 

McKernan: The most exciting picture is the 2.1Mb CBCAS (cannabichromenic acid synthase) gene cluster seen the Jamaican Lion assembly. This has 9 tandem copies of CBCAS all directionally orientated that are 99.4-99.9 percent identical and separated by 30-80kb long terminal repeats. This region has been an assembly knot for over seven years and I think the only reason it is visible to us today is due to novel sequencing tools we didn’t have in 2011.

 

Why is the Cannabis genome so difficult to assemble? Are there unique genomic features (i.e. copy number variants, special repeat classes, segmental duplications) that are especially troublesome?

McKernan: Its 1.07Gb genome consists of 10 chromosomes, with 73 percent repeat, 66 percent AT and 0.5-1 percent polymorphic. The genes that contribute to chemotype are under the most selective pressure and have hijacked long terminal repeats to enable gene expansions. We had suspicions of this back in 2011 but could never assemble the region to prove it.

 

Why was it important to obtain chromosomes for your assembly? How did Hi-C help?

 

McKernan: The Pacific Biosciences assembly delivered us an assembly that was an amazing leap forward from the Illumina assemblies, but it is not chromosomal in scale. Hi-C has helped to organize these contigs into chromosomes and it can do this without having to make linkage maps.

 

What did you find to be most useful in working with Phase Genomics?

 

McKernan: Hi-C is very complimentary to PacBio sequence data and is the only technology that delivers long range information without having to make high molecular weight DNA. This is very important in Cannabis as it is difficult to get high molecular weight DNA out of the plant.

 

What would you like other researchers, breeders or regulators to take away from your high-quality genome assembly? How do you think this genome assembly will be utilized in the future?

 

McKernan: We also need dozens of genomes sequenced to the quality level of Jamaican Lion to get a full picture of these complex cannabinoid loci. We need Hi-C libraries to better understand the microbiome of the plant, so we can more intelligently manage pathogenic threats that affect yield. Many endofungal bacteria like Ralstonia are found in metagenomic sequencing studies in Cannabis flowers and can be a risk to consumers and negatively impact plant yield. Ralstonia is also notorious for contaminating many metagenomic studies due to contamination in library construction kits. We suspect Hi-C will play important roles in segregating live versus dead DNA and resolving these contamination problems.

 

What regulatory challenges do you run into when working on Cannabis genomics?

 

McKernan: The biggest issue at the moment is that the movement of tissue, other than sterilized stalk, is currently federally prohibited in the U.S. This makes RNA studies very challenging as RNA isolation has to be performed in the field. Movement of DNA or cross-linked chromatin is legal, so this is a compelling case for the use of Hi-C in the Cannabis field (insert Hi-C pun here). Phase Genomics’ kits were critical, as shipping certain tissues is restricted.U.S. federal funding also remains restricted. We turned to the Dash Distributed Autonomous Organization for funding to rapidly sequence and publish the genome. We applied for funds in May of 2018 and had the first assembly public on August 2. This is a very generous contribution by Dash because any U.S. university that attempts to handle the plant places their federal funding at risk.

 

What genomic evidence suggests that Cannabis has been selectively bred by humans?

 

McKernan: I think the elevated THCA levels witnessed since prohibition — combined with the long terminal repeat-driven expansion of the synthase genes — is the best evidence we have.

 

What is your favorite fact and what is your least favorite misconception about Cannabis?

 

McKernan: My favorite thought experiment regarding the rapid reproduction of Cannabis is that its genome is very likely spreading through space and time more quickly than the human genome, and it evokes much of David Sinclair’s work on Xenohormesis. My least favorite misconception is the false dichotomy of medical versus recreational cannabis consumption. I think this showcases our reactionary health-care mindset as opposed to the preventative mindset we need to strive for. If you disregard recreational use, you are likely going to require more medical use. These compounds have been in our diet for thousands of years. We now know mutations in human endocannabinoid system-related genes are associated with neurological phenotypes and a large class of idiosyncratic diseases are now being recognized as clinical endocannabinoid deficiency (CED). It was incredibly naïve and destructive to remove cannabinoids from the American diet in 1937.

 

What do you think the future holds for the cannabis industry?

 

McKernan: In states that legalize cannabis, there is a 15 percent reduction in alcohol consumption, a 25 percent reduction in opiate overdoses, a 17 percent decrease in Medicare opiate usage and a 25 percent reduction in general pharmaceutical use. There is a 10 percent reduction in suicide and a 72 percent reduction in PTSD nightmares. The benefits to epilepsy have survived FDA scrutiny. This is the most disruptive market force we have seen in healthcare since the internet and next-generation sequencing. We are now just witnessing the alcohol industry take multi-billion dollar positions in the cannabis industry. It is only a matter of time before the pharmaceutical industry begins to hedge their losses as well. I am betting against the endocannabinoid mimetic known as acetaminophen and in favor of the less-toxic phytocannabinoids like cannabidiol.

 

 

About Phase Genomics

Seattle-based Phase offers research services and kits based on its Hi-C and proximity-ligation technologies, which enable chromosome-scale genome assembly, metagenomic deconvolution, and the analysis of structural genomic variation and genome architecture. Phase Genomics offers Hi-C genomics tools for genome scaffolding and phasing. Learn more about Proximo and bring the power of Hi-C into your lab today by purchasing one of our Hi-C kits.

How it Works: Proximo Hi-C Genome Scaffolding

Hi-C Technology Links Antimicrobial Resistance Genes to the Microbiome

 

Antibiotic resistance is a rapidly growing global health threat as bacteria share and spread resistance genes via plasmids and other mobile genetic elements. Several teams of researchers applied a new method to understand which microorganisms house genes for antibiotic resistance within complex microbiome communities.
Read the paper, Linking the Resistome and Plasmidome to the Microbiome.

 

ANTIMICROBIAL RESISTANCE ON THE RISE

 

According to the World Health Organization, antimicrobial resistance (AMR) in microbial pathogens is expected to take 10 million lives by 2050 if there are no new pharmaceutical or technological advancements dedicated to combating this pressing problem. For almost a century, medicine has made remarkable impact on human life by using antibiotics to treat infections, but this has led to a very concerning overuse problem, stoking an arms race between antibiotics and the pathogens they target. The CDC points out that at least 30% of antibiotic prescriptions are unnecessary and there is a massive contribution to antibiotic overuse in the food and agriculture industry where each year 130,000 tons of antibiotics are given to food animal livestock. Both of these problems correlate with the rise of AMR.

 

Though there are naturally occurring antibiotic-resistant bacteria, there are two mechanisms by which bacteria can acquire antimicrobial resistance genes (ARGs) and become resistant: 1) through spontaneous genetic mutations and/or 2) by acquiring genetic material from other microbes via plasmids, viruses, or other means of horizontal gene transfer. Due to the evolutionary pressure exerted on microbes by antibiotic overuse, pathogens resistant to these antibiotics within our body, hospitals, and the environment become reservoirs of transmittable AMR genes that can rapidly spread and accumulate within a single microbe contributing to the emergence of multidrug-resistant microbes commonly known as superbugs.

 

PROXIMITY-LIGATION (HI-C) LINKS ARG AND PLASMIDS TO THEIR HOSTS

 

One of the biggest obstacles faced by scientists when studying AMR is the inability to determine which microbes are carrying and spreading specific ARGs. Because these genes often travel on mobile elements, they can move dynamically between different species and can therefore be found in numerous organisms without one clear parental host. When attempting to sequence the DNA of a mixed microbial sample, all the DNA is purified from all the cells at the same time and the host-plasmid connection is severed, making it nearly impossible to determine where each mobile element came from or if they were shared among several species. In this newly published paper, researchers highlight a novel method for linking ARGs and other mobile genetic elements to their hosts directly from microbiome samples using the latest version of the proximity-ligation (Hi-C) data analysis tool, ProxiMeta Hi-C.

 

Phase Genomics CEO, Dr. Ivan Liachko, describes how our Hi-C platform solves one of microbiologists’ greatest problems pertaining to the linking of plasmids with their hosts.

 

Hi-C utilizes in vivo proximity-ligation which can assemble complete genomes down to the strain-level directly from mixed-population samples as well as physically links plasmids/ARGs to their host. This method is particularly useful for researchers studying the “dark-matter” of the microbiome because the method does not require culturing nor a priori information about a sample.

 

USING HI-C TO TRACK ARGs IN THE MICROBIOME

 

Lead author Thibault Stalder from the University of Idaho used the ProxiMeta Hi-C kit on a complex microbiome wastewater community, a suspected AMR reservoir, to learn more about which bacteria carry ARGs. After the Hi-C library was sequenced, Phase Genomics used the data to inform contig clustering of hundreds of genomes, most of which are novel, with our cloud-based software – ProxiMeta. Using the genome clusters found by ProxiMeta, the Hi-C linkages of each ARG-, plasmid-, and integron-bearing contigs to each genome were measured to determine which species physically hosted the relevant mobile elements.

 

ProxiMeta was able to cluster contigs into >1000 genome clusters and search for over 30 groups of ARGs, plasmids, and integrons which speed up the adaptive process of newly integrated ARGs (Figure 1, circle plot). For each of these genes, we inferred hosts (Figure 2). Moreover, these organisms generally belonged to families known to host each known gene (marked with an “X” in Figure 2), supporting the accuracy of the analysis. In the future, this information will allow us to track the spread of AMR in complex communities consisting of many diverse organisms.

 

Microbiome Antibiotic Resistance Genes and Plasmids

Figure 1: Hi-C linkage between ARGs, plasmid markers, and integrons among clusters belonging to Alpha, Beta, Gamma and Delta Proteobacteria.

 

Over 200 genome clusters had strong Hi-C links to ARGs, of which 12 had high-quality assemblies. These resultant genomes include both gram positive and gram-negative bacteria and most belonged to species that were previously unsequenced. ARGs were mostly linked to genome clusters belonging to the Gammaproteobacteria, Betaproteobacteria and Bacteroidetes (Figure 2, below).

 

Microbiome Antibiotic Resistance Genes AMR and Plasmids

Figure 2: Normalized Hi-C links between ARGs, plasmids, and families of bacteria.

 

 

FUTURE DIRECTIONS

 

This method can be useful for researchers not only studying the microbiome, but the virome as well. Phages, or viruses, also distribute genetic information amongst bacteria to influence host biology, much like plasmids. Several previous studies showed that in vivo proximity-ligation can be used to link phages with their hosts directly from mixed complex samples, much like was done with plasmids and AMR genes in this study. This information could be crucial to labs and companies that are now engineering phages that could replace the widespread use of antibiotics and combat AMR.

 

This year, antibiotic resistant bugs have infected more than 2 million people globally; 23,000 of those individuals will die because of our inability to fight these superbugs. By using ProxiMeta Hi-C to better understand the genomics of microbial communities suspected to be AMR reservoirs, researchers can identify ARG carriers down to the strain-level and quantify how prevalent these genes are. With further exploration, this tool could one day offer a new solution to limit the spread of these genes and reverse the trend of increasing antibiotic resistance and save lives.

 

BRING A HI-C KIT INTO YOUR LAB TODAY

 

Phase Genomics offers a wide variety of proximity-ligation products and services including Hi-C preparation kits and a range of different cloud-based bioinformatic analysis platforms. Power your microbiome research with ProxiMeta Hi-C and our easy Hi-C kits; assemble hundreds of complete genomes for novel, unculturable microbes, and associate plasmids with hosts directly from raw microbiome samples using ProxiMeta Hi-C.

A sweet new genome for the black raspberry using Proximo™ Hi-C

Black raspberries

The Black Raspberry, known for its sweetness and health benefits studied further to reveal its chromosome-scale genome.

What is a black raspberry you may ask? Jams, preserves, pies, and liqueur are just a few of the delicious products made with black raspberry. The black raspberry offers much more beyond its exquisite flavors. For instance, did you know it contains a compound called anthocyanins that is used as a dye? It is also used in anti-aging beauty products and contains compounds that may help fight cancer. The useful properties of black raspberry are encoded within the genome.

A multi-national team of scientists have built a full map of the Black Raspberry genome. Teams from New Zealand, Canada, and the U.S.A. contributed to the project led by Drs. Rubina Jibran and David Chagné. The work was published in Nature, Horticulture Research. In the project they leverage Proximo™ Hi-C to order and orient short-read contigs into chromosome-scale scaffolds.

A chromosome-scale reference genome is an important step for basic biology and for breeding programs. Breeders can use this genome while crossing plants to select for traits like color or taste.  To learn more about how Hi-C technology was used to improve the black raspberry genome we contacted Dr. Chagné and Dr. Jibran for a Q&A session. We also wanted their take on the scientific value of Proximo Hi-C and to share their experiences working with us.

 

What is a black raspberry? How is it different from the blackberries we have in Seattle?

The black raspberry we used is no different from the ones found in Seattle. Actually, I remember seeing some black raspberries (also called black-caps) at Pike market few years ago! Washington and Oregon are the largest producers of this delicious crop. Raspberries belong to the genus Rubus, which includes red (Rubus idaeus) and black (R. occidentalis) raspberries, blackberries, loganberries and boysenberries.

 

There are many curious uses of black raspberries, what’s yours?

Black and red raspberries are great on top of Pavlova, alongside slices of kiwifruit. Pavlova is New Zealand’s iconic dessert served around Christmas time, which is the berry fruit season down under here.

 

What are molecular breeding technologies? What are some of the traits in black raspberry you’d like to breed for?

Molecular Breeding techniques use DNA to inform selection decisions. My colleague Cameron Peace from Washington State University did a very good review about the use of DNA-informed breeding in fruit tree.  Plant & Food Research is leading in the use of molecular tools for breeding fruit species, for example we are using genetic markers to predict if apple seedlings carry certain loci for black spot resistance or if they are likely to be red fruited. The breeding goals for Plant & Food Research’s raspberry breeding programme are high fruit flavour, berry anti-oxidant content, pest and disease resistance and higher productivity.

 

The initial black raspberry genome assembly was built from short-read data. Why did you choose to scaffold the short-read contigs rather than create a new long-read assembly? Would you get chromosome scale contigs from a long-read assembly? 

Actually we took both approaches and we decided we would like to see how much of the short-read assembly we would be putting together using Proximo Hi-C. A long-read based assembly will be released soon and the comparison of both assemblies will be extremely informative on what strategy to use for future assemblies of other crop species.

 

How did you validate the Proximity Guided Assembly (PGA) scaffolds? How did you correct errors in the scaffolds?

The PGA for black raspberry was first validated by aligning it to a linkage map and then by aligning it to the genome of strawberry (Fragaria vesca) as they have syntenic genomes.

 

What was the process like in working with Phase Genomics? Would you recommend them to your colleagues?

I enjoy a lot working with Phase Genomics. Black raspberry is not the first genome that we collaborated with Phase Genomics, as we had assembled genomes for kiwifruit and New Zealand manuka previously. The way we work with Phase Genomics is very iterative and they are excellent at trying new methods and assembly parameters until we are satisfied with our assemblies. Every organism has its own challenges when it comes to genome assembly and working with Phase Genomics in a very collaborative way is extremely useful. I have recommended Phase Genomics to colleagues.

Orphan Crop Gains Reference Genome with Proximo Hi-C

Amaranth genome assembly brought to the chromosome-scale using Phase Genomics’ Proximo Hi-C technology. 

 

“Orphan crops” are growing in popularity because they have the potential to feed the world’s expanding population.  You may have heard of orphan crops like quinoa or spelt, but have you heard of amaranth?  The amaranth genus (Amaranthus) is a hearty group of plants that produce nutritious (high in protein and vitamin content) leaves and seeds.  Amaranth species grow strongly across a wide geographic range, including South America, Mesoamerica, and Asia.  Amaranth was likely domesticated by the Aztec civilization and has been a staple food of Mesoamericans for thousands of years. Breeders wish to enhance amaranth’s beneficial properties like drought resistance, nutrition, and seed production to improve the usefulness of amaranth as a food source.  However, effective plant husbandry requires genetic and genomic resources, and building these resources has been inhibited by the high cost of genome sequencing and assembly.

 

Genome assembly Hi-C Orphan Crop

Dr. Jeff Maughan (left) and Dr. Damien Lightfoot (right), are the primary authors of the amaranth genome paper.

Dr. Jeff Maughan, professor at Brigham Young University, is a champion of orphan crop genomics.  Over the past year, Dr. Maughan and his team built a reference-quality amaranth genome on a tight budget.  They built upon an earlier,  short-read assembly by adding Hi-C data, which measures the conformation of chromatin in vivo, as well as low coverage long reads and optical mapping data.  After using optical mapping to correct assembly errors in the short read assembly, the Hi-C data was used to cluster the short genome fragments into nearly complete chromosomes using Phase Genomics’ Proximity-Guided Assembly platform, Proximo™ Hi-C, Then, the long reads were used to close remaining gaps on the chromosomes.  This cost-effective strategy recovered over 98% of the 16 amaranth chromosomes.

 

The completed reference genome provides an important resource for the community and will boost the efforts of plant breeders to unlock more agricultural benefits for amaranth.  In their paper, Dr. Maughan’s team demonstrated the utility of the reference quality genome in at least two ways.  First, they looked at chromosomal evolution by comparing the amaranth genome to the beet genome, which enables researchers to better understand amaranth in the context of how plants evolved, and second, they mapped the genetic locus responsible for stem color, which clarifies the scientific understanding of a useful agricultural trait.  Dr. Maughan points out that both of these experiments would have been impossible without the chromosome-scale genome assembly afforded by Proximo Hi-C.

 

A high-quality reference genome is the first of many important steps towards creating a modern breeding program for amaranth. We contacted Dr. Maughan to learn more about how he is improving amaranth genomics and the importance of orphan crops.

 

What is an orphan crop? 

According to the FAO (Food and Agriculture Organization of the United Nations) the world has approximately 7,000 cultivated edible plant species, but just five of them (rice, wheat, corn, millet, and sorghum) are estimated to provide 60% of the world’s energy intake and just 30 species account for nearly all (95%) of all human food energy needs.  The remaining species are underutilized and often referred to as “orphan crops”.

 

How is genomics relevant to orphan crops?

Would you invest your entire 401K savings in just three stocks?  In essence, that is what we are doing with world food security.  This comes with tremendous risk.  If we are going to diversify our food crops, it will be with these orphan crops.  Modern plant breeding programs leverage genomics to significantly enhance genetic gain (yield), such methods will undoubtedly expedite the development of advanced varieties in orphan crop species.

 

What are the challenges facing researchers interested in orphan crop genomics?  How have you overcome them?

Funding has long been the main obstacle to developing genomic resources for orphaned crops.  The development of cheap, high-quality next-generation sequencing technology has dramatically ameliorated this problem – making genomics accessible for most plant species.

 

You used two scaffolding technologies for your assembly, Hi-C, and BioNano. How did they compare?

Both technologies are extremely useful and complementary but address different genome assembly challenges.  The Hi-C data allows for the production of chromosome length scaffolds, while the BioNano data allows for fine-tuning and verification of the assembly.

 

Beyond building a high-quality genome assembly, what other genomic resources are required to encourage the adoption of orphan crops?

While genomic resources (such as genome assemblies and genetic markers) are fundamental for developing a modern plant breeding program, often what is missing with orphan crops is the collection of diverse germplasm (or gene bank) that is the foundation of a hybrid breeding program.  The U.S. and other nations have extensive collections (tens of thousands of accessions) that serve as the genetic foundation for staple crop breeding programs – unfortunately, such collections are minimal or non-existent for orphan crops.

 

Who stands to benefit the most from a complete amaranth genome?  How do you disseminate your work to them?

We collaborate extensively with researchers throughout South and Central America, where amaranth is already valued as a regionally important crop.  Dissemination of our research occurs though traditional methods (e.g., peer reviewed publications) as well as through sponsored scientist and student exchanges.

 

Amaranth is used in a variety of interesting foods, what’s your favorite dish?

Alegría, which is made with popped amaranth and honey, and is common throughout Mexico.

 

Hi-C Used to Assemble Extremely Large, Difficult Barley Genome

Barley is the 4th most cultivated plant in the world and has been a reliable food source for over 10,000 years. Genome Web reports on the exceptional state of the genome assembly and how researchers used Hi-C technology to tackle this extremely complex genome.

 

The barley genome, like many other grains, is notorious for being extremely difficult to assemble due to extensive polyploidy, long repeat regions, and its large genome size (5.3 Gb). However, the Barley Genome Sequencing Consortium (IBSC) used Hi-C to tackle this genome assembly, producing chromosome-level scaffolds representing over 95% of the genome in an attempt to understand the biology of this widely cultivated plant. After completing the assembly, the researchers began annotating the genome and identified over 87,000 different genes, publishing their findings in Nature.

 

Obtaining reference-quality assemblies for complex genomes, such as barley, used to be an extremely challenging endeavor. With Hi-C, obstacles like polyploidy and multi-Gb genomes are manageable due to its ability capture ultra-long-range genomic contiguity information from unbroken chromosomes, replacing the need for genetic maps. This ability enables researchers to answer questions otherwise difficult or impossible, including structural variation, complex gene structure, gene linkage, gene regulation, and more. While the researchers performed the barley assembly themselves, Phase Genomics’ Proximo Hi-C service makes it easy for any researcher to obtain similar results and has been used to assemble hundreds of genomes to chromosome-scale over the past two years, including complex genomes like barley.

 

Read more about the barley genome on Genome Web.