Tag: HiC

Choose This Year’s Metagenomics Award Winner

Congratulations to Dr. Ben Tully on winning this year’s Project ProxiMeta: 2019 Metagenomics Award! Read more about his project, 4. The Complete Hydrothermal Microbial Metal Metabolism

This summer, researchers from across the U.S. sent in short proposals for a chance to win a full-service ProxiMeta™ microbiome workup for a sample of their choice. ProxiMeta combines shotgun metagenomics with in vivo proximity ligation (Hi-C) and necessary bioinformatic tools to help researchers assemble high-quality microbial genomes directly from complex microbiome samples.

 

 

HOW TO VOTE

Each project was assessed by a panel of scientists for scientific merit, novelty, impact, and feasibility, and four finalists were selected. Cast your vote on Twitter for your favorite project.

 


 

THE FINALISTS

1. The Gut Microbiome as a Risk Factor for Arsenic-Induced Cancer

Twitter Name: Gut & As-Induced Cancer

It is estimated that ~200 million people worldwide are exposed to arsenic concentrations exceeding current safety standards. Our collaborators have recently demonstrated that mice and human microbiomes can protect mice from arsenic toxicity. While human stool supplementation fully restores protection to arsenic in germ-free mice, researchers were only able to isolate one microbe, Faecalibacterium prausnitzii, that successfully conferred protection to both parent and infant mice. These results are huge because arsenic poses the highest lifetime risk for developing cancer in humans.We will investigate the role of arsenic-transforming bacteria within the gastrointestinal (GI) microbiome as another possible risk factor.

In nature, arsenic-reducing microorganisms are well known for their ability to generate more toxic arsenic products called arsenites, which are typically formed in anaerobic environments like the gut. Past research indicates that ingested arsenic may also be transformed into the toxic product arsenite by gut microbes thus increasing the risk for the host. On the other hand, arsenite-oxidizing microbes may also provide a benefit to the host by lowering arsenite concentrations. The ability of the microbiome to transform arsenic is determined by its genetic composition, therefore ProxiMeta sequencing technology will allow us to immediately analyze our collaborators rodent stool samples for genetic clues regarding this mysterious protection. Our project goals are to expand on this knowledge by: (1) characterizing the genetic basis for protection to arsenic provided by the microbiome (2) identifying, and then isolating, the bacteria-harboring arsenic transforming genes involved in protection.

We predict that differences in the gut metagenome composition will explain the incidences in arsenic susceptibility within a population or even at the family level. This project will provide important insight regarding how gut microbes contribute to cancer and may lead to novel therapies and probiotics that could target the microbiome of arsenic-exposed individuals.


2. Evaluating Antimicrobial Resistance in Backyard Poultry Environments

Twitter: AMR in Backyard Poultry

Approximately 13 million rural, urban, and suburban US residents reported owning backyard poultry (BYP) in 2014, and interest in BYP ownership is nearly four times that amount. BYP ownership has risen recently due to product quality, public health, ethical, and animal welfare concerns of commercial operations. However, BYP ownership and disease treatment is largely under-regulated, unlike commercial poultry production. Lack of regulation poses public health concerns of transmission of antimicrobial resistant (AMR) bacteria, such as AMR strains of Salmonella, Mycoplasma gallisepticum, and Escherichia coli commonly associated with BYP. BYP owners (2014 survey) were largely uninformed about poultry diseases and treatments but were interested in learning more on disease management.

The combination of a lack of regulation and public information warrants further research into the bacterial communities of BYP and their environments. Cloacal and environmental swabs were collected as part of a 2018 citizen science study where BYP owners reported current and historical poultry antibiotic usage. We propose to conduct shotgun metagenomic sequencing and proximity ligation using the ProxiMeta platform, allowing for increased detection of full-length AMR gene alleles compared to that revealed by short-read sequencing. The combination of PacBio reads with HiC intercontig ligation analysis allows for identification of potential gene transfer events of AMR genes within communities and potential dissemination throughout the environment.

This analysis is especially important considering the public health concerns of AMR persistence in backyard environments. Additionally, investigation of lytic and prophage presence would allow investigation of phage-mediated bacterial regulation that would not be possible with short-read sequencing alone. ProxiMeta analysis of these samples would provide the most comprehensive insight of AMR presences and persistence in BYP environments to date. These findings will be critical for new regulation and disease management for the increasing number of BYP flocks, which currently pose a potential health risk.


3. Unraveling the Metagenomics of Contamination

Twitter: Steel Site Contamination

We propose a metagenome characterization of contaminated Munger Landing sediment located in the St. Louis River, Duluth, MN USA. Seasonal samples are already collected and stored; of which one will be sequenced. Munger landing, is located downstream from the U.S. Steel Superfund site and contaminants include PAHs, dioxins, PCBs, and heavy metals.

Soil condition is integral to high productivity and ecosystem balance at all trophic levels. Human activities erode soil condition through agriculture, mining, sewage outflows and/or chemical/waste disposal into waterways. These practices alter the chemical structure of the soil and break down the microbial community processes responsible for ensuring the balance of biogeochemical cycling patterns in the soil. We hypothesize the activity of these pathways involved in cycling of nitrogen, phosphorus and carbon are altered in contaminated soil systems.

Metagenomic profiling of Munger Landing will provide data to examine microbes, metabolic pathways, and contaminant-processing genes present in the community that can be characterized further using qRTPCR. This project will be presented within a community college microbiology course module. Curriculum utilizing real-world data and the sequencing technology from Phase Genomics will teach students experimental design, troubleshooting, hypothesis testing, data analysis and how to communicate the broader impacts of a study to society, the field of environmental microbiology or conservation.

In the future, this data will assist in designing a longitudinal metagenomic and metatranscriptomic study to assess the ability of remediation to ‘recover’ bacterial community function at the Munger Landing site; slated to start in 2020-2021 as compared to two uncontaminated control sites. Ten sites, slated for remediation, have been identified as having high chemical and heavy metal contamination for the St. Louis River Estuary. The Munger Landing project will establish a workflow that can be applied to other contaminated sites.


4. The Complete Hydrothermal Microbial Metal Metabolism

Twitter: Hydrothermal Microbiome

Hydrothermal vents replenish the oceans with much-needed micronutrients, spewing iron, magnesium, nickel, and other metals from the earth’s crust. These metal micronutrients are used as biological cofactors for organisms throughout the marine food chain. Boiling, sterile hydrothermal fluids quickly cool and are colonized by highly specialized microorganisms that begin to cycle the metal species mixing with the seawater. Though regularly sampled, rarely have hydrothermal plumes been tracked through the water column to establish how microbial colonization occurs through time and space. We lack understanding regarding the replicability of colonization to what extent stochastic processes shape microbial community structure.

While on station at the East Pacific Rise hydrothermal vent field, size-fractionated samples (0.2, 3.0 and 5.0-μm) were collected in the hydrothermal plume emanating from Bio Vent. Samples fluids were collected from the source through the first 1-km of dispersal – the key distance for colonization – and this effort was repeated over the course of 10-days – to determine the replicability of natural colonization events. The application of standard metagenomics sequencing and microbial genome reconstruction through binning would provide novel insight into the cycling of metals within the plume but the use of cross-linked DNA techniques would deliver an unprecedented understanding of how strain diversity impacts colonization and how microbes interact with extrachromosomal elements in the environment.

While some microbes are poised to take advantage of reduced metal species for lithotrophic growth, microbes from the water column that become entrained in the plume will need metal-resistance adaptations to alleviate stress from the elevated metal concentrations present. Metal-resistance genes dispersed through the viral and plasmid pools are essential elements for understanding the functioning of the microbial community in this globally important source of metals to the oceans and effective interpretation of the community can only be achieved through cross-linked DNA metagenomic techniques.

*All finalists projects are owned by verified researchers at U.S. academic institutions.


 

RESOURCES

 

Earth’s Wine Cellar: Digging into the Microbiome of Vineyards

 

Phase Genomics partnered with Browne Family Vineyards to begin to understand, the microbiome makeup of soils within different vineyards across the state of Washington. The findings were unveiled at the Pacific Science Center’s “STEM: Science Uncorked” winetasting event.

There are many different factors that contribute to soil composition, such as parent material, topography, climate, geological time, and the thousands of different and undiscovered microbes living in the soil—the least understood factor. In April of 2018, Browne Family Vineyards staff visited five of their vineyards, filled a bag with soil from each site, and sent it to Phase Genomics to analyze the microbiome in each of the soil samples.

SYMBIOSIS BETWEEN PLANTS AND MICROBES

Plants rely heavily on their microbiome to live, grow, and protect themselves from pathogens. One example of this symbiotic relationship is that plants release chemicals into the soil in order to attract microbes. These microbes bring nutrients such as nitrogen, iron, potassium, and phosphorus to the plants in exchange for sugar, which the microbes require to survive. Microbes also play an important role in nitrogen fixation, organic decay, and biofilm production to protect the plant roots from drought. It is evident that this symbiotic relationship between microbes and plants is critical to the health and survival of both, but further research into this complex community is inhibited by two main problems: It is impossible to isolate microbes in such a complex mix and most of the microbes have never been discovered before.

THE DARK MATTER OF THE MICROBIOME

Microbes live in communities where they rely on each other. This makes it difficult to isolate or culture (i.e. grow) microbes without killing them or altering their genetic makeup. Moreover, there can be millions of microbes living in a single teaspoon of soil, making these samples extremely complex environments. This causes most of the microbial world to be unknown, sometimes referred to as the “Dark Matter of the Microbiome”.

The most effective way to identify the microbes in the community is to look at the genetic makeup of the microbiome to try to classify microbial genomes present. Standard practices include sequencing of 16S (a hypervariable genomic region) and shotgun sequencing.  By combining these standard practices with Hi-C, researchers are now able to fully reconstruct genomes from a mix because Hi-C captures the DNA within each microbe to exploit key genetic features unique to each individual in the community. The Phase Genomics Hi-C kit and software, ProxiMetaTM, uses this information to capture even novel genomes straight from the sample without culturing—illuminating the dark matter of the microbiome.

THE PROCEDURE

Shotgun Sequencing Procedure and Difficulties

Figure 1: Shotgun Sequencing Procedure and Difficulties

Once the soil samples were collected from the five vineyards, Phase Genomics produced shotgun libraries to obtain DNA from all of the microbes in each sample (Figure 1)—essentially taking the soil sample, breaking open all of the microbial cells then purifying the DNA (1.A). Since DNA is fragile, most of it gets broken into smaller pieces during this process, leaving a mix of many DNA fragments from all of the microbes that were present in the original soil sample. The fragmented DNA is then sequenced and the “sequence reads” are uploaded into a database of known microbial genomes (1.B). This database then searches for matches or “hits” to see if the reads are similar to anything in the database (1.C).

A problem with relying on shotgun data is that it’s unclear which DNA fragments belong to which microbe, thus relying heavily on computational techniques and the accuracy of the reference database for classification. This results in little improvement or clarity on the makeup of the sample, again, leaving the microbiome in the dark. Though shotgun sequencing only provides a glimpse into the microbial community, this data allows scientists to differentiate the taxonomy (phyla, genera, species) of the microorganisms living in the soil.

THE RESULTS

Shotgun sequencing identified over 10,000 different species from each of the vineyard soil samples; however, it is impossible to know if this is the true number of species because only ~ 20% of the reads matched the database, indicating ~80% was either incomplete or undiscovered (see table below).

Table 1: Vineyard Read Classification
Vineyard Total Reads Percent of Reads Classified Number of Organisms Found Percent of Unknown Organisms
Canyon 19,001,222 15.95% 10,726 73.32%
Canoe Ridge 21,214,190 17.66% 11,721 55.55%
Waterbrook 19,469,954 19.6% 10,782 50.58%
Skyfall 63,850,810 16.17% 15,101 80.08%
Willow Crest 43,941,026 17.13% 13,914 71.84%

 

Moreover, of assigned reads, >50% did not match to a genus or species—hinting that many of the organisms found are novel. Without digging too deep into the microbiome analysis, it is evident that the microbial makeup is different for each of the samples. Varying levels of reads from each vineyard were able to be classified (Table 1), and among the classified reads, the vineyards have 3-4 microbes that vary in abundance in common. These microbes, such as Proteobacteria, Rhizobacteria, and Actinobacteria, generally, are very common in soil.

Proteobacteria

Proteobacteria

There are obvious differences in the biodiversity of the soil samples both in number of species and relative abundance. For example, Canoe Ridge and Waterbrook samples were >20%, Delftia, while the microbes in the other vineyards were more evenly distributed, with abundance closer to 1-5%. Interestingly, Delftia, a rod-shaped bacterium, has the ability to break down toxic chemicals and to produce gold.

Actinobacteria

Actinobacteria

There are two main components that influence microbe classification in these samples: the desired taxonomy level, and the statistical threshold, or minimum number of reads, set to define it. Much like zooming in and out, the most “zoomed out” analysis is achieved by a stringent threshold and will reveal phylum, while the most “zoomed in” analysis is achieved by a more lenient threshold and will reveal genus and species

If the data is “zoomed in” further, about 37% of the microbes in each community can be identified by genus. On average, 63% of the communities do not match to a genus at all, hinting that these microbes may have never been sequenced. The most abundant microbe genera present in these samples are Bradyrhizobium, Streptomyces, and Nocardiodes.

As discussed earlier, this data highlights the issues that are present with shotgun data and the corresponding analysis: there is still far too much that is unknown. In order to better understand these samples, we also performed Hi-C on two of the samples which will be discussed in further detail in the next section.

 

HI-C AND FINDING NOVEL GENOMES

One thing all these soil samples have in common is that they are composed of numerous novel species. To obtain more information on the microbes present in these samples, and solve the issue discussed earlier surrounding shotgun data, Hi-C was performed on two of the soil samples, Skyfall and Willow Crest. Essentially, Hi-C assigns DNA fragments from shotgun sequencing to the correct species by connecting DNA while the cells are still intact.

Hi-C enables clustering of shotgun assemblies and subsequently yields complete genomes from a microbiome, even if the genome has never been sequenced before. With complete microbe genomes, it becomes easier to classify organisms down to the strain-level—a step even further than species. By having the genome, we can essentially read a microorganism’s blueprint and learn more about its genes, evolution, and even function once the genome is annotated.

For example, preliminary data from the Willow Crest soil sample yielded 400 different genome clusters. When compared to known bacterial genomes in the RefSeq database, which aggregates all published microbial genomic data, over half of the extracted genomes are unable to be identified at a genus level and thus likely represent newly discovered bacterial organisms.

SCIENCE UNCORKED

When the microbiome data from the vineyards were presented to the public at the Pacific Science Center, two questions consistently arose: How does this influence wine taste, and how can growers select for a healthy microbiome? These very forward-thinking questions unfortunately cannot be answered—yet.

Scientists do know that soil plays a big role in plant health, and this could in part be due to the plants’ symbiotic relationship with microbes, as discussed earlier. It has also been shown that biodiversity can benefit plants because of the diverse functions individual microbes have, i.e. with more microbes, there are more potential functions being served versus 1 microbe serving one function. However, nailing down answers to these questions will take a lot of research. With emerging technologies, like Hi-C, the answers have become much more obtainable.

Though the term “microbiome” may not be household vocabulary, many of the attendees were very aware about the role that microbes play in human health, and how they influence the world around us. It goes to show that the rapid developments in the microbiome field are reaching beyond just research and becoming more tangible for the general public. Relevant stories—like looking into the microbiome of vineyards— are helping them understand the intricate concept of microbial life.

Learn more about ProxiMeta Hi-C and the microbiome by visiting our website www.phasegenomics.com and connect with us on twitter by following @PhaseGenomics

Lil BUB Aids in Discovery of New Bacteria

Published author, talk show host, movie star, musician, and philanthropist—Lil BUB has now also helped to discover novel microbial life living in her gut in collaboration with AnimalBiome, KittyBiome, and Phase Genomics. Enter to sequence your cat’s microbiome in our #Meowcrobiome twitter raffle!

 

We live in an era of discovery, especially as it relates to the microbiome and how microbial diversity influences our world, our health—and even our pet’s health. To better understand the microbial life of our feline friends, Lil BUB volunteered to sequence her gut microbiome. Thanks to a recent collaboration with AnimalBiome, KittyBiome, and Phase Genomics, Lil BUB helped discover 22 new microbes living in cats which, in time, could reveal new insights into cat health and happiness.

When KittyBiome started back in 2015 with an intent to understand the cat microbiome,  Lil BUB’s owner Mike “Dude” Bridavsky provided a sample of her poop to be analyzed. Because of Lil BUB and over 1,000 other cats, KittyBiome’s microbial census will help us identify what microbes are associated with healthy cats and work towards helping cats with Inflammatory Bowel Disease (IBD), diabetes and other ailments likely to be associated with the microbiome.

 

USING GENOMICS TO FIND MICROBES

Late last year, Phase Genomics offered to analyze samples from Lil BUB and another cat, Danny (belonging to Jennifer Gardy—a microbiologist at the University of British Columbia and science TV host), using our ProxiMeta™ Hi-C Metagenomic Deconvolution platform to obtain complete microbial genomes from their samples.  This method solves a huge problem in microbiome research—how to tell apart different species when their DNA is all mixed up in one sample (imagine a thousand jigsaw puzzles mixed together).

ProxiMeta Hi-C revealed about two hundred different species of microorganisms living in Lil BUB and Danny’s poop, many of which have never been seen before. The genome sequences of the microorganisms found in these samples were analyzed using our software and other microbiome analysis tools to measure the quality of the different assembled genomes and to see if those genomes matched any known microbes (Lil BUB’s and Danny’s data are available for free on our website). Without using our ProxiMeta Hi-C platform to extract these genomes, many of them would have been undetectable and gone unseen.

Lil BUB and Danny the Cat

Phase Genomics sequenced both Lil BUB (left) and Danny’s (right) poop samples.

 

OVER 20 NEW BACTERIAL GENOMES DISCOVERED

Lil BUB being heldTogether, Lil BUB and her buddy Danny carry 22 previously undescribed bacterial species in their guts.  Lil BUB’s poop sample had 13 species and Danny’s sample had 9 species that have never before been fully sequenced or characterized.

These new bacterial species mostly belong to the order Clostridiales, and the team is currently analyzing the genomes to better characterize them. This discovery will help continue to build a database that contains cat bacteria that are new to science, so we can better identify the contributions of the microbiome to various health conditions.

This cool discovery, made with the help of Lil BUB and Danny, highlights that there’s a  universe of undiscovered microbial life out there. If we found 22 potentially novel species in only two cats, just imagine what else is out there, and what the implications might be for new ways to support and improve the health of our pets.

 

WHO ARE OUR HERO CATS?

Lil BUB is a one of a kind critter, made famous on the Internet due to her adorable genetic anomalies. She is a “perma-kitten”, which means she will stay kitten-sized and maintain kitten-like features her entire life. She has an extreme case of dwarfism, which means her limbs are disproportionately small relative to the rest of her body. Her lower jaw is significantly shorter than her upper jaw, and her teeth never grew in so her tongue is always hanging out. Lil BUB is also a polydactyl cat, meaning she has extra toes – 22 toes total!  Lil BUB and Her Dude travel all over the country raising hundreds of thousands of dollars for animals in need.

Danny, an exotic shorthair with a face much like Grumpy Cat, is equally adorable.  He is the companion cat of one of KittyBiome’s original researchers, Jennifer Gardy, and was one of the very first cats to lend his poop profile to the KittyBiome initiative.  He is a very healthy cat and his microbial profile has helped us learn what a balanced gut in cats looks like.

WHAT’S NEXT?

Phase Genomics and AnimalBiome are eager to learn more about these newly-discovered bacterial species. They hope to work with the scientific community to analyze, identify, characterize and publish these genomes, starting with exploring their identities based on 16S rRNA and other marker genes.

HOW TO GET INVOLVED

  • Help characterize the new bacteria: If you know of a researcher, scientist or cat-lover who would like to help us, we are soliciting input on the analysis that needs to be done to properly characterize and publish these genomes. Participants who contribute in a substantive manner to the project will be co-authors on the publication. All data associated with the project will be deposited into publicly available databases and we will publish the findings in open access journals, so all pet lovers can read them. We will hold a raffle to award one lucky contributor a free Hi-C sample kit from Phase Genomics. If interested, contact us at team@animalbiome.com to learn more.
  • Name the new bacteria: We’re looking for input from the community on what we should name these 22 new bacteria, so if you have any fun ideas, please drop us an email at team@animalbiome.com. The format should follow standard practices of scientific nomenclature, so it should be constructed like this: “Clostridium _________.”
  • Submit your pet’s sample for genomic research: If you don’t win the raffle and still want your pet to contribute to scientific knowledge through the identification of new bacterial species, please contact us at team@animalbiome.com. We can provide you with the details and pricing involved for us to identify new species in your cat or dog through in depth analyses like we did for Lil BUB and Danny using the Hi-C approach pioneered by Phase Genomics, which would also result in a publication.

Improving databases of the microbiome of cats (and dogs) with new bacteria like this could help us learn more about how the gut microbiome helps support the digestive health of all pets.

ENTER YOUR CAT IN OUR TWITTER RAFFLE

Phase Genomics, AnimalBiome and KittyBiome are hosting a twitter raffle where you can enter to sequence your cat’s microbiome! All you have to do is go to either the Phase Genomics’ or AnimalBiome’s original tweet of this blog, retweet it with a picture and introduction of your cat with the hashtag #Meowcrobiome. On August 8th 2018, we will randomly draw one (1) winner whose cat poop will be scientifically analyzed by Phase Genomics with ProxiMeta Hi-C to search for novel microbes, and three (3) additional winners whose cat poop will receive a Kitty Kit to have their cat’s poop analyzed by Animal Biome to compare their cat’s gut to healthy cat guts.  Send in your cat’s poop, and you too can help discover new microbial life!

LIL BUB AND DANNY’S STORY FEATURED ON GEEKWIRE PODCAST

GeekWire discussed Lil BUB, Danny, and the new bacteria found in their poop in their weekly Week in Geek podcast. Check out the full podcast on their website (the segment begins around 22:58), or play just the segment about Lil BUB and Danny below.

 

 

Phase Genomics and Pacific Biosciences Co-Developing new Genome Assembly Phasing Software

Phase Genomics and Pacific Biosciences logos

“FALCON-Phase” – an algorithm for producing diploid genomes.

 

Phase Genomics has entered into a co-development agreement with Pacific Biosciences to develop FALCON-Phase, a software module that combines Hi-C and PacBio® highly-accurate, long read sequencing data to produce fully-phased diploid genome assemblies. The software will be released later this spring.

FALCON-Phase augments PacBio Single Molecule, Real-Time (SMRT®) assemblies with Hi-C proximity-ligation data, generating accurate, fully-phased diploid assemblies. Specifically, it uses Hi-C’s chromatin proximity information to identify sequences belonging to the same parental chromosome in genome assemblies produced by PacBio’s FALCON-Unzip software, greatly reducing haplotype switching along the primary assembly.

Furthermore, by combining Phase Genomics’ Proximo Hi-C genome scaffolding technology with FALCON-Phase, users can fully reconstruct maternal and paternal haplotypes on a chromosomal scale. The end result is a diploid set of chromosome-scale scaffolds, or two fully-phased genomes for the same data and labor cost typical for a single genome project.

FALCON-Phase genome Phasing Graph

FALCON-Phase groups long-read contigs into two separate haplotypes based on Hi-C data. Red and blue edges show contigs connected to the same haplotype, while black edges show homologous contigs connected to both haplotypes. Colors were assigned based on known phasing of assembly, which was not otherwise used to inform FALCON-Phase analysis.

These high-quality phased haplotypes can be leveraged to improve the efficiency of agricultural breeding programs, and could help identify disease-causing genomic variations in humans.

Prof. John Williams, Director of the Davies Research Centre at the University of Adelaide, Australia, wrote, “We are interested in expression of imprinted genes and for this work the availability of haplotype-resolved genome assemblies is an important advance. The release of software that enables the creation of haplotyped genome sequence assembly will revolutionize exploration of genome function. The FALCON-Phase software has this ability and can be applied retroactively to SMRT assemblies, as long as Hi-C data are available. Therefore, even pre-existing genomes can potentially be upgraded to haplotyped assemblies for little or no cost.”

Haplotype-resolved genome assembly is an exciting emerging field. Currently, there is only one other method, Trio Canu, which, unlike FALCON-Phase, requires the parents and offspring to be sequenced, adding an additional cost. For many species, it is not possible to collect a trio in the wild and breeding is often not an option. Other Hi-C phasing techniques exist, but they phase genetic variants, not genome assemblies.

The addition of ultra-long genomic interactions captured by Hi-C to PacBio assemblies is very powerful and presents a straightforward solution to a problem experienced by almost all genomic researchers working with diploid organisms.

A formal announcement with more information is coming in the next month. For more information, email us at info@phasegenomics.com.

 

Pacific Biosciences, the Pacific Biosciences logo, PacBio and SMRT are trademarks of Pacific Biosciences of California, Inc.

A sweet new genome for the black raspberry using Proximo™ Hi-C

Black raspberries

The Black Raspberry, known for its sweetness and health benefits studied further to reveal its chromosome-scale genome.

What is a black raspberry you may ask? Jams, preserves, pies, and liqueur are just a few of the delicious products made with black raspberry. The black raspberry offers much more beyond its exquisite flavors. For instance, did you know it contains a compound called anthocyanins that is used as a dye? It is also used in anti-aging beauty products and contains compounds that may help fight cancer. The useful properties of black raspberry are encoded within the genome.

A multi-national team of scientists have built a full map of the Black Raspberry genome. Teams from New Zealand, Canada, and the U.S.A. contributed to the project led by Drs. Rubina Jibran and David Chagné. The work was published in Nature, Horticulture Research. In the project they leverage Proximo™ Hi-C to order and orient short-read contigs into chromosome-scale scaffolds.

A chromosome-scale reference genome is an important step for basic biology and for breeding programs. Breeders can use this genome while crossing plants to select for traits like color or taste.  To learn more about how Hi-C technology was used to improve the black raspberry genome we contacted Dr. Chagné and Dr. Jibran for a Q&A session. We also wanted their take on the scientific value of Proximo Hi-C and to share their experiences working with us.

 

What is a black raspberry? How is it different from the blackberries we have in Seattle?

The black raspberry we used is no different from the ones found in Seattle. Actually, I remember seeing some black raspberries (also called black-caps) at Pike market few years ago! Washington and Oregon are the largest producers of this delicious crop. Raspberries belong to the genus Rubus, which includes red (Rubus idaeus) and black (R. occidentalis) raspberries, blackberries, loganberries and boysenberries.

 

There are many curious uses of black raspberries, what’s yours?

Black and red raspberries are great on top of Pavlova, alongside slices of kiwifruit. Pavlova is New Zealand’s iconic dessert served around Christmas time, which is the berry fruit season down under here.

 

What are molecular breeding technologies? What are some of the traits in black raspberry you’d like to breed for?

Molecular Breeding techniques use DNA to inform selection decisions. My colleague Cameron Peace from Washington State University did a very good review about the use of DNA-informed breeding in fruit tree.  Plant & Food Research is leading in the use of molecular tools for breeding fruit species, for example we are using genetic markers to predict if apple seedlings carry certain loci for black spot resistance or if they are likely to be red fruited. The breeding goals for Plant & Food Research’s raspberry breeding programme are high fruit flavour, berry anti-oxidant content, pest and disease resistance and higher productivity.

 

The initial black raspberry genome assembly was built from short-read data. Why did you choose to scaffold the short-read contigs rather than create a new long-read assembly? Would you get chromosome scale contigs from a long-read assembly? 

Actually we took both approaches and we decided we would like to see how much of the short-read assembly we would be putting together using Proximo Hi-C. A long-read based assembly will be released soon and the comparison of both assemblies will be extremely informative on what strategy to use for future assemblies of other crop species.

 

How did you validate the Proximity Guided Assembly (PGA) scaffolds? How did you correct errors in the scaffolds?

The PGA for black raspberry was first validated by aligning it to a linkage map and then by aligning it to the genome of strawberry (Fragaria vesca) as they have syntenic genomes.

 

What was the process like in working with Phase Genomics? Would you recommend them to your colleagues?

I enjoy a lot working with Phase Genomics. Black raspberry is not the first genome that we collaborated with Phase Genomics, as we had assembled genomes for kiwifruit and New Zealand manuka previously. The way we work with Phase Genomics is very iterative and they are excellent at trying new methods and assembly parameters until we are satisfied with our assemblies. Every organism has its own challenges when it comes to genome assembly and working with Phase Genomics in a very collaborative way is extremely useful. I have recommended Phase Genomics to colleagues.

New Video: From Contigs to Chromosomes

Phase Genomics CEO and Founder Ivan Liachko, Ph.D. offers an inside look at our ProxiMeta™ Hi-C and Proximo™ Hi-C technology. He explains in this 40 minute presentation how Hi-C is revolutionizing genome and metagenome assembly. Watch “From Contigs to Chromosomes” now and reach out to http://phasegenomics.com/contact-us/ with any questions.

Thanks to IMMSA for hosting this webinar.

Uncovering the microbiome: What will you do with metagenomics?

In this Nature Microbiology blog post, Mick Watson shares his journey into the depths of the rumen microbiome. Read more here to learn how Phase Genomics ProxiMeta Hi-C Metagenomic Deconvolution techniques are helping investigators advance their metagenomic research in complex samples. This study successfully assembled 913 genomes and will help to improve our understanding of the microbial population in cow rumen in an unprecedented way using these new metagenomics techniques. We look forward to seeing what else comes from Microbiome 2.0. and are proud to be a part of this impressive piece of work.

Hundreds of Genomes Isolated from Single Fecal Sample with Hi-C Kit

 

Hi-C Kit Microbiome

A Phase Genomics Hi-C kit for any sample type are now available!

Phase Genomics recently launched its ProxiMeta™ Hi-C metagenome deconvolution kit + software
product, enabling researchers to bring this powerful technology (previously only available through the ProxiMeta service) into their own labs. A new paper posted to biorxiv describes the results of employing ProxiMeta technology to deconvolute a human gut microbiome sample.

 

In the paper, ProxiMeta was used on a single human gut microbiome sample and isolated 252 individual microbial genomes or genome fragments, with 50 of these genomes meeting the “near-complete” threshold typically used as the standard according to the CheckM tool (>90% complete, <10% contaminated). Examining the tRNA and rRNA content of the genomes found 10 to meet “high-quality” and 75 to meet “medium-quality” thresholds. Additionally, 14 of the genomes represent near-complete assemblies of novel species or strains not found in RefSeq, showing that even after many years of research, there remain numerous unknown microbes in the human gut that are discoverable with new approaches.

 

ProxiMeta’s results were compared to those achieved with MaxBin, a common tool used to perform metagenomic binning based on heuristics such as shotgun read depth and tetranucleotide profiles. MaxBin was able to create 29 near-complete genomes (cf. 50 for ProxiMeta), with only 5 meeting high-quality (cf. 10) and 44 meeting medium-quality (cf. 75) thresholds based on tRNA and rRNA content. In terms of ability to construct similar sets of near-complete genomes, ProxiMeta and MaxBin constructed 27 of approximately the same genomes, with ProxiMeta constructing an additional 32 genomes that MaxBin did not, and MaxBin constructing 9 genomes that ProxiMeta did not. ProxiMeta’s assembled genomes also exhibited a much lower amount of contamination than MaxBin’s assembled genomes, with 43% of MaxBin’s assemblies exceeding the 10% contamination limit that is the typical standard for genome quality, compared to only 2% of ProxiMeta’s assemblies.

 

Other results unique to ProxiMeta include the discovery of near-complete genomes for 14 novel species or strains and various associations of plasmids with their hosts. Of the 14 novel genomes, 10 appear to be of the class Clostridia, a common group of gut microbes that are poorly characterized due to their difficulty to culture.  ProxiMeta also assigned 137 contigs containing plasmid content to a cluster and identified candidate plasmid sequences as being present across multiple, distantly related bacteria. For example, ProxiMeta placed a known megaplasmid into an assembly for Eubacterium eligens that included homologous plasmid sequences placed into several other genomes, suggesting either the presence of the megaplasmid into other species, or variants of the megaplasmid being found on other mobile elements spread through the metagenome.

 

The depth of the resulting data and results offers the opportunity to learn much more about this microbial niche and research continues to unlock new discoveries about this community. Phase Genomics is thrilled to be able to offer all researchers the same new power to dig deeper into their mixed samples than ever before, especially now with a product that puts the power of discovery in their hands.

 

To learn more about ordering our kits or services, just send us an email at info@phasegenomics.com

Orphan Crop Gains Reference Genome with Proximo Hi-C

Amaranth genome assembly brought to the chromosome-scale using Phase Genomics’ Proximo Hi-C technology. 

 

“Orphan crops” are growing in popularity because they have the potential to feed the world’s expanding population.  You may have heard of orphan crops like quinoa or spelt, but have you heard of amaranth?  The amaranth genus (Amaranthus) is a hearty group of plants that produce nutritious (high in protein and vitamin content) leaves and seeds.  Amaranth species grow strongly across a wide geographic range, including South America, Mesoamerica, and Asia.  Amaranth was likely domesticated by the Aztec civilization and has been a staple food of Mesoamericans for thousands of years. Breeders wish to enhance amaranth’s beneficial properties like drought resistance, nutrition, and seed production to improve the usefulness of amaranth as a food source.  However, effective plant husbandry requires genetic and genomic resources, and building these resources has been inhibited by the high cost of genome sequencing and assembly.

 

Genome assembly Hi-C Orphan Crop

Dr. Jeff Maughan (left) and Dr. Damien Lightfoot (right), are the primary authors of the amaranth genome paper.

Dr. Jeff Maughan, professor at Brigham Young University, is a champion of orphan crop genomics.  Over the past year, Dr. Maughan and his team built a reference-quality amaranth genome on a tight budget.  They built upon an earlier,  short-read assembly by adding Hi-C data, which measures the conformation of chromatin in vivo, as well as low coverage long reads and optical mapping data.  After using optical mapping to correct assembly errors in the short read assembly, the Hi-C data was used to cluster the short genome fragments into nearly complete chromosomes using Phase Genomics’ Proximity-Guided Assembly platform, Proximo™ Hi-C, Then, the long reads were used to close remaining gaps on the chromosomes.  This cost-effective strategy recovered over 98% of the 16 amaranth chromosomes.

 

The completed reference genome provides an important resource for the community and will boost the efforts of plant breeders to unlock more agricultural benefits for amaranth.  In their paper, Dr. Maughan’s team demonstrated the utility of the reference quality genome in at least two ways.  First, they looked at chromosomal evolution by comparing the amaranth genome to the beet genome, which enables researchers to better understand amaranth in the context of how plants evolved, and second, they mapped the genetic locus responsible for stem color, which clarifies the scientific understanding of a useful agricultural trait.  Dr. Maughan points out that both of these experiments would have been impossible without the chromosome-scale genome assembly afforded by Proximo Hi-C.

 

A high-quality reference genome is the first of many important steps towards creating a modern breeding program for amaranth. We contacted Dr. Maughan to learn more about how he is improving amaranth genomics and the importance of orphan crops.

 

What is an orphan crop? 

According to the FAO (Food and Agriculture Organization of the United Nations) the world has approximately 7,000 cultivated edible plant species, but just five of them (rice, wheat, corn, millet, and sorghum) are estimated to provide 60% of the world’s energy intake and just 30 species account for nearly all (95%) of all human food energy needs.  The remaining species are underutilized and often referred to as “orphan crops”.

 

How is genomics relevant to orphan crops?

Would you invest your entire 401K savings in just three stocks?  In essence, that is what we are doing with world food security.  This comes with tremendous risk.  If we are going to diversify our food crops, it will be with these orphan crops.  Modern plant breeding programs leverage genomics to significantly enhance genetic gain (yield), such methods will undoubtedly expedite the development of advanced varieties in orphan crop species.

 

What are the challenges facing researchers interested in orphan crop genomics?  How have you overcome them?

Funding has long been the main obstacle to developing genomic resources for orphaned crops.  The development of cheap, high-quality next-generation sequencing technology has dramatically ameliorated this problem – making genomics accessible for most plant species.

 

You used two scaffolding technologies for your assembly, Hi-C, and BioNano. How did they compare?

Both technologies are extremely useful and complementary but address different genome assembly challenges.  The Hi-C data allows for the production of chromosome length scaffolds, while the BioNano data allows for fine-tuning and verification of the assembly.

 

Beyond building a high-quality genome assembly, what other genomic resources are required to encourage the adoption of orphan crops?

While genomic resources (such as genome assemblies and genetic markers) are fundamental for developing a modern plant breeding program, often what is missing with orphan crops is the collection of diverse germplasm (or gene bank) that is the foundation of a hybrid breeding program.  The U.S. and other nations have extensive collections (tens of thousands of accessions) that serve as the genetic foundation for staple crop breeding programs – unfortunately, such collections are minimal or non-existent for orphan crops.

 

Who stands to benefit the most from a complete amaranth genome?  How do you disseminate your work to them?

We collaborate extensively with researchers throughout South and Central America, where amaranth is already valued as a regionally important crop.  Dissemination of our research occurs though traditional methods (e.g., peer reviewed publications) as well as through sponsored scientist and student exchanges.

 

Amaranth is used in a variety of interesting foods, what’s your favorite dish?

Alegría, which is made with popped amaranth and honey, and is common throughout Mexico.

 

Spotlight on Hi-C in Science: New Technologies Boost Genome Quality

Science writer, Elizabeth Pennisi, outlines available genomics technologies that are helping researchers improve genome assemblies with a focus on Hi-C’s ability to bring genome assembly to the chromosome-scale.

This article, by Elizabeth Pennisi, focuses on how new technologies are making genome quality much better.  Long-reads, optical maps, and Hi-C data are being synergistically applied to improve modern genome assemblies including goat (Dr. Tim Smith), humming bird (Dr. Eric Jarvis), maize, and more.  Importantly, Hi-C provides the finishing touch to these genomes, by providing ultra-long contiguity information that can scaffold entire chromosomes. We, at Phase Genomics, are glad researchers have chosen Proximo Hi-C to scaffold the goat, hummingbird, and hundreds of other assemblies into contiguous chromosome-scale reference genomes.

 

Read the article here

Hi-C Used to Assemble Extremely Large, Difficult Barley Genome

Barley is the 4th most cultivated plant in the world and has been a reliable food source for over 10,000 years. Genome Web reports on the exceptional state of the genome assembly and how researchers used Hi-C technology to tackle this extremely complex genome.

 

The barley genome, like many other grains, is notorious for being extremely difficult to assemble due to extensive polyploidy, long repeat regions, and its large genome size (5.3 Gb). However, the Barley Genome Sequencing Consortium (IBSC) used Hi-C to tackle this genome assembly, producing chromosome-level scaffolds representing over 95% of the genome in an attempt to understand the biology of this widely cultivated plant. After completing the assembly, the researchers began annotating the genome and identified over 87,000 different genes, publishing their findings in Nature.

 

Obtaining reference-quality assemblies for complex genomes, such as barley, used to be an extremely challenging endeavor. With Hi-C, obstacles like polyploidy and multi-Gb genomes are manageable due to its ability capture ultra-long-range genomic contiguity information from unbroken chromosomes, replacing the need for genetic maps. This ability enables researchers to answer questions otherwise difficult or impossible, including structural variation, complex gene structure, gene linkage, gene regulation, and more. While the researchers performed the barley assembly themselves, Phase Genomics’ Proximo Hi-C service makes it easy for any researcher to obtain similar results and has been used to assemble hundreds of genomes to chromosome-scale over the past two years, including complex genomes like barley.

 

Read more about the barley genome on Genome Web.

Spotlight on Hi-C in The Atlantic: The Game-Changing Technique That Cracked the Zika-Mosquito Genome

One of the most prolific science writers, Ed Yong, profiles how Hi-C sequencing technologies can make genome assembly easier and more cost-effective than ever before. 

Science writer Ed Yong covers the narrative on the researchers’ tackling the disease carrying Aedes aegypti genome, and how Hi-C “knitted” the genome from 36,000 pieces into complete and contiguous chromosomes. Yong points out that the completed genome will not only help scientists better understand the biology of the mosquito at a much deeper level, but it also marks a technological pivot in genomics: Hi-C makes genome assembly cheaper, more accurate and faster than ever before. Also, mentioned in the article: our collaborator, Dr. Catherine Piechel’s newly published three-spine stickleback genome, and Dr. Erich Jarvis’s hummingbird were also cited as examples of the power of Proximo Hi-C scaffolding.

 

Read the article here

Papadum’s Recipe for an Outstanding, Chromosome-Scale Genome with Hi-C

Meet Papadum the Goat! Papadum is a descendent from a rare population of goats that used to inhabit the San Clemente Island, and notably, Papadum also now holds the world record for the most contiguous non-model mammalian genome.  The recipe for a his amazing de novo genome assembly? Long reads, optical mapping, and Proximo Hi-C genome scaffolding. Read NIH’s article about Papadum’s genome here.

 

The goat genome has been of scientific interest for several reasons: goats are important suppliers of milk, cloth, meat, and more. But prior to the Papadum genome, scientists’ ability to fully understand how the goat genome controls its biology was limited. As a part of the “Feed the Future” initiative, in 2014 the U.S. Agency for International Development awarded innovative scientists Dr. Tim Smith, Dr. Derek Bickhart and Dr. Adam Phillippy a grant to attempt to eliminate these limitations by assembling Papadum’s genome. As pioneers in the genomics field, the scientists teamed up to leverage two rather young technologies, long reads and Hi-C, to create an ultra-high-quality new assembly of the goat genome.

 

Their efforts ultimately led to the creation of the highest quality de novo genome assembly of a mammal to date and are published in Nature Genetics.  With this new reference-quality goat genome, scientists will have a better understanding of goat biology and health to guide better breeding decisions, improving traits like milk production, meat quality, and resilience from disease.

 

The Papadum genome assembly includes large DNA sequences called “chromosome-scale scaffolds” which are nearly complete representations of entire chromosomes from Papadum. These chromosome-scale scaffolds are critical achievement that allows far better understanding of the mechanics of the goat genome than earlier, less advanced results, which included thousands of tiny fragments of chromosomes and lacked the overall structure of the goat genome. The difference is not unlike having an entire intact book, versus a jumble of all the individual words from the book.

 

The ability to reconstruct nearly complete chromosomes was made possible largely by a new technique called Proximity-Guided Assembly, performed with Phase Genomics’ ProximoTM Hi-C scaffolding technology. This process was followed by a tool called PBJelly, which identifies and closes gaps (regions of uncertainty) in the chromosome-scale scaffolds. After Proximo and PBJelly, the resulting assembly included 31 chromosome-scale scaffolds containing only 663 gaps total across the 3 billion base pair diploid genome. Descended from research first published in 2013, Phase Genomics has since successfully demonstrated the success of the Proximo Hi-C scaffolding method in the genomes of plants, animals, fungi and more.

 

Papadum’s genome marks the beginning of an era where reference-quality genomes are achievable and affordable for any organism, not just extensively studied model organisms like mice, fruit flies, and humans. The availability of these extraordinarily complete genomes enables scientists to answer many new biological questions that have the potential to help farmers, government agencies, agricultural companies, and developing countries solve a significant part of the food security problem.

 

Read more about the grant, the scientists, and Papadum’s genome on the NIH’s National Human Genome Research Institute website.