Tag: Genome Assembly

Catching Evolution in the Act

Scientist studying chromosomes

 

Genome sequencing has confirmed some long-held theories about the blueprints of life. But it has also unearthed quite a few surprises. Scientists once hypothesized that the human genome consisted of upward of 100,000 genes. The decades-long Human Genome Project — as well as many next-generation sequencing studies — have prompted the downward revision of that figure to a relatively spartan 20,000 genes, more or less.

 

Evolution in action

 

If there is a lesson in this vast overestimation to our gene load, it is perhaps that evolution shapes genomes in unexpected ways.

 

The advent of more nimble and lithe methods for genome assembly and analysis holds the promise to unearth the surprises that evolution has wrought. These relatively new advancements include tools like Phase Genomics’ ultra-long-range sequencing, which reconstructs the sequence of chromosomes by using positional relationships between DNA sequences in the genome. These methods have grown sufficiently sophisticated to catch the quick transitions that transform populations and species.

 

Recently a team led by Dr. Leonid Kruglyak at UCLA employed these tools to catch evolution at work. Their discovery relates to sex determination, a complex developmental process that, in animals, generally kicks off when an immature gonad develops into either testes or ovaries. In humans and many animals, sex determination is governed largely by genes, and in turn shapes their genomes and evolutionary trajectories like few other biological processes can.

 

That special pair

 

For species with full genetic control over sex determination, the process often leaves its imprint on the genome in the form of sex chromosomes. In most animals, genomes consist of pairs of chromosomes called autosomes. But in addition to those autosomes, many animals — including us — harbor another set of chromosomes called the sex chromosomes. Sex chromosomes govern — or at least try to govern — whether the gonads develop into ovaries or testes, which  in turn influences the development of genitals and secondary sex characteristics.

 

Scientists have long theorized that sex chromosomes evolve from autosomes. Studies of young, relatively new sex chromosome systems, like those in the medaka, indicate that the transition happens fast. Yet the steps that transform a pair of autosomes into sex chromosomes are at best murky, with many questions unresolved. Much could be answered by catching this transition from autosome to sex chromosome in the act.

 

Behind the curtain

In a paper published June 1 in Nature, Dr. Kruglyak and his colleagues announced that they have found just such a transition: an animal with a pair of autosomes that is beginning to act like sex chromosomes. The researchers utilized Phase Genomics’ Proximo™ genome scaffolding platform and PacBio long reads to sequence and assemble a highly complete genome for a microscopic, freshwater flatworm, Schmidtea mediterranea. In many parts of its natural habitat across the Mediterranean basin, S. mediterranea reproduces by budding, without the need for sex. But some populations in Corsica and Sardinia produce the next generation through sexual reproduction.

 

The team, including lead and co-corresponding author Dr. Longhua Guo at UCLA, discovered that in these sexual strains of S. mediterranea, one pair of autosomes shows evidence of almost no genetic exchange, also known as recombination, during reproduction. This is a telltale signature of sex chromosomes. In addition, they saw that the unusual pair of autosomes harbors a large contingent of genes that play a role in developing sex-specific characteristics. Taken together, these genomic data finger these autosomes as a “sex-primed” pair that are in the process of evolving into fully fledged sex chromosomes.

 

Photo finishes

 

Future studies of S. mediterranea’s nascent sex chromosomes will likely fuel fresh inquiry and debate about this rarely-seen evolutionary transition. The answers will stretch far beyond flatworms. Studies of other recently evolved systems, such as in stickleback fish, show that sex chromosomes can play a decisive role in other poorly understood evolutionary transitions, such as the rise of a new species.

 

Beyond sex chromosomes, this study demonstrates the raw interrogative power of modern genome assembly and analysis methods. They can capture transitions — even the most brief and ephemeral. Applied appropriately, methods like these can help scientists make sense of a myriad of messy, complex processes that evolution shapes. These include some issues that hit as close to home as gonads, from curbing the spread of antibiotic resistance to protecting pollinators from annihilation. Evolution moves quickly. Now, so can we.

 

Better together: long-range and long-read DNA sequencing methods, combined, reach record heights in microbiome discovery

Microbiome plate and Phase Genomics logo. Reads "Breaking records in microbiome discovery"

 

Click here for an updated blog post.

 

Since its debut, next-generation sequencing has not rested on its laurels. Improved sequencing platforms have reduced error and lengthened reads into the tens of thousands of bases. The debut of long-range sequencing methods that are based on proximity ligation (aka Hi-C) has brought a new order-of-magnitude into reach by linking DNA strands with their neighbors before sequencing.

 

This progress has birthed high-resolution metagenomics, the sequencing and assembly of genomes from environmental samples to study ecosystem dynamics. But metagenomic experiments often undersample microbial diversity, missing rare residents, overlooking closely related organisms (like bacterial strains), losing rich genetic data (like metabolite gene clusters), and ignoring host-viral or host-plasmid interactions.

 

A revolution within a revolution

 

New sequencing platforms and methods can reform metagenomics from within. Long-read platforms, such as the PacBio® Sequel® IIe system, now yield HiFi reads of up to 15,000 base pairs with error rates below 1%. In addition, Phase Genomics created ProxiMeta™ kits to generate proximity-ligated long-range sequencing libraries, which preserve associations between DNA strands originating in the same cell.

 

In a study posted May 4 to bioRxiv, a team — led by Dr. Timothy Smith and Dr. Derek Bickhart at the U.S. Department of Agriculture and Dr. Pavel Pevzner at the University of California, San Diego — employed both PacBio HiFi sequencing and ProxiMeta in a deep sequencing experiment to uncover record levels of microbial diversity from a fecal sample of a Katahdin lamb. Combined, PacBio HiFi sequencing and ProxiMeta eased assembly, recovered rare microbes, resolved hundreds of strains and haplotypes, and preserved hundreds of plasmid and viral interactions.

 

HiFi family trees

 

The team constructed SMRTbell® libraries to generate HiFi data, and ProxiMeta kits to generate long-range libraries. The two datasets, along with the metaFlye and ProxiMeta algorithms, allowed them to assemble contigs and create draft genomes without manual curation.

 

Researchers compared the breadth and depth of HiFi data-derived metagenome-assembled genomes, or MAGs, to control MAGs from assemblies of the same sample made using long, error-prone reads. HiFi data yielded more complete MAGs — 428 versus 335 — from more bacteria and archaea. HiFi data also generated more low-prevalence MAGs, capturing a larger slice of the community’s diversity by picking up more genomes from less common residents.

 

The HiFi MAGs also contained more than 1,400 complete and 350 partial sets of gene clusters for synthesizing metabolites such as proteasome inhibitors, which likely help some of these microbes colonize the gut. HiFi data picked up about 40% more of such clusters than control MAGs, illustrating just how much data is lost when long reads aren’t also highly accurate reads.

 

The team also used the HiFi MAGs to trace lineages within the community. They computationally resolved 220 MAGs into strain haplotypes, based largely on variations within single-copy genes. One MAG had 25 different haplotypes, which are likely strains of the same genus or species.

 

ProxiMeta’s long-range discoveries

 

The ProxiMeta-generated libraries added flesh to these MAG frames skeletons by unveiling additional rich biological information. Long-range sequencing linked nearly 300 HiFi-assembled plasmids to specific MAGs — revealing the species that hosted them. One plasmid, for example, was found in bacteria from 13 different genera. Long-range data also identified the first plasmids associated with two archaea, Methanobrevibacter and Methanosphaera.

 

Long-range sequencing illuminated the viral burden in this community. The HiFi library included nearly 400 viral contigs, more than half of which came from a single family of viruses that infect both bacteria and archaea. The team identified 424 unique viral-host interactions, including 60 between viruses and archaea, which is a more than 7-fold increase over controls.

 

What’s around the bend?

 

This study has lessons beyond one lamb’s gastrointestinal tract. It shows decisively that the highly accurate long reads generated by HiFi sequencing ideal partners for Hi-C-derived methods like ProxiMeta — together generating increasingly sophisticated metagenome assemblies for biologists to interrogate.

 

Applied to other environmental samples, this platform could illuminate the diversity and complexity of other microbial communities — from the bottom of the sea to mountain peaks, and within the stomach of every human being. It could probe pressing issues of our day, such as antibiotic resistance, soil health, or how microbes can break down pollutants. These endeavors will not just fuel the engines of scientific inquiry. Broader use of this method could generate new insights into pressing problems of our times, including antibiotic resistance.

New genome assembly method makes fruitful advances in genomic technology

 

A collaboration between Phase Genomics and Pacific Biosciences of California is bringing about the next generation of genome assembly technology. A newly published software tool, FALCON-Phase, combines genomic proximity ligation methods developed by Phase Genomics™, with the high accuracy, long-read sequencing data from PacBio®, enabling researchers to create haplotype-resolved genome sequences on a chromosomal scale, without having parental genome data. This method and its application to several animal genomes was published today in Nature Communications.

cow, zebra finch, and human hand arranged in a collage

Humans, as well as other animals, carry DNA sequence copies from both parents. These parental sequence “haplotypes” can carry millions of mutations unique to one of the parents and are often very relevant to diseases and other genetic traits. Until recently, accurately separating paternal and maternal mutations on the whole-genome scale required sequence information from the individual parents or extensive efforts that relied heavily on imputation from population studies. The new method employs the physical proximity information captured by proximity ligation (a technology also known as “Hi-C”) to separate maternal and paternal haplotype information from long-read genome assemblies. This development significantly increases the actionable information content coming out of genome sequencing studies.

 

 

“It’s an exciting time for genome assembly and PacBio HiFi sequencing continues to lead the way in this area with its powerful combination of read length and accuracy,” wrote Jonas Korlach, Chief Scientific Officer at Pacific Biosciences. “Phase Genomics Hi-C complements PacBio technology by extending our data into the ultra-long-range domain, enabling us to connect phase blocks and deliver chromosome-scale diploid assemblies without parental data. We are fortunate to have this excellent partnership with Phase Genomics, and we look forward to continuing to work together to create the highest quality reference genomes available.”

 

Assembling two fully-phased genomes in a single, streamlined process not only saves on the costs of research, but it also enables scientists to upgrade their genome assembly pipelines and obtain previously unobtainable information.

 

Dr. Erich Jarvis, professor at Rockefeller University and chair of the international Vertebrate Genomes Project, wrote, “Chromosome-scale haplotype phasing is critical for generating accurate genome assemblies and for understanding genomic variation within a species.” Furthermore, FALCON-Phase produces maternal and paternal haplotypes without family-trio data, so it can be applied to wild-caught samples or organisms lacking pedigree information. Jarvis notes, “In wild populations that many work with, parental samples are usually unavailable and therefore we need a method that can phase paternal and maternal sequences in the offspring individuals. With FALCON-Phase, we are able to use the Hi-C data that we have already generated for genome scaffolding and add a new dimension to every genome assembly, even retrospectively for previous projects. Our collaboration with Phase Genomics and PacBio has been extremely fruitful and the combination of the two technologies through FALCON-Phase will be highly beneficial to genomic sequencing efforts focused on conservation.”

 

FALCON-Phase is applicable to any diploid genome, including plants, animals, and fungi. It is available as free of charge open-source software (https://github.com/phasegenomics/FALCON-Phase) and Phase Genomics offers services that include the application of this method to varying genome projects. See the latest news and publications on this and other genome assembly methods at https://phasegenomics.com/resources-and-support/publications/.

 

For more information, email us at info@phasegenomics.com.

Breaking the Mold: New Tech Sheds Light on 5 Mysteries of the Fungal World

 

This month Phase Genomics is celebrating #FungusFebruary by highlighting some of the unique capabilities of our Hi-C technology to solve age-old mysteries in the world of fungal genetics and deliver new potential for researchers to understand fungi, all while helping solve global crop crises and develop new groundbreaking pharmaceuticals.

While we wield the power of genomics to explore the wonders of fungi today, a few centuries ago people dismissed them as just weird plants. Eventually microscopes and anatomical studies revealed fungi as a distinct flavor of life — some varieties quite tasty — but educational experts today continue to bemoan the lack of lessons on fungi in biology curricula, and research on fungi — even those that cause disease — lags.

As a result, scientists lack much basic information on the genetics, life cycles, and reproductive habits of many fungi — even though members of this kingdom could help address a bevy of challenges in food and energy production, illuminate the evolution of complex life and even shelter us on Mars.

Genome studies on fungi of all stripes can resolve evolutionary relationships and ecosystem dynamics, identify metabolites of commercial and medical interest and — for fungi that cause disease — reveal biochemical and genetic targets to help us fight pathogenicity.

Like their animal and plant cousins, fungal genomes also have their challenging parts, including repeats, duplications and structural elements that complicate both sequencing and assembly. Recently, the chromosome conformation method “Hi-C” and advances in next-generation sequencing have helped untangle some of these sticky genomic knots, and show promise in taming genomes across this diverse and neglected kingdom of life.

 

        1. High-resolution mapping of centromeres

 Hi-C’s power lies in its ability to identify regions of the genome that reside in close proximity to one another in the nucleus — information that essentially captures the 3D organization of the genome. But Hi-C doesn’t just identify where particular chromosomes reside within the nucleus. It can also help identify functional elements in genomes that are difficult to identify in other ways.

That is what two groups of researchers (from the Pasteur Institute and the University of Washington) did when they used Hi-C to track down functional elements in yeast genomes — centromeres and rDNA clusters — both of which are typically repeat-rich and difficult to identify without laborious experiments involving functional assays or mapping the binding sites of rare centromere proteins. In fungal species, centromeres are held tightly together at the spindle pole body, and the team used this shared proximity to identify centromere locations in the genomes of numerous yeasts (and subsequently other fungi), despite not knowing their centromeric DNA sequence. Ribosomal DNA clusters similarly congregate in yeast nuclei, which one team exploited to identify their positions in Debaryomyces hansenii.

 

        2. High-quality genomes illuminate biochemical pathways

Fungi harbor a wide array of genes for synthesizing secondary metabolites, which range from harmful toxins to helpful pharmaceuticals. In fungi, genes for synthesizing secondary metabolites tend to occur in clusters, which are also thought to be sites of rapid evolution.

Phase Genomics worked with a University of Minnesota-led team and used Hi-C to generate high-quality genomes of six strains of Tolypocladium inflatum, an insect pathogen that has already given us the immunosuppressant drug cyclosporin. The new assemblies revealed major differences in secondary metabolite production between T. inflatum strains, including novel clusters, transpositions and clusters that may be involved in toxin synthesis. The bevy of discoveries from these assemblies showed how recombination can drive significant divergence even within a single species — and how important it is to build multiple high-quality genome assemblies that can capture that diversity.

 

        3. Fungal dikaryons and the hidden nuclear dance

The genetic differences between strains also apply to pathogenic fungi, like the stem rust, which parasitizes wheat. Phase Genomics partnered with a team led by scientists at CSIRO in Australia to apply Hi-C to stem rust – the particularly deadly scourge Ug99. Like many fungi, stem rust genomes are divided between two haploid nuclei. The team used Hi-C data to assemble complete haplotypes for both haploid genomes of both strains, and discovered that Ug99, a recent arrival that is decimating whole fields of wheat in Africa, has an unexpected origin: The strain arose through “somatic hybridization,” when hyphae from two strains exchange haploid nuclei. This may explain the strain’s sudden rise and deadly wake, and gives scientists new genomic information to understand Ug99’s virulence and identify weaknesses that could give wheat a leg up.

 

        4. Hybrids, beer, and fungal metagenomics

The ability to separate two nuclei from within the same cell can be extended to more complex samples.  Yeasts, which are integral players in brewing, will often hybridize to form new species containing genomes from two organisms at once (the famous lager-producing yeast Sacharomyces carlsbergensis is one example of such a hybrid).  But in a mixed microbial community, such as beer, wine, or a microbiome sample, how can DNA sequencing detect which genomes co-exist within the same cell?  One special power of Hi-C is that it traps sequences that are within touching distance of each other, and therefore must come from inside the same cell.  The Dunham lab at the University of Washington used this property to analyze an open-fermentation beer from a local brewery.  The exciting result was that they were able to discover a new hybrid yeast, later named Pichia apotheca, using Hi-C data to identify it as a hybrid bearing two genomes from related organisms.  This new hybrid species has since been used by home-brewers to ply their craft and gives beer a very unique flavor.

 

        5. The Epigenetics of Symbiosis

Nature has plenty of examples of plants and fungi getting along. One of them is Epichloë festucae, a filamentous fungus that has evolved a symbiotic relationship with certain grass species. When Phase Genomics worked with a Massey University-led team, they discovered that E. festucae’s genome carries hallmarks of this symbiosis. The analysis of Hi-C data revealed that important genes are clustered into blocks separated by repeat-rich regions. Hi-C and RNA-seq data together showed that genes within the blocks have similar expression patterns — indicating that genes needed for symbiosis with their grass hosts tend to cluster together in the same blocks.

 

Looking Forward

Cutting edge genomic technologies like Hi-C have the potential to keep making up for lost time and reveal even more intimate details of the hidden lives of fungi. This #FungusFebruary, it’s worth asking: What other mysteries about this long-overlooked kingdom are worth solving?

Phase Genomics Transformative Genome Phasing Tool (FALCON-Phase) Now Compatible with Nanopore Sequencing

Nanopore and Hi-C produce a new fully-phased, chromosome-scale genome for the red raspberry.

On October 22, scientists at KeyGene revealed the first fully-phased, chromosome-scale reference genome for the red raspberry, sequenced with Oxford Nanopore long-read technology and scaffolded and phased into full chromosomes using Phase Genomics’ Proximo™ Hi-C method.  

Assembling complex plant genomes used to be considered nearly impossible as they can be extremely large, polypoid, and contain highly repetitive regions. Long-read sequencing generates genomic data spanning very long regions, but still needs to be scaffolded, or “put together” into chromosomes. Proximo Hi-C not only helps guide the assembly to produce chromosome-level scaffolds but can also tell which sequences and mutations come from the maternal and paternal chromosome copies (this is called phasing). Our phasing method, FALCON-Phase was originally released in 2018 and was used in conjunction with the Proximo pipeline to generate this “platinum level” raspberry genome.

Read more about the assembly and future directions for the project here.

The Highest-Quality Genomes: Q&A on Cannabis Genomics

 

Co-author Kevin McKernan of Medicinal Genomics talks more about the past, present, and future of cannabis genomic research. Read more about his newly published cannabis genome assembly project using Proximo Hi-C scaffolding featured in The Genetic Literacy Project.

 

What is the difference between hemp and marijuana? How can we use genomics to answer this question?

 

McKernan: The legal definition of hemp is any Cannabis sativa that has less than 0.3 percent THC acid, or THCA. Historically, hemp has been grown for fiber and the exceptional nutritional content of its seed. THCA expression is genetically controlled at what has been historically referred to the Bt:Bd allele. Next-generation sequencing technologies are giving us our first glimpse of this complicated locus.

 

Why are you interested in assembling the Cannabis genome? What are you hoping to accomplish?

 

McKernan: A refined genome assembly will enable molecular breeding programs to deploy marker-assisted selection for yield, flowering time, pest resistance and rare cannabinoid expression. It will likely shed light on the heritability of hermaphroditism and apomixis. A clearer picture of the genes involved in cannabinoid and terpenoid expression will enable more intelligent breeding and synthetic biology programs.

 

Which genes are responsible for cannabidiolic acid production and how do these genes vary between the cultivars?

 

McKernan: The Cannabis plant makes 113 different cannabinoids. There are three well-understood cannabinoid synthesis genes. These highly similar genes all compete for a common precursor molecule. Mutations in these genes affect gross cannabinoid expression. A more refined reference may enlighten us to the genetic variants that can more accurately estimate THCA levels to segregate hemp and drug-type seed stocks.

 

What other hidden gems did you find in the Cannabis genome after you finished the assembly?

 

McKernan: The most exciting picture is the 2.1Mb CBCAS (cannabichromenic acid synthase) gene cluster seen the Jamaican Lion assembly. This has 9 tandem copies of CBCAS all directionally orientated that are 99.4-99.9 percent identical and separated by 30-80kb long terminal repeats. This region has been an assembly knot for over seven years and I think the only reason it is visible to us today is due to novel sequencing tools we didn’t have in 2011.

 

Why is the Cannabis genome so difficult to assemble? Are there unique genomic features (i.e. copy number variants, special repeat classes, segmental duplications) that are especially troublesome?

McKernan: Its 1.07Gb genome consists of 10 chromosomes, with 73 percent repeat, 66 percent AT and 0.5-1 percent polymorphic. The genes that contribute to chemotype are under the most selective pressure and have hijacked long terminal repeats to enable gene expansions. We had suspicions of this back in 2011 but could never assemble the region to prove it.

 

Why was it important to obtain chromosomes for your assembly? How did Hi-C help?

 

McKernan: The Pacific Biosciences assembly delivered us an assembly that was an amazing leap forward from the Illumina assemblies, but it is not chromosomal in scale. Hi-C has helped to organize these contigs into chromosomes and it can do this without having to make linkage maps.

 

What did you find to be most useful in working with Phase Genomics?

 

McKernan: Hi-C is very complimentary to PacBio sequence data and is the only technology that delivers long range information without having to make high molecular weight DNA. This is very important in Cannabis as it is difficult to get high molecular weight DNA out of the plant.

 

What would you like other researchers, breeders or regulators to take away from your high-quality genome assembly? How do you think this genome assembly will be utilized in the future?

 

McKernan: We also need dozens of genomes sequenced to the quality level of Jamaican Lion to get a full picture of these complex cannabinoid loci. We need Hi-C libraries to better understand the microbiome of the plant, so we can more intelligently manage pathogenic threats that affect yield. Many endofungal bacteria like Ralstonia are found in metagenomic sequencing studies in Cannabis flowers and can be a risk to consumers and negatively impact plant yield. Ralstonia is also notorious for contaminating many metagenomic studies due to contamination in library construction kits. We suspect Hi-C will play important roles in segregating live versus dead DNA and resolving these contamination problems.

 

What regulatory challenges do you run into when working on Cannabis genomics?

 

McKernan: The biggest issue at the moment is that the movement of tissue, other than sterilized stalk, is currently federally prohibited in the U.S. This makes RNA studies very challenging as RNA isolation has to be performed in the field. Movement of DNA or cross-linked chromatin is legal, so this is a compelling case for the use of Hi-C in the Cannabis field (insert Hi-C pun here). Phase Genomics’ kits were critical, as shipping certain tissues is restricted.U.S. federal funding also remains restricted. We turned to the Dash Distributed Autonomous Organization for funding to rapidly sequence and publish the genome. We applied for funds in May of 2018 and had the first assembly public on August 2. This is a very generous contribution by Dash because any U.S. university that attempts to handle the plant places their federal funding at risk.

 

What genomic evidence suggests that Cannabis has been selectively bred by humans?

 

McKernan: I think the elevated THCA levels witnessed since prohibition — combined with the long terminal repeat-driven expansion of the synthase genes — is the best evidence we have.

 

What is your favorite fact and what is your least favorite misconception about Cannabis?

 

McKernan: My favorite thought experiment regarding the rapid reproduction of Cannabis is that its genome is very likely spreading through space and time more quickly than the human genome, and it evokes much of David Sinclair’s work on Xenohormesis. My least favorite misconception is the false dichotomy of medical versus recreational cannabis consumption. I think this showcases our reactionary health-care mindset as opposed to the preventative mindset we need to strive for. If you disregard recreational use, you are likely going to require more medical use. These compounds have been in our diet for thousands of years. We now know mutations in human endocannabinoid system-related genes are associated with neurological phenotypes and a large class of idiosyncratic diseases are now being recognized as clinical endocannabinoid deficiency (CED). It was incredibly naïve and destructive to remove cannabinoids from the American diet in 1937.

 

What do you think the future holds for the cannabis industry?

 

McKernan: In states that legalize cannabis, there is a 15 percent reduction in alcohol consumption, a 25 percent reduction in opiate overdoses, a 17 percent decrease in Medicare opiate usage and a 25 percent reduction in general pharmaceutical use. There is a 10 percent reduction in suicide and a 72 percent reduction in PTSD nightmares. The benefits to epilepsy have survived FDA scrutiny. This is the most disruptive market force we have seen in healthcare since the internet and next-generation sequencing. We are now just witnessing the alcohol industry take multi-billion dollar positions in the cannabis industry. It is only a matter of time before the pharmaceutical industry begins to hedge their losses as well. I am betting against the endocannabinoid mimetic known as acetaminophen and in favor of the less-toxic phytocannabinoids like cannabidiol.

 

 

About Phase Genomics

Seattle-based Phase offers research services and kits based on its Hi-C and proximity-ligation technologies, which enable chromosome-scale genome assembly, metagenomic deconvolution, and the analysis of structural genomic variation and genome architecture. Phase Genomics offers Hi-C genomics tools for genome scaffolding and phasing. Learn more about Proximo and bring the power of Hi-C into your lab today by purchasing one of our Hi-C kits.

How it Works: Proximo Hi-C Genome Scaffolding

Q&A with Co-Authors About Bees, Mites, and Their Genomes

Co-authors Dr. Alexander Mikheyev of the Okinawa Institute of Science and Technology and Dr. Jay Evans from the U.S. Department of Agriculture’s Bee Research Laboratory had such great answers that we wanted to share some of them. This research was also featured in ZME Science.

Why is it important and useful to have a high-quality genome for Varroa species? Is there any combined value with the recently published bee genome?

Dr. Mikheyev: Understanding the mechanisms of parasitism requires detailed information about the organization of the genome. Many recent ideas for fighting Varroa rely on molecular tools, which in turn rely on genomic data. Furthermore, a good genome enables us to understand the coevolutionary interactions between mites and the bees. For now, our studies are focused on understanding how the mite has evolved to become a better parasite. However, my lab is also looking at the bee side of the coevolutionary interaction. Having high-quality genomes for both will allow us to identify genomic regions and genes involved in coevolution.

Why did you choose to use Hi-C? Why did you need chromosomes for your genome assembly?

Dr. Evans: From prior genome efforts, we had no information on the physical positions of mite gene features. Now with these in place, we can leverage synteny information from other arthropod genomes and narrow searches for some hard-to-find proteins like olfactory receptors, which often occur in clusters. Generally, the improved genome helps us know what might be unique to Varroa — and therefore a novel clue into their biology and control.

Dr. Mikheyev: One element of this study was to look at patterns of gene duplication, which could indicate diversification of particular gene families. Having a contiguous genome allows us to better localize these duplications and confirm that the different copies are homologous. In the future, when we’ll be looking at signatures of selection, a really powerful approach is to identify genomic regions with reduced genetic diversity. Having adequate chromosomal scaffolding will be essential there.

What genomic clues were found in the two Varroa species that may contribute to parasitism?

Dr. Evans: We found a clear set of genes for the proteins — olfactory receptors and others — that these mites must be using to react to their bee hosts. Hopefully, knowing these proteins will lead to smarter controls and insights into why each species maintains a specific host preference.

Dr. Mikheyev: For us, the most striking finding is this: The evolutionary trajectories of both mites, despite their similarities and close relatedness, were completely dissimilar. At this stage, it is still a bit hard to tell specifically what the selective pressures were and what the mites are adapting to. Curiously, in both species, genes involved in stress tolerance and detoxification were already under selection. This most likely happened before they ever faced miticides and suggests that they may have pre-adapted strategies for dealing with our chemical warfare strategies against them. We hope to tackle this in an upcoming study looking at population-level differences between mites adapted to original and novel hosts.

How do you hope these genomes will be used to help save honey bees?

Dr. Evans: Prior genome drafts had enough gaps that we missed candidate proteins for mite control. These mite genomes will lead to focused efforts to target pathways or traits not found in bees by techniques like small molecules, biological controls, and RNA interference.

Dr. Mikheyev: They can be used to develop new strategies for Varroa control. Also, in upcoming studies looking at how mite populations are adapted to original vs. switched hosts, we hope to identify genes and genomic regions that are specifically important in host switches.

Is there any genomic evidence that the western honeybee could be developing resistance to these pests?

Dr. Evans: Yes. Some bee breeders are targeting these traits, from behaviors to virus resistance. A recent, improved assembly of the honey bee genome — aided in part by Hi-C sequencing — is being used for trait identification and marker-assisted breeding right now.

Dr. Mikheyev: They most definitely are. Intriguingly, wild populations of honey bees seem to evolve tolerance to the mites relatively quickly. In one of my favorite studies, a USDA-monitored population in Louisiana first saw high mortality upon the arrival of Varroa, but a few years later colonies lived even longer than before. There are resistant populations known in the U.S. and in Europe, and resistance is a trait that can be selected. How this adaptation takes place in the bee is really interesting, and something we’ll continue to look into.

Isolating Varroa mites from bees involves a creative use of powdered sugar. How do you think this technique came about?

Dr. Mikheyev: We don’t know. The papers describing this method are pretty prosaic. It seems that in the late 1980s, wheat flour was used to control Varroa by knocking them off the bees — and eventually, someone tried sugar.

Dr. Evans: Since they’re attached to their bee hosts, researchers have used a variety of ‘irritants’ to get mites to fall off. Powdered sugar is safe for the bees and might even be an extra calorie boost. The bees pull sugar from each other and the mites fall off — mostly because of the sugar itself, but also because the grooming bees find them.

What is your favorite weird food that involves honey?

Dr. Mikheyev: It’s not really a food since it is honey, but I love the fact that the giant honey bees of Nepal make psychedelic honey from Rhododendron flowers. The story is worth tracking down for no other reason than the dramatic photos of the men that harvest this honey from sheer cliffs.

Dr. Evans: Honey lemonade. Sorry, I am required by my kids to not say weird things.