Tag: Hi-C

A Year in Review: 2024

image of microscope and dna with Phase Genomics logo

 

Phase Genomics continues to pioneer genomic innovation, driving advancements in human health and life science research over another impactful year. Our efforts range from developing novel tools for detecting chromosomal abnormalities to managing the world’s most extensive phage-host interaction atlas, thereby accelerating genomic research and promoting a healthier future.

Our ultra-long-range sequencing technology is fostering advancements across a wide array of research applications, both at Phase Genomics and in laboratories worldwide. Utilizing our microbial platform, we are driving transformative discovery in metagenomics and ecology while making bounds in human health with cutting-edge approaches to antimicrobial research and oncology.

Thank you to our supporters, collaborators, and clients for their contributions to making this an outstanding year. Here are some key highlights from 2024.

 

Insights in oncology

This year, our cytogenetic platform uncovered novel, clinically-relevant chromosomal aberrations critical for assessing patient care in oncology. Genomic Proximity Mapping™ (GPM) is an upgraded approach to cytogenetic analysis – meeting and  surpassing current risk stratification assessments. Over the summer, researchers at Fred Hutchinson Cancer Center and University of Washington Medical Center published research that used GPM to analyze 48 patient samples, identifying known and novel chromosomal aberrations. Read the MedRxiv preprint to discover the expanding possibilities in leukemia research here » image of DNA

 

Cow burps, super bugs, and our enemy’s enemy–phages

We are actively developing solutions to address the growing threat of antimicrobial resistance and in the same stroke, advancing environmental health efforts with lysin discovery. By leveraging metagenomic data and AI, scientists can harness the evolutionary power of bacteriophages to target and eliminate harmful microbial pathogens with precision. Discover how we are turning the tables using our antimicrobial discovery platform with support from the Bill & Melinda Gates Foundation in our blog here »

 

A novel approach to vaccination

Phase Genomics’ metagenomic deconvolution technology helped crack the code on a potential new vaccine for farmed salmon to defend against sea lice by targeting the parasite’s microbiome. Published in The Economist, researcher Cristian Gallardo Escárate shares results that led to the creation of the groundbreaking invention that could ease global environmental impacts of salmon farming. More here » image of a salmon

Diving into the data 

Two new data analysis tools were made available to ProxiMeta and CytoTerra platform users this year: ProxiMeta™ Explorer and CytoTerra® Curator.  

ProxiMeta™ Explorer is an interactive, cloud-based genome-resolved metagenomic analysis platform for data visualization and exploration. The platform provides fully customizable analyses and reports for tracking genomes across time, conditions, groups, and more with a click of a button. 

CytoTerra® Curator enables users to effortlessly review, revise, and generate reports from cytogenetic data – no prior bioinformatics experience required. From curating calls to constructing circos plots, Curator provides fast, accurate insights for human genomics and oncology research.

 

Tune in to this year’s podcasts

Listen to Phase Genomics CEO, Ivan Liachko, discuss the breadth of applications supported by Phase Genomics’ ultra-long-range sequencing technology in these podcast episodes. Discover the story behind commercializing and implementing biotech innovations and get a glimpse at where this technology is taking us. 

 

Looking Forward

In 2025, we aim to elevate our technology to new heights and broaden our impact across science and medicine. We hope you will follow us on our journey on X, LinkedIn, and BlueSky as we lead genomics innovation to an insightful and healthy future. 

 

Happy New Year from our team at Phase Genomics!

 

 

An Ancient Fungal Affair

two fungi exchange love letters in a whimsical forest scene

 

New genomic technology reveals the parental past of “ancient asexuals,” paving a route to crop engineering and soil remediation with symbiotic fungi

 

In a warming, crowded world, we need more help than ever from plants. But maximizing the bounty from crops — from food to fuel to fibers — means coaxing plants to draw minerals and nutrients from soil more effectively, and paying special heed to the tiny, often-overlooked fungi that make this possible.

Plant roots have symbiotic relationships with fungi that stretch back eons. For example, arbuscular mycorrhizal fungi, or AMF, have been cozying up to plant roots for at least 400 million years. In exchange for carbon-rich lipids from their hosts, AMF — named for the branch-like structures their bodies form within plant roots — help host defenses against pathogens, deliver water and increase absorption of nutrients rich in nitrogen, potassium and phosphorus. They also boost plant diversity.

Thanks to this ecological generosity, AMF are used as crop stimulants and in soil remediation. Their lipid lust also makes them good at carbon sequestration. Theoretically, engineered AMF strains could mount an even better performance in these essential tasks. But scientists have long viewed certain features of AMF, particularly their genetic structure and life cycle, as evolutionary puzzles that must be solved to make strain engineering possible and build better symbionts.

Working with Phase Genomics, scientists at the University of Ottawa recently overcame this barrier, successfully sequencing the genomes of four strains of the most widely studied AMF species, Rhizophagus irregularis. Using Phase’s proximity-ligation sequencing techology, they showed for the first time that the genomes of AMF are simultaneously more straightforward and more surprising than many mycologists had dared to dream.

Armed with this knowledge, scientists can plan new approaches to engineer AMF strains for applications in biomass production, soil remediation — and beyond.

 

The mysterious kary carryall

For years, the more scientists looked at AMF, the more questions they had. AMF bodies are essentially bags of haploid nuclei — tens of thousands, all sharing a common cytoplasm. And that’s not all.

“There were many, many outstanding questions about AMF,” said Dr. Nicolas Corradi, leader of the University of Ottawa team. “This was primarily because these fungi are always multinucleated and lack observable sex. It was suggested that AMF have an ‘oddball’ genetics and evolution.”

They were assumed to be “ancient asexuals,” who must’ve somehow thrived without the gene-shuffling benefits of sexual reproduction.

Dr. Corradi and his colleagues were determined to find out if that’s the case, and in the process began to shatter AMF’s asexual reputation. In 2016, they showed that Rhizophagus irregularis strains harbor evidence of sexual reproduction, including finding some of the genes needed for it. In some strains, all nuclei were genetically identical. But other, more robust and resilient AMF strains — termed heterokaryons — harbored two distinct populations of nuclei in their cytoplasm. More recently, Dr. Corradi and his team reported that the two populations of nuclei in heterokaryons change in abundance, depending on their host plant.

“But these were, however, based on fragmented genome datasets,” said Dr. Corradi.

To know for sure what was going on in AMF heterokaryons, the team needed a method to sequence the complete genomes of both populations of nuclei, allowing more complex studies of gene expression, genetic exchange and evolution in these puzzling fungal packages.

 

Would you prefer carrots or chicory?

Working with Phase Genomics, Dr. Corradi and his team employed a combination of proximity ligation (Hi-C) and PacBio HiFi data to sequence the genomes of both nuclear populations in four Rhizophagus AMF heterokaryon strains. Surprisingly, all four strains harbored genomes largely similar in structure — 32 chromosomes, with clear delineations between gene-rich and gene-poor regions — but highly divergent in sequence. For all four strains, the two populations of nuclei were essentially haplotypes, derived from parental strains during prior sexual reproduction.

Equipped with eight complete genomes — two haplotypes among four strains — the team followed-up with gene-expression analyses and discovered that each haplotype was transcriptionally active. But within an individual strain, haplotype gene expression patterns were not equal.

“AMF heterokaryons carry two haplotypes that physically separate among many thousands — potentially millions — of co-existing nuclei,” said Dr. Corradi. “This is unheard of in any other organism. But each ‘parental genome’ also regulates different biological functions, and these change depending on the plant host.”

They recorded at times dramatic shifts in haplotype abundance and expression depending on the AMF heterokaryon’s plant host — carrot versus chicory, for example. This suggests that each haplotype makes specific and unique contributions to the AMF heterokaryon’s phenotype. Future studies will have to tease out what role the plant host is playing, if any, in these shifting expression and abundance patterns.

 

Sex, but when? And more new mysteries

In assembling these long-sought genomes that co-exist within a common cytoplasm, Hi-C has revealed that Rhizophagus AMF heterokaryons are not as complex as once thought, or feared. Both haplotypes within each heterokaryon appear to arise through some past sexual reproduction event, contribute to the AMF’s phenotype and have unique gene expression patterns based on plant host. Their surprisingly ordinary genetic behavior — at least, ordinary for fungi — means it could be possible to engineer AMF that are even better symbionts for specific hosts, helping to boost crop biomass or improve resilience, for example. Engineered strains could also aid in soil remediation, or store carbon that would otherwise end up above ground or in the air.

The findings, coupled with the team’s previous experiments, also bring new mysteries into focus: AMF strains appear to employ a mixture of sexual and asexual reproduction, similar to other fungi. But scientists have never witnessed AMF sexual reproduction — a potentially useful tool for engineering strains. The new genome sequences will also serve as a point of comparison as scientists investigate whether the hundreds of other AMF species are similar to Rhizophagus — and their potential to transform agriculture.

Better together: long-range and long-read DNA sequencing methods, combined, reach record heights in microbiome discovery

Microbiome plate and Phase Genomics logo. Reads "Breaking records in microbiome discovery"

 

Click here for an updated blog post.

 

Since its debut, next-generation sequencing has not rested on its laurels. Improved sequencing platforms have reduced error and lengthened reads into the tens of thousands of bases. The debut of long-range sequencing methods that are based on proximity ligation (aka Hi-C) has brought a new order-of-magnitude into reach by linking DNA strands with their neighbors before sequencing.

 

This progress has birthed high-resolution metagenomics, the sequencing and assembly of genomes from environmental samples to study ecosystem dynamics. But metagenomic experiments often undersample microbial diversity, missing rare residents, overlooking closely related organisms (like bacterial strains), losing rich genetic data (like metabolite gene clusters), and ignoring host-viral or host-plasmid interactions.

 

A revolution within a revolution

 

New sequencing platforms and methods can reform metagenomics from within. Long-read platforms, such as the PacBio® Sequel® IIe system, now yield HiFi reads of up to 15,000 base pairs with error rates below 1%. In addition, Phase Genomics created ProxiMeta™ kits to generate proximity-ligated long-range sequencing libraries, which preserve associations between DNA strands originating in the same cell.

 

In a study posted May 4 to bioRxiv, a team — led by Dr. Timothy Smith and Dr. Derek Bickhart at the U.S. Department of Agriculture and Dr. Pavel Pevzner at the University of California, San Diego — employed both PacBio HiFi sequencing and ProxiMeta in a deep sequencing experiment to uncover record levels of microbial diversity from a fecal sample of a Katahdin lamb. Combined, PacBio HiFi sequencing and ProxiMeta eased assembly, recovered rare microbes, resolved hundreds of strains and haplotypes, and preserved hundreds of plasmid and viral interactions.

 

HiFi family trees

 

The team constructed SMRTbell® libraries to generate HiFi data, and ProxiMeta kits to generate long-range libraries. The two datasets, along with the metaFlye and ProxiMeta algorithms, allowed them to assemble contigs and create draft genomes without manual curation.

 

Researchers compared the breadth and depth of HiFi data-derived metagenome-assembled genomes, or MAGs, to control MAGs from assemblies of the same sample made using long, error-prone reads. HiFi data yielded more complete MAGs — 428 versus 335 — from more bacteria and archaea. HiFi data also generated more low-prevalence MAGs, capturing a larger slice of the community’s diversity by picking up more genomes from less common residents.

 

The HiFi MAGs also contained more than 1,400 complete and 350 partial sets of gene clusters for synthesizing metabolites such as proteasome inhibitors, which likely help some of these microbes colonize the gut. HiFi data picked up about 40% more of such clusters than control MAGs, illustrating just how much data is lost when long reads aren’t also highly accurate reads.

 

The team also used the HiFi MAGs to trace lineages within the community. They computationally resolved 220 MAGs into strain haplotypes, based largely on variations within single-copy genes. One MAG had 25 different haplotypes, which are likely strains of the same genus or species.

 

ProxiMeta’s long-range discoveries

 

The ProxiMeta-generated libraries added flesh to these MAG frames skeletons by unveiling additional rich biological information. Long-range sequencing linked nearly 300 HiFi-assembled plasmids to specific MAGs — revealing the species that hosted them. One plasmid, for example, was found in bacteria from 13 different genera. Long-range data also identified the first plasmids associated with two archaea, Methanobrevibacter and Methanosphaera.

 

Long-range sequencing illuminated the viral burden in this community. The HiFi library included nearly 400 viral contigs, more than half of which came from a single family of viruses that infect both bacteria and archaea. The team identified 424 unique viral-host interactions, including 60 between viruses and archaea, which is a more than 7-fold increase over controls.

 

What’s around the bend?

 

This study has lessons beyond one lamb’s gastrointestinal tract. It shows decisively that the highly accurate long reads generated by HiFi sequencing ideal partners for Hi-C-derived methods like ProxiMeta — together generating increasingly sophisticated metagenome assemblies for biologists to interrogate.

 

Applied to other environmental samples, this platform could illuminate the diversity and complexity of other microbial communities — from the bottom of the sea to mountain peaks, and within the stomach of every human being. It could probe pressing issues of our day, such as antibiotic resistance, soil health, or how microbes can break down pollutants. These endeavors will not just fuel the engines of scientific inquiry. Broader use of this method could generate new insights into pressing problems of our times, including antibiotic resistance.

New genome assembly method makes fruitful advances in genomic technology

 

A collaboration between Phase Genomics and Pacific Biosciences of California is bringing about the next generation of genome assembly technology. A newly published software tool, FALCON-Phase, combines genomic proximity ligation methods developed by Phase Genomics™, with the high accuracy, long-read sequencing data from PacBio®, enabling researchers to create haplotype-resolved genome sequences on a chromosomal scale, without having parental genome data. This method and its application to several animal genomes was published today in Nature Communications.

cow, zebra finch, and human hand arranged in a collage

Humans, as well as other animals, carry DNA sequence copies from both parents. These parental sequence “haplotypes” can carry millions of mutations unique to one of the parents and are often very relevant to diseases and other genetic traits. Until recently, accurately separating paternal and maternal mutations on the whole-genome scale required sequence information from the individual parents or extensive efforts that relied heavily on imputation from population studies. The new method employs the physical proximity information captured by proximity ligation (a technology also known as “Hi-C”) to separate maternal and paternal haplotype information from long-read genome assemblies. This development significantly increases the actionable information content coming out of genome sequencing studies.

 

 

“It’s an exciting time for genome assembly and PacBio HiFi sequencing continues to lead the way in this area with its powerful combination of read length and accuracy,” wrote Jonas Korlach, Chief Scientific Officer at Pacific Biosciences. “Phase Genomics Hi-C complements PacBio technology by extending our data into the ultra-long-range domain, enabling us to connect phase blocks and deliver chromosome-scale diploid assemblies without parental data. We are fortunate to have this excellent partnership with Phase Genomics, and we look forward to continuing to work together to create the highest quality reference genomes available.”

 

Assembling two fully-phased genomes in a single, streamlined process not only saves on the costs of research, but it also enables scientists to upgrade their genome assembly pipelines and obtain previously unobtainable information.

 

Dr. Erich Jarvis, professor at Rockefeller University and chair of the international Vertebrate Genomes Project, wrote, “Chromosome-scale haplotype phasing is critical for generating accurate genome assemblies and for understanding genomic variation within a species.” Furthermore, FALCON-Phase produces maternal and paternal haplotypes without family-trio data, so it can be applied to wild-caught samples or organisms lacking pedigree information. Jarvis notes, “In wild populations that many work with, parental samples are usually unavailable and therefore we need a method that can phase paternal and maternal sequences in the offspring individuals. With FALCON-Phase, we are able to use the Hi-C data that we have already generated for genome scaffolding and add a new dimension to every genome assembly, even retrospectively for previous projects. Our collaboration with Phase Genomics and PacBio has been extremely fruitful and the combination of the two technologies through FALCON-Phase will be highly beneficial to genomic sequencing efforts focused on conservation.”

 

FALCON-Phase is applicable to any diploid genome, including plants, animals, and fungi. It is available as free of charge open-source software (https://github.com/phasegenomics/FALCON-Phase) and Phase Genomics offers services that include the application of this method to varying genome projects. See the latest news and publications on this and other genome assembly methods at https://phasegenomics.com/resources-and-support/publications/.

 

For more information, email us at info@phasegenomics.com.

Phase Genomics and QIAGEN Partner to Bring Hi-C Epigenetics Solutions to U.S. Market

In the last few years, Hi-C technology has grown in popularity within the epigenetics community. The chief application this proprietary method is to measure the three-dimensional architecture of genomes to better understand complex nuclear dynamics. Being a leader in this space, we at Phase Genomics seek to maximize the commercial footprint of our technology. As interest in this method has increased significantly, we have partnered with QIAGEN to increase its commercial availability. Read about the new EpiTect Hi-C kits available now through our collaborative effort.

 

QIAGEN expands its existing Epigenomic offering in the United States with Sample to Insight solution for Hi-C NGS analysis

• EpiTect Hi-C Kit helps researchers to better understand key aspects of long-range genome architecture
• License agreement enables QIAGEN to sell Phase Genomics’ proprietary proximity-ligation technology in the United States research market
• Adds to QIAGEN’s epigenomic capabilities in identification of individual methylation marks and histone modification at the nucleotide level

 

Hilden, Germany, and Germantown, Maryland, October 29, 2020 – QIAGEN today announced a non-exclusive agreement with Phase Genomics, Inc. to license specific patents to sell its EpiTect Hi-C kits in the United States. Through this agreement, QIAGEN now has access to support chromatin research in the largest research market in the world.

Chromatin conformation research, including chromatin conformational analysis (Hi-C), is an emerging and growing market area of genomic research that is refining our knowledge of the interconnectivity and organization of the genome. Hi-C has become a vital tool for understanding the structures and organization associated with cell biology. The EpiTect Hi-C Kit provides a simplified, single-box solution, requiring less than 250,000 mammalian cells to generate sequence-ready libraries.

“QIAGEN’s EpiTect portfolio has until now focused on identifying individual methylation marks and histone modification in the genome at the nucleotide level,” said Kerstin Steinert, Vice President of Product Development & Research Services at QIAGEN. “With the QIAGEN EpiTect Hi-C Kit, we are providing an end-to-end solution to study the 3-D genome and identify larger structural aspects of chromatin conformation and genomic architecture.”

“This partnership demonstrates a strong confidence in the value of Phase Genomics’ technology. Now, scientists studying epigenetics can more fully understand changes in genome architecture that may trigger disease in ways that are more cost-effective than ever before,” said Ivan Liachko, PhD, Founder and CEO of Phase Genomics. “Our Hi-C proximity-ligation technology, now available through QIAGEN, will help accelerate treatments to market and discover new paths toward the prevention of disease.”

In keeping with the QIAGEN commitment to provide Sample to Insight solutions, customers will have two options to analyze data from experiments. The EpiTect Hi-C Portal, located on GeneGlobe Data Analysis Center, provides multiple analysis types, including contact matrices and maps. In addition, Phase Genomics will provide a comprehensive suite of computational analytic services for Hi-C data analysis to QIAGEN EpiTect Hi-C Kit customers through a cloud-based bioinformatic platform that employs novel computational approaches and algorithms to analyze and interrogate proximity ligation (Hi-C) data.

Breaking the Mold: New Tech Sheds Light on 5 Mysteries of the Fungal World

 

This month Phase Genomics is celebrating #FungusFebruary by highlighting some of the unique capabilities of our Hi-C technology to solve age-old mysteries in the world of fungal genetics and deliver new potential for researchers to understand fungi, all while helping solve global crop crises and develop new groundbreaking pharmaceuticals.

While we wield the power of genomics to explore the wonders of fungi today, a few centuries ago people dismissed them as just weird plants. Eventually microscopes and anatomical studies revealed fungi as a distinct flavor of life — some varieties quite tasty — but educational experts today continue to bemoan the lack of lessons on fungi in biology curricula, and research on fungi — even those that cause disease — lags.

As a result, scientists lack much basic information on the genetics, life cycles, and reproductive habits of many fungi — even though members of this kingdom could help address a bevy of challenges in food and energy production, illuminate the evolution of complex life and even shelter us on Mars.

Genome studies on fungi of all stripes can resolve evolutionary relationships and ecosystem dynamics, identify metabolites of commercial and medical interest and — for fungi that cause disease — reveal biochemical and genetic targets to help us fight pathogenicity.

Like their animal and plant cousins, fungal genomes also have their challenging parts, including repeats, duplications and structural elements that complicate both sequencing and assembly. Recently, the chromosome conformation method “Hi-C” and advances in next-generation sequencing have helped untangle some of these sticky genomic knots, and show promise in taming genomes across this diverse and neglected kingdom of life.

 

        1. High-resolution mapping of centromeres

 Hi-C’s power lies in its ability to identify regions of the genome that reside in close proximity to one another in the nucleus — information that essentially captures the 3D organization of the genome. But Hi-C doesn’t just identify where particular chromosomes reside within the nucleus. It can also help identify functional elements in genomes that are difficult to identify in other ways.

That is what two groups of researchers (from the Pasteur Institute and the University of Washington) did when they used Hi-C to track down functional elements in yeast genomes — centromeres and rDNA clusters — both of which are typically repeat-rich and difficult to identify without laborious experiments involving functional assays or mapping the binding sites of rare centromere proteins. In fungal species, centromeres are held tightly together at the spindle pole body, and the team used this shared proximity to identify centromere locations in the genomes of numerous yeasts (and subsequently other fungi), despite not knowing their centromeric DNA sequence. Ribosomal DNA clusters similarly congregate in yeast nuclei, which one team exploited to identify their positions in Debaryomyces hansenii.

 

        2. High-quality genomes illuminate biochemical pathways

Fungi harbor a wide array of genes for synthesizing secondary metabolites, which range from harmful toxins to helpful pharmaceuticals. In fungi, genes for synthesizing secondary metabolites tend to occur in clusters, which are also thought to be sites of rapid evolution.

Phase Genomics worked with a University of Minnesota-led team and used Hi-C to generate high-quality genomes of six strains of Tolypocladium inflatum, an insect pathogen that has already given us the immunosuppressant drug cyclosporin. The new assemblies revealed major differences in secondary metabolite production between T. inflatum strains, including novel clusters, transpositions and clusters that may be involved in toxin synthesis. The bevy of discoveries from these assemblies showed how recombination can drive significant divergence even within a single species — and how important it is to build multiple high-quality genome assemblies that can capture that diversity.

 

        3. Fungal dikaryons and the hidden nuclear dance

The genetic differences between strains also apply to pathogenic fungi, like the stem rust, which parasitizes wheat. Phase Genomics partnered with a team led by scientists at CSIRO in Australia to apply Hi-C to stem rust – the particularly deadly scourge Ug99. Like many fungi, stem rust genomes are divided between two haploid nuclei. The team used Hi-C data to assemble complete haplotypes for both haploid genomes of both strains, and discovered that Ug99, a recent arrival that is decimating whole fields of wheat in Africa, has an unexpected origin: The strain arose through “somatic hybridization,” when hyphae from two strains exchange haploid nuclei. This may explain the strain’s sudden rise and deadly wake, and gives scientists new genomic information to understand Ug99’s virulence and identify weaknesses that could give wheat a leg up.

 

        4. Hybrids, beer, and fungal metagenomics

The ability to separate two nuclei from within the same cell can be extended to more complex samples.  Yeasts, which are integral players in brewing, will often hybridize to form new species containing genomes from two organisms at once (the famous lager-producing yeast Sacharomyces carlsbergensis is one example of such a hybrid).  But in a mixed microbial community, such as beer, wine, or a microbiome sample, how can DNA sequencing detect which genomes co-exist within the same cell?  One special power of Hi-C is that it traps sequences that are within touching distance of each other, and therefore must come from inside the same cell.  The Dunham lab at the University of Washington used this property to analyze an open-fermentation beer from a local brewery.  The exciting result was that they were able to discover a new hybrid yeast, later named Pichia apotheca, using Hi-C data to identify it as a hybrid bearing two genomes from related organisms.  This new hybrid species has since been used by home-brewers to ply their craft and gives beer a very unique flavor.

 

        5. The Epigenetics of Symbiosis

Nature has plenty of examples of plants and fungi getting along. One of them is Epichloë festucae, a filamentous fungus that has evolved a symbiotic relationship with certain grass species. When Phase Genomics worked with a Massey University-led team, they discovered that E. festucae’s genome carries hallmarks of this symbiosis. The analysis of Hi-C data revealed that important genes are clustered into blocks separated by repeat-rich regions. Hi-C and RNA-seq data together showed that genes within the blocks have similar expression patterns — indicating that genes needed for symbiosis with their grass hosts tend to cluster together in the same blocks.

 

Looking Forward

Cutting edge genomic technologies like Hi-C have the potential to keep making up for lost time and reveal even more intimate details of the hidden lives of fungi. This #FungusFebruary, it’s worth asking: What other mysteries about this long-overlooked kingdom are worth solving?

The Highest-Quality Genomes: Q&A on Cannabis Genomics

 

Co-author Kevin McKernan of Medicinal Genomics talks more about the past, present, and future of cannabis genomic research. Read more about his newly published cannabis genome assembly project using Proximo Hi-C scaffolding featured in The Genetic Literacy Project.

 

What is the difference between hemp and marijuana? How can we use genomics to answer this question?

 

McKernan: The legal definition of hemp is any Cannabis sativa that has less than 0.3 percent THC acid, or THCA. Historically, hemp has been grown for fiber and the exceptional nutritional content of its seed. THCA expression is genetically controlled at what has been historically referred to the Bt:Bd allele. Next-generation sequencing technologies are giving us our first glimpse of this complicated locus.

 

Why are you interested in assembling the Cannabis genome? What are you hoping to accomplish?

 

McKernan: A refined genome assembly will enable molecular breeding programs to deploy marker-assisted selection for yield, flowering time, pest resistance and rare cannabinoid expression. It will likely shed light on the heritability of hermaphroditism and apomixis. A clearer picture of the genes involved in cannabinoid and terpenoid expression will enable more intelligent breeding and synthetic biology programs.

 

Which genes are responsible for cannabidiolic acid production and how do these genes vary between the cultivars?

 

McKernan: The Cannabis plant makes 113 different cannabinoids. There are three well-understood cannabinoid synthesis genes. These highly similar genes all compete for a common precursor molecule. Mutations in these genes affect gross cannabinoid expression. A more refined reference may enlighten us to the genetic variants that can more accurately estimate THCA levels to segregate hemp and drug-type seed stocks.

 

What other hidden gems did you find in the Cannabis genome after you finished the assembly?

 

McKernan: The most exciting picture is the 2.1Mb CBCAS (cannabichromenic acid synthase) gene cluster seen the Jamaican Lion assembly. This has 9 tandem copies of CBCAS all directionally orientated that are 99.4-99.9 percent identical and separated by 30-80kb long terminal repeats. This region has been an assembly knot for over seven years and I think the only reason it is visible to us today is due to novel sequencing tools we didn’t have in 2011.

 

Why is the Cannabis genome so difficult to assemble? Are there unique genomic features (i.e. copy number variants, special repeat classes, segmental duplications) that are especially troublesome?

McKernan: Its 1.07Gb genome consists of 10 chromosomes, with 73 percent repeat, 66 percent AT and 0.5-1 percent polymorphic. The genes that contribute to chemotype are under the most selective pressure and have hijacked long terminal repeats to enable gene expansions. We had suspicions of this back in 2011 but could never assemble the region to prove it.

 

Why was it important to obtain chromosomes for your assembly? How did Hi-C help?

 

McKernan: The Pacific Biosciences assembly delivered us an assembly that was an amazing leap forward from the Illumina assemblies, but it is not chromosomal in scale. Hi-C has helped to organize these contigs into chromosomes and it can do this without having to make linkage maps.

 

What did you find to be most useful in working with Phase Genomics?

 

McKernan: Hi-C is very complimentary to PacBio sequence data and is the only technology that delivers long range information without having to make high molecular weight DNA. This is very important in Cannabis as it is difficult to get high molecular weight DNA out of the plant.

 

What would you like other researchers, breeders or regulators to take away from your high-quality genome assembly? How do you think this genome assembly will be utilized in the future?

 

McKernan: We also need dozens of genomes sequenced to the quality level of Jamaican Lion to get a full picture of these complex cannabinoid loci. We need Hi-C libraries to better understand the microbiome of the plant, so we can more intelligently manage pathogenic threats that affect yield. Many endofungal bacteria like Ralstonia are found in metagenomic sequencing studies in Cannabis flowers and can be a risk to consumers and negatively impact plant yield. Ralstonia is also notorious for contaminating many metagenomic studies due to contamination in library construction kits. We suspect Hi-C will play important roles in segregating live versus dead DNA and resolving these contamination problems.

 

What regulatory challenges do you run into when working on Cannabis genomics?

 

McKernan: The biggest issue at the moment is that the movement of tissue, other than sterilized stalk, is currently federally prohibited in the U.S. This makes RNA studies very challenging as RNA isolation has to be performed in the field. Movement of DNA or cross-linked chromatin is legal, so this is a compelling case for the use of Hi-C in the Cannabis field (insert Hi-C pun here). Phase Genomics’ kits were critical, as shipping certain tissues is restricted.U.S. federal funding also remains restricted. We turned to the Dash Distributed Autonomous Organization for funding to rapidly sequence and publish the genome. We applied for funds in May of 2018 and had the first assembly public on August 2. This is a very generous contribution by Dash because any U.S. university that attempts to handle the plant places their federal funding at risk.

 

What genomic evidence suggests that Cannabis has been selectively bred by humans?

 

McKernan: I think the elevated THCA levels witnessed since prohibition — combined with the long terminal repeat-driven expansion of the synthase genes — is the best evidence we have.

 

What is your favorite fact and what is your least favorite misconception about Cannabis?

 

McKernan: My favorite thought experiment regarding the rapid reproduction of Cannabis is that its genome is very likely spreading through space and time more quickly than the human genome, and it evokes much of David Sinclair’s work on Xenohormesis. My least favorite misconception is the false dichotomy of medical versus recreational cannabis consumption. I think this showcases our reactionary health-care mindset as opposed to the preventative mindset we need to strive for. If you disregard recreational use, you are likely going to require more medical use. These compounds have been in our diet for thousands of years. We now know mutations in human endocannabinoid system-related genes are associated with neurological phenotypes and a large class of idiosyncratic diseases are now being recognized as clinical endocannabinoid deficiency (CED). It was incredibly naïve and destructive to remove cannabinoids from the American diet in 1937.

 

What do you think the future holds for the cannabis industry?

 

McKernan: In states that legalize cannabis, there is a 15 percent reduction in alcohol consumption, a 25 percent reduction in opiate overdoses, a 17 percent decrease in Medicare opiate usage and a 25 percent reduction in general pharmaceutical use. There is a 10 percent reduction in suicide and a 72 percent reduction in PTSD nightmares. The benefits to epilepsy have survived FDA scrutiny. This is the most disruptive market force we have seen in healthcare since the internet and next-generation sequencing. We are now just witnessing the alcohol industry take multi-billion dollar positions in the cannabis industry. It is only a matter of time before the pharmaceutical industry begins to hedge their losses as well. I am betting against the endocannabinoid mimetic known as acetaminophen and in favor of the less-toxic phytocannabinoids like cannabidiol.

 

 

About Phase Genomics

Seattle-based Phase offers research services and kits based on its Hi-C and proximity-ligation technologies, which enable chromosome-scale genome assembly, metagenomic deconvolution, and the analysis of structural genomic variation and genome architecture. Phase Genomics offers Hi-C genomics tools for genome scaffolding and phasing. Learn more about Proximo and bring the power of Hi-C into your lab today by purchasing one of our Hi-C kits.

How it Works: Proximo Hi-C Genome Scaffolding