The Highest-Quality Genomes: Q&A on Cannabis Genomics

 

Co-author Kevin McKernan of Medicinal Genomics talks more about the past, present, and future of cannabis genomic research. Read more about his newly published cannabis genome assembly project using Proximo Hi-C scaffolding featured in The Genetic Literacy Project.

 

What is the difference between hemp and marijuana? How can we use genomics to answer this question?

 

McKernan: The legal definition of hemp is any Cannabis sativa that has less than 0.3 percent THC acid, or THCA. Historically, hemp has been grown for fiber and the exceptional nutritional content of its seed. THCA expression is genetically controlled at what has been historically referred to the Bt:Bd allele. Next-generation sequencing technologies are giving us our first glimpse of this complicated locus.

 

Why are you interested in assembling the Cannabis genome? What are you hoping to accomplish?

 

McKernan: A refined genome assembly will enable molecular breeding programs to deploy marker-assisted selection for yield, flowering time, pest resistance and rare cannabinoid expression. It will likely shed light on the heritability of hermaphroditism and apomixis. A clearer picture of the genes involved in cannabinoid and terpenoid expression will enable more intelligent breeding and synthetic biology programs.

 

Which genes are responsible for cannabidiolic acid production and how do these genes vary between the cultivars?

 

McKernan: The Cannabis plant makes 113 different cannabinoids. There are three well-understood cannabinoid synthesis genes. These highly similar genes all compete for a common precursor molecule. Mutations in these genes affect gross cannabinoid expression. A more refined reference may enlighten us to the genetic variants that can more accurately estimate THCA levels to segregate hemp and drug-type seed stocks.

 

What other hidden gems did you find in the Cannabis genome after you finished the assembly?

 

McKernan: The most exciting picture is the 2.1Mb CBCAS (cannabichromenic acid synthase) gene cluster seen the Jamaican Lion assembly. This has 9 tandem copies of CBCAS all directionally orientated that are 99.4-99.9 percent identical and separated by 30-80kb long terminal repeats. This region has been an assembly knot for over seven years and I think the only reason it is visible to us today is due to novel sequencing tools we didn’t have in 2011.

 

Why is the Cannabis genome so difficult to assemble? Are there unique genomic features (i.e. copy number variants, special repeat classes, segmental duplications) that are especially troublesome?

McKernan: Its 1.07Gb genome consists of 10 chromosomes, with 73 percent repeat, 66 percent AT and 0.5-1 percent polymorphic. The genes that contribute to chemotype are under the most selective pressure and have hijacked long terminal repeats to enable gene expansions. We had suspicions of this back in 2011 but could never assemble the region to prove it.

 

Why was it important to obtain chromosomes for your assembly? How did Hi-C help?

 

McKernan: The Pacific Biosciences assembly delivered us an assembly that was an amazing leap forward from the Illumina assemblies, but it is not chromosomal in scale. Hi-C has helped to organize these contigs into chromosomes and it can do this without having to make linkage maps.

 

What did you find to be most useful in working with Phase Genomics?

 

McKernan: Hi-C is very complimentary to PacBio sequence data and is the only technology that delivers long range information without having to make high molecular weight DNA. This is very important in Cannabis as it is difficult to get high molecular weight DNA out of the plant.

 

What would you like other researchers, breeders or regulators to take away from your high-quality genome assembly? How do you think this genome assembly will be utilized in the future?

 

McKernan: We also need dozens of genomes sequenced to the quality level of Jamaican Lion to get a full picture of these complex cannabinoid loci. We need Hi-C libraries to better understand the microbiome of the plant, so we can more intelligently manage pathogenic threats that affect yield. Many endofungal bacteria like Ralstonia are found in metagenomic sequencing studies in Cannabis flowers and can be a risk to consumers and negatively impact plant yield. Ralstonia is also notorious for contaminating many metagenomic studies due to contamination in library construction kits. We suspect Hi-C will play important roles in segregating live versus dead DNA and resolving these contamination problems.

 

What regulatory challenges do you run into when working on Cannabis genomics?

 

McKernan: The biggest issue at the moment is that the movement of tissue, other than sterilized stalk, is currently federally prohibited in the U.S. This makes RNA studies very challenging as RNA isolation has to be performed in the field. Movement of DNA or cross-linked chromatin is legal, so this is a compelling case for the use of Hi-C in the Cannabis field (insert Hi-C pun here). Phase Genomics’ kits were critical, as shipping certain tissues is restricted.U.S. federal funding also remains restricted. We turned to the Dash Distributed Autonomous Organization for funding to rapidly sequence and publish the genome. We applied for funds in May of 2018 and had the first assembly public on August 2. This is a very generous contribution by Dash because any U.S. university that attempts to handle the plant places their federal funding at risk.

 

What genomic evidence suggests that Cannabis has been selectively bred by humans?

 

McKernan: I think the elevated THCA levels witnessed since prohibition — combined with the long terminal repeat-driven expansion of the synthase genes — is the best evidence we have.

 

What is your favorite fact and what is your least favorite misconception about Cannabis?

 

McKernan: My favorite thought experiment regarding the rapid reproduction of Cannabis is that its genome is very likely spreading through space and time more quickly than the human genome, and it evokes much of David Sinclair’s work on Xenohormesis. My least favorite misconception is the false dichotomy of medical versus recreational cannabis consumption. I think this showcases our reactionary health-care mindset as opposed to the preventative mindset we need to strive for. If you disregard recreational use, you are likely going to require more medical use. These compounds have been in our diet for thousands of years. We now know mutations in human endocannabinoid system-related genes are associated with neurological phenotypes and a large class of idiosyncratic diseases are now being recognized as clinical endocannabinoid deficiency (CED). It was incredibly naïve and destructive to remove cannabinoids from the American diet in 1937.

 

What do you think the future holds for the cannabis industry?

 

McKernan: In states that legalize cannabis, there is a 15 percent reduction in alcohol consumption, a 25 percent reduction in opiate overdoses, a 17 percent decrease in Medicare opiate usage and a 25 percent reduction in general pharmaceutical use. There is a 10 percent reduction in suicide and a 72 percent reduction in PTSD nightmares. The benefits to epilepsy have survived FDA scrutiny. This is the most disruptive market force we have seen in healthcare since the internet and next-generation sequencing. We are now just witnessing the alcohol industry take multi-billion dollar positions in the cannabis industry. It is only a matter of time before the pharmaceutical industry begins to hedge their losses as well. I am betting against the endocannabinoid mimetic known as acetaminophen and in favor of the less-toxic phytocannabinoids like cannabidiol.

 

 

About Phase Genomics

Seattle-based Phase offers research services and kits based on its Hi-C and proximity-ligation technologies, which enable chromosome-scale genome assembly, metagenomic deconvolution, and the analysis of structural genomic variation and genome architecture. Phase Genomics offers Hi-C genomics tools for genome scaffolding and phasing. Learn more about Proximo and bring the power of Hi-C into your lab today by purchasing one of our Hi-C kits.

How it Works: Proximo Hi-C Genome Scaffolding