Category: Genomics

Orphan Crop Gains Reference Genome with Proximo Hi-C

Amaranth genome assembly brought to the chromosome-scale using Phase Genomics’ Proximo Hi-C technology. 


“Orphan crops” are growing in popularity because they have the potential to feed the world’s expanding population.  You may have heard of orphan crops like quinoa or spelt, but have you heard of amaranth?  The amaranth genus (Amaranthus) is a hearty group of plants that produce nutritious (high in protein and vitamin content) leaves and seeds.  Amaranth species grow strongly across a wide geographic range, including South America, Mesoamerica, and Asia.  Amaranth was likely domesticated by the Aztec civilization and has been a staple food of Mesoamericans for thousands of years. Breeders wish to enhance amaranth’s beneficial properties like drought resistance, nutrition, and seed production to improve the usefulness of amaranth as a food source.  However, effective plant husbandry requires genetic and genomic resources, and building these resources has been inhibited by the high cost of genome sequencing and assembly.


Dr. Jeff Maughan (left) and Dr. Damien Lightfoot (right), are the primary authors of the amaranth genome paper.

Dr. Jeff Maughan, professor at Brigham Young University, is a champion of orphan crop genomics.  Over the past year, Dr. Maughan and his team built a reference-quality amaranth genome on a tight budget.  They built upon an earlier,  short-read assembly by adding Hi-C data, which measures the conformation of chromatin in vivo, as well as low coverage long reads and optical mapping data.  After using optical mapping to correct assembly errors in the short read assembly, the Hi-C data was used to cluster the short genome fragments into nearly complete chromosomes using Phase Genomics’ Proximity-Guided Assembly platform, Proximo™ Hi-C, Then, the long reads were used to close remaining gaps on the chromosomes.  This cost-effective strategy recovered over 98% of the 16 amaranth chromosomes.


The completed reference genome provides an important resource for the community and will boost the efforts of plant breeders to unlock more agricultural benefits for amaranth.  In their paper, Dr. Maughan’s team demonstrated the utility of the reference quality genome in at least two ways.  First, they looked at chromosomal evolution by comparing the amaranth genome to the beet genome, which enables researchers to better understand amaranth in the context of how plants evolved, and second, they mapped the genetic locus responsible for stem color, which clarifies the scientific understanding of a useful agricultural trait.  Dr. Maughan points out that both of these experiments would have been impossible without the chromosome-scale genome assembly afforded by Proximo Hi-C.


A high-quality reference genome is the first of many important steps towards creating a modern breeding program for amaranth. We contacted Dr. Maughan to learn more about how he is improving amaranth genomics and the importance of orphan crops.


What is an orphan crop? 

According to the FAO (Food and Agriculture Organization of the United Nations) the world has approximately 7,000 cultivated edible plant species, but just five of them (rice, wheat, corn, millet, and sorghum) are estimated to provide 60% of the world’s energy intake and just 30 species account for nearly all (95%) of all human food energy needs.  The remaining species are underutilized and often referred to as “orphan crops”.


How is genomics relevant to orphan crops?

Would you invest your entire 401K savings in just three stocks?  In essence, that is what we are doing with world food security.  This comes with tremendous risk.  If we are going to diversify our food crops, it will be with these orphan crops.  Modern plant breeding programs leverage genomics to significantly enhance genetic gain (yield), such methods will undoubtedly expedite the development of advanced varieties in orphan crop species.


What are the challenges facing researchers interested in orphan crop genomics?  How have you overcome them?

Funding has long been the main obstacle to developing genomic resources for orphaned crops.  The development of cheap, high-quality next-generation sequencing technology has dramatically ameliorated this problem – making genomics accessible for most plant species.


You used two scaffolding technologies for your assembly, Hi-C, and BioNano. How did they compare?

Both technologies are extremely useful and complementary but address different genome assembly challenges.  The Hi-C data allows for the production of chromosome length scaffolds, while the BioNano data allows for fine-tuning and verification of the assembly.


Beyond building a high-quality genome assembly, what other genomic resources are required to encourage the adoption of orphan crops?

While genomic resources (such as genome assemblies and genetic markers) are fundamental for developing a modern plant breeding program, often what is missing with orphan crops is the collection of diverse germplasm (or gene bank) that is the foundation of a hybrid breeding program.  The U.S. and other nations have extensive collections (tens of thousands of accessions) that serve as the genetic foundation for staple crop breeding programs – unfortunately, such collections are minimal or non-existent for orphan crops.


Who stands to benefit the most from a complete amaranth genome?  How do you disseminate your work to them?

We collaborate extensively with researchers throughout South and Central America, where amaranth is already valued as a regionally important crop.  Dissemination of our research occurs though traditional methods (e.g., peer reviewed publications) as well as through sponsored scientist and student exchanges.


Amaranth is used in a variety of interesting foods, what’s your favorite dish?

Alegría, which is made with popped amaranth and honey, and is common throughout Mexico.


Threespine Stickleback Genome Upgraded Using Phase Genomics’ Proximo™ Hi-C Technology

Threespine stickleback

Proximo Hi-C genome scaffolding not only improved the well-studied threespine stickleback assembly, but also found structural differences that would have otherwise been missed. 


This week researchers from the University of Bern and the University of Georgia released a new high-quality reference threespine stickleback genome. The results of this project, a joint collaboration between Dr. Catherine Peichel, Dr. Michael White, and Phase Genomics, were publiaried in the Journal of Heredity. By applying a relatively new scaffolding technology, Proximo Hi-C, the team was able place 60% of previously unassigned sequence to chromosomes. These previously unplaced sequences make up ~5% (13 Megabases) of the stickleback genome and contain multiple genes and other functional DNA. The assembly was generated from an individual from a different lake than the previous stickleback reference genome, and the structural information generated by Proximo Hi-C allowed the team to identify novel structural variants between the two populations. These improvements and new structural information will benefit many research groups that use this model organism to study genetics and evolution.


The first efforts to sequence and assemble the threespine stickleback genome from 2012 used a costly sequencing method called Sanger sequencing. This assembly was followed by two revisions in 2013 and 2015 that used standard short-read sequencing technologies. Short reads can be assembled together into larger fragments of the genome called contigs, but some regions of the genome are difficult to assemble because they are long, highly repetitive, or otherwise ambiguous. In the end, these efforts left researchers with a decent yet highly fragmented picture of the stickleback’s chromosomes, with other large portions of its genetic sequence left in individual contigs unassociated to any chromosome.


Dr. Catherine (Katie) Peichel and Dr. Michael White

Dr. Catherine (Katie) Peichel (left), Head of the Division Evolutionary Ecology, University of Bern, and Dr. Michael White, Assistant Professor, Department of Genetics, University of Georgia, used Proximo Hi-C genome scaffolding to make many improvements to the Threespine stickleback genome and detect structural variation.

Dr. Peichel and Dr. White used Phase Genomics’ Proximo Hi-C genome scaffolding technology to resolve many of these issues and create the new reference genome. Proximo Hi-C genome scaffolding uses a protocol called Hi-C to measure the physical structure of an organism’s genome and then uses that information to place contigs into chromosome-scale de novo assemblies. Phase Genomics was founded by the inventors of this genome assembly approach and has been making its Proximo Hi-C genome scaffolding technology available to researchers since 2015. The company specializes in generating and analyzing Hi-C data for the scaffolding of genomes such as the Threespine stickleback, as well as for analyzing microbial communities and other metagenomic samples through its ProxiMeta™ Hi-C metagenomic deconvolution technology.


We know that scientific tools are only as good as the resulting scientific findings. We sent a brief Q&A to both Dr. Peichel and Dr. White to get their take on the scientific value of Proximo Hi-C and share their experiences in working with us.


Why is the stickleback genome important?

Sticklebacks are a “supermodel” for evolutionary genetics, in that they have been one of the leading model systems for identifying the genetic and molecular basis of phenotypic changes in natural populations. Thus, it is important to have a complete genome sequence so that one can correctly identify all the genes that are present in a genomic region that is associated with a phenotype of interest. -CP

Why did the original genome need improvement?

A high-quality Sanger-sequenced genome was published in 2012 and has undergone two revisions since this time. Despite incorporating dense linkage maps to help assign many of the unanchored scaffolds to linkage groups, over 26.7 Mb of the 460 Mb genome still remained unassigned to linkage groups. We needed to apply other approaches to try and assign these remaining scaffolds. -MW

How did Proximo Hi-C scaffolding improve the contiguity of the genome?

We were able to assign over 60% of the unassigned contigs to chromosomes. -CP

What other applications of the Hi-C data are useful to your biological questions?

Hi-C is a useful way to identify structural variation (like inversions) among stickleback populations. We are also excited about the possibility of using Hi-C for assembly of the hard-to-assemble regions of the genome like Y chromosomes. -CP

Why did you choose to work with Phase Genomics?

I was impressed by their interest in our biological questions and dedication to working with us until we were satisfied with the assembly. -CP

We chose to work with Phase Genomics because of the ease of the entire pipeline. Phase Genomics was fast and kept us updated at every step along the way. It was great to work with a group that was so communicative and open to trying different approaches to get the best assembly. -MW

Spotlight on Hi-C in Science: New Technologies Boost Genome Quality

Science writer, Elizabeth Pennisi, outlines available genomics technologies that are helping researchers improve genome assemblies with a focus on Hi-C’s ability to bring genome assembly to the chromosome-scale.

This article, by Elizabeth Pennisi, focuses on how new technologies are making genome quality much better.  Long-reads, optical maps, and Hi-C data are being synergistically applied to improve modern genome assemblies including goat (Dr. Tim Smith), humming bird (Dr. Eric Jarvis), maize, and more.  Importantly, Hi-C provides the finishing touch to these genomes, by providing ultra-long contiguity information that can scaffold entire chromosomes. We, at Phase Genomics, are glad researchers have chosen Proximo Hi-C to scaffold the goat, hummingbird, and hundreds of other assemblies into contiguous chromosome-scale reference genomes.


Read the article here

Spotlight on Hi-C in The Atlantic: The Game-Changing Technique That Cracked the Zika-Mosquito Genome

One of the most prolific science writers, Ed Yong, profiles how Hi-C sequencing technologies can make genome assembly easier and more cost-effective than ever before. 

Science writer Ed Yong covers the narrative on the researchers’ tackling the disease carrying Aedes aegypti genome, and how Hi-C “knitted” the genome from 36,000 pieces into complete and contiguous chromosomes. Yong points out that the completed genome will not only help scientists better understand the biology of the mosquito at a much deeper level, but it also marks a technological pivot in genomics: Hi-C makes genome assembly cheaper, more accurate and faster than ever before. Also, mentioned in the article: our collaborator, Dr. Catherine Piechel’s newly published three-spine stickleback genome, and Dr. Erich Jarvis’s hummingbird were also cited as examples of the power of Proximo Hi-C scaffolding.


Read the article here