Phase Genomics and Pacific Biosciences Co-Developing new Genome Assembly Phasing Software

Phase Genomics and Pacific Biosciences logos

“FALCON-Phase” – an algorithm for producing diploid genomes.

 

Phase Genomics has entered into a co-development agreement with Pacific Biosciences to develop FALCON-Phase, a software module that combines Hi-C and PacBio® highly-accurate, long read sequencing data to produce fully-phased diploid genome assemblies. The software will be released later this spring.

FALCON-Phase augments PacBio Single Molecule, Real-Time (SMRT®) assemblies with Hi-C proximity-ligation data, generating accurate, fully-phased diploid assemblies. Specifically, it uses Hi-C’s chromatin proximity information to identify sequences belonging to the same parental chromosome in genome assemblies produced by PacBio’s FALCON-Unzip software, greatly reducing haplotype switching along the primary assembly.

Furthermore, by combining Phase Genomics’ Proximo Hi-C genome scaffolding technology with FALCON-Phase, users can fully reconstruct maternal and paternal haplotypes on a chromosomal scale. The end result is a diploid set of chromosome-scale scaffolds, or two fully-phased genomes for the same data and labor cost typical for a single genome project.

FALCON-Phase genome Phasing Graph

FALCON-Phase groups long-read contigs into two separate haplotypes based on Hi-C data. Red and blue edges show contigs connected to the same haplotype, while black edges show homologous contigs connected to both haplotypes. Colors were assigned based on known phasing of assembly, which was not otherwise used to inform FALCON-Phase analysis.

These high-quality phased haplotypes can be leveraged to improve the efficiency of agricultural breeding programs, and could help identify disease-causing genomic variations in humans.

Prof. John Williams, Director of the Davies Research Centre at the University of Adelaide, Australia, wrote, “We are interested in expression of imprinted genes and for this work the availability of haplotype-resolved genome assemblies is an important advance. The release of software that enables the creation of haplotyped genome sequence assembly will revolutionize exploration of genome function. The FALCON-Phase software has this ability and can be applied retroactively to SMRT assemblies, as long as Hi-C data are available. Therefore, even pre-existing genomes can potentially be upgraded to haplotyped assemblies for little or no cost.”

Haplotype-resolved genome assembly is an exciting emerging field. Currently, there is only one other method, Trio Canu, which, unlike FALCON-Phase, requires the parents and offspring to be sequenced, adding an additional cost. For many species, it is not possible to collect a trio in the wild and breeding is often not an option. Other Hi-C phasing techniques exist, but they phase genetic variants, not genome assemblies.

The addition of ultra-long genomic interactions captured by Hi-C to PacBio assemblies is very powerful and presents a straightforward solution to a problem experienced by almost all genomic researchers working with diploid organisms.

A formal announcement with more information is coming in the next month. For more information, email us at info@phasegenomics.com.

 

Pacific Biosciences, the Pacific Biosciences logo, PacBio and SMRT are trademarks of Pacific Biosciences of California, Inc.