Category: Genomics

An Ancient Fungal Affair

two fungi exchange love letters in a whimsical forest scene

 

New genomic technology reveals the parental past of “ancient asexuals,” paving a route to crop engineering and soil remediation with symbiotic fungi

 

In a warming, crowded world, we need more help than ever from plants. But maximizing the bounty from crops — from food to fuel to fibers — means coaxing plants to draw minerals and nutrients from soil more effectively, and paying special heed to the tiny, often-overlooked fungi that make this possible.

Plant roots have symbiotic relationships with fungi that stretch back eons. For example, arbuscular mycorrhizal fungi, or AMF, have been cozying up to plant roots for at least 400 million years. In exchange for carbon-rich lipids from their hosts, AMF — named for the branch-like structures their bodies form within plant roots — help host defenses against pathogens, deliver water and increase absorption of nutrients rich in nitrogen, potassium and phosphorus. They also boost plant diversity.

Thanks to this ecological generosity, AMF are used as crop stimulants and in soil remediation. Their lipid lust also makes them good at carbon sequestration. Theoretically, engineered AMF strains could mount an even better performance in these essential tasks. But scientists have long viewed certain features of AMF, particularly their genetic structure and life cycle, as evolutionary puzzles that must be solved to make strain engineering possible and build better symbionts.

Working with Phase Genomics, scientists at the University of Ottawa recently overcame this barrier, successfully sequencing the genomes of four strains of the most widely studied AMF species, Rhizophagus irregularis. Using Phase’s proximity-ligation sequencing techology, they showed for the first time that the genomes of AMF are simultaneously more straightforward and more surprising than many mycologists had dared to dream.

Armed with this knowledge, scientists can plan new approaches to engineer AMF strains for applications in biomass production, soil remediation — and beyond.

 

The mysterious kary carryall

For years, the more scientists looked at AMF, the more questions they had. AMF bodies are essentially bags of haploid nuclei — tens of thousands, all sharing a common cytoplasm. And that’s not all.

“There were many, many outstanding questions about AMF,” said Dr. Nicolas Corradi, leader of the University of Ottawa team. “This was primarily because these fungi are always multinucleated and lack observable sex. It was suggested that AMF have an ‘oddball’ genetics and evolution.”

They were assumed to be “ancient asexuals,” who must’ve somehow thrived without the gene-shuffling benefits of sexual reproduction.

Dr. Corradi and his colleagues were determined to find out if that’s the case, and in the process began to shatter AMF’s asexual reputation. In 2016, they showed that Rhizophagus irregularis strains harbor evidence of sexual reproduction, including finding some of the genes needed for it. In some strains, all nuclei were genetically identical. But other, more robust and resilient AMF strains — termed heterokaryons — harbored two distinct populations of nuclei in their cytoplasm. More recently, Dr. Corradi and his team reported that the two populations of nuclei in heterokaryons change in abundance, depending on their host plant.

“But these were, however, based on fragmented genome datasets,” said Dr. Corradi.

To know for sure what was going on in AMF heterokaryons, the team needed a method to sequence the complete genomes of both populations of nuclei, allowing more complex studies of gene expression, genetic exchange and evolution in these puzzling fungal packages.

 

Would you prefer carrots or chicory?

Working with Phase Genomics, Dr. Corradi and his team employed a combination of proximity ligation (Hi-C) and PacBio HiFi data to sequence the genomes of both nuclear populations in four Rhizophagus AMF heterokaryon strains. Surprisingly, all four strains harbored genomes largely similar in structure — 32 chromosomes, with clear delineations between gene-rich and gene-poor regions — but highly divergent in sequence. For all four strains, the two populations of nuclei were essentially haplotypes, derived from parental strains during prior sexual reproduction.

Equipped with eight complete genomes — two haplotypes among four strains — the team followed-up with gene-expression analyses and discovered that each haplotype was transcriptionally active. But within an individual strain, haplotype gene expression patterns were not equal.

“AMF heterokaryons carry two haplotypes that physically separate among many thousands — potentially millions — of co-existing nuclei,” said Dr. Corradi. “This is unheard of in any other organism. But each ‘parental genome’ also regulates different biological functions, and these change depending on the plant host.”

They recorded at times dramatic shifts in haplotype abundance and expression depending on the AMF heterokaryon’s plant host — carrot versus chicory, for example. This suggests that each haplotype makes specific and unique contributions to the AMF heterokaryon’s phenotype. Future studies will have to tease out what role the plant host is playing, if any, in these shifting expression and abundance patterns.

 

Sex, but when? And more new mysteries

In assembling these long-sought genomes that co-exist within a common cytoplasm, Hi-C has revealed that Rhizophagus AMF heterokaryons are not as complex as once thought, or feared. Both haplotypes within each heterokaryon appear to arise through some past sexual reproduction event, contribute to the AMF’s phenotype and have unique gene expression patterns based on plant host. Their surprisingly ordinary genetic behavior — at least, ordinary for fungi — means it could be possible to engineer AMF that are even better symbionts for specific hosts, helping to boost crop biomass or improve resilience, for example. Engineered strains could also aid in soil remediation, or store carbon that would otherwise end up above ground or in the air.

The findings, coupled with the team’s previous experiments, also bring new mysteries into focus: AMF strains appear to employ a mixture of sexual and asexual reproduction, similar to other fungi. But scientists have never witnessed AMF sexual reproduction — a potentially useful tool for engineering strains. The new genome sequences will also serve as a point of comparison as scientists investigate whether the hundreds of other AMF species are similar to Rhizophagus — and their potential to transform agriculture.

Catching Evolution in the Act

Scientist studying chromosomes

 

Genome sequencing has confirmed some long-held theories about the blueprints of life. But it has also unearthed quite a few surprises. Scientists once hypothesized that the human genome consisted of upward of 100,000 genes. The decades-long Human Genome Project — as well as many next-generation sequencing studies — have prompted the downward revision of that figure to a relatively spartan 20,000 genes, more or less.

 

Evolution in action

 

If there is a lesson in this vast overestimation to our gene load, it is perhaps that evolution shapes genomes in unexpected ways.

 

The advent of more nimble and lithe methods for genome assembly and analysis holds the promise to unearth the surprises that evolution has wrought. These relatively new advancements include tools like Phase Genomics’ ultra-long-range sequencing, which reconstructs the sequence of chromosomes by using positional relationships between DNA sequences in the genome. These methods have grown sufficiently sophisticated to catch the quick transitions that transform populations and species.

 

Recently a team led by Dr. Leonid Kruglyak at UCLA employed these tools to catch evolution at work. Their discovery relates to sex determination, a complex developmental process that, in animals, generally kicks off when an immature gonad develops into either testes or ovaries. In humans and many animals, sex determination is governed largely by genes, and in turn shapes their genomes and evolutionary trajectories like few other biological processes can.

 

That special pair

 

For species with full genetic control over sex determination, the process often leaves its imprint on the genome in the form of sex chromosomes. In most animals, genomes consist of pairs of chromosomes called autosomes. But in addition to those autosomes, many animals — including us — harbor another set of chromosomes called the sex chromosomes. Sex chromosomes govern — or at least try to govern — whether the gonads develop into ovaries or testes, which  in turn influences the development of genitals and secondary sex characteristics.

 

Scientists have long theorized that sex chromosomes evolve from autosomes. Studies of young, relatively new sex chromosome systems, like those in the medaka, indicate that the transition happens fast. Yet the steps that transform a pair of autosomes into sex chromosomes are at best murky, with many questions unresolved. Much could be answered by catching this transition from autosome to sex chromosome in the act.

 

Behind the curtain

In a paper published June 1 in Nature, Dr. Kruglyak and his colleagues announced that they have found just such a transition: an animal with a pair of autosomes that is beginning to act like sex chromosomes. The researchers utilized Phase Genomics’ Proximo™ genome scaffolding platform and PacBio long reads to sequence and assemble a highly complete genome for a microscopic, freshwater flatworm, Schmidtea mediterranea. In many parts of its natural habitat across the Mediterranean basin, S. mediterranea reproduces by budding, without the need for sex. But some populations in Corsica and Sardinia produce the next generation through sexual reproduction.

 

The team, including lead and co-corresponding author Dr. Longhua Guo at UCLA, discovered that in these sexual strains of S. mediterranea, one pair of autosomes shows evidence of almost no genetic exchange, also known as recombination, during reproduction. This is a telltale signature of sex chromosomes. In addition, they saw that the unusual pair of autosomes harbors a large contingent of genes that play a role in developing sex-specific characteristics. Taken together, these genomic data finger these autosomes as a “sex-primed” pair that are in the process of evolving into fully fledged sex chromosomes.

 

Photo finishes

 

Future studies of S. mediterranea’s nascent sex chromosomes will likely fuel fresh inquiry and debate about this rarely-seen evolutionary transition. The answers will stretch far beyond flatworms. Studies of other recently evolved systems, such as in stickleback fish, show that sex chromosomes can play a decisive role in other poorly understood evolutionary transitions, such as the rise of a new species.

 

Beyond sex chromosomes, this study demonstrates the raw interrogative power of modern genome assembly and analysis methods. They can capture transitions — even the most brief and ephemeral. Applied appropriately, methods like these can help scientists make sense of a myriad of messy, complex processes that evolution shapes. These include some issues that hit as close to home as gonads, from curbing the spread of antibiotic resistance to protecting pollinators from annihilation. Evolution moves quickly. Now, so can we.

 

Ultra-long-range sequencing technology expands research opportunities in reproductive genetics and oncology

 

Phase Genomics’ recent release of the RUO cytogenomics platform, CytoTerra™, was accompanied by a webinar which covered an in-depth analysis of current technologies and emerging opportunities in reproductive genetics and oncology.

 

 

“The genome is the blueprint of life,” beginning the webinar, Ivan Liachko describes Phase Genomics’ history of discoveries and contributions to genomic research. Through the development of various genomic, metagenomic, and epigenomic platforms, Phase Genomics has risen as a leader in next generation sequencing (NGS) solutions. Now, the company’s latest platform leverages their ultra-long-range sequencing technology to be used for cytogenomic applications

 

Chromosome rearrangement is a known driver of many diseases, including cancer, infertility, developmental delay, and immunologic complications. Thus, the detection and treatment of these rearrangements is essential in the advancement of modern medicine and therapeutics. However, current methods are limited in scale, throughput, and resolution. Additionally, challenges in sample types and analysis constraints present a cascade of costly tests to run in order to assemble a complete view of the genome. Some of these challenges include culturing dividing cells for cytogenomics, obtaining advanced knowledge of the targeted abnormality for fluorescence in situ hybridization, and working within the limited scope of rearrangements detectable by chromosomal microarray analysis. Further complicating the process of genetic analysis, most cancer biopsies are stored as formalin-fixed paraffin-embedded (FFPE) samples—a wax-like encasing which kills the cells and traps the DNA. Historically, there has been no way to access the DNA to perform NGS testing in these sample types. However, recent cytogenomics platforms created by Phase Genomics do not require a priori knowledge, improve chromosomal abnormality detection, and unlock information in FFPE samples, offering a promising solution to many challenges in the oncology and reproductive genomics spaces. 

 

Phase Genomics’ new cytogenomics platforms, CytoTerra and OncoTerra, are powered by ultra-long-range sequencing—using proximity ligation data and artificial intelligence  to analyze the breadth of chromosome arrangements in a single assay—which eliminates the need for sequential testing, includes a scalable approach to genomic detection, and unlocks information stored in difficult sample types, including FFPE and frozen samples. 

 

Watch the webinar for more information on the expanding possibilities of chromosomal aberration detection and contact Phase Genomics to start a project.



Transcription

 

00:00:01:18 – 00:00:15:08

Speaker 1

Hi, everyone. My name is Ivan Liachko and I’m one of the founders and chief scientist at Phase Genomics. I’m joined today by Jill Tapper, our cytogenetics product manager. Today, we’re going to tell you about our new next generation cytogenomics platform.

 

00:00:15:19 – 00:00:36:23

Speaker 1

This new platform is powered by a unique next generation sequencing technology and has the power to transform how clinicians and researchers approach oncology and reproductive genetics. All right. Let’s get started. So, for those of you who are not familiar with Phase Genomics and what we do, essentially our thing is building genomes.

 

00:00:37:09 – 00:00:54:16

Speaker 1

What we do is we capture unique genomic information to reconstruct genomes and genome structure in order to transform research and clinical applications. We got our start by building cutting edge genomic tools to assemble genomes for non-model organisms.

 

00:00:55:00 – 00:01:13:09

Speaker 1

So, we came out a few years ago with the first chromosome scale non model genome scaffolds as a way of basically putting together an end-to-end chromosome scale genome for anything—for plants, animals, fungi. We’ve also developed tools to haplotype phase a genome of any size.

 

00:01:14:13 – 00:01:37:00

Speaker 1

This is something that at this point is fairly well accepted by the field. We’ve published this over 100 times. It’s even made it into the popular press. And what this sort of technology is based on is a method that has many names, the most descriptive of these is ultra-long-range sequencing.

 

00:01:37:10 – 00:01:52:11

Speaker 1

What it does is it allows us to sequence DNA molecules that are really far, far away from each other. And the way it works is you will take a cell that is intact and within the cell. The genome is condensed into this three-dimensional structure, right?

 

00:01:52:11 – 00:02:11:01

Speaker 1

Remember, a genome is just linear molecules being squished into a ball and they condense into these three-dimensional shapes. The way the technology works is it captures physical junctions between DNA molecules that are close to each other in three-dimensional proximity.

 

00:02:11:02 – 00:02:26:22

Speaker 1

So, in three-dimensional proximity with each other, we can capture these junctions and sequence them. And what that does is it tells us it gives us a way to count how often every part of the genome is close to every other part of the genome.

 

00:02:27:10 – 00:02:47:10

Speaker 1

And so, if you know how often two sequences are physically touching each other, you can figure out how close they are because the sequences that are closer touch more and sequences that are further touch less. And if you know this, if you know this sort of three-dimensional distance between all the sequences in the genome, you can

 

00:02:47:10 – 00:03:08:09

Speaker 1

reconstruct that into a genetic map, right? Sequences that are closer touch more sequences that are further touch less, and that enables you to do to basically use computational tools to reconstruct that information into a karyotype. And so, if you have a genome that you don’t know how it’s supposed to go together, you can use

 

00:03:08:09 – 00:03:23:05

Speaker 1

this information to scaffold it by arranging all the pieces. But if you have a genome like the human genome where you know what it’s supposed to look like, this is a really robust way of detecting rearrangements big chromosome scale karyotype, style rearrangements.

 

00:03:24:18 – 00:03:41:10

Speaker 1

There’s a lot of other things you can do with this technology. I’ll mention them briefly, just for reference. So, the first thing that I’ve mentioned is it allows us if you have this data, you can reconstruct essentially high-resolution genetic map for whichever organism it is you’re working with.

 

00:03:42:07 – 00:03:52:14

Speaker 1

But it also allows us to assemble and phase genomes de novo. So, when you don’t have a sort of a scaffold, a genome with some new organism you’ve never seen before, it allows us to assemble the genome from scratch.

 

00:03:53:10 – 00:04:10:22

Speaker 1

This technology allows us to understand the three-dimensional architecture of the genome. So basically, it allows us to study the 3D structure of a genome, which is a very, sort of very interesting biological property that every genome sort of lives in.

 

00:04:11:13 – 00:04:25:16

Speaker 1

We also have a number of cool tools in the microbiome space. So, this technology I won’t go into this technology has lots of really neat properties that allow us to discover new bacteria, new viruses, new mobile elements.

 

00:04:26:02 – 00:04:44:04

Speaker 1

It allows us to track the movements of mobile elements such as antibiotic resistance genes in infectious disease microbial environments, in addition to building a suite of wet lab molecular tools, which we, we sell all sorts of kits and services in the space.

 

00:04:44:16 – 00:04:58:23

Speaker 1

We also are a very informatically focused company. And so, we develop the tools that are needed to take this unique information type and actually turn it into actionable insights. So, we’ve developed everything so Proximo, our genome scaffolding platform.

 

00:04:59:04 – 00:05:10:12

Speaker 1

We’ve developed tools such as Falcon Phase, which allow us to phase genomes, haplotype-phase genomes. We have a number of tools for doing karyotype instead of genetic type studies, and that’s what we’re going to talk about today.

 

00:05:10:24 – 00:05:31:23

Speaker 1

And then we have a suite of methods that leverages this technology for microbiome discovery. ProxiMeta is for discovering new bacterial genomes, ProxiPhage for discovering new phages and then ProxiLink is for looking at the transmission of antibiotic resistance in complex microbial communities.

 

00:05:33:11 – 00:05:49:12

Speaker 1

And so, the focus of today’s talk is really going to be on one of the properties of this technology. It allows us, you know, we really want to understand the structure of chromosomes, the structure of genomes. This is extremely important in the medical space, right?

 

00:05:49:12 – 00:06:07:18

Speaker 1

There is a whole army of diagnostics that have been designed specifically to look at the structure of chromosomes. But these diagnostics of the day of today, the sort of the most well adapted ones, you know, they’re limited, they’re limited in scale, they’re limited in throughput, they’re limited in their resolution.

 

00:06:08:01 – 00:06:27:07

Speaker 1

And our technology can solve these problems to a large degree. And that’s what we’re going to be displaying today. So, we recently launched a method called CytoTerra It’s a new platform that we’ve developed that enables us to leverage this technology to really benefit folks who are trying to do cytogenomic testing.

 

00:06:27:18 – 00:06:30:19

Speaker 1

And that’s what Jill is going to talk to you next.

 

00:06:32:20 – 00:06:58:09

Speaker 2

Thanks, Ivan. I want to start with some basic background context as to how our platform plays a role in advancing precision medicine focused research and diagnostics. And it’s really built on this fact that we know very well, which is genomic and chromosomal rearrangements are drivers of every aspect of disease, from etiology to prognosis to therapy selection.

 

00:06:59:00 – 00:07:22:03

Speaker 2

And we see proof of this in long standing examples like the 9:22 translocation and CML shown on the left and those patients’ response to a very specific therapy Gleevec. We also see in an example like the spectral cure type tumor on the right, where there are likely many abnormalities as opposed to one specific abnormality contributing to this

 

00:07:22:03 – 00:07:46:16

Speaker 2

tumor’s development. And those are examples in cancer, but we know these rearrangements play an equally important role in many other diseases and conditions like infertility, recurrent pregnancy loss, developmental delay, and those are just a few. So, we have a collection of or genomic methods we use to try and help uncover these genomic disease drivers.

 

00:07:47:01 – 00:08:07:19

Speaker 2

These are karyotyping or chromosome analysis, FISH, and microarray. So, among these current solutions, we have a combination of high and low throughput approaches, high- and low-resolution approaches. But even with this range of resolution and throughput, each solution still has its drawbacks.

 

00:08:08:15 – 00:08:34:00

Speaker 2

For cytogenetics, we need live cells to grow in culture. We also need highly skilled personnel to do the analysis and interpretation of the results. For FISH, we need advanced knowledge or advanced suspicion of the abnormality. With Array, it’s difficult to detect things like balanced rearrangements inversions, low-level mosaicism

 

00:08:35:01 – 00:08:55:05

Speaker 2

So collectively, we’re trying to balance these limitations such that we get as comprehensive of a view as possible with respect to the size and type of abnormalities that may be present. And typically, to get that comprehensive view, it’s necessary to use these methods in a sequential format.

 

00:08:55:13 – 00:09:15:09

Speaker 2

So, when we look at that in terms of workflow and timeline, we end up with a resource intensive, very long, very expensive cascade testing approach. And this is where our Phase Genomics platform has a major impact in that it offers an efficient, streamlined workflow.

 

00:09:15:21 – 00:09:37:19

Speaker 2

And that’s because ultra-long-range sequencing can provide the large structural and copy number variation detection capabilities that we find with cytogenetics, along with the molecular precision of FISH and Array, all in a single assay. So, we’re eliminating the cost and time associated with the current cascade approach.

 

00:09:39:23 – 00:10:01:05

Speaker 2

Ivan touched on the technical aspects of ultra-long-range sequencing earlier on. But at a high level, we’re able to leverage that method’s unique capability to capture the physical proximity of DNA sequences in the genome. We then use our proprietary analytic software to convert the proximity counts to genomic distances.

 

00:10:01:12 – 00:10:31:17

Speaker 2

And as a result, we’re able to detect a wealth of abnormalities like balanced and unbalanced translocations, inversions, insertions, aneuploidy, and a lot more. And these capabilities are part of a comprehensive sample-to-report service workflow, with results ultimately being returned using standard ISCN in sequencing nomenclature and returned in a familiar clinical style report format.

 

00:10:32:16 – 00:10:53:02

Speaker 2

Well beyond the report, we’re creating a valuable data resource for novel variant and biomarker discovery. So, does it really work? Yes. And here is some data from proof of concept work we conducted with an academic health system clinical genetics lab.

 

00:10:53:21 – 00:11:14:07

Speaker 2

And in these 100 plus samples, we see our Phase Genomics platform not just meeting, but exceeding the detection capabilities of the current set of genomic approaches, with some low-level translocations not previously identified being detected. And we’ve had similar success with other sample types.

 

00:11:14:13 – 00:11:31:16

Speaker 2

And you can see a list of some of those here. Everything from whole blood and cheek swabs to POC tissues. But it’s not just compatibility with numerous sample types that makes this platform so flexible. It’s flexibility in sample condition as well.

 

00:11:32:05 – 00:11:50:22

Speaker 2

The platform works with fresh samples, frozen samples and very notably, it works with FFPE samples. And here’s just a small representation of some of the FFPE tissue types we’ve worked with in the past. So why is sample type flexibility compatibility…

 

00:11:51:02 – 00:12:16:14

Speaker 2

Is detection capabilities particularly meaningful to reproductive health in oncology? Well, to start, there are some sample related challenges in these areas. Obtaining fresh sample material for cell culture is difficult in POC samples, for example, this tissue is often non-viable, with very high failure rates for tumor and bone marrow samples.

 

00:12:16:20 – 00:12:36:03

Speaker 2

There’s often a limited amount of sample to work with, and the cells are not necessarily unviable, but they’re disease cells that are often very challenging to work with. Formalin fixation and paraffin embedding are also prevailing collection methods for these sample types, so cell culture is immediately out of the question.

 

00:12:36:20 – 00:13:03:08

Speaker 2

And then there are concerns about obtaining sufficient quantity and quality of the high molecular weight DNA that’s needed for things like Array and many NGS assays. These are also areas where balanced rearrangements play a significant role. There are cryptic translocations or seemingly balanced rearrangements in areas of visual homology that can be causative, so knowing if something is

 

00:13:03:08 – 00:13:31:19

Speaker 2

truly balanced is critical. We also know that gene fusions resulting from balanced rearrangements drive cancer and tumor development. So, in the end, these challenges present as missed opportunities, opportunities to uncover diagnostic and prognostic information, to discover biomarkers for therapy and treatment development, and to make genotype phenotype disease associations to further disease understanding.

 

00:13:33:24 – 00:13:54:10

Speaker 2

So, in comparison to the current cytogenetics methods, our Phase Genomic platform can avoid these and many other missed opportunities by offering genome wide simultaneous detection of the multiple types of genomic rearrangements that cause and characterize disease, and it can do so in a single assay.

 

00:13:59:20 – 00:14:11:04

Speaker 2

The platform has capabilities well beyond what we expect of our current cytogenetics methods. I’m going to hand things back to Ivan so he can talk about what some of those expanded possibilities are.

 

00:14:14:18 – 00:14:29:20

Speaker 1

Thank you, Jill. So, as you have just seen, this platform is very useful in the field of cytogenetics. But there’s more there’s a lot of things you can do with it beyond just sort of an improved way of conventional testing.

 

00:14:30:14 – 00:14:57:20

Speaker 1

one of the main challenges in oncology is that while you know there are obviously so many different cancer types, only a small subset of available cancer samples get processed, cytogenomically and get analyzed by cytogenomic assays. And the reason is that the vast majority of cancer biopsies in cancer samples are stored as FFPEs.

 

00:14:58:03 – 00:15:20:07

Speaker 1

They’re stored in formalin fixed paraffin embedded format. And what that does is that kills all the cells and also ruins the DNA for long read sequencing for optical genome mapping. And so, it makes most karyotypic assays and large-scale structural analysis virtually impossible.

 

00:15:20:21 – 00:15:31:04

Speaker 1

And this is one of the things that our technology can overcome. So first, let’s take a look at what this data looks like we’ve been talking a lot about. So, the technology and what it can do? But here’s what.

 

00:15:31:16 – 00:15:54:05

Speaker 1

At its core, here’s what the data looks like. When you plot this type of ultra-long-range sequencing information, you can you generate these sorts of maps these heatmaps. Imagine if you’re not familiar with this, imagine a just a matrix where an x axis and the y axis you just lay out the chromosomes like left to right

 

00:15:54:19 – 00:16:18:03

Speaker 1

and you’re seeing this coordinate system. This is chromosome, you know, 1, 2, 3, 4, 5 and chromosomes along this line. And these boxes are showing you how much interaction there is within that combination of coordinates. So, this heat inside of this box tells you that there’s a lot of interaction between chromosome two and chromosome two other parts of

 

00:16:18:03 – 00:16:33:07

Speaker 1

the same chromosome. They’re touching each other because they’re close. But there’s not a lot, for instance, between chromosome two and chromosome four. Right? But then if you look at a cancer sample like this one, you will see that there is this hotspot in this box.

 

00:16:33:07 – 00:16:48:05

Speaker 1

And what that means is that this area of chromosome two and this area of chromosome four are touching each other way more than they’re supposed to. They’re closer together. So, this was caused by a translocation and there are different types of these events.

 

00:16:48:06 – 00:17:07:20

Speaker 1

Sometimes they look like squares, sometimes they look like bow ties, et cetera. We have essentially trained the analytics, right, we’ve built this A.I. that recognizes these things, and that’s how we can generate these. These karyotypic reports, karyotype maps, but we can do this in FFPEs.

 

00:17:07:24 – 00:17:26:18

Speaker 1

And the reason why we can do this in the FFPE is because the first step of our method involves fixation with format with formaldehyde or formalin, which is what the f is in FFP. And so, we’re able to not only generate these kinds of cytogenetic profiles on fresh frozen tissues and cells and these sorts of things

 

00:17:26:23 – 00:17:44:22

Speaker 1

, but also, FFPE slices and even a single FFPE slice can generate a really cool complex karyotype. So, this is an example from one of our collaborators. This is a solid cancer. FFPE slice from a solid cancer is just a slice.

 

00:17:44:22 – 00:17:59:23

Speaker 1

You don’t need to consume the entire FFPE block. Everything works with sort of how people are used to looking at it, and you can see again, in this case, the data is shown in a different color. It’s now orange instead of blue, like in the previous slide.

 

00:18:00:09 – 00:18:20:21

Speaker 1

But basically, again, these boxes are the chromosomes and these little shapes. These events out here that are marked by black arrows represent the structural chromosomal aberrations within this FFPE slice. And so, you know, there’s there are these little bow ties and squares like before, and you can sort of divide them.

 

00:18:21:13 – 00:18:30:15

Speaker 1

Here’s what they look like when you when you zoom in. If you were to kind of visually analyze it, this is what they would look like. Of course, we use software, but you can actually look at them and see them with your eyes.

 

00:18:31:19 – 00:18:46:23

Speaker 1

And so, this is what a balanced translocation looks like an unbalanced an inversion because this technology is sequencing based, Illumina sequencing based. It allows you to do all the other things that you do with Illumina so you can detect deletions and copy number changes.

 

00:18:46:23 – 00:19:10:18

Speaker 1

You can detect amplifications; you can detect aneuploidy and other similar things. And so, you can generate these really complex karyotypes right off of FFPE without sort of, you know, in the very in a very manageable way. And so, what we’re going to show you in this video here is a comparison of a data set from a fresh

 

00:19:10:18 – 00:19:28:11

Speaker 1

frozen lung cancer sample and an FFPE matched sample from the same biopsy. And so, what you’re looking at again is just like before the boxes in the middle of the chromosomes and the events out in sort of in this yellow space.

 

00:19:28:16 – 00:19:43:15

Speaker 1

These are your translocations and other karyotype aberrations. So, we’re going to zoom in. This is being visualized in a tool called high glass. And so, what we’re doing is we’re sort of zooming in in sync with a fresh frozen in an FFPE sample.

 

00:19:44:03 – 00:20:06:07

Speaker 1

And when I hope you can see is that we’re able to even in a FFPE sample, detect really, really sort of crisp, large scale structural rearrangements. And this is going to zoom in on its kind of just so you can see the sort of the beginnings and the ends of these things, and we can then map

 

00:20:06:07 – 00:20:17:01

Speaker 1

them to, of course, their genomic coordinates and what you’re looking at here is this is the entire genome. And so, you can see there’s lots of things happening here. There’s some of them are small, some of them are big.

 

00:20:17:11 – 00:20:39:20

Speaker 1

So, this allows you. This method allows you to reconstruct high quality karyotype maps specifically for FFPE samples, as well as for any kind of fresh frozen sample. And finally, what I’m going to end with is this technology because it is based on Illumina sequencing.

 

00:20:40:01 – 00:20:55:14

Speaker 1

It is compatible with target capture methods. So, if you are looking for if you’re looking for specific snips, if you’re looking for mutations, if you’re looking for a loss of heterozygosity in specific regions, you can actually pair the two methods.

 

00:20:55:20 – 00:21:15:17

Speaker 1

You can use pretty much any capture panel out there. They all work extremely well with this technology, and that allows you to do simultaneous snip profiling looking for specific mutations in specific genes while also generating a structural map of karyotype for your given sample.

 

00:21:15:17 – 00:21:39:02

Speaker 1

Be it fresh frozen, be at cells, be it FFPE directly out of one sample, so you can combine a target capture panel and a cytogenetic work, up from the same sample with this technology. So just to summarize and finish off this talk, what we’ve shown you is that applying ultra-long-range sequencing to various biological samples

 

00:21:39:09 – 00:22:00:14

Speaker 1

is basically a really useful way of enhancing your cytogenomics game. It’s a scalable platform for cytogenomics. It’s simple. It can be performed in your lab. You don’t need a special machine for this. Aside from Illumina sequencers, which can which are sort of, you know, pretty common these days, we provide both services kits as

 

00:22:00:14 – 00:22:20:07

Speaker 1

well as companion cloud-based analytics, so you don’t have to sort of figure out your own computational pipeline. We integrate both the analytics as well as the molecular methods. It works on all sorts of difficult sample types. It works, and if it works on cheek swabs, it works on all sorts of samples that are usually very difficult

 

00:22:20:10 – 00:22:36:07

Speaker 1

to process by traditional cytogenetic methods. As I mentioned, you don’t need to make sort of buy a special machine for this. You don’t need high molecular weight DNA, so it’s a much easier way of generating large scale cytogenomic data.

 

00:22:37:08 – 00:22:52:16

Speaker 1

If this is something that you’re interested in, please let us know. And the reason why we’re giving this webinar is to announce sort of our early access program. And so, we are recruiting folks to do all sorts of cool research with us.

 

00:22:53:00 – 00:22:58:19

Speaker 1

So, this is of interest, please. You can scan that barcode. You can go visit us on the website and shoot us an email.

 

100 Publications

 

 

Over 100 scientific papers have been published using Phase Genomics technology!

 

Since our founding in 2015, we have sought to bring transformative change to research, industry, and the clinic by building and providing cutting-edge genomic solutions to scientists all over the globe. Now, in 2021, we are happy to look back at the accomplishments made by those using our kits, services, and software.

 

Our team of researchers, computational scientists, and bioinformaticians have refined our ProxiMeta and Proximo Platforms (as well as many other products) to construct platinum genomes, master the microbiome, and expand our knowledge of the human genome and epigenomics. From potatoes to people, cassava to cannabis, bison to basenjis, our molecular tools and software have been used to drive genomic discoveries across many scientific fields. We encourage you to take a look at the fascinating collection of articles we have compiled here that explore more research using our technology.

 

Over the years, we have also helped break records and make headlines as researchers use our platforms to make breakthroughs in science.

 

A Question Hidden in the Platypus Genome: Are We the Weird Ones?

-The New York Times

Phase Genomics Releases Platform for Discovering New Viruses in Microbiome Samples

-BusinessWire

Precision Medicine Looks beyond DNA Sequences

-Genetic Engineering and Biotechnology News

 

We are grateful to all the researchers who have been working with us to accomplish these feats. Together, we can drive innovation and continue to make advances in genomic science. We will continue to work on ways to add applications and support current research, making it easier to get high-quality data and comprehensive reports.

 

Follow us on social media (Twitter, LinkedIn, YouTube) or subscribe to our quarterly newsletter (Phasebook) to receive updates on our technology and highlights from the latest in genomics.

Unlock the Virome with ProxiPhage

viruses moving through a net

 

Metagenomic studies are illuminating the diverse array of microbiomes that exist from the ocean floor to our gastrointestinal tracts. Understanding these microbial communities is essential to understanding modern health and the environment; however, outdated lab techniques are laborious, costly, and fail to create a complete picture of the microbiome. This article, posted by Ivan Liachko, describes how advancements in biotechnology are facilitating exciting discoveries with recent tools developed to capture phage and other mobile genetic element dynamics within microbiome samples.

Continue reading to discover how ProxiPhage, a recent addition to the ProxiMeta platform, is helping scientists answer questions relating to microbiome composition dynamics, prophage prevalence, frequency of transient infections, spread of antibiotic resistance, and more.

https://www.linkedin.com/pulse/unlocking-virome-proximity-guided-metagenomics-new-frontier-liachko/

 

 

Better together: long-range and long-read DNA sequencing methods, combined, reach record heights in microbiome discovery

Microbiome plate and Phase Genomics logo. Reads "Breaking records in microbiome discovery"

 

Click here for an updated blog post.

 

Since its debut, next-generation sequencing has not rested on its laurels. Improved sequencing platforms have reduced error and lengthened reads into the tens of thousands of bases. The debut of long-range sequencing methods that are based on proximity ligation (aka Hi-C) has brought a new order-of-magnitude into reach by linking DNA strands with their neighbors before sequencing.

 

This progress has birthed high-resolution metagenomics, the sequencing and assembly of genomes from environmental samples to study ecosystem dynamics. But metagenomic experiments often undersample microbial diversity, missing rare residents, overlooking closely related organisms (like bacterial strains), losing rich genetic data (like metabolite gene clusters), and ignoring host-viral or host-plasmid interactions.

 

A revolution within a revolution

 

New sequencing platforms and methods can reform metagenomics from within. Long-read platforms, such as the PacBio® Sequel® IIe system, now yield HiFi reads of up to 15,000 base pairs with error rates below 1%. In addition, Phase Genomics created ProxiMeta™ kits to generate proximity-ligated long-range sequencing libraries, which preserve associations between DNA strands originating in the same cell.

 

In a study posted May 4 to bioRxiv, a team — led by Dr. Timothy Smith and Dr. Derek Bickhart at the U.S. Department of Agriculture and Dr. Pavel Pevzner at the University of California, San Diego — employed both PacBio HiFi sequencing and ProxiMeta in a deep sequencing experiment to uncover record levels of microbial diversity from a fecal sample of a Katahdin lamb. Combined, PacBio HiFi sequencing and ProxiMeta eased assembly, recovered rare microbes, resolved hundreds of strains and haplotypes, and preserved hundreds of plasmid and viral interactions.

 

HiFi family trees

 

The team constructed SMRTbell® libraries to generate HiFi data, and ProxiMeta kits to generate long-range libraries. The two datasets, along with the metaFlye and ProxiMeta algorithms, allowed them to assemble contigs and create draft genomes without manual curation.

 

Researchers compared the breadth and depth of HiFi data-derived metagenome-assembled genomes, or MAGs, to control MAGs from assemblies of the same sample made using long, error-prone reads. HiFi data yielded more complete MAGs — 428 versus 335 — from more bacteria and archaea. HiFi data also generated more low-prevalence MAGs, capturing a larger slice of the community’s diversity by picking up more genomes from less common residents.

 

The HiFi MAGs also contained more than 1,400 complete and 350 partial sets of gene clusters for synthesizing metabolites such as proteasome inhibitors, which likely help some of these microbes colonize the gut. HiFi data picked up about 40% more of such clusters than control MAGs, illustrating just how much data is lost when long reads aren’t also highly accurate reads.

 

The team also used the HiFi MAGs to trace lineages within the community. They computationally resolved 220 MAGs into strain haplotypes, based largely on variations within single-copy genes. One MAG had 25 different haplotypes, which are likely strains of the same genus or species.

 

ProxiMeta’s long-range discoveries

 

The ProxiMeta-generated libraries added flesh to these MAG frames skeletons by unveiling additional rich biological information. Long-range sequencing linked nearly 300 HiFi-assembled plasmids to specific MAGs — revealing the species that hosted them. One plasmid, for example, was found in bacteria from 13 different genera. Long-range data also identified the first plasmids associated with two archaea, Methanobrevibacter and Methanosphaera.

 

Long-range sequencing illuminated the viral burden in this community. The HiFi library included nearly 400 viral contigs, more than half of which came from a single family of viruses that infect both bacteria and archaea. The team identified 424 unique viral-host interactions, including 60 between viruses and archaea, which is a more than 7-fold increase over controls.

 

What’s around the bend?

 

This study has lessons beyond one lamb’s gastrointestinal tract. It shows decisively that the highly accurate long reads generated by HiFi sequencing ideal partners for Hi-C-derived methods like ProxiMeta — together generating increasingly sophisticated metagenome assemblies for biologists to interrogate.

 

Applied to other environmental samples, this platform could illuminate the diversity and complexity of other microbial communities — from the bottom of the sea to mountain peaks, and within the stomach of every human being. It could probe pressing issues of our day, such as antibiotic resistance, soil health, or how microbes can break down pollutants. These endeavors will not just fuel the engines of scientific inquiry. Broader use of this method could generate new insights into pressing problems of our times, including antibiotic resistance.

New genome assembly method makes fruitful advances in genomic technology

 

A collaboration between Phase Genomics and Pacific Biosciences of California is bringing about the next generation of genome assembly technology. A newly published software tool, FALCON-Phase, combines genomic proximity ligation methods developed by Phase Genomics™, with the high accuracy, long-read sequencing data from PacBio®, enabling researchers to create haplotype-resolved genome sequences on a chromosomal scale, without having parental genome data. This method and its application to several animal genomes was published today in Nature Communications.

cow, zebra finch, and human hand arranged in a collage

Humans, as well as other animals, carry DNA sequence copies from both parents. These parental sequence “haplotypes” can carry millions of mutations unique to one of the parents and are often very relevant to diseases and other genetic traits. Until recently, accurately separating paternal and maternal mutations on the whole-genome scale required sequence information from the individual parents or extensive efforts that relied heavily on imputation from population studies. The new method employs the physical proximity information captured by proximity ligation (a technology also known as “Hi-C”) to separate maternal and paternal haplotype information from long-read genome assemblies. This development significantly increases the actionable information content coming out of genome sequencing studies.

 

 

“It’s an exciting time for genome assembly and PacBio HiFi sequencing continues to lead the way in this area with its powerful combination of read length and accuracy,” wrote Jonas Korlach, Chief Scientific Officer at Pacific Biosciences. “Phase Genomics Hi-C complements PacBio technology by extending our data into the ultra-long-range domain, enabling us to connect phase blocks and deliver chromosome-scale diploid assemblies without parental data. We are fortunate to have this excellent partnership with Phase Genomics, and we look forward to continuing to work together to create the highest quality reference genomes available.”

 

Assembling two fully-phased genomes in a single, streamlined process not only saves on the costs of research, but it also enables scientists to upgrade their genome assembly pipelines and obtain previously unobtainable information.

 

Dr. Erich Jarvis, professor at Rockefeller University and chair of the international Vertebrate Genomes Project, wrote, “Chromosome-scale haplotype phasing is critical for generating accurate genome assemblies and for understanding genomic variation within a species.” Furthermore, FALCON-Phase produces maternal and paternal haplotypes without family-trio data, so it can be applied to wild-caught samples or organisms lacking pedigree information. Jarvis notes, “In wild populations that many work with, parental samples are usually unavailable and therefore we need a method that can phase paternal and maternal sequences in the offspring individuals. With FALCON-Phase, we are able to use the Hi-C data that we have already generated for genome scaffolding and add a new dimension to every genome assembly, even retrospectively for previous projects. Our collaboration with Phase Genomics and PacBio has been extremely fruitful and the combination of the two technologies through FALCON-Phase will be highly beneficial to genomic sequencing efforts focused on conservation.”

 

FALCON-Phase is applicable to any diploid genome, including plants, animals, and fungi. It is available as free of charge open-source software (https://github.com/phasegenomics/FALCON-Phase) and Phase Genomics offers services that include the application of this method to varying genome projects. See the latest news and publications on this and other genome assembly methods at https://phasegenomics.com/resources-and-support/publications/.

 

For more information, email us at info@phasegenomics.com.

Phase Genomics Developing Cost-Effective Genomic Tools to Track and Understand Crop Disease with New Funding

 

The impact of fungal rust pathogens is measured in tens of thousands of acres of lost crops annually and an increasingly vulnerable supply chain. An outbreak of oat crown rust devastated yields in South Dakota and Minnesota, wiping out as much as 50% of the crop in 2014 alone.  

 

Current surveillance rust collections are not enough to develop effective countermeasures against fungal rust pathogens such as oat crown rust, wheat stem rust, and many others. Now Phase Genomics has received a National Institute of Food and Agriculture grant to develop a cutting-edge genomic diagnostic test to affordably identify existing and novel strains of fungal pathogens in the wild. 

 

Phase Genomics’ proprietary technology allows it to generate full chromosome-scale rust genomes and separate their constituent sub-genomes, creating a unique genomic resource that will provide the sequence information needed to identify, track, and study virulent fungal strains. Since the platform will employ machine learning tools combined with genomics, as the dataset grows it will potentially enable scientists to proactively predict the virulence of new wild strains before they have a chance to decimate crops.  Costs from traditional diagnostic techniques are expected to be reduced by up to 90%. 

 

The same proven and proprietary technology was demonstrated by researchers producing a first-of-its-kind reference genome for the wheat stem rust pathogenic strain Ug99. The economically destructive pathogen with a dikaryotic genome structure (two independent nuclei) is a crop killer on several continents. 

 

The new ability to leverage high-quality genomic information from sets of rust strains will transform researchers’ ability to diagnose, track crop disease spread and understand the evolution of fungal virulence.

 

Learn more about leveraging this technology in your agricultural research here.

Unlocking New Frontiers in our Understanding of Human Disease through Deep Learning and Three-Dimensional Genomics

 

By Ivan Liachko, Ph.D – Founder & CEO, Phase Genomics

 

As we enter the era of personalized medicine, novel genomic technologies are enabling a much deeper understanding of the biology of individual people. Such knowledge improves our ability to detect and diagnose diseases, offering personalized treatments that leverage each person’s unique genetic makeup for maximum safety and efficacy. However, human biology is very complex, and – despite decades of advances in DNA sequencing and analysis methods – we have yet to realize the full promise of genomics-enabled personalized medicine.

To truly realize the benefits of genomics in healthcare, we must go beyond basic sequencing efforts that look at mutations or gene expression patterns, and study the higher-order structure of the genome, i.e. its organization and shape. These are known to drive many kinds of human diseases including cancer, autism, and infertility.

Phase Genomics has commercialized a new genome sequencing technology that enables us to look beyond the genetic code and characterize the higher-order organization of genomes. This technology, called “proximity ligation“, not only detects sequence differences. It enables us to identify and characterize changes in genome structure called “structural variation”, as well as patterns in the three-dimensional organization of the genome.

Phase Genomics has developed and commercialized several products that leverage proximity ligation in different research contexts. We are now combining the technology with deep learning to deliver new research and diagnostic capabilities for human disease.

The new, revolutionary approach currently in development at Phase Genomics combines deep learning with several other supervised and unsupervised machine learning methods to identify, recognize, and contextualize structural variants or other perturbations in a human genomic sample, based on recognizing structural signatures hidden deep within the proximity ligation data. Once variants are detected, they can be connected to the body of research and medical literature to provide actionable clinical information. The high-throughput nature of both the biological and computational underpinnings of this technology means that the approach is not only more effective than other methods; it is also faster, cheaper, and more scalable.

Phase Genomics will be announcing additional products delivering new dimensions of genomic insights into human disease in the coming months. For now, the research, development, and testing continue.

Built on Amazon’s AWS cloud computing and machine learning technology, and in consultation with 1Strategy – a leading cloud architecture and development firm – Phase Genomics’ proximity ligation plus deep learning technology is poised to open new frontiers in human clinical diagnostics.

The Era of Platinum Genomes Has Arrived

Platinum Genome

 

Phase Genomics is dedicating the rest of this month (January, 2019) to the beginning of “The Era of Platinum Genomes” to celebrate recent advancements in genome assembly; researchers now have the ability to generate chromosome-scale, fully-phased diploid genome assemblies for any species by combining two technologies: long-read sequencing data from PacBio and Phase Genomics’ Hi-C.

 

At the end of this month, we will be giving away a “Platinum Genome Project” which includes a full Hi-C service or kit project to an attendee at the International Plant and Animal Genome Conference 2019 (PAGXXVII). This project includes using Proximo Genome Scaffolding to generate chromosome-scale scaffolds and FALCON-Phase to phase haplotypes across the entire genome. Attendees can enter the raffle by stopping by our booth (#208) throughout the conference, or enter using the form at the bottom of this page. Stay tuned for the winner announcement on January 31st, 2019 by following our twitter account @PhaseGenomics. Offer subject to sweepstakes terms. No purchase necessary.

 

WHAT ARE PLATINUM GENOMES?

 

Much like the music industry ranks albums as gold or platinum, genomes can also be classified using the same terminology based on the completeness of the assembly and quality of phasing (i.e. haplotype resolution). High-quality genomes have complete chromosomes and haplotype resolution in critical sections of the genome qualify as a “gold genome,” whereas “platinum genomes” are assemblies with full chromosome scaffold and haplotypes resolved across the entire genome.

 

Since publishing the first human genome assembly, research from the 1000 genomes project and other groups have created several platinum human genomes to represent different human populations. In fact, one of our latest projects in collaboration with PacBio, generated the most contiguous, haplotype resolved, human genome to-date. However, there are only a few platinum genomes for non-human organisms, as scaffolding and haplotyping entire genomes is very labor-intensive using standard tools.  We are excited to offer tools such as Proximo and FALCON-Phase to help usher in the era of straightforward platinum genome assemblies to researchers studying plants and animals.

RESOURCES

Phase Genomics Workshop at PAGXXVII: Add it to your schedule.

Standard Projects Outline 

Phase Genomics Platinum Genome Sweepstakes guidelines

 

A Year in Review with Phase Genomics

 

From releasing the world’s first Hi-C kits for plants and animals to publishing the most contiguous human genome assembly to date, Phase Genomics has had a year filled with new papers, new discoveries, and new applications. Proximity-Guided Assembly is continuing to fuel genomic research and here is a brief recap of newsworthy items in 2018.

 

PAPERS

 

Published Hi-C Genome Assemblies:

 

 

 

 

 

 

 

 

 

 

 

Published Metagenomic Projects:

 

 

 

 

PRODUCT RELEASES

 

 

 

 

 

BLOGS AND VIDEOS

 

Uncovering the microbiome: What will you do with metagenomics? March 1st, 2018

 

New Video: From Contigs to Chromosomes March 15th, 2018

 

A sweet new genome for the black raspberry using Proximo™ Hi-C March 28th, 2018

 

Phase Genomics and Pacific Biosciences Co-Developing new Genome Assembly Phasing Software April 19th, 2018

 

Lil BUB Aids in Discovery of New Bacteria August 1st, 2018

 

Hi-C solves the problem of linking plasmids to hosts in microbiome samples August 8th, 2018

 

Earth’s Wine Cellar: Digging into the Microbiome of Vineyards September 6th, 2018

 

Hi-C Technology Links Antimicrobial Resistance Genes to the Microbiome December 4, 2018

 

 

IN THE NEWS

 

NPR, March 6th, 2018
Mysteries of the Moo-crobiome: Could Tweaking Cow Gut Bugs Improve Beef?

 

GeekWire, June 27th, 2018
Phase Genomics wins $1.5M grant to peer inside microorganisms’ DNA

 

GeekWire, August 1st, 2018
Cat celebrity Lil Bub lends poop to Seattle startup, leading to discovery of new kinds of bacteria

 

Market Watch, August 10th, 2018
$500+ Million Human Microbiome Market Scenario, 2018-2022

 

GenomeWeb, September 13th, 2018
Vertebrate Genomes Project Releases First Assemblies; Describes Challenges, Plans

 

Bio-IT World, October 9th, 2018
Pacific Biosciences Releases Highest-Quality, Most Contiguous Individual Human Genome Assembly To Date

 

Genetic Engineering and Biotechnology News (GEN), November 14th, 2018
Precision Medicine Looks beyond DNA Sequences

 

Boise State Radio, December 18th, 2018
University Of Idaho Scientists Put Crosshairs On Antibiotic-Resistant Bacteria

Phase Genomics and Pacific Biosciences Co-Developing new Genome Assembly Phasing Software

Phase Genomics and Pacific Biosciences logos

“FALCON-Phase” – an algorithm for producing diploid genomes.

 

Phase Genomics has entered into a co-development agreement with Pacific Biosciences to develop FALCON-Phase, a software module that combines Hi-C and PacBio® highly-accurate, long read sequencing data to produce fully-phased diploid genome assemblies. The software will be released later this spring.

FALCON-Phase augments PacBio Single Molecule, Real-Time (SMRT®) assemblies with Hi-C proximity-ligation data, generating accurate, fully-phased diploid assemblies. Specifically, it uses Hi-C’s chromatin proximity information to identify sequences belonging to the same parental chromosome in genome assemblies produced by PacBio’s FALCON-Unzip software, greatly reducing haplotype switching along the primary assembly.

Furthermore, by combining Phase Genomics’ Proximo Hi-C genome scaffolding technology with FALCON-Phase, users can fully reconstruct maternal and paternal haplotypes on a chromosomal scale. The end result is a diploid set of chromosome-scale scaffolds, or two fully-phased genomes for the same data and labor cost typical for a single genome project.

FALCON-Phase genome Phasing Graph

FALCON-Phase groups long-read contigs into two separate haplotypes based on Hi-C data. Red and blue edges show contigs connected to the same haplotype, while black edges show homologous contigs connected to both haplotypes. Colors were assigned based on known phasing of assembly, which was not otherwise used to inform FALCON-Phase analysis.

These high-quality phased haplotypes can be leveraged to improve the efficiency of agricultural breeding programs, and could help identify disease-causing genomic variations in humans.

Prof. John Williams, Director of the Davies Research Centre at the University of Adelaide, Australia, wrote, “We are interested in expression of imprinted genes and for this work the availability of haplotype-resolved genome assemblies is an important advance. The release of software that enables the creation of haplotyped genome sequence assembly will revolutionize exploration of genome function. The FALCON-Phase software has this ability and can be applied retroactively to SMRT assemblies, as long as Hi-C data are available. Therefore, even pre-existing genomes can potentially be upgraded to haplotyped assemblies for little or no cost.”

Haplotype-resolved genome assembly is an exciting emerging field. Currently, there is only one other method, Trio Canu, which, unlike FALCON-Phase, requires the parents and offspring to be sequenced, adding an additional cost. For many species, it is not possible to collect a trio in the wild and breeding is often not an option. Other Hi-C phasing techniques exist, but they phase genetic variants, not genome assemblies.

The addition of ultra-long genomic interactions captured by Hi-C to PacBio assemblies is very powerful and presents a straightforward solution to a problem experienced by almost all genomic researchers working with diploid organisms.

A formal announcement with more information is coming in the next month. For more information, email us at info@phasegenomics.com.

 

Pacific Biosciences, the Pacific Biosciences logo, PacBio and SMRT are trademarks of Pacific Biosciences of California, Inc.

A sweet new genome for the black raspberry using Proximo™ Hi-C

Black raspberries

The Black Raspberry, known for its sweetness and health benefits studied further to reveal its chromosome-scale genome.

What is a black raspberry you may ask? Jams, preserves, pies, and liqueur are just a few of the delicious products made with black raspberry. The black raspberry offers much more beyond its exquisite flavors. For instance, did you know it contains a compound called anthocyanins that is used as a dye? It is also used in anti-aging beauty products and contains compounds that may help fight cancer. The useful properties of black raspberry are encoded within the genome.

A multi-national team of scientists have built a full map of the Black Raspberry genome. Teams from New Zealand, Canada, and the U.S.A. contributed to the project led by Drs. Rubina Jibran and David Chagné. The work was published in Nature, Horticulture Research. In the project they leverage Proximo™ Hi-C to order and orient short-read contigs into chromosome-scale scaffolds.

A chromosome-scale reference genome is an important step for basic biology and for breeding programs. Breeders can use this genome while crossing plants to select for traits like color or taste.  To learn more about how Hi-C technology was used to improve the black raspberry genome we contacted Dr. Chagné and Dr. Jibran for a Q&A session. We also wanted their take on the scientific value of Proximo Hi-C and to share their experiences working with us.

 

What is a black raspberry? How is it different from the blackberries we have in Seattle?

The black raspberry we used is no different from the ones found in Seattle. Actually, I remember seeing some black raspberries (also called black-caps) at Pike market few years ago! Washington and Oregon are the largest producers of this delicious crop. Raspberries belong to the genus Rubus, which includes red (Rubus idaeus) and black (R. occidentalis) raspberries, blackberries, loganberries and boysenberries.

 

There are many curious uses of black raspberries, what’s yours?

Black and red raspberries are great on top of Pavlova, alongside slices of kiwifruit. Pavlova is New Zealand’s iconic dessert served around Christmas time, which is the berry fruit season down under here.

 

What are molecular breeding technologies? What are some of the traits in black raspberry you’d like to breed for?

Molecular Breeding techniques use DNA to inform selection decisions. My colleague Cameron Peace from Washington State University did a very good review about the use of DNA-informed breeding in fruit tree.  Plant & Food Research is leading in the use of molecular tools for breeding fruit species, for example we are using genetic markers to predict if apple seedlings carry certain loci for black spot resistance or if they are likely to be red fruited. The breeding goals for Plant & Food Research’s raspberry breeding programme are high fruit flavour, berry anti-oxidant content, pest and disease resistance and higher productivity.

 

The initial black raspberry genome assembly was built from short-read data. Why did you choose to scaffold the short-read contigs rather than create a new long-read assembly? Would you get chromosome scale contigs from a long-read assembly? 

Actually we took both approaches and we decided we would like to see how much of the short-read assembly we would be putting together using Proximo Hi-C. A long-read based assembly will be released soon and the comparison of both assemblies will be extremely informative on what strategy to use for future assemblies of other crop species.

 

How did you validate the Proximity Guided Assembly (PGA) scaffolds? How did you correct errors in the scaffolds?

The PGA for black raspberry was first validated by aligning it to a linkage map and then by aligning it to the genome of strawberry (Fragaria vesca) as they have syntenic genomes.

 

What was the process like in working with Phase Genomics? Would you recommend them to your colleagues?

I enjoy a lot working with Phase Genomics. Black raspberry is not the first genome that we collaborated with Phase Genomics, as we had assembled genomes for kiwifruit and New Zealand manuka previously. The way we work with Phase Genomics is very iterative and they are excellent at trying new methods and assembly parameters until we are satisfied with our assemblies. Every organism has its own challenges when it comes to genome assembly and working with Phase Genomics in a very collaborative way is extremely useful. I have recommended Phase Genomics to colleagues.

New Video: From Contigs to Chromosomes

Phase Genomics CEO and Founder Ivan Liachko, Ph.D. offers an inside look at our ProxiMeta™ Hi-C and Proximo™ Hi-C technology. He explains in this 40 minute presentation how Hi-C is revolutionizing genome and metagenome assembly. Watch “From Contigs to Chromosomes” now and reach out to http://phasegenomics.com/contact-us/ with any questions.

Thanks to IMMSA for hosting this webinar.

Orphan Crop Gains Reference Genome with Proximo Hi-C

Amaranth genome assembly brought to the chromosome-scale using Phase Genomics’ Proximo Hi-C technology. 

 

“Orphan crops” are growing in popularity because they have the potential to feed the world’s expanding population.  You may have heard of orphan crops like quinoa or spelt, but have you heard of amaranth?  The amaranth genus (Amaranthus) is a hearty group of plants that produce nutritious (high in protein and vitamin content) leaves and seeds.  Amaranth species grow strongly across a wide geographic range, including South America, Mesoamerica, and Asia.  Amaranth was likely domesticated by the Aztec civilization and has been a staple food of Mesoamericans for thousands of years. Breeders wish to enhance amaranth’s beneficial properties like drought resistance, nutrition, and seed production to improve the usefulness of amaranth as a food source.  However, effective plant husbandry requires genetic and genomic resources, and building these resources has been inhibited by the high cost of genome sequencing and assembly.

 

Genome assembly Hi-C Orphan Crop

Dr. Jeff Maughan (left) and Dr. Damien Lightfoot (right), are the primary authors of the amaranth genome paper.

Dr. Jeff Maughan, professor at Brigham Young University, is a champion of orphan crop genomics.  Over the past year, Dr. Maughan and his team built a reference-quality amaranth genome on a tight budget.  They built upon an earlier,  short-read assembly by adding Hi-C data, which measures the conformation of chromatin in vivo, as well as low coverage long reads and optical mapping data.  After using optical mapping to correct assembly errors in the short read assembly, the Hi-C data was used to cluster the short genome fragments into nearly complete chromosomes using Phase Genomics’ Proximity-Guided Assembly platform, Proximo™ Hi-C, Then, the long reads were used to close remaining gaps on the chromosomes.  This cost-effective strategy recovered over 98% of the 16 amaranth chromosomes.

 

The completed reference genome provides an important resource for the community and will boost the efforts of plant breeders to unlock more agricultural benefits for amaranth.  In their paper, Dr. Maughan’s team demonstrated the utility of the reference quality genome in at least two ways.  First, they looked at chromosomal evolution by comparing the amaranth genome to the beet genome, which enables researchers to better understand amaranth in the context of how plants evolved, and second, they mapped the genetic locus responsible for stem color, which clarifies the scientific understanding of a useful agricultural trait.  Dr. Maughan points out that both of these experiments would have been impossible without the chromosome-scale genome assembly afforded by Proximo Hi-C.

 

A high-quality reference genome is the first of many important steps towards creating a modern breeding program for amaranth. We contacted Dr. Maughan to learn more about how he is improving amaranth genomics and the importance of orphan crops.

 

What is an orphan crop? 

According to the FAO (Food and Agriculture Organization of the United Nations) the world has approximately 7,000 cultivated edible plant species, but just five of them (rice, wheat, corn, millet, and sorghum) are estimated to provide 60% of the world’s energy intake and just 30 species account for nearly all (95%) of all human food energy needs.  The remaining species are underutilized and often referred to as “orphan crops”.

 

How is genomics relevant to orphan crops?

Would you invest your entire 401K savings in just three stocks?  In essence, that is what we are doing with world food security.  This comes with tremendous risk.  If we are going to diversify our food crops, it will be with these orphan crops.  Modern plant breeding programs leverage genomics to significantly enhance genetic gain (yield), such methods will undoubtedly expedite the development of advanced varieties in orphan crop species.

 

What are the challenges facing researchers interested in orphan crop genomics?  How have you overcome them?

Funding has long been the main obstacle to developing genomic resources for orphaned crops.  The development of cheap, high-quality next-generation sequencing technology has dramatically ameliorated this problem – making genomics accessible for most plant species.

 

You used two scaffolding technologies for your assembly, Hi-C, and BioNano. How did they compare?

Both technologies are extremely useful and complementary but address different genome assembly challenges.  The Hi-C data allows for the production of chromosome length scaffolds, while the BioNano data allows for fine-tuning and verification of the assembly.

 

Beyond building a high-quality genome assembly, what other genomic resources are required to encourage the adoption of orphan crops?

While genomic resources (such as genome assemblies and genetic markers) are fundamental for developing a modern plant breeding program, often what is missing with orphan crops is the collection of diverse germplasm (or gene bank) that is the foundation of a hybrid breeding program.  The U.S. and other nations have extensive collections (tens of thousands of accessions) that serve as the genetic foundation for staple crop breeding programs – unfortunately, such collections are minimal or non-existent for orphan crops.

 

Who stands to benefit the most from a complete amaranth genome?  How do you disseminate your work to them?

We collaborate extensively with researchers throughout South and Central America, where amaranth is already valued as a regionally important crop.  Dissemination of our research occurs though traditional methods (e.g., peer reviewed publications) as well as through sponsored scientist and student exchanges.

 

Amaranth is used in a variety of interesting foods, what’s your favorite dish?

Alegría, which is made with popped amaranth and honey, and is common throughout Mexico.

 

Threespine Stickleback Genome Upgraded Using Proximo™ Hi-C

Threespine stickleback

Proximo Hi-C genome scaffolding not only improved the well-studied threespine stickleback assembly, but also found structural differences that would have otherwise been missed. 

 

This week researchers from the University of Bern and the University of Georgia released a new high-quality reference threespine stickleback genome. The results of this project, a joint collaboration between Dr. Catherine Peichel, Dr. Michael White, and Phase Genomics, were publiaried in the Journal of Heredity. By applying a relatively new scaffolding technology, Proximo Hi-C, the team was able place 60% of previously unassigned sequence to chromosomes. These previously unplaced sequences make up ~5% (13 Megabases) of the stickleback genome and contain multiple genes and other functional DNA. The assembly was generated from an individual from a different lake than the previous stickleback reference genome, and the structural information generated by Proximo Hi-C allowed the team to identify novel structural variants between the two populations. These improvements and new structural information will benefit many research groups that use this model organism to study genetics and evolution.

 

The first efforts to sequence and assemble the threespine stickleback genome from 2012 used a costly sequencing method called Sanger sequencing. This assembly was followed by two revisions in 2013 and 2015 that used standard short-read sequencing technologies. Short reads can be assembled together into larger fragments of the genome called contigs, but some regions of the genome are difficult to assemble because they are long, highly repetitive, or otherwise ambiguous. In the end, these efforts left researchers with a decent yet highly fragmented picture of the stickleback’s chromosomes, with other large portions of its genetic sequence left in individual contigs unassociated to any chromosome.

 

Dr. Catherine (Katie) Peichel and Dr. Michael White

Dr. Catherine (Katie) Peichel (left), Head of the Division Evolutionary Ecology, University of Bern, and Dr. Michael White, Assistant Professor, Department of Genetics, University of Georgia, used Proximo Hi-C genome scaffolding to make many improvements to the Threespine stickleback genome and detect structural variation.

Dr. Peichel and Dr. White used Phase Genomics’ Proximo Hi-C genome scaffolding technology to resolve many of these issues and create the new reference genome. Proximo Hi-C genome scaffolding uses a protocol called Hi-C to measure the physical structure of an organism’s genome and then uses that information to place contigs into chromosome-scale de novo assemblies. Phase Genomics was founded by the inventors of this genome assembly approach and has been making its Proximo Hi-C genome scaffolding technology available to researchers since 2015. The company specializes in generating and analyzing Hi-C data for the scaffolding of genomes such as the Threespine stickleback, as well as for analyzing microbial communities and other metagenomic samples through its ProxiMeta™ Hi-C metagenomic deconvolution technology.

 

We know that scientific tools are only as good as the resulting scientific findings. We sent a brief Q&A to both Dr. Peichel and Dr. White to get their take on the scientific value of Proximo Hi-C and share their experiences in working with us.

 

Why is the stickleback genome important?

Sticklebacks are a “supermodel” for evolutionary genetics, in that they have been one of the leading model systems for identifying the genetic and molecular basis of phenotypic changes in natural populations. Thus, it is important to have a complete genome sequence so that one can correctly identify all the genes that are present in a genomic region that is associated with a phenotype of interest. -CP

Why did the original genome need improvement?

A high-quality Sanger-sequenced genome was published in 2012 and has undergone two revisions since this time. Despite incorporating dense linkage maps to help assign many of the unanchored scaffolds to linkage groups, over 26.7 Mb of the 460 Mb genome still remained unassigned to linkage groups. We needed to apply other approaches to try and assign these remaining scaffolds. -MW

How did Proximo Hi-C scaffolding improve the contiguity of the genome?

We were able to assign over 60% of the unassigned contigs to chromosomes. -CP

What other applications of the Hi-C data are useful to your biological questions?

Hi-C is a useful way to identify structural variation (like inversions) among stickleback populations. We are also excited about the possibility of using Hi-C for assembly of the hard-to-assemble regions of the genome like Y chromosomes. -CP

Why did you choose to work with Phase Genomics?

I was impressed by their interest in our biological questions and dedication to working with us until we were satisfied with the assembly. -CP

We chose to work with Phase Genomics because of the ease of the entire pipeline. Phase Genomics was fast and kept us updated at every step along the way. It was great to work with a group that was so communicative and open to trying different approaches to get the best assembly. -MW

Spotlight on Hi-C in Science: New Technologies Boost Genome Quality

Science writer, Elizabeth Pennisi, outlines available genomics technologies that are helping researchers improve genome assemblies with a focus on Hi-C’s ability to bring genome assembly to the chromosome-scale.

This article, by Elizabeth Pennisi, focuses on how new technologies are making genome quality much better.  Long-reads, optical maps, and Hi-C data are being synergistically applied to improve modern genome assemblies including goat (Dr. Tim Smith), humming bird (Dr. Eric Jarvis), maize, and more.  Importantly, Hi-C provides the finishing touch to these genomes, by providing ultra-long contiguity information that can scaffold entire chromosomes. We, at Phase Genomics, are glad researchers have chosen Proximo Hi-C to scaffold the goat, hummingbird, and hundreds of other assemblies into contiguous chromosome-scale reference genomes.

 

Read the article here

Spotlight on Hi-C in The Atlantic: The Game-Changing Technique That Cracked the Zika-Mosquito Genome

One of the most prolific science writers, Ed Yong, profiles how Hi-C sequencing technologies can make genome assembly easier and more cost-effective than ever before. 

Science writer Ed Yong covers the narrative on the researchers’ tackling the disease carrying Aedes aegypti genome, and how Hi-C “knitted” the genome from 36,000 pieces into complete and contiguous chromosomes. Yong points out that the completed genome will not only help scientists better understand the biology of the mosquito at a much deeper level, but it also marks a technological pivot in genomics: Hi-C makes genome assembly cheaper, more accurate and faster than ever before. Also, mentioned in the article: our collaborator, Dr. Catherine Piechel’s newly published three-spine stickleback genome, and Dr. Erich Jarvis’s hummingbird were also cited as examples of the power of Proximo Hi-C scaffolding.

 

Read the article here