Haplotype Phased cassava genome assembly
Cassava (Manihot esculenta Crantz, 2n=36) is an important food crop for a billion people across 105 countries where this starchy root is a critical staple. Cassava has a highly heterozygous genome, high genetic load, genotype-dependent asynchronous flowering, and is typically propagated by stem cuttings. Thus, any genetic variation between haplotypes, including large structural variations, are preserved by such clonal propagation. Traditional genome assembly approaches generate a collapsed haplotype representation of the genome. In highly heterozygous plants, this introduces artefacts and results in an oversimplification of heterozygous regions of the genome. To independently resolve each haplotype of the cassava genome, we use a combination of Pacific Biosciences (PacBio), Illumina and Hi-C sequence reads to the cassava genotype TME7 (Oko-Iyawo).
PacBio reads were assembled into contigs using the FALCON and FALCON-Unzip pipeline and phase switch errors within contigs were corrected using FALCON-Phase and Hi-C read data. The ultra-long-range genotype information from Hi-C sequencing was then used for scaffolding and within-scaffold phasing, thus correcting phase switch errors. Each haploid assembly is ~700MB. Comparison of the two phases revealed more than 5000 large structural variants, including insertions and deletions of up to 10kb, affecting more than 8Mb of the genome.
We also used this genome, and RNAseq data from diverse tissue, to study allele specific expression (ASE) in cassava. I identified thousands of genes showing patterns of ASE, including many associated with defense. Furthermore, I found that genes with higher levels of ASE are more likely to be in proximity to large haplotypic structural variations.
These two haplotype assemblies will provide an excellent means to study the haplotype specific structural variation, synteny, and allele specific gene expression contributing to important agricultural traits and further our understanding of the genetics and domestication of cassava.
This research, published in The Plant Journal, has key implications for our understanding of how high heterozygosity is involved in in tissue development and other important biological processes such as defense and even heterosis.