Haplotype Phased cassava genome assembly


Cassava (Manihot esculenta Crantz, 2n=36) is an important food crop for a billion people across 105 countries where this starchy root is a critical staple. Cassava has a highly heterozygous genome, high genetic load, genotype-dependent asynchronous flowering, and is typically propagated by stem cuttings. Thus, any genetic variation between haplotypes, including large structural variations, are preserved by such clonal propagation. Traditional genome assembly approaches generate a collapsed haplotype representation of the genome. In highly heterozygous plants, this introduces artefacts and results in an oversimplification of heterozygous regions of the genome. To independently resolve each haplotype of the cassava genome, we use a combination of Pacific Biosciences (PacBio), Illumina and Hi-C sequence reads to the cassava genotype TME7 (Oko-Iyawo).

PacBio reads were assembled into contigs using the FALCON and FALCON-Unzip pipeline and phase switch errors within contigs were corrected using FALCON-Phase and Hi-C read data. The ultra-long-range genotype information from Hi-C sequencing was then used for scaffolding and within-scaffold phasing, thus correcting phase switch errors. After gap filling, the results are 36 highly contiguous chromosome sequences representing the two haploid copies of each chromosome – we report a contig N50 of 735 and 712Kb for phase 0 and 1, respectively, and a scaffold N50 of ~40Mb for both phases. Each haploid assembly is ~750MB. Both linkage- and optical maps confirmed the contiguity, order, and directionality of the assemblies. Comparison of the two phases revealed 3,753 structural variants, including insertions and deletions of up to 10kb, affecting more than 5Mb of the sequence. Annotation of haplotype specific genes and transposable elements is underway. These two haplotype assemblies will provide an excellent means to study the haplotype specific structural variation, synteny, and allele specific gene expression contributing to important agricultural traits and further our understanding of the genetics and domestication of cassava.