Sci Data. 2017 Oct 10;4:170149. doi: 10.1038/sdata.2017.149.
Pirita Paajanen, Jan Strauss, Cock van Oosterhout, Mark McMullan, Matthew D Clark, Thomas Mock
PMCID: PMC5634323 DOI: 10.1038/sdata.2017.149
The genome of the cold-adapted diatom Fragilariopsis cylindrus is characterized by highly diverged haplotypes that intersperse its homozygous genome. Here, we describe how a combination of PacBio DNA and Illumina RNA sequencing can be used to resolve this complex genomic landscape locally into the highly diverged haplotypes, and how to map various environmentally controlled transcripts onto individual haplotypes. We assembled PacBio sequence data with the FALCON assembler and created a haplotype resolved annotation of the assembly using annotations of a Sanger sequenced F. cylindrus genome. RNA-seq datasets from six different growth conditions were used to resolve allele-specifc gene expression in F. cylindrus. This approach enables to study differential expression of alleles in a complex genomic landscape and provides a useful tool to study how diverged haplotypes in diploid organisms are used for adaptation and evolution to highly variable environments.