Vaccinium darrowii clone NJ8810/NJ8807 v1.2 genome sequence
This is the primary haplotype genome assembly.
GDV Accession: GDV21003
NCBI Accession: JAFMTH000000000
Reference: Yu J, Hulse-Kemp A, Babiker E and Staton M. High-quality reference genome and annotation aids understanding of berry development for evergreen blueberry (Vaccinium darrowii). Revision for Horticulture Research.
Materials and Methods: Vaccinium darrowii genome assembly was generated from DNA extracted from clone NJ8810/NJ8807 leaves. De novo assembly was performed using MECAT2. The draft genome assembly was polished by consensus calling with Arrow from PacBio SMRT Tools using the PacBio long reads. The genome assembly was further polished by two rounds of error correction with Pilon using Illumina short reads. At this stage the assembly represented a partially fused set of heterozygous sequences representing both haplotypes. This mostly diploid representation of the genome was separated into a primary haplotype assembly and secondary haplotype assembly by purge_dups. To anchor contigs into chromosomes, the primary assemblies were scaffolded using Hi-C sequencing reads through the Dovetail HiRiseTM pipeline (Dovetail Genomics, LLC). The secondary haplotype assembly was scaffolded using the same Hi-C data with the software packages juicer and 3D-DNA.
The primary assembly and secondary haplotype assembly (i.e. the haplotigs removed by purging) were annotated separately using RNAseq, Iso-Seq and V. corymbosum proteins by the BRAKER2 pipeline. First, repetitive elements were identified and masked by RepeatModeler and RepeatMasker using previously characterized plant repetitive elements from RepBase. Next, trimmed Illumina RNASeq reads were aligned to both masked assemblies by STAR v2.7.3a. Iso-Seq reads were aligned to genome assemblies by minimap2. Both aligned RNASeq and Iso-Seq reads were merged by samtools and used as transcript evidence in the BRAKER2 pipeline. Additionally, 128,559 proteins from the V. corymbosum ‘Draper’ genome were mapped to masked assemblies by Prothint as protein evidence. Repeat masked assemblies were annotated by BRAKER2 under ‘etpmode’ with both transcript and protein evidence. The predicted gene models were then filtered with structural and functional annotation by EnTAP and gFACS.
Genome Assembly Summary:
Additional information about this analysis: