Vaccinium darrowii clone NJ8810/NJ8807 v1.2 genome sequence

Genome Overview


Analysis Name	Vaccinium darrowii clone NJ8810/NJ8807 v1.2 genome sequence
Method	PacBio, Illumina, Hi-C (MECAT2, Arrow, Pilon, Purge_dups, Juicer, 3D-DNA)
Source	V. darrowii clone NJ8810/NJ8807
Date performed	2021-06-18

This is the primary haplotype genome assembly.

GDV Accession: GDV21003

NCBI Accession: JAFMTH000000000, GCA_020921065.1 (Download file with NCBI scaffold and gene names and corresponding scaffold and gene names on GDV)

Reference: Yu, J., Hulse-Kemp, A.M., Babiker, E., Staton, M. High-quality reference genome and annotation aids understanding of berry development for evergreen blueberry (Vaccinium darrowii). Horticulture Research 8, 228 (2021). DOI: https://doi.org/10.1038/s41438-021-00641-9

Materials and Methods: Vaccinium darrowii genome assembly was generated from DNA extracted from clone NJ8810/NJ8807 leaves. De novo assembly was performed using MECAT2. The draft genome assembly was polished by consensus calling with Arrow from PacBio SMRT Tools using the PacBio long reads. The genome assembly was further polished by two rounds of error correction with Pilon using Illumina short reads. At this stage the assembly represented a partially fused set of heterozygous sequences representing both haplotypes. This mostly diploid representation of the genome was separated into a primary haplotype assembly and secondary haplotype assembly by purge_dups. To anchor contigs into chromosomes, the primary assemblies were scaffolded using Hi-C sequencing reads through the Dovetail HiRiseTM pipeline (Dovetail Genomics, LLC). The secondary haplotype assembly was scaffolded using the same Hi-C data with the software packages juicer and 3D-DNA.

The primary assembly and secondary haplotype assembly (i.e. the haplotigs removed by purging) were annotated separately using RNAseq, Iso-Seq and V. corymbosum proteins by the BRAKER2 pipeline. First, repetitive elements were identified and masked by RepeatModeler and RepeatMasker using previously characterized plant repetitive elements from RepBase. Next, trimmed Illumina RNASeq reads were aligned to both masked assemblies by STAR v2.7.3a. Iso-Seq reads were aligned to genome assemblies by minimap2. Both aligned RNASeq and Iso-Seq reads were merged by samtools and used as transcript evidence in the BRAKER2 pipeline. Additionally, 128,559 proteins from the V. corymbosum ‘Draper’ genome were mapped to masked assemblies by Prothint as protein evidence. Repeat masked assemblies were annotated by BRAKER2 under ‘etpmode’ with both transcript and protein evidence. The predicted gene models were then filtered with structural and functional annotation by EnTAP and gFACS.

Genome Assembly Summary:

Total size	582,669,857
Number of scaffolds	107
N50	47,393,601
Assembly BUSCO score (embryophyta_odb10)	94.0%
Annotation BUSCO score (embryophyta_odb10)	92.3%

Downloads

All the files below are also available on the GDV Data Repository.

V. darrowii genome, v1.2 assembly files:

Scaffolds [Fasta Format]
mRNA and Transcripts [GFF3 Format]
Transcript sequences [Fasta Format]
Protein sequences [Fasta Format]
Gene sequences [Fasta Format]

Homology Analysis:

The protein homolog searches were performed using BLAST (e-value cutoff of 1e-6) and the Swissprot and TrEMBL protein databases by the Main Bioinformatics Lab. Results were parsed into an Excel format.

Homology Analysis: V. darrowii NJ8810/NJ8807 v1.2 Proteins vs. Swissprot
Homology Analysis: V. darrowii NJ8810/NJ8807 v1.2 Proteins vs. TrEMBL

Links