Summary
Resource Type
Genome Assembly
Organism
Name
Whole Genome Assembly (v4.0) and Annotation (v4.1) of Panicum virgatum - cv. AP13 (JGI)
Program, Pipeline, Workflow or Method Name
Assembly & annotation, performed by JGI
Program Version
v 4.1
Algorithm
Date Performed
Wednesday, January 30, 2019
Data Source
Source Name
: JGI-Phytozome Panicum virgatum assembly/annotation
Source Version
: v4.1 [450]
Source URI
: https://phytozome.jgi.doe.gov/pz/portal.html#!info?alias=Org_Pvirgatum

 

Overview

 

This is a release of V1.0 assembly of Panicum virgatum AP13. This version orders part of the tetraploid genome on subgenome specific pseudomolecules. The main assembly consists of Roche 454 linear data sequenced by the JGI and assembled into contigs. For this genome version, contigs were positioned on a dense genetic map (derived from resequencing an AP13 x VS16 cross), contigs were collapsed on subgenome specific markers, scaffolds built with WGS linking data and then larger scaffolds were constructed with syntenty to the diploid relative Panicum hallii.

The first 18 scaffolds are named 01-09 (a or b) and are named after syntenic foxtail millet (Setaria italica) chromosomes.

There is 636.1Mb of sequence localized to chromosomes and an additional 593.5 Mb which are still not localized. Within this set of scaffolds, there can be up to 4 or more copies of loci that could be contributed from A1, A2, B1, or B2. Please do not rely on the gene identifiers being stable in release to release, there will be significant movement of genes as we integrate direct sequence from our clone based genome improvement project.

 

Statistics

 

Genome

The main genome assembly is approximately 1,230 Mb arranged in 319,670 contigs

The main pseudomolecules include 99,024 contigs localized to either the A or B subgenome

Contig N50 (L50) = 54,506 (5.7Kb)

Loci

98,007 total loci containing protein-coding transcripts

Alternative Transcripts

27,432 total alternatively spliced transcripts

For primary transcripts

Average number of exons: 3.9

Median exon length: 183

Median intron length: 133

Number of complete genes: 83,091

Number of incomplete gene with start codon: 5,109

Number of incomplete gene with stop codon: 8,065

 

Sequencing, Assembly, and Annotation

 

How was the assembly generated?

This sequence was generated primarily with the Roche 454 platform and includes 15x sequence coverage of the genome with about 6.5x coming from long linear reads. It was assembled with Newbler2.6. The contigs were then positioned on a subgenome specific marker map and where multiple contigs shared a marker, collapsed into a single contig. These contigs were then scaffolded using long mate pair information and then ordered based on synteny with our diploid switchgrass model organism.

Gene prediction

359,536 transcript assemblies were made from ~0.64B pairs of 150BP stranded paired-end Illumina RNAseq reads and ~37M pairs of 100BP paired-end Illumina RNAseq reads using PERTRAN (Shu et. al., manuscript in preparation). The assemblies were further assembled into PASA transcript assemblies with 5,662 full length cDNA and ~810K Sanger ESTs using PASA (Haas, 2003). Loci were determined by PASA transcript assembly alignments and/or EXONERATE alignments of proteins from arabi (Arabidopsis thaliana), rice, sorghum, foxtail millet,Brachypodium distachyon, soybean, grape and Swiss-Prot proteomes to repeat-soft-masked P. virgatum genome using RepeatMasker (Smit, 1996-2012) and MIPS grasss repeat library with up to 2K BP extension on both ends unless extending into another locus on the same strand. Gene models were predicted by homology-based predictors, FGENESH+ (Salamov, 2000), FGENESH_EST (similar to FGENESH+, EST as splice site and intron input instead of protein/translated ORF), and GenomeScan (Yeh, 2001). The best scored predictions for each locus are selected using multiple positive factors including EST and protein support, and one negative factor: overlap with repeats. The selected gene predictions were improved by PASA. Improvement includes adding UTRs, splicing correction, and adding alternative transcripts. PASA-improved gene model proteins were subject to protein homology analysis to above mentioned proteomes to obtain Cscore and protein coverage. Cscore is a protein BLASTP score ratio to MBH (mutual best hit) BLASTP score and protein coverage is highest percentage of protein aligned to the best of homologs. PASA-improved transcripts were selected based on Cscore, protein coverage, EST coverage, and its CDS overlapping with repeats. The transcripts were selected if its Cscore is larger than or equal to 0.5 and protein coverage larger than or equal to 0.5, or it has EST coverage, but its CDS overlapping with repeats is less than 20%. For gene models whose CDS overlaps with repeats for more that 20%, its Cscore must be at least 0.9 and homology coverage at least 70% to be selected. The selected gene models were subject to Pfam and Panther analysis and gene models whose protein is more than 30% in Pfam/Panther TE domains were removed.

 

References:

Haas, B.J., Delcher, A.L., Mount, S.M., Wortman, J.R., Smith Jr, R.K., Jr., Hannick, L.I., Maiti, R., Ronning, C.M., Rusch, D.B., Town, C.D. et al. (2003) Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. http://nar.oupjournals.org/cgi/content/full/31/19/5654 [Nucleic Acids Res, 31, 5654-5666].

Smit, AFA, Hubley, R & Green, P. RepeatMasker Open-3.0. 1996-2011 .

Yeh, R.-F., Lim, L. P., and Burge, C. B. (2001) Computational inference of homologous gene structures in the human genome. Genome Res. 11: 803-816.

Salamov, A. A. and Solovyev, V. V. (2000). Ab initio gene finding in Drosophila genomic DNA. Genome Res 10, 516-22.

 

Restrictions on dataset usage

I would like to use this data to help clone a gene, analyse a gene family, etc.
Please use this data to advance your studies. Please cite "Panicum virgatum v1.0, DOE-JGI, http://phytozome.jgi.doe.gov/".

I would like to do a large-scale comparison of Panicum virgatum to other genomes, and/or a global analysis of its gene content.
As a public service, the Department of Energy's Joint Genome Institute (JGI) is making the completed Panicum virgatum genome sequence available before scientific publication according to the Ft. Lauderdale Accord. This balances the imperative of the DOE and the JGI that the data from its sequencing projects be made available as soon and as completely as possible with the desire of contributing scientists and the JGI to reserve a reasonable period of time to publish on the genome sequencing and analysis without concerns about preemption by other groups. JGI policy is that early release should aid the progress of science. By accessing these data, you agree not to publish any articles containing analyses of genes or genomic data on a whole genome or chromosome scale prior to publication by JGI and/or its collaborators of a comprehensive genome analysis ("Reserved Analyses"). "Reserved analyses" include the identification of complete (whole genome) sets of genomic features such as genes, gene families, regulatory elements, repeat structures, GC content, or any other genome feature, and whole-genome- or chromosome- scale comparisons with other species. The embargo on publication of Reserved Analyses by researchers outside of the Panicum virgatum Genome Sequencing Project is expected to extend until the publication of the results of the sequencing project is accepted. Scientific users are free to publish papers dealing with specific genes or small sets of genes using the sequence data. If these data are used for publication, the following acknowledgment should be included: 'These sequence data were produced by the US Department of Energy Joint Genome Institute'. This letter has been circulated to Journal Editors so that they are aware of the conditions of access and publication detailed above. These data may be freely downloaded and used by all who respect the restrictions in the previous paragraphs. The assembly and sequence data should not be redistributed or repackaged without permission from the JGI. Any redistribution of the data during the embargo period should carry this notice: "The Joint Genome Institute provides these data in good faith, but makes no warranty, expressed or implied, nor assumes any legal liability or responsibility for any purpose for which the data are used. Once the sequence is moved to unreserved status, the data will be freely available for any subsequent use."

We prefer that potential users of this sequence assembly contact the individuals listed under Contacts with their plans to ensure that proposed usage of sequence data are not considered Reserved Analyses.

 

Contacts

JGI Contact: Jeremy Schmutz (HudsonAlpha/JGI) (email: jschmutz AT hudsonalpha DOT org)

 

 

Source : Phytozome

Publication
There are no publications associated with this record.
Cross Reference
There are no cross references.