Summary
Resource Type
Genome Assembly
Organism
Name
Whole Genome Assembly (v3.0) and Annotation (v3.1) of Brachypodium distachyon - cv. Bd21 (JGI)
Program, Pipeline, Workflow or Method Name
Assembly & annotation, performed by JGI
Program Version
v 3.1
Algorithm
Date Performed
Monday, November 20, 2017
Data Source
Source Name
: JGI-Phytozome Brachypodium distachyon assembly/annotation
Source Version
: v3.1 [314]
Source URI
: https://phytozome.jgi.doe.gov/pz/portal.html#!info?alias=Org_Bdistachyon

Assembly-version = v3.0
Annotation-version = v3.1


About Brachypodium distachyon

Brachypodium distachyon, like Arabidopsis thaliana, has several features that recommend it as a model plant for functional genomic studies, especially in the grasses. It has a small, diploid genome (~355 Mb), small physical size, a short life-cycle and few growth requirements. Brachypodium is related to the major cereal grain species but is understood to be more closely related to the Triticeae (wheat and barley) than to the other cereals.

Assembly

This release represents the second improved Brachypodium distachyon (Bd21 strain) genome including ~270 Mb of improved Brachypodium sequence, from JGI. These regions were improved by dividing the gene space into ~2Mb overlapping pieces. Each region was manually inspected and then finished using a variety of technologies including Sanger (primer walks on subclones and fosmid templates, transposon sequencing on subclone templates), Illumina (small insert shatter libraries) and clone-based shotgun sequencing using both Sanger and Illumina libraries. 1,496 gaps were closed, and a total of 1.43 Mb was added to the assembly. Overall contiguity (contig N50) increased by a factor of 63 from 347.8 kb to 22 Mb.

Annotation

74,756 transcript assemblies were constructed from 160 million paired-end Illumina RNA-seq reads, 17,647 transcript assemblies from ~1.9 million 454 reads. The transcript assemblies from RNA-seq reads were made using PERTRAN. 76,209 transcript assemblies were constructed using PASA from 314,866 sequences in total, consisting of the RNA-seq transcript assemblies above, as well as Sanger ESTs. Loci were determined by transcript assembly alignments and/or EXONERATE alignments of proteins from arabidopsis (Arabidopsis thaliana), rice, sorghum, foxtail, grape, soybean and Swiss-Prot eukaryote proteins to soft-repeatmasked Brachypodium distachyon Bd21 genome using RepeatMasker with up to 2 kb extension on both ends unless extending into another locus on the same strand. Gene models were predicted by homology-based predictors, FGENESH+, FGENESH_EST (similar to FGENESH+, EST as splice site and intron input instead of protein/translated ORF), and GenomeScan.

The end result was 34,310 loci containing protein-coding transcripts and 52,972 protein-coding transcripts

 

Source : Gramene


 
NCBI GenBank Records
Release Date: 11/20/2017 BioProject: PRJNA32607 Genbank assembly : GCA_000005505.4 Accession ID: ADDN01000000
Overview

The temperate wild grass species Brachypodium distachyon (Brachypodium) is a new model plant for temperate grasses and herbaceous energy crops. Temperate grass species such as wheat, barley, and forage grasses underpin our food supply. However, the size and complexity of their genomes is a major barrier to biotechnological improvement. Similarly, while herbaceous energy crops (especially grasses) are poised to become a major source of renewable energy in the United States, we know very little about the biology of traits that affect their utility for energy production. Thus a tractable temperate grass model is urgently needed to address questions directly relevant both for improving grain crops and forage grasses that are indispensable to our food production systems, and for developing grasses into superior energy crops. Neither rice nor Arabidopsis adequately fits this role.

Brachypodium is closely related to the cool-season grasses and is an emerging model system for the diverse and economically important grain, forage and turf crops that these groups encompass. The small Brachypodium genome can be used as an accurate template for the much larger polyploid genomes of crops such as wheat and barley. Moreover, since Brachypodium is inbreeding, small in stature, can be grown rapidly, and is amenable to transformation it can be used as a functional model to gain the knowledge about basic grass biology necessary to develop superior energy crops. This combination of desirable attributes underlies the burgeoning research interest in the species.

 

Statistics

This release of Phytozome includes the JGI v3.0 assembly of Brachypodium distachyon Bd21 and the JGI v3.1 annotation.

Genome

Approximately 272Mb arranged in 5 chromosomes and 22 unmapped scaffolds. 99.8% of all sequence is contained in the 5 chromosome assemblies.

Loci

34,310 loci containing protein-coding transcripts

Transcripts

52,972 protein-coding transcripts

 

Sequencing, Assembly, and Annotation
Assembly Improvement

This release represents the second improved Brachypodium distachyon (Bd21) genome including ~270 Mb of improved Brachypodium sequence. These regions were improved by dividing the gene space into ~2Mb overlapping pieces. Each region was manually inspected and then finished using a variety of technologies including Sanger (primer walks on subclones and fosmid templates, transposon sequencing on subclone templates), Illumina (small insert shatter libraries) and clone-based shotgun sequencing using both Sanger and Illumina libraries. 1,496 gaps were closed, and a total of 1.43 MB of base pairs was added to the assembly. Overall contiguity (contig N50) increased by a factor of 63 from 347.8Kb to 22 Mb.

Gene Prediction and Locus Naming

74,756 transcript assemblies were constructed from 160M paired-end Illumina RNA-seq reads, 17,647 transcript assemblies from ~1.9M 454 reads. The transcript assemblies from RNA-seq reads were made using PERTRAN (Shu et. al., manuscript in preparation). 76,209 transcript assemblies were constructed using PASA (Haas, 2003) from 314,866 sequences in total, consisting of the RNA-seq transcript assemblies above, as well as Sanger ESTs. Loci were determined by transcript assembly alignments and/or EXONERATE alignments of proteins from arabidopsis (Arabidopsis thaliana), rice, sorghum, foxtail, grape, soybean and Swiss-Prot eukaryote proteins to soft-repeatmasked Brachypodium distachyon Bd21 genome using RepeatMasker (Smit, 1996-2012) with up to 2K BP extension on both ends unless extending into another locus on the same strand. Gene models were predicted by homology-based predictors, FGENESH+ (Salamov, 2000), FGENESH_EST (similar to FGENESH+, EST as splice site and intron input instead of protein/translated ORF), and GenomeScan (Yeh, 2001).

The highest scoring predictions for each locus are selected using multiple positive factors including EST and protein support, and one negative factor: overlap with repeats. The selected gene predictions were improved by PASA. Improvement includes adding UTRs, splicing correction, and adding alternative transcripts. PASA-improved gene model proteins were subject to protein homology analysis to above mentioned proteomes to obtain Cscore and protein coverage. Cscore is a protein BLASTP score ratio to MBH (mutual best hit) BLASTP score and protein coverage is highest percentage of protein aligned to the best of homologs. PASA-improved transcripts were selected based on Cscore, protein coverage, EST coverage, and its CDS overlapping with repeats. The transcripts were selected if its Cscore is larger than or equal to 0.5 and protein coverage larger than or equal to 0.5, or it has EST coverage, but its CDS overlapping with repeats is less than 20%. For gene models whose CDS overlaps with repeats for more that 20%, its Cscore must be at least 0.9 and homology coverage at least 70% to be selected. The selected gene models were subject to Pfam analysis and gene models whose protein is more than 30% in Pfam TE domains were removed.

References:

Haas, B.J., Delcher, A.L., Mount, S.M., Wortman, J.R., Smith Jr, R.K.,Jr., Hannick, L.I., Maiti, R., Ronning, C.M., Rusch, D.B., Town, C.D. et al. (2003) Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. http://nar.oupjournals.org/cgi/content/full/31/19/5654 [Nucleic Acids Res, 31, 5654-5666].

Smit, AFA, Hubley, R & Green, P. RepeatMasker Open-3.0. 1996-2011 .

Yeh, R.-F., Lim, L. P., and Burge, C. B. (2001) Computational inference of homologous gene structures in the human genome. Genome Res. 11: 803-816.

Salamov, A. A. and Solovyev, V. V. (2000). Ab initio gene finding in Drosophila genomic DNA. Genome Res 10, 516-22.

 

Locus name and transcript name mapping from previous annotation version

The locus model name of a v2.1 gene is mapped to a corresponding v3.1 gene if 1) the v2.1 and v3.1 loci overlap uniquely and appear on the same chromosome, and 2) at least one pair of translated transcripts from the old and new loci are MBH's (mutual best hits) with at least 70% normalized identity in a BLASTP alignment (normalized identity defined as the number of identical residues divided by the longer sequence). For a given pair of v2.1 and v3.1 transcripts at loci that map, transcript names are mapped forward if either a) an MBH relationship exists between the two proteins with at least 90% normalized identity or, b) the proteins have at least 90% normalized identity and are not MBH, but the corresponding transcripts sequences are (also with 90% normalized identity). This latter rule is to specifically handle the cases where the v2.1 and v3.1 models differ mainly by the addition of, or extension, of UTR to a v2.1 model. These rules allowed the locus model names of approximately 90% of non-TE associated models in v2.1 mapped forward as locus model name in v3.1.

 

Contacts

Principal Collaborators:

  • John Vogel (email: jpvogel AT lbl DOT gov)
  • Todd Mockler (Danforth Plant Science Center) (email: TMockler AT danforthcenter DOT org)

JGI Contact:

  • Jeremy Schmutz (HudsonAlpha/JGI) (email: jschmutz AT hudsonalpha DOT org)

 

Reference Publication(s)

International Brachypodium Initiative, Genome sequencing and analysis of the model grass Brachypodium distachyon., Nature. 2010 Feb 11; 463 7282 763-8

 

Additional Resources

More JGI Brachypodium resources (such as T-DNA mutants, Germplasm, and protocols) and information can be found here.

 

Source : Phytozome

Publication
International Brachypodium Initiative. Genome sequencing and analysis of the model grass Brachypodium distachyon.. Nature. 2010 Feb 11; 463(7282):763-8.