Summary
Resource Type
Genome Assembly
Organism
Name
Whole Genome Assembly (TAIR9) and Annotation (TAIR10) of Arabidopsis thaliana - ec. Columbia
Program, Pipeline, Workflow or Method Name
TAIR Assembly & annotation workflow
Program Version
TAIR10
Algorithm
Date Performed
Wednesday, January 30, 2019
Data Source
Source Name
: The Arabidopsis Information Resource (TAIR)
Source Version
: TAIR10

Assembly : GCF_000001735.3
The TAIR10 release contains 27,416 protein coding genes, 4827 pseudogenes or transposable element genes and 1359 ncRNAs (33,602 genes in all, 41,671 gene models). A total of 126 new loci and 2099 new gene models were added.

https://www.arabidopsis.org/portals/genAnnotation/gene_structural_annotation/annotation_data.jsp


Data Source

TAIR:  The Arabidopsis Information Resource

Overview

Arabidopsis thaliana is a small flowering plant that is widely used as a model organism in plant biology. Arabidopsis is a member of the mustard (Brassicaceae) family, which includes cultivated species such as cabbage and radish. Arabidopsis is not of major agronomic significance, but it offers important advantages for basic research in genetics and molecular biology (from TAIR - The Arabidopsis Information Resource).

Statistics

This release of Phytozome includes TAIR annotation release 10 of the Arabidopsis thaliana genome release 9.

Genome

Approximately 135Mb arranged in 5 chromosomes

Loci

27416 loci containing protein-coding transcripts

Transcripts

35386 protein-coding transcripts

Contacts

For general questions about the assembly and annotation, please contact the TAIR curators (email: curator AT arabidopsis DOT org)

Reference Publication(s)

Lamesch P, Berardini TZ, Li D, Swarbreck D, Wilks C, Sasidharan R, Muller R, Dreher K, Alexander DL, Garcia-Hernandez M, Karthikeyan AS, Lee CH, Nelson WD, Ploetz L, Singh S, Wensel A, Huala E, The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools.Nucleic acids research. 2012 Jan ; 40 Database issue D1202-10

 

Source : Phytozome


 

TAIR10 Genome release announcement

The Arabidopsis Information Resource (TAIR) is pleased to announce the release of the latest version of the Arabidopsis genome annotation (TAIR10). 

The latest release builds upon the gene structures of the previous TAIR9 release using RNA-seq and proteomics datasets as well as manual updates informed by cross species alignments, peptides and community input regarding missing and incorrectly annotated genes. 

 

 

TAIR10 statistics

The TAIR10 release contains 27,416 protein coding genes, 4827 pseudogenes or transposable element genes and 1359 ncRNAs (33,602 genes in all, 41,671 gene models). A total of 126 new loci and 2099 new gene models were added. 

Eighteen percent (5885) of Arabidopsis genes now have annotated splice variants. Updates were made to 1184 gene models of which 707 had CDS updates. There were 41 gene splits and 37 gene merges. No changes were made to the Arabidopsis genome assembly for the TAIR10 release. 

Gene annotation utilized available proteomics data (Baerenfaller et al., 2008 and Castellana et al., 2008) and RNA-seq data from the Ecker and Mockler labs (Lister et al. 2008, Filichkin et al. 2010). RNA-seq data was mapped to the Arabidopsis genome using TopHat, HashMatch or supersplat. After quality and low complexity filtering a total of ~200 million RNA-seq reads were successfully mapped to the genome. Of these, ~9 million represent spliced reads. Proteomics data and spliced RNA-seq reads were provided to Augustus and the resulting gene models categorised and manually reviewed. Validated gene updates, novel genes and novel splice variants were incorporated into the TAIR10 release. Additional spliced RNA-seq reads not already incorporated into gene models by Augustus were supplied to TAU. The resulting TAU models were again reviewed for potential novel splice variants. Transcript assemblies were generated via Cufflinks using all spliced reads and unspliced reads from the Ecker sets. Transcript assemblies were filtered and compared to existing gene models, resulting in the addition of 56 novel genes. Additional new proteome data provided to us by Katja Baerenfaller was used to directly update 24 gene models. 

Gene models created using the Gnomon pipeline were provided to TAIR by NCBI. Reanalysis of these models for TAIR10 resulted in 11 additional novel genes, 67 additional alternative splice variants and 178 updates to existing genes. 

 

 

 

 

Genome assembly updates (done for TAIR9)

In agreement with our reference genome policy corrections to the reference assembly were only made if supported by at least two independently derived sequence libraries from the Columbia ecotype. The following updates were made to the chromosome sequences for the TAIR9 release:

227 single nucleotide substitutions were made to the assembly sequence based on re-sequencing data provided by Richard Clark (Ossowski et al. 2008) and Joe Ecker. 

341 indels were made to the assembly sequence based on re-sequencing data provided by Richard Clark and EST and cDNA sequences deposited in Genbank that supported the insertion/deletion. 

14 regions previously identified in TAIR8 as either vector, E.coli or rice contamination, and where the existing sequence had been substituted with the equivalent number of IUPAC ambiguity code 'N's were standardized (via deletion) to a set size of 100bp. 

All five nuclear chromosomes were updated for TAIR9 details of the golden path length of each chromosome can be found at http://www.arabidopsis.org/portals/genAnnotation/gene_structural_annotat...

Further details of these assembly changes and earlier TAIR8 updates can be found at ftp://ftp.arabidopsis.org/home/tair/Sequences/whole_chromosomes/ 

Assembly updates and gap information can also be viewed in TAIRs GBrowse (see Assembly tracks section). 

We would like to thank all those who contributed to the latest release by providing submissions for new and incorrectly annotated genes. 

TAIR wishes to thank Cornell University for use of the computer clusters at the Cornell Center for Advanced Computing (CAC). 

 

 

 

 

Source : The Arabidopsis Information Resource

Publication
Lamesch P, Berardini TZ, Li D, Swarbreck D, Wilks C, Sasidharan R, Muller R, Dreher K, Alexander DL, Garcia-Hernandez M, Karthikeyan AS, Lee CH, Nelson WD, Ploetz L, Singh S, Wensel A, Huala E. The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools.. Nucleic acids research. 2012 Jan; 40(Database issue):D1202-10.