Whole Genome Assembly and Annotation (v2.1) of Sorghum bicolor Rio

Error message

User warning: The following module is missing from the file system: announcements_feed. For information about how to fix this, see the documentation page. in _drupal_trigger_error_with_delayed_logging() (line 1181 of /opt/projects/grass-genome-hub.southgreen.fr/drupal-7.94/includes/bootstrap.inc).

Summary

Publication

Annotations

Cross Reference

Relationship

Summary

Name	Whole Genome Assembly and Annotation (v2.1) of Sorghum bicolor Rio
Description	Assembly The Sorghum Rio genome assembly was constructed by Cooper et al (2019) using FALCON (Chin et al, 2016) and polished with Quiver (Chin et al, 2013). The Sorghum Rio v2.1 assembly in SorghumBase corresponds to release v2.0 of Phytozome. A total of 35,627 unique, non-repetitive, non-overlapping 1 KB sequences were generated using the existing Sorghum bicolor v3.0 assembly and aligned to the polished Sorghum Rio assembly. Scaffolds were oriented, ordered, and assembled into 10 chromosomes. NCBI accession: GCA_015952705.1. Annotation Genome-guided transcript assemblies were made from close to 1 billion bp of 2x151bp paired-end Illumina RNAseq reads using PERTRAN (Shu, unpublished; see Cooper et al, 2019). PASA (Haas et al, 2003) alignment assemblies were constructed using the PERTRAN output from the Rio RNAseq data along with sequences from known S. bicolor expressed sequence tags (ESTs) associated with the current reference genome. As further described in Phytozome, loci were determined by transcript assembly alignments and/or EXONERATE alignments of proteins from Arabidopsis thaliana, soybean, maize, rice, foxtail, Sorghum bicolor BTx623, brachy, grape, and Swiss-Prot proteomes to the repeat-soft-masked Sorghum bicolor Rio genome using RepeatMasker (RepeatMasker Open-3.0 by AFA Smit, R Hubley & P Green, 1996-2011) with up to 2K BP extension on both ends unless extending into another locus on the same strand. Gene models were predicted by homology-based predictors, FGENESH+ (Salamov and Solovyev, 2000), FGENESH_EST (similar to FGENESH+, EST as splice site and intron input instead of protein/translated ORF), and GenomeScan (Yeh et al, 2001), PASA assembly ORFs (in-house homology constrained ORF finder) and from AUGUSTUS via BRAKER1 (Hoff et al, 2016). The best scored predictions for each locus were selected using multiple positive factors including EST and protein support, and one negative factor: overlap with repeats. The selected gene predictions were improved by PASA (Haas et al, 2003). PASA-improved gene model proteins were subject to protein homology analysis to above mentioned proteomes to obtain Cscore and protein coverage; PASA-improved transcripts were selected based on Cscore, protein coverage, EST coverage, and its CDS overlapping with repeats. Selected gene models were subject to Pfam analysis and gene models whose protein was more than 30% in Pfam TE domains were removed. For additional details, see Sorghum bicolor Rio v2.1 (Sorghum Rio) in Phytozome v12.1.
Program, Pipeline, Workflow or Method Name	PASA-improved
Program Version	n/a
Algorithm
Date Performed	Monday, February 13, 2023
Data Source	Source Name : Phyzotome Source Version : 2.1 Source URI : https://phytozome-next.jgi.doe.gov/info/SbicolorRio_v2_1

Publication

There are no publications associated with this record.

Annotations

This record has the following annotations.
Term	Name	Definition
There are no annotations of this type

Relationship

There are 0 relationships.
Relationship
There are no relationships