Difference between revisions of "SGD Newsletter, Fall 2021"

From SGD-Wiki
Jump to: navigation, search
(New links to AlphaFold 3D Predicted Protein Structure Database)
(New links to AlphaFold 3D Predicted Protein Structure Database)
Line 47: Line 47:
 
SGD now contains links to [https://www.alphafold.ebi.ac.uk/ AlphaFold] in the Resources sections of the [https://www.yeastgenome.org/locus/S000004103#resources/ Summary], [https://www.yeastgenome.org/locus/hog1/protein#resources/ Protein], and [https://www.yeastgenome.org/locus/S000004103/homology#resources/ Homology] pages for every gene.
 
SGD now contains links to [https://www.alphafold.ebi.ac.uk/ AlphaFold] in the Resources sections of the [https://www.yeastgenome.org/locus/S000004103#resources/ Summary], [https://www.yeastgenome.org/locus/hog1/protein#resources/ Protein], and [https://www.yeastgenome.org/locus/S000004103/homology#resources/ Homology] pages for every gene.
  
[[File:AlphaFold graphic.png|thumb|left|upright=.99]]
 
  
*EMBL’s European Bioinformatics Institute (EMBL-EBI) offers a new, highly accurate tool for predicting protein structure with speed and clarity.  
+
*The links through SGD give quick access to EMBL’s [https://www.ebi.ac.uk/ European Bioinformatics Institute] (EMBL-EBI) offers a new, highly accurate tool for predicting protein structure with speed and clarity.  
  
 
*Given a peptide sequence for an uncharacterized protein, AlphaFold will model predicted domains and provide relative confidence levels for each portion of the prediction.  
 
*Given a peptide sequence for an uncharacterized protein, AlphaFold will model predicted domains and provide relative confidence levels for each portion of the prediction.  
  
*The predicted domains can then be compared to known protein structures (using a tool such as PDBeFold, https://www.ebi.ac.uk/msd-srv/ssm/) to seek matches to characterized protein families.  
+
*The predicted domains can then be compared to known protein structures (using a tool such as [https://www.ebi.ac.uk/msd-srv/ssm/ PDBeFold]) to seek matches to characterized protein families.  
  
 
*Whether or not a family is identified, the comparison will yield clues to protein function to help design the next experiments.
 
*Whether or not a family is identified, the comparison will yield clues to protein function to help design the next experiments.
 +
 +
[[File:AlphaFold graphic.png|thumb|left|upright=.99]]
  
 
==YeastMine Updates==
 
==YeastMine Updates==

Revision as of 06:33, 10 November 2021

About this newsletter:
This is the Fall 2021 issue of the SGD newsletter. The goal of this newsletter is to inform our users about new features in SGD and to foster communication within the yeast community. You can view this newsletter as well as previous newsletters on our Community Wiki.

Updated Protein Complex pages

Nomenclature Updates

Legacy gene names

SGD has long been the keeper of the official Saccharomyces cerevisiae gene nomenclature. Robert Mortimer handed over this responsibility to SGD in 1993 after maintaining the yeast genetic map and gene nomenclature for 30 years.

The accepted format for gene names in S. cerevisiae comprises three uppercase letters followed by a number. The letters typically signify a phrase (referred to as the "Name Description" in SGD) that provides information about a function, mutant phenotype, or process related to that gene, for example "ADE" for "ADEnine biosynthesis" or "CDC" for "Cell Division Cycle". Gene names for many types of chromosomal features follow this basic format regardless of the type of feature named, whether an ORF, a tRNA, another type of non-coding RNA, an ARS, or a genetic locus. Some S. cerevisiae gene names that pre-date the current nomenclature standards do not conform to this format, such as MRLP38, RPL1A, and OM45.

A few historical gene names predate both the nomenclature standards and the database, and were less computer-friendly than more recent gene names, due to the presence of punctuation. SGD recently updated these gene names to be consistent with current standards and to be more software-friendly by removing punctuation. The old names for these four genes have been retained as aliases.

ORF Old gene name New gene name
YGL234W ADE5,7 ADE57
YER069W ARG5,6 ARG56
YBR298C DUR1,2 DUR12
YIL154C IMP2' IMP2

New Systematic Nomenclature for yeast genes not in the reference genome

For many years, a widely adopted systematic nomenclature has existed for yeast protein-coding genes, or ORFs, as many yeast researchers call them. Readers of the last SGD newsletter will recall that, earlier this year, SGD adopted a new systematic nomenclature for the entire annotated complement of ncRNAs.

We have just put into place a new systematic nomenclature for S. cerevisiae genes that are not found in the reference genome of strain S288C ("non-reference" genes). This new systematic nomenclature is similar to, but distinct from, that used for ORFs and that used for ncRNAs. Non-reference genes are designated by a symbol consisting of three uppercase letters and a four-digit number, as follows: Y for "Yeast", SC for "Saccharomyces cerevisiae", and a four-digit number corresponding to the sequential order in which the gene was added to SGD. We currently have 55 of these genes in SGD, some of which are old favorites like MAL21/YSC0004 and MATA/YSC0046, while others are more recent additions like XDH1/YSC0051. Going forward, as evidence is published pointing to other S. cerevisiae genes not present in the S288C reference genome, they will be added to the annotation using the next sequential number available. We already have 15 more of these YSC0000 names reserved by researchers and awaiting publication.

If you have some non-reference genes for which these names would be appropriate, please let us know!

New links to AlphaFold 3D Predicted Protein Structure Database

Would you like to see the shape of your protein?

SGD now contains links to AlphaFold in the Resources sections of the Summary, Protein, and Homology pages for every gene.


  • The links through SGD give quick access to EMBL’s European Bioinformatics Institute (EMBL-EBI) offers a new, highly accurate tool for predicting protein structure with speed and clarity.
  • Given a peptide sequence for an uncharacterized protein, AlphaFold will model predicted domains and provide relative confidence levels for each portion of the prediction.
  • The predicted domains can then be compared to known protein structures (using a tool such as PDBeFold) to seek matches to characterized protein families.
  • Whether or not a family is identified, the comparison will yield clues to protein function to help design the next experiments.
AlphaFold graphic.png

YeastMine Updates

SGD has updated YeastMine with ...


Alliance of Genome Resources - latest release

alliance logo.png

The Alliance of Genome Resources, a collaborative effort from SGD and other model organism databases (MOD), released version 4.1 this past August. Notable improvements and new features include:

  • Human and model organism high throughput (HTP) variant data
    • Human variants are imported from Ensembl
    • Model organism HTP variants are submitted by Alliance members (FlyBase, RGD, SGD, Wormbase) or imported from EVA (MGI and ZFIN).
    • Added HTP variants to the Alleles and Variants table on gene pages (e.g. rat Lepr Gene page) and to the table on the Alleles and Variants Details page (e.g. rat Lepr Alleles and Variants Details.
    • Created a report page for Human and model organism HTP variants (e.g. human variant rs1041354454).
    • Expanded Allele Category in search to “Allele/Variant” and added a search for HTP variants.
  • On Gene Pages, a new Pathways widget displays via tabs:
    • Reactome models of pathways for human gene products as well as inferred pathways for model organism genes based on orthology to human genes.
    • Reactome reactions for gene products (e.g. human TP53 Gene page)
    • Gene Ontology Causal Activity Models (GO-CAMs). These provide a framework to represent a biological system by linking together multiple GO annotations. PMID:31548717 (e.g. worm nsy-1 Gene page).
  • Experimental conditions are include for Disease and Phenotype data in tables on Gene, Allele, and Disease pages (e.g. zebrafish scn1lab Gene page).
  • AllianceMine added Orthologs, and Allele and Variants (low throughput) data types to this release. You can now query for these data types via pre-made template queries.
  • The Alliance Community Forum is released. The Forum permits discussions across six model organism communities—flies, mice, yeast, rats, worms, and zebrafish. More details will follow.

Upcoming Conferences

  • Fungal Genetics - the premier meeting for the international community of fungal geneticists
    • Asilomar Conference Grounds, Pacific Grove, California (and Online)
    • March 15 - 20, 2022
  • 36th International Specialised Symposium on Yeasts (ISSY36) - Yeast Sea to Sky - Yeast in the Genomics Era
    • University of British Columbia, Vancouver
    • July 12 - 16, 2022
  • Yeast Genetics Meeting - the premier meeting for students, postdoctoral scholars, research staff, and principal investigators studying various aspects of eukaryotic biology in yeast
    • University of California, Los Angeles
    • August 17 – 21, 2022

Gene Ontology Consortium Fall 2021 Meeting

logo GOC.png

From October 12-14, SGD biocurators attended the Gene Ontology Consortium's Fall Meeting with participants from around the world. The goal of these meetings is to bring together data scientists with diverse backgrounds (curators, programmers, etc.) for lively discussions regarding how to better capture, curate, analyze, and serve data to researchers, educators, students, and other life science professionals. Our goal in participating in these meetings each year is to find ways to make SGD even better for you!

Discussion topics included, but were not limited to:

  • LitSuggest - web-based system for biomedical literature recommendation and curation
  • ECO, Evidence and Conclusions Ontology - terms used to describe types of evidence and assertion methods
  • PAINT, Phylogenetic Annotation and INference Tool from PANTHER - orthology between reference genome genes and human disease genes

Happy Holidays from SGD!

SnowShmoo.png

We know that 2021 has been another challenging year for everyone. Our thoughts go out to all those who have been impacted by recent events. We wish you and your family, friends, and lab mates the best during the upcoming holidays.

Stanford University will be closed for two weeks starting on December 20, and will reopen on January 3rd, 2022. Although SGD staff members will be taking time off, the website will be up and running throughout the winter break, and we will resume responding to user requests and questions in the new year.