Frequently Asked Questions

Querying/Searching
Tools and Resources
Using Ontologies

Yes, you can now use * as a wild card anywhere a search string. You must have at least three characters (letters, numbers, etc.) in addition to the wild card.

(Previously, wild cards did not work at the beginning of a search string, but PomBase now uses a different search technology that supports wild cards anywhere.)

Yes. In the Advanced Search, the Gene Systematic IDs and Gene Names filters both accept lists. You can type or paste lists of IDs/names into the box, separated by commas or with one ID or name per line.

At present, there is a fixed set of data retrieved when you execute the search. We plan to offer more flexible options in the near future. Later, we also hope to allow you to upload a file containing your gene list.

For convenience, there is a direct link to a search page pre-configured to accept a list of systematic IDs available in the Find menu, on the Find page, and here:

http://www.pombase.org/spombe/query/builder?filter=12

Once you have done a search for your genes, the list of results will be available in the Query Management section of the Advanced Search, allowing you to combine the list with other lists or with additional search criteria.

You can search for genes annotated to a Fission Yeast Phenotype Ontology term in the Advanced Search (http://www.pombase.org/spombe/query/builder or go to the Find tab and click "Advanced Search").

In the "Select Filter" pulldown, if you know the ID (for example, "inviable cell" is FYPO:0000049, and "elongated cell" is FYPO:0000017) choose "FYPO ID", and then type or paste the ID into the box. Otherwise, choose "FYPO Term Name" and start typing; the autocomplete feature will suggest phenotypes. Choose one, and click the Submit button to run the search. You can download the list in plain text or a few other formats from the query results page.

Note that the FYPO search retrieves annotations by following the is_a, part_of, output_of, has_output, and has_part relationships in the ontology. For example, FYPO includes the relation "inviable swollen elongated cell with enlarged nucleus" (FYPO:0002083) has_part "swollen cell" (FYPO:0000025). Genes annotated to FYPO:0002083 will therefore be retrieved in a search for FYPO:0000025. See the Advanced Search documentation for more information.

Example query: Genes annotated to "elongated cell" (FYPO:0000017), all alleles

Also see the FAQ on finding essential genes.

If an essential gene is deleted, the cell cannot survive under normal laboratory conditions. A search for deletion alleles annotated to the Fission Yeast Phenotype Ontology term "inviable vegetative cell population" (FYPO:0002061) would therefore identify essential fission yeast genes. Similarly, deletion alleles annotated to "viable vegetative cell population" (FYPO:0002060) represent non-essential genes.

Downloadable summary

A set of "viability summary" data, as shown at the top of the FYPO table on each gene page, is available as a downloadable file. The file has two columns: the gene systematic ID and one of three values: "viable", "inviable" or "condition-dependent".

Querying

  • To find genes annotated to "inviable vegetative cell population", select the "FYPO ID" filter and type or paste the ID, FYPO:0002061. Set the Allele Expression pulldown to "Null Expression" and submit the query. The results include all genes that showed inviable phenotypes in the HTP deletion project as well some manually annotated genes. Do the same for viable (FYPO:0002060).
  • For some deletion mutants, viability depends on experimental conditions, which cannot yet be queried in PomBase. These genes are annotated to both viable (FYPO:0002060) and inviable (FYPO:0002061) at once. To find them, use the "AND" operator in the Query Management panel (this search can also be set up all at once in the New Query panel).
  • See the Advanced Search documentation for more information on performing the searches described here.

A brief note about FYPO terms

At present, there are very few null mutants annotated as inviable in life cycle stages other than vegetative growth, and "inviable vegetative cell population" best fits the most common usage of "essential gene". If you do want to include other stages (such as "inviable spore"), you can use the very generic term "inviable cell population" (FYPO:0002059) or "viable cell population" (FYPO:0002058) in your query. All of the caveats about alleles and conditions still apply.

Query links

Gene Ontology (GO) cellular component annotations capture the localizations of gene products to subcellular structures such as organelles or complexes. GO Cellular Component annotations are displayed on PomBase gene pages as described in the PomBase GO documentation. The GO Consortium provides documentation that describes what the Cellular Component ontology includes. To search for proteins (or functional RNAs) with a particular localization, use the Gene Ontology filter in the Advanced Search to find genes annotated to the relevant GO Cellular Component term(s).

Pombase GO Cellular Component annotations include data from the whole-genome localization study (Matsuyama et al. 2006) as well as manually curated data from papers on small-scale experiments, and inferences from ortholog annotations. Macromolecular complex annotations are also available in a file (see FAQ).

Example query: nucleus (GO:0005634)

PomBase uses Gene Ontology (GO) molecular function terms to capture the activities -- including enzymatic activities, binding, transporters, etc. -- of gene products. You can therefore use the GO filters in the Advanced Search to retrieve genes whose products have a given activity.

In the "Select Filter" pulldown, if you know the ID (for example, "histone acetyltransferase activity" is GO:0004402, and "calcium ion transmembrane transporter activity" is GO:0015085) choose "GO ID", and then type or paste the ID into the box. Otherwise, choose "GO Term Name" and start typing; the autocomplete feature will suggest terms. Choose one, and click the Submit button to run the search. You can download the list in plain text or a few other formats from the query results page. You can try using more specific or less specific terms to retrieve the results that best fit your expectations and needs. See the Advanced Search documentation and the Gene Page GO documentation for more information, including how ontology searches retrieve annotations to general terms.

Example query: phosphoprotein phosphatase activity (GO:0004721)

Protein modifications (where curated) are included in the Modifications section on gene pages. (We plan to include RNA modifications later.) The Gene Page modifications documentation describes the display.

To retrieve all genes whose products have a given modification, use the PSI-MOD filter in the Advanced Search. In the "Select Filter" pulldown, if you know the ID (for example, "phosphorylated residue" is MOD:00696) choose "PSI-MOD ID", and then type or paste the ID into the box. Otherwise, choose "PSI-MOD Term Name" and start typing; the autocomplete feature will suggest terms. Choose one, and click the Submit button to run the search. See the Advanced Search documentation for more information, including how ontology searches retrieve annotations to general terms.

We are aware that protein modification curation is relatively incomplete. If you know of any protein modifications that are missing from the gene pages or the search results, please notify the PomBase curators.

Example query: phosphorylated residue (MOD:00696)

In the Advanced Search (http://www.pombase.org/spombe/query/builder), choose "Transmembrane Domains" under the "Protein Filters" heading in the "Select Filter" pulldown. Enter the minimum and maximum number of domains (use the same number as minimum and maximum to retrieve proteins with, e.g. exactly 7 transmembrane domains), and submit.

Example query: Genes whose products have 7 transmembrane domains

Go to the Advanced Search - http://www.pombase.org/spombe/query/builder
Under "Select Filter" choose "Genes By Type", then choose "snoRNA". Click "Submit". You can download the resulting list of genes or the genomic sequences. Also see the FAQ on retrieving sequence coordinates.

Note that there are likely a number of snoRNAs that have not yet been identified and annotated in S. pombe; we hope to investigate further in the future.

Query: snoRNA genes

Downloadable intron datasets are available in FASTA format from the Intron Data page.

You can also find genes with introns using the PomBase Advanced Search. To find all genes with introns, search for genes with a specified number of exons, and use the range 2 (i.e. at least one intron) to 20 (more than the maximum known, 16 introns). You can also restrict the search to protein-coding genes. Note that the PomBase count includes introns in UTRs.

Instructions for searching PomBase

  1. Go to the Advanced Search - http://www.pombase.org/spombe/query/builder
  2. Under "Select Filter" choose "Genes That Have N Exons" (under the "Gene Filters" heading)
  3. Enter values: Minimum 2, Maximum 20
  4. Optional: to restrict to protein-coding genes, click "+". Leave the operator set to "AND", and choose "Genes by Type", then choose "protein_coding".
  5. Click "Submit". The results page has links to download the resulting list of genes or the genomic, cDNA or protein sequences. Note that we plan to offer additional download options, including coordinates, in the future. In the meantime, see the FAQ on finding sequence features in a region.

Query link: protein-coding genes with 2-20 exons

A dataset of intron branch sites is available as a track in the Genome Browser. The data were published in:

Bitton DA, Rallis C, Jeffares DC, Smith GC, Chen YY, Codlin S, Marguerat S, Bähler J. LaSSO, a strategy for genome-wide mapping of intronic lariats and branch points using RNA-seq. Genome Res. 2014 Jul;24(7):1169-79. (PMID:24709818; DOI:10.1101/gr.166819.113)

To view this data track, follow the instructions in the track configuration FAQ, and select the Intron Branch Point track.

Transcript start and end coordinates from all sources will be available as individual data tracks in the Ensembl genome browser in the near future, which will allow you to view, evaluate and download them. We also provide downloadable UTR data sets that are updated periodically.

Also see the precedence criteria used to choose default UTR features to display on gene pages.

To retrieve UTRs for a specified list of genes, see the FAQ on downloading sequences for multiple genes (choose 5' UTR and/or 3' UTR in step 9).

In PomBase, S. cerevisiae orthologs are curated for S. pombe genes as described in the Orthologs documentation.

To find S. pombe orthologs for a budding yeast gene, you can search for the systematic name (ORF name) of the S. cerevisiae gene in the Simple Search (go to http://www.pombase.org/search/ensembl or use the search box in the page header). For example, S. cerevisiae LRP1 has the systematic name YHR081W, and a search on this in PomBase will retrieve the S. pombe gene cti1. Note that only systematic names can be searched for S. cerevisiae, to avoid confusion in cases where unrelated genes coincidentally have the same name in S. pombe and S. cerevisiae. To find systematic names of S. cerevisiae genes, you can search SGD.

Also see the FAQ on downloading the full set of orthologs.

There are various ways you can find protein family members.

  1. If you know the Pfam, PRINTs, PROSITE, or InterPro accession for the family or domain you want, you can use the Advanced Search (http://www.pombase.org/spombe/query/builder). Go to the New Query tab, choose "Proteins That Have Specific Protein Domains" in the "Select Filter" pulldown, enter the accession, and submit.
  2. If you don't have an accession, but do know any member of the family, go directly to its gene page. In the "Protein Features" section of the gene page there is a table of protein domains and families, which includes a link to a list of all family members in S. pombe.
  3. If you know neither accessions nor family members, you can search for keywords in the InterPro database (http://www.ebi.ac.uk/interpro/), which combines signatures from a number of member databases, including Pfam. Record the accession number(s) of the family, and use them in the PomBase advanced search as described in item 1 above. (If necessary, you can use Query Management to combine the results of several queries.)

You can also try a keyword search in the PomBase advanced search, but this is much less reliable, because a keyword search may retrieve some proteins that don't have the domain or aren't family members due to coincidentally matching words in gene product descriptions. In the future, we plan to add the ability to search the full text of gene pages, which will provide another option for finding protein family information.

Example query: Proteins matching "ATPase, AAA-type, core" (Pfam:PF00004)

Yes: In the Advanced Search (http://www.pombase.org/spombe/query/builder), choose the "Conserved in ..." filter option. Then choose one of the descriptions, and submit.

Example query: Genes conserved in vertebrates

In PomBase, human orthologs are curated for S. pombe genes as described in the Manually Curated Orthologs documentation.

To find S. pombe orthologs for a human gene, you can search for the standard human gene name in the Simple Search (go to http://www.pombase.org/search/ensembl or use the search box in the page header). For example, searching for human ABTB1 will retrieve the S. pombe gene btb1. To find standard human gene names, you can search HGNC. Note that in a few cases, a human gene name will coincidentally match a name or synonym of a non-orthologous S. pombe gene as well as the actual curated ortholog(s), so please check the gene pages carefully, especially if your search retrieves more than one result.

Also see the FAQs on on finding genes conserved in human, finding disease gene orthologs, and on downloading the full set of curated orthologs.

Almost all genes that are conserved between fission yeast and human are also conserved in other vertebrates (there are two exceptions, genes encoding amino acid biosynthesis proteins that have become pseudogenes in human). To retreive these genes, go to the Advanced Search (http://www.pombase.org/spombe/query/builder), and choose the "Conserved in ..." filter option. Then choose the description "Conserved in vertebrates", and submit.

Also see the FAQs on finding disease gene orthologs, finding the ortholog of a specific gene, and on downloading the full set of curated orthologs.

Query link: Genes conserved in vertebrates

S. pombe genes whose human orthologs have been implicated in disease are annotated with terms from the internal PBO vocabulary. To retrieve all of these genes, you can use the most general "disease associated" term. To do the query manually:

  1. Go to the Advanced Search (http://www.pombase.org/spombe/query).
  2. Find the term:
    1. Select the 'PBO Term Name' filter, start typing 'disease associated', and choose 'disease_associated' from the autocomplete options; or
    2. Select the 'PBO ID' filter and enter 'PBO:5000000'.
  3. Submit the query.

You can also type all or part of specific disease name (e.g. 'cancer') into the 'PBO Term Name' filter to see if any matches come up in the autocomplete suggestions. Also see the FAQs on finding genes conserved in human, finding the ortholog of a specific gene, and on downloading the full set of curated orthologs.

Example queries:

For orthologs that are not manually curated by PomBase, we suggest two approaches:

Compara

You can search for orthologs/paralogs in Fungi, or in a pan-taxonomic comparison (eukaryotes), using Compara in the Ensembl browser.

  1. On any gene page, go to the Orthologs section (scroll or use the Quick Links box).
  2. Follow the relevant link to Compara - for fungal alignments, choose "View orthologs in other fungal species with Compara", or for all eukaryotic species choose "View orthologs across taxonomic space using pan-species Compara".
  3. You should see a "collapsed" gene tree highlighting your fission yeast gene of interest. From here you can click on any node to see a menu of options:
    1. Expand or collapes specific sub-nodes of the tree, or expand the tree fully
    2. View the alignment in FASTA format
    3. Launch the jalview multiple alignment viewer to see the full alignment and colour by residue conservation, hydrophobicity, etc.

To configure the protein entries visible in the alignment, select the most "inclusive" node you require. You can reduce the number of entries by collapsing individual sub-trees (step 4) before you generate your alignment. A brief video demostrates using the Compara trees.

Information about how the Compara trees are generated, homology types, and species is available from the Ensembl comparative genomics documentation.


YOGY

From any gene page, follow the link to YOGY under External References.

YOGY is a web-based resource for retrieving orthologous proteins from ten eukaryotic organisms and one prokaryote: Homo sapiens, Mus musculus, Rattus norvegicus, Arabidopsis thaliana, Dictyostelium discoideum, Drosophila melanogaster, Caenorhabditis elegans, Plasmodium falciparum, Escherichia coli, Schizosaccharomyces pombe, and Saccharomyces cerevisiae. Using a gene or protein identifier from any of these organisms as a query, this database provides comprehensive, combined information on orthologs in other species using data from five independent resources: KOGs, Inparanoid, Homologene, OrthoMCL

The "Drugs with knowns S. pombe targets" page lists drugs that have been shown to affect S. pombe, with brief summaries of their targets.

If you notice any errors or omissions on this page, or can provide any supporting references, please email the helpdesk.

You can search for GO terms by name or ID in the PomBase Advanced Search, and retrieve a list of all genes annotated to the term and its descendants via the relations is_a, part_of, regulates, positively_regulates, and negatively_regulates. For example, a search for "cytokinesis" will include genes annotated to "regulation of cytokinesis". (See the GO documentation on Ontology Structure and Ontology Relations for more information.)

S. pombe GO annotations are also available in browsers that use the GO repository, notably AmiGO and QuickGO. Both browsers have extensive documentation available:

Hint: to find S. pombe annotations, use Taxon: 4896 (Schizosaccharomyces pombe) or Source: PomBase. You can download the results in GAF format.

In PomBase, GO IDs on gene pages link to QuickGO, and ontology detail pages for GO terms offer links to both AmiGO and QuickGO.

In the future, we plan to make Fission Yeast Phenotype Ontology (FYPO) terms and annotations available in a browser analogous to AmiGO or QuickGO. Until such a browser becomes available, FYPO is accessible in these external resources:

NCBO BioPortal - search on the BioPortal home page, go to the FYPO summary page, or go to the FYPO terms page. For assistance, see the "User Interface" part of the BioPortal Help.

EBI's Ontology Lookup Service (OLS) - search on the OLS home page or go to the FYPO page. Help is provided on each page.

The Advanced Search includes a filter that retrieves all genes at once. Simply go to the New Query panel (http://www.pombase.org/spombe/query/builder), select the filter 'All Genes', and submit the query.

Query link: All genes

 

Yes, you can search for short nucleotide sequences, such as primers or other oligomers, in the PomBase BLAST. For sequences less than 20 nt long, however, you may need to change the search sensitivity from "Normal" to "Short sequences" using the pulldown menu at the bottom of the query form.

Go to the Genome Browser (in the Tools menu), and enter coordinates in the 'Search for:' box. The format is 'I:100000..200000' or 'I:100000-200000' (i.e. use Roman numerals to specify the chromosome, and don't include the word "chromosome"; use either '..' or '-' between the start and end coordinates.)

You can do this in the Genome Browser (from a gene page or the Tools menu). First, enter the coordinates, then click the Export Data button on the left-hand side. In the Output pulldown (topmost in the popup window) choose one of the formats under the "Feature File" header. Then follow the remaining steps to retrieve the sequence features -- add flanking sequences, select options for your selected output format, etc.

At present, there isn't a good way to retrieve flanking sequences for multiple genes in bulk directly from PomBase. (You can download coding sequences via the Advanced Search, or flanking sequences for individual genes via the gene page Sequence section.) We hope to add a more convenient option in the near future, but in the meantime, we recommend using the Ensembl Genomes Biomart query:

  1. Go to http://fungi.ensembl.org/biomart/martview/
  2. Select the database “Ensembl Fungi Genes” from the "CHOOSE DATABASE" drop-down menu.
  3. Select “Schizosaccharomyces pombe genes” from the "CHOOSE DATASET" drop-down menu. Additional options will appear in the left-hand sidebar.
  4. In the left-hand menu, click on the header “Filters”.
  5. Expand the section “GENE” by clicking the + sign
  6. In the drop-down menu in the section “ID list limit” select “PomBase Gene ID(s)”. (This will automatically tick the "ID list limit box.) In the box underneath, type or paste a list of S. pombe gene names or systematic IDs.
  7. In the left-hand menu, click on the header “Attributes”.
  8. Click the “Sequences” button, and expand the “SEQUENCES” section.
  9. Click a button to select which sequences you want. In the cartoon, red or black highlighting indicates what each option retrieves. Key: |---, 5' flanking region; leftmost box, 5' UTR; inner boxes, coding exons; rightmost box, 3' UTR; ---|, 3' flanking region; ^, introns.
  10. To include flanking regions, tick one or both of the "Upstream flank" and "Downstream flank" boxes, and enter the length you want. (Note: the "flank" options in the button selections retrieve ONLY flanking squence, and will only retrieve 3' or 5' in any given query, not both.)
  11. When you have specified what you want, find the "Results" button in the header and click it. You will be able to view or download the results, or have them emailed to you.

To find genes within a given set of chromosome coordinates, you must perform a complex query in the Advanced Search to specify which chromosome you want, and what coordinates. You can do the queries separately and then combine them, or set up a single query. In either case, first go to the Advanced Search page (http://www.pombase.org/spombe/query).

To query by separate steps:

  1. Select the 'Genes Between Chromosome Coordinates' filter. Enter your start and end coordinates, and click 'Submit'.
  2. Click the 'New Query' tab (in the horizontal list just under the page header).
  3. Select the 'Genes On Chromosome...' filter. Choose a chromosome from the pulldown. Click 'Submit'
  4. Click the 'Query Management' tab.
  5. Select your last two queries by checking the boxes on the left.
  6. Click 'Join (AND)' to find genes that match both sets of criteria.

(You can do the chromosome and coordinate queries in either order; it makes no difference.)

To do a single query:

  1. Select the 'Genes Between Chromosome Coordinates' filter. Enter your start and end coordinates.
  2. Click '+' to add another query parameter. Leave the 'Operator' pulldown on the left set to 'AND' (the default).
  3. Select the 'Genes On Chromosome...' filter. Choose a chromosome from the pulldown. Click 'Submit' to execute the entire query.

See the Advanced Search documentation for more information on setting up complex queries.

To find all features (not only genes), use the Genome Browser as described here.

Example query: Genes between coordinates 1000000-2000000 on chromosome 2

No; at present only the Ensembl genome browser is available via the PomBase web site. (As of May 2013, we are investigating the possibility of adding an Artemis applet to PomBase, and will update this FAQ accordingly when it becomes available.)

If you want to browse the S. pombe genome in the Artemis environment, it is fairly easy to download and run locally:

Once you have loaded the file(s), you can do many different things, e.g.:

  • Find features by name or ID
  • Find all features of a given type (e.g. see the "can I find transposons" FAQ)
  • Find matches to a specific nucleotide sequence (e.g. see the "restriction enzyme map" FAQ)
  • View the nucleotide or amino acid sequence of a region or feature
  • Export selected sequences

A video demonstrating Artemis installation is available on YouTube. See the Artemis FAQ and the Artemis manual (pdf; Sanger site) for additional information.

On gene pages, the source of the annotated transcript coordinates is shown with the UTR coordinates in the Transcript section (e.g. cdc2). PomBase curators have chosen default UTR features using three data sources and a set of precedence criteria:

  1. Highest priority is given to data from low-throughput "conventional" experiments preformed on individual mRNAs and reported in publications or submitted to EMBL. Where low-throughput data are not available, one of three high-throughput datasets is used.
  2. The Broad data published in 2011 by Rhind et al. (PMID:21511999) is given precedence because it is the most recent, is higher resolution and detected splicing within the UTRs. Note: This study used a "greedy" algorithm to determine the longest possible transcript from transcriptome reads, which may result in the prediction of longer UTRs than are actually present. Use these data with caution, and refer to the transcript profiling data in the genome browser for genes of interest.
  3. For genes not covered by (1) or (2), start/end data from Lantermann et al. (PMID:20118936) based on transcriptome data from Dutrow et al. (PMID:18641648) are used where available.
  4. For genes not covered by (1), (2) or (3), we use data from Wilhelm et al. (PMID:18488015).

More information is available in the mailing list archive for two HTP datasets (Broad: http://listserver.ebi.ac.uk/pipermail/pombelist/2011/000856.html ; Lanterman/Dutrow: http://listserver.ebi.ac.uk/pipermail/pombelist/2011/000814.html).

Transcript start and end coordinates from all sources will be available as individual data tracks in the Ensembl genome browser in the near future, which will allow you to view and evaluate them. PomBase will also curate splice and transcript variants as data become available.

Yes. First, make sure sequence display is "on":

  1. From the Location tab click on the "Configure this Page"
  2. In the left hand menu of the pop-up, click on "Sequence and assembly"
  3. If the box next to "Sequence" is blank, click it

(also see the FAQ on configuring tracks).

The Location tab shows two graphics. Between the two images there is a slider that controls the zoom. Click the '+' or drag the vertical bar to the left to zoom in. The lower graphic will first display colored blocks representing color-coded nucleotides, and then, at maximum zoom, legible sequence. (Note that the vertical bar may appear to be all the way to the left of the slider before you actually zoom in enough to read the sequence; keep clicking '+' if necessary.)

At any zoom level, use the arrows flanking the zoom slider to scroll along the sequence.

To search PomBase for transposable elements:

  1. Go to the Advanced Search - http://www.pombase.org/spombe/query/builder
  2. Under "Select Filter" choose "Gene Annotation Status" and then choose "Transposon".
  3. Click "Submit". The results page has links to download the resulting list of genes or the genomic, cDNA or protein sequences. Note that we plan to offer additional download options, including coordinates, in the future.

At present, there are 11 full-length transposons annotated, and two frameshifted copies.

Query link: Transposons

Lone LTRs are also annotated as sequence features. They cannot yet be retrieved by the simple or advanced searches, but they can be displayed on a track in the Ensembl browser (under "Repeats").

Finally, if you wish to install Artemis (available from http://www.sanger.ac.uk/resources/software/artemis/), you can use it to view LTRs in more detail. Read in the EMBL format files of sequence and annotation (available from the Genome Datasets page). To see LTRs,

  1. In the Select menu, choose "By Key".
  2. In the pulldown that pops up, choose "LTR".

A video demonstrating Artemis installation is available on YouTube. See the Artemis FAQ and the Artemis manual (pdf; Sanger site) for additional information.

PBO is an internal set of terms used for various PomBase annotations that do not fit into any of the other ontologies in use. PBO IDs and term names can be queried in the Advanced search, and are most useful if you have noted a term or ID from a gene page. Examples include complementation annotations (e.g. cdc2 'functionally complemented by H. sapiens CDK1' PBO:0012584), disease association, and "miscellaneous" annotations (e.g. pom1 'forms a polar gradient' PBO:0000437).

PomBase offers two ways to find polyadenylation sites and usage:

  • Each gene page has a link in the External References section to the Pomb(A) database of polyadenylation signal and cleavage sites.
  • Fission yeast polyadenylation data are available in the genome browser. To display the data:
  • From any gene page, click the "View in Genome Browser" link.
  • Go to "Configure this page" option in the left-hand menu.
  • Select "Polyadenylation sites" in the left-hand menu of the popup window. Select the tracks you want to show, then click the "tick" in the corner.

Further help with configuring browser tracks is available.

We plan to offer a downloadable list of protein-coding genes (5052 as of release version 23_47, October 2014) in the near future.

In the meantime, you can use the Advanced Search to retrieve a list. All protein coding genes have the type "protein coding", but this type also includes a few transposon genes and several genes that are dubious (i.e. predicted by automated methods considered unlikely to actually encode protein), which you will presumably want to exclude from the set. To do so, use the NOT operator and the "Annotation Status" filter. The query is:

Genes By Type protein coding
NOT Annotation Status dubious
NOT Annotation Status transposon

query for protein coding genes

You can also perform the query in separate steps:

  1. In the New Query panel, query for Feature Type protein coding (query 1)
  2. New Query - NOT Annotation Status dubious (query 2)
  3. New Query - NOT Annotation Status transposon (query 3)
  4. In the Query Management panel, select 2 and 3 and combine them with OR (union); this forms query 
  5. Also in Query Management, select query 1 and query 4, and follow the instructions to combine them with NOT.

See the Advanced Search documentation for more information on performing the search described here.

Query link: Protein-coding genes (excluding 'dubious' and 'transposon' status)

A file of cDNA sequences in FASTA format is available on the Genome Datasets page.

Available options:

  1. Download one of the files available via the Genome Datasets page. The GFF3 files contain coordinates, whereas the EMBL- and GenBank-format files contain both coordinates and sequence data. You can then parse the files for the feature type you need. For example, to find all non-coding RNAs, search for "ncRNA_gene"; for coding sequences, use "CDS", etc. There are also separate files available for CDS and UTR data.
  2. If you only need genes, you can use the Advanced Search to find all genes of a given type. (Note that non-gene features such as repeats cannot be retrieved by this method.) Select the "Genes By Type" filter, then choose a type from the pulldown menu. The results include coordinates, and the "Download Results" options include sequences in FASTA format. If you need more than one feature type, query for each type and then use Query Management to combine the individual queries with the OR operator. See the Advanced Search documentation for more information.
  3. The bioinformatically inclined can also use the Ensembl Genomes REST API to retrieve transcript feature coordinates, as described in the FAQ on pombe transcriptome sequences. Select the desired feature type(s) from the output file of stable IDs (bear in mind that Ensembl idiosyncratically uses "biotype" to mean feature type).

Example advanced search query: snoRNA genes

PomBase offers two ways to view nucleotide-level similarity between S. pombe and S. japonicus, S. octosporus, or S. cryophilus. Both use the Genome Browser.

  1. To view nucleotide similarity data tracks in the browser, follow the usual steps as described in the data track FAQ. Select the data type "Comparative Genomics".
  2. Display syntenic regions as follows:
    1. Go to your region of interest in the browser (e.g. follow the link from a gene page or use sequence coordinates). Make sure the "Location" tab is selected in the horizontal set of tabs along the top.
    2. In the left-hand menu, find the "Comparative Genomics" heading, and click on "Region Comparison".
    3. To select a species for comparison, go to the bottom of the left-hand menu, and click the "Select species or regions" link (it may appear to be subtly blinking; we apologise for this anomaly).
    4. In the popup, click the "+" beside any species in the "Unselected species or regions" list to move it to the "Selected species or regions" list. Note: "lastz" is the nucleotide alignment algorithm used. Close the popup - click the tick/check mark in the upper right corner, or click outside the popup.
    5. Synteny views will now be visible in the bottom-most graphical display (scroll down if necessary). For any region in the S. pombe genome, pink tracks show the region in the second genome with the best nucleotide alignment. Green bands connect the best-aligned regions to highlight synteny.
    6. A video is available demonstrating this feature.

There is a data track available for transcription factor binding sites in the genome browser. Follow the instructions for showing tracks, and choose "Transcription Factor Binding Sites" in the left-hand menu of the pop-up.

Also see the FAQ on finding transcription factors.

Go to the Advanced Search - http://www.pombase.org/spombe/query/builder
Under "Select Filter" choose "Genes By Type", then choose "rRNA". Click "Submit". You can download the resulting list of genes or the genomic sequences. Also see the FAQ on retrieving sequence coordinates.

Also see the FAQ on rDNA sequences.

Query: rRNA genes

There are two possible approaches:

1. Retrieve a set of GO annotations in GAF format for S. japonicus, S. octosporus or S. cryophilus, as described in the relevant FAQ. Use the GO annotation dataset and your gene list for enrichment.

OR

2. In your gene list of interest, substitute the Schizosaccharomyces species gene IDs with the IDs of orthologous S. pombe genes. For ortholog IDs, see the FAQ on Schizosaccharomyces orthologs, and use the indicated table from Rhind et al. Comparative functional genomics of the fission yeasts (PMID:21511999).

In either case, you can then proceed as described in the FAQ on S. pombe GO enrichment. For the first option, use the Princeton GO Term Finder or another enrichment tool that allows you to use your own GAF, and include the GO Slim analysis using GO Term Mapper as recommended in the FAQ on enrichment in S. pombe.

For the sequenced strains of S. japonicus, S. octosporus and S. cryophilus, the Ensembl group has generated GO annotation data sets for protein-coding genes by transferring experiment-based annotations from S. pombe orthologs. You can use the QuickGO browser to retrieve the data for each species -- follow the "Search and Filter GO annotation sets" link, then click "Filter" to set a taxon filter for the taxon ID:

S. japonicus (strain yFS275) - 402676
S. octosporus (strain yFS286) - 483514
S. cryophilus (strain OY26) - 653667

Because these automated annotations are inferred only from experimentally-derived S. pombe annotations, coverage will not be complete.

Note that the GAF downloaded from QuickGO uses UniProtKB accessions in the gene product ID column (column 2). To use the GAF in any further analysis, such as term enrichment, you will have to convert the accessions to systematic IDs. See the FAQ on ID mapping for hints.

One feasible approach to improve annotation coverage is to download the S. pombe GO annotations (see the GO Associations download page), and then substitute the S. pombe IDs with the IDs of orthologous genes from the other Schizosaccharomyces species of interest. For ortholog IDs, see the FAQ on Schizosaccharomyces orthologs, and use the indicated table from Rhind et al. Comparative functional genomics of the fission yeasts (PMID:21511999).

Note that some genes are present in S. japonicusS. octosporus or S. cryophilusbut absent fromS. pombe. For some of these gene products, GO annotations can be transferred from other species. If you wish to include annotations for these genes in your analysis you will need to use this option, and extend your GAF with the relevant annotation lines (contact the Helpdesk if you need assistance).

Combining all approaches gives the best coverage possible at present. You can use a "GO Slim" tool such as Princeton's GO Term Mapper to see if there are any gaps in coverage, as described in the FAQ on enrichment in S. pombe. Also see the FAQs on GO term enrichment in other Schizosaccharomyces species.

At present, if you need sequences for all tRNAs, rRNAs, other ncRNAs, etc. we recommend using the Advanced Search and results download as described in the FAQ on retrieving sequence coordinates for all features of a particular type.

Downloadable FASTA sequence datasets will be added to the PomBase FTP site in the near future.

On a gene-by-gene basis, you can use the link to "View orthologs in other fungal species with Compara" as described in the FAQ on orthologs in other species.

For a full set of orthologous genes in S. pombe, S. cryophilus, S. japonicus and S. octosporus, see Table S12, columns AD-AG, in Rhind et al. Comparative functional genomics of the fission yeasts (PMID:21511999).

The best way to find metabolism-related annotations for S. pombe genes is to use the GO annotation data available from PomBase in combination with mappings between GO terms and entries in the various metabolism-oriented databases.

For example, many GO molecular function (MF) terms representing enzymatic activities are mapped to the corresponding Enzyme Commission (EC) number for the reaction, and some are also mapped to entries from KEGG or from the Rhea database of annotated chemical reactions. GO MF and biological process (BP) terms may be annotated to reactions or pathways, respectively, in MetaCyc or Reactome.

A complete list, with descriptions and links, is available on the GO Consortium's Download Mappings page.

FYPO enrichment analysis is analogous to GO term enrichment, using phenotypes rather than GO annotations, i.e. analysing a gene list by finding FYPO terms that are significantly over- or under-represented among the annotations for the genes.

At present, PomBase does not have its own FYPO enrichment tool, and very few ontology enrichment tools can use phenotype data. One that does is AnGeLi, produced by Jürg Bähler's lab.

A small number of enrichment tools use phenotype data. See the FAQ on FYPO term enrichment.

We recommend using only the genome sequence, either from PomBase downloadable files or from the sequence retrieval tools on the gene pages and in the genome browser. Although there are some sequence updates still pending, the genome sequence is more accurate than individual gene sequences that predate the genome.

Many older S. pombe sequence submissions to the DNA databases (International Nucleotide Sequence Database Collaboration databases, i.e. ENA, GenBank, DDBJ) contain one or more errors (sometimes with an error rate as high as 20%), and we do not have the resources to maintain past sequences or flag every error in PomBase.

The PomBase genome browser includes a data track of core promoter locations from Li et al. (2015) Genome-wide analysis of core promoter structures in Schizosaccharomyces pombe with DeepCAGE. RNA Biol. 12:525-37. PMID:25747261. The promoter track is listed under the Regulatory Elements menu, and is best viewed using the "Labels" track style, which labels promoter features using the associated gene IDs (see the browser configuration FAQ for more information).

We will add new tracks for any more promoter data sets that are submitted to us.

There are very few manually curated promoters in PomBase, which are displayed on a genome browser track, "PomBase Annotated Promoter (SO:0000167)". This track is under the Sequence and assembly menu, and is activated in the default configuration. To search the manually curated promoters, we suggest that you use Artemis. Follow the instructions in this FAQ, and search for features with "Key" = "promoter".

A selection of protein sequence motifs and features have been manually curated using terms  from the Sequence Ontology (SO). For example, Rad54 has a KEN box (a motif recognized by the anaphase-promoting complex; SO:0001807), and Cuf1 and Trz1 have nuclear localization signals (NLS; SO:0001528). These annotations are included in the Protein Features section of the gene page.

To search for these features, use one of the "Sequence Ontology" filters in the Advanced Search (see the documentation for help with searching).

Also see the FAQs on transmembrane domains and protein families, and the section of the search documentation on using Protein Filters.

Example query: nuclear localization signal (SO:0001528)

If there is complementation data available for an S. pombe gene, it will be displayed in the Complementation section of the gene page. For example, ura3 can be complemented by S. cerevisiae URA1, and itself complements human DHODH.

To search for complementation annotations, use one of the "PBO" filters in the Advanced Search (see the documentation for help with searching). The complementation descriptions are stored as entries in the PBO internal ontology, so a search for PBO term names that match "complements" or "complemented by" will retrieve genes with complementation data curated. The most general term, "complementation" (PBO:2000000) retrieves all genes that have any complementation annotation.

Example queries:

 

The best way to find genes that have any effect on a process, we recommend searching for both GO and FYPO terms relevant to the process.

As described in the FAQ on GO and FYPO annotations, PomBase curators annotate all genes with phenotypes that affect a process, whereas GO annotations are restricted to genes whose products act directly in a process or its regulation. By querying for genes annotated to either a GO term or a FYPO term, you can find genes with relevant phenotypes (including "downstream effects") as well as genes involved in a process (with or without mutant phenotypes affecting the process).

Use the "OR" operator in the PomBase Advanced Search, available in Query Management, as described in the Advanced Search documentation. For example, to find genes that affect cellular respiration, search for "FYPO:0000078 (abnormal cellular respiration) OR GO:0045333 (cellular respiration)". For any process, you can try using more specific or less specific terms to retrieve the results that best fit your expectations and needs.

Example query: genes annotated to 'abnormal cellular respiration' (FYPO:0000078) or 'cellular respiration' (GO:0045333)

You can use the Ensembl Genomes (EG) MySQL database access to query S. pombe data. Note, however, that there is often a time lag in updating EG, so it may not have data as up-to-date as on the PomBase web site. MySQL dumps of EG data, including Schizosaccharomyces species, are available from EG's FTP site. (We plan to provide MySQL dumps for PomBase releases soon.)

For Chado, we do not have a publicly accessible PostgresQL server. Instead, you can download Chado database dumps to query locally.

The GO annotations available from PomBase (gene pages, advanced search, etc.) and the GO Consortium site (AmiGO; GO downloads) differ from those available from the UniProt GOA site (including QuickGO) for three main reasons:

  1. RNA - PomBase provides GO annotations for functional RNAs (e.g. rRNA, tRNA, snRNA), but at present the UniProt GOA dataset only includes annotations for protein-coding genes.
  2. Time lag - S. pombe GO data are updated at the same time on the PomBase and GO Consortium sites, but the UniProt GOA site may be up to a few weeks behind.
  3. Filtering - PomBase does not include automated annotations that are redundant with manual annotations (contact the Helpdesk for further details). The GO Consortium site uses the same filtered annotation dataset as PomBase, whereas the UniProt GOA site includes the automated annotations.



You can find the GO annotations for your genes corresponding to functional roles and localizations. Our recommended approach depends on how many specific topics you are interested in:

  • For a small number of specific GO terms (e.g. localization to the nucleus or cytoplasm, or a role in signaling or DNA metabolism), you can import your gene list into the Advanced Search and then combine it with a query for each term of interest (use the "Systematic IDs" filter for your list, and then the Term name or GO ID filter; see the search documentation for more information).
  • If you are interested in many GO terms, or if you do not know in advance which terms may be relevant, we recommend that you use a "GO term enrichment" tool. Such tools are typically used to find terms overrepresented for a gene list, but can be used to retrieve all GO annotations if the p-value threshold is set artificially high.

Both the Advanced Search and term enrichment tools take advantage of the hierarchical structure of GO, such that annotations to specific terms are propagated to "ancestor" terms via is_a and part_of relations. See the PomBase GO documentation, and the GO Consortium documentation linked there, for more information. (These approaches also make it easier to maintain and update your data than storing individual GO annotations locally.)

Also see the FAQ on GO term enrichment and the PomBase GO Slim page.

"GO term enrichment" refers to analysing a gene list by finding GO terms that are significantly over- or under-represented among the annotations for the genes. Finding GO terms that are shared by genes in your list can help you find out what they have in common biologically.

PomBase does not have its own GO enrichment tool, but we recommend one, and provide a bit more information, in the FAQ on GO term enrichment.

Yes, the Phenotype annotations page offers two options, a complete phenotype annotation file and a "viability summary" for deletion mutants. At present, the full file contains all manually curated single mutant phenotypes, and is in the same format as PomBase uses for bulk phenotype data submissions (see the file formats FAQ). Further information on the viability summary is available in the essential genes FAQ.

Orphan genes are generally defined as genes without homologs in other organisms. In PomBase, genes conserved in the Schizosaccharomyces genus are distinguished from genes conserved only in S. pombe.

To retrieve either set of genes, use the "Conserved in" filter in the Advanced Search. Choose "Schizosaccharomyces specific" for genes found in more than one Schizosaccharomyces species, or "Schizosaccharomyces pombe specific" for genes found only in S. pombe. See the Advanced Search documentation for help with performing searches.

Historical note: Prior to August 2014, PomBase and its predecessor GeneDB referred to single-copy genes conserved within, but not outside, the Schizosaccharomyces genus as "sequence orphans". See the Gene Characterisation Statistics History page for more details (note that the gene characterisation classifications reflect whether a gene has been studied experimentally as well as the extent of its conservation).

All sequence-specific DNA-binding transcription factors should be annotated to at least two GO Molecular Function terms, either directly or by transitivity (i.e. annotated to a more specific "descendant" term linked to one of these terms):

  • GO:0000976 transcription regulatory region sequence-specific DNA binding (view in QuickGO or AmiGO)
  • GO:0003700 sequence-specific DNA binding transcription factor activity (view in QuickGO or AmiGO)

Annotation extensions are used to capture two types of "target" data (where available):

  • Annotations to GO:0000976 (or a descendant) may have extensions that capture DNA binding specificity using Sequence Ontology (SO) terms. A list of DNA binding sites identified in S. pombe is available on the DNA Binding Sites page.
  • Annotations to GO:0003700 (or a descendant) may have extensions identifying target genes.

Because it is not yet possible to query annotation extensions in the PomBase Advanced Search, to identify target genes you must either inspect transcription factor gene pages manually, or search the GO annotation dataset. For the latter:

  1. Download the GO annotation file (GAF) from the GO Associations page. The file is tab-delimited text, so it can be opened in a spreadsheet application or parsed with a script; the format is described on the GO website.
  2. Look up the GO IDs for the specific terms to which genes are directly annotated -- the "Child Terms" feature in QuickGO is good for this (for "transcription factor activity", the most commonly used term is GO:0000978, RNA polymerase II core promoter proximal region sequence-specific DNA binding).
  3. Find annotations to the GO IDs of interest (GO ID is column 5; gene ID column 2), and then look at the annotation extensions (column 16).
  4. Contact the Helpdesk if you have any problems or questions.

Finally, note that not all S. pombe transcription factors have been extensively characterised with respect to target genes, and for those that have, target curation in PomBase may be incomplete. You may therefore wish to query for transcription factors that have been have been experimentally characterised, and therefore might have targets which are not yet curated. To do so, use the Advanced Search to find which of the genes annotated to the transcription factor-related GO terms above have the annotation status "published" (e.g. GO ID "GO:0000978" AND Annotation Status "published"; see the Advanced Search documentation for more tips on setting up the query).

Query links:

Genes annotated to 'transcription regulatory region sequence-specific DNA binding' (GO:0000976) or 'sequence-specific DNA binding transcription factor activity' (GO:0003700)

Genes annotated to 'RNA polymerase II core promoter proximal region sequence-specific DNA binding' (GO:0000978) with annotation status 'published'

Replication origin coordinates are not yet include in PomBase. We hope to obtain comprehensive origin data in the genome browser in the future, but we rely on user submissions to set priorities for adding data tracks to the browser.

Until replication origins are available in PomBase, we suggest that you use the data collated by Conrad Nieduszynski in S. pombe OriDB:

At present this is not possible; all advanced search results are ordered alphabetically by systematic ID.

As a workaround, we suggest that you download the results using the TSV link, and import them into a spreadsheet. You can then combine the results with other data in the spreadsheet, and use the spreadsheet software to sort on any column.

The best route depends on the amount and type of data you have.

  • If you have a published paper with "small-scale" data (a few genes, several different data types, etc.), you can curate it in Canto, PomBase's online community curation tool. The Canto documentation describes the supported data types (Gene Ontology, phenotypes, interactions, modifications) and how to use the curation interface.
  • If you have data from large-scale experiments that is associated with sequence coordinates, it can be displayed as a track in the Genome Browser. Use the HTP Data Submission form to send details.
  • If you have a large set of phenotype data (e.g. from a screen of the deletion collection), especially if there are many annotations to one or a few FYPO term(s), you can send them in the spreadsheet format supported by the Batch Phenotype Data Submission form.
  • For large sets of genetic or physical interaction data, we recommend that you prepare a spreadsheet using the template provided by BioGRID (see "Step 1. Send Us Your Interaction Data"). You are welcome to send the spreadsheet either to BioGRID or to PomBase (via the helpdesk); PomBase and BioGRID exchange data regularly so all interactions will appear in both databases.

You can do both Canto curation and large-scale data submission for a single publication, if it reports both large- and small-scale experimental results.

For any data types not listed above, or if you have any questions, please contact the PomBase helpdesk.

PomBase welcomes large sets of published data. The recommended submission route depends on the data type:

  • For any data that can be associated with genome sequence coordinates (e.g. gene expression, ChIP-seq protein localisation, variation, etc.), please use the Data Submission Form for HTP sequence-linked data.
  • Several types of data associated with genes can be displayed on gene pages. PomBase has developed bulk upload formats for phenotype, modification, and gene expression data -- see the FAQ on file formats for links to the file format descriptions and data submission forms.

If you have any other type of large-scale data -- or if you have problems or questions regarding the available submission forms -- please contact the Helpdesk.

At present, PomBase can host any types of data that can be connected with sequence features or coordinates, and can display the data as tracks in the genome browser. We accept data in any of several formats, listed on the genome browser documentation page on Adding Custom Tracks to Ensembl. To choose a file format for your data, consult the table below and the linked FAQs. Please consult the helpdesk if you need further assistance. consult the helpdesk if you need further assistance.

File format Recommended for
BAM sequence alignments, especially from high-throughput experiments
BED sequence features with coordinates
bedGraph values attached to genome locations/regions
bigBed sequence features with coordinates
bigWig values attached to genome locations/regions
GFF3 sequence features with coordinates
PSL sequence alignments
VCF structural variations, such as SNPs, insertions, deletions, or copy number variants
WIG values attached to genome locations/regions

We can also accept batch submissions of certain types of data that appear on PomBase gene pages. For these data types, we use dedicated PomBase-specific formats as shown in the table:

Data type File format description Submission form
Phenotypes phenotype file format submit phenotype data
Modifications modification file format submit modification data
Qualitative gene expression qualitative gene expression file format submit qualitative gene expression data
Quantitative gene expression quantitative gene expression file format submit quantitative gene expression data

 

We may be able to accept data in other text formats. Please enquire via the PomBase helpdesk if you have any questions about your data format.

Wiggle (WIG) is a file format for display of continuous-value data in a genome browser track.

BigWig format is described at the UCSC Genome Bioinformatics web site, and the Broad Institute file format guide provides additional information.

PSL is a tab-delimited text format that represents sequence alignments.

PSL format is described in the UCSC Genome Bioinformatics FAQ, and the Broad Institute file format guide provides additional information.

BigWig is a file format for display of dense, continuous data in a genome browser track, created by conversion from Wiggle (WIG) format.

BigWig format is described at the UCSC Genome Bioinformatics web site, and the Broad Institute file format guide provides additional information.

BigBed is a binary file format that is created by conversion from BED, and thus stores similar types of data for display in a genome browser track.

BigBed format is described at the UCSC Genome Bioinformatics web site, and the Broad Institute file format guide provides additional information.

BedGraph is a file format that allows display of continuous-valued data in a genome browser track.

BedGraph format is described at the UCSC Genome Bioinformatics web site, and the Broad Institute file format guide provides additional information.

If you have strand-specific data that can be represented in bedGraph format, we recommend submitting two bedGraph files, one per strand.

BED is a tab-delimited text format that defines a feature track for a genome browser.

BED format is described in the UCSC Genome Bioinformatics FAQ, and the Broad Institute file format guide provides additional information.

Variant Call Format (VCF) is a text file format used to describe structural variations, such as SNPs, insertions, deletions, or copy number variants.

The file format specification is available from the 1000 Genomes web site.

The UCSC Genome Bioinformatics FAQ and the Broad Institute file format guide provide additional information.

BAM is a binary file format used for nucleotide sequence alignment data.

The file format specification (PDF) is available from the SAMtools web site.

The UCSC Genome Bioinformatics FAQ and the Broad Institute file format guide provide additional information.

Generic Feature Format Version 3 (GFF3) is a tab-delimited text file format used to represent genomic sequence features.

PomBase produces a GFF3 file of S. pombe sequence features, and accepts high-throughput data submissions in this format.

The file format specification is available from the Sequence Ontology web site, which also provides a link to validation software.

In the Ensembl genome browser, click the "Configure this page" button in the left-hand bar. A pop-up box will appear. Note that this box has several tabs along its top, and the exact selection of tabs and configuration options depends on whether you are configuring the "Location", "Gene", or "Transcript" tab of the main browser.

To turn a track on or off, click the small box to the left of its description. Note that some tracks simply toggle on and off, whereas for others a small popup appears, in which you can select from a set of options controlling exactly how the track appears. The left-hand bar of the configuration popup organizes available tracks into subsets, and offers a few additional options (including "Reset configuration", which restores the default display).

For example, to show or hide repeat regions, make sure you have the "Location" tab selected. The tabs for this configuration allow you to configure the "Region" (lower) and "Overview" (lower) images separately. In the "Configure Region Image" tab, click "Repeat regions" in the popup's left-hand bar. You can then check one box to show all repeats, or select specific types of repeat to display.

When you are finished choosing tracks, click the tick/check mark in the upper right corner of the configuration popup.

PomBase welcomes large sets of published data. The recommended submission route depends on the data type:

  • For any data that can be associated with genome sequence coordinates (e.g. gene expression, ChIP-seq protein localisation, variation, etc.), please use the Data Submission Form for HTP sequence-linked data.
  • Several types of data associated with genes can be displayed on gene pages. PomBase has developed bulk upload formats for phenotype, modification, and gene expression data -- see the FAQ on file formats for links to the file format descriptions and data submission forms.

If you have any other type of large-scale data -- or if you have problems or questions regarding the available submission forms -- please contact the Helpdesk.

PomBase offers two ways to find polyadenylation sites and usage:

  • Each gene page has a link in the External References section to the Pomb(A) database of polyadenylation signal and cleavage sites.
  • Fission yeast polyadenylation data are available in the genome browser. To display the data:
  • From any gene page, click the "View in Genome Browser" link.
  • Go to "Configure this page" option in the left-hand menu.
  • Select "Polyadenylation sites" in the left-hand menu of the popup window. Select the tracks you want to show, then click the "tick" in the corner.

Further help with configuring browser tracks is available.

Genome sequence files can be downloaded from the Genome Datasets page in several different formats.

At present, PomBase can host any types of data that can be connected with sequence features or coordinates, and can display the data as tracks in the genome browser. We accept data in any of several formats, listed on the genome browser documentation page on Adding Custom Tracks to Ensembl. To choose a file format for your data, consult the table below and the linked FAQs. Please consult the helpdesk if you need further assistance. consult the helpdesk if you need further assistance.

File format Recommended for
BAM sequence alignments, especially from high-throughput experiments
BED sequence features with coordinates
bedGraph values attached to genome locations/regions
bigBed sequence features with coordinates
bigWig values attached to genome locations/regions
GFF3 sequence features with coordinates
PSL sequence alignments
VCF structural variations, such as SNPs, insertions, deletions, or copy number variants
WIG values attached to genome locations/regions

We can also accept batch submissions of certain types of data that appear on PomBase gene pages. For these data types, we use dedicated PomBase-specific formats as shown in the table:

Data type File format description Submission form
Phenotypes phenotype file format submit phenotype data
Modifications modification file format submit modification data
Qualitative gene expression qualitative gene expression file format submit qualitative gene expression data
Quantitative gene expression quantitative gene expression file format submit quantitative gene expression data

 

We may be able to accept data in other text formats. Please enquire via the PomBase helpdesk if you have any questions about your data format.

If an essential gene is deleted, the cell cannot survive under normal laboratory conditions. A search for deletion alleles annotated to the Fission Yeast Phenotype Ontology term "inviable vegetative cell population" (FYPO:0002061) would therefore identify essential fission yeast genes. Similarly, deletion alleles annotated to "viable vegetative cell population" (FYPO:0002060) represent non-essential genes.

Downloadable summary

A set of "viability summary" data, as shown at the top of the FYPO table on each gene page, is available as a downloadable file. The file has two columns: the gene systematic ID and one of three values: "viable", "inviable" or "condition-dependent".

Querying

  • To find genes annotated to "inviable vegetative cell population", select the "FYPO ID" filter and type or paste the ID, FYPO:0002061. Set the Allele Expression pulldown to "Null Expression" and submit the query. The results include all genes that showed inviable phenotypes in the HTP deletion project as well some manually annotated genes. Do the same for viable (FYPO:0002060).
  • For some deletion mutants, viability depends on experimental conditions, which cannot yet be queried in PomBase. These genes are annotated to both viable (FYPO:0002060) and inviable (FYPO:0002061) at once. To find them, use the "AND" operator in the Query Management panel (this search can also be set up all at once in the New Query panel).
  • See the Advanced Search documentation for more information on performing the searches described here.

A brief note about FYPO terms

At present, there are very few null mutants annotated as inviable in life cycle stages other than vegetative growth, and "inviable vegetative cell population" best fits the most common usage of "essential gene". If you do want to include other stages (such as "inviable spore"), you can use the very generic term "inviable cell population" (FYPO:0002059) or "viable cell population" (FYPO:0002058) in your query. All of the caveats about alleles and conditions still apply.

Query links

Yes, there is a file that lists GO macromolecular complex assignments for fission yeast gene products in the FTP directory:

ftp://ftp.ebi.ac.uk/pub/databases/pombase/pombe/Complexes/

Note that the complex inventory includes the RNA subunits of ribonucleoprotein complexes. There is some redundancy in the list, because some gene products are annotated to both complexes and subcomplexes. For example, subunits of the DASH complex (GO:0042729) are annotated to 'condensed chromosome outer kinetochore' (GO:0000940) as well as GO:0042729. Additional notes are available in a README file: ftp://ftp.ebi.ac.uk/pub/databases/pombase/pombe/Complexes/README

Also see the FAQ on localization.

The current version of the manually curated list of orthologs and orthologous groups identified between fission and budding yeast is available for download from the Orthologs page (linked from the Datasets page).

The current version of the manually curated list of orthologs and orthologous groups identified between fission yeast and human is available for download from the Orthologs page (linked from the Datasets page).

Also see the FAQs on finding genes conserved in human, finding disease gene orthologs, and finding the ortholog of a specific gene.

There is no single transcriptome sequence file available from PomBase at present. Several transcriptomic data sets are available as tracks in the PomBase genome browser. The GFF3 genome feature files available from the Genome Datasets page include the coordinates of the annotated full-length transcript features.

The bioinformatically inclined can also use the Ensembl Genomes REST API to retrieve transcript feature coordinates. The FAQ on programmatic access to PomBase provides an introduction to using the API, some pombe-specific examples, and links to additional documentation.

The Broad Institute has archived genomic data files for the Schizosaccharomyces species, including transcript files.

A dataset of intron branch sites is available as a track in the Genome Browser. The data were published in:

Bitton DA, Rallis C, Jeffares DC, Smith GC, Chen YY, Codlin S, Marguerat S, Bähler J. LaSSO, a strategy for genome-wide mapping of intronic lariats and branch points using RNA-seq. Genome Res. 2014 Jul;24(7):1169-79. (PMID:24709818; DOI:10.1101/gr.166819.113)

To view this data track, follow the instructions in the track configuration FAQ, and select the Intron Branch Point track.

Yes, if you have locally stored data that you want to see in the context of the genome browser, you can add it as a custom track. The Ensembl documentation on adding custom tracks describes supported file formats, options for uploading or linking your data, and instructions on using the web interface to configure your tracks.

On gene pages, the source of the annotated transcript coordinates is shown with the UTR coordinates in the Transcript section (e.g. cdc2). PomBase curators have chosen default UTR features using three data sources and a set of precedence criteria:

  1. Highest priority is given to data from low-throughput "conventional" experiments preformed on individual mRNAs and reported in publications or submitted to EMBL. Where low-throughput data are not available, one of three high-throughput datasets is used.
  2. The Broad data published in 2011 by Rhind et al. (PMID:21511999) is given precedence because it is the most recent, is higher resolution and detected splicing within the UTRs. Note: This study used a "greedy" algorithm to determine the longest possible transcript from transcriptome reads, which may result in the prediction of longer UTRs than are actually present. Use these data with caution, and refer to the transcript profiling data in the genome browser for genes of interest.
  3. For genes not covered by (1) or (2), start/end data from Lantermann et al. (PMID:20118936) based on transcriptome data from Dutrow et al. (PMID:18641648) are used where available.
  4. For genes not covered by (1), (2) or (3), we use data from Wilhelm et al. (PMID:18488015).

More information is available in the mailing list archive for two HTP datasets (Broad: http://listserver.ebi.ac.uk/pipermail/pombelist/2011/000856.html ; Lanterman/Dutrow: http://listserver.ebi.ac.uk/pipermail/pombelist/2011/000814.html).

Transcript start and end coordinates from all sources will be available as individual data tracks in the Ensembl genome browser in the near future, which will allow you to view and evaluate them. PomBase will also curate splice and transcript variants as data become available.

S. pombe genome features were originally annotated using Artemis. As noted in the manual (ftp://ftp.sanger.ac.uk/pub/resources/software/artemis/artemis.pdf - see p. 9), Artemis draws from a list of feature keys that is documented at EBI: ftp://ftp.ebi.ac.uk/pub/databases/embl/doc/FT_current.html#7.2

In the genome sequence data files, features are defined using Sequence Ontology terms. Gene pages use a selection of human-friendly text descriptions for feature types. (Further details will be available here soon.)

Transcript start and end coordinates from all sources will be available as individual data tracks in the Ensembl genome browser in the near future, which will allow you to view, evaluate and download them. We also provide downloadable UTR data sets that are updated periodically.

Also see the precedence criteria used to choose default UTR features to display on gene pages.

To retrieve UTRs for a specified list of genes, see the FAQ on downloading sequences for multiple genes (choose 5' UTR and/or 3' UTR in step 9).

There is a data track available for transcription factor binding sites in the genome browser. Follow the instructions for showing tracks, and choose "Transcription Factor Binding Sites" in the left-hand menu of the pop-up.

Also see the FAQ on finding transcription factors.

For the sequenced strains of S. japonicus, S. octosporus and S. cryophilus, the Ensembl group has generated GO annotation data sets for protein-coding genes by transferring experiment-based annotations from S. pombe orthologs. You can use the QuickGO browser to retrieve the data for each species -- follow the "Search and Filter GO annotation sets" link, then click "Filter" to set a taxon filter for the taxon ID:

S. japonicus (strain yFS275) - 402676
S. octosporus (strain yFS286) - 483514
S. cryophilus (strain OY26) - 653667

Because these automated annotations are inferred only from experimentally-derived S. pombe annotations, coverage will not be complete.

Note that the GAF downloaded from QuickGO uses UniProtKB accessions in the gene product ID column (column 2). To use the GAF in any further analysis, such as term enrichment, you will have to convert the accessions to systematic IDs. See the FAQ on ID mapping for hints.

One feasible approach to improve annotation coverage is to download the S. pombe GO annotations (see the GO Associations download page), and then substitute the S. pombe IDs with the IDs of orthologous genes from the other Schizosaccharomyces species of interest. For ortholog IDs, see the FAQ on Schizosaccharomyces orthologs, and use the indicated table from Rhind et al. Comparative functional genomics of the fission yeasts (PMID:21511999).

Note that some genes are present in S. japonicusS. octosporus or S. cryophilusbut absent fromS. pombe. For some of these gene products, GO annotations can be transferred from other species. If you wish to include annotations for these genes in your analysis you will need to use this option, and extend your GAF with the relevant annotation lines (contact the Helpdesk if you need assistance).

Combining all approaches gives the best coverage possible at present. You can use a "GO Slim" tool such as Princeton's GO Term Mapper to see if there are any gaps in coverage, as described in the FAQ on enrichment in S. pombe. Also see the FAQs on GO term enrichment in other Schizosaccharomyces species.

At present, if you need sequences for all tRNAs, rRNAs, other ncRNAs, etc. we recommend using the Advanced Search and results download as described in the FAQ on retrieving sequence coordinates for all features of a particular type.

Downloadable FASTA sequence datasets will be added to the PomBase FTP site in the near future.

The genome browser includes variation data from natural S. pombe isolates, published in:

Jeffares DC et al. 2015. The genomic and phenotypic diversity of Schizosaccharomyces pombe. Nat Genet. 47(3): 235-241. doi:10.1038/ng.3215 PMID:25665008

To view the variation data, enable one or more of the tracks under "Variation". Help is available for enabling tracks.

Insertions/deletions (indels) and SNPs can be enabled as separate tracks, or "Sequence variants (all sources)" displays both types on a single track. Clicking on any variation feature in the track brings up a small pop-up box with a summary and link to further details about the variation.

A Variscan track showing variation diversity along the sequence is also available.

Ensembl provides a quick reference card with more information on the browser interface for variations.

The PomBase genome browser includes a data track of core promoter locations from Li et al. (2015) Genome-wide analysis of core promoter structures in Schizosaccharomyces pombe with DeepCAGE. RNA Biol. 12:525-37. PMID:25747261. The promoter track is listed under the Regulatory Elements menu, and is best viewed using the "Labels" track style, which labels promoter features using the associated gene IDs (see the browser configuration FAQ for more information).

We will add new tracks for any more promoter data sets that are submitted to us.

There are very few manually curated promoters in PomBase, which are displayed on a genome browser track, "PomBase Annotated Promoter (SO:0000167)". This track is under the Sequence and assembly menu, and is activated in the default configuration. To search the manually curated promoters, we suggest that you use Artemis. Follow the instructions in this FAQ, and search for features with "Key" = "promoter".

Yes, the Phenotype annotations page offers two options, a complete phenotype annotation file and a "viability summary" for deletion mutants. At present, the full file contains all manually curated single mutant phenotypes, and is in the same format as PomBase uses for bulk phenotype data submissions (see the file formats FAQ). Further information on the viability summary is available in the essential genes FAQ.

Replication origin coordinates are not yet include in PomBase. We hope to obtain comprehensive origin data in the genome browser in the future, but we rely on user submissions to set priorities for adding data tracks to the browser.

Until replication origins are available in PomBase, we suggest that you use the data collated by Conrad Nieduszynski in S. pombe OriDB:

The S. pombe networks in esyN use the PomBase High Confidence Physical Interaction Network (HCPIN) data. The data can be downloaded in tab-delimited format from the Download Datasets page, which links to an FTP directory with two files. One contains physical interaction data, and the other contains GO substrate data. Each file lists two gene systematic IDs and a reference identifier (usually a PubMed ID).

We plan to offer a downloadable list of protein-coding genes (5052 as of release version 23_47, October 2014) in the near future.

In the meantime, you can use the Advanced Search to retrieve a list. All protein coding genes have the type "protein coding", but this type also includes a few transposon genes and several genes that are dubious (i.e. predicted by automated methods considered unlikely to actually encode protein), which you will presumably want to exclude from the set. To do so, use the NOT operator and the "Annotation Status" filter. The query is:

Genes By Type protein coding
NOT Annotation Status dubious
NOT Annotation Status transposon

query for protein coding genes

You can also perform the query in separate steps:

  1. In the New Query panel, query for Feature Type protein coding (query 1)
  2. New Query - NOT Annotation Status dubious (query 2)
  3. New Query - NOT Annotation Status transposon (query 3)
  4. In the Query Management panel, select 2 and 3 and combine them with OR (union); this forms query 
  5. Also in Query Management, select query 1 and query 4, and follow the instructions to combine them with NOT.

See the Advanced Search documentation for more information on performing the search described here.

Query link: Protein-coding genes (excluding 'dubious' and 'transposon' status)

A file of cDNA sequences in FASTA format is available on the Genome Datasets page.

Data that appear on gene pages -- sequence feature annotations, ontology annotations, etc. -- are stored in a database that uses the Chado schema. Dumps from the Chado database for each PomBase release are available via the Downloads page.

Yes, the Quick Links menu can be expanded and collapsed by clicking the "Quick Links" text in its header.

One gene can be correctly annotated to both a "viable" term and an "inviable" term from FYPO, under certain circumstances:

  • Different alleles may have different phenotypes; e.g., a deletion may be inviable, but a point mutation may be fully viable or conditionally lethal.
  • One allele may cause death under some, but not all, conditions.
  • An allele may cause only some cells in a population to die (this would be annotated using an "inviable cell" term, with an extension to indicate incomplete penetrance ("low" or "medium"), plus an annotation to a "viable cell population" term).
  • Cells that can divide for a few generations but then die are annotated as inviable, but can acquire suppressor mutations at a high enough frequency for populations to appear viable.

At present, alleles cannot be queried directly in the PomBase advanced search, but the FYPO phenotype filters do allow you to retrieve annotations for all alleles, or to restrict to null expression (deletions etc.) or overexpression of the wild-type allele. Comparing results with and without the allele restrictions may help resolve apparent discrepancies.

Note that it not yet possible to search for specific conditions, or for penetrance, but we plan to add these features to the Advanced Search.

If, however, the allele and condition details are identical, annotation to both viable and inviable terms is probably an error (either one of the terms is wrong, or there are missing or incorrect details for the alleles and/or conditions). Please let us know via the helpdesk if you notice any potential errors.

You can retrieve sequences from a gene page or in the Genome Browser.

On the gene page: Scroll down or click the quick link to the Sequence section of the page, where there is a set of pre-set one-click options and a Custom option. For protein-coding genes, there are pre-set options to retrieve the coding sequence (CDS), CDS + UTRs, CDS + UTRs + introns, or a translation of the CDS; for non-coding RNA genes only the relevant options are offered. Under Display Options, you can choose whether to retrieve plain text or add color highlighting of different regions.

To include flanking sequences, use the Custom Sequence option. Clicking the View button takes you to a page where you can specify whether to include UTRs and introns, and how much upstream and downstream sequence to include. Click the Download button to see the sequence. You can save by copying and pasting from the browser.

To use the Genome Browser: Click the "View in Genome Browser" link under the map graphic on a gene page, or go directly to the Genome Browser via the Tools menu, and search for a gene name or systematic ID. Click Export Data (a button on the left hand side). Select the number of bases up- and downstream, which strand and the features you would like. Click "next". Select your download option. Your browser will save or display the data, depending on which format you select.

To retrieve flanking regions for more than one gene at a time, at present you must use the Ensembl Genomes Biomart query, as described in this FAQ.

In PomBase, human and S. cerevisiae orthologs are manually curated for S. pombe genes as described in the Manually Curated Orthologs documentation. Because manual ortholog curation is extremely time-consuming, it is not done for any species other than human and S. cerevisiae. For automated ortholog prediction of orthologs in other species please see the relevant FAQ.

In the future we will add a tree view of consensus orthologs in key species to the gene pages.

In the Ensembl genome browser, click the "Configure this page" button in the left-hand bar. A pop-up box will appear. Note that this box has several tabs along its top, and the exact selection of tabs and configuration options depends on whether you are configuring the "Location", "Gene", or "Transcript" tab of the main browser.

To turn a track on or off, click the small box to the left of its description. Note that some tracks simply toggle on and off, whereas for others a small popup appears, in which you can select from a set of options controlling exactly how the track appears. The left-hand bar of the configuration popup organizes available tracks into subsets, and offers a few additional options (including "Reset configuration", which restores the default display).

For example, to show or hide repeat regions, make sure you have the "Location" tab selected. The tabs for this configuration allow you to configure the "Region" (lower) and "Overview" (lower) images separately. In the "Configure Region Image" tab, click "Repeat regions" in the popup's left-hand bar. You can then check one box to show all repeats, or select specific types of repeat to display.

When you are finished choosing tracks, click the tick/check mark in the upper right corner of the configuration popup.

Go to the Genome Browser (in the Tools menu), and enter coordinates in the 'Search for:' box. The format is 'I:100000..200000' or 'I:100000-200000' (i.e. use Roman numerals to specify the chromosome, and don't include the word "chromosome"; use either '..' or '-' between the start and end coordinates.)

Yes. First, make sure sequence display is "on":

  1. From the Location tab click on the "Configure this Page"
  2. In the left hand menu of the pop-up, click on "Sequence and assembly"
  3. If the box next to "Sequence" is blank, click it

(also see the FAQ on configuring tracks).

The Location tab shows two graphics. Between the two images there is a slider that controls the zoom. Click the '+' or drag the vertical bar to the left to zoom in. The lower graphic will first display colored blocks representing color-coded nucleotides, and then, at maximum zoom, legible sequence. (Note that the vertical bar may appear to be all the way to the left of the slider before you actually zoom in enough to read the sequence; keep clicking '+' if necessary.)

At any zoom level, use the arrows flanking the zoom slider to scroll along the sequence.

You can do this in the Genome Browser (from a gene page or the Tools menu). First, enter the coordinates, then click the Export Data button on the left-hand side. In the Output pulldown (topmost in the popup window) choose one of the formats under the "Feature File" header. Then follow the remaining steps to retrieve the sequence features -- add flanking sequences, select options for your selected output format, etc.

Go to the Genome Browser (in the Tools menu), and find the region of interest by searching for a gene name, a systematic ID, or a set of coordinates. Then click the Export Data button on the left-hand side.

Select the number of bases up- and downstream (even if you have searched using coordinates, you can add flanking sequence to what you download), which strand and the features you would like. Click "next". Select your download option. Your browser will save or display the data, depending on which format you select.

You can retrieve sequences from a gene page or in the Genome Browser.

On the gene page: Scroll down or click the quick link to the Sequence section of the page, where there is a set of pre-set one-click options and a Custom option. For protein-coding genes, there are pre-set options to retrieve the coding sequence (CDS), CDS + UTRs, CDS + UTRs + introns, or a translation of the CDS; for non-coding RNA genes only the relevant options are offered. Under Display Options, you can choose whether to retrieve plain text or add color highlighting of different regions.

To include flanking sequences, use the Custom Sequence option. Clicking the View button takes you to a page where you can specify whether to include UTRs and introns, and how much upstream and downstream sequence to include. Click the Download button to see the sequence. You can save by copying and pasting from the browser.

To use the Genome Browser: Click the "View in Genome Browser" link under the map graphic on a gene page, or go directly to the Genome Browser via the Tools menu, and search for a gene name or systematic ID. Click Export Data (a button on the left hand side). Select the number of bases up- and downstream, which strand and the features you would like. Click "next". Select your download option. Your browser will save or display the data, depending on which format you select.

To retrieve flanking regions for more than one gene at a time, at present you must use the Ensembl Genomes Biomart query, as described in this FAQ.

The reference genome sequence excludes most of the ribosomal DNA (rDNA) repeats, which are present in two tandem arrays on chromosome III. These arrays are estimated to be 1225 kb and 240 kb in size for the sequenced strain (972 h-). The reference sequence includes two partial and one complete representative rDNA repeats:

The complete repeat sequence coordinates are Chromosome 3:5542-13722 (note that the reverse strand is transcribed). The link goes to the PomBase Genome Browser, where you can view and download the sequence. Because the reverse strand is transcribed, you may want to choose "-1" in the location settings.

Also see the FAQ on finding rRNA genes.

Browse for Chromosome II:2129208-2137121, and see the Mating Type Region page.

The mating type region will soon be annotated as a feature, and refer to a Sequence Ontology term.

Centromeres can be retrieved in the PomBase Ensembl browser; the coordinates are:
Chromosome I:3753687-3789421
Chromosome II:1602264-1644747
Chromosome III:1070904-1137003

Sequence features within the centromeres, such as repeats, are annotated with Sequence Ontology terms. For more details, see the Centromeres page.

Yes, if you have locally stored data that you want to see in the context of the genome browser, you can add it as a custom track. The Ensembl documentation on adding custom tracks describes supported file formats, options for uploading or linking your data, and instructions on using the web interface to configure your tracks.

The genome browser includes variation data from natural S. pombe isolates, published in:

Jeffares DC et al. 2015. The genomic and phenotypic diversity of Schizosaccharomyces pombe. Nat Genet. 47(3): 235-241. doi:10.1038/ng.3215 PMID:25665008

To view the variation data, enable one or more of the tracks under "Variation". Help is available for enabling tracks.

Insertions/deletions (indels) and SNPs can be enabled as separate tracks, or "Sequence variants (all sources)" displays both types on a single track. Clicking on any variation feature in the track brings up a small pop-up box with a summary and link to further details about the variation.

A Variscan track showing variation diversity along the sequence is also available.

Ensembl provides a quick reference card with more information on the browser interface for variations.

The current version of the manually curated list of orthologs and orthologous groups identified between fission and budding yeast is available for download from the Orthologs page (linked from the Datasets page).

The current version of the manually curated list of orthologs and orthologous groups identified between fission yeast and human is available for download from the Orthologs page (linked from the Datasets page).

Also see the FAQs on finding genes conserved in human, finding disease gene orthologs, and finding the ortholog of a specific gene.

The reference sequence was last updated in January 2007; only feature coordinates and annotation have changed since then. See Sequence Updates and Sequence Updates Pending for more information.

As genes are annotated, each is assigned a status, as described on the Gene Characterisation page. Taxonomic conservation of a gene is assigned manually on a case-by-case basis, taking into account multiple criteria. Additional information is available from PomBase curators upon request.

Genes listed on the Priority Unstudied Genes page are those that have "conserved unknown" characterisation status and the "conserved in vertebrates" taxonomic distribution.

You can also use the Advanced Search to find conserved unstudied genes as described in the FAQs on characterisation status and taxonomic conservation. Start by searching for Annotation Status "conserved unknown", and refine the search by adding a Conserved in ... filter if you wish.

Each gene is assigned exactly one characterisation status that reflects how much is known about the gene, whether it is conserved, etc. Specific status descriptions:

  • Experimentally characterised: Completely or partially characterised in a small scale experiment, with some published information about the biological role (corresponding to any of the fission yeast GO slim biological process terms)
  • Role inferred from homology: A biological role (as above, a fission yeast GO slim term) is inferred from homology to an experimentally characterised gene product
  • Conserved protein (unknown biological role): Conserved outside the Schizosaccharomyces, but nothing known about the biological role in any organism
  • Schizosaccharomyces specific protein, uncharacterised: Unpublished and found only in fission yeast (S. pombeS. octosporus, S. japonicus, S. cryophilus); nothing known about biological role. May be single copy or a member of a multi-member family.
  • S. pombe specific protein, uncharacterised: Unpublished and found only in S. pombe (not detected in other Schizosaccharomyces species); nothing known about biological role
  • Dubious: Unlikely to be protein coding

A current summary of gene characterisation status for the S. pombe genome is available, as well as a table of historical characterisation status counts.

You can also retrieve current lists of genes with each characterisation status using the Advanced Search. Select the Annotation Status filter, then choose a status from the pulldown menu, and submit.

If an essential gene is deleted, the cell cannot survive under normal laboratory conditions. A search for deletion alleles annotated to the Fission Yeast Phenotype Ontology term "inviable vegetative cell population" (FYPO:0002061) would therefore identify essential fission yeast genes. Similarly, deletion alleles annotated to "viable vegetative cell population" (FYPO:0002060) represent non-essential genes.

Downloadable summary

A set of "viability summary" data, as shown at the top of the FYPO table on each gene page, is available as a downloadable file. The file has two columns: the gene systematic ID and one of three values: "viable", "inviable" or "condition-dependent".

Querying

  • To find genes annotated to "inviable vegetative cell population", select the "FYPO ID" filter and type or paste the ID, FYPO:0002061. Set the Allele Expression pulldown to "Null Expression" and submit the query. The results include all genes that showed inviable phenotypes in the HTP deletion project as well some manually annotated genes. Do the same for viable (FYPO:0002060).
  • For some deletion mutants, viability depends on experimental conditions, which cannot yet be queried in PomBase. These genes are annotated to both viable (FYPO:0002060) and inviable (FYPO:0002061) at once. To find them, use the "AND" operator in the Query Management panel (this search can also be set up all at once in the New Query panel).
  • See the Advanced Search documentation for more information on performing the searches described here.

A brief note about FYPO terms

At present, there are very few null mutants annotated as inviable in life cycle stages other than vegetative growth, and "inviable vegetative cell population" best fits the most common usage of "essential gene". If you do want to include other stages (such as "inviable spore"), you can use the very generic term "inviable cell population" (FYPO:0002059) or "viable cell population" (FYPO:0002058) in your query. All of the caveats about alleles and conditions still apply.

Query links

On the Genome Statistics page: http://www.pombase.org/status/statistics

The Advanced Search includes a filter that retrieves all genes at once. Simply go to the New Query panel (http://www.pombase.org/spombe/query/builder), select the filter 'All Genes', and submit the query.

Query link: All genes

 

Most non-coding RNAs in PomBase are based on transcriptome data, either from Jürg Bähler's lab (Solexa/deep sequencing; PMID:18488015) or Nick Rhind's lab (RNA sequencing; PMID:21511999). For any ncRNA, the source should be linked as a publication in the "Literature" section at the bottom of the PomBase gene page. To get an idea of the transcription in a region, you can look at the Bähler Lab Transcriptome Viewer, which is linked from most gene pages, e.g. SPNCRNA.200. Unfortunately some genes, such as SPNCRNA.1115, post-date the viewer and therefore do not have entries, but you can look at transcription in this region by accessing a neighboring gene.

The Fission Yeast GO slim terms page provides a generic GO biological process slim for S. pombe, and shows total genes annotated to each term directly or to any of its descendants.

If you want GO slim annotations for your own list of S. pombe genes, we recommend the GO Term Mapper at Princeton. Upload your list of genes, and select "PomBase (S. pombe GOslim) (Process only)" from the "Choose GO slim" pulldown. GO Term Mapper's interface and documentation should make the rest straightforward, but let PomBase staff know if you have any problems.

For further information on using the generic S. pombe slim, or on creating your own GO slim, please see the Fission Yeast GO slimming tips page.

Yes, there is a file that lists GO macromolecular complex assignments for fission yeast gene products in the FTP directory:

ftp://ftp.ebi.ac.uk/pub/databases/pombase/pombe/Complexes/

Note that the complex inventory includes the RNA subunits of ribonucleoprotein complexes. There is some redundancy in the list, because some gene products are annotated to both complexes and subcomplexes. For example, subunits of the DASH complex (GO:0042729) are annotated to 'condensed chromosome outer kinetochore' (GO:0000940) as well as GO:0042729. Additional notes are available in a README file: ftp://ftp.ebi.ac.uk/pub/databases/pombase/pombe/Complexes/README

Also see the FAQ on localization.

The "Drugs with knowns S. pombe targets" page lists drugs that have been shown to affect S. pombe, with brief summaries of their targets.

If you notice any errors or omissions on this page, or can provide any supporting references, please email the helpdesk.

In PomBase, human and S. cerevisiae orthologs are manually curated for S. pombe genes as described in the Manually Curated Orthologs documentation. Because manual ortholog curation is extremely time-consuming, it is not done for any species other than human and S. cerevisiae. For automated ortholog prediction of orthologs in other species please see the relevant FAQ.

In the future we will add a tree view of consensus orthologs in key species to the gene pages.

Almost all genes that are conserved between fission yeast and human are also conserved in other vertebrates (there are two exceptions, genes encoding amino acid biosynthesis proteins that have become pseudogenes in human). To retreive these genes, go to the Advanced Search (http://www.pombase.org/spombe/query/builder), and choose the "Conserved in ..." filter option. Then choose the description "Conserved in vertebrates", and submit.

Also see the FAQs on finding disease gene orthologs, finding the ortholog of a specific gene, and on downloading the full set of curated orthologs.

Query link: Genes conserved in vertebrates

S. pombe genome features were originally annotated using Artemis. As noted in the manual (ftp://ftp.sanger.ac.uk/pub/resources/software/artemis/artemis.pdf - see p. 9), Artemis draws from a list of feature keys that is documented at EBI: ftp://ftp.ebi.ac.uk/pub/databases/embl/doc/FT_current.html#7.2

In the genome sequence data files, features are defined using Sequence Ontology terms. Gene pages use a selection of human-friendly text descriptions for feature types. (Further details will be available here soon.)

On gene pages, the source of the annotated transcript coordinates is shown with the UTR coordinates in the Transcript section (e.g. cdc2). PomBase curators have chosen default UTR features using three data sources and a set of precedence criteria:

  1. Highest priority is given to data from low-throughput "conventional" experiments preformed on individual mRNAs and reported in publications or submitted to EMBL. Where low-throughput data are not available, one of three high-throughput datasets is used.
  2. The Broad data published in 2011 by Rhind et al. (PMID:21511999) is given precedence because it is the most recent, is higher resolution and detected splicing within the UTRs. Note: This study used a "greedy" algorithm to determine the longest possible transcript from transcriptome reads, which may result in the prediction of longer UTRs than are actually present. Use these data with caution, and refer to the transcript profiling data in the genome browser for genes of interest.
  3. For genes not covered by (1) or (2), start/end data from Lantermann et al. (PMID:20118936) based on transcriptome data from Dutrow et al. (PMID:18641648) are used where available.
  4. For genes not covered by (1), (2) or (3), we use data from Wilhelm et al. (PMID:18488015).

More information is available in the mailing list archive for two HTP datasets (Broad: http://listserver.ebi.ac.uk/pipermail/pombelist/2011/000856.html ; Lanterman/Dutrow: http://listserver.ebi.ac.uk/pipermail/pombelist/2011/000814.html).

Transcript start and end coordinates from all sources will be available as individual data tracks in the Ensembl genome browser in the near future, which will allow you to view and evaluate them. PomBase will also curate splice and transcript variants as data become available.

Orphan genes are generally defined as genes without homologs in other organisms. In PomBase, genes conserved in the Schizosaccharomyces genus are distinguished from genes conserved only in S. pombe.

To retrieve either set of genes, use the "Conserved in" filter in the Advanced Search. Choose "Schizosaccharomyces specific" for genes found in more than one Schizosaccharomyces species, or "Schizosaccharomyces pombe specific" for genes found only in S. pombe. See the Advanced Search documentation for help with performing searches.

Historical note: Prior to August 2014, PomBase and its predecessor GeneDB referred to single-copy genes conserved within, but not outside, the Schizosaccharomyces genus as "sequence orphans". See the Gene Characterisation Statistics History page for more details (note that the gene characterisation classifications reflect whether a gene has been studied experimentally as well as the extent of its conservation).

Systematic IDs follow patterns based on the feature type, and in some cases the chromosome, as shown in the table below.

Open reading frame (ORF) IDs also indicate which cosmid or plasmid they were found on in genome sequencing. In most cases, ORF IDs that end with a digit indicate that the ORF is on the forward (Watson) strand, and an ORF with an ID that ends with 'c' is on the reverse (Crick) stand. There are a few exceptions, however, because some cosmids were moved and their orientation reversed late in the sequence assembly procedure

IDs with '.1' appended are transcript IDs; the dot-and-digit IDs follow Ensembl's standard. At present, PomBase has only one transcript annotated for any given feature, but in the future when alternative transcripts are annotated the digit will be incremented (.2, .3, etc.).

Systematic ID patterns

ID pattern Description
SPAC* features, usually ORFs, on chromosome 1, sequenced on cosmids
SPBC* features, usually ORFs, on chromosome 2, sequenced on cosmids
SPCC* features, usually ORFs, on chromosome 3, sequenced on cosmids
SPAP* features, usually ORFs, on chromosome 1, sequenced on plasmids
SPBP* features, usually ORFs, on chromosome 2, sequenced on plasmids
SBCP* features, usually ORFs, on chromosome 3, sequenced on plasmids
SPATRNA* tRNA genes on chromosome 1
SPBTRNA* tRNA genes on chromosome 2
SPCTRNA* tRNA genes on chromosome 3
SPLTRA* LTRs on chromosome 1
SPLTRB* LTRs on chromosome 2
SPLTRC* LTRs on chromosome 3
SPNCRNA* non-coding RNA genes (no chromosome info in ID)
SPRPTA.* repeats (other than LTRs or centromeric repeats) on chromosome 1
SPRPTB.* repeats (other than LTRs or centromeric repeats) on chromosome 2
SPRPTC.* repeats (other than LTRs or centromeric repeats) on chromosome 3
SPRPTCENA* centromeric repeats on chromosome 1
SPRPTCENB* centromeric repeats on chromosome 2
SPRPTCENC* centromeric repeats on chromosome 3
SPRRNA* rRNA genes (no chromosome info in ID)
SPSNORNA* snoRNA genes (no chromosome info in ID)
SPSNRNA* snRNA genes (no chromosome info in ID)
SPTF* transposons (no chromosome info in ID)
SPMTR* features on the separately sequenced mating type region contig
SPMIT* features on the mitochondrial chromosome
SPMITTRNA* subset of SPMIT*; tRNA genes on mitochondrial chromosome
SPNUMT* NUMTs (nuclear mitochondrial pseudogenes) (no chromosome info in ID)

The S. pombe networks in esyN use the PomBase High Confidence Physical Interaction Network (HCPIN) data. The data can be downloaded in tab-delimited format from the Download Datasets page, which links to an FTP directory with two files. One contains physical interaction data, and the other contains GO substrate data. Each file lists two gene systematic IDs and a reference identifier (usually a PubMed ID).

We plan to offer a downloadable list of protein-coding genes (5052 as of release version 23_47, October 2014) in the near future.

In the meantime, you can use the Advanced Search to retrieve a list. All protein coding genes have the type "protein coding", but this type also includes a few transposon genes and several genes that are dubious (i.e. predicted by automated methods considered unlikely to actually encode protein), which you will presumably want to exclude from the set. To do so, use the NOT operator and the "Annotation Status" filter. The query is:

Genes By Type protein coding
NOT Annotation Status dubious
NOT Annotation Status transposon

query for protein coding genes

You can also perform the query in separate steps:

  1. In the New Query panel, query for Feature Type protein coding (query 1)
  2. New Query - NOT Annotation Status dubious (query 2)
  3. New Query - NOT Annotation Status transposon (query 3)
  4. In the Query Management panel, select 2 and 3 and combine them with OR (union); this forms query 
  5. Also in Query Management, select query 1 and query 4, and follow the instructions to combine them with NOT.

See the Advanced Search documentation for more information on performing the search described here.

Query link: Protein-coding genes (excluding 'dubious' and 'transposon' status)

The current PomBase version and release date are displayed at the bottom of each gene page. The version number has two parts, of which the first is the Ensembl Genomes (EG) version and the second is the version of curated PomBase annotations (sequence features, ontology annotations, etc.). For example, PomBase version 20_39 uses EG version 20 and PomBase annotation data version 39.

The Data Version History page shows additional information about the versions of various data and software portions of the current PomBase release. A table of historical values is also included.

Yes. First, make sure sequence display is "on":

  1. From the Location tab click on the "Configure this Page"
  2. In the left hand menu of the pop-up, click on "Sequence and assembly"
  3. If the box next to "Sequence" is blank, click it

(also see the FAQ on configuring tracks).

The Location tab shows two graphics. Between the two images there is a slider that controls the zoom. Click the '+' or drag the vertical bar to the left to zoom in. The lower graphic will first display colored blocks representing color-coded nucleotides, and then, at maximum zoom, legible sequence. (Note that the vertical bar may appear to be all the way to the left of the slider before you actually zoom in enough to read the sequence; keep clicking '+' if necessary.)

At any zoom level, use the arrows flanking the zoom slider to scroll along the sequence.

Go to the Genome Browser (in the Tools menu), and enter coordinates in the 'Search for:' box. The format is 'I:100000..200000' or 'I:100000-200000' (i.e. use Roman numerals to specify the chromosome, and don't include the word "chromosome"; use either '..' or '-' between the start and end coordinates.)

The current S. pombe genome assembly does not include the complete telomeric regions or the telomeric short repeats. These omissions are beyond the control of PomBase curators. Subtelomeric repeats are also not explicitly defined at present, although we hope to provide this information in the future. Additional information about S. pombe telomeres is available at on the Telomeres page (linked via the Genome Status menu).

Browse for Chromosome II:2129208-2137121, and see the Mating Type Region page.

The mating type region will soon be annotated as a feature, and refer to a Sequence Ontology term.

Centromeres can be retrieved in the PomBase Ensembl browser; the coordinates are:
Chromosome I:3753687-3789421
Chromosome II:1602264-1644747
Chromosome III:1070904-1137003

Sequence features within the centromeres, such as repeats, are annotated with Sequence Ontology terms. For more details, see the Centromeres page.

The reference genome sequence excludes most of the ribosomal DNA (rDNA) repeats, which are present in two tandem arrays on chromosome III. These arrays are estimated to be 1225 kb and 240 kb in size for the sequenced strain (972 h-). The reference sequence includes two partial and one complete representative rDNA repeats:

The complete repeat sequence coordinates are Chromosome 3:5542-13722 (note that the reverse strand is transcribed). The link goes to the PomBase Genome Browser, where you can view and download the sequence. Because the reverse strand is transcribed, you may want to choose "-1" in the location settings.

Also see the FAQ on finding rRNA genes.

No; at present only the Ensembl genome browser is available via the PomBase web site. (As of May 2013, we are investigating the possibility of adding an Artemis applet to PomBase, and will update this FAQ accordingly when it becomes available.)

If you want to browse the S. pombe genome in the Artemis environment, it is fairly easy to download and run locally:

Once you have loaded the file(s), you can do many different things, e.g.:

  • Find features by name or ID
  • Find all features of a given type (e.g. see the "can I find transposons" FAQ)
  • Find matches to a specific nucleotide sequence (e.g. see the "restriction enzyme map" FAQ)
  • View the nucleotide or amino acid sequence of a region or feature
  • Export selected sequences

A video demonstrating Artemis installation is available on YouTube. See the Artemis FAQ and the Artemis manual (pdf; Sanger site) for additional information.

The current version of the manually curated list of orthologs and orthologous groups identified between fission and budding yeast is available for download from the Orthologs page (linked from the Datasets page).

In PomBase, S. cerevisiae orthologs are curated for S. pombe genes as described in the Orthologs documentation.

To find S. pombe orthologs for a budding yeast gene, you can search for the systematic name (ORF name) of the S. cerevisiae gene in the Simple Search (go to http://www.pombase.org/search/ensembl or use the search box in the page header). For example, S. cerevisiae LRP1 has the systematic name YHR081W, and a search on this in PomBase will retrieve the S. pombe gene cti1. Note that only systematic names can be searched for S. cerevisiae, to avoid confusion in cases where unrelated genes coincidentally have the same name in S. pombe and S. cerevisiae. To find systematic names of S. cerevisiae genes, you can search SGD.

Also see the FAQ on downloading the full set of orthologs.

The current version of the manually curated list of orthologs and orthologous groups identified between fission yeast and human is available for download from the Orthologs page (linked from the Datasets page).

Also see the FAQs on finding genes conserved in human, finding disease gene orthologs, and finding the ortholog of a specific gene.

Almost all genes that are conserved between fission yeast and human are also conserved in other vertebrates (there are two exceptions, genes encoding amino acid biosynthesis proteins that have become pseudogenes in human). To retreive these genes, go to the Advanced Search (http://www.pombase.org/spombe/query/builder), and choose the "Conserved in ..." filter option. Then choose the description "Conserved in vertebrates", and submit.

Also see the FAQs on finding disease gene orthologs, finding the ortholog of a specific gene, and on downloading the full set of curated orthologs.

Query link: Genes conserved in vertebrates

In PomBase, human orthologs are curated for S. pombe genes as described in the Manually Curated Orthologs documentation.

To find S. pombe orthologs for a human gene, you can search for the standard human gene name in the Simple Search (go to http://www.pombase.org/search/ensembl or use the search box in the page header). For example, searching for human ABTB1 will retrieve the S. pombe gene btb1. To find standard human gene names, you can search HGNC. Note that in a few cases, a human gene name will coincidentally match a name or synonym of a non-orthologous S. pombe gene as well as the actual curated ortholog(s), so please check the gene pages carefully, especially if your search retrieves more than one result.

Also see the FAQs on on finding genes conserved in human, finding disease gene orthologs, and on downloading the full set of curated orthologs.

S. pombe genes whose human orthologs have been implicated in disease are annotated with terms from the internal PBO vocabulary. To retrieve all of these genes, you can use the most general "disease associated" term. To do the query manually:

  1. Go to the Advanced Search (http://www.pombase.org/spombe/query).
  2. Find the term:
    1. Select the 'PBO Term Name' filter, start typing 'disease associated', and choose 'disease_associated' from the autocomplete options; or
    2. Select the 'PBO ID' filter and enter 'PBO:5000000'.
  3. Submit the query.

You can also type all or part of specific disease name (e.g. 'cancer') into the 'PBO Term Name' filter to see if any matches come up in the autocomplete suggestions. Also see the FAQs on finding genes conserved in human, finding the ortholog of a specific gene, and on downloading the full set of curated orthologs.

Example queries:

You can use Compara via the genome browser to see multiple alignments:

  1. On any gene page, go to the Orthologs section (scroll or use the Quick Links box).
  2. Follow the relevant link to Compara - for fungal alignments, choose "View orthologs in other fungal species with Compara", or for all eukaryotic species choose "View orthologs across taxonomic space using pan-species Compara".
  3. You should see a "collapsed" gene tree highlighting your fission yeast gene of interest. From here you can click on any node to see a menu of options:
    1. Expand or collapes specific sub-nodes of the tree, or expand the tree fully
    2. View the alignment in FASTA format
    3. Launch the jalview multiple alignment viewer to see the full alignment and colour by residue conservation, hydrophobicity, etc.

To configure the protein entries visible in the alignment, select the most "inclusive" node you require. You can reduce the number of entries by collapsing individual sub-trees (step 4) before you generate your alignment. A brief video demostrates using the Compara trees.

Information about how the Compara trees are generated, homology types, and species are available here: http://fungi.ensembl.org/info/genome/compara/homology_method.html

For orthologs that are not manually curated by PomBase, we suggest two approaches:

Compara

You can search for orthologs/paralogs in Fungi, or in a pan-taxonomic comparison (eukaryotes), using Compara in the Ensembl browser.

  1. On any gene page, go to the Orthologs section (scroll or use the Quick Links box).
  2. Follow the relevant link to Compara - for fungal alignments, choose "View orthologs in other fungal species with Compara", or for all eukaryotic species choose "View orthologs across taxonomic space using pan-species Compara".
  3. You should see a "collapsed" gene tree highlighting your fission yeast gene of interest. From here you can click on any node to see a menu of options:
    1. Expand or collapes specific sub-nodes of the tree, or expand the tree fully
    2. View the alignment in FASTA format
    3. Launch the jalview multiple alignment viewer to see the full alignment and colour by residue conservation, hydrophobicity, etc.

To configure the protein entries visible in the alignment, select the most "inclusive" node you require. You can reduce the number of entries by collapsing individual sub-trees (step 4) before you generate your alignment. A brief video demostrates using the Compara trees.

Information about how the Compara trees are generated, homology types, and species is available from the Ensembl comparative genomics documentation.


YOGY

From any gene page, follow the link to YOGY under External References.

YOGY is a web-based resource for retrieving orthologous proteins from ten eukaryotic organisms and one prokaryote: Homo sapiens, Mus musculus, Rattus norvegicus, Arabidopsis thaliana, Dictyostelium discoideum, Drosophila melanogaster, Caenorhabditis elegans, Plasmodium falciparum, Escherichia coli, Schizosaccharomyces pombe, and Saccharomyces cerevisiae. Using a gene or protein identifier from any of these organisms as a query, this database provides comprehensive, combined information on orthologs in other species using data from five independent resources: KOGs, Inparanoid, Homologene, OrthoMCL

There are various ways you can find protein family members.

  1. If you know the Pfam, PRINTs, PROSITE, or InterPro accession for the family or domain you want, you can use the Advanced Search (http://www.pombase.org/spombe/query/builder). Go to the New Query tab, choose "Proteins That Have Specific Protein Domains" in the "Select Filter" pulldown, enter the accession, and submit.
  2. If you don't have an accession, but do know any member of the family, go directly to its gene page. In the "Protein Features" section of the gene page there is a table of protein domains and families, which includes a link to a list of all family members in S. pombe.
  3. If you know neither accessions nor family members, you can search for keywords in the InterPro database (http://www.ebi.ac.uk/interpro/), which combines signatures from a number of member databases, including Pfam. Record the accession number(s) of the family, and use them in the PomBase advanced search as described in item 1 above. (If necessary, you can use Query Management to combine the results of several queries.)

You can also try a keyword search in the PomBase advanced search, but this is much less reliable, because a keyword search may retrieve some proteins that don't have the domain or aren't family members due to coincidentally matching words in gene product descriptions. In the future, we plan to add the ability to search the full text of gene pages, which will provide another option for finding protein family information.

Example query: Proteins matching "ATPase, AAA-type, core" (Pfam:PF00004)

Yes: In the Advanced Search (http://www.pombase.org/spombe/query/builder), choose the "Conserved in ..." filter option. Then choose one of the descriptions, and submit.

Example query: Genes conserved in vertebrates

In PomBase, human and S. cerevisiae orthologs are manually curated for S. pombe genes as described in the Manually Curated Orthologs documentation. Because manual ortholog curation is extremely time-consuming, it is not done for any species other than human and S. cerevisiae. For automated ortholog prediction of orthologs in other species please see the relevant FAQ.

In the future we will add a tree view of consensus orthologs in key species to the gene pages.

On a gene-by-gene basis, you can use the link to "View orthologs in other fungal species with Compara" as described in the FAQ on orthologs in other species.

For a full set of orthologous genes in S. pombe, S. cryophilus, S. japonicus and S. octosporus, see Table S12, columns AD-AG, in Rhind et al. Comparative functional genomics of the fission yeasts (PMID:21511999).

PomBase offers two ways to view nucleotide-level similarity between S. pombe and S. japonicus, S. octosporus, or S. cryophilus. Both use the Genome Browser.

  1. To view nucleotide similarity data tracks in the browser, follow the usual steps as described in the data track FAQ. Select the data type "Comparative Genomics".
  2. Display syntenic regions as follows:
    1. Go to your region of interest in the browser (e.g. follow the link from a gene page or use sequence coordinates). Make sure the "Location" tab is selected in the horizontal set of tabs along the top.
    2. In the left-hand menu, find the "Comparative Genomics" heading, and click on "Region Comparison".
    3. To select a species for comparison, go to the bottom of the left-hand menu, and click the "Select species or regions" link (it may appear to be subtly blinking; we apologise for this anomaly).
    4. In the popup, click the "+" beside any species in the "Unselected species or regions" list to move it to the "Selected species or regions" list. Note: "lastz" is the nucleotide alignment algorithm used. Close the popup - click the tick/check mark in the upper right corner, or click outside the popup.
    5. Synteny views will now be visible in the bottom-most graphical display (scroll down if necessary). For any region in the S. pombe genome, pink tracks show the region in the second genome with the best nucleotide alignment. Green bands connect the best-aligned regions to highlight synteny.
    6. A video is available demonstrating this feature.

Genome sequence files can be downloaded from the Genome Datasets page in several different formats.

At present, there isn't a good way to retrieve flanking sequences for multiple genes in bulk directly from PomBase. (You can download coding sequences via the Advanced Search, or flanking sequences for individual genes via the gene page Sequence section.) We hope to add a more convenient option in the near future, but in the meantime, we recommend using the Ensembl Genomes Biomart query:

  1. Go to http://fungi.ensembl.org/biomart/martview/
  2. Select the database “Ensembl Fungi Genes” from the "CHOOSE DATABASE" drop-down menu.
  3. Select “Schizosaccharomyces pombe genes” from the "CHOOSE DATASET" drop-down menu. Additional options will appear in the left-hand sidebar.
  4. In the left-hand menu, click on the header “Filters”.
  5. Expand the section “GENE” by clicking the + sign
  6. In the drop-down menu in the section “ID list limit” select “PomBase Gene ID(s)”. (This will automatically tick the "ID list limit box.) In the box underneath, type or paste a list of S. pombe gene names or systematic IDs.
  7. In the left-hand menu, click on the header “Attributes”.
  8. Click the “Sequences” button, and expand the “SEQUENCES” section.
  9. Click a button to select which sequences you want. In the cartoon, red or black highlighting indicates what each option retrieves. Key: |---, 5' flanking region; leftmost box, 5' UTR; inner boxes, coding exons; rightmost box, 3' UTR; ---|, 3' flanking region; ^, introns.
  10. To include flanking regions, tick one or both of the "Upstream flank" and "Downstream flank" boxes, and enter the length you want. (Note: the "flank" options in the button selections retrieve ONLY flanking squence, and will only retrieve 3' or 5' in any given query, not both.)
  11. When you have specified what you want, find the "Results" button in the header and click it. You will be able to view or download the results, or have them emailed to you.

Yes. First, make sure sequence display is "on":

  1. From the Location tab click on the "Configure this Page"
  2. In the left hand menu of the pop-up, click on "Sequence and assembly"
  3. If the box next to "Sequence" is blank, click it

(also see the FAQ on configuring tracks).

The Location tab shows two graphics. Between the two images there is a slider that controls the zoom. Click the '+' or drag the vertical bar to the left to zoom in. The lower graphic will first display colored blocks representing color-coded nucleotides, and then, at maximum zoom, legible sequence. (Note that the vertical bar may appear to be all the way to the left of the slider before you actually zoom in enough to read the sequence; keep clicking '+' if necessary.)

At any zoom level, use the arrows flanking the zoom slider to scroll along the sequence.

You can retrieve sequences from a gene page or in the Genome Browser.

On the gene page: Scroll down or click the quick link to the Sequence section of the page, where there is a set of pre-set one-click options and a Custom option. For protein-coding genes, there are pre-set options to retrieve the coding sequence (CDS), CDS + UTRs, CDS + UTRs + introns, or a translation of the CDS; for non-coding RNA genes only the relevant options are offered. Under Display Options, you can choose whether to retrieve plain text or add color highlighting of different regions.

To include flanking sequences, use the Custom Sequence option. Clicking the View button takes you to a page where you can specify whether to include UTRs and introns, and how much upstream and downstream sequence to include. Click the Download button to see the sequence. You can save by copying and pasting from the browser.

To use the Genome Browser: Click the "View in Genome Browser" link under the map graphic on a gene page, or go directly to the Genome Browser via the Tools menu, and search for a gene name or systematic ID. Click Export Data (a button on the left hand side). Select the number of bases up- and downstream, which strand and the features you would like. Click "next". Select your download option. Your browser will save or display the data, depending on which format you select.

To retrieve flanking regions for more than one gene at a time, at present you must use the Ensembl Genomes Biomart query, as described in this FAQ.

You can do this in the Genome Browser (from a gene page or the Tools menu). First, enter the coordinates, then click the Export Data button on the left-hand side. In the Output pulldown (topmost in the popup window) choose one of the formats under the "Feature File" header. Then follow the remaining steps to retrieve the sequence features -- add flanking sequences, select options for your selected output format, etc.

Go to the Genome Browser (in the Tools menu), and find the region of interest by searching for a gene name, a systematic ID, or a set of coordinates. Then click the Export Data button on the left-hand side.

Select the number of bases up- and downstream (even if you have searched using coordinates, you can add flanking sequence to what you download), which strand and the features you would like. Click "next". Select your download option. Your browser will save or display the data, depending on which format you select.

Go to the Genome Browser (in the Tools menu), and enter coordinates in the 'Search for:' box. The format is 'I:100000..200000' or 'I:100000-200000' (i.e. use Roman numerals to specify the chromosome, and don't include the word "chromosome"; use either '..' or '-' between the start and end coordinates.)

The genomic GTF file available on the Genome Datasets page includes all gene features. We plan to replace this with a GFF file in the future, and will add all other annotated genome features (such as repeats) to the file at that time.

The EMBL format files also contain all annotated features.

Another option for extracting all annotated features (or if you need to specify which feature types to include) is to use the Ensembl API. See the FAQ "Can I access PomBase via an API?" for more information on using the API.

Downloadable intron datasets are available in FASTA format from the Intron Data page.

You can also find genes with introns using the PomBase Advanced Search. To find all genes with introns, search for genes with a specified number of exons, and use the range 2 (i.e. at least one intron) to 20 (more than the maximum known, 16 introns). You can also restrict the search to protein-coding genes. Note that the PomBase count includes introns in UTRs.

Instructions for searching PomBase

  1. Go to the Advanced Search - http://www.pombase.org/spombe/query/builder
  2. Under "Select Filter" choose "Genes That Have N Exons" (under the "Gene Filters" heading)
  3. Enter values: Minimum 2, Maximum 20
  4. Optional: to restrict to protein-coding genes, click "+". Leave the operator set to "AND", and choose "Genes by Type", then choose "protein_coding".
  5. Click "Submit". The results page has links to download the resulting list of genes or the genomic, cDNA or protein sequences. Note that we plan to offer additional download options, including coordinates, in the future. In the meantime, see the FAQ on finding sequence features in a region.

Query link: protein-coding genes with 2-20 exons

A dataset of intron branch sites is available as a track in the Genome Browser. The data were published in:

Bitton DA, Rallis C, Jeffares DC, Smith GC, Chen YY, Codlin S, Marguerat S, Bähler J. LaSSO, a strategy for genome-wide mapping of intronic lariats and branch points using RNA-seq. Genome Res. 2014 Jul;24(7):1169-79. (PMID:24709818; DOI:10.1101/gr.166819.113)

To view this data track, follow the instructions in the track configuration FAQ, and select the Intron Branch Point track.

Transcript start and end coordinates from all sources will be available as individual data tracks in the Ensembl genome browser in the near future, which will allow you to view, evaluate and download them. We also provide downloadable UTR data sets that are updated periodically.

Also see the precedence criteria used to choose default UTR features to display on gene pages.

To retrieve UTRs for a specified list of genes, see the FAQ on downloading sequences for multiple genes (choose 5' UTR and/or 3' UTR in step 9).

Go to the Advanced Search - http://www.pombase.org/spombe/query/builder
Under "Select Filter" choose "Genes By Type", then choose "snoRNA". Click "Submit". You can download the resulting list of genes or the genomic sequences. Also see the FAQ on retrieving sequence coordinates.

Note that there are likely a number of snoRNAs that have not yet been identified and annotated in S. pombe; we hope to investigate further in the future.

Query: snoRNA genes

The reference genome sequence excludes most of the ribosomal DNA (rDNA) repeats, which are present in two tandem arrays on chromosome III. These arrays are estimated to be 1225 kb and 240 kb in size for the sequenced strain (972 h-). The reference sequence includes two partial and one complete representative rDNA repeats:

The complete repeat sequence coordinates are Chromosome 3:5542-13722 (note that the reverse strand is transcribed). The link goes to the PomBase Genome Browser, where you can view and download the sequence. Because the reverse strand is transcribed, you may want to choose "-1" in the location settings.

Also see the FAQ on finding rRNA genes.

S. pombe genome features were originally annotated using Artemis. As noted in the manual (ftp://ftp.sanger.ac.uk/pub/resources/software/artemis/artemis.pdf - see p. 9), Artemis draws from a list of feature keys that is documented at EBI: ftp://ftp.ebi.ac.uk/pub/databases/embl/doc/FT_current.html#7.2

In the genome sequence data files, features are defined using Sequence Ontology terms. Gene pages use a selection of human-friendly text descriptions for feature types. (Further details will be available here soon.)

To search PomBase for transposable elements:

  1. Go to the Advanced Search - http://www.pombase.org/spombe/query/builder
  2. Under "Select Filter" choose "Gene Annotation Status" and then choose "Transposon".
  3. Click "Submit". The results page has links to download the resulting list of genes or the genomic, cDNA or protein sequences. Note that we plan to offer additional download options, including coordinates, in the future.

At present, there are 11 full-length transposons annotated, and two frameshifted copies.

Query link: Transposons

Lone LTRs are also annotated as sequence features. They cannot yet be retrieved by the simple or advanced searches, but they can be displayed on a track in the Ensembl browser (under "Repeats").

Finally, if you wish to install Artemis (available from http://www.sanger.ac.uk/resources/software/artemis/), you can use it to view LTRs in more detail. Read in the EMBL format files of sequence and annotation (available from the Genome Datasets page). To see LTRs,

  1. In the Select menu, choose "By Key".
  2. In the pulldown that pops up, choose "LTR".

A video demonstrating Artemis installation is available on YouTube. See the Artemis FAQ and the Artemis manual (pdf; Sanger site) for additional information.

Although old cosmid sequences used in the reference assembly are not available in PomBase directly, they are all stored in the International Nucleotide Sequence Database Collaboration database (ENA, GenBank, DDBJ) archives. For ease of searching, PomBase curators recommend finding the accession, e.g. AL137130,  for a cosmid, and using GenBank to retrieve the sequence:

Go to http://www.ncbi.nlm.nih.gov/nucleotide/ (or choose "Nucleotide" in the search pull-down menu on any NCBI search page). Enter the accession. The resulting page will inform you that the sequence has been replaced by one of the whole-chromosome entries, but offers links to both the current chromosome entry and the obsolete contig entry.

There is no single transcriptome sequence file available from PomBase at present. Several transcriptomic data sets are available as tracks in the PomBase genome browser. The GFF3 genome feature files available from the Genome Datasets page include the coordinates of the annotated full-length transcript features.

The bioinformatically inclined can also use the Ensembl Genomes REST API to retrieve transcript feature coordinates. The FAQ on programmatic access to PomBase provides an introduction to using the API, some pombe-specific examples, and links to additional documentation.

The Broad Institute has archived genomic data files for the Schizosaccharomyces species, including transcript files.

Go to the Advanced Search - http://www.pombase.org/spombe/query/builder
Under "Select Filter" choose "Genes By Type", then choose "rRNA". Click "Submit". You can download the resulting list of genes or the genomic sequences. Also see the FAQ on retrieving sequence coordinates.

Also see the FAQ on rDNA sequences.

Query: rRNA genes

At present, if you need sequences for all tRNAs, rRNAs, other ncRNAs, etc. we recommend using the Advanced Search and results download as described in the FAQ on retrieving sequence coordinates for all features of a particular type.

Downloadable FASTA sequence datasets will be added to the PomBase FTP site in the near future.

We recommend using only the genome sequence, either from PomBase downloadable files or from the sequence retrieval tools on the gene pages and in the genome browser. Although there are some sequence updates still pending, the genome sequence is more accurate than individual gene sequences that predate the genome.

Many older S. pombe sequence submissions to the DNA databases (International Nucleotide Sequence Database Collaboration databases, i.e. ENA, GenBank, DDBJ) contain one or more errors (sometimes with an error rate as high as 20%), and we do not have the resources to maintain past sequences or flag every error in PomBase.

Replication origin coordinates are not yet include in PomBase. We hope to obtain comprehensive origin data in the genome browser in the future, but we rely on user submissions to set priorities for adding data tracks to the browser.

Until replication origins are available in PomBase, we suggest that you use the data collated by Conrad Nieduszynski in S. pombe OriDB:

A file of cDNA sequences in FASTA format is available on the Genome Datasets page.

Available options:

  1. Download one of the files available via the Genome Datasets page. The GFF3 files contain coordinates, whereas the EMBL- and GenBank-format files contain both coordinates and sequence data. You can then parse the files for the feature type you need. For example, to find all non-coding RNAs, search for "ncRNA_gene"; for coding sequences, use "CDS", etc. There are also separate files available for CDS and UTR data.
  2. If you only need genes, you can use the Advanced Search to find all genes of a given type. (Note that non-gene features such as repeats cannot be retrieved by this method.) Select the "Genes By Type" filter, then choose a type from the pulldown menu. The results include coordinates, and the "Download Results" options include sequences in FASTA format. If you need more than one feature type, query for each type and then use Query Management to combine the individual queries with the OR operator. See the Advanced Search documentation for more information.
  3. The bioinformatically inclined can also use the Ensembl Genomes REST API to retrieve transcript feature coordinates, as described in the FAQ on pombe transcriptome sequences. Select the desired feature type(s) from the output file of stable IDs (bear in mind that Ensembl idiosyncratically uses "biotype" to mean feature type).

Example advanced search query: snoRNA genes

Yes, sequence and feature annotations data files are available for each chromosome on the Genome Datasets page (FTP download).

See the Citing PomBase page, which lists papers to cite for PomBase, the S. pombe genome sequence, and Compara. Additional key papers may be added as needed.

Yes. First, make sure sequence display is "on":

  1. From the Location tab click on the "Configure this Page"
  2. In the left hand menu of the pop-up, click on "Sequence and assembly"
  3. If the box next to "Sequence" is blank, click it

(also see the FAQ on configuring tracks).

The Location tab shows two graphics. Between the two images there is a slider that controls the zoom. Click the '+' or drag the vertical bar to the left to zoom in. The lower graphic will first display colored blocks representing color-coded nucleotides, and then, at maximum zoom, legible sequence. (Note that the vertical bar may appear to be all the way to the left of the slider before you actually zoom in enough to read the sequence; keep clicking '+' if necessary.)

At any zoom level, use the arrows flanking the zoom slider to scroll along the sequence.

Yes, you can search for short nucleotide sequences, such as primers or other oligomers, in the PomBase BLAST. For sequences less than 20 nt long, however, you may need to change the search sensitivity from "Normal" to "Short sequences" using the pulldown menu at the bottom of the query form.

PomBase does not have its own tool for ID conversion. We suggest you try the EBI's PICR web service (http://www.ebi.ac.uk/Tools/picr/), which can convert between UniProtKB, RefSeq, Ensembl Genomes (including S. pombe systematic IDs) and many other common database IDs.

For UniProt IDs, we provide a static mapping file of PomBase systematic IDs and UniProtKB accessions, available on the Data Mapping page and by FTP from ftp://ftp.ebi.ac.uk/pub/databases/pombase/pombe/Mappings/PomBase2UniProt.tsv.

For other Schizosaccharomyces species (S. japonicus, S. octosporus, S. cryophilus), the PICR service converts between UniProtKB accessions and the identifiers used in Rhind et al. Comparative functional genomics of the fission yeasts (PMID:21511999), but with one caveat: the IDs used by PICR, UniProtKB, and QuickGO contain underscores, whereas those in Rhind et al. do not (e.g. SJAG_00455 versus SJAG00455). The IDs with underscores are correct.

Yes: the Ensembl Genomes REST API Endpoints page provides a REST-ful interface allows language-independent programmatic access to all genomes accessible through Ensembl Genomes, including the same Schizosaccharomyces pombe genome data available in PomBase. The REST interface provides data in a variety of formats including GFF3, FASTA and JSON. Data types accessible via this interface include:

  • genomic features, including genes and CDSs
  • genomic and protein sequences
  • cross-references including ontologies
  • gene trees and orthologues

In addition, the interface also provides access to the Variant Effect Predictor tool and a tool for mapping genomic coordinates between different versions of genome assemblies.

The user guide provides comprehensive descriptions of interface functionality, plus examples using a variety of languages and interfaces. The following URLs are examples specific to S. pombe:

Yes, the Ensembl API can be used with PomBase, as documented here:

  1. Ensembl Perl API installation instructions
  2. Ensembl core database API documentation
  3. Tutorial for using the API with the core database - Includes examples about connecting to the database, retrieving chromosomes, genes, transcript and translations along with the corresponding xrefs.

We will add examples for common API uses soon.

In the future, we plan to make Fission Yeast Phenotype Ontology (FYPO) terms and annotations available in a browser analogous to AmiGO or QuickGO. Until such a browser becomes available, FYPO is accessible in these external resources:

NCBO BioPortal - search on the BioPortal home page, go to the FYPO summary page, or go to the FYPO terms page. For assistance, see the "User Interface" part of the BioPortal Help.

EBI's Ontology Lookup Service (OLS) - search on the OLS home page or go to the FYPO page. Help is provided on each page.

No; at present only the Ensembl genome browser is available via the PomBase web site. (As of May 2013, we are investigating the possibility of adding an Artemis applet to PomBase, and will update this FAQ accordingly when it becomes available.)

If you want to browse the S. pombe genome in the Artemis environment, it is fairly easy to download and run locally:

Once you have loaded the file(s), you can do many different things, e.g.:

  • Find features by name or ID
  • Find all features of a given type (e.g. see the "can I find transposons" FAQ)
  • Find matches to a specific nucleotide sequence (e.g. see the "restriction enzyme map" FAQ)
  • View the nucleotide or amino acid sequence of a region or feature
  • Export selected sequences

A video demonstrating Artemis installation is available on YouTube. See the Artemis FAQ and the Artemis manual (pdf; Sanger site) for additional information.

The genomic GTF file available on the Genome Datasets page includes all gene features. We plan to replace this with a GFF file in the future, and will add all other annotated genome features (such as repeats) to the file at that time.

The EMBL format files also contain all annotated features.

Another option for extracting all annotated features (or if you need to specify which feature types to include) is to use the Ensembl API. See the FAQ "Can I access PomBase via an API?" for more information on using the API.

The Fission Yeast GO slim terms page provides a generic GO biological process slim for S. pombe, and shows total genes annotated to each term directly or to any of its descendants.

If you want GO slim annotations for your own list of S. pombe genes, we recommend the GO Term Mapper at Princeton. Upload your list of genes, and select "PomBase (S. pombe GOslim) (Process only)" from the "Choose GO slim" pulldown. GO Term Mapper's interface and documentation should make the rest straightforward, but let PomBase staff know if you have any problems.

For further information on using the generic S. pombe slim, or on creating your own GO slim, please see the Fission Yeast GO slimming tips page.

To search PomBase for transposable elements:

  1. Go to the Advanced Search - http://www.pombase.org/spombe/query/builder
  2. Under "Select Filter" choose "Gene Annotation Status" and then choose "Transposon".
  3. Click "Submit". The results page has links to download the resulting list of genes or the genomic, cDNA or protein sequences. Note that we plan to offer additional download options, including coordinates, in the future.

At present, there are 11 full-length transposons annotated, and two frameshifted copies.

Query link: Transposons

Lone LTRs are also annotated as sequence features. They cannot yet be retrieved by the simple or advanced searches, but they can be displayed on a track in the Ensembl browser (under "Repeats").

Finally, if you wish to install Artemis (available from http://www.sanger.ac.uk/resources/software/artemis/), you can use it to view LTRs in more detail. Read in the EMBL format files of sequence and annotation (available from the Genome Datasets page). To see LTRs,

  1. In the Select menu, choose "By Key".
  2. In the pulldown that pops up, choose "LTR".

A video demonstrating Artemis installation is available on YouTube. See the Artemis FAQ and the Artemis manual (pdf; Sanger site) for additional information.

Clones included in the final sequence are available from archives@sanger.ac.uk

GO term enrichment identifies GO terms that are significantly overrepresented (or underrepresented) among a set of genes.

At present PomBase does not have its own GO enrichment tool. We recommend using the Generic GO Term Finder at Princeton, because it offers a simple interface and up-to-date ontology and annotation data, including the current PomBase GO annotation dataset (you can upload your own backgound set, GO annotation file, or both). You can also use the GO Term Finder to retrieve all annotations for your gene list by setting the p-value to 1.

Before you perform an enrichment analysis, we recommend that you use the accompanying "slimming" tool, GO Term Mapper, which is useful for a broad overview of the annotation set (for more information, see the Fission Yeast GO slim terms page and FAQ). GO Term Mapper is especially useful if you use your own GAF for the enrichment, because it will show:

  • IDs in your gene list that are missing from the annotation set (the annotations in GO Term Mapper's database or your uploaded GAF)
  • Genes in your list that have annotations but do not map to the slim
  • Genes in your list that have no annotations

A few other enrichment tools are described on the GO Consortium's GO Enrichment Analysis page.

For any GO analysis, we strongly recommend that you describe your approach fully in methods, and include the release details (number and/or date) for PomBase and the GO terms and annotations you use.

You can use Compara via the genome browser to see multiple alignments:

  1. On any gene page, go to the Orthologs section (scroll or use the Quick Links box).
  2. Follow the relevant link to Compara - for fungal alignments, choose "View orthologs in other fungal species with Compara", or for all eukaryotic species choose "View orthologs across taxonomic space using pan-species Compara".
  3. You should see a "collapsed" gene tree highlighting your fission yeast gene of interest. From here you can click on any node to see a menu of options:
    1. Expand or collapes specific sub-nodes of the tree, or expand the tree fully
    2. View the alignment in FASTA format
    3. Launch the jalview multiple alignment viewer to see the full alignment and colour by residue conservation, hydrophobicity, etc.

To configure the protein entries visible in the alignment, select the most "inclusive" node you require. You can reduce the number of entries by collapsing individual sub-trees (step 4) before you generate your alignment. A brief video demostrates using the Compara trees.

Information about how the Compara trees are generated, homology types, and species are available here: http://fungi.ensembl.org/info/genome/compara/homology_method.html

For orthologs that are not manually curated by PomBase, we suggest two approaches:

Compara

You can search for orthologs/paralogs in Fungi, or in a pan-taxonomic comparison (eukaryotes), using Compara in the Ensembl browser.

  1. On any gene page, go to the Orthologs section (scroll or use the Quick Links box).
  2. Follow the relevant link to Compara - for fungal alignments, choose "View orthologs in other fungal species with Compara", or for all eukaryotic species choose "View orthologs across taxonomic space using pan-species Compara".
  3. You should see a "collapsed" gene tree highlighting your fission yeast gene of interest. From here you can click on any node to see a menu of options:
    1. Expand or collapes specific sub-nodes of the tree, or expand the tree fully
    2. View the alignment in FASTA format
    3. Launch the jalview multiple alignment viewer to see the full alignment and colour by residue conservation, hydrophobicity, etc.

To configure the protein entries visible in the alignment, select the most "inclusive" node you require. You can reduce the number of entries by collapsing individual sub-trees (step 4) before you generate your alignment. A brief video demostrates using the Compara trees.

Information about how the Compara trees are generated, homology types, and species is available from the Ensembl comparative genomics documentation.


YOGY

From any gene page, follow the link to YOGY under External References.

YOGY is a web-based resource for retrieving orthologous proteins from ten eukaryotic organisms and one prokaryote: Homo sapiens, Mus musculus, Rattus norvegicus, Arabidopsis thaliana, Dictyostelium discoideum, Drosophila melanogaster, Caenorhabditis elegans, Plasmodium falciparum, Escherichia coli, Schizosaccharomyces pombe, and Saccharomyces cerevisiae. Using a gene or protein identifier from any of these organisms as a query, this database provides comprehensive, combined information on orthologs in other species using data from five independent resources: KOGs, Inparanoid, Homologene, OrthoMCL

PomBase does not offer a converter. The Sequence Ontology site has a conversion script that can be used via a web form (at http://www.sequenceontology.org/cgi-bin/converter.cgi) or checked out from their CVS repository.

PomBase does not offer a GFF-to-GTF converter. There is a perl script on SEQanswers, which uses the module Bio::Tools::GFF from the BioPerl library, available from http://seqanswers.com/forums/showpost.php?p=22529&postcount=4

No, but this can be done within Artemis.

Install Artemis (available from http://www.sanger.ac.uk/resources/software/artemis/; a video is available).

You can then read in the EMBL format chromosome contig files of sequence and annotation (available from the Genome Datasets page). To generate a restriction map:

  1. Create a new entry using the "Create" menu item "New Entry"
  2. Toggle off the main annotation by un-checking the chromsome contig file (this will make your new file "no name" the active entry).
  3. Save your new file with your preferred name.
  4. Use the Create menu option "Mark From Pattern" to create features for any restriction patterns of interest and save them into your file.
  5. You can add "color" labels to distinguish the different restriction sites. See the Artemis FAQ and the Artemis manual (pdf; Sanger site) for additional information.

There is no single transcriptome sequence file available from PomBase at present. Several transcriptomic data sets are available as tracks in the PomBase genome browser. The GFF3 genome feature files available from the Genome Datasets page include the coordinates of the annotated full-length transcript features.

The bioinformatically inclined can also use the Ensembl Genomes REST API to retrieve transcript feature coordinates. The FAQ on programmatic access to PomBase provides an introduction to using the API, some pombe-specific examples, and links to additional documentation.

The Broad Institute has archived genomic data files for the Schizosaccharomyces species, including transcript files.

Although old cosmid sequences used in the reference assembly are not available in PomBase directly, they are all stored in the International Nucleotide Sequence Database Collaboration database (ENA, GenBank, DDBJ) archives. For ease of searching, PomBase curators recommend finding the accession, e.g. AL137130,  for a cosmid, and using GenBank to retrieve the sequence:

Go to http://www.ncbi.nlm.nih.gov/nucleotide/ (or choose "Nucleotide" in the search pull-down menu on any NCBI search page). Enter the accession. The resulting page will inform you that the sequence has been replaced by one of the whole-chromosome entries, but offers links to both the current chromosome entry and the obsolete contig entry.

There are two possible approaches:

1. Retrieve a set of GO annotations in GAF format for S. japonicus, S. octosporus or S. cryophilus, as described in the relevant FAQ. Use the GO annotation dataset and your gene list for enrichment.

OR

2. In your gene list of interest, substitute the Schizosaccharomyces species gene IDs with the IDs of orthologous S. pombe genes. For ortholog IDs, see the FAQ on Schizosaccharomyces orthologs, and use the indicated table from Rhind et al. Comparative functional genomics of the fission yeasts (PMID:21511999).

In either case, you can then proceed as described in the FAQ on S. pombe GO enrichment. For the first option, use the Princeton GO Term Finder or another enrichment tool that allows you to use your own GAF, and include the GO Slim analysis using GO Term Mapper as recommended in the FAQ on enrichment in S. pombe.

FYPO enrichment analysis is analogous to GO term enrichment, using phenotypes rather than GO annotations, i.e. analysing a gene list by finding FYPO terms that are significantly over- or under-represented among the annotations for the genes.

At present, PomBase does not have its own FYPO enrichment tool, and very few ontology enrichment tools can use phenotype data. One that does is AnGeLi, produced by Jürg Bähler's lab.

A small number of enrichment tools use phenotype data. See the FAQ on FYPO term enrichment.

You can use the Ensembl Genomes (EG) MySQL database access to query S. pombe data. Note, however, that there is often a time lag in updating EG, so it may not have data as up-to-date as on the PomBase web site. MySQL dumps of EG data, including Schizosaccharomyces species, are available from EG's FTP site. (We plan to provide MySQL dumps for PomBase releases soon.)

For Chado, we do not have a publicly accessible PostgresQL server. Instead, you can download Chado database dumps to query locally.

You can find the GO annotations for your genes corresponding to functional roles and localizations. Our recommended approach depends on how many specific topics you are interested in:

  • For a small number of specific GO terms (e.g. localization to the nucleus or cytoplasm, or a role in signaling or DNA metabolism), you can import your gene list into the Advanced Search and then combine it with a query for each term of interest (use the "Systematic IDs" filter for your list, and then the Term name or GO ID filter; see the search documentation for more information).
  • If you are interested in many GO terms, or if you do not know in advance which terms may be relevant, we recommend that you use a "GO term enrichment" tool. Such tools are typically used to find terms overrepresented for a gene list, but can be used to retrieve all GO annotations if the p-value threshold is set artificially high.

Both the Advanced Search and term enrichment tools take advantage of the hierarchical structure of GO, such that annotations to specific terms are propagated to "ancestor" terms via is_a and part_of relations. See the PomBase GO documentation, and the GO Consortium documentation linked there, for more information. (These approaches also make it easier to maintain and update your data than storing individual GO annotations locally.)

Also see the FAQ on GO term enrichment and the PomBase GO Slim page.

"GO term enrichment" refers to analysing a gene list by finding GO terms that are significantly over- or under-represented among the annotations for the genes. Finding GO terms that are shared by genes in your list can help you find out what they have in common biologically.

PomBase does not have its own GO enrichment tool, but we recommend one, and provide a bit more information, in the FAQ on GO term enrichment.

To visualise interaction networks for S. pombe genes, PomBase links to esyN from gene pages and GO slim terms. esyN is a web-based tool for building, sharing, and viewing network data developed by Dan Bean and Giorgio Favrin in the Cambridge Systems Biology Centre, University of Cambridge, UK1.

On gene pages, we have links to gene-specific interaction networks in esyN in the table headers of the Interactions sections:

  • The Genetic Interactions section links to all interactions centred on the gene and curated in BioGRID
  • The Physical interactions section has links to two datasets:

For example, the Genetic Interactions header for cdc2 links to http://www.esyn.org/builder.php?type=Graph&query=SPBC11B10.09&organism=4896&interactionType=genetic&source=biogrid

We also have esyN links on the GO Slim page and on ontology term pages for GO Slim biological process terms. Each GO Slim term links to the HCPIN physical interaction network in esyN. For
example, the GO Slim page and the ontology term page for "regulation of mitotic cell cycle" (GO:0007346) link to http://www.esyn.org//builder.php?type=Graph&term=GO:0007346&interactionType=physical&source=pombase&includeInteractors=false

Using the esyN network display:

  • A brief simulation is used to position the nodes initially. This layout can often be improved by continuing the simulation: click "Layouts" in the left-hand panel, then click "Force-Directed (improve)". Repeat until you like the network arrangement.
  • You can extend any network in esyN:
  1. Click on a node in the display.
  2. Click "Extend Network" in the right-hand bar.
  3. In the Advanced Tools box under the network display, first click "Get Interactions".
  4. In the table that appears, click "Add" buttons to add individual interactions, or click "Add all".


You can also use EsyN to:
Visualize interactions for a user-defined list of genes. To do this, visit http://www.esyn.org/builder.php, click on the "Network from list" option in the left-hand panel, and follow the instructions in the pop-up.

Build your own network -- either an Interactome graph or a Petri net -- from scratch (see the tutorial at http://www.esyn.org/tutorial.html). In both cases you can use the Advanced Tools to retrieve the interactions for a number of model organisms from several databases (see http://www.esyn.org/builder.php?type=Graph#interactions).

Save and share your networks. By logging in via the "My esyN" link (at the top of every esyN page), any user can save, share privately with collaborators, or publish any network.

Browse, view and modify, previously published, models (both graphs and Petri nets) at http://www.esyn.org/browse.php. We describe these networks are "public" in the sense of the open source movement, so that they are not only free to be copied, modified and (when possible) re-published, but we also actively encourage any collaborative effort to build and improve these biological networks.

1 esyN reference: Bean DM, Heimbach J, Ficorella L, Micklem G, Oliver SG, Favrin G. 2014. esyN: network building, sharing and publishing. PLoS One. 2014 Sep 2;9(9):e106035. doi: 10.1371/journal.pone.0106035. eCollection 2014. PMID:25181461

Data that appear on gene pages -- sequence feature annotations, ontology annotations, etc. -- are stored in a database that uses the Chado schema. Dumps from the Chado database for each PomBase release are available via the Downloads page.

PomBase offers two ways to view nucleotide-level similarity between S. pombe and S. japonicus, S. octosporus, or S. cryophilus. Both use the Genome Browser.

  1. To view nucleotide similarity data tracks in the browser, follow the usual steps as described in the data track FAQ. Select the data type "Comparative Genomics".
  2. Display syntenic regions as follows:
    1. Go to your region of interest in the browser (e.g. follow the link from a gene page or use sequence coordinates). Make sure the "Location" tab is selected in the horizontal set of tabs along the top.
    2. In the left-hand menu, find the "Comparative Genomics" heading, and click on "Region Comparison".
    3. To select a species for comparison, go to the bottom of the left-hand menu, and click the "Select species or regions" link (it may appear to be subtly blinking; we apologise for this anomaly).
    4. In the popup, click the "+" beside any species in the "Unselected species or regions" list to move it to the "Selected species or regions" list. Note: "lastz" is the nucleotide alignment algorithm used. Close the popup - click the tick/check mark in the upper right corner, or click outside the popup.
    5. Synteny views will now be visible in the bottom-most graphical display (scroll down if necessary). For any region in the S. pombe genome, pink tracks show the region in the second genome with the best nucleotide alignment. Green bands connect the best-aligned regions to highlight synteny.
    6. A video is available demonstrating this feature.

One gene can be correctly annotated to both a "viable" term and an "inviable" term from FYPO, under certain circumstances:

  • Different alleles may have different phenotypes; e.g., a deletion may be inviable, but a point mutation may be fully viable or conditionally lethal.
  • One allele may cause death under some, but not all, conditions.
  • An allele may cause only some cells in a population to die (this would be annotated using an "inviable cell" term, with an extension to indicate incomplete penetrance ("low" or "medium"), plus an annotation to a "viable cell population" term).
  • Cells that can divide for a few generations but then die are annotated as inviable, but can acquire suppressor mutations at a high enough frequency for populations to appear viable.

At present, alleles cannot be queried directly in the PomBase advanced search, but the FYPO phenotype filters do allow you to retrieve annotations for all alleles, or to restrict to null expression (deletions etc.) or overexpression of the wild-type allele. Comparing results with and without the allele restrictions may help resolve apparent discrepancies.

Note that it not yet possible to search for specific conditions, or for penetrance, but we plan to add these features to the Advanced Search.

If, however, the allele and condition details are identical, annotation to both viable and inviable terms is probably an error (either one of the terms is wrong, or there are missing or incorrect details for the alleles and/or conditions). Please let us know via the helpdesk if you notice any potential errors.

You can search for genes annotated to a Fission Yeast Phenotype Ontology term in the Advanced Search (http://www.pombase.org/spombe/query/builder or go to the Find tab and click "Advanced Search").

In the "Select Filter" pulldown, if you know the ID (for example, "inviable cell" is FYPO:0000049, and "elongated cell" is FYPO:0000017) choose "FYPO ID", and then type or paste the ID into the box. Otherwise, choose "FYPO Term Name" and start typing; the autocomplete feature will suggest phenotypes. Choose one, and click the Submit button to run the search. You can download the list in plain text or a few other formats from the query results page.

Note that the FYPO search retrieves annotations by following the is_a, part_of, output_of, has_output, and has_part relationships in the ontology. For example, FYPO includes the relation "inviable swollen elongated cell with enlarged nucleus" (FYPO:0002083) has_part "swollen cell" (FYPO:0000025). Genes annotated to FYPO:0002083 will therefore be retrieved in a search for FYPO:0000025. See the Advanced Search documentation for more information.

Example query: Genes annotated to "elongated cell" (FYPO:0000017), all alleles

Also see the FAQ on finding essential genes.

If an essential gene is deleted, the cell cannot survive under normal laboratory conditions. A search for deletion alleles annotated to the Fission Yeast Phenotype Ontology term "inviable vegetative cell population" (FYPO:0002061) would therefore identify essential fission yeast genes. Similarly, deletion alleles annotated to "viable vegetative cell population" (FYPO:0002060) represent non-essential genes.

Downloadable summary

A set of "viability summary" data, as shown at the top of the FYPO table on each gene page, is available as a downloadable file. The file has two columns: the gene systematic ID and one of three values: "viable", "inviable" or "condition-dependent".

Querying

  • To find genes annotated to "inviable vegetative cell population", select the "FYPO ID" filter and type or paste the ID, FYPO:0002061. Set the Allele Expression pulldown to "Null Expression" and submit the query. The results include all genes that showed inviable phenotypes in the HTP deletion project as well some manually annotated genes. Do the same for viable (FYPO:0002060).
  • For some deletion mutants, viability depends on experimental conditions, which cannot yet be queried in PomBase. These genes are annotated to both viable (FYPO:0002060) and inviable (FYPO:0002061) at once. To find them, use the "AND" operator in the Query Management panel (this search can also be set up all at once in the New Query panel).
  • See the Advanced Search documentation for more information on performing the searches described here.

A brief note about FYPO terms

At present, there are very few null mutants annotated as inviable in life cycle stages other than vegetative growth, and "inviable vegetative cell population" best fits the most common usage of "essential gene". If you do want to include other stages (such as "inviable spore"), you can use the very generic term "inviable cell population" (FYPO:0002059) or "viable cell population" (FYPO:0002058) in your query. All of the caveats about alleles and conditions still apply.

Query links

In the future, we plan to make Fission Yeast Phenotype Ontology (FYPO) terms and annotations available in a browser analogous to AmiGO or QuickGO. Until such a browser becomes available, FYPO is accessible in these external resources:

NCBO BioPortal - search on the BioPortal home page, go to the FYPO summary page, or go to the FYPO terms page. For assistance, see the "User Interface" part of the BioPortal Help.

EBI's Ontology Lookup Service (OLS) - search on the OLS home page or go to the FYPO page. Help is provided on each page.

PomBase uses Gene Ontology (GO) molecular function terms to capture the activities -- including enzymatic activities, binding, transporters, etc. -- of gene products. You can therefore use the GO filters in the Advanced Search to retrieve genes whose products have a given activity.

In the "Select Filter" pulldown, if you know the ID (for example, "histone acetyltransferase activity" is GO:0004402, and "calcium ion transmembrane transporter activity" is GO:0015085) choose "GO ID", and then type or paste the ID into the box. Otherwise, choose "GO Term Name" and start typing; the autocomplete feature will suggest terms. Choose one, and click the Submit button to run the search. You can download the list in plain text or a few other formats from the query results page. You can try using more specific or less specific terms to retrieve the results that best fit your expectations and needs. See the Advanced Search documentation and the Gene Page GO documentation for more information, including how ontology searches retrieve annotations to general terms.

Example query: phosphoprotein phosphatase activity (GO:0004721)

Gene Ontology (GO) cellular component annotations capture the localizations of gene products to subcellular structures such as organelles or complexes. GO Cellular Component annotations are displayed on PomBase gene pages as described in the PomBase GO documentation. The GO Consortium provides documentation that describes what the Cellular Component ontology includes. To search for proteins (or functional RNAs) with a particular localization, use the Gene Ontology filter in the Advanced Search to find genes annotated to the relevant GO Cellular Component term(s).

Pombase GO Cellular Component annotations include data from the whole-genome localization study (Matsuyama et al. 2006) as well as manually curated data from papers on small-scale experiments, and inferences from ortholog annotations. Macromolecular complex annotations are also available in a file (see FAQ).

Example query: nucleus (GO:0005634)

Yes, there is a file that lists GO macromolecular complex assignments for fission yeast gene products in the FTP directory:

ftp://ftp.ebi.ac.uk/pub/databases/pombase/pombe/Complexes/

Note that the complex inventory includes the RNA subunits of ribonucleoprotein complexes. There is some redundancy in the list, because some gene products are annotated to both complexes and subcomplexes. For example, subunits of the DASH complex (GO:0042729) are annotated to 'condensed chromosome outer kinetochore' (GO:0000940) as well as GO:0042729. Additional notes are available in a README file: ftp://ftp.ebi.ac.uk/pub/databases/pombase/pombe/Complexes/README

Also see the FAQ on localization.

Protein modifications (where curated) are included in the Modifications section on gene pages. (We plan to include RNA modifications later.) The Gene Page modifications documentation describes the display.

To retrieve all genes whose products have a given modification, use the PSI-MOD filter in the Advanced Search. In the "Select Filter" pulldown, if you know the ID (for example, "phosphorylated residue" is MOD:00696) choose "PSI-MOD ID", and then type or paste the ID into the box. Otherwise, choose "PSI-MOD Term Name" and start typing; the autocomplete feature will suggest terms. Choose one, and click the Submit button to run the search. See the Advanced Search documentation for more information, including how ontology searches retrieve annotations to general terms.

We are aware that protein modification curation is relatively incomplete. If you know of any protein modifications that are missing from the gene pages or the search results, please notify the PomBase curators.

Example query: phosphorylated residue (MOD:00696)

GO term enrichment identifies GO terms that are significantly overrepresented (or underrepresented) among a set of genes.

At present PomBase does not have its own GO enrichment tool. We recommend using the Generic GO Term Finder at Princeton, because it offers a simple interface and up-to-date ontology and annotation data, including the current PomBase GO annotation dataset (you can upload your own backgound set, GO annotation file, or both). You can also use the GO Term Finder to retrieve all annotations for your gene list by setting the p-value to 1.

Before you perform an enrichment analysis, we recommend that you use the accompanying "slimming" tool, GO Term Mapper, which is useful for a broad overview of the annotation set (for more information, see the Fission Yeast GO slim terms page and FAQ). GO Term Mapper is especially useful if you use your own GAF for the enrichment, because it will show:

  • IDs in your gene list that are missing from the annotation set (the annotations in GO Term Mapper's database or your uploaded GAF)
  • Genes in your list that have annotations but do not map to the slim
  • Genes in your list that have no annotations

A few other enrichment tools are described on the GO Consortium's GO Enrichment Analysis page.

For any GO analysis, we strongly recommend that you describe your approach fully in methods, and include the release details (number and/or date) for PomBase and the GO terms and annotations you use.

The Fission Yeast GO slim terms page provides a generic GO biological process slim for S. pombe, and shows total genes annotated to each term directly or to any of its descendants.

If you want GO slim annotations for your own list of S. pombe genes, we recommend the GO Term Mapper at Princeton. Upload your list of genes, and select "PomBase (S. pombe GOslim) (Process only)" from the "Choose GO slim" pulldown. GO Term Mapper's interface and documentation should make the rest straightforward, but let PomBase staff know if you have any problems.

For further information on using the generic S. pombe slim, or on creating your own GO slim, please see the Fission Yeast GO slimming tips page.

You can search for GO terms by name or ID in the PomBase Advanced Search, and retrieve a list of all genes annotated to the term and its descendants via the relations is_a, part_of, regulates, positively_regulates, and negatively_regulates. For example, a search for "cytokinesis" will include genes annotated to "regulation of cytokinesis". (See the GO documentation on Ontology Structure and Ontology Relations for more information.)

S. pombe GO annotations are also available in browsers that use the GO repository, notably AmiGO and QuickGO. Both browsers have extensive documentation available:

Hint: to find S. pombe annotations, use Taxon: 4896 (Schizosaccharomyces pombe) or Source: PomBase. You can download the results in GAF format.

In PomBase, GO IDs on gene pages link to QuickGO, and ontology detail pages for GO terms offer links to both AmiGO and QuickGO.

Annotation extensions can be used with annotations to terms from various ontologies, such as GO, FYPO, modifications, etc. Extensions provide additional specificity to the annotation by linking the term to another ontology term or a gene product via a relationship.

Extensions are most commonly used with GO annotations, where they can be used to capture details such as substrates of molecular functions or cell cycle phases during which a localization is observed. More information is available in the gene page GO annotation documentation. The GO Consortium provides further information on annotation extensions in its annotation documentation, including the file format guide, on a wiki page, and in a publication. PomBase converts many extension names to more human-friendly text, as described here.

Phenotype annotations using FYPO may have extensions that capture expressivity (severity) or penetrance, or identify a gene or gene product used in an assay, as described in the gene page phenotype documentation.

PBO is an internal set of terms used for various PomBase annotations that do not fit into any of the other ontologies in use. PBO IDs and term names can be queried in the Advanced search, and are most useful if you have noted a term or ID from a gene page. Examples include complementation annotations (e.g. cdc2 'functionally complemented by H. sapiens CDK1' PBO:0012584), disease association, and "miscellaneous" annotations (e.g. pom1 'forms a polar gradient' PBO:0000437).

There are two possible approaches:

1. Retrieve a set of GO annotations in GAF format for S. japonicus, S. octosporus or S. cryophilus, as described in the relevant FAQ. Use the GO annotation dataset and your gene list for enrichment.

OR

2. In your gene list of interest, substitute the Schizosaccharomyces species gene IDs with the IDs of orthologous S. pombe genes. For ortholog IDs, see the FAQ on Schizosaccharomyces orthologs, and use the indicated table from Rhind et al. Comparative functional genomics of the fission yeasts (PMID:21511999).

In either case, you can then proceed as described in the FAQ on S. pombe GO enrichment. For the first option, use the Princeton GO Term Finder or another enrichment tool that allows you to use your own GAF, and include the GO Slim analysis using GO Term Mapper as recommended in the FAQ on enrichment in S. pombe.

For the sequenced strains of S. japonicus, S. octosporus and S. cryophilus, the Ensembl group has generated GO annotation data sets for protein-coding genes by transferring experiment-based annotations from S. pombe orthologs. You can use the QuickGO browser to retrieve the data for each species -- follow the "Search and Filter GO annotation sets" link, then click "Filter" to set a taxon filter for the taxon ID:

S. japonicus (strain yFS275) - 402676
S. octosporus (strain yFS286) - 483514
S. cryophilus (strain OY26) - 653667

Because these automated annotations are inferred only from experimentally-derived S. pombe annotations, coverage will not be complete.

Note that the GAF downloaded from QuickGO uses UniProtKB accessions in the gene product ID column (column 2). To use the GAF in any further analysis, such as term enrichment, you will have to convert the accessions to systematic IDs. See the FAQ on ID mapping for hints.

One feasible approach to improve annotation coverage is to download the S. pombe GO annotations (see the GO Associations download page), and then substitute the S. pombe IDs with the IDs of orthologous genes from the other Schizosaccharomyces species of interest. For ortholog IDs, see the FAQ on Schizosaccharomyces orthologs, and use the indicated table from Rhind et al. Comparative functional genomics of the fission yeasts (PMID:21511999).

Note that some genes are present in S. japonicusS. octosporus or S. cryophilusbut absent fromS. pombe. For some of these gene products, GO annotations can be transferred from other species. If you wish to include annotations for these genes in your analysis you will need to use this option, and extend your GAF with the relevant annotation lines (contact the Helpdesk if you need assistance).

Combining all approaches gives the best coverage possible at present. You can use a "GO Slim" tool such as Princeton's GO Term Mapper to see if there are any gaps in coverage, as described in the FAQ on enrichment in S. pombe. Also see the FAQs on GO term enrichment in other Schizosaccharomyces species.

The best way to find metabolism-related annotations for S. pombe genes is to use the GO annotation data available from PomBase in combination with mappings between GO terms and entries in the various metabolism-oriented databases.

For example, many GO molecular function (MF) terms representing enzymatic activities are mapped to the corresponding Enzyme Commission (EC) number for the reaction, and some are also mapped to entries from KEGG or from the Rhea database of annotated chemical reactions. GO MF and biological process (BP) terms may be annotated to reactions or pathways, respectively, in MetaCyc or Reactome.

A complete list, with descriptions and links, is available on the GO Consortium's Download Mappings page.

FYPO enrichment analysis is analogous to GO term enrichment, using phenotypes rather than GO annotations, i.e. analysing a gene list by finding FYPO terms that are significantly over- or under-represented among the annotations for the genes.

At present, PomBase does not have its own FYPO enrichment tool, and very few ontology enrichment tools can use phenotype data. One that does is AnGeLi, produced by Jürg Bähler's lab.

A small number of enrichment tools use phenotype data. See the FAQ on FYPO term enrichment.

A selection of protein sequence motifs and features have been manually curated using terms  from the Sequence Ontology (SO). For example, Rad54 has a KEN box (a motif recognized by the anaphase-promoting complex; SO:0001807), and Cuf1 and Trz1 have nuclear localization signals (NLS; SO:0001528). These annotations are included in the Protein Features section of the gene page.

To search for these features, use one of the "Sequence Ontology" filters in the Advanced Search (see the documentation for help with searching).

Also see the FAQs on transmembrane domains and protein families, and the section of the search documentation on using Protein Filters.

Example query: nuclear localization signal (SO:0001528)

If there is complementation data available for an S. pombe gene, it will be displayed in the Complementation section of the gene page. For example, ura3 can be complemented by S. cerevisiae URA1, and itself complements human DHODH.

To search for complementation annotations, use one of the "PBO" filters in the Advanced Search (see the documentation for help with searching). The complementation descriptions are stored as entries in the PBO internal ontology, so a search for PBO term names that match "complements" or "complemented by" will retrieve genes with complementation data curated. The most general term, "complementation" (PBO:2000000) retrieves all genes that have any complementation annotation.

Example queries:

 

PomBase curators use GO Biological Process annotations to indicate that a gene product is directly involved in a process or its regulation. FYPO annotations indicate when a mutation in a gene causes a change in a process, but do not say whether the effect is direct or indirect.

Many mutant phenotypes reflect downstream effects of compromising an upstream process. In these cases, we annotate the phenotypes using FYPO terms, but do not annotate to the GO corresponding biological process term. We use "regulation of biological process" GO terms in cases where there is evidence for a gene playing a regulatory role in wild-type cells, but not where defects in an upstream process affect a downstream process (even though the latter is sometimes described as "regulating" or "modulating" the downstream process).

For example, a defect in cellular respiration may arise from mutations in genes directly involved in respiration, but also as a downstream effect of mutations in genes involved in mitochondrial translation, respiratory chain complex assembly, or ubiquinone biosynthesis. Similarly, DNA replication defects often also lead to defects in chromosome segregation; for the genes involved we annotate both replication and segregation phenotypes, but only replication in GO biological process.

The cell cycle offers an even more dramatic example of why we restrict usage of GO annotations. Over 750 genes can be mutated to give an elongated vegetative cell phenotype, which is traditionally interpreted as indicating that cell cycle progression is blocked in interphase. Most of these genes, however, are involved in transcription, translation, transport or splicing, and cell cycle delays seen in mutants are due to activation of cell cycle checkpoints by the abnormal processes. To annotate all 750 genes to "regulation of mitotic cell cycle" would obscure the genes that actually are part of the cell cycle regulatory network, greatly reducing the usefulness and precision of GO annotations.

Also see the FAQ on finding genes that affect a process.

The best way to find genes that have any effect on a process, we recommend searching for both GO and FYPO terms relevant to the process.

As described in the FAQ on GO and FYPO annotations, PomBase curators annotate all genes with phenotypes that affect a process, whereas GO annotations are restricted to genes whose products act directly in a process or its regulation. By querying for genes annotated to either a GO term or a FYPO term, you can find genes with relevant phenotypes (including "downstream effects") as well as genes involved in a process (with or without mutant phenotypes affecting the process).

Use the "OR" operator in the PomBase Advanced Search, available in Query Management, as described in the Advanced Search documentation. For example, to find genes that affect cellular respiration, search for "FYPO:0000078 (abnormal cellular respiration) OR GO:0045333 (cellular respiration)". For any process, you can try using more specific or less specific terms to retrieve the results that best fit your expectations and needs.

Example query: genes annotated to 'abnormal cellular respiration' (FYPO:0000078) or 'cellular respiration' (GO:0045333)

The GO annotations available from PomBase (gene pages, advanced search, etc.) and the GO Consortium site (AmiGO; GO downloads) differ from those available from the UniProt GOA site (including QuickGO) for three main reasons:

  1. RNA - PomBase provides GO annotations for functional RNAs (e.g. rRNA, tRNA, snRNA), but at present the UniProt GOA dataset only includes annotations for protein-coding genes.
  2. Time lag - S. pombe GO data are updated at the same time on the PomBase and GO Consortium sites, but the UniProt GOA site may be up to a few weeks behind.
  3. Filtering - PomBase does not include automated annotations that are redundant with manual annotations (contact the Helpdesk for further details). The GO Consortium site uses the same filtered annotation dataset as PomBase, whereas the UniProt GOA site includes the automated annotations.



You can find the GO annotations for your genes corresponding to functional roles and localizations. Our recommended approach depends on how many specific topics you are interested in:

  • For a small number of specific GO terms (e.g. localization to the nucleus or cytoplasm, or a role in signaling or DNA metabolism), you can import your gene list into the Advanced Search and then combine it with a query for each term of interest (use the "Systematic IDs" filter for your list, and then the Term name or GO ID filter; see the search documentation for more information).
  • If you are interested in many GO terms, or if you do not know in advance which terms may be relevant, we recommend that you use a "GO term enrichment" tool. Such tools are typically used to find terms overrepresented for a gene list, but can be used to retrieve all GO annotations if the p-value threshold is set artificially high.

Both the Advanced Search and term enrichment tools take advantage of the hierarchical structure of GO, such that annotations to specific terms are propagated to "ancestor" terms via is_a and part_of relations. See the PomBase GO documentation, and the GO Consortium documentation linked there, for more information. (These approaches also make it easier to maintain and update your data than storing individual GO annotations locally.)

Also see the FAQ on GO term enrichment and the PomBase GO Slim page.

"GO term enrichment" refers to analysing a gene list by finding GO terms that are significantly over- or under-represented among the annotations for the genes. Finding GO terms that are shared by genes in your list can help you find out what they have in common biologically.

PomBase does not have its own GO enrichment tool, but we recommend one, and provide a bit more information, in the FAQ on GO term enrichment.

Yes, the Phenotype annotations page offers two options, a complete phenotype annotation file and a "viability summary" for deletion mutants. At present, the full file contains all manually curated single mutant phenotypes, and is in the same format as PomBase uses for bulk phenotype data submissions (see the file formats FAQ). Further information on the viability summary is available in the essential genes FAQ.