    Sub-cellular localization special characters and formalisms

    To denote multiple localization possibilities that have been experimentally established we introduced the comma "," formalism whereas a slash "/" denotes two or more possible sub-cellular locations that have not yet been experimentally determined.

    Non-experimental qualifiers

    Each protein is assigned to a level of evidence accompanying the assigned sub-cellular location. Proteins experimentally verified are indicated as Experimental. In case of the abcense of evidence proteins are categorized following the non-experimental qualifiers defined in Uniprot database:

      Potential: There is some logical or conclusive evidence that the given annotation could apply. This non-experimental qualifier is often used to present results from protein sequence analysis software tools, which are only annotated if the result makes sense in the biological context of a given protein.
      Probable: Indicates stronger evidence than the qualifier "Potential". This qualifier implies that there must be at least some experimental evidence, which indicates, that the information is expected to be found in the natural environment of a protein.
      By similarity: When some biological information was experimentally obtained for a given protein (or part of it), it may be transferred to other protein family members within a certain taxonomic range, dependent on the biological event or characteristic.

    Prediction Score

    Proteins with unspecified sub-cellular location in Uniprot database have been assigned to a unique (where possible) or multiple locations based on the results of five bioinformatic tools:

      LipoP 1.0 Prediction of lipoproteins and signal peptides in Gram negative bacteria (Although LipoP 1.0 has been trained on sequences from Gram-negative bacteria only, O. Rahman et al report that it has a good performance on sequences from Gram-positive bacteria also.)
      SignalP 4.1 SignalP 4.1 server predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes. The method incorporates a prediction of cleavage sites and a signal peptide/non-signal peptide prediction based on a combination of several artificial neural networks.
      TMHMM 2.0 Predicts transmembrane helices in proteins.
      Phobius An esembled classifier of transmembrane topology and signal peptides.
      PRED-TAT Predicts twin-arginine and Sec signal peptides with Hidden Markov Models
    The predictions of the tools where combined a prediction score (0 to 5 stars) was assigned to them, based on the number of tools predicting the assigned sub-cellular location out of the total number of tools.

    Annotation rules

    Sub-cellular location was either based on Uniprot or on the results of prediction tools (see above). In the first case proteins are tagged with the word "Uniprot" indicating that the topology is based on the electronic annotations of Uniprot database. The remaining proteins are tagged with the word "SToPS" indicating that the annotation was based on the following criteria:

      Transmembrane: When the majority of the tools (see above) predict a transmembrane helix whereas the remainig tools predict either a signal peptide or no feature (e.g. three tools predict a TM and two a signal peptide).
      Secretory: When the majority of the tools predict a signal peptide (TAT, type I or type II) (e.g. four tools predict a signal peptide and one a TM).
      Lipoproteins: When it is secretory (above) and additionaly LipoP predict a type II signal peptide.
      Cytoplasmic: When LipoP predicts that the protein is cytoplasmic and the other tools predict the absence of either a TM or a signal peptide.
      Secretory or Transmembrane: When equal number of tools predict signal peptide and transmembrane region.

