| This section contains documentation and references related to PSORTb 
                  v.2.0. Documentation pertaining to the old version of PSORT-B 
                  (v.1.1) is still acessible here. A plain text version of the documentation is available here. 1. History Computational prediction of the subcellular localization of proteins 
                  is a valuable tool for genome analysis and annotation, since 
                  a protein's subcellular localization can provide clues regarding 
                  its function in an organism. For bacterial pathogens, the prediction 
                  of proteins on the cell surface is of particular interest due 
                  to the potential of such proteins to be primary drug or vaccine 
                  targets. A protein's subcellular localization is influenced 
                  by several features present within the protein's primary structure, 
                  such as the presence of a signal peptide or membrane-spanning 
                  alpha-helices.  Several algorithms have been developed 
                  to analyze single features such as these, however the PSORT 
                  family of programs analyzes several features at once, using 
                  information obtained from each analysis to generate an overall 
                  prediction of localization site. Developed by Kenta Nakai in 
                  1991, PSORT is an algorithm which assigns a probable localization 
                  site to a protein given an amino acid sequence alone. Originally 
                  developed for prediction of protein localization in Gram-negative 
                  bacteria, PSORT was expanded into a suite of programs (PSORT, 
                  PSORT II, iPSORT) capable of handling proteins from all classes 
                  of organisms.  The Brinkman Laboratory headed development of PSORT-B, an updated 
                  version of the PSORT algorithm designed for Gram-negative bacterial 
                  proteins. PSORT-B includes new analytical modules designed to 
                  capitalize on new discoveries and observations in protein sorting, 
                  and benefits from a training dataset of over 1400 proteins of 
                  known localization. Its focus is on precision over recall to 
                  faciliate accurate predictions, at the expense of not making 
                  as many predictions as other methods may make. PSORT-B v.1.1 
                  was released in July, 2003 and has now been succeeded by PSORTb 
                  v.2.0. 2. PSORTb v.2.0 vs previous versions of PSORT 
                  and PSORT-B The original version of PSORT, still frequently 
                  used for prediction of prokaryotic localization sites, used 
                  a number of analyses arranged in an if/then rule-based format 
                  to determine which of four localization sites a protein might 
                  be resident at - cytoplasm, periplasm, inner or outer membrane 
                  (see the documentation available at the PSORT WWW server for a full 
                  explanation). The PSORTb algorithm, however:  
                   
                    uses updated versions of several of these analyses, as 
                      well as several novel analytical methods  
                     utilizes a probabilistic system for determination of 
                      a final prediction, rather than a rule-based system  
                     is capable of predicting all localization sites (PSORT 
                      I does not predict extracellular proteins) 
                    does not force a prediction, returning a prediction of "Unknown" 
                      if no prediction is made 
                    displays a 28% increase in precision (% of correct predictions) relative 
                      to PSORT I Furthermore, PSORTb 
                  v.2.0 offers several improvements over v.1.1: 
                   
                    prediction of Gram-positive proteins 
                      added 
                    increased coverage (more predictions are made) 
                    automated flagging of proteins with potential multiple localization 
                      sites Note also the change in name between PSORT-B 
                  v.1.1 and PSORTb v.2.0 - the hyphen was eliminated in order 
                  to avoid conflicts during pattern matching or other searches. 3. PSORTb v.2.0: 
                  Analytical Modules   PSORTb v.2.0 consists of multiple analytical 
                  modules, each of which analyzes one biological feature known 
                  to influence or be characteristic of subcellular localization. 
                  The modules may act as a binary predictor, classifying a protein 
                  as either belonging or not belonging to a particular localization 
                  site, or they may be multi-category, able to assign a protein 
                  to one of several localization sites. When analyzing a Gram-negative 
                  organism, possible localization sites are: cytoplasm, cytoplasmic 
                  membrane, periplasm, outer membrane and extracellular space. 
                  Gram-positive localization sites include: cytoplasm, cytoplasmic 
                  membrane, cell wall and extracellular space. All modules are 
                  capable of returning a negative prediction as well, such that 
                  a protein will not be forced into one of the localization sites. 
                  
                  3.1 SCL-BLAST & SCL-BLASTe, 
                    or SubCellular Localization BLAST, is a BLAST-P search against 
                    the current local database of 
                    proteins of known subcellular localization. An E-value 
                    cutoff of 10e-10 is used to ensure that returned HSPs represent 
                    true homologs, and an additional length restriction is placed 
                    on any subject matches - the length of the query:subject HSP 
                    must be within 80-120% of the length of the subject protein, 
                    thus reducing potential errors associated with the domain 
                    nature of proteins. SCL-BLAST selects the top-scoring HSP 
                    from the list of results, and returns that protein's localization 
                    site as its prediction, along with the name of the top-scoring 
                    HSP and the associated E-value. SCL-BLAST is capable of assigning 
                    a protein to any one of the possible localization sites. SCL-BLASTe 
                    is a specialized implementation of this analysis, in which 
                    a user's query protein is checked to see if it is an exact 
                    match to a protein in the SCL-BLAST database. If an exact 
                    match is found (100% similarity and within 1aa length), the 
                    protein is immediately predicted as residing at that localization 
                    site, and is not passed to subsequent modules. 3.2 Support Vector Machines (SVMs) 
                    are machine learning-based classifiers trained to 
                    classify a protein as belonging or not belonging to the set 
                    of proteins at a specific localization site. PSORTb v.2.0 
                    contains 9 SVMs, one for each of the localization sites (5 
                    Gram-negative and 4 Gram-positive). Trained using frequent 
                    sequences mined from proteins resident at a specific localization 
                    site, each SVM will examine a query protein and determine 
                    whether it does or does not belong at the localization site 
                    in question. If the SVM believes to the protein to belong 
                    to that particular site, that result is returned. Otherwise, 
                    an unknown prediction is returned. 3.3 Motif & Profile Analysis 
                    relies on the observation that a protein's function is closely 
                    linked to its localization, and that several PROSITE motifs 
                    characteristic of specific functions can be used to infer 
                    specific localizations. Several potentially important motifs 
                    were used to scan our current dataset, and motifs with a false-positive 
                    rate of 0% were built into PSORTb, as were expanded versions 
                    of the motifs termed "profiles. Note that the 0% false-positive 
                    rate is based on the dataset of proteins of known localization 
                    that we currently have access to. We wish to emphasize that 
                    this does not necessarily mean that such motifs and profiles 
                    will always be 100% accurate. If you identify an incorrect 
                    prediction, please contact us. 
                    A submitted protein is scanned for the occurrence of any of 
                    these motifs or profiles, and, if found, the localization 
                    site associated with the motif/profile is returned as the 
                    program's prediction. Motifs associated with each of the possible localization 
                    sites are included in PSORTb.  3.4 Outer Membrane Motif Analysis 
                    uses motifs generated from data mining techniques 
                    applied to a set of 425 beta-barrel proteins to classify a 
                    query protein as outer membrane or non-outer membrane (She 
                    et al, 2003). The A Priori algorithm was used to mine for 
                    short motifs found more often in outer membrane proteins than 
                    in proteins at the other four localization sites. Over 250 
                    such motifs were identified, and a query protein is scanned 
                    for the co-occurrence of two or more of these motifs. A prediction 
                    of outer membrane is returned if successful.  3.5 HMMTOP (Tusnady, 1998) is used to identify transmembrane alpha helices, 
                    which can then be used to identify proteins spanning the cytoplasmic 
                    membrane. Our analyses have shown that when three or more 
                    TMHs are predicted in a protein, there is a 94% chance of 
                    that protein being an inner membrane protein. PSORTb uses 
                    HMMTOP (Tusnady, 1998), a hidden Markov model-based method, to identify 
                    transmembrane helices and returns a prediction of cytoplasmic 
                    membrane if 3 or more are found.  3.6 A  Signal Peptide 
                    directs a protein for export past the cytoplasmic membrane, 
                    and thus can be further used to differentiate cytoplasmic 
                    and non-cytoplasmic proteins. A hidden Markov model was trained 
                    on the dataset used to train the SignalP program, and is used 
                    to predict potential signal peptide cleavage sites. If a cleavage 
                    site with a high probability value is not found, the first 
                    70 amino acids of the protein are passed to a support vector 
                    machine module trained on the same data. If the SVM is unable 
                    to recognize a signal peptide, the protein is predicted not 
                    to have one and is classified as cytoplasmic. However, a protein 
                    may possess a non-traditional signal peptide, so the results 
                    of this analysis carry less weight than do other modules when 
                    generating a final prediction.  4. PSORTb v.2.0: Final Prediction  
                    In order to generate a final prediction, the results of each module 
                  are combined and assessed. A probabilistic method and 5-fold 
                  cross validation were used to assess the likelihood of a protein 
                  being at a specific localization given the prediction of a certain 
                  module. These likelihoods are used to generate a probability 
                  value for each of the five localization sites for a user's query 
                  protein.  PSORTb v.2.0 returns a list of the five localization sites and the 
                  associated probability value for each. We consider 7.5 
                  to be a good cutoff above which a single localization 
                  can be assigned, and our precision and recall values for the 
                  program are calculated using this cutoff. In certain cases, two localization sites may both exhibit high scores, 
                  which may indicate a protein with domains present in neighbouring 
                  localization sites. In cases where a localization site has a 
                  score between 4.5 (for Gram-negative) and 5.0 (for Gram-positive) 
                  and 7.49, the result returned to the user will say "Unknown 
                  - This protein may have multiple localization sites". In 
                  cases like these, we recommend you examine the long format output 
                  of the program's prediction to draw your own conclusion. This section of the documentation will be updated 
                  as changes are made to the web interface. Please check back 
                  often for up-to-date instructions on program use. 5.1 Accessing PSORTb  
                  5.1.1 WWW Access: PSORTb is available online at http://www.psort.org. The sequence submission form 
                    for the current version of the program is located at http://www.psort.org/psortb2. The first release 
                    of the program is still accessible, at http://www.psort.org/psortb/v1index.html. 5.1.2 Standalone PSORTb: PSORTb is also available 
                    as a standalone program to run in a Linux environment. The 
                    file, as well as instructions for installation, is available 
                    at the PSORTb Downloads page. 5.2 Submitting a Sequence for Analysis on the WWW  
                  5.2.1 Sequence Submission: The sequence submission 
                    form can be found at http://www.psort.org/psortb/. One or more sequences 
                    can be pasted into the text box, or the "upload from 
                    file" option can be used to analyze a file of one or 
                    more sequences stored on your computer. When using the text 
                    box option, please note that a maximum of 600,000 characters 
                    can be pasted into the box. 5.2.2 Selecting Gram Stain: PSORTb 
                    v.2.0 performs different analyses depending on the class of 
                    organism. You are required to choose the appropriate Gram-stain 
                    for your sequences. Not sure which option to select? Our Genomes page lists the classifications we used 
                    when we analyzed sequenced genomes. If your organism is not 
                    found there, try the NCBI Taxonomy Browser which provides a rough taxonomy for 
                    many bacterial species which may be helpful (for example, 
                    there is an association between proteobacteria and Gram-negative 
                    stain properties) or see the authoratative Bergey's Manual for 
                    Gram-stain properties for your microbe of interest.  5.2.3 Acceptable Organisms: PSORTb v.2.0 accepts 
                    protein sequences from Gram-negative and Gram-positive bacteria. 
                    All protein sequences from Archaea and eukaryotic organisms 
                    must be analyzed using a different tool. See the Resources page for possible options.  5.2.4 Acceptable Formats: PSORTb 
                    requires that a PROTEIN sequence be submitted in FASTA format. 
                     A sequence within a FASTA sequence file consists of three parts:  
                     A title line, which must begin with a `>' symbol, 
                      and may be followed by any type of text A newline character at the end of the title line  The sequence itself, which continues until the end of 
                      file or the next `>' is reached  An example of FASTA format is shown below: >gi|31562958|sp|Q8CWD2|BTUF_ECOL6 MAKSLFRALVALSFLAPLWLNAAPRVITLSPANTELAFAAGITPVGVSSYSDYPLQAQKIEQVSTWQGMN
 LERIVALKPDLVIAWRGGNAERQVDQLASLGIKVMWVDATSIEQIANALRQLAPWSPQPDKAEQAAQSLL
 DQYAQLKAQYADKPKKRVFLQFGINPPFTSGKESIQNQVLEVCGGENIFKDSRVPWPQVSREQVLARSPQ
 AIVITGGPDQIPKIKQYWGEQLKIPVIPLTSDWFERASPRIILAAQQLCNALSQVD
 For more information, see the description at NCBI or contact us. 5.2.5 Whole Genome Analysis: In order to reduce the 
                    load on the PSORTb servers, precalculated results for whole 
                    bacterial genomes are available on the PSORTb site, on the 
                    Genomes page.  5.3 Submitting a Sequence for Analysis 
                  to Standalone PSORTb  
                  5.3.1 Sequence File: One 
                    or more sequences in FASTA format can be submitted to standalone 
                    PSORTb, provided they are all contained within one file (e.g. 
                    mysequences.txt) and are all from the same Gram class of organism. 
                    If you have both Gram-negative and Gram-positive sequences 
                    you wish to analyze, they must be divided into two files and 
                    run separately. 5.3.2 Command line syntax: 
                    Standalone PSORTb contains several options and arguments, 
                    which are described below. The most basic command, however, 
                    which will be sufficient for most instances, is: $ 
                    psort [-p|-n] mysequences.txt > mysequences.out 
                    psort 
                      calls the PSORTb program-p 
                      (Gram-positive) or -n 
                      (Gram-negative) tells the program which predictive model 
                      to usemysequences.txt 
                      is the name of your FASTA file containing the sequences 
                      to be analyzed> 
                      mysequences.out sends the output to a new 
                      file that will be created called mysequences.out. If no 
                      > is used, the output will be written to the terminal 
                      display. 
                      Usage: 
                        psort [-p|-n] [OPTIONS] [SEQFILE]Runs psort on the sequence file SEQFILE . If SEQFILE isn't 
                        provided then sequences will be read from STDIN.
 --help, -h Displays usage information
 --positive, -p Gram positive bacteria
 --negative, -n Gram negative bacteria
 --cutoff, -c Sets a cutoff value for reported results
 --divergent, -d Sets a cutoff value for the multiple
 localization flag
 --hmmtop, -h Specifies the path to the HMMTOP installation. 
                        If
 not set, defaults to the value of the PSORT_HMMTOP
 environment variable.
 --matrix, -m Specifies the path to the pftools instalation. 
                        If
 not set, defaults to the value of the PSORT_PFTOOLS
 environment variable.
 --format, -f Specifies sequence format (default is FASTA)
 --output, -o Specifies the format for the output (default 
                        is
 'normal' Value can be one of: terse, long or normal
 --root, -r Specify PSORT_ROOT for running local copies. 
                        If
 not set, defaults to the value of the PSORT_ROOT
 environment variable.
 --server, -s Specifies the PSort server to use
 --verbose, -v Be verbose while running
 5.3.3 Help: Typing psort 
                    -h at the command prompt will bring up a list of available 
                    options and usage instructions.  5.4 Understanding the Output  
                  5.4.1 Output Formats: PSORTb 
                    allows the user to select one of three output formats from 
                    the sequence submission screen: Normal, Tab-delimited (terse 
                    format) and Tab-delimited (long format). Normal output is 
                    recommended for analysis of one or a few sequences, whereas 
                    tab-delimited output in either format is recommended for the 
                    analysis of a large number of sequences. The output formats 
                    are described below. If you would like to try the examples 
                    given below for yourself, input sequences are below: Gram-positive input sequence:  
                    >SAK_BPP42MLKRSLLFLTVLLLLFSFSSITNEVSASSSFDKGKYKKGDDASYFEPTGPYLMVNVTGVDGKRNELLSPR
 YVEFPIKPGTTLTKEKIEYYVEWALDATAYKEFRVVELDPSAKIEVTYYDKNKKKEETKSFPITEKGFVV
 PDLSEHIKNPGFNLITKVVIEKK
 Gram-negative input sequence:  
                    >NP_949347.1MQGHHFGGDMSNSEAIDNTTAKLRLAQSSSLLALALLIGSAPAQAADTDWGWLAIGAPAATAQGWTGKGV
 VIGVVDTGIDFSHPALSGRAFDYNYGSFVAGSNHPHATHVAGIIGATDINRGMEGVAPDVRFSSMKIFTG
 AGGSYLGDAAVADAYDGAIGSGVRIFNNSWGSSDSIANFTSREELLAHEPLLVGAFTRAVNADAVLVWST
 GNDGRSQPSWQAAAPYYIQELKANWIAVTSVGENGTIASYANACGVAKAWCLAAPGGDFNPGIYSTIPGK 
                      DYGYMSGTSMAAPYVTGATAIARQMFPKASGAQLAQIVLQTSRDIGAPGIDDVYGWGLLAVDNIVDTINP
 RGAALFASAAWGRFTTLSAIGNTVLDRISDLRNGRGDVVTAPLAFAGQNGAFSQSGSNPRNAYAADLAAA
 PQPSPLGFGSVWARGLAGRATLSGSASSPQTTADISGGLLGFDLVNNQNLLVGIAGGGSNTNLTASGISD
 KAGAQAWHVLGYAAAMYGPAFVNVAGGWNSFDQSYQRRVIPGTAGTVFASTISAAQSSSTDVAYFFQGRG
 GWTFQTEVGRIEPYVHGATRNQSFGGFSETNASIFSLSVPSASLSEAEYGAGVRWACAPIKTVDQRVAVA
 PTIDLAYVRFTNDGPIQVETNLLGTSVVGQTAALGADAIRVAAGLSLTSLAGISGSFGYTGTVRDAATAH
 TVSGGLSIKF
 
 5.4.2 Normal Output: The Normal output option displays 
                    the results of each of PSORTb's analytical modules, the localization 
                    scores for each of the 5 sites, as well as a final prediction 
                    and associated score (if one site scores above the 7.5 cutoff). 
                    Below are examples of both Gram-positive and Gram-negative 
                    output, using the input sequences given in 5.3.1. Descriptions 
                    of the output fields can be found beneath each output example.  
                    Gram-positive sample output:  
                    
                       
                        | SeqID: 
                          SAK_BPP42 |   
                        | Analysis 
                          Report: |  |   
                        | CMSVM+ | Unknown | [No 
                          details] |   
                        | CWSVM+ | Unknown | [No 
                          details] |   
                        | CytoSVM+ | Unknown | [No 
                          details] |   
                        | ECSVM+ | Extracellular | [No 
                          details] |   
                        | HMMTOP | Unknown | [1 
                          internal helix found] |   
                        | Motif+ | Unknown | [No 
                          motifs found] |   
                        | Profile+ | Unknown | [No 
                          matches to profiles found] |   
                        | SCL-BLAST+ | Extracellular | [matched 
                          134189: 
                          Extracellular protein] |   
                        | SCL-BLASTe+ | Unknown | [No 
                          matches against database] |   
                        | Signal+ | Non-cytoplasmic | [Signal 
                          peptide detected] |   
                        | Localization 
                          Scores: |   
                        | Cytoplasmic | 0.0 |   
                        | CytoplasmicMembrane | 0.0 |   
                        | Cellwall | 0.2 |   
                        | Extracellular | 9.98 |   
                        | Final 
                          Prediction: |   
                        | Extracellular | 9.98 |  SeqID returns whatever was found on the title line of the FASTA format 
                    input file.  The Analysis Report contains the results of each of PSORTb's analytical 
                    modules. The module name is listed in the left-most column, 
                    the centre column contains the localization site predicted 
                    by that module (or "Unknown" if the module did not 
                    generate a prediction), and the right-most column contains 
                    comments related to the modules' findings. The modules in 
                    the Gram-positive version are as follows: 
                    CMSVM+: The Gram-positive version of the support vector 
                      machine trained to identify cytoplasmic membrane proteins. 
                      Returns cytoplasmic membrane or unknown.CWSVM+: The support vector machine trained to identify 
                      cell wall proteins (Gram-positive only). Returns cell wall 
                      or unknown.CytoSVM+: The Gram-positive version of the support vector 
                      machine trained to identify cytoplasmic proteins. Returns 
                      cytoplasmic or unknown.ECSVM+: The Gram-positive version of the support vector 
                      machine trained to identify extracellular proteins. Returns 
                      extracellular or unknown.HMMTOP: Predicts transmembrane helices within the sequence. 
                      The presence of 3 or more transmebrane helices causes the 
                      module to return a prediction of cytoplasmic membrane, otherwise 
                      unknown is returned. The Details column returns the number 
                      of predicted helices.Motif+: Searches the sequence for Gram-positive 
                      motifs indicative of a specific localization site. If 
                      a match occurs, the localization site associated with that 
                      motif is reported, otherwise unknown is returned. The details 
                      column returns a link to the motif in PROSITE.Profile+: Searches the sequence for Gram-positive 
                      profiles indicative of a specific localization site. 
                      If a match occurs, the localization site associated with 
                      that profile is reported, otherwise unknown is returned. 
                      The details column returns a link to the profile in PROSITE.SCL-BLAST+: Performs a BLASTP search against the Gram-positive 
                      subset of the current PSORTdb 
                      dataset. If a match is found, its associated localization 
                      site is returned and a link to that protein's record at 
                      NCBI is provided in the Details column.SCL-BLASTe+: Like SCL-BLAST, but only returns a match 
                      if the query and subject have 100% similarity and are within 
                      1aa in length of each other. If a match is found, its associated 
                      localization site is returned and a link to that protein's 
                      record at NCBI is provided in the Details column.Signal+: Searches the sequence for the presence of a Gram-positive 
                      cleavable N-terminal signal peptide. If a signal peptide 
                      is detected, the module returns a prediction of non-cytoplasmic, 
                      otherwise a result of unknown is returned. In the Localization Scores area, the confidence value for each of 
                    the localization sites are given. If one of the sites has 
                    a score of 7.5 or greater, this site and its score are returned 
                    in the Final Prediction section. If two sites have 
                    high scores, a flag of "This protein may have multiple 
                    localization sites" is also returned in the Final Prediction 
                    field.   
                    Gram-negative sample output (to illustrate multiple localization):  
                    
                       
                        | SeqID: 
                          NP_949347.1 |   
                        | Analysis 
                          Report: |  |   
                        | CMSVM- | Unknown | [No 
                          details] |   
                        | CytoSVM- | Unknown | [No 
                          details] |   
                        | ECSVM- | Extracellular | [No 
                          details] |   
                        | HMMTOP | Unknown | [No 
                          internal helices found] |   
                        | Motif- | Unknown | [No 
                          motifs found] |   
                        | OMPMotif- | Unknown | [No 
                          motifs found] |   
                        | OMSVM- | OuterMembrane | [No 
                          details] |   
                        | PPSVM- | Unknown | [No 
                          details] |   
                        | Profile- | Unknown | [No 
                          matches to profiles found] |   
                        | SCL-BLAST- | OuterMembrane, 
                          Extracellular | [matched 
                          3646417: 
                          Outer membrane (Autotransporter)] |   
                        | SCL-BLASTe- | Unknown | [No 
                          matches against database] |   
                        | Signal- | Non-cytoplasmic | [Signal 
                          peptide detected] |   
                        | Localization 
                          Scores: |   
                        | Cytoplasmic | 0.00 |   
                        | CytoplasmicMembrane | 0.00 |   
                        | Periplasm | 0.00 |   
                        | OuterMembrane | 5.87 |   
                        | Extracellular | 4.13 |   
                        | Final 
                          Prediction: |   
                        | Unknown 
                          (This protein may have multiple localization sites) |    The modules which differ between those described for the Gram-positive 
                    version of PSORTb are listed below: 
                    In the Localization Scores area, the confidence value for each of 
                  the localization sites are given. If one of the sites has a 
                  score of 7.5 or greater, this site and its score are returned 
                  in the Final Prediction section. If two sites have high 
                  scores, a flag of "This protein may have multiple localization 
                  sites" is also returned in the Final Prediction field.CMSVM-: The Gram-negative version of the support vector 
                      machine trained to identify cytoplasmic membrane proteins. 
                      Returns cytoplasmic membrane or unknown.CytoSVM-: The Gram-negative version of the support vector 
                      machine trained to identify cytoplasmic proteins. Returns 
                      cytoplasmic or unknown.ECSVM-: The Gram-negative version of the support vector 
                      machine trained to identify extracellular proteins. Returns 
                      extracellular or unknown.HMMTOP: See above.Motif-: Searches the sequence for Gram-negative 
                      motifs indicative of a specific localization site. If 
                      a match occurs, the localization site associated with that 
                      motif is reported, otherwise unknown is returned. The details 
                      column returns a link to the motif in PROSITE.OMPMotif-: Searches the sequence for Gram-negative 
                      outer membrane protein motifs. If a match occurs, outer 
                      membrane is reported, otherwise unknown is returned. The 
                      details column returns the numerical identifiers of the 
                      motifs found.OMSVM-: The support vector machine trained to identify 
                      outer membrane proteins. Returns outer membrane or unknown 
                      (Gram-negative only).PPSVM-: The support vector machine trained to identify 
                      periplasmic proteins. Returns periplasm or unknown (Gram-negative 
                      only).Profile-: Searches the sequence for Gram-negative 
                      profiles indicative of a specific localization site. 
                      If a match occurs, the localization site associated with 
                      that profile is reported, otherwise unknown is returned. 
                      The details column returns a link to the profile in PROSITE.SCL-BLAST-: Performs a BLASTP search against the Gram-negative 
                      subset of the current PSORTdb dataset. 
                      If a match is found, its associated localization site is 
                      returned and a link to that protein's record at NCBI is 
                      provided in the Details column.SCL-BLASTe-: See aboveSignal-: Searches the sequence for the presence of a Gram-negative 
                      cleavable N-terminal signal peptide. If a signal peptide 
                      is detected, the module returns a prediction of non-cytoplasmic, 
                      otherwise a result of unknown is returned.  
                  5.4.3 Tab-delimited (Terse Format) Output: Tab-delimited 
                    terse format output returns a list of inputted sequences, 
                    each one on a new line, with 3 columns: SeqId contains the 
                    information from the FASTA file definition line, Localization 
                    contains the final prediction of localization site (or "Unknown" 
                    is no site scored above 7.5), and Score contains the confidence 
                    value associated with this localization site. Tab characters 
                    occur between the columns, and, in the case of a multiple 
                    sequence submission, each sequence record is separated by 
                    newline characters. This format can be easily read into a 
                    spreadsheet, using a program such as MS Excel. 5.4.4 Tab-delimited (Long Format) Output: Tab-delimited 
                    long format output returns a list of inputted sequences, each 
                    one on a new line, and with all of the information from the 
                    PSORTb results placed into columns. The SeqId, module results 
                    and comments from the analysis report, localizations and scores, 
                    and the final prediction and score are each placed into their 
                    own column.  PSORTb is designed to emphasize precision (or 
                  specificity) over recall (or sensitivity), and as a result, 
                  some classes of proteins are not predicted well. The following 
                  issues must be considered when performing an analysis using 
                  the current version of PSORTb:  
                  6.1 Proteins resident at multiple localization sites: 
                    Many proteins can exist at multiple localization sites. Examples 
                    of such proteins include integral membrane proteins with large 
                    periplasmic domains, or autotransporters, which contain an 
                    outer membrane pore domain and a cleaved extracellular domain. 
                    The current version of PSORTb handles this situation by flagging 
                    proteins which show a distribution of localization scores 
                    favouring two sites, rather than one. It is important to examine 
                    the distrubtion of localization scores carefully in order 
                    to determine if your submitted protein may have multiple localization 
                    sites and if so, which two sites are involved. 6.2 Lipoproteins: The current version of PSORTb does 
                    not detect lipoprotein motifs.  6.3 Precision vs. Recall: PSORTb has been designed 
                    to yield as high a precision level as possible, at the expense 
                    of recall. Programs which make predictions at all costs often 
                    provide incorrect or incomplete results, which can be propagated 
                    through annotated databases, datasets and reports in the literature. 
                    We believe that a confident prediction is more valuable than 
                    any prediction, and we have designed the program to this end. 
                    Note, however, that a user may choose to use their own reduced 
                    cutoff score in generating final predictions. |