PSORT.ORG - Protein Subcellular Localization 
Prediction Resources | Updates | Contact  
PSORT.ORG Menu

PSORT.org provides links to the PSORT family of programs for subcellular localization prediction as well as other datasets and resources relevant to localization prediction. The page is currently hosted by the Brinkman Laboratory at Simon Fraser University, and our goal is to provide an open-source resource centre for researchers interested in subcellular localization prediction.

Please choose from the following PSORT programs for localization prediction:

Locally hosted resources:

  • PSORTdb - contains updated pre-computed PSORTb v.3 and PSORTb v.2.1 SCL prediction results of the completely sequenced archaeal and bacterial genomes from NCBI.
    A two-component searchable and browsable database. ePSORTdb contains bacterial proteins of experimentally verified localization used in training and testing of PSORTb. cPSORTdb contains predictions of localization for bacterial genomes. The Old version of PSORTdb can be accessed here.
  • Standalone PSORTb for Linux A downloadable version of PSORTb which can be run locally.
  • Datasets of Proteins of Known Localization Datasets of proteins used to train and evaluate PSORTb. Note: the datasets used in PSORTb development can now be accessed through ePSORTdb.
  • Precomputed Genomes with PSORTb v.2.0 Archived version of precomputed PSORTb v.2.0 results for available bacterial genomes.
  • Motifs and Profiles Associated with Specific Localizations Motifs and Profiles characteristic of specific localization sites used in PSORTb's Motif, Profile, and OMPMotif modules.

PSORTb and PSORTdb are maintained by the Brinkman Laboratory, Simon Fraser University, British Columbia, Canada. PSORT and PSORT II are maintained by Kenta Nakai, at the Human Genome Center, Institute for Medical Science, University of Tokyo, Japan. iPSORT is maintained by Hideo Bannai at the Human Genome Center.


Other predictive methods, datasets and resources:

The following is a collection of links relevant to subcellular localization prediction. If you would like to see a link to a particular program or resource added to this page, please contact us.

At the bottom of the page, we have also provided a suggested reading list containing selected review articles describing SCL and SCL prediction.

Other prokaryotic subcellular localization predictors (with web servers):

  • PRED-TAT (Bagos et al, 2010) predicts TAT and Sec signal peptides.
  • CW-PRED (Litou et al, 2008) predicts cell wall-attached proteins in Gram-positive bacteria using HMM.
  • PRED-SIGNAL (Bagos et al, 2009) predicts signal peptides for Archaea.
  • PRED-LIPO (Bagos et al, 2008) predicts lipoprotein signal peptides of Gram-positive bacteria using HMM.
  • iLoc-Gneg (Xiao et al, 2011) uses Gene Ontology and sequence information to predict 8 sites in Gram-negative bacteria
  • NClassG+ (Restrepo-Montoya et al, 2011) a sequence-based classifier for identifying non-classically secreted Gram-positive bacterial proteins.
  • Gpos-mPLoc (Shen and Chou, 2009) and Gneg-mPLoc (Shen and Chou, 2010) predict bacterial subcellular localization by using gene ontology, functional domain, and sequential evolution.
  • SOSUI-GramN (Imai et al, 2008) predicts Gram-negative localizations based on N- and C-terminal signal sequences.
  • Augur (Billion et al, 2006) is a computational pipeline for Gram-positive bacterial whole-genome sufrace protein predictions.
  • SubcellPredict (Niu et al, 2008) uses AdaBoost algorithm to predict cytoplasmic, periplasmic and extracellular localizations sites for prokaryotic organisms.
  • P-classifier (Wang et al, 2005) predicts subcellular localizations of proteins for Gram-negative bacteria based on amino acid subalphabets and a combination of multiple support vector machines
  • PSLDoc (Chang et al, 2008) uses document classification techniques and incorporates a probabilistic latent semantic analysis with a support vector machine model, for prediction on prokaryotes and eukaryotes.
  • TBpred (Rashid et al, 2007) is a prediction server that predicts four subcellular localization (cytoplasmic,integral membrane,secretory and membrane attached by lipid anchor) of mycobacterial proteins.
  • PSL101 (Su et al, 2007) is a hybrid prediction method for Gram-negative bacteria that combines a one-versus-one support vector machine(SVM) model and a structure homology approach
  • SLP-Local (Matsuda et al, 2005) predicts localizations for chloroplast, mitochondria, secretory pathway, and other locations (nucleus or cytosol) for eukaryotic proteins, as well as cytoplasm, extracell, and periplasm for Gram negative organisms.
  • Gpos-PLoc (Shen and Chou, 2007) and Gneg-PLoc (Chou and Shen, 2006) use K-nearest neighbor-based classifier to predict localizations for Gram-positive and Gram-negative bacteria, respectively.
  • CELLO version 2 (Yu et al, 2006) uses a two-level Support Vector Machine system to assign localizations to both prokaryotic and eukaryotic proteins. Version 1 of the software is described in the Yu et al, 2004 paper.
  • PSLpred (Bhasin et al, 2005) is a localization prediction tool for Gram-negative bacteria which utilizes support vector machine and PSI-BLAST to generate predictions for 5 localization sites.
  • Proteome Analyst's Subcellular Localization Server (Lu et al, 2004) This specialized server available at the PENCE Proteome Analyst site is able to classify Gram-negative, Gram-positive, fungi, plant and animal proteins to many localization sites. A database of predictions is also available and is described below.
  • LOCtree (Nair and Rost, 2005). LOCtree is a eukaryotic and prokaryotic localization prediction tool available at the CUBIC site. Databases of localization predictions made by CUBIC's servers are also available and are described below.
  • SubLoc (Hua and Sun, 2001) uses Support Vector Machine to assign a prokaryotic protein to the cytoplasmic, periplasmic, or extracellular sites, and a eukaryotic protein to the cytoplasmic, mitochondrial, nuclear, or extracellular sites. A modified version of SubLoc was used in PSORT-B v.1.1 to differentiate cytoplasmic and non-cytoplasmic proteins.
  • SignalP 4.0 (Petersen et al, 2011), (Bendtsen et al, 2004) predicts traditional N-terminal signal peptides in both prokaryotic and eukaryotic proteins.
  • TatP (Bendtsen et al, 2005) predicts twin-arginine signal peptides in bacteria.
  • LipoP (Juncker et al,2003) uses HMM to predict lipoprotein signal peptides in Gram-negative bacteria.
  • Signal-BLAST (Frank and Sippl, 2008) uses BLAST to predict dignal peptides in bacteria.

Other prokaryotic subcellular localization prediction methods (without web servers):

  • Wang et al, 2011 predict protein SCL by pseudo amino acid composition with a segment-weighted and features-combined approach.
  • FFT-based SCL predictor (Wang et al, 2007) is a fast Fourier transform-based support vector machine for subcellular localization prediction using different substitution models
  • GNBSL (Guo et al, 2006) generates subcellular localization prediction for Gram negative bacteria using a combination of several different SVM's based on the PSSM and PSFM generated from the input protein
  • HensBC (Bulashevska and Eils, 2006) predicts localizations by constructing a hierarchical ensemble of classifiers, namely Bayesian classifiers based on Markov chain models

Other eukaryotic subcellular localization predictors:

  • PredSL (Petsalaki et al, 2006) uses neural networks, Markov chains and HMMs to predict eukaryotic protein SCLs based on their N-terminal amino acid sequences.
  • PSCL (Wang et al, 2011) uses Interpro domains to predict plant protein SCLs
  • BCAR SCL prediction (Yoon and Lee, 2011) predicts plant, animal and fungal protein SCLs by boosting association rules.
  • SCLpred (Mooney et al, 2011) predicts SCLs for animals and fungi by N-to-1 neural networks.
  • Discriminative HMMs (Lin et al, 2011) predicts yeast SCLs using motifs that are present in a compartment but absent in other, nearby, compartments by utilizing an hierarchical structure that mimics the protein sorting mechanism.
  • SecretP (Yu et al, 2010) predicts mammalian secreted proteins using PseAA and SVMs
  • TESTLoc (Shen and Burger, 2010) predicts 9 plant protein subcellular localizations for EST-DNA input.
  • PROlocalizer (Laurila and Vihinen, 2010) predicts 12 animal protein localization by integrating 11 methods together.
  • Plant-mPLoc (Shen and Chou, 2010) predicts plant protein subcellular localization by Gene Ontology, functional domain, and 3 modes of pseduo-amino acid composition.
  • YLoc (Briesemeister et al, 2010, Briesemeister et al, 2010) provides attributes explanations for users and mutliple localization prediction capabilities for animal, plant and fungal protein subcellular localizations.
  • KnowPredsite (Lin et al, 2009) predicts single and multiple localizations based on local similarity of proteins at different sites.
  • SherLoc2 (Briesemeister et al, 2009) predicts animal, plant and fungal protein subcellualr localizations using sequence-based and text-based features.
  • MultiLoc2 (Blum et al, 2009) predicts animal, plant and fungal protein subcellularlocalizations by integrating phylogeny and Gene Ontology terms to the new version of the software.
  • Hum-mPLoc 2.0 (Shen and Chou, 2009) is an updated version of Hum-mPLoc.
  • Signal-BLAST (Frank and Sippl, 2008) uses BLAST to predict dignal peptides in eukaryotes and bacteria.
  • RSLpred (Kaundal and Raghava, 2009) predicts subcellular localization of rice (Oryza sativa) proteins.
  • SubCellProt (Garg et al, 2009) uses k Nearest Neighbor (k-NN) and Probabilistic Neural Network (PNN) to classify proteins into 11 subcellular localizations.
  • ESLpred2 (Garg and Raghava, 2008) is an updated version of ESLpred and can predict localizations for animal, plant, and fungus proteins.
  • AdaBoost Learner (Jin et al, 2008) predicts 12 eukaryotic localizations using the AdaBoost algorithm.
  • SubcellPredict (Niu et al, 2008) uses AdaBoost algorithm to predict cytoplasmic, nuclear, mitochondrial, and extracellular localizations sites for eukaryotic organisms.
  • PSLDoc (Chang et al, 2008) uses document classification techniques and incorporates a probabilistic latent semantic analysis with a support vector machine model, for prediction on prokaryotes and eukaryotes.
  • EpiLoc (Brady and Shatkay, 2008) is a text-based system for predicting animal, plant and fungal protein subcellular locations.
  • ProLoc-GO (Huang et al, 2008) utilizes Gene Ontology terms for sequenced-based prediction of subcellular localization.
  • AAIndexLoc (Tantoso and Li, 2007) predicts protein subcellular localization by using amino acid composition and physicochemical properties.
  • SLPFA (Tamura and Akutsu, 2007) predicts localizations by feature vectors based on amino acid composition (frequency) and sequence alignment. Subcellular locations predicted include chloroplast, mitochondria, secretory pathway, and other locations (nucleus or cytosol) for eukaryotic proteins
  • YimLOC (Shen and Burger, 2007) integrates previously published subcellular localization prediction tools using a stacked decision tree and makes predictions for mitochondrial proteins.
  • SLP-Local (Matsuda et al, 2005) predicts localizations for chloroplast, mitochondria, secretory pathway, and other locations (nucleus or cytosol) for eukaryotic proteins, as well as cytoplasm, extracell, and periplasm for Gram negative organisms.
  • SherLoc (Shatkay et al, 2007) intergrates several sequence and text-based features and provides predictions for plant, animal, and fungal proteins.
  • SLPS (Jia et al, 2007), or Subcellular Localization Predicting System, predicts localization using a Nearest Neighbor Algorithm (NNA) and incorporating a protein functional domain profile.
  • Hum-mPLoc (Shen and Chou, 2007) is a localization predictor specific for human proteins. It uses an ensemble classifier that handles cases where a human protein has multiple possible location sites.
  • Hum-PLoc (Chou and Shen, 2006) uses a KNN classifier to predict localizations of human proteins.
  • Euk-mPLoc (Chou and Shen, 2007) (Chou and Shen, 2010) is a general eukaryotic predictor which hybridizes gene ontology information, functional domain information, and sequential evolutionary information to predict eukaryotic protein subcellular localization.
  • Euk-PLoc (Shen et al, 2007) is a general eukarytoic predictor that uses KNN (K-Nearest Neighbor)based algorithm to predict localizations.
  • Plant-PLoc (Chou and Shen, 2007) is a plant-specific predictor that uses KNN algorithm to predict localizations.
  • BaCelLo (Pierleoni et al, 2006) is a predictor for five classes of eukaryotic subcellular localization (secretory pathway, cytoplasm, nucleus, mitochondrion and chloroplast) and it is based on different SVMs organized in a decision tree.
  • Protein Prowler version 1.2 (Hawkins and Boden, 2006) uses a multi-layer classifer system for predicting the subcellular localization of proteins based on their amino acid sequence. It classifies eukaryotic targeting signals as secretory, mitochondrion, chloroplast or other. Version 1.1 was originally described in Boden and Hawkins, 2005 paper.
  • pTARGET (Guda 2006), (Guda and Subramaniam, 2005) uses amino acid composition and localization-specific Pfam domains to assign a eukaryotic protein to one of nine localization sites.
  • CELLO version 2 (Yu et al, 2006) uses a two-level Support Vector Machine system to assign localizations to both prokaryotic and eukaryotic proteins.
  • Golgi Localization Predictor (Yuan and Teasdale, 2002) predicts Golgi Type II membrane proteins and can discriminate between proteins destined for the Golgi apparatus or other post-Golgi locations.
  • pSLIP (Sarda et al, 2005) uses support vector machine and multiple physiochemical properties of amino acids to assign a eukaryotic protein to one of six localization sites.
  • HSLpred (Bhasin et al, 2005) is a localization prediction tool for human proteins which utilizes support vector machine and PSI-BLAST to generate predictions for 4 localization sites.
  • LOCSVMPSI (Xie et al, 2005) is a eukaryotic localization prediction method that incorporates evolutionary information into its predictions. The method uses PSI-BLAST and support vector machine to generate predictions for up to 12 localization sites.
  • PSLT (Scott et al, 2004) is a Bayesian network-based method that predicts human protein localization based on motif/domain co-occurence. The tool is not yet available online, however its predictions for 9793 human proteins in SWISS-PROT are available for download from the PSLT site.
  • ESLPred (Bhasin and Raghava, 2004) uses Support Vector Machine and PSI-BLAST to assign eukaryotic proteins to the nucleus, mitochondrion, cytoplasm, or extracellular space.
  • Proteome Analyst's Subcellular Localization Server (Lu et al, 2004) This specialized server available at the PENCE Proteome Analyst site is able to classify Gram-negative, Gram-positive, fungi, plant and animal proteins to many localization sites. A database of predictions is also available and is described below.
  • LOCtree (Nair and Rost, 2005). LOCtree is a eukaryotic and prokaryotic localization prediction tool available at the CUBIC site. Databases of localization predictions made by CUBIC's servers are also available and are described below.
  • SecretomeP (Bendtsen et al, 2004) predicts eukaryotic proteins which are secreted via a non-traditional secretory mechanism.
  • SignalP (Bendtsen et al, 2004) predicts traditional N-terminal signal peptides in both prokaryotic and eukaryotic proteins.
  • SubLoc (Hua and Sun, 2001) uses Support Vector Machine to assign a prokaryotic protein to the cytoplasmic, periplasmic, or extracellular sites, and a eukaryotic protein to the cytoplasmic, mitochondrial, nuclear, or extracellular sites. A modified version of SubLoc was used in PSORT-B v.1.1 to differentiate cytoplasmic and non-cytoplasmic proteins.
  • TargetP (Emanuelsson et al, 2000) predicts the presence of signal peptides, chloroplast transit peptides, and mitochondrial targeting peptides for plant proteins, and the presence of signal peptides and mitochondrial targeting peptides for eukaryotic proteins.
  • Predotar is designed to predict the presence of mitochondrial and plastid targeting peptides in plant sequences.

Other eukaryotic subcellular localization prediction methods (without web servers):

  • GO-TLM (Mei et al, 2011) uses a Gene Ontology transfer model to predict eukaryotic protein SCLs.
  • Wang et al, 2011 predicts yeast protein SCL with frequent pattern tree approach (FPT)
  • Tian et al, 2011 predict protein SCLs by combining PCA and WSVMs.
  • Liao et al, 2011 predict apoptosis protein SCLs with PseAAC by incorporating tripeptide composition.
  • M(3)-SVM (Yang and Lu, 2010) uses an ensemble classifier that includes gene ontology (GO) semantic information, amino acid composition with secondary structure and solvent accessibility information to predict SCLs.
  • ngLOC (King and Guda, 2007) uses an n-gram-based Bayesian classifier that predicts the localization of a protein sequence over ten distinct subcellular organelles. An enhanced version of ngLOC was developed to estimate the subcellular proteomes of eight eukaryotic organisms: yeast, nematode, fruitfly, mosquito, zebrafish, chicken, mouse, and human.

Nucleus-specific localization predictors:

  • NoD (Scott et al, 2011) predicts human nucleolar SCL using neural network algorithm.
  • SpectrumKernel+ (Mei and Fei, 2010) predicts subnuclear localizations by embedding into implicit size-varying motifs the multi-aspect amino acid physiochemical properties captured by amino acid classification approaches.
  • NLStradamus (Nguyen Ba et al, 2009) is a simple Hidden Markov Model for nuclear localization signal prediction.
  • Nuc-PLoc (Shen and Chou, 2007) is a web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM.
  • NUCLEO (Hawkins et al, 2007) predicts possible nuclear localization by taking into consideration of dually localized proteins. It uses an SVM-based approach with a custom kernel that employs a composite spectrum (or multiple k-mer) encoding conjoined with a bit vector indicating the presence or absence of a range of sequence motifs known to be important for nuclear proteins.
  • NucPred (Brameier et al, 2007) predicts possible nuclear localization by using a genetic programming-based algorithm. Previous version was described in Heddad et al, 2004 paper.
  • ProLoc (Huang et al, 2007) predicts subnuclear localizations using an evolutionary SVM based classifier with automatic selection from a large set of physicochemical composition (PCC) features.
  • Subnuclear Compartments Prediction System (Lei and Dai, 2006), (Lei and Dai, 2005) predicts subnuclear localization by combining an SVM-based system for sequence analysis with a nearest-neighbor classifier using a similarity measure derived from the GO annotation terms for the protein sequences.
  • NetNES (la Cour et al, 2004) predicts nuclear export signals using neural network and HMMs.
  • predictNLS (Cokol et al, 2000) uses nuclear localization signal motifs to predict whether a protein might be localized to the nucleus.

Viral protein subcellular localization predictors:

  • Virus-mPLoc (Shen and Chou, 2010) predicts viral protein subcellular localization with the ability to predict multiple localizations for a protein.
  • Virus-PLoc (Shen and Chou, 2007) predicts viral protein subcellular localization using a fusion of classifiers implemented with K-nearest neighbor rules and Swissprot annotated viral proteins as training data.

Other subcellular localization-related databases:
  • OMPdb (Tsirigos et al, 2011) is a database of a comprehensive collection of beta-barrel outer membrane proteins in Gram-negative bacteria.
  • ExTopoDB (Tsaousis et al, 2011) is a database of experimentally derived topological models of transmembrane proteins.
  • LocDB (Rastogi and Rost, 2011) is a manually curated database with experimental annotations for the subcellular localizations of proteins in Homo sapiens and Arabidopsis thaliana.
  • FGsub (Sun et al, 2010) is a website that contains SCL predictions results for fungal pathogen Fusarium graminearum.
  • CoBaltDB (Goudenège et al, 2010) is a database of prokaryotic subcellular localization predictions that integrates the prediction results of many general SCL predictors as well as specific signal sequence or cleavage site predictors.
  • LocateP-DB (Zhou et al, 2008) is a database of precomputed Gram-positive genomic protein subcellular localization predictions.
  • DBMLoc (Zhang et al, 2008) is a database of proteins with multiple subcellular localizations.
  • TOPDOM (Tusnady et al, 2008) is a database of domains and sequence motifs located consistently on the same side of the membrane in alpha-helical transmembrane proteins.
  • eSLDB (Pierleoni et al, 2007) collects the annotations of subcellular localizations of eukaryotic proteomes based on experimental results, homology, and computational predictions.
  • SUBA (Heazlewood et al, 2007) is an Arabidopsis subcellular localization database with annotations based on experimental results, literature references, Swiss-Prot annotations, and computational predictions.
  • FTFLP Database (Li et al, 2006) contains a collection of Arabidopsis protein localizations verified using fluorescent tagging of full-length proteins.
  • SPdb (Choo et al, 2005) is a signal peptide database containing a repository of experimentally verified and predicted signal peptides.
  • NESbase (la Cour et al, 2003) is a database with a collection of nuclear export signals.
  • LOCATE (Sprenger et al, 2007) (Fink et al, 2006) is a database that houses data describing the membrane organization and subcellular localization of human and mouse proteins.
  • PDB_TM (Tusnady et al, 2005) is a database of transmembrane proteins with known 3D structures.
  • PA-GOSUB (Lu et al, 2005) is a database collecting the localization predictions made by the Proteome Analyst tool.
  • Organelle DB (Wiwattwatana and Kumar, 2005) is a database of eukaryotic proteins found at various organelles and subcellular structures.
  • AMPDB (Heazlewood and Millar, 2005) is a database of known and predicted mitochondrial proteins in the plant species Arabidopsis thaliana.
  • MITOMAP (Brandon et al, 2005) is a database of information related to the human mitochondrial genome.
  • DBSubLoc (Guo et al, 2004): A dataset of proteins with annotated subcellular localizations according to SWISS-PROT and PIR.
  • LOCtarget (Nair and Rost, 2004) is a database of LOCtree predictions for structural genomics targets. LOC3D (Nair and Rost, 2003) is a database of predicted localizations for eukaryotic proteins with 3D structures. LOCkey (Nair and Rost, 2002) contains predicted localizations for the human, Arabidopsis, fly, yeast and worm genomes based on Swiss-Prot keywords. LOChom (2002) is a database of predicted localizations based on homology to experimentally annotated proteins.
  • SignalP (Nielsen et al, 1997): The dataset of prokaryotic and eukaryotic secreted and non-secreted proteins used to train SignalP, and also used to train PSORTb's signal peptide prediction module.
  • Signal Peptides (Menne at al, 2000): The dataset of prokaryotic and eukaryotic secreted and non-secreted proteins used in an independent evaluation of several signal peptide prediction methods, and used to test PSORTb's signal peptide prediction module

Transmembrane alpha-helix predictors and membrane prediction software:

Beta-barrel outer membrane protein predictors:

Suggested reading:

Jennifer Gardy and Fiona Brinkman: "Methods for predicting bacterial protein subcellular localization", Nature Reviews Microbiology, 4(10):741-751 (2006).

Gisbert Schneider and Uli Fechner: "Advances in the prediction of protein targeting signals", Proteomics, 4(6):1571-1580 (2004).

Olof Emanuelsson: "Predicting protein subcellular localisation from amino acid sequence information", Briefings in Bioinformatics, 3 (4):361-376 (2002).

Kenta Nakai: "Protein sorting signals and prediction of subcellular localization", Adv. Protein Chem., 54:277-344 (2000).

 

 


[ Resources | Contact ]