PSORT.ORG - Protein Subcellular Localization 
Prediction Resources | Updates | Contact  
PSORT.ORG Menu

PSORT.org provides links to the PSORT family of programs for subcellular localization prediction as well as other datasets and resources relevant to localization prediction. The page is currently hosted by the Brinkman Laboratory at Simon Fraser University, and our goal is to provide an open-source resource centre for researchers interested in subcellular localization prediction.

PSORT family of programs for protein subcellular localization prediction and analysis

Related resources:

PSORTb and PSORTdb are maintained by the Brinkman Laboratory, Simon Fraser University, British Columbia, Canada. PSORT and PSORT II are maintained by Kenta Nakai, at the Human Genome Center, Institute for Medical Science, University of Tokyo, Japan. iPSORT is maintained by Hideo Bannai at the Human Genome Center.


Other predictive methods, datasets and resources:

The following is a collection of links relevant to subcellular localization prediction. If you would like to see a link to a particular program or resource added to this page, please contact us.

At the bottom of the page, we have also provided a suggested reading list containing selected review articles describing SCL and SCL prediction.

Other prokaryotic subcellular localization predictors (with web servers):

Active

Archived

  • Augur (Billion et al, 2006) is a computational pipeline for Gram-positive bacterial whole-genome sufrace protein predictions.
  • NClassG+ (Restrepo-Montoya et al, 2011) a sequence-based classifier for identifying non-classically secreted Gram-positive bacterial proteins.
  • P-classifier (Wang et al, 2005) predicts subcellular localizations of proteins for Gram-negative bacteria based on amino acid subalphabets and a combination of multiple support vector machines
  • PSLDoc (Chang et al, 2008) uses document classification techniques and incorporates a probabilistic latent semantic analysis with a support vector machine odel, for prediction on prokaryotes and eukaryotes.
  • SubLoc (Hua and Sun, 2001) uses Support Vector Machine to assign a prokaryotic protein to the cytoplasmic, periplasmic, or extracellular sites, and a eukaryotic protein to the cytoplasmic, mitochondrial, nuclear, or extracellular sites. A modified version of SubLoc was used in PSORT-B v.1.1 to differentiate cytoplasmic and non-cytoplasmic proteins.

Other prokaryotic subcellular localization prediction methods (without web servers):

  • FFT-based SCL predictor (Wang et al, 2007) is a fast Fourier transform-based support vector machine for subcellular localization prediction using different substitution models
  • GNBSL (Guo et al, 2006) generates subcellular localization prediction for Gram negative bacteria using a combination of several different SVM's based on the PSSM and PSFM generated from the input protein
  • HensBC (Bulashevska and Eils, 2006) predicts localizations by constructing a hierarchical ensemble of classifiers, namely Bayesian classifiers based on Markov chain models
  • Wang et al, 2011 predict protein SCL by pseudo amino acid composition with a segment-weighted and features-combined approach.

Other eukaryotic subcellular localization predictors:

Active

  • AAIndexLoc (Tantoso and Li, 2007) predicts protein subcellular localization by using amino acid composition and physicochemical properties.
  • BaCelLo (Pierleoni et al, 2006) is a predictor for five classes of eukaryotic subcellular localization (secretory pathway, cytoplasm, nucleus, mitochondrion and chloroplast) and it is based on different SVMs organized in a decision tree.
  • BCAR SCL prediction (Yoon and Lee, 2011) predicts plant, animal and fungal protein SCLs by boosting association rules.
  • CELLO version 2 (Yu et al, 2006) uses a two-level Support Vector Machine system to assign localizations to both prokaryotic and eukaryotic proteins.
  • DeepLoc (Almagro Armenteros et al. 2017) predicts protein subcellular localization in ten categories from sequence alone using deep learning.
  • Discriminative HMMs (Lin et al, 2011) predicts yeast SCLs using motifs that are present in a compartment but absent in other, nearby, compartments by utilizing an hierarchical structure that mimics the protein sorting mechanism.
  • ESLPred (Bhasin and Raghava, 2004) uses Support Vector Machine and PSI-BLAST to assign eukaryotic proteins to the nucleus, mitochondrion, cytoplasm, or extracellular space.
  • Euk-mPLoc (Chou and Shen, 2007) (Chou and Shen, 2010) is a general eukaryotic predictor which hybridizes gene ontology information, functional domain information, and sequential evolutionary information to predict eukaryotic protein subcellular localization.
  • Euk-PLoc (Shen et al, 2007) is a general eukarytoic predictor that uses KNN (K-Nearest Neighbor)based algorithm to predict localizations.
  • Golgi Localization Predictor (Yuan and Teasdale, 2002) predicts Golgi Type II membrane proteins and can discriminate between proteins destined for the Golgi apparatus or other post-Golgi locations.
  • HSLpred (Bhasin et al, 2005) is a localization prediction tool for human proteins which utilizes support vector machine and PSI-BLAST to generate predictions for 4 localization sites.
  • Hum-mPLoc (Shen and Chou, 2007) is a localization predictor specific for human proteins. It uses an ensemble classifier that handles cases where a human protein has multiple possible location sites.
  • LOCtree3 ( Goldberg et al, 2014). LOCtree3 is a eukaryotic and prokaryotic localization prediction tool available at the Rost Lab Wiki Page.
  • MultiLoc2 (Blum et al, 2009) predicts animal, plant and fungal protein subcellularlocalizations by integrating phylogeny and Gene Ontology terms to the new version of the software.
  • Plant-mPLoc (Shen and Chou, 2010) predicts plant protein subcellular localization by Gene Ontology, functional domain, and 3 modes of pseduo-amino acid composition.
  • PredSL (Petsalaki et al, 2006) uses neural networks, Markov chains and HMMs to predict eukaryotic protein SCLs based on their N-terminal amino acid sequences.
  • Protein Prowler version 1.2 (Hawkins and Boden, 2006) uses a multi-layer classifer system for predicting the subcellular localization of proteins based on their amino acid sequence. It classifies eukaryotic targeting signals as secretory, mitochondrion, chloroplast or other. Version 1.1 was originally described in Boden and Hawkins, 2005 paper.
  • Proteome Analyst's Subcellular Localization Server (Lu et al, 2004) This specialized server available at the PENCE Proteome Analyst site is able to classify Gram-negative, Gram-positive, fungi, plant and animal proteins to many localization sites. A database of predictions is also available and is described below.
  • PSLDoc (Chang et al, 2008) uses document classification techniques and incorporates a probabilistic latent semantic analysis with a support vector machine model, for prediction on prokaryotes and eukaryotes.
  • RSLpred (Kaundal and Raghava, 2009) predicts subcellular localization of rice (Oryza sativa) proteins.
  • SecretomeP (Bendtsen et al, 2004) predicts eukaryotic proteins which are secreted via a non-traditional secretory mechanism.
  • SecretP (Yu et al, 2010) predicts mammalian secreted proteins using PseAA and SVMs
  • SherLoc2 (Briesemeister et al, 2009) predicts animal, plant and fungal protein subcellualr localizations using sequence-based and text-based features.
  • Signal-BLAST (Frank and Sippl, 2008) uses BLAST to predict dignal peptides in eukaryotes and bacteria.
  • SignalP (Bendtsen et al, 2004) predicts traditional N-terminal signal peptides in both prokaryotic and eukaryotic proteins.
  • SLPFA (Tamura and Akutsu, 2007) predicts localizations by feature vectors based on amino acid composition (frequency) and sequence alignment. Subcellular locations predicted include chloroplast, mitochondria, secretory pathway, and other locations (nucleus or cytosol) for eukaryotic proteins
  • SLP-Local (Matsuda et al, 2005) predicts localizations for chloroplast, mitochondria, secretory pathway, and other locations (nucleus or cytosol) for eukaryotic proteins, as well as cytoplasm, extracell, and periplasm for Gram negative organisms.
  • TargetP (Emanuelsson et al, 2000) predicts the presence of signal peptides, chloroplast transit peptides, and mitochondrial targeting peptides for plant proteins, and the presence of signal peptides and mitochondrial targeting peptides for eukaryotic proteins.
  • YLoc (Briesemeister et al, 2010, Briesemeister et al, 2010) provides attributes explanations for users and mutliple localization prediction capabilities for animal, plant and fungal protein subcellular localizations.

Archived

  • AdaBoost Learner (Jin et al, 2008) predicts 12 eukaryotic localizations using the AdaBoost algorithm.
  • EpiLoc (Brady and Shatkay, 2008) is a text-based system for predicting animal, plant and fungal protein subcellular locations.
  • Hum-mPLoc 2.0 (Shen and Chou, 2009) is an updated version of Hum-mPLoc.
  • Hum-PLoc (Chou and Shen, 2006) uses a KNN classifier to predict localizations of human proteins.
  • KnowPredsite (Lin et al, 2009) predicts single and multiple localizations based on local similarity of proteins at different sites.
  • LOCSVMPSI (Xie et al, 2005) is a eukaryotic localization prediction method that incorporates evolutionary information into its predictions. The method uses PSI-BLAST and support vector machine to generate predictions for up to 12 localization sites.
  • Plant-PLoc (Chou and Shen, 2007) is a plant-specific predictor that uses KNN algorithm to predict localizations.
  • Predotar is designed to predict the presence of mitochondrial and plastid targeting peptides in plant sequences.
  • PROlocalizer (Laurila and Vihinen, 2010) predicts 12 animal protein localization by integrating 11 methods together.
  • ProLoc-GO (Huang et al, 2008) utilizes Gene Ontology terms for sequenced-based prediction of subcellular localization.
  • PSCL (Wang et al, 2011) uses Interpro domains to predict plant protein SCLs
  • pSLIP (Sarda et al, 2005) uses support vector machine and multiple physiochemical properties of amino acids to assign a eukaryotic protein to one of six localization sites.
  • PSLT (Scott et al, 2004) is a Bayesian network-based method that predicts human protein localization based on motif/domain co-occurence. The tool is not yet available online, however its predictions for 9793 human proteins in SWISS-PROT are available for download from the PSLT site.
  • pTARGET (Guda 2006), (Guda and Subramaniam, 2005) uses amino acid composition and localization-specific Pfam domains to assign a eukaryotic protein to one of nine localization sites.
  • SCLpred (Mooney et al, 2011) predicts SCLs for animals and fungi by N-to-1 neural networks.
  • SLPS (Jia et al, 2007), or Subcellular Localization Predicting System, predicts localization using a Nearest Neighbor Algorithm (NNA) and incorporating a protein functional domain profile.
  • SubCellProt (Garg et al, 2009) uses k Nearest Neighbor (k-NN) and Probabilistic Neural Network (PNN) to classify proteins into 11 subcellular localizations.
  • SubcellPredict (Niu et al, 2008) uses AdaBoost algorithm to predict cytoplasmic, nuclear, mitochondrial, and extracellular localizations sites for eukaryotic organisms.
  • SubLoc (Hua and Sun, 2001) uses Support Vector Machine to assign a prokaryotic protein to the cytoplasmic, periplasmic, or extracellular sites, and a eukaryotic protein to the cytoplasmic, mitochondrial, nuclear, or extracellular sites. A modified version of SubLoc was used in PSORT-B v.1.1 to differentiate cytoplasmic and non-cytoplasmic proteins.
  • TESTLoc (Shen and Burger, 2010) predicts 9 plant protein subcellular localizations for EST-DNA input.
  • YimLOC (Shen and Burger, 2007) integrates previously published subcellular localization prediction tools using a stacked decision tree and makes predictions for mitochondrial proteins.

Other eukaryotic subcellular localization prediction methods (without web servers):

  • GO-TLM (Mei et al, 2011) uses a Gene Ontology transfer model to predict eukaryotic protein SCLs.
  • Wang et al, 2011 predicts yeast protein SCL with frequent pattern tree approach (FPT)
  • Tian et al, 2011 predict protein SCLs by combining PCA and WSVMs.
  • Liao et al, 2011 predict apoptosis protein SCLs with PseAAC by incorporating tripeptide composition.
  • M(3)-SVM (Yang and Lu, 2010) uses an ensemble classifier that includes gene ontology (GO) semantic information, amino acid composition with secondary structure and solvent accessibility information to predict SCLs.
  • ngLOC (King and Guda, 2007) uses an n-gram-based Bayesian classifier that predicts the localization of a protein sequence over ten distinct subcellular organelles. An enhanced version of ngLOC was developed to estimate the subcellular proteomes of eight eukaryotic organisms: yeast, nematode, fruitfly, mosquito, zebrafish, chicken, mouse, and human.

Nucleus-specific localization predictors:

Active

  • NetNES (la Cour et al, 2004) predicts nuclear export signals using neural network and HMMs.
  • NLStradamus (Nguyen Ba et al, 2009) is a simple Hidden Markov Model for nuclear localization signal prediction.
  • NoD (Scott et al, 2011) predicts human nucleolar SCL using neural network algorithm.
  • Nuc-PLoc (Shen and Chou, 2007) is a web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM.
  • NUCLEO (Hawkins et al, 2007) predicts possible nuclear localization by taking into consideration of dually localized proteins. It uses an SVM-based approach with a custom kernel that employs a composite spectrum (or multiple k-mer) encoding conjoined with a bit vector indicating the presence or absence of a range of sequence motifs known to be important for nuclear proteins.
  • NucPred (Brameier et al, 2007) predicts possible nuclear localization by using a genetic programming-based algorithm. Previous version was described in Heddad et al, 2004 paper.
  • predictNLS (Cokol et al, 2000) uses nuclear localization signal motifs to predict whether a protein might be localized to the nucleus.
  • SpectrumKernel+ (Mei and Fei, 2010) predicts subnuclear localizations by embedding into implicit size-varying motifs the multi-aspect amino acid physiochemical properties captured by amino acid classification approaches.

Archived

  • ProLoc (Huang et al, 2007) predicts subnuclear localizations using an evolutionary SVM based classifier with automatic selection from a large set of physicochemical composition (PCC) features.
  • Subnuclear Compartments Prediction System (Lei and Dai, 2006), (Lei and Dai, 2005) predicts subnuclear localization by combining an SVM-based system for sequence analysis with a nearest-neighbor classifier using a similarity measure derived from the GO annotation terms for the protein sequences.

Viral protein subcellular localization predictors:

Active

Archived

  • Virus-PLoc (Shen and Chou, 2007) predicts viral protein subcellular localization using a fusion of classifiers implemented with K-nearest neighbor rules and Swissprot annotated viral proteins as training data.
Other subcellular localization-related databases:

Active

  • eSLDB ( Pierleoni et al, 2007) collects the annotations of subcellular localizations of eukaryotic proteomes based on experimental results, homology, and computational predictions.
  • ExTopoDB(Tsaousis et al, 2011) is a database of experimentally derived topological models of transmembrane proteins.
  • FGsub (Sun et al, 2010) is a website that contains SCL predictions results for fungal pathogen Fusarium graminearum.
  • FTFLP Database (Li et al, 2006) contains a collection of Arabidopsis protein localizations verified using fluorescent tagging of full-length proteins.
  • LOCATE ( Sprenger et al, 2007) (Fink et al, 2006) is a database that houses data describing the membrane organization and subcellular localization of human and mouse proteins.
  • LocDB(Rastogi and Rost, 2011) is a manually curated database with experimental annotations for the subcellular localizations of proteins in Homo sapiens and Arabidopsis thaliana.
  • MITOMAP (Brandon et al, 2005) is a database of information related to the human mitochondrial genome.
  • NESbase( la Cour et al, 2003) is a database with a collection of nuclear export signals.
  • OMPdb (Tsirigos et al, 2011 ) is a database of a comprehensive collection of beta-barrel outer membrane proteins in Gram-negative bacteria.
  • Organelle DB ( Wiwattwatana and Kumar, 2005) is a database of eukaryotic proteins found at various organelles and subcellular structures.
  • PA-GOSUB (Lu et al, 2005) is a database collecting the localization predictions made by the Proteome Analyst tool.
  • PDB_TM (Tusnady et al, 2005) is a database of transmembrane proteins with known 3D structures.
  • SignalP (Nielsen et al, 1997): The dataset of prokaryotic and eukaryotic secreted and non-secreted proteins used to train SignalP, and also used to train PSORTb's signal peptide prediction module.
  • Signal Peptides (Menne at al, 2000): The dataset of prokaryotic and eukaryotic secreted and non-secreted proteins used in an independent evaluation of several signal peptide prediction methods, and used to test PSORTb's signal peptide prediction module
  • SPdb (Choo et al, 2005) is a signal peptide database containing a repository of experimentally verified and predicted signal peptides.
  • STEPdb (Orfanoudaki et al.) A database of comprehensive characterization of sub-cellular localization and topology of the Escherichia coli proteome
  • SUBA4 (Heazlewood et al, 2007) is an Arabidopsis subcellular localization database with annotations based on experimental results, literature references, Swiss-Prot annotations, and computational predictions.
  • TOPDOM (Tusnady et al, 2008) is a database of domains and sequence motifs located consistently on the same side of the membrane in alpha-helical transmembrane proteins.

Archived

  • AMPDB (Heazlewood and Millar, 2005) is a database of known and predicted mitochondrial proteins in the plant species Arabidopsis thaliana.
  • CoBaltDB (Goudenège et al, 2010) is a database of prokaryotic subcellular localization predictions that integrates the prediction results of many general SCL predictors as well as specific signal sequence or cleavage site predictors.
  • DBMLoc (Zhang et al, 2008) is a database of proteins with multiple subcellular localizations.
  • DBSubLoc (Guo et al, 2004): A dataset of proteins with annotated subcellular localizations according to SWISS-PROT and PIR.
  • FGsub (Sun et al, 2010) is a website that contains SCL predictions results for fungal pathogen Fusarium graminearum.
  • LocateP-DB (Zhou et al, 2008) is a database of precomputed Gram-positive genomic protein subcellular localization predictions.
  • LOCtarget (Nair and Rost, 2004) is a database of LOCtree predictions for structural genomics targets. LOC3D (Nair and Rost, 2003) is a database of predicted localizations for eukaryotic proteins with 3D structures. LOCkey (Nair and Rost, 2002) contains predicted localizations for the human, Arabidopsis, fly, yeast and worm genomes based on Swiss-Prot keywords. LOChom (2002) is a database of predicted localizations based on homology to experimentally annotated proteins.

Transmembrane alpha-helix predictors and membrane prediction software:

Active

Archived

Beta-barrel outer membrane protein predictors:

Active

Archived

Suggested reading:

Jennifer Gardy and Fiona Brinkman: "Methods for predicting bacterial protein subcellular localization", Nature Reviews Microbiology, 4(10):741-751 (2006).

Gisbert Schneider and Uli Fechner: "Advances in the prediction of protein targeting signals", Proteomics, 4(6):1571-1580 (2004).

Olof Emanuelsson: "Predicting protein subcellular localisation from amino acid sequence information", Briefings in Bioinformatics, 3 (4):361-376 (2002).

Kenta Nakai: "Protein sorting signals and prediction of subcellular localization", Adv. Protein Chem., 54:277-344 (2000).

 

 


[ Resources | Contact ]