PSORT.org provides links to the PSORT family of programs for subcellular localization
prediction as well as other datasets and resources relevant to localization
prediction. The page is currently hosted by the Brinkman Laboratory
at Simon Fraser University, and our goal is to provide an open-source
resource centre for researchers interested in subcellular localization
prediction.
PSORT family of programs for protein subcellular localization prediction and analysis
-
-
-
PSORTdb (Peabody et al, 2015) is a database containing cPSORTdb: updated pre-computed PSORTb prediction results for completely sequenced archaeal and bacterial genomes from NCBI and ePSORTdb: proteins of experimentally verified localization used in training and testing.
-
-
-
-
Related resources:
PSORTb and PSORTdb are
maintained by the Brinkman
Laboratory,
Simon Fraser University, British Columbia, Canada.
PSORT and
PSORT II are maintained by Kenta Nakai, at the Human Genome Center, Institute for Medical Science, University
of Tokyo, Japan. iPSORT is maintained by Hideo
Bannai at the Human Genome Center.
Other predictive methods, datasets and resources:
The following is a collection of links relevant to subcellular localization prediction. If you would like to see a link to a particular program or resource added to this page, please
contact us.
At the bottom of the page, we have also provided a suggested reading list containing selected review articles describing
SCL and SCL prediction.
Other prokaryotic subcellular localization predictors (with web servers):
Active
- CELLO version 2 (Yu
et al, 2006) uses a two-level Support Vector Machine system
to assign localizations to both prokaryotic and eukaryotic proteins.
Version 1 of the software is described in the Yu et al, 2004 paper.
- CW-PRED
(Litou
et al, 2008) predicts cell wall-attached proteins in Gram-positive
bacteria using HMM.
- Gpos-mPLoc
(Shen and
Chou, 2009) and Gneg-mPLoc
(Shen and
Chou, 2010) predict bacterial subcellular localization by using
gene ontology, functional domain, and sequential evolution.
- Gpos-PLoc
(Shen
and Chou, 2007) and Gneg-PLoc
(Chou
and Shen, 2006) use K-nearest neighbor-based classifier to predict
localizations for Gram-positive and Gram-negative bacteria, respectively.
- iLoc-Gneg
(Xiao
et al, 2011) uses Gene Ontology and sequence information to
predict 8 sites in Gram-negative bacteria
- LipoP
(Juncker
et al,2003) uses HMM to predict lipoprotein signal peptides
in Gram-negative bacteria.
- LOCtree3
(
Goldberg et al, 2014). LOCtree3 is a eukaryotic and prokaryotic localization prediction tool available at the
Rost Lab Wiki Page.
- PRED-LIPO
(Bagos
et al, 2008) predicts lipoprotein signal peptides of Gram-positive
bacteria using HMM.
- PRED-SIGNAL
(Bagos
et al, 2009) predicts signal peptides for Archaea.
- PRED-TAT
(Bagos et al, 2010)
predicts TAT and Sec signal peptides.
- Proteome Analyst's Subcellular Localization Server (Lu et al, 2004) This specialized server available at the PENCE Proteome Analyst site
is able to classify Gram-negative, Gram-positive, fungi, plant and animal proteins to many localization sites.
A database of predictions is also available and is described below.
- PSL101
(Su
et al, 2007) is a hybrid prediction method for Gram-negative
bacteria that combines a one-versus-one support vector machine(SVM)
model and a structure homology approach
- PSLpred
(Bhasin et al, 2005) is a localization prediction tool for
Gram-negative bacteria which utilizes support vector machine and
PSI-BLAST to generate predictions for 5 localization sites.
- Signal-BLAST
(Frank
and Sippl, 2008) uses BLAST to predict signal peptides in bacteria.
- SignalP 4.0 (
Petersen et al, 2011), (Bendtsen et al, 2004) predicts traditional N-terminal signal
peptides in both prokaryotic and eukaryotic proteins.
- SLP-Local
(Matsuda
et al, 2005) predicts localizations for chloroplast, mitochondria,
secretory pathway, and other locations (nucleus or cytosol) for
eukaryotic proteins, as well as cytoplasm, extracell, and periplasm
for Gram negative organisms.
- SOSUI-GramN
(Imai
et al, 2008) predicts Gram-negative localizations based on N-
and C-terminal signal sequences.
- TatP
(Bendtsen
et al, 2005) predicts twin-arginine signal peptides in bacteria.
- TBpred
(Rashid
et al, 2007) is a prediction server that predicts four subcellular
localization (cytoplasmic, integral membrane, secretory and membrane
attached by lipid anchor) of mycobacterial proteins.
Archived
- Augur
(Billion
et al, 2006) is a computational pipeline for Gram-positive bacterial
whole-genome sufrace protein predictions.
- NClassG+
(Restrepo-Montoya
et al, 2011) a sequence-based classifier for identifying non-classically
secreted Gram-positive bacterial proteins.
- P-classifier
(Wang
et al, 2005) predicts subcellular localizations of proteins
for Gram-negative bacteria based on amino acid subalphabets and
a combination of multiple support vector machines
- PSLDoc
(Chang et
al, 2008) uses document classification techniques and incorporates a probabilistic latent semantic analysis with a support vector machine odel, for prediction on prokaryotes and eukaryotes.
- SubLoc (Hua and Sun, 2001) uses Support Vector Machine to assign a prokaryotic protein to the cytoplasmic, periplasmic, or extracellular sites, and a eukaryotic protein to the cytoplasmic, mitochondrial, nuclear, or extracellular sites. A modified version of SubLoc was used in PSORT-B v.1.1 to differentiate cytoplasmic and non-cytoplasmic proteins.
Other prokaryotic subcellular localization prediction methods (without web servers):
- FFT-based SCL predictor (Wang
et al, 2007) is a fast Fourier transform-based support vector
machine for subcellular localization prediction using different
substitution models
- GNBSL (Guo
et al, 2006) generates subcellular localization prediction for
Gram negative bacteria using a combination of several different
SVM's based on the PSSM and PSFM generated from the input protein
- HensBC (Bulashevska
and Eils, 2006) predicts localizations by constructing a hierarchical
ensemble of classifiers, namely Bayesian classifiers based on Markov
chain models
- Wang
et al, 2011 predict protein SCL by pseudo amino acid composition
with a segment-weighted and features-combined approach.
Other eukaryotic subcellular localization predictors:
Active
- AAIndexLoc (Tantoso
and Li, 2007) predicts protein subcellular localization by using
amino acid composition and physicochemical properties.
- BaCelLo (Pierleoni
et al, 2006) is a predictor for five classes of eukaryotic subcellular
localization (secretory pathway, cytoplasm, nucleus, mitochondrion
and chloroplast) and it is based on different SVMs organized in
a decision tree.
- BCAR
SCL prediction (Yoon
and Lee, 2011) predicts plant, animal and fungal protein SCLs
by boosting association rules.
- CELLO version 2 (Yu
et al, 2006) uses a two-level Support Vector Machine system
to assign localizations to both prokaryotic and eukaryotic proteins.
- DeepLoc (Almagro Armenteros et al. 2017) predicts protein subcellular localization in ten categories from sequence alone using deep learning.
- Discriminative
HMMs (Lin
et al, 2011) predicts yeast SCLs using motifs that are present
in a compartment but absent in other, nearby, compartments by utilizing
an hierarchical structure that mimics the protein sorting mechanism.
- ESLPred (Bhasin and Raghava, 2004) uses Support Vector Machine and
PSI-BLAST to assign eukaryotic proteins to the nucleus, mitochondrion,
cytoplasm, or extracellular space.
- Euk-mPLoc
(Chou
and Shen, 2007) (Chou
and Shen, 2010) is a general eukaryotic predictor which hybridizes
gene ontology information, functional domain information, and sequential
evolutionary information to predict eukaryotic protein subcellular
localization.
- Euk-PLoc
(Shen
et al, 2007) is a general eukarytoic predictor that uses KNN
(K-Nearest Neighbor)based algorithm to predict localizations.
- Golgi
Localization Predictor (Yuan
and Teasdale, 2002) predicts Golgi Type II membrane proteins
and can discriminate between proteins destined for the Golgi apparatus
or other post-Golgi locations.
- HSLpred (Bhasin et al, 2005) is a localization prediction tool for
human proteins which utilizes support vector machine and PSI-BLAST
to generate predictions for 4 localization sites.
- Hum-mPLoc
(Shen
and Chou, 2007) is a localization predictor specific for human
proteins. It uses an ensemble classifier that handles cases where
a human protein has multiple possible location sites.
- LOCtree3
(
Goldberg et al, 2014). LOCtree3 is a eukaryotic and prokaryotic localization prediction tool available at the
Rost Lab Wiki Page.
- MultiLoc2
(Blum
et al, 2009) predicts animal, plant and fungal protein subcellularlocalizations
by integrating phylogeny and Gene Ontology terms to the new version
of the software.
- Plant-mPLoc
(Shen and
Chou, 2010) predicts plant protein subcellular localization
by Gene Ontology, functional domain, and 3 modes of pseduo-amino acid composition.
- PredSL
(Petsalaki
et al, 2006) uses neural networks, Markov chains and HMMs to
predict eukaryotic protein SCLs based on their N-terminal amino
acid sequences.
- Protein Prowler version 1.2 (Hawkins
and Boden, 2006) uses a multi-layer classifer system for predicting the subcellular localization of proteins based on their amino acid sequence. It classifies eukaryotic targeting signals as secretory, mitochondrion, chloroplast or other. Version 1.1 was originally described in Boden and Hawkins, 2005 paper.
- Proteome Analyst's Subcellular Localization Server (Lu et al, 2004) This specialized server available at the PENCE
Proteome Analyst site is able to classify Gram-negative, Gram-positive,
fungi, plant and animal proteins to many localization sites. A database
of predictions is also available and is described below.
- PSLDoc
(Chang et
al, 2008) uses document classification techniques and incorporates
a probabilistic latent semantic analysis with a support vector machine
model, for prediction on prokaryotes and eukaryotes.
- RSLpred
(Kaundal
and Raghava, 2009) predicts subcellular localization of rice
(Oryza sativa) proteins.
- SecretomeP (Bendtsen et al, 2004) predicts eukaryotic proteins which are
secreted via a non-traditional secretory mechanism.
- SecretP
(Yu
et al, 2010) predicts mammalian secreted proteins using PseAA and SVMs
- SherLoc2
(Briesemeister
et al, 2009) predicts animal, plant and fungal protein subcellualr
localizations using sequence-based and text-based features.
- Signal-BLAST
(Frank
and Sippl, 2008) uses BLAST to predict dignal peptides in eukaryotes
and bacteria.
- SignalP (Bendtsen et al, 2004) predicts traditional N-terminal signal
peptides in both prokaryotic and eukaryotic proteins.
- SLPFA
(Tamura
and Akutsu, 2007) predicts localizations by feature vectors
based on amino acid composition (frequency) and sequence alignment.
Subcellular locations predicted include chloroplast, mitochondria,
secretory pathway, and other locations (nucleus or cytosol) for
eukaryotic proteins
- SLP-Local
(Matsuda
et al, 2005) predicts localizations for chloroplast, mitochondria,
secretory pathway, and other locations (nucleus or cytosol) for
eukaryotic proteins, as well as cytoplasm, extracell, and periplasm
for Gram negative organisms.
- TargetP (Emanuelsson et al, 2000) predicts the presence of signal peptides,
chloroplast transit peptides, and mitochondrial targeting peptides
for plant proteins, and the presence of signal peptides and mitochondrial
targeting peptides for eukaryotic proteins.
- YLoc
(Briesemeister
et al, 2010, Briesemeister et al, 2010) provides attributes
explanations for users and mutliple localization prediction capabilities
for animal, plant and fungal protein subcellular localizations.
Archived
- AdaBoost
Learner (Jin
et al, 2008) predicts 12 eukaryotic localizations using the AdaBoost algorithm.
- EpiLoc (Brady
and Shatkay, 2008) is a text-based system for predicting animal,
plant and fungal protein subcellular locations.
- Hum-mPLoc
2.0 (Shen
and Chou, 2009) is an updated version of Hum-mPLoc.
- Hum-PLoc
(Chou
and Shen, 2006) uses a KNN classifier to predict localizations
of human proteins.
- KnowPredsite
(Lin
et al, 2009) predicts single and multiple localizations based on local similarity of proteins at different sites.
- LOCSVMPSI (Xie
et al, 2005) is a eukaryotic localization prediction method
that incorporates evolutionary information into its predictions.
The method uses PSI-BLAST and support vector machine to generate
predictions for up to 12 localization sites.
- Plant-PLoc (Chou
and Shen, 2007) is a plant-specific predictor that uses KNN
algorithm to predict localizations.
- Predotar is designed to predict the presence of mitochondrial
and plastid targeting peptides in plant sequences.
- PROlocalizer
(Laurila and
Vihinen, 2010) predicts 12 animal protein localization by integrating
11 methods together.
- ProLoc-GO
(Huang et
al, 2008) utilizes Gene Ontology terms for sequenced-based prediction
of subcellular localization.
- PSCL
(Wang
et al, 2011) uses Interpro domains to predict plant protein
SCLs
- pSLIP (Sarda et al, 2005) uses support vector machine and multiple
physiochemical properties of amino acids to assign a eukaryotic
protein to one of six localization sites.
- PSLT (Scott et al, 2004) is a Bayesian network-based method that
predicts human protein localization based on motif/domain co-occurence.
The tool is not yet available online, however its predictions for
9793 human proteins in SWISS-PROT are available for download from
the PSLT site.
- pTARGET
(Guda
2006), (Guda
and Subramaniam, 2005) uses amino acid composition and localization-specific
Pfam domains to assign a eukaryotic protein to one of nine localization
sites.
- SCLpred
(Mooney
et al, 2011) predicts SCLs for animals and fungi by N-to-1 neural
networks.
- SLPS (Jia
et al, 2007), or Subcellular Localization Predicting System,
predicts localization using a Nearest Neighbor Algorithm (NNA) and
incorporating a protein functional domain profile.
- SubCellProt
(Garg
et al, 2009) uses k Nearest Neighbor (k-NN) and Probabilistic
Neural Network (PNN) to classify proteins into 11 subcellular localizations.
- SubcellPredict
(Niu
et al, 2008) uses AdaBoost algorithm to predict cytoplasmic,
nuclear, mitochondrial, and extracellular localizations sites for
eukaryotic organisms.
- SubLoc (Hua and Sun, 2001) uses Support Vector Machine to assign a
prokaryotic protein to the cytoplasmic, periplasmic, or extracellular
sites, and a eukaryotic protein to the cytoplasmic, mitochondrial,
nuclear, or extracellular sites. A modified version of SubLoc was
used in PSORT-B v.1.1 to differentiate cytoplasmic and non-cytoplasmic
proteins.
- TESTLoc
(Shen
and Burger, 2010) predicts 9 plant protein subcellular localizations
for EST-DNA input.
- YimLOC
(Shen
and Burger, 2007) integrates previously published subcellular
localization prediction tools using a stacked decision tree and
makes predictions for mitochondrial proteins.
Other eukaryotic subcellular localization prediction methods (without web servers):
- GO-TLM (Mei
et al, 2011) uses a Gene Ontology transfer model to predict
eukaryotic protein SCLs.
- Wang
et al, 2011 predicts yeast protein SCL with frequent pattern
tree approach (FPT)
- Tian
et al, 2011 predict protein SCLs by combining PCA and WSVMs.
- Liao
et al, 2011 predict apoptosis protein SCLs with PseAAC by incorporating
tripeptide composition.
- M(3)-SVM (Yang
and Lu, 2010) uses an ensemble classifier that includes gene
ontology (GO) semantic information, amino acid composition with
secondary structure and solvent accessibility information to predict
SCLs.
- ngLOC (King
and Guda, 2007) uses an n-gram-based Bayesian classifier
that predicts the localization of a protein sequence over ten distinct
subcellular organelles. An enhanced version of ngLOC was developed
to estimate the subcellular proteomes of eight eukaryotic organisms:
yeast, nematode, fruitfly, mosquito, zebrafish, chicken, mouse,
and human.
Nucleus-specific localization predictors:
Active
- NetNES (la
Cour et al, 2004) predicts nuclear export signals using neural
network and HMMs.
- NLStradamus
(Nguyen
Ba et al, 2009) is a simple Hidden Markov Model for nuclear
localization signal prediction.
- NoD
(Scott
et al, 2011) predicts human nucleolar SCL using neural network
algorithm.
- Nuc-PLoc
(Shen
and Chou, 2007) is a web-server for predicting protein subnuclear
localization by fusing PseAA composition and PsePSSM.
- NUCLEO
(Hawkins
et al, 2007) predicts possible nuclear localization by taking
into consideration of dually localized proteins. It uses an SVM-based
approach with a custom kernel that employs a composite spectrum
(or multiple k-mer) encoding conjoined with a bit vector
indicating the presence or absence of a range of sequence motifs
known to be important for nuclear proteins.
- NucPred
(Brameier
et al, 2007) predicts possible nuclear localization by using
a genetic programming-based algorithm. Previous version was described
in Heddad et al, 2004 paper.
- predictNLS (Cokol et al, 2000) uses nuclear localization signal motifs
to predict whether a protein might be localized to the nucleus.
- SpectrumKernel+ (Mei
and Fei, 2010) predicts subnuclear localizations by embedding
into implicit size-varying motifs the multi-aspect amino acid physiochemical
properties captured by amino acid classification approaches.
Archived
Viral protein subcellular localization predictors:
Active
Archived
- Virus-PLoc (Shen and Chou, 2007) predicts viral protein subcellular localization using a fusion of classifiers implemented with K-nearest neighbor rules and Swissprot annotated viral proteins as training data.
Other subcellular localization-related databases:
Active
- eSLDB (
Pierleoni et al, 2007) collects the annotations of subcellular localizations of eukaryotic proteomes
based on experimental results, homology, and computational predictions.
- ExTopoDB(Tsaousis et al, 2011)
is a database of experimentally derived topological models of transmembrane proteins.
- FGsub (Sun et al, 2010)
is a website that contains SCL predictions results for fungal pathogen Fusarium graminearum.
- FTFLP Database
(Li
et al, 2006) contains a collection of Arabidopsis protein localizations verified using fluorescent
tagging of full-length proteins.
- LOCATE (
Sprenger et al, 2007) (Fink et al, 2006) is a database that houses data describing the membrane
organization and subcellular localization of human and mouse proteins.
- LocDB(Rastogi and Rost, 2011)
is a manually curated database with experimental annotations for the subcellular localizations
of proteins in Homo sapiens and Arabidopsis thaliana.
- MITOMAP (Brandon et al, 2005) is a database of information related
to the human mitochondrial genome.
- NESbase(
la Cour et al, 2003) is a database with a collection of nuclear export signals.
- OMPdb
(Tsirigos et al, 2011
) is a database of a comprehensive collection of beta-barrel outer membrane proteins in
Gram-negative bacteria.
- Organelle DB (
Wiwattwatana and Kumar, 2005) is a database of eukaryotic proteins found at various organelles
and subcellular structures.
- PA-GOSUB (Lu et al, 2005)
is a database collecting the localization predictions made by the Proteome Analyst tool.
- PDB_TM (Tusnady et al,
2005) is a database of transmembrane proteins with known 3D structures.
- SignalP (Nielsen et al, 1997): The dataset of prokaryotic and eukaryotic secreted and non-secreted
proteins used to train SignalP, and also used to train PSORTb's signal peptide prediction module.
- Signal Peptides
(Menne at al, 2000): The dataset of prokaryotic and eukaryotic secreted and non-secreted
proteins used in an independent evaluation of several signal peptide prediction methods, and used to test PSORTb's
signal peptide prediction module
- SPdb (Choo et al, 2005) is a signal peptide database containing a repository of experimentally verified and
predicted signal peptides.
- STEPdb (Orfanoudaki
et al.) A database of comprehensive characterization of sub-cellular localization and topology of the
Escherichia coli proteome
- SUBA4 (Heazlewood et al, 2007) is an Arabidopsis subcellular localization database with annotations based
on experimental results, literature references, Swiss-Prot annotations, and computational predictions.
- TOPDOM (Tusnady et al, 2008) is a database of domains and sequence motifs located
consistently on the same side of the membrane in alpha-helical transmembrane proteins.
Archived
- AMPDB (Heazlewood and Millar, 2005) is a database of known and predicted
mitochondrial proteins in the plant species Arabidopsis thaliana.
- CoBaltDB
(Goudenège
et al, 2010) is a database of prokaryotic subcellular localization
predictions that integrates the prediction results of many general
SCL predictors as well as specific signal sequence or cleavage site
predictors.
- DBMLoc
(Zhang
et al, 2008) is a database of proteins with multiple subcellular
localizations.
- DBSubLoc (Guo et al, 2004): A dataset of proteins with annotated subcellular
localizations according to SWISS-PROT and PIR.
- FGsub
(Sun
et al, 2010) is a website that contains SCL predictions results
for fungal pathogen Fusarium graminearum.
- LocateP-DB
(Zhou
et al, 2008) is a database of precomputed Gram-positive genomic
protein subcellular localization predictions.
- LOCtarget (Nair and Rost, 2004) is a database of LOCtree predictions
for structural genomics targets. LOC3D (Nair and Rost, 2003) is a database of predicted localizations
for eukaryotic proteins with 3D structures. LOCkey (Nair and Rost, 2002) contains predicted localizations for
the human, Arabidopsis, fly, yeast and worm genomes based on Swiss-Prot
keywords. LOChom (2002) is a database of predicted localizations based
on homology to experimentally annotated proteins.
Transmembrane alpha-helix predictors and membrane
prediction software:
Active
- BetAware (Savojardo et al, 2013)
- HMM-TM
(Bagos et al, 2006) incorporates
prior topological information in HMMs.
- HMMTOP (Tusnady and Simon, 1998) HMMTOP is used in all versions of PSORTb.
- LIPS (Adamian and Liang, 2006)
- MemPype
(Pierleoni et al, 2011
) is a pipeline for identifying membrane-associated proteins and discriminates types of
membrane SCLs and topology for eurkaryotic membrane proteins.
- MEMSAT3
(Jones, 2007)
- Philius
(Reynolds et al, 2008)
is an updated version of Phobius
- Phobius (Käll et al, 2007; Käll et al, 2004)
- PolyPhobius (Käll et al, 2005)
- SOSUI (Tokyo Univ. of Agriculture & Technology)
- SPOCTOPUS (Viklund et al, 2008)
predicts signal peptides and transmembrane helices as well as their topology.
- SVMtm (Yuan et al, 2003)
- TMHMM (Krogh et al, 2001)
-
TMpred (Hofmann and Stoffel, 1993)
- TOPCONS (Bernsel et al, 2009)
is a web server for consensus prediction of membrane protein topology.
Archived
Beta-barrel outer membrane protein
predictors:
Active
Archived
|