MMTSB
Tool Set Documentation

Difference between revisions of "enscluster.pl"

From MMTSB
Jump to: navigation, search
 
 
(11 intermediate revisions by 2 users not shown)
Line 8: Line 8:
 
This script applies a clustering algorithm to ensemble
 
This script applies a clustering algorithm to ensemble
 
structures. The options and functionality is very similar to
 
structures. The options and functionality is very similar to
<docmark>cluster.pl</docmark>. The differences are that instead
+
[[cluster.pl]]. The differences are that instead
 
of a list of files an ensemble tag is expected and the output is
 
of a list of files an ensemble tag is expected and the output is
 
stored in a file <font color=#508060>tag.cluster</font> in the ensemble
 
stored in a file <font color=#508060>tag.cluster</font> in the ensemble
 
data directory. The clustering options are also stored in and read
 
data directory. The clustering options are also stored in and read
 
from the options file associated with the ensemble tag.<BR>
 
from the options file associated with the ensemble tag.<BR>
In addition to the parameters from <docmark>cluster.pl</docmark>
+
 
 +
In addition to the parameters from [[cluster.pl]]
 
the parameter <B>-dir</B> is used to specify the ensemble directory.
 
the parameter <B>-dir</B> is used to specify the ensemble directory.
 
With <B>-opt</B> other options files (other than the default one) can
 
With <B>-opt</B> other options files (other than the default one) can
 
be read in.<BR>
 
be read in.<BR>
 +
 
For fragment/loop modeling the residue range may be specified as
 
For fragment/loop modeling the residue range may be specified as
in <docmark>cluster.pl</docmark>, but if a residue range has been
+
in [[cluster.pl]], but if a residue range has been
 
stored in the ensemble configuration file previously clustering
 
stored in the ensemble configuration file previously clustering
 
will also only be based on the corresponding residue subset even
 
will also only be based on the corresponding residue subset even
 
if <B>-l</B> is not explicitly given. Fitting for RMSD based clustering
 
if <B>-l</B> is not explicitly given. Fitting for RMSD based clustering
 
is always done for the protein template surrounding the selected
 
is always done for the protein template surrounding the selected
residues.
+
residues.<BR>
<BR>
+
 
 
The centroid output options are not supported in the ensemble clustering
 
The centroid output options are not supported in the ensemble clustering
 
script.
 
script.
Line 31: Line 33:
  
 
; -help : usage information
 
; -help : usage information
 +
; -jclust : hierarchical clustering
 +
; -kclust : K-means clustering
 +
; -maxnum value : maximum number of clusters for hierarchical clustering
 +
; -minsize value : minimum cluster size to generate subclusters in hierarchical clustering
 +
; -maxlevel value : maximum levels for hierarchical clustering
 +
; -radius value : define cluster radius for K-means clustering
 +
; -[no]iterate : (do not) iterate during K-means clustering
 +
; -mode rmsd|contact|phi|psi|phipsi|mix : measure for comparing structures during clustering
 +
; -contmaxdist value : contact distance threshold if clustering based on contact map
 +
; -mixfactor value : weight factor if clustering both on RMSD and contact map
 +
; -l min&#58;max[=min&#58;max ..] : compare only specified residue range when clustering
 +
; -selmode ca|cb|cab|heavy|all : atoms used for comparing structures during clustering
 +
; -[no]lsqfit : (do not) superimpose structures before comparing
 +
; -dir workdir : data directory
 +
; -opt file[&#58;file] : provide file with clustering options
  
 +
== Examples ==
  
== Examples ==
+
<mmtsbToolExample cmd="enscluster.pl" set="test1"></mmtsbToolExample>
  
<mmtsbToolExample cmd="enscluster.pl" set="/apps/mmtsb/bench/enscluster.pl-test/test1"></mmtsbToolExample>
+
<mmtsbToolExample cmd="enscluster.pl" set="test2"></mmtsbToolExample>

Latest revision as of 14:37, 30 July 2009

Usage

usage:   enscluster.pl [options] tag
options: [-jclust] [-kclust]
         [-maxnum value] [-minsize value] [-maxlevel value]
         [-radius value] [-[no]iterate] [-maxerr value]
         [-mode rmsd|contact|phi|psi|phipsi|mix]
         [-contmaxdist value] [-mixfactor value]
         [-l min:max[=min:max ...]] [-fit min:max[=min:max] | -fitxl]
         [-selmode ca|cb|cab|heavy|all]
         [-[no]lsqfit]
         [-dir workdir]
         [-opt file[:file]]

Show source


Description

This script applies a clustering algorithm to ensemble structures. The options and functionality is very similar to cluster.pl. The differences are that instead of a list of files an ensemble tag is expected and the output is stored in a file tag.cluster in the ensemble data directory. The clustering options are also stored in and read from the options file associated with the ensemble tag.

In addition to the parameters from cluster.pl the parameter -dir is used to specify the ensemble directory. With -opt other options files (other than the default one) can be read in.

For fragment/loop modeling the residue range may be specified as in cluster.pl, but if a residue range has been stored in the ensemble configuration file previously clustering will also only be based on the corresponding residue subset even if -l is not explicitly given. Fitting for RMSD based clustering is always done for the protein template surrounding the selected residues.

The centroid output options are not supported in the ensemble clustering script.

Options

-help 
usage information
-jclust 
hierarchical clustering
-kclust 
K-means clustering
-maxnum value 
maximum number of clusters for hierarchical clustering
-minsize value 
minimum cluster size to generate subclusters in hierarchical clustering
-maxlevel value 
maximum levels for hierarchical clustering
-radius value 
define cluster radius for K-means clustering
-[no]iterate 
(do not) iterate during K-means clustering
-mode rmsd|contact|phi|psi|phipsi|mix 
measure for comparing structures during clustering
-contmaxdist value 
contact distance threshold if clustering based on contact map
-mixfactor value 
weight factor if clustering both on RMSD and contact map
-l min:max[=min:max ..] 
compare only specified residue range when clustering
-selmode ca|cb|cab|heavy|all 
atoms used for comparing structures during clustering
-[no]lsqfit 
(do not) superimpose structures before comparing
-dir workdir 
data directory
-opt file[:file] 
provide file with clustering options

Examples

enscluster.pl -maxnum 3 -minsize 10 -dir data sample
performs hierarchical clustering for ensemble structure associated with the sample tag. The maximum number of clusters at each level is set to 3, subclusters are recursively clustered again if they have 10 or more elements.


enscluster.pl -kclust -radius 5 -dir data sample
performs K-means clustering with a radius of 5 Å for ensemble structure associated with the sample tag.