MMTSB
Tool Set Documentation

Difference between revisions of "bestcluster.pl"

From MMTSB
Jump to: navigation, search
 
 
(2 intermediate revisions by the same user not shown)
Line 6: Line 6:
 
== Description ==
 
== Description ==
  
This script scores ensemble clusters previously generated with
+
This script scores ensemble clusters previously generated with [[enscluster.pl]]. As with the other utilities
<docmark>enscluster.pl</docmark>. As with the other utilities
 
 
for ensembles a tag is required for identifying
 
for ensembles a tag is required for identifying
 
the structure set and the ensemble directory may be given with
 
the structure set and the ensemble directory may be given with
 
<B>-dir</B>.<BR>
 
<B>-dir</B>.<BR>
 +
 
By default the final clusters at the bottom of the cluster
 
By default the final clusters at the bottom of the cluster
 
hierarchy are scored. Alternatively, clusters at a specific level
 
hierarchy are scored. Alternatively, clusters at a specific level
 
may be scored instead by specifying the level through the
 
may be scored instead by specifying the level through the
 
<B>-level</B> option.<BR>
 
<B>-level</B> option.<BR>
 +
 
The total energy (<font color=#508060>etot</font>) is used for scoring as the default
 
The total energy (<font color=#508060>etot</font>) is used for scoring as the default
 
property. A different property may be chosen with <B>-prop</B>.
 
property. A different property may be chosen with <B>-prop</B>.
Line 32: Line 33:
 
the option <B>-limstd</B> may be used to change the limit in multiples of
 
the option <B>-limstd</B> may be used to change the limit in multiples of
 
the standard deviation for excluding cluster members.<BR>
 
the standard deviation for excluding cluster members.<BR>
 +
 
The best value is used as the score with <font color=#508060>best</font>, the average
 
The best value is used as the score with <font color=#508060>best</font>, the average
 
over the best <font color=#508060>num</font> structures with <font color=#508060>best<num></font>. Finally,
 
over the best <font color=#508060>num</font> structures with <font color=#508060>best<num></font>. Finally,
with <font color=#508060>median</font> the median value is taken as the cluster score.<BR><BR>
+
with <font color=#508060>median</font> the median value is taken as the cluster score.<BR>
 +
 
 
In the output the clusters are sorted according to the score (or according
 
In the output the clusters are sorted according to the score (or according
 
to their size if <B>-size</B> is given). For each cluster it consists of
 
to their size if <B>-size</B> is given). For each cluster it consists of
Line 46: Line 49:
  
 
; -help : usage information
 
; -help : usage information
 
+
; -dir directory : data directory
 +
; -level num : specify clustering level for hierarchical clusters
 +
; -ctag alttag : read clustering data from <TT>alttag.clusters</TT>
 +
; -prop tag[+tag...] : specify which properties to use for sorting clusters
 +
; -size : sort clusters by size
 +
; -crit avg|avglow|avgcent|best|best#|median : criteria for ranking clusters (average, best score etc.)
 +
; -limstd multiple : provide cutoff in terms of multiples of standard deviation for excluding data when averaging
 +
; -lowest : show file name and score of lowest structure in each cluster
 +
; -xlowest tags : show additional properties for lowest scoring structure
  
 
== Examples ==
 
== Examples ==

Latest revision as of 12:56, 30 July 2009

Usage

usage:   bestcluster.pl [options] tag
options: [-dir datadir]
         [-level num]
         [-ctag tag]
         [-prop tag[+tag...]]
         [-size]
         [-crit avg|avglow|avgcent|best|best#|median]
         [-limstd multiple]
         [-lowest]
         [-xlowest tags]

Show source


Description

This script scores ensemble clusters previously generated with enscluster.pl. As with the other utilities for ensembles a tag is required for identifying the structure set and the ensemble directory may be given with -dir.

By default the final clusters at the bottom of the cluster hierarchy are scored. Alternatively, clusters at a specific level may be scored instead by specifying the level through the -level option.

The total energy (etot) is used for scoring as the default property. A different property may be chosen with -prop. A number of different methods are available to obtain a single score for each cluster from the property values of the cluster members. The default method is to calculate a simple average for all members. Other methods are available with the -crit option followed by a corresponding keyword. With avglow and avgcent cluster members with a property value outside the standard distribution around the mean are ignored in calculating the average property value. If avglow is selected only values on the high end of the distribution are ignored, with avgcent extreme values on both sides of the distribution are omitted from the calculation of the average. avglow is particularly useful with energy values if a small number of erroneously high energies occur due to structural distortions that are not resolved during minimization. If avglow or avgcent is selected the option -limstd may be used to change the limit in multiples of the standard deviation for excluding cluster members.

The best value is used as the score with best, the average over the best num structures with best<num>. Finally, with median the median value is taken as the cluster score.

In the output the clusters are sorted according to the score (or according to their size if -size is given). For each cluster it consists of the number of total members, the number of members used in calculating the score, the score itself and, if applicable, the standard deviation and the statistical error of the score based on the standard deviation and the number of values used in calculating the score. If the option -lowest is given additional fields contain the energy and filename of the lowest energy conformation for each cluster.

Options

-help 
usage information
-dir directory 
data directory
-level num 
specify clustering level for hierarchical clusters
-ctag alttag 
read clustering data from alttag.clusters
-prop tag[+tag...] 
specify which properties to use for sorting clusters
-size 
sort clusters by size
-crit avg|avglow|avgcent|best|best#|median 
criteria for ranking clusters (average, best score etc.)
-limstd multiple 
provide cutoff in terms of multiples of standard deviation for excluding data when averaging
-lowest 
show file name and score of lowest structure in each cluster
-xlowest tags 
show additional properties for lowest scoring structure

Examples

bestcluster.pl -dir data vacmin
scores final clusters for the vacmin ensemble structures according to the average total energy

t.2.1            5     5   -1870.1520   44.8227   20.0453
t.2.2            5     5   -1769.0720   88.6326   39.6377
t.1.1            5     5   -1741.1520   49.8687   22.3020
t.1.2            5     5   -1654.2400  152.9225   68.3890


bestcluster.pl -dir data -level 1 -crit best vacmin
scores first level clusters for the vacmin ensemble structures according to the best total energy

t.2             10     1   -1916.5800    0.0000    0.0000
t.1             10     1   -1791.0300    0.0000    0.0000


bestcluster.pl -dir data -crit avglow -limstd 1.2 vacmin
scores final clusters for the vacmin ensemble structures according to the average total energy but excluding structures with energies beyond 1.2 times the standard deviation from the average.

t.2.1            5     3   -1898.0267   16.0680    9.2769
t.1.1            5     3   -1775.0900    7.3533    4.2454
t.2.2            5     3   -1707.0833   32.4968   18.7620
t.1.2            5     3   -1694.6000   41.5083   23.9648


bestcluster.pl -prop rgyr -crit best2 -dir data vacmin
scores final clusters for the vacmin ensemble structures according to the average radius of gyration of the two best structures (also with respect to radius of gyration).

t.1.2            5     2       8.9657    0.1587    0.1122
t.2.2            5     2       9.1108    0.0507    0.0359
t.2.1            5     2       9.2168    0.3081    0.2178
t.1.1            5     2       9.2837    0.0013    0.0009