|
genseq.pl
Usage
usage: genseq.pl [options] [-pdb | -monsster | -one] [file] options: [-out [one][sec] | monsster] [-sel from:to] [-s inx:sequence[=inx:sequence]] [-2ndpred file[:file...] [-2ndone file] [-dssp] [-dsspfull] [-dsspnum] [-fill]
Description
The main purpose of this script is to generate a MONSSTER amino acid
sequence file from a PDB file or sequence string but it can also
perform some other related functions.
The input format is selected with the -pdb (PDB file, default),
-one (string consisting of one letter amino acid codes), or
-monsster (MONSSTER sequence file) options. If a file name is
specified input is read from that file, otherwise from standard input.
The output format is either a MONSSTER sequence file (default) or
a string of one letter amino acid codes. It is selected with
-out monsster or -out one, respectively. With
-out onesec secondary structure abbreviations are printed
along with the sequence. Using these
options a sequence file can be generated from a PDB file but it is also
possible to get an abbreviated sequence string from a sequence file, e.g.,
as shown in the examples below.
If the PDB file is lacking part of the complete structure, as in typical
loop modeling applications, the sequence for the missing parts can be
given through the option -s. This option requires the first index
in the PDB structure and an abbreviated sequence string as a colon-separated
argument.
Further options are available to include secondary structure information
in the sequence file. If -dssp is specified the secondary structure
is taken from DSSP output. This requires the availability of a compiled
version of DSSP in the
MMTSB binary directory. With the option
-2ndpred a list of files containing output from common automated
secondary structure prediction programs can be provided. If more than file
is given a consensus prediction will be determined from all predictions.
Currently prediction outputs from the following programs are recognized:
PHD, SSpro, PSIpred, Jpred2, PSSP, PROF, Pred2ary.
Finally it is possible to set the secondary structure from a file containing
an abbreviated string with the option -2ndone.
Options
- -help
- usage information
- -pdb
- read input in PDB format
- -monsster
- read input in MONSSTER sequence format
- -one
- read input as one-letter amino acid abbreviations
- -out one|onesec|sec|monsster
- specify output format
- -sel from:to
- output only limited residue range
- -s inx:sequence[=...]
- provide sequence at specific index (useful for filling in missing sequence from input)
- -2ndpred file[:file...]
- read secondary structure predictions from files
- -2ndone file
- read one-letter secondary structure code from file
- -dssp
- determine secondary structure with DSSP program
Examples
genseq.pl 1vii.exp.pdb
generates a MONSSTER sequence file from a PDB file
1 MET 1 1 2 LEU 1 1 3 SER 1 1 4 ASP 1 1 5 GLU 1 1 6 ASP 1 1 7 PHE 1 1 8 LYS 1 1 9 ALA 1 1 10 VAL 1 1 ...
genseq.pl -out one -monsster 1vii.seq
generates an abbreviated sequence string from a MONSSTER sequence file
MLSDEDFKAVFGMTRSAFANLPLWKQQNLKKEKGLF
genseq.pl -dssp 1vii.exp.pdb
generates a MONSSTER sequence file with secondary structure identification
calculated via dssp (alpha: 2, beta: 4, coil/unknown: 1)
1 MET 1 1 2 LEU 1 1 3 SER 1 1 4 ASP 2 1 5 GLU 2 1 6 ASP 2 1 7 PHE 2 1 8 LYS 2 1 9 ALA 1 1 10 VAL 1 1 ...
genseq.pl -out onesec -2ndpred phd.out:jpred2.out 1vii.exp.pdb
generates abbreviated sequence information from a PDB file and secondary structure information
from PHD and jpred2 secondary structure prediction server output.
MLSDEDFKAVFGMTRSAFANLPLWKQQNLKKEKGLF UUUUHHHHHHHHHHHHHHHHHHHHHHHHHHHHUUUU
genseq.pl -s 10:VFGMTRSAFANL 1vii.exp.x10:21.pdb
generates a MONSSTER sequence file from the given PDB file. The sequence
for the missing residues starting at index 10 are inserted from the
sequence string.
1 MET 1 1 2 LEU 1 1 3 SER 1 1 4 ASP 1 1 5 GLU 1 1 6 ASP 1 1 7 PHE 1 1 8 LYS 1 1 9 ALA 1 1 10 VAL 1 1 ...