PFRMAT SS 
TARGET T0043 
AUTHOR 6844-7080-1795 
REMARK 27 June 1998 
METHOD Shestopalov Boris V.      Russian Academy of Sciences 
METHOD Sanct-Peterburg 194064    Instutute of Cytology      
METHOD DOUblet Code Method   -   DOUC Method 
METHOD INTRO-DOUC-TION.The method is manual. It takes only  
METHOD 30 minutes to perform manual single sequence  
METHOD pre-DOUC-tion of 300 residue protein. The basis of 
METHOD the method was published in 1990 (Shestopalov B.V. 
METHOD Prediction of protein secondary structure by doublet 
METHOD code method. Mol. Biol., Moscow, Engl. transl., 24/4, 
METHOD p.900-907). For the CASP3 the method has been modified. 
METHOD DOUC-SCRIPTION. Coils, strands, helices consist 
METHOD of overlaps of structurons which consist of 2, 3, 5, 
METHOD residues respectively and are encoded by residue pairs 
METHOD (i, i+1), (i, i+2), (i, i+4). Codon tables are obtained 
METHOD from analysis of residue pairs occurence in secondary 
METHOD structures. Codon distributions in a primary structure 
METHOD are placed in three lines under the structure. 
METHOD Usually codons of diiferent structural types overlap 
METHOD in an amino acid sequence. Choice of codons in such 
METHOD cases is necessary. The choice is to exclude the least 
METHOD number of codons until the overlap disappear.   
METHOD Obtained codon distributions are used for prediction. 
METHOD If several variants of distributions are obtained 
METHOD the prediction of some regions may be ambiguous 
METHOD and such regions can not be predicted at this stage. 
METHOD The average prediction accuracy of this procedure, 
METHOD so called single sequence prediction (SSP), is limited 
METHOD up to 63% because only local interactions are 
METHOD considered. If one uses similar sequences with such  
METHOD similar secondary structure predicted that just the same 
METHOD 3D structures may be suggested on the basis of data 
METHOD of B.Rost, C.Sander, R.Schneider (JMB, 235/1, 13-26, 
METHOD 1994), the average secondary structure may be nearer 
METHOD to the experimental one up to 5-10% and ambiguities 
METHOD are excluded. This is version of so called multiple 
METHOD sequence prediction (MSP) used here. 
METHOD DOUC-TAILS. The codons are classified as strong 
METHOD and weak ones. A residue pair is strong (weak) 
METHOD codon if probability of respective structure 
METHOD for this pair is more (equal) than probability 
METHOD of total of two other structures. The probability 
METHOD is calculated from an occurence of the pair 
METHOD in a secondary structure database using the reverse 
METHOD binomial distribution (2P-1=0.999). 
METHOD The codon choice is performed firstly between strong 
METHOD codons. Then weak codons are considered. 
METHOD The secondary structure database was constructed 
METHOD firstly from primary and secondary structures 
METHOD of 257 proteins. Then secondary structure of these 
METHOD proteins was predicted by the code obtained 
METHOD from this database. Then new database was constructed 
METHOD from primary and secondary structures of correctly 
METHOD predicted regions and new code was obtained from 
METHOD this database. New code was used for new proteins 
METHOD secondary structure prediction and correctly predicted 
METHOD regions were added to the database and new code was 
METHOD obtained from the enlarged database and new proteins 
METHOD were predicted and new database was constructed... 
METHOD The DOUC-CODONS used for the CASP3 target prediction 
METHOD is remarked after the method. 
REMARK To this moment the DOUBLET CODE is ready up to 95-97%. 
REMARK Probably most of the weak codons with rare residues (W, 
REMARK C, H, M) may become strong ones or disappear. Therefore 
REMARK the version of DOUC method without differentiation 
REMARK between strong and weak codons is used also. 
METHOD Five models are used. Models 1, 2, 5 (3, 4) are 
METHOD MSP (SSP). Secondary structures for MSP alignments 
METHOD are selected from ones obtained by pre-DOUC-tions 
METHOD for sequences selected, as a rule, by BLAST method  
METHOD /identities/ (http://www.ncbi.nlm.nih.gov), 
METHOD Altschul et al., NAR v.25, p.3389-3402 (1997). 
METHOD All the structures for which such alignment may be 
METHOD constructed that any pair of nearest structures has 
METHOD similarity not less than 70% (models 1, 2) or 61% 
METHOD (model 5) are selected. The LIST OF SEQUENCES USED  
METHOD FOR MSP is placed before the model line. 
METHOD Models 1 and 3 are ones without strong/weak codon 
METHOD differentiation. Models 1, 2 (5) are not less than 70% 
METHOD (61%) similarity between aligned structures (see above). 
REMARK DOUC-CODONS-07.06.98. The database volume is 150000 
REMARK amino acid residues. COIL. Strong codons: AD AG AP CG 
REMARK CN CP DC DD DG DH DK DN DP DS DT DW ED EG EP ES FP GA 
REMARK GC GD GE GG GH GK GL GM GN GP GQ GR GS GT GW HD HG HN  
REMARK HP HS IP KD KG KN KP KS LP MG MP ND NG NH NK NN NP NR 
REMARK NS NT PA PC PD PE PF PG PH PK PL PM PN PP PQ PR PS PT 
REMARK PW PY QG QP RG RP SD SG SH SK SN SP SQ SR SS ST TD TG 
REMARK TK TN TP TS VP WD WG WP YP.  Weak codons: CD CH DQ DR 
REMARK EN HH HK KC MN MS NC NQ NW NY QD QN QS RN RS SC SW WC 
REMARK WN WS. STRAND. Strong codons: CI CV FC FF FI FL FT FV 
REMARK FW FY HV IC IF II IL IT IV IW IY LC LF LI LL LV LY MF 
REMARK MV TF TH TI TT TV TY VC VF VI VL VS VT VV VW VY WF WI  
REMARK WV WY YC YF YI YL YT YV YY.  Weak codons: CC CF CL CM 
REMARK CT CW CY FM HC HF HH HI HW HY IM LW MC MI MW MY VH VM 
REMARK WC WH WL WW YH YW. HELIX.  Strong codons: AA AE AK AL 
REMARK AM AQ AR EA EE EK EL EM EQ ER IL IM KA KE KQ LA LL LM 
REMARK LQ LR MA ME ML MR QA QD QE QK QQ QR RA RE RK RM RQ RR 
REMARK Weak codons:   AH AW CM DR EH EW FL FM HE HH HM KM KR 
REMARK LI MH MI MK MM MQ MW QM RW WI WK WL WM WW YM. 
REMARK 127 COIL CODONS: 103 strong, 24 weak; 81 STRAND CODONS: 
REMARK 53/28; 68 HELIX CODONS: 40/28. 276 IN TOTALITY: 196/80. 
REMARK LIST OF SEQUENCES USED FOR MSP: T0043, ncbi indexes gi 
REMARK 1170357, 2160683, 98370, 119838, 2982888, 1651824, 
REMARK 123494, 1706767, 2113970, 3097221, 3242316, 1934972, 
REMARK 2314180. 
MODEL  5 
T C 1.00 
V E 1.00 
A E 1.00 
Y E 1.00 
I E 1.00 
A E 1.00 
I E 1.00 
G C 1.00 
S C 1.00 
N C 1.00 
L C 1.00 
A C 1.00 
S C 1.00 
P C 1.00 
L H 1.00 
E H 1.00 
Q H 1.00 
V H 1.00 
N H 1.00 
A H 1.00 
A H 1.00 
L H 1.00 
K H 1.00 
A H 1.00 
L H 1.00 
G H 1.00 
D H 1.00 
I H 1.00 
P C 1.00 
E C 1.00 
S E 1.00 
H E 1.00 
I E 1.00 
L C 1.00 
T C 1.00 
V C 1.00 
S C 1.00 
S C 1.00 
F C 1.00 
Y C 1.00 
R C 1.00 
T C 1.00 
P C 1.00 
P C 1.00 
L C 1.00 
G C 1.00 
P C 1.00 
Q C 1.00 
D C 1.00 
Q C 1.00 
P C 1.00 
D C 1.00 
Y C 1.00 
L C 1.00 
N C 1.00 
A C 1.00 
A C 1.00 
V E 1.00 
A E 1.00 
L E 1.00 
E C 1.00 
T C 1.00 
S C 1.00 
L C 1.00 
A C 1.00 
P C 1.00 
E C 1.00 
E H 1.00 
L H 1.00 
L H 1.00 
N H 1.00 
H H 1.00 
T H 1.00 
Q H 1.00 
R H 1.00 
I H 1.00 
E H 1.00 
L H 1.00 
Q H 1.00 
Q H 1.00 
G H 1.00 
R H 1.00 
V H 1.00 
R H 1.00 
K C 1.00 
A C 1.00 
E C 1.00 
R C 1.00 
W C 1.00 
G C 1.00 
P C 1.00 
R C 1.00 
T C 1.00 
L E 1.00 
D E 1.00 
L E 1.00 
D E 1.00 
I E 1.00 
M E 1.00 
L E 1.00 
F E 1.00 
G C 1.00 
N C 1.00 
E C 1.00 
V C 1.00 
I C 1.00 
N C 1.00 
T C 1.00 
E C 1.00 
R C 1.00 
L C 1.00 
T C 1.00 
V C 1.00 
P C 1.00 
H C 1.00 
Y C 1.00 
D C 1.00 
M C 1.00 
K C 1.00 
N C 1.00 
R C 1.00 
G C 1.00 
F E 1.00 
M E 1.00 
L E 1.00 
W C 1.00 
P C 1.00 
L C 1.00 
F E 1.00 
E E 1.00 
I E 1.00 
A C 1.00 
P C 1.00 
E C 1.00 
L C 1.00 
V C 1.00 
F C 1.00 
P C 1.00 
D C 1.00 
G C 1.00 
E C 1.00 
M C 1.00 
L C 1.00 
R C 1.00 
Q H 1.00 
I H 1.00 
L H 1.00 
H H 1.00 
T H 1.00 
R C 1.00 
A C 1.00 
F C 1.00 
D C 1.00 
K C 1.00 
L C 1.00 
N C 1.00 
K C 1.00 
W C 1.00 
END 
