PFRMAT SS 
TARGET T0052 
AUTHOR 6844-7080-1795 
METHOD Shestopalov Boris V.      Russian Academy of Sciences 
METHOD Sanct-Peterburg 194064    Instutute of Cytology 
METHOD DOUblet Code Method   -   DOUC Method 
METHOD INTRO-DOUC-TION.The method is manual. It takes only 
METHOD 30 minutes to perform manual single sequence 
METHOD pre-DOUC-tion of 300residues protein. The basis of 
METHOD the method was published in 1990 (Shestopalov B.V. 
METHOD Prediction of protein secondary structure by doublet 
METHOD code method. Mol. Biol., Moscow, Engl. transl., 24/4, 
METHOD p.900-907. For the CASP3 the method has been modified. 
METHOD DOUC-SCRIPTION. Coils, strands, helices consist 
METHOD of overlaps of structurons which consist of 2, 3, 5, 
METHOD residues respectively and are encoded by residue pairs 
METHOD (i, i+1), (i, i+2), (i, i+4). Codon tables are obtained 
METHOD from analysis of residue pairs occurence in secondary 
METHOD structures. Codon distributions in a primary structure 
METHOD are placed in three lines under the structure. 
METHOD Usually codons of diiferent structural types overlap 
METHOD in an amino acid sequence. Choice of codons in such 
METHOD cases is necessary. The choce is to exclude the least 
METHOD number of codons until the overlap disappear. 
METHOD Obtained codon distributions are used for prediction. 
METHOD If several variants of distributions are obtained 
METHOD the prediction of some regions may be ambiguous 
METHOD and such regions can not be predicted at this stage. 
METHOD Described procedure predict with 62-63% accuracy. 
METHOD Multiple sequence version improves the accuracy 
METHOD and excludes some ambiguous regions. 
METHOD DOUC-TAILS. The codons are classified as strong 
METHOD and weak ones. A residue pair is strong (weak) 
METHOD codon if probability of respective structure 
METHOD for this pair is more (equal) than probability 
METHOD of total of two other structures. The probability 
METHOD is calculated from an occurence of the pair 
METHOD in a secondary structure database using the reverse 
METHOD binomial distribution (2P-1=0.999). 
METHOD The codon choice is performed firstly between strong 
METHOD codons. Then weak codons are considered. 
METHOD The secondary structure database was constructed 
METHOD firstly from primary and secondary structures 
METHOD of 257 proteins. Then secondary structure of these 
METHOD proteins was predicted by the code obtained 
METHOD from this database. Then new database was constructed 
METHOD from primary and secondary structures of correctly 
METHOD predicted regions and new code was obtained from 
METHOD this database. New code was used for new proteins 
METHOD secondary structure prediction and correctly predicted 
METHOD regions were added to the database and new code was 
METHOD obtained from the enlarged database and new proteins 
METHOD were predicted and new database was constructed... 
METHOD The DOUC-CODONS used for the CASP3 target prediction 
METHOD is remarked after the method line. 
METHOD Five models are used. Models 1 and 2 (3 and 4) are 
METHOD multiple (single) sequence predictions. Alignments 
METHOD are constructed by eye on the basis of BLAST results 
METHOD /identities/ (http://www.ncbi.nlm.nih.gov) and residue 
METHOD equivalence derived from DOUC. A list of aligned 
METHOD sequences (ncbi indexes) is remarked before model line. 
METHOD Models 1 and 3 are ones without differentiation 
METHOD between strong and weak codons. Model 5 is a special 
METHOD case remarked before the model. 
REMARK DOUC-CODONS-07.06.98. The database volume is 150000 
REMARK amino acid residues. COIL. Strong codons: AD AG AP CG 
REMARK CN CP DC DD DG DH DK DN DP DS DT DW ED EG EP ES FP GA 
REMARK GC GD GE GG GH GK GL GM GN GP GQ GR GS GT GW HD HG HN 
REMARK HP HS IP KD KG KN KP KS LP MG MP ND NG NH NK NN NP NR 
REMARK NS NT PA PC PD PE PF PG PH PK PL PM PN PP PQ PR PS PT 
REMARK PW PY QG QP RG RP SD SG SH SK SN SP SQ SR SS ST TD TG 
REMARK TK TN TP TS VP WD WG WP YP.  Weak codons: CD CH DQ DR 
REMARK EN HH HK KC MN MS NC NQ NW NY QD QN QS RN RS SC SW WC 
REMARK WN WS. STRAND. Strong codons: CI CV FC FF FI FL FT FV 
REMARK FW FY HV IC IF II IL IT IV IW IY LC LF LI LL LV LY MF 
REMARK MV TF TH TI TT TV TY VC VF VI VL VS VT VV VW VY WF WI 
REMARK WV WY YC YF YI YL YT YV YY.  Weak codons: CC CF CL CM 
REMARK CT CW CY FM HC HF HH HI HW HY IM LW MC MI MW MY VH VM 
REMARK WC WH WL WW YH YW. HELIX.  Strong codons: AA AE AK AL 
REMARK AM AQ AR EA EE EK EL EM EQ ER IL IM KA KE KQ LA LL LM 
REMARK LQ LR MA ME ML MR QA QD QE QK QQ QR RA RE RK RM RQ RR 
REMARK Weak codons:   AH AW CM DR EH EW FL FM HE HH HM KM KR 
REMARK LI MH MI MK MM MQ MW QM RW WI WK WL WM WW YM. 
REMARK 127 COIL CODONS: 103 strong, 24 weak; 81 STRAND CODONS: 
REMARK 53/28; 68 HELIX CODONS: 40/28. 276 IN TOTALITY: 196/80. 
MODEL  4 
L C 1.00 
G C 1.00 
K C 1.00 
F C 1.00 
S C 1.00 
Q C 1.00 
T E 1.00 
C E 1.00 
Y E 1.00 
N C 1.00 
S C 1.00 
A C 1.00 
I C 1.00 
Q C 1.00 
G C 1.00 
S C 1.00 
V C 0.00 
L C 0.00 
T C 0.00 
S C 0.00 
T C 0.00 
C C 1.00 
E C 1.00 
R C 1.00 
T C 1.00 
N C 1.00 
G C 1.00 
G C 1.00 
Y C 1.00 
N C 1.00 
T C 1.00 
S C 1.00 
S C 1.00 
I E 1.00 
D E 1.00 
L E 1.00 
N C 1.00 
S C 1.00 
V C 1.00 
I C 1.00 
E C 1.00 
N C 1.00 
V C 1.00 
D C 1.00 
G C 1.00 
S C 1.00 
L E 1.00 
K E 1.00 
W E 1.00 
Q C 1.00 
P C 1.00 
S C 1.00 
N C 1.00 
F C 1.00 
I E 1.00 
E E 1.00 
T E 1.00 
C C 1.00 
R C 1.00 
N C 1.00 
T C 1.00 
N C 1.00 
L C 1.00 
A C 1.00 
G C 1.00 
S C 1.00 
S C 1.00 
E H 1.00 
L H 1.00 
A H 1.00 
A H 1.00 
E H 1.00 
C H 1.00 
K H 1.00 
T H 1.00 
R H 1.00 
A H 1.00 
Q H 1.00 
Q C 1.00 
F C 1.00 
V C 0.00 
S C 0.00 
T C 0.00 
K C 0.00 
I E 1.00 
N E 1.00 
L E 1.00 
D C 1.00 
D C 1.00 
H C 1.00 
I C 1.00 
A C 1.00 
N C 1.00 
I C 1.00 
D C 1.00 
G C 1.00 
T C 1.00 
L E 1.00 
K E 1.00 
Y E 1.00 
E C 1.00 
END 
