PFRMAT AL 
TARGET T0043 
AUTHOR 3873-9906-1225 
REMARK Submission 2 
REMARK Work by Gidon Moont (1) , Lawrence Kelley (1), 
REMARK Bob MacCallum (1), Marcel Turcotte (1) Mansoor Saqi (2) 
REMARK and Michael Sternberg (1) (m.sternberg@icrf.icnet.uk) 
REMARK (1) Biomolecular Modelling Laboratory, 
REMARK Imperial Cancer Research Fund 
REMARK (1) Lincoln's Inn Fields, London WC2A 3PX, UK 
REMARK (2) Bioinformatics Group, GlaxoWellcome, Stevenage, UK 
METHOD 
METHOD Method outline 
METHOD --------------- 
METHOD unknown = target, library of known folds = template 
METHOD (0) Initial check for remote homology of target 
METHOD to templates of known structures using PSI-BLAST 
METHOD (1) Secondary structure & sequence target against fold 
METHOD template library using FOLDFIT 
METHOD (2) Multiple sequence matching against fold template 
METHOD library scored by novel metric (PSSM) 
METHOD (3) Search against Hidden Markov Models for fold template 
METHOD library using SAM 
METHOD (4) Local hydrophobicity and predicted secondary structure 
METHOD matched for target and template using SIVA (MacCallum & 
METHOD Thornton) 
METHOD (5) Filter top hits from above against topological rules 
METHOD for folds derived by an artificial intelligent type machine 
METHOD learning approach  (PROGOL) , Turcotte, Muggleton & 
METHOD Sternberg) 
METHOD (6) Evaluation of above results in terms of literature and 
METHOD function of target. 
METHOD 
METHOD General features of approach 
METHOD ----------------------------- 
METHOD 
METHOD (i) The fold (template) library consists of non-redundant 
METHOD SCOP domains with <40% sequence identity per family (called 
METHOD SCOP40). 
METHOD 
METHOD (ii) Secondary structure prediction from multiple alignment 
METHOD (homologues gathered with PSI-BLAST) DSC (King & 
METHOD Sternberg); PHD (Rost & Sander); JPRED (Barton) 
METHOD 
METHOD Method details 
METHOD -------------- 
METHOD 
METHOD (1) FOLDFIT (Russell,R.B., Saqi, M.A.S., Bates,P.A., 
METHOD Sayle,R.A.  & Sternberg, M.J.E. (1998). Prot Eng 11, 1-9.) 
METHOD The target is represented by sequence and predicted 
METHOD secondary structure and scanned against known secondary 
METHOD structure and sequence for template in fold library. 
METHOD Different weights for secondary structure and sequence are 
METHOD used to obtain different possible top hits. 
METHOD 
METHOD (2) Sequence profiles generated by PSIBLAST for the target 
METHOD and the templates are matched by a sum of squares metric. 
METHOD (PSSM, Kelley, MacCallum, Saqi & Sternberg, unpublished). 
METHOD (NB This method is being extended to obtain 3D structure 
METHOD  and sequence based-profiles for the templates). 
METHOD 
METHOD (3) HMM from SAM (Hughley & Krogh ) against a 
METHOD library generated from each 
METHOD template in SCOP40 (Moont, MacCallum & Sternberg). 
METHOD 
METHOD (4) Vector-based alignment of per-residue hydrophobicity 
METHOD and DSC predicted secondary structure probabilities for 
METHOD both target and template. This approach could also 
METHOD be used in the absence of known structures for library 
METHOD sequences.  Algorithm is SIVA (MacCallum & Thornton, 
METHOD unpublished) 
METHOD 
METHOD (5) Using an artificial intelligence based machine learning 
METHOD algorithm (PROGOL, Muggleton et al), we have obtained 
METHOD expert system type rules governing protein folds (Turcotte, 
METHOD Muggleton & Sternberg).  These rules include data on 
METHOD patterns and types of secondary structures including 
METHOD length, loop length and hydrophobicity.  Top hits from all 
METHOD the above methods were screened against rules for the folds 
METHOD to assess their likelihood. 
METHOD 
METHOD (6) Visual inspection of results. 
METHOD 
METHOD Details for T0043 
METHOD -------------------------------------- 
METHOD 
METHOD Runs from FOLDFIT, SIVA and the PSSMs gave flavodoxin always 
METHOD high in the list.  Visual inspection of sequence confirmed 
METHOD patterns hydrophobics typical of alpha/beta fold. 
METHOD Based on an initial alignment from FOLDFIT, a manual 
METHOD alignment constructed and checked by construction of 3D model 
METHOD using MAXSPROUT. 
METHOD 
METHOD 
MODEL  1 
PARENT 3fx2 
T    1 A 2 
V    2 K 3 
A    3 A 4 
Y    4 L 5 
I    5 I 6 
A    6 V 7 
I    7 Y 8 
G    8 G 9 
S    9 S 10 
N   10 T 11 
L   11 T 12 
A   12 G 13 
S   13 N 14 
P   14 T 15 
L   15 E 16 
E   16 Y 17 
Q   17 T 18 
V   18 A 19 
N   19 E 20 
A   20 T 21 
A   21 I 22 
L   22 A 23 
K   23 R 24 
A   24 E 25 
L   25 L 26 
G   26 A 27 
D   27 D 28 
I   28 A 29 
E   30 G 30 
S   31 Y 31 
H   32 E 32 
I   33 V 33 
L   34 D 34 
T   35 S 35 
V   36 R 36 
S   37 D 37 
S   38 A 38 
F   39 A 39 
Y   40 S 40 
R   41 V 41 
T   42 E 42 
P   43 A 43 
Q   48 G 44 
D   49 G 45 
Q   50 L 46 
P   51 F 47 
N   55 L 52 
A   56 V 53 
A   57 L 54 
V   58 L 55 
A   59 G 56 
L   60 C 57 
E   61 S 58 
T   62 T 59 
S   63 W 60 
L   64 G 61 
A   65 D 62 
P   66 D 63 
E   67 S 64 
E   68 I 65 
L   69 E 66 
L   70 L 67 
N   71 Q 68 
H   72 D 69 
T   73 D 70 
Q   74 F 71 
R   75 I 72 
I   76 P 73 
E   77 L 74 
L   78 F 75 
Q   79 D 76 
Q   80 S 77 
G   81 L 78 
R   82 E 79 
V   83 E 80 
P   91 T 81 
R   92 G 82 
T   93 A 83 
L   94 Q 84 
D   95 G 85 
L   96 R 86 
D   97 K 87 
I   98 V 88 
M   99 A 89 
L  100 C 90 
F  101 F 91 
G  102 G 92 
N  103 C 93 
E  104 G 94 
V  105 D 95 
I  106 S 96 
H  115 S 97 
Y  116 Y 98 
D  117 E 99 
M  118 Y 100 
K  119 F 101 
N  120 C 102 
R  121 G 103 
G  122 A 104 
F  123 V 105 
M  124 D 106 
L  125 A 107 
W  126 I 108 
P  127 E 109 
L  128 E 110 
F  129 K 111 
E  130 L 112 
I  131 K 113 
L  135 I 119 
V  136 V 120 
F  137 Q 121 
P  138 D 122 
D  139 G 123 
G  140 L 124 
E  141 R 125 
M  142 I 126 
L  143 D 127 
R  144 G 128 
Q  145 D 129 
I  146 P 130 
L  147 R 131 
H  148 A 132 
T  149 A 133 
R  150 R 134 
A  151 D 135 
F  152 D 136 
D  153 I 137 
K  154 V 138 
L  155 G 139 
N  156 W 140 
K  157 A 141 
W  158 H 142 
TER 
END 
