PFRMAT AL 
TARGET T0085 
AUTHOR 5827-4749-3439  
METHOD T0085 prediction method 
METHOD  
METHOD As usual, we employed a two-pronged approach: (1) We scored the target  
METHOD against all the HMMs and threading models in our libraries, and (2) we  
METHOD gathered homologs to the target using one or more tools 
METHOD (such as iterated FASTA or PsiBLAST), constructed an  
METHOD HMM for the target and homologs (using UCSC's HMM software), and  
METHOD scored PDB. This yielded three sets of scores, which we then used  
METHOD to find a target-structure match. 
METHOD  
METHOD Since we didn't find sequence homologs using BLAST, 
METHOD we focused on analyzing scores given the sequence by the 
METHOD HMMs in our library. This analysis 
METHOD revealed four structures from the same SCOP  
METHOD superfamily (multi-heme cytochromes) 
METHOD to have the highest scores in our hmm library: 
METHOD d1wad, d2cy3, and d1czj. Of these three, d1wad and d2cy3 
METHOD had virtually identical scores for the general hmms  
METHOD in our library (constructed to represent the family as a whole). 
METHOD While the scores were only moderately strong (NLL scores of 
METHOD -15.5 and -14.5 respectively), analysis of the scores given 
METHOD a representative subset of PDB by each of the HMMs showed 
METHOD the scores to be at least two standard deviations away 
METHOD from the mean scores. (In unpublished experiments on proteins  
METHOD with solved structures, we have found these z-scores  
METHOD to provide additional information than the simply NLL-NULL 
METHOD score alone,  
METHOD and two standard deviations from the mean to be a conservative 
METHOD cutoff with high confidence of accuracy, especially when 
METHOD combined with NLL-NULL scores < -13 or so.  
METHOD 
METHOD We chose d2cy3 over d1wad primarily because the sequence similarity 
METHOD of the target and structure was superior in variable regions of  
METHOD the multiple alignment of superfamily members. 
METHOD 
METHOD We considered, but rejected, a cytochrome fold with a different 
METHOD SCOP classification: single heme cytochromes, including 
METHOD 1ycc, 3c2c, and 1ctj. This group had high scores, although not as high 
METHOD as the multi-heme scores. Of this latter group, only the hmm for 1ycc  
METHOD and homologs gave the target scores as strong as those 
METHOD given by the multi-heme HMMs.  
METHOD These structures were rejected for two reasons: (1) our target 
METHOD contains multiple hemes, making it more naturally a member 
METHOD of the multiple heme SCOP class; (2) the alignment of the target 
METHOD to these structures were extremely gappy, albeit with some 
METHOD sequence similarities in aligned regions. 
METHOD 
METHOD  
METHOD The alignment submitted for the prediction was obtained as follows: 
METHOD (1) The target and structure homologs were aligned to the HMM 
METHOD constructed for d2cy3 and homologs. In constructing this HMM 
METHOD we used prrp from Gotoh to align the sequences, constructed an 
METHOD HMM from the alignment using modelfromalign, and then re-estimated 
METHOD the HMM using weighted sequences (weighting scheme  
METHOD uses an estimation of the number of independent observations 
METHOD to derive total weights, relative weighting scheme from  
METHOD Henikoff & Henikoff). 
METHOD (2) The multiple alignment of the structure and homologs was 
METHOD examined to identify regions of high variability. We employed  
METHOD a log likelihood function to measure the similarity among 
METHOD subfamilies for physical-chemical constraints at each position. 
METHOD We also examined the alignment for 
METHOD gapped regions as an additional signal of essential variability. 
METHOD We excluded these high-variability regions. (Bayesian formula 
METHOD was published in Sjolander's thesis, available by ftp from 
METHOD ftp.cse.ucsc.edu in pub/protein/phylogeny.) 
METHOD (3) The threading and HMM alignments were compared to 
METHOD find regions where the alignments agreed with shifts of 
METHOD less than or equal to 2 residues. 
METHOD (4) The alignment submitted contains those portions of the 
METHOD automatic HMM alignment that passed the criteria of (2) and (3). 
METHOD 
METHOD We have omitted the N-terminal 4 residues from the alignment 
METHOD (a conserved ADAP motif in both structure and target), 
METHOD even though they are aligned by HMM and threading methods, 
METHOD and the log likelihood function is positive in that region. 
METHOD Our reason for doing so is the expectation that this N-terminal 
METHOD region of the target must have a different structure than 
METHOD d2cy3 given the presence of a heme-binding CXXCH motif, 
METHOD and the otherwise variable primary sequence in that area. 
METHOD 
METHOD This produced an alignment that starts at d2cy3 V32 
METHOD and ends at d2cy3 K100, 
METHOD and omits four residues in a variable region in the middle.  
METHOD The C-terminal almost 70 residues of the target are left unaligned. 
METHOD 
METHOD Examining the alignment of the target to these structures 
METHOD revealed a different order of heme groups: the target inserts 
METHOD a heme group at the N-terminus, aligns its last three heme 
METHOD groups to the first three of these superfamily members, 
METHOD and has no heme group to align to the last C-terminal heme 
METHOD group of the structures. We also found that the  C-terminal heme  
METHOD of the superfamily relatives of d2cy3 
METHOD has only weak similarity among superfamily members; some have 
METHOD CXXCH motifs, while others have CXXXXCH motifs.  
METHOD Interestingly, the conserved Histidine residues upstream of 
METHOD the CXXCH motifs that help coordinate the heme groups  
METHOD among superfamily members is not found in the target.  
METHOD 
METHOD 
METHOD The un-edited alignment is shown below: 
METHOD 
METHOD                      10        20        30        40        50 
METHOD                       |         |         |         |         | 
METHOD T0085   ADAPFEgrkkcsSCHKAQAQSWKDTAHAKAMESLKPNVKKEAKQKAKLDPAKDY-TQDKDCV 
METHOD d2cy3__ ADAPGD......DYVISAPEGMKAKPKGDKPGALQKT------VPFPHTKHATV-----ECV 
METHOD  
METHOD  
METHOD           60        70               80                    90       10 
METHOD            |         |                |                     | 
METHOD T0085   GCHVDGFGQKGGYTIEspkpmltGVGCESCHGPGRNFRGDhrksgqafeksgKKTPRKDLAK 
METHOD d2cy3__ QCHHTLEADG--GAVK.......KCTTSGCHDSLE-FRDK............ANAKDIKLV- 
METHOD  
METHOD  
METHOD         0       110       120 
METHOD         |         |         | 
METHOD T0085   KGQDFHFEERCSACHLNYEGSPWKGAKAPYtpftpevdakytfkfdemvkevkamhehykle 
METHOD d2cy3__ ---ENAFHTQCIDCHKALKKDKKP--TGPT................................ 
METHOD  
METHOD  
METHOD           130                   140 
METHOD             |                     | 
METHOD T0085   gvfeGEPKFKFHDEfqasakpakkgk. 
METHOD d2cy3__ ....AC--GKCHTT.............  
METHOD  
METHOD  
MODEL 1 
PARENT 2cy3 
Q    44    V    32 
K    45    P    33 
A    46    F    34 
K    47    P    35 
L    48    H    36 
D    49    T    37 
P    50    K    38 
A    51    H    39 
K    52    A    40 
D    53    T    41 
Y    54    V    42 
D    59    E    43 
C    60    C    44 
V    61    V    45 
G    62    Q    46 
C    63    C    47 
H    64    H    48 
V    65    H    49 
D    66    T    50 
G    67    L    51 
F    68    E    52 
G    69    A    53 
Q    70    D    54 
K    71    G    55 
G    85    K    60 
V    86    C    61 
G    87    T    62 
C    88    T    63 
E    89    S    64 
S    90    G    65 
C    91    C    66 
H    92    H    67 
G    93    D    68 
P    94    S    69 
G    95    L    70 
R    96    E    71 
F    98    F    72 
R    99    R    73 
G    100    D    74 
D    101    K    75 
K    114    A    76 
K    115    N    77 
T    116    A    78 
P    117    K    79 
R    118    D    80 
K    119    I    81 
D    120    K    82 
L    121    L    83 
A    122    V    84 
D    127    E    85 
F    128    N    86 
H    129    A    87 
F    130    F    88 
E    131    H    89 
E    132    T    90 
R    133    Q    91 
C    134    C    92 
S    135    I    93 
A    136    D    94 
C    137    C    95 
H    138    H    96 
L    139    K    97 
N    140    A    98 
Y    141    L    99 
E    142    K    100 
TER 
END 
