PFRMAT AL 
TARGET T0075 
AUTHOR 5529-3140-9255 
REMARK Prediction team BENNER-COHEN has two group leaders, 
REMARK but we will consistently use Steven Benner's predictor 
REMARK number to avoid confusion concerning a second team 
REMARK connected with Fred Cohen (called Cohen, Fred). 
REMARK (Fred Cohen's number as a group leader is 6140-7890-6093). 
REMARK Prediction team members: D.L.Gerloff, G.Cannarozzi, 
REMARK M.Joachimiak, F.E.Cohen & S.A.Benner. 
REMARK 
REMARK This threading alignment prediction is accompanied by 
REMARK a separate secondary structure submission !! 
METHOD 
METHOD IMPORTANT REMARK: THIS STRUCTURE WAS MODELLED UNDER THE  
METHOD ASSUMPTION THAT THE STRUCTURE IS MONOMERIC. 
METHOD HOWEVER, AS THE C-TERMINAL PART HAS SOME SEQUENCE SIMILARITY TO THE 
METHOD HELIX-LOOP-HELIX PROTEINS (SETH & PAPAS, ONCOGENE (1990), 5:1761-1767)  
METHOD WE ARE SUBMITTING THE SUGGESTED THREADING ALIGNMENT WITH A MEMBER OF 
METHOD THE bHLH PROTEINS (1AN4, USF1) AS A SECOND ALTERNATIVE. IF THE TARGET 
METHOD PROTEIN OCCURRED AT A DIMER, ONE WOULD HAVE TO START FROM THE HLH- 
METHOD FRAGMENT AND PACK A FRAGMENT SIMILAR TO A PART OF MODEL1:75-100  
METHOD AGAINST THE SIDE NOT OCCUPIED BY DIMER-INTERACTIONS. DUE TO TIME 
METHOD PRESSURE, WE WILL NOT BE ABLE TO MODEL THIS ALTERNATIVE IN COORDINATE 
METHOD FORMAT. 
METHOD OUR FAVORITE MODEL IS NUMBER ONE, AND WE HAVE NO INDICATIONS FROM THE 
METHOD LITERATURE THAT T0075 IS A MANDATORY OLIGOMER. IN FACT, WE FOUND THE 
METHOD SOME REMARKS IN THE LITERATURE THAT THE ETS1 PROTEIN (FULL PROTEIN) 
METHOD APPEARS TO BE CAPABLE OF BINDING TO DNA AS A MONOMER. 
METHOD 
METHOD This model was assembled based on our secondary structure prediction 
METHOD which is submitted separately. Due to the absence of significant hits 
METHOD in the UCLA-DOE and ProCyon standard results, we attempted to assemble   
METHOD a core model ab initio, first, based on sequence analysis (Surf/Int/ 
METHOD Parse prediction) and three central assumptions specific to this problem. 
METHOD (A), The target is a monomer and forms a small core, starting approxi- 
METHOD mately at residue 54, or even 69. This conclusion is based on the lack 
METHOD of sequence similarity throughout the protein family in the N-terminal 
METHOD segment (which is consistent with the PRODOM domain assignment in this 
METHOD case), and on the occurrence of a strongly charged region 58-67 which  
METHOD is reminiscent of the "basic" regions found in DNA-binding proteins. 
METHOD (Our secondary structure prediction includes the N-terminus, however.) 
METHOD (B) Consistent with what is found in small protein cores, we attribute 
METHOD the high conservation of hydrophobic residues W72, V77, W80, W83, L90, 
METHOD L105, L125, L129, and possibly F113 to core packing. This was used 
METHOD to assemble our model together with (C), the prediction of a functional 
METHOD site including the highly conserved C106, H128; possible other candi- 
METHOD dates to complement this site are E111, H76, or E78. 
METHOD (we suspect a zinc-binding site, either monomeric or functioning 
METHOD as composite site between monomeric subunits in an oligomer). 
METHOD 
METHOD After restricting the number of plausible conformations based on the 
METHOD assumptions listed above, we noted that the folding topology of a  
METHOD winged DNA-binding domain might be helpful as a scaffold in the 
METHOD modeling, as the orientation of the secondary structure elements 
METHOD around a hydrophobic core seemed to fit the constraint of one of 
METHOD our favorite topologies, approximately. 
METHOD If we model onto this scaffold, residue H76 seems to be involved 
METHOD in the functional site (zinc?), while E87 is not -- note that  
METHOD these presence of these two residues at the functional site is 
METHOD mutually exclusive due to the predicted alpha helix inbetween. 
METHOD Accordingly, another plausible topology would have the buried 
METHOD helix V77-V85, and the loop/strand hairpin, rotated by 180 deg.  
METHOD and allow E87 to participate. We slightly prefer the first to- 
METHOD pology because of packing. However, we cannot assign a role to 
METHOD conserved position E87 -- this might indicate a problem in this 
METHOD prediction.  
METHOD 
METHOD Therefore, and because the search for plausible topologies was 
METHOD not exhaustive, we submit this model at LOW CONFIDENCE LEVEL. 
METHOD (Also, we cannot see any indication for similarity in functional 
METHOD sites between our target structure, 2stw_A (DBD) and the upstream 
METHOD fragment, T0075. This further weakens our prediction.) In addition, 
METHOD we did not search actively to find the best possible scaffold.  
METHOD Interesting in this context, the homolog to 2stw_A, 1pue_E was 
METHOD listed among the UCLA-DOE FR-server hits below significance level, 
METHOD so were a number of other members of the so-called winged DNA- 
METHOD binding proteins.  
METHOD 
METHOD In summary, it should be noted that we have insufficient grounds 
METHOD for predicting an evolutionary relationship between our scaffold 
METHOD ETS-domain and the T0075 fragment. However, the geometrical  
METHOD arrangement of the secondary structure elements seems plausible, 
METHOD while it is probably not the only plausible arrangement. 
METHOD 
REMARK  
REMARK WE HAVE GENERATED A COORDINATE MODEL FOR T0075 ALSO (WHICH IS MODEL1). 
REMARK WHERE THE MISSING PARTS HAVE BEEN FILLED IN (AT VERY LOW RESOLUTION). 
REMARK THE COORDINATE MODEL 1 WILL SUPERSEDE THIS ENTRY (AL-FORMAT MODEL1). 
REMARK  
MODEL  1 
PARENT 2stw_A 
T   75   Q   26 
H   76   L   27 
V   77   W   28 
R   78   Q   29 
D   79   F   30 
W   80   L   31 
V   81   L   32 
M   82   E   33 
W   83   L   34 
A   84   L   35 
V   85   T   36 
S   89   S   45 
L   90   W   46 
K   91   T   47 
D   94   D   49 
F   95   G   50 
Q   96   W   51 
K   97   E   52 
F   98   F   53 
C   99   K   54 
M   100   L   55 
G   102   P   58 
A   103   D   59 
A   104   E   60 
L   105   V   61 
C   106   A   62 
A   107   R   63 
L   108   R   64 
G   109   W   65 
D   123   E   77 
I   124   K   78 
L   125   L   79 
W   126   S   80 
E   127   R   81 
H   128   G   82 
L   129   L   83 
E   130   R   84 
I   131   Y   85 
L   132   Y   86 
Q   133   Y   87 
TER 
END 
