
PFRMAT DR
TARGET T0090
AUTHOR 1438-8319-9551
REMARK Predictions are for lack of fixed tertiary structure, 
REMARK that is, for "natively unfolded" or "intrinsically 
REMARK unstructured" segments. 
REMARK O=ordered, structured, folded
REMARK D=disordered, unstructured, unfolded
REMARK Last column is NOT probablity per se.  PONDR is a neural 
REMARK network predictor which produces a score.  The cutoff 
REMARK for Disorder is indicated by a score greater than 0.5.  
REMARK Studies have shown that neural network scores are roughly 
REMARK equivalent to probabilities. Short strings of amino acids 
REMARK predicted to be disordered occur more often due to chance 
REMARK than long strings of predicted disorder.
METHOD 
METHOD PONDR is a neural network Predictor of Natural Protein 
METHOD Disordered Regions.  The predictor used here is an integration
METHOD of three predictors, one for each termini and one for internal
METHOD sequences. For the internal sequences, a training set of 15 
METHOD disordered regions having a total of 1149 residues was compiled 
METHOD and balanced by an equal number of ordered residues taken 
METHOD randomly from NRL_3D. Of the 15 disordered regions in the 
METHOD training set, 8 were characterized by X-ray diffraction (PDB 
METHOD IDs: 2tbv, 2ts1, 1aui, 1bgw, 1elo, 1af3, 1ati and 1lbh) and 7 
METHOD by NMR (SW IDs: prio_mouse, h5_chick, flgm_salty, regn_lambd, 
METHOD hsf_klula, and hmgi_human, and PIR accession: S50866).  
METHOD 
METHOD From an initial pool of 31 attributes, a branch and bound search 
METHOD was used to select 10 attributes that gave the best collective 
METHOD discrimination between the order and disorder in the training 
METHOD set using a Mahalanobis distance criterion. The 31 attributes in 
METHOD the initial pool included the 20 amino acid compositions, two 
METHOD different hydropathy scales, flexibility index, alpha-moment, 
METHOD beta-moment, net charge (K + R - D - E), aromatic composition  
METHOD (W + F + Y), coordination number, codon number, alphabet size, 
METHOD and side chain volumes. The attributes selected by this process 
METHOD were fraction of W, Y, F, D, E, K, R, aromatic composition, 
METHOD coordination number, and net charge.
METHOD 
METHOD The back-propagation learning algorithm was used to train a 
METHOD feedforward neural network having the ten selected attributes as 
METHOD inputs, a fully connected hidden layer of ten neurons and a 
METHOD single output. To estimate errors, the training was repeated on 
METHOD 5 disjoint subsets each having 80% of the data with 3 different 
METHOD initializations, so neural network training was repeated 5 x 3 = 
METHOD 15 times. Once the accuracy was established by this 5-cross 
METHOD validation procedure, a new neural network was trained to the 
METHOD same accuracy using all the data.
METHOD  
METHOD To enable prediction from the first to the last residue in a 
METHOD protein, disorder was partitioned according to position, with 
METHOD the development of different predictors for N-terminal, and 
METHOD C-terminal regions (Li X, Romero P, Rani M, Dunker AK, Obradovic 
METHOD Z. Predicting protein disorder for N-, C-, and internal regions. 
METHOD Genome Informatics 1999;10:30-40).  These predictors used 8 
METHOD inputs.
METHOD 
METHOD The integration of the three predictors was carried out in 3 
METHOD steps. First, predictions were made by the three predictors over 
METHOD their respective domains, with overlapping predictions for 
METHOD positions 11 - 14 by the N-terminal and internal predictors, and, 
METHOD for a protein of length M, with overlapping predictions from M-14 
METHOD to M-11 by the C-terminal and internal predictors. Second, the 
METHOD values for each of the 4 pairs of overlapping prediction were 
METHOD averaged.  Third, the now integrated prediction outputs were 
METHOD smoothed by averaging over sliding windows of 9 amino acids, 
METHOD with the first and last 4 sequence positions being assigned the 
METHOD unsmoothed prediction output values from the N- and C-terminal 
METHOD predictors, respectively. This integrated predictor is used 
METHOD herein. 
REMARK In the predictions below, the indicated disorder from 90 - 122
REMARK (WLL...IVK) and 135 - 178 (GGT...AYQ) are long enough to be 
REMARK significant and thus indicate the likely presence of unobserved 
REMARK loops.  The prediction of order within a region of predicted 
REMARK disorder, e. g. the drop in scores from 123 - 134, suggests that 
REMARK this region may contain a binding site (Garner, et al. Genome 
REMARK Informatics 1999;10:41-50). Finally, the predictor has previously
REMARK performed fairly well even for short regions of disorder if they
REMARK are located at the ends of proteins (Li et al., Genome Informatics 
REMARK 1999: 10:30-40), so the 7 residue prediction of disorder at the 
REMARK amino terminus is likely to be significant.  
MODEL  1
M D 0.9099
L D 0.767803
K D 0.873028
P D 0.664547
D D 0.655069
N D 0.593222
L D 0.535211
P O 0.495406
V O 0.472641
T O 0.438084
F O 0.421567
G O 0.412643
K O 0.388137
N O 0.350066
D O 0.316793
V O 0.297113
E O 0.246245
I O 0.207674
I O 0.180521
A O 0.144877
R O 0.111986
E O 0.075879
T O 0.068349
L O 0.073607
Y O 0.089977
R O 0.10757
G O 0.114797
F O 0.114836
F O 0.121718
S O 0.124973
L O 0.130696
D O 0.133615
L O 0.127699
Y O 0.113794
R O 0.100276
F O 0.096913
R O 0.119776
H O 0.132885
R O 0.145012
L O 0.153879
F O 0.158027
N O 0.17095
G O 0.181136
Q O 0.188631
M O 0.191151
S O 0.183875
H O 0.187142
E O 0.212741
V O 0.235371
R O 0.265001
R O 0.270132
E O 0.269825
I O 0.273049
F O 0.276325
E O 0.26832
R O 0.259336
G O 0.232191
H O 0.22087
A O 0.188835
A O 0.181136
V O 0.185012
L O 0.187725
L O 0.209819
P O 0.227137
F O 0.234199
D O 0.233858
P O 0.222626
V O 0.22963
R O 0.234115
D O 0.228901
E O 0.218878
V O 0.211629
V O 0.250394
L O 0.291721
I O 0.347708
E O 0.377602
Q O 0.40549
I O 0.396333
R O 0.424798
I O 0.464453
A O 0.484325
A O 0.475399
Y O 0.44565
D O 0.40447
T O 0.389384
S O 0.391285
E O 0.446625
T O 0.467813
P O 0.484719
W D 0.525803
L D 0.554189
L D 0.614461
E D 0.678734
M D 0.735969
V D 0.775723
A D 0.797983
G D 0.822024
M D 0.847535
I D 0.84672
E D 0.847731
E D 0.847534
G D 0.847812
E D 0.855128
S D 0.862338
V D 0.871792
E D 0.877534
D D 0.876076
V D 0.871485
A D 0.870411
R D 0.865414
R D 0.863683
E D 0.862871
A D 0.864287
I D 0.865938
E D 0.86854
E D 0.865533
A D 0.856107
G D 0.829634
L D 0.755328
I D 0.681412
V D 0.615204
K D 0.519513
R O 0.427366
T O 0.33774
K O 0.252396
P O 0.186319
V O 0.151902
L O 0.184291
S O 0.20586
F O 0.218694
L O 0.248947
A O 0.314262
S O 0.386752
P O 0.464199
G D 0.534733
G D 0.588379
T D 0.63968
S D 0.696926
E D 0.754278
R D 0.821466
S D 0.848835
S D 0.862147
I D 0.870868
M D 0.876514
V D 0.877787
G D 0.855565
E D 0.833509
V D 0.814647
D D 0.79156
A D 0.782089
T D 0.779712
T D 0.78032
A D 0.773396
S D 0.75343
G D 0.737711
I D 0.722763
H D 0.699696
G D 0.686225
L D 0.644052
A D 0.623037
D D 0.600866
E D 0.583787
N D 0.581857
E D 0.592113
D D 0.556054
I D 0.527819
R D 0.521626
V D 0.553086
H D 0.550262
V D 0.552651
V D 0.53486
S D 0.521544
R D 0.507436
E D 0.521866
Q D 0.538846
A D 0.542295
Y D 0.525693
Q D 0.516279
W O 0.499464
V O 0.462008
E O 0.413342
E O 0.35488
G O 0.310759
K O 0.261389
I O 0.198036
D O 0.136346
N O 0.076484
A O 0.020611
A O 0.007571
S O 0.005115
V O 0.005414
I O 0.012483
A O 0.019749
L O 0.027028
Q O 0.031439
W O 0.040722
L O 0.052592
Q O 0.071796
L O 0.093282
H O 0.127597
H O 0.173044
Q O 0.204254
A O 0.234336
L O 0.278387
K O 0.322928
N O 0.348174
E O 0.337224
W O 0.452109
A O 0.49653
END


