Why are proteins marginally stable?
D. Taverna and R. A. Goldstein
Proteins, in press.
Most globular proteins are marginally stable regardless of size or
activity. The most common interpretation is that proteins must be
marginally stable in order to function, and so marginal stability
represents the results of positive selection. We consider the issue of
marginal stability directly using model proteins and the dynamical aspects
of protein evolution in populations. We find that the marginal stability
of proteins is an inherent property of proteins due to the high
dimensionality of the sequence space, without regard to protein function.
In this way, marginal stability can result from neutral, non-adaptive
evolution. By allowing protein sub-populations with different stability
requirements for functionality to compete, we find that marginally stable
populations of proteins tend to dominate. Our results show that
functionalities consistent with marginal stability have a strong
evolutionary advantage, and might arise because of the natural tendency of
proteins towards marginal stability.
The distribution of indel lengths
Protein sequence alignment has become a widely used method in the study
of newly-sequenced proteins. Most sequence alignment methods use an affine
gap penalty to assign scores to insertions and deletions. While affine gap
penalties represent the relative ease of extending a gap compared with
initializing a gap, it is still an obvious over-simplification of the real
processes that occur during sequence evolution. In order to improve the
efficiency of sequence alignment methods and to obtain a better
understanding of the process of sequence evolution, we wish to find a
more-accurate model of insertions and deletions in homologous proteins. In
this work, we extract the probability of a gap occurrence and the
resulting gap length distribution in distantly related proteins (sequence
identity less than 25%) using alignments based on their common structures.
We observe a distribution of gaps that can be fitted with a
multi-exponential with four distinct components. The results suggest new
approaches to modeling insertions and deletions in sequence alignments.
Evolution of functionality in lattice
proteins
We study the evolution of protein functionality using a 2-dimensional
lattice model. We find that characteristics particular to evolution, such
as population dynamics and the early evolutionary trajectories, have a
large effect on the distribution of observed structures. We find little
difference between the distribution of structures evolved for function and
those evolved for their ability to form compact structures.
Optimization of a new score function for
the detection of remote homologs
The growth in protein sequence data has placed a premium on ways to
infer structure and function of the newly sequenced proteins. One of the
most effective ways is to identify a homologous relationship with a
protein about which more is known. While close evolutionary relationships
can be confidently determined with standard methods, the difficulty
increases as the relationships become more distant. All of these methods
rely on some score function to measure sequence similarity. The choice of
score function is especially critical for these distant relationships. We
describe a new method of determining a score function, optimizing the
ability to discriminate between homologs and non-homologs. We find that
this new score function performs better than standard score functions for
the identification of distant homologies.
How to Generate Improved Potentials for Protein Tertiary
Structure Prediction: A Lattice Model Study
Success in the protein structure prediction problem relies heavily on
the choice of an accurate potential function. One approach towards extracting
these potentials from a database of known protein structures is to maximize
the Z-score of the database proteins, maximizing the ability of the potential
to discriminate correct from random conformations. These optimization methods
have an unfortunate tendency to underestimate the repulsive interactions,
leading to reduced accuracy and predictive ability. Using a lattice model,
we show how this tendency is due to the Gaussian form assumed for the energies
of the ensemble of random structures, and show how we can weight the distribution
to suppress the high-energy state contribution to the Z-score calculation.
The result is a potential that is more accurate and more likely to yield
correct predictions than other Z-score optimization methods as well as
potentials of mean force.
Optimizing for Success: A New Score Function For Distantly Related
Protein Sequence Comparison
The exponential growth of the sequence data produced by the genome projects
motivates the development of better ways of inferring structural and functional
information about those newly sequenced proteins. Looking for similarities
between these probe protein sequences and other protein sequences in the
database has proved to be one of the most useful current techniques. This
procedure, known as sequence comparison, relies on the use of an appropriate
score function that discriminates homologs from non-homologs. Current score
functions have difficulty identifying distantly-related homologs with low
sequence similarity. As a result, there is an increased demand for a new
score function that yields statistically-significant higher scores for
all the pairs of homologous protein sequences including such distantly-related
homologs. We present a new method for generating a score function by optimizing
it for successful discrimination between homologous and unrelated proteins.
The performance of the new score function (OPTIMA) on a set of distantly
related protein sequences was compared with other substitution matrices
of common use, obtained with different methods. OPTIMA performs better
than Dayhoff's PAM250, structural derived matrices (JTT), Gonnet et al.
matrix (GONN,an improvement of the PAM series), and than the widely used
BLOSUM 62(BL62). This improvement can have a big impact in the distinction
for increasing amount of protein sequences.
Modeling Evolution at the Protein Level using an Adjustable Amino
Acid Fitness Model.
An adjustable fitness model for amino acid site substitutions
is investigated. This model, a generalization of previously developed evolutionary
models, has several distinguishing characteristics: it separately accounts
for the processes of mutation and substitution, allows for heterogeneity
among substitution rates and among evolutionary constraints, and does not
make any prior assumptions about which sites or characteristics of proteins
are important to molecular evolution. While the model has fewer adjustable
parameters than the general reversible mtREV model, when optimized it outperforms
mtREV in likelihood analysis on protein-coding mitochondrial genes. In
addition, the optimized fitness parameters of the model show correspondence
to some biophysical characteristics of amino acids.
The Evolution of Duplicated Genes Considering Protein Stability
Constraints
We model the evolution of duplicated genes by assuming that the gene's
protein message, if transcribed and translated, must form a stable, folded
structure. We observe the change in protein structure over time in an evolving
population of lattice model proteins. We find that selection of stable
proteins conserves the original structure if the structure is highly designable,
that is, if a large fraction of all foldable sequences form that structure.
This effect implies the relative number of pseudogenes can be less than
previously predicted with neutral evolution models. The data also suggests
a reason for lower than expected ratios of non-synonymous to synonymous
substitutions in pseudogenes.
Surveying Determinants of Protein Structure Designability across
Different Energy Models and Amino-Acid Alphabets: A Consensus
A variety of analytical and computational models have been proposed
to answer the question of why some protein structures are more ``designable''
(i.e. have more sequences folding into them) than others. One class
of analytical and statistical-mechanical models has approach the designability
problem from a thermodynamic viewpoint. These models highlighted specific
structural features important for increased designability. Furthermore,
designability was shown to be inherently related to thermodynamically-relevant
energetic measures of protein folding such as the foldability F
and energy gap The Distribution of Structures in Evolving Protein Populations
Proteins exhibit a non-uniform distribution of structures. A number
of models have been advanced to explain this observation by considering
the distribution of designabilities, that is, the fraction of all sequences
that could successfully fold into any particular native structure. In this
paper, we describe how population dynamics can make the distribution of
observed protein structures more uneven than the distribution of designabilites.
Additional factors, such as the topology of the sequence space and the
similarity of other structures, can also influence this distribution.
Universal Correlation between Energy Gap and Foldability for the
Random Energy Model and Lattice Proteins
The Random Energy Model, originally used to analyze the physics
of spin glasses, has been employed to explore what makes a protein a good
folder versus a bad folder. In earlier work, the ratio of the folding temperature
over the glass-transition temperature was related to a statistical measure
of protein energy landscapes denoted as the foldability F. It was
posited and subsequently established by simulation that good folders had
larger foldabilites, on average, than bad folders. An alternative hypothesis,
equally verified by protein folding simulations, was that it is the energy
gap Estimating the Total Number of Protein Folds
Many seemingly unrelated protein families share common folds.
Theoretical models based on structure designability have suggested that
a few folds should be very common while many others have low probability.
In agreement with the predictions of these models, we show that the distribution
of observed protein families over different folds can be modeled with a
highly-stretched exponential. Our results suggest that there are approximately
4000 possible folds, some so unlikely that only 2000 folds exist among
naturally-occuring proteins. Due to the large number of extremely rare
folds, constructing a comprehensive database of all existent folds would
be difficult. Constructing a database of the most-likely folds representing
the vast majority of protein families would be considerably easier.
Using Physical-Chemistry Based Mutation Models in Phylogenetic Analyses
of HIV-1 Subtypes
HIV-1 subtype phylogeny is investigated using a previously-developed
computational model of natural amino acid site mutations. This model, based
on Boltzmann statistics and Metropolis kinetics, involves an order of magnitude
fewer adjustable parameters than traditional mutation matrices and deals
more effectively with the issue of protein site-heterogeneity. After training
on sequences of HIV-1 envelope (env) proteins from a few specific
subtypes, our model is more likely to describe the evolutionary record
for other subtypes than methods using a single mutation matrix, even a
matrix optimized over the same data. Pairwise distances are calculated
between various probabilistic ancestral subtype sequences, and a distance
matrix approach is used to find the optimal phylogenetic tree. Our results
indicate that the relationship between subtypes B, C, and D may be closer
than previously thought.
Effect of Alphabet Size and Foldability Requirements on Protein Structure
Designbility
A number of investigators have addressed the issue of why certain
protein structures are especially common by considering structure "designability",
defined as the number of sequences that would successfully fold into any
particular native structure. One such approach, based on "foldability",
suggested that structures could be classified according to their maximum
possible foldability and that this optimal foldability would be highly
correlated with structure designability. Other approaches have focused
on computing the designability of lattice proteins written with reduced
two-letter amino-acid alphabets. These different approaches suggested contrasting
characteristics of the most designable structures. This report compares
the designability of lattice proteins over a wide range of amino-acid alphabets
and foldability requirements. While all alphabets have a wide distribution
of protein designabilities, the form of the distribution depends on how
protein "viability" is defined. Furthermore, under increasing foldability
requirements, the change in designabilities for all alphabets are in good
agreement with the previous conclusions of the foldability approach. Most
importantly, it was noticed that those structures which were highly designable
for the two-letter amino-acid alphabets are not especially designable with
higher-letter alphabets.
Optimizing Potentials for the Inverse Protein Folding Problem
Inverse protein folding, which seeks to identify sequences that
fold into a given structure, has been approached by threading candidate
sequences onto the structure and scoring them with database-derived potentials.
The sequences with the lowest energies are predicted to fold into that
structure. It has been argued that the limited success of this type of
approach is not due to the discrepancy between the scoring potential and
the true potential but is rather due to the fact that sequences choose
their lowest-energy structure rather than structures choosing the lowest-energy
sequences. Here we develop a non-physical potential scheme optimized for
the inverse folding problem. We maximize the average probability of success
for a set of lattice proteins to obtain the optimal potential energy function,
and show that the potential obtained by our method is more likely to produce
successful predictions than the true potential.
Optimizing Energy Potentials for Success in Protein Tertiary Structure
Prediction
Success in the protein structure prediction problem relies on
the choice of an accurate potential function. For a single protein sequence,
Wolynes and co-workers showed that the potential function can be optimized
for predictive success by maximizing the energy gap between the correct
structure and the ensemble of random structures relative to the distribution
of the energies of these random structures (Z-score). Different ways have
been described of implementing this procedure for an ensemble of database
proteins. Here we demonstrate a new approach to carrying out this task.
For a single protein sequence, the probability of success (i.e.
the probability that the folded state is the lowest energy state) is derived.
We then maximize the average probability of success for a set of proteins
to obtain the optimal potential energy function. This results in maximum
attention being focused on those proteins whose structures are difficult
but not impossible to predict. Using a lattice model of proteins, we show
that the optimal interaction potentials obtained by our method are both
more accurate and more likely to produce successful predictions than those
obtained by other averaging procedures.
On the Thermodynamic Hypothesis of Protein Folding
The validity of the thermodynamic hypothesis of protein folding
was explored by simulating the evolution of protein sequences. Simple models
of lattice proteins were allowed to evolve by random point mutations subject
to the constraint that they fold into a pre-determined native structure
using a Monte Carlo folding algorithm. We employed a simple analytical
approach to compute the probability of violation of the thermodynamic hypothesis
as a function of the size of the protein, the fraction of the total number
of possible conformations which are kinetically accessible, and the roughness
of the free-energy landscape. It was found that even if the folding is
under kinetic control, the sequence will evolve so that the native state
is most often the state of minimum free energy.
Models of Natural Mutations Including Site Heterogeneity
New computational models of natural site mutations are developed
that account for the different selective pressure acting on different locations
in the protein. The number of adjustable parameters is greatly reduced
by basing the models on the underlying physical-chemical properties of
the amino acids. This allows us to use our method on small data sets built
of specific protein types. We demonstrate that with this approach we can
represent the evolutionary patterns in HIV envelope proteins far better
than with more traditional methods.
Beyond Mutation Matrices: Physical-chemistry Based Evolutionary Models
We describe a model for characterizing site mutations in evolving
proteins. By representing the fitness of each of the amino acids as a function
of the physical-chemical properties of that amino acid, and constructing
mutation matrices based on Boltzmann statistics and Metropolis kinetics,
we are able to greatly reduce the number of adjustable parameters. This
allows us to include site heterogeneity in the model, as well as to optimize
the model for specific protein types. We demonstrate the applicability
of the model by investigating the phylogenetic relationship between various
subtypes of HIV-1.
Evolution of Model Proteins on a Foldability Landscape
We model the evolution of simple lattice proteins as a random
walk in a fitness landscape, where the fitness represents the ability of
the protein to fold. At higher selective pressure the evolutionary trajectories
are confined to neutral networks where the native structure is conserved
and the dynamics are non self-averaging and non-exponential. The optimizability
of the corresponding native structure has a strong effect on the size of
these neutral networks, and thus on the nature of the evolutionary process.
Site Mutations in Model Proteins
Model proteins can be used to understand the process of site mutations.
We simulate the evolution of lattice proteins, requiring that every sequence
during the evolutionary trajectory be sufficiently able to fold. We can
then study what mutations are accepted, and how these relative mutation
rates depend upon surface accessibility. We measure the degree of conservation
of the mutation by how much it affects the intramolecular interactions
that determine the native structure and the foldability. We find that although
substitutions in the interior of the protein are more conservative than
those on the protein exterior in terms of substituting similar amino acids,
the changes in the interactions are comparable in these two different cases.
The advantages of the interaction landscape approach are discussed.
The Foldability Landscape of Model Proteins
Molecular evolution may be considered as a walk in a multi-dimensional
fitness landscape, where the fitness at each point is associated with features
such as the function, stability and survivability of these molecules. We
present a simple model for the evolution of protein sequences on a landscape
with a precisely defined fitness function. We use simple lattice models
to represent protein structures, with the ability of a protein sequence
to fold into the structure with lowest energy, quantified as the foldability,
represents the fitness of the sequence. The foldability of the sequence
is characterized based on the spin glass model of protein folding. We consider
evolution as a walk in this foldability landscape and study the nature
of the landscape and the dynamics on such a landscape. Selective pressure
is explicitly included in this model in the form of a minimum foldability
requirement. We find that different native structures are not evenly distributed
in interaction space, with similar structures and structures with similar
optimal foldabilities clustered together. Evolving proteins marginally
fulfill the selective criteria of foldability. As the selective pressure
is increased, evolutionary trajectories become increasingly confined to
``neutral networks'', where the sequence and the interactions can be significantly
changed while a constant structure is maintained.
Predicting Protein Secondary Structure Using Probabilistic Substitution
Schemata
We demonstrate the applicability of our previously developed Bayesian
probabilistic approach for predicting residue solvent accessibility to
the problem of predicting secondary structure. Using only single sequence
data, this method achieves a 3-state accuracy of 67% over a database of
473 non-homologous proteins. This approach is more amenable to inspection
and less likely to overlearn specifics of a dataset than ``black box''
methods such as neural networks. It is also conceptually simpler and less
computationally costly. We also introduce a novel method for representing
and incorporating multiple sequence alignment information within the prediction
algorithm, achieving 72% accuracy over a dataset of 304 non-homologous
proteins. This is accomplished by creating a statistical model of the evolutionarily-derived
correlations between patterns of amino acid substitution and local protein
structure. This model consists of parameter vectors, termed ``substitution
schemata'', which probabilistically encode the structure-based heterogeneity
in the distributions of amino acid substitutions found in alignments of
homologous proteins. The model is optimized for structure prediction by
maximizing the mutual information between the set of schemata and the database
of secondary structures. Unlike ``expert heuristic'' methods, this approach
has been demonstrated to work well over large datasets. Unlike the opaque
neural network algorithms, this approach is physicochemically intelligible.
Moreover, the model optimization procedure, the formalism for predicting
one-dimensional structural features, and our previously developed method
for tertiary structure recognition all share a common Bayesian probabilistic
basis. This consistency starkly contrasts with the hybrid and ad hoc
nature of methods which have dominated this field in recent years.
Compaction and Folding in Model Proteins
Protein folding is modeled as diffusion on a free-energy landscape,
allowing use of the diffusion equation to study the impact of energetic
parameters on the folding dynamics. The free-energy landscape is characterized
by two different order parameters, one representing the degree of compactness,
the other a measure of the progress towards the folded state. For marginally
stable proteins, fastest folding is achieved when the non-specific interactions
favoring compaction are strong, resulting in a high folding temperature.
Such proteins fold by rapid collapse followed by slower accumulation of
correct contacts.
Protein Heteronuclear NMR Assignments using Mean-Field Simulated
Annealing
A computational method for the assignment of the NMR spectra of
larger (21 kDa) proteins using a set of six of the most sensitive heteronuclear
multidimensional nuclear magnetic resonance experiments is described. Connectivity
data obtained from HNCa, HN(CO)Ca, HN(Ca)Ha, and Ha(CaCO)NH and spin-system
identification data obtained from CP-(H)CCH-N TOCSY and CP-(H)C(CaCO)NH
TOCSY were used to perform sequence-specific assignments using a mean-field
formalism and simulated annealing. This mean-field method reports the resonance
assignments in a probabilistic fashion, displaying the certainty of assignments
in an unambiguous and quantitative manner. This technique was applied to
the NMR data of the 172-residue peptide-binding domain of the E. coli heat-shock
protein, DnaK. The method is demonstrated to be robust to significant amounts
of missing, spurious, noisy, extraneous, and erroneous data.
Mutation Matrices and Physical-Chemical Properties: Correlations
and Implications
In order to better understand how the properties of individual
amino acids result in proteins with particular structures and functions,
we have examined the correlations between previously-derived
structure-dependent mutation rates and changes in various
physical-chemical properties of the amino acids such as volume,
charge, Probabilistic Reconstruction of Ancestral Protein Sequences
Using a maximum likelihood formalism, we have developed a method
to reconstruct the sequences of ancestral proteins. Our approach allows
the calculation of not only the most probable ancestral sequence, but also
computes the probability of all amino acids at any given node in the evolutionary
tree. Because we consider evolution on the amino acid level, we are better
able to include effects of evolutionary pressure, and take advantage of
structural information about the protein through the use of mutation matrices
that depend on secondary structure and surface accessibility. The computational
complexity of this method scales linearly with the number of homologous
proteins used to reconstruct the ancestral sequence.
Why are some Protein Structures so Common?
Many biological proteins are observed to fold into one of a limited
number of structural motifs. By considering the requirements imposed on
proteins by their need to fold rapidly, and the ease with which such requirements
can be fulfilled as a function of the native structure, we can explain
why certain structures are repeatedly observed among proteins with negligible
sequence similarity. This work has implications for the understanding of
protein sequence-structure relationships as well as protein evolution.
Predicting Solvent Accessibility: Higher Accuracy Using Bayesian
Statistics and Optimized Residue Substitution Classes
We introduce a novel Bayesian probabilistic method for predicting
the solvent accessibilities of amino acid residues in globular proteins.
Using single sequence data this method achieves prediction accuracies higher
than previously published methods. Substantially improved predictions--comparable
to the highest accuracies reported in the literature to date--are obtained
by representing alignments of the example proteins and their homologs as
strings of residue substitution classes depending on the side chain types
observed at each alignment position. These results demonstrate the applicability
of this relatively simple Bayesian approach to structure prediction and
illustrate the utility of the classification methodology previously developed
to extract information from aligned sets of structurally related proteins.
Constructing Amino Acid Residue Substitution Classes Maximally Indicative
of Local Protein Structure
Using an information theoretic formalism, we optimize classes
of amino acid substitution to be maximally indicative of local protein
structure. Our statistically-derived classes are loosely identifiable with
the heuristic constructions found in previously published work. However,
while these other methods provide a more rigid idealization of physicochemically-constrained
residue substitution, our classes provide substantially more structural
information with many fewer parameters. Moreover, these substitution classes
are consistent with the paradigmatic view of the sequence--to--structure
relationship in globular proteins which holds that the three-dimensional
architecture is predominantly determined by the arrangement of hydrophobic
and polar side chains with weak constraint on the actual amino acid identities.
More specific constraints are imposed on the placement of prolines, glycines
and the charged residues. These substitution classes have been used in
highly accurate predictions of residue solvent accessibility. They could
also be used in the identification of homologous proteins, the construction
and refinement of multiple sequence alignments, and as a means of condensing
and codifying the information in multiple sequence alignments for secondary
structure prediction and tertiary fold recognition.
Correlating Structure-Dependent Mutation Matrices with Physical-Chemical
Properties
We have investigated how structure-dependent mutation matrices
derived in previous work correlate with various physical-chemical properties
of the 20 naturally occurring amino acids. Among the properties we investigated
were Optimal Local Propensities for Model Proteins
Lattice models of proteins were used to examine the role of local
propensities in stabilizing the native state of a protein, using techniques
drawn from the spin-glass theory to characterize the free-energy landscapes.
In the strong evolutionary limit, optimal conditions for folding
is achieved when the contributions from local interactions to the stability
of the native state of the protein is small. Further increasing the local
interactions rapidly decreases the foldability.
Context-Dependent Optimal Substitution Matrices Derived Using Bayesian
Statistics and Phylogenetic Trees
Substitution matrices are a key tool in important applications
such as identifying sequence homologies, creating sequence alignments,
and more recently using evolutionary patterns for the prediction of protein
structure. We have derived a novel approach to the derivation of these
matrices that utilizes not only multiple sequence alignments, but also
the associated evolutionary trees. The key to our method is the use of
a Bayesian formalism to calculate the probability that a given substitution
matrix fits the tree structures and multiple sequence alignment data. With
this ability, we can determine optimal substitution matrices for various
local environments, depending upon parameters such as secondary structure
and surface accessibility.
Searching for Foldable Protein Structures using Optimized Energy
Functions
During evolution, the effective interactions between residues
in a protein can be adjusted through mutations to allow the protein to
fold to its native structure on an adequate time-scale. We seek to address
the question, are there some structures that can be better optimized than
others? Using exhaustive enumeration of the compact conformations of short
proteins confined to simple lattices, we find that the best structures
are those that contain contacts rare in random structures, indicating the
importance of non-local contacts for assisting the folding process. Certain
structural motifs such as long Optimized Energy Functions for Tertiary Structure Prediction and
Recognition
A theoretical basis for the alignment of a protein sequence to
a set of protein structure templates is presented, based on a Bayesian
statistical analysis. The optimal Hamiltonian for this threading is closely
related to the Hamiltonian optimized for molecular dynamics based on spin-glass
theory. The Bayesian theory provides the optimal penalty functions for
insertions and deletions in the alignment, which can be put in the form
of a chemical potential. In contrast to standard methods for determining
gap penalities, these penalties involve the logarithm of the probability
distribution of gaps in alignments against correct templates as compared
to the probability distribution of gaps in alignments against random templates,
as determined self-consistently. Sequences of unknown proteins can be aligned
to known protein structures, identifying similar structural motifs and
generating reasonably correct alignments.
3-Dimensional Model for the Hormone-Binding Domains of Steroid-Receptors
We have used a motif-based structural search method to identify
structural homologs of the hormone binding domains of the nuclear receptors
from among a set of known protein structures and have found the closest
similarity with members of the subtilisin-like serine proteases. These
proteins consist of an open twisted sheet of parallel Protein Tertiary Structure Prediction using Optimized Hamiltonians
with Local Interactions
Protein folding codes embodying local interactions including surface
and secondary structure propensities and residue-residue contacts are optimized
for a set of training proteins by using spin-glass theory. A screening
method based on these codes correctly matches the structure of a set of
test proteins with proteins of similar topology with 100% accuracy, even
with limited sequence similarity between the test proteins and the structural
homologs and the absence of any structurally similar proteins in the training
set.
Optimal Protein-Folding Codes from Spin-Glass Theory
Protein-folding codes embodied in sequence-dependent energy functions
can be optimized using spin-glass theory. Optimal folding codes for associative-memory
Hamiltonians based on aligned sequences are deduced. A screening method
based on these codes correctly recognizes protein structures in the "twilight
zone" of sequence identity in the overwhelming majority of cases. Simulated
annealing for the optimally encoded Hamiltonian generally leads to qualitatively
correct structures.
B. Qian and R. A. Goldstein
Proteins, 45 (2001), 102-104.
P. D. Williams, D. D. Pollock, and R. A.
Goldstein
Journal of Molecular Graphics and Modelling
19 (2001), 150-156.
M. Kann, B. Qian, and R.A. Goldstein
Proteins 41 (2000), 498-503.
T.-L. Chiu and R.A. Goldstein
Proteins 41 (2000), 157-163.
M. Kann and R.A. Goldstein
RECOMB 2000, in press.
M.W. Dimmic, D. P. Mindell, and R.A. Goldstein
Pacific Symposium on Biocomputing 2000.
D.M. Taverna and R.A. Goldstein
Pacific Symposium on Biocomputing 2000.
N.E.G. Buchler and R.A. Goldstein
Journal of Chemical Physics 112 (2000), 2533-2547.
.
However,
many of these models have been done within a very narrow focus: namely,
pair-contact interactions and two-letter amino-acid alphabets. Recently,
two-letter amino-acid alphabets for pair-contact models have been shown
to contain designability artifacts which disappear for larger-letter amino-acid
alphabets. In addition, a solvation model was demonstrated to give identical
designability results to previous two-letter amino-acid alphabet pair-contact
models. In light of these discordant results, this report synthesizes a
broad consensus regarding the relationship between specific structural
features, foldability F, energy gap
,
and structure designability for different energy models (pair-contact vs.
solvation) across a wide range of amino-acid alphabets. We also propose
a novel measure
Z which is shown to be well correlated to designability.
Finally, we conclusively demonstrate that two-letter amino-acid alphabets
for pair-contact models appear to be solvation models in disguise.
D.M. Taverna and R.A. Goldstein
Biopolymers 53 (2000), 1-8.
N.E.G. Buchler and R.A. Goldstein
Journal of Chemical Physics 111 (1999), 6599-6609.
between the
native
state and the next highest energy that distinguishes good folders from
bad folders. This duality of measures has led to some controversy and confusion
with little done to reconcile the two. In this paper, we revisit the Random
Energy Model to derive the statistical distributions of the various energy
gaps and foldability. The resulting joint distribution allows us to explicitly
demonstrate the positive correlation between foldability and energy gap.
In addition, we compare the results of this analytical theory with a variety
of lattice models. Our simulations indicate that both the individual distributions
and the joint distribution of foldability and energy gap agree qualitatively
well with the Random Energy Model. It is argued that the universal distribution
of and the positive correlation between foldability and energy gap, both
in lattice proteins and the REM, is simply a stochastic consequence of
the ``Thermodynamic Hypothesis''.
S. Govindarajan, R. Recabarren, and R.A. Goldstein
Proteins 35 (1999), 408-414.
J.M. Koshi, D.P. Mindell, and R.A. Goldstein
Molecular Biology and Evolution 16 (1999), 173-179.
N.E.G. Buchler and R.A. Goldstein
Proteins 34 (1999), 113-124.
T.-L. Chiu and R.A. Goldstein
Protein Engineering 11 (1998), 749-752.
T.-L. Chiu and R.A. Goldstein
Folding & Design 3 (1998), 223-228.
S. Govindarajan and R.A. Goldstein
Proc. Nat'l Acad. Sci (USA) 95 (1998), 5545-5549.
J.M. Koshi and R.A. Goldstein
Proteins 32 (1998), 289-295.
J.M. Koshi, D.P. Mindell, and R.A. Goldstein
Genome Informatics 1997 (1998), refereed conference proceeding.
S. Govindarajan and R.A. Goldstein
Proteins 29 (1997), 461-466.
S. Govindarajan and R.A. Goldstein
Mathematical Modelling and Scientific Computing (1997).
S. Govindarajan and R.A. Goldstein
Biopolymers 42 (1997), 427-438.
M.J. Thompson and R.A. Goldstein
Protein Science 6 (1997), 1963-1975.
T.-L. Chiu and R.A. Goldstein
Journal of Chemical Physics 107 (1997), 4408-4415.
N.E.G. Buchler, E.R.P. Zuiderweg, H. Wang, and R.A. Goldstein
Journal of Magnetic Resonance 125 (1997), 34-42.
J.M. Koshi and R.A. Goldstein
Proteins 27 (1996), 336-344.
-helical
and
-sheet
propensity, and hydrophobicity. In most cases we found the
G of transfer from octanol
to water to be the best model for evolutionary constraints, in contrast to
the much weaker correlation with the
G of transfer from cyclohexane to water, a property
found highly correlated to changes in stability in site-directed
mutagenesis studies. This suggests that natural evolution may follow
different rules than those suggested by results obtained in the
laboratory. A high degree of conservation of a surface residue's relative
hydrophobicity was also observed, a fact which can not be explained based
on constraints on protein stability, but may reflect the consequences of
the reverse-hydrophobic effect. Local propensity, especially
-helical propensity,
is rather poorly conserved during evolution, indicating that non-local
interactions dominate protein structure formation. We found that changes
in volume were important in specific cases, most significantly in transitions
among the hydrophobic residues in buried locations. To demonstrate how
these techniques could be used to understand particular protein families,
we derived and analyzed mutation matrices for the hypervariable and framework
regions of antibody light chain V regions. We found a surprisingly high
conservation of hydrophobicity in the hypervariable region, possibly indicating
an important role for hydrophobicity in antigen recognition.
J.M. Koshi and R.A. Goldstein
Journal of Molecular Evolution 42 (1996), 413-420.
S. Govindarajan and R.A. Goldstein
Proc. Nat'l Acad. Sci (USA) 93 (1996), 3341-3345.
M.J. Thompson and R.A. Goldstein
Proteins 25 (1996), 38-47.
M.J. Thompson and R.A. Goldstein
Proteins 25 (1996), 28-37.
J.M. Koshi and R.A. Goldstein
Pacific Symposium on Biocomputing '96, (L. Hunter and T. E. Klein
eds) (1995), 488-499.
G of transfer
from water to octanol and cyclohexane,
helical and
sheet propensity, size, and charge. We found that
the
G of transfer
to octanol had a high correlation with matrices for all categories of
residues, especially the matrices for buried and exposed positions. This
result suggests that octanol is a good model for understanding both the
changes in stability resulting from substitutions of buried residues and
changes in foldability resulting from varying exposed residues. We also
found the correlations of the matrices with size and charge varied with
the local environment, and that neither
helical nor
sheet propensity had high correlations with most
matrices. Thus, conservation of size and charge appear to be important in
specific environments, and conservation of
-helix and
-sheet propensity do not seem
to be key factors.
S. Govindarajan and R.A. Goldstein
Proteins 22 (1995), 413-418.
J.M. Koshi and R.A. Goldstein
Protein Engineering 8 (1995), 641-645.
S. Govindarajan and R.A. Goldstein
Biopolymers 36 (1995), 43-51.
-hairpins, Greek-key motifs, and jelly rolls, commonly found in
proteins of known structure, have a high degree of optimizability.
Contrary to what might be expected, positive correlations between the
various interactions reduce optimizability. The optimization procedure
produces a correlated energy landscape, which might assist folding.
R.A. Goldstein, Z.A. Luthey-Schulten, and P.G. Wolynes
Protein Structure by Distance Analysis (1994), 135-144.
R.A. Goldstein, J.A. Katzenellenbogen, Z.A. Luthey-Schulten, D.A. Seielstad,
and P.G. Wolynes
Proc. Nat'l Acad. Sci (USA) 90 (1993), 9949-9953.
-strands flanked on both sides
by
-helices.
The alignment with the protease scaffold was refined by using multiple
sequence prealignment of different sets of nuclear receptors, and alternative
model structures were screened by considering their consistency with the
results of biochemical experiments defining the ligand binding pocket.
In the most favored model, nearly all of the residues thought to be involved
in ligand binding map to a pocket of appropriate dimensions where the subtilisin-like
proteases have their active site. The three-dimensional model that we propose
for the hormone binding domains of the nuclear receptors provides a framework
for the design of experiments to further investigate nuclear receptor structure
and function.
R.A. Goldstein, Z.A. Luthey-Schulten, and P.G. Wolynes
Proc. Nat'l Acad. Sci (USA) 89 (1992), 9029-9033.
R.A. Goldstein, Z.A. Luthey-Schulten, and P.G. Wolynes
Proc. Nat'l Acad. Sci (USA) 89 (1992), 4918-4922.
|
|