/biocomputing/

ABSTRACTS OF SELECTED PAPERS

Computational protein design: validation and possible relevance as a tool for homology searching and fold recognition.
M. Schmidt am Busch, A. Sedano & T. Simonson (2010) PLOS One,  5, article e10410.

ABSTRACT

Background Protein fold recognition usually relies on a statistical model of each fold; each model is constructed from an ensemble of natural sequences belonging to that fold. A complementary strategy may be to employ sequence ensembles produced by computational protein design. Designed sequences can be more diverse than natural sequences, possibly avoiding some limitations of experimental databases. Methodology/Principal findings We explore this strategy for four SCOP families: Small Kunitz-type inhibitors (SKIs), Interleukin-8 chemokines, PDZ domains, and large Caspase catalytic subunits, represented by 43 structures. An automated procedure is used to redesign the 43 proteins. We use the experimental backbones as fixed templates in the folded state and a molecular mechanics model to compute the interaction energies between sidechain and backbone groups. Calculations are done with the Proteins@ Home volunteer computing platform. A heuristic algorithm is used to scan the sequence and conformational space, yielding 200,000–300,000 sequences per backbone template. The results confirm and generalize our earlier study of SH2 and SH3 domains. The designed sequences ressemble moderately-distant, natural homologues of the initial templates; e.g., the SUPERFAMILY, profile Hidden-Markov Model library recognizes 85% of the low-energy sequences as native-like. Conversely, Position Specific Scoring Matrices derived from the sequences can be used to detect natural homologues within the SwissProt database: 60% of known PDZ domains are detected and around 90% of known SKIs and chemokines. Energy components and inter-residue correlations are analyzed and ways to improve the method are discussed. Conclusions/Significance For some families, designed sequences can be a useful complement to experimental ones for homologue searching. However, improved tools are needed to extract more information from the designed profiles before the method can be of general use.


Predicting the acid/base behavior of proteins: a constant-pH Monte Carlo approach with Generalized Born solvent.
A. Aleksandrov, S. Polydorides, G. Archontis & T. Simonson (2010) Journal of Physical Chemistry B, 114, 10634-10648.

ABSTRACT

The acid/base properties of proteins are essential in biochemistry, and proton binding is a valuable reporter on electrostatic interactions. We propose a constant-pH Monte Carlo strategy to compute protonation free energies and pKa’s. The solvent is described implicitly, through a generalized Born model. The electronic polarizability and backbone motions of the protein are included through the protein dielectric constant. Side chain motions are described explicitly, by the Monte Carlo scheme. An efficient computational algorithm is described, which allows us to treat the fluctuating shape of the protein/solvent boundary in a way that is numerically exact (within the GB framework); this contrasts with several previous constant-pH approaches. For a test set of six proteins and 78 titratable groups, the model performs well, with an rms error of 1.2 pH units. While this is slightly greater than a simple Null model (rms error of 1.1) and a fully empirical model (rms error of 0.9), it is obtained using physically meaningful model parameters, including a low protein dielectric of four. Importantly, similar performance is obtained for side chains with large and small pKa shifts (relative to a standard model compound). The titration curve slopes and the conformations sampled are reasonable. Several directions to improve the method further are discussed.


Dynamics of beta 3 integrin I-like and Hybrid domains: Insight from simulations on the mechanism of transition between open and closed forms.
T. Gaillard, A. Dejaegere & R. Stote (2009) Proteins, 76, 977-994.

ABSTRACT

The conformational dynamics of the I-like and Hybrid domains from the beta 3 integrin headpiece were studied by molecular dynamics simulation and normal mode analysis. Crystallographic structures of integrins show that the integrin headpiece can exist in largely different conformations manifested by a significant difference in the angle between the I-like and Hybrid domains. The relative orientation of these two domains is believed to be a crucial element of integrin function, as it may relate local structural modifications induced by ligand binding into large-scale conformational changes. To investigate the detailed mechanisms responsible for this coupling, we carried out molecular dynamics simulations of the I-like/Hybrid system and employed quasi-harmonic and normal mode analyses to characterize the large-scale motions. Our results show that the conformational transition of I-like and Hybrid domains inferred from crystallographic data is contained in the low-frequency dynamics of the system. Using targeted molecular dynamics simulations, we investigated the roles played by two structural elements of the I-like domain, the alpha 7 and alpha 1 helices, in the interdomain transition. From our results, we propose that these two helices function in tandem to initiate large-scale, interdomain conformational transition apparent in integrin activation and signaling.

Tet repressor induction by tetracycline: a molecular dynamics, continuum electrostatics, and crystallographic study.
A. Alexandrov, L. Schuldt, W. Hinrichs & T. Simonson (2008) Journal of Molecular Biology, 378, 896-910.

ABSTRACT

The Tet repressor (TetR) mediates the most important mechanism of bacterial resistance against tetracycline antibiotics (Tc). In the absence of Tc, TetR is tightly bound to its operator DNA; upon binding of Tc with an associated Mg ion, it dissociates from the DNA, allowing expression of the repressed genes. Its tight control by Tc makes TetR broadly useful in genetic engineering. The Tc binding site is over 20 Angstroms from the DNA, so the binding signal must propagate a long distance. We use molecular dynamics simulations and continuum electrostatic calculations to test two models of the allosteric mechanism. We simulate the TetR:DNA complex, the Tc-bound, ``induced'' TetR, and the transition pathway between them. The simulations support the model inferred previously from the crystal structures and reveal new details. When [Tc:Mg] binds, the Mg ion makes direct and water-mediated interactions with helix 8 of one TetR monomer and helix 6 of the other monomer, and helix 6 is pulled in towards the central core of the structure. Hydrophobic interactions with helix 6 then pull helix 4 in a pendulum motion, with a maximal displacement at its N-terminus: the DNA interface. The crystal structure of an additional TetR reported here corroborates this motion. The N-terminal residue of helix 4, Lys48, is highly-conserved in DNA-binding regulatory proteins of the TetR class and makes the largest contribution of any amino acid to the TetR:DNA binding free energy. Thus, the conformational changes lead to a drastic reduction in the TetR:DNA binding affinity, allowing TetR to detach itself from the DNA. Tc plays the role of a specific Mg carrier, whereas the Mg ion itself makes key interactions that trigger the allosteric transition in the TetR:Tc complex.

Homology modelling of protein-protein complexes: a simple method and its possibilities and limitations.
G. Launay & T. Simonson (2008) BMC Bioinformatics, 9, 427-443.

ABSTRACT

Structure-based computational methods are needed to help identify and characterize protein-protein complexes and their function. For individual proteins, the most successful technique is homology modelling. We investigate a simple extension of this technique to protein-protein complexes. We consider a large set of complexes of known structures, involving pairs of single-domain proteins. The complexes are compared with each other to establish their sequence and structural similarities and the relation between the two. Compared to earlier studies, a simpler dataset, a simpler structural alignment procedure, and an additional energy criterion are used. Next, we compare the Xray structures to models obtained by threading the native sequence onto other, homologous complexes. An elementary requirement for a successful energy function is to rank the native structure above any threaded structure. We use the DFIRE(beta) energy function, whose quality and complexity are typical of the models used today. Finally, we compare near-native models to distinctly non-native models. If weakly stable complexes are excluded (defined by a binding energy cutoff), as well as a few unusual complexes, a simple homology principle holds: complexes that share more than 35% sequence identity share similar structures and interaction modes; this principle was less clearcut in earlier studies. The energy function was then tested for its ability to identify experimental structures among sets of decoys, produced by a simple threading procedure. On average, the experimental structure is ranked above 92% of the alternate structures. Thus, discrimination of the native structure is good but not perfect. The discrimination of near-native structures is fair.  Typically, a single, alternate, non-native binding mode exists that has a native-like energy. Some of the associated failures may correspond to genuine, alternate binding modes and/or native complexes that are artefacts of the crystal environment. In other cases, additional model filtering with more sophisticated tools is needed. The results suggest that the simple modelling procedure applied here could help identify and characterize protein-protein complexes. The next step is to apply it on a genomic scale.

Computational protein design as a tool for fold recognition.
M. Schmidt am Busch, D. Mignon & T. Simonson (2009) Proteins, 77, 139-158.

ABSTRACT

Computationally designed protein sequences have been proposed as a basis to perform fold recognition and homology searching. To investigate this possibility, an automated procedure is used to completely redesign 24 SH3 proteins and 22 SH2 proteins. We use the experimental backbone coordinates as fixed templates in the folded state and a molecular mechanics model to compute the pairwise interaction energies between all sidechain types and conformations. Energy calculations are done with the Proteins@Home volunteer computing platform. A heuristic algorithm is then used to scan the sequence and conformational space for optimal solutions. We produced 200,000-450,000 sequences for each backbone template. The designed sequences ressemble moderately-distant, natural homologues of the initial templates, according to their identity scores and their similarity with respect to the Pfam sets of SH2 and SH3 domains. Standard homology detection tools document their native-like character: the Conserved Domain Database recognizes 61% (52%) of our low-energy sequences as SH3 (SH2) domains; the SUPERFAMILY, Hidden-Markov Model library recognizes 81% (84%). Conversely, Position Specific Scoring Matrices (PSSMs) derived from our designed sequences can be used to detect natural homologues in sequence databases. Within SwissProt, a set of natural SH3 PSSMs detects 772 SH3 domains, for example; our designed PSSMs detect 67% of these, plus one additional sequence and two false positives. If six amino acids involved in substrate binding (a selective pressure not accounted for in our design) are reset to their experimental types, then 77% of the experimental SH3 domains are detected. Results for the SH2 domains are similar. Several directions to improve the method further are discussed.

Molecular dynamics simulations of the 30S ribosomal subunit reveal a preferred tetracycline binding site.
A. Alexandrov & T. Simonson (2008) Journal of the American Chemical Society, 130, 1114-1115.

ABSTRACT

Tetracyclines (Tc) are important antibiotics that inhibit bacterial ribosomes. Two and six Tc binding sites, respectively, were seen in two Xray structures of the Thermus thermophilus 30S ribosome subunit; the exact functional role of the various sites remains unclear. We study the two consensus sites, seen in both structures: a primary site, which is positioned to block tRNA A-site binding, and a secondary site, which has a weaker electron density. We combine molecular dynamics simulations and continuum electrostatic calculations to estimate the relative affinities of the two sites. The dielectric constant of the ribosome is set to 8, to reproduce the experimental binding free energy differences between Tc and its analogues minocycline and doxycycline, as well as more rigorous free energy simulations of Mg2+ binding. We find that both sites include a pre-bound Mg2+ ion, present before Tc binds. Using long simulations and comparing 8 structural models for each site, we then show that primary site Tc binding is stronger by 1 4 kcal/mol; this range appears consistent with the crystallographically-observed occupancies of the two sites. With this free energy range, TET5 is largely unoccupied under physiological conditions. Thus, we propose that the primary site is the inhibitory site and that allosteric effects may not be essential for tetracycline function.

Neutral evolution of proteins: the superfunnel in sequence space and its relation to mutational robustness.
J. Noirel & T. Simonson (2008) Journal of Chemical Physics, 129, 185104-185112.

ABSTRACT

Following Kimura's neutral theory of molecular evolution  (M. Kimura, The Neutral Theory of Molecular Evolution  Cambridge University Press, Cambridge, 1983), it has become common to assume that the vast majority of viable mutations of a gene confer little or no functional advantage. Yet, in silico models of protein evolution have shown that mutational robustness of sequences could be selected for, even in the context of neutral evolution. The evolution of a biological population can be seen as a diffusion on the network of viable sequences. This network is called a  neutral network.  Depending on the mutation rate  m and the population size N, the biological population can evolve purely randomly (mN << 1)  or it can evolve in such a way as to select for sequences of higher mutational robustness  (mN >> 1) . The stringency of the selection depends not only on the product  mN but also on the exact topology of the neutral network, the special arrangement of which was named  superfunnel.  Even though the relation between mutation rate, population size, and selection was thoroughly investigated, a study of the salient topological features of the superfunnel that could affect the strength of the selection was wanting. This question is addressed in this study. We use two different models of proteins: on lattice and off lattice. We compare neutral networks computed using these models to random networks. From this, we identify two important factors of the topology that determine the stringency of the selection for mutationally robust sequences. First, the presence of highly connected nodes  ( hubs)   in the network increases the selection for mutationally robust sequences. Second, the stringency of the selection increases when the correlation between a sequence's mutational robustness and its neighbors  increases. The latter finding relates a global characteristic of the neutral network to a local one, which is attainable through experiments or molecular modeling.

Probing electrostatic interactions and ligand binding in aspartyl-tRNA synthetase through site-directed mutagenesis and computer simulations.
D. Thompson, C. Lazennec, P. Plateau & T. Simonson (2008) Proteins, 71, 1450-1460.

ABSTRACT

Faithful genetic code translation requires that each aminoacyl-tRNA synthetase recognise its cognate amino acid ligand specifically. Aspartyl-tRNA synthetase (AspRS) distinguishes between its negatively-charged Asp substrate and two competitors, neutral Asn and di-negative succinate, using a complex network of electrostatic interactions. Here, we used molecular dynamics simulations and site-directed mutagenesis experiments to probe these interactions further. We attempt to decrease the Asp/Asn binding free energy difference via single, double and triple mutations that reduce the net positive charge in the active site of Escherichia coli AspRS. Earlier, Glutamine 199 was changed to a negatively-charged glutamate, giving a computed reduction in Asp affinity in good agreement with experiment. Here, Lysine 198 was changed to a neutral leucine; then, Lys198 and Gln199 were mutated simultaneously. Both mutants are predicted to have reduced Asp binding and improved Asn binding, but the changes are insufficient to overcome the initial, high specificity of the native enzyme, which retains a preference for Asp. Probing the aminoacyl-adenylation reaction through pyrophosphate exchange experiments, we found no detectable activity for the mutant enzymes, indicating weaker Asp binding and/or poorer transition state stabilization. The simulations show that the mutations' effect is partly offset by proton uptake by a nearby histidine. Therefore, we performed additional simulations where the nearby Histidines 448 and 449 were mutated to neutral or negative residues: {Lys198Leu, His448Gln, His449Gln} and {Lys198Leu, His448Glu, His449Gln}. This led to unexpected conformational changes and loss of active site preorganization, suggesting that the AspRS active site has a limited structural tolerance for electrostatic modifications. The data give insights into the complex electrostatic network in the AspRS active site and illustrate the difficulty in engineering charged-to-neutral changes of the preferred ligand.