tél : 01.69.35.97.15 - fax : 02.38.25.55.83

Rougier, N. P., Hinsen, K., Alexandre, F., Arildsen, T., Barba, L., Benureau, F. C. Y., Brown, C. T., De Buyl, P., Caglayan, O., Davison, A. P., Delsuc, M. A., Detorakis, G. Diem, A. K., Drix, D., Enel, P., Girard, B., Guest, O., Hall, M. G., Henriques, R. N., Hinaut, X., Jaron, K. S, Khamassi, M., Klein, A., Manninen, T., Marchesi, P., McGlinn, D., Metzner, C., Petchey, O. L., Plesser, H. E., Poisot, T., Ram, K., Ram, Y., Roesch, E., Rossant, C., Rostami, V., Shifman, A., Stachelek, J., Stimberg, M., Stollmeier, F., Vaggi, F., Viejo, G., Vitay, J., Vostinar, A., Yurchak, R. and Zito, T. (

Computer science offers a large set of tools for prototyping, writing, running, testing, validating, sharing and reproducing results, however computational science lags behind. In the best case, authors may provide their source code as a compressed archive and they may feel confident their research is reproducible. But this is not exactly true. James Buckheit and David Donoho proposed more than two decades ago that an article about computational results is advertising, not scholarship. The actual scholarship is the full software environment, code, and data that produced the result. This implies new workflows, in particular in peer-reviews. Existing journals have been slow to adapt : source codes are rarely requested, hardly ever actually executed to check that they produce the results advertised in the article. ReScience is a peer-reviewed journal that targets computational research and encourages the explicit replication of already published research, promoting new and open-source implementations in order to ensure that the original research can be replicated from its description. To achieve this goal, the whole publishing chain is radically different from other traditional scientific journals. ReScience resides on GitHub where each new implementation of a computational study is made available together with comments, explanations, and software tests.

Frustrated by another failed software installation ? Wondering why you can’t reproduce your colleagues’ computations ? This story will tell you why. It won’t magically solve your problems, but it does point out a glimpse of hope for the future.

With the development of new experimental technologies, biologists are faced with an avalanche of data to be computationally analyzed for scientific advancements and discoveries to emerge. Faced with the complexity of analysis pipelines, the large number of computational tools, and the enormous amount of data to manage, there is compelling evidence that many if not most scientific discoveries will not stand the test of time : increasing the reproducibility of computed results is of paramount importance.

The objective we set out in this paper is to place scientific workflows in the context of reproducibility. To do so, we define several kinds of reproducibility that can be reached when scientific workflows are used to perform experiments. We characterize and define the criteria that need to be catered for by reproducibility-friendly scientific workflow systems, and use such criteria to place several representative and widely used workflow systems and companion tools within such a framework. We also discuss the remaining challenges posed by reproducible scientific workflows in the life sciences. Our study was guided by three use cases from the life science domain involving in silico experiments.

Many of us write code regularly as part of our scientific activity, perhaps even as a full-time job. But even though we write—and use—more and more code, we rarely think about the roles that this code will have in our research, in our publications, and ultimately in the scientific record. In this article, the author outlines some frequent roles of code in computational science. These roles aren’t exclusive ; in fact, it’s common for a piece of code to have several roles, at the same time or as an evolution over time. Thinking about these roles, ideally before starting to write the code, is a good habit to develop.

Hinsen K. and Kneller G. R. (

Anomalous diffusion is characterized by its asymptotic behavior for t —> infinity. This makes it difficult to detect and describe in particle trajectories from experiments or computer simulations, which are necessarily of finite length. We propose a new approach using Bayesian inference applied directly to the observed trajectories sampled at different time scales. We illustrate the performance of this approach using random trajectories with known statistical properties and then use it for analyzing the motion of lipid molecules in the plane of a lipid bilayer.

Computers are the only research tools that by design exhibit chaotic behavior : a minimal change in the input to a computation can change its output in any imaginable way. Developers and users of scientific software should be aware of this feature and set up safety nets for protecting themselves against bad surprises.

Pouzat C., Davison A. and Hinsen K. (

Dans les articles de recherche ordinaires, tout n’est pas écrit, loin de là :

des connaissances sont présupposées, des détails techniques sont omis.

Depuis quelques années, des chercheurs ont entrepris de publier des

articles qui, en plus du contenu scientifique classique, contiennent toute

l’information nécessaire à la reproduction de celui-ci, une fois les données

acquises. Des logiciels spécifiques ont été mis au point pour rendre aisée

la production de tels articles. Les chercheurs ont intérêt à en profiter, car

les revues scientifiques et les agences de financement de la recherche leur

demandent de plus en plus d’adopter ce genre de pratiques.

A coarse-grained geometrical model for protein secondary-structure description and analysis is presented which uses only the positions of the C(alpha) atoms. A space curve connecting these positions by piecewise polynomial interpolation is constructed and the folding of the protein backbone is described by a succession of screw motions linking the Frenet frames at consecutive C(alpha) positions. Using the ASTRAL subset of the SCOPe database of protein structures, thresholds are derived for the screw parameters of secondary-structure elements and demonstrate that the latter can be reliably assigned on the basis of a C(alpha) model. For this purpose, a comparative study with the widely used DSSP (Define Secondary Structure of Proteins) algorithm was performed and it was shown that the parameter distribution corresponding to the ensemble of all pure C(alpha) structures in the RCSB Protein Data Bank matches that of the ASTRAL database. It is expected that this approach will be useful in the development of structure-refinement techniques for low-resolution data.

Technical debt is a useful metaphor for understanding the long-term impact of technical choices. The analysis of technical debt has found its place in the software industry, with results that are applicable to scientific software as well. But technical debt is also a useful concept to look at how we use computers in doing scientific research.

Here, Konrad Hinsen talks about how one aspect of validating a piece of software is to check that it does what it’s expected to do. But how do you write down your expectations ?

Numerical solutions of mathematical equations in scientific models are the result of several approximation steps. Konrad Hinsen uses a simulation of the solar system as an example for illustrating these approximations and explaining their role in the difficult problem of testing scientific software.

The 18 kDa protein TSPO is a highly conserved transmembrane protein found in bacteria, yeast, animals and plants. TSPO is involved in a wide range of physiological functions, among which the transport of several molecules. The atomic structure of monomeric ligand-bound mouse TSPO in detergent has been published recently. A previously published low-resolution structure of Rhodobacter sphaeroides TSPO, obtained from tubular crystals with lipids and observed in cryo-electron microscopy, revealed an oligomeric structure without any ligand. We analyze this electron microscopy density in view of available biochemical and biophysical data, building a matching atomic model for the monomer and then the entire crystal. We compare its intra- and inter-molecular contacts with those predicted by amino acid covariation in TSPO proteins from evolutionary sequence analysis. The arrangement of the five transmembrane helices in a monomer of our model is different from that observed for the mouse TSPO. We analyze possible ligand binding sites for protoporphyrin, for the high-affinity ligand PK 11195, and for cholesterol in TSPO monomers and/or oligomers, and we discuss possible functional implications.

The lack of replicability and reproducibility of scientific studies based on computational methods has lead to serious mistakes in published scientific findings, some of which have been discovered and publicized recently. Many strategies are currently pursued to improve the situation. This article reports the first conclusions from the ActivePapers project, whose goal is the development and application of a computational platform that allows the publication of computational research in a form that enables installation-free deployment, encourages reuse, and permits the full integration of datasets and software into the scientific record. The main finding is that these goals can be achieved with existing technology, but that there is no straightforward way to adapt legacy software to such a framework.

Hinsen, K. (

Computational techniques have revolutionized many aspects of scientific research over the last few decades. Experimentalists use computation for data analysis, processing ever bigger data sets. Theoreticians compute predictions from ever more complex models. However, traditional articles do not permit the publication of big data sets or complex models. As a consequence, these crucial pieces of information no longer enter the scientific record. Moreover, they have become prisoners of scientific software : many models exist only as software implementations, and the data are often stored in proprietary formats defined by the software. In this article, I argue that this emphasis on software tools over models and data is detrimental to science in the long term, and I propose a means by which this can be reversed.

The MOlecular SimulAtion Interchange Conventions (MOSAIC) consist of a data model for molecular simulations and of concrete implementations of this data model in the form of file formats. MOSAIC is designed as a modular set of specifications, of which the initial version covers molecular structure and configurations. A reference implementation in the Python language facilitates the development of simulation software based on MOSAIC.

Fuglebakk E., Reuter N. and Hinsen K. (

Elastic network models (ENMs) are valuable tools for investigating collective motions of proteins, and a rich variety of simple models have been proposed over the past decade. A good representation of the collective motions requires a good approximation of the covariances between the fluctuations of the individual atoms. Nevertheless, most studies have validated such models only by the magnitudes of the single-atom fluctuations they predict. In the present study, we have quantified the agreement between the covariance structure predicted by molecular dynamics (MD) simulations and those predicted by a representative selection of proposed coarse-grained ENMs. We then contrast this approach with the comparison to MD-predicted atomic fluctuations and comparison to crystallographic B-factors. While all the ENMs yield approximations to the MD-predicted covariance structure, we report large and consistent differences between proposed models. We also find that the ability of the ENMs to predict atomic fluctuations is correlated with their ability to capture the covariance structure. In contrast, we find that the models that agree best with B-factors model collective motions less reliably and recommend against using B-factors as a benchmark.

We study the dynamical transition of human acetylcholinesterase by analyzing elastic neutron scat- tering data with a simulation gauged analytical model that goes beyond the standard Gaussian ap- proximation for the elastic incoherent structure factor [G. R. Kneller and K. Hinsen, J. Chem. Phys. 131, 045104 (2009)]. The model exploits the whole available momentum transfer range in the ex- perimental data and yields not only a neutron-weighted average of the atomic mean square position fluctuations, but also an estimation for their distribution. Applied to the neutron scattering data from human acetylcholinesterase, it reveals a strong increase of the motional heterogeneity at the two transition temperatures T = 150 K and T = 220 K, respectively, which can be located with less am- biguity than with the Gaussian model. We find that the first transition is essentially characterized by a change in the form of the elastic scattering profile and the second by a homogeneous increase of all motional amplitudes. These results are in agreement with previous combined experimental and simulation studies of protein dynamics, which attribute the first transition to an onset of methyl rotations and the second to more unspecific diffusion processes involving large amplitude motions.

In the present work, we propose a simple model-free approach for the computation of molecular dif- fusion tensors from molecular dynamics trajectories. The method uses a rigid body trajectory of the molecule under consideration, which is constructed a posteriori by an accumulation of quaternion- based superposition fits of consecutive conformations. From the rigid body trajectory, we compute the translational and angular velocities of the molecule and by integration of the latter also the cor- responding angular trajectory. All quantities can be referred to the laboratory frame and a molecule- fixed frame. The 6 × 6 diffusion tensor is computed from the asymptotic slope of the tensorial mean square displacement and, for comparison, also from the Kubo integral of the velocity cor- relation tensor. The method is illustrated for two simple model systems – a water molecule and a lysozyme molecule in bulk water. We give estimations of the statistical accuracy of the calculations.

In all-atom molecular simulation studies of proteins, each atom in the protein is represented by a point mass and interactions are defined in terms of the atomic positions. In recent years, various simplified approaches have been proposed. These approaches aim to improve computational effi- ciency and to provide a better physical insight. The simplified models can differ widely in their description of the geometry and the interactions inside the protein. This study explores the most fun- damental choice in the simplified protein models : the choice of a coordinate set defining the protein structure. A simplified model can use fewer point masses than the all-atom model and/or eliminate some of the internal coordinates of the molecule by setting them to an average or ideal value. We look at the implications of such choices for the overall protein structure. We find that care must be taken for angular coordinates, where even very small variations can lead to significant changes in the positions of far away atoms. In particular, we show that the φ/ψ torsion angles are not a sufficient coordinate set, whereas another coordinate set with two degrees of freedom per residue, virtual Cα backbone bond, and torsion angles performs satisfactorily.

Did you ever think about what your ideal work environment for scientific programming would be like ? Here’s my version of it.

Reproducible research will change not only the way we run computations, but also the way we write scientific software.

Technology being developed right now in computer science labs might change the way scientists and engineers write programs in the future.

Hinsen, K. (

In the long run, your data matters more than your code. It’s worth investing some effort to keep your data in good shape for years to come. Data is at the heart of science. A scientist is expected to be able to back up all published conclusions with data. Data management should thus be a priority in science. Scientists can’t afford to lose data, be uncertain of what it means, or not know where it came from. In the experimental sciences, there’s a long tradition of writing down all experimental setups, parameters, and results meticulously in a lab notebook. Unfortunately, computational science is much less rigorous about data handling, although there are clear signs of improvement. Here, we’ll consider what you can do to better prepare and manage your data.

Photoionization of protein ions : The ionization energy of polyprotonated protein cations in the gas phase measured using VUV synchrotron radiation appears to be correlated with the charge state z of the protein and its tertiary structure. A simple electrostatic model accounts for the results and also shows predictive capabilities to derive a mean radius R(m) of the protein ion from the ionization energy, and vice versa.

We present an implementation of path integral molecular dynamics for sampling low temperature properties of doped helium clusters using Langevin dynamics. The robustness of the path integral Langevin equation and white-noise Langevin equation [M. Ceriotti, M. Parrinello, T. E. Markland, and D. E. Manolopoulos, J. Chem. Phys. 133, 124104 (2010)] sampling methods are considered for those weakly bound systems with comparison to path integral Monte Carlo (PIMC) in terms of efficiency and accuracy. Using these techniques, convergence studies are performed to confirm the systematic error reduction introduced by increasing the number of discretization steps of the path integral. We comment on the structural and energetic evolution of HeN−CO2 clusters from N = 1 to 20. To quantify the importance of both rotations and exchange in our simulations, we present a chemical potential and calculated band origin shifts as a function of cluster size utilizing PIMC sampling that includes these effects. This work also serves to showcase the implementation of path integral simulation techniques within the molecular modelling toolkit [K. Hinsen, J. Comp. Chem. 21, 79 (2000)], an open-sourcemolecular simulation package.

We present a new version of the program package nMoldyn, which has been originally developed for a neutron-scattering oriented analysis of molecular dynamics simulations of macromolecular systems (Kneller et al., Comput. Phys. Commun. 1995, 91, 191) and was later rewritten to include in-depth time series analyses and a graphical user interface (Rog et al., J. Comput. Chem. 2003, 24, 657). The main improvement in this new version and the focus of this article are the parallelization of all the analysis algorithms for use on multicore desktop computers as well as distributed-memory computing clusters. The parallelization is based on a task farming approach which maintains a simple program structure permitting easy modification and extension of the code to integrate new analysis methods.

We present a model for the local diffusion-relaxation dynamics of the Cα-atoms in proteins describing both the diffusive short-time dynamics and the asymptotic long-time relaxation of the position autocorrelation functions. The relaxation rate spectra of the latter are represented by shifted gamma distributions, where the standard gamma distribution describes anomalous slow relaxation in macromolecular systems of infinite size and the shift accounts for a smallest local relaxation rate in macromolecules of finite size. The resulting autocorrelation functions are analytic for any time t ⩾ 0. Using results from a molecular dynamics simulation of lysozyme, we demonstrate that the model fits the position autocorrelation functions of the Cα-atoms exceptionally well and reveals moreover a strong correlation between the residue’s solvent-accessible surface and the fitted model parameters.

The notion of state is fundamental to the study of dynamical systems. State refers to the minimal set of information that fully describes the system at a specific time. The simplest state model is found in Fortran 77, where a program’s state consists of the values of all its scalar variables and of all the elements of all arrays, the statement to be executed next, and the statements at which execution resumes at the end of each active subroutine or function. In the following, I’ll concentrate on the variables, because the statement references are part of control flow and not directly accessible to the programmer. It’s important, however, to remember that each state is associated with a precise point in the program’s execution. The two important concepts in understanding data handling in Fortran 77 are thus value and variable, where the latter term includes the elements of arrays. Each variable has a value at each point in execution. This value is changed by an assignment statement. It can be used in an expression, where it’s combined with other values (constants or the values of other variables) by operations, such as arithmetic, to produce new values. Languages such as C or Fortran 90 that offer pointers and dynamic memory allocation lead to a much more complicated description of state. Obviously the contents of the dynamically allocated storage must be included in the total state of a program, but that’s only the first step. Each such storage block can itself contain pointers to other storage blocks. The neat distinction between variables and values disappears : the value of a variable can define another variable. As a consequence, it’s no longer possible to compile a list of all parts of a program’s state just by looking at its source code. What is or isn’t part of the state is defined by the state itself.

Hinsen, K. (

Kneller, G.R. & Hinsen, K. (

Kneller, G., Hinsen, K., Sutmann, G., & Calandrini, V. (

Motivation : In the study of the structural flexibility of proteins, crystallographic Debye-Waller factors are the most important experimental information used in the calibration and validation of computational models, such as the very successful elastic network models (ENMs). However, these models are applied to single protein molecules, whereas the experiments are performed on crystals. Moreover, the energy scale in standard ENMs is undefined and must be obtained by fitting to the same data that the ENM is trying to predict, reducing the predictive power of the model. Results : We develop an elastic network model for the whole protein crystal in order to study the influence of crystal packing and lattice vibrations on the thermal fluctuations of the atom positions. We use experimental values for the compressibility of the crystal to establish the energy scale of our model. We predict the elastic constants of the crystal and compare with experimental data. Our main findings are (1) crystal packing modifies the atomic fluctuations considerably and (2) thermal fluctuations are not the dominant contribution to crystallographic Debye-Waller factors.

The influence of solvent on the slow internal dynamics of proteins is studied by comparing Molecular Dynamics simulations of solvated and unsolvated lysozyme. The dynamical trajectories are projected onto the protein’s normal modes in order to obtain a separate analysis for each of the associated time scales. The results show that solvent effects are important for the slowest motions (below ï¿1/2 1/ps) but negligible for faster motions. The damping effects seen in the latter show that the principal source of friction in protein dynamics is not the solvent, but the protein itself.

Calandrini, V ; Hamon, V ; Hinsen, K ; Calligari, P ; Bellissent-Funel, M ; Kneller, GR (

The paper presents a study of the influence of non-denaturing hydrostatic pressure on the relaxation dynamics of lysozyme in solution, which combines molecular dynamics simulations and quasielastic neutron scattering experiments. We compare results obtained at ambient pressure and at 3 kbar. To interpret the simulation results and the experimental data, we adopt the fractional Ornstein-Uhlenbeck process as a model for the internal relaxation dynamics of the protein. On the experimental side, global protein motions are accounted for by the model of free translational diffusion, neglecting the much slower rotational diffusion. We find that the protein dynamics in the observed time window from 1 to 100 picoseconds is slowed down under pressure, while its fractal characteristics is preserved, and that the amplitudes of the motions are reduced by about 20 %. The slowing down of the relaxation is reduced with increasing $q$-values, where more localized motions are seen.

The combination of the Python language and the bulk synchronous parallel computing model make developing and testing parallel programs a much more pleasurable experience.

Hinsen, K. (

Scientific computing is usually associated with compiled languages for maximum efficiency. However, in a typical application program, only a small part of the code is time-critical and requires the efficiency of a compiled language. It is often advantageous to use interpreted high-level languages for the remaining tasks, adopting a mixed-language approach. This will be demonstrated for Python, an interpreted object-oriented high-level language that is well suited for scientific computing. Particular attention is paid to high-level parallel programming using Python and the BSP model. We explain the basics of BSP and how it differs from other parallel programming tools like MPI. Thereafter we present an application of Python and BSP for solving a partial differential equation from computational science, utilizing high-level design of libraries and mixed-language (Python-C or Python-Fortran) programming. (c) 2004 Published by Elsevier B.V.

The paper describes preliminary results of a molecular dynamics simulation study on the influence of non-denaturing hydrostatic pressure on the structure and the relaxation dynamics of lysozyme. The overall compression and the structural changes are in agreement with results from recent nuclear magnetic resonance experiments. We find that moderate hydrostatic pressure reduces essentially the amplitudes of the atomic motions. but does not change the characteristics of the slow internal dynamics. The latter is well described by a fractional Ornstein-Uhlenbeck process, concerning both single particle and collective motions. (c) 2006 Elsevier B.V. All rights reserved.

The dihedral angle principal component analysis method published recently by Mu, Nguyen, and Stock, is shown to produce distortions of the free energy landscape due to the neglect of constraints in the coordinates. It is further shown that these distortions can create artificial minima and energy barriers. The rugged energy landscape that the authors find for a small peptide chain might thus be an artifact of their method.

Hinsen, K ; Reuter, N ; Navaza, J ; Stokes, DL ; Lacapere, JJ (

A method for the flexible docking of high-resolution atomic structures into lower resolution densities derived from electron microscopy is presented. The atomic structure is deformed by an iterative process using combinations of normal modes to obtain the best fit of the electron microscopical density. The quality of the computed structures has been evaluated by several techniques borrowed from crystallography. Two atomic structures of the SERCA1 Ca-ATPase corresponding to different conformations were used as a starting point to fit the electron density corresponding to a different conformation.

Kneller, GR ; Hinsen, K (

Correlation functions describing relaxation processes in proteins and other complex molecular systems are known to exhibit a nonexponential decay. The simulation study presented here shows that fractional Brownian dynamics is a good model for the internal dynamics of a lysozyme molecule in solution. We show that both the dynamic structure factor and the associated memory function fit well the corresponding analytical functions calculated from the model. The numerical analysis is based on autoregressive modeling of time series. (C) 2004 American Institute of Physics.

Reuter, N ; Hinsen, K ; Lacapere, JJ (

The transport of Ca2+ by Ca-ATPase across the sarcoplasmic reticulum membrane is accompanied by several transconformations of the protein. Relying on the already established functional importance of low-frequency modes in dynamics of proteins, we report here a normal mode analysis of the Ca2+-ATPase based on the crystallographic structures of the E1Ca(2) and E2TG forms. The lowest-frequency modes reveal that the N and A(+Nter) domains undergo the largest amplitude movements. The dynamical domain analysis performed with the DomainFinder program suggests that they behave as rigid bodies, unlike the highly flexible P domain.

We present a new implementation of the program nMoldyn, which has been developed for the computation and decomposition of neutron scattering intensities from Molecular Dynamics trajectories (Comp. Phys. Commun 1995, 91, 191-214). The new implementation extends the functionality of the original version, provides a much more convenient user interface (both graphical/interactive and batch), and can be used as a tool set for implementing new analysis modules. This was made possible by the use of a high-level language, Python, and of modern object-oriented programming techniques.

Using autoregressive modeling of discrete signals, we investigate the influence of mass and size on the memory function of a tracer particle immersed in a Lennard-Jones liquid. We find that the memory function of the tracer particle scales with the inverse reduced mass of the simulated system. Increasing the particle’s mass leads rapidly to a slow exponential decay of the velocity autocorrelation function, whereas the memory function changes just its amplitude. This effect is the more pronounced the smaller and the heavier the tracer particle is. (C) 2003 American Institute of Physics.

One of the main obstacles to a more widespread use of parallel computing in computational science is the difficulty of implementing, testing, and maintaining parallel programs. The combination of a simple parallel computation model, BSP, and a high-level programming language, Python, simplifies these tasks significantly. It allows the rapid development facilities of Python to be applied to parallel programs, providing interactive development as well as interactive debugging of parallel programs.

Hinsen, K ; Petrescu, AJ ; Dellerue, S ; Bellissent-Funel, MC ; Kneller, GR (

Recent analyses of molecular dynamics simulations of hydrated C-phycocyanin suggest that the internal single-particle dynamics of this protein can be decomposed into four almost decoupled motion types : (1) diffusion of residues ("beads") in an effective harmonic potential, (2) corresponding vibrations in a local potential well, (3) purely rotational rigid side-chain diffusion, and (4) residue deformations, Each residue bead is represented by the corresponding C-alpha carbon atom on the main chain.

Kneller, GR ; Hinsen, K (

We propose a new method to compute reliable estimates for memory functions of dynamical variables from molecular dynamics simulations. The key point is that the dynamical variable under consideration, which we take to be the velocity of a fluid particle, is modeled as an autoregressive stochastic process. The parameters of this stochastic process can be determined from molecular dynamics trajectories using efficient algorithms that are well established in signal processing. The procedure is also referred to as the maximum entropy method. From the autoregressive model of the velocity autocorrelation function we compute the one-sided z transform of the discretized memory function and the memory function itself. Using liquid argon as a simple model system, we demonstrate that the autocorrelation function and its power spectrum can be approximated to almost arbitrary precision. The same is therefore true for the memory function, which is calculated within the same stochastic model. (C) 2001 American Institute of Physics.

Hinsen, K (

The Molecular Modeling Toolkit is a library that implements common molecular simulation techniques, with an emphasis on biomolecular simulations. It uses modem software engineering techniques (object-oriented design, a high-level language) to overcome limitations associated with the large monolithic simulation programs that are commonly used for biomolecules. Its principal advantages are (1) easy extension and combination with other libraries due to modular library design ; (2) a single high-level general-purpose programming language (Python) is used for library implementation as well as for application scripts ; (3) use of documented and machine-independent formats for all data files ; and (4) interfaces to other simulation and visualization programs. (C) 2000 John Wiley & Sons, Inc.

In studies of macromolecular dynamics it is often desirable to analyze complex motions in terms of a small number of coordinates. Only for simple types of motion, e.g., rigid-body motions, these coordinates can be easily constructed from the Cartesian atomic coordinates. This article presents an approach that is applicable to infinitesimal or approximately infinitesimal motions, e.g., Cartesian velocities, normal modes, or atomic fluctuations. The basic idea is to characterize the subspace of interesting motions by a set of (possibly linearly dependent) vectors describing elementary displacements, and then project the dynamics onto this subspace. Often the elementary displacements can be found by physical intuition. The restriction to small displacements facilitates the study of complicated coupled motions and permits the construction of collective-motion subspaces that do not correspond to any set of generalized coordinates. As an example for this technique, we analyze the low-frequency normal modes of proteins up to approximate to 20 THz (600 cm(-1)) in order to see what kinds of motions occupy which frequency range. This kind of analysis is useful for the interpretation of spectroscopic measurements on proteins, e.g., inelastic neutron scattering experiments.

Radiation damage in DNA is caused mainly by hydroxyl radicals which are generated by ionizing radiation in water and removing hydrogen atoms from the DNA chain. This damage affects certain nucleotide sequences more than others due to differences in the local structure of the DNA chains. This sequence dependence has been analyzed experimentally and calculated theoretically for a rigid DNA model. In this paper we take into account the flexibility of the DNA chain and show how it modifies the strand breakage probabilities. We use a simple harmonic model for DNA flexibility which permits the study of a long (68 base pair) fragment with modest computational effort. The essential influence of flexibility is an increased breakage probability towards the ends of the fragment, which can also be identified in the experimental data.

The slow dynamics of proteins around its native folded state is usually described by diffusion in a strongly anharmonic potential. In this paper, we try to understand the form and origin of the anharmonicities, with the principal aim of gaining a better understanding of the principal motion types, but also in order to develop more efficient numerical methods for simulating neutron scattering spectra of large proteins.

New numerical techniques permit the identification and characterization of dynamical domains in proteins with almost negligible computational effort. The validity of the approximations on which these techniques are based provides valuable insight into the nature of the energy landscape of proteins. An easy-to-use interactive analysis program based on the new methods is available. Extensions to higher frequency ranges are discussed briefly. (C) 2000 Elsevier Science B.V. All rights reserved.

Spotheim-Maurizot, M., Sy, D., Begusova, M., Hinsen, K., Savoye, C., Tartier, L., Viduna, D., Kneller, G. & M. Charlier, M. (

The glycine decarboxylase complex consists of four different proteins (the L-, P-, H-, and T-proteins). The H-protein plays a central role in communication among the other enzymes, as its lipoamide arm interacts successively with each of the components of the complex. The crystal structures of two states of the H-protein have been resolved : the oxidized form, H-ox at 2 Angstrom and the methylamine-loaded form, H-met at 2.2 Angstrom. However, the position of the arm for the reduced form, H-red, is still unknown. We have performed numerical free-energy calculations in order to better understand the differences in the structures and to elucidate the conformation of the arm in H-red. The results of the simulations are in agreement with the crystallographic results, as the minima of the free energy surface for H-ox and H-met correspond to the crystal structures.

Aspartate transcarbamylase (ATCase) initiates the pyrimidine biosynthetic pathway in Escherichia coli. Binding of aspartate to this allosteric enzyme induces a cooperative transition between the tensed (T) and relaxed (R) states of the enzyme which involves large quaternary and tertiary rearrangements. The mechanisms of the transmission of the regulatory signal to the active site (60 PL away) and that of the cooperative transition are not known in detail, although a large number of single, double, and triple site-specific mutants and chimeric forms of ATCase have been obtained and kinetically characterized. A previous analysis of the very low-frequency normal modes of both the T and R state structures of ATCase identified some of the large-amplitude motions mediating the intertrimer elongation and rotation that occur during the cooperative transition (Thomas et al., J.Mol. Biol. 257 : 1070-1087, 1996 ; Thomas et al., J. Mol. Biol. 261:490-506, 1996).

The empirical force fields used for protein simulations contain short-ranged terms (chemical bond structure, steric effects, van der Waals interactions) and long-ranged electrostatic contributions. It is well known that both components are important for determining the structure of a protein. We show that the dynamics around a stable equilibrium state can be described by a much simpler midrange force field made up of the chemical bond structure terms plus unspecific harmonic terms with a distance-dependent force constant. A normal mode analysis of such a model can reproduce the experimental density of states as well as a conventional molecular dynamics simulation using a standard force field with long-range electrostatic terms. This finding is consistent with a recent observation that effective Coulomb interactions are short ranged for systems with a sufficiently homogeneous charge distribution. (C) 1999 American Institute of Physics. [S0021-9606(99)52348-9].

We present a new approach for determining dynamical domains in large proteins, either based on a comparison of different experimental structures, or on a simplified normal mode calculation for a single conformation. In a first step, a deformation measure is evaluated for all residues in the protein ; a high deformation indicates highly flexible interdomain regions. The sufficiently rigid parts of the protein are then classified into rigid domains and low-deformation interdomain regions on the basis of their global motion. We demonstrate the techniques on three proteins : citrate synthase, which has been the subject of earlier domain analyses, HIV-1 reverse transcriptase, which has a rather complex domain structure, and aspartate transcarbamylase as an example of a very large protein. These examples show that the comparison of conformations and the normal mode analysis lead to essentially the same domain identification, except for cases where the experimental conformations differ by the presence of a large ligand, such as a DNA strand. Normal mode analysis has the advantage of requiring only one experimental structure and of providing a more detailed picture of domain movements, e.g. the splitting of domains into subdomains at higher frequencies. Proteins 1999 ;34:369-382, (C) 1999 Wiley-Liss, Inc.

Hinsen, K (

The identification of dynamical domains in proteins and the description of the low-frequency domain motions are one of the important applications of numerical simulation techniques. The application of these techniques to large proteins requires a substantial computational effort and therefore cannot be performed routinely, if at all. This article shows how physically motivated approximations permit the calculation of low-frequency normal modes in a few minutes on standard desktop computers, The technique is based on the observation that the low-frequency modes, which describe domain motions, are independent of force field details and can be obtained with simplified mechanical models.

Hinsen, K ; Roux, B (

The intramolecular proton transfer in the, enol form of acetylacetone is investigated at various temperatures both classically and quantum-mechanically using computer simulations. The potential energy surface is modeled using the empirical valence bond (EVB) approach of Warshel and fitted to the results of ab initio calculations. Quantum-statistical results are obtained via discretized Feynman path integral simulations. The classical and centroid potential of mean force for the reaction coordinate is obtained using umbrella sampling.

We present results of mixed quantum-classical molecular dynamics simulations of the intramolecular proton transfer in acetylacetone. Simulations are performed starting from the reactant and transition state configurations with initial velocities at each configuration chosen from an ensemble at 300 M. The proton motion is treated quantum mechanically and the remaining degrees of freedom are treated classically.

A potential energy model is developed to study the intramolecular proton transfer in the enol form of acetylacetone. It makes use of the empirical valence bond approach developed by Warshel to combine standard molecular mechanics potentials for the reactant and product states to reproduce the interconversion between these two states. Most parameters have been fitted to reproduce the key features of an ab initio potential surface obtained from 4-31G* Hartree-Fock calculations. The partial charges have been fitted to reproduce the electrostatic potential surface of 6-31G* Hartree-Fock wave functions, subject to total charge and symmetry constraints, using a fitting procedure based on generalized inverses. The resulting potential energy function reproduces the features most important for proton transfer simulations, while being several orders of magnitude faster in evaluation time than ab initio energy calculations. (C) 1997 by John Wiley & Sons, Inc.

Kneller, GR ; Hinsen, K (

Starting from the N-body friction matrix of an unconstrained system of N rigid particles immersed in a viscous liquid, we derive rigorous expressions for the corresponding friction and mobility matrices of a geometrically constrained dynamical system. Our method is based on the fact that geometrical constraints in a dynamical system can be cast in the form of linear constraints for the Cartesian translational and angular velocities of its constituents. Corresponding equations of motion for Molecular Dynamics simulations have been derived recently [1].

We describe a numerical method for calculating hydrodynamic interactions between spherical particles efficiently and accurately, both for particles immersed in an infinite liquid and for systems with periodic boundary conditions. Our method is based on a multipole expansion in Cartesian tensors. We then show how to solve the equations of motion for translational and rotational motion of suspended particles at large Peclet numbers. As an example we study the sedimentation of an array of spheres with and without periodic boundary conditions. We also study the effect of perturbations on the stability of the trajectories.

Cichocki, B ; Hinsen, K (

Based on the equations of motion for linked rigid bodies that we derived recently [G. Kneller and K. Hinsen, Phys Rev. E 50, 1559 (1994)], we develop a technique for the simulation of molecular systems with constraints. We apply it to analyze the importance of the-various degrees of freedom of a polypeptide chain for its dynamics. We find that keeping the peptide planes rigid does not change the dynamics much, but that the bending degrees of freedom of the alpha-carbon bond geometry are essential for large-amplitude backbone motions. This means that the phi and psi angles commonly used to characterize protein conformations and protein backbone dynamics do not constitute a sufficient set of variables to perform dynamical simulations.

The library presented here allows the efficient calculation of friction and mobility matrices that describe the hydrodynamic interactions between identical spherical particles suspended in a liquid, assuming stick boundary conditions on the particle surfaces. It allows an arbitrary mixture of fixed particles, freely moving particles, and freely moving rigid clusters of particles. A special option is provided to calculate the velocities of the particles in a Stokesian dynamics simulation. The particle configuration can be treated as finite in an infinite liquid or as periodic. Four levels of accuracy are available. The library can be used to study colloidal suspensions or macromolecules in solution.

Kneller, GR ; Hinsen, K (

We derive the equations of motion for linked rigid bodies from Lagrange mechanics and from Gauss’s principle of least constraint. The rotational motion of the subunits is described in terms of quaternion parameters and angular velocities. Different types of joints can be incorporated via axis constraints for the angular velocities. The resulting equations of motion are generalizations of the Euler equations of motion for a single rotor.

An efficient scheme is presented for the numerical calculation of hydrodynamic interactions of many spheres in Stokes flow. The spheres may have various sizes, and are freely moving or arranged in rigid arrays. Both the friction and mobility matrix are found from the solution of a set of coupled equations. The Stokesian dynamics of many spheres and the friction and mobility tensors of polymers and proteins may be calculated accurately at a modest expense of computer memory and time. The transport coefficients of suspensions can be evaluated by use of periodic boundary conditions.

Hinsen, K. & Felderhof, B.U. (

The electrostatic potential due to a multipole moment of order l is expressed in terms of 2l+1 independent Cartesian multipole components. The multipole expansion of the electrostatic interaction energy between two charge distributions is reduced correspondingly to the minimum number of Cartesian components.

We study the third-order electrical susceptibility of a fluid of nonpolar spherical molecules with linear polarizability and cubic hyperpolarizability. In the electrostatic dipole approximation, we derive an expression for the third-order susceptibility, which allows its evaluation from a property of the linear system. To second order in the linear polarizability, the correction to a mean field approximation, obtained by Buckingham, may be related to the correction to the Clausius-Mossotti formula for the linear dielectric constant. We present numerical results, obtained by computer simulation, for a system of hard spheres.

In view of recent theoretical results by Cichocki and Felderhof we reevaluate previously published simulation data for the mean square displacement of a particle in a hard sphere suspension without hydrodynamic interactions. This new evaluation is consistent with the theoretically predicted algebraic long-time tails. We observe better agreement for dense systems and obtain values for the relaxation time almost twice as large as previously.

We study the effective dielectric tensor for a composite of spheres with anisotropic dielectric tensor embedded in an isotropic uniform medium. We also study the effective conductivity tensor for a suspension of conducting spheres, where both the spheres and the conducting background medium are gyrotropic due to an applied magnetic field. The anisotropy of the effective conductivity tensor leads to the Hall effect. In theory the two problems are rather similar. We derive expressions of the Clausius-Mossotti type for the effective tensors. Correlation corrections are evaluated for weak gyrotropy by computer simulation. We consider both uniform spheres and the dipole approximation.

Hinsen, K ; Felderhof, BU (

We study the dielectric constant of a fluid of hard spheres with a polarizable point dipole and quadrupole at their center. The deviations from the Clausius-Mossotti formula are found at six volume fractions by computer simulation of a system with periodic boundary conditions. For small polarizability the deviation agrees well with the theoretical result up to a volume fraction of 30%, if calculated in superposition approximation for the triplet distribution function. We also determine the spectral density appearing in the spectral representation of the dielectric constant. The spectrum differs significantly from that for a system with only induced dipole interactions, especially at high volume fraction.

Cichocki, B ; Hinsen, K (

Cichocki, B ; Felderhof, BU ; Hinsen, K (

Chargé de recherche , Biophysique théorique, simulation moléculaire et calcul scientifique