**Statistical and computational physics of biomolecular systems**

Anomalous relaxation and diffusion processes in biomolecular systems

The internal dynamics of biomolecular systems such as proteins is characterized by a vast spectrum of time scales and most of the dynamical modes are strongly overdamped and diffusive. Their time evolution and corresponding time correlation functions can be modeled by fractional Fokker-Planck equations, which generalize the idea of Markovian, i.e. memoryless small-step diffusion processes to stochastic processes with long-time memory. The keyword “anomalous relaxation” refers here to the strongly non-exponential decay of the corresponding time correlation functions. We have successfully applied and continue to apply such concepts to model quasielastic neutron scattering spectra and NMR relaxation spectra from proteins.

Anomalous diffusion generally refers to unconstrained diffusion process where the mean square displacement exhibits a non-linear growth with time. The underlying mechanisms are the same as for anomalous relaxation, except that the dynamics of the diffusing particles, which maybe anything from single atoms to whole proteins, is not space-limited. We have studied anomalous lateral diffusion if lipid molecules in lipid bilayers and we have also developed a theoretical framework for anomalous diffusion and relaxation in general, which links such processes to the atomistic dynamics in “crowded” molecular systems. Anomalous diffusion is an ubiquitous phenomenon which is also of great importance in other domains of science, such as in solid state physics, in physical chemistry, and in financial mathematics (http://www.smoluchowski.if.uj.edu.pl).

**Minimal models for protein structure and dynamics**

Based on the concepts of fractional Brownian dynamics and on the general theoretical framework for anomalous diffusion and relaxation processes, we have developed a so-called minimal model for the backbone dynamics of proteins (J. Chem. Phys. Editor’s choice 2012) and more recently a model-free interpretation of quasielastic neutron scattering spectra (QENS) from proteins proteins (J. Chem. Phys. Editor’s choice 2016). The basic features of protein dynamics, in particular its multiscale character, is here captured by essentially two parameters describing, respectively, the form and the scale of a spectrum. In case of the QENS analysis one uses in addition that high-resolution spectrometers can only detect the asymptotic for of the dynamics for long times and small frequencies.

Another type of minimal protein models, which has been developed in the group, concerns the bigactad.gifcharacterization of their global fold. The ScrewFrame model uses the positions of the Cα-atoms along the backbone of a protein to construct a tube model for the protein under consideration. Such a tube model is essentially characterized by the bending and by the internal torsion of the tube. The model is based on Cα-based Frenet frames, which are constructed from the discrete trace of the Cα-positions, and a sequence of helix motions relating these frames. Current applications concern the structural characterization of “unstructured proteins” and the analysis of electron microscopy clichés.

**Elastic Network Models for proteins**

An Elastic Network Model (ENM) describes a protein as a structured elastic object at a coarse-grained level. The most widely used ENMs represent a protein by its Cα atoms connected by springs. We have been developing, evaluating, and applying ENMs for many years, with applications including in particular the interpretation of low-resolution protein structures and the analysis of conformational transitions.

**Reproducible research**

The rapid change in computing technology have made it difficult to reproduce or verify results obtained with the help of computers. The publication of software and electronic datasets are crucial to improve to make such research transparent, but it remains difficult to publish them in such a way that other scientists can easily re-run a computational analysis several years later. We have been publishing most of our work reproducibly in recent years, using the ActivePapers framework that we are developing to support the specific needs of biomolecular simulation.

**Scientific data management**

A major technical challenge in publishing biomolecular simulation data is the lack of suitable file formats for many data types. Only molecular configurations and sequences of such configurations (trajectories) are well supported by today’s software tools. Other important information, such as molecular systems definitions, including force fields and their parameters, normal modes, or models used in trajectory analysis, are difficult to archive or exchange, and are therefore not published at all. We are working on the development of modular and extensible data model and file formats for all aspects of molecular simulation. Current projects in this field are the MOSAIC data model and the digital scientific notation Leibniz.

**Software development**

Most of our research is methodological and therefore requires the development of appropriate software. We have made all our research software publicly available, both to allow verification of our work and to provide useful tools to the scientific community. Our most widely used tools are the Molecular Modelling Toolkit (MMTK), a Python library for molecular simulation, and nMOLDYN, an analysis tool for Molecular Dynamics (MD) trajectories and the calculation of MD-based neutron scattering spectra.

### Publications

**2017**References found : 5

Rougier, N. P., Hinsen, K., Alexandre, F., Arildsen, T., Barba, L., Benureau, F. C. Y., Brown, C. T., De Buyl, P., Caglayan, O., Davison, A. P., Delsuc, M. A., Detorakis, G. Diem, A. K., Drix, D., Enel, P., Girard, B., Guest, O., Hall, M. G., Henriques, R. N., Hinaut, X., Jaron, K. S, Khamassi, M., Klein, A., Manninen, T., Marchesi, P., McGlinn, D., Metzner, C., Petchey, O. L., Plesser, H. E., Poisot, T., Ram, K., Ram, Y., Roesch, E., Rossant, C., Rostami, V., Shifman, A., Stachelek, J., Stimberg, M., Stollmeier, F., Vaggi, F., Viejo, G., Vitay, J., Vostinar, A., Yurchak, R. and Zito, T. (

**2017**)

### Sustainable computational science: the ReScience initiative

PeerJ Computer Science (sous presse)Computer science offers a large set of tools for prototyping, writing, running, testing, validating, sharing and reproducing results, however computational science lags behind. In the best case, authors may provide their source code as a compressed archive and they may feel confident their research is reproducible. But this is not exactly true. James Buckheit and David Donoho proposed more than two decades ago that an article about computational results is advertising, not scholarship. The actual scholarship is the full software environment, code, and data that produced the result. This implies new workflows, in particular in peer-reviews. Existing journals have been slow to adapt: source codes are rarely requested, hardly ever actually executed to check that they produce the results advertised in the article. ReScience is a peer-reviewed journal that targets computational research and encourages the explicit replication of already published research, promoting new and open-source implementations in order to ensure that the original research can be replicated from its description. To achieve this goal, the whole publishing chain is radically different from other traditional scientific journals. ReScience resides on GitHub where each new implementation of a computational study is made available together with comments, explanations, and software tests.

**2017**)

### A Dream of Simplicity: Scientific Computing on Turing Machines

Computing in Science and Engineering (201) 19 (3) 78-85Frustrated by another failed software installation? Wondering why you can’t reproduce your colleagues’ computations? This story will tell you why. It won’t magically solve your problems, but it does point out a glimpse of hope for the future.

**2017**)

### Scientific workflows for computational reproducibility in the life sciences: Status, challenges and opportunities

Future Generation Computer Systems (2017) 75 (supplement C) 284-298With the development of new experimental technologies, biologists are faced with an avalanche of data to be computationally analyzed for scientific advancements and discoveries to emerge. Faced with the complexity of analysis pipelines, the large number of computational tools, and the enormous amount of data to manage, there is compelling evidence that many if not most scientific discoveries will not stand the test of time: increasing the reproducibility of computed results is of paramount importance.

The objective we set out in this paper is to place scientific workflows in the context of reproducibility. To do so, we define several kinds of reproducibility that can be reached when scientific workflows are used to perform experiments. We characterize and define the criteria that need to be catered for by reproducibility-friendly scientific workflow systems, and use such criteria to place several representative and widely used workflow systems and companion tools within such a framework. We also discuss the remaining challenges posed by reproducible scientific workflows in the life sciences. Our study was guided by three use cases from the life science domain involving in silico experiments.

**2017**)

### Problem-Specific Analysis of Molecular Dynamics Trajectories for Biomolecules

In "The Practice of Reproducible Research: Case Studies and Lessons from the Data-Intensive Sciences", Kitzes, J. Turek, D. Deniz, F. (eds), Part III, 254-260 - Hinsen, K. (**2017**)

### The Roles of Code in Computational Science

Computing in Science & Engineering (201) 19 (1) 78-82Many of us write code regularly as part of our scientific activity, perhaps even as a full-time job. But even though we write—and use—more and more code, we rarely think about the roles that this code will have in our research, in our publications, and ultimately in the scientific record. In this article, the author outlines some frequent roles of code in computational science. These roles aren’t exclusive; in fact, it’s common for a piece of code to have several roles, at the same time or as an evolution over time. Thinking about these roles, ideally before starting to write the code, is a good habit to develop.

**2016**References found : 3

Hinsen K. and Kneller G. R. (

**2016**)

### Communication: A multiscale Bayesian inference approach to analyzing subdiffusion in particle trajectories

The Journal of chemical physics (2016) 145 (15) 151101 - doi : 10.1063/1.4965881Anomalous diffusion is characterized by its asymptotic behavior for t —> infinity. This makes it difficult to detect and describe in particle trajectories from experiments or computer simulations, which are necessarily of finite length. We propose a new approach using Bayesian inference applied directly to the observed trajectories sampled at different time scales. We illustrate the performance of this approach using random trajectories with known statistical properties and then use it for analyzing the motion of lipid molecules in the plane of a lipid bilayer.

**2016**)

### The Power to Create Chaos

Computing in Science & Engineering (2016) 18 (4) 75-79 - doi : 10.1109/MCSE.2016.67Computers are the only research tools that by design exhibit chaotic behavior: a minimal change in the input to a computation can change its output in any imaginable way. Developers and users of scientific software should be aware of this feature and set up safety nets for protecting themselves against bad surprises.

**2016**)

### Asymptotic neutron scattering laws for anomalously diffusing quantum particles

The Journal of Chemical Physics (2016) 145, 044103 - doi : 10.1063/1.4959124The paper deals with a model-free approach to the analysis of quasielastic neutron scattering intensities from anomalously diffusing quantum particles. All quantities are inferred from the asymptotic form of their time-dependent mean square displacements which grow ∝t α, with 0 ≤ α < 2. Confined diffusion (α = 0) is here explicitly included. We discuss in particular the intermediate scattering function for long times and the Fourier spectrum of the velocity autocorrelation function for small frequencies. Quantum effects enter in both cases through the general symmetry properties of quantum time correlation functions. It is shown that the fractional diffusion constant can be expressed by a Green-Kubo type relation involving the real part of the velocity autocorrelation function. The theory is exact in the diffusive regime and at moderate momentum transfers.