Proposal outlines open ecosystem to make molecular simulation data reusable and AI-ready

More than a hundred experts in molecular simulation have a paper in the journal Nature Methods calling for a paradigm shift in molecular dynamics data management.
The paper, led by Modesto Orozco, professor at the University of Barcelona, and the expert Adam Hospital, both members of the Institute for Research in Biomedicine (IRB Barcelona), proposes the creation of a common infrastructure for storing and reusing data in the context of the revolution that artificial intelligence represents.
In particular, the article advocates for the implementation of FAIR (findable, accessible, interoperable, reusable) principles to improve the reproducibility of the calculations and facilitate their subsequent use as a source of information on the flexibility of biomacromolecules.
Computational simulations have become a key tool for studying the behavior of biomolecules over time. Thanks to supercomputers, molecular dynamics (MD) makes it possible to observe these processes with great precision and provides new knowledge of interest both in basic research and in the design of biomolecules, from enzymes to drugs.
Unlike structural biology or genomics—disciplines in which storing and sharing data under common standards is common practice—in the field of molecular simulation, these data remain fragmented. Moreover, they often end up forgotten on personal computers, which hinders the reproducibility of calculations and prevents their further use.
This creates a major problem in integrating data into structural biology and biophysics workflows, and slows down the development of artificial intelligence methods, the training of which is extremely dependent on access to large amounts of dynamic data.
Reuse rather than repeat
Designing an open and sustainable ecosystem that multiplies the impact of this data and avoids unnecessary duplication is the aim of the new article, signed by more than a hundred leading international researchers, including several Nobel laureates in chemistry. The authors call for a change of model to apply the FAIR principles—which ensure that data is findable, accessible, interoperable and reusable—to simulation results.
"The community has assumed for years that repeating a simulation was easier and cheaper than archiving it. But that is no longer true," says Dr. Orozco, professor at the UB's Faculty of Chemistry, coordinator of the European MDDB project, head of the Molecular Modeling and Bioinformatics Group at IRB Barcelona and founder of the biotechnology company Nostrum Biodiscovery.
"The knowledge we can get from reusing data is enormous: it allows us to identify new targets, train artificial intelligence algorithms or design new experiments," adds researcher Hospital. Orozco and Hospital lead the European MDDB project, which aims to establish a centralized and accessible database for simulations.
Lessons from other fields
The proposal draws inspiration from the success of other fields that have embraced open science. The Protein Data Bank, which has collected three-dimensional structures of biomacromolecules since the 1970s, has been instrumental—not only in revealing the function of proteins and nucleic acids, enabling the "omics" revolution, and providing a holistic view of the cell, but also in the development of drugs, vaccines, and new therapies.
The data stored there were key to training AlphaFold2, which was recognized with the 2024 Nobel Prize in Chemistry. The authors argue that complementing these structural data with dynamic information will open a new field whose developmental potential is difficult to grasp.
According to the authors of the article, the time has come for the molecular simulation community to adopt practices similar to those of the structural and "omics" communities—not only preserving data, but also standardizing file formats, metadata, and quality criteria. The text outlines how a federated infrastructure—with distributed nodes and shared access tools—could make this planet-scale archive feasible.
Beyond storage
The approach put forward in the article published in Nature Methods goes beyond merely storing data. It advocates for an integrated model—from the precise documentation of simulations (including conditions, software, parameters, etc.) to their automated analysis, validation, and reuse through machine learning techniques.
"The value of these data doesn't end with the publication of a paper or their presentation at a conference. Often, that's just the beginning," concludes Dr. Orozco. "We must treat data as a shared resource for science."
This article has been drawn up in the framework of the European Project MDDB (Molecular Dynamics Data Bank), coordinated by IRB Barcelona, which aims to build an open and standardized database to store dynamic molecular simulations. The consortium brings together leading research centers in bioinformatics, simulation and data analysis to move toward more open, reproducible and collaborative science.
More information: Rommie E. Amaro et al, The need to implement FAIR principles in biomolecular simulations, Nature Methods (2025).
Journal information: Nature Methods
Provided by University of Barcelona