From sequence to structure: A fast track for RNA modeling
 
             In Biology 101, we learn that RNA is a single, ribbon-like strand of base pairs that is copied from our DNA and then read like a recipe to build a protein. But there's more to the story. Some RNA strands fold into complex shapes that allow them to drive cellular processes like gene regulation and protein synthesis, or catalyze biochemical reactions.
We know that these active molecules, called non-coding RNAs, are present in all life forms, yet we're just starting to understand their many roles—and how they can be harnessed for applications in environmental science, agriculture, and medicine.
To study—and potentially modify—the functions of non-coding RNAs, we need to determine their structure. Scientists from Lawrence Berkeley National Laboratory (Berkeley Lab) and the Hebrew University of Jerusalem have developed a streamlined process that predicts the structure of an RNA molecule down to the atomic level.
Members of the research community can come to Berkeley Lab's Advanced Light Source (ALS) user facility knowing nothing more than the molecule's nucleotide sequence and get a structure, or they can do it themselves using the team's open-source software.
"We were looking at the bigger picture with structure prediction, like how we can go from A to Z rather than working on A, B, and D. That's what we try to do at Berkeley Lab, make it user-friendly," said Michal Hammel, a staff scientist in Berkeley Lab's Molecular Biophysics and Integrated Bioimaging (MBIB) division.
Hammel co-developed the process, called SOlution Conformation PrEdictor for RNA (SCOPER), with MBIB colleague Scott Classen and Hebrew University collaborators Dina Schneidman-Duhovny and Edan Patt. A paper describing SCOPER was recently in Biophysical Journal.
Historically, it has ranged between difficult to impossible to accurately determine the three-dimensional atomic blueprint of a folded RNA because they rarely convert into a neat crystalline form to be imaged with X-ray crystallography. And because the twists and folds of the RNA strand move around as the molecule functions, there are actually multiple correct structures.
In recent years, AI tools like AlphaFold have become very accurate at generating protein structure predictions based on amino acid sequences, making life a lot easier for scientists worldwide and greatly accelerating the pace of drug discovery.
These algorithms have been expanded to RNA structures, but the accuracy remains middling. Getting a reliable model currently involves combining the outputs of multiple computational tools and imaging data. It's a long process, and still fraught with uncertainty.
SCOPER has simplified it significantly. Say you want to study a new RNA: First, put the nucleotide sequence into one of the open-source, AI-based structure prediction tools available today. Then, take your sample to a small angle X-ray scattering (SAXS) facility for characterization. Better yet, let Hammel and his colleagues at the ALS's SAXS beamline get that data for you.
Take the SAXS data and predicted structures, and put them through SCOPER's pipeline. The first step uses an existing program to generate possible flexible arrangements of the RNA from the predicted static structures.
Next, a new machine learning program, developed and trained on existing atomic structures by Patt, refines the structures by adding the placements of magnesium ions. Inside cells, positively charged magnesium ions interact with negatively charged RNAs to keep them folded stably. Their presence also helps elucidate structure when using SAXS.
Then SCOPER generates simulated SAXS data representing the theoretical structures and compares them with the real-world SAXS data to determine which structure is correct.
Finally, another software program models the multiple arrangements that the confirmed structure might take as it functions. Without having to corral multiple software tools themselves, users walk away with a set of precise, three-dimensional atomistic models.
"These days, programs like AlphaFold are almost 95% accurate for proteins but much worse for RNA. It will sometimes come up with five different models that are different. And now the question is, which one is right?" said Hammel. "SCOPER can tell you."
Researchers in the U.S. and Europe are already using the process, but the team is still working to make SCOPER even more convenient. The computing cluster at the SIBYLS beamline can run SCOPER as well as the initial structure prediction software like AlphaFold3, so users don't need to perform that step in advance.
But the power of this cluster is limited, so to make it as smooth and speedy as possible, Classen is installing the pipeline on the supercomputing systems at Berkeley Lab's National Energy Research Scientific Computing Center (NERSC) user facility. His goal is to make one "nice and neat" self-contained application at NERSC that users can operate with ease.
When complete, researchers could perform the whole process remotely by using the SIBYLS automated beamline capabilities, allowing the flexibility for users to mail in samples. Then, they could access SCOPER online.
Berkeley Lab will be a one-stop shop for visualizing the solution state of RNAs.
More information: Edan Patt et al, Predicting RNA structure and dynamics with deep learning and solution scattering, Biophysical Journal (2024).
Journal information: Biophysical Journal
Provided by Lawrence Berkeley National Laboratory
 
                                 
                                 
                                 
                                 
                                 
                                 
                                 
                                 
                                 
                                 
                                 
                                 
                                 
                                 
                                 
                                 
                                 
                                 
                                