Predicting SARS-CoV-2 variant infectivity with biophysical principles

When the COVID-19 pandemic first began, we saw how quickly the SARS-CoV-2 virus evolved. New variants emerged with mutations that increased transmissibility or helped the virus evade our immune systems. But predicting which mutations would be most successful—and why—has remained a challenge.
In our study, in the Proceedings of the National Academy of Sciences, we take a different approach. Rather than relying purely on genomic surveillance, we apply biophysical principles to understand viral evolution. Specifically, we developed a statistical mechanics model that connects the thermodynamic properties of the virus's spike protein to its ability to spread.
A biophysical map of viral fitness
The key player in SARS-CoV-2's infection process is the receptor binding domain (RBD) of the spike protein. The RBD determines how well the virus binds to ACE2, the human cellular receptor, and how effectively it avoids neutralization by antibodies.
We found that binding affinities—how tightly the viral RBD binds to ACE2 and neutralizing antibodies—are powerful predictors of how a variant will behave in a population. Stronger ACE2 binding can make a variant more infectious, while weaker antibody binding allows it to escape immune defenses.
Our biophysical model integrates these molecular measurements to generate a fitness landscape—a map that predicts which viral mutations will be favored by evolution, which we validated with real world sequencing data.
Predicting fitness using experimental and machine learning predictions
To make this model practical, we combined experimental binding affinity data with machine learning predictions. Some RBD mutations have already been studied in the lab, but many have not. To fill in the gaps, we trained a transformer to estimate the dissociation constants (KD) for unseen mutations.
These predicted values were then fed into our biophysical model, allowing us to forecast the fitness of variants before they appear in large numbers in the population.

Why some mutations disappear
Our model also helps explain puzzling evolutionary reversals. For example, the Q493R mutation initially helped the virus evade antibodies, but later variants like BA.4 and BA.5 dropped this mutation. Why? Because fitness is a trade-off—while Q493R helped with immune escape, it also weakened ACE2 binding. As other mutations accumulated, the cost of keeping Q493R became too high, and natural selection reversed it.
This is an example of epistasis—the way mutations interact with each other. Instead of acting independently, mutations can amplify or cancel each other's effects. Our model captures these interactions, making it more realistic than previous approaches that assumed all mutations act in isolation.
Beyond COVID-19: A universal approach to viral evolution
Although we developed this model for SARS-CoV-2, the principles behind it apply to many other viruses. Any virus that relies on protein binding for infection and immune escape—such as influenza or HIV—could be analyzed using similar methods.
By linking molecular properties to real-world epidemiology, our approach offers a powerful tool for pandemic preparedness. Instead of waiting for new variants to spread, we could predict their fitness in advance, guiding public health decisions, vaccine updates, and treatment strategies. By grounding viral evolution in biophysical reality, we move beyond trial-and-error surveillance toward a predictive science of pandemic evolution.
This story is part of , where researchers can report findings from their published research articles. for information about Science X Dialog and how to participate.
More information: Dianzhuo Wang et al, Biophysical principles predict fitness of SARS-CoV-2 variants, Proceedings of the National Academy of Sciences (2024).
Journal information: Proceedings of the National Academy of Sciences
Dianzhuo Wang is a senior PhD student at Shakhnovich biophysics lab at Harvard. He uses tools drawn from statistical mechanics and machine learning to unravel the complex evolution of the SARS-CoV-2 virus. His goal is to develop predictive models that provide valuable insights into the future trajectory of the virus.