Âé¶¹ÒùÔº


Peering inside political AI: How LLMs responded to the 2024 election

draw ai choosing between republican donkey and democrat elephant
Credit: AI-generated image

In the months leading up to the 2024 U.S. presidential election, a team of researchers at MIT CSAIL, MIT Sloan, MIT LIDS, set out to answer a question no one had fully explored: how do large language models (LLMs) respond to questions about the election? Over four months, from July through November, the team ran nearly daily queries across 12 state-of-the-art models on more than 12,000 carefully constructed prompts, generating a dataset with over 16 million responses from LLMs, to help answer this question.

This project was sparked by a recognition that the 2024 election was the first to unfold after LLMs became widely available in November 2022. While social media's influence on elections has been studied extensively, the researchers note that LLMs are capable of potentially subtler and more persuasive behaviors: they can sycophantically reinforce ideas, mislead, or even exhibit manipulative tendencies. Understanding how these models interact with political narratives was the central motivation for the study.

Rather than focusing on a single model or snapshot in time, the team approached the election as a longitudinal experiment. The prompts covered a range of topics: candidate traits, exit poll-style questions, election issues, and predictive scenarios.

Questions were systematically varied to explore framing and identity cues—like adding "I am a Republican" or "I am a woman" to prompts—and both offline and online versions of models were included. By collecting time-stamped responses across multiple models and prompt variations, the team created a layered, multidimensional view of LLM behavior that could capture both stability and change during the first major US election since LLMs became mainstream.

The breadth and structure of the resulting dataset make it unique. Researchers can see how models shift in response to current events, internal updates, prompt framing, and identity cues. They can track responses to "endogenous" questions, expected to remain stable regardless of external events, alongside "exogenous" questions that might reflect news or political developments.

Embedding analyses allow comparisons of how similar, or divergent, responses are across time and models. In short, the dataset lets you watch these models think, stumble, and sometimes contradict themselves during the 2024 election season.

Even a slice of the dataset reveals fascinating patterns. For example, the researchers found that LLMs exhibit strong associations between candidates and traits, like "competent" and "temperamental." Moreover, these associations shift across time.

Compared to associations before and after Kamala Harris was formally nominated as the Democratic candidate on August 5, Joe Biden's scores after this date fell across nearly every trait except "incompetent." Some of his lost associations were evenly distributed between Trump and Harris; others, like "charismatic," "compassionate," and "strategic," moved mostly to Harris, while "competent" and "trustworthy" shifted toward Trump. The researchers note that analysis is not causal (not solely due to Harris' nomination) since there are many factors that could cause the shifts.

Patterns also emerged in how models choose to abstain. Across the candidate-trait questions, "Other" or "Unsure" responses frequently exceeded 40% across various models. Adjectives like "ethical," "weak," and "incompetent" were particularly likely to trigger abstention. This behavior varied across models: GPT-4, Claude's Haiku, and Claude's Opus versions declined more often, while Perplexity's model tended to answer more directly. Online vs. offline distinctions were visible, too, suggesting that access to live information affects how guardrails operate.

The dataset also exposes how LLMs simulate and predict voter sentiment. When asked to predict how voter groups will respond to exit poll-style questions such as, "Do you expect life for the next generation of Americans to be better than life today, worse than life today, or about the same?", models often predicted highly variable responses for Trump, Harris, and Biden supporters. On whether life for the next generation would be better or worse, GPT-4o's predictions were skewed. Harris supporters were predicted to be more optimistic, Trump supporters more pessimistic.

Peering inside political AI: How LLMs responded to the 2024 election
Cosine similarities between embeddings (as described in Section˜5.1) across time, for four question categories (rows) and six offline models (columns). Credit: arXiv (2025). DOI: 10.48550/arxiv.2509.18446

These questions reveal more than simply the models' exit poll predictions—they also reveal the models' election predictions. While models typically refused to predict the election outcome when asked directly, their responses to the exit poll-related questions allowed indirect inference of their forecasts. By applying solver methods to the exit poll predictions, researchers inferred which voter base each model implicitly treated as most representative of voters overall.

Interestingly, models were not self-consistent: for example, when asked about issues such as taxes and inflation, GPT-4o's predictions implicitly pointed to Trump supporters being more representative, while Harris supporters were predicted to be more representative when the model was prompted to consider exit poll questions about education, immigration, and racial equality.

Time itself became another axis of variation. Even offline models with deterministic settings showed abrupt "step changes" on specific dates, sometimes aligning with checkpoint or version updates, sometimes appearing spontaneously, hinting at internal dynamics in training, guardrailing, or deployment that may not be visible to users.

Prompt framing further influenced responses. When demographic or identity cues were included, models adjusted their outputs in noticeable ways, not just in wording but in trait attribution, issue emphasis, and even exit poll predictions. Some models were highly steerable, showing large shifts between cues, while others were more stable.

Taken together, these findings hint at the depth and complexity contained in the dataset. Candidate-trait associations, refusal behavior, voter sentiment, implicit predictions, temporal drift, and prompt framing effects all emerge naturally from the data. Yet these analyses are just illustrations; the dataset itself offers a far richer canvas open for exploration.

For lead author Sarah Cen (EECS '24), Assistant Professor at Carnegie Mellon University, and senior authors Aleksander MÄ…dry, Cadence Design Systems Professor of Computing at MIT EECS and CSAIL principal investigator, and Chara Podimata, MIT Sloan Assistant Professor and affiliate member of LIDS, the goal is clear: to provide a methodology and dataset that allow researchers to study how models respond to users during election seasons and in politically sensitive contexts.

Looking ahead, the team envisions comparing model outputs with real-world polls, expanding studies to other democracies, examining newer reasoning or "thinking" models, and exploring downstream effects on users. For now, the dataset itself stands as an open invitation to map the terrain of political AI and better understand how LLMs are shaping discourse in modern democracy.

The project's authors include Carnegie Mellon Assistant Professor Andrew Ilyas (EECS '24), MIT research scientist Hedi Driss, and MIT EECS Ph.D. students Charlotte Park and Aspen Hopkins.

More information: Sarah H. Cen et al, Large-Scale, Longitudinal Study of Large Language Models During the 2024 US Election Season, arXiv (2025).

Journal information: arXiv

Citation: Peering inside political AI: How LLMs responded to the 2024 election (2025, September 25) retrieved 25 September 2025 from /news/2025-09-peering-political-ai-llms-election.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further


0 shares

Feedback to editors