AI advance helps astronomers spot cosmic events with just a handful of examples

Stephanie Baum
scientific editor

Robert Egan
associate editor

A new study co-led by the University of Oxford and Google Cloud has shown how general-purpose AI can accurately classify real changes in the night sky—such as an exploding star, a black hole tearing apart a passing star, a fast-moving asteroid, or a brief stellar flare from a compact star system—and explain its reasoning, without the need for complex training.
Published in Nature Astronomy, by researchers from the University of Oxford, Google Cloud, and Radboud University demonstrates that a general-purpose large language model (LLM)—Google's Gemini—can be transformed into an expert astronomy assistant with minimal guidance.
Using just 15 example images and a simple set of instructions, Gemini learned to distinguish real cosmic events from imaging artifacts with approximately 93% accuracy. Crucially, the AI also provided a plain-English explanation for every classification—an important step towards making AI-driven science more transparent and trustworthy, and towards building accessible tools that don't require massive training datasets or deep expertise in AI programming.
"It's striking that a handful of examples and clear text instructions can deliver such accuracy," said Dr. Fiorenzo Stoppa, co-lead author, from the University of Oxford's Department of Âé¶¹ÒùÔºics. "This makes it possible for a broad range of scientists to develop their own classifiers without deep expertise in training neural networks—only the will to create one."
"As someone without formal astronomy training, this research is incredibly exciting," said Turan Bulmus, co-lead author, from Google Cloud. "It demonstrates how general-purpose LLMs can democratize scientific discovery, empowering anyone with curiosity to contribute meaningfully to fields they might not have a traditional background in. It's a testament to the power of accessible AI to break down barriers in scientific research."
Rare signals in a universe of noise
Modern telescopes scan the sky relentlessly, generating millions of alerts every night about potential changes. While some of these are genuine discoveries like exploding stars, the vast majority are "bogus" signals caused by satellite trails, cosmic ray hits, or other instrumental artifacts.
Traditionally, astronomers have relied on specialized machine learning models to filter this data. However, these systems often operate like a "black box," providing a simple "real" or "bogus" label without explaining their logic. This forces scientists to either blindly trust the output or spend countless hours manually verifying thousands of candidates—a task that will become impossible with the next generation of telescopes such as the Vera C. Rubin Observatory, which will output around 20 terabytes of data every 24 hours.
The research team asked a simple question: Could a general-purpose, multimodal AI like Gemini, designed to understand text and images together, not only match the accuracy of specialized models, but also explain what it sees?
The team provided the LLM with just 15 labeled examples for each of three major sky surveys (ATLAS, MeerLICHT, and Pan-STARRS). Each example included a small image of a new alert, a reference image of the same patch of sky, and a "difference" image highlighting the change, along with a brief expert note. Guided only by these few-shot examples and concise instructions, the model then classified thousands of new alerts, providing a label (real/bogus), a priority score, and a short, readable description of its decision.

A human in the loop: An AI that knows when to ask for help
A key component of the study was verifying the quality and usefulness of the AI's explanations. The team assembled a panel of 12 astronomers to review the AI's descriptions, who rated them as highly coherent and useful.
Moreover, in a parallel test, the team had Gemini review its own answers and assign a coherence score to each one. They discovered that the model's confidence was a powerful indicator of its accuracy: low-coherence outputs were much more likely to be incorrect. This self-assessment capability is critical for building a reliable "human-in-the-loop" workflow. By automatically flagging its own uncertain cases for human review, the system can focus astronomers' attention where it is most needed.
Using this self-correction loop to refine the initial examples, the team improved the model's performance on one dataset from ~93.4% to ~96.7%, demonstrating how the system can learn and improve in partnership with human experts.
Co-author Professor Stephen Smartt (Department of Âé¶¹ÒùÔºics, University of Oxford) said, "I've worked on this problem of rapidly processing data from sky surveys for over 10 years, and we are constantly plagued by weeding out the real events from the bogus signals in the data processing. We have spent years training machine learning models, neural networks, to do image recognition.
"However, the LLM's accuracy at recognizing sources with minimal guidance rather than task-specific training was remarkable. If we can engineer to scale this up, it could be a total game-changer for the field, another example of AI enabling scientific discovery."
What's next?
The team envisions this technology as the foundation for autonomous "agentic assistants" in science. Such systems could do far more than classify a single image; they could integrate multiple data sources (like images and brightness measurements), check their own confidence, autonomously request follow-up observations from robotic telescopes, and escalate only the most promising and unusual discoveries to human scientists.
Because the method requires only a small set of examples and plain-language instructions, it can be rapidly adapted for new scientific instruments, surveys, and research goals across different fields.
"We are entering an era where scientific discovery is accelerated not by black-box algorithms, but by transparent AI partners," said Turan Bulmus, co-lead author from Google Cloud.
"This work shows a path towards systems that learn with us, explain their reasoning, and empower researchers in any field to focus on what matters most: asking the next great question."
More information: Textual interpretation of transient image classifications from large language models, Nature Astronomy (2025).
Journal information: Nature Astronomy
Provided by University of Oxford