New AI technique creates synthetic images to track costly invasive plants

Gaby Clark
scientific editor

Andrew Zinin
lead editor

A single plant costs U.S. ranchers $35 million a year. Now, a team of researchers is using artificial intelligence to keep it in check.
Researchers at Carnegie Mellon University, together with scientists at a conservation ranch in Montana, developed a method that trains machine learning models to detect invasive species more effectively, even with limited data. The study is in the journal Scientific Data.
One invasive, leafy spurge is a noxious weed with small green flowers that can wreak havoc on farms and natural ecosystems. Toxic to livestock, it can render whole hayfields inedible by crowding out native plants. Researchers estimate that it causes more than $35 million in losses annually in the country's beef and hay production.
But the costs go far beyond the bottom line. As grazing lands shrink, food supplies tighten. As invasive species spread, pesticide use rises. Native plants disappear, pollinators and birds lose habitat, and the broader ecosystem begins to unravel. What starts as a weed in a pasture can quickly become a threat to the land itself.
Tracking down, monitoring and ultimately eradicating invasive species is a priority for those in agriculture, conservation and ecology, but this task is time-consuming and costly. AI could help, but scientists lack the data needed to develop a tool that can effectively identify and monitor these plants.
"These invasive plants are a serious problem," said Ruslan Salakhutdinov, faculty in CMU's School of Computer Science. "Leafy spurge can destroy the ecosystems around it. Building a machine learning tool to help was a tough problem to solve because we didn't have massive amounts of data on this plant, even online.
"So, it became this problem of how do we build accurate models with a limited amount of data. That's the problem we wanted to solve. And, along with this interesting problem, the solution has a big impact on the ecology and environment."
Salakhutdinov is the UPMC Professor of Computer Science in the Machine Learning Department (MLD). He worked with Brandon Trabucco, an MLD doctoral student; Max Gurinas, at Harvard University; and Kyle Doherty, a staff scientist at MPG Ranch, which manages more than 15,000 acres of conservation property in western Montana for research. Scientists at MPG Ranch had been using drones and AI tools to map the locations of plants in greater detail. Even though Doherty called himself an "AI nerd," working with Salakhutdinov's lab expanded his knowledge of the field.
Researchers at SCS wanted to leverage new generative AI tools to improve existing models trained to detect leafy spurge using existing drone images. Salakhutdinov and Trabucco wondered if using synthetic images of leafy spurge made with AI could create the needed data to make the models work better.
Researchers developed a new technique called DA-Fusion, which uses advanced AI models to create more useful training images. Normally, to expand a dataset, researchers make small changes to existing images, like cropping or flipping them. DA-Fusion goes further by changing the image's subject or background. For example, if the original photo shows leafy spurge growing in a crop field, DA-Fusion might generate an image of the same weed in a forest or a grassy field during a different season.
DA-Fusion created diverse training data of leafy spurge under various weather conditions, such as snow or during a spring bloom. This spared ecologists in Montana from needing to go out in every weather condition to gather data.
A representative sample of leafy spurge presences (top; a) and absences (bottom; b).
"The costs to effectively manage a conservation ranch like MPG can be quite high, and a lot of that is due to the labor necessary to access remote areas and assess the presence of plants like leafy spurge," said Gurinas. "Machine learning techniques like the one we've developed allow a degree of automation which makes conservation efforts across wider areas more economically feasible."
By improving the diversity and quality of training data, researchers can improve the accuracy of machine learning models with fewer examples. Researchers agree that establishing this relationship between conservation scientists and machine learning researchers is a critical development for the future of agriculture and ecology.
"The exciting thing about the datasets that we're building is that they're unique. There aren't many ecological datasets out there for the machine learning community to sink their teeth into," Doherty said. "I think people are interested in making an impact. You can solve the problem of restoration ecologists and combat climate change. It's meaningful work that's important to put out there."
Doherty and his colleagues at MPG Ranch have made their leafy spurge dataset publicly available for the machine learning community. By sharing the data openly, they hope to accelerate efforts to detect and manage other invasive species.
"These tasks are some of the most important we face as a society," Trabucco said. "Problems like leafy spurge are very underserved, and maybe the advances that we see in machine learning can help us in the ways we've seen these tools solve other problems and unlock new abilities."
More information: Kyle Doherty et al, Ground-truthed and high-resolution drone images of the leafy spurge weed plant (Euphorbia esula), Scientific Data (2025).
Journal information: Scientific Data
Provided by Carnegie Mellon University