AI and climate change: How to reliably record greenhouse gas emissions

September 8, 2025

AI and climate change: How to reliably record greenhouse gas emissions

edited by , reviewed by

Large companies in the EU are legally required to report their greenhouse gas (GHG) emissions. Yet pulling this information manually from long PDF sustainability reports is slow and error-prone. Many teams try to speed up the process with automation鈥攆or example, by using large language models (LLMs), AI systems that read text and produce answers.

Project coordinator and postdoctoral researcher at the Social Data Science and AI Lab (SODA Lab), Dr. Malte Schierholz urges caution however. "With automatic extraction methods, it's easy to fully trust the LLM's output and overlook measurement errors that occur frequently."

Because the trend of increased automation is promising but risky at the same time, the research group Greenhouse Gas Insights and Sustainability Tracking (GIST) set out to build a reliable point of reference for collecting emission data.

A gold standard for recording emissions data

In a published in Scientific Data, the group introduces a gold-standard benchmark dataset for extracting GHG emissions. The dataset is based on sustainability reports sampled from companies in the MSCI World Small Cap index and the German DAX.

"The basic task was to extract GHG emissions values from PDF files into a table," says Schierholz. "What first sounded straightforward turned out to be surprisingly complex."

In a multi-stage process, sustainable finance experts from LMU and Deutsche Bundesbank worked with methodologists to define strict annotation rules, ran multiple rounds of extraction and verification, and convened expert discussion groups.

"If you want a dataset that's both accurate and allows for comparisons between companies, you need clear rules and plenty of feedback loops throughout the data annotation process," says Jacob Beck, who led the annotation effort. "In the end, some ambiguous cases still required expert group discussion."

Many companies do not provide sufficient documentation

Sustainable finance researcher Dr. Andreas Dimmelmeier (GreenDIA consortium) was not surprised. "The hard-to-resolve cases stem not only from complex and partly inconsistent reporting protocols, but also from missing context and incomplete disclosures in company reports. Many companies in our sample did not disclose emissions according to established reporting and calculation frameworks."

The team also observed that about half of the reports contained no usable greenhouse gas data at all. When emissions were reported, they most often referred to direct emissions and indirect emissions from energy consumption. Data on other indirect emissions, such as those arising in the supply chain or from travel and transport, was rarely complete.

The dataset鈥攖ogether with scripts and supplementary materials鈥攐ffers a transparent, rigorously curated foundation for evaluating automated approaches to sustainability reporting. By making the assumptions and decisions explicit, it enables fair method comparisons and clearer communication of annotation uncertainty. The GIST group hopes this resource will help researchers and practitioners measure progress more honestly and close critical data gaps on the path to net zero.

More information: Jacob Beck et al, Addressing data gaps in sustainability reporting: A benchmark dataset for greenhouse gas emission extraction, Scientific Data (2025).

Journal information: Scientific Data

Provided by Ludwig Maximilian University of Munich

Citation: AI and climate change: How to reliably record greenhouse gas emissions (2025, September 8) retrieved 8 September 2025 from /news/2025-09-ai-climate-reliably-greenhouse-gas.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

麻豆淫院