Correctness and Reproducibility for Earth System Software

Workshop

We are excited to announce the second edition of the Workshop on Correctness and Reproducibility for Earth System Software, to be held on November 5-7, 2025 at the Mesa Laboratory of the NSF National Center for Atmospheric Research (NCAR) in Boulder, Colorado. We aim to provide a dedicated forum for earth system modelers, software engineers, and the broader scientific software community to discuss challenges, opportunities, and recent advances in ensuring software correctness and reproducibility. This workshop is a follow-up to the inaugural workshop held in November 2023, which brought together participants from academia, research labs, and industry to share their experiences and insights on software correctness and reproducibility. (Link to 2023 Workshop Report.)

Sponsored by the 2025 Better Scientific Software (BSSw) Fellowship program, this year’s workshop will feature a Tutorial on Rigor and Reasoning in Research Software, which will include sessions on practical techniques for improving software quality and reliability in scientific computing. The tutorial will cover core topics such as unit testing, continuous integration (CI), property-based testing, correctness in AI, and reasoning in research software. The workshop will also include invited talks, panel discussions, and contributed presentations on a wide range of topics related to software correctness and reproducibility.

Call for Abstracts: We invite contributions from researchers, software engineers, and practitioners in the Earth System Modeling (ESM) community, as well as the broader scientific computing community. Topics include:

Testing, debugging, QA, and CI tools

Statistical and ensemble-based validation

Software design for correctness and reproducibility

Automated reasoning, formal methods, and verification techniques

Validation of HPC, cloud, heterogeneous, and GPU-based applications

Other verification and validation approaches

Relevant applications include simulation codes, external libraries, AI techniques, diagnostics, packaging, and development practices.

Tutorial

Join us for a hands-on tutorial on bringing rigor and reasoning to research software (R3Sw). The R3Sw Tutorial runs primarily on Day 1 (November 5, 2025), with optional sessions on Days 2–3. See the tentative program for more details.

Motivation. Scientific software enables critical research, yet it’s often built in a “code-and-fix” style: quick to prototype, hard to test, and difficult to reason about. The R3Sw Tutorial draws inspiration from the scientific method, introducing practical techniques for designing and verifying code with the same rigor and systematic reasoning that underpin trustworthy scientific discovery.

What you’ll do. Working through a running example (1-D heat equation), you’ll incrementally transform an unstructured, monolithic code into modular, testable, and trustworthy code. Key topics:

Designing for robustness: specifications, preconditions, postconditions, invariants

Unit testing with `pytest`

Property-based testing with `Hypothesis`

Theorem proving with `z3py`

Practical strategies for testing real-world scientific code

Verifiable and Explainable AI for scientific computing

Tutorial materials. Jupyter notebooks and code examples are available at: GitHub Repository

Invited Speakers and Lecturers

Soonho Kong (AWS)

Adrianna Foster (NSF NCAR)

Antonios Mamalakis (UVA)

Deepak Cherian (Earthmover)

Helen Kershaw (NSF NCAR)

Manish Venumuddula (NSF NCAR)

Who should attend? This tutorial is intended for scientists, engineers, and students involved in scientific computing, regardless of domain or career stage. No prior experience with testing or verification is required, but participants should have some familiarity with Python. This tutorial is led by Alper Altuntas, features guest lecturers from academia and industry, and is supported by the 2025 Better Scientific Software (BSSw) Fellowship program.

Registration

Registration is closed.

Dates

Abstract submissions due: July 25, 2025.

Notification of acceptance: August 22, 2025.

Registration deadline: October 20, 2025 (in person), October 31, 2025 (virtual).

Workshop and Tutorial dates: November 5-7, 2025.

Program

Click here to see the full program and slides

Special Session: AI/ML Reasoning and Explainability in Scientific Software (Thursday Morning Session)

Keynote: Soonho Kong, Amazon Web Services (AWS)
Title: “Lean into Verifiable Intelligence”

Abstract

The convergence of artificial intelligence and formal verification is creating a transformative virtuous cycle that will reshape how we create and validate knowledge. On one side, AI systems increasingly need verification to ensure correct outputs and build trust. On the other, formal verification methods require AI to overcome fundamental challenges in scalability, usability, and automation. This bidirectional relationship is already revolutionizing mathematics through systems like Lean4, where AI assists in proof discovery while formal foundations ensure mathematical rigor, demonstrating how this synergy accelerates reliable knowledge creation. The success of Lean4 in transforming mathematical research offers a compelling blueprint for achieving verifiable intelligence across domains, fundamentally changing our approach to creating systems that are both powerful and provably correct.

Invited Talk: Antonios Mamalakis, University of Virginia (UVA)
Title: “When Do Explanations Explain? A Controlled Study of XAI Under Varying Signal-to-Noise”

Abstract

Neural networks deliver strong predictions across the sciences, but their opacity limits adoption, especially in geoscience, where understanding why a forecast is made matters. Explainable AI (XAI) methods aim to bridge this gap, yet their outputs are method-dependent and can disagree, and even faithful attributions may be physically misleading if the underlying model has learned noise rather than signal. We conduct a controlled study using a synthetic benchmark in which the true drivers of the target are known. By training neural networks across different data sizes and target noise levels (conditions that mirror many observational Earth system datasets), we test when XAI methods recover the true explanatory structure. Two main results emerge. First, explanatory fidelity increases as models capture a larger fraction of the learnable, signal-driven variance. Second, inter-method agreement tracks this fidelity and can serve as a practical proxy when ground truth is unavailable. Conversely, in low signal-to-noise or data-scarce regimes, explanations degrade and methods diverge. These findings offer concrete guidance for deploying XAI in geosciences and beyond: prioritize models that demonstrably learn signal, and use cross-method consensus as an operational check on explanation reliability.

Invited Talk: David John Gagne, NSF National Center for Atmospheric Research
Title: “Validating and Enforcing Physical Consistencies in AI weather prediction models”

Abstract

AI weather prediction models have demonstrated remarkable increases in accuracy over traditional NWP models with far less latency and computational requirements for prediction. However, there are a growing number of examples where the improvements in performance come at the expense of physical consistency guarantees that are necessary assumptions for downstream applications like data assimilation. AI weather prediction models also tend to experience error growth in ways that differ noticeably from physics-based models. This presentation will examine different error scenarios for AI weather prediction models and show how some of these errors are being mitigated through architecture and physics constraints in the NCAR CREDIT platform.

Panel Discussion: ” AI/ML Reasoning and Explainability in Scientific Computing”
Moderator: Dorit Hammerling, Colorado School of Mines
Panelists: Soonho Kong (AWS), Antonios Mamalakis (UVA), David John Gagne (NCAR)

Abstracts: Click here to see the full list of invited and contributed abstracts for the workshop.

Organizers

Co-chairs

Allison Baker, CISL, NSF National Center for Atmospheric Research
Alper Altuntas, CGD, NSF National Center for Atmospheric Research

Committee

John Baugh, Civil Engineering and Operations Research, North Carolina State University
Ilene Carpenter, Earth Sciences Segment Manager, Hewlett Packard Enterprise
Brian Dobbins, CGD, NSF National Center for Atmospheric Research
Michael Duda, Mesoscale & Microscale Meteorology Lab, NSF National Center for Atmospheric Research
Karsten Peters-von Gehlen, Department of Data Management, Deutsches Klimarechenzentrum GmbH (DKRZ)
Ganesh Gopalakrishnan, Kahlert School of Computing, University of Utah
Dorit Hammerling, Applied Mathematics and Statistics, Colorado School of Mines
Balwinder Singh, Atmospheric Sciences and Global Change, Pacific Northwest National Laboratory

Administrators

Samantha Scalice, CISL, NSF National Center for Atmospheric Research
Elizabeth Faircloth, CGD, NSF National Center for Atmospheric Research
Teresa Walz, CGD, NSF National Center for Atmospheric Research

Submissions

Abstract submissions are closed.

Venue

Both the workshop and tutorial will be held in person (with a virtual option) and at the Mesa Laboratory of the NSF National Center for Atmospheric Research. (Helpful things to know for your visit.)

Address: 1850 Table Mesa Dr, Boulder, CO 80305

Virtual Meeting details will be announced later.

Lodging

Fairfield by Marriott Inn & Suites Boulder

Reservation Link

$99/night, Book by October 22, 2025.

Workshop

Tutorial

Registration

Registration is closed.

Dates

Abstract submissions due: July 25, 2025. Notification of acceptance: August 22, 2025. Registration deadline: October 20, 2025 (in person), October 31, 2025 (virtual). Workshop and Tutorial dates: November 5-7, 2025.

Program

Organizers

Co-chairs

Committee

Administrators

Submissions

Abstract submissions are closed.

Venue

Both the workshop and tutorial will be held in person (with a virtual option) and at the Mesa Laboratory of the NSF National Center for Atmospheric Research. (Helpful things to know for your visit.) Address: 1850 Table Mesa Dr, Boulder, CO 80305 Virtual Meeting details will be announced later.

Lodging

Fairfield by Marriott Inn & Suites Boulder Reservation Link $99/night, Book by October 22, 2025.

Abstract submissions due: July 25, 2025.

Notification of acceptance: August 22, 2025.

Registration deadline: October 20, 2025 (in person), October 31, 2025 (virtual).

Workshop and Tutorial dates: November 5-7, 2025.

Both the workshop and tutorial will be held in person (with a virtual option) and at the Mesa Laboratory of the NSF National Center for Atmospheric Research. (Helpful things to know for your visit.)

Address: 1850 Table Mesa Dr, Boulder, CO 80305

Virtual Meeting details will be announced later.

Fairfield by Marriott Inn & Suites Boulder

Reservation Link

$99/night, Book by October 22, 2025.