November 5-7, 2025

2nd Workshop on Correctness and Reproducibility

for Earth System Software

in conjunction with

Tutorial: Rigor and Reasoning in Research Software

Registration Form   Program
NCAR Logo BSSw Fellowship Logo

Workshop

We are excited to announce the second edition of the Workshop on Correctness and Reproducibility for Earth System Software, to be held on November 5-7, 2025 at the Mesa Laboratory of the NSF National Center for Atmospheric Research (NCAR) in Boulder, Colorado. We aim to provide a dedicated forum for earth system modelers, software engineers, and the broader scientific software community to discuss challenges, opportunities, and recent advances in ensuring software correctness and reproducibility. This workshop is a follow-up to the inaugural workshop held in November 2023, which brought together participants from academia, research labs, and industry to share their experiences and insights on software correctness and reproducibility.

Sponsored by the 2025 Better Scientific Software (BSSw) Fellowship program, this year’s workshop will feature a Tutorial on Rigor and Reasoning in Research Software, which will include sessions on practical techniques for improving software quality and reliability in scientific computing. The tutorial will cover core topics such as unit testing, continuous integration (CI), property-based testing, correctness in AI, and reasoning in research software. The workshop will also include invited talks, panel discussions, and contributed presentations on a wide range of topics related to software correctness and reproducibility.

Call for Abstracts: We invite contributions from researchers, software engineers, and practitioners in the Earth System Modeling (ESM) community, as well as the broader scientific computing community. Topics include:

  • Testing, debugging, QA, and CI tools
  • Statistical and ensemble-based validation
  • Software design for correctness and reproducibility
  • Automated reasoning, formal methods, and verification techniques
  • Validation of HPC, cloud, heterogeneous, and GPU-based applications
  • Other verification and validation approaches

Relevant applications include simulation codes, external libraries, AI techniques, diagnostics, packaging, and development practices.

Tutorial

Join us for a hands-on tutorial on bringing rigor and reasoning to research software (R3Sw). The R3Sw Tutorial runs primarily on Day 1 (November 5, 2025), with optional sessions on Days 2–3. See the tentative program for more details.

Motivation. Scientific software enables critical research, yet it’s often built in a “code-and-fix” style: quick to prototype, hard to test, and difficult to reason about. The R3Sw Tutorial draws inspiration from the scientific method, introducing practical techniques for designing and verifying code with the same rigor and systematic reasoning that underpin trustworthy scientific discovery.

What you’ll do. Working through a running example (1-D heat equation), you’ll incrementally transform an unstructured, monolithic code into modular, testable, and trustworthy code. Key topics:

  • Designing for robustness: specifications, preconditions, postconditions, invariants
  • Unit testing with pytest
  • Property-based testing with Hypothesis
  • Symbolic & bounded checking with CrossHair
  • Practical strategies for testing real-world scientific code and verifying AI/ML-generated code

Who should attend? This tutorial is intended for scientists, engineers, and students involved in scientific computing, regardless of domain or career stage. No prior experience with testing or verification is required, but participants should have some familiarity with Python.

We will offer travel support for a limited number of students and early-career researchers. Indicate your interest on the registration form. This tutorial is led by Alper Altuntas, features guest lecturers from academia and industry, and is supported by the 2025 Better Scientific Software (BSSw) Fellowship program.

Registration

  • Registration form.
  • In-person fee: $80, Virtual fee: $25
  • Travel Support: Available for a limited number of student / early career participants. Indicate your interest in the registration form.
  • Tutorial-only participation (Day 1) for students and postdocs is free. Contact altuntas AT ucar DOT edu for details.

Dates

  • Abstract submissions due: July 25, 2025.
  • Notification of acceptance: August 22, 2025.
  • Travel Support application deadline: September 29, 2025.
  • Registration deadline: October 20, 2025 (in person), October 31, 2025 (virtual).
  • Workshop and Tutorial dates: November 5-7, 2025.

Program

  • Click here to see the full program (Days 1-3)

  • Special Session: AI/ML Reasoning and Explainability in Scientific Software (Thursday Morning Session)

    • Keynote: Soonho Kong, Amazon Web Services (AWS)
      Title: “Lean into Verifiable Intelligence”
      Abstract

      The convergence of artificial intelligence and formal verification is creating a transformative virtuous cycle that will reshape how we create and validate knowledge. On one side, AI systems increasingly need verification to ensure correct outputs and build trust. On the other, formal verification methods require AI to overcome fundamental challenges in scalability, usability, and automation. This bidirectional relationship is already revolutionizing mathematics through systems like Lean4, where AI assists in proof discovery while formal foundations ensure mathematical rigor, demonstrating how this synergy accelerates reliable knowledge creation. The success of Lean4 in transforming mathematical research offers a compelling blueprint for achieving verifiable intelligence across domains, fundamentally changing our approach to creating systems that are both powerful and provably correct.

    • Invited Talk: Antonios Mamalakis, University of Virginia (UVA)
      Title: “When Do Explanations Explain? A Controlled Study of XAI Under Varying Signal-to-Noise”
      Abstract

      Neural networks deliver strong predictions across the sciences, but their opacity limits adoption, especially in geoscience, where understanding why a forecast is made matters. Explainable AI (XAI) methods aim to bridge this gap, yet their outputs are method-dependent and can disagree, and even faithful attributions may be physically misleading if the underlying model has learned noise rather than signal. We conduct a controlled study using a synthetic benchmark in which the true drivers of the target are known. By training neural networks across different data sizes and target noise levels (conditions that mirror many observational Earth system datasets), we test when XAI methods recover the true explanatory structure. Two main results emerge. First, explanatory fidelity increases as models capture a larger fraction of the learnable, signal-driven variance. Second, inter-method agreement tracks this fidelity and can serve as a practical proxy when ground truth is unavailable. Conversely, in low signal-to-noise or data-scarce regimes, explanations degrade and methods diverge. These findings offer concrete guidance for deploying XAI in geosciences and beyond: prioritize models that demonstrably learn signal, and use cross-method consensus as an operational check on explanation reliability.

    • Invited Talk: David John Gagne, NSF National Center for Atmospheric Research
      Title: “Validating and Enforcing Physical Consistencies in AI weather prediction models”
      Abstract

      AI weather prediction models have demonstrated remarkable increases in accuracy over traditional NWP models with far less latency and computational requirements for prediction. However, there are a growing number of examples where the improvements in performance come at the expense of physical consistency guarantees that are necessary assumptions for downstream applications like data assimilation. AI weather prediction models also tend to experience error growth in ways that differ noticeably from physics-based models. This presentation will examine different error scenarios for AI weather prediction models and show how some of these errors are being mitigated through architecture and physics constraints in the NCAR CREDIT platform.

    • Panel Discussion: ” AI/ML Reasoning and Explainability in Scientific Computing”
      Moderator: Dorit Hammerling, Colorado School of Mines
      Panelists: Soonho Kong (AWS), Antonios Mamalakis (UVA), David John Gagne (NCAR)

Organizers

Co-chairs
Committee
  • John Baugh, Civil Engineering and Operations Research, North Carolina State University
  • Ilene Carpenter, Earth Sciences Segment Manager, Hewlett Packard Enterprise
  • Brian Dobbins, CGD, NSF National Center for Atmospheric Research
  • Michael Duda, Mesoscale & Microscale Meteorology Lab, NSF National Center for Atmospheric Research
  • Karsten Peters-von Gehlen, Department of Data Management, Deutsches Klimarechenzentrum GmbH (DKRZ)
  • Ganesh Gopalakrishnan, Kahlert School of Computing, University of Utah
  • Dorit Hammerling, Applied Mathematics and Statistics, Colorado School of Mines
  • Balwinder Singh, Atmospheric Sciences and Global Change, Pacific Northwest National Laboratory
Administrators

Submissions

  • Abstract submissions are closed.

Venue

Both the workshop and tutorial will be held in person (with a virtual option) and at the Mesa Laboratory of the NSF National Center for Atmospheric Research. (Helpful things to know for your visit.)

  • Address: 1850 Table Mesa Dr, Boulder, CO 80305
  • Virtual Meeting details will be announced later.

    CW2025

Lodging

  • Fairfield by Marriott Inn & Suites Boulder