BEACON AI Testbed

Evaluating AI-Enhanced Weather Forecasting in Southern New Mexico

Scroll to explore

Overview

This visualization compares the performance of four AI weather models — CREDIT, FourCastNet, GraphCast, and Pangu-Weather — both standalone and as ensembles with 4DWX, across 38 meteorological stations in southern New Mexico (Domain d03, 3.3 km resolution).

The analysis covers forecast lead times from 6 to 114 hours for three key atmospheric variables: potential temperature, sea level pressure, and wind speed.

-- Stations
-- Lead Times
01

Mean Performance Over Lead Time

How does forecast accuracy change as we predict further into the future? This chart shows the mean performance metric averaged across all stations for each lead time.

Interpretation

Loading interpretation...

02

Variability Across Stations

How consistent is model performance across different locations? The shaded bands show the range (min to max) across all stations, while the lines show the median.

Interpretation

Loading interpretation...

03

Direct Model Comparison

How do models compare head-to-head? The delta plot below shows the computed difference between model pairs across all lead times, while the grouped bar chart compares every model at five key forecast milestones.

A. Pairwise Difference Across Lead Times

B. Model Performance at Key Milestones

Interpretation

Loading interpretation...

04

Spatial Performance Patterns

Where does each model perform best? This map shows the comparative performance at each station location, averaged across all lead times.

Interpretation

Blue markers indicate CREDIT performs better; orange markers indicate CREDIT + 4DWX performs better. Purple markers show similar performance (within 3%).

05

Detailed Station Analysis

A comprehensive view of CREDIT model performance at every station and lead time. This heatmap reveals patterns that may not be visible in aggregated statistics.

Interpretation

Loading interpretation...

06

Methodology

Data Source

Forecast verification data generated using METplus verification framework. Observations from 38 surface stations across southern New Mexico.

Models Compared

  • CREDIT - AI/ML framework (CREDIT_6h)
  • CREDIT + 4DWX - Ensemble combination of CREDIT and 4DWX
  • FourCastNet - AI/ML global weather forecast model
  • FourCastNet + 4DWX - Ensemble combination of FourCastNet and 4DWX
  • GraphCast - Google DeepMind AI weather forecast model
  • GraphCast + 4DWX - Ensemble combination of GraphCast and 4DWX
  • Pangu-Weather - Huawei AI weather forecast model
  • Pangu-Weather + 4DWX - Ensemble combination of Pangu-Weather and 4DWX
  • 4DWX - Operational baseline model

Metrics

  • RMSE (Root Mean Square Error) - Measures absolute forecast accuracy. Lower values are better.
  • RMSE Skill Score - Measures RMSE improvement relative to 4DWX baseline. Values above 0 indicate improvement; 1.0 would be a perfect forecast.

Variables

  • Potential Temperature (pott) - Temperature adjusted for pressure, measured in Kelvin
  • Sea Level Pressure (slp) - Atmospheric pressure at sea level, measured in hPa
  • Wind Speed (wspd) - Surface wind speed, measured in m/s
  • 2m Temperature (t2_adj) - Adjusted 2-meter temperature, measured in Kelvin