Grand Statistics#

This example demonstrates how to compute grand statistics for an observation sequence. For an explanation of the statistics calculations see the Statistics guide.

Import the obs_sequence module and the statistics module.

import pydartdiags.obs_sequence.obs_sequence as obsq
from pydartdiags.stats import stats

Chose an obs_seq file to read. This is a small obs_seq file “obs_seq.final.ascii.medium” that comes with the pyDARTdiags package in the data directory, so we import os to get the path to the file

import os
data_dir = os.path.join(os.getcwd(), "../..", "data")
data_file = os.path.join(data_dir, "obs_seq.final.ascii.medium")

Read the obs_seq file into an obs_seq object.

obs_seq = obsq.ObsSequence(data_file)

# Select observations that were used in the assimilation.
used_obs = obs_seq.select_used_qcs()

used_obs is a dataframe with only the observations with QC value of 0.

The columns of the dataframe are the same as the original obs_seq dataframe.

used_obs.columns
Index(['obs_num', 'observation', 'prior_ensemble_mean',
       'prior_ensemble_spread', 'Data_QC', 'DART_quality_control',
       'linked_list', 'longitude', 'latitude', 'vertical', 'vert_unit', 'type',
       'metadata', 'external_FO', 'seconds', 'days', 'time', 'obs_err_var'],
      dtype='object')

Now we calculate the statistics required for DART diagnostics.

stats.diag_stats(used_obs)
obs_num observation prior_ensemble_mean prior_ensemble_spread Data_QC DART_quality_control linked_list longitude latitude vertical vert_unit type metadata external_FO seconds days time obs_err_var prior_sq_err prior_bias prior_totalvar
0 1 230.16 231.310652 0.405191 1.0 0.0 -1 2 -1 274.460 40.010 23950.0 pressure (Pa) ACARS_TEMPERATURE [] [] 75603 153005 2019-12-01 21:00:03 1.00 1.324001 1.150652 1.164180
1 2 18.40 15.720527 0.630827 1.0 0.0 1 3 -1 274.460 40.010 23950.0 pressure (Pa) ACARS_U_WIND_COMPONENT [] [] 75603 153005 2019-12-01 21:00:03 6.25 7.179578 -2.679473 6.647942
2 3 1.60 -4.932073 0.825899 1.0 0.0 2 4 -1 274.460 40.010 23950.0 pressure (Pa) ACARS_V_WIND_COMPONENT [] [] 75603 153005 2019-12-01 21:00:03 6.25 42.667980 -6.532073 6.932109
3 4 264.16 264.060532 0.035584 1.0 0.0 3 5 -1 242.628 34.105 56260.0 pressure (Pa) ACARS_TEMPERATURE [] [] 75603 153005 2019-12-01 21:00:03 1.00 0.009894 -0.099468 1.001266
4 5 11.60 10.134115 0.063183 1.0 0.0 4 6 -1 242.628 34.105 56260.0 pressure (Pa) ACARS_U_WIND_COMPONENT [] [] 75603 153005 2019-12-01 21:00:03 6.25 2.148818 -1.465885 6.253992
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
996 997 229.96 229.980655 0.205479 1.0 0.0 996 998 -1 239.665 36.126 28750.0 pressure (Pa) ACARS_TEMPERATURE [] [] 75661 153005 2019-12-01 21:01:01 1.00 0.000427 0.020655 1.042222
997 998 44.60 40.860344 0.479893 1.0 0.0 997 999 -1 239.665 36.126 28750.0 pressure (Pa) ACARS_U_WIND_COMPONENT [] [] 75661 153005 2019-12-01 21:01:01 6.25 13.985023 -3.739656 6.480297
998 999 26.80 28.039954 0.481712 1.0 0.0 998 1000 -1 239.665 36.126 28750.0 pressure (Pa) ACARS_V_WIND_COMPONENT [] [] 75661 153005 2019-12-01 21:01:01 6.25 1.537487 1.239954 6.482046
999 1000 237.16 237.538919 0.280533 1.0 0.0 999 1001 -1 261.915 33.102 31070.0 pressure (Pa) ACARS_TEMPERATURE [] [] 75661 153005 2019-12-01 21:01:01 1.00 0.143579 0.378919 1.078699
1000 1001 43.30 43.186404 0.573788 1.0 0.0 1000 -1 -1 261.915 33.102 31070.0 pressure (Pa) ACARS_U_WIND_COMPONENT [] [] 75661 153005 2019-12-01 21:01:01 6.25 0.012904 -0.113596 6.579233

689 rows × 21 columns



The statistics are calculated for each row in the dataframe, and the results are stored in new columns.

used_obs.columns
Index(['obs_num', 'observation', 'prior_ensemble_mean',
       'prior_ensemble_spread', 'Data_QC', 'DART_quality_control',
       'linked_list', 'longitude', 'latitude', 'vertical', 'vert_unit', 'type',
       'metadata', 'external_FO', 'seconds', 'days', 'time', 'obs_err_var',
       'prior_sq_err', 'prior_bias', 'prior_totalvar'],
      dtype='object')

The help function can be used to find out more about the diag_stats function including what statistics are calculated.

help(stats.diag_stats)
Help on function diag_stats in module pydartdiags.stats.stats:

diag_stats(df, phase)
    Calculate diagnostic statistics for a given phase and add them to the DataFrame.

    Note:
        This function is decorated with @apply_to_phases_in_place, which modifies its usage.
        You should call it as diag_stats(df), and the decorator will automatically apply the
        function to all relevant phases (‘prior’ and ‘posterior’) modifying the DataFrame
        in place.

    Args:
        df (pandas.DataFrame): The input DataFrame containing observation data and ensemble statistics.
            The DataFrame must include the following columns:

            - 'observation': The actual observation values.
            - 'obs_err_var': The variance of the observation error.
            - 'prior_ensemble_mean' and/or 'posterior_ensemble_mean': The mean of the ensemble.
            - 'prior_ensemble_spread' and/or 'posterior_ensemble_spread': The spread of the ensemble.

    Returns:
        None: The function modifies the DataFrame in place by adding the following columns:
            - 'prior_sq_err' and/or 'posterior_sq_err': The square error for the 'prior' and 'posterior' phases.
            - 'prior_bias' and/or 'posterior_bias': The bias for the 'prior' and 'posterior' phases.
            - 'prior_totalvar' and/or 'posterior_totalvar': The total variance for the 'prior' and 'posterior' phases.

    Notes:
        - Spread is the standard deviation of the ensemble.
        - The function modifies the input DataFrame by adding new columns for the calculated statistics.

Summarize the grand statistics, which are the statistics aggregated over all the observations for each type of observation.

stats.grand_statistics(used_obs)
type prior_rmse prior_bias prior_totalspread
0 ACARS_TEMPERATURE 0.936280 0.003940 1.068301
1 ACARS_U_WIND_COMPONENT 3.208672 0.767216 2.647837
2 ACARS_V_WIND_COMPONENT 2.991960 0.039161 2.646821
3 AIRCRAFT_TEMPERATURE 0.988145 0.302789 1.053419
4 AIRCRAFT_U_WIND_COMPONENT 3.970926 0.021871 3.170020
5 AIRCRAFT_V_WIND_COMPONENT 3.310620 -0.428454 3.165622
6 AIRS_TEMPERATURE 0.989038 -0.212630 1.042903
7 GPSRO_REFRACTIVITY 0.999933 0.089593 1.130523


Total running time of the script: (0 minutes 0.076 seconds)

Gallery generated by Sphinx-Gallery