Grand Statistics#

This example demonstrates how to compute grand statistics for an observation sequence. For an explanation of the statistics calculations see the Statistics guide.

Import the obs_sequence module and the statistics module.

import pydartdiags.obs_sequence.obs_sequence as obsq
from pydartdiags.stats import stats

Chose an obs_seq file to read. This is a small obs_seq file “obs_seq.final.ascii.medium” that comes with the pyDARTdiags package in the data directory, so we import os to get the path to the file

import os
data_dir = os.path.join(os.getcwd(), "../..", "data")
data_file = os.path.join(data_dir, "obs_seq.final.ascii.medium")

Read the obs_seq file into an obs_seq object.

obs_seq = obsq.ObsSequence(data_file)

# Select observations that were used in the assimilation.
used_obs = obs_seq.select_used_qcs()

used_obs is a dataframe with only the observations with QC value of 0.

The columns of the dataframe are the same as the original obs_seq dataframe.

used_obs.columns

Index(['obs_num', 'observation', 'prior_ensemble_mean',
       'prior_ensemble_spread', 'Data_QC', 'DART_quality_control',
       'linked_list', 'longitude', 'latitude', 'vertical', 'vert_unit', 'type',
       'metadata', 'external_FO', 'seconds', 'days', 'time', 'obs_err_var'],
      dtype='object')

Now we calculate the statistics required for DART diagnostics.

stats.diag_stats(used_obs)

	obs_num	observation	prior_ensemble_mean	prior_ensemble_spread	Data_QC	DART_quality_control	linked_list	longitude	latitude	vertical	vert_unit	type	metadata	external_FO	seconds	days	time	obs_err_var	prior_sq_err	prior_bias	prior_totalvar
0	1	230.16	231.310652	0.405191	1.0	0.0	-1 2 -1	274.460	40.010	23950.0	pressure (Pa)	ACARS_TEMPERATURE	[]	[]	75603	153005	2019-12-01 21:00:03	1.00	1.324001	1.150652	1.164180
1	2	18.40	15.720527	0.630827	1.0	0.0	1 3 -1	274.460	40.010	23950.0	pressure (Pa)	ACARS_U_WIND_COMPONENT	[]	[]	75603	153005	2019-12-01 21:00:03	6.25	7.179578	-2.679473	6.647942
2	3	1.60	-4.932073	0.825899	1.0	0.0	2 4 -1	274.460	40.010	23950.0	pressure (Pa)	ACARS_V_WIND_COMPONENT	[]	[]	75603	153005	2019-12-01 21:00:03	6.25	42.667980	-6.532073	6.932109
3	4	264.16	264.060532	0.035584	1.0	0.0	3 5 -1	242.628	34.105	56260.0	pressure (Pa)	ACARS_TEMPERATURE	[]	[]	75603	153005	2019-12-01 21:00:03	1.00	0.009894	-0.099468	1.001266
4	5	11.60	10.134115	0.063183	1.0	0.0	4 6 -1	242.628	34.105	56260.0	pressure (Pa)	ACARS_U_WIND_COMPONENT	[]	[]	75603	153005	2019-12-01 21:00:03	6.25	2.148818	-1.465885	6.253992
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
996	997	229.96	229.980655	0.205479	1.0	0.0	996 998 -1	239.665	36.126	28750.0	pressure (Pa)	ACARS_TEMPERATURE	[]	[]	75661	153005	2019-12-01 21:01:01	1.00	0.000427	0.020655	1.042222
997	998	44.60	40.860344	0.479893	1.0	0.0	997 999 -1	239.665	36.126	28750.0	pressure (Pa)	ACARS_U_WIND_COMPONENT	[]	[]	75661	153005	2019-12-01 21:01:01	6.25	13.985023	-3.739656	6.480297
998	999	26.80	28.039954	0.481712	1.0	0.0	998 1000 -1	239.665	36.126	28750.0	pressure (Pa)	ACARS_V_WIND_COMPONENT	[]	[]	75661	153005	2019-12-01 21:01:01	6.25	1.537487	1.239954	6.482046
999	1000	237.16	237.538919	0.280533	1.0	0.0	999 1001 -1	261.915	33.102	31070.0	pressure (Pa)	ACARS_TEMPERATURE	[]	[]	75661	153005	2019-12-01 21:01:01	1.00	0.143579	0.378919	1.078699
1000	1001	43.30	43.186404	0.573788	1.0	0.0	1000 -1 -1	261.915	33.102	31070.0	pressure (Pa)	ACARS_U_WIND_COMPONENT	[]	[]	75661	153005	2019-12-01 21:01:01	6.25	0.012904	-0.113596	6.579233

689 rows × 21 columns

The statistics are calculated for each row in the dataframe, and the results are stored in new columns.

used_obs.columns

Index(['obs_num', 'observation', 'prior_ensemble_mean',
       'prior_ensemble_spread', 'Data_QC', 'DART_quality_control',
       'linked_list', 'longitude', 'latitude', 'vertical', 'vert_unit', 'type',
       'metadata', 'external_FO', 'seconds', 'days', 'time', 'obs_err_var',
       'prior_sq_err', 'prior_bias', 'prior_totalvar'],
      dtype='object')

The help function can be used to find out more about the diag_stats function including what statistics are calculated.

help(stats.diag_stats)

Help on function diag_stats in module pydartdiags.stats.stats:

diag_stats(df, phase)
    Calculate diagnostic statistics for a given phase and add them to the DataFrame.

    Note:
        This function is decorated with @apply_to_phases_in_place, which modifies its usage.
        You should call it as diag_stats(df), and the decorator will automatically apply the
        function to all relevant phases (‘prior’ and ‘posterior’) modifying the DataFrame
        in place.

    Args:
        df (pandas.DataFrame): The input DataFrame containing observation data and ensemble statistics.
            The DataFrame must include the following columns:

            - 'observation': The actual observation values.
            - 'obs_err_var': The variance of the observation error.
            - 'prior_ensemble_mean' and/or 'posterior_ensemble_mean': The mean of the ensemble.
            - 'prior_ensemble_spread' and/or 'posterior_ensemble_spread': The spread of the ensemble.

    Returns:
        None: The function modifies the DataFrame in place by adding the following columns:
            - 'prior_sq_err' and/or 'posterior_sq_err': The square error for the 'prior' and 'posterior' phases.
            - 'prior_bias' and/or 'posterior_bias': The bias for the 'prior' and 'posterior' phases.
            - 'prior_totalvar' and/or 'posterior_totalvar': The total variance for the 'prior' and 'posterior' phases.

    Notes:
        - Spread is the standard deviation of the ensemble.
        - The function modifies the input DataFrame by adding new columns for the calculated statistics.

Summarize the grand statistics, which are the statistics aggregated over all the observations for each type of observation.

stats.grand_statistics(used_obs)

	type	prior_rmse	prior_bias	prior_totalspread
0	ACARS_TEMPERATURE	0.936280	0.003940	1.068301
1	ACARS_U_WIND_COMPONENT	3.208672	0.767216	2.647837
2	ACARS_V_WIND_COMPONENT	2.991960	0.039161	2.646821
3	AIRCRAFT_TEMPERATURE	0.988145	0.302789	1.053419
4	AIRCRAFT_U_WIND_COMPONENT	3.970926	0.021871	3.170020
5	AIRCRAFT_V_WIND_COMPONENT	3.310620	-0.428454	3.165622
6	AIRS_TEMPERATURE	0.989038	-0.212630	1.042903
7	GPSRO_REFRACTIVITY	0.999933	0.089593	1.130523

Total running time of the script: (0 minutes 0.158 seconds)

Gallery generated by Sphinx-Gallery