Remove Observations#

This example demonstrates how to remove observations from an observation sequence and write a new observation sequence file.

Import the obs_sequence module

import pydartdiags.obs_sequence.obs_sequence as obsq

Chose an obs_seq file to read. In this example, we are using a small obs_seq file “obs_seq.final.medium” that comes with the pyDARTdiags package in the data directory, so we import os to get the path to the file.

import os
data_dir = os.path.join(os.getcwd(), "../..", "data")
data_file = os.path.join(data_dir, "obs_seq.final.ascii.medium")

Read the obs_seq file into an obs_seq object.

obs_seq = obsq.ObsSequence(data_file)

Take a look at the observation sequence.

obs_seq.df.head()
obs_num observation prior_ensemble_mean prior_ensemble_spread Data_QC DART_quality_control linked_list longitude latitude vertical vert_unit type metadata external_FO seconds days time obs_err_var
0 1 230.16 231.310652 0.405191 1.0 0.0 -1 2 -1 274.460 40.010 23950.0 pressure (Pa) ACARS_TEMPERATURE [] [] 75603 153005 2019-12-01 21:00:03 1.00
1 2 18.40 15.720527 0.630827 1.0 0.0 1 3 -1 274.460 40.010 23950.0 pressure (Pa) ACARS_U_WIND_COMPONENT [] [] 75603 153005 2019-12-01 21:00:03 6.25
2 3 1.60 -4.932073 0.825899 1.0 0.0 2 4 -1 274.460 40.010 23950.0 pressure (Pa) ACARS_V_WIND_COMPONENT [] [] 75603 153005 2019-12-01 21:00:03 6.25
3 4 264.16 264.060532 0.035584 1.0 0.0 3 5 -1 242.628 34.105 56260.0 pressure (Pa) ACARS_TEMPERATURE [] [] 75603 153005 2019-12-01 21:00:03 1.00
4 5 11.60 10.134115 0.063183 1.0 0.0 4 6 -1 242.628 34.105 56260.0 pressure (Pa) ACARS_U_WIND_COMPONENT [] [] 75603 153005 2019-12-01 21:00:03 6.25


To count of the number of observations by type, use the groupby method.

obs_seq.df.groupby('type').size()
type
ACARS_TEMPERATURE            107
ACARS_U_WIND_COMPONENT       106
ACARS_V_WIND_COMPONENT       105
AIRCRAFT_TEMPERATURE          20
AIRCRAFT_U_WIND_COMPONENT     20
AIRCRAFT_V_WIND_COMPONENT     20
AIRS_SPECIFIC_HUMIDITY        39
AIRS_TEMPERATURE              81
GPSRO_REFRACTIVITY           503
dtype: int64

Let’s remove the ‘GPSRO_REFRACTIVITY’ observations. Remove rows where 'type' == 'GPSRO_REFRACTIVITY'

obs_seq.df = obs_seq.df[obs_seq.df['type'] != 'GPSRO_REFRACTIVITY']

Now let’s check the number of observations by type again. For only the ‘GPSRO_REFRACTIVITY’ observations:

gpsro_count = (obs_seq.df['type'] == 'GPSRO_REFRACTIVITY').sum()
print(f"Number of observations with type 'GPSRO_REFRACTIVITY': {gpsro_count}")
Number of observations with type 'GPSRO_REFRACTIVITY': 0

Count the observations by type again. You’ll see that the ‘GPSRO_REFRACTIVITY’ observations have been removed from the dataFrame

obs_seq.df.groupby('type').size()
type
ACARS_TEMPERATURE            107
ACARS_U_WIND_COMPONENT       106
ACARS_V_WIND_COMPONENT       105
AIRCRAFT_TEMPERATURE          20
AIRCRAFT_U_WIND_COMPONENT     20
AIRCRAFT_V_WIND_COMPONENT     20
AIRS_SPECIFIC_HUMIDITY        39
AIRS_TEMPERATURE              81
dtype: int64

Write the new observation sequence to a file.

obs_seq.write_obs_seq('obs_seq.final.ascii.medium.no_gpsro')

The new file will not have the ‘GPSRO_REFRACTIVITY’ observations.

Total running time of the script: (0 minutes 0.032 seconds)

Gallery generated by Sphinx-Gallery