module: obs_sequence#

class obs_sequence.ObsSequence(file, synonyms=None)#

Initialize an ObsSequence object from an ASCII or binary observation sequence file, or create an empty ObsSequence object from scratch.

1D observations are given a datetime of days, seconds since 2000-01-01 00:00:00

3D observations are given a datetime of days, seconds since 1601-01-01 00:00:00 (DART Gregorian calendar)

Parameters:
  • file (str) – The input observation sequence ASCII or binary file. If None, an empty ObsSequence object is created from scratch.

  • synonyms (list, optional) –

    List of additional synonyms for the observation column in the DataFrame. The default list is

    ['NCEP BUFR observation',
    'AIRS observation',
    'GTSPP observation',
    'SST observation',
    'observations',
    'WOD observation']
    

    You can add more synonyms by providing a list of strings when creating the ObsSequence object.

    ObsSequence(file, synonyms=['synonym1', 'synonym2'])
    

Raises:

ValueError – If neither ‘loc3d’ nor ‘loc1d’ could be found in the observation sequence.

Examples

obs_seq = ObsSequence(file='obs_seq.final')
empty_obs_seq = ObsSequence(file=None)

Attributes of ObsSequence Objects#

df

(pandas.DataFrame) The DataFrame containing the observation sequence data.

header

(list) The header of the observation sequence.

copie_names

(list) The names of the copies in the observation sequence. Spelled ‘copie’ to avoid conflict with the Python built-in ‘copy’. Spaces are replaced with underscores in copie_names.

non_qc_copie_names

(list) The names of the copies not including quality control, e.g. observation, mean, ensemble_members

qc_copie_names

(list) The names of the quality control copies, e.g. DART_QC

n_copies

(int) The total number of copies in the observation sequence.

n_non_qc

(int) The number of copies not including quality control.

n_qc

(int) The number of quality control copies.

vert

(dict) A dictionary mapping DART vertical coordinate types to their corresponding integer values:

  • undefined: ‘VERTISUNDEF’

  • surface: ‘VERTISSURFACE’ (value is surface elevation in meters)

  • model level: ‘VERTISLEVEL’

  • pressure: ‘VERTISPRESSURE’ (in Pascals)

  • height: ‘VERTISHEIGHT’ (in meters)

  • scale height: ‘VERTISSCALEHEIGHT’ (unitless)

loc_mod

(str) The location model, either ‘loc3d’ or ‘loc1d’. For 3D sphere models: latitude and longitude are in degrees in the DataFrame.

types

(dict) Dictionary of types of observations in the observation sequence, e.g. {23: ‘ACARS_TEMPERATURE’}

reverse_types

(dict) Dictionary of types with keys and values reversed, e.g. {‘ACARS_TEMPERATURE’: 23}

synonyms_for_obs

(list) List of synonyms for the observation column in the DataFrame.

all_obs

(list) List of all observations, each observation is a list. Valid when the ObsSequence is created from a file. Set to None when the ObsSequence is created from scratch or multiple ObsSequences are joined.

ObsSequence Methods#

ObsSequence.write_obs_seq(file)#

Write the observation sequence to a file.

This function writes the observation sequence stored in the obs_seq.DataFrame to a specified file. It updates the header with the number of observations, converts coordinates back to radians if necessary, reverts NaNs back to MISSING_R8 for observations with QC=2, drops unnecessary columns, sorts the DataFrame by time, and generates a linked list pattern for reading by DART programs.

Parameters:

file (str) – The path to the file where the observation sequence will be written.

Notes

  • Longitude and latitude are converted back to radians if the location model is ‘loc3d’.

  • The replacement of MISSING_R8 values with NaNs for any obs that failed the posterior forward observation operators (QC2) is reverted.

  • The ‘bias’ and ‘sq_err’ columns are dropped if they exist in the DataFrame.

  • The DataFrame is sorted by the ‘time’ column.

  • An ‘obs_num’ column is added to the DataFrame to number the observations in time order.

  • A ‘linked_list’ column is generated to create a linked list pattern for the observations.

Example

obsq.write_obs_seq('obs_seq.new')
ObsSequence.possible_vs_used()#

Calculates the count of possible vs. used observations by type.

The number of used observations (‘used’), is the total number of assimilated observations (as determined by the select_used_qcs function). The result is a DataFrame with each observation type, the count of possible observations, and the count of used observations.

Returns:

A DataFrame with three columns: ‘type’, ‘possible’, and ‘used’. ‘type’ is the observation type, ‘possible’ is the count of all observations of that type, and ‘used’ is the count of observations of that type that passed quality control checks.

Return type:

pd.DataFrame

ObsSequence.select_by_dart_qc(dart_qc)#

Selects rows from a DataFrame based on the DART quality control flag.

Parameters:
  • df (DataFrame) – A pandas DataFrame.

  • dart_qc (int) – The DART quality control flag to select.

Returns:

A DataFrame containing only the rows with the specified DART quality control flag.

Return type:

DataFrame

Raises:

ValueError – If the DART quality control flag is not present in the DataFrame.

ObsSequence.select_used_qcs()#

Select rows from the DataFrame where the observation was used. Includes observations for which the posterior forward observation operators failed.

Returns:

A DataFrame containing only the rows with a DART quality control flag 0 or 2.

Return type:

pandas.DataFrame

ObsSequence.composite_types(composite_types='use_default', raise_on_duplicate=False)#

Set up and construct composite observation types for the DataFrame.

This function sets up composite observation types based on a provided YAML configuration or a default configuration. It constructs new composite rows by combining specified components and adds them to the DataFrame in place.

Parameters:
  • composite_types (str, optional) – The YAML configuration for composite types. If ‘use_default’, the default configuration is used. Otherwise, a custom YAML configuration can be provided.

  • raise_on_duplicate (bool, optional) – If True, raises an exception if there are duplicates in the components. otherwise default False, deals with duplicates as though they are distinct observations.

Returns:

The updated DataFrame with the new composite rows added.

Return type:

pd.DataFrame

Raises:

Exception – If there are repeat values in the components and raise_on_duplicate = True

classmethod ObsSequence.join(obs_sequences, copies=None)#

Join a list of observation sequences together.

This method combines the headers and observations from a list of ObsSequence objects into a single ObsSequence object.

Parameters:
  • obs_sequences (list of ObsSequences) – The list of observation sequences objects to join.

  • copies (list of str, optional) – A list of copy names to include in the combined data. If not provided, all copies are included.

Returns:

A new ObsSequence object containing the combined data.

Examples

obs_seq1 = ObsSequence(file='obs_seq1.final')
obs_seq2 = ObsSequence(file='obs_seq2.final')
obs_seq3 = ObsSequence(file='obs_seq3.final')
combined = ObsSequence.join([obs_seq1, obs_seq2, obs_seq3])
ObsSequence.update_attributes_from_df()#

Update all internal data (fields/properties) of the ObsSequence object that depend on the DataFrame (self.df). Call this after self.df is replaced or its structure changes.

Important

Assumes copies are all columns between ‘obs_num’ and ‘linked_list’ (if present)

ObsSequence.create_header_from_dataframe()#

Create a header for the observation sequence based on the data in the DataFrame.

It creates a dictionary of unique observation types, counts the number of observations, and constructs the header with necessary information.

Example

self.create_header_from_dataframe()
ObsSequence.create_header(n)#

Create a header for the obs_seq file from the ObsSequence object.

ObsSequence.has_posterior()#

Check if the DataFrame has posterior information.

Returns:

True if both ‘posterior_ensemble_mean’ and ‘posterior_ensemble_spread’ columns are present, False otherwise.

Return type:

bool

ObsSequence.has_assimilation_info()#

Check if the DataFrame has prior information.

Returns:

True if both ‘prior_ensemble_mean’ and ‘prior_ensemble_spread’ columns are present, False otherwise.

Return type:

bool