module: obs_sequence#

class obs_sequence.obs_sequence(file, synonyms=None)#

Initialize an obs_sequence object from an ASCII or binary observation sequence file, or create an empty obs_sequence object from scratch.

Parameters:

file (str) – The input observation sequence ASCII or binary file. If None, an empty obs_sequence object is created from scratch.

Returns:

An obs_sequence object

df#

The DataFrame containing the observation sequence data.

Type:

pandas.DataFrame

header#

The header of the observation sequence.

Type:

list

copie_names#

The names of the copies in the observation sequence. Spelled ‘copie’ to avoid conflict with the Python built-in ‘copy’. Spaces are replaced with underscores in copie_names.

Type:

list

non_qc_copie_names#

The names of the copies not including quality control, e.g. observation, mean, ensemble_members

Type:

list

qc_copie_names#

The names of the quality control copies, e.g. DART_QC

Type:

list

n_copies#

The total number of copies in the observation sequence.

Type:

int

n_non_qc#

The number of copies not including quality control.

Type:

int

n_qc#

The number of quality control copies.

Type:

int

vert#

A dictionary mapping DART vertical coordinate types to their corresponding integer values.

  • undefined: ‘VERTISUNDEF’

  • surface: ‘VERTISSURFACE’ (value is surface elevation in meters)

  • model level: ‘VERTISLEVEL’

  • pressure: ‘VERTISPRESSURE’ (in Pascals)

  • height: ‘VERTISHEIGHT’ (in meters)

  • scale height: ‘VERTISSCALEHEIGHT’ (unitless)

Type:

dict

loc_mod#

The location model, either ‘loc3d’ or ‘loc1d’. For 3D sphere models: latitude and longitude are in degrees in the DataFrame.

Type:

str

types#

Dictionary of types of observations the observation sequence, e.g. {23: ‘ACARS_TEMPERATURE’},

Type:

dict

reverse_types#

Dictionary of types with keys and values reversed, e.g {‘ACARS_TEMPERATURE’: 23}

Type:

dict

synonyms_for_obs#

List of synonyms for the observation column in the DataFrame. The default list is

[ 'NCEP BUFR observation',
'AIRS observation',
'GTSPP observation',
'SST observation',
'observations',
'WOD observation']

You can add more synonyms by providing a list of strings when creating the obs_sequence object.

obs_sequence(file, synonyms=['synonym1', 'synonym2']).df
Type:

list

seq#

Generator of observations from the observation sequence file.

Type:

generator

all_obs#

List of all observations, each observation is a list. Valid when the obs_sequence is created from a file. Set to None when the obs_sequence is created from scratch or multiple obs_sequences are joined.

Type:

list

create_all_obs()#

steps through the generator to create a list of all observations in the sequence

obs_to_list(obs)#

put single observation into a list

static split_metadata(metadata)#

Split the metadata list at the first occurrence of an element starting with ‘externalF0’.

Parameters:

metadata (list of str) – The metadata list to be split.

Returns:

Two sublists, the first containing elements before ‘externalF0’, and the second

containing ‘externalF0’ and all elements after it. If ‘externalF0’ is not found, the first sublist contains the entire metadata list, and the second is empty.

Return type:

tuple

list_to_obs(data)#

convert a list of data to an observation

Assuming the order of the list is obs_seq.copie_names

static generate_linked_list_pattern(n)#

Create a list of strings with the linked list pattern for n observations.

write_obs_seq(file)#

Write the observation sequence to a file.

This function writes the observation sequence stored in the obs_seq.DataFrame to a specified file. It updates the header with the number of observations, converts coordinates back to radians if necessary, drops unnecessary columns, sorts the DataFrame by time, and generates a linked list pattern for reading by DART programs.

Parameters:

file (str) – The path to the file where the observation sequence will be written.

Notes

  • Longitude and latitude are converted back to radians if the location model is ‘loc3d’.

  • The ‘bias’ and ‘sq_err’ columns are dropped if they exist in the DataFrame.

  • The DataFrame is sorted by the ‘time’ column.

  • An ‘obs_num’ column is added to the DataFrame to number the observations in time order.

  • A ‘linked_list’ column is generated to create a linked list pattern for the observations.

Example

obsq.write_obs_seq(‘obs_seq.new’)

static update_types_dicts(df, reverse_types)#

Ensure all unique observation types are in the reverse_types dictionary and create the types dictionary.

Parameters:
  • df (pd.DataFrame) – The DataFrame containing the observation sequence data.

  • reverse_types (dict) – The dictionary mapping observation types to their corresponding integer values.

Returns:

The updated reverse_types dictionary. dict: The types dictionary with keys sorted in numerical order.

Return type:

dict

create_header_from_dataframe()#

Create a header for the observation sequence based on the data in the DataFrame.

It creates a dictionary of unique observation types, counts the number of observations, and constructs the header with necessary information.

Example: self.create_header_from_dataframe()

column_headers()#

define the columns for the dataframe

static is_binary(file)#

Check if a file is binary file.

static read_header(file)#

Read the header and number of lines in the header of an ascii obs_seq file

static read_binary_header(file)#

Read the header and number of lines in the header of a binary obs_seq file from Fortran output

static collect_obs_types(header)#

Create a dictionary for the observation types in the obs_seq header

static collect_copie_names(header)#

Extracts the names of the copies from the header of an obs_seq file.

Parameters:

header (list) – A list of strings representing the lines in the header of the obs_seq file.

Returns:

A tuple containing two elements:
  • copie_names (list): A list of strings representing the copy names with underscores for spaces.

  • len(copie_names) (int): The number of copy names.

Return type:

tuple

static num_qc_non_qc(header)#

Find the number of qc and non-qc copies in the header

static obs_reader(file, n)#

Reads the ascii obs sequence file and returns a generator of the obs

static check_trailing_record_length(file, expected_length)#

Reads and checks the trailing record length from the binary file written by Fortran.

Parameters:
  • file (file) – The file object.

  • expected_length – The expected length of the trailing record.

static read_record_length(file)#

Reads and unpacks the record length from the file.

obs_binary_reader(file, n)#

Reads the obs sequence binary file and returns a generator of the obs

composite_types(composite_types='use_default')#

Set up and construct composite types for the DataFrame.

This function sets up composite types based on a provided YAML configuration or a default configuration. It constructs new composite rows by combining specified components and adds them to the DataFrame.

Parameters:
  • composite_types (str, optional) – The YAML configuration for composite types.

  • 'use_default' (If)

  • Otherwise (the default configuration is used.)

  • provided. (a custom YAML configuration can be)

Returns:

The updated DataFrame with the new composite rows added.

Return type:

pd.DataFrame

Raises:

Exception – If there are repeat values in the components.

classmethod join(obs_sequences, copies=None)#

Join a list of observation sequences together.

This method combines the headers and observations from a list of obs_sequence objects into a single obs_sequence object.

Parameters:
  • obs_sequences (list of obs_sequences) – The list of observation sequences objects to join.

  • copies (list of str, optional) – A list of copy names to include in the combined data. If not provided, all copies are included.

Returns:

A new obs_sequence object containing the combined data.

Example

obs_seq1 = obs_sequence(file='obs_seq1.final')
obs_seq2 = obs_sequence(file='obs_seq2.final')
obs_seq3 = obs_sequence(file='obs_seq3.final')
combined = obs_sequence.join([obs_seq1, obs_seq2, obs_seq3])
has_assimilation_info()#

Check if the DataFrame has prior information.

Returns:

True if both ‘prior_ensemble_mean’ and ‘prior_ensemble_spread’ columns are present, False otherwise.

Return type:

bool

has_posterior()#

Check if the DataFrame has posterior information.

Returns:

True if both ‘posterior_ensemble_mean’ and ‘posterior_ensemble_spread’ columns are present, False otherwise.

Return type:

bool

create_header(n)#

Create a header for the obs_seq file from the obs_sequence object.

obs_sequence.load_yaml_to_dict(file_path)#

Load a YAML file and convert it to a dictionary.

Parameters:

file_path (str) – The path to the YAML file.

Returns:

The YAML file content as a dictionary.

Return type:

dict

obs_sequence.convert_dart_time(seconds, days)#

covert from seconds, days after 1601 to datetime object

Note

  • base year for Gregorian calendar is 1601

  • dart time is seconds, days since 1601

obs_sequence.construct_composit(df_comp, composite, components)#

Construct a composite DataFrame by combining rows from two components.

This function takes two DataFrames and combines rows from them based on matching location and time. It creates a new row with a composite type by combining specified columns using the square root of the sum of squares method.

Parameters:
  • df_comp (pd.DataFrame) – The DataFrame containing the component rows to be combined.

  • composite (str) – The type name for the new composite rows.

  • components (list of str) – A list containing the type names of the two components to be combined.

Returns:

A DataFrame containing the new composite rows.

Return type:

merged_df (pd.DataFrame)