module: obs_sequence#
- class obs_sequence.ObsSequence(file, synonyms=None)#
Initialize an ObsSequence object from an ASCII or binary observation sequence file, or create an empty ObsSequence object from scratch.
1D observations are given a datetime of days, seconds since 2000-01-01 00:00:00
3D observations are given a datetime of days, seconds since 1601-01-01 00:00:00 (DART Gregorian calendar)
- Parameters:
file (str) – The input observation sequence ASCII or binary file. If None, an empty ObsSequence object is created from scratch.
synonyms (list, optional) –
List of additional synonyms for the observation column in the DataFrame. The default list is
['NCEP BUFR observation', 'AIRS observation', 'GTSPP observation', 'SST observation', 'observations', 'WOD observation']
You can add more synonyms by providing a list of strings when creating the ObsSequence object.
ObsSequence(file, synonyms=['synonym1', 'synonym2'])
- Raises:
ValueError – If neither ‘loc3d’ nor ‘loc1d’ could be found in the observation sequence.
Examples
obs_seq = ObsSequence(file='obs_seq.final')
- df#
The DataFrame containing the observation sequence data.
- Type:
pandas.DataFrame
- header#
The header of the observation sequence.
- Type:
list
- copie_names#
The names of the copies in the observation sequence. Spelled ‘copie’ to avoid conflict with the Python built-in ‘copy’. Spaces are replaced with underscores in copie_names.
- Type:
list
- non_qc_copie_names#
The names of the copies not including quality control, e.g. observation, mean, ensemble_members
- Type:
list
- qc_copie_names#
The names of the quality control copies, e.g. DART_QC
- Type:
list
- n_copies#
The total number of copies in the observation sequence.
- Type:
int
- n_non_qc#
The number of copies not including quality control.
- Type:
int
- n_qc#
The number of quality control copies.
- Type:
int
- vert#
A dictionary mapping DART vertical coordinate types to their corresponding integer values.
undefined: ‘VERTISUNDEF’
surface: ‘VERTISSURFACE’ (value is surface elevation in meters)
model level: ‘VERTISLEVEL’
pressure: ‘VERTISPRESSURE’ (in Pascals)
height: ‘VERTISHEIGHT’ (in meters)
scale height: ‘VERTISSCALEHEIGHT’ (unitless)
- Type:
dict
- loc_mod#
The location model, either ‘loc3d’ or ‘loc1d’. For 3D sphere models: latitude and longitude are in degrees in the DataFrame.
- Type:
str
- types#
Dictionary of types of observations in the observation sequence, e.g. {23: ‘ACARS_TEMPERATURE’},
- Type:
dict
- reverse_types#
Dictionary of types with keys and values reversed, e.g {‘ACARS_TEMPERATURE’: 23}
- Type:
dict
- synonyms_for_obs#
List of synonyms for the observation column in the DataFrame.
- Type:
list
- seq#
Generator of observations from the observation sequence file.
- Type:
generator
- all_obs#
List of all observations, each observation is a list. Valid when the ObsSequence is created from a file. Set to None when the ObsSequence is created from scratch or multiple ObsSequences are joined.
- Type:
list
- create_all_obs()#
steps through the generator to create a list of all observations in the sequence
- obs_to_list(obs)#
put single observation into a list
- static split_metadata(metadata)#
Split the metadata list at the first occurrence of an element starting with ‘externalF0’.
- Parameters:
metadata (list of str) – The metadata list to be split.
- Returns:
- Two sublists, the first containing elements before ‘externalF0’, and the second
containing ‘externalF0’ and all elements after it. If ‘externalF0’ is not found, the first sublist contains the entire metadata list, and the second is empty.
- Return type:
tuple
- list_to_obs(data)#
convert a list of data to an observation
Assuming the order of the list is obs_seq.copie_names
- static generate_linked_list_pattern(n)#
Create a list of strings with the linked list pattern for n observations.
- write_obs_seq(file)#
Write the observation sequence to a file.
This function writes the observation sequence stored in the obs_seq.DataFrame to a specified file. It updates the header with the number of observations, converts coordinates back to radians if necessary, reverts NaNs back to MISSING_R8 for observations with QC=2, drops unnecessary columns, sorts the DataFrame by time, and generates a linked list pattern for reading by DART programs.
- Parameters:
file (str) – The path to the file where the observation sequence will be written.
Notes
Longitude and latitude are converted back to radians if the location model is ‘loc3d’.
The replacement of MISSING_R8 values with NaNs for any obs that failed the posterior forward observation operators (QC2) is reverted.
The ‘bias’ and ‘sq_err’ columns are dropped if they exist in the DataFrame.
The DataFrame is sorted by the ‘time’ column.
An ‘obs_num’ column is added to the DataFrame to number the observations in time order.
A ‘linked_list’ column is generated to create a linked list pattern for the observations.
Example
obsq.write_obs_seq(‘obs_seq.new’)
- static update_types_dicts(df, reverse_types)#
Ensure all unique observation types are in the reverse_types dictionary and create the types dictionary.
- Parameters:
df (pd.DataFrame) – The DataFrame containing the observation sequence data.
reverse_types (dict) – The dictionary mapping observation types to their corresponding integer values.
- Returns:
The updated reverse_types dictionary. dict: The types dictionary with keys sorted in numerical order.
- Return type:
dict
- create_header_from_dataframe()#
Create a header for the observation sequence based on the data in the DataFrame.
It creates a dictionary of unique observation types, counts the number of observations, and constructs the header with necessary information.
Example: self.create_header_from_dataframe()
- column_headers()#
define the columns for the dataframe
- static is_binary(file)#
Check if a file is binary file.
- static read_header(file)#
Read the header and number of lines in the header of an ascii obs_seq file
- static read_binary_header(file)#
Read the header and number of lines in the header of a binary obs_seq file from Fortran output
- static collect_obs_types(header)#
Create a dictionary for the observation types in the obs_seq header
- static collect_copie_names(header)#
Extracts the names of the copies from the header of an obs_seq file.
- Parameters:
header (list) – A list of strings representing the lines in the header of the obs_seq file.
- Returns:
- A tuple containing two elements:
copie_names (list): A list of strings representing the copy names with underscores for spaces.
len(copie_names) (int): The number of copy names.
- Return type:
tuple
- static num_qc_non_qc(header)#
Find the number of qc and non-qc copies in the header
- static obs_reader(file, n)#
Reads the ascii obs sequence file and returns a generator of the obs
- static check_trailing_record_length(file, expected_length)#
Reads and checks the trailing record length from the binary file written by Fortran.
- Parameters:
file (file) – The file object.
expected_length – The expected length of the trailing record.
- static read_record_length(file)#
Reads and unpacks the record length from the file.
- obs_binary_reader(file, n)#
Reads the obs sequence binary file and returns a generator of the obs
- composite_types(composite_types='use_default', raise_on_duplicate=False)#
Set up and construct composite observation types for the DataFrame.
This function sets up composite observation types based on a provided YAML configuration or a default configuration. It constructs new composite rows by combining specified components and adds them to the DataFrame in place.
- Parameters:
composite_types (str, optional) – The YAML configuration for composite types. If ‘use_default’, the default configuration is used. Otherwise, a custom YAML configuration can be provided.
raise_on_duplicate (bool, optional) – If True, raises an exception if there are duplicates in the components. otherwise default False, deals with duplicates as though they are distinct observations.
- Returns:
The updated DataFrame with the new composite rows added.
- Return type:
pd.DataFrame
- Raises:
Exception – If there are repeat values in the components and raise_on_duplicate = True
- classmethod join(obs_sequences, copies=None)#
Join a list of observation sequences together.
This method combines the headers and observations from a list of ObsSequence objects into a single ObsSequence object.
- Parameters:
obs_sequences (list of ObsSequences) – The list of observation sequences objects to join.
copies (list of str, optional) – A list of copy names to include in the combined data. If not provided, all copies are included.
- Returns:
A new ObsSequence object containing the combined data.
Example
obs_seq1 = ObsSequence(file='obs_seq1.final') obs_seq2 = ObsSequence(file='obs_seq2.final') obs_seq3 = ObsSequence(file='obs_seq3.final') combined = ObsSequence.join([obs_seq1, obs_seq2, obs_seq3])
- static update_linked_list(df)#
Sorts the DataFrame by ‘time’, resets the index, and adds/updates ‘linked_list’ and ‘obs_num’ columns in place. Modifies the input DataFrame directly.
- has_assimilation_info()#
Check if the DataFrame has prior information.
- Returns:
True if both ‘prior_ensemble_mean’ and ‘prior_ensemble_spread’ columns are present, False otherwise.
- Return type:
bool
- has_posterior()#
Check if the DataFrame has posterior information.
- Returns:
True if both ‘posterior_ensemble_mean’ and ‘posterior_ensemble_spread’ columns are present, False otherwise.
- Return type:
bool
- create_header(n)#
Create a header for the obs_seq file from the ObsSequence object.
- static replace_qc2_nan(df)#
Replace MISSING_R8 values with NaNs in posterior columns for observations where DART_quality_control = 2 (posterior forward observation operators failed)
This causes these observations to be ignored in the calculations of posterior statistics
- static revert_qc2_nan(df)#
Revert NaNs back to MISSING_R8s for observations where DART_quality_control = 2 (posterior forward observation operators failed)
- update_attributes_from_df()#
Update all internal data (fields/properties) of the ObsSequence object that depend on the DataFrame (self.df). Call this after self.df is replaced or its structure changes.
Important
Assumes copies are all columns between ‘obs_num’ and ‘linked_list’ (if present)
- obs_sequence.load_yaml_to_dict(file_path)#
Load a YAML file and convert it to a dictionary.
- Parameters:
file_path (str) – The path to the YAML file.
- Returns:
The YAML file content as a dictionary.
- Return type:
dict
- obs_sequence.convert_dart_time(seconds, days)#
covert from seconds, days after 1601 to datetime object
Note
base year for Gregorian calendar is 1601
dart time is seconds, days since 1601
- obs_sequence.construct_composit(df_comp, composite, components, raise_on_duplicate)#
Creates a new DataFrame by combining pairs of rows from two specified component types in an observation DataFrame. It matches rows based on location and time, and then combines certain columns using the square root of the sum of squares of the components.
- Parameters:
df_comp (pd.DataFrame) – The DataFrame containing the component rows to be combined.
composite (str) – The type name for the new composite rows.
components (list of str) – A list containing the type names of the two components to be combined.
raise_on_duplicate (bool) – If False, raises an exception if there are duplicates in the components.
observations. (otherwise deals with duplicates as though they are distinct)
- Returns:
A DataFrame containing the new composite rows.
- Return type:
merged_df (pd.DataFrame)