module: obs_sequence#
- class obs_sequence.obs_sequence(file, synonyms=None)#
Initialize an obs_sequence object from an ASCII or binary observation sequence file, or create an empty obs_sequence object from scratch.
- Parameters:
file (str) – The input observation sequence ASCII or binary file. If None, an empty obs_sequence object is created from scratch.
- Returns:
An obs_sequence object
- df#
The DataFrame containing the observation sequence data.
- Type:
pandas.DataFrame
- header#
The header of the observation sequence.
- Type:
list
- copie_names#
The names of the copies in the observation sequence. Spelled ‘copie’ to avoid conflict with the Python built-in ‘copy’. Spaces are replaced with underscores in copie_names.
- Type:
list
- non_qc_copie_names#
The names of the copies not including quality control, e.g. observation, mean, ensemble_members
- Type:
list
- qc_copie_names#
The names of the quality control copies, e.g. DART_QC
- Type:
list
- n_copies#
The total number of copies in the observation sequence.
- Type:
int
- n_non_qc#
The number of copies not including quality control.
- Type:
int
- n_qc#
The number of quality control copies.
- Type:
int
- vert#
A dictionary mapping DART vertical coordinate types to their corresponding integer values.
undefined: ‘VERTISUNDEF’
surface: ‘VERTISSURFACE’ (value is surface elevation in meters)
model level: ‘VERTISLEVEL’
pressure: ‘VERTISPRESSURE’ (in Pascals)
height: ‘VERTISHEIGHT’ (in meters)
scale height: ‘VERTISSCALEHEIGHT’ (unitless)
- Type:
dict
- loc_mod#
The location model, either ‘loc3d’ or ‘loc1d’. For 3D sphere models: latitude and longitude are in degrees in the DataFrame.
- Type:
str
- types#
Dictionary of types of observations the observation sequence, e.g. {23: ‘ACARS_TEMPERATURE’},
- Type:
dict
- reverse_types#
Dictionary of types with keys and values reversed, e.g {‘ACARS_TEMPERATURE’: 23}
- Type:
dict
- synonyms_for_obs#
List of synonyms for the observation column in the DataFrame. The default list is
[ 'NCEP BUFR observation', 'AIRS observation', 'GTSPP observation', 'SST observation', 'observations', 'WOD observation']
You can add more synonyms by providing a list of strings when creating the obs_sequence object.
obs_sequence(file, synonyms=['synonym1', 'synonym2']).df
- Type:
list
- seq#
Generator of observations from the observation sequence file.
- Type:
generator
- all_obs#
List of all observations, each observation is a list. Valid when the obs_sequence is created from a file. Set to None when the obs_sequence is created from scratch or multiple obs_sequences are joined.
- Type:
list
- create_all_obs()#
steps through the generator to create a list of all observations in the sequence
- obs_to_list(obs)#
put single observation into a list
- static split_metadata(metadata)#
Split the metadata list at the first occurrence of an element starting with ‘externalF0’.
- Parameters:
metadata (list of str) – The metadata list to be split.
- Returns:
- Two sublists, the first containing elements before ‘externalF0’, and the second
containing ‘externalF0’ and all elements after it. If ‘externalF0’ is not found, the first sublist contains the entire metadata list, and the second is empty.
- Return type:
tuple
- list_to_obs(data)#
convert a list of data to an observation
Assuming the order of the list is obs_seq.copie_names
- static generate_linked_list_pattern(n)#
Create a list of strings with the linked list pattern for n observations.
- write_obs_seq(file)#
Write the observation sequence to a file.
This function writes the observation sequence stored in the obs_seq.DataFrame to a specified file. It updates the header with the number of observations, converts coordinates back to radians if necessary, drops unnecessary columns, sorts the DataFrame by time, and generates a linked list pattern for reading by DART programs.
- Parameters:
file (str) – The path to the file where the observation sequence will be written.
Notes
Longitude and latitude are converted back to radians if the location model is ‘loc3d’.
The ‘bias’ and ‘sq_err’ columns are dropped if they exist in the DataFrame.
The DataFrame is sorted by the ‘time’ column.
An ‘obs_num’ column is added to the DataFrame to number the observations in time order.
A ‘linked_list’ column is generated to create a linked list pattern for the observations.
Example
obsq.write_obs_seq(‘obs_seq.new’)
- static update_types_dicts(df, reverse_types)#
Ensure all unique observation types are in the reverse_types dictionary and create the types dictionary.
- Parameters:
df (pd.DataFrame) – The DataFrame containing the observation sequence data.
reverse_types (dict) – The dictionary mapping observation types to their corresponding integer values.
- Returns:
The updated reverse_types dictionary. dict: The types dictionary with keys sorted in numerical order.
- Return type:
dict
- create_header_from_dataframe()#
Create a header for the observation sequence based on the data in the DataFrame.
It creates a dictionary of unique observation types, counts the number of observations, and constructs the header with necessary information.
Example: self.create_header_from_dataframe()
- column_headers()#
define the columns for the dataframe
- static is_binary(file)#
Check if a file is binary file.
- static read_header(file)#
Read the header and number of lines in the header of an ascii obs_seq file
- static read_binary_header(file)#
Read the header and number of lines in the header of a binary obs_seq file from Fortran output
- static collect_obs_types(header)#
Create a dictionary for the observation types in the obs_seq header
- static collect_copie_names(header)#
Extracts the names of the copies from the header of an obs_seq file.
- Parameters:
header (list) – A list of strings representing the lines in the header of the obs_seq file.
- Returns:
- A tuple containing two elements:
copie_names (list): A list of strings representing the copy names with underscores for spaces.
len(copie_names) (int): The number of copy names.
- Return type:
tuple
- static num_qc_non_qc(header)#
Find the number of qc and non-qc copies in the header
- static obs_reader(file, n)#
Reads the ascii obs sequence file and returns a generator of the obs
- static check_trailing_record_length(file, expected_length)#
Reads and checks the trailing record length from the binary file written by Fortran.
- Parameters:
file (file) – The file object.
expected_length – The expected length of the trailing record.
- static read_record_length(file)#
Reads and unpacks the record length from the file.
- obs_binary_reader(file, n)#
Reads the obs sequence binary file and returns a generator of the obs
- composite_types(composite_types='use_default')#
Set up and construct composite types for the DataFrame.
This function sets up composite types based on a provided YAML configuration or a default configuration. It constructs new composite rows by combining specified components and adds them to the DataFrame.
- Parameters:
composite_types (str, optional) – The YAML configuration for composite types.
'use_default' (If)
Otherwise (the default configuration is used.)
provided. (a custom YAML configuration can be)
- Returns:
The updated DataFrame with the new composite rows added.
- Return type:
pd.DataFrame
- Raises:
Exception – If there are repeat values in the components.
- classmethod join(obs_sequences, copies=None)#
Join a list of observation sequences together.
This method combines the headers and observations from a list of obs_sequence objects into a single obs_sequence object.
- Parameters:
obs_sequences (list of obs_sequences) – The list of observation sequences objects to join.
copies (list of str, optional) – A list of copy names to include in the combined data. If not provided, all copies are included.
- Returns:
A new obs_sequence object containing the combined data.
Example
obs_seq1 = obs_sequence(file='obs_seq1.final') obs_seq2 = obs_sequence(file='obs_seq2.final') obs_seq3 = obs_sequence(file='obs_seq3.final') combined = obs_sequence.join([obs_seq1, obs_seq2, obs_seq3])
- has_assimilation_info()#
Check if the DataFrame has prior information.
- Returns:
True if both ‘prior_ensemble_mean’ and ‘prior_ensemble_spread’ columns are present, False otherwise.
- Return type:
bool
- has_posterior()#
Check if the DataFrame has posterior information.
- Returns:
True if both ‘posterior_ensemble_mean’ and ‘posterior_ensemble_spread’ columns are present, False otherwise.
- Return type:
bool
- create_header(n)#
Create a header for the obs_seq file from the obs_sequence object.
- obs_sequence.load_yaml_to_dict(file_path)#
Load a YAML file and convert it to a dictionary.
- Parameters:
file_path (str) – The path to the YAML file.
- Returns:
The YAML file content as a dictionary.
- Return type:
dict
- obs_sequence.convert_dart_time(seconds, days)#
covert from seconds, days after 1601 to datetime object
Note
base year for Gregorian calendar is 1601
dart time is seconds, days since 1601
- obs_sequence.construct_composit(df_comp, composite, components)#
Construct a composite DataFrame by combining rows from two components.
This function takes two DataFrames and combines rows from them based on matching location and time. It creates a new row with a composite type by combining specified columns using the square root of the sum of squares method.
- Parameters:
df_comp (pd.DataFrame) – The DataFrame containing the component rows to be combined.
composite (str) – The type name for the new composite rows.
components (list of str) – A list containing the type names of the two components to be combined.
- Returns:
A DataFrame containing the new composite rows.
- Return type:
merged_df (pd.DataFrame)