Note
Go to the end to download the full example code.
Join Observation Sequences#
This example demonstrates how to read in two observation sequences and join them together.
Import the obs_sequence module.
import pydartdiags.obs_sequence.obs_sequence as obsq
Chose the first obs_seq file to read.
In this example, we are using a small obs_seq file “obs_seq.final.1000”
that comes with the pyDARTdiags package
in the data directory, so we import os
to get the path to the file.
import os
data_dir = os.path.join(os.getcwd(), "../..", "data")
data_file1 = os.path.join(data_dir, "obs_seq.final.1000")
Read the obs_seq file into an obs_seq object.
obs_seq1 = obsq.ObsSequence(data_file1)
print('obs_seq1 has assimilation info:', obs_seq1.has_assimilation_info())
print('obs_seq1 has posterior:', obs_seq1.has_posterior())
obs_seq1 has assimilation info: True
obs_seq1 has posterior: True
Chose the second obs_seq file to read.
data_file2 = os.path.join(data_dir, "obs_seq.final.ascii.small")
obs_seq2 = obsq.ObsSequence(data_file2)
print('obs_seq2 has assimilation info:', obs_seq2.has_assimilation_info())
print('obs_seq2 has posterior:', obs_seq2.has_posterior())
obs_seq2 has assimilation info: True
obs_seq2 has posterior: False
obs_seq1 has posterior information, but obs_seq2 does not. So we will remove the posterior columns from obs_seq1 DataFrame, using the pandas drop method before joining the two obs_seq objects together.
obs_seq1.df.drop(columns=obs_seq1.df.filter(like='posterior').columns, inplace=True)
print('obs_seq1 has posterior:', obs_seq1.has_posterior())
obs_seq1 has posterior: False
Now, let’s join the two obs_seq objects together using the join method.
obs_sequence.obs_sequence.join()
is a class method, so it is called
on the obs_sequence class, which
we’ve imported as obsq. The method takes a list of obs_seq objects to join.
obs_seq_mega = obsq.ObsSequence.join([obs_seq1, obs_seq2])
print(f'length of obs_seq1: {len(obs_seq1.df)}'), print(f'length of obs_seq2: {len(obs_seq2.df)}')
print(f'length of obs_seq_mega: {len(obs_seq_mega.df)}')
obs_seq_mega.df.head()
length of obs_seq1: 1000
length of obs_seq2: 10
length of obs_seq_mega: 1010
Now, the obs_seq_mega object has the observations from both obs_seq1 and obs_seq2. with the prior columns from both obs_seq DataFrames.
obs_seq_mega.df.columns
Index(['obs_num', 'observation', 'prior_ensemble_mean',
'prior_ensemble_spread', 'prior_ensemble_member_1',
'prior_ensemble_member_2', 'prior_ensemble_member_3',
'prior_ensemble_member_4', 'prior_ensemble_member_5',
'prior_ensemble_member_6', 'prior_ensemble_member_7',
'prior_ensemble_member_8', 'prior_ensemble_member_9',
'prior_ensemble_member_10', 'prior_ensemble_member_11',
'prior_ensemble_member_12', 'prior_ensemble_member_13',
'prior_ensemble_member_14', 'prior_ensemble_member_15',
'prior_ensemble_member_16', 'prior_ensemble_member_17',
'prior_ensemble_member_18', 'prior_ensemble_member_19',
'prior_ensemble_member_20', 'prior_ensemble_member_21',
'prior_ensemble_member_22', 'prior_ensemble_member_23',
'prior_ensemble_member_24', 'prior_ensemble_member_25',
'prior_ensemble_member_26', 'prior_ensemble_member_27',
'prior_ensemble_member_28', 'prior_ensemble_member_29',
'prior_ensemble_member_30', 'prior_ensemble_member_31',
'prior_ensemble_member_32', 'prior_ensemble_member_33',
'prior_ensemble_member_34', 'prior_ensemble_member_35',
'prior_ensemble_member_36', 'prior_ensemble_member_37',
'prior_ensemble_member_38', 'prior_ensemble_member_39',
'prior_ensemble_member_40', 'prior_ensemble_member_41',
'prior_ensemble_member_42', 'prior_ensemble_member_43',
'prior_ensemble_member_44', 'prior_ensemble_member_45',
'prior_ensemble_member_46', 'prior_ensemble_member_47',
'prior_ensemble_member_48', 'prior_ensemble_member_49',
'prior_ensemble_member_50', 'prior_ensemble_member_51',
'prior_ensemble_member_52', 'prior_ensemble_member_53',
'prior_ensemble_member_54', 'prior_ensemble_member_55',
'prior_ensemble_member_56', 'prior_ensemble_member_57',
'prior_ensemble_member_58', 'prior_ensemble_member_59',
'prior_ensemble_member_60', 'prior_ensemble_member_61',
'prior_ensemble_member_62', 'prior_ensemble_member_63',
'prior_ensemble_member_64', 'prior_ensemble_member_65',
'prior_ensemble_member_66', 'prior_ensemble_member_67',
'prior_ensemble_member_68', 'prior_ensemble_member_69',
'prior_ensemble_member_70', 'prior_ensemble_member_71',
'prior_ensemble_member_72', 'prior_ensemble_member_73',
'prior_ensemble_member_74', 'prior_ensemble_member_75',
'prior_ensemble_member_76', 'prior_ensemble_member_77',
'prior_ensemble_member_78', 'prior_ensemble_member_79',
'prior_ensemble_member_80', 'Data_QC', 'DART_quality_control',
'linked_list', 'longitude', 'latitude', 'vertical', 'vert_unit', 'type',
'metadata', 'external_FO', 'seconds', 'days', 'time', 'obs_err_var'],
dtype='object')
You can pass a list of columns to the join method to only join the columns you want. For example, if you only want to join the ‘prior_mean’ and ‘prior_spread’ columns, and discard the rest of the columns from the obs_seq objects, you can do so like this:
obs_seq_no_members = obsq.ObsSequence.join([obs_seq1, obs_seq2],
['prior_ensemble_mean',
'prior_ensemble_spread'])
Note, the join method will still include the required columns for the obs_seq object to function properly.
obs_seq_no_members.df.columns
Index(['obs_num', 'observation', 'prior_ensemble_mean',
'prior_ensemble_spread', 'linked_list', 'longitude', 'latitude',
'vertical', 'vert_unit', 'type', 'metadata', 'external_FO', 'seconds',
'days', 'time', 'obs_err_var'],
dtype='object')
Total running time of the script: (0 minutes 0.127 seconds)