Spectral Analysis#

This example demonstrates how to perform spectral and cross-spectral analysis using the scipy.signal module. To learn more about the functions and keyword arguments available in this package, read SciPy’s signal processing documentation.

The motivation for this example is to demonstrate how to perform the spectral analysis suite of NCL functions: specx_anal() and specxy_anal(), which calculate the spectra of a series, in Python.

This notebook will cover:

  1. Periodograms

  2. Smoothing

  3. Cross Spectral Analysis (cospectrum, quadtrature, phase, coherence squared)

  4. Coherence

For more examples of use, look at NCL spectral analysis applications.

To see a concise example using Python, visit the spectral_analysis notebook.

Spectral Analysis (Periodograms)#

According to the NCL specx_anal() documentation, the function:

  1. Optionally detrends the series

  2. Optionally tapers the series

  3. Calculates the variance of the detrended/tapered series

  4. Finds the Fast Fourier Transform of the series

  5. Squares the Fourier coefficients (i.e. the periodogram)

  6. Smooths the periodogram estimates

  7. Normalizes the periodogram so that the area under the curve matches the calculated variance

We will be able to combine steps 4 and 5 using scipy.signal.periodogram().

import xarray as xr
import geocat.datafiles as gcd
import pandas as pd
import scipy
import matplotlib.pyplot as plt
import numpy as np

Read in Data#

We’ll use the geocat-datafiles module to access the relevant datafile for this example: SOI_Darwin.nc

Then, we use xarray.open_dataset() function to read the data as an xarray dataset.

This data represents the Darwin Southern Oscillation Index between 1882 and 1998 using normalized sea level pressure.

soi_darwin = xr.open_dataset(gcd.get('netcdf_files/SOI_Darwin.nc'))

soi_darwin
<xarray.Dataset> Size: 22kB
Dimensions:  (time: 1404)
Coordinates:
  * time     (time) int32 6kB 0 1 2 3 4 5 6 ... 1398 1399 1400 1401 1402 1403
Data variables:
    date     (time) float64 11kB ...
    DSOI     (time) float32 6kB ...
Attributes:
    title:          Darwin Southern Oscillation Index
    source:         Climate Analysis Section, NCAR
    history:        \nDSOI = - Normalized Darwin\nNormalized sea level pressu...
    creation_date:  Tue Mar 30 09:29:20 MST 1999
    references:     Trenberth, Mon. Wea. Rev: 2/1984
    time_span:      1882 - 1998
    conventions:    None

Time Series Correction#

Currently, our time coordinate is in units of months since January 1882, we’ll use pandas to change this to the conventional DatetimeIndex.

start_date = pd.Timestamp('1882-01-01')
dates = [
    start_date + pd.DateOffset(months=int(month))
    for month in soi_darwin.indexes['time']
]
datetime_index = pd.DatetimeIndex(dates)

datetime_index
DatetimeIndex(['1882-01-01', '1882-02-01', '1882-03-01', '1882-04-01',
               '1882-05-01', '1882-06-01', '1882-07-01', '1882-08-01',
               '1882-09-01', '1882-10-01',
               ...
               '1998-03-01', '1998-04-01', '1998-05-01', '1998-06-01',
               '1998-07-01', '1998-08-01', '1998-09-01', '1998-10-01',
               '1998-11-01', '1998-12-01'],
              dtype='datetime64[ns]', length=1404, freq=None)
soi = soi_darwin.DSOI
soi['time'] = datetime_index

soi.plot();
../../_images/0069320f9705e3e0a8f03a64bd760ac673458c8acc6a1c84a755aec381137437.png

Detrend#

Here we will do a simple detrending by removing the mean with scipy.signal.detrend().

If you wanted to perform a linear least-squares fit, change type='linear'.

The scipy.signal.periodogram() function does have a detrending option, but because we are using an unsupported tapering window, and because detrending must be done before tapering, here we demonstrate how to separate out these steps.

soi_detrended = scipy.signal.detrend(soi, type='constant')

Tapering#

According to the code in NCL’s specx_dp.DTAPERX, the tapering performed in the NCL example is split-cosine-bell tapering.

Split-cosine-bell tapering is sometimes identified to as Tukey tapering, however Tukey can also refer to “cosine tapering”.

Here we demonstrate how to perform a unique tapering window with scipy.signal.windows.tukey. Your desired tapering can be chosen instead.

percent_taper = 0.1
tukey_window = scipy.signal.windows.tukey(
    len(soi_detrended), alpha=percent_taper, sym=False
)  # generates a periodic window

soi_tapered = soi_detrended * tukey_window

Periodogram#

Now we can call scipy.signal.periodogram().

A periodogram estimate is the square of the coefficients from the Fourier transform

There is also a scaling keyword that defaults to density for computing the power spectral density with units of \(V^{2}/Hz\), with V being the units of the input array (here dimensionless), and Hz representing your frequency units (we’ll use per month instead of per second). spectral is also supported and produces the squared magnitude spectrum with units of \(V^{2}\).

freq_soi, psd_soi = scipy.signal.periodogram(
    soi_tapered,
    fs=1,  # sample monthly
    detrend=False,
)

Smoothing#

To provide statistical reliability, the periodogram estimates must be smoothed.

The NCL example uses modified Daniel kernel smoothing, which is not supported, here we’ll demonstrate how to perform your own custom smoothing using scipy.signal.convolve().

Here is a list of supported smoothing windows.

k = 7  # define smoothing constant

kernel = np.ones(7)  # Create a Daniel smoothing kernel
kernel[0] = (
    0.5  # "Modify" kernel by making the endpoints have half the weight of the interior points
)
kernel[-1] = 0.5
kernel = kernel / kernel.sum()

smoothed_psd = scipy.signal.convolve(
    psd_soi, kernel, mode='same'
)  # Sets output array as the same length as the first input

Normalization#

Lastly we’ll normalize our smoothed periodogram by the variance of the detrended and tapered series.

variance = np.var(soi_tapered, ddof=1)

df = freq_soi[1] - freq_soi[0]  # Frequency step

# Create an array to adjust contributions of endpoints
frac = np.ones_like(freq_soi)
frac[0] = 0.5
frac[-1] = 0.5

current_area = np.sum(
    smoothed_psd * df * frac
)  # Calculate the current area under the curve
normalization_factor = variance / current_area  # Find the factor to adjust this area

normalized_psd = (
    smoothed_psd * normalization_factor
)  # Apply the normalization factor to the smoothed power spectrum

Plot#

plt.plot(freq_soi, normalized_psd)
plt.xlabel('Frequency (Cycles / Month)')
plt.ylabel('Variance (Pressure$^2$ / Frequency)')
plt.title('Darwin Southern Oscillation Index (1882 - 1998) - Power Spectral Density')
plt.xlim([0, 0.5]);
../../_images/71c375dc611273e5046ad052fa6232c2869873735a461f6dcd79abe0e28c32dc.png

Cross-Spectral Analysis#

Read in Data#

For our cross-spectral data we’ll look at SLP_DarwinTahitiAnom which contains Darwin and Tahiti sea level pressure anomalies between 1935-1998.

This data will need a time index correction to use the standard DateTimeIndex.

slp_darwin = xr.open_dataset(gcd.get('netcdf_files/SLP_DarwinTahitiAnom.nc'))

# Time Series Correction
start_date = pd.Timestamp('1935-01-01')  # Months Since January 1935
dates = [
    start_date + pd.DateOffset(months=int(month))
    for month in slp_darwin.indexes['time']
]
datetime_index = pd.DatetimeIndex(dates)

slp_darwin['time'] = datetime_index
slp_darwin
<xarray.Dataset> Size: 18kB
Dimensions:  (time: 768)
Coordinates:
  * time     (time) datetime64[ns] 6kB 1935-01-01 1935-02-01 ... 1998-12-01
Data variables:
    date     (time) float64 6kB ...
    DSLP     (time) float32 3kB ...
    TSLP     (time) float32 3kB ...
Attributes:
    title:          Darwin and Tahiti Sea Level Pressure Anomalies: 1935-1998
    source:         Climate Analysis Section, NCAR
    history:        \nCreated to test specxy routines
    creation_date:  Tue Mar 30 12:34:39 MST 1999
    conventions:    None
dslp = slp_darwin.DSLP  # Darwin
dslp
<xarray.DataArray 'DSLP' (time: 768)> Size: 3kB
[768 values with dtype=float32]
Coordinates:
  * time     (time) datetime64[ns] 6kB 1935-01-01 1935-02-01 ... 1998-12-01
Attributes:
    short_name:  DSLP
    long_name:   Darwin Sea Level Pressure Anomalies
    units:       hPa
tslp = slp_darwin.TSLP  # Tahiti
tslp
<xarray.DataArray 'TSLP' (time: 768)> Size: 3kB
[768 values with dtype=float32]
Coordinates:
  * time     (time) datetime64[ns] 6kB 1935-01-01 1935-02-01 ... 1998-12-01
Attributes:
    short_name:  TSLP
    long_name:   Tahiti Sea Level Pressure Anomalies
    units:       hPa
dslp.plot(label='Darwin')
tslp.plot(label='Tahiti')
plt.legend()
plt.ylabel('Sea Level Pressure Anomalies [hPa]');
../../_images/36300771defaedbf64f74273f55ec9ba643c203cd86ffdbb9913713798095590.png

Cross-Power Spectral Density#

We will calculate cospectrum, quadtrature, and phase by leveraging scipy.signal.csd() and computing its real (cospectrum) and imaginary (quadtrature) components, and calculating the angle between them (phase).

f, pxy = scipy.signal.csd(
    dslp,
    tslp,
    fs=1,  # monthly
    detrend='constant',
)  # remove mean

cospectrum = np.real(pxy)
quadrature = np.imag(pxy)

phase = np.angle(pxy, deg=True)
fig, axs = plt.subplots(1, 3, figsize=(10, 5))

axs[0].plot(f, cospectrum)
axs[0].set_xlim([0, 0.5])
axs[0].set_xlabel('Frequency (Cycles / Month)')
axs[0].set_ylabel('Cospectrum')

axs[1].plot(f, quadrature)
axs[1].set_xlim([0, 0.5])
axs[1].set_ylim([-3, 3])
axs[1].set_xlabel('Frequency (Cycles / Month)')
axs[1].set_ylabel('Quadrature')

axs[2].plot(f, phase)
axs[2].set_xlim([0, 0.5])
axs[2].set_ylim([-200, 200])
axs[2].set_xlabel('Frequency (Cycles / Month)')
axs[2].set_ylabel('Phase')

fig.suptitle(
    'Darwin and Tahiti Sea Level Pressure Anomalies (1935-1998) - Cross-Spectral Analysis'
)

plt.tight_layout();  # Makes the plots y-labels not overlap the neighboring plot
../../_images/50d3428ee2239cf76fddfb2859ab53a2d4a2187d22483404dd73e34b3db236f4.png

Coherence Squared#

Coherence is a measure of how similar two specta are, with regards to their frequency and phase offset relative to one another.

For coherence squared, simply square the output from scipy.signal.coherence().

f, cxy = scipy.signal.coherence(
    dslp,
    tslp,
    fs=1,  # monthly
    detrend='constant',
)  # remove mean

co_sq = cxy**2
plt.plot(f, co_sq)
plt.xlim([0, 0.5])
plt.ylim([0, 1])
plt.xlabel('Frequency (Cycles / Month)')
plt.ylabel('Coherence SQ')

plt.title('Darwin and Tahiti Sea Level Pressure Anomalies (1935-1998)');
../../_images/79d44a53e46e138dfb5a41fa9040bf817f1d9a735ccda69a58a76fee50453247.png

Peridogram and Smoothing#

Previously we demonstrated additional tapering and normalization steps for the periodgram. Now we demonstrate a faster working example for our Darwin and Tahiti datasets.

# Calculate Periodogram
freq_dslp, dslp_psd = scipy.signal.periodogram(
    dslp,
    fs=1,  # monthly
    detrend='constant',
)  # remove mean
freq_tslp, tslp_psd = scipy.signal.periodogram(tslp, fs=1, detrend='constant')


# Perform modified Daniel Smoothing
k = 3  # smoothing constant

window = scipy.signal.windows.hann(2 * k + 1)  # Hann Window
window = window / window.sum()  # Normalize window

dslp_smoothed = scipy.signal.convolve(dslp_psd, window, mode='same')
tslp_smoothed = scipy.signal.convolve(tslp_psd, window, mode='same')
fig, axs = plt.subplots(1, 2, figsize=(10, 5))

axs[0].plot(freq_dslp, dslp_smoothed)
axs[0].set_xlim([0, 0.5])
axs[0].set_ylim([0, 30])
axs[0].set_xlabel('Frequency (cycles/month)')
axs[0].set_ylabel('Variance of Darwin SLP')

axs[1].plot(freq_tslp, tslp_smoothed)
axs[1].set_xlim([0, 0.5])
axs[1].set_ylim([0, 30])
axs[1].set_xlabel('Frequency (cycles/month)')
axs[1].set_ylabel('Variance of Tahiti SLP')

fig.suptitle(
    'Darwin and Tahiti Sea Level Pressure (SLP) Anomalies (1935-1998) - Power Spectral Density'
);
../../_images/7a2db257a5b18e3b45a2893bdb39944c2af18260fb58fa9b68c4580ac15bb294.png