Select time ranges in multidimensional arrays with pandas

Task

Select specific time ranges from multidimensional arrays

Solution

Pandas periods

I like pandas for very easy time handling, and would like to use similar approach when work with multidimensional arrays, for example from netCDF files. There are already some efforts to do this. However I don't need anything complicated, just select some months, years of time periods. For this I can use pandas itself and benefit from its great time indexing. Below I will show a small example of how to do this.

Necessary imports (everything can be installed from Anaconda)

In [1]:
import matplotlib.pylab as plt
from netCDF4 import Dataset, num2date
import pandas as pd
from mpl_toolkits.basemap import Basemap
import numpy as np
%matplotlib inline

Download some data (monthly mean air temperatures from NCEP reanalysis):

In [2]:
#!wget ftp://ftp.cdc.noaa.gov/Datasets/ncep.reanalysis.derived/surface/air.mon.mean.nc

Open the file:

In [3]:
f = Dataset('./air.mon.mean.nc')

Load variables, and create 2D lat and lon

In [4]:
air = f.variables['air'][:]
time = f.variables['time']
lon = f.variables['lon'][:]
lat = f.variables['lat'][:]

lon, lat = np.meshgrid(lon, lat)

Here we create dates variable, that contains datetime values, extracted by num2date function from netCDF4 package (don't confuse it with matplotlib version, which is not that great!)

In [5]:
dates = num2date(time[:], time.units)
In [6]:
dates[:3]
Out[6]:
array([datetime.datetime(1948, 1, 1, 0, 0),
       datetime.datetime(1948, 2, 1, 0, 0),
       datetime.datetime(1948, 3, 1, 0, 0)], dtype=object)

Now we convert dates to pandas format:

In [7]:
dates_pd = pd.to_datetime(dates)

And then convert timestemps to periods, since we dealing with monthly values:

In [8]:
periods = dates_pd.to_period(freq='M')
In [9]:
periods
Out[9]:

[1948-01, ..., 2015-01]
Length: 805, Freq: M

This is a small helper function, that will plot our results:

In [10]:
def plt_map(data):
    m = Basemap(projection='mill',llcrnrlat=-80,urcrnrlat=80,\
            llcrnrlon=0,urcrnrlon=360,lat_ts=20,resolution='c')
    x, y = m(lon, lat)
    plt.figure(figsize=(10,7))
    m.drawcoastlines()
    m.drawparallels(np.arange(-80.,81.,20.))
    m.drawmeridians(np.arange(-180.,181.,20.))
    m.drawmapboundary(fill_color='white')
    m.contourf(x,y,data,levels=np.linspace(-25,30,56), extend="both");
    plt.colorbar( orientation='horizontal', pad=0.05)

Test on the first time step:

In [11]:
plt_map(air[0,:,:])

Select specific year (yearly mean)

We create the mask from the periods where only 2010 is selected, then apply it to the time dimension of our variable, and calculate the mean over time:

In [12]:
mask_2014 = periods.year==2010
In [13]:
data = air[mask_2014,:,:].mean(axis=0)
In [14]:
plt_map(data)

Monthly mean

Here we do the same thing, but for months (September mean):

In [15]:
mask_months = periods.month==9
In [16]:
data = air[mask_months,:,:].mean(axis=0)
In [17]:
plt_map(data)

Time period

Select only Octobers in 2000s:

In [18]:
mask_tm = (periods.year>=2000)&(periods.year<=2010)&(periods.month==10)

And calculate the mean:

In [19]:
data = air[mask_tm,:,:].mean(axis=0)
In [20]:
plt_map(data)

And so on... I guess you get the idea :)

Comments !

links

social