Task
Select specific time ranges from multidimensional arrays
Solution
Pandas periods
I like pandas for very easy time handling, and would like to use similar approach when work with multidimensional arrays, for example from netCDF files. There are already some efforts to do this. However I don't need anything complicated, just select some months, years of time periods. For this I can use pandas itself and benefit from its great time indexing. Below I will show a small example of how to do this.
Necessary imports (everything can be installed from Anaconda)
import matplotlib.pylab as plt
from netCDF4 import Dataset, num2date
import pandas as pd
from mpl_toolkits.basemap import Basemap
import numpy as np
%matplotlib inline
Download some data (monthly mean air temperatures from NCEP reanalysis):
#!wget ftp://ftp.cdc.noaa.gov/Datasets/ncep.reanalysis.derived/surface/air.mon.mean.nc
Open the file:
f = Dataset('./air.mon.mean.nc')
Load variables, and create 2D lat and lon
air = f.variables['air'][:]
time = f.variables['time']
lon = f.variables['lon'][:]
lat = f.variables['lat'][:]
lon, lat = np.meshgrid(lon, lat)
Here we create dates
variable, that contains datetime
values, extracted by num2date
function from netCDF4
package (don't confuse it with matplotlib version, which is not that great!)
dates = num2date(time[:], time.units)
dates[:3]
Now we convert dates
to pandas format:
dates_pd = pd.to_datetime(dates)
And then convert timestemps to periods, since we dealing with monthly values:
periods = dates_pd.to_period(freq='M')
periods
This is a small helper function, that will plot our results:
def plt_map(data):
m = Basemap(projection='mill',llcrnrlat=-80,urcrnrlat=80,\
llcrnrlon=0,urcrnrlon=360,lat_ts=20,resolution='c')
x, y = m(lon, lat)
plt.figure(figsize=(10,7))
m.drawcoastlines()
m.drawparallels(np.arange(-80.,81.,20.))
m.drawmeridians(np.arange(-180.,181.,20.))
m.drawmapboundary(fill_color='white')
m.contourf(x,y,data,levels=np.linspace(-25,30,56), extend="both");
plt.colorbar( orientation='horizontal', pad=0.05)
Test on the first time step:
plt_map(air[0,:,:])
Select specific year (yearly mean)¶
We create the mask from the periods
where only 2010 is selected, then apply it to the time dimension of our variable, and calculate the mean over time:
mask_2014 = periods.year==2010
data = air[mask_2014,:,:].mean(axis=0)
plt_map(data)
Monthly mean¶
Here we do the same thing, but for months (September mean):
mask_months = periods.month==9
data = air[mask_months,:,:].mean(axis=0)
plt_map(data)
Time period¶
Select only Octobers in 2000s:
mask_tm = (periods.year>=2000)&(periods.year<=2010)&(periods.month==10)
And calculate the mean:
data = air[mask_tm,:,:].mean(axis=0)
plt_map(data)
And so on... I guess you get the idea :)
Comments !