Task:
continue interactive analysis of time series (AO, NAO indexes)
Module:
pandas
In the previous part we looked at very basic ways of work with pandas. Here I am going to introduce couple of more advance tricks. We will use very powerful pandas IO capabilities to create time series directly from the text file, try to create seasonal means with resample and multi-year monthly means with groupby. At the end I will show how new functionality from the upcoming IPython 2.0 can be used to explore your data more efficiently with sort of a simple GUI (interact function). There might be easier or better ways to do some of the things discussed here, and I will be happy to hear about them in comments :)
Import usual suspects and change some output formatting:
import pandas as pd
import numpy as np
%matplotlib inline
pd.set_option('max_rows',15) # this limit maximum numbers of rows
We also going to download necessary files. their description can be found in the first part.
!wget http://www.cpc.ncep.noaa.gov/products/precip/CWlink/daily_ao_index/monthly.ao.index.b50.current.ascii
!wget http://www.cpc.ncep.noaa.gov/products/precip/CWlink/pna/norm.nao.monthly.b5001.current.ascii
Pandas IO¶
Pandas is equipped with very rich IO functionality, that allows direct conversion of essentially any text table based data format to Series or DataFrame directly. There is very good extensive documentation with a lot of examples. Here we are going to open AO file in the same way we did in the first part and NAO file with pandas io. Then we are going to combine two in one DataFrame.
Simple numpy loadtxt, create dates and then Series.
ao = np.loadtxt('monthly.ao.index.b50.current.ascii')
dates = pd.date_range('1950-01', '2014-03', freq='M')
AO = pd.Series(ao[:,2], index=dates)
Now let's open NAO. First remind ourselves how the file looks like:
!tail norm.nao.monthly.b5001.current.ascii
We have 3 space separated columns with two first columns containing years and months. Here is the expression that will create time series out of this file:
NAO = pd.read_table('norm.nao.monthly.b5001.current.ascii', sep='\s*', \
parse_dates={'dates':[0, 1]}, header=None, index_col=0, squeeze=True)
NAO
Some explanations:
- first argument is obviously the file name
- '\s*' - regular expression, that describe separator.
- parse_dates - combine columns 0 and 1, convert resulting column to dates and give it the name "dates"
- header - don't use 0 row as header
- index_col - make column 0 (this will be already result of the parse_dates parsing)
- squeeze - create Series instead of DataFrame.
Now we would like to combine AO and NAO Series. But there is a little problem - dates in our two Series are different. Pandas date parser returns time stamps, so it uses present day number (15 in my case) and interpret indexes in NAO as points in time. Similar thing happened with AO series. Its index has monthly frequency, but every value is interpreted as point in time associated with last day of the month. As a consequence simple approach will not work:
aonao = pd.DataFrame({'AO':AO, 'NAO':NAO})
aonao.head(10)
But our data are monthly means, so they are related not to some particular point in time, but rather to the time interval, or time span. We can convert time stamps in our Series to time periods, and then combine them.
aonao = pd.DataFrame({'AO':AO.to_period(freq='M'), 'NAO':NAO.to_period(freq='M')} )
aonao.head(10)
Note that now index show only years and months. Below you can see that type of indexes for original time series and for converted one differ:
type(AO.index)
type(AO.to_period(freq='M').index)
Seasonal means with resample¶
Initially pandas was created for analysis of financial information and it thinks not in seasons, but in quarters. So we have to resample our data to quarters. We also need to make a shift from standard quarters, so they correspond with seasons. This is done by using 'Q-NOV' as a time frequency, indicating that year in our case ends in November:
q_mean = aonao.resample('Q-NOV')
q_mean.head()
q_mean[q_mean.index.quarter==1].plot(figsize=(8,5))
If you don't mind to sacrifice first two data points (that strictly speaking can't represent the whole winter of 1949-1950), there is another way to do similar thing by just resampling to 3M (3 months) interval starting from March (third data point):
m3_mean = aonao[2:].resample('3M', closed='left' )
m3_mean.head()
Now in order to select all winter months we have to choose Februaries (last month of the season):
m3_mean[m3_mean.index.month==2].plot(figsize=(8,5))
Result is the same except for the first point. This method allows to use any possible time frequency, but one will have to deal with time stamps again, since periods for arbitrary frequencies are not yet implemented.
Multi-year monthly means with groupby¶
First step will be to add another column to our DataFrame with month numbers:
aonao['mon'] = aonao.index.month
aonao
Now we can use groupby to group our values by months and calculate mean for each of the groups (month in our case):
monmean = aonao['1950':'2013'].groupby('mon').aggregate(mean)
monmean.plot(kind='bar')
There are very large negative values for winter months of AO. In order to see what is going on there it is useful to look at the box plots for every month:
ax = aonao.boxplot(column=['AO'], by='mon')
ax = aonao.boxplot(column=['NAO'], by='mon')
While NAO show more or less uniform spread, AO have pronounced seasonal variations, with largest spread during winter months.
Interactive exploration of data (only for IPython 2.0)¶
Say we would like to look at variability of our indexes by individual months, and also (if necessary) do a bit of smoothing in order to filter out high frequencies. Something like this will work:
pd.rolling_mean(aonao[['AO','NAO']][aonao.mon==1], window=1).plot()
This is data for January and there is no smoothing (window=1).
pd.rolling_mean(aonao[['AO','NAO']][aonao.mon==2], window=10).plot()
This one is February and rolling mean with 10 year window is applied. Would it be nice to be able to change our two parameters (month and window) somehow interactively? IPython developers include very simple way for such interactive interaction in the upcoming 2.0 version. The following code will only work on local machine, and only with IPython > 2.0.
Import interact:
from IPython.html.widgets import interact
Define function that will use our parameters as input:
def kp(mm=1, wind=1):
pd.rolling_mean(aonao[['AO','NAO']][aonao.mon==mm], window=wind).plot(ylim=(-4,4))
And now you call interact with our previously defined function as first argument. Other arguments are our parameters with value limits and step size:
cc = interact(kp, mm=(1, 12, 1), wind=(1,10,1))
As simple as that you get two controls for months and window size, that you can operate with mouse or arrow keys. More about this feature you can find in this talk of Brian Granger.
Those who does not have IPython 2.0 yet can enjoy video of the process below :)
from IPython.display import YouTubeVideo
YouTubeVideo('Ba9GAq5PR_8')
Comments !