Seaborn library

Task:

Make matplotlib plots look nicer

Solution:

Seaborn

We all know and love matplotlib, but I guess most of you agree that the default output of matplotlib is ugly. One can spend hours tweaking the plots, and for sure you will get very nice result at the end - customization is one of the great powers of matplotlib after all. However there is another way - just rely on beautiful defaults created by someone else. Below I will show you couple of examples with Seaborn library, that is based on matplotlib, but make figures look much better. It also provide very simple way to draw statistical graphs, that I will also demonstrate.

Online documentation

Tutorial

In [1]:
%matplotlib inline
import numpy as np
import pandas as pd

We going to work with climate indeces (AO, NAO, PNA indices):

In [2]:
!wget http://www.cpc.ncep.noaa.gov/products/precip/CWlink/daily_ao_index/monthly.ao.index.b50.current.ascii
!wget http://www.cpc.ncep.noaa.gov/products/precip/CWlink/pna/norm.nao.monthly.b5001.current.ascii
!wget http://www.cpc.ncep.noaa.gov/products/precip/CWlink/pna/norm.pna.monthly.b5001.current.ascii

Create pandas data frame (ind) out of them:

In [4]:
AO = pd.read_table('monthly.ao.index.b50.current.ascii', sep='\s*', \
              parse_dates={'dates':[0, 1]}, header=None, index_col=0, squeeze=True)
NAO = pd.read_table('norm.nao.monthly.b5001.current.ascii', sep='\s*', \
              parse_dates={'dates':[0, 1]}, header=None, index_col=0, squeeze=True)
PNA = pd.read_table('norm.pna.monthly.b5001.current.ascii', sep='\s*', \
              parse_dates={'dates':[0, 1]}, header=None, index_col=0, squeeze=True)

ind = pd.DataFrame({'AO':AO, 'NAO':NAO, 'PNA':PNA})

Here is how usual plots will look like (done with pandas, but uses matplotlib under the hood):

In [5]:
ind.plot()
Out[5]:
<matplotlib.axes._subplots.AxesSubplot at 0xaefc4cec>
In [6]:
ind.AO.hist()
Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0xaee9a5cc>

They do the job - show some data, but also have certain 1990s flaivor :)

And now we just import seaborn library:

In [7]:
import seaborn as sns

And magic happend, plots change automatically:

In [8]:
ind.plot()
Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0xac1d82ec>
In [9]:
ind.AO.hist()
Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0xac1f624c>

Same plots, but look much better. You can tweek some basic styling:

In [10]:
sns.set(context='talk',style='ticks')
In [11]:
ind.AO.hist()
Out[11]:
<matplotlib.axes._subplots.AxesSubplot at 0xabfea40c>
In [12]:
sns.set(context='talk',style='darkgrid')
In [13]:
ind.AO.hist()
Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0xabf6434c>

But at the end you get just a usual matplotlib object, that later can be tweaked in a way you want.

Statistical plots

But seaborn is mainly not about pretty plots, it's about pretty statistical plots. Regression for example:

In [14]:
sns.regplot('AO', 'NAO', data=ind)
Out[14]:
<matplotlib.axes._subplots.AxesSubplot at 0xaa66142c>
In [15]:
sns.jointplot('AO', 'NAO', data=ind)
Out[15]:
<seaborn.axisgrid.JointGrid at 0xaa5e23ec>
In [16]:
sns.jointplot('AO', 'NAO', data=ind, kind='reg', size=7)
Out[16]:
<seaborn.axisgrid.JointGrid at 0xaa540bec>

Or correlation heat map:

In [17]:
sns.corrplot(ind)
Out[17]:
<matplotlib.axes._subplots.AxesSubplot at 0xa9d420cc>

This is how box plots look like with seaborn:

In [18]:
ind['mon'] = ind.index.month
In [19]:
sns.boxplot(ind.AO, groupby=ind.mon, alpha=0.8)
Out[19]:
<matplotlib.axes._subplots.AxesSubplot at 0xa9d1452c>
In [20]:
sns.boxplot(ind.NAO, groupby=ind.mon)
Out[20]:
<matplotlib.axes._subplots.AxesSubplot at 0xa994264c>
In [21]:
sns.boxplot(ind.PNA, groupby=ind.mon)
Out[21]:
<matplotlib.axes._subplots.AxesSubplot at 0xa9654c6c>

There is also a violin plot:

In [22]:
sns.violinplot(ind.NAO, groupby=ind.mon, inner="points", kernel='cos')
Out[22]:
<matplotlib.axes._subplots.AxesSubplot at 0xa942660c>

Kernel density estimate:

In [24]:
sns.kdeplot(ind.AO, shade=True)
sns.kdeplot(ind.NAO, shade=True)
sns.kdeplot(ind.PNA, shade=True)
Out[24]:
<matplotlib.axes._subplots.AxesSubplot at 0xa91a35cc>

Distribution:

In [26]:
sns.distplot(ind.AO, rug=True)
Out[26]:
<matplotlib.axes._subplots.AxesSubplot at 0xa9165e6c>

Linear regression for all data together:

In [29]:
sns.lmplot('AO', 'NAO', ind)
Out[29]:
<seaborn.axisgrid.FacetGrid at 0xa8429d6c>

For individual months (all together):

In [30]:
sns.lmplot('AO', 'NAO', ind, hue='mon')
Out[30]:
<seaborn.axisgrid.FacetGrid at 0xa83d506c>

For individual months one by one:

In [31]:
sns.lmplot('AO', 'NAO', ind, hue='mon' , col='mon', col_wrap=3, size=4)
Out[31]:
<seaborn.axisgrid.FacetGrid at 0xa8669cac>