Task:

easy access to climatology data

Solution:

ulmo

Notebook file

One of the main things that bothers me most at work is data conversion. World would be a much better place for somebody like me if everybody use netCDF file format for data distribution. While situation slightly changing, and more and more organisations switch to netCDF, there are still plenty of those who distribute their data in some crazy forms.

Would it be nice if somebody once and for all create converters for all this formats and provide a way to directly search and access data from python? Imagine - instead of spending time writing regular expressions for another converter you could ~~watch cat videos on youtube~~ do more actual research stuff. Project ulmo tries to do something like this (AFAIU) and provide "clean, simple and fast access to public hydrology and climatology data".

They certainly now do more hydrology than climatology data, but for me there is also something interesting - historical measurements from meteorological stations. In the following I will give you an example of how to access those, so you get an idea of what ulmo is doing.

Nessesary preparations, as usual:

In [1]:

%pylab inline

Welcome to pylab, a matplotlib-based Python environment [backend: module://IPython.kernel.zmq.pylab.backend_inline].
For more information, type 'help(pylab)'.

Installation is very simple with pip, but if you need development version, go to githib repository.

In [2]:

!pip install ulmo

Import ulmo and pandas (you already use pandas for data analysis, right? if not, you have to, go and read this post first).

In [3]:

import ulmo
import pandas

We are going to work with data from National Climatic Data Center Global Historical Climate Network - Daily dataset. Say, we would like to find station in our home town. It's a good idea to start by getting information obout stations in the country of interest. Ulmo has a function ulmo.ncdc.ghcn_daily.get_stations that can obtain information about stations available in GHCN dataset and also let you define some conditions for the search, like country, time span, and specific variables.

Country should be provided as a country code, and list of countries available here. We will search for station in Germany (GM) and ask to return pandas Data Frame. Be patient it might take some time.

In [4]:

st = ulmo.ncdc.ghcn_daily.get_stations(country='GM', as_dataframe=True)

Now we have a nice table with information about all available German meteorological stations:

In [17]:

st.head()

Out[17]:

	country	network	network_id	latitude	longitude	elevation	state	name	gsn_flag	hcn_flag	wm_oid	id
id
GM000001153	GM	0	00001153	51.9506	7.5914	62	NaN	MUENSTER	NaN	NaN	10313	GM000001153
GM000001474	GM	0	00001474	53.0464	8.7992	4	NaN	BREMEN	NaN	NaN	10224	GM000001474
GM000002277	GM	0	00002277	49.7517	6.6467	144	NaN	TRIER	NaN	NaN	10609	GM000002277
GM000002288	GM	0	00002288	49.4253	7.7367	285	NaN	KAISERSLAUTERN	NaN	NaN	NaN	GM000002288
GM000002698	GM	0	00002698	49.0392	8.3650	112	NaN	KARLSRUHE	NaN	NaN	10727	GM000002698

Let's search for Hamburg stations:

In [45]:

st[st.name.str.contains('HAMBURG')]

Out[45]:

	country	network	network_id	latitude	longitude	elevation	state	name	gsn_flag	hcn_flag	wm_oid	id
id
GM000003865	GM	0	00003865	53.4806	10.2428	35	NaN	HAMBURG BERGEDORF	NaN	NaN	NaN	GM000003865
GM000010147	GM	0	00010147	53.6350	9.9900	11	NaN	HAMBURG FUHLSBUETTEL	GSN	NaN	10147	GM000010147

There are only two stations, and we are interested only in data from HAMBURG FUHLSBUETTEL.

Getting the data is also very easy,. The only thing you need is the id of your station:

In [7]:

data = ulmo.ncdc.ghcn_daily.get_data('GM000010147', as_dataframe=True)

This function returns dictionary with names of the variables as keys and pandas data frames with measurements as values:

In [9]:

data

Out[9]:

{'PRCP': 
PeriodIndex: 44591 entries, 1891-01-01 to 2013-01-31
Data columns:
value    44591  non-null values
mflag    0  non-null values
qflag    0  non-null values
sflag    44591  non-null values
dtypes: object(4),
 'SNWD': 
PeriodIndex: 28855 entries, 1934-02-01 to 2013-01-31
Data columns:
value    27829  non-null values
mflag    0  non-null values
qflag    0  non-null values
sflag    27829  non-null values
dtypes: object(4),
 'TMAX': 
PeriodIndex: 44591 entries, 1891-01-01 to 2013-01-31
Data columns:
value    44591  non-null values
mflag    0  non-null values
qflag    195  non-null values
sflag    44591  non-null values
dtypes: object(4),
 'TMIN': 
PeriodIndex: 44591 entries, 1891-01-01 to 2013-01-31
Data columns:
value    44591  non-null values
mflag    0  non-null values
qflag    195  non-null values
sflag    44591  non-null values
dtypes: object(4)}

Let's get maximum daily temperatures:

In [10]:

tm = data['TMAX'].copy()

In [11]:

tm.head()

Out[11]:

	value	mflag	qflag	sflag
1891-01-01	-72	NaN	NaN	E
1891-01-02	-43	NaN	NaN	E
1891-01-03	-32	NaN	NaN	E
1891-01-04	12	NaN	NaN	E
1891-01-05	-29	NaN	NaN	E

Values has to be divided by 10 in order to get degrees Celsius:

In [12]:

tm.value=tm.value/10.0

Now you can plot the data, as you would usually do with pandas:

In [13]:

tm['value']['1980':'2010'].plot()

Out[13]:

Or do some statistical analysis:

In [14]:

pandas.rolling_mean(tm.value, window=365).plot()

Out[14]:

Unfortunately something like

tm.value.resample('A')

will not work, since value have a type that pandas can't process in this case. We first have to convert value variable to float:

In [15]:

tm.value = tm.value.astype('float')

Now it's working:

In [16]:

tm['1950':'2012'].value.resample('A').plot()
title('Annual mean daily maximum temperature in Hamburg')

Out[16]:

Now you know how to easily find meteo station and get it's data with ulmo. Maybe if you want data only for one station ulmo is not that useful, but when you begin to collect statistics about many stations, then it became very handy. If you use hydrological data you certainly has to give ulmo a try (see list of supported data sets). Hopefully authors will continue development and add more data sources in future:)

EarthPy

Climatology data access with ulmo

Comments !

Comments !

links

social