Use of Basemap for Amazon river discharge visualization


    Show how to work with river discharge data.
    Also show cople of ways to visualise this data with Basemap. 


    Pandas, Basemap

This notebook was originally created for Marinexplore Earth Data Challenge, in order to show how data for the submission was processed. I think it might be also interesting for those who begin to use python in geoscience, because it demonstrate couple of ways to handle csv and netCDF data and plotting capabilities of the Basemap module. There will be no extensive explanations though, moistly the code.

I want to show a small example of the work flow that is more or less typical during research process. Often you see some interesting feature in your data and want to investigate it in more detail. If you not lucky enough to work with the model data, this would require dealing with multiple data sources, and possibly multiple file formats. Having all data sets in one place in consistent format becomes very handy for this type of applications.

I begin with the visualization of river discharge data, that I did for one of my projects. It is performed in Tableau, and use Dai and Trenberth Global River Flow and Continental Discharge Dataset. This data set is also available on Marinexplore.

In [1]:
from IPython.display import HTML
HTML('<iframe src=\
top925rivers?:embed=y&:display_count=no width=800 height=700></iframe>')

This visualization is interactive, so if you choose several rivers on the map it will show a bar chart with distribution of river discharges.

There is an intresting story related to this visualization. After my first attempt to plot the data it was obvious that something is wrong - Niger had discarge close to the Amazon, and there were also cople of rivers in the top that I never heard of. I check my scripts, they were fine, so the problem was with the original data set. I wrote to the autors and now the error is fixed, and there is red letter warning on the Dai and Trenberth dataset website. Funny thing is, that for four years, that this data set is used nobody notices the error, and it took only one simple visualization to find it. So visualization might serve as simple, but very efficient way to check your data for errors.

I also want to show some simple statistics about my data, so I create another visualization with only 20 biggest rivers:

In [3]:
HTML('<iframe src=\
views/rivers_0/top20rivers width=800 height=700></iframe>')

Look at Amazon. I knew that it's a biggest river in the world, but I never though it's domination is so overwhelming. This huge river discharge should be visible somehow in salinity field of the ocean. In order to test this I decide to look at satellite salinity data.

But first we would need to plot our river discharge data in pyhton. Text file with anual mean river discharge values from Dai and Trenberth have been converted in .csv format with vertical bar as separator and is available here. We are going to open it with pandas, and plot with basemap.

In [2]:
%pylab inline
Welcome to pylab, a matplotlib-based Python environment [backend: module://IPython.kernel.zmq.pylab.backend_inline].
For more information, type 'help(pylab)'.
In [4]:
import pandas as pd
In [5]:
df = pd.read_csv('/home/magik/NOTEBOOKS/marine/rivers2.csv', sep='|', na_values=[' Null', '  Null'])

Here is our data in pandas DataFrame:

In [6]:
No m2s_ra lonm latm area(km2) Vol nyr yrb yre elev(m) CT CN River_Name OCN Station_Name
0 1 12462 -51.75 -0.75 4618750 5389.537 79.0 1928 2006 37 BR SA Amazon ATL Obidos, Bra
1 2 10291 12.75 -5.75 3475000 1270.203 97.1 1903 2000 270 CD AF Congo ATL Kinshasa, C
2 3 11474 -61.75 9.25 836000 980.088 75.8 1923 1999 -9999 VE SA Orinoco ATL Pte Angostu
3 4 10374 120.75 32.25 1705383 905.141 99.6 1900 2000 19 CN AS Changjiang PAC Datong, Chi
4 5 10245 89.75 24.25 554542 670.943 44.3 1956 2000 19 BD AS Brahmaputra IND Bahadurabad

We need only several values to work with:

In [7]:
df_data = df[[' lonm', 'latm',  'Vol', 'm2s_ra']]

We don't need missing values:

In [8]:
df_data = df_data.dropna()

And convert values to floats:

In [9]:
df_data['Vol'] = df_data['Vol'].astype('float32')
df_data['lonm'] = df_data[' lonm'].astype('float32')
df_data['latm'] = df_data['latm'].astype('float32')
df_data['m2s_ra'] = df_data['m2s_ra'].astype('float32')

Now plot the map:

In [7]:
from mpl_toolkits.basemap import Basemap
import as cm
In [11]:
m = Basemap(projection='robin',lon_0=0,resolution='c')
x, y = m(df_data['lonm'][0:50], df_data['latm'][0:50])
In [16]:

<matplotlib.collections.PathCollection at 0xf5e822c>

This is discharge of first 50 rivers from Dai and Trenberth data set. Let's add some salinity.

In order to do so I first generated subset of the AQUARIUS sea surface salinity data. The data set is very short, so I took only one year (2012). I also need only annual mean, and had to generate it by myself with use of cdo. At this point I encountered some probelms. The output is in netCDF4, that is still not very abundant, and soft for handling netCDF files in standard linux distributions (like cdo, or is still compiled without netCDF4 support. This is a bit annoying, and it would be nice to be able to choose between netCDF3 and 4.

The file with annual mean sea surface salinity for 2012 is available from here and it's nerCDF3. I use Nio to open it, but it should be very similar to

In [3]:
import Nio
In [4]:
ff = Nio.open_file('/home/magik/NOTEBOOKS/marine/')
sal = ff.variables['sea_water.salinity'][:]
lat = ff.variables['lat'][:]
lon = ff.variables['lon'][:]
In [5]:
lon, lat = np.meshgrid(lon, lat)
In [20]:
import as cm
In [21]:
m2 = Basemap(projection='robin',lon_0=0,resolution='c')
x2, y2 = m2(lon,lat)
In [23]:

cs = m.contourf(x2,y2,sal[0,0,:,:], levels=np.arange(32,37.7,0.1), cmap=cm.coolwarm,  vmin=32, vmax=37.5 )
m2.scatter(x,y,s=np.sqrt(df_data['Vol'][0:50]*(df_data['m2s_ra'][0:50]/10000))*10,c='gray',marker='o',zorder=4, cmap=cm.Paired,alpha=0.5)
cbar = m2.colorbar(cs,location='bottom',pad="5%")

One can see that there is some reduced salinity, however it is located not to the east of the Amazon river mouth, but more to the north of it. Let's have a closer look:

In [24]:
m3 = Basemap(projection='merc',llcrnrlat=-20,urcrnrlat=20,\
x3, y3 = m3(lon,lat)
In [25]:

cs = m3.contourf(x3,y3,sal[0,0,:,:], levels=np.arange(32,37.5,0.1), cmap=cm.coolwarm,  vmin=32, vmax=38.5 )
cbar = m3.colorbar(cs,location='bottom',pad="5%")

The fresh water plume is obviously going to the north, and then at some point partly turn to the east. There is also some signature of big rivers from the western coast of the African continent. However we will concentrate on the Amazon river plume. What is causing the shape of it's surface distribution? Maybe winds?

In order to test this I create data set with NCDC sea winds for 2012, and calculate annual mean (available from here).

In [5]:
fw = Nio.open_file('/home/magik/NOTEBOOKS/marine/', 'r')

In order to be able to plot vectors with Basemap we have to flip our data (lat should be monotonically increasing):

In [11]:
latw = np.flipud(fw.variables['lat'][:])
lonw = fw.variables['lon'][:]
uw = np.flipud(fw.variables['wind.vector_u'][0,0,:,:])
vw = np.flipud(fw.variables['wind.vector_v'][0,0,:,:])
In [14]:
m4 = Basemap(projection='merc',llcrnrlat=-10,urcrnrlat=20,\
x4, y4 = m4(lon,lat)
In [15]:
uwproj,vwproj,xxw,yyw = \
In [16]:

cs = m4.contourf(x4,y4,sal[0,0,:,:], levels=np.arange(32,37.5,0.1), cmap=cm.coolwarm,  vmin=32, vmax=38.5 )
Q = m4.quiver(xxw,yyw,uwproj,vwproj,scale=200, width=0.002)
qk = plt.quiverkey(Q, 0.1, 0.1, 10, '10 m/s', labelpos='W')
#colorbar(orientation = 'horizontal')
cbar = m4.colorbar(cs,location='bottom',pad="5%")

The mean wind is classical, like from the text books, and can't lead to such distribution. This was strange idea from the beginning, but sometimes it worth to test even strange ideas :)

So, what about ocean currents, how do they look like in this area?

Data set that we will use is here and time mean for 2012 is located here.

In [8]:
fc = Nio.open_file('/home/magik/NOTEBOOKS/marine/', 'r')
In [10]:
lats = np.flipud(fc.variables['lat'][:])
lons = fc.variables['lon'][:]
u = np.flipud(fc.variables['sea_water.current.vector_u'][0,0,:,:])
v = np.flipud(fc.variables['sea_water.current.vector_v'][0,0,:,:])
In [11]:
lonuv, latuv = np.meshgrid(lons,lats)
In [12]:
m5 = Basemap(projection='merc',llcrnrlat=-10,urcrnrlat=20,\
x5, y5 = m5(lon,lat)
In [13]:
uproj,vproj,xx,yy = \
In [15]:

cs = m5.contourf(x5,y5,sal[0,0,:,:], levels=np.arange(32,37.5,0.1), cmap=cm.coolwarm,  vmin=32, vmax=38.5 )
Q = m5.quiver(xx,yy,uproj,vproj,scale=10, width=0.0012)
qk = plt.quiverkey(Q, 0.1, 0.1, 0.1, '0.1 m/s', labelpos='W')
#colorbar(orientation = 'horizontal')
cbar = m5.colorbar(cs,location='bottom',pad="5%")

The configuration of surface ocean currents match distribution of Amazon fresh water plume very well. Now one can look at the monthly and daily data in order to reveal more details about evolution of the plume.

This is a kind of analysis that can be done very quickly in case of easy availability of different data sets that Marinexplore provide.

In [ ]:

Comments !