Task:
Show how to work with river discharge data.
Also show cople of ways to visualise this data with Basemap.
Solution
Pandas, Basemap
This notebook was originally created for Marinexplore Earth Data Challenge, in order to show how data for the submission was processed. I think it might be also interesting for those who begin to use python in geoscience, because it demonstrate couple of ways to handle csv and netCDF data and plotting capabilities of the Basemap module. There will be no extensive explanations though, moistly the code.
I want to show a small example of the work flow that is more or less typical during research process. Often you see some interesting feature in your data and want to investigate it in more detail. If you not lucky enough to work with the model data, this would require dealing with multiple data sources, and possibly multiple file formats. Having all data sets in one place in consistent format becomes very handy for this type of applications.
I begin with the visualization of river discharge data, that I did for one of my projects. It is performed in Tableau, and use Dai and Trenberth Global River Flow and Continental Discharge Dataset. This data set is also available on Marinexplore.
from IPython.display import HTML
HTML('')
This visualization is interactive, so if you choose several rivers on the map it will show a bar chart with distribution of river discharges.
There is an intresting story related to this visualization. After my first attempt to plot the data it was obvious that something is wrong - Niger had discarge close to the Amazon, and there were also cople of rivers in the top that I never heard of. I check my scripts, they were fine, so the problem was with the original data set. I wrote to the autors and now the error is fixed, and there is red letter warning on the Dai and Trenberth dataset website. Funny thing is, that for four years, that this data set is used nobody notices the error, and it took only one simple visualization to find it. So visualization might serve as simple, but very efficient way to check your data for errors.
I also want to show some simple statistics about my data, so I create another visualization with only 20 biggest rivers:
HTML('')
Look at Amazon. I knew that it's a biggest river in the world, but I never though it's domination is so overwhelming. This huge river discharge should be visible somehow in salinity field of the ocean. In order to test this I decide to look at satellite salinity data.
%pylab inline
import pandas as pd
df = pd.read_csv('/home/magik/NOTEBOOKS/marine/rivers2.csv', sep='|', na_values=[' Null', ' Null'])
Here is our data in pandas DataFrame:
df.head()
We need only several values to work with:
df_data = df[[' lonm', 'latm', 'Vol', 'm2s_ra']]
We don't need missing values:
df_data = df_data.dropna()
And convert values to floats:
df_data['Vol'] = df_data['Vol'].astype('float32')
df_data['lonm'] = df_data[' lonm'].astype('float32')
df_data['latm'] = df_data['latm'].astype('float32')
df_data['m2s_ra'] = df_data['m2s_ra'].astype('float32')
Now plot the map:
from mpl_toolkits.basemap import Basemap
import matplotlib.cm as cm
m = Basemap(projection='robin',lon_0=0,resolution='c')
x, y = m(df_data['lonm'][0:50], df_data['latm'][0:50])
figure(figsize=(10,10))
m.drawcoastlines(linewidth=0.25)
m.drawcountries(linewidth=0.25)
m.fillcontinents(color='#eeefff',lake_color='white')
m.drawmapboundary(fill_color='white')
m.drawmeridians(np.arange(0,360,30))
m.drawparallels(np.arange(-90,90,30))
m.scatter(x,y,s=np.sqrt(df_data['Vol'][0:50]*(df_data['m2s_ra'][0:50]/10000))*10,c='gray',marker='o',zorder=4,alpha=0.5)
This is discharge of first 50 rivers from Dai and Trenberth data set. Let's add some salinity.
In order to do so I first generated subset of the AQUARIUS sea surface salinity data. The data set is very short, so I took only one year (2012). I also need only annual mean, and had to generate it by myself with use of cdo. At this point I encountered some probelms. The output is in netCDF4, that is still not very abundant, and soft for handling netCDF files in standard linux distributions (like cdo, or scipy.io.netcdf) is still compiled without netCDF4 support. This is a bit annoying, and it would be nice to be able to choose between netCDF3 and 4.
The file with annual mean sea surface salinity for 2012 is available from here and it's nerCDF3. I use Nio to open it, but it should be very similar to scipy.io.netcdf.
import Nio
ff = Nio.open_file('/home/magik/NOTEBOOKS/marine/AQUARIUS.L3.7DAY_tm.nc')
sal = ff.variables['sea_water.salinity'][:]
lat = ff.variables['lat'][:]
lon = ff.variables['lon'][:]
lon, lat = np.meshgrid(lon, lat)
import matplotlib.cm as cm
m2 = Basemap(projection='robin',lon_0=0,resolution='c')
x2, y2 = m2(lon,lat)
figure(figsize=(10,10))
m2.drawcoastlines(linewidth=0.25)
m2.drawcountries(linewidth=0.25)
m2.fillcontinents(color='#eeefff',lake_color='white')
m2.drawmapboundary(fill_color='white')
m2.drawmeridians(np.arange(0,360,30))
m2.drawparallels(np.arange(-90,90,30))
cs = m.contourf(x2,y2,sal[0,0,:,:], levels=np.arange(32,37.7,0.1), cmap=cm.coolwarm, vmin=32, vmax=37.5 )
m2.scatter(x,y,s=np.sqrt(df_data['Vol'][0:50]*(df_data['m2s_ra'][0:50]/10000))*10,c='gray',marker='o',zorder=4, cmap=cm.Paired,alpha=0.5)
cbar = m2.colorbar(cs,location='bottom',pad="5%")
cbar.set_label('psu')
One can see that there is some reduced salinity, however it is located not to the east of the Amazon river mouth, but more to the north of it. Let's have a closer look:
m3 = Basemap(projection='merc',llcrnrlat=-20,urcrnrlat=20,\
llcrnrlon=-80,urcrnrlon=30,lat_ts=20,resolution='c')
x3, y3 = m3(lon,lat)
figure(figsize=(10,10))
m3.drawcoastlines(linewidth=0.25)
m3.drawcountries(linewidth=0.25)
m3.fillcontinents(color='gray',lake_color='aqua')
m3.drawmapboundary(fill_color='white')
m3.drawmeridians(np.arange(0,360,30))
m3.drawparallels(np.arange(-90,90,30))
cs = m3.contourf(x3,y3,sal[0,0,:,:], levels=np.arange(32,37.5,0.1), cmap=cm.coolwarm, vmin=32, vmax=38.5 )
cbar = m3.colorbar(cs,location='bottom',pad="5%")
cbar.set_label('psu')
The fresh water plume is obviously going to the north, and then at some point partly turn to the east. There is also some signature of big rivers from the western coast of the African continent. However we will concentrate on the Amazon river plume. What is causing the shape of it's surface distribution? Maybe winds?
In order to test this I create data set with NCDC sea winds for 2012, and calculate annual mean (available from here).
fw = Nio.open_file('/home/magik/NOTEBOOKS/marine/NOAA.NCDC.SEAWINDS.RT.DAILY_tm.nc', 'r')
In order to be able to plot vectors with Basemap we have to flip our data (lat should be monotonically increasing):
latw = np.flipud(fw.variables['lat'][:])
lonw = fw.variables['lon'][:]
uw = np.flipud(fw.variables['wind.vector_u'][0,0,:,:])
vw = np.flipud(fw.variables['wind.vector_v'][0,0,:,:])
m4 = Basemap(projection='merc',llcrnrlat=-10,urcrnrlat=20,\
llcrnrlon=-80,urcrnrlon=-30,lat_ts=20,resolution='i')
x4, y4 = m4(lon,lat)
uwproj,vwproj,xxw,yyw = \
m4.transform_vector(uw,vw,lonw,latw,20,20,returnxy=True,masked=True)
figure(figsize=(10,10))
m4.drawcoastlines(linewidth=0.25)
m4.drawcountries(linewidth=0.25)
m4.fillcontinents(color='gray',lake_color='aqua')
m4.drawmapboundary(fill_color='white')
m4.drawmeridians(np.arange(0,360,30))
m4.drawparallels(np.arange(-90,90,30))
cs = m4.contourf(x4,y4,sal[0,0,:,:], levels=np.arange(32,37.5,0.1), cmap=cm.coolwarm, vmin=32, vmax=38.5 )
Q = m4.quiver(xxw,yyw,uwproj,vwproj,scale=200, width=0.002)
qk = plt.quiverkey(Q, 0.1, 0.1, 10, '10 m/s', labelpos='W')
#colorbar(orientation = 'horizontal')
cbar = m4.colorbar(cs,location='bottom',pad="5%")
cbar.set_label('psu')
The mean wind is classical, like from the text books, and can't lead to such distribution. This was strange idea from the beginning, but sometimes it worth to test even strange ideas :)
So, what about ocean currents, how do they look like in this area?
fc = Nio.open_file('/home/magik/NOTEBOOKS/marine/NOAA.OSCAR.CURRENTS_tm.nc', 'r')
lats = np.flipud(fc.variables['lat'][:])
lons = fc.variables['lon'][:]
u = np.flipud(fc.variables['sea_water.current.vector_u'][0,0,:,:])
v = np.flipud(fc.variables['sea_water.current.vector_v'][0,0,:,:])
lonuv, latuv = np.meshgrid(lons,lats)
m5 = Basemap(projection='merc',llcrnrlat=-10,urcrnrlat=20,\
llcrnrlon=-80,urcrnrlon=-30,lat_ts=20,resolution='i')
x5, y5 = m5(lon,lat)
uproj,vproj,xx,yy = \
m5.transform_vector(u,v,lons,lats,80,80,returnxy=True,masked=True)
figure(figsize=(10,10))
m5.drawcoastlines(linewidth=0.25)
m5.drawcountries(linewidth=0.25)
m5.fillcontinents(color='gray',lake_color='aqua')
m5.drawmapboundary(fill_color='white')
m5.drawmeridians(np.arange(0,360,30))
m5.drawparallels(np.arange(-90,90,30))
cs = m5.contourf(x5,y5,sal[0,0,:,:], levels=np.arange(32,37.5,0.1), cmap=cm.coolwarm, vmin=32, vmax=38.5 )
Q = m5.quiver(xx,yy,uproj,vproj,scale=10, width=0.0012)
qk = plt.quiverkey(Q, 0.1, 0.1, 0.1, '0.1 m/s', labelpos='W')
#colorbar(orientation = 'horizontal')
cbar = m5.colorbar(cs,location='bottom',pad="5%")
cbar.set_label('psu')
The configuration of surface ocean currents match distribution of Amazon fresh water plume very well. Now one can look at the monthly and daily data in order to reveal more details about evolution of the plume.
This is a kind of analysis that can be done very quickly in case of easy availability of different data sets that Marinexplore provide.
Comments !