This is part of Python for Geosciences notes.
================
Binary data¶
Open binary¶
!wget ftp://sidads.colorado.edu/pub/DATASETS/nsidc0051_gsfc_nasateam_seaice/final-gsfc/north/monthly/nt_200709_f17_v01_n.bin
Create file id:
ice = fromfile('nt_200709_f17_v01_n.bin', dtype='uint8')
We use uint8 data type. List of numpy data types
The file format consists of a 300-byte descriptive header followed by a two-dimensional array.
ice = ice[300:]
Reshape
ice = ice.reshape(448,304)
Simple visualisation of array with imshow (Matplotlib function):
imshow(ice)
colorbar()
To convert to the fractional parameter range of 0.0 to 1.0, divide the scaled data in the file by 250.
ice = ice/250.
imshow(ice)
colorbar()
Let's mask all land and missing values:
ice_masked = ma.masked_greater(ice, 1.0)
imshow(ice_masked)
colorbar()
Masking in this case is similar to using NaN in Matlab. More about NumPy masked arrays
Save binary¶
fid = open('My_ice_2007.bin', 'wb')
ice.tofile(fid)
fid.close()
In order to work with other data formats we need to use one of the SciPy submodules:
SciPy¶
General purpose scientific library (that consist of bunch of sublibraries) and builds on NumPy arrays.
- Special functions (scipy.special)
- Integration (scipy.integrate)
- Optimization (scipy.optimize)
- Interpolation (scipy.interpolate)
- Fourier Transforms (scipy.fftpack)
- Signal Processing (scipy.signal)
- Linear Algebra (scipy.linalg)
- Sparse Eigenvalue Problems (scipy.sparse)
- Statistics (scipy.stats)
- Multi-dimensional image processing (scipy.ndimage)
- File IO (scipy.io)
We are going to use only scipy.io library.
scipy.io¶
Open .mat files¶
First we have to load function that works with Matlab files:
from scipy.io import loadmat
We are going to download Polar science center Hydrographic Climatology (PHC) for January in Matlab format.
!wget https://www.dropbox.com/s/0kuzvz03gw6d393/PHC_jan.mat
Open file:
all_variables = loadmat('PHC_jan.mat')
We can look at the names of variables stored in the file:
all_variables.keys()
We need only PTEMP1 (3d potential temperature).
temp = numpy.array(all_variables['PTEMP1'])
Check variable's shape:
temp.shape
Show surface level:
imshow(temp[0,:,:])
colorbar()
Open netCDF files¶
Import nessesary function:
from scipy.io import netcdf
I am going to download NCEP reanalysis data. Surface 4 daily air temperature for 2012.
!wget ftp://ftp.cdc.noaa.gov/Datasets/ncep.reanalysis/surface/air.sig995.2012.nc
#Alternative for the times of US goverment shutdowns:
#!wget http://database.rish.kyoto-u.ac.jp/arch/ncep/data/ncep.reanalysis/surface/air.sig995.2012.nc
Create file id:
fnc = netcdf.netcdf_file('air.sig995.2012.nc')
It's not really file id, it's netcdf_file object, that have some methods and attributes:
fnc.description
fnc.history
list variables
fnc.variables
Access information about variables
air = fnc.variables['air']
This time we create netcdf_variable object, that contain among other things attributes of the netCDF variable as well as data themselves.
air.actual_range
air.long_name
air.units
print(air.add_offset)
print(air.scale_factor)
air.shape
We can access the data by simply using array syntax. Here we show first time step of our data set:
imshow(air[0,:,:])
colorbar()
Data are packed. In order to unpack we have to use scale_factor and add_offset values from attributes of the netCDF air variable. We also convert to $^{\circ}$C:
air_c = ((air[:] * air.scale_factor) + air.add_offset) - 273.15
Check the values of our new variable:
imshow(air_c[0,:,:])
colorbar()
Save netCDF file¶
Minimalistic variant :)
!rm test_netcdf.nc
fw = netcdf.netcdf_file('test_netcdf.nc', 'w')
fw.createDimension('t', 1464)
fw.createDimension('y', 73)
fw.createDimension('x', 144)
air_var = fw.createVariable( 'air','float32', ('t', 'y', 'x'))
air_var[:] = air_c[:]
fw.close()
More descriptive variant:
!rm test_netcdf.nc
fw = netcdf.netcdf_file('test_netcdf.nc', 'w')
fw.createDimension('TIME', 1464)
fw.createDimension('LATITUDE', 73)
fw.createDimension('LONGITUDE', 144)
time = fw.createVariable('TIME', 'f', ('TIME',))
time[:] = fnc.variables['time'][:]
time.units = 'hours since 1-1-1 00:00:0.0'
lat = fw.createVariable('LATITUDE', 'f', ('LATITUDE',))
lat[:] = fnc.variables['lat'][:]
lon = fw.createVariable('LONGITUDE', 'f', ('LONGITUDE',))
lon[:] = fnc.variables['lon'][:]
ha = fw.createVariable('New_air','f', ('TIME', 'LATITUDE', 'LONGITUDE'))
ha[:] = air_c[:]
ha.missing_value = -9999.
fw.close()
Comments !