Other articles


  1. Calculations with arrays bigger than your memory (dask arrays)

    Task

    Process numpy arrays in parallel
    
    

    Solution

    dask

    Geophysical models get higher and higher resolutions, producing more and more data. However numpy arrays and pandas data frames only work with data that fit in to a memory. For many of us it means that before real analysis we have to somehow subsample or aggregate initial data with some heavy lifting tools (like cdo) and only then switch to convenience and beauty of python. These times might come to an end soon with introduction of dask - library that helps to parallelize computations on big chunks of data. This allows analyzing data that do not (or barely) fit in to your computer's memory as well as to utilize multiprocessing capabilities of your machine.

    read more

    There are comments.

  2. Select time ranges in multidimensional arrays with pandas

    Task

    Select specific time ranges from multidimensional arrays
    
    

    Solution

    Pandas periods

    I like pandas for very easy time handling, and would like to use similar approach when work with multidimensional arrays, for example from netCDF files. There are already some efforts to do this. However I don't need anything complicated, just select some months, years of time periods. For this I can use pandas itself and benefit from its great time indexing. Below I will show a small example of how to do this.

    Necessary imports (everything can be installed from Anaconda)

    read more

    There are comments.

  3. How to make your python code run faster

    Task:

     Make your python scripts run faster
    
    

    Solution:

    multiprocessor, cython, numba

    One of the counterarguments that you constantly hear about using python is that it is slow. This is somehow true for many cases, while most of the tools that scientist mainly use, like numpy, scipy and pandas have big chunks written in C, so they are very fast. For most of the geoscientific applications main advice would be to use vectorisation whenever possible, and avoid loops. However sometimes loops are unavoidable, and then python speed can get on to your nerves. Fortunately there are several easy ways to make your python loops faster.

    read more

    There are comments.

  4. Time series analysis with pandas. Part 2

    Task:

    continue interactive analysis of time series (AO, NAO indexes)
    
    

    Module:

    pandas

    In the previous part we looked at very basic ways of work with pandas. Here I am going to introduce couple of more advance tricks. We will use very powerful pandas IO capabilities to create time series directly from the text file, try to create seasonal means with resample and multi-year monthly means with groupby. At the end I will show how new functionality from the upcoming IPython 2.0 can be used to explore your data more efficiently with sort of a simple GUI (interact function). There might be easier or better ways to do some of the things discussed here, and I will be happy to hear about them in comments :)

    read more

    There are comments.

  5. SMOS sea ice thickness

    Task:

    Acces SMOS Sea Ice Thickness data
    
    

    Solution:

    pydap

    Sea ice thickens is one of the most important environmental variables in the Arctic, but unfortunately is one of the hardest to measure. Unlike sea ice concentration, which is measured by satellites operationally now for more than three decades, only recently we begin to obtain limited satellite sea ice thickness information from missions like ICESat, Cryosat-2 and SMOS. The later is not specifically dedicated to cryospheric applications, but turns out that its information can be used to obtain data about the thin sea ice.

    Here you can learn more about SMOS sea ice thickness project read more

    There are comments.

  6. Northern Cryosphere Metrics rendered with Colors

    Notebook file

    Author's twitter

    Arctic sea ice and snow cover and are two of the most prominent features of the cryosphere of the northern hemisphere and can be seen with a naked eye from the Moon. Measurements of sea ice area started around 1979 and snow cover a bit earlier. Both show a strong seasonal signal and area also a decline over the three decades. Even stronger is the decline calculated by a sea ice model run by the Polar Science Center, Washington. PIOMAS outputs daily sea ice volume for the same time allowing a good comparison of the three data sets. Instead of the usual line charts this notebook translates the daily data into a color range. Changes within a dataset are far better visible, while maintaining comparability.

    read more

    There are comments.

  7. Interpolation between grids with pyresample

    Task: Interpolate data from regular to curvilinear grid

    Solution: pyresample

    Following two excellent contributions on interpolation between grids by Nikolay Koldunov and Oleksandr Huziy I would like to introduce a solution using the pyresample package. I feel it is timely since pyresample does encapsulate the strategy presented by Oleksandr (which I totally support) in fewer function calls. There might also be a speed-up factor to consider for big datasets, since pyresample comes with its own implementation of KD-Trees which was tested faster than the scipy.spatial.cKDTree.

    The same data as in Nikolay's and Oleksandr's post will be used for easing comparison.

    Some necessary imports:

    read more

    There are comments.

  8. Interpolation between grids with cKDTree

    Task: Interpolate data from regular to curvilinear grid

    Solution: scipy.spatial.cKDTree

    The problem of interpolation between various grids and projections is the one that Earth and Atmospheric scientists have to deal with sooner or later, whether for data analysis or for model validation. And when this happens it is very useful to know convnient, suitable, fast algorithms and approaches. Following the post by Nikolay Koldunov about this problem, where he proposes to deal with it using interp function from basemap package, here I present the approach using cKDTree class from scipy.spatial package. Basically this object introduces an index in k-dimensional coordinate space upon creation in order to provide very efficient querying when needed.

    read more

    There are comments.

  9. Near realtime data from Arctic ice mass balance buoys

    Notebook file

    Author's twitter

    Arctic sea ice thickness is a very important information and tells a lot about the state of the floating ice sheet. Unfortunately direct measurements are rare and an area-wide assessment from the ground is too costly. Satellites can fill the gap by using freeboard as a proxy but there are still some obstacles like e.g. determining snow cover. Mass balance buoys offer a near realtime view at a few sites and show the characteristics of melting sea ice in summer and freezing in winter. This notebook accesses latest available data and plots daily thickness, temperature, snow cover and drift of the buoys.

    read more

    There are comments.

  10. Interpolation between grids with Basemap

    Task:

    Interpolate data from regular to curvilinear grid
    
    

    Solution:

    Basemap.interp function

    Unfortunately geophysical data distributed on a large variety of grids, and from time to time we have to compare our variables to each other. Often plotting a simple map is enough, but if you want to go a bit beyond qualitative comparison then you have to interpolate data from one grid to another. One of the easiest way to do this is to use basemap.interp function from Matplotlib Basemap library. Here I will show how to prepare your data and how to perform interpolation.

    Some necessary imports:

    read more

    There are comments.

  11. Analyzing whale tracks

    Dr. Roberto De Almeida

    Notebook file

    In this iPython notebook we use ocean data to look at the trajectory of a migrating whale. When traveling on the surface of the Earth one cannot take a constant heading (an angle with respect to North) to travel the shortest route from point $A$ to $B$. Instead, the heading must be constantly readjusted so that the arc of the trajectory corresponds to the intersection between the globe and a plane that passes through the center of the Earth:

    This is called Great-circle Navigation, and is done by airplanes and ships (wherever possible).

    There are also other factors that define the shortest route in time read more

    There are comments.

  12. Climatology data access with ulmo

    Task:

    easy access to climatology data 
    
    

    Solution:

    ulmo
    
    

    Notebook file

    One of the main things that bothers me most at work is data conversion. World would be a much better place for somebody like me if everybody use netCDF file format for data distribution. While situation slightly changing, and more and more organisations switch to netCDF, there are still plenty of those who distribute their data in some crazy forms.

    Would it be nice if somebody once and for all create converters for all this formats and provide a way to directly search and access data from python? Imagine - instead of spending time writing regular expressions for another converter you could watch cat videos on youtube read more

    There are comments.

  13. Time series analysis with pandas

    Task:

    analysis of several time series data (AO, NAO)
    
    

    Modules:

    pandas
    
    

    Notebook file

    Here I am going to show just some basic pandas stuff for time series analysis, as I think for the Earth Scientists it's the most interesting topic. If you find this small tutorial useful, I encourage you to watch this video, where Wes McKinney give extensive introduction to the time series data analysis with pandas.

    On the official website you can find explanation of what problems pandas solve in general, but I can tell you what problem pandas solve for me. It makes analysis and visualisation of 1D data, especially time series, MUCH faster. Before pandas working with time series in python was a pain for me, now it's fun. Ease of use stimulate in-depth exploration of the data: why wouldn't you make some additional analysis if it's just one line of code? Hope you will also find this great tool helpful and useful. So, let's begin.

    read more

    There are comments.

links

social