Process NetCDF4 – Network Common Data Form file in Python Programming Language

NetCDF (Network Common Data Form) is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. NetCDF (network Common Data Form) is a file format for storing multidimensional scientific data (variables) such as temperature, humidity, pressure, wind speed, and direction. Each of these variables can be displayed through a dimension (such as time) by making a layer or table view from the netCDF file.

The Unidata Program Center supports and maintains netCDF programming interfaces for C, C++, Java, and Fortran. Programming interfaces are also available for Python, IDL, MATLAB, R, Ruby, and Perl.

Data in netCDF format is:

  • Self-Describing. A netCDF file includes information about the data it contains.
  • Portable. A netCDF file can be accessed by computers with different ways of storing integers, characters, and floating-point numbers.
  • Scalable. Small subsets of large datasets in various formats may be accessed efficiently through netCDF interfaces, even from remote servers.
  • Appendable. Data may be appended to a properly structured netCDF file without copying the dataset or redefining its structure.
  • Sharable. One writer and multiple readers may simultaneously access the same netCDF file.
  • Archivable. Access to all earlier forms of netCDF data will be supported by current and future versions of the software.

In this tutorial we will be using Python Iris library to process the netCDF scientific data.

Iris

Iris is a powerful, format-agnostic, community-driven Python package for analyzing and visualizing Earth science data. Iris implements a data model based on the CF conventions giving you a powerful, format-agnostic interface for working with your data. It excels when working with multi-dimensional Earth Science data, where tabular representations become unwieldy and inefficient.

CF Standard names, units, and coordinate metadata are built into Iris, giving you a rich and expressive interface for maintaining an accurate representation of your data. Its treatment of data and associated metadata as first-class objects includes:

  • visualization interface based on matplotlib and cartopy,
  • unit conversion,
  • subsetting and extraction,
  • merge and concatenate,
  • aggregations and reductions (including min, max, mean and weighted averages),
  • interpolation and re-gridding (including nearest-neighbor, linear and area-weighted), and
  • operator overloads (+, -, *, /, etc.).

Installing Iris

Iris is available using conda for the following platforms:

  • Linux 64-bit,
  • Mac OSX 64-bit, and
  • Windows 64-bit.

Windows 10 now has support for Linux distributions via WSL (Windows Subsystem for Linux). This is a great option to get started with Iris for users and developers. Be aware that we do not currently test against any WSL distributions.

Installing Using Conda

To install Iris using conda, you must first download and install conda, for example from https://docs.conda.io/en/latest/miniconda.html.

Once conda is installed, you can install Iris using conda with the following command:

conda install -c conda-forge iris

If you wish to run any of the code in the gallery you will also need the Iris sample data. This can also be installed using conda:

conda install -c conda-forge iris-sample-data

Further documentation on using conda and the features it provides can be found at https://docs.conda.io/projects/conda/en/latest/index.html.

Sample NetCDF Meteorological Data

For this example, we will be using UK Meteorological Office Weather Data. You can get the UK Met Office data from Registry of Open Data on AWS.

import boto3import loggingimport osimport ioimport sysimport csvimport irisimport numpy as npimport timefrom datetime import datetimeclass IrisNetCDFProcessor:    logger = ""    # Constructor    def __init__(self, ):        self.logger = logging.getLogger(__name__)    # Member method to process netCDF file and create various CSV files in data lake S3 bucket.    def process_netcdf_file(self, netCDFFile):        try:            file_date = netCDFFile.split('_')[3]            listOfCubes = iris.load(netCDFFile)        except Exception as e:            self.logger.exception(str(e))            self.logger.error(f"Cannot process netCDF file {netCDFFile}", e)            sys.exit(1)        for cube in listOfCubes:            try:                cubeName = cube.standard_name                self.logger.info("CubeName : {0}". format(cubeName))                if cubeName == None:                    self.logger.info("Skipping cube : {0}".format(cubeName))                    self.logger.info(cube)                    continue                self.logger.info(cube)                self.logger.info(cube.shape)                self.logger.info("Cube unit : {0}".format(cube.units))                coord_names = [coord.name() for coord in cube.coords()]                self.logger.info("Coordinates names : {0}".format(coord_names))                if str(cube.units) == "K":                    self.logger.info("Converting Kelvin to Celsius .....")                    cube.convert_units('celsius')                #                # Get dimension coordinates                #                dimCords = cube.dim_coords                dim_names = [dimension.standard_name for dimension in cube.dim_coords]                #                # Change time formay to YYYY-MM-DD %H:%M:%S standard Postgres SQL format                #                t = cube.coord('time')                self.logger.info("Time unit = {0}".format(t.units))                time_unit = t.units                time_calendar = t.units.calendar                self.logger.info("Time Calendar = {0}".format(time_calendar))                dates = t.units.num2date(t.points)                ds_list = []                new_time = ""                for date_time in dates:                    ds = date_time.strftime("%Y-%m-%d %H:%M:%S")                    ds_list.append(ds)                new_time = np.array(ds_list)                self.logger.info("New time list : {0}".format(new_time))                # Get Grid longitude which uses RotatedGeogGS coordinate system                longitudes = ""                latitudes = ""                if(len(dimCords) == 2):                    latitudes = cube.coord(dimCords[0].standard_name).points                    longitudes = cube.coord(dimCords[1].standard_name).points                elif(len(dimCords) == 3):                    latitudes = cube.coord(dimCords[1].standard_name).points                    longitudes = cube.coord(dimCords[2].standard_name).points                elif(len(dimCords) == 4):                    latitudes = cube.coord(dimCords[2].standard_name).points                    longitudes = cube.coord(dimCords[3].standard_name).points                cs = cube.coord_system()                self.logger.info("Coordinate System : {0}".format(cs))                #                # Convert RotatedGSSystem Longitude and Latitude to unrotated values                #                new_longitudes = ""                new_latitudes = ""                if(isinstance(cs,iris.cube.iris.coord_systems.RotatedGeogCS)):                    x, y = np.meshgrid(longitudes, latitudes)                    new_longitudes, new_latitudes = iris.analysis.cartography.unrotate_pole(x,                                                                                            y,                                                                                            cs.grid_north_pole_longitude,                                                                                            cs.grid_north_pole_latitude)                else:                    new_longitudes = longitudes                    new_latitudes = latitudes                # Create CSV header                csvHeaderList = []                lastColumnName = str(cube.standard_name) + "-" + str(cube.units)                if(len(dimCords) == 2):                    csvHeaderList.append('time')                    csvHeaderList.append(dimCords[0].standard_name)                    csvHeaderList.append(dimCords[1].standard_name)                if(len(dimCords) == 3):                    csvHeaderList.append(dimCords[0].standard_name)                    csvHeaderList.append(dimCords[1].standard_name)                    csvHeaderList.append(dimCords[2].standard_name)                elif(len(dimCords) == 4):                    csvHeaderList.append(dimCords[0].standard_name)                    csvHeaderList.append(dimCords[2].standard_name)                    csvHeaderList.append(dimCords[3].standard_name)                csvHeaderList.append(lastColumnName)                # Write to CSV file                currTimestamp = int(time.time())                localConvertedPath = "data/converted/"                localCSVFileName = str(cube.standard_name) + "-" + str(currTimestamp) + ".csv"                with open(localConvertedPath+localCSVFileName, 'w', newline='') as csvFile:                    csvFileWriter = csv.DictWriter(csvFile, delimiter='|', fieldnames=csvHeaderList)                    csvFileWriter.writeheader()                    if(len(dimCords) == 2):                        for i in range(cube.shape[0]):                            for j in range(cube.shape[1]):                                rowDict = {}                                rowDict['time'] = new_time[0]                                rowDict[dimCords[0].standard_name] = new_longitudes[i]                                rowDict[dimCords[1].standard_name] = new_latitudes[i]                                rowDict[lastColumnName] = cube.data[i][j]                                csvFileWriter.writerow(rowDict)                    if(len(dimCords) == 3):                        for i in range(cube.shape[0]):                            for j in range(cube.shape[1]):                                for k in range(cube.shape[2]):                                    rowDict = {}                                    rowDict[dimCords[0].standard_name] = new_time[i]                                    rowDict[dimCords[1].standard_name] = new_longitudes[j][k]                                    rowDict[dimCords[2].standard_name] = new_latitudes[j][k]                                    rowDict[lastColumnName] = cube.data[i][j][k]                                    csvFileWriter.writerow(rowDict)                    elif(len(dimCords) == 4):                        for i in range(cube.shape[0]):                            for j in range(cube.shape[1]):                                for k in range(cube.shape[2]):                                    for l in range(cube.shape[3]):                                        rowDict = {}                                        rowDict[dimCords[0].standard_name] = new_time[i]                                        rowDict[dimCords[2].standard_name] = new_longitudes[k][l]                                        rowDict[dimCords[3].standard_name] = new_latitudes[k][l]                                        rowDict[lastColumnName] = cube.data[i][j][k][l]                                        csvFileWriter.writerow(rowDict)            except Exception as e:                self.logger.exception(str(e))                self.logger.error(f"Cannot process netCDF file {netCDFFile}", e)                continue

If this was useful and helped you in your learning, please consider supporting us by subscribing to our youtube.com channel DataHackr.

Team,
DataHackr

Scroll to Top