Forked from copernicusmarinegist/convert-netcdf-to-csv.ipynb
Created
July 10, 2022 15:39
-
-
Save ashok-arjun/e8c535885ceae7204bc67ecbd8bd889a to your computer and use it in GitHub Desktop.
Python Notebook Example on how to convert a netcdf file to csv file
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
} | |
}, | |
"source": [ | |
"# Data conversion (from NetCDF to CSV)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"In this chapter, we'll see **how to convert** datapoints of several years of Ocean Sea Surface Temperature from **netCDF format (.nc)** to **tabular format (.csv)**.\n", | |
"\n", | |
"Some users are also interested in converting Kelvin Temperature to Celcius Degree, that is why there will be an example at the end.\n", | |
"\n", | |
"#### Tip\n", | |
"<hr>\n", | |
"- To get the **netCDF files** on your local hard drive, follow the [previous chapter](http://localhost:8888/notebooks/sf_Temp/copernicus-marine-motu-fetcher.ipynb#Install-Copernicus-Marine-Environment).\n", | |
"\n", | |
"\n", | |
"- If you aren’t familiar with **this data format**, the [netCDF FAQ](http://www.unidata.ucar.edu/software/netcdf/docs/faq.html#What-Is-netCDF) is a good place to start. \n", | |
"\n", | |
"\n", | |
"- A HowTo on how to **manipulate** netCDF files content with **Python** is available [here]()." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
} | |
}, | |
"source": [ | |
"## Summary\n", | |
"\n", | |
"- #### working directories setup\n", | |
"- #### xarray install (via pip)\n", | |
"- #### xarray usage to convert netcdf to csv\n", | |
" - #### One-file-conversion-(specifying-a-filename)\n", | |
" - #### Several-files-conversion-(specifying-a-directory)\n", | |
"- #### pandas usage to clean/minize csv (without continent values)\n", | |
"- #### manipulate [pandas](https://github.com/pydata/pandas) DataFrames within notebook - [Qgrid](http://qgrid.readthedocs.io/en/widget-guidelines/#)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
} | |
}, | |
"source": [ | |
"## working directories setup" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"csv netcdf raw-data\r\n" | |
] | |
} | |
], | |
"source": [ | |
"# List the content of the directory including Copernicus Marine Data (.nc, .png, .geotiff, .xml, pdf etc...)\n", | |
"!ls /home/mercator/copernicus-marine-data/" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": { | |
"collapsed": true, | |
"slideshow": { | |
"slide_type": "-" | |
} | |
}, | |
"outputs": [], | |
"source": [ | |
"# Replace 'local_storage_directory', 'netcdf_dir' and 'csv_dir' by respectively\n", | |
"# the directory path to Copernicus Marine data, the directory path to netcdf files\n", | |
"# and the directory path to csv files\n", | |
"local_storage_directory = '/home/mercator/copernicus-marine-data/'\n", | |
"netcdf_dir = local_storage_directory + 'netcdf/'\n", | |
"csv_dir = local_storage_directory + 'csv/'" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
} | |
}, | |
"source": [ | |
"## xarray install (via pip)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"source": [ | |
"\n", | |
"\n", | |
"#### __Important__\n", | |
"<hr>\n", | |
"\n", | |
"The **bash command (pip)** should be executed **only once** (to get/download the python module in the system), **not everytime you open this notebook**.\n", | |
"\n", | |
"<br>\n", | |
"If you **already have** this module, you can **go to** the **next slide** (simply import it)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true, | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"outputs": [], | |
"source": [ | |
"%%bash\n", | |
"pip install xarray" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
} | |
}, | |
"source": [ | |
"## xarray usage to convert netcdf to csv" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"source": [ | |
"Import module(s) should be executed only once per kernel execution." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": { | |
"collapsed": true, | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"outputs": [], | |
"source": [ | |
"import xarray as xr\n", | |
"import os" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
} | |
}, | |
"source": [ | |
"- ### One file conversion (specifying a filename)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "subslide" | |
} | |
}, | |
"source": [ | |
"We set some variables to get the **input netcdf_file_name**, its **path** and the **output csv_file_name ** and **path**." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": { | |
"collapsed": true, | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"outputs": [], | |
"source": [ | |
"# Replace the file name in quote by the file name of the netcdf file (.nc) you want to convert to csv\n", | |
"netcdf_file_name = 'CMEMS_Yearly_Temperature_Dailymean_010_011_2000.nc'" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": { | |
"collapsed": true, | |
"slideshow": { | |
"slide_type": "-" | |
} | |
}, | |
"outputs": [], | |
"source": [ | |
"# Set variables names for the input file.nc (netcdf_file_in) and the output file.csv (`csv_file_out`)\n", | |
"netcdf_file_in = netcdf_dir + netcdf_file_name\n", | |
"csv_file_out = csv_dir + netcdf_file_name[:-3] + '.csv'" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "subslide" | |
} | |
}, | |
"source": [ | |
"We **open** the netcdf file (using `open_dataset()` method), **convert** it to a dataframe (`to_dataframe()` method) and **write** this object to a **csv file** (`to_csv()` method)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": { | |
"collapsed": true, | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"outputs": [], | |
"source": [ | |
"ds = xr.open_dataset(netcdf_file_in)\n", | |
"df = ds.to_dataframe()\n", | |
"\n", | |
"df.to_csv(csv_file_out)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": { | |
"scrolled": false, | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"lat,lon,time,analysed_sst\r\n", | |
"39.6749992371,15.6750001907,2000-01-01 12:00:00,288.789993547\r\n", | |
"39.6749992371,15.6750001907,2000-01-02 12:00:00,288.759993548\r\n", | |
"39.6749992371,15.6750001907,2000-01-03 12:00:00,288.569993552\r\n", | |
"39.6749992371,15.6750001907,2000-01-04 12:00:00,288.589993551\r\n", | |
"39.6749992371,15.6750001907,2000-01-05 12:00:00,288.529993553\r\n", | |
"39.6749992371,15.6750001907,2000-01-06 12:00:00,288.549993552\r\n", | |
"39.6749992371,15.6750001907,2000-01-07 12:00:00,288.329993557\r\n", | |
"39.6749992371,15.6750001907,2000-01-08 12:00:00,288.299993558\r\n", | |
"39.6749992371,15.6750001907,2000-01-09 12:00:00,288.229993559\r\n" | |
] | |
} | |
], | |
"source": [ | |
"!head $csv_file_out" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "subslide" | |
} | |
}, | |
"source": [ | |
"\n", | |
"\n", | |
"### One File vs Directory conversion\n", | |
"<hr>\n", | |
"\n", | |
"It's great for one netcdf file, but what if we want to convert a **LIST OF FILES** located in a specific directory (e.g. raw-data) " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
} | |
}, | |
"source": [ | |
"- ### Several files conversion (specifying a directory)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": { | |
"slideshow": { | |
"slide_type": "subslide" | |
} | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"CMEMS_Yearly_Temperature_Dailymean_010_011_2000.nc\r\n", | |
"CMEMS_Yearly_Temperature_Dailymean_010_011_2001.nc\r\n", | |
"CMEMS_Yearly_Temperature_Dailymean_010_011_2002.nc\r\n", | |
"CMEMS_Yearly_Temperature_Dailymean_010_011_2003.nc\r\n", | |
"CMEMS_Yearly_Temperature_Dailymean_010_011_2004.nc\r\n" | |
] | |
} | |
], | |
"source": [ | |
"ls $local_storage_directory/raw-data/" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": { | |
"scrolled": false, | |
"slideshow": { | |
"slide_type": "-" | |
} | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"total 12M\r\n", | |
"-rw-rw-r-- 1 mercator mercator 2.4M Dec 2 14:36 CMEMS_Yearly_Temperature_Dailymean_010_011_2000.nc\r\n", | |
"-rw-rw-r-- 1 mercator mercator 2.4M Dec 2 14:37 CMEMS_Yearly_Temperature_Dailymean_010_011_2001.nc\r\n", | |
"-rw-rw-r-- 1 mercator mercator 2.4M Dec 2 14:37 CMEMS_Yearly_Temperature_Dailymean_010_011_2002.nc\r\n", | |
"-rw-rw-r-- 1 mercator mercator 2.4M Dec 2 14:37 CMEMS_Yearly_Temperature_Dailymean_010_011_2003.nc\r\n", | |
"-rw-rw-r-- 1 mercator mercator 2.4M Dec 2 14:37 CMEMS_Yearly_Temperature_Dailymean_010_011_2004.nc\r\n" | |
] | |
} | |
], | |
"source": [ | |
"# Set a new directory path where the list of netcdf files (.nc) is stored.\n", | |
"# Path should end with a 'slash (/)'\n", | |
"files_to_convert = local_storage_directory + 'raw-data/'\n", | |
"!ls -lh $files_to_convert" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": { | |
"slideshow": { | |
"slide_type": "-" | |
} | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"CMEMS_Yearly_Temperature_Dailymean_010_011_2000.nc has been processed to .csv\n", | |
"CMEMS_Yearly_Temperature_Dailymean_010_011_2001.nc has been processed to .csv\n", | |
"CMEMS_Yearly_Temperature_Dailymean_010_011_2002.nc has been processed to .csv\n", | |
"CMEMS_Yearly_Temperature_Dailymean_010_011_2003.nc has been processed to .csv\n", | |
"CMEMS_Yearly_Temperature_Dailymean_010_011_2004.nc has been processed to .csv\n" | |
] | |
} | |
], | |
"source": [ | |
"# Set a loop to create a .csv file for each .nc file listed in `files_to_convert`\n", | |
"for filename in os.listdir(files_to_convert):\n", | |
" ds = xr.open_dataset(files_to_convert + filename)\n", | |
" df = ds.to_dataframe()\n", | |
" df.to_csv(csv_dir + filename[:-3] + '.csv')\n", | |
" print filename + ' has been processed to .csv'" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": { | |
"slideshow": { | |
"slide_type": "subslide" | |
} | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"total 317M\r\n", | |
"drwxrwxr-x 2 mercator mercator 6 Dec 2 23:03 monthly\r\n", | |
"drwxrwxr-x 2 mercator mercator 4.0K Dec 3 02:46 yearly\r\n", | |
"-rw-rw-r-- 1 mercator mercator 64M Dec 8 17:28 CMEMS_Yearly_Temperature_Dailymean_010_011_2000.csv\r\n", | |
"-rw-rw-r-- 1 mercator mercator 64M Dec 8 17:28 CMEMS_Yearly_Temperature_Dailymean_010_011_2001.csv\r\n", | |
"-rw-rw-r-- 1 mercator mercator 64M Dec 8 17:28 CMEMS_Yearly_Temperature_Dailymean_010_011_2002.csv\r\n", | |
"-rw-rw-r-- 1 mercator mercator 64M Dec 8 17:28 CMEMS_Yearly_Temperature_Dailymean_010_011_2003.csv\r\n", | |
"-rw-rw-r-- 1 mercator mercator 64M Dec 8 17:28 CMEMS_Yearly_Temperature_Dailymean_010_011_2004.csv\r\n" | |
] | |
} | |
], | |
"source": [ | |
"!ls -lrth $csv_dir" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": { | |
"scrolled": true, | |
"slideshow": { | |
"slide_type": "subslide" | |
} | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" analysed_sst\n", | |
"lat lon time \n", | |
"15.025 38.974998 2004-01-01 12:00:00 NaN\n", | |
" 2004-01-02 12:00:00 NaN\n", | |
" 2004-01-03 12:00:00 NaN\n", | |
" 2004-01-04 12:00:00 NaN\n", | |
" 2004-01-05 12:00:00 NaN\n", | |
" 2004-01-06 12:00:00 NaN\n", | |
" 2004-01-07 12:00:00 NaN\n", | |
" 2004-01-08 12:00:00 NaN\n", | |
" 2004-01-09 12:00:00 NaN\n", | |
" 2004-01-10 12:00:00 NaN\n", | |
" 2004-01-11 12:00:00 NaN\n", | |
" 2004-01-12 12:00:00 NaN\n", | |
" 2004-01-13 12:00:00 NaN\n", | |
" 2004-01-14 12:00:00 NaN\n", | |
" 2004-01-15 12:00:00 NaN\n", | |
" 2004-01-16 12:00:00 NaN\n", | |
" 2004-01-17 12:00:00 NaN\n", | |
" 2004-01-18 12:00:00 NaN\n", | |
" 2004-01-19 12:00:00 NaN\n", | |
" 2004-01-20 12:00:00 NaN\n", | |
" 2004-01-21 12:00:00 NaN\n", | |
" 2004-01-22 12:00:00 NaN\n", | |
" 2004-01-23 12:00:00 NaN\n", | |
" 2004-01-24 12:00:00 NaN\n", | |
" 2004-01-25 12:00:00 NaN\n", | |
" 2004-01-26 12:00:00 NaN\n", | |
" 2004-01-27 12:00:00 NaN\n", | |
" 2004-01-28 12:00:00 NaN\n", | |
" 2004-01-29 12:00:00 NaN\n", | |
" 2004-01-30 12:00:00 NaN\n", | |
"... ...\n", | |
"22.875 39.974998 2004-12-02 12:00:00 NaN\n", | |
" 2004-12-03 12:00:00 NaN\n", | |
" 2004-12-04 12:00:00 NaN\n", | |
" 2004-12-05 12:00:00 NaN\n", | |
" 2004-12-06 12:00:00 NaN\n", | |
" 2004-12-07 12:00:00 NaN\n", | |
" 2004-12-08 12:00:00 NaN\n", | |
" 2004-12-09 12:00:00 NaN\n", | |
" 2004-12-10 12:00:00 NaN\n", | |
" 2004-12-11 12:00:00 NaN\n", | |
" 2004-12-12 12:00:00 NaN\n", | |
" 2004-12-13 12:00:00 NaN\n", | |
" 2004-12-14 12:00:00 NaN\n", | |
" 2004-12-15 12:00:00 NaN\n", | |
" 2004-12-16 12:00:00 NaN\n", | |
" 2004-12-17 12:00:00 NaN\n", | |
" 2004-12-18 12:00:00 NaN\n", | |
" 2004-12-19 12:00:00 NaN\n", | |
" 2004-12-20 12:00:00 NaN\n", | |
" 2004-12-21 12:00:00 NaN\n", | |
" 2004-12-22 12:00:00 NaN\n", | |
" 2004-12-23 12:00:00 NaN\n", | |
" 2004-12-24 12:00:00 NaN\n", | |
" 2004-12-25 12:00:00 NaN\n", | |
" 2004-12-26 12:00:00 NaN\n", | |
" 2004-12-27 12:00:00 NaN\n", | |
" 2004-12-28 12:00:00 NaN\n", | |
" 2004-12-29 12:00:00 NaN\n", | |
" 2004-12-30 12:00:00 NaN\n", | |
" 2004-12-31 12:00:00 NaN\n", | |
"\n", | |
"[1214388 rows x 1 columns]\n" | |
] | |
} | |
], | |
"source": [ | |
"print df" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "subslide" | |
} | |
}, | |
"source": [ | |
"### What do we see?\n", | |
"<hr>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"collapsed": true, | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"source": [ | |
"**.csv files** we created contains:" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"source": [ | |
"- __4 columns__ for **Latitude (Lat)**, **Longitude (Lon)**, **Time (Date)** and **Sea Surface Temperature (SST)**" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"source": [ | |
"- __1 211 010 rows__ corresponding to the **value of the Ocean Sea Surface Temperature (SST)** for **each point (Lat, Lon)** of the bounding box from **January 2000** to **December 2004 (Time)**." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"source": [ | |
"- A large numbers of __Empty cells__ (grip point without SST values) are displayed as **'NaN'**" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "subslide" | |
} | |
}, | |
"source": [ | |
"### Polygon: Valid points located in the dataset grid\n", | |
"<hr>\n", | |
"\n", | |
"\n", | |
"- What if we want to get a **.csv file** containing SST values located on the **Ocean (exclusively)**, not Land **(without continents)**?\n", | |
"\n", | |
"\n", | |
"- **Important**: CSV reader softwares **may be limited** in the **maximum total number of rows** displayable (e.g. **microsoft excel** can natively display csv files containing **up to 1 048 576 rows**)." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
} | |
}, | |
"source": [ | |
"## Pandas usage to clean/minimize .csv files (e.g. without continent values)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": { | |
"collapsed": true, | |
"slideshow": { | |
"slide_type": "subslide" | |
} | |
}, | |
"outputs": [], | |
"source": [ | |
"import pandas as pd\n", | |
"import os" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"pandas module will be used on the previous created .csv files (e.g. '{{netcdf_file_name[:-3] + '.csv'}}'), which means that cells can be executed as is.\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": { | |
"collapsed": true, | |
"slideshow": { | |
"slide_type": "subslide" | |
} | |
}, | |
"outputs": [], | |
"source": [ | |
"csv_uncleaned_in = netcdf_file_name[:-3] + '.csv'\n", | |
"csv_cleaned_out = csv_uncleaned_in[:-4] + '_cleaned.csv'" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "-" | |
} | |
}, | |
"source": [ | |
"The .csv file is loaded in a pandas dataframe with the `read_csv()` method." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": { | |
"collapsed": true, | |
"scrolled": true, | |
"slideshow": { | |
"slide_type": "-" | |
} | |
}, | |
"outputs": [], | |
"source": [ | |
"data = pd.read_csv(csv_dir + csv_uncleaned_in)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "subslide" | |
} | |
}, | |
"source": [ | |
"Use of the **'dropna()'** function to remove rows including cells which do not contain SST values." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": { | |
"scrolled": true, | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" lat lon time analysed_sst\n", | |
"12810 15.075 39.674999 2000-01-01 12:00:00 300.209993\n", | |
"12811 15.075 39.674999 2000-01-02 12:00:00 300.199993\n", | |
"12812 15.075 39.674999 2000-01-03 12:00:00 300.209993\n", | |
"12813 15.075 39.674999 2000-01-04 12:00:00 300.309993\n", | |
"12814 15.075 39.674999 2000-01-05 12:00:00 300.369993\n", | |
"12815 15.075 39.674999 2000-01-06 12:00:00 300.399993\n", | |
"12816 15.075 39.674999 2000-01-07 12:00:00 300.389993\n", | |
"12817 15.075 39.674999 2000-01-08 12:00:00 300.389993\n", | |
"12818 15.075 39.674999 2000-01-09 12:00:00 300.249993\n", | |
"12819 15.075 39.674999 2000-01-10 12:00:00 300.209993\n", | |
"12820 15.075 39.674999 2000-01-11 12:00:00 300.289993\n", | |
"12821 15.075 39.674999 2000-01-12 12:00:00 300.229993\n", | |
"12822 15.075 39.674999 2000-01-13 12:00:00 300.329993\n", | |
"12823 15.075 39.674999 2000-01-14 12:00:00 300.379993\n", | |
"12824 15.075 39.674999 2000-01-15 12:00:00 300.309993\n", | |
"12825 15.075 39.674999 2000-01-16 12:00:00 300.199993\n", | |
"12826 15.075 39.674999 2000-01-17 12:00:00 300.039993\n", | |
"12827 15.075 39.674999 2000-01-18 12:00:00 300.029993\n", | |
"12828 15.075 39.674999 2000-01-19 12:00:00 299.919993\n", | |
"12829 15.075 39.674999 2000-01-20 12:00:00 299.909993\n", | |
"12830 15.075 39.674999 2000-01-21 12:00:00 299.989993\n", | |
"12831 15.075 39.674999 2000-01-22 12:00:00 300.049993\n", | |
"12832 15.075 39.674999 2000-01-23 12:00:00 300.059993\n", | |
"12833 15.075 39.674999 2000-01-24 12:00:00 300.069993\n", | |
"12834 15.075 39.674999 2000-01-25 12:00:00 299.979993\n", | |
"12835 15.075 39.674999 2000-01-26 12:00:00 300.049993\n", | |
"12836 15.075 39.674999 2000-01-27 12:00:00 299.699993\n", | |
"12837 15.075 39.674999 2000-01-28 12:00:00 299.589993\n", | |
"12838 15.075 39.674999 2000-01-29 12:00:00 299.559993\n", | |
"12839 15.075 39.674999 2000-01-30 12:00:00 299.629993\n", | |
"... ... ... ... ...\n", | |
"1207038 22.875 38.974998 2000-12-02 12:00:00 300.369993\n", | |
"1207039 22.875 38.974998 2000-12-03 12:00:00 300.339993\n", | |
"1207040 22.875 38.974998 2000-12-04 12:00:00 300.389993\n", | |
"1207041 22.875 38.974998 2000-12-05 12:00:00 300.489993\n", | |
"1207042 22.875 38.974998 2000-12-06 12:00:00 300.519993\n", | |
"1207043 22.875 38.974998 2000-12-07 12:00:00 300.379993\n", | |
"1207044 22.875 38.974998 2000-12-08 12:00:00 300.259993\n", | |
"1207045 22.875 38.974998 2000-12-09 12:00:00 300.219993\n", | |
"1207046 22.875 38.974998 2000-12-10 12:00:00 300.039993\n", | |
"1207047 22.875 38.974998 2000-12-11 12:00:00 299.879993\n", | |
"1207048 22.875 38.974998 2000-12-12 12:00:00 299.559993\n", | |
"1207049 22.875 38.974998 2000-12-13 12:00:00 299.499993\n", | |
"1207050 22.875 38.974998 2000-12-14 12:00:00 299.409993\n", | |
"1207051 22.875 38.974998 2000-12-15 12:00:00 299.159993\n", | |
"1207052 22.875 38.974998 2000-12-16 12:00:00 299.159993\n", | |
"1207053 22.875 38.974998 2000-12-17 12:00:00 299.149993\n", | |
"1207054 22.875 38.974998 2000-12-18 12:00:00 299.159993\n", | |
"1207055 22.875 38.974998 2000-12-19 12:00:00 299.129993\n", | |
"1207056 22.875 38.974998 2000-12-20 12:00:00 299.089993\n", | |
"1207057 22.875 38.974998 2000-12-21 12:00:00 298.869993\n", | |
"1207058 22.875 38.974998 2000-12-22 12:00:00 298.859993\n", | |
"1207059 22.875 38.974998 2000-12-23 12:00:00 298.789993\n", | |
"1207060 22.875 38.974998 2000-12-24 12:00:00 298.699993\n", | |
"1207061 22.875 38.974998 2000-12-25 12:00:00 298.679993\n", | |
"1207062 22.875 38.974998 2000-12-26 12:00:00 298.699993\n", | |
"1207063 22.875 38.974998 2000-12-27 12:00:00 298.729993\n", | |
"1207064 22.875 38.974998 2000-12-28 12:00:00 298.689993\n", | |
"1207065 22.875 38.974998 2000-12-29 12:00:00 298.649993\n", | |
"1207066 22.875 38.974998 2000-12-30 12:00:00 298.659993\n", | |
"1207067 22.875 38.974998 2000-12-31 12:00:00 298.719993\n", | |
"\n", | |
"[798978 rows x 4 columns]\n" | |
] | |
} | |
], | |
"source": [ | |
"print data.dropna()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "subslide" | |
} | |
}, | |
"source": [ | |
"**Write** (with `data.dropna()` method) the new clean dataframe to a output **.csv** file." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 17, | |
"metadata": { | |
"collapsed": true, | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"outputs": [], | |
"source": [ | |
"data.dropna().to_csv(csv_dir + csv_cleaned_out, index = False)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
} | |
}, | |
"source": [ | |
"## TO SUM UP\n", | |
"---\n", | |
"\n", | |
"## Convert NetCDF file(s) to CSV file(s) - with option convert K to C°" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"source": [ | |
"- ### containing only valid Ocean Sea Surface Temperature (in K but converted to C°)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"source": [ | |
"- ### for the bouding box 15, 22, 39, 40\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"source": [ | |
"- ### from January 2000 to December 2004" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 18, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"# Variable name of the temperature you want to convert from Kelvin to Celcius°\n", | |
"# If no conversion is required, then set the value of 'var_name' to ''\n", | |
"# e.g. var_name = ''\n", | |
"var_name = 'analysed_sst'" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 19, | |
"metadata": { | |
"scrolled": true, | |
"slideshow": { | |
"slide_type": "subslide" | |
} | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"CMEMS_Yearly_Temperature_Dailymean_010_011_2000_cleaned_Cdegree.csv has been successfully processed - OK\n", | |
"CMEMS_Yearly_Temperature_Dailymean_010_011_2001_cleaned_Cdegree.csv has been successfully processed - OK\n", | |
"CMEMS_Yearly_Temperature_Dailymean_010_011_2002_cleaned_Cdegree.csv has been successfully processed - OK\n", | |
"CMEMS_Yearly_Temperature_Dailymean_010_011_2003_cleaned_Cdegree.csv has been successfully processed - OK\n", | |
"CMEMS_Yearly_Temperature_Dailymean_010_011_2004_cleaned_Cdegree.csv has been successfully processed - OK\n" | |
] | |
} | |
], | |
"source": [ | |
"for filename in os.listdir(files_to_convert):\n", | |
" csv_cleaned_out = filename[:-3] + '_cleaned_Cdegree.csv'\n", | |
" ds = xr.open_dataset(files_to_convert+filename)\n", | |
" df = (ds.to_dataframe()).dropna()\n", | |
" if var_name:\n", | |
" df[var_name] = df[var_name].apply(lambda x: x - 273,15)\n", | |
" df.to_csv(csv_dir + csv_cleaned_out)\n", | |
" print csv_cleaned_out + ' has been successfully processed - OK'" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let's check the content of '{{csv_cleaned_out}}'" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 20, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"lat,lon,time,analysed_sst\r\n", | |
"15.0749998093,39.6749992371,2004-01-01 12:00:00,27.4099932872\r\n", | |
"15.0749998093,39.6749992371,2004-01-02 12:00:00,27.149993293\r\n", | |
"15.0749998093,39.6749992371,2004-01-03 12:00:00,27.0899932943\r\n", | |
"15.0749998093,39.6749992371,2004-01-04 12:00:00,27.0699932948\r\n", | |
"15.0749998093,39.6749992371,2004-01-05 12:00:00,27.1599932928\r\n", | |
"15.0749998093,39.6749992371,2004-01-06 12:00:00,27.329993289\r\n", | |
"15.0749998093,39.6749992371,2004-01-07 12:00:00,27.3899932876\r\n", | |
"15.0749998093,39.6749992371,2004-01-08 12:00:00,27.4899932854\r\n", | |
"15.0749998093,39.6749992371,2004-01-09 12:00:00,27.5399932843\r\n" | |
] | |
} | |
], | |
"source": [ | |
"!head $csv_dir/$csv_cleaned_out" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 21, | |
"metadata": { | |
"slideshow": { | |
"slide_type": "subslide" | |
} | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"\n", | |
"Header (5 first lines) of CMEMS_Yearly_Temperature_Dailymean_010_011_2000_cleaned.csv\n", | |
"\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>lat</th>\n", | |
" <th>lon</th>\n", | |
" <th>time</th>\n", | |
" <th>analysed_sst</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2000-01-01 12:00:00</td>\n", | |
" <td>300.209993</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2000-01-02 12:00:00</td>\n", | |
" <td>300.199993</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2000-01-03 12:00:00</td>\n", | |
" <td>300.209993</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2000-01-04 12:00:00</td>\n", | |
" <td>300.309993</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2000-01-05 12:00:00</td>\n", | |
" <td>300.369993</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"<IPython.core.display.HTML object>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"Header (5 first lines) of CMEMS_Yearly_Temperature_Dailymean_010_011_2000_cleaned_Cdegree.csv\n", | |
"\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>lat</th>\n", | |
" <th>lon</th>\n", | |
" <th>time</th>\n", | |
" <th>analysed_sst</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2000-01-01 12:00:00</td>\n", | |
" <td>27.209993</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2000-01-02 12:00:00</td>\n", | |
" <td>27.199993</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2000-01-03 12:00:00</td>\n", | |
" <td>27.209993</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2000-01-04 12:00:00</td>\n", | |
" <td>27.309993</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2000-01-05 12:00:00</td>\n", | |
" <td>27.369993</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"<IPython.core.display.HTML object>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"Header (5 first lines) of CMEMS_Yearly_Temperature_Dailymean_010_011_2001_cleaned_Cdegree.csv\n", | |
"\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>lat</th>\n", | |
" <th>lon</th>\n", | |
" <th>time</th>\n", | |
" <th>analysed_sst</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2001-01-01 12:00:00</td>\n", | |
" <td>27.899993</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2001-01-02 12:00:00</td>\n", | |
" <td>27.869993</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2001-01-03 12:00:00</td>\n", | |
" <td>27.809993</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2001-01-04 12:00:00</td>\n", | |
" <td>27.769993</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2001-01-05 12:00:00</td>\n", | |
" <td>27.649993</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"<IPython.core.display.HTML object>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"Header (5 first lines) of CMEMS_Yearly_Temperature_Dailymean_010_011_2002_cleaned_Cdegree.csv\n", | |
"\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>lat</th>\n", | |
" <th>lon</th>\n", | |
" <th>time</th>\n", | |
" <th>analysed_sst</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2002-01-01 12:00:00</td>\n", | |
" <td>27.749993</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2002-01-02 12:00:00</td>\n", | |
" <td>27.299993</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2002-01-03 12:00:00</td>\n", | |
" <td>27.199993</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2002-01-04 12:00:00</td>\n", | |
" <td>27.239993</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2002-01-05 12:00:00</td>\n", | |
" <td>27.359993</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"<IPython.core.display.HTML object>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"Header (5 first lines) of CMEMS_Yearly_Temperature_Dailymean_010_011_2003_cleaned_Cdegree.csv\n", | |
"\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>lat</th>\n", | |
" <th>lon</th>\n", | |
" <th>time</th>\n", | |
" <th>analysed_sst</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2003-01-01 12:00:00</td>\n", | |
" <td>27.539993</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2003-01-02 12:00:00</td>\n", | |
" <td>27.629993</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2003-01-03 12:00:00</td>\n", | |
" <td>27.569993</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2003-01-04 12:00:00</td>\n", | |
" <td>27.499993</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2003-01-05 12:00:00</td>\n", | |
" <td>27.509993</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"<IPython.core.display.HTML object>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"Header (5 first lines) of CMEMS_Yearly_Temperature_Dailymean_010_011_2004_cleaned_Cdegree.csv\n", | |
"\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/html": [ | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>lat</th>\n", | |
" <th>lon</th>\n", | |
" <th>time</th>\n", | |
" <th>analysed_sst</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2004-01-01 12:00:00</td>\n", | |
" <td>27.409993</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2004-01-02 12:00:00</td>\n", | |
" <td>27.149993</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2004-01-03 12:00:00</td>\n", | |
" <td>27.089993</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2004-01-04 12:00:00</td>\n", | |
" <td>27.069993</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>15.075</td>\n", | |
" <td>39.674999</td>\n", | |
" <td>2004-01-05 12:00:00</td>\n", | |
" <td>27.159993</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>" | |
], | |
"text/plain": [ | |
"<IPython.core.display.HTML object>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"\n", | |
"\n", | |
"\n" | |
] | |
} | |
], | |
"source": [ | |
"from IPython.display import display, HTML\n", | |
"\n", | |
"for filename in os.listdir(csv_dir):\n", | |
" if 'cleaned' in filename:\n", | |
" print '\\nHeader (5 first lines) of ' + filename + '\\n'\n", | |
" df = pd.read_csv(csv_dir + filename)\n", | |
" display(HTML((df.head(5)).to_html()))\n", | |
" print \"\\n\\n\"" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "slide" | |
} | |
}, | |
"source": [ | |
"## Manipulate [pandas](https://github.com/pydata/pandas) DataFrames within notebook - Qgrid" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"source": [ | |
"\n", | |
"\n", | |
"### __Important__\n", | |
"<hr>\n", | |
"\n", | |
"The **bash command (pip)** should be executed **only once** (to get the python module in the system), **not everytime you open this notebook**.\n", | |
"\n", | |
"<br>\n", | |
"If you **already have** this module, you can **go to** the **next slide**." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true, | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"outputs": [], | |
"source": [ | |
"%%bash\n", | |
"pip install qgrid" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 22, | |
"metadata": { | |
"collapsed": true, | |
"slideshow": { | |
"slide_type": "subslide" | |
} | |
}, | |
"outputs": [], | |
"source": [ | |
"import qgrid" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 23, | |
"metadata": { | |
"scrolled": false, | |
"slideshow": { | |
"slide_type": "fragment" | |
} | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"application/vnd.jupyter.widget-view+json": { | |
"model_id": "e7b3cf4ab202464f8a143796763a7495", | |
"version_major": 2, | |
"version_minor": 0 | |
}, | |
"text/html": [ | |
"<p>Failed to display Jupyter Widget of type <code>QgridWidget</code>.</p>\n", | |
"<p>\n", | |
" If you're reading this message in the Jupyter Notebook or JupyterLab Notebook, it may mean\n", | |
" that the widgets JavaScript is still loading. If this message persists, it\n", | |
" likely means that the widgets JavaScript library is either not installed or\n", | |
" not enabled. See the <a href=\"https://ipywidgets.readthedocs.io/en/stable/user_install.html\">Jupyter\n", | |
" Widgets Documentation</a> for setup instructions.\n", | |
"</p>\n", | |
"<p>\n", | |
" If you're reading this message in another frontend (for example, a static\n", | |
" rendering on GitHub or <a href=\"https://nbviewer.jupyter.org/\">NBViewer</a>),\n", | |
" it may mean that your frontend doesn't currently support widgets.\n", | |
"</p>\n" | |
], | |
"text/plain": [ | |
"QgridWidget(grid_options={'enableTextSelectionOnCells': True, 'editable': True, 'explicitInitialization': True, 'minVisibleRows': 8, 'enableColumnReorder': False, 'defaultColumnWidth': 150, 'maxVisibleRows': 15, 'fullWidthRows': True, 'rowHeight': 28, 'autoEdit': False, 'syncColumnCellResize': True, 'forceFitColumns': True}, precision=5)" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"qgrid.show_grid(df)" | |
] | |
} | |
], | |
"metadata": { | |
"celltoolbar": "Slideshow", | |
"kernelspec": { | |
"display_name": "Python 2", | |
"language": "python", | |
"name": "python2" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 2 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython2", | |
"version": "2.7.13" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment