bennyistanto · December 10, 2023 16:36
diff --git a/Propagation_Met2Hyd_Drought_Using_CCA.ipynb b/Propagation_Met2Hyd_Drought_Using_CCA.ipynb
 {
 "cells": [
  {
   "cell_type": "markdown",
   "id": "57f3fa7d-d129-4e81-b828-56e45609ce4b",
   "metadata": {},
   "source": [
    "# Drought Propagation from Meteorological to Hydrological using Cross-Correlation\n",
    "\n",
    "The notebook explain on how to analyzing the propagation of **meteorological drought** (Standardized Precipitation Index - SPI) to **hydrological drought** (Standardized Streamflow Index - SSI) using **Cross-Correlation Analysis (CCA)** at the pixel level. It required two input netCDF files, SPI - generated from `1_Steps_to_Generate_SPI_Using_CHIRPS_Data.ipynb` and SSI - generated from `2_Steps_to_Generate_SSI_Using_GloFAS-ERA5_Data.ipynb`.\n",
    "\n",
    "As background, we have SPI and SSI data which has same temporal resolution (monthly) and cover the same period (1981-2022), also has same various time scale: `1`, `2`, `3`, `6`, `9`, `12`, `18`, `24`, `36`, `48`, `60` and `72-month`. \n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c87e2d9f-3baa-4ec8-a43d-b37bacb043f2",
   "metadata": {},
   "source": [
    "The analysis **ideally** will take some process.\n",
    "\n",
    "1. **Choice of Time Scale:**\n",
    "\n",
    "    Let's start with the 3, 6, 9 and 12-month timescale. \n",
    "\n",
    "2. **Handling Seasonality:**\n",
    "\n",
    "    While splitting the dataset by season can be complex due to varying wet and dry seasons across regions, we might consider a simpler approach: analyze the entire dataset first, then conduct a separate analysis for regions with different climatic patterns. Another method is to detrend the data to remove seasonality, but this might also remove some meaningful variability related to the drought dynamics. **For simplicity, we will not do this part.**\n",
    "\n",
    "3. **Noise Filtering:**\n",
    "\n",
    "    Employing noise filtering techniques like Singular Spectrum Analysis (SSA) or a moving average can indeed help in isolating the underlying trends and patterns in our data. This step is crucial for enhancing the signal-to-noise ratio in our datasets.\n",
    "\n",
    "4. **Cross-Correlation Analysis:**\n",
    "\n",
    "    Perform the cross-correlation for each pixel across the entire time series. This will involve computing the correlation for different lags (e.g., 1 month, 2 months, up to a reasonable maximum that makes sense for our study).\n",
    "\n",
    "5. **Output and Visualization**\n",
    "\n",
    "    **Lag Map:** This will be particularly insightful, showing the spatial distribution of the lag between meteorological and hydrological droughts.\n",
    "\n",
    "    **Strength Map:** This map will be useful in understanding where the correlations are strongest, indicating regions where meteorological conditions are good predictors of hydrological conditions.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4b10af3b-7509-42a9-a1a3-2c51d70253cd",
   "metadata": {},
   "source": [
    "We will utilise some [Python](https://www.python.org/) library like `xarray`, `pandas`, `numpy`, `statsmodel`, `matplotlib`, `cartopy`, `tqdm` and other necessary tools. Assume we are working inside a dedicated Python/Conda environment, so it will not break other configuration.\n",
    "\n",
    "## 0. Working Directory\n",
    "\n",
    "For this exercise, I am working on these folder `/Temp/drought/prop/` (applied to Mac/Linux machine) or `Z:/Temp/drought/prop/` (applied to Windows machine) directory. I have some folder inside this directory:\n",
    "\n",
    "1. `01_event` # Result from converting the SPI or SSI into drought event.\n",
    "2. `02_magnitude` # Result from converting the SPI or SSI and drought event into drought magnitude (cumulative SPI or SPI value during drought event).\n",
    "3. `03_filtered` # Result from noise filtering for drought magnitude dataset.\n",
    "4. `04_correlation` # Result from cross-correlation using filtered dataset for each lag.\n",
    "5. `05_frequency` # Result from maximum/mode correlation and on which lag the maximum/mode correlation is happen.\n",
    "6. `images` # In this folder I put some screenshot file as illustration, used in this notebook.\n",
    "\n",
    "Feel free to use your own preferences for this setting/folder arrangements.\n",
    "\n",
    "This step-by-step guide was tested using Windows 11 with WSL2 - Ubuntu 22 enabled running on Thinkpad T480 2019, i7-8650U 1.9GHz, 64 GB 2400 MHz DDR4."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "baedd9a1-d76a-461a-8b66-cf1d55ae4257",
   "metadata": {},
   "source": [
    "## 1. Preprocessing\n",
    "\n",
    "The drought characteristics originally following the method proposed by Yevjevich in [1967](https://www.engr.colostate.edu/ce/facultystaff/yevjevich/papers/HydrologyPapers_n23_1967.pdf) and has been employed to recognize the feature of droughts. The paper from Le, et al in [2019](https://www.researchgate.net/publication/333171255_Space-time_variability_of_drought_over_Vietnam) provide better explanation about it: duration, severity, intensity, and interarrival.\n",
    "\n",
    "![Drought](./prop/images/drought-runtheory.png)\n",
    "\n",
    "The drought condition is set when the SPI or SSI value negative, or less than `-1.2`. Focusing on drought conditions could be a more relevant approach for our analysis compare to using all SPI and SSI data, as it has dry and wet condition. By concentrating on these periods, we can potentially gain more insight into the correlation between meteorological and hydrological droughts.\n",
    "\n",
    "**Masking for Drought Event:**\n",
    "\n",
    "Create masks for both SPI and SSI datasets where only values less than -1.2 are retained. This step isolates the drought conditions.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "6c5a6b10-a1f0-402e-a10e-6283d8996c67",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing time scale: 3 months\n",
      "  Loading SPI data from './met/05_spi/idn_cli_spi_gamma_3_month.nc'\n",
      "  Calculating drought events for SPI...\n",
      "File saved as './prop/01_event/idn_cli_drought_event_spi_3_month.nc'.\n",
      "  Loading SSI data from './hyd/08_ssi/idn_cli_ssi_gamma_3_month.nc'\n",
      "  Calculating drought events for SSI...\n",
      "File saved as './prop/01_event/idn_cli_drought_event_ssi_3_month.nc'.\n",
      "Drought event calculation completed for SPI and SSI, time scale 3 months\n",
      "Processing time scale: 6 months\n",
      "  Loading SPI data from './met/05_spi/idn_cli_spi_gamma_6_month.nc'\n",
      "  Calculating drought events for SPI...\n",
      "File saved as './prop/01_event/idn_cli_drought_event_spi_6_month.nc'.\n",
      "  Loading SSI data from './hyd/08_ssi/idn_cli_ssi_gamma_6_month.nc'\n",
      "  Calculating drought events for SSI...\n",
      "File saved as './prop/01_event/idn_cli_drought_event_ssi_6_month.nc'.\n",
      "Drought event calculation completed for SPI and SSI, time scale 6 months\n",
      "Processing time scale: 9 months\n",
      "  Loading SPI data from './met/05_spi/idn_cli_spi_gamma_9_month.nc'\n",
      "  Calculating drought events for SPI...\n",
      "File saved as './prop/01_event/idn_cli_drought_event_spi_9_month.nc'.\n",
      "  Loading SSI data from './hyd/08_ssi/idn_cli_ssi_gamma_9_month.nc'\n",
      "  Calculating drought events for SSI...\n",
      "File saved as './prop/01_event/idn_cli_drought_event_ssi_9_month.nc'.\n",
      "Drought event calculation completed for SPI and SSI, time scale 9 months\n",
      "Processing time scale: 12 months\n",
      "  Loading SPI data from './met/05_spi/idn_cli_spi_gamma_12_month.nc'\n",
      "  Calculating drought events for SPI...\n",
      "File saved as './prop/01_event/idn_cli_drought_event_spi_12_month.nc'.\n",
      "  Loading SSI data from './hyd/08_ssi/idn_cli_ssi_gamma_12_month.nc'\n",
      "  Calculating drought events for SSI...\n",
      "File saved as './prop/01_event/idn_cli_drought_event_ssi_12_month.nc'.\n",
      "Drought event calculation completed for SPI and SSI, time scale 12 months\n",
      "Completed!\n"
     ]
    }
   ],
   "source": [
    "import xarray as xr\n",
    "import os\n",
    "import numpy as np\n",
    "\n",
    "# Function to save dataset with automatic overwrite\n",
    "def save_dataset(ds, filename):\n",
    "    ds.to_netcdf(filename)\n",
    "    print(f\"File saved as '{filename}'.\")\n",
    "\n",
    "# Define folders and subset file path\n",
    "spi_folder = './met/05_spi/'\n",
    "ssi_folder = './hyd/08_ssi/'\n",
    "event_folder = './prop/01_event/'\n",
    "subset_file = './subset/idn_subset_chirps.nc'\n",
    "\n",
    "# Ensure the event folder exists\n",
    "os.makedirs(event_folder, exist_ok=True)\n",
    "\n",
    "# Load the land mask from the subset file\n",
    "land_mask = xr.open_dataset(subset_file)['land']\n",
    "\n",
    "# Define time scales\n",
    "time_scales = [3, 6, 9, 12]\n",
    "\n",
    "# Process datasets for both SPI and SSI for each time scale\n",
    "for time_scale in time_scales:\n",
    "    print(f\"Processing time scale: {time_scale} months\")\n",
    "\n",
    "    # SPI file processing\n",
    "    spi_file = spi_folder + f'idn_cli_spi_gamma_{time_scale}_month.nc'\n",
    "    print(f\"  Loading SPI data from '{spi_file}'\")\n",
    "    ds_spi = xr.open_dataset(spi_file)\n",
    "    print(\"  Calculating drought events for SPI...\")\n",
    "    ds_spi['drought_event'] = xr.where(ds_spi[f'spi_gamma_{time_scale}_month'] < -1.2, 1, 0)\n",
    "    ds_spi['drought_event'] = ds_spi['drought_event'].where(land_mask == 1, other=np.nan)\n",
    "    drought_event_spi_file = event_folder + f'idn_cli_drought_event_spi_{time_scale}_month.nc'\n",
    "    save_dataset(ds_spi, drought_event_spi_file)\n",
    "\n",
    "    # SSI file processing\n",
    "    ssi_file = ssi_folder + f'idn_cli_ssi_gamma_{time_scale}_month.nc'\n",
    "    print(f\"  Loading SSI data from '{ssi_file}'\")\n",
    "    ds_ssi = xr.open_dataset(ssi_file)\n",
    "    print(\"  Calculating drought events for SSI...\")\n",
    "    ds_ssi['drought_event'] = xr.where(ds_ssi[f'ssi_gamma_{time_scale}_month'] < -1.2, 1, 0)\n",
    "    ds_ssi['drought_event'] = ds_ssi['drought_event'].where(land_mask == 1, other=np.nan)\n",
    "    drought_event_ssi_file = event_folder + f'idn_cli_drought_event_ssi_{time_scale}_month.nc'\n",
    "    save_dataset(ds_ssi, drought_event_ssi_file)\n",
    "\n",
    "    print(f\"Drought event calculation completed for SPI and SSI, time scale {time_scale} months\")\n",
    "\n",
    "print(\"Completed!\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "946ae98b-a062-4708-a9ad-5f2a7781d7e3",
   "metadata": {},
   "source": [
    "**Calculate Drought Magnitude:**\n",
    "\n",
    "Compute the absolute cumulative values during drought events for both datasets. This gives a measure of drought magnitude, which may be more meaningful for correlation analysis than using raw SPI/SSI values.\n",
    "\n",
    "This script reads the drought event data and the absolute values, calculates the cumulative magnitude during drought periods, and writes the results to new NetCDF files."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "0ded7d04-fe20-4b57-a096-c7f741053bbe",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading land mask...\n",
      "\n",
      "Processing time scale: 3 months\n",
      "  Processing SPI data...\n",
      "    Opening drought event file: ./prop/01_event/idn_cli_drought_event_spi_3_month.nc\n",
      "    Opening SPI data file: ./met/05_spi/idn_cli_spi_gamma_3_month.nc\n",
      "    Applying land mask to drought event data...\n",
      "    Calculating drought magnitude...\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "      Processing: 100%|████████████████████████████████████████████████████████████████| 503/503 [49:36<00:00,  5.92s/ timestep]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    Create a new DataArray for drought magnitude...\n",
      "    Saving drought magnitude file: ./prop/02_magnitude/idn_cli_drought_magnitude_spi_3_month.nc\n",
      "  Drought magnitude calculations completed for SPI, time scale 3 months\n",
      "  Processing SSI data...\n",
      "    Opening drought event file: ./prop/01_event/idn_cli_drought_event_ssi_3_month.nc\n",
      "    Opening SSI data file: ./hyd/08_ssi/idn_cli_ssi_gamma_3_month.nc\n",
      "    Applying land mask to drought event data...\n",
      "    Calculating drought magnitude...\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "      Processing: 100%|████████████████████████████████████████████████████████████████| 503/503 [46:09<00:00,  5.51s/ timestep]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    Create a new DataArray for drought magnitude...\n",
      "    Saving drought magnitude file: ./prop/02_magnitude/idn_cli_drought_magnitude_ssi_3_month.nc\n",
      "  Drought magnitude calculations completed for SSI, time scale 3 months\n",
      "\n",
      "Processing time scale: 6 months\n",
      "  Processing SPI data...\n",
      "    Opening drought event file: ./prop/01_event/idn_cli_drought_event_spi_6_month.nc\n",
      "    Opening SPI data file: ./met/05_spi/idn_cli_spi_gamma_6_month.nc\n",
      "    Applying land mask to drought event data...\n",
      "    Calculating drought magnitude...\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "      Processing: 100%|████████████████████████████████████████████████████████████████| 503/503 [50:21<00:00,  6.01s/ timestep]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    Create a new DataArray for drought magnitude...\n",
      "    Saving drought magnitude file: ./prop/02_magnitude/idn_cli_drought_magnitude_spi_6_month.nc\n",
      "  Drought magnitude calculations completed for SPI, time scale 6 months\n",
      "  Processing SSI data...\n",
      "    Opening drought event file: ./prop/01_event/idn_cli_drought_event_ssi_6_month.nc\n",
      "    Opening SSI data file: ./hyd/08_ssi/idn_cli_ssi_gamma_6_month.nc\n",
      "    Applying land mask to drought event data...\n",
      "    Calculating drought magnitude...\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "      Processing: 100%|████████████████████████████████████████████████████████████████| 503/503 [46:37<00:00,  5.56s/ timestep]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    Create a new DataArray for drought magnitude...\n",
      "    Saving drought magnitude file: ./prop/02_magnitude/idn_cli_drought_magnitude_ssi_6_month.nc\n",
      "  Drought magnitude calculations completed for SSI, time scale 6 months\n",
      "\n",
      "Processing time scale: 9 months\n",
      "  Processing SPI data...\n",
      "    Opening drought event file: ./prop/01_event/idn_cli_drought_event_spi_9_month.nc\n",
      "    Opening SPI data file: ./met/05_spi/idn_cli_spi_gamma_9_month.nc\n",
      "    Applying land mask to drought event data...\n",
      "    Calculating drought magnitude...\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "      Processing: 100%|████████████████████████████████████████████████████████████████| 503/503 [50:24<00:00,  6.01s/ timestep]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    Create a new DataArray for drought magnitude...\n",
      "    Saving drought magnitude file: ./prop/02_magnitude/idn_cli_drought_magnitude_spi_9_month.nc\n",
      "  Drought magnitude calculations completed for SPI, time scale 9 months\n",
      "  Processing SSI data...\n",
      "    Opening drought event file: ./prop/01_event/idn_cli_drought_event_ssi_9_month.nc\n",
      "    Opening SSI data file: ./hyd/08_ssi/idn_cli_ssi_gamma_9_month.nc\n",
      "    Applying land mask to drought event data...\n",
      "    Calculating drought magnitude...\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "      Processing: 100%|████████████████████████████████████████████████████████████████| 503/503 [45:55<00:00,  5.48s/ timestep]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    Create a new DataArray for drought magnitude...\n",
      "    Saving drought magnitude file: ./prop/02_magnitude/idn_cli_drought_magnitude_ssi_9_month.nc\n",
      "  Drought magnitude calculations completed for SSI, time scale 9 months\n",
      "\n",
      "Processing time scale: 12 months\n",
      "  Processing SPI data...\n",
      "    Opening drought event file: ./prop/01_event/idn_cli_drought_event_spi_12_month.nc\n",
      "    Opening SPI data file: ./met/05_spi/idn_cli_spi_gamma_12_month.nc\n",
      "    Applying land mask to drought event data...\n",
      "    Calculating drought magnitude...\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "      Processing: 100%|████████████████████████████████████████████████████████████████| 503/503 [48:52<00:00,  5.83s/ timestep]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    Create a new DataArray for drought magnitude...\n",
      "    Saving drought magnitude file: ./prop/02_magnitude/idn_cli_drought_magnitude_spi_12_month.nc\n",
      "  Drought magnitude calculations completed for SPI, time scale 12 months\n",
      "  Processing SSI data...\n",
      "    Opening drought event file: ./prop/01_event/idn_cli_drought_event_ssi_12_month.nc\n",
      "    Opening SSI data file: ./hyd/08_ssi/idn_cli_ssi_gamma_12_month.nc\n",
      "    Applying land mask to drought event data...\n",
      "    Calculating drought magnitude...\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "      Processing: 100%|████████████████████████████████████████████████████████████████| 503/503 [48:17<00:00,  5.76s/ timestep]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    Create a new DataArray for drought magnitude...\n",
      "    Saving drought magnitude file: ./prop/02_magnitude/idn_cli_drought_magnitude_ssi_12_month.nc\n",
      "  Drought magnitude calculations completed for SSI, time scale 12 months\n",
      "\n",
      "All processes completed!\n"
     ]
    }
   ],
   "source": [
    "import xarray as xr\n",
    "import numpy as np\n",
    "from tqdm import tqdm\n",
    "import os\n",
    "\n",
    "# Define folders and subset file path\n",
    "spi_folder = './met/05_spi/'\n",
    "ssi_folder = './hyd/08_ssi/'\n",
    "event_folder = './prop/01_event/'\n",
    "magnitude_folder = './prop/02_magnitude/'\n",
    "subset_file = './subset/idn_subset_chirps.nc'\n",
    "\n",
    "# Ensure the magnitude folder exists\n",
    "os.makedirs(magnitude_folder, exist_ok=True)\n",
    "\n",
    "# Load the land mask from the subset file\n",
    "print(\"Loading land mask...\")\n",
    "land_mask = xr.open_dataset(subset_file)['land']\n",
    "\n",
    "# Define time scales\n",
    "time_scales = [3, 6, 9, 12]\n",
    "\n",
    "# Calculate drought magnitude for each time scale\n",
    "for time_scale in time_scales:\n",
    "    print(f\"\\nProcessing time scale: {time_scale} months\")\n",
    "    \n",
    "    # Process SPI and SSI files\n",
    "    for data_type, folder in [('spi', spi_folder), ('ssi', ssi_folder)]:\n",
    "        print(f\"  Processing {data_type.upper()} data...\")\n",
    "\n",
    "        # Open the drought event data\n",
    "        event_file = event_folder + f'idn_cli_drought_event_{data_type}_{time_scale}_month.nc'\n",
    "        print(f\"    Opening drought event file: {event_file}\")\n",
    "        ds_drought = xr.open_dataset(event_file)\n",
    "\n",
    "        # Open the corresponding SPI/SSI data\n",
    "        data_file = folder + f'idn_cli_{data_type}_gamma_{time_scale}_month.nc'\n",
    "        print(f\"    Opening {data_type.upper()} data file: {data_file}\")\n",
    "        ds_data = xr.open_dataset(data_file)\n",
    "\n",
    "        # Transpose datasets to respect original dimensions\n",
    "        ds_drought = ds_drought.transpose('time', 'lat', 'lon')\n",
    "        ds_data = ds_data.transpose('time', 'lat', 'lon')\n",
    "\n",
    "        # Apply land mask to drought event data\n",
    "        print(\"    Applying land mask to drought event data...\")\n",
    "        ds_drought['drought_event'] = ds_drought['drought_event'].where(land_mask == 1, other=np.nan)\n",
    "\n",
    "        # Preallocate an empty array for drought_magnitude\n",
    "        drought_magnitude_arr = np.empty_like(ds_drought['drought_event'].values, dtype=float)\n",
    "        drought_magnitude_arr[:] = np.nan\n",
    "\n",
    "        # Calculate drought magnitude\n",
    "        print(\"    Calculating drought magnitude...\")\n",
    "        for i in tqdm(range(1, len(ds_drought.time)), desc=\"      Processing\", unit=\" timestep\"):\n",
    "            drought_event = ds_drought['drought_event'].isel(time=i).values\n",
    "            data_value = ds_data[f'{data_type}_gamma_{time_scale}_month'].isel(time=i).values\n",
    "            drought_magnitude_arr[i] = np.where(drought_event == 1, abs(data_value) + drought_magnitude_arr[i-1], 0)\n",
    "\n",
    "        # Create a new DataArray for drought magnitude\n",
    "        print(\"    Create a new DataArray for drought magnitude...\")\n",
    "        drought_magnitude_da = xr.DataArray(drought_magnitude_arr, \n",
    "                                            coords={'time': ds_drought.time, 'lat': ds_drought.lat, 'lon': ds_drought.lon}, \n",
    "                                            dims=['time', 'lat', 'lon'],\n",
    "                                            name=f'drought_magnitude_{data_type}')\n",
    "\n",
    "        # Apply land mask to the calculated drought magnitude\n",
    "        drought_magnitude_da = drought_magnitude_da.where(land_mask == 1, other=np.nan)\n",
    "\n",
    "        # Set attributes according to CF Conventions\n",
    "        drought_magnitude_da.attrs['standard_name'] = f'drought_magnitude_{data_type}'\n",
    "        drought_magnitude_da.attrs['long_name'] = f'Drought Magnitude for {data_type.upper()}, {time_scale}-month timescale'\n",
    "        drought_magnitude_da.attrs['units'] = '1'  # Assuming unitless\n",
    "\n",
    "        # Create a new dataset\n",
    "        ds_magnitude = xr.Dataset({f'drought_magnitude_{data_type}': drought_magnitude_da})\n",
    "\n",
    "        # Save the new drought magnitude variable to a new netCDF file\n",
    "        output_file = magnitude_folder + f'idn_cli_drought_magnitude_{data_type}_{time_scale}_month.nc'\n",
    "        print(f\"    Saving drought magnitude file: {output_file}\")\n",
    "        ds_magnitude.to_netcdf(output_file)\n",
    "\n",
    "        print(f\"  Drought magnitude calculations completed for {data_type.upper()}, time scale {time_scale} months\")\n",
    "\n",
    "print(\"\\nAll processes completed!\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c32afb77-c802-4f61-aaa1-25203be06d1a",
   "metadata": {},
   "source": [
    "## 2. Noise Filtering with Singular Spectrum Analysis\n",
    "\n",
    "This part of our analysis focuses on applying Singular Spectrum Analysis (SSA) for noise filtering and trend extraction in drought magnitude data in SPI and SSI datasets. In drought propagation analysis, noise filtering with SSA is a critical step for data preparation. SSA effectively separates the underlying signal from the noise in climate datasets, such as Standardized Precipitation Index (SPI) and Standardized Streamflow Index (SSI). This process is crucial for enhancing the clarity and accuracy of the data, which in turn facilitates a more precise understanding of drought patterns and their progression. \n",
    "\n",
    "By filtering out the noise, SSA aids in identifying genuine drought trends and magnitudes, providing a reliable basis for subsequent analysis of drought propagation and its potential impacts. This enhanced data quality is instrumental in developing effective drought management strategies and improving predictive models. \n",
    "\n",
    "Key steps and features of this script include:\n",
    "\n",
    "* **SSA Implementation:** Utilizes the `seasonal_decompose` function from the `statsmodels` library for SSA, effectively decomposing time series data into trend, seasonal, and residual components.\n",
    "* **Data Preparation and Processing:**\n",
    "    * Converts xarray DataArrays to pandas DataFrames for SSA application.\n",
    "    * Handles missing values in SSI data through forward filling, ensuring data continuity.\n",
    "* **Trend Extraction:** Focuses on extracting the trend component from both SPI and SSI data, considered as the crucial signal of interest.\n",
    "* **Data Transformation:** Transforms the extracted trend components back into xarray DataArrays for further analysis.\n",
    "* **Metadata Assignment:** Assigns appropriate metadata to the trend components, enhancing data understanding and context.\n",
    "* **NetCDF File Output:** Saves the filtered datasets as NetCDF files, adhering to the Climate and Forecast (CF) Conventions, making them ready for subsequent analysis or visualization.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "59565b7b-1a5d-4f9c-8169-a3de9bff9029",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Processing SPI data for time scale: 3 months\n",
      "    Load the dataset...\n",
      "    Select the time period...\n",
      "    Convert to pandas DataFrame...\n",
      "    Applying SSA for noise filtering...\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "49abd2966d6d4209bcb3695a19e6e37f",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/502 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    Convert back to xarray DataArray...\n",
      "    Create xarray Dataset...\n",
      "    Saving filtered dataset: ./prop/03_filtered/idn_cli_drought_magnitude_spi_3_month_filtered.nc\n",
      "    Overwriting existing file: ./prop/03_filtered/idn_cli_drought_magnitude_spi_3_month_filtered.nc\n",
      "    File saved as './prop/03_filtered/idn_cli_drought_magnitude_spi_3_month_filtered.nc'.\n",
      "\n",
      "Processing SSI data for time scale: 3 months\n",
      "    Load the dataset...\n",
      "    Select the time period...\n",
      "    Convert to pandas DataFrame...\n",
      "    Applying SSA for noise filtering...\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "d89ff68e7d2f44ed8536152ddb01282e",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/502 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    Convert back to xarray DataArray...\n",
      "    Create xarray Dataset...\n",
      "    Saving filtered dataset: ./prop/03_filtered/idn_cli_drought_magnitude_ssi_3_month_filtered.nc\n",
      "    Overwriting existing file: ./prop/03_filtered/idn_cli_drought_magnitude_ssi_3_month_filtered.nc\n",
      "    File saved as './prop/03_filtered/idn_cli_drought_magnitude_ssi_3_month_filtered.nc'.\n",
      "\n",
      "Processing SPI data for time scale: 6 months\n",
      "    Load the dataset...\n",
      "    Select the time period...\n",
      "    Convert to pandas DataFrame...\n",
      "    Applying SSA for noise filtering...\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "0c6043217a844ba790602e2ad2d54367",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/499 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    Convert back to xarray DataArray...\n",
      "    Create xarray Dataset...\n",
      "    Saving filtered dataset: ./prop/03_filtered/idn_cli_drought_magnitude_spi_6_month_filtered.nc\n",
      "    Overwriting existing file: ./prop/03_filtered/idn_cli_drought_magnitude_spi_6_month_filtered.nc\n",
      "    File saved as './prop/03_filtered/idn_cli_drought_magnitude_spi_6_month_filtered.nc'.\n",
      "\n",
      "Processing SSI data for time scale: 6 months\n",
      "    Load the dataset...\n",
      "    Select the time period...\n",
      "    Convert to pandas DataFrame...\n",
      "    Applying SSA for noise filtering...\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "ac15ed7c84ec4edd8085237ec135517e",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/499 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    Convert back to xarray DataArray...\n",
      "    Create xarray Dataset...\n",
      "    Saving filtered dataset: ./prop/03_filtered/idn_cli_drought_magnitude_ssi_6_month_filtered.nc\n",
      "    Overwriting existing file: ./prop/03_filtered/idn_cli_drought_magnitude_ssi_6_month_filtered.nc\n",
      "    File saved as './prop/03_filtered/idn_cli_drought_magnitude_ssi_6_month_filtered.nc'.\n",
      "\n",
      "Processing SPI data for time scale: 9 months\n",
      "    Load the dataset...\n",
      "    Select the time period...\n",
      "    Convert to pandas DataFrame...\n",
      "    Applying SSA for noise filtering...\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "2e8c1675e32f49dd9006dd0d5b837f00",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/496 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    Convert back to xarray DataArray...\n",
      "    Create xarray Dataset...\n",
      "    Saving filtered dataset: ./prop/03_filtered/idn_cli_drought_magnitude_spi_9_month_filtered.nc\n",
      "    Overwriting existing file: ./prop/03_filtered/idn_cli_drought_magnitude_spi_9_month_filtered.nc\n",
      "    File saved as './prop/03_filtered/idn_cli_drought_magnitude_spi_9_month_filtered.nc'.\n",
      "\n",
      "Processing SSI data for time scale: 9 months\n",
      "    Load the dataset...\n",
      "    Select the time period...\n",
      "    Convert to pandas DataFrame...\n",
      "    Applying SSA for noise filtering...\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "447abb9602d043749e78220d1b199401",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/496 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    Convert back to xarray DataArray...\n",
      "    Create xarray Dataset...\n",
      "    Saving filtered dataset: ./prop/03_filtered/idn_cli_drought_magnitude_ssi_9_month_filtered.nc\n",
      "    File saved as './prop/03_filtered/idn_cli_drought_magnitude_ssi_9_month_filtered.nc'.\n",
      "\n",
      "Processing SPI data for time scale: 12 months\n",
      "    Load the dataset...\n",
      "    Select the time period...\n",
      "    Convert to pandas DataFrame...\n",
      "    Applying SSA for noise filtering...\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "e8f1e97000e14b47803ad8e0968b40d1",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/493 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    Convert back to xarray DataArray...\n",
      "    Create xarray Dataset...\n",
      "    Saving filtered dataset: ./prop/03_filtered/idn_cli_drought_magnitude_spi_12_month_filtered.nc\n",
      "    File saved as './prop/03_filtered/idn_cli_drought_magnitude_spi_12_month_filtered.nc'.\n",
      "\n",
      "Processing SSI data for time scale: 12 months\n",
      "    Load the dataset...\n",
      "    Select the time period...\n",
      "    Convert to pandas DataFrame...\n",
      "    Applying SSA for noise filtering...\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "e3d8a22c4a564c2a9de20599c4b27fd7",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/493 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    Convert back to xarray DataArray...\n",
      "    Create xarray Dataset...\n",
      "    Saving filtered dataset: ./prop/03_filtered/idn_cli_drought_magnitude_ssi_12_month_filtered.nc\n",
      "    File saved as './prop/03_filtered/idn_cli_drought_magnitude_ssi_12_month_filtered.nc'.\n",
      "\n",
      "All processing completed.\n"
     ]
    }
   ],
   "source": [
    "import xarray as xr\n",
    "import pandas as pd\n",
    "from tqdm.auto import tqdm\n",
    "from statsmodels.tsa.seasonal import seasonal_decompose\n",
    "import os\n",
    "\n",
    "# Register tqdm with pandas to enable progress_apply\n",
    "tqdm.pandas()\n",
    "\n",
    "# Function to apply SSA\n",
    "def apply_ssa(data, freq):\n",
    "    result = seasonal_decompose(data.dropna(), model='additive', period=freq)\n",
    "    return result.trend\n",
    "\n",
    "# Function to save dataset with CF Conventions\n",
    "def save_dataset(dataset, filename, data_type, time_scale):\n",
    "    if os.path.exists(filename):\n",
    "        print(f\"    Overwriting existing file: {filename}\")\n",
    "        os.remove(filename)\n",
    "\n",
    "    # Set attributes according to CF Conventions\n",
    "    dataset[f'drought_magnitude_{data_type}_filtered'].attrs = {\n",
    "        'standard_name': f'drought_magnitude_{data_type}_filtered',\n",
    "        'long_name': f'Drought Magnitude of {data_type.capitalize()}, {time_scale}-month (Filtered)',\n",
    "        'units': '1',  # SPI and SSI are unitless\n",
    "        'comment': 'Filtered using Singular Spectrum Analysis'\n",
    "    }\n",
    "    dataset.to_netcdf(filename)\n",
    "    print(f\"    File saved as '{filename}'.\")\n",
    "\n",
    "# Define folders and subset file path\n",
    "magnitude_folder = './prop/02_magnitude/'\n",
    "subset_file = './subset/idn_subset_chirps.nc'\n",
    "filtered_folder = './prop/03_filtered/'\n",
    "\n",
    "# Ensure the filtered folder exists\n",
    "os.makedirs(filtered_folder, exist_ok=True)\n",
    "\n",
    "# Load the land mask from the subset file\n",
    "land_mask = xr.open_dataset(subset_file)['land']\n",
    "\n",
    "# Define time scales and data types\n",
    "time_scales = [3, 6, 9, 12]\n",
    "data_types = ['spi', 'ssi']\n",
    "\n",
    "for time_scale in time_scales:\n",
    "    for data_type in data_types:\n",
    "        print(f\"\\nProcessing {data_type.upper()} data for time scale: {time_scale} months\")\n",
    "\n",
    "        # Load the dataset\n",
    "        print(\"    Load the dataset...\")\n",
    "        file_path = magnitude_folder + f'idn_cli_drought_magnitude_{data_type}_{time_scale}_month.nc'\n",
    "        ds = xr.open_dataset(file_path)\n",
    "        ds[f'drought_magnitude_{data_type}'] = ds[f'drought_magnitude_{data_type}'].where(land_mask == 1, other=np.nan)\n",
    "\n",
    "        # Select data starting from the appropriate month\n",
    "        print(\"    Select the time period...\")\n",
    "        start_year = '1981'\n",
    "        start_date = f\"{start_year}-{str(time_scale).zfill(2)}-01\"\n",
    "        ds_selected = ds.sel(time=slice(start_date, '2022-12-31'))\n",
    "\n",
    "        # Convert to pandas DataFrame\n",
    "        print(\"    Convert to pandas DataFrame...\")\n",
    "        df = ds_selected.to_dataframe().unstack('time')[f'drought_magnitude_{data_type}']\n",
    "        df_filled = df.fillna(method='ffill').fillna(method='bfill')\n",
    "\n",
    "        # Apply SSA with tqdm progress bar\n",
    "        print(\"    Applying SSA for noise filtering...\")\n",
    "        df_filtered = df_filled.progress_apply(lambda x: apply_ssa(x, freq=12), axis=0)\n",
    "\n",
    "        # Convert back to xarray DataArray\n",
    "        print(\"    Convert back to xarray DataArray...\")\n",
    "        df_filtered = df_filtered.stack().reset_index()\n",
    "        df_filtered = df_filtered.rename(columns={'level_0': 'lat', 'level_1': 'lon', 0: 'filtered_data'})\n",
    "        filtered_da = df_filtered.set_index(['time', 'lat', 'lon']).to_xarray()['filtered_data']\n",
    "\n",
    "        # Realign the land mask with the filtered data\n",
    "        aligned_land_mask = land_mask.reindex_like(filtered_da, method='nearest')\n",
    "\n",
    "        # Apply the aligned land mask to the filtered data\n",
    "        filtered_da_masked = filtered_da.where(aligned_land_mask == 1, other=np.nan)\n",
    "\n",
    "        # Create xarray Dataset\n",
    "        print(\"    Create xarray Dataset...\")\n",
    "        ds_filtered = xr.Dataset({f'drought_magnitude_{data_type}_filtered': filtered_da_masked})\n",
    "\n",
    "        # Save the filtered dataset as NetCDF file with attributes\n",
    "        output_file = filtered_folder + f'idn_cli_drought_magnitude_{data_type}_{time_scale}_month_filtered.nc'\n",
    "        print(f\"    Saving filtered dataset: {output_file}\")\n",
    "        save_dataset(ds_filtered, output_file, data_type, time_scale)\n",
    "\n",
    "print(\"\\nAll processing completed.\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff978ecb-9d93-4173-ab0e-06579c27904a",
   "metadata": {},
   "source": [
    "## 3. Cross-Correlation Analysis\n",
    "\n",
    "Cross-Correlation Analysis, especially when applied to data refined through Singular Spectrum Analysis (SSA) noise filtering, is pivotal in understanding drought propagation. This technique examines the relationship between different drought indicators across various time scales. \n",
    "\n",
    "By utilizing data filtered through SSA, which isolates the core signal from noise, Cross-Correlation Analysis can more accurately determine the time lag and intensity with which meteorological droughts (indicated by SPI) translate into hydrological droughts (indicated by SSI). This approach is essential for predicting the onset and progression of drought conditions, enabling timely decision-making and effective resource management to mitigate the adverse impacts of droughts.\n",
    "\n",
    "Key steps and features of this script include:\n",
    "\n",
    "* **Loading Data:** Imports meteorological and hydrological drought indices from NetCDF files.\n",
    "* **Land Mask Application:** Utilizes a land mask to focus analysis on terrestrial areas, excluding ocean or sea regions.\n",
    "* **Cross-Correlation Calculation:**\n",
    "    * Computes the correlation for various time lags (up to 12 months).\n",
    "    * Analyzes each grid cell separately, ensuring detailed spatial analysis.\n",
    "* **Result Processing:** Identifies the time lag with the maximum correlation at each location, providing insights into drought propagation.\n",
    "* **Data Array Creation:** Constructs xarray DataArrays for both lag time and correlation strength, including appropriate metadata.\n",
    "* **NetCDF Output:** Saves the processed data as a NetCDF file, suitable for further analysis or visualization.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "2ac9816e-20bb-45d5-9ed7-98f204cdf522",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing SPI (Time Scale: 3) and SSI (Time Scale: 3)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Processing lags for SPI 3 & SSI 3: 100%|██████████████████████████████████████████████████████| 12/12 [20:01<00:00, 100.14s/lag]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time scale 3 & 3 processing completed. Output saved.\n",
      "Processing SPI (Time Scale: 3) and SSI (Time Scale: 6)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Processing lags for SPI 3 & SSI 6: 100%|███████████████████████████████████████████████████████| 12/12 [19:30<00:00, 97.51s/lag]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time scale 3 & 6 processing completed. Output saved.\n",
      "Processing SPI (Time Scale: 3) and SSI (Time Scale: 9)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Processing lags for SPI 3 & SSI 9: 100%|███████████████████████████████████████████████████████| 12/12 [19:13<00:00, 96.13s/lag]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time scale 3 & 9 processing completed. Output saved.\n",
      "Processing SPI (Time Scale: 3) and SSI (Time Scale: 12)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Processing lags for SPI 3 & SSI 12: 100%|██████████████████████████████████████████████████████| 12/12 [19:09<00:00, 95.78s/lag]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time scale 3 & 12 processing completed. Output saved.\n",
      "Processing SPI (Time Scale: 6) and SSI (Time Scale: 3)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Processing lags for SPI 6 & SSI 3: 100%|███████████████████████████████████████████████████████| 12/12 [19:21<00:00, 96.78s/lag]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time scale 6 & 3 processing completed. Output saved.\n",
      "Processing SPI (Time Scale: 6) and SSI (Time Scale: 6)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Processing lags for SPI 6 & SSI 6: 100%|███████████████████████████████████████████████████████| 12/12 [19:13<00:00, 96.14s/lag]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time scale 6 & 6 processing completed. Output saved.\n",
      "Processing SPI (Time Scale: 6) and SSI (Time Scale: 9)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Processing lags for SPI 6 & SSI 9: 100%|███████████████████████████████████████████████████████| 12/12 [19:18<00:00, 96.52s/lag]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time scale 6 & 9 processing completed. Output saved.\n",
      "Processing SPI (Time Scale: 6) and SSI (Time Scale: 12)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Processing lags for SPI 6 & SSI 12: 100%|██████████████████████████████████████████████████████| 12/12 [19:09<00:00, 95.83s/lag]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time scale 6 & 12 processing completed. Output saved.\n",
      "Processing SPI (Time Scale: 9) and SSI (Time Scale: 3)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Processing lags for SPI 9 & SSI 3: 100%|███████████████████████████████████████████████████████| 12/12 [19:19<00:00, 96.62s/lag]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time scale 9 & 3 processing completed. Output saved.\n",
      "Processing SPI (Time Scale: 9) and SSI (Time Scale: 6)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Processing lags for SPI 9 & SSI 6: 100%|███████████████████████████████████████████████████████| 12/12 [19:13<00:00, 96.15s/lag]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time scale 9 & 6 processing completed. Output saved.\n",
      "Processing SPI (Time Scale: 9) and SSI (Time Scale: 9)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Processing lags for SPI 9 & SSI 9: 100%|███████████████████████████████████████████████████████| 12/12 [19:11<00:00, 95.93s/lag]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time scale 9 & 9 processing completed. Output saved.\n",
      "Processing SPI (Time Scale: 9) and SSI (Time Scale: 12)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Processing lags for SPI 9 & SSI 12: 100%|██████████████████████████████████████████████████████| 12/12 [19:07<00:00, 95.63s/lag]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time scale 9 & 12 processing completed. Output saved.\n",
      "Processing SPI (Time Scale: 12) and SSI (Time Scale: 3)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Processing lags for SPI 12 & SSI 3: 100%|██████████████████████████████████████████████████████| 12/12 [19:10<00:00, 95.88s/lag]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time scale 12 & 3 processing completed. Output saved.\n",
      "Processing SPI (Time Scale: 12) and SSI (Time Scale: 6)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Processing lags for SPI 12 & SSI 6: 100%|██████████████████████████████████████████████████████| 12/12 [19:08<00:00, 95.74s/lag]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time scale 12 & 6 processing completed. Output saved.\n",
      "Processing SPI (Time Scale: 12) and SSI (Time Scale: 9)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Processing lags for SPI 12 & SSI 9: 100%|██████████████████████████████████████████████████████| 12/12 [19:06<00:00, 95.58s/lag]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time scale 12 & 9 processing completed. Output saved.\n",
      "Processing SPI (Time Scale: 12) and SSI (Time Scale: 12)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Processing lags for SPI 12 & SSI 12: 100%|█████████████████████████████████████████████████████| 12/12 [19:11<00:00, 95.95s/lag]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time scale 12 & 12 processing completed. Output saved.\n",
      "\n",
      "Cross-correlation analysis completed for all time scale combinations.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "import xarray as xr\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import os\n",
    "from tqdm import tqdm\n",
    "\n",
    "# Define folders\n",
    "filtered_folder = './prop/03_filtered/'  # Filtered drought magnitude\n",
    "correlation_base_folder = './prop/04_correlation/'\n",
    "subset_file = './subset/idn_subset_chirps.nc'\n",
    "\n",
    "# Load the land mask from the subset file\n",
    "land_mask = xr.open_dataset(subset_file)['land']\n",
    "\n",
    "# Define time scales and lags\n",
    "time_scales = [3, 6, 9, 12]\n",
    "max_lag = 12\n",
    "\n",
    "# Function to calculate cross-correlation\n",
    "def calculate_cross_correlation(magnitude_spi_data, magnitude_ssi_data, lag):\n",
    "    corr_values = np.full((magnitude_spi_data.shape[1], magnitude_spi_data.shape[2]), np.nan)\n",
    "    for lat in range(magnitude_spi_data.shape[1]):\n",
    "        for lon in range(magnitude_spi_data.shape[2]):\n",
    "            if land_mask[lat, lon]:  # Only calculate for land points\n",
    "                magnitude_spi_series = magnitude_spi_data[:-lag or None, lat, lon]\n",
    "                magnitude_ssi_series = magnitude_ssi_data[lag:, lat, lon]\n",
    "                valid_idx = ~np.isnan(magnitude_spi_series) & ~np.isnan(magnitude_ssi_series)\n",
    "                if valid_idx.any():\n",
    "                    corr = np.corrcoef(magnitude_spi_series[valid_idx], magnitude_ssi_series[valid_idx])[0, 1]\n",
    "                    corr_values[lat, lon] = corr\n",
    "    return corr_values\n",
    "\n",
    "# Function to align datasets to the larger time scale\n",
    "def align_to_larger_time_scale(ds1, ds2, time_scale1, time_scale2):\n",
    "    larger_time_scale = max(time_scale1, time_scale2)\n",
    "    start_year = '1981'\n",
    "    start_date = f\"{start_year}-{str(larger_time_scale).zfill(2)}-01\"\n",
    "    end_date = '2022-12-31'\n",
    "    ds1_aligned = ds1.sel(time=slice(start_date, end_date))\n",
    "    ds2_aligned = ds2.sel(time=slice(start_date, end_date))\n",
    "    return ds1_aligned, ds2_aligned\n",
    "\n",
    "# Process each time scale combination\n",
    "for magnitude_spi_time_scale in time_scales:\n",
    "    for magnitude_ssi_time_scale in time_scales:\n",
    "        time_scale_folder = f\"spi{magnitude_spi_time_scale:02d}_ssi{magnitude_ssi_time_scale:02d}\"\n",
    "        correlation_folder = os.path.join(correlation_base_folder, time_scale_folder)\n",
    "        os.makedirs(correlation_folder, exist_ok=True)\n",
    "\n",
    "        print(f\"Processing SPI (Time Scale: {magnitude_spi_time_scale}) and SSI (Time Scale: {magnitude_ssi_time_scale})\")\n",
    "\n",
    "        # Load drought magnitude datasets\n",
    "        magnitude_spi_file = filtered_folder + f'idn_cli_drought_magnitude_spi_{magnitude_spi_time_scale}_month_filtered.nc'\n",
    "        magnitude_ssi_file = filtered_folder + f'idn_cli_drought_magnitude_ssi_{magnitude_ssi_time_scale}_month_filtered.nc'\n",
    "        magnitude_spi_ds = xr.open_dataset(magnitude_spi_file)\n",
    "        magnitude_ssi_ds = xr.open_dataset(magnitude_ssi_file)\n",
    "\n",
    "        # Align datasets to the larger time scale\n",
    "        magnitude_spi_data, magnitude_ssi_data = align_to_larger_time_scale(\n",
    "            magnitude_spi_ds[f'drought_magnitude_spi_filtered'],\n",
    "            magnitude_ssi_ds[f'drought_magnitude_ssi_filtered'],\n",
    "            magnitude_spi_time_scale,\n",
    "            magnitude_ssi_time_scale\n",
    "        )\n",
    "\n",
    "        # Apply land mask\n",
    "        magnitude_spi_data = magnitude_spi_data.where(land_mask == 1)\n",
    "        magnitude_ssi_data = magnitude_ssi_data.where(land_mask == 1)\n",
    "\n",
    "        # Calculate and save correlation for each lag\n",
    "        for lag in tqdm(range(1, max_lag + 1), desc=f\"Processing lags for SPI {magnitude_spi_time_scale} & SSI {magnitude_ssi_time_scale}\", unit=\"lag\"):\n",
    "            corr_values = calculate_cross_correlation(magnitude_spi_data.values, magnitude_ssi_data.values, lag)\n",
    "            corr_da = xr.DataArray(corr_values, coords={'lat': magnitude_spi_data.lat, 'lon': magnitude_spi_data.lon}, dims=['lat', 'lon'])\n",
    "            corr_ds = xr.Dataset({'correlation': corr_da})\n",
    "            output_file = os.path.join(correlation_folder, f\"idn_cli_correlation_{time_scale_folder}_lag{lag:02d}.nc\")\n",
    "            corr_ds.to_netcdf(output_file)\n",
    "\n",
    "        print(f\"Time scale {magnitude_spi_time_scale} & {magnitude_ssi_time_scale} processing completed. Output saved.\")\n",
    "\n",
    "print(\"\\nCross-correlation analysis completed for all time scale combinations.\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "63846d1d-9e4d-4072-8bfd-f97024b8d443",
   "metadata": {},
   "source": [
    "## 4. Frequency Analysis\n",
    "\n",
    "In the context of drought propagation analysis, frequency analysis plays a critical role in identifying the most prominent patterns of correlation between meteorological and hydrological drought indicators over time. By classifying cross-correlation values into distinct ranges (e.g., 0.0-0.1, 0.1-0.2, etc.) and analyzing these across different lag times, researchers can pinpoint the range that most frequently occurs. \n",
    "\n",
    "This approach helps in understanding the typical strength of correlation and the temporal shift (lag) between the onset of meteorological drought (as indicated by SPI) and its subsequent impact on hydrological conditions (as indicated by SSI). The most frequent range provides insights into the commonality of correlation strengths, while the corresponding lag sheds light on the typical delay between atmospheric changes and their effects on hydrological systems. \n",
    "\n",
    "**Let's convert the result into GeoTIFF files**\n",
    "\n",
    "Getting statistical value within the netCDF files is faster, but it can be very slow if it required multiple arrays to be process. So far doing the similar analysis using GeoTIFF files as input is better.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "dc93733c-a866-49d8-9859-e33ae9eb7867",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing folder: spi03_ssi03\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Converting files: 100%|█████████████████████████████████████████████████████████████████████████| 25/25 [00:01<00:00, 16.04it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing folder: spi03_ssi06\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Converting files: 100%|█████████████████████████████████████████████████████████████████████████| 24/24 [00:01<00:00, 17.87it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing folder: spi03_ssi09\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Converting files: 100%|█████████████████████████████████████████████████████████████████████████| 24/24 [00:01<00:00, 19.45it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing folder: spi03_ssi12\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Converting files: 100%|█████████████████████████████████████████████████████████████████████████| 24/24 [00:01<00:00, 13.20it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing folder: spi06_ssi03\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Converting files: 100%|█████████████████████████████████████████████████████████████████████████| 24/24 [00:01<00:00, 17.72it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing folder: spi06_ssi06\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Converting files: 100%|█████████████████████████████████████████████████████████████████████████| 24/24 [00:01<00:00, 18.00it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing folder: spi06_ssi09\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Converting files: 100%|█████████████████████████████████████████████████████████████████████████| 24/24 [00:01<00:00, 19.18it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing folder: spi06_ssi12\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Converting files: 100%|█████████████████████████████████████████████████████████████████████████| 24/24 [00:01<00:00, 15.43it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing folder: spi09_ssi03\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Converting files: 100%|█████████████████████████████████████████████████████████████████████████| 24/24 [00:01<00:00, 13.41it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing folder: spi09_ssi06\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Converting files: 100%|█████████████████████████████████████████████████████████████████████████| 24/24 [00:01<00:00, 20.53it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing folder: spi09_ssi09\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Converting files: 100%|█████████████████████████████████████████████████████████████████████████| 24/24 [00:01<00:00, 17.20it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing folder: spi09_ssi12\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Converting files: 100%|█████████████████████████████████████████████████████████████████████████| 24/24 [00:01<00:00, 19.14it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing folder: spi12_ssi03\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Converting files: 100%|█████████████████████████████████████████████████████████████████████████| 24/24 [00:01<00:00, 12.69it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing folder: spi12_ssi06\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Converting files: 100%|█████████████████████████████████████████████████████████████████████████| 24/24 [00:00<00:00, 28.97it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing folder: spi12_ssi09\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Converting files: 100%|█████████████████████████████████████████████████████████████████████████| 24/24 [00:01<00:00, 18.50it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Processing folder: spi12_ssi12\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Converting files: 100%|█████████████████████████████████████████████████████████████████████████| 24/24 [00:00<00:00, 26.64it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Conversion to GeoTIFF completed for all files.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "import rasterio\n",
    "import xarray as xr\n",
    "from rasterio.transform import Affine\n",
    "from tqdm import tqdm\n",
    "\n",
    "def convert_nc_to_geotiff(nc_file, geotiff_file):\n",
    "    with xr.open_dataset(nc_file) as ds:\n",
    "        # Assuming 'correlation' is the variable to be converted\n",
    "        data = ds['correlation'].values\n",
    "        lat = ds['lat'].values\n",
    "        lon = ds['lon'].values\n",
    "\n",
    "        # Calculate resolution\n",
    "        res_lat = (lat[-1] - lat[0]) / (len(lat) - 1)\n",
    "        res_lon = (lon[-1] - lon[0]) / (len(lon) - 1)\n",
    "\n",
    "        # Check the order of the latitudes to determine if we need to flip the array\n",
    "        if lat[0] > lat[-1]:  # latitudes are in descending order\n",
    "            data = data[::-1, :]  # flip the data array vertically\n",
    "            transform = Affine.translation(lon[0] - res_lon / 2, lat[-1] - res_lat / 2) * Affine.scale(res_lon, -res_lat)\n",
    "        else:\n",
    "            transform = Affine.translation(lon[0] - res_lon / 2, lat[0] - res_lat / 2) * Affine.scale(res_lon, res_lat)\n",
    "\n",
    "        # Write to GeoTIFF\n",
    "        with rasterio.open(\n",
    "            geotiff_file,\n",
    "            'w',\n",
    "            driver='GTiff',\n",
    "            height=data.shape[0],\n",
    "            width=data.shape[1],\n",
    "            count=1,\n",
    "            dtype=str(data.dtype),\n",
    "            crs='+proj=latlong',\n",
    "            transform=transform,\n",
    "        ) as new_dataset:\n",
    "            new_dataset.write(data, 1)\n",
    "\n",
    "correlation_base_folder = './prop/04_correlation/'\n",
    "\n",
    "# Iterate over each time scale folder\n",
    "for time_scale_folder in os.listdir(correlation_base_folder):\n",
    "    folder_path = os.path.join(correlation_base_folder, time_scale_folder)\n",
    "    if os.path.isdir(folder_path):\n",
    "        print(f\"Processing folder: {time_scale_folder}\")\n",
    "        for nc_file in tqdm(os.listdir(folder_path), desc=\"Converting files\"):\n",
    "            if nc_file.endswith('.nc'):\n",
    "                nc_file_path = os.path.join(folder_path, nc_file)\n",
    "                geotiff_file_path = nc_file_path.replace('.nc', '.tif')\n",
    "                convert_nc_to_geotiff(nc_file_path, geotiff_file_path)\n",
    "\n",
    "print(\"\\nConversion to GeoTIFF completed for all files.\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f3304f5-1a47-47c3-8f90-fadd51dde0f4",
   "metadata": {},
   "source": [
    "**Calculate the statistics**\n",
    "\n",
    "Calculate the maximum and the most frequent correlation, along with lag where the correlation is observe.\n",
    "\n",
    "The script below will do:\n",
    "\n",
    "* **Classify Correlation Values:** Classify the correlation values into bins (e.g., 0.0-0.1, 0.1-0.2, ... up to 1.0) for each lag.\n",
    "* **Find the Most Frequent Range:** Determine the most frequently occurring range across all lags.\n",
    "* **Identify the Most Frequent Lag:** Within the most frequent range, identify the most frequent lag.\n",
    "* **Save the Results:** Save the most frequent range and corresponding lag as NetCDF files following CF conventions.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "id": "2f57b6b8-edb6-4c82-90c7-04317d661445",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Processing time scale combinations:   0%|                                                                | 0/16 [00:00<?, ?it/s]/var/folders/9k/rftc85d17g90vcrwv_ctcd140000gp/T/ipykernel_78724/511869154.py:54: RuntimeWarning: invalid value encountered in cast\n",
      "  bin_indices = np.floor(corr_data * 10).astype(np.int32)\n",
      "Processing time scale combinations: 100%|███████████████████████████████████████████████████████| 16/16 [00:10<00:00,  1.49it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Frequency analysis completed for all time scale combinations.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "import numpy as np\n",
    "import rasterio\n",
    "from tqdm import tqdm\n",
    "\n",
    "# Define folders\n",
    "correlation_base_folder = './prop/04_correlation/'\n",
    "temp_output_folder = './prop/05_frequency/temp/'\n",
    "max_frequency_folder = './prop/05_frequency/max/'\n",
    "mode_frequency_folder = './prop/05_frequency/mode/'\n",
    "\n",
    "# Ensure the output folder exists\n",
    "os.makedirs(temp_output_folder, exist_ok=True)\n",
    "\n",
    "# Time scale combinations\n",
    "time_scale_combinations = [\n",
    "    \"spi03_ssi03\", \"spi06_ssi03\", \"spi09_ssi03\", \"spi12_ssi03\",\n",
    "    \"spi03_ssi06\", \"spi06_ssi06\", \"spi09_ssi06\", \"spi12_ssi06\",\n",
    "    \"spi03_ssi09\", \"spi06_ssi09\", \"spi09_ssi09\", \"spi12_ssi09\",\n",
    "    \"spi03_ssi12\", \"spi06_ssi12\", \"spi09_ssi12\", \"spi12_ssi12\"\n",
    "]\n",
    "\n",
    "def process_time_scale_folder(time_scale_folder):\n",
    "    folder_path = os.path.join(correlation_base_folder, time_scale_folder)\n",
    "\n",
    "    # Initialize arrays to store maximum and most frequent values\n",
    "    max_corr_value = None\n",
    "    max_corr_lag = None\n",
    "    freq_corr_bins = None\n",
    "    freq_corr_lag = None\n",
    "\n",
    "    # Process each lag file\n",
    "    for lag in range(1, 13):\n",
    "        file_path = os.path.join(folder_path, f\"idn_cli_correlation_{time_scale_folder}_lag{str(lag).zfill(2)}.tif\")\n",
    "        if os.path.exists(file_path):\n",
    "            with rasterio.open(file_path) as src:\n",
    "                corr_data = src.read(1)\n",
    "\n",
    "                # Initialize arrays on the first iteration\n",
    "                if max_corr_value is None:\n",
    "                    max_corr_value = np.copy(corr_data)\n",
    "                    max_corr_lag = np.full(corr_data.shape, 1, dtype=np.float32)  # Initialize with 1\n",
    "                    freq_corr_bins = np.zeros((corr_data.shape[0], corr_data.shape[1], 10), dtype=np.int32)\n",
    "                    freq_corr_lag = np.full(corr_data.shape, np.nan, dtype=np.float32)  # Initialize as float32 with NaN\n",
    "                    profile = src.profile\n",
    "\n",
    "                # Update maximum correlation and lag\n",
    "                valid_mask = ~np.isnan(corr_data)\n",
    "                new_max = (corr_data > max_corr_value) & valid_mask\n",
    "                max_corr_value[new_max] = corr_data[new_max]\n",
    "                max_corr_lag[new_max] = lag\n",
    "\n",
    "                # Update frequency bins\n",
    "                bin_indices = np.floor(corr_data * 10).astype(np.int32)\n",
    "                bin_indices = np.clip(bin_indices, 0, 9)  # Ensure bin indices are within the range [0, 9]\n",
    "                np.add.at(freq_corr_bins, (np.arange(corr_data.shape[0])[:, None], np.arange(corr_data.shape[1]), bin_indices), 1)\n",
    "\n",
    "    # Calculate the most frequent correlation range and its lag\n",
    "    freq_corr_value = np.nanmax(freq_corr_bins, axis=2) / 12.0\n",
    "    freq_corr_lag = np.argmax(freq_corr_bins, axis=2) + 1\n",
    "\n",
    "    # Save results in the temporary folder\n",
    "    for name, data in [('max_corr_value', max_corr_value), ('max_corr_lag', max_corr_lag),\n",
    "                       ('freq_corr_value', freq_corr_value), ('freq_corr_lag', freq_corr_lag)]:\n",
    "        output_file = os.path.join(temp_output_folder, f'idn_cli_{name}_{time_scale_folder}.tif')\n",
    "        with rasterio.open(output_file, 'w', **profile) as dst:\n",
    "            dst.write(data, 1)\n",
    "\n",
    "for time_scale_folder in tqdm(time_scale_combinations, desc=\"Processing time scale combinations\"):\n",
    "    process_time_scale_folder(time_scale_folder)\n",
    "\n",
    "print(\"\\nFrequency analysis completed for all time scale combinations.\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "84bf600e-4219-49a2-ae0b-d79bc5bc8f72",
   "metadata": {},
   "source": [
    "**Remove the NaN**\n",
    "\n",
    "From above process, the output still have unwanted value, usually in non land areas. Let's remove it.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "id": "fe835ca4-49c2-4676-9272-b075c8b93d08",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Cleaned GeoTIFF outputs saved to respective folders.\n",
      "\n",
      "Temporary files have been removed.\n"
     ]
    }
   ],
   "source": [
    "import rasterio\n",
    "import os\n",
    "import shutil\n",
    "\n",
    "# Define folders\n",
    "max_frequency_folder = './prop/05_frequency/max/'\n",
    "mode_frequency_folder = './prop/05_frequency/mode'\n",
    "temp_output_folder = './prop/05_frequency/temp/'\n",
    "\n",
    "# Ensure the output folders exist\n",
    "os.makedirs(max_frequency_folder, exist_ok=True)\n",
    "os.makedirs(mode_frequency_folder, exist_ok=True)\n",
    "\n",
    "# Define folders and combinations\n",
    "correlation_base_folder = './prop/04_correlation/'\n",
    "time_scale_combinations = [\n",
    "    \"spi03_ssi03\", \"spi06_ssi03\", \"spi09_ssi03\", \"spi12_ssi03\",\n",
    "    \"spi03_ssi06\", \"spi06_ssi06\", \"spi09_ssi06\", \"spi12_ssi06\",\n",
    "    \"spi03_ssi09\", \"spi06_ssi09\", \"spi09_ssi09\", \"spi12_ssi09\",\n",
    "    \"spi03_ssi12\", \"spi06_ssi12\", \"spi09_ssi12\", \"spi12_ssi12\"\n",
    "]\n",
    "\n",
    "# Choose a sample file from one of the time scale combinations to create the land mask\n",
    "sample_time_scale = time_scale_combinations[0]  # Using the first time scale combination for the sample\n",
    "sample_folder_path = os.path.join(correlation_base_folder, sample_time_scale)\n",
    "\n",
    "# Find the first .tif file in the sample folder\n",
    "for file in os.listdir(sample_folder_path):\n",
    "    if file.endswith('.tif'):\n",
    "        sample_file_path = os.path.join(sample_folder_path, file)\n",
    "        break\n",
    "\n",
    "# Load the sample file to create a land mask\n",
    "with rasterio.open(sample_file_path) as mask_src:\n",
    "    land_mask = ~np.isnan(mask_src.read(1))  # Non-land areas will be NaN\n",
    "\n",
    "# Iterate through the files in the temp_output_folder\n",
    "for filename in os.listdir(temp_output_folder):\n",
    "    if filename.endswith('.tif'):\n",
    "        file_path = os.path.join(temp_output_folder, filename)\n",
    "\n",
    "        with rasterio.open(file_path) as src:\n",
    "            data = src.read(1)\n",
    "            profile = src.profile\n",
    "\n",
    "            # Apply the land mask to the data\n",
    "            data[~land_mask] = np.nan  # Set non-land areas to NaN\n",
    "\n",
    "            # Determine the output folder\n",
    "            if 'max_corr' in filename:\n",
    "                output_folder = max_frequency_folder\n",
    "            elif 'freq_corr' in filename:\n",
    "                output_folder = mode_frequency_folder\n",
    "            else:\n",
    "                continue  # Skip files that don't match the criteria\n",
    "\n",
    "            # Save the cleaned data\n",
    "            output_file_path = os.path.join(output_folder, filename)\n",
    "            with rasterio.open(output_file_path, 'w', **profile) as dst:\n",
    "                dst.write(data, 1)\n",
    "\n",
    "print(\"\\nCleaned GeoTIFF outputs saved to respective folders.\")\n",
    "\n",
    "# Remove all files in the temp_output_folder\n",
    "for filename in os.listdir(temp_output_folder):\n",
    "    file_path = os.path.join(temp_output_folder, filename)\n",
    "    try:\n",
    "        if os.path.isfile(file_path) or os.path.islink(file_path):\n",
    "            os.unlink(file_path)\n",
    "        elif os.path.isdir(file_path):\n",
    "            shutil.rmtree(file_path)\n",
    "    except Exception as e:\n",
    "        print(f'Failed to delete {file_path}. Reason: {e}')\n",
    "\n",
    "print(\"\\nTemporary files have been removed.\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "be0feb35-b3ef-4fc3-afd5-9ee201b8d6e4",
   "metadata": {},
   "source": [
    "## 5. Visualisation\n",
    "\n",
    "The visualization script is designed to illustrate the results of the cross-correlation analysis between meteorological (SPI) and hydrological (SSI) droughts. Key features of the script include:\n",
    "\n",
    "* **Lag Map Creation:** This map displays the time lag (in months) between meteorological and hydrological droughts across the study area. It helps identify regions where hydrological responses to meteorological changes are immediate or delayed.\n",
    "\n",
    "* **Strength Map Generation:** This map shows the strength of the correlation between meteorological and hydrological droughts. It highlights areas with a strong predictive relationship, indicating regions sensitive to meteorological changes.\n",
    "\n",
    "* **Spatial Insight:** Both maps together provide a spatial understanding of drought dynamics, highlighting areas of resilience and vulnerability.\n",
    "\n",
    "* **Practical Application:** The visualizations aid in drought management by pinpointing critical areas for focused attention and intervention.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 114,
   "id": "0bbde80d-714c-4093-976d-2325a55469cf",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Creating plot on the Maximum Correlation for ['spi03_ssi03', 'spi06_ssi03', 'spi09_ssi03', 'spi12_ssi03', 'spi03_ssi06', 'spi06_ssi06', 'spi09_ssi06', 'spi12_ssi06', 'spi03_ssi09', 'spi06_ssi09', 'spi09_ssi09', 'spi12_ssi09', 'spi03_ssi12', 'spi06_ssi12', 'spi09_ssi12', 'spi12_ssi12']\n",
      "Saving the plot...\n",
      "Saving the plot...\n",
      "Saving the plot...\n",
      "Saving the plot...\n",
      "Creating plot on the Most Frequent Correlation for ['spi03_ssi03', 'spi06_ssi03', 'spi09_ssi03', 'spi12_ssi03', 'spi03_ssi06', 'spi06_ssi06', 'spi09_ssi06', 'spi12_ssi06', 'spi03_ssi09', 'spi06_ssi09', 'spi09_ssi09', 'spi12_ssi09', 'spi03_ssi12', 'spi06_ssi12', 'spi09_ssi12', 'spi12_ssi12']\n",
      "Saving the plot...\n",
      "Saving the plot...\n",
      "Saving the plot...\n",
      "Saving the plot...\n",
      "Completed!\n"
     ]
    }
   ],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import cartopy.crs as ccrs\n",
    "import rasterio\n",
    "import os\n",
    "import numpy as np\n",
    "import matplotlib.colors as mcolors\n",
    "\n",
    "def plot_maps(folder, combinations, file_prefix, main_title, output_folder):\n",
    "    # Normalization for the colorbars\n",
    "    value_norm = mcolors.BoundaryNorm(boundaries=np.linspace(0, 1.0, 11), ncolors=10)\n",
    "    lag_norm = mcolors.BoundaryNorm(boundaries=np.arange(1, 13), ncolors=12)\n",
    "\n",
    "    # Define color maps for discrete intervals\n",
    "    value_cmap = plt.get_cmap('inferno', 10)\n",
    "    lag_cmap = plt.get_cmap('coolwarm', 12)\n",
    "\n",
    "    for i in range(0, len(combinations), 4):\n",
    "        subset_combinations = combinations[i:i+4]\n",
    "        # Adjust figsize to make the plot more compact\n",
    "        fig, axs = plt.subplots(nrows=4, ncols=2, figsize=(10, 10), subplot_kw={'projection': ccrs.PlateCarree()})\n",
    "\n",
    "        for j, time_scale_folder in enumerate(subset_combinations):\n",
    "            # Load data\n",
    "            value_file = os.path.join(folder, f'idn_cli_{file_prefix}_value_{time_scale_folder}.tif')\n",
    "            lag_file = os.path.join(folder, f'idn_cli_{file_prefix}_lag_{time_scale_folder}.tif')\n",
    "\n",
    "            with rasterio.open(value_file) as src:\n",
    "                value_data = src.read(1)\n",
    "                extent = [src.bounds.left, src.bounds.right, src.bounds.bottom, src.bounds.top]\n",
    "\n",
    "            with rasterio.open(lag_file) as src:\n",
    "                lag_data = src.read(1)\n",
    "\n",
    "            # Plot max_corr_value\n",
    "            ax_value = axs[j, 0]\n",
    "            ax_value.set_extent(extent, crs=ccrs.PlateCarree())\n",
    "            ax_value.coastlines()\n",
    "            im_value = ax_value.imshow(value_data, extent=extent, transform=ccrs.PlateCarree(), cmap=value_cmap, norm=value_norm, origin='upper')\n",
    "            ax_value.set_title(f'{time_scale_folder} - Strength', fontsize=10)\n",
    "            \n",
    "            # Plot max_corr_lag\n",
    "            ax_lag = axs[j, 1]\n",
    "            ax_lag.set_extent(extent, crs=ccrs.PlateCarree())\n",
    "            ax_lag.coastlines()\n",
    "            im_lag = ax_lag.imshow(lag_data, extent=extent, transform=ccrs.PlateCarree(), cmap=lag_cmap, norm=lag_norm, origin='upper')\n",
    "            ax_lag.set_title(f'{time_scale_folder} - Lag', fontsize=10)\n",
    "\n",
    "        plt.suptitle(main_title, fontsize=12, y=0.98)\n",
    "\n",
    "        # Configure and place the colorbar\n",
    "        cbar_ax_value = fig.add_axes([0.125, 0.07, 0.35, 0.01])\n",
    "        cbar_ax_lag = fig.add_axes([0.575, 0.07, 0.35, 0.01])\n",
    "        cbar_value = fig.colorbar(im_value, cax=cbar_ax_value, orientation='horizontal', spacing='proportional')\n",
    "        cbar_lag = fig.colorbar(im_lag, cax=cbar_ax_lag, orientation='horizontal', spacing='proportional')\n",
    "\n",
    "        cbar_value.set_label('Correlation Strength', fontsize=10)\n",
    "        cbar_lag.set_label('Lag (months)', fontsize=10)\n",
    "\n",
    "        # Adjust layout to make room for colorbar and reduce spaces between rows\n",
    "        plt.subplots_adjust(left=0.05, right=0.95, top=0.95, hspace=0.05, wspace=0.1)\n",
    "        # plt.tight_layout(rect=[0, 0.03, 1, 0.95], pad=1.8)\n",
    "\n",
    "        print('Saving the plot...')\n",
    "        plt.savefig(os.path.join(output_folder, f'idn_cli_{file_prefix}_combination_{i//4 + 1}.png'), dpi=300)\n",
    "        plt.close()\n",
    "\n",
    "# Define folders and combinations\n",
    "max_frequency_folder = './prop/05_frequency/max/'\n",
    "mode_frequency_folder = './prop/05_frequency/mode/'\n",
    "time_scale_combinations = [\n",
    "    \"spi03_ssi03\", \"spi06_ssi03\", \"spi09_ssi03\", \"spi12_ssi03\",\n",
    "    \"spi03_ssi06\", \"spi06_ssi06\", \"spi09_ssi06\", \"spi12_ssi06\",\n",
    "    \"spi03_ssi09\", \"spi06_ssi09\", \"spi09_ssi09\", \"spi12_ssi09\",\n",
    "    \"spi03_ssi12\", \"spi06_ssi12\", \"spi09_ssi12\", \"spi12_ssi12\"\n",
    "]\n",
    "output_folder = './prop/images/'\n",
    "os.makedirs(output_folder, exist_ok=True)\n",
    "\n",
    "# Plotting for max_corr\n",
    "print(f'Creating plot on the Maximum Correlation for {time_scale_combinations}')\n",
    "plot_maps(max_frequency_folder, time_scale_combinations, 'max_corr', \n",
    "          'The Strength and Lag of the Maximum Correlation between Meteorological and Hydrological Droughts', output_folder)\n",
    "\n",
    "# Plotting for freq_corr\n",
    "print(f'Creating plot on the Most Frequent Correlation for {time_scale_combinations}')\n",
    "plot_maps(mode_frequency_folder, time_scale_combinations, 'freq_corr', \n",
    "          'The Strength and Lag of the Most Frequent Correlation between Meteorological and Hydrological Droughts', output_folder)\n",
    "\n",
    "print('Completed!')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d1cee39f-bc8b-45fd-b928-7327a3464527",
   "metadata": {},
   "source": [
    "End of notebook"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
 }