giswqs · October 8, 2024 21:09
diff --git a/image_classification.ipynb b/image_classification.ipynb
 {
 "cells": [
  {
   "cell_type": "markdown",
   "id": "55fb1b6f",
   "metadata": {},
   "source": [
    "# Image Classification\n",
    "\n",
    "## Unsupervised classification\n",
    "\n",
    "Unsupervised classification is a machine learning technique that groups similar pixels into classes based on their spectral characteristics. Earth Engine provides the `ee.Clusterer` class for performing unsupervised classification. The supported clustering algorithms include: `wekaKmeans`, `wekaXmeans`, `wekaCascadeKMeans`, `wekaCobweb`, and `wekaVQ`. The general workflow for performing unsupervised classification in Earth Engine is as follows:\n",
    "\n",
    "1. Prepare a multiband image for classification.\n",
    "2. Generate training samples from the image.\n",
    "3. Initialize a clusterer and adjust the parameters as needed.\n",
    "4. Train the clusterer using the training samples.\n",
    "5. Apply the clusterer to the image.\n",
    "6. Label the clusters as needed.\n",
    "7. Export the classified image.\n",
    "\n",
    "The following example demonstrates how to perform unsupervised classification on a Landsat 9 image. First, filter the Landsat 9 image collection by a region of interest and date range. Select the least cloudy image and select the seven spectral bands. Note that a scaling factor of 0.0000275 and a bias of -0.2 are applied to the image to convert the DN values to reflectance values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d8e33cb6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %pip install geemap geedim"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4cfc231a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import ee\n",
    "import geemap"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "500bd678",
   "metadata": {},
   "outputs": [],
   "source": [
    "ee.Authenticate()\n",
    "ee.Initialize()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ec6ae105",
   "metadata": {},
   "outputs": [],
   "source": [
    "m = geemap.Map()\n",
    "\n",
    "point = ee.Geometry.Point([-88.0664, 41.9411])\n",
    "\n",
    "image = (\n",
    "    ee.ImageCollection('LANDSAT/LC09/C02/T1_L2')\n",
    "    .filterBounds(point)\n",
    "    .filterDate('2022-01-01', '2022-12-31')\n",
    "    .sort('CLOUD_COVER')\n",
    "    .first()\n",
    "    .select('SR_B[1-7]')\n",
    ")\n",
    "\n",
    "region = image.geometry()\n",
    "image = image.multiply(0.0000275).add(-0.2).set(image.toDictionary())\n",
    "vis_params = {'min': 0, 'max': 0.3, 'bands': ['SR_B5', 'SR_B4', 'SR_B3']}\n",
    "\n",
    "m.center_object(region, 8)\n",
    "m.add_layer(image, vis_params, \"Landsat-9\")\n",
    "m"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e9c614f3",
   "metadata": {},
   "source": [
    "Next, use the `get_info()` function to inspect the image metadata and inspect it in a tree structure:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b7c7498d",
   "metadata": {},
   "outputs": [],
   "source": [
    "geemap.get_info(image)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bade5fe6",
   "metadata": {},
   "source": [
    "You can also get an image property using the `get()` method. For example, to get the image acquisition date:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f9e1b363",
   "metadata": {},
   "outputs": [],
   "source": [
    "image.get('DATE_ACQUIRED').getInfo()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6cd3cc5",
   "metadata": {},
   "source": [
    "To check the image cloud cover:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b794ae99",
   "metadata": {},
   "outputs": [],
   "source": [
    "image.get('CLOUD_COVER').getInfo()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ade557b2",
   "metadata": {},
   "source": [
    "With the image ready, you can now generate training samples. You can specify a region where the training samples will be generated. There are several ways to create a region for generating training samples:\n",
    "\n",
    "- Draw a shape (e.g., rectangle) on the map and the use `region = m.user_roi`\n",
    "- Define a geometry, such as `region = ee.Geometry.Rectangle([xmin, ymin, xmax, ymax])`\n",
    "- Create a buffer around a point, such as `region = ee.Geometry.Point([x, y]).buffer(v)`\n",
    "- If you don't define a region, it will use the image footprint by default\n",
    "\n",
    "The following code generates 5000 training samples from the image and add them to the m. Note that the `region` parameter is not specified, so the image footprint will be used as the region:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7b1da7a2",
   "metadata": {},
   "outputs": [],
   "source": [
    "training = image.sample(\n",
    "    **{\n",
    "        # \"region\": region,\n",
    "        'scale': 30,\n",
    "        'numPixels': 5000,\n",
    "        'seed': 0,\n",
    "        'geometries': True,  # Set this to False to ignore geometries\n",
    "    }\n",
    ")\n",
    "\n",
    "m.add_layer(training, {}, 'Training samples')\n",
    "m"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac9c872b",
   "metadata": {},
   "source": [
    "To inspect the attribute table of training data, use the `ee_to_df()` function on the first few features of the collection:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "24deb7bd",
   "metadata": {},
   "outputs": [],
   "source": [
    "geemap.ee_to_df(training.limit(5))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0fb3082a",
   "metadata": {},
   "source": [
    "The training data is ready. Next, you need to initialize a clusterer and train it using the training data. The following code initializes a `wekaKmeans` clusterer and trains it using the training data by specifying the number of clusters (e.g., 5):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7353f022",
   "metadata": {},
   "outputs": [],
   "source": [
    "n_clusters = 5\n",
    "clusterer = ee.Clusterer.wekaKMeans(n_clusters).train(training)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "66c65be4",
   "metadata": {},
   "source": [
    "With the clusterer trained, you can now apply it to the image. The following code applies the clusterer to the image and adds the classified image to the map with a random color palette:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "755caf17",
   "metadata": {},
   "outputs": [],
   "source": [
    "result = image.cluster(clusterer)\n",
    "m.add_layer(result.randomVisualizer(), {}, 'clusters')\n",
    "m"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "adafe380",
   "metadata": {},
   "source": [
    "Note the value and color of each cluster are randomly assigned. Use the Inspector tool to inspect the value of each cluster and label them as needed. Define a legend dictionary with pairs of cluster labels and colors, which can be used to create a legend for the classified image:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4228b5c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "legend_dict = {\n",
    "    'Open Water': '#466b9f',\n",
    "    'Developed, High Intensity': '#ab0000',\n",
    "    'Developed, Low Intensity': '#d99282',\n",
    "    'Forest': '#1c5f2c',\n",
    "    'Cropland': '#ab6c28'\n",
    "\n",
    "}\n",
    "\n",
    "palette = list(legend_dict.values())\n",
    "\n",
    "m.add_layer(\n",
    "    result, {'min': 0, 'max': 4, 'palette': palette}, 'Labelled clusters'\n",
    ")\n",
    "m.add_legend(title='Land Cover Type',legend_dict=legend_dict , position='bottomright')\n",
    "m"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ae680b89",
   "metadata": {},
   "source": [
    "The unsupervised classification result is shown in {numref}`ch06_unsupervised_classification`.\n",
    "\n",
    "Finally, you can export the classified image to your computer. Specify the image region, scale, and output file path as needed:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6f6caacb",
   "metadata": {},
   "outputs": [],
   "source": [
    "geemap.download_ee_image(image, filename='unsupervised.tif', region=region, scale=90)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0d1b301",
   "metadata": {},
   "source": [
    "## Supervised classification\n",
    "\n",
    "Supervised classification is a machine learning technique that uses labeled training data to classify an image. The training data is used to train a classifier, which is then applied to the image to generate a classified image. Earth Engine provides the `ee.Classifier` class for performing supervised classification. The supported supervised classification algorithms include: [Classification and Regression Trees (CART)](https://en.wikipedia.org/wiki/Decision_tree_learning), [Support Vector Machine (SVM)](https://en.wikipedia.org/wiki/Support_vector_machine), [Random Forest](https://en.wikipedia.org/wiki/Random_forest), [Naive Bayes](https://en.wikipedia.org/wiki/Naive_Bayes_classifier), and [Gradient Tree Boost](https://en.wikipedia.org/wiki/Gradient_boosting). The general workflow for supervised classification is as follows:\n",
    "\n",
    "1. Prepare an image for classification.\n",
    "2. Collect training data. Each training sample should have a class label and a set of properties storing numeric values for the predictors.\n",
    "3. Initialize a classifier and set its parameters as needed.\n",
    "4. Train the classifier using the training data.\n",
    "5. Apply the classifier to the image.\n",
    "6. Perform accuracy assessment.\n",
    "7. Export the classified image.\n",
    "\n",
    "In this section, you will learn how to perform supervised classification using the CART algorithm. You can easily adapt the code to other supervised classification algorithms, such as SVM and Random Forest. We will use labeled training data from the [USGS National Land Cover Database (NLCD)](https://developers.google.com/earth-engine/datasets/catalog/USGS_NLCD_RELEASES_2019_REL_NLCD) dataset and train a CART classifier using the training data. The trained classifier will then be applied to the Landsat-9 image to generate a classified image.\n",
    "\n",
    "First, filter the [Landsat 8 image collection](https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LC08_C02_T1_L2) to select a cloud-free image acquired in 2019 for your region of interest:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7744225e",
   "metadata": {},
   "outputs": [],
   "source": [
    "m = geemap.Map()\n",
    "point = ee.Geometry.Point([-122.4439, 37.7538])\n",
    "\n",
    "image = (\n",
    "    ee.ImageCollection('LANDSAT/LC08/C02/T1_L2')\n",
    "    .filterBounds(point)\n",
    "    .filterDate('2019-01-01', '2020-01-01')\n",
    "    .sort('CLOUD_COVER')\n",
    "    .first()\n",
    "    .select('SR_B[1-7]')\n",
    ")\n",
    "\n",
    "image = image.multiply(0.0000275).add(-0.2).set(image.toDictionary())\n",
    "vis_params = {'min': 0, 'max': 0.3, 'bands': ['SR_B5', 'SR_B4', 'SR_B3']}\n",
    "\n",
    "m.center_object(point, 8)\n",
    "m.add_layer(image, vis_params, \"Landsat-8\")\n",
    "m"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ebe29bb7",
   "metadata": {},
   "source": [
    "Use the `get_info()` function to check the image properties:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3d1b024c",
   "metadata": {},
   "outputs": [],
   "source": [
    "geemap.get_info(image)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "52f8013f",
   "metadata": {},
   "source": [
    "To get a specific image property, use the `get()` method with the property name as the argument. For example, to retrieve the image acquisition date:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "82abc9ca",
   "metadata": {},
   "outputs": [],
   "source": [
    "image.get('DATE_ACQUIRED').getInfo()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2947735b",
   "metadata": {},
   "source": [
    "To check the cloud cover of the image:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bb65c0d2",
   "metadata": {},
   "outputs": [],
   "source": [
    "image.get('CLOUD_COVER').getInfo()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2834ad7b",
   "metadata": {},
   "source": [
    "Next, create a training dataset from the NLCD dataset, which is a 30-m resolution dataset covering the conterminous United States. The NLCD dataset contains 21 land cover classes. A detailed description of each NLCD land cover type can be found at <https://bit.ly/3EyvacV>. The following code filters the NLCD dataset to select the land cover image of interest and clips the dataset to the region of interest (the footprint of the selected Landsat image in the previous step):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aa203ea5",
   "metadata": {},
   "outputs": [],
   "source": [
    "nlcd = ee.Image('USGS/NLCD_RELEASES/2019_REL/NLCD/2019')\n",
    "landcover = nlcd.select('landcover').clip(image.geometry())\n",
    "m.add_layer(landcover, {}, 'NLCD Landcover')\n",
    "m"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "89e58fbd",
   "metadata": {},
   "source": [
    "With the land cover image ready, we can now sample the image to collect training data with land cover labels. Similar to the unsupervised classification introduced above, you can specify a region of interest, scale, and number of points to sample. The following code samples 5000 points from the land cover image:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b7b4abf8",
   "metadata": {},
   "outputs": [],
   "source": [
    "points = landcover.sample(\n",
    "    **{\n",
    "        'region': image.geometry(),\n",
    "        'scale': 30,\n",
    "        'numPixels': 5000,\n",
    "        'seed': 0,\n",
    "        'geometries': True,\n",
    "    }\n",
    ")\n",
    "\n",
    "m.add_layer(points, {}, 'training', False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c13599ff",
   "metadata": {},
   "source": [
    "Note that the resulting number of training samples may be less than the specified number of points. This is because the sampling algorithm will discard pixels with no data values. To check the number of training samples:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "369a8a6f",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(points.size().getInfo())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5c037f8c",
   "metadata": {},
   "source": [
    "Revise the number of points to sample in the previous step if needed.\n",
    "\n",
    "Next, we will add the spectral bands of the Landsat image to the training data. Note that the training data created from the previous step already contains the land cover labels (i.e., the `landcover` property). The following code adds the seven spectral bands of the Landsat image to the training data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9c7e8798",
   "metadata": {},
   "outputs": [],
   "source": [
    "bands = ['SR_B1', 'SR_B2', 'SR_B3', 'SR_B4', 'SR_B5', 'SR_B6', 'SR_B7']\n",
    "label = 'landcover'\n",
    "features = image.select(bands).sampleRegions(\n",
    "    **{'collection': points, 'properties': [label], 'scale': 30}\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "58a8899e",
   "metadata": {},
   "source": [
    "Dislay the attribute table of the training data using the `ee_to_df()` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "50ddd0e4",
   "metadata": {},
   "outputs": [],
   "source": [
    "geemap.ee_to_df(features.limit(5))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "76ca5d84",
   "metadata": {},
   "source": [
    "The training dataset is ready. You can now train a classifier using the training data. The following code initializes a [CART classifier](https://developers.google.com/earth-engine/apidocs/ee-classifier-smilecart) and trains it using the training data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bb844524",
   "metadata": {},
   "outputs": [],
   "source": [
    "params = {\n",
    "\n",
    "    'features': features,\n",
    "    'classProperty': label,\n",
    "    'inputProperties': bands,\n",
    "\n",
    "}\n",
    "classifier = ee.Classifier.smileCart(maxNodes=None).train(**params)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4e488e8d",
   "metadata": {},
   "source": [
    "The `features` parameter specifies the training data. The `classProperty` parameter specifies the property name of the training data that contains the class labels. The `inputProperties` parameter specifies the property names of the training data that contain the predictor values.\n",
    "\n",
    "All Earth Engine classifiers have a `train()` function to train the classifier using the training data. The CART classifier has a `maxNodes` parameter to specify the maximum number of nodes in the tree. The default value is `None`, which means that the tree will be grown until all leaves are pure or until all leaves contain less than 5 training samples.\n",
    "\n",
    "Since the classifier has been trained, you can now apply it to the Landsat image to generate a classified image. Make sure you use the same spectral bands as the training data. The following code applies the trained classifier to the selected Landsat image and adds the classified image with a random color palette to the map:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d3dfa244",
   "metadata": {},
   "outputs": [],
   "source": [
    "classified = image.select(bands).classify(classifier).rename('landcover')\n",
    "m.add_layer(classified.randomVisualizer(), {}, 'Classified')\n",
    "m"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f45d515",
   "metadata": {},
   "source": [
    "To compare the classified image with the referenced NLCD land cover image, it is better to use the same color palette. To set the color palette of an Earth Engine image with a predefined palette, set the `bandname_class_values` and `bandname_class_palette` properties of the image. For example, the NLCD land cover has the `landcover_class_values` and `landcover_class_palette` properties. When the land cover band is added to the map, the color palette will be automatically applied so that users don't have to specify the color palette manually. To check the color palette of the NLCD land cover image:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8a6720d5",
   "metadata": {},
   "outputs": [],
   "source": [
    "geemap.get_info(nlcd)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eaab7f04",
   "metadata": {},
   "source": [
    "We can use the same approach to set the color palette of the classified image. Note that in the previous step, we already renamed the classified image band to `landcover`. Therefore, we can use the `landcover_class_values` and `landcover_class_palette` properties to set the color palette of the classified image. The following code sets the color palette of the classified image:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8011f6a0",
   "metadata": {},
   "outputs": [],
   "source": [
    "class_values = nlcd.get('landcover_class_values')\n",
    "class_palette = nlcd.get('landcover_class_palette')\n",
    "classified = classified.set({\n",
    "    'landcover_class_values': class_values,\n",
    "    'landcover_class_palette': class_palette\n",
    "})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "df8ff2a1",
   "metadata": {},
   "source": [
    "The classified image should now have the same color palette as the NLCD land cover image. Add the classified image and associated legend to the map:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d4b93e1e",
   "metadata": {},
   "outputs": [],
   "source": [
    "m.add_layer(classified, {}, 'Land cover')\n",
    "m.add_legend(title=\"Land cover type\", builtin_legend='NLCD')\n",
    "m"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc03ebc2",
   "metadata": {},
   "source": [
    "Use the layer control widget to change the opacity of the classified image and compare it visually with the NLCD land cover image. You might need to use the full-screen mode as the legend may block the view of layer control widget.\n",
    "\n",
    "Finally, you can export the classified image to your computer:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b6a6ba03",
   "metadata": {},
   "outputs": [],
   "source": [
    "geemap.download_ee_image(\n",
    "    landcover,\n",
    "    filename='supervised.tif',\n",
    "    region=image.geometry(),\n",
    "    scale=30\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "61d0f965",
   "metadata": {},
   "source": [
    "## Accuracy assessment\n",
    "\n",
    "After performing image classification, you may want to assess the accuracy of the classification. Earth Engine provides several functions for assessing the accuracy of a classification. In this section, we will classify a Sentinel-2 image using [random forest](https://en.wikipedia.org/wiki/Random_forest) and assess the accuracy of the classification.\n",
    "\n",
    "First, filter the Sentinel-2 image collection and select an image for your region of interest:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f63fe03a",
   "metadata": {},
   "outputs": [],
   "source": [
    "m = geemap.Map()\n",
    "point = ee.Geometry.Point([-122.4439, 37.7538])\n",
    "\n",
    "img = (\n",
    "    ee.ImageCollection('COPERNICUS/S2_SR')\n",
    "    .filterBounds(point)\n",
    "    .filterDate('2020-01-01', '2021-01-01')\n",
    "    .sort('CLOUDY_PIXEL_PERCENTAGE')\n",
    "    .first()\n",
    "    .select('B.*')\n",
    ")\n",
    "\n",
    "vis_params = {'min': 100, 'max': 3500, 'bands': ['B11',  'B8',  'B3']}\n",
    "\n",
    "m.center_object(point, 9)\n",
    "m.add_layer(img, vis_params, \"Sentinel-2\")\n",
    "m"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d3b95aeb",
   "metadata": {},
   "source": [
    "The [ESA 10-m WorldCover](https://developers.google.com/earth-engine/datasets/catalog/ESA_WorldCover_v100) can be used to create labeled training data. First, we need to remap the land cover class values to a 0-based sequential series so that we can create a confusion matrix later:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "420d632a",
   "metadata": {},
   "outputs": [],
   "source": [
    "lc = ee.Image('ESA/WorldCover/v100/2020')\n",
    "classValues = [10, 20, 30, 40, 50, 60, 70, 80, 90, 95, 100]\n",
    "remapValues = ee.List.sequence(0, 10)\n",
    "label = 'lc'\n",
    "lc = lc.remap(classValues, remapValues).rename(label).toByte()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b30dc93a",
   "metadata": {},
   "source": [
    "Next, add the ESA land cover as a band of the Sentinel-2 reflectance image and sample 100 pixels at a 10m scale from each land cover class within the region of interest:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2e3a1c42",
   "metadata": {},
   "outputs": [],
   "source": [
    "sample = img.addBands(lc).stratifiedSample(**{\n",
    "  'numPoints': 100,\n",
    "  'classBand': label,\n",
    "  'region': img.geometry(),\n",
    "  'scale': 10,\n",
    "  'geometries': True\n",
    "})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "191430b1",
   "metadata": {},
   "source": [
    "Add a random value field to the sample and use it to approximately split 80% of the features into a training set and 20% into a validation set:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "af21fdcc",
   "metadata": {},
   "outputs": [],
   "source": [
    "sample = sample.randomColumn()\n",
    "trainingSample = sample.filter('random <= 0.8')\n",
    "validationSample = sample.filter('random > 0.8')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cd5b8262",
   "metadata": {},
   "source": [
    "With the training data ready, we can train a random forest classifier using the training data.\n",
    "\n",
    "The following code trains a random forest classifier with 10 trees:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "75243563",
   "metadata": {},
   "outputs": [],
   "source": [
    "trainedClassifier = ee.Classifier.smileRandomForest(numberOfTrees=10).train(**{\n",
    "  'features': trainingSample,\n",
    "  'classProperty': label,\n",
    "  'inputProperties': img.bandNames()\n",
    "})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c1cb7ffb",
   "metadata": {},
   "source": [
    "To get information about the trained classifier:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "af64d3ad",
   "metadata": {},
   "outputs": [],
   "source": [
    "print('Results of trained classifier', trainedClassifier.explain().getInfo())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bd5e8acc",
   "metadata": {},
   "source": [
    "To get a confusion matrix and overall accuracy for the training sample:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e3e4e434",
   "metadata": {},
   "outputs": [],
   "source": [
    "trainAccuracy = trainedClassifier.confusionMatrix()\n",
    "trainAccuracy.getInfo()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "03ba6671",
   "metadata": {},
   "source": [
    "```text\n",
    "[[81, 0, 0, 0, 1, 0, 0, 0, 0],\n",
    " [0, 83, 1, 0, 0, 0, 0, 0, 0],\n",
    " [1, 1, 73, 2, 0, 0, 0, 0, 0],\n",
    " [0, 0, 1, 77, 0, 0, 0, 0, 0],\n",
    " [0, 0, 0, 1, 81, 1, 0, 0, 0],\n",
    " [0, 1, 2, 3, 2, 70, 0, 0, 2],\n",
    " [0, 0, 0, 0, 0, 0, 0, 0, 0],\n",
    " [0, 0, 0, 0, 0, 0, 0, 71, 0],\n",
    " [1, 0, 0, 1, 0, 4, 0, 0, 71]]\n",
    "```\n",
    "\n",
    "The horizontal axis of the confusion matrix corresponds to the input classes, and the vertical axis corresponds to the output classes. The rows and columns start at class 0 and increase sequentially up to the maximum class value, so some rows or columns might be empty if the input classes aren't 0-based or sequential. That's the reason why we remapped the ESA land cover class values to a 0-based sequential series earlier. Note that your confusion matrix may look slightly different from the one shown above as the training data is randomly sampled.\n",
    "\n",
    "The overall accuracy essentially tells us what proportion of al the reference sites was mapped correctly. The overall accuracy is usually expressed as a percent, with 100% accuracy being a perfect classification where all reference sites were classified correctly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4eba21ed",
   "metadata": {},
   "outputs": [],
   "source": [
    "trainAccuracy.accuracy().getInfo()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "015601ea",
   "metadata": {},
   "source": [
    "The Kappa Coefficient is generated from a statistical test to evaluate the accuracy of a classification. Kappa essentially evaluates how well the classification performed as compared to just randomly assigning values, i.e., did the classification do better than random? The Kappa Coefficient can range from -1 to 1. A value of 0 indicated that the classification is no better than a random classification. A negative number indicates the classification is significantly worse than random. A value close to 1 indicates that the classification is significantly better than random."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3501e191",
   "metadata": {},
   "outputs": [],
   "source": [
    "trainAccuracy.kappa().getInfo()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a16d8a3d",
   "metadata": {},
   "source": [
    "To get a confusion matrix and overall accuracy for the validation sample:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9460b3dd",
   "metadata": {},
   "outputs": [],
   "source": [
    "validationSample = validationSample.classify(trainedClassifier)\n",
    "validationAccuracy = validationSample.errorMatrix(label, 'classification')\n",
    "validationAccuracy.getInfo()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c692acc",
   "metadata": {},
   "source": [
    "```text\n",
    "[[13, 1, 3, 0, 1, 0, 0, 0, 0],\n",
    " [0, 11, 2, 0, 1, 0, 0, 0, 2],\n",
    " [1, 0, 12, 7, 1, 2, 0, 0, 0],\n",
    " [0, 3, 6, 9, 3, 1, 0, 0, 0],\n",
    " [2, 0, 3, 0, 10, 2, 0, 0, 0],\n",
    " [1, 1, 1, 3, 7, 6, 0, 0, 1],\n",
    " [0, 0, 0, 0, 0, 0, 0, 0, 0],\n",
    " [0, 0, 0, 0, 0, 0, 0, 29, 0],\n",
    " [2, 2, 2, 0, 2, 2, 0, 0, 13]]\n",
    "```\n",
    "\n",
    "To compute the overall accuracy for the validation sample:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b95c085e",
   "metadata": {},
   "outputs": [],
   "source": [
    "validationAccuracy.accuracy().getInfo()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "35553000",
   "metadata": {},
   "source": [
    "Producer's Accuracy, also known as the Sensitivity or Recall, is a measure of how well a classifier correctly identifies positive instances. It is the ratio of true positive classifications to the total number of actual positive instances. This metric is used to evaluate the performance of a classifier when the focus is on minimizing false negatives (i.e., instances that are actually positive but are classified as negative)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "714c5020",
   "metadata": {},
   "outputs": [],
   "source": [
    "validationAccuracy.producersAccuracy().getInfo()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ae161d14",
   "metadata": {},
   "source": [
    "On the other hand, Consumer's Accuracy, also known as Precision, is a measure of how well a classifier correctly identifies negative instances. It is the ratio of true negative classifications to the total number of actual negative instances. This metric is used to evaluate the performance of a classifier when the focus is on minimizing false positives (i.e., instances that are actually negative but are classified as positive)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "92cdbc2b",
   "metadata": {},
   "outputs": [],
   "source": [
    "validationAccuracy.consumersAccuracy().getInfo()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e970bad6",
   "metadata": {},
   "source": [
    "The confusion matrices can be saved as a csv file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b75dd521",
   "metadata": {},
   "outputs": [],
   "source": [
    "import csv\n",
    "\n",
    "with open(\"training.csv\", \"w\", newline=\"\") as f:\n",
    "    writer = csv.writer(f)\n",
    "    writer.writerows(trainAccuracy.getInfo())\n",
    "\n",
    "with open(\"validation.csv\", \"w\", newline=\"\") as f:\n",
    "    writer = csv.writer(f)\n",
    "    writer.writerows(validationAccuracy.getInfo())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "87777b00",
   "metadata": {},
   "source": [
    "If the validation accuracy is acceptable, the trained classifier can then be applied to the entire image:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "15129c08",
   "metadata": {},
   "outputs": [],
   "source": [
    "imgClassified = img.classify(trainedClassifier)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f8fa7ce1",
   "metadata": {},
   "source": [
    "Lastly, add the resulting data layers (e.g., Sentinel-2 image, classified image, training and validation samples) to the m. Use the layer control widget the change the opacity of the classified image and compare it to the ESA WorldCover."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8ba15931",
   "metadata": {},
   "outputs": [],
   "source": [
    "classVis = {\n",
    "  'min': 0,\n",
    "  'max': 10,\n",
    "  'palette': ['006400' ,'ffbb22', 'ffff4c', 'f096ff', 'fa0000', 'b4b4b4',\n",
    "            'f0f0f0', '0064c8', '0096a0', '00cf75', 'fae6a0']\n",
    "}\n",
    "m.add_layer(lc, classVis, 'ESA Land Cover', False)\n",
    "m.add_layer(imgClassified, classVis, 'Classified')\n",
    "m.add_layer(trainingSample, {'color': 'black'}, 'Training sample')\n",
    "m.add_layer(validationSample, {'color': 'white'}, 'Validation sample')\n",
    "m.add_legend(title='Land Cover Type', builtin_legend='ESA_WorldCover')\n",
    "m.center_object(img)\n",
    "m"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
 }