Skip to content

Instantly share code, notes, and snippets.

@danhammer
Last active December 17, 2015 10:18
Show Gist options
  • Save danhammer/5593429 to your computer and use it in GitHub Desktop.
Save danhammer/5593429 to your computer and use it in GitHub Desktop.
A list of items that we have checked off for the port of FORMA to Earth Engine.

We continue to work with Thau and colleagues to port the FORMA algorithm to Earth Engine. It is clear that the exact FORMA algorithm cannot be implemented on EE without considerable effort from the Google team. There is currently no logistic classifier, for example, among other constraints. However, we are working with Thau et al. to make minor modifications to the algorithm to balance fidelity with ease/speed of implementation. Many of these modifications do not seem like concessions at all, but rather enhancements to the current methodology. With that said, the following is a list, to date, of the items that we have completed in the ongoing effort to implement the FORMA algorithm with the EE API.

  1. Built the cluster infrastructure to map across partitions and specifically to map the training algorithm across each tropical ecoregion. The objective is to build a classifier image for each ecoregion, independent of any other ecoregion. We had briefly experimented with the map() method in Earth Engine; but it will not be sufficient for our objective. Instead, we built the capacity to pass in an ecoregion identifier to a Python function on our own cluster to hit the EE servers in parallel. We then hit another constraint, however, on the volume of API calls that we're allowed. We are now working with Thau on a workaround.

  2. Wrote a script to perform the classification of pixels based on available data sets. With the considerable data already in Earth Engine, the move to 250m resolution was technically simple, although we are still trying to understand the statistical validity of mixing spatial resolutions (our training data will remain at 500m resolution). We do not have the FIRMS (fire) data, as it has not yet been ingested into Earth Engine. The neighborhood effects are written, but we are trying to figure out a good testing suite, since we cannot visually check the results in the Playground (a feature that still has to be implemented for neighborhood effects, says Noel). We have moved to the Pegasos algorithm rather than the logistic classifier, since the logistic classifier is not part of the suite of existing EE classifiers. The Pegasos algorithm yields a continuous index of binary classification intensity, much like the logistic probability output.

  3. Built an ensemble framework to circumvent the 1 million training point limit within EE. The training points, as defined by the EE API, are all pixels within each ecoregion. The 1 million point limit is a binding constraint, since many ecoregions are much larger (as seen in the following figure). Instead, if we can repeatedly sample from the ecoregion, then we will be able to bootstrap or ensemble a classifier image. We then run into the limit on API calls, however, since each ecoregion would require thousands of calls. This would effectively be a distributed denial of service attack on the EE servers, which is well-protected against by standard Google infrastructure. Thau suggested that the Google team can, internally, circumvent one or more of these constraints. They have access that we do not.

  4. Wrote a process to rip apart and upload EE images to the current GFW site, given an HTML pointer from EE. Once the output image has been calculated on the EE platform, we will now be able to immediatey display the data on the current site -- a seamless correspondence between EE and the existing GFW infrastructure. We have not extensively tested the process, since ee do not have actual FORMA output from EE; but we worked closely with Andrew Hill at Vizzuality on the script. The process should work for the large images that are calculated from EE.

  5. Documentation and setup for multiple platforms. If the forma-ee repository is ever made public, then there is sufficuent documentation for anyone to run the process. The documentation is subject to change in parallel with changes to the EE API, which is not fully stable yet.

There are many other, smaller tasks that have been accomplished. Those listed above are the most significant. There are now many questions about the long-term interaction between GFW and the Google team, since we will not be able to directly adjust the algorithm. We had hoped that only the component functions would be maintained by the Google team, and not the entire workflow. The existing constraints have removed control of the maintenance and refinement of the algorithm. How will we be involved in the maintenance and refinement of the algorithm? Will we rely on the Google team in perpetuity? Questions to be answered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment