Skip to content

Instantly share code, notes, and snippets.

@ldodds
Last active January 27, 2021 18:30
Show Gist options
  • Save ldodds/fcf420704160e34ff981e81771d5c8b8 to your computer and use it in GitHub Desktop.
Save ldodds/fcf420704160e34ff981e81771d5c8b8 to your computer and use it in GitHub Desktop.
Meteostat Interpolation Notes

Code for the hourly endpoints in Meteostat API can be found here:

Overall Flow

After initially accepting the request, connecting to the database, parsing parameters, etc. The bulk of work is done in the Fetch method.

This does the following, focusing only on temperatures:

  • parse the start/end dates, ading supplied timezone
  • fetch the list of nearby stations from the database (see below)
  • use the list of stations to assemble the SQL query to fetch observations from the databse. This is a big union across several sets of tables. Looks like one per data source. If it takes more than 5 seconds, it'll be aborted
  • the queries are processed into a raw variable which is a hash of station -> date -> hour -> results
  • no results then return
  • create a PHP DatePeriod to iterate over hours between the dates requested. This iteration builds up the data to be returned
  • get the "score" for temp reading
  • interpolate the reading
  • add the interpolated reading to the data to be returned
  • return the array of readings

There's two areas to look more closely:

Querying for Stations

Summary: find up to 6 nearby stations that have readings, ordering by a distance based score.

This is the query that finds stations. $this->lat and $this->lon refer to the coordinates provided by the user.

$this->alt is the altitude, either supplied by the user, or a default which is calculated from the database :

SELECT `stations`.`id`                                AS `id`, 
       `stations`.`altitude`                          AS `altitude`, 
       Round(St_distance_sphere(Point(".$this->lon.", ".$this->lat."), Point( 
                   `stations`.`longitude`, 
                   `stations`.`latitude`)) / 1000, 1) AS `distance`, 
       Round(( St_distance_sphere(Point(".$this->lon.", ".$this->lat."), Point( 
                       `stations`.`longitude`, 
                       `stations`.`latitude`)) / 1000 ) * ( 
             Abs(`stations`.`altitude` - ". $this->alt .") 
             + 10 ), 1)                               AS `score` 
FROM   `stations` 
       INNER JOIN `stations_inventory` 
               ON `stations`.`id` = `stations_inventory`.`station` 
WHERE  `stations`.`latitude` IS NOT NULL 
       AND Abs(`stations`.`latitude`) <= 90 
       AND `stations`.`longitude` IS NOT NULL 
       AND Abs(`stations`.`longitude`) <= 180 
       AND `stations_inventory`.`hourly_start` IS NOT NULL 
       AND `stations_inventory`.`hourly_end` > Date('". $this->start ."') 
       AND Abs(`stations`.`altitude` - ". $this->alt .") < 350 
HAVING `distance` < 60 
ORDER  BY `score` 
LIMIT  6 

Couple of things to note:

  • distance is the distance between the station and the coords provided by the user. This is converted from meters to kms
  • the distance is used to limit results to stations that are less than 60km away
  • this is further limited to 6 stations, ordered by score
  • the score is used in the code when doing interpolation.

It is calculated as: the distance of the station from the request points in km * the difference in altitude between the station and the requested altitude + 10

Calculating scores

Summary: calculate a weighting to be applied to observations from four of the stations found earlier (those with highest distance score)

The getScore function is used to build up a score.

It:

  • loops through the stations found earlier
  • if the station has an observations for temperature at that day, its score value from the SQL query is added to rawScore, and its id is recorded
  • when four stations have been found, the loop exits

To calculate the final score, the code then

  • loops through the stations found earlier
  • if the stations has temperature readings on the day, and was identified in the first step
  • a tempscore is calculated for the station. If the rawScore is the same as the stations core (which will happen if only a single station is found), then its set to that value, otherwise its the difference between the rawScore and the database score

The final score returned is the total tempscore, which should be the same as rawScore

The score then is basicall a weighting thats based on how close the station is to the designed coordinated.

The interpolation

Summary: calculate an adjusted reading based on adding together weighted values of observations from the four stations found earlier, plus an altitude related modifier. If there are stations which are very close by then this observations are used more or less directly.

The iterpolation varies across different measurements. But for temperatures its done as follows:

  • find those stations that have a tempscore (so a maximum of 4 of the possible 6 returned, those with a highest score are preferred)
  • calculate a constant modifier. For temperatures this is 2/3 * the altitude differences between the station and the request point / 100. This takes into account the average altitude adjustment of 6C per 1000m of altitude difference ie. ~(2/3)C per 100m.
  • then, if the altitude difference is less than 100m and the station is closer than 9km, return observed temperature * constant modifier
  • otherwise, calculate the value as the sum of observed temperature * constant modifier * the tempscore weighting

So the final interpolation is a calculated by a % weighting of the observed value based on the geographic distance between the queried point and the observations in the database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment