Code for the hourly endpoints in Meteostat API can be found here:
Overall Flow
After initially accepting the request, connecting to the database, parsing parameters, etc. The bulk of work is done in the Fetch method.
This does the following, focusing only on temperatures:
- parse the start/end dates, ading supplied timezone
- fetch the list of nearby stations from the database (see below)
- use the list of stations to assemble the SQL query to fetch observations from the databse. This is a big union across several sets of tables. Looks like one per data source. If it takes more than 5 seconds, it'll be aborted
- the queries are processed into a
raw
variable which is a hash of station -> date -> hour -> results - no results then return
- create a PHP
DatePeriod
to iterate over hours between the dates requested. This iteration builds up thedata
to be returned - get the "score" for temp reading
- interpolate the reading
- add the interpolated reading to the data to be returned
- return the array of readings
There's two areas to look more closely:
Querying for Stations
Summary: find up to 6 nearby stations that have readings, ordering by a distance based score.
This is the query that finds stations. $this->lat
and $this->lon
refer to the coordinates provided by the user.
$this->alt
is the altitude, either supplied by the user, or a default which is calculated from the database :
SELECT `stations`.`id` AS `id`,
`stations`.`altitude` AS `altitude`,
Round(St_distance_sphere(Point(".$this->lon.", ".$this->lat."), Point(
`stations`.`longitude`,
`stations`.`latitude`)) / 1000, 1) AS `distance`,
Round(( St_distance_sphere(Point(".$this->lon.", ".$this->lat."), Point(
`stations`.`longitude`,
`stations`.`latitude`)) / 1000 ) * (
Abs(`stations`.`altitude` - ". $this->alt .")
+ 10 ), 1) AS `score`
FROM `stations`
INNER JOIN `stations_inventory`
ON `stations`.`id` = `stations_inventory`.`station`
WHERE `stations`.`latitude` IS NOT NULL
AND Abs(`stations`.`latitude`) <= 90
AND `stations`.`longitude` IS NOT NULL
AND Abs(`stations`.`longitude`) <= 180
AND `stations_inventory`.`hourly_start` IS NOT NULL
AND `stations_inventory`.`hourly_end` > Date('". $this->start ."')
AND Abs(`stations`.`altitude` - ". $this->alt .") < 350
HAVING `distance` < 60
ORDER BY `score`
LIMIT 6
Couple of things to note:
distance
is the distance between the station and the coords provided by the user. This is converted from meters to kms- the
distance
is used to limit results to stations that are less than 60km away - this is further limited to 6 stations, ordered by
score
- the
score
is used in the code when doing interpolation.
It is calculated as: the distance of the station from the request points in km
* the difference in altitude between the station and the requested altitude
+ 10
Calculating scores
Summary: calculate a weighting to be applied to observations from four of the stations found earlier (those with highest distance score)
The getScore
function is
used to build up a score.
It:
- loops through the stations found earlier
- if the station has an observations for temperature at that day, its
score
value from the SQL query is added torawScore
, and its id is recorded - when four stations have been found, the loop exits
To calculate the final score, the code then
- loops through the stations found earlier
- if the stations has temperature readings on the day, and was identified in the first step
- a
tempscore
is calculated for the station. If therawScore
is the same as the stations core (which will happen if only a single station is found), then its set to that value, otherwise its the difference between therawScore
and the databasescore
The final score returned is the total tempscore
, which should be the same as rawScore
The score then is basicall a weighting thats based on how close the station is to the designed coordinated.
The interpolation
Summary: calculate an adjusted reading based on adding together weighted values of observations from the four stations found earlier, plus an altitude related modifier. If there are stations which are very close by then this observations are used more or less directly.
The iterpolation varies across different measurements. But for temperatures its done as follows:
- find those stations that have a
tempscore
(so a maximum of 4 of the possible 6 returned, those with a highest score are preferred) - calculate a constant modifier. For temperatures this is
2/3 * the altitude differences between the station and the request point / 100
. This takes into account the average altitude adjustment of 6C per 1000m of altitude difference ie. ~(2/3)C per 100m. - then, if the altitude difference is less than 100m and the station is closer than 9km, return
observed temperature * constant modifier
- otherwise, calculate the value as the sum of
observed temperature * constant modifier * the tempscore weighting
So the final interpolation is a calculated by a % weighting of the observed value based on the geographic distance between the queried point and the observations in the database.