Skip to content

Instantly share code, notes, and snippets.

@matsukaz
Last active September 27, 2021 11:07
Show Gist options
  • Save matsukaz/a145c2553a0faa59e32ad7c25e6a92f7 to your computer and use it in GitHub Desktop.
Save matsukaz/a145c2553a0faa59e32ad7c25e6a92f7 to your computer and use it in GitHub Desktop.
Find a geolocation of an IP address in BigQuery

This query is to find geolocation of an IP address including latitude, longitude, city and country.

Legacy SQL doesn't support range conditions such as BETWEEN when using JOIN, so we need to filter data by WHERE. This means if IP address does not match any of the data inside geolite_city_bq_b2b, records will not be able to receive.

Use Standard SQL if you want to receive records no matter you succeed to find geolocation or not.

Please refer to the following post for more detail.

https://cloudplatform.googleblog.com/2014/03/geoip-geolocation-with-google-bigquery.html

1 192.188.171.9
2 202.41.146.198
3 103.83.231.23
4 202.64.101.91
5 43.240.52.112
6 103.12.247.112
7 217.228.79.15
8 35.10.42.69
9 86.40.207.143
10 66.71.60.27
11 72.136.125.125
12 68.144.88.194
13 194.197.79.18
14 69.119.90.108
15 83.235.250.67
bq mk <dataset>.log id:integer,ip:string
$ bq load --source_format=CSV <dataset>.log test_data.csv
#standardSQL
SELECT
id,
IFNULL(city, 'Other') AS city,
IFNULL(countryLabel, 'Other') AS countryLabel,
latitude,
longitude
FROM (
SELECT
id,
NET.IPV4_TO_INT64(NET.IP_FROM_STRING(ip)) AS clientIpNum,
TRUNC(NET.IPV4_TO_INT64(NET.IP_FROM_STRING(ip))/(256*256)) AS classB
FROM
`<project>.<dataset>.log` ) AS a
LEFT OUTER JOIN
`fh-bigquery.geocode.geolite_city_bq_b2b` AS b
ON
a.classB = b.classB
AND a.clientIpNum BETWEEN b.startIpNum AND b.endIpNum
ORDER BY
id ASC
Row id city countryLabel latitude longitude
1 1 Other Other null null
2 2 Taiwan 23.5000 121.0000
3 3 Other Other null null
4 4 Hong Kong 22.2500 114.1667
5 5 Japan 35.6900 139.6900
6 6 Bangladesh 24.0000 90.0000
7 7 Heiden Germany 51.8333 6.9333
8 8 East Lansing United States 42.7283 -84.4882
9 9 Buncrana Ireland 55.1333 -7.4500
10 10 Collegeville United States 40.1879 -75.4254
11 11 Canada 60.0000 -95.0000
12 12 Calgary Canada 51.0833 -114.0833
13 13 Helsinki Finland 60.1756 24.9342
14 14 Bellmore United States 40.6726 -73.5364
15 15 Greece 39.0000 22.0000
#legacySQL
SELECT
id,
city,
countryLabel,
latitude,
longitude
FROM (
SELECT
id,
INTEGER(PARSE_IP(ip)) AS clientIpNum,
INTEGER(PARSE_IP(ip)/(256*256)) AS classB
FROM
[<project>:<dataset>.log] ) AS a
LEFT OUTER JOIN
[fh-bigquery:geocode.geolite_city_bq_b2b] AS b
ON
a.classB = b.classB
WHERE
a.clientIpNum BETWEEN b.startIpNum AND b.endIpNum
ORDER BY
id ASC
Row id city countryLabel latitude longitude
1 2 Taiwan 23.5000 121.0000
2 4 Hong Kong 22.2500 114.1667
3 5 Japan 35.6900 139.6900
4 6 Bangladesh 24.0000 90.0000
5 7 Heiden Germany 51.8333 6.9333
6 8 East Lansing United States 42.7283 -84.4882
7 9 Buncrana Ireland 55.1333 -7.4500
8 10 Collegeville United States 40.1879 -75.4254
9 11 Canada 60.0000 -95.0000
10 12 Calgary Canada 51.0833 -114.0833
11 13 Helsinki Finland 60.1756 24.9342
12 14 Bellmore United States 40.6726 -73.5364
13 15 Greece 39.0000 22.0000
14 16 Somerset United States 40.4976 -74.4885
15 17 United Kingdom 51.5000 -0.1300
#standardSQL
SELECT
id,
geoip.Country_Name AS country
FROM (
SELECT
id,
NET.IPV4_TO_INT64(NET.IP_FROM_STRING(ip)) AS ip,
TRUNC(NET.IPV4_TO_INT64(NET.IP_FROM_STRING(ip))/(256*256*256)) AS ip_class_a
FROM
`<project>.<dataset>.log` ) AS main
LEFT OUTER JOIN
`fh-bigquery.geocode.geolite_country_bq` AS geoip
ON
main.ip_class_a = geoip.classA
AND main.ip BETWEEN geoip.From_IP_Code
AND geoip.To_IP_Code
ORDER BY
id
Row id country
1 1 NA
2 2 Taiwan
3 3 NA
4 4 Hong Kong
5 5 Japan
6 6 Bangladesh
7 7 Germany
8 8 United States
9 9 Ireland
10 10 United States
11 11 Canada
12 12 Canada
13 13 Finland
14 14 United States
15 15 Greece
SELECT
id,
geoip.Country_Name AS country
FROM (
SELECT
id,
INTEGER(PARSE_IP(ip)) AS ip,
INTEGER(PARSE_IP(ip)/(256*256*256)) AS ip_class_a
FROM
[<project>:<dataset>.log] ) AS main
JOIN EACH [fh-bigquery:geocode.geolite_country_bq] AS geoip
ON
main.ip_class_a = geoip.classA
WHERE
main.ip BETWEEN geoip.From_IP_Code
AND geoip.To_IP_Code
ORDER BY
id
Row id country
1 1 NA
2 2 Taiwan
3 3 NA
4 4 Hong Kong
5 5 Japan
6 6 Bangladesh
7 7 Germany
8 8 United States
9 9 Ireland
10 10 United States
11 11 Canada
12 12 Canada
13 13 Finland
14 14 United States
15 15 Greece
@matsukaz
Copy link
Author

@mbyrne00
Thanks for your information!
Yeah, if you want to treat the record which doesn't match with geo data or just city name was lost as a same data, that nice idea :)

@quindici
Ah, fh-bigquery.geocode.geolite_city_bq_b2b may not be maintained any more...
I know it's really hard to map geo location from ip address, so I think we should use this query just for reference level data, not the exact place.

@byanjati
Copy link

neat!

@cheikhdjibrilla
Copy link

Can you please provide this table schema and info fh-bigquery.geocode.geolite_city_bq_b2b ?
in your first code you wrote this a.classB = b.classB. How do we get b.classB with the geoip dataset in which only the network ip is present?

@matsukaz
Copy link
Author

@cheikhdjibrilla
Unfortunately, it looks like google removed the table from public data.
I don't remember the structure of the table schema, but the original data was provided by MAXMIND.
May be you can find the table schema from the following site.
https://dev.maxmind.com/geoip/geoip2/geolite2/

@cheikhdjibrilla
Copy link

Thank you for your replay.

@dinesarun
Copy link

Did you able to find the table schema @cheikhdjibrilla

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment