With BigQuery's new remote user defined functions (in preview) it is now possible to bring the power of Google Maps to your analytic data warehouse. Using Google Maps API endpoints in Cloud Functions called by BigQuery you can:
- Geocode Addresses
- Determine drive time distance between locations
- Supplementing address or location data with Google Map's data such as elevation or place descriptions
By enriching location datasets in BigQuery you can accomplish advanced spatial analysis including:
- Route and drive time optimization
- Geographic clustering and cohort analysis
For example, the following query will calculate the drive time between a set of distribution centers and a list of users, allowing organizations to optimize distribution center deliveries.
WITH shipping AS (
SELECT
CONCAT(users.first_name, " ", users.last_name) AS destination_name,
CONCAT('{\"lat\": ', users.latitude, ', \"lng\": ', users.longitude, '}') AS destination,
dc.name AS origin_name,
CONCAT('{\"lat\": ', dc.latitude, ', \"lng\": ', dc.longitude, '}') AS origin
FROM `looker-private-demo`.thelook.users
CROSS JOIN `looker-private-demo`.thelook.distribution_centers AS dc
WHERE users.state="Tennessee"
)
SELECT
bq_udf_test.drivetime(shipping.origin, shipping.destination) AS best_drivetime,
shipping.destination_name,
shipping.origin_name
FROM shipping
ORDER BY 1, 2
This example calculates 10K route times in approx 30 seconds!
The supporting drivetime function is a very simple cloud function built around 2 Python functions, the functional psuedo-code is below:
# This function handles the formatting, accepting a BQ request, calling our get_drivetime function, and formatting the result
def drivetime_from_addresses(request):
# initiate a googlemaps client using
# a runtime environment variable that injects a cloud secret
api_key = os.environ.get('MAPS_GEOCODE_API_KEY', 'No Maps API Key Available')
gmaps = googlemaps.Client(key=api_key)
# get the BQ request
return_value = []
request_json = request.get_json()
rows = request_json['calls']
# iterate through rows with two columns: origin, destination
for row in rows:
return_value.append(get_drivetime(row[0], row[1], gmaps))
# format the result as BQ expects
replies = [float(x) for x in return_value]
return_json = json.dumps( { "replies" : replies} )
return(return_json)
# get drivetime in seconds for an origin, destination pair
def get_drivetime(origin, destination, gmaps):
print(origin)
print(destination)
content = gmaps.distance_matrix([origin], [destination])
drivetime_sec = content['rows'][0]['elements'][0]['duration']['value']
return(json.dumps(drivetime_sec))
The following steps go into more detail, using an example that shows how to geocode addresses to retrieve their latitude and longitude.
- Create a BigQuery dataset including a table with address information, or use:
CREATE TABLE your_dataset.addresses AS
SELECT
CONCAT(street, " ",
SPLIT(client.address,'|')[OFFSET(0)], " ",
SPLIT(client.address,'|')[OFFSET(1)], " ",
SPLIT(client.address,'|')[OFFSET(2)]
) AS address
FROM `looker-private-demo.retail_banking.client` AS client
LIMIT 100
-
Enable the Google Maps API and create an API key of interest (e.g. the geocoding API). Add the API key as a Cloud Secret.
-
Create and deploy a cloud function with the example main.py file and the requirements.txt file, enabling access to the Maps API key secret. Use
lat_lon_from_addresses
as the entrypoint.
- Follow the steps to enable an external BigQuery connection, sourced from this document:
bq mk --connection --display_name='geocode fn conn' --connection_type=CLOUD_RESOURCE
--project_id=my-project-id --location=US geocode_fn_conn
- Give the service account from step 4 the cloud invoker permission on your function from step 3
- In BigQuery, register your cloud function as a user defined function:
CREATE FUNCTION `your-gcp-project.your_dataset`.geocode(call STRING) RETURNS STRING
REMOTE WITH CONNECTION `your-gcp-project.us.geocode_fn_conn`
OPTIONS (endpoint = 'https://your-cloud-function-endpoint-url/geocode')
- In BigQuery, use your new cloud function:
WITH result AS (
SELECT
address,
`your-gcp-project.your_dataset`.geocode(address) AS location
FROM `your-gcp-project.your_dataset.addresses`
)
SELECT
address,
json_extract_scalar(location, '$.lat') AS lat,
json_extract_scalar(location, '$.lng') AS lng
FROM result
This pattern is highly extensible to other data enrichment tasks. Update the Cloud Function to use any Maps API endpoint, or copy and deploy a second function to chain together tasks.