Skip to content

Instantly share code, notes, and snippets.

@bilus
Created March 6, 2020 11:21
Show Gist options
  • Save bilus/d4993db95de9885189ed61aba03f3070 to your computer and use it in GitHub Desktop.
Save bilus/d4993db95de9885189ed61aba03f3070 to your computer and use it in GitHub Desktop.

radiodns

Broadcast data

Link raw data to remote ids

SELECT n.message.data.remote_content_id, n.message.data.remote_broadcast_id, n.message.data.source, r.message as rawData, n.message as normalizedData
FROM `conrad-production-api.conradEvents.StaticDataRaw` r
LEFT JOIN `conrad-production-api.conradEvents.StaticDataNormalized` n
  ON (n.message.data.update_id = JSON_EXTRACT_SCALAR(r.message, "$.data.update_id"))

Link remote ids to record ids

Use StaticDataEnrichedUpdates

SELECT key, message.data.delta.remote_broadcast_id, message.data.broadcast_id, message.data.delta.remote_content_id, message.data.content_id
       FROM `conrad-production-api.conradEvents.StaticDataEnrichedUpdates`
       WHERE (message.data.broadcast_id IS NOT NULL OR message.data.content_id IS NOT NULL)
WITH Mappings AS (SELECT key, message.data.delta.remote_broadcast_id AS remoteBroadcastId, message.data.broadcast_id AS broadcastId, message.data.delta.remote_content_id AS remoteContentId, message.data.content_id AS contentId, message.data.delta.source AS source
       FROM `conrad-production-api.conradEvents.StaticDataEnrichedUpdates`
       WHERE (message.data.broadcast_id IS NOT NULL OR message.data.content_id IS NOT NULL))
SELECT remoteBroadcastId, source, broadcastId FROM Mappings LIMIT 100

We are getting ~2000 records.

Alternative solutions

Load content mappings & broadcast records from matcher database every day.

Generate initial data and then use events.

Ask Zach if our hypothesis is true

Hypothesis: We only have ~2000 records ^ because the events only contain CHANGES of mapping of remote ids. ****

Create daily snapshots of raw data in airflow (manually at first?)

Link daily snapshots of raw data to daily snapshots of broadcast records to coverage areas.

Link inside daily snapshots the day’s fmlist to radiodns.

Content data?

Minimal plan

Export content & broadcast mappings from matcher db to CSV and add it InitialBroadcast(Content)MappingSyntheticEvents table

timestamp, source, remoteBroadcastId, broadcastId (or content…)

Link raw data to remote ids to record ids

SELECT n.message.data.remote_content_id, n.message.data.remote_broadcast_id, n.message.data.source, r.message as rawData, n.message as normalizedData
FROM `conrad-production-api.conradEvents.StaticDataRaw` r
LEFT JOIN `conrad-production-api.conradEvents.StaticDataNormalized` n
  ON (n.message.data.update_id = JSON_EXTRACT_SCALAR(r.message, "$.data.update_id"))

Link synthetic events to raw data using remote ids using the latest raw data version; MAX(timestamp)

General idea

SELECT u.broadcastId, u.baselineData.country, timestamp FROM `conrad-analytics-staging.ProductionStatic.BroadcastUpdates` u
RIGHT JOIN
  (SELECT broadcastId, MAX(timestamp) AS timestamp FROM `conrad-analytics-staging.ProductionStatic.BroadcastUpdates` GROUP BY broadcastId)
  USING (broadcastId, timestamp) WHERE broadcastId = "196422" LIMIT 100

Of course, you need to use the proper table + remote id and source tuples to group and match.

Link to CoverageAreas via DailyFactSnapshots using broadcast id and updateId.

Find the centroid of coverage areas and generate the pin map table

LatestBroadcastLocations lat, lng, broadcastId

LatestBroadcastDataSources broadcastId, source

Content table

Flatten everything

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment