SELECT n.message.data.remote_content_id, n.message.data.remote_broadcast_id, n.message.data.source, r.message as rawData, n.message as normalizedData
FROM `conrad-production-api.conradEvents.StaticDataRaw` r
LEFT JOIN `conrad-production-api.conradEvents.StaticDataNormalized` n
ON (n.message.data.update_id = JSON_EXTRACT_SCALAR(r.message, "$.data.update_id"))
Use StaticDataEnrichedUpdates
SELECT key, message.data.delta.remote_broadcast_id, message.data.broadcast_id, message.data.delta.remote_content_id, message.data.content_id
FROM `conrad-production-api.conradEvents.StaticDataEnrichedUpdates`
WHERE (message.data.broadcast_id IS NOT NULL OR message.data.content_id IS NOT NULL)
WITH Mappings AS (SELECT key, message.data.delta.remote_broadcast_id AS remoteBroadcastId, message.data.broadcast_id AS broadcastId, message.data.delta.remote_content_id AS remoteContentId, message.data.content_id AS contentId, message.data.delta.source AS source
FROM `conrad-production-api.conradEvents.StaticDataEnrichedUpdates`
WHERE (message.data.broadcast_id IS NOT NULL OR message.data.content_id IS NOT NULL))
SELECT remoteBroadcastId, source, broadcastId FROM Mappings LIMIT 100
We are getting ~2000 records.
Hypothesis: We only have ~2000 records ^ because the events only contain CHANGES of mapping of remote ids. ****
Export content & broadcast mappings from matcher db to CSV and add it InitialBroadcast(Content)MappingSyntheticEvents table
SELECT n.message.data.remote_content_id, n.message.data.remote_broadcast_id, n.message.data.source, r.message as rawData, n.message as normalizedData
FROM `conrad-production-api.conradEvents.StaticDataRaw` r
LEFT JOIN `conrad-production-api.conradEvents.StaticDataNormalized` n
ON (n.message.data.update_id = JSON_EXTRACT_SCALAR(r.message, "$.data.update_id"))
Link synthetic events to raw data using remote ids using the latest raw data version; MAX(timestamp)
General idea
SELECT u.broadcastId, u.baselineData.country, timestamp FROM `conrad-analytics-staging.ProductionStatic.BroadcastUpdates` u
RIGHT JOIN
(SELECT broadcastId, MAX(timestamp) AS timestamp FROM `conrad-analytics-staging.ProductionStatic.BroadcastUpdates` GROUP BY broadcastId)
USING (broadcastId, timestamp) WHERE broadcastId = "196422" LIMIT 100
Of course, you need to use the proper table + remote id and source tuples to group and match.