Skip to content

Instantly share code, notes, and snippets.

@yalisassoon
yalisassoon / example_metrics_views.sql
Created June 9, 2026 12:42
genie_instructions_pt_2 example_metrics_views
-- Update these to point to your schemas, then run the whole script.
DECLARE OR REPLACE VARIABLE derived_schema STRING DEFAULT 'your_catalog.derived'; -- schema with the snowplow_unified dbt output
DECLARE OR REPLACE VARIABLE semantics_schema STRING DEFAULT 'your_catalog.semantics'; -- schema for the new metric views, can be the same
EXECUTE IMMEDIATE 'CREATE SCHEMA IF NOT EXISTS ' || semantics_schema;
@yalisassoon
yalisassoon / genie_instructions.md
Last active June 9, 2026 12:41
genie instructions.md

Snowplow Unified Digital (Web) is a dbt package that converts raw Snowplow behavioral events into three analytics-ready entities: views, sessions, and users. It standardizes and enriches event data (e.g., content, device, geo, traffic attribution) and computes engagement metrics such as engaged time and scroll depth, producing a reliable canonical layer for product analytics, funnel/journey reporting, experimentation, and ML/LLM use.

The core tables relate hierarchically: snowplow_unified_views has one row per content view (page/screen), capturing view-level context and engagement; snowplow_unified_sessions aggregates all activity within a visit into one row per session, rolling up view and event behaviors into session metrics; snowplow_unified_users aggregates sessions into one row per user, providing user traits and lifetime/rolling metrics, including optional identity stitching (e.g., stitched_user_id) across devices and login states.

Together, these tables replace repeated event-level aggreg

name: SNOWPLOW_UNIFIED_VIEWS
description: |
Snowplow Unified Digital (Web) is a dbt package that converts raw Snowplow behavioral events into three analytics-ready entities: views, sessions, and users. It standardizes and enriches event data (e.g., content, device, geo, traffic attribution) and computes engagement metrics such as engaged time and scroll depth, producing a reliable canonical layer for product analytics, funnel/journey reporting, experimentation, and ML/LLM use.
The core tables relate hierarchically:
`snowplow_unified_views` has one row per content view (page/screen), capturing view-level context and engagement;
`snowplow_unified_sessions` aggregates all activity within a visit into one row per session, rolling up view and event behaviors into session metrics;
`snowplow_unified_users` aggregates sessions into one row per user, providing user traits and lifetime/rolling metrics, including optional identity stitching (e.g., `stitched_user_id`) across devices and login states.
To
{
"$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
"description": "Meta-schema for self-describing JSON schema",
"self": {
"vendor": "com.snowplowanalytics.self-desc",
"name": "schema",
"format": "jsonschema",
"version": "2-0-0"
},
"allOf": [
{
"$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
"description": "Schema for a knock out event",
"self": {
"vendor": "com.gamescompany.fighting",
"name": "knock_out",
"format": "jsonschema",
"version": "1-0-0"
},
"metadata": {
-- first let's extract the "entries" array
select admission_data->>'entries' from joint_records limit 1;
-- Following query attempts to explode JSON based on the position of objects in the entries array
-- Note that the results show you cannot depend on the position because some entries have more json objects than others
-- Therefore we're gonig to have to find a way to identify the object in the array with the key e.g. `BabyCryTriage` and then return the corresponding value
select
admission_data->'entries'->0->'key' as "is_BabyCryTriage?",
admission_data->'entries'->0->'values'->0->'label' as "BabyCryTriage",
@yalisassoon
yalisassoon / neotree-deduplicate.sql
Last active February 25, 2020 09:04
Neotree db explore
-- First we create a view where we deduplicate the sessions
create materialized view deduplicated_sessions as
(
with latest_sessions as (
select
scriptid,
uid,
max(id) as id -- This takes the most recent upload
-- of the session as the deduplicated record.
-- We could replace with min(id) to take the
[{
"schema": "iglu:com.optimizely/experiment/jsonschema/1-0-0",
"data": {
"id": "3623930456",
"name": "",
"code": "PDP-Desktop-Rearrange colour size",
"manual": true,
"conditional": false,
"variationIds": [
"3646160533",
@yalisassoon
yalisassoon / raw-optimizely-data-object.json
Created January 20, 2016 19:11
Example Optimizely data object (raw)
{
"audiences": {
"3528620749": {
"conditions": [
"and",
[
"or",
[
"not",
[
@yalisassoon
yalisassoon / Upgrade atomic.events
Created April 20, 2015 15:59
SQL upgrade to release 64 from release 62
-- Copyright (c) 2015 Snowplow Analytics Ltd. All rights reserved.
--
-- This program is licensed to you under the Apache License Version 2.0,
-- and you may not use this file except in compliance with the Apache License Version 2.0.
-- You may obtain a copy of the Apache License Version 2.0 at http://www.apache.org/licenses/LICENSE-2.0.
--
-- Unless required by applicable law or agreed to in writing,
-- software distributed under the Apache License Version 2.0 is distributed on an
-- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the Apache License Version 2.0 for the specific language governing permissions and limitations there under.