Skip to content

Instantly share code, notes, and snippets.

@luisdelatorre012
Last active July 21, 2025 16:14
Show Gist options
  • Save luisdelatorre012/434e377a643483498fcd62dbb4e65dda to your computer and use it in GitHub Desktop.
Save luisdelatorre012/434e377a643483498fcd62dbb4e65dda to your computer and use it in GitHub Desktop.
Asking O3 to help me with setting up the multi-key relationship

You:

I'm building a model in power bi. I have two tables, and there is a one-to-one relationship between them based on the combination of these columns: branch_origin_key, branch_dest_key, outbound_revenue_customer_key How do I get around the limitation that relationships between tables in power bi can only be based on one column?

ChatGPT:

Thought for 13 seconds

Power BI’s model designer can only link two tables on one column, so you have to collapse your three-column key into a single surrogate key (or refactor the model). Here are the two common ways:

Approach How to do it Pros Cons
Surrogate key in each fact table 1. Add a calculated column in both tables:

```DAX
CompositeKey = COMBINEVALUES(" | ",branch_origin_key, branch_dest_key, outbound_revenue_customer_key)```

2. Set the relationship on CompositeKey (cardinality = 1-to-1, cross-filter = Both, if that’s what you need). | • Quick—no extra table
• Works with Import or DirectQuery | | Bridge / dimension table | 1. In Power Query (or in the data source) create a table that contains every distinct combination of the three columns and add the same CompositeKey.
2. Relate each fact table to this new table on CompositeKey.
3. Optionally hide the surrogate key column and keep the three originals to use in slicers. | • Keeps the model in star-schema style
• Lets you put the three keys in one slicer table
• Avoids repeating long string key in multiple large fact tables | • One extra table
• Slightly more up-front work |

Tips & gotchas

  • Uniqueness – run a quick measure to be sure each table’s CompositeKey column has no duplicates:

    HasDupes =
        VAR total   = COUNTROWS ( 'Table' )
        VAR distinct= DISTINCTCOUNT ( 'Table'[CompositeKey] )
        RETURN IF ( total = distinct, "OK", "Duplicates!" )
    
  • Performance – if your model is big, consider creating a numeric surrogate key in the source (e.g., an integer identity) instead of the text that COMBINEVALUES produces.

  • Why not relationships on all three columns? – Power BI’s engine (VertiPaq) stores relationships as a single hash lookup, so multi-column joins would break that design.

  • When to merge instead – since it’s 1-to-1, you could skip the relationship entirely and merge the two tables in Power Query; just keep the columns you actually need.

Pick whichever path best matches your data-volume and modeling preferences—most teams start with the COMBINEVALUES column because it’s fast to implement, then move to a proper surrogate key or bridge table if the model grows.

You:

let's do this. Performance – if your model is big, consider creating a numeric surrogate key in the source (e.g., an integer identity) instead of the text that COMBINEVALUES produces. I'm using sql server. my two tables are named air_housebills and aggregated_data

ChatGPT:

Thought for 5 seconds

Below is a pragmatic pattern you can run once in SQL Server to give both air_housebills and aggregated_data the same numeric surrogate key that represents
(branch_origin_key, branch_dest_key, outbound_revenue_customer_key).
It keeps the tables unchanged except for a single INT column each and a small bridge table.


1 Create a bridge (dimension) table with an identity key

-- 1. Holds every distinct 3-column combination only once
CREATE TABLE dbo.BranchDestCustBridge
(
    SurrogateKey   INT IDENTITY(1,1) PRIMARY KEY,
    branch_origin_key          INT  NOT NULL,
    branch_dest_key            INT  NOT NULL,
    outbound_revenue_customer_key INT NOT NULL,
    CONSTRAINT uq_bridge UNIQUE
        (branch_origin_key, branch_dest_key, outbound_revenue_customer_key)
);

IDENTITY makes SurrogateKey a compact 4-byte integer; the UNIQUE constraint guarantees you never get duplicate rows.


2 Populate the bridge from the two source tables

INSERT INTO dbo.BranchDestCustBridge
        (branch_origin_key, branch_dest_key, outbound_revenue_customer_key)
SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
FROM (
        SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
        FROM dbo.air_housebills
        UNION      -- UNION removes duplicates across both tables
        SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
        FROM dbo.aggregated_data
     ) AS u;

Run this once (or re-run as an UP‐SERT in a nightly job if the combo list grows).


3 Add a surrogate-key column to each fact table

-- 3a. Add nullable column first
ALTER TABLE dbo.air_housebills     ADD SurrogateKey INT NULL;
ALTER TABLE dbo.aggregated_data    ADD SurrogateKey INT NULL;

-- 3b. Fill it by looking up the bridge
UPDATE h
SET    SurrogateKey = b.SurrogateKey
FROM   dbo.air_housebills h
JOIN   dbo.BranchDestCustBridge b
  ON   b.branch_origin_key   = h.branch_origin_key
 AND   b.branch_dest_key     = h.branch_dest_key
 AND   b.outbound_revenue_customer_key = h.outbound_revenue_customer_key;

UPDATE a
SET    SurrogateKey = b.SurrogateKey
FROM   dbo.aggregated_data a
JOIN   dbo.BranchDestCustBridge b
  ON   b.branch_origin_key   = a.branch_origin_key
 AND   b.branch_dest_key     = a.branch_dest_key
 AND   b.outbound_revenue_customer_key = a.outbound_revenue_customer_key;

-- 3c. Make it NOT NULL and add an index
ALTER TABLE dbo.air_housebills  ALTER COLUMN SurrogateKey INT NOT NULL;
ALTER TABLE dbo.aggregated_data ALTER COLUMN SurrogateKey INT NOT NULL;

CREATE INDEX IX_air_housebills_SKey  ON dbo.air_housebills  (SurrogateKey);
CREATE INDEX IX_aggregated_SKey      ON dbo.aggregated_data (SurrogateKey);

Optionally add a FOREIGN KEY back to the bridge for data integrity.


4 Use it in Power BI

  1. Import the three tables.

  2. Create one-to-many relationships:

    • BranchDestCustBridge[SurrogateKey] → air_housebills[SurrogateKey]

    • BranchDestCustBridge[SurrogateKey] → aggregated_data[SurrogateKey]

  3. Hide the SurrogateKey columns in the report view if you don’t want users to see them.

  4. If report users need to slice by branch or customer, expose those three natural-key columns from the bridge (they live only once).


Why this is fast

Benefit Reason
4-byte lookup instead of long VARCHAR VertiPaq compression is optimal on integers; memory footprint drops sharply.
Single join Power BI relationships are hash tables on one column—exactly what SurrogateKey gives you.
Indexes keep ETL cheap On large tables the SurrogateKey cluster-friendly index speeds lookups for updates/inserts.

Alternative: hashed computed column

If you cannot add a bridge table (e.g., read-only source), you can:

ALTER TABLE dbo.air_housebills
ADD CompositeHash AS
    (CONVERT(BIGINT,
             CHECKSUM(branch_origin_key, branch_dest_key, outbound_revenue_customer_key)
            )) PERSISTED;
-- Repeat for aggregated_data and build a relationship on CompositeHash

…but CHECKSUM collisions, even though rare, are possible, so the identity-bridge approach above is safer and easier to maintain.

You:

I'm a little confused about step 3, "3 Add a surrogate-key column to each fact table". I'm going to run a monthly job that truncates all the tables and replaces the data. that doesn't seem like it will work with step 3.

ChatGPT:

Thought for 29 seconds

How the surrogate-key pattern plays with a monthly TRUNCATE + reload

The key idea is that the SurrogateKey column is just another data column.
When you TRUNCATE the fact tables you wipe all columns (including SurrogateKey), then you reload them and repopulate that column exactly the same way each month. Nothing in Step 3 assumes the tables stay untouched—only that you repopulate the key after each load.

Below are three practical variants you can choose from, depending on how your ETL job already works.


Variant A – keep the bridge table persistent (recommended)

  1. Do not truncate the bridge.
    It holds the master list of key combinations and their identity numbers.
    Keep it growing over time; an INT identity can store >2 billion rows.

  2. ETL sequence each month

    BEGIN TRAN;
    
    -- 1. Truncate and reload the two staging tables
    TRUNCATE TABLE dbo.air_housebills;
    TRUNCATE TABLE dbo.aggregated_data;
    INSERT INTO dbo.air_housebills      (...) SELECT ... FROM source1;
    INSERT INTO dbo.aggregated_data     (...) SELECT ... FROM source2;
    
    -- 2. Up-sert new combos into the bridge (only inserts run)
    MERGE dbo.BranchDestCustBridge AS tgt
    USING (
           SELECT DISTINCT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
           FROM (
                   SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
                   FROM dbo.air_housebills
                   UNION ALL
                   SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
                   FROM dbo.aggregated_data
                ) s
          ) AS src
    ON  src.branch_origin_key           = tgt.branch_origin_key
    AND src.branch_dest_key             = tgt.branch_dest_key
    AND src.outbound_revenue_customer_key = tgt.outbound_revenue_customer_key
    WHEN NOT MATCHED BY TARGET THEN
         INSERT (branch_origin_key, branch_dest_key, outbound_revenue_customer_key)
         VALUES (src.branch_origin_key, src.branch_dest_key, src.outbound_revenue_customer_key);
    
    -- 3. Populate SurrogateKey in the facts
    UPDATE h
    SET    SurrogateKey = b.SurrogateKey
    FROM   dbo.air_housebills h
    JOIN   dbo.BranchDestCustBridge b
      ON   b.branch_origin_key   = h.branch_origin_key
     AND   b.branch_dest_key     = h.branch_dest_key
     AND   b.outbound_revenue_customer_key = h.outbound_revenue_customer_key;
    
    UPDATE a
    SET    SurrogateKey = b.SurrogateKey
    FROM   dbo.aggregated_data a
    JOIN   dbo.BranchDestCustBridge b
      ON   b.branch_origin_key   = a.branch_origin_key
     AND   b.branch_dest_key     = a.branch_dest_key
     AND   b.outbound_revenue_customer_key = a.outbound_revenue_customer_key;
    
    COMMIT;

Why this is easiest: the identity values in the bridge never change, so incremental refresh in Power BI or snapshotting across months still works.


Variant B – rebuild the bridge every month

If you must truncate the bridge as well (e.g., corporate policy says “truncate everything”), do the steps in this order so the keys line up:

  1. Truncate the bridge.

  2. Insert distinct combos into the bridge (same INSERT … SELECT … UNION as before).

  3. Truncate & reload the fact tables while joining to the freshly built bridge to pick up SurrogateKey during the insert:

    TRUNCATE TABLE dbo.air_housebills;
    
    INSERT INTO dbo.air_housebills (SurrogateKey, col1, col2, …)
    SELECT b.SurrogateKey, s.col1, s.col2, …
    FROM   Staging_air_housebills s         -- raw data already loaded into staging
    JOIN   dbo.BranchDestCustBridge b
           ON b.branch_origin_key   = s.branch_origin_key
          AND b.branch_dest_key     = s.branch_dest_key
          AND b.outbound_revenue_customer_key = s.outbound_revenue_customer_key;

Power BI doesn’t care that the identity numbers are different each month—as long as you do a full refresh after the reload.


Variant C – deterministic hash instead of bridge

If you have read-only access and cannot create or maintain a bridge, you can stamp a deterministic numeric hash in each fact table:

ALTER TABLE dbo.air_housebills
ADD SurrogateKey AS
    (ABS(CONVERT(BIGINT,
        CONVERT(BINARY(8), HASHBYTES('SHA1',
              CONCAT(branch_origin_key, '|',
                     branch_dest_key,   '|',
                     outbound_revenue_customer_key)), 1)
        ))) PERSISTED;

Pros: no join, no extra table.
Cons: slightly longer key (BIGINT), a vanishingly small risk of collisions, and you lose the star-schema “dimension” benefits.


Which one should I pick?

Scenario Best variant
You can keep a small helper table and you’d like lightning-fast joins, easy slicers, and minimal risk Variant A
Corporate ETL wipes everything every load and you can tweak the load scripts Variant B
Data source is read-only, or adding extra tables is impossible Variant C

Power BI model steps stay the same

Regardless of which loading variant you choose, in Power BI you still:

  1. Import the bridge/dimension (unless you went with Variant C).

  2. Import the two fact tables.

  3. Create one-to-many relationships on SurrogateKey.

  4. Refresh the model after each monthly ETL run.

That’s all—the monthly truncate/reload cycle just means your ETL has to repopulate the SurrogateKey column every time.

You:

I like variant A. show me the full steps 1-4 assuming that I will use variant A.

ChatGPT:

Thought for 5 seconds

Below is a complete, end-to-end recipe for Variant A – persistent bridge table.
Follow it once to set everything up, then reuse the “Monthly ETL” block each time you reload.


Step 1 Create the bridge (dimension) table once

/* 1. Master list of the three natural-key columns */
CREATE TABLE dbo.BranchDestCustBridge
(
    SurrogateKey   INT IDENTITY(1,1) PRIMARY KEY,
    branch_origin_key              INT  NOT NULL,
    branch_dest_key                INT  NOT NULL,
    outbound_revenue_customer_key  INT  NOT NULL,
    CONSTRAINT uq_BranchDestCust UNIQUE
            (branch_origin_key, branch_dest_key, outbound_revenue_customer_key)
);
/* Optional but useful for ETL lookups */
CREATE INDEX IX_BranchDestCust_3cols
    ON dbo.BranchDestCustBridge (branch_origin_key,
                                 branch_dest_key,
                                 outbound_revenue_customer_key);

Why keep it forever?
Every unique combination gets one identity number and never changes, so Power BI relationships—and any historical snapshots you keep—stay stable.


Step 2 Initial load of the bridge once

INSERT INTO dbo.BranchDestCustBridge
        (branch_origin_key, branch_dest_key, outbound_revenue_customer_key)
SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
FROM (
        SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
        FROM dbo.air_housebills

        UNION        -- UNION removes duplicates across the two tables
        SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
        FROM dbo.aggregated_data
     ) AS u;

After this point the bridge contains every combo that already exists in either fact table.


Step 3 Add surrogate-key column to each fact table once

ALTER TABLE dbo.air_housebills     ADD SurrogateKey INT NULL;
ALTER TABLE dbo.aggregated_data    ADD SurrogateKey INT NULL;

/* Fill it for the rows currently in the tables */
UPDATE h
SET    SurrogateKey = b.SurrogateKey
FROM   dbo.air_housebills h
JOIN   dbo.BranchDestCustBridge b
  ON   b.branch_origin_key   = h.branch_origin_key
 AND   b.branch_dest_key     = h.branch_dest_key
 AND   b.outbound_revenue_customer_key = h.outbound_revenue_customer_key;

UPDATE a
SET    SurrogateKey = b.SurrogateKey
FROM   dbo.aggregated_data a
JOIN   dbo.BranchDestCustBridge b
  ON   b.branch_origin_key   = a.branch_origin_key
 AND   b.branch_dest_key     = a.branch_dest_key
 AND   b.outbound_revenue_customer_key = a.outbound_revenue_customer_key;

/* Lock it down and index it */
ALTER TABLE dbo.air_housebills     ALTER COLUMN SurrogateKey INT NOT NULL;
ALTER TABLE dbo.aggregated_data    ALTER COLUMN SurrogateKey INT NOT NULL;

ALTER TABLE dbo.air_housebills
    ADD CONSTRAINT fk_AHB_SKey FOREIGN KEY (SurrogateKey)
        REFERENCES dbo.BranchDestCustBridge (SurrogateKey);
ALTER TABLE dbo.aggregated_data
    ADD CONSTRAINT fk_AGG_SKey FOREIGN KEY (SurrogateKey)
        REFERENCES dbo.BranchDestCustBridge (SurrogateKey);

CREATE INDEX IX_AHB_SKey ON dbo.air_housebills     (SurrogateKey);
CREATE INDEX IX_AGG_SKey ON dbo.aggregated_data    (SurrogateKey);

You’re now ready to build the Power BI model, but the keys also need to be refreshed after each monthly reload…


Step 3 (bis) Monthly ETL sequence (truncate + reload + key refresh)

Run this block every time you replace the data.

BEGIN TRAN;

/* 0. Truncate the two fact tables.  (Bridge is NOT truncated) */
TRUNCATE TABLE dbo.air_housebills;
TRUNCATE TABLE dbo.aggregated_data;

/* 1. Reload fact tables from your external sources */
INSERT INTO dbo.air_housebills (branch_origin_key, branch_dest_key,
                                outbound_revenue_customer_key, …other columns…)
SELECTFROM source1;          -- your real load here

INSERT INTO dbo.aggregated_data (branch_origin_key, branch_dest_key,
                                 outbound_revenue_customer_key, …)
SELECTFROM source2;

/* 2. Up-sert any brand-new combos into the bridge */
MERGE dbo.BranchDestCustBridge AS tgt
USING (
        /* Distinct list across both freshly loaded tables */
        SELECT DISTINCT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
        FROM (
                SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
                FROM dbo.air_housebills
                UNION ALL
                SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
                FROM dbo.aggregated_data
             ) s
      ) AS src
ON  src.branch_origin_key           = tgt.branch_origin_key
AND src.branch_dest_key             = tgt.branch_dest_key
AND src.outbound_revenue_customer_key = tgt.outbound_revenue_customer_key
WHEN NOT MATCHED BY TARGET THEN
     INSERT (branch_origin_key, branch_dest_key, outbound_revenue_customer_key)
     VALUES (src.branch_origin_key, src.branch_dest_key,
             src.outbound_revenue_customer_key);

/* 3. Stamp SurrogateKey in both fact tables */
UPDATE h
SET    SurrogateKey = b.SurrogateKey
FROM   dbo.air_housebills h
JOIN   dbo.BranchDestCustBridge b
  ON   b.branch_origin_key   = h.branch_origin_key
 AND   b.branch_dest_key     = h.branch_dest_key
 AND   b.outbound_revenue_customer_key = h.outbound_revenue_customer_key;

UPDATE a
SET    SurrogateKey = b.SurrogateKey
FROM   dbo.aggregated_data a
JOIN   dbo.BranchDestCustBridge b
  ON   b.branch_origin_key   = a.branch_origin_key
 AND   b.branch_dest_key     = a.branch_dest_key
 AND   b.outbound_revenue_customer_key = a.outbound_revenue_customer_key;

COMMIT;

Each month:

  • fact tables are emptied and refilled;

  • the bridge only grows if new combos appear;

  • SurrogateKey values stay stable for existing combos; brand-new combos get new identity values.


Step 4 Create the relationships in Power BI (one-time model design)

  1. Get data

    • Import BranchDestCustBridge, air_housebills, and aggregated_data.
  2. In Model view create two relationships:

    From table Column To table Column Cardinality Cross-filter
    BranchDestCustBridge SurrogateKey air_housebills SurrogateKey 1 : * Both (if you want bidirectional)
    BranchDestCustBridge SurrogateKey aggregated_data SurrogateKey 1 : * Both
  3. Hide SurrogateKey in the report view (optional).

  4. Expose the three natural-key columns in the bridge if users need them for slicers.

  5. After every monthly ETL run, refresh the Power BI dataset (gateway or manual).


What you gain

  • Compact memory & fast joins – Power BI looks up 4-byte integers instead of multi-column strings.

  • Stable keys – Historical snapshots or incremental refresh partitions stay valid month-to-month.

  • Clean star schema – The bridge doubles as a slicer table if you need to filter by branch or customer.

That’s the entire workflow for Variant A. Once you put these scripts into your ETL pipeline and schedule the dataset refresh, everything runs hands-off each month.

You:

Step 3 seems like it won’t work. The surrogate keys are non-null at this point, so you can’t update them after an insert.

ChatGPT:

Thought for 52 seconds

Good catch. You’re absolutely right: once SurrogateKey is NOT NULL (and especially if a FK exists), you cannot insert rows without supplying that value and then update it later. My earlier “insert then update” monthly ETL block would fail under those constraints.

Below is the corrected Variant A (persistent bridge) workflow that loads SurrogateKey at insert time. I’ll show the full end-to-end steps again with that fix incorporated.


Variant A (Persistent Bridge) – Corrected Full Workflow

Overview of the load pattern

  1. Load raw data into staging tables (no SurrogateKey columns).

  2. Upsert any new (branch_origin_key, branch_dest_key, outbound_revenue_customer_key) combos into the persistent bridge.

  3. Insert into the production fact tables joining to the bridge to pick up SurrogateKey in the INSERT.

  4. (Optional) swap/rename partitions instead of TRUNCATE if you want minimal downtime; below I’ll keep TRUNCATE for clarity.


Step 1 – Create the bridge (run once)

CREATE TABLE dbo.BranchDestCustBridge
(
    SurrogateKey   INT IDENTITY(1,1) PRIMARY KEY,
    branch_origin_key              INT  NOT NULL,
    branch_dest_key                INT  NOT NULL,
    outbound_revenue_customer_key  INT  NOT NULL,
    CONSTRAINT uq_BranchDestCust UNIQUE
        (branch_origin_key, branch_dest_key, outbound_revenue_customer_key)
);
GO

CREATE INDEX IX_BranchDestCust_3cols
ON dbo.BranchDestCustBridge (branch_origin_key, branch_dest_key, outbound_revenue_customer_key);
GO

Step 2 – Initial bridge seed (run once, before introducing SurrogateKey into facts)

INSERT INTO dbo.BranchDestCustBridge
        (branch_origin_key, branch_dest_key, outbound_revenue_customer_key)
SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
FROM (
        SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
        FROM dbo.air_housebills
        UNION
        SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
        FROM dbo.aggregated_data
     ) AS u;
GO

Step 3 – Add SurrogateKey columns to fact tables (run once)

IMPORTANT: We’ll create them NULLABLE now, populate them, then set NOT NULL.
After that, future reloads must insert SurrogateKey.

ALTER TABLE dbo.air_housebills     ADD SurrogateKey INT NULL;
ALTER TABLE dbo.aggregated_data    ADD SurrogateKey INT NULL;
GO

/* Backfill current rows */
UPDATE h
SET    SurrogateKey = b.SurrogateKey
FROM   dbo.air_housebills AS h
JOIN   dbo.BranchDestCustBridge AS b
  ON   b.branch_origin_key   = h.branch_origin_key
 AND   b.branch_dest_key     = h.branch_dest_key
 AND   b.outbound_revenue_customer_key = h.outbound_revenue_customer_key;

UPDATE a
SET    SurrogateKey = b.SurrogateKey
FROM   dbo.aggregated_data AS a
JOIN   dbo.BranchDestCustBridge AS b
  ON   b.branch_origin_key   = a.branch_origin_key
 AND   b.branch_dest_key     = a.branch_dest_key
 AND   b.outbound_revenue_customer_key = a.outbound_revenue_customer_key;

/* Enforce integrity */
ALTER TABLE dbo.air_housebills     ALTER COLUMN SurrogateKey INT NOT NULL;
ALTER TABLE dbo.aggregated_data    ALTER COLUMN SurrogateKey INT NOT NULL;

ALTER TABLE dbo.air_housebills
    ADD CONSTRAINT fk_AHB_SKey FOREIGN KEY (SurrogateKey)
        REFERENCES dbo.BranchDestCustBridge (SurrogateKey);

ALTER TABLE dbo.aggregated_data
    ADD CONSTRAINT fk_AGG_SKey FOREIGN KEY (SurrogateKey)
        REFERENCES dbo.BranchDestCustBridge (SurrogateKey);

CREATE INDEX IX_AHB_SKey ON dbo.air_housebills  (SurrogateKey);
CREATE INDEX IX_AGG_SKey ON dbo.aggregated_data (SurrogateKey);
GO

At this point your production fact tables are keyed and enforced.


Step 4 – Monthly ETL Reload (corrected)

Goal: Truncate facts, reload from source each month, and supply SurrogateKey values during insert. To do that we (a) load to staging, (b) upsert bridge, (c) insert into prod with join.

If your current ETL already lands data in staging tables (recommended), follow the pattern below.
If not, create temp (#) or staging tables in the job to mimic this.

4.1 Staging tables (if not already present)

You can use permanent staging tables or truncate-and-replace them each cycle.

-- Example staging schemas; adjust as needed
IF OBJECT_ID('stg.air_housebills_raw','U') IS NULL
BEGIN
    CREATE TABLE stg.air_housebills_raw
    (
        branch_origin_key              INT  NOT NULL,
        branch_dest_key                INT  NOT NULL,
        outbound_revenue_customer_key  INT  NOT NULL,
        /* other source columns ... */
        /* types must match or be castable */
    );
END;

IF OBJECT_ID('stg.aggregated_data_raw','U') IS NULL
BEGIN
    CREATE TABLE stg.aggregated_data_raw
    (
        branch_origin_key              INT  NOT NULL,
        branch_dest_key                INT  NOT NULL,
        outbound_revenue_customer_key  INT  NOT NULL,
        /* other source columns ... */
    );
END;
GO

4.2 Monthly ETL script

BEGIN TRY
    BEGIN TRAN;

    /* A. Refresh staging from source systems */
    TRUNCATE TABLE stg.air_housebills_raw;
    TRUNCATE TABLE stg.aggregated_data_raw;

    -- Replace these with your real source pulls
    INSERT INTO stg.air_housebills_raw (branch_origin_key, branch_dest_key, outbound_revenue_customer_key /*, ...*/)
    SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key /*, ...*/
    FROM   SourceSystem1.dbo.air_housebills_extract;  -- example

    INSERT INTO stg.aggregated_data_raw (branch_origin_key, branch_dest_key, outbound_revenue_customer_key /*, ...*/)
    SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key /*, ...*/
    FROM   SourceSystem2.dbo.aggregated_data_extract; -- example

    /* B. Upsert new combos into the persistent bridge */
    ;WITH AllCombos AS (
        SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
        FROM stg.air_housebills_raw
        UNION ALL
        SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
        FROM stg.aggregated_data_raw
    ),
    DistCombos AS (
        SELECT DISTINCT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
        FROM AllCombos
    )
    INSERT INTO dbo.BranchDestCustBridge (branch_origin_key, branch_dest_key, outbound_revenue_customer_key)
    SELECT d.branch_origin_key, d.branch_dest_key, d.outbound_revenue_customer_key
    FROM DistCombos AS d
    WHERE NOT EXISTS (
        SELECT 1
        FROM dbo.BranchDestCustBridge AS b
        WHERE b.branch_origin_key   = d.branch_origin_key
          AND b.branch_dest_key     = d.branch_dest_key
          AND b.outbound_revenue_customer_key = d.outbound_revenue_customer_key
    );
    /* (Avoid MERGE if you prefer; above pattern is concurrency-safe and simple.) */

    /* C. Truncate production fact tables */
    TRUNCATE TABLE dbo.air_housebills;
    TRUNCATE TABLE dbo.aggregated_data;

    /* D. Insert into production facts WITH SurrogateKey lookup */
    INSERT INTO dbo.air_housebills
            (SurrogateKey,
             branch_origin_key, branch_dest_key, outbound_revenue_customer_key,
             /* other columns in target order */)
    SELECT b.SurrogateKey,
           s.branch_origin_key, s.branch_dest_key, s.outbound_revenue_customer_key,
           /* other columns from staging in matching order */
    FROM   stg.air_housebills_raw AS s
    JOIN   dbo.BranchDestCustBridge AS b
       ON  b.branch_origin_key   = s.branch_origin_key
      AND  b.branch_dest_key     = s.branch_dest_key
      AND  b.outbound_revenue_customer_key = s.outbound_revenue_customer_key;

    INSERT INTO dbo.aggregated_data
            (SurrogateKey,
             branch_origin_key, branch_dest_key, outbound_revenue_customer_key,
             /* other columns */)
    SELECT b.SurrogateKey,
           s.branch_origin_key, s.branch_dest_key, s.outbound_revenue_customer_key,
           /* other columns */
    FROM   stg.aggregated_data_raw AS s
    JOIN   dbo.BranchDestCustBridge AS b
       ON  b.branch_origin_key   = s.branch_origin_key
      AND  b.branch_dest_key     = s.branch_dest_key
      AND  b.outbound_revenue_customer_key = s.outbound_revenue_customer_key;

    COMMIT TRAN;
END TRY
BEGIN CATCH
    IF @@TRANCOUNT > 0 ROLLBACK TRAN;
    THROW;
END CATCH;
GO

Why this works: Every insert into the production fact tables already has a valid SurrogateKey, so NOT NULL and FOREIGN KEY constraints are satisfied at insert time—no post-insert update required.


Step 5 – Power BI model (unchanged)

Create the relationships from BranchDestCustBridge[SurrogateKey] to each fact’s SurrogateKey (1:*). Refresh after each monthly reload.


Sanity checks you can add to the ETL

1. Missing bridge rows (should be zero):

SELECT COUNT(*) AS MissingInBridge
FROM stg.air_housebills_raw s
LEFT JOIN dbo.BranchDestCustBridge b
  ON  b.branch_origin_key   = s.branch_origin_key
 AND b.branch_dest_key     = s.branch_dest_key
 AND b.outbound_revenue_customer_key = s.outbound_revenue_customer_key
WHERE b.SurrogateKey IS NULL;

Run the same for aggregated_data_raw. If anything > 0, the bridge upsert step failed.

2. Post-load null keys (should be zero):

SELECT COUNT(*) FROM dbo.air_housebills  WHERE SurrogateKey IS NULL;
SELECT COUNT(*) FROM dbo.aggregated_data WHERE SurrogateKey IS NULL;

3. Referential integrity is already enforced by FK; but if you delay FKs for load performance, run checks then re-enable.


Quick decision guide

If you… Do this
Already have ETL staging tables Use them exactly as shown; simplest.
Pull directly from external sources with a single INSERT…SELECT Inline the bridge lookup in that SELECT; still must upsert bridge first.
Need very fast loads Bulk insert to staging; bulk upsert bridge; batch insert with SK join; optional minimal logging in bulk mode.

Final reminder

  • You correctly identified the NOT NULL insertion issue. The fix is: always include SurrogateKey in the INSERT (or keep the column nullable, which weakens integrity—don’t recommend). *

  • Because the bridge persists, SurrogateKey values remain stable month to month for existing combos; new combos simply get new identity values.

Let me know how your current monthly load is structured (SSIS? ADF? T-SQL Agent job? Python?), and I’ll tailor the staging + insert scripts to match exactly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment