You:
I'm building a model in power bi. I have two tables, and there is a one-to-one relationship between them based on the combination of these columns: branch_origin_key, branch_dest_key, outbound_revenue_customer_key How do I get around the limitation that relationships between tables in power bi can only be based on one column?
ChatGPT:
Thought for 13 seconds
Power BI’s model designer can only link two tables on one column, so you have to collapse your three-column key into a single surrogate key (or refactor the model). Here are the two common ways:
Approach | How to do it | Pros | Cons |
---|---|---|---|
Surrogate key in each fact table | 1. Add a calculated column in both tables: |
```DAX
CompositeKey = COMBINEVALUES(" | ",branch_origin_key, branch_dest_key, outbound_revenue_customer_key)```
2. Set the relationship on CompositeKey
(cardinality = 1-to-1, cross-filter = Both, if that’s what you need). | • Quick—no extra table
• Works with Import or DirectQuery |
| Bridge / dimension table | 1. In Power Query (or in the data source) create a table that contains every distinct combination of the three columns and add the same CompositeKey
.
2. Relate each fact table to this new table on CompositeKey
.
3. Optionally hide the surrogate key column and keep the three originals to use in slicers. | • Keeps the model in star-schema style
• Lets you put the three keys in one slicer table
• Avoids repeating long string key in multiple large fact tables | • One extra table
• Slightly more up-front work |
-
Uniqueness – run a quick measure to be sure each table’s
CompositeKey
column has no duplicates:HasDupes = VAR total = COUNTROWS ( 'Table' ) VAR distinct= DISTINCTCOUNT ( 'Table'[CompositeKey] ) RETURN IF ( total = distinct, "OK", "Duplicates!" )
-
Performance – if your model is big, consider creating a numeric surrogate key in the source (e.g., an integer identity) instead of the text that
COMBINEVALUES
produces. -
Why not relationships on all three columns? – Power BI’s engine (VertiPaq) stores relationships as a single hash lookup, so multi-column joins would break that design.
-
When to merge instead – since it’s 1-to-1, you could skip the relationship entirely and merge the two tables in Power Query; just keep the columns you actually need.
Pick whichever path best matches your data-volume and modeling preferences—most teams start with the COMBINEVALUES
column because it’s fast to implement, then move to a proper surrogate key or bridge table if the model grows.
You:
let's do this. Performance – if your model is big, consider creating a numeric surrogate key in the source (e.g., an integer identity) instead of the text that COMBINEVALUES produces. I'm using sql server. my two tables are named air_housebills and aggregated_data
ChatGPT:
Thought for 5 seconds
Below is a pragmatic pattern you can run once in SQL Server to give both air_housebills and aggregated_data the same numeric surrogate key that represents
(branch_origin_key, branch_dest_key, outbound_revenue_customer_key)
.
It keeps the tables unchanged except for a single INT column each and a small bridge table.
-- 1. Holds every distinct 3-column combination only once
CREATE TABLE dbo.BranchDestCustBridge
(
SurrogateKey INT IDENTITY(1,1) PRIMARY KEY,
branch_origin_key INT NOT NULL,
branch_dest_key INT NOT NULL,
outbound_revenue_customer_key INT NOT NULL,
CONSTRAINT uq_bridge UNIQUE
(branch_origin_key, branch_dest_key, outbound_revenue_customer_key)
);
IDENTITY
makes SurrogateKey
a compact 4-byte integer; the UNIQUE constraint guarantees you never get duplicate rows.
INSERT INTO dbo.BranchDestCustBridge
(branch_origin_key, branch_dest_key, outbound_revenue_customer_key)
SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
FROM (
SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
FROM dbo.air_housebills
UNION -- UNION removes duplicates across both tables
SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
FROM dbo.aggregated_data
) AS u;
Run this once (or re-run as an UP‐SERT in a nightly job if the combo list grows).
-- 3a. Add nullable column first
ALTER TABLE dbo.air_housebills ADD SurrogateKey INT NULL;
ALTER TABLE dbo.aggregated_data ADD SurrogateKey INT NULL;
-- 3b. Fill it by looking up the bridge
UPDATE h
SET SurrogateKey = b.SurrogateKey
FROM dbo.air_housebills h
JOIN dbo.BranchDestCustBridge b
ON b.branch_origin_key = h.branch_origin_key
AND b.branch_dest_key = h.branch_dest_key
AND b.outbound_revenue_customer_key = h.outbound_revenue_customer_key;
UPDATE a
SET SurrogateKey = b.SurrogateKey
FROM dbo.aggregated_data a
JOIN dbo.BranchDestCustBridge b
ON b.branch_origin_key = a.branch_origin_key
AND b.branch_dest_key = a.branch_dest_key
AND b.outbound_revenue_customer_key = a.outbound_revenue_customer_key;
-- 3c. Make it NOT NULL and add an index
ALTER TABLE dbo.air_housebills ALTER COLUMN SurrogateKey INT NOT NULL;
ALTER TABLE dbo.aggregated_data ALTER COLUMN SurrogateKey INT NOT NULL;
CREATE INDEX IX_air_housebills_SKey ON dbo.air_housebills (SurrogateKey);
CREATE INDEX IX_aggregated_SKey ON dbo.aggregated_data (SurrogateKey);
Optionally add a FOREIGN KEY back to the bridge for data integrity.
-
Import the three tables.
-
Create one-to-many relationships:
-
BranchDestCustBridge[SurrogateKey] → air_housebills[SurrogateKey]
-
BranchDestCustBridge[SurrogateKey] → aggregated_data[SurrogateKey]
-
-
Hide the SurrogateKey columns in the report view if you don’t want users to see them.
-
If report users need to slice by branch or customer, expose those three natural-key columns from the bridge (they live only once).
Benefit | Reason |
---|---|
4-byte lookup instead of long VARCHAR | VertiPaq compression is optimal on integers; memory footprint drops sharply. |
Single join | Power BI relationships are hash tables on one column—exactly what SurrogateKey gives you. |
Indexes keep ETL cheap | On large tables the SurrogateKey cluster-friendly index speeds lookups for updates/inserts. |
If you cannot add a bridge table (e.g., read-only source), you can:
ALTER TABLE dbo.air_housebills
ADD CompositeHash AS
(CONVERT(BIGINT,
CHECKSUM(branch_origin_key, branch_dest_key, outbound_revenue_customer_key)
)) PERSISTED;
-- Repeat for aggregated_data and build a relationship on CompositeHash
…but CHECKSUM
collisions, even though rare, are possible, so the identity-bridge approach above is safer and easier to maintain.
You:
I'm a little confused about step 3, "3 Add a surrogate-key column to each fact table". I'm going to run a monthly job that truncates all the tables and replaces the data. that doesn't seem like it will work with step 3.
ChatGPT:
Thought for 29 seconds
The key idea is that the SurrogateKey column is just another data column.
When you TRUNCATE
the fact tables you wipe all columns (including SurrogateKey), then you reload them and repopulate that column exactly the same way each month. Nothing in Step 3 assumes the tables stay untouched—only that you repopulate the key after each load.
Below are three practical variants you can choose from, depending on how your ETL job already works.
-
Do not truncate the bridge.
It holds the master list of key combinations and their identity numbers.
Keep it growing over time; an INT identity can store >2 billion rows. -
ETL sequence each month
BEGIN TRAN; -- 1. Truncate and reload the two staging tables TRUNCATE TABLE dbo.air_housebills; TRUNCATE TABLE dbo.aggregated_data; INSERT INTO dbo.air_housebills (...) SELECT ... FROM source1; INSERT INTO dbo.aggregated_data (...) SELECT ... FROM source2; -- 2. Up-sert new combos into the bridge (only inserts run) MERGE dbo.BranchDestCustBridge AS tgt USING ( SELECT DISTINCT branch_origin_key, branch_dest_key, outbound_revenue_customer_key FROM ( SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key FROM dbo.air_housebills UNION ALL SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key FROM dbo.aggregated_data ) s ) AS src ON src.branch_origin_key = tgt.branch_origin_key AND src.branch_dest_key = tgt.branch_dest_key AND src.outbound_revenue_customer_key = tgt.outbound_revenue_customer_key WHEN NOT MATCHED BY TARGET THEN INSERT (branch_origin_key, branch_dest_key, outbound_revenue_customer_key) VALUES (src.branch_origin_key, src.branch_dest_key, src.outbound_revenue_customer_key); -- 3. Populate SurrogateKey in the facts UPDATE h SET SurrogateKey = b.SurrogateKey FROM dbo.air_housebills h JOIN dbo.BranchDestCustBridge b ON b.branch_origin_key = h.branch_origin_key AND b.branch_dest_key = h.branch_dest_key AND b.outbound_revenue_customer_key = h.outbound_revenue_customer_key; UPDATE a SET SurrogateKey = b.SurrogateKey FROM dbo.aggregated_data a JOIN dbo.BranchDestCustBridge b ON b.branch_origin_key = a.branch_origin_key AND b.branch_dest_key = a.branch_dest_key AND b.outbound_revenue_customer_key = a.outbound_revenue_customer_key; COMMIT;
Why this is easiest: the identity values in the bridge never change, so incremental refresh in Power BI or snapshotting across months still works.
If you must truncate the bridge as well (e.g., corporate policy says “truncate everything”), do the steps in this order so the keys line up:
-
Truncate the bridge.
-
Insert distinct combos into the bridge (same
INSERT … SELECT … UNION
as before). -
Truncate & reload the fact tables while joining to the freshly built bridge to pick up SurrogateKey during the insert:
TRUNCATE TABLE dbo.air_housebills; INSERT INTO dbo.air_housebills (SurrogateKey, col1, col2, …) SELECT b.SurrogateKey, s.col1, s.col2, … FROM Staging_air_housebills s -- raw data already loaded into staging JOIN dbo.BranchDestCustBridge b ON b.branch_origin_key = s.branch_origin_key AND b.branch_dest_key = s.branch_dest_key AND b.outbound_revenue_customer_key = s.outbound_revenue_customer_key;
Power BI doesn’t care that the identity numbers are different each month—as long as you do a full refresh after the reload.
If you have read-only access and cannot create or maintain a bridge, you can stamp a deterministic numeric hash in each fact table:
ALTER TABLE dbo.air_housebills
ADD SurrogateKey AS
(ABS(CONVERT(BIGINT,
CONVERT(BINARY(8), HASHBYTES('SHA1',
CONCAT(branch_origin_key, '|',
branch_dest_key, '|',
outbound_revenue_customer_key)), 1)
))) PERSISTED;
Pros: no join, no extra table.
Cons: slightly longer key (BIGINT), a vanishingly small risk of collisions, and you lose the star-schema “dimension” benefits.
Scenario | Best variant |
---|---|
You can keep a small helper table and you’d like lightning-fast joins, easy slicers, and minimal risk | Variant A |
Corporate ETL wipes everything every load and you can tweak the load scripts | Variant B |
Data source is read-only, or adding extra tables is impossible | Variant C |
Regardless of which loading variant you choose, in Power BI you still:
-
Import the bridge/dimension (unless you went with Variant C).
-
Import the two fact tables.
-
Create one-to-many relationships on
SurrogateKey
. -
Refresh the model after each monthly ETL run.
That’s all—the monthly truncate/reload cycle just means your ETL has to repopulate the SurrogateKey column every time.
You:
I like variant A. show me the full steps 1-4 assuming that I will use variant A.
ChatGPT:
Thought for 5 seconds
Below is a complete, end-to-end recipe for Variant A – persistent bridge table.
Follow it once to set everything up, then reuse the “Monthly ETL” block each time you reload.
/* 1. Master list of the three natural-key columns */
CREATE TABLE dbo.BranchDestCustBridge
(
SurrogateKey INT IDENTITY(1,1) PRIMARY KEY,
branch_origin_key INT NOT NULL,
branch_dest_key INT NOT NULL,
outbound_revenue_customer_key INT NOT NULL,
CONSTRAINT uq_BranchDestCust UNIQUE
(branch_origin_key, branch_dest_key, outbound_revenue_customer_key)
);
/* Optional but useful for ETL lookups */
CREATE INDEX IX_BranchDestCust_3cols
ON dbo.BranchDestCustBridge (branch_origin_key,
branch_dest_key,
outbound_revenue_customer_key);
Why keep it forever?
Every unique combination gets one identity number and never changes, so Power BI relationships—and any historical snapshots you keep—stay stable.
INSERT INTO dbo.BranchDestCustBridge
(branch_origin_key, branch_dest_key, outbound_revenue_customer_key)
SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
FROM (
SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
FROM dbo.air_housebills
UNION -- UNION removes duplicates across the two tables
SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
FROM dbo.aggregated_data
) AS u;
After this point the bridge contains every combo that already exists in either fact table.
ALTER TABLE dbo.air_housebills ADD SurrogateKey INT NULL;
ALTER TABLE dbo.aggregated_data ADD SurrogateKey INT NULL;
/* Fill it for the rows currently in the tables */
UPDATE h
SET SurrogateKey = b.SurrogateKey
FROM dbo.air_housebills h
JOIN dbo.BranchDestCustBridge b
ON b.branch_origin_key = h.branch_origin_key
AND b.branch_dest_key = h.branch_dest_key
AND b.outbound_revenue_customer_key = h.outbound_revenue_customer_key;
UPDATE a
SET SurrogateKey = b.SurrogateKey
FROM dbo.aggregated_data a
JOIN dbo.BranchDestCustBridge b
ON b.branch_origin_key = a.branch_origin_key
AND b.branch_dest_key = a.branch_dest_key
AND b.outbound_revenue_customer_key = a.outbound_revenue_customer_key;
/* Lock it down and index it */
ALTER TABLE dbo.air_housebills ALTER COLUMN SurrogateKey INT NOT NULL;
ALTER TABLE dbo.aggregated_data ALTER COLUMN SurrogateKey INT NOT NULL;
ALTER TABLE dbo.air_housebills
ADD CONSTRAINT fk_AHB_SKey FOREIGN KEY (SurrogateKey)
REFERENCES dbo.BranchDestCustBridge (SurrogateKey);
ALTER TABLE dbo.aggregated_data
ADD CONSTRAINT fk_AGG_SKey FOREIGN KEY (SurrogateKey)
REFERENCES dbo.BranchDestCustBridge (SurrogateKey);
CREATE INDEX IX_AHB_SKey ON dbo.air_housebills (SurrogateKey);
CREATE INDEX IX_AGG_SKey ON dbo.aggregated_data (SurrogateKey);
You’re now ready to build the Power BI model, but the keys also need to be refreshed after each monthly reload…
Run this block every time you replace the data.
BEGIN TRAN;
/* 0. Truncate the two fact tables. (Bridge is NOT truncated) */
TRUNCATE TABLE dbo.air_housebills;
TRUNCATE TABLE dbo.aggregated_data;
/* 1. Reload fact tables from your external sources */
INSERT INTO dbo.air_housebills (branch_origin_key, branch_dest_key,
outbound_revenue_customer_key, …other columns…)
SELECT … FROM source1; -- your real load here
INSERT INTO dbo.aggregated_data (branch_origin_key, branch_dest_key,
outbound_revenue_customer_key, …)
SELECT … FROM source2;
/* 2. Up-sert any brand-new combos into the bridge */
MERGE dbo.BranchDestCustBridge AS tgt
USING (
/* Distinct list across both freshly loaded tables */
SELECT DISTINCT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
FROM (
SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
FROM dbo.air_housebills
UNION ALL
SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
FROM dbo.aggregated_data
) s
) AS src
ON src.branch_origin_key = tgt.branch_origin_key
AND src.branch_dest_key = tgt.branch_dest_key
AND src.outbound_revenue_customer_key = tgt.outbound_revenue_customer_key
WHEN NOT MATCHED BY TARGET THEN
INSERT (branch_origin_key, branch_dest_key, outbound_revenue_customer_key)
VALUES (src.branch_origin_key, src.branch_dest_key,
src.outbound_revenue_customer_key);
/* 3. Stamp SurrogateKey in both fact tables */
UPDATE h
SET SurrogateKey = b.SurrogateKey
FROM dbo.air_housebills h
JOIN dbo.BranchDestCustBridge b
ON b.branch_origin_key = h.branch_origin_key
AND b.branch_dest_key = h.branch_dest_key
AND b.outbound_revenue_customer_key = h.outbound_revenue_customer_key;
UPDATE a
SET SurrogateKey = b.SurrogateKey
FROM dbo.aggregated_data a
JOIN dbo.BranchDestCustBridge b
ON b.branch_origin_key = a.branch_origin_key
AND b.branch_dest_key = a.branch_dest_key
AND b.outbound_revenue_customer_key = a.outbound_revenue_customer_key;
COMMIT;
Each month:
-
fact tables are emptied and refilled;
-
the bridge only grows if new combos appear;
-
SurrogateKey values stay stable for existing combos; brand-new combos get new identity values.
-
Get data
- Import
BranchDestCustBridge
,air_housebills
, andaggregated_data
.
- Import
-
In Model view create two relationships:
From table Column To table Column Cardinality Cross-filter BranchDestCustBridge SurrogateKey air_housebills SurrogateKey 1 : * Both (if you want bidirectional) BranchDestCustBridge SurrogateKey aggregated_data SurrogateKey 1 : * Both -
Hide
SurrogateKey
in the report view (optional). -
Expose the three natural-key columns in the bridge if users need them for slicers.
-
After every monthly ETL run, refresh the Power BI dataset (gateway or manual).
-
Compact memory & fast joins – Power BI looks up 4-byte integers instead of multi-column strings.
-
Stable keys – Historical snapshots or incremental refresh partitions stay valid month-to-month.
-
Clean star schema – The bridge doubles as a slicer table if you need to filter by branch or customer.
That’s the entire workflow for Variant A. Once you put these scripts into your ETL pipeline and schedule the dataset refresh, everything runs hands-off each month.
You:
Step 3 seems like it won’t work. The surrogate keys are non-null at this point, so you can’t update them after an insert.
ChatGPT:
Thought for 52 seconds
Good catch. You’re absolutely right: once SurrogateKey is NOT NULL
(and especially if a FK exists), you cannot insert rows without supplying that value and then update it later. My earlier “insert then update” monthly ETL block would fail under those constraints.
Below is the corrected Variant A (persistent bridge) workflow that loads SurrogateKey at insert time. I’ll show the full end-to-end steps again with that fix incorporated.
-
Load raw data into staging tables (no SurrogateKey columns).
-
Upsert any new (branch_origin_key, branch_dest_key, outbound_revenue_customer_key) combos into the persistent bridge.
-
Insert into the production fact tables joining to the bridge to pick up SurrogateKey in the INSERT.
-
(Optional) swap/rename partitions instead of TRUNCATE if you want minimal downtime; below I’ll keep TRUNCATE for clarity.
CREATE TABLE dbo.BranchDestCustBridge
(
SurrogateKey INT IDENTITY(1,1) PRIMARY KEY,
branch_origin_key INT NOT NULL,
branch_dest_key INT NOT NULL,
outbound_revenue_customer_key INT NOT NULL,
CONSTRAINT uq_BranchDestCust UNIQUE
(branch_origin_key, branch_dest_key, outbound_revenue_customer_key)
);
GO
CREATE INDEX IX_BranchDestCust_3cols
ON dbo.BranchDestCustBridge (branch_origin_key, branch_dest_key, outbound_revenue_customer_key);
GO
INSERT INTO dbo.BranchDestCustBridge
(branch_origin_key, branch_dest_key, outbound_revenue_customer_key)
SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
FROM (
SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
FROM dbo.air_housebills
UNION
SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
FROM dbo.aggregated_data
) AS u;
GO
IMPORTANT: We’ll create them NULLABLE now, populate them, then set
NOT NULL
.
After that, future reloads must insert SurrogateKey.
ALTER TABLE dbo.air_housebills ADD SurrogateKey INT NULL;
ALTER TABLE dbo.aggregated_data ADD SurrogateKey INT NULL;
GO
/* Backfill current rows */
UPDATE h
SET SurrogateKey = b.SurrogateKey
FROM dbo.air_housebills AS h
JOIN dbo.BranchDestCustBridge AS b
ON b.branch_origin_key = h.branch_origin_key
AND b.branch_dest_key = h.branch_dest_key
AND b.outbound_revenue_customer_key = h.outbound_revenue_customer_key;
UPDATE a
SET SurrogateKey = b.SurrogateKey
FROM dbo.aggregated_data AS a
JOIN dbo.BranchDestCustBridge AS b
ON b.branch_origin_key = a.branch_origin_key
AND b.branch_dest_key = a.branch_dest_key
AND b.outbound_revenue_customer_key = a.outbound_revenue_customer_key;
/* Enforce integrity */
ALTER TABLE dbo.air_housebills ALTER COLUMN SurrogateKey INT NOT NULL;
ALTER TABLE dbo.aggregated_data ALTER COLUMN SurrogateKey INT NOT NULL;
ALTER TABLE dbo.air_housebills
ADD CONSTRAINT fk_AHB_SKey FOREIGN KEY (SurrogateKey)
REFERENCES dbo.BranchDestCustBridge (SurrogateKey);
ALTER TABLE dbo.aggregated_data
ADD CONSTRAINT fk_AGG_SKey FOREIGN KEY (SurrogateKey)
REFERENCES dbo.BranchDestCustBridge (SurrogateKey);
CREATE INDEX IX_AHB_SKey ON dbo.air_housebills (SurrogateKey);
CREATE INDEX IX_AGG_SKey ON dbo.aggregated_data (SurrogateKey);
GO
At this point your production fact tables are keyed and enforced.
Goal: Truncate facts, reload from source each month, and supply SurrogateKey values during insert. To do that we (a) load to staging, (b) upsert bridge, (c) insert into prod with join.
If your current ETL already lands data in staging tables (recommended), follow the pattern below.
If not, create temp (#) or staging tables in the job to mimic this.
You can use permanent staging tables or truncate-and-replace them each cycle.
-- Example staging schemas; adjust as needed
IF OBJECT_ID('stg.air_housebills_raw','U') IS NULL
BEGIN
CREATE TABLE stg.air_housebills_raw
(
branch_origin_key INT NOT NULL,
branch_dest_key INT NOT NULL,
outbound_revenue_customer_key INT NOT NULL,
/* other source columns ... */
/* types must match or be castable */
);
END;
IF OBJECT_ID('stg.aggregated_data_raw','U') IS NULL
BEGIN
CREATE TABLE stg.aggregated_data_raw
(
branch_origin_key INT NOT NULL,
branch_dest_key INT NOT NULL,
outbound_revenue_customer_key INT NOT NULL,
/* other source columns ... */
);
END;
GO
BEGIN TRY
BEGIN TRAN;
/* A. Refresh staging from source systems */
TRUNCATE TABLE stg.air_housebills_raw;
TRUNCATE TABLE stg.aggregated_data_raw;
-- Replace these with your real source pulls
INSERT INTO stg.air_housebills_raw (branch_origin_key, branch_dest_key, outbound_revenue_customer_key /*, ...*/)
SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key /*, ...*/
FROM SourceSystem1.dbo.air_housebills_extract; -- example
INSERT INTO stg.aggregated_data_raw (branch_origin_key, branch_dest_key, outbound_revenue_customer_key /*, ...*/)
SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key /*, ...*/
FROM SourceSystem2.dbo.aggregated_data_extract; -- example
/* B. Upsert new combos into the persistent bridge */
;WITH AllCombos AS (
SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
FROM stg.air_housebills_raw
UNION ALL
SELECT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
FROM stg.aggregated_data_raw
),
DistCombos AS (
SELECT DISTINCT branch_origin_key, branch_dest_key, outbound_revenue_customer_key
FROM AllCombos
)
INSERT INTO dbo.BranchDestCustBridge (branch_origin_key, branch_dest_key, outbound_revenue_customer_key)
SELECT d.branch_origin_key, d.branch_dest_key, d.outbound_revenue_customer_key
FROM DistCombos AS d
WHERE NOT EXISTS (
SELECT 1
FROM dbo.BranchDestCustBridge AS b
WHERE b.branch_origin_key = d.branch_origin_key
AND b.branch_dest_key = d.branch_dest_key
AND b.outbound_revenue_customer_key = d.outbound_revenue_customer_key
);
/* (Avoid MERGE if you prefer; above pattern is concurrency-safe and simple.) */
/* C. Truncate production fact tables */
TRUNCATE TABLE dbo.air_housebills;
TRUNCATE TABLE dbo.aggregated_data;
/* D. Insert into production facts WITH SurrogateKey lookup */
INSERT INTO dbo.air_housebills
(SurrogateKey,
branch_origin_key, branch_dest_key, outbound_revenue_customer_key,
/* other columns in target order */)
SELECT b.SurrogateKey,
s.branch_origin_key, s.branch_dest_key, s.outbound_revenue_customer_key,
/* other columns from staging in matching order */
FROM stg.air_housebills_raw AS s
JOIN dbo.BranchDestCustBridge AS b
ON b.branch_origin_key = s.branch_origin_key
AND b.branch_dest_key = s.branch_dest_key
AND b.outbound_revenue_customer_key = s.outbound_revenue_customer_key;
INSERT INTO dbo.aggregated_data
(SurrogateKey,
branch_origin_key, branch_dest_key, outbound_revenue_customer_key,
/* other columns */)
SELECT b.SurrogateKey,
s.branch_origin_key, s.branch_dest_key, s.outbound_revenue_customer_key,
/* other columns */
FROM stg.aggregated_data_raw AS s
JOIN dbo.BranchDestCustBridge AS b
ON b.branch_origin_key = s.branch_origin_key
AND b.branch_dest_key = s.branch_dest_key
AND b.outbound_revenue_customer_key = s.outbound_revenue_customer_key;
COMMIT TRAN;
END TRY
BEGIN CATCH
IF @@TRANCOUNT > 0 ROLLBACK TRAN;
THROW;
END CATCH;
GO
Why this works: Every insert into the production fact tables already has a valid SurrogateKey, so NOT NULL
and FOREIGN KEY
constraints are satisfied at insert time—no post-insert update required.
Create the relationships from BranchDestCustBridge[SurrogateKey]
to each fact’s SurrogateKey
(1:*). Refresh after each monthly reload.
1. Missing bridge rows (should be zero):
SELECT COUNT(*) AS MissingInBridge
FROM stg.air_housebills_raw s
LEFT JOIN dbo.BranchDestCustBridge b
ON b.branch_origin_key = s.branch_origin_key
AND b.branch_dest_key = s.branch_dest_key
AND b.outbound_revenue_customer_key = s.outbound_revenue_customer_key
WHERE b.SurrogateKey IS NULL;
Run the same for aggregated_data_raw
. If anything > 0, the bridge upsert step failed.
2. Post-load null keys (should be zero):
SELECT COUNT(*) FROM dbo.air_housebills WHERE SurrogateKey IS NULL;
SELECT COUNT(*) FROM dbo.aggregated_data WHERE SurrogateKey IS NULL;
3. Referential integrity is already enforced by FK; but if you delay FKs for load performance, run checks then re-enable.
If you… | Do this |
---|---|
Already have ETL staging tables | Use them exactly as shown; simplest. |
Pull directly from external sources with a single INSERT…SELECT | Inline the bridge lookup in that SELECT; still must upsert bridge first. |
Need very fast loads | Bulk insert to staging; bulk upsert bridge; batch insert with SK join; optional minimal logging in bulk mode. |
-
You correctly identified the NOT NULL insertion issue. The fix is: always include SurrogateKey in the INSERT (or keep the column nullable, which weakens integrity—don’t recommend). *
-
Because the bridge persists, SurrogateKey values remain stable month to month for existing combos; new combos simply get new identity values.
Let me know how your current monthly load is structured (SSIS? ADF? T-SQL Agent job? Python?), and I’ll tailor the staging + insert scripts to match exactly.