Skip to content

Instantly share code, notes, and snippets.

@dbist
Last active March 7, 2023 20:01
Show Gist options
  • Select an option

  • Save dbist/3a4ebddcf7d94c9653a4ef79b7d1f884 to your computer and use it in GitHub Desktop.

Select an option

Save dbist/3a4ebddcf7d94c9653a4ef79b7d1f884 to your computer and use it in GitHub Desktop.

Correcting my misconceptions with REGIONAL BY ROW tables


This tutorial will dive deeper into nuances with CockroachDB REGIONAL BY ROW tables. I learned the hard way about the limitations and to pay close attention to the documentation on the topic.


Previous articles on multiregion

CockroachDB Multi-Region Abstractions for MongoDB Developers With FerretDB


Motivation

I was working with a prospect on a use case where my knowledge of CockroachDB multiregion absractions was put to the test. After trying and failing to gain low latency write behavior from geographically distributed table, I reached out to the engineering team who corrected my understanding. This is my way of documenting my understanding and committing it to memory.

High Level Steps

  • Provision a multiregion CockroachDB cluster
  • Demonstrate the problem
  • Demonstrate the solution
  • Workaround for non-uuid primary keys

Step by step instructions

Provision a multiregion CockroachDB cluster

You will need access to a multiregion CockroachDB cluster. The easiest and most cost effective way is signing up for CockroachDB Serverless. That's what I am using in my demonstration. My serverless cluster spans three regions in GCP.

serverless_mr

We can connect to the regional endpoints using the Connect modal. The default connection string given will be localized as CockroachDB Serverless automatically detects the end user locality. To access the cluster from the remote regions, you must append the regions to the connection string. In essence, the host artem-serverless-mr-26.h4f.cockroachlabs.cloud becomes artem-serverless-mr-26.h4f.gcp-us-east1.cockroachlabs.cloud, artem-serverless-mr-26.h4f.gcp-us-west2.cockroachlabs.cloud, and artem-serverless-mr-26.h4f.gcp-europe-west1.cockroachlabs.cloud.

We can verify the gateway region we're connecting from by querying the helper function select gateway_region(); from each of the connections.

  gateway_region
------------------
  gcp-us-east1

  gateway_region
------------------
  gcp-us-west2

  gateway_region
--------------------
  gcp-europe-west1

Demonstrate the problem

Considering an inventory system moving to CockroachDB and leveraging multiregion abstractions, typically the challenge is getting the primary key correct to optimize for writes. The following examples should demonstrate where the challenges are and how to work around them.

To get started, we're going to create a database called demo and enable multiregion abstractions on it.

CREATE DATABASE demo;
ALTER DATABASE demo SET PRIMARY REGION "gcp-us-east1";
ALTER DATABASE demo ADD region "gcp-us-west2";
ALTER DATABASE demo ADD region "gcp-europe-west1";

SET override_multi_region_zone_config = true;

Feel free to review our docs for more information but in summary, we're enrolling our database into multiregion abstractions and setting gcp-us-east1 as the primary region as it's the closest region to my location.

Then I'm passing a property to override certain zone configurations. This step is optional unless you are coming back to make further changes after.

Finally, we want to pin the leaseholders to the US region so that schema changes in multiregion context complete faster. You can learn more here.

ALTER DATABASE system CONFIGURE ZONE USING constraints = '{"+region=gcp-us-east1": 1}', lease_preferences = '[[+region=gcp-us-east1]]';

Let's create a table to demonstrate the problem

CREATE TABLE demo_int_pk (productID PRIMARY KEY, count) AS SELECT unordered_unique_rowid() AS productID, generate_series(1, 10) AS count;

Make the table REGIONAL BY ROW

ALTER TABLE demo_int_pk SET LOCALITY REGIONAL BY ROW;

The primary key on the table is an integer type. This problem is not specific to integers, you will have similar behavior with other types too, except for uuid but I'm jumping ahead a bit.

Let's look at the schema of the table

  CREATE TABLE public.demo_int_pk (
      productid INT8 NOT NULL,
      count INT8 NULL,
      crdb_region demo.public.crdb_internal_region NOT VISIBLE NOT NULL DEFAULT default_to_database_primary_region(gateway_region())::demo.public.crdb_internal_region,
      CONSTRAINT demo_int_pk_pkey PRIMARY KEY (productid ASC)
  ) LOCALITY REGIONAL BY ROW

We can confirm the PK is indeed an integer. We can also see a new column called crdb_region which was added when we enabled REGIONAL BY ROW.

Let's look at the data

 SELECT crdb_region, * FROM demo_int_pk;
  crdb_region  |      productid      | count
---------------+---------------------+--------
  gcp-us-east1 | 1021351265574322179 |     5
  gcp-us-east1 | 2174272770181169155 |     9
  gcp-us-east1 | 2750733522484592643 |     1
  gcp-us-east1 | 3327194274788016131 |     7
  gcp-us-east1 | 3903655027091439619 |     3
  gcp-us-east1 | 5633037284001710083 |     6
  gcp-us-east1 | 6209498036305133571 |     2
  gcp-us-east1 | 6785958788608557059 |    10
  gcp-us-east1 | 7938880293215404035 |     8
  gcp-us-east1 | 8515341045518827523 |     4

Notice the crdb_region column reports the US East region which is the region where I executed the insert statement and the rows are homed in. Let's attempt to do an update on the inventory count for one of the records.

WITH update_demo_int_pk AS (
    UPDATE demo_int_pk SET count = 100 WHERE productID = 2750733522484592643 RETURNING crdb_region, count
) SELECT gateway_region(), crdb_region, count FROM update_demo_int_pk;
  gateway_region | crdb_region  | count
-----------------+--------------+--------
  gcp-us-east1   | gcp-us-east1 |   100

Time: 40ms total (execution 11ms / network 29ms)

It takes 11ms to update the count using the US East connection. Notice I am using gateway_region() function, which will confirm I am using the US East connection. Let's now update the record from another region. I am going to use gcp-us-west2 region.

WITH update_demo_int_pk AS (
    UPDATE demo_int_pk SET count = 100 WHERE productID = 2750733522484592643 RETURNING crdb_region, count
) SELECT gateway_region(), crdb_region, count FROM update_demo_int_pk;
  gateway_region | crdb_region  | count
-----------------+--------------+--------
  gcp-us-west2   | gcp-us-east1 |   100

Time: 1.280s total (execution 1.204s / network 0.076s)

The query took 1.2 seconds to execute. This is not surprising considering the record is physically stored in the US East region. Let's look at the plan of the query:

distribution: local
  vectorized: true

  • root
  │
  ├── • render
  │   │
  │   └── • scan buffer
  │         estimated row count: 1
  │         label: buffer 2 (update_demo_int_pk)
  │
  └── • subquery
      │ id: @S1
      │ original sql: UPDATE demo_int_pk SET count = 100 WHERE productid = 2750733522484592643 RETURNING crdb_region, count
      │ exec mode: all rows
      │
      └── • buffer
          │ label: buffer 2 (update_demo_int_pk)
          │
          └── • update
              │ estimated row count: 1
              │ table: demo_int_pk
              │ set: count
              │
              └── • render
                  │
                  └── • union all
                      │ estimated row count: 1limit: 1
                      │
                      ├── • scan
                      │     estimated row count: 1 (10% of the table; stats collected 19 seconds ago; using stats forecast for 12 minutes in the future)
                      │     table: demo_int_pk@demo_int_pk_pkey
                      │     spans: [/'gcp-us-west2'/2750733522484592643 - /'gcp-us-west2'/2750733522484592643]
                      │
                      └── • scan
                            estimated row count: 1 (10% of the table; stats collected 19 seconds ago; using stats forecast for 12 minutes in the future)
                            table: demo_int_pk@demo_int_pk_pkey
                            spans: [/'gcp-europe-west1'/2750733522484592643 - /'gcp-europe-west1'/2750733522484592643] [/'gcp-us-east1'/2750733522484592643 - /'gcp-us-east1'/2750733522484592643]

if you read the bottom part where we union all the results of the scans in the gcp-us-west2 and gcp-europe-west1 regions to validate the record is unique across all regions.

Let's add a record in the US West region, we're expecting the write to be quick.

INSERT INTO demo_int_pk (productID, count) VALUES (unordered_unique_rowid(), 1) RETURNING gateway_region(), crdb_region, productID, count;
  gateway_region | crdb_region  |      productid      | count
-----------------+--------------+---------------------+--------
  gcp-us-west2   | gcp-us-west2 | 5746135747274211350 |     1

Time: 296ms total (execution 217ms / network 78ms)

It takes 217ms to insert eventhough we are writing from the US West region.

The story is the same if we were to insert from the EU:

   gateway_region  |   crdb_region    |      productid      | count
-------------------+------------------+---------------------+--------
  gcp-europe-west1 | gcp-europe-west1 | 1222035725603831819 |     1

Time: 338ms total (execution 250ms / network 88ms)

Updating a record from the outside region will also be costly:

WITH update_demo_int_pk AS (
    UPDATE demo_int_pk SET count = 100 WHERE productID = 2750733522484592643 RETURNING crdb_region, count
) SELECT gateway_region(), crdb_region, count FROM update_demo_int_pk;
  gateway_region | crdb_region  | count
-----------------+--------------+--------
  gcp-us-west2   | gcp-us-east1 |   100

  Time: 354ms total (execution 277ms / network 76ms)

Just to explain the statement; I updated the record physically stored in the US East (crdb_region) going through the US West endpoint (gateway_region).

We now have very fast writes from the local region but very slow writes from everywhere else.

Let's now focus on how we can make the scenario better. The hard lesson for me was to realize that our docs call out an optimization that I've failed to see originally.

Note: When using DEFAULT gen_random_uuid() on columns in REGIONAL BY ROW tables, uniqueness checks on those columns are disabled by default for performance purposes. CockroachDB assumes uniqueness based on the way this column generates UUIDs. To enable this check, you can modify the sql.optimizer.uniqueness_checks_for_gen_random_uuid.enabled cluster setting. Note that while there is virtually no chance of a collision occurring when enabling this setting, it is not truly zero.

With that, let's create a new table with PRIMARY KEY being uuid.

Demonstrate the solution

CREATE TABLE demo_uuid_pk (productID PRIMARY KEY, count) AS SELECT gen_random_uuid() AS productID, generate_series(1, 10) AS count;
ALTER TABLE demo_uuid_pk SET LOCALITY REGIONAL BY ROW;

The create statement is below

  CREATE TABLE public.demo_uuid_pk (
      productid UUID NOT NULL,
      count INT8 NULL,
      crdb_region demo.public.crdb_internal_region NOT VISIBLE NOT NULL DEFAULT default_to_database_primary_region(gateway_region())::demo.public.crdb_internal_region,
      CONSTRAINT demo_uuid_pk_pkey PRIMARY KEY (productid ASC)
  ) LOCALITY REGIONAL BY ROW

The data looks like so

  crdb_region  |              productid               | count
---------------+--------------------------------------+--------
  gcp-us-east1 | 229f4258-cac2-4b06-815f-5dd8be8483c7 |     9
  gcp-us-east1 | 525845a9-30ab-42dd-95bb-fb1d72fc7137 |     8
  gcp-us-east1 | 58b174b5-c2f6-433e-a373-74879d81ff92 |     5
  gcp-us-east1 | 5b10da77-e750-431d-a7ea-f0b5f0d61c77 |     3
  gcp-us-east1 | 930e64db-fbae-4743-9419-c274853b4cb3 |     2
  gcp-us-east1 | 994f1714-5d0e-4b9a-8cbb-ab0c94efe226 |     4
  gcp-us-east1 | a5b70013-b8cb-4ba1-86f5-1f892285cf99 |     7
  gcp-us-east1 | d3778386-153b-4431-ab8c-7d9ee9215bec |    10
  gcp-us-east1 | de6b91f8-f5a8-4575-ae3d-fe1b516ae906 |     6
  gcp-us-east1 | f5fa80ea-4fbf-4234-86b7-5592f1bfdda2 |     1

Let's update the inventory from the US East region

WITH update_demo_uuid_pk AS (
    UPDATE demo_uuid_pk SET count = 10 WHERE productID = '229f4258-cac2-4b06-815f-5dd8be8483c7' RETURNING crdb_region, count
) SELECT gateway_region(), crdb_region, count FROM update_demo_uuid_pk;
  gateway_region | crdb_region  | count
-----------------+--------------+--------
  gcp-us-east1   | gcp-us-east1 |    10

Time: 42ms total (execution 12ms / network 30ms)

This is similar to our original results, let's add records in the US West region and update the inventory.

INSERT INTO demo_uuid_pk (productID, count) SELECT gen_random_uuid(), generate_series(1, 10) RETURNING gateway_region(), crdb_region, productID, count;
  gateway_region | crdb_region  |              productid               | count
-----------------+--------------+--------------------------------------+--------
  gcp-us-west2   | gcp-us-west2 | cfb86888-e9b5-4328-9c60-0c102035b5e6 |     1
  gcp-us-west2   | gcp-us-west2 | b243affe-47e9-47b2-a7ba-cbc966d42797 |     2
  gcp-us-west2   | gcp-us-west2 | 82dc438b-d819-4a52-85b8-5ac4ecf4c725 |     3
  gcp-us-west2   | gcp-us-west2 | 4a7bda92-b37a-461a-8f76-ea33dba899dc |     4
  gcp-us-west2   | gcp-us-west2 | 55954dba-6892-4c8a-9f84-2a70b7b6f204 |     5
  gcp-us-west2   | gcp-us-west2 | 95c7dbf6-093a-4afb-af71-196833ab469c |     6
  gcp-us-west2   | gcp-us-west2 | 01689a64-9ae5-408a-b150-590f51efde31 |     7
  gcp-us-west2   | gcp-us-west2 | 7387bac4-e5c6-43b3-9bf6-149774de7ba5 |     8
  gcp-us-west2   | gcp-us-west2 | 6f0f8abe-6735-41d1-a549-755daddfa14a |     9
  gcp-us-west2   | gcp-us-west2 | 3d8c9ad6-d8ac-4551-b25a-067b8efd2db4 |    10

Time: 143ms total (execution 66ms / network 77ms)

Notice the execution is 66ms, which is 5x better than 250ms in the earlier attempts with integers.

Similarly, updates are just as fast

WITH update_demo_uuid_pk AS (
    UPDATE demo_uuid_pk SET count = 10 WHERE productID = 'e19aa60b-d449-478d-8867-7ecabe6eb361' RETURNING crdb_region, count
) SELECT gateway_region(), crdb_region, count FROM update_demo_uuid_pk;
  gateway_region | crdb_region  | count
-----------------+--------------+--------
  gcp-us-west2   | gcp-us-west2 |    10

Time: 86ms total (execution 10ms / network 77ms)

Notice the execution is now 10ms, which is significantly faster.

For posterity, here are results for inserting and updating in the EU:

   gateway_region  |   crdb_region    |              productid               | count
-------------------+------------------+--------------------------------------+--------
  gcp-europe-west1 | gcp-europe-west1 | f27db778-20ba-4458-b75f-0f3a6d46c6ab |     1
  gcp-europe-west1 | gcp-europe-west1 | fbff7233-468f-49d1-a7a2-719f851fed54 |     2
  gcp-europe-west1 | gcp-europe-west1 | 01709624-1cf6-4f72-8e8c-6188db8e517a |     3
  gcp-europe-west1 | gcp-europe-west1 | 21c4999a-d4e4-47c4-a034-ad005dd9e9b2 |     4
  gcp-europe-west1 | gcp-europe-west1 | 64ea0759-129d-4174-88cb-020ef7e947a8 |     5
  gcp-europe-west1 | gcp-europe-west1 | 946daa03-3932-4694-a7bc-a87b0fa5d242 |     6
  gcp-europe-west1 | gcp-europe-west1 | c1cbe899-d981-4223-993c-0d5987b9320b |     7
  gcp-europe-west1 | gcp-europe-west1 | 5dcca454-2b66-4dd1-aee2-d540eac6b710 |     8
  gcp-europe-west1 | gcp-europe-west1 | 1a6209c4-d804-432e-a239-60916c2f889b |     9
  gcp-europe-west1 | gcp-europe-west1 | 2df436d4-0f3f-40c3-8303-bec776e591c7 |    10

Time: 115ms total (execution 27ms / network 88ms)
WITH update_demo_uuid_pk AS (
    UPDATE demo_uuid_pk SET count = 10 WHERE productID = 'f27db778-20ba-4458-b75f-0f3a6d46c6ab' RETURNING crdb_region, count
) SELECT gateway_region(), crdb_region, count FROM update_demo_uuid_pk;
   gateway_region  |   crdb_region    | count
-------------------+------------------+--------
  gcp-europe-west1 | gcp-europe-west1 |    10

Time: 100ms total (execution 11ms / network 88ms)

The reason this is so fast is because we're using uuid type for our primary key. The chance of collision for uuid PK across all regions is minimal, therefore we skip uniqueness checks in the remote regions.

Workaround for non-uuid primary keys

You may be asking what if we would like to maintain the old type for primary keys? Well, in that case, we do have one more way to maintain backwards compatibility to your legacy primary keys and still leveraging great performance. We can take advantage of composite primary keys where one column is going to be your legacy non-uuid primary key and the other being a locality column like crdb_region.

CREATE TABLE demo_composite_pk (productID PRIMARY KEY, count) AS SELECT unordered_unique_rowid() AS productID, generate_series(1, 10) AS count;

Now we're going to add a new column region which you've seen before but this time, we are going to add it as a default expression. The reason being, it will force to evaluate gateway_region() which is the function we determine the locality of the write statement, at the insert time. The additional benefit to this approach is that region column in PK will avoid the uniqueness checks globally.

ALTER TABLE demo_composite_pk ADD COLUMN region crdb_internal_region NOT NULL DEFAULT gateway_region()::crdb_internal_region;

CockroachDB has a unique capability to changing primary keys online. Here is a highly simplified example on a small table and in many cases this is immensely useful! Changin PK in production is not a frequent occurence but it's good to know there are products that can do that.

ALTER TABLE demo_composite_pk ALTER PRIMARY KEY USING COLUMNS (region, productID);

Finally, let's set the table to REGIONAL BY ROW

ALTER TABLE demo_composite_pk SET LOCALITY REGIONAL BY ROW;

The final schema is below

  CREATE TABLE public.demo_composite_pk (
      productid INT8 NOT NULL,
      count INT8 NULL,
      region demo.public.crdb_internal_region NOT NULL DEFAULT gateway_region()::demo.public.crdb_internal_region,
      crdb_region demo.public.crdb_internal_region NOT VISIBLE NOT NULL DEFAULT default_to_database_primary_region(gateway_region())::demo.public.crdb_internal_region,
      CONSTRAINT demo_composite_pk_pkey PRIMARY KEY (region ASC, productid ASC),
      UNIQUE INDEX demo_composite_pk_productid_key (productid ASC)
  ) LOCALITY REGIONAL BY ROW

Notice there's an additional unique index on the productid. It is not just the remnants of the legacy PK (CockroachDB promotes a former PK to Unique Index when PK changes to a new one). Additionally, it serves another purpose to maintain uniqueness for productids.

Let's test this from the US East:

INSERT INTO demo_composite_pk (productID, count) SELECT unordered_unique_rowid(), generate_series(1, 10) RETURNING gateway_region(), crdb_region, productID, count;
  gateway_region | crdb_region  |      productid      | count
-----------------+--------------+---------------------+--------
  gcp-us-east1   | gcp-us-east1 | 5199323410302500866 |     1
  gcp-us-east1   | gcp-us-east1 | 8658087924123041794 |     2
  gcp-us-east1   | gcp-us-east1 |  299407015723401218 |     3
  gcp-us-east1   | gcp-us-east1 | 4911093034150789122 |     4
  gcp-us-east1   | gcp-us-east1 | 2605250024937095170 |     5
  gcp-us-east1   | gcp-us-east1 | 7216936043364483074 |     6
  gcp-us-east1   | gcp-us-east1 | 1452328520330248194 |     7
  gcp-us-east1   | gcp-us-east1 | 6064014538757636098 |     8
  gcp-us-east1   | gcp-us-east1 | 3758171529543942146 |     9
  gcp-us-east1   | gcp-us-east1 | 8369857547971330050 |    10

Time: 352ms total (execution 329ms / network 23ms)

Let's update a row homed in the US East region

WITH demo_composite_pk AS (
    UPDATE demo_composite_pk SET count = 10 WHERE productID = '5199323410302500866' RETURNING crdb_region, count
) SELECT gateway_region(), crdb_region, count FROM demo_composite_pk;
  gateway_region | crdb_region  | count
-----------------+--------------+--------
  gcp-us-east1   | gcp-us-east1 |    10

Time: 35ms total (execution 12ms / network 22ms)

Allright, we're not seeing anything we haven't seen before, let's insert a row in the US West region and then update it.

  gateway_region | crdb_region  |      productid      | count
-----------------+--------------+---------------------+--------
  gcp-us-west2   | gcp-us-west2 |  695146676766375958 |     1
  gcp-us-west2   | gcp-us-west2 | 6459754199800610838 |     2
  gcp-us-west2   | gcp-us-west2 | 4153911190586916886 |     3
  gcp-us-west2   | gcp-us-west2 | 8765597209014304790 |     4
  gcp-us-west2   | gcp-us-west2 |  406916300614664214 |     5
  gcp-us-west2   | gcp-us-west2 | 5018602319042052118 |     6
  gcp-us-west2   | gcp-us-west2 | 2712759309828358166 |     7
  gcp-us-west2   | gcp-us-west2 | 7324445328255746070 |     8
  gcp-us-west2   | gcp-us-west2 | 1559837805221511190 |     9
  gcp-us-west2   | gcp-us-west2 | 6171523823648899094 |    10

Time: 986ms total (execution 724ms / network 262ms)
WITH demo_composite_pk AS (
    UPDATE demo_composite_pk SET count = 10 WHERE productID = '6459754199800610838' RETURNING crdb_region, count
) SELECT gateway_region(), crdb_region, count FROM demo_composite_pk;
  gateway_region | crdb_region  | count
-----------------+--------------+--------
  gcp-us-west2   | gcp-us-west2 |    10

Time: 85ms total (execution 10ms / network 75ms)

This is good, we now have similar update performance from the US East and the West regions. What about EU?

   gateway_region  |   crdb_region    |      productid      | count
-------------------+------------------+---------------------+--------
  gcp-europe-west1 | gcp-europe-west1 | 3574678019714056204 |     1
  gcp-europe-west1 | gcp-europe-west1 | 2998217267410632716 |     2
  gcp-europe-west1 | gcp-europe-west1 | 7609903285838020620 |     3
  gcp-europe-west1 | gcp-europe-west1 | 1845295762803785740 |     4
  gcp-europe-west1 | gcp-europe-west1 | 6456981781231173644 |     5
  gcp-europe-west1 | gcp-europe-west1 | 4151138772017479692 |     6
  gcp-europe-west1 | gcp-europe-west1 | 8762824790444867596 |     7
  gcp-europe-west1 | gcp-europe-west1 | 5015829900472614924 |     8
  gcp-europe-west1 | gcp-europe-west1 | 2709986891258920972 |     9
  gcp-europe-west1 | gcp-europe-west1 | 7321672909686308876 |    10

Time: 1.046s total (execution 0.885s / network 0.162s)
WITH demo_composite_pk AS (
    UPDATE demo_composite_pk SET count = 10 WHERE productID = '3574678019714056204' RETURNING crdb_region, count
) SELECT gateway_region(), crdb_region, count FROM demo_composite_pk;
   gateway_region  |   crdb_region    | count
-------------------+------------------+--------
  gcp-europe-west1 | gcp-europe-west1 |    10

Time: 129ms total (execution 14ms / network 116ms)

This is great, we're able to maintain backwards compatibility to legacy keys and take advantage of local writes.

It's worth calling out that having a partition column as part of PK is not absolutely necessary. As long as the field you're partitioning by region has a unique constraint, you should be good to go. To emphasize this point, let's do the following, let's drop the unique constraint on the productid field.

DROP INDEX demo_composite_pk_productid_key;   

Execute an update on the table from any region

   gateway_region  |   crdb_region    | count
-------------------+------------------+--------
  gcp-europe-west1 | gcp-europe-west1 |    10

Time: 411ms total (execution 157ms / network 254ms)

Latency goes up immediately, we can no longer maintain uniqueness across regions and must visit each region.

distribution: local
  vectorized: true

  • root
  │
  ├── • render
  │   │
  │   └── • scan buffer
  │         estimated row count: 0
  │         label: buffer 2 (demo_composite_pk)
  │
  └── • subquery
      │ id: @S1
      │ original sql: UPDATE demo_composite_pk SET count = 10 WHERE productid = '3574678019714056204' RETURNING crdb_region, count
      │ exec mode: all rows
      │
      └── • buffer
          │ label: buffer 2 (demo_composite_pk)
          │
          └── • update
              │ estimated row count: 0
              │ table: demo_composite_pk
              │ set: count
              │
              └── • render
                  │
                  └── • scan
                        estimated row count: 0 (0.26% of the table; stats collected 16 minutes ago; using stats forecast for 5 hours in the future)
                        table: demo_composite_pk@demo_composite_pk_pkey
                        spans: [/'gcp-europe-west1'/'gcp-europe-west1'/3574678019714056204 - /'gcp-europe-west1'/'gcp-europe-west1'/3574678019714056204] [/'gcp-europe-west1'/'gcp-us-east1'/3574678019714056204 - /'gcp-europe-west1'/'gcp-us-east1'/3574678019714056204] [/'gcp-europe-west1'/'gcp-us-west2'/3574678019714056204 - /'gcp-europe-west1'/'gcp-us-west2'/3574678019714056204] [/'gcp-us-east1'/'gcp-europe-west1'/3574678019714056204 - /'gcp-us-east1'/'gcp-europe-west1'/3574678019714056204] … (5 more)
                        locking strength: for update

Let's add it back using the example in the link above

ALTER TABLE demo_composite_pk ADD CONSTRAINT productid_unique UNIQUE (productid);

Let's take a look at the partitions for productid across every region. We are going to filter the output of the following command as it is verbose.

SHOW PARTITIONS FROM TABLE demo_composite_pk;

changing it to

SELECT table_name, partition_name, column_names, index_name FROM [SHOW PARTITIONS FROM TABLE demo_composite_pk];
     table_name     |  partition_name  | column_names |                index_name
--------------------+------------------+--------------+-------------------------------------------
  demo_composite_pk | gcp-europe-west1 | crdb_region  | demo_composite_pk@demo_composite_pk_pkey
  demo_composite_pk | gcp-us-east1     | crdb_region  | demo_composite_pk@demo_composite_pk_pkey
  demo_composite_pk | gcp-us-west2     | crdb_region  | demo_composite_pk@demo_composite_pk_pkey
  demo_composite_pk | gcp-europe-west1 | crdb_region  | demo_composite_pk@productid_unique
  demo_composite_pk | gcp-us-east1     | crdb_region  | demo_composite_pk@productid_unique
  demo_composite_pk | gcp-us-west2     | crdb_region  | demo_composite_pk@productid_unique

Conclusion

In summary, if your application has a choice to migrate and pay the one-time cost to capture all of the benefits of what a multiregion database can offer, then opt for uuid PK. If your app is not that flexible, make sure you add a deterministic partition column and add a unique index on it. In my experiments I still see better overall performance with uuid but it's the next best alternative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment