This is an unexpected third part to an interesting experiment with FerretDB and CockroachDB. This time, we're going to expand on the previous article.
Using CockroachDB for FerretDB backend
Migrating MongoDB collections to CockroachDB with FerretDB
Once I completed my first two articles, I realized there are a lot of possibilities exposed by proxying MongoDB collections through FerretDB backed by CockroachDB. CockroachDB has unique data domiciling capabilities available through multiregion abstractions, inverted and partial indexes, computed columns and of course strong consistency. Today, we're going to discuss unique constraints in MongoDB and CockroachDB.
- Start a 9 node multiregion cluster (CockroachDB Dedicated)
- Start FerretDB (Docker)
- Unique Indexes
- Considerations
- Conclusion
I am going to use the same CockroachDB Dedicated cluster from the previous article. Please refer to the previous article for the detailed steps. You can get a 30 day trial of CockroachDB Dedicated following this link.
I'll be using the same compose file from the previous article, however, as we will be discussing multiregion, I will make changes to the compose file and discuss the steps below.
FerretDB is under development and many features may still be unavailable. In the previous article I've migrated a MongoDB sample_mflix dataset which relies on unique indexes for some of the collections. As of this writing, I've not been able to validate unique constraints work in FerretDB. I've concluded that it doesn't work. It inhibits some of the capabilities we'd like to discuss and I will demonstrate how we can leverage CockroachDB for globally strong consistent unique constraints.
If you've not done so before, restore the sample_mflix.users collection
mongorestore --archive=sampledata.archive --nsInclude=sample_mflix.users --numInsertionWorkersPerCollection=1002022-09-12T16:29:24.266-0400 sample_mflix.users 28.9KB
2022-09-12T16:29:26.156-0400 sample_mflix.users 28.9KB
2022-09-12T16:29:26.156-0400 finished restoring sample_mflix.users (185 documents, 0 failures)
2022-09-12T16:29:26.200-0400 restoring indexes for collection sample_mflix.users from metadata
2022-09-12T16:29:26.200-0400 index: &idx.IndexDocument{Options:primitive.M{"name":"email_1", "unique":true, "v":2}, Key:primitive.D{primitive.E{Key:"email", Value:1}}, PartialFilterExpression:primitive.D(nil)}
2022-09-12T16:29:26.203-0400 185 document(s) restored successfully. 0 document(s) failed to restore.notice a unique constraint was restored along with the data, looking at the dataset description, indeed there's a unique constraint on the email field. The schema in CockroachDB is not aware of the constraint:
CREATE TABLE sample_mflix.users_5e7cc513 (
_jsonb JSONB NULL,
rowid INT8 NOT VISIBLE NOT NULL DEFAULT unique_rowid(),
CONSTRAINT users_5e7cc513_pkey PRIMARY KEY (rowid ASC)
);If we would attempt to restore this file again, the constraint would not fire, in fact, it would append to the existing rows.
Originally we restored 185 records, if we rerun the restore, we should see 370 records
2022-09-12T16:38:01.207-0400 sample_mflix.users 28.9KB
2022-09-12T16:38:01.207-0400 finished restoring sample_mflix.users (185 documents, 0 failures)
2022-09-12T16:38:01.247-0400 restoring indexes for collection sample_mflix.users from metadata
2022-09-12T16:38:01.247-0400 index: &idx.IndexDocument{Options:primitive.M{"name":"email_1", "unique":true, "v":2}, Key:primitive.D{primitive.E{Key:"email", Value:1}}, PartialFilterExpression:primitive.D(nil)}
2022-09-12T16:38:01.251-0400 185 document(s) restored successfully. 0 document(s) failed to restore.ferretdb=> select count(*) from sample_mflix.users_5e7cc513 ;
count
-------
370It's obvious the unique constraint is not enforced. What does FerretDB show:
sample_mflix> db.getCollectionInfos( )
[ { name: 'users', type: 'collection' } ]sample_mflix> db.users.getIndexes()
MongoServerError: no such command: 'listIndexes'Let's try to add it explicitly
sample_mflix> db.users.createIndex( { "email": 1 }, { unique: true } )
email_1let's attempt to insert a row using FerretDB.
sample_mflix> db.users.findOne({ })
{
_id: ObjectId("59b99db4cfa9a34dcd7885b6"),
name: 'Ned Stark',
email: 'sean_bean@gameofthron.es',
password: '$2b$12$UREFwsRUoyF0CRqGNK0LzO0HM/jLhgUCNNIJ9RJAqMUQ74crlJ1Vu'
}We're going to add another record with the same email address
sample_mflix> db.users.insertOne({name: 'Stark Sr.', email: 'sean_bean@gameofthron.es' })
{
acknowledged: true,
insertedId: ObjectId("631f99c3cc8ad2b643a9f459")
}Insert was accepted
I will give MongoDB benefit of a doubt that unique constraints do indeed work. I think the issue unique indexes don't work in FerretDB is because it's not implemented yet.
Let's enforce the constraint on the CockroachDB side:
We will need to create a computed column on the email field in the jsonb field, we're also specifying UNIQUE constraint as part of the command.
ALTER TABLE sample_mflix.users_5e7cc513 ADD COLUMN email STRING NOT NULL UNIQUE AS ((_jsonb->>'email')::STRING) VIRTUAL;If you added Ned Stark's email address previously or restored the users collection twice as I suggested previously, CockroachDB will detect the unique constraint violation.
ERROR: failed to ingest index entries during backfill: duplicate key value violates unique constraint "users_5e7cc513_email_key"
DETAIL: Key (email)=('sean_bean@gameofthron.es') already exists.You have a choice to truncate the table or delete all violations manually.
TRUNCATE TABLE sample_mflix.users_5e7cc513 ;
ALTER TABLE sample_mflix.users_5e7cc513 ADD COLUMN email STRING NOT NULL UNIQUE AS ((_jsonb->>'email')::STRING) VIRTUAL;The schema now looks like so:
CREATE TABLE sample_mflix.users_5e7cc513 (
_jsonb JSONB NULL,
rowid INT8 NOT VISIBLE NOT NULL DEFAULT unique_rowid(),
email STRING NOT NULL AS (_jsonb->>'email':::STRING) VIRTUAL,
CONSTRAINT users_5e7cc513_pkey PRIMARY KEY (rowid ASC),
UNIQUE INDEX users_5e7cc513_email_key (email ASC)
) LOCALITY REGIONAL BY TABLE IN PRIMARY REGION;At this point we can attempt to restore the collection again, this time around, CockroachDB will enforce the uniqueness on the email field.
2022-09-14T12:43:59.689-0400 finished restoring sample_mflix.users (185 documents, 0 failures)
2022-09-14T12:43:59.715-0400 restoring indexes for collection sample_mflix.users from metadata
2022-09-14T12:43:59.715-0400 index: &idx.IndexDocument{Options:primitive.M{"name":"email_1", "unique":true, "v":2}, Key:primitive.D{primitive.E{Key:"email", Value:1}}, PartialFilterExpression:primitive.D(nil)}
2022-09-14T12:43:59.718-0400 185 document(s) restored successfully. 0 document(s) failed to restore.Let's attempt to insert a record using mongosh
db.users.insertOne({name: 'Stark Sr.', email: 'sean_bean@gameofthron.es' })MongoServerError: [pool.go:283 pgdb.(*Pool).InTransaction] [msg_insert.go:108 pg.(*Handler).insert.func1] [insert.go:57 pgdb.InsertDocument] ERROR: duplicate key value violates unique constraint "users_5e7cc513_email_key" (SQLSTATE 23505)We have a violation! But just to be 100% sure, let's attempt to restore the table again
2022-09-12T16:56:55.884-0400 Failed: sample_mflix.users: error restoring from archive 'sampledata.archive': (InternalError) [pool.go:283 pgdb.(*Pool).InTransaction] [msg_insert.go:108 pg.(*Handler).insert.func1] [insert.go:57 pgdb.InsertDocument] ERROR: duplicate key value violates unique constraint "users_5e7cc513_email_key" (SQLSTATE 23505)
2022-09-12T16:56:55.884-0400 0 document(s) restored successfully. 0 document(s) failed to restore.We can stop here but I would like to make this concept stick as there are significant benefits to having a multi-region database with strong global consistency. The unique constraint is enforced immediately upon creation across all of the regions, availability zones and nodes. If we were to access the database from say aws-us-east-2, we'd get the same violation. Let's tear down the FerretDB compose environment and create three new mongosh clients, each accessing the regional (aws-us-east-1, aws-us-east-2, aws-us-west-2) cluster endpoints.
version: "3"
services:
ferretdb-us-east-1:
image: ghcr.io/ferretdb/ferretdb:latest
hostname: 'ferretdb-us-east-1'
container_name: 'ferretdb-us-east-1'
restart: 'on-failure'
command:
[
'-listen-addr=:27017',
## Dedicated multiregion cluster
'-postgresql-url=postgresql://artem:password@artem-mr-7xw.aws-us-east-1.cockroachlabs.cloud:26257/ferretdb?sslmode=verify-full&sslrootcert=/certs/artem-mr-ca.crt'
]
ports:
- 27017:27017
volumes:
- /Users/artem/.postgresql/root.crt:/certs/root.crt
- /Users/artem/Library/CockroachCloud/certs/artem-mr-ca.crt:/certs/artem-mr-ca.crt
ferretdb-us-east-2:
image: ghcr.io/ferretdb/ferretdb:latest
hostname: 'ferretdb-us-east-2'
container_name: 'ferretdb-us-east-2'
restart: 'on-failure'
command:
[
'-listen-addr=:27017',
## Dedicated multiregion cluster
'-postgresql-url=postgresql://artem:password@artem-mr-7xw.aws-us-east-2.cockroachlabs.cloud:26257/ferretdb?sslmode=verify-full&sslrootcert=/certs/artem-mr-ca.crt'
]
ports:
- 27019:27017
volumes:
- /Users/artem/.postgresql/root.crt:/certs/root.crt
- /Users/artem/Library/CockroachCloud/certs/artem-mr-ca.crt:/certs/artem-mr-ca.crt
ferretdb-us-west-2:
image: ghcr.io/ferretdb/ferretdb:latest
hostname: 'ferretdb-us-west-2'
container_name: 'ferretdb-us-west-2'
restart: 'on-failure'
command:
[
'-listen-addr=:27017',
## Dedicated multiregion cluster
'-postgresql-url=postgresql://artem:password@artem-mr-7xw.aws-us-west-2.cockroachlabs.cloud:26257/ferretdb?sslmode=verify-full&sslrootcert=/certs/artem-mr-ca.crt'
]
ports:
- 27021:27017
volumes:
- /Users/artem/.postgresql/root.crt:/certs/root.crt
- /Users/artem/Library/CockroachCloud/certs/artem-mr-ca.crt:/certs/artem-mr-ca.crt I've created services ferretdb-us-east-1, ferretdb-us-east-2 and ferretdb-us-west-2, the connection strings coincide with the regional endpoints, exposing the host ports on 27017, 27019 and 27021 respectively. Feel free to validate the -postgresql-url for each of the service.
docker compose up -d
docker compose psNAME COMMAND SERVICE STATUS PORTS
ferretdb-us-east-1 "/ferretdb -listen-a…" ferretdb-us-east-1 running 0.0.0.0:27017->27017/tcp
ferretdb-us-east-2 "/ferretdb -listen-a…" ferretdb-us-east-2 running 0.0.0.0:27019->27017/tcp
ferretdb-us-west-2 "/ferretdb -listen-a…" ferretdb-us-west-2 running 0.0.0.0:27021->27017/tcpI'm not familiar with Mongo CLI enough to demonstrate that each endpoint is accessing the regional instance, I will show it in SQL shortly. In the meantime, let's attempt to insert a record
Access the mongosh pointing to the us-west-2 endpoint
mongosh mongodb://localhost:27021Insert a record
sample_mflix> db.users.insertOne({name: 'Stark Sr.', email: 'sean_bean@gameofthron.es' })
MongoServerError: [pool.go:283 pgdb.(*Pool).InTransaction] [msg_insert.go:108 pg.(*Handler).insert.func1] [insert.go:57 pgdb.InsertDocument] ERROR: duplicate key value violates unique constraint "users_5e7cc513_email_key" (SQLSTATE 23505)It failed and proves that CockroachDB indexes are consistent across geographies. You may be asking why am I working so hard to make this point. I was reading MongoDB documentation and found the following note
For replica sets and sharded clusters, using a rolling procedure to create a unique index requires that you stop all writes to the collection during the procedure. If you cannot stop all writes to the collection during the procedure, do not use the rolling procedure. Instead, build your unique index on the collection by:
issuing
db.collection.createIndex()on the primary for a replica set, orissuing
db.collection.createIndex()on the mongos for a sharded cluster.
Now please correct me if I'm misinterpreting the note but it says you have to stop writes to the collection when you create an index in MongoDB. Additionally, if stopping writes is not an option, you have to create an index on the mongos which as I understand are follower replicas in Mongo speak. That is absolutely not the case with CockroachDB and goes against the CockroachDB consistency guarantees. The unique index is global and consistent across all of the nodes, availability zones and regions. You create it once and you don't stop the writes to the table. That said, let me show you this example in SQL.
Let me connect to the us-west-2 region and intentionally attempt to violate the constraint
cockroach sql --url "postgresql://artem:password@artem-mr-7xw.aws-us-west-2.cockroachlabs.cloud:26257/ferretdb2?sslmode=verify-full&sslrootcert=$HOME/Library/CockroachCloud/certs/artem-mr-ca.crt"Let's confirm the region is us-west-2
select gateway_region(); gateway_region
----------------
aws-us-west-2Let's see the network round trip time from my host
select 1; ?column?
------------
1
(1 row)
Time: 69ms total (execution 1ms / network 68ms)That seems accurate, there's about 70ms RTT from where I am in New Jersey.
Let's attempt to insert a record
INSERT INTO sample_mflix.users_5e7cc513 ("_jsonb") VALUES (
'{"email": "sean_bean@gameofthron.es"}'
);ERROR: duplicate key value violates unique constraint "users_5e7cc513_email_key"
SQLSTATE: 23505
DETAIL: Key (email)=('sean_bean@gameofthron.es') already exists.
CONSTRAINT: users_5e7cc513_email_keyThis wraps up our unique constraint experiments with the MongoDB collections. I admit some of the capabilities have not been tested on the MongoDB side because we're proxying via FerretDB. However, I hope I've given you enough evidence having a proper relational database with strict schema validation protects users from data quality issues. I'm looking forward to new experiments with FerretDB!