Skip to content

Instantly share code, notes, and snippets.

@federico-garcia
Last active April 15, 2020 12:51
Show Gist options
  • Save federico-garcia/ee7f393df01f7349b40f6a9099f43d1b to your computer and use it in GitHub Desktop.
Save federico-garcia/ee7f393df01f7349b40f6a9099f43d1b to your computer and use it in GitHub Desktop.
MongoDB guide

Running mongodb

docker run -d -v $(pwd)/data/single:/data/db -v $(pwd)/config/single/mongod.conf:/etc/mongod.conf -p 27017:27017 --name mongo-server mongo:3.4 mongod -f /etc/mongod.conf

mongod - the server itself
mongo - mongodb cli (you can run js scripts)

default data directory: /data/db default DB: test default port: 27017 (config parameter net:port anf net:bindIp) WiredTiger is the default storage engine starting in MongoDB 3.2 There is a config parameter, directoryPerDB, to tell mongodb to create a directory for each DB data There is an http manager you can activate in the config (net:http:enable) to view basic info about the server, not recommended for production. The default port is 28017

connecting a mongo server

mongo --host <hostname> --port <port-number>

docker run -it --rm --link mongo-server:mongo-server --name mongo-cli mongo:3.4 mongo --host mongo-server --port 27017

docker run -it --link some-mongo:mongo --rm mongo:3.4 sh -c 'exec mongo "$MONGO_PORT_27017_TCP_ADDR:$MONGO_PORT_27017_TCP_PORT/test"'

Displaying the last N lines from the log

tail -n <N> /var/log/mongodb/mongod.log

in the config, you can set mongodb to write the logs to syslog

Shutdown the db

mongo
use admin
db.shutdownServer()

If the file /var/lib/mongod/mongodb.lock is more than 0 bytes after a shutdown, that means the shutdown was not clean and you may lose data or corruption. You may want to start the mongod on a different port and using the dbpath of the crased instance to allow mongodb to reciver itself. After all is done, shut it down cleanly and open it for normal operations.

Starting mongodb with a custom config file

mongod --config /data/config/mongod.conf

Checking the status of the server

db.config()
db.serverStatus().dur

Backing data up in Mongodb

mongodump is used to back data up from a running mongo server.

mongodump
mongodump --host <hostname> --port <port-number> --out <backup-directory>
mongodump --db <database-name>
mongodump --db <database-name> --collection <collection-name>
mongodump --username <mongodb-user> --password <password>

It creates a folder per DB, one .bson file per collection with the data and one json file per collection with metadata. During the backup operation, different documents are exprted at different times, this may miss some operations on documents after they were exported. To avoid this, you can specify the --oplog so all documents are up-to-date at the time of the last document is exported (point-in-time backup), however, this requires a replica set and do not work on a standalone mongoDB instance. Make sure you secure the backup folder from unwanted access.

Restoring data in mongodb

mongorestore is used to restore data from a backup.

mongorestore <backup-directory>
mongorestore --host <hostname> --port <port-number> <backup-directory>
mongorestore --drop <backup-directory>
mongorestore --drop --collection <collection-name> --db <db-name> <collection-bson-file-path>
mongorestore --db <target-db> <db-backup-directory>
mongorestore --oplogReplay --port <new-db-port> <backup-directory>

Use --drop to delete all existing collection in the target DB before restoring the data from the backup file. Without --drop all existing data will remain even if it's not part of the backup. Use --oplogReplay to replay operations that ocurred during the backup, point-in-time restore. Available for replicae set dumps created with the oplog option. Ypu should practice resroring routinely: master the syntax, ensure backup file integrity and know the time to restore.

Importing data into Mongodb

mongoimport is used to import data into a mongodb's collection. File format supported: json (default), csv amd tsv.

mongoimport --db <target-db> --collection <target-collection> <json-file>
mongoimport --db <target-db> --collection <target-collection> --upsert <json-file>
mongoimport --db <target-db> --collection <target-collection> --upsert --upsertFields <comma-separated-fields> <json-file>
mongoimport --type csv --headerline --db <target-db> --collection <target-collection> <csv-file>

Use --upsert to tell mongoimport to insert new documents and update existing ones, otherwise, existing documents will reject documents from the import file. By default, mongodb match documents based on _id, if the source file doesn't have the _id field, we can tell mongoimport to use other fields in the import file to identify documents when importing. When importing CSV or TSV files, if the first line contains the name of the fields then add --headerline and the data starts in the second line, one document per line. If the file doens't contains the fields, you need to specify them in the command-line using --fields as a comma-separated list or using the --fieldFile option, one field name per line.

Exporting data out of Mongodb

mongoexport is used to export data out of mongodb. By default, it exports data in json format.

mongoexport -- db <target-db> --collection <target-collection> --out <export-file-name>
mongoexport -- db <target-db> --collec tion <target-collection> --fields <comma-separated-list>--out <export-file-name>
mongoexport -- db <target-db> --collection <target-collection> --query "<query-object>" --out <export-file-name>

Use --fields to export only certain fields from a collection. Use --type=<file-type> to export data in another format. e.g csv. Whne exporting to csv files, you need to provide the list of fields. Use --query to export only documents that match certain criteria.

Indexing

Indexes are a specialized data structure (b-tree) that contains pointers to the actual documents in Mongo, so searches are much more faster. Collections may have more than one index, Mongo will select teh best one when executing queries. All collections by default have an index on the _id field.

db.<collection-name>.getIndexes()
db.system.indexes.find()

creating an index on a collection. 1 means ASC, 0 means DESC. Multiple-field indexes can have up to 31 fields. Max number of indexes per collection is 64.

db.<collection-name>.ensureIndex({field-name:1})
db.<collection-name>.ensureIndex({field-name:1,field-name-2:0,field-name-3:1})
db.<collection-name>.ensureIndex({<main-field-name>.<sub-field-name>:1})
db.<collection-name>.ensureIndex({field-name:1},{sparse:true})
db.<collection-name>.stats()

Use the option {sparse:true} if the index's field is not present in most documents, so the index created is smaller, does not contain a NULL entry pointing to all documents where the field does not exist. The smaller the faster. dropping an index from a collection

db.<collection-name>.dropIndex('<index-name>')

displaying the execution plan of a query. Read the data below the winningPlan field, stage equals to COLLSCAN is not good. keyPattern displays information about the index used.

db.<collection-name>.find({query-document}).explain()
db.<collection-name>.find({query-document}).explain('executionStats')
db.<collection-name>.find({query-document}).explain(true)
db.<collection-name>.find({query-document}).sort({sorting-doc}).hint({indexes-doc}).explain(true)

Use the function hint({indexes-doc}) to define indexes you want to force MongoDB to use in your query. e.g {promo:1} There is a field executionStats:executionTimeMillis that tells you how many ms MongoDB took to define and execute the query plan. The time to send this data oevr the network is not part of this. When sorting, if the direction of the index is the same as the sort required, there is not need for an extra sort step in the query execution plan. If you have multiple-field indexes, queries will use those indexes if the query uses the columns in the same order of the index's fields, from left to right.

creating unique indexes. They work on single and array values. If you want to enforce uniqueness on documents that actually have the index's field, you need to add the sparse option. They don't work across shards.

db.<collection-name>.ensureIndex({field-name:1},{unique:true})

Removing data from indexes after certain time, you can define just one of this per collection. The field should be a date data type.

db.<collection-name>.ensureIndex({field-name:1},{expireAfterSeconds:N})

Creating indexes in the backgroud instead of in the foreground. In the foreground, access to the collection is not allowed. In the backgroud, access to the collection is allowed but the index creation is slower.

db.<collection-name>.ensureIndex({field-name:1},{backgroud:true})

Compacting a given collection, it defragments the collection and rebuild all its indexes, it's a blocking operation and it's only recommended on maintenance windows. It depends on the storage driver, check the docs!

db.runCommand({compact:'<collection-name>'})

Replica sets

"two is one, and one is none" A mongo replica set is a group of mongod servers working together, each server has a copy of all the data. There is only one primary, the only one that accepts writes, and a group of secondary servers, they are read-only.

The primary server creates an oplog where it records all write operations. The oplog is a capped collection (the size of the collection is fxied) where mongodb writes changes up to the end and then overrides old documents stored (cyclical), this is very quickly for writes. Secondary servers read operations from the oplog and update their own copy of the data.

Starting monogo in replica set mode

mongod --replSet <replica-set-name>

Initializing a replica set

rs.initiate()

Getting the replica set configuration

var cfg = rs.config()

Changing the current configuration (it only works from a primary node)

rs.reconfig(cfg)

In order to be elected as primary, a server needs to receive more than 50% of all votes from other servers. In a configuration where you have 1 primary and 1 secondary servers, if the primary goes down, the secondary cannot become primary automatically since it doesn't get more than 50% of the votes, just 1 out of 2. For this cases, you can create an "Arbiter", an arbiter is a server that doesn't hold any data but can participate on primary elections. All of this happens automatically. If due to errors or a network partitiona majority cannot be obtained, you can start a new secondary or arbiter and the replica set will get back to normal.

Adding more members to the replica set

mongod --replSet <replica-set-name>
mongod --replSet <replica-set-name>
mongo
rs.initiate()
rs.add("<hostname>:<mongod-port>")

Adding an Arbiter to the replica set

mongod --replSet <replica-set-name>
rs.add("<hostname>:<mongod-port>", true)

Connecting to another mongodb server from the cli

db = connect("<hostname>:<mongod-port>/<db-name>")

Reading from a secondary server in a replica set

db.setSlaveOk()
db.<collection-name>.find()

Giving more priority to a given server at election time (default = 1)

var cfg = rs.config()
cfg.members[0].priority = 10
cfg
rs.reconfig(cfg)

Determining what mongo server you're connected to

db.getMongo()

Taking down a mongo server for N seconds, default 60 seconds.

db.stepDown(<seconds>)

Preventing an elligeble secondary to become primary for N seconds

db.freeze(<seconds>)

Making a secondary server hidden so no applications connect to them. It will never become primary (priority=0). Useful for reporting purposes.

var cfg = rs.config()
cfg.members[0].priority = 0
cfg.members[0].hidden = true
cfg
rs.reconfig(cfg)

Mongodb chaining is a feature that allow MongoDB secondary servers to sync data from another secondary server based on ping performance between servers. They will sync from the closest one. If you have secondary servers in 2 DCs, data has to go over the wire once from DC1 to DC2 and the rest of secondary servers in DC2 will sync from a secondary server in DC2 and not from the primary in DC1, saving bandwidth.

Durability. There are different write concern levels:

  • Acknowledged. No guarantee.
  • Journaled. Minimum durability guaranteed. default. j parameter
  • Multi-member. requires the Acknowledge from multiple members in a replica set. w parameter

Sharding

Monitoring

Mongo logs a lot of event to the mongod.log file, it's good practice to review it when trying to solve an issue or on a schdeuled basis.

show logs
show log global
db.setlogLevel(verbosity, topic)
db.setlogLevel(4, 'replication')
db.setlogLevel(-1, 'replication')

topic possible values: accessControl, command, control, geo, index, network, query, replication, storage, journal, write verbosity: 0(default)..5

Query profiler. Detect slow queries

show profile
db.setProfilingLevel(level, threshold)
db.setProfilingLevel(1, 20)
db.system.profile.findOne({op:'query', ns:'<db-name>.<collection-name>'})

level: 0 (default)..2 threshold: capture only slo queries that exceed threshold

mongostat. Runtime statistics from mongod servers. Continuously polls and displays values.

mongostat --host <hostname> --port <port-number>
mongostat --host <hostname> --port <port-number> --rowcount N

mongotop. Shows where mongod spends most of the time. Continuously polls and displays values.

mongotop --host <hostname> --port <port-number>

MongoDB's database/collection disk and memory usage estimates

db.stats()
db.stats(1024) #KB
db.stats(1048576) #MB

db.<collection-name>.stats(1024) #KB
db.<collection-name>.stats(1048576) #MB

MongoDB server runtime status

db.serverStatus()
db.serverStatus().<section-name>
db.serverStatus().dur
db.serverStatus().mem

MongoDB Cloud manager solution https://www.mongodb.com/cloud/cloud-manager

What metrics to monitor https://www.datadoghq.com/blog/monitoring-mongodb-performance-metrics-wiredtiger/ https://www.datadoghq.com/blog/collecting-mongodb-metrics-and-statistics/ https://www.datadoghq.com/blog/monitor-mongodb-performance-with-datadog/

Security

Reduce the surface area attackers can take advantage of: network, files and mongo data access.

  • Don't make mongodb listens on all possible IPs of the VM or physical machine. You can restrict that with the config parameter bindIp.
  • Firewall rules to allow traffic only on needed ports.
  • No public access to mongoDb directly, only through applications
  • SSL connections to protect data and commands over the wire. configuration parameter net/ssl. net/ssl/mode could be requireSSL|allowSSL|preferSSL|disabled
  • Allow only certain users access to the mongodb's data and backup files (W/R permissions).
  • No support for encryption at rest from current storage engines (6/22/102017)
  • Use keyfiles (security/keyFile and security/clusterAuthMode: sendKeyFile) to identify valid members of a replicaset and protect intra cluster communication. You don't want to replicate your data to an unknown server.
  • Authentication and Authorization (security/authorization: enabled). Users without proper credentials are rejected. MongoDB has predefined list of roles with permissions to do different things in the DB, use those to proper manage users. e.g Root, userAdminAnyDatabase, userAdmin, read, readWrite. Follow the principle of least privilege. In order to create the first user, you need to connect from the server itself where mongoDB is running, not from client computers. Make sure the first user is a super user so you can create more users later.
mongo
use admin
db.createUser({user-config-object})
exit
mongo --username <username> --password <password> --authenticationDatabase <db-name-where-user-was-defined>
db.grantRolesToUser("<super-admin-user>", ["readWrite"])
show users
db.system.users.find()

Here is an example of a user configuration object

var root = {user: 'root', pwd: '12345', roles: ['root']}
var rpt = {user: 'reporter', pwd: '12345', roles: [{ role:read, db: 'demo'}]}
var app = {user: 'webApp', pwd: '12345', roles: [{ role:readWrite, db: 'demo'}]}

Removing users and revoking roles from users. When DB is not specified when granting/revoking roles, it assumes the current DB.

db.dropUser("<user-name>")
db.revokeRoleFromUser("<user-name>", ["<role-name>"])

Logging in and out. When logging in you need to make sure you are in the DB where the user was defined, by default this is the admin. However, you can create users within any DB, this is different to the DB the user has permissions to read/write. This allows you to have multiple users with the same provided they are created in different DBs.

db.logout()
use admin
db.auth('<username>','<password>')

Miscellaneous

Hardware:

  • CPU. least critical. limited effect on overall performance. Useful if you need compresion or encryption at storage level.
  • Memory. More memory, larger working set. A working set is all the data Mongodb needs to satisfy an operation. Enough for your working set (documents used, indexes and intermidiary data). data no found in memory needs to be loaded from disk. high page-fault rate when not enough memory. Scale up and Scale out, if needed.
  • Disk. I/O crucial for performance. SSD provides better performance for random access.

You can set-up a secondary server in a replicaset with cloud storage like AWS EBS, so your data is stored out of your datacenter in case of total failures. A good backup strategy is still required.

Storage engines: WiredTiger (recommended) or you can plug-in third-party engines if you need special features the built-in engine doesn't provide. e.g encryption

Docker

openssl rand -base64 741 > keyfile
sudo chown mongodb:mongodb /opt/mongodb/keyfile
sudo chmod 0600 /opt/mongodb/keyfile

docker-compose up

docker exec -it `docker ps -qf name=mongodb-s1` bash -c 'mongo'

rs.initiate({ _id: "ec-prehire", members: [{ _id: 1, host: "mongodb-s1:27017" }, { _id: 2, host: "mongodb-s2:27018" }, { _id: 3, host: "mongodb-s3:27019" }], settings: { getLastErrorDefaults: { w: "majority", wtimeout: 30000 }}})

rs.status()

admin = db.getSiblingDB("admin")
admin.createUser(
  {
    user: "adminUser",
    pwd: "password",
    roles: [ { role: "userAdminAnyDatabase", db: "admin" } ]
  }
)

db.getSiblingDB("admin").auth("adminUser", "password")
OR
mongo -u "adminUser" -p "password" --authenticationDatabase "admin"

use admin
db.grantRolesToUser("adminUser", ["readWrite"])

db.getSiblingDB("admin").createUser(
  {
    "user" : "adminCluster",
    "pwd" : "password",
    roles: [ { "role" : "clusterAdmin", "db" : "admin" } ]
  }
)

db.getSiblingDB("admin").createUser(
  {
    "user" : "jrAppUser",
    "pwd" : "password",
    roles: [ { "role" : "readWrite", "db" : "job_requirements" } ]
  }
)

db.getSiblingDB("admin").createUser(
  {
    "user" : "jrAppReadUser",
    "pwd" : "password",
    roles: [ { "role" : "read", "db" : "job_requirements" } ]
  }
)

db.getSiblingDB("admin").createUser(
  {
    "user" : "restoreUser",
    "pwd" : "password",
    roles: [ { "role" : "restore", "db" : "admin" } ]
  }
)

docker exec -it docker ps -qf name=mongodb-s1 bash -c 'mongo -u "demoUser" -p "password" --authenticationDatabase "admin"'

Connecting to a remote docker server

export DOCKER_HOST=10.21.100.240

workers: .241, 242. 243

ssh [email protected] ctgisfisf

sudo su

docker run -d
-v /run/docker/plugins/:/run/docker/plugins/
-v /volumes/local-persist/:/var/lib/docker/plugin-data/
-v /volumes/mongodb/data/prehire/:/volumes/mongodb/data/prehire/
cwspear/docker-local-persist-volume-plugin

docker volume create --name mongodb-data-prehire -o mountpoint=/volumes/mongodb/data/prehire -d local-persist

docker volume create --name mongodb-data-prehire-s1 -o mountpoint=/volumes/mongodb/data/prehire/m1 -d local-persist

docker volume create --name mongodb-data-prehire-s2 -o mountpoint=/volumes/mongodb/data/prehire/m2 -d local-persist

docker volume create --name mongodb-data-prehire-s3 -o mountpoint=/volumes/mongodb/data/prehire/m3 -d local-persist

OR

docker volume create --driver=vsphere --name=mongodb-data-prehire-s1 -o size=3gb docker volume create --driver=vsphere --name=mongodb-data-prehire-s2 -o size=3gb docker volume create --driver=vsphere --name=mongodb-data-prehire-s3 -o size=3gb

docker run -it --rm -v mongodb-data-prehire-s3:/redis-data alpine:3.4 sh

Moving files from your local mahcine to the server

[http://www.binarytides.com/linux-tar-command/

tar -cvzf mongod.tar.gz ./config/replicaset/

scp mongod.tar.gz [email protected]:/home/zcebjobs

tar -xvzf mongod.tar.gz -C /tmp/mongod/

Copying files from your local computer to an existing volume

docker run --rm -v $(pwd)/config/replicaset/:/incoming
-v mongodb-data-prehire:/config alpine:3.4 cp -rp /incoming/* /config

docker run -it --rm -v $(pwd)/config/replicaset:/incoming
-v mongodb-data-prehire:/config alpine:3.4 sh

You should run this on the docker server and then copy the files manually docker run -it --rm -v /tmp/mongod/config/replicaset:/config-files
-v mongodb-data-prehire-s1:/config alpine:3.4 sh

cp /config-files/s3/* /config

I'm not sure why this doesn't work docker run --rm -v /tmp/mongod/config/replicaset:/config-files -v mongodb-data-prehire:/config alpine:3.4 cp -rp /config-files/* /config

Run a new container to double check the content was copied succeexitssfully docker run -it --rm -v mongodb-data-prehire:/config alpine:3.4 sh

Deploying a stack in docker cluster

scp docker-compose-prod.yml [email protected]:/home/zcebjobs

docker stack deploy --compose-file docker-compose-prod.yml mongodb-prehire

docker stack ps mongodb-prehire

docker stack rm mongodb-prehire

docker stack services mongodb-prehire

docker logs docker ps -qf name=mongodb-prehire_mongodb-s1

docker exec -it docker ps -qf name=mongodb-prehire_mongodb-s1 bash

docker exec -it docker ps -qf name=mongodb-prehire_mongodb-s1 bash -c 'mongo'

docker run -it --rm --name mongo-cli mongo:3.4 mongo --host 10.21.100.241 --port 27017

db = connect("mongodb-s3:27019") db.getSiblingDB("admin").auth("adminUser", "password") db.getSiblingDB("admin").auth("adminCluster", "password") db.getSiblingDB("admin").auth("jrAppUser", "password")

backing data up and restoring the JR DB in another server

ssh [email protected] mongodump --db job_requirements tar -cvzf jr.tar.gz ./dump scp jr.tar.gz [email protected]:/home/zcebjobs sshdockertest tar -xvzf jr.tar.gz -C ./mongo/ docker run -it --rm -v /home/zcebjobs/mongo:/backup -v mongodb-data-prehire-s1:/config alpine:3.4 sh cp -r /backup/ /config/backup exit docker exec -it docker ps -qf name=mongodb-prehire_mongodb-s1 bash mongorestore --host localhost --port 27018 -u restoreUser -p password --authenticationDatabase admin /data/db/backup/dump

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment