docker run -d -v
mongod - the server itself
mongo - mongodb cli (you can run js scripts)
default data directory: /data/db
default DB: test
default port: 27017 (config parameter net:port
anf net:bindIp
)
WiredTiger is the default storage engine starting in MongoDB 3.2
There is a config parameter, directoryPerDB
, to tell mongodb to create a directory for each DB data
There is an http manager you can activate in the config (net:http:enable
) to view basic info about the server, not recommended for production. The default port is 28017
mongo --host <hostname> --port <port-number>
docker run -it --rm --link mongo-server:mongo-server --name mongo-cli mongo:3.4 mongo --host mongo-server --port 27017
docker run -it --link some-mongo:mongo --rm mongo:3.4 sh -c 'exec mongo "$MONGO_PORT_27017_TCP_ADDR:$MONGO_PORT_27017_TCP_PORT/test"'
tail -n <N> /var/log/mongodb/mongod.log
in the config, you can set mongodb to write the logs to syslog
mongo
use admin
db.shutdownServer()
If the file /var/lib/mongod/mongodb.lock
is more than 0 bytes after a shutdown, that means the shutdown was not clean and you may lose data or corruption. You may want to start the mongod
on a different port and using the dbpath
of the crased instance to allow mongodb to reciver itself. After all is done, shut it down cleanly and open it for normal operations.
mongod --config /data/config/mongod.conf
db.config()
db.serverStatus().dur
mongodump
is used to back data up from a running mongo server.
mongodump
mongodump --host <hostname> --port <port-number> --out <backup-directory>
mongodump --db <database-name>
mongodump --db <database-name> --collection <collection-name>
mongodump --username <mongodb-user> --password <password>
It creates a folder per DB, one .bson
file per collection with the data and one json
file per collection with metadata.
During the backup operation, different documents are exprted at different times, this may miss some operations on documents after they were exported. To avoid this, you can specify the --oplog
so all documents are up-to-date at the time of the last document is exported (point-in-time backup), however, this requires a replica set and do not work on a standalone mongoDB instance.
Make sure you secure the backup folder from unwanted access.
mongorestore
is used to restore data from a backup.
mongorestore <backup-directory>
mongorestore --host <hostname> --port <port-number> <backup-directory>
mongorestore --drop <backup-directory>
mongorestore --drop --collection <collection-name> --db <db-name> <collection-bson-file-path>
mongorestore --db <target-db> <db-backup-directory>
mongorestore --oplogReplay --port <new-db-port> <backup-directory>
Use --drop
to delete all existing collection in the target DB before restoring the data from the backup file. Without --drop
all existing data will remain even if it's not part of the backup.
Use --oplogReplay
to replay operations that ocurred during the backup, point-in-time restore. Available for replicae set dumps created with the oplog
option.
Ypu should practice resroring routinely: master the syntax, ensure backup file integrity and know the time to restore.
mongoimport
is used to import data into a mongodb's collection. File format supported: json (default), csv amd tsv.
mongoimport --db <target-db> --collection <target-collection> <json-file>
mongoimport --db <target-db> --collection <target-collection> --upsert <json-file>
mongoimport --db <target-db> --collection <target-collection> --upsert --upsertFields <comma-separated-fields> <json-file>
mongoimport --type csv --headerline --db <target-db> --collection <target-collection> <csv-file>
Use --upsert
to tell mongoimport
to insert new documents and update existing ones, otherwise, existing documents will reject documents from the import file. By default, mongodb match documents based on _id
, if the source file doesn't have the _id
field, we can tell mongoimport
to use other fields in the import file to identify documents when importing.
When importing CSV or TSV files, if the first line contains the name of the fields then add --headerline
and the data starts in the second line, one document per line. If the file doens't contains the fields, you need to specify them in the command-line using --fields
as a comma-separated list or using the --fieldFile
option, one field name per line.
mongoexport
is used to export data out of mongodb. By default, it exports data in json format.
mongoexport -- db <target-db> --collection <target-collection> --out <export-file-name>
mongoexport -- db <target-db> --collec tion <target-collection> --fields <comma-separated-list>--out <export-file-name>
mongoexport -- db <target-db> --collection <target-collection> --query "<query-object>" --out <export-file-name>
Use --fields
to export only certain fields from a collection.
Use --type=<file-type>
to export data in another format. e.g csv. Whne exporting to csv files, you need to provide the list of fields.
Use --query
to export only documents that match certain criteria.
Indexes are a specialized data structure (b-tree) that contains pointers to the actual documents in Mongo, so searches are much more faster. Collections may have more than one index, Mongo will select teh best one when executing queries.
All collections by default have an index on the _id
field.
db.<collection-name>.getIndexes()
db.system.indexes.find()
creating an index on a collection. 1 means ASC, 0 means DESC. Multiple-field indexes can have up to 31 fields. Max number of indexes per collection is 64.
db.<collection-name>.ensureIndex({field-name:1})
db.<collection-name>.ensureIndex({field-name:1,field-name-2:0,field-name-3:1})
db.<collection-name>.ensureIndex({<main-field-name>.<sub-field-name>:1})
db.<collection-name>.ensureIndex({field-name:1},{sparse:true})
db.<collection-name>.stats()
Use the option {sparse:true}
if the index's field is not present in most documents, so the index created is smaller, does not contain a NULL
entry pointing to all documents where the field does not exist. The smaller the faster.
dropping an index from a collection
db.<collection-name>.dropIndex('<index-name>')
displaying the execution plan of a query. Read the data below the winningPlan
field, stage
equals to COLLSCAN
is not good. keyPattern
displays information about the index used.
db.<collection-name>.find({query-document}).explain()
db.<collection-name>.find({query-document}).explain('executionStats')
db.<collection-name>.find({query-document}).explain(true)
db.<collection-name>.find({query-document}).sort({sorting-doc}).hint({indexes-doc}).explain(true)
Use the function hint({indexes-doc})
to define indexes you want to force MongoDB to use in your query. e.g {promo:1}
There is a field executionStats:executionTimeMillis
that tells you how many ms MongoDB took to define and execute the query plan. The time to send this data oevr the network is not part of this.
When sorting, if the direction of the index is the same as the sort required, there is not need for an extra sort step in the query execution plan.
If you have multiple-field indexes, queries will use those indexes if the query uses the columns in the same order of the index's fields, from left to right.
creating unique indexes. They work on single and array values. If you want to enforce uniqueness on documents that actually have the index's field, you need to add the sparse
option. They don't work across shards.
db.<collection-name>.ensureIndex({field-name:1},{unique:true})
Removing data from indexes after certain time, you can define just one of this per collection. The field should be a date data type.
db.<collection-name>.ensureIndex({field-name:1},{expireAfterSeconds:N})
Creating indexes in the backgroud instead of in the foreground. In the foreground, access to the collection is not allowed. In the backgroud, access to the collection is allowed but the index creation is slower.
db.<collection-name>.ensureIndex({field-name:1},{backgroud:true})
Compacting a given collection, it defragments the collection and rebuild all its indexes, it's a blocking operation and it's only recommended on maintenance windows. It depends on the storage driver, check the docs!
db.runCommand({compact:'<collection-name>'})
"two is one, and one is none" A mongo replica set is a group of mongod servers working together, each server has a copy of all the data. There is only one primary, the only one that accepts writes, and a group of secondary servers, they are read-only.
The primary server creates an oplog where it records all write operations. The oplog is a capped collection (the size of the collection is fxied) where mongodb writes changes up to the end and then overrides old documents stored (cyclical), this is very quickly for writes. Secondary servers read operations from the oplog and update their own copy of the data.
Starting monogo in replica set mode
mongod --replSet <replica-set-name>
Initializing a replica set
rs.initiate()
Getting the replica set configuration
var cfg = rs.config()
Changing the current configuration (it only works from a primary node)
rs.reconfig(cfg)
In order to be elected as primary, a server needs to receive more than 50% of all votes from other servers. In a configuration where you have 1 primary and 1 secondary servers, if the primary goes down, the secondary cannot become primary automatically since it doesn't get more than 50% of the votes, just 1 out of 2. For this cases, you can create an "Arbiter", an arbiter is a server that doesn't hold any data but can participate on primary elections. All of this happens automatically. If due to errors or a network partitiona majority cannot be obtained, you can start a new secondary or arbiter and the replica set will get back to normal.
Adding more members to the replica set
mongod --replSet <replica-set-name>
mongod --replSet <replica-set-name>
mongo
rs.initiate()
rs.add("<hostname>:<mongod-port>")
Adding an Arbiter to the replica set
mongod --replSet <replica-set-name>
rs.add("<hostname>:<mongod-port>", true)
Connecting to another mongodb server from the cli
db = connect("<hostname>:<mongod-port>/<db-name>")
Reading from a secondary server in a replica set
db.setSlaveOk()
db.<collection-name>.find()
Giving more priority to a given server at election time (default = 1)
var cfg = rs.config()
cfg.members[0].priority = 10
cfg
rs.reconfig(cfg)
Determining what mongo server you're connected to
db.getMongo()
Taking down a mongo server for N seconds, default 60 seconds.
db.stepDown(<seconds>)
Preventing an elligeble secondary to become primary for N seconds
db.freeze(<seconds>)
Making a secondary server hidden so no applications connect to them. It will never become primary (priority=0). Useful for reporting purposes.
var cfg = rs.config()
cfg.members[0].priority = 0
cfg.members[0].hidden = true
cfg
rs.reconfig(cfg)
Mongodb chaining is a feature that allow MongoDB secondary servers to sync data from another secondary server based on ping performance between servers. They will sync from the closest one. If you have secondary servers in 2 DCs, data has to go over the wire once from DC1 to DC2 and the rest of secondary servers in DC2 will sync from a secondary server in DC2 and not from the primary in DC1, saving bandwidth.
Durability. There are different write concern levels:
- Acknowledged. No guarantee.
- Journaled. Minimum durability guaranteed. default. j parameter
- Multi-member. requires the Acknowledge from multiple members in a replica set. w parameter
Mongo logs a lot of event to the mongod.log
file, it's good practice to review it when trying to solve an issue or on a schdeuled basis.
show logs
show log global
db.setlogLevel(verbosity, topic)
db.setlogLevel(4, 'replication')
db.setlogLevel(-1, 'replication')
topic possible values: accessControl, command, control, geo, index, network, query, replication, storage, journal, write verbosity: 0(default)..5
Query profiler. Detect slow queries
show profile
db.setProfilingLevel(level, threshold)
db.setProfilingLevel(1, 20)
db.system.profile.findOne({op:'query', ns:'<db-name>.<collection-name>'})
level: 0 (default)..2 threshold: capture only slo queries that exceed threshold
mongostat. Runtime statistics from mongod servers. Continuously polls and displays values.
mongostat --host <hostname> --port <port-number>
mongostat --host <hostname> --port <port-number> --rowcount N
mongotop. Shows where mongod spends most of the time. Continuously polls and displays values.
mongotop --host <hostname> --port <port-number>
MongoDB's database/collection disk and memory usage estimates
db.stats()
db.stats(1024) #KB
db.stats(1048576) #MB
db.<collection-name>.stats(1024) #KB
db.<collection-name>.stats(1048576) #MB
MongoDB server runtime status
db.serverStatus()
db.serverStatus().<section-name>
db.serverStatus().dur
db.serverStatus().mem
MongoDB Cloud manager solution https://www.mongodb.com/cloud/cloud-manager
What metrics to monitor https://www.datadoghq.com/blog/monitoring-mongodb-performance-metrics-wiredtiger/ https://www.datadoghq.com/blog/collecting-mongodb-metrics-and-statistics/ https://www.datadoghq.com/blog/monitor-mongodb-performance-with-datadog/
Reduce the surface area attackers can take advantage of: network, files and mongo data access.
- Don't make mongodb listens on all possible IPs of the VM or physical machine. You can restrict that with the config parameter
bindIp
. - Firewall rules to allow traffic only on needed ports.
- No public access to mongoDb directly, only through applications
- SSL connections to protect data and commands over the wire. configuration parameter
net/ssl
.net/ssl/mode
could berequireSSL|allowSSL|preferSSL|disabled
- Allow only certain users access to the mongodb's data and backup files (W/R permissions).
- No support for encryption at rest from current storage engines (6/22/102017)
- Use keyfiles (
security/keyFile
andsecurity/clusterAuthMode: sendKeyFile
) to identify valid members of a replicaset and protect intra cluster communication. You don't want to replicate your data to an unknown server. - Authentication and Authorization (
security/authorization: enabled
). Users without proper credentials are rejected. MongoDB has predefined list of roles with permissions to do different things in the DB, use those to proper manage users. e.g Root, userAdminAnyDatabase, userAdmin, read, readWrite. Follow the principle of least privilege. In order to create the first user, you need to connect from the server itself where mongoDB is running, not from client computers. Make sure the first user is a super user so you can create more users later.
mongo
use admin
db.createUser({user-config-object})
exit
mongo --username <username> --password <password> --authenticationDatabase <db-name-where-user-was-defined>
db.grantRolesToUser("<super-admin-user>", ["readWrite"])
show users
db.system.users.find()
Here is an example of a user configuration object
var root = {user: 'root', pwd: '12345', roles: ['root']}
var rpt = {user: 'reporter', pwd: '12345', roles: [{ role:read, db: 'demo'}]}
var app = {user: 'webApp', pwd: '12345', roles: [{ role:readWrite, db: 'demo'}]}
Removing users and revoking roles from users. When DB is not specified when granting/revoking roles, it assumes the current DB.
db.dropUser("<user-name>")
db.revokeRoleFromUser("<user-name>", ["<role-name>"])
Logging in and out. When logging in you need to make sure you are in the DB where the user was defined, by default this is the admin
. However, you can create users within any DB, this is different to the DB the user has permissions to read/write. This allows you to have multiple users with the same provided they are created in different DBs.
db.logout()
use admin
db.auth('<username>','<password>')
Hardware:
- CPU. least critical. limited effect on overall performance. Useful if you need compresion or encryption at storage level.
- Memory. More memory, larger working set. A working set is all the data Mongodb needs to satisfy an operation. Enough for your working set (documents used, indexes and intermidiary data). data no found in memory needs to be loaded from disk. high page-fault rate when not enough memory. Scale up and Scale out, if needed.
- Disk. I/O crucial for performance. SSD provides better performance for random access.
You can set-up a secondary server in a replicaset with cloud storage like AWS EBS, so your data is stored out of your datacenter in case of total failures. A good backup strategy is still required.
Storage engines: WiredTiger (recommended) or you can plug-in third-party engines if you need special features the built-in engine doesn't provide. e.g encryption
openssl rand -base64 741 > keyfile
sudo chown mongodb:mongodb /opt/mongodb/keyfile
sudo chmod 0600 /opt/mongodb/keyfile
docker-compose up
docker exec -it `docker ps -qf name=mongodb-s1` bash -c 'mongo'
rs.initiate({ _id: "ec-prehire", members: [{ _id: 1, host: "mongodb-s1:27017" }, { _id: 2, host: "mongodb-s2:27018" }, { _id: 3, host: "mongodb-s3:27019" }], settings: { getLastErrorDefaults: { w: "majority", wtimeout: 30000 }}})
rs.status()
admin = db.getSiblingDB("admin")
admin.createUser(
{
user: "adminUser",
pwd: "password",
roles: [ { role: "userAdminAnyDatabase", db: "admin" } ]
}
)
db.getSiblingDB("admin").auth("adminUser", "password")
OR
mongo -u "adminUser" -p "password" --authenticationDatabase "admin"
use admin
db.grantRolesToUser("adminUser", ["readWrite"])
db.getSiblingDB("admin").createUser(
{
"user" : "adminCluster",
"pwd" : "password",
roles: [ { "role" : "clusterAdmin", "db" : "admin" } ]
}
)
db.getSiblingDB("admin").createUser(
{
"user" : "jrAppUser",
"pwd" : "password",
roles: [ { "role" : "readWrite", "db" : "job_requirements" } ]
}
)
db.getSiblingDB("admin").createUser(
{
"user" : "jrAppReadUser",
"pwd" : "password",
roles: [ { "role" : "read", "db" : "job_requirements" } ]
}
)
db.getSiblingDB("admin").createUser(
{
"user" : "restoreUser",
"pwd" : "password",
roles: [ { "role" : "restore", "db" : "admin" } ]
}
)
docker exec -it docker ps -qf name=mongodb-s1
bash -c 'mongo -u "demoUser" -p "password" --authenticationDatabase "admin"'
export DOCKER_HOST=10.21.100.240
workers: .241, 242. 243
ssh [email protected] ctgisfisf
sudo su
docker run -d
-v /run/docker/plugins/:/run/docker/plugins/
-v /volumes/local-persist/:/var/lib/docker/plugin-data/
-v /volumes/mongodb/data/prehire/:/volumes/mongodb/data/prehire/
cwspear/docker-local-persist-volume-plugin
docker volume create --name mongodb-data-prehire -o mountpoint=/volumes/mongodb/data/prehire -d local-persist
docker volume create --name mongodb-data-prehire-s1 -o mountpoint=/volumes/mongodb/data/prehire/m1 -d local-persist
docker volume create --name mongodb-data-prehire-s2 -o mountpoint=/volumes/mongodb/data/prehire/m2 -d local-persist
docker volume create --name mongodb-data-prehire-s3 -o mountpoint=/volumes/mongodb/data/prehire/m3 -d local-persist
OR
docker volume create --driver=vsphere --name=mongodb-data-prehire-s1 -o size=3gb docker volume create --driver=vsphere --name=mongodb-data-prehire-s2 -o size=3gb docker volume create --driver=vsphere --name=mongodb-data-prehire-s3 -o size=3gb
docker run -it --rm -v mongodb-data-prehire-s3:/redis-data alpine:3.4 sh
[http://www.binarytides.com/linux-tar-command/
tar -cvzf mongod.tar.gz ./config/replicaset/
scp mongod.tar.gz [email protected]:/home/zcebjobs
tar -xvzf mongod.tar.gz -C /tmp/mongod/
docker run --rm -v $(pwd)/config/replicaset/:/incoming
-v mongodb-data-prehire:/config alpine:3.4 cp -rp /incoming/* /config
docker run -it --rm -v $(pwd)/config/replicaset:/incoming
-v mongodb-data-prehire:/config alpine:3.4 sh
You should run this on the docker server and then copy the files manually
docker run -it --rm -v /tmp/mongod/config/replicaset:/config-files
-v mongodb-data-prehire-s1:/config alpine:3.4 sh
cp /config-files/s3/* /config
I'm not sure why this doesn't work docker run --rm -v /tmp/mongod/config/replicaset:/config-files -v mongodb-data-prehire:/config alpine:3.4 cp -rp /config-files/* /config
Run a new container to double check the content was copied succeexitssfully docker run -it --rm -v mongodb-data-prehire:/config alpine:3.4 sh
scp docker-compose-prod.yml [email protected]:/home/zcebjobs
docker stack deploy --compose-file docker-compose-prod.yml mongodb-prehire
docker stack ps mongodb-prehire
docker stack rm mongodb-prehire
docker stack services mongodb-prehire
docker logs docker ps -qf name=mongodb-prehire_mongodb-s1
docker exec -it docker ps -qf name=mongodb-prehire_mongodb-s1
bash
docker exec -it docker ps -qf name=mongodb-prehire_mongodb-s1
bash -c 'mongo'
docker run -it --rm --name mongo-cli mongo:3.4 mongo --host 10.21.100.241 --port 27017
db = connect("mongodb-s3:27019") db.getSiblingDB("admin").auth("adminUser", "password") db.getSiblingDB("admin").auth("adminCluster", "password") db.getSiblingDB("admin").auth("jrAppUser", "password")
ssh [email protected]
mongodump --db job_requirements
tar -cvzf jr.tar.gz ./dump
scp jr.tar.gz [email protected]:/home/zcebjobs
sshdockertest
tar -xvzf jr.tar.gz -C ./mongo/
docker run -it --rm -v /home/zcebjobs/mongo:/backup -v mongodb-data-prehire-s1:/config alpine:3.4 sh
cp -r /backup/ /config/backup
exit
docker exec -it docker ps -qf name=mongodb-prehire_mongodb-s1
bash
mongorestore --host localhost --port 27018 -u restoreUser -p password --authenticationDatabase admin /data/db/backup/dump