Skip to content

Instantly share code, notes, and snippets.

@calvinh8
Last active October 27, 2024 17:55
Show Gist options
  • Save calvinh8/c99e198ce5df3d8b1f1e42c1b984d7a4 to your computer and use it in GitHub Desktop.
Save calvinh8/c99e198ce5df3d8b1f1e42c1b984d7a4 to your computer and use it in GitHub Desktop.
MongoDB Setup Guide for AWS EC2 Instances with Auth Enabled

MongoDB AWS EC2 Setup Guide

You can see my latest update in my blog here https://medium.com/@calvin.hsieh/steps-to-install-mongodb-on-aws-ec2-instance-62db66981218

Credits:

Big thanks to Elad Nava and Shane Rainville for writing the above articles that allow me to conduct this guide. If by all means that I violated original post's copyright, please contact me.

Disclaimer

  • You should at least read Elad's article before secition Get Started
  • You should have some basic knowledge about the nature of AWS and MongoDB before continuing this guide.

Why make another guide if there are already good ones like the two above?

I found myself hitting walls over and over again for days on how to properly configure MongoDB with replica set and auth enabled. Lots of Google search here and there for days and finally found a solution. Without authentication and firewall, MongoDB is extremely vulnerable to the public.

First article teaches you step-by-step clearly on how to setup MongoDB on AWS EC2. However, MongoDB has gone through some updates and changes. So I included the changes in my article.

Second article shows you how to integrate internal authentication between replica set members properly and enable authentication.

I see there's a need to combine them into one guide, so I hope I can save your time if you are about to set up a MongoDB server especially when you want authentication enabled!

AWS EC2 Setup

First, prepare the AWS EC2 instances for running MongoDB and to make sure you have your own domain name.

1. Launch the instances

  • Launch 3 brand new Ubuntu Server 16.04 LTS instances in EC2 console.
  • Pick i3 instances if in need for NoSQL optimized instances; otherwise, m3.medium or m4.large
  • Make sure each instance is in different availability zone
  • Create new security group, mongodb-cluster
    • Configure all three instances to use it
    • Allow SSH on port 22 from your IP only
    • Allow port 27017 from the mongodb-cluster security group and your IP
    • So that both your IP and the replica set members have access to each other's mongod process listening on port 27017
  • Label each instance you created as follows (replace example.com with your own domain name):
  • Data - db1.example.com
  • Data - db2.example.com
  • Arbiter - arbiter1.example.com

2. Reqeust 3 Elastic IPs

Attach the requested IPs to each instance, so your replica members will maintain the same public IP throughout the lifetime.

3. Setup DNS Records

Go to your domain's DNS console and add CNAME records for db1, db2, arbiter1. For each record, enter each instance's Public DNS hostname, visible in the EC2 instances dashboard.

Configure Server

We will need to modify the server to the underlying OS in order for it to behave nicely with MongoDB.

1. Set the Hostname

SSH into each server and set its hostname so that when we initialize the replica set, members will be able to understand how to reach one another:

sudo bash -c 'echo db1.example.com > /etc/hostname && hostname -F /etc/hostname'

Make sure to modify db1.example.com and set it to each server's DNS hostname.

2. Increase OS Limits

MongoDB needs to be able to create file descriptors when clients connect and spawn a large number of processes in order to operate effectively. The default file and process limits shipped with Ubuntu are not applicable for MongoDB.

Modify them by editing the limits.conf file:

sudo nano /etc/security/limits.conf

Add the following lines to the end of the file:

* soft nofile 64000
* hard nofile 64000
* soft nproc 32000
* hard nproc 32000

Next, create a file called 90-nproc.conf in /etc/security/limits.d/:

sudo nano /etc/security/limits.d/90-nproc.conf

Paste the following lines into the file:

* soft nproc 32000
* hard nproc 32000

3. Disable Transparent Huge Pages

Transparent Huge Pages (THP) is a Linux memory management system that reduces the overhead of Translation Lookaside Buffer (TLB) lookups on machines with large amounts of memory by using larger memory pages.

However, database workloads often perform poorly with THP, because they tend to have sparse rather than contiguous memory access patterns. You should disable THP to ensure best performance with MongoDB.

Run the following commands to create an init script that will automatically disable THP on system boot:

sudo nano /etc/init.d/disable-transparent-hugepages

Paste the following inside it:

#!/bin/sh
### BEGIN INIT INFO
# Provides:          disable-transparent-hugepages
# Required-Start:    $local_fs
# Required-Stop:
# X-Start-Before:    mongod mongodb-mms-automation-agent
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: Disable Linux transparent huge pages
# Description:       Disable Linux transparent huge pages, to improve
#                    database performance.
### END INIT INFO

case $1 in
  start)
    if [ -d /sys/kernel/mm/transparent_hugepage ]; then
      thp_path=/sys/kernel/mm/transparent_hugepage
    elif [ -d /sys/kernel/mm/redhat_transparent_hugepage ]; then
      thp_path=/sys/kernel/mm/redhat_transparent_hugepage
    else
      return 0
    fi

    echo 'never' > ${thp_path}/enabled
    echo 'never' > ${thp_path}/defrag

    unset thp_path
    ;;
esac

Make it executable:

sudo chmod 755 /etc/init.d/disable-transparent-hugepages

Set it to start automatically on boot:

sudo update-rc.d disable-transparent-hugepages defaults

4. Configure the File System

Linux by default will update the last access time when files are modified. When MongoDB performs frequent writes to the filesystem, this will create unnecessary overhead and performance degradation. We can disable this feature by editing the fstab file:

sudo nano /etc/fstab

Add the noatime flag directly after defaults:

LABEL=cloudimg-rootfs   /        ext4   defaults,noatime,discard        0 0

In addition, the default disk read ahead settings on EC2 are not optimized for MongoDB. The number of blocks to read ahead should be adjusted to approximately 32 blocks (or 16 KB) of data. We can achieve this by adding a crontab entry that will execute when the system boots up:

sudo crontab -e

Choose nano by pressing 2 if this is your first time editing the crontab, and then append the following to the end of the file:

@reboot /sbin/blockdev --setra 32 /dev/xvda1

5. Reboot

Reboot the instance

sudo reboot

6. Repeat Steps 1 - 5

Reapt steps 1 to 5 for all replica set members.

Verify Server Configuration

After rebooting, you can check whether the new hostname is in effect by running:

hostname

Check that the OS limits have been increased by running:

ulimit -u # max number of processes
ulimit -n # max number of open file descriptors

The first command should output 32000, the second 64000.

Check whether the Transparent Huge Pages feature was disabled successfully by issuing the following commands:

cat /sys/kernel/mm/transparent_hugepage/enabled
cat /sys/kernel/mm/transparent_hugepage/defrag

For both commands, the correct output resembles:

always madvise [never]

Check that noatime was successfully configured:

cat /proc/mounts | grep noatime

It should print a line similar to:

/dev/xvda1 / ext4 rw,noatime,discard,data=ordered 0 0

In addition, verify that the disk read-ahead value is correct by running:

sudo blockdev --getra /dev/xvda1

It should print 32.

Verify the configuration for all replica set members.

Install MongoDB

Run the following commands to install the latest stable 3.4.x version of MongoDB:

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 0C49F3730359A14518585931BC711F9BA15703C6
echo "deb [ arch=amd64 ] http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.4.list

sudo apt-get update
sudo apt-get install -y mongodb-org

These commands will also auto-start mongod, the MongoDB daemon. Repeat this step on all replica set members.

Repeat for all replica set members.

MongoDB Setup

Create keyFile

The keyFile stores the password used by each node. The password allows each node to authenticate to each other, allowing them replicate changes between each other. This password should be long and very complex. We’ll use the openssl command to ensure our password is complex.

openssl rand -base64 741 > keyFile

Create the directory where the key will be stored

sudo mkdir -p /opt/mongodb

Copy the file to the new directory

sudo cp keyFile /opt/mongodb

Set the ownership of the keyfile to mongodb.

sudo chown mongodb:mongodb /opt/mongodb/keyFile

Set the appropriate file permissions.

sudo chmod 0600 /opt/mongodb/keyFile

Copy the KeyFile for all replica set members.

Setup mongod.conf

Now it's time to configure MongoDB to operate in replica set mode, as well as allow remote access to the server.

sudo nano /etc/mongod.conf

Find and remove bindIp: 127.0.0.1, or prefix it with a # to comment it out:

# network interfaces
net:
  port: 27017
#  bindIp: 127.0.0.1  # remove or comment out this line

Find the commented out security section and uncomment it. Use the path of the keyFile created earlier:

security:
  keyFile: /opt/mongodb/keyFile

Find the commented out replication section and uncomment it. Add the following below, replacing example-replica-set with a name for your replica set:

replication:
  replSetName: example-replica-set

IMPORTANT use the same replSetName for ALL replica members

Create mongod.service

sudo nano /etc/systemd/system/mongod.service

Write the following to the file:

[Unit]
Description=High-performance, schema-free document-oriented database
After=network.target

[Service]
User=mongodb
ExecStart=/usr/bin/mongod --quiet --config /etc/mongod.conf

[Install]
WantedBy=multi-user.target

Enable mongod.service

sudo systemctl enable mongod.service

Restart MongoDB to apply our changes.

sudo service mongod restart

Repeat for all replica set members.

Initialize the Replica Set

Be sure you have everything setup properly in all replica set members by this point.

Connect to one of the MongoDB instances (preferably db1) using SSH to initialize the replica set and declare its members. Note that you only have to run these commands on one of the members. MongoDB will synchronize the replica set configuration to all of the other members automatically.

Connect to MongoDB via the following command:

mongo

Initialize the replica set:

rs.initiate()

The command will automatically add the current member as the first member of the replica set.

Create Admin Account

The default MongoDB configuration is wide open, meaning anyone can access the stored databases unless your network has firewall rules in place.

Create an admin user to access the database.

mongo

Select admin database.

use admin

Create admin account.

db.createUser( {
    user: "johndoe",
    pwd: "strongPassword",
    roles: [{ role: "root", db: "admin" }]
});

It's recommended to not use special characters in the password to prevent issues logging in

Adding Replica Members

Add the second data member to the replica set:

rs.add("db2.example.com")

And finally, add the arbiter, making sure to pass in true as the second argument (which denotes that the member is an arbiter and not a data member).

rs.add("arbiter1.example.com", true)

Be sure to replace example.com with your own domain name.

Verify Replica Set Status

Take a look at the replica set status by running:

rs.status()

Inspect the members array. Look for one PRIMARY, one SECONDARY, and one ARBITER member. All members should have a health value of 1. If not, make sure the members can talk to each other on port 27017 by using telnet, for example.

Connect MongoDB with Authentication

Using Command Line

mongo -u johndoe -p --authenticationDatabase admin

Enter password when prompted.

To properly fetch admin account info, use --authenticationDatabase admin when accessing MongoDB

Using Connection String

mongodb://johndoe:[email protected],db2.example.com/dbName?authSource=admin?replicaSet=example-replica-set

Refer to this post for more info on connection string format.

Don't forget to change:

  • user and password to your own
  • example.com to your own domain
  • dbName to your own database name
  • example-replica-set to your own replica set name

Automated Backup to AWS S3

Credit:

First script requires extra use of space to hold temporary backup files, and second script allows you to directly backup files to S3 without extra use of space. So I merged both together along with authentication to backup/restore database easily.

Checkout this script on how to automate the backups

Further Configuration

For how to setup log rotation, perform maintenance, and more, please visit Elad's post

Hooray!

Now you just deployed highly-available MongoDB server on AWS on your own! Hooray!

#!/bin/sh
# Make sure to:
# 1) Name this file `backup.sh` and place it in /home/ubuntu
# 2) Run sudo apt-get install awscli to install the AWSCLI
# 3) Run aws configure (enter s3-authorized IAM user and specify region)
# 4) Fill in DB host + name
# 5) Create S3 bucket for the backups and fill it in below (set a lifecycle rule to expire files older than X days in the bucket)
# 6) Run chmod +x backup.sh
# 7) Test it out via ./backup.sh
# 8) Set up a daily backup at midnight via `crontab -e`:
# 0 0 * * * /home/ubuntu/backup.sh > /home/ubuntu/backup.log
# DB host (secondary preferred as to avoid impacting primary performance)
HOST=db2.example.com
# DB name
DBNAME=dbName
# S3 bucket name
BUCKET=mongodb/backup
# Current time
TIME=`/bin/date +%m-%d-%Y-%T`
# Username
USERNAME=johndoe
# Password
PASSWORD=strongPassword
# Log
echo "Backing up $HOST/$DBNAME to s3://$BUCKET/ on $TIME";
S3PATH="s3://$BUCKET/"
S3BACKUP=$S3PATH$TIME.gz
S3LATEST=$S3PATH"latest".gz
# Make S3 bucket
/usr/bin/aws s3 mb $S3PATH
# Dump from MongoDB data to S3
/usr/bin/mongodump -h $HOST -d $DBNAME -p $PASSWORD -u $USERNAME --authenticationDatabase "admin" --gzip --archive | aws s3 cp - $S3BACKUP
# Copy the new backup to latest
/usr/bin/aws s3 cp $S3BACKUP $S3LATEST
# All done
echo "Backup available at https://s3.amazonaws.com/$BUCKET/$TIME.gz"
# To restore stabase
# aws s3 cp s3://mongodb/backup/latest.gz - | mongorestore -h db1.example.com -d dbName --archive --gzip -u johndoe -p strongPassword --authenticationDatabase admin
@alok102singh
Copy link

alok102singh commented Jan 2, 2019

@daniyel i am also facing same problem. Did find any solution for this?

@nehuru
Copy link

nehuru commented Jan 24, 2019

Yes, I was able to set up up everything properly.

@nehuru
Copy link

nehuru commented Jan 24, 2019

Nice Article

@jafuentest
Copy link

This is pure gold

@peter-snr
Copy link

Used this article to set up my Mongo environment, although I use an AWS private hosted zone and mongo db v4, all works fine. I also set bindIp to the DNS IP address for the server. Otherwise it would initialise to localhost and then cause issues when trying to add the other nodes.

Thank you for this article, very nice.

@parikshittyagi
Copy link

Thanks a ton for sharing this...., have you deployed such cluster with shard partition?

@calvinh8
Copy link
Author

Thanks a ton for sharing this...., have you deployed such cluster with shard partition?

I have not deployed cluster with sharding yet; that'll just give me more headaches and time required to maintain on my own if I do that IMO...

@mebibou
Copy link

mebibou commented Jan 23, 2021

I had to set

net:
  port: 27017
  bindIp: 0.0.0.0

And in /etc/hosts

127.0.0.1 localhost <domain name>

For the configuration to work, otherwise the different hosts could not see each other

@slejnej
Copy link

slejnej commented Dec 2, 2021

Has anyone been able to create a cluster with privet subnets only and without public IPs?
Example:

  • each instance is available only through bastion via 22
  • each instance is available via 27017 through AWS private IP and bastion
  • each WEB instance created via ELB is private and is added to the 27017 exceptions for access

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment