You can see my latest update in my blog here https://medium.com/@calvin.hsieh/steps-to-install-mongodb-on-aws-ec2-instance-62db66981218
Credits:
- https://eladnava.com/deploy-a-highly-available-mongodb-replica-set-on-aws/
- http://www.serverlab.ca/tutorials/linux/database-servers/how-to-create-mongodb-replication-clusters/
Big thanks to Elad Nava and Shane Rainville for writing the above articles that allow me to conduct this guide. If by all means that I violated original post's copyright, please contact me.
Disclaimer
- You should at least read Elad's article before secition Get Started
- You should have some basic knowledge about the nature of AWS and MongoDB before continuing this guide.
Why make another guide if there are already good ones like the two above?
I found myself hitting walls over and over again for days on how to properly configure MongoDB with replica set and auth enabled. Lots of Google search here and there for days and finally found a solution. Without authentication and firewall, MongoDB is extremely vulnerable to the public.
First article teaches you step-by-step clearly on how to setup MongoDB on AWS EC2. However, MongoDB has gone through some updates and changes. So I included the changes in my article.
Second article shows you how to integrate internal authentication between replica set members properly and enable authentication.
I see there's a need to combine them into one guide, so I hope I can save your time if you are about to set up a MongoDB server especially when you want authentication enabled!
First, prepare the AWS EC2 instances for running MongoDB and to make sure you have your own domain name.
- Launch 3 brand new Ubuntu Server 16.04 LTS instances in EC2 console.
- Pick i3 instances if in need for NoSQL optimized instances; otherwise,
m3.medium
orm4.large
- Make sure each instance is in different availability zone
- Create new security group,
mongodb-cluster
- Configure all three instances to use it
- Allow SSH on port 22 from your IP only
- Allow port 27017 from the
mongodb-cluster
security group and your IP - So that both your IP and the replica set members have access to each other's mongod process listening on port 27017
- Label each instance you created as follows (replace example.com with your own domain name):
- Data - db1.example.com
- Data - db2.example.com
- Arbiter - arbiter1.example.com
Attach the requested IPs to each instance, so your replica members will maintain the same public IP throughout the lifetime.
Go to your domain's DNS console and add CNAME
records for db1, db2, arbiter1. For each record, enter each instance's Public DNS hostname, visible in the EC2 instances dashboard.
We will need to modify the server to the underlying OS in order for it to behave nicely with MongoDB.
SSH into each server and set its hostname so that when we initialize the replica set, members will be able to understand how to reach one another:
sudo bash -c 'echo db1.example.com > /etc/hostname && hostname -F /etc/hostname'
Make sure to modify db1.example.com and set it to each server's DNS hostname.
MongoDB needs to be able to create file descriptors when clients connect and spawn a large number of processes in order to operate effectively. The default file and process limits shipped with Ubuntu are not applicable for MongoDB.
Modify them by editing the limits.conf
file:
sudo nano /etc/security/limits.conf
Add the following lines to the end of the file:
* soft nofile 64000
* hard nofile 64000
* soft nproc 32000
* hard nproc 32000
Next, create a file called 90-nproc.conf
in /etc/security/limits.d/
:
sudo nano /etc/security/limits.d/90-nproc.conf
Paste the following lines into the file:
* soft nproc 32000
* hard nproc 32000
Transparent Huge Pages (THP) is a Linux memory management system that reduces the overhead of Translation Lookaside Buffer (TLB) lookups on machines with large amounts of memory by using larger memory pages.
However, database workloads often perform poorly with THP, because they tend to have sparse rather than contiguous memory access patterns. You should disable THP to ensure best performance with MongoDB.
Run the following commands to create an init script that will automatically disable THP on system boot:
sudo nano /etc/init.d/disable-transparent-hugepages
Paste the following inside it:
#!/bin/sh
### BEGIN INIT INFO
# Provides: disable-transparent-hugepages
# Required-Start: $local_fs
# Required-Stop:
# X-Start-Before: mongod mongodb-mms-automation-agent
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Disable Linux transparent huge pages
# Description: Disable Linux transparent huge pages, to improve
# database performance.
### END INIT INFO
case $1 in
start)
if [ -d /sys/kernel/mm/transparent_hugepage ]; then
thp_path=/sys/kernel/mm/transparent_hugepage
elif [ -d /sys/kernel/mm/redhat_transparent_hugepage ]; then
thp_path=/sys/kernel/mm/redhat_transparent_hugepage
else
return 0
fi
echo 'never' > ${thp_path}/enabled
echo 'never' > ${thp_path}/defrag
unset thp_path
;;
esac
Make it executable:
sudo chmod 755 /etc/init.d/disable-transparent-hugepages
Set it to start automatically on boot:
sudo update-rc.d disable-transparent-hugepages defaults
Linux by default will update the last access time when files are modified. When MongoDB performs frequent writes to the filesystem, this will create unnecessary overhead and performance degradation. We can disable this feature by editing the fstab
file:
sudo nano /etc/fstab
Add the noatime
flag directly after defaults
:
LABEL=cloudimg-rootfs / ext4 defaults,noatime,discard 0 0
In addition, the default disk read ahead settings on EC2 are not optimized for MongoDB. The number of blocks to read ahead should be adjusted to approximately 32 blocks (or 16 KB) of data. We can achieve this by adding a crontab entry that will execute when the system boots up:
sudo crontab -e
Choose nano
by pressing 2
if this is your first time editing the crontab, and then append the following to the end of the file:
@reboot /sbin/blockdev --setra 32 /dev/xvda1
Reboot the instance
sudo reboot
Reapt steps 1 to 5 for all replica set members.
After rebooting, you can check whether the new hostname is in effect by running:
hostname
Check that the OS limits have been increased by running:
ulimit -u # max number of processes
ulimit -n # max number of open file descriptors
The first command should output 32000
, the second 64000
.
Check whether the Transparent Huge Pages feature was disabled successfully by issuing the following commands:
cat /sys/kernel/mm/transparent_hugepage/enabled
cat /sys/kernel/mm/transparent_hugepage/defrag
For both commands, the correct output resembles:
always madvise [never]
Check that noatime
was successfully configured:
cat /proc/mounts | grep noatime
It should print a line similar to:
/dev/xvda1 / ext4 rw,noatime,discard,data=ordered 0 0
In addition, verify that the disk read-ahead value is correct by running:
sudo blockdev --getra /dev/xvda1
It should print 32
.
Verify the configuration for all replica set members.
Run the following commands to install the latest stable 3.4.x version of MongoDB:
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 0C49F3730359A14518585931BC711F9BA15703C6
echo "deb [ arch=amd64 ] http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.4.list
sudo apt-get update
sudo apt-get install -y mongodb-org
These commands will also auto-start mongod
, the MongoDB daemon. Repeat this step on all replica set members.
Repeat for all replica set members.
The keyFile
stores the password used by each node. The password allows each node to authenticate to each other, allowing them replicate changes between each other. This password should be long and very complex. We’ll use the openssl command to ensure our password is complex.
openssl rand -base64 741 > keyFile
Create the directory where the key will be stored
sudo mkdir -p /opt/mongodb
Copy the file to the new directory
sudo cp keyFile /opt/mongodb
Set the ownership of the keyfile to mongodb.
sudo chown mongodb:mongodb /opt/mongodb/keyFile
Set the appropriate file permissions.
sudo chmod 0600 /opt/mongodb/keyFile
Copy the KeyFile
for all replica set members.
Now it's time to configure MongoDB to operate in replica set mode, as well as allow remote access to the server.
sudo nano /etc/mongod.conf
Find and remove bindIp: 127.0.0.1
, or prefix it with a #
to comment it out:
# network interfaces
net:
port: 27017
# bindIp: 127.0.0.1 # remove or comment out this line
Find the commented out security
section and uncomment it. Use the path of the keyFile created earlier:
security:
keyFile: /opt/mongodb/keyFile
Find the commented out replication
section and uncomment it. Add the following below, replacing example-replica-set
with a name for your replica set:
replication:
replSetName: example-replica-set
IMPORTANT use the same replSetName
for ALL replica members
Create mongod.service
sudo nano /etc/systemd/system/mongod.service
Write the following to the file:
[Unit]
Description=High-performance, schema-free document-oriented database
After=network.target
[Service]
User=mongodb
ExecStart=/usr/bin/mongod --quiet --config /etc/mongod.conf
[Install]
WantedBy=multi-user.target
Enable mongod.service
sudo systemctl enable mongod.service
Restart MongoDB to apply our changes.
sudo service mongod restart
Repeat for all replica set members.
Be sure you have everything setup properly in all replica set members by this point.
Connect to one of the MongoDB instances (preferably db1
) using SSH to initialize the replica set and declare its members. Note that you only have to run these commands on one of the members. MongoDB will synchronize the replica set configuration to all of the other members automatically.
Connect to MongoDB via the following command:
mongo
Initialize the replica set:
rs.initiate()
The command will automatically add the current member as the first member of the replica set.
The default MongoDB configuration is wide open, meaning anyone can access the stored databases unless your network has firewall rules in place.
Create an admin user to access the database.
mongo
Select admin
database.
use admin
Create admin
account.
db.createUser( {
user: "johndoe",
pwd: "strongPassword",
roles: [{ role: "root", db: "admin" }]
});
It's recommended to not use special characters in the password to prevent issues logging in
Add the second data member to the replica set:
rs.add("db2.example.com")
And finally, add the arbiter, making sure to pass in true as the second argument (which denotes that the member is an arbiter and not a data member).
rs.add("arbiter1.example.com", true)
Be sure to replace example.com
with your own domain name.
Take a look at the replica set status by running:
rs.status()
Inspect the members
array. Look for one PRIMARY
, one SECONDARY
, and one ARBITER
member. All members should have a health
value of 1
. If not, make sure the members can talk to each other on port 27017
by using telnet
, for example.
mongo -u johndoe -p --authenticationDatabase admin
Enter password when prompted.
To properly fetch admin account info, use --authenticationDatabase admin
when accessing MongoDB
mongodb://johndoe:[email protected],db2.example.com/dbName?authSource=admin?replicaSet=example-replica-set
Refer to this post for more info on connection string format.
Don't forget to change:
- user and password to your own
example.com
to your own domaindbName
to your own database nameexample-replica-set
to your own replica set name
Credit:
- https://gist.github.com/eladnava/96bd9771cd2e01fb4427230563991c8d
- https://gist.github.com/caraboides/7679bb73f4f13e36fc2b9dbded3c24c0
First script requires extra use of space to hold temporary backup files, and second script allows you to directly backup files to S3 without extra use of space. So I merged both together along with authentication to backup/restore database easily.
Checkout this script on how to automate the backups
For how to setup log rotation, perform maintenance, and more, please visit Elad's post
Now you just deployed highly-available MongoDB server on AWS on your own! Hooray!
I have not deployed cluster with sharding yet; that'll just give me more headaches and time required to maintain on my own if I do that IMO...