First and foremost it is important to not run out of connections from the Rancher Server to the MySQL database server. The threads per server are driven by the thread pool size. So max connections should follow this basic formula:
Number of nodes x Total threads + buffer for non-rancher tasks
- Number of nodes: Rancher Management server nodes.
- Total Threads: Sum of all thread pools (maximums) found on (http(s):///admin/processes/pools). Default: 150
- Buffer: Non-rancher tasks such as monitoring, admin, backups and other ad hoc tasks require connections as well.
Rancher recommended absolute minimum in a single Rancher Server: 175
This setting is configurable via the my.cnf file config item: max_connections
Example:
3 Node HA setup with stock thread pool settings:
3 nodes x 150 threads/node + 25 extra = 475
- Note tuning has been done on the thread pools assume 1 connection per thread.
In practice, setting max_connections to 1000 on a normal Linux host should be simple and remove any immediate constraints. A larger number of connections might require OS tuning and max_open_files to be adjusted. Check out this guide from MySQL [https://dev.mysql.com/doc/refman/5.7/en/too-many-connections.html]
You will want to set the innodb_buffer_pool_sizing to ~80% of the hosts available memory.
Please note these are just guidelines.
Rancher Servers | Compute Nodes | MySQL CPU | MySQL Mem | MySQL IOPs | Disk size GB |
---|---|---|---|---|---|
1 | <=50 | 1-2 vCPU | 2 GB | 50-200 | 20 |
1 | 50-100 | 2 vCPU | 4 GB | 400-1000 | 50 |
1-3 | 100-200 | 4 vCPU | 8 GB | 1000+ | 100 |
1-3 | >200 | 8 vCPU | 16 GB | 6000+ | 200 |
Note: This assumes that the Rancher Server is tuned to handle these workloads as well. See Rancher Server Tuning guide. Also, disk sizes and IOPS are coupled in some cloud providers. It might be necessary to use larger disks to obtain the needed IOP performance.
When running your own database server we recommend the following
innodb_buffer_pool_size
= 80% of the system memory.
In Rancher environments with large environments or a high degree of churn you will want to consider adjusting the settings in the table below. This will help manage the overall size of the database tables.
setting | Rec. | Default | Description | URL |
---|---|---|---|---|
main_tables.purge.after.seconds | 7200 | 172800 (2 days) | Main table retention period | /v2-beta/settings/main_tables.purge.after.secondsservice_log.purge.after.seconds |
service_log.purge.after.seconds | 7200 | 86400 (1 day) | Retention period for service log entries. | /v2-beta/settings/service_log.purge.after.secondsevents.purge.after.seconds |
events.purge.after.seconds | 7200 | 43200 (12 hrs) | /v2-beta/settings/events.purge.after.seconds | |
audit_log.purge.after.seconds | 2592000 (30 days) | Audit log retention period | /v2-beta/settings/audit_log.purge.after.seconds |
Since workloads vary per installation, these are general guidelines on how to alleviate performance bottlenecks. Under loaded conditions if you see the following conditions, these solutions may help.
CPU 5-10% iowait, in this case you could migrate to improved disks. In this case adding SSDs, or moving to SSDs in a RAID configuration to improve available IOPS. If you are using bin logs you could also look to use a dedicated disk for storing them.
CPU 5-10% softirq, this could be a number of things, but possibly increasing the amount of memory and the innodb_buffer_pool_size setting could alleviate this condition. Additional cores could also help in this case as well.