Skip to content

Instantly share code, notes, and snippets.

@magnetikonline
Last active October 24, 2024 16:31
Show Gist options
  • Save magnetikonline/2760f98f6bf654d5ad79 to your computer and use it in GitHub Desktop.
Save magnetikonline/2760f98f6bf654d5ad79 to your computer and use it in GitHub Desktop.
Collection of /etc/sysctl.conf networking notes.

Collection of sysctl networking notes

List & reloading changes

$ sysctl --all
$ sysctl --load

Summary details of each connection status

$ netstat --numeric --tcp | tail --lines +3 | \
  awk "{n[\$6]++} END { for(k in n) { print k, n[k]; }}"

Settings tweaking in /etc/sysctl.conf

# per-socket receive/send buffers
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216

# per-socket receive/send buffers for TCP [min default max]
net.ipv4.tcp_rmem = 4096 12582912 16777216
net.ipv4.tcp_wmem = 4096 12582912 16777216
#net.ipv4.tcp_rmem = 4096 87380 16777216
#net.ipv4.tcp_wmem = 4096 65536 16777216
#net.ipv4.tcp_rmem = 4096 16060 262144
#net.ipv4.tcp_wmem = 4096 16384 262144

# port range used by TCP and UDP to choose the local port
# default: net.ipv4.ip_local_port_range = 32768 60999
net.ipv4.ip_local_port_range = 1024 61000

# various timewait socket setting tweaks
net.ipv4.tcp_tw_reuse = 1
#net.ipv4.tcp_tw_recycle = 1
#net.ipv4.tcp_max_tw_buckets = 400000
#net.ipv4.tcp_max_orphans = 60000

# time that must elapse before TCP/IP can release an orphaned/closed connection and reuse its resources
# default: net.ipv4.tcp_fin_timeout = 60
#net.ipv4.tcp_fin_timeout = 30

# note: net.ipv4.tcp_syncookies enabled by default with Ubuntu 12.04LTS+
net.ipv4.tcp_syncookies = 1

# remembered connection requests, without an ACK
# note: will increase automatically in proportion to available memory
# default: net.ipv4.tcp_max_syn_backlog = 128
net.ipv4.tcp_max_syn_backlog = 4096
#net.ipv4.tcp_max_syn_backlog = 8096

# upper limit allowed for a listen() backlog
# maximum established sockets (with an ACK) waiting to be accepted by listening process
# default: net.core.somaxconn = 128
net.core.somaxconn = 1024
net.core.somaxconn = 4096
net.core.somaxconn = 8192

# give kernel more memory for TCP
#net.ipv4.tcp_mem = 50576 64768 98152
#net.core.netdev_max_backlog = 2500
#net.core.netdev_max_backlog = 5000

# note: only need to tweak if ip_conntrack is used - e.g. stateful iptables rules
# default: net.ipv4.netfilter.ip_conntrack_max = 65536
#net.ipv4.netfilter.ip_conntrack_max = 1048576

Detailed setting notes

  • net.ipv4.tcp_tw_reuse = 1:

    Allow to reuse TIME_WAIT sockets for new connections when it is safe from protocol viewpoint. In detail Linux will reuse an existing connection in the TIME_WAIT state for a new outgoing connection only. An outgoing connection in the TIME_WAIT state can be reused after just one second. Again, note the fact it will only reuse for outgoing connections, not incoming - so the practical use of this for a server might be fairly limited.

  • net.ipv4.tcp_tw_recycle = 1:

  • net.ipv4.tcp_max_tw_buckets:

    Maximal number of timewait sockets held by system simultaneously. If this number is exceeded, a time-wait socket is immediately destroyed and a warning is printed. This limit exists only to prevent simple DoS attacks, you must not lower the limit artificially, but rather increase it (probably, after increasing installed memory), if network conditions require more than default value.

  • net.ipv4.tcp_max_orphans:

    Maximal number of TCP sockets not attached to any user file handle, held by system. If this number is exceeded orphaned connections are reset immediately and a warning is printed. This limit exists only to prevent simple DoS attacks, you must not rely on this or lower the limit artificially, but rather increase it (probably, after increasing installed memory), if network conditions require more than default value and also with tuning of network services which linger - killing such states more aggressively. Note: each orphan eats up to 64K of unswappable memory.

  • net.ipv4.tcp_fin_timeout:

    Time that must elapse before TCP/IP can release an orphaned (no longer referenced by any application) connection and reuse its resources. During this TIME_WAIT state, reopening the connection to the client costs less than establishing a new connection. Reducing the value of this entry, TCP/IP can release closed connections faster, making more resources available for new connections. Can cause issues when set below 25-30 seconds.

  • net.ipv4.tcp_max_syn_backlog:

    Maximal number of remembered connection requests, which have not received an acknowledgment from connecting client. The minimal value is 128 for low memory machines, and it will increase in proportion to the memory of machine. If server suffers from overload, try increasing this number.

  • net.core.somaxconn:

    • An upper limit for the value of the backlog parameter passed to the listen(2) function. If the backlog argument is greater than the value of /proc/sys/net/core/somaxconn, then it is silently truncated to this limit.

    • Note: as per the listen(2) man page, with Linux 2.2 the meaning of backlog changed:

      • It now specifies the queue length for completely established sockets waiting to be accepted.
      • The maximum length of incomplete connection requests is set via net.ipv4.tcp_max_syn_backlog.
    • Details: https://derrickpetzold.com/p/somaxconn/.

    • Raising this value may not be wise.

    • View current number of active connections in this queue:

       $ netstat --all --numeric --tcp | grep --count "SYN_RECV"

Control TIME_WAIT and connection tracking timeouts

  • Refer to: http://www.engineyard.com/blog/2012/linux-scalability/, this is good stuff.

  • Tweak ulimit -a values for open file handles.

  • nf_conntrack_tcp_timeout_time_wait:

    By default, a connection is supposed to stay in the TIME_WAIT state for twice the MSL. Its purpose is to make sure any lost packets that arrive after a connection is closed do not confuse the TCP subsystem. The default maximum segment lifetime (MSL) is 60 seconds, which puts the default TIME_WAIT timeout value at 2 minutes. This means you’ll run out of available ports if you receive more than about 400 requests a second.

  • nf_conntrack_tcp_timeout_established:

    The established connection timeout. Technically this should only apply to connections that are in the ESTABLISHED state and a connection should get out of this state when a FIN packet goes through in either direction - but it seems this does not always happen. So how long do connections stay in this table then? It turns out that the default value for nf_conntrack_tcp_timeout_established is 432000 seconds (around 5 days).

net.netfilter.nf_conntrack_tcp_timeout_time_wait = 15
net.netfilter.nf_conntrack_tcp_timeout_established = 300

View maximum/current in use netfilter connection tracking counts:

$ sysctl net.ipv4.netfilter.ip_conntrack_max
$ sysctl net.netfilter.nf_conntrack_count
$ cat /proc/net/ip_conntrack && wc --lines /proc/net/ip_conntrack

Miscellaneous

Nginx Plus additions

Part of the following Amazon Web Services AMI: https://aws.amazon.com/marketplace/pp/B00UU272MM

# AMI-ID: ami-5d56a83f // nginx-plus-ami-amazon-linux-hvm-v1.2-20180118.x86_64
net.ipv4.ip_local_port_range = 1024 64999
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.core.wmem_max = 16777216
net.core.rmem_max = 16777216
net.ipv4.tcp_tw_reuse = 1
net.core.netdev_max_backlog = 30000
net.core.somaxconn = 32768
net.ipv4.tcp_max_orphans = 32768
# AMI-ID: ami-aefed3cd // nginx-plus-ami-amazon-linux-hvm-v1.1-20160426.x86_64
net.ipv4.ip_local_port_range = 1024 65000
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.core.wmem_max = 16777216
net.core.rmem_max = 16777216
net.ipv4.tcp_tw_reuse = 1
net.core.netdev_max_backlog = 30000
net.core.somaxconn = 32768
net.ipv4.tcp_max_orphans = 32768

Further reading

@smtibaa
Copy link

smtibaa commented May 18, 2024

thanks Peter for your insights ! very important subject.
Do you have experience with specific sysctl networking params to support high load through haproxy (deployed on EKS) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment