- List & reloading changes
- Summary details of each connection status
- Settings tweaking in
/etc/sysctl.conf
- Miscellaneous
- Further reading
$ sysctl --all
$ sysctl --load
$ netstat --numeric --tcp | tail --lines +3 | \
awk "{n[\$6]++} END { for(k in n) { print k, n[k]; }}"
# per-socket receive/send buffers
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
# per-socket receive/send buffers for TCP [min default max]
net.ipv4.tcp_rmem = 4096 12582912 16777216
net.ipv4.tcp_wmem = 4096 12582912 16777216
#net.ipv4.tcp_rmem = 4096 87380 16777216
#net.ipv4.tcp_wmem = 4096 65536 16777216
#net.ipv4.tcp_rmem = 4096 16060 262144
#net.ipv4.tcp_wmem = 4096 16384 262144
# port range used by TCP and UDP to choose the local port
# default: net.ipv4.ip_local_port_range = 32768 60999
net.ipv4.ip_local_port_range = 1024 61000
# various timewait socket setting tweaks
net.ipv4.tcp_tw_reuse = 1
#net.ipv4.tcp_tw_recycle = 1
#net.ipv4.tcp_max_tw_buckets = 400000
#net.ipv4.tcp_max_orphans = 60000
# time that must elapse before TCP/IP can release an orphaned/closed connection and reuse its resources
# default: net.ipv4.tcp_fin_timeout = 60
#net.ipv4.tcp_fin_timeout = 30
# note: net.ipv4.tcp_syncookies enabled by default with Ubuntu 12.04LTS+
net.ipv4.tcp_syncookies = 1
# remembered connection requests, without an ACK
# note: will increase automatically in proportion to available memory
# default: net.ipv4.tcp_max_syn_backlog = 128
net.ipv4.tcp_max_syn_backlog = 4096
#net.ipv4.tcp_max_syn_backlog = 8096
# upper limit allowed for a listen() backlog
# maximum established sockets (with an ACK) waiting to be accepted by listening process
# default: net.core.somaxconn = 128
net.core.somaxconn = 1024
net.core.somaxconn = 4096
net.core.somaxconn = 8192
# give kernel more memory for TCP
#net.ipv4.tcp_mem = 50576 64768 98152
#net.core.netdev_max_backlog = 2500
#net.core.netdev_max_backlog = 5000
# note: only need to tweak if ip_conntrack is used - e.g. stateful iptables rules
# default: net.ipv4.netfilter.ip_conntrack_max = 65536
#net.ipv4.netfilter.ip_conntrack_max = 1048576
-
net.ipv4.tcp_tw_reuse = 1
:Allow to reuse
TIME_WAIT
sockets for new connections when it is safe from protocol viewpoint. In detail Linux will reuse an existing connection in theTIME_WAIT
state for a new outgoing connection only. An outgoing connection in theTIME_WAIT
state can be reused after just one second. Again, note the fact it will only reuse for outgoing connections, not incoming - so the practical use of this for a server might be fairly limited. -
net.ipv4.tcp_tw_recycle = 1
:- From this article it's not worth/dangerous to enable this with NAT devices connecting, also here.
- Warnings around this:
-
net.ipv4.tcp_max_tw_buckets
:Maximal number of timewait sockets held by system simultaneously. If this number is exceeded, a time-wait socket is immediately destroyed and a warning is printed. This limit exists only to prevent simple DoS attacks, you must not lower the limit artificially, but rather increase it (probably, after increasing installed memory), if network conditions require more than default value.
-
net.ipv4.tcp_max_orphans
:Maximal number of TCP sockets not attached to any user file handle, held by system. If this number is exceeded orphaned connections are reset immediately and a warning is printed. This limit exists only to prevent simple DoS attacks, you must not rely on this or lower the limit artificially, but rather increase it (probably, after increasing installed memory), if network conditions require more than default value and also with tuning of network services which linger - killing such states more aggressively. Note: each orphan eats up to 64K of unswappable memory.
-
net.ipv4.tcp_fin_timeout
:Time that must elapse before TCP/IP can release an orphaned (no longer referenced by any application) connection and reuse its resources. During this
TIME_WAIT
state, reopening the connection to the client costs less than establishing a new connection. Reducing the value of this entry, TCP/IP can release closed connections faster, making more resources available for new connections. Can cause issues when set below 25-30 seconds. -
net.ipv4.tcp_max_syn_backlog
:Maximal number of remembered connection requests, which have not received an acknowledgment from connecting client. The minimal value is 128 for low memory machines, and it will increase in proportion to the memory of machine. If server suffers from overload, try increasing this number.
-
net.core.somaxconn
:-
An upper limit for the value of the backlog parameter passed to the
listen(2)
function. If thebacklog
argument is greater than the value of/proc/sys/net/core/somaxconn
, then it is silently truncated to this limit. -
Note: as per the
listen(2)
man page, with Linux 2.2 the meaning ofbacklog
changed:- It now specifies the queue length for completely established sockets waiting to be accepted.
- The maximum length of incomplete connection requests is set via
net.ipv4.tcp_max_syn_backlog
.
-
Details: https://derrickpetzold.com/p/somaxconn/.
-
Raising this value may not be wise.
-
View current number of active connections in this queue:
$ netstat --all --numeric --tcp | grep --count "SYN_RECV"
-
-
Refer to: http://www.engineyard.com/blog/2012/linux-scalability/, this is good stuff.
-
Tweak
ulimit -a
values for open file handles. -
nf_conntrack_tcp_timeout_time_wait
:By default, a connection is supposed to stay in the
TIME_WAIT
state for twice the MSL. Its purpose is to make sure any lost packets that arrive after a connection is closed do not confuse the TCP subsystem. The default maximum segment lifetime (MSL) is 60 seconds, which puts the defaultTIME_WAIT
timeout value at 2 minutes. This means you’ll run out of available ports if you receive more than about 400 requests a second. -
nf_conntrack_tcp_timeout_established
:The established connection timeout. Technically this should only apply to connections that are in the
ESTABLISHED
state and a connection should get out of this state when a FIN packet goes through in either direction - but it seems this does not always happen. So how long do connections stay in this table then? It turns out that the default value fornf_conntrack_tcp_timeout_established
is 432000 seconds (around 5 days).
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 15
net.netfilter.nf_conntrack_tcp_timeout_established = 300
View maximum/current in use netfilter connection tracking counts:
$ sysctl net.ipv4.netfilter.ip_conntrack_max
$ sysctl net.netfilter.nf_conntrack_count
$ cat /proc/net/ip_conntrack && wc --lines /proc/net/ip_conntrack
Part of the following Amazon Web Services AMI: https://aws.amazon.com/marketplace/pp/B00UU272MM
# AMI-ID: ami-5d56a83f // nginx-plus-ami-amazon-linux-hvm-v1.2-20180118.x86_64
net.ipv4.ip_local_port_range = 1024 64999
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.core.wmem_max = 16777216
net.core.rmem_max = 16777216
net.ipv4.tcp_tw_reuse = 1
net.core.netdev_max_backlog = 30000
net.core.somaxconn = 32768
net.ipv4.tcp_max_orphans = 32768
# AMI-ID: ami-aefed3cd // nginx-plus-ami-amazon-linux-hvm-v1.1-20160426.x86_64
net.ipv4.ip_local_port_range = 1024 65000
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.core.wmem_max = 16777216
net.core.rmem_max = 16777216
net.ipv4.tcp_tw_reuse = 1
net.core.netdev_max_backlog = 30000
net.core.somaxconn = 32768
net.ipv4.tcp_max_orphans = 32768
- Kernel.org references for
/proc/sys/net/ipv4/*
and/proc/sys/net/netfilter/nf_conntrack_*
settings: - http://agiletesting.blogspot.com/2009/03/haproxy-and-apache-performance-tuning.html
- http://baheyeldin.com/technology/linux/detecting-and-preventing-syn-flood-attacks-web-servers-running-linux.html
- http://comments.gmane.org/gmane.comp.web.haproxy/1384
- http://lartc.org/howto/lartc.kernel.obscure.html (explains
tcp_max_orphans
&tcp_max_tw_buckets
well). - https://lowlatencyweb.wordpress.com/2012/03/20/500000-requestssec-modern-http-servers-are-fast/
- https://redmine.lighttpd.net/projects/1/wiki/Docs_Performance
- https://www.frozentux.net/documents/ipsysctl-tutorial/
- https://www.mail-archive.com/[email protected]/msg01708.html
- http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-1
- http://www.symantec.com/connect/articles/hardening-tcpip-stack-syn-attacks
- https://serverfault.com/questions/400822/tuning-linux-haproxy
- https://serverfault.com/questions/408576/why-does-nf-conntrack-count-keep-increasing
- https://vincent.bernat.ch/en/blog/2014-tcp-time-wait-state-linux
- https://www.slideshare.net/brendangregg/how-netflix-tunes-ec2-instances-for-performance (slide #33).
- https://man7.org/linux/man-pages/man2/listen.2.html
- https://man7.org/linux/man-pages/man7/tcp.7.html
- https://blog.cloudflare.com/syn-packet-handling-in-the-wild/
thanks Peter for your insights ! very important subject.
Do you have experience with specific sysctl networking params to support high load through haproxy (deployed on EKS) ?