Posted on April 25, 2017 at 1:10 pm
There are many guides online about Linux kernel and TCP tuning, I tried to sum the most useful and detailed Linux kernel and TCP tuning tips, including the best guides about TCP and kernel tuning on Linux, useful to scale and handle more concurrent connections on a linux server.
Time ago I wrote about optimizing Linux Sysctl.conf parameters.
This is a more advanced post about Linux TCP and Kernel optimization.
This is the /etc/sysctl.conf file I use on my servers (Debian 8.7.1):
I included references and personal comments.
# Increase number of max open-files fs.file-max = 150000 # Increase max number of PIDs kernel.pid_max = 4194303 # Increase range of ports that can be used net.ipv4.ip_local_port_range = 1024 65535 # https://tweaked.io/guide/kernel/ # Forking servers, like PostgreSQL or Apache, scale to much higher levels of concurrent connections if this is made larger kernel.sched_migration_cost_ns=5000000 # https://tweaked.io/guide/kernel/ # Various PostgreSQL users have reported (on the postgresql performance mailing list) gains up to 30% on highly concurrent workloads on multi-core systems kernel.sched_autogroup_enabled = 0 # https://github.com/ton31337/tools/wiki/tcp_slow_start_after_idle---tcp_no_metrics_save-performance # Avoid falling back to slow start after a connection goes idle net.ipv4.tcp_slow_start_after_idle=0 net.ipv4.tcp_no_metrics_save=0 # https://github.com/ton31337/tools/wiki/Is-net.ipv4.tcp_abort_on_overflow-good-or-not%3F net.ipv4.tcp_abort_on_overflow=0 # Enable TCP window scaling (enabled by default) # https://en.wikipedia.org/wiki/TCP_window_scale_option net.ipv4.tcp_window_scaling=1 # Enables fast recycling of TIME_WAIT sockets. # (Use with caution according to the kernel documentation!) net.ipv4.tcp_tw_recycle = 1 # Allow reuse of sockets in TIME_WAIT state for new connections # only when it is safe from the network stack’s perspective. net.ipv4.tcp_tw_reuse = 1 # Turn on SYN-flood protections net.ipv4.tcp_syncookies=1 # Only retry creating TCP connections twice # Minimize the time it takes for a connection attempt to fail net.ipv4.tcp_syn_retries=2 net.ipv4.tcp_synack_retries=2 net.ipv4.tcp_orphan_retries=2 # How many retries TCP makes on data segments (default 15) # Some guides suggest to reduce this value net.ipv4.tcp_retries2=8 # Optimize connection queues # https://www.linode.com/docs/web-servers/nginx/configure-nginx-for-optimized-performance # Increase the number of packets that can be queued net.core.netdev_max_backlog = 3240000 # Max number of "backlogged sockets" (connection requests that can be queued for any given listening socket) net.core.somaxconn = 50000 # Increase max number of sockets allowed in TIME_WAIT net.ipv4.tcp_max_tw_buckets = 1440000 # Number of packets to keep in the backlog before the kernel starts dropping them # A sane value is net.ipv4.tcp_max_syn_backlog = 3240000 net.ipv4.tcp_max_syn_backlog = 3240000 # TCP memory tuning # View memory TCP actually uses with: cat /proc/net/sockstat # *** These values are auto-created based on your server specs *** # *** Edit these parameters with caution because they will use more RAM *** # Changes suggested by IBM on https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Welcome%20to%20High%20Performance%20Computing%20%28HPC%29%20Central/page/Linux%20System%20Tuning%20Recommendations # Increase the default socket buffer read size (rmem_default) and write size (wmem_default) # *** Maybe recommended only for high-RAM servers? *** net.core.rmem_default=16777216 net.core.wmem_default=16777216 # Increase the max socket buffer size (optmem_max), max socket buffer read size (rmem_max), max socket buffer write size (wmem_max) # 16MB per socket - which sounds like a lot, but will virtually never consume that much # rmem_max over-rides tcp_rmem param, wmem_max over-rides tcp_wmem param and optmem_max over-rides tcp_mem param net.core.optmem_max=16777216 net.core.rmem_max=16777216 net.core.wmem_max=16777216 # Configure the Min, Pressure, Max values (units are in page size) # Useful mostly for very high-traffic websites that have a lot of RAM # Consider that we already set the *_max values to 16777216 # So you may eventually comment these three lines net.ipv4.tcp_mem=16777216 16777216 16777216 net.ipv4.tcp_wmem=4096 87380 16777216 net.ipv4.tcp_rmem=4096 87380 16777216 # Keepalive optimizations # By default, the keepalive routines wait for two hours (7200 secs) before sending the first keepalive probe, # and then resend it every 75 seconds. If no ACK response is received for 9 consecutive times, the connection is marked as broken. # The default values are: tcp_keepalive_time = 7200, tcp_keepalive_intvl = 75, tcp_keepalive_probes = 9 # We would decrease the default values for tcp_keepalive_* params as follow: net.ipv4.tcp_keepalive_time = 600 net.ipv4.tcp_keepalive_intvl = 10 net.ipv4.tcp_keepalive_probes = 9 # The TCP FIN timeout belays the amount of time a port must be inactive before it can reused for another connection. # The default is often 60 seconds, but can normally be safely reduced to 30 or even 15 seconds # https://www.linode.com/docs/web-servers/nginx/configure-nginx-for-optimized-performance net.ipv4.tcp_fin_timeout = 7
The following modifications caused many 500 errors, so I removed them:
# Disable TCP SACK (TCP Selective Acknowledgement), DSACK (duplicate TCP SACK), and FACK (Forward Acknowledgement) # SACK requires enabling tcp_timestamps and adds some packet overhead # Only advised in cases of packet loss on the network net.ipv4.tcp_sack = 0 net.ipv4.tcp_dsack = 0 net.ipv4.tcp_fack = 0 # Disable TCP timestamps # Can have a performance overhead and is only advised in cases where sack is needed (see tcp_sack) net.ipv4.tcp_timestamps=0
Type “sysctl -p” to apply the sysctl changes (I also reboot the server).
Reduce Disk I/O Requests
Another optimization I have done on my servers is to mount the /webserver partition with “noatime” to disable the access time on files to reduce the disk I\O. Just edit /etc/fstab and add “noatime” to the partition where you have the web server data (vhosts, database, etc):
UUID=[...] /webserver ext4 defaults,noexec,nodev,nosuid,noatime 0 2
For the changes to take effect reboot the server or remount the partition:
mount -o remount /webserver
Use “mount” to verify that /webserver has been remounted with “noatime” attribute.
You may disable access time also on / partition and other partitions.
Disable Nginx Access Log
Reduce disk I\O by disabling web server access logs:
Concurrent Connections Test
This is a screenshot of the concurrent connections handled with the above changes:
I used https://loader.io/ to stress-test the server.
This is a screenshot without any sysctl.conf changes (a lot of 500 errors):
This is a screenshot without the sysctl.conf “TCP memory tuning”:
References and Links
Here are the guides I used to create the sysctl.conf file:
Find detailed information about all TCP variables:
Useful tips about Linux TCP and kernel optimizations:
Optimizing servers – Tuning the GNU/Linux Kernel
Linux System Tuning Recommendations by IBM
Part 1: Lessons learned tuning TCP and Nginx in EC2
Using TCP keepalive under Linux
Kernel: The “Out of socket memory” error
Sysctl tweaks – Sysctl Network tweaks and settings for VMs / VPS
How to Configure nginx for Optimized Performance by Linode
Updated on April 29, 2017 at 6:47 pm
- How to Find PHP.ini Location
- How to Enable Display of Errors in PHP
- Go One Directory Level Up/Back with Dirname() in PHP
- Save a String to File in Python
- Create a File Name with Current Date & Time in Python
- Get Current Script Path in Python
- Python Wildcard Search a String or Array
- Get Current Script Path in PHP
- api-ms-win-crt-runtime-l1-1-0.dll is missing
- jQuery Get File Extension Before Uploading File
- Estimate Number of Visitors and Pageviews of any Website
- How to Configure OVH Email POP3 with Thunderbird
- Simple jQuery File Upload SimpleUpload.js
- Repeat AJAX GET Until a Specific Response is Met in jQuery
- Calculate Elapsed Time of a Function in jQuery
- Convert Seconds to H:M:S in jQuery