Nginx as load balancer (much slower than native, am i expecting too much?)

I spent the weekend testing out load balancing with various configurations - using docker, nginx and various proof-of-concept REST api servers. All to help make decisions regarding what to use in the future of a project.

Long story short, I have a REST application which handles 134k requests/s (a test case) when run natively. Its a highly optimized framework (giraffe/kestrel). This performance is expected on the hardware I have.

As soon as I put it behind nginx (same box) with upstream load balancing the requests fall to 24k/s.
I’ve tried various configurations - running 4 instances of the application (docker) or one instance - it makes no difference.

I am using autocannon to measure rps scores, running it from a mac-mini with application/nginx running on i5-6600k dev server.

My question is… am I expecting too much? Is nginx just inherently slow? Is it normal to lose this much performance with a load balancer? I don’t have deep experience with horisontal scaling, but I was under assumption that at most there would be a latency hit, not massive throughput hit like this.

Would appreciate your thoughts.

1 Like

I don’t think that nginx is that slow usually. Might be the overlay network of docker? I’m not sure. Have you tried traefik or haproxy?

1 Like

It depends.

Check the Nginx worker processes and worker connections settings in your nginx conf, /etc/nginx/nginx.conf (for most systems).

There are by default 512 connections per defined worker process (this can be increased). So check that worker_processess is set to auto.

https://nginx.org/en/docs/ngx_core_module.html#worker_processes

Worker Processes

The optimal value depends on many factors including (but not limited to) the number of CPU cores, the number of hard disk drives that store data, and load pattern. When one is in doubt, setting it to the number of available CPU cores would be a good start (the value “ auto ” will try to autodetect it).

https://nginx.org/en/docs/ngx_core_module.html#worker_connections

Worker Connections
Sets the maximum number of simultaneous connections that can be opened by a worker process.

It should be kept in mind that this number includes all connections (e.g. connections with proxied servers, among others), not only connections with clients. Another consideration is that the actual number of simultaneous connections cannot exceed the current limit on the maximum number of open files, which can be changed by worker_rlimit_nofile.

This bit right here:

The number of connections is limited by the maximum number of open files (RLIMIT_NOFILE) on your system

5 Likes

24kQPS is slow.

I’ve never heard of giraffe/kestrel before, it seems to be some kind of .net CLR framework.

What kind of requests are you sending?
Are you trying to run nginx on Windows somehow?

For reference, here’s nginx on my chromebook (linux vm on an i7-10610U / 15W part) serving index.html that comes with nginx:

[email protected]:~$ wrk -d 10 http://localhost
Running 10s test @ http://localhost
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   345.83us  535.09us  18.78ms   91.34%
    Req/Sec    20.39k     3.52k   44.91k    81.59%
  407663 requests in 10.10s, 330.44MB read
Requests/sec:  40362.63
Transfer/sec:     32.72MB

I’d expect around 50k and up on a more powerfull cpu than this chromebook.

2 Likes

I did make some notes of this in my own configuration when I was talking about hardening. There’s a few things you can do … Let me edit in what I have

Heres my events block

## Events Block
events {
        # High Throughput Settings
        worker_connections 65535;
        multi_accept on;
        use epoll;
}

There are some things you can do to optimize the HTTP{} block too which I assume you are load balancing… Id have to pick mine apart a bit… so im gonna go see what I have in there on mine. Of course YMMV

I have the following additional parameters: (mostly caching and buffer tweaks as well as TCP tweaks and timeouts)

    sendfile            on;
    tcp_nopush          on;
    tcp_nodelay         on;
    keepalive_timeout   90;
    types_hash_max_size 4096;
    client_body_buffer_size 10K;
    client_header_buffer_size 1k;
    client_max_body_size 30G;
    large_client_header_buffers 2 1k;
    server_tokens off;
    client_body_timeout   32;
    client_header_timeout 32;
    reset_timedout_connection on;
    open_file_cache max=200000 inactive=20s;
    open_file_cache_valid 30s;
    open_file_cache_min_uses 2;
    open_file_cache_errors on;

ive also turned on GZIP compression on everything compressible

    gzip on;
    gzip_disable "MSIE [1-6]\.";
    gzip_proxied expired no-cache no-store private auth;
    gzip_comp_level 9;
    gzip_min_length 500; # Reminder: Default 20
    gzip_types 
	text/html
	text/css
	text/javascript
	text/xml
	text/plain
	text/x-component
	application/javascript
	application/json
	application/xml
	application/rss+xml
	font/truetype
	font/opentype
	application/vnd.ms-fontobject
	image/svg+xml;
    proxy_connect_timeout       600;
    proxy_send_timeout          600;
    proxy_read_timeout          600;
    send_timeout                600;

You also could try to move to elliptic curve encyption as much as possible if you find its the handshake that is slowing things down. Thats in the hardening thing I wrote

Thats most of my work and its solid now

I hated figuring this out the first time lol


be aware

client_max_body_size 30G;

Is a setting to match my nextcloud through the reverse proxy. You dont necessarily need to set this.

Dynamic just informed me in casual chit chat that you can indeed set this to unlimited if you dont want to tune it

Also if the servers aren’t equal in grunt… List them in the balancer from most powerful to least. NGINX reads in order and balances in order. Not sure if it round robins but this is what I’ve found

3 Likes

Gentlemen, thank you for your suggestions. I ended up digging further into nginx than I ever had in last 10+ years of setting it up as reverse proxy in various environments. Its pretty excellent software.

This is REST only proxy, so no files are ever read. Tried everything, ignoring body, turning compression off/on, forcing http1.1…but it just would not budge past 25k rps.

I know there are more sophisticated debug tools that can show whats really slowing it down, but this is essentially a pass-through URL that returns some json - a step above ‘hello world’.

I was going to reinstall whole system, since this dev system is ubuntu desktop 20.04 that has been distro upgraded several times - my only guess was there was something wrong with networking that is maybe mis-configured and out of my depth.

So I’ve tried HAProxy, just to confirm that its not isolated to nginx - and it handled 114k rps. Running on same host its reasonable. I’ve ended up going with it for now (its also as simple to set up as nginx, unlike traefik). There are drawbacks but I will revisit if serving static content becomes requirement.

Thank you for your help!

1 Like

If its a dev system and not prod… I hate to suggest but why not Fedora or arch. If your hell bent on reinstalling it might be prudent to install something more “agile” to say the least. Then again I dont know any details of your setup and if this conflicts with that dont follow that advice…

Sounds good dude. thats territory im not familiar with. NGINX is my swiss army knife but if HA works better use it :wink:

anytime :slight_smile:

2 Likes

I would love to, arch sounds really cool. I am keeping it Ubuntu because most of NVidia + Tensorflow packages are configured for Ubuntu/Debian. Path of least resistance. In production its almost always an image container.

HA is a scalpel, no illusions about it.

1 Like