Elastic Load Balancer and EC2 instance bandwidth
So we are working on a caching-related project on EC2. In this scenario high performance is very important.
We set up a Varnish cluster on EC2 and evaluate if it can replace an existing caching infrastructure in terms of costs and requests per second. Our benchmarks yielded some interesting results. It seemed that for our caching scenario the limiting factor is bandwidth. Varnish is very humble with CPU/RAM consumption. We could easily deliver 500 to 600 requests per second with a small instance and have the box idle around 95% (uncompressed content).
It turns out we are limited by bandwidth and not by CPU.
In our benchmarks we were only able to push 35 MB/s on small instances. So the actual requests per seconds were dependent on the object size we were pushing. The limit was always ~35 MB/s. Our typical HTML pages were around 50 to 70 KB, so we couldn’t reach the desired requests per second as our instance was at its bandwidth limit.
Usually when one instance hits its resource limits you load balance multiple ones. HAProxy is a fine example for a very robust TCP/HTTP load balancer. The problem is though, that it will not increase your bandwidth as all your traffic has to go through this one HAProxy instance. So even when you load balance multiple instances, each one is capable of pushing ~35 MB/s (—> ~350 MB/s with 10 small instances), the bottleneck will still be at ~35 MB/s (aka the load balancer).
So any load balancing that is driven by an EC2 instance will limit the bandwidth. The maximum you can get is the load balancer’s bandwidth. If you want/need more than that there is only network level load balancing left. Some more advanced load balancing solutions (ARP/router level) offer features like this.
The question was, can Amazon’s Elastic Load Balancer do this?
After setting up an Elastic Load Balancer configuring multiple instances as the backends the answer was: No, it can’t.
It seems like the Elastic Load Balancer is also limited to one, max. 1 Gig Ethernet connection (maybe also just a small EC2 instance?) and thus cannot increase the bandwidth over 35 MB/s. This is even more critical if you use larger instances as it actually decreases your bandwidth. More in a minute.
So with Elastic Load Balancer out of the question, the only available solution on EC2 is DNS Round Robin. There, you’ve heard it. Yes, the old and ugly DNS Round Robin.
DNS Round Robin will allow you to increase your bandwidth with every entry/instance you add. The only problem is that it is a bit inflexible and you can’t route the traffic yourself as DNS clients are picking each target/instance as they like. For a small number of instances (2-4 maybe) it is tolerable and solves our bandwidth problem.
Further, it seems the bigger the instance, the more bandwidth you have. Amazon does not guarantee any bandwidth but on the XL instances we guess that you have a physical server for yourself, so you can this box’ bandwidth for yourself. On the smaller instances the bandwidth is shared and thus can also be worse than our benchmarks.
So our solution is to use DNS Round Robin for wo to three HighCPU medium instances. This proved to be very cost-effective and the HighCPU medium instances push out more bytes per second than the small instances.
A follow-up post will show the exact number for each instance.
— @jweiss
