Playing with Server Architecture Optimization

Optimizing resource usage in the cloud is an involved task. Whether it's staying within a budget or quota, or simply benchmarking a proposed architecture for how efficient it is, finding out the bottleneck can be an time-consuming process. One of the largest advantages that cloud offers is the flexibility to scale, alter, and otherwise experiment with architecting your application stack with ease. Both the DAIR and the Rapid Access Cloud projects provide test cloud infrastructure for just this kind of purpose.

One question we get when we do various talks is for some hard numbers and comparisons between different architectures. We're hoping that this blog post can help by giving some concrete numbers and pointing out the most common bottlenecks.

The most common locations  to look for bottlenecks are in the CPU, memory usage, disk throughput, and network throughput. Eventually one of the four will be maxed out and the other three will be waiting on the bottleneck. Our goal is to identify where these slowdowns occur most often and how to optimize this away.

What are we measuring?

In this example we will be restricting ourselves to the default DAIR quota of four cores and 8GB of RAM, and we measured the requests per second, along with noting the average response time of our four different architectures. Additionally, we ran iotop (for IO), top (for CPU), and iftop (for network) to help ascertain where our bottlenecks were.

The requests per second and average response time give us a useful number for knowing how many people areable to use our service at a time, while the individual aspects helped us with our conclusions about the architecture design.

Test Details

Our test did not perform any optimization, and used the default settings out of the box when installed on a VM running Ubuntu 14.04. The web server (Apache), application (WordPress with a 1 MB uncompressed front page), and the database (MySQL) are all things that can be easily optimized after some rough benchmark numbers are seen.

The tests were run using a HTTP benchmark application called siege on the same 10GB network. Cybera ran this application against the front page of the website. For the second column to emulate production traffic, we used siege's delay feature to add random delays between requests. This way the connections per second could be compared to active users using the site at that time. For this test we chose 30 constant connections and simulated having 60 users access the site.

Our results can be found in the table below measured in average transactions per second for the tests:


30 Connections (Concurrent)

60 Connections (Staggered)

One Big Server



Split Servers






Standard with Cache



You can view the exact configuration of the tests (and run it yourself if you wish) on GitHub:


We used four generic architectures to help highlight the tradeoffs found with splitting a workload across servers.

One Big Server

The first architecture created one large server (4 CPU, 8 GB of RAM) that consumed all of our quota, and we installed each of the pieces, applications and database onto it. This gave us a fairly decent baseline to complete roughly ten transactions per second. (That would be enough for 36,000 users per hour)


Split Server


Our second architecture introduced the idea of splitting the two components up by placing the application on its own server and the database on it's own server. Each server is sized the same – 2 CPU and 2GB of RAM.

Monitoring our servers (using top, vm_stat, and other tools) quickly highlighted that the database server was seeing very little load, while the application server was CPU constrained (and memory was not a factor). Since we had just halved the number of CPUs available to the application, it was not surprising to see our throughput roughly halved, along with a small latency penalty due to the introduction of network traffic between our application server and the database.



Our third architecture followed the increasingly common approach of placing a load balancer in front of several application servers while other components were also placed  on their own servers. This promotes and encourages horizontal scaling for both performance and reliability, and is the recommended starting point for web application architectures.

Each of the servers was significantly smaller than the second architecture (1 CPU and 512MB of RAM), but we still saw a similar performance. This further showed that the main bottleneck in our setup was how long it was taking the application to render the data. We saw a slight penalty as well due to the fact that we now had some extra work in the network going between the load balancer and the application server. Ideally with a larger quota, we would try and create more or larger application servers and keep a smaller database server.

Standard with Cache

Our fourth architecture was the first to focus more on the bottleneck: the time it takes for the application to create the page and whether or not this action needs to be taken. Since the page was not dynamic, we could introduce some heavy caching (cache the results for 10 minutes) on the load balancer to try and avoid the bottleneck all together. It did that with flying colours (~100x increase) – instead of running into the bottleneck of our test itself; the test server couldn't request fast enough to outpace the load balancer.


We hope you're able to take at least three things away from this look at architectures with a limited quota:

  1. Larger machines do give better performance, but there's physical and organizational limits that come with their size.

  2. Just because we can throw virtual hardware at the problem doesn't mean the issue can best be solved by throwing virtual hardware at the problem

  3. Try things out and be sure to measure performance