josiahsullivan
Tera Contributor

This is Part 3 of the Cost Modeling Your Cloud series.

 

So how does ServiceNow make rational decisions about what to buy when building our cloud? And how can you?

 

** Disclaimer: Math ahead. Actual costs and data points obfuscated for confidentiality. **

 

Be consistent

 

As a lean organization, we don't have the luxury of a 50-man team just for testing hardware. We had to hack the evaluation process into the essentials.

 

First, generate a baseline. Simply run a test against whatever you are currently using. Do your best to use a benchmark that is similar to your production workload. A 3D rendering benchmark (e.g. Frames Per Second) would be largely irrelevant to DB performance measured in Transactions Per Second (TPS).

 

 

Second, create evaluation sets. Let's face it: one could test and compare an infinite number of configurations, but that would require infinite resources.

 

In 2012 ServiceNow was evaluating more than a thousand possible configs (multiple CPUs, RAM configs, disk form factors,   RAID configs, NUMA settings, and file systems) for the 2013 hardware platform. Some of these tests required additional test passes at various load levels (e.g. 1,2,4,8,16,32,64,128 threads), resulting in tens of thousands of possible test results.

 

However, by creating evaluation sets to cost model similar changes, we quickly got a sense of which individual changes were most valuable.

 

If you were testing memory configurations, a change set might include:

4x8GB     LV 1333 DIMMS

4x8GB               1600 DIMMS

2x16GB LV 1333 DIMMS

2x16GB           1600 DIMMS

 

Here is a CPU and memory evaluation set built into the cost model from Part 2. This allows you to get a general sense of which config most effectively aligns with your business priorities (cost vs. performance, etc.). You can tell that our workload sees minimal performance gains from 1866 MHz memory (SKU B and D), and that the increased CAPEX and OPEX make it more expensive per performance unit. This low ROI actually led us to continue running 1600 Mhz for the 2014 hardware platform.

evalsets.png

 

 

Third and finally, build a pilot from the best options in the evaluation sets, and test again.

 

whack-a-mole.jpg

Bottlenecks are like whack-a-mole, and improvements almost never combine linearly. Rather, adding 2% and 2% improvements tends to equal 1.5% or 6% more often than 4%. You won't know how much difference is actually made until you test the final combination. On paper, we expected to improve performance per dollar by 37% this year but it ended up being closer to 44% when changes were tested in harmony.

 

 

By applying these methods, we are able to run less than 100 tests to identify the best configurations out of tens of thousands of possibilities.

  1. Define a baseline
  2. Create and test evaluation sets against the baseline
  3. Build a pilot from the best eval set results and test against the baseline

 

Be smart: don't test everything. But this only works if you can be consistent, test after test, month after month, year after year.