Blog: Validating Azure SQL Server Deployments Using Benchmark Tools

If you have been deploying SQL Server workloads to the Microsoft Azure platform using virtual machines, you may have noticed that not all virtual machines, solid-state drives, and network components are created equal. There might be issues in your deployment templates that could result in SQL Server not performing as well as expected. For example, in a presentation given by André Arko at StrangeLoop 2016, André claims that Netflix benchmarks AWS EC2 instances that they use for production and if it passes, they use it. If the EC2 instance doesn’t pass the benchmark, then they terminate the instance and spin up a replacement. According to André, the reason is that Netflix found that the same instance type could be up to 5 times slower than the baseline they established. For the number of instances that Netflix creates, that represents a significant operating cost. With this in mind, we are now offering a value-added service of validating Azure SQL Server deployments using HammerDB. This way, our customers know that they are getting the best Azure virtual machine possible to drive their production workloads.

This blog post describes a new service we offer customers for using the HammerDB TPC-C benchmark as a “burn-in” process for running production SQL Server 2017 instances on Azure virtual machines. We’ll show how we compare the results against our baseline library and determine if the virtual machine can be used for production or replaced.

Validating Azure SQL Server Deployments Overview

The problem with traditional CPU benchmarks like PassMark, or disk benchmarking tools like Windows Diskspd, or other open-source suites from places like, is that they don’t stress the system the way that SQL Server running an online transaction processing (OLTP) would. At DB Best, we’ve been using HammerDB for benchmarking physical, virtual, and cloud-based deployments of SQL Server for years. We’ve also used it to test deployments of Oracle, MySQL, IBM DB2, PostgreSQL, and Amazon Redshift as well. Depending on whether our customer is running OLTP or data warehouse (DW) solutions, HammerDB is ready for both. HammerDB includes an OLTP benchmark that is patterned after the original TPC-C benchmark. HammerDB also includes a DW benchmark that is based on the original TPC-H benchmark.

We want to make it very clear that HammerDB is not an official benchmarking tool that can be used to publicly compare benchmark results between database products or platforms.

However, we have found that HammerDB is a great tool that provides consistent results on a target platform that we want to deploy SQL Server solutions to for our customers. This way we can make sure they are getting the expected performance for their production deployments. If there is a weakness in the platform infrastructure or a miss-configured SQL Server, HammerDB will catch the problem system.

In addition, we recently did a project with Western Digital and DataON using HammerDB to show how DataOn’s new SQL Server 2017 solutions perform using Windows Server Hyper-Converged Infrastructure (HCI). The DataON engineering team is now using the benchmark testing framework we developed and is using it to burn-in systems before they ship them off to their customers.

One of our solution architects came up with the idea, why not do the same for SQL Server deployments on Azure? We already built out a framework for automating HammerDB to run benchmark machines. The only thing that we need to do is come up with a baseline production environment and then test it against the popular Azure VM instances that we use for customer deployments in various data centers and use these baselines for a thumbs up or down evaluation of the target VM.

The following diagram represents an overview of a single-server production deployment for validating Azure SQL Server deployments using HammerDB.

Azure Benchmark Architecture for SQL Server Validation

Accelerated networking enables single root I/O virtualization (SR-IOV) to a VM, greatly improving its networking performance. This high-performance path bypasses the host from the data path, reducing latency, jitter, and CPU utilization. This provides greater consistency when running the benchmark.

Configuring the HammerDB TPC-C benchmark

HammerDB provides a variety of parameters for configuring the benchmark run. First, we created a tpcc database with 1000 warehouses. This results in a database that is approximately 80 GB in size. The following image shows the distribution of records for the tpcc database.
1000 warehouse tpcc database tables report

We run the benchmark with a modified Fibonacci series of virtual users starting at 3 up to 233. With 1 and 2 virtual users, we don’t get statistically consistent results, so we ignore them in our cost analysis. We run the benchmark with two minutes of ramp-up time followed by three minutes of execution time to determine the transactions per minute for each of the 10 user configurations. The complete run takes 100 mins.

Configuring SQL Server for benchmark

In configuring SQL Server for the benchmark, we aren’t necessarily trying to optimize SQL Server to best run the benchmark. However, we do want to make sure we are getting the most out of the Azure infrastructure. Microsoft provides an article titled Performance guidelines for SQL Server in Azure Virtual Machines that we take into consideration for our benchmark and customer deployments.

Rather than use the Azure templates for SQL Server, we create our own image based on a snapshot of an installed version of SQL Server 2017 with the latest CU. In our case, CU12 build 14.0.3045.24. We then configure various drive configurations that cover the 1.75 TB to 2 TB in size to allow for plenty of growth. Depending on the size of the VM we use the following disk configurations:

  • 5×2 – Five P15 (256 GB) drives striped for the data drive and Two P15 (256 GB) drives striped for the log drive.
  • 7 – Seven P15 (256 GB) drives striped for both data and log files
  • P30 – Two P30 (1024 GB) drives striped for both data and log files
  • P40 – One P40 (2048 GB) drive for both data and log

We used Windows Storage Spaces to stripe the drives. Then we found that Storage Spaces performs a little better (1% to 3%) than using Windows Disk Manager. We are generally ok with combining the data and log drives on the same drive because of the built-in redundancy of Azure storage. However, it all depends on the workload, throughput, VM size, etc., and pointed out in the blog post by the SQL Server Engine team titled Storage Configuration Guidelines for SQL Server on Azure VM.

For the SQL Server database definition, we used multiple data files for the PRIMARY filegroup as follows:

-- --
-- --
-- 1,000 Warehouses TPC-C 100 GB --
-- --


USE master

-- create main database files
FILENAME = 'D:\Data\TPC_C_ROOT.mdf',
SIZE = 20GB,
FILENAME = 'D:\Data\TPC_C_DATA_1.mdf',
SIZE = 20GB,
FILENAME = 'D:\Data\TPC_C_DATA_2.mdf',
SIZE = 20GB,
FILENAME = 'D:\Data\TPC_C_DATA_3.mdf',
SIZE = 20GB,
FILENAME = 'D:\Data\TPC_C_DATA_4.mdf',
SIZE = 20GB,
FILENAME = 'D:\Data\TPC_C_DATA_5.mdf',
SIZE = 20GB,
FILENAME = 'D:\Data\TPC_C_DATA_6.mdf',
SIZE = 20GB,
FILENAME = 'D:\Data\TPC_C_DATA_7.mdf',
SIZE = 20GB,

FILENAME = 'E:\Log\TPCC_LOG_1.ldf',
SIZE = 20GB,

Other configurations include:

  • Moving TEMPDB to the temp storage drive using default sizes and the recommended files number of files based on the size of the VM.
  • Setting MAXDOP=1. We found that not doing this can cause deadlocks at the higher user counts.
  • Setting Recovery Mode on the database to Full. Although we are not taking transaction log file backups during the benchmark, this is likely the way we would configure databases for our customers with OLTP workloads.

Defining the baseline configuration

For purposes of this blog post, we are going to focus on three common configurations that we find are popular depending on the transactional throughput for our customers.

Instance ClassInstance NamevCoresRAM (GB)VM w/Windows $/moTemp storageMax cached IOPS / MBps (size in GiB)Max NICs / Network bandwidth (Mbps)
SmallD4s v3416$274.48328,000 / 64 (100)2 / 2,000
MediumE16s v316128$1,314.0025632,000 / 256 (400)8 / 8,000
LargeE64s v364488$4,953.78864128,000/1024 (1600)8 / 30,000

For our cost analysis, we don’t include the cost of the actual SQL Server license. Since in many cases, our customers transfer their existing negotiated licenses to Azure.

For our storage configurations and pricing as of November 6 2018 in US West 2, we use the following:

Disk ConfigBase Disk SizePrice per monthIOPS per diskMB/secTotal storagePrice per month
5×2 and 7256 GB$34.561,1001251.75 TB$241.92 (7 * $34.56)
P301 TB$122.885,0002002 TB$245.76 (2 * $122.88)
P402 TB$235.527,5002502 TB$235.52

Overview of our test automation process

We aren’t trying to achieve the best possible result for the OLTP benchmark. Basically, we save the tuning of the system when we go to deploy our customer’s database solution. We are simply trying to produce a consistent result that can be compared to our historical baseline for the instance type in the targeted Azure data center.

Here is the overall flow for the automation of the benchmark process.

  • Use an Azure Runbook to configure the HammerDB driver system and SQL Server test VM for a specific Subscription ID, Resource Group, Location, VPN, Subnet, Instance Type, etc.
  • Automation script for the SQL Server test VM completes the SysPrep installation of SQL Server and restores the tpcc database from a backup file with MAXDOP=1 and Recovery Model=FILL
  • Once the SQL Server test VM is ready, the HammerDB launches off the HammerDB CLI with an autopilot run with 3, 5, 8, 13, 55, 89, 144, 233 virtual users. Each test sequence takes 10 mins with a 2-minute ramp-up time and 3 minutes of executing time. The remaining 5 minutes are to let the system stabilize for the next run of virtual users.
  • Copy the files to Azure storage and then processed to extract the TPM metric for each of the virtual user runs
  • Compare the results to our benchmarked results to determine a thumbs up or terminate.
  • If the benchmark result is significantly better than our recorded value, we make a determination whether or not to use it as the new baseline. Azure is constantly working to improve the performance of their systems, so we need a way to easily update our baseline.
  • For the thumbs up system, we drop the test TPCC database and then release the VM to our customer for the actual database deployment.

An example of why Read-Cache for data disk drives matter

One of the recommendations from Microsoft for running OLTP workloads is to use Read-Only caching for the data drives. To see how this makes a difference, we ran our 5×2 configuration with and without Read-Only disk caching for the data drives for our Small, Medium, and Large VM configurations.

The following chart shows a comparison of our baseline data for the large VM, one with No-Cache and another with Read-Only Cache represented in Transactions per Minute as recorded by the HammerDB log.

Azure Large VM E64sv3 No-Cache vs Read-Only Cache

As you can see, using Read-Only cache for data drives performs over twice as good as using premium storage with No-Cache. In order to put this in perspective over a month’s period of time, we convert Transactions Per Minute to Transactions Per Month (B) by multiplying the recorded values for each user run by  525,600 minutes per year divided by 12 months per year divided by 1 billion.

Azure Large VM Txn per Month (B)

Let’s now associate a dollar figure for each user configuration. The E64 v3 large VM for West US 2 runs $4,953.78 per month. The associated 7 P15 drives of 256 GB run $241.92 per month for a total VM price – not including the SQL Server license – of $5,195.70.

We establish the cost per billion transactions over the month, taking the cost of the VM per month divided by the transactions per month (billions) for each user configuration.  Here is what the results look like for the large VM.

Azure Large VM Read-Only Cache vs No Cache for disks

Finally, we need to establish an all up number for the result that can be used for a go – no go result. To do so, we take the sum of the Cost Per One Billion Transactions Per Month for each of the virtual users runs for 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, and then divide by 10. This approach models variable number of users throughout the month running transactions with bursting represented by the higher user counts. The following chart shows our baseline values for the Large, Medium, and Small VM types using the model described.

Financial Impact of running without Read-Only Cache


As you can see, the cost of using Read-Only disk cache for your data drives is significantly less than using the default of No-Cache. When we do our go versus no-go analysis, this is the first check we make which is easily corrected – assuming that the value for the VM and storage is within our baseline criteria. We can then re-run the benchmark with the drives reconfigured.

Disk configurations matter, but so does VM size

We have found in our testing that choosing the right disk combination can make a significant difference in your choice for VMs. Let’s take a look at how the use of the 5×2 (P15), 7 (P15), 2 (P30), and 1 (P40) disk configurations can make a difference in your cost per one billion transactions. In the following charts, we show the baseline TPM (K) results for our comparison of the different disk configurations with the large, medium, and small VM types.

Azure Large VM Results for Different Disk Configurations

Azure Medium VM Results for Different Disk Configurations

Azure Small VM Results for Different Disk Configurations

When we go ahead and compute the monthly cost for one billion transactions, you can see how the different disk configurations align with the different VM sizes.

Combined results for small, medium, and large VMs and disk configurations

As you can see, with the Large VM, the single P40 disk worked best when averaged across the user configurations, even though 2 P30 drives had greater TPM values at the high end. With the Medium VM, the 2 P30 drive configuration had the best value overall because it performed consistently well across the different user configurations. The small VM configuration showed a slight advantage to using the 5×2 P15 disk configuration.

Using the baseline to determine VM and storage suitability

By having baseline values for different VM types and disk configurations, we can establish a way to determine based on the cost per month for one billion transactions if the Azure infrastructure is suitable for our customers. By using a dollar figure, it’s relatively easy to understand the impact of having poor performing infrastructure.

Broader applications of this technique

There are other scenarios where we can use HammerDB to create baseline results for scenarios like:

  • Comparing results against an Always On Availability Group deployment
  • Testing results of SQL Server running in Docker containers on Windows or Linux
  • Comparing results against single and replicated Azure SQL Database and Azure SQL Database Managed Instances for PaaS versus IaaS cost analysis
  • Testing of DW workloads against the baseline for SQL Server 2017 on IaaS, Azure Synapse Analytics, and Azure SQL Database Hyperscale with Clustered Columnstore

Our benchmarking framework is designed to work against all of these scenarios.

Key benefits of this approach

In summary, our customers benefit in the following ways:

  • The virtual machine and storage configurations meet or exceed the baseline configuration for their desired throughput in transactions per minute and value.
  • For customers who subscribe to our managed service, we use our DBMSys platform to continually monitor the VM and perform health checks to validate that the VM is running within our customer’s KPIs.
  • With the combination of the DMO analysis and our benchmark baseline, we can give our customers a better understanding of the cost per transactions per month up front to show the immediate value of running SQL Server on Azure.

If you are interested in how you can benefit from our benchmarking validation, please contact us today!

Share this...
Share on Facebook
Tweet about this on Twitter
Share on LinkedIn