Blog: Capturing performance counters to optimize Microsoft SQL Server performance on AWS EC2 instances

Over the last few months of benchmarking SQL Server performance on AWS, I discovered a way to optimize SQL Server performance running on EC2 instances by paying close attention to several key Windows performance counters. This blog post is the first in my new series called Power BI for the SQL Server DBA. I’m going to show how you can easily capture performance counters to optimize SQL Server performance and cost on AWS EC2 instances. I’m going to start with the basics and grow the Power BI reports during this series. You can find the source code, Power BI reports, and other related files on my GitHub site at https://github.com/db-best-technologies/power-bi-for-sql-dbas.

Here is a summary of what’s coming up in the series. It is subject to change based on feedback and reader demand:

  • Capturing performance counters to optimize Microsoft SQL Server performance on AWS EC2 instances — this blog post
  • Using Power BI to analyze Windows Perfmon data for SQL Server
  • Collecting business-critical transaction throughput for Power BI
  • Using Power BI to gather AWS EC2 price list data
  • Using Power BI to create measures for optimizing SQL Server storage performance
  • Normalizing Perfmon data and the start of a performance data warehouse
  • Incorporating other data like Glen Berry’s SQL Server Diagnostic Queries
  • Incorporating Query Execution Stats with Query Store and Open Query Store
  • And more…

Why Use Power BI as a SQL Server DBA?

Easy, it’s a great way to communicate with your team on what’s going on with SQL Server.

Imagine having an easy way to point out to your application developers the impact of recently deployed changes to the production server. For example,

I I noticed that prior to your latest check-in, that CPU utilization didn’t peg out to 100% until the number of users reached 200.

Power BI example showing CPU baseline

But now, the server is pegging out with 20 users.

Power BI example showing CPU bottleneck

I have some ideas on what might be happening.

Better yet, I think I found a way to save money on our AWS bill by changing our storage configuration with our SQL Server EC2 instances.

Why use Windows performance counters to optimize SQL Server performance on AWS EC2 instances?

There are many ways of capturing performance data for SQL Server. These include Extended Events, DMVs (Dynamic Management Views and Functions), SQL trace, and of course Windows performance counters. For Linux, I typically use code from LinuxPSSDiag on Github created by the SQL Server support engineers to get what I need.

I’m using Windows Performance Counters and PowerShell featuring the Get-Counter cmdlet in this scenario as a lightweight process that can collect data from SQL Server 2005 and later. What do I mean by lightweight? By using this technique, I’ve recorded no measurable impact after running 100’s of TPC-C and TPC-H performance runs using HammerDB CLI scripts.

Our objective? By having the IOPS, throughput, CPU, and memory data, you can make data-based decisions on determining the EC2 instance and storage depending on whether you are interested in getting the best performance for the cost or in reducing the overall costs while maintaining existing performance.

How are these two things different?

If you are looking for the best performance, it’s all about providing SQL Server with all the resources needed to get the job done as quickly as possible. However, it’s easy to over-provision.

If you are looking for value, you assume that the performance of SQL Server satisfies the current business requirements. So, give them the same performance without over-provisioning. Just make sure you can resize on-demand if more scale is needed.

Selecting the performance counters that matter

The set of counters I’ve chosen represents a smaller set that DB Best normally collects as part of our managed services.

There is lots of guidance from SQL Server experts on routine counters to capture. For example, I found this article useful for the essentials – https://www.sciencedirect.com/topics/computer-science/performance-counter.

In addition, this post goes into great detail on what disk counters mean – https://blogs.technet.microsoft.com/askcore/2012/03/16/windows-performance-monitor-disk-counters-explained/.

However, I haven’t found any published tips on the counters need to optimize storage to improve performance or help reduce the cost of storage while maintaining similar performance.

So, let’s look at the counters and how to collect them on the SQL Server under test using PowerShell.

Overview of the PowerShell script for collecting counters

At DB Best, we’ve developed an automation process using the GitLab Community Edition for source control and CI/CD tools. This system drives the creation of test VMs, execution of setup scripts, running of industry-standard benchmarks like TPC-C and TPC-H with HammerDB, collecting results and metrics, and then loading it all into a data model used by Power BI to display the results.

For most of you, this is overkill. Instead, I’ll stick with the Keep It Simple Stupid design approach.

Here is what the first script in the series that I have on GitHub – Blog Post 1 – Capturing Basic Performance Counters for SQL Server.ps1 does.

  1. Setup the variables for the run
  2. Create an array of the counters to collect
  3. Get the first sample before starting the workload
  4. Start the timer and then collect counters for the specified duration

I’ve added comments after each counter as to why I felt the counter was useful. I’ve also grouped the counters into three categories: CPU, Memory, and Disk. I left off Network since my process for optimizing disk performance assumes that the network latency is not a bottleneck.

I’m also assuming that for the initial benchmark the application database under test has separate drives for the database (E:) and log files (L:), tempdb (T:), and the backup files (G:). If you happen to have the log files on the same drive as the data files, there are SQL Server:Database counters that can get the IOPS and B/s values that I’ve included in the script.

I’m not including the optimization of the backup and restore process. The process that I’m using can be used to optimize the backup drives as well. But, that’s for a later blog.

Let’s dig into each section of the PowerShell script.

Setup the variables for the run

This script sets the variables for the collection process. I use the $Task variable to create a unique file name for the output CSV file. I also have code in step’s 3 and 4 to add the value into the CSV file as well. This way, I can eventually compare two runs against each other in Power BI.

####################################
# 1. Setup the variables for the run
####################################
$Task = "Blog-B1"   # Unique identifier for the test run. Used as part of the output file name for the results.
$perfmon_outfile = "C:\Temp\Task-$($Task)-PerfMon-Capture.csv"  # Name for the output file.

# Specify the number of seconds to collect the data and how often to check
$Timeout = 600      # Defines the number of seconds to capture the counters. Ex. 600 = 10 mins, 3600 = 1 hr, 28800 = 8 hr
$CheckEvery = 60    # Defines the number of seconds to wait before checking again

You will want to adjust the values for $Timeout and $CheckEvery based on the length of time you want to capture data. For example, if you want to collect data over a month, you might want to collect every hour.

Create an array of counters to collect

A few things before I start out regarding the counters and why I chose them.

  • CPU Categories.

    IMHO, getting the vCPU count right yields the greatest cost savings on all platforms. Just a handful of counters provides the essential data for helping to determine where to start in selecting an EC2 instance.

  • Memory Counter Categories.

    The memory counters for Windows and SQL Server will give you an idea as to whether memory is a bottleneck. If SQL Server is consuming all the server’s available memory, it can result in an increase of smaller IO reads and writes placing pressure on disk IO.

  • Disk Counter Categories.

    I’ll call this the second most influential set of counters for optimizing SQL Server performance and cost on EC2 instances. Between the different storage options and features like burstable IO, if you don’t have the right data, it’s easy to make a poor decision.

########################################################
# 2. Create an array of the counters you want to collect
########################################################
$Counters = @(

# Processor
  "\Processor(_Total)\% Processor Time"     # CPU usage provides a good way to identify patterns to investigate.
, "\Processor(*)\% Processor Time"          # This is helpful to see what is going on with individual vCPU trends.
, "\Processor(_total)\% Privileged Time"    # % time on kernel operations. If value is high, check AWS for EC2 driver patches.
, "\Processor(_total)\% User Time"          # % time spent on applications like SQL Server.

# \SQLServer:Workload Group Stats
, "\SQLServer:Workload Group Stats(*)\CPU usage %"  # % time SQL Server is spending on a specific Workload Group like default.

# Memory Counter Categories

# - Memory
, "\Memory\Available Kbytes"    # The Kbytes counter aligns nicely with SQL Server's (KB) scale.
, "\Memory\Committed Bytes"     # If Committed bytes is greater than physical memory, then more RAM will help.

# - Paging File
, "\Paging File(_Total)\% Usage"    # This is not really a Memory counter. A high value for the % Usage would indicate memory pressure.

# - SQL Server:Memory Manager
, "\SQLServer:Memory Manager\Database Cache Memory (KB)"# This is basically the buffer pool.
, "\SQLServer:Memory Manager\Free Memory (KB)"          # Represents the amount of memory SQL Server has available to use
, "\SQLServer:Memory Manager\Target Server Memory (KB)" # The amount of memory that SQL Server thinks it needs at the time
, "\SQLServer:Memory Manager\Total Server Memory (KB)"  # An approximation of how much the database engine is using.


# Disk Counter Categories

# IOPS counters - Reported as the average of the interval where the interval is greater than 1 second.
, "\LogicalDisk(_Total)\Disk Reads/sec"          # Read operations where SQL Server has to load data into buffer pool
, "\LogicalDisk(_Total)\Disk Writes/sec"         # Write operations where SQL Server has to harden data to disk
, "\LogicalDisk(_Total)\Disk Transfers/sec"      # Transfers (AKA IOPS) is approximately the sum of the Read/sec and Writes/sec

# Throughput counters - Bytes/sec - Reported as the average of the interval where the interval is greater than 1 second.
, "\LogicalDisk(_Total)\Disk Read Bytes/sec"     # Read throughput
, "\LogicalDisk(_Total)\Disk Write Bytes/sec"    # Write throughput
, "\LogicalDisk(_Total)\Disk Bytes/sec"          # Total throughput

# Block sizes for IO - Reported as an average for the interval. These are useful to look at over time
* to see the block sizes SQL Server is using.
, "\LogicalDisk(_Total)\Avg. Disk Bytes/Read"    # Read IO block size
, "\LogicalDisk(_Total)\Avg. Disk Bytes/Write"   # Write IO block size
, "\LogicalDisk(_Total)\Avg. Disk Bytes/Transfer"# Read + Write IO block size

# Latency counter - Avg. Disk sec/Transfer represents IO latency.
# This really isn't needed for the optimization, but it does verify volume configuration.
, "\LogicalDisk(_Total)\Avg. Disk sec/Transfer" # For gp2 drives, this value is generally around .001 sec (1 ms) or less.
                                                # SQL Server sys.dm_io_virtual_file_stats calls this io_stall_read/write

# Physical counters - We collect the same counters as the LogicalDisk, but the values are reported
# by drive letter. Same comments above apply.
, "\PhysicalDisk(* *)\Disk Reads/sec"
, "\PhysicalDisk(* *)\Disk Writes/sec"
, "\PhysicalDisk(* *)\Disk Transfers/sec"
, "\PhysicalDisk(* *)\Disk Read Bytes/sec"
, "\PhysicalDisk(* *)\Disk Write Bytes/sec"
, "\PhysicalDisk(* *)\Disk Bytes/sec"
, "\PhysicalDisk(* *)\Avg. Disk Bytes/Read"
, "\PhysicalDisk(* *)\Avg. Disk Bytes/Write"
, "\PhysicalDisk(* *)\Avg. Disk Bytes/Transfer"
, "\PhysicalDisk(* *)\Avg. Disk sec/Transfer"

# SQL Server:Databases - We can collect specific counters for the log operations if we want to later
#                        move the database log files to another volume.
, "\SQLServer:Databases(*)\Log Flushes/sec"          # Shows Write IOPS for all database log files.
, "\SQLServer:Databases(*)\Log Bytes Flushed/sec"    # Shows Write Bytes/sec for all database log files.

)

Get the first sample before starting the workload

The code below handles a general approach for capturing the performance counters defined for the $Counters array. Ideally, I could have created this as a PowerShell function, but I’ll save that for another day. The code captures one set of counters and formats the output using the PSCustomObject for exporting to a CSV file. For a great description of how the PSCustomObject works, check out Kevin Marquette‘s blog post Powershell: Everything you wanted to know about PSCustomObject.

######################################################
# 3. Get the first sample before starting the workload
######################################################
Get-Counter -Counter $Counters | ForEach-Object {   # Loops thru each performance counter in $Counters
    $_.CounterSamples | ForEach-Object {            # Take the array of CounterSamples to build a custom object
        [pscustomobject]@{                          # Define the [pscustomobject] as follows:
            "Task ID" = $Task                       # Task ID using the $task in step 1
             "Event Date Time (UTC)" = $_.TimeStamp # Event Date Time (UTC) using the TimeStamp for the collection
             "Performance Counter" = $_.Path        # Performance Counter using the counter path
             Value = $_.CookedValue                 # Value using the CookedValue based on the counter type.
        }
    }
} | `
Export-Csv -Path "$perfmon_outfile" -NoTypeInformation # Create the result CSV file from the data.

I used this technique rather than using the Export-Counter cmdlet because I could add additional properties at a low cost of processing at the time of collection. In this case, the Task ID that I used to uniquely identify test runs. I was also able to change the heading names for properties to something a little friendlier for Power BI.

In addition, since I’m using the same code within a timer loop, using the Continuous, SampleInterval, and MaxSamples parameters didn’t make sense.

As a general rule of thumb when using data in Power BI, the less you have to transform the data in Power BI using either Power Query M  functions or DAX (Data Analysis Expressions), the better the performance in Power BI for ad-hoc filtering.

Start the timer and then collect counters for the specified duration

I picked up this technique based on a blog post by Adam BertramUsing a Stopwatch in Powershell. This code illustrates how to use the .NET System.Diagnostics.Stopwatch class to create and use a timer object in PowerShell.  This code collects the performance counters every 60 seconds over a 600 second period based on the $CheckEvery and $Timeout variables. The counters are then appended to the $perfmon_outoutfile.

##############################################
# 4. Start the time and then collect counters.
##############################################

# Start the timer using the Stopwatch Class within the .NET Framework
# https://docs.microsoft.com/en-us/dotnet/api/system.diagnostics.stopwatch?view=netframework-4.8
$timersql = [Diagnostics.Stopwatch]::StartNew()

while ( $timersql.Elapsed.TotalSeconds -lt $Timeout )  # Loop while time remains
{
    Write-Host "Time remaining = $( $Timeout - $timersql.Elapsed.TotalSeconds )"
    # Time to sleep based on the value for $CheckEvery in seconds.
    # The wait is done here to make sure that the initial performance counters are captured.
    Start-Sleep -Seconds $CheckEvery

    # The wait is over, get the next set of performance counters
    Get-Counter -Counter $Counters | ForEach-Object {
        $_.CounterSamples | ForEach-Object {
            [pscustomobject]@{
                "Task ID" = $Task
                "Event Date Time (UTC)" = $_.TimeStamp
                "Performance Counter" = $_.Path
                Value = $_.CookedValue
            }
        }
    } | Export-Csv -Path "$perfmon_outfile" -NoTypeInformation -Append  # Results are appended to the CSV file

}

# That's it!
Write-Host "Go to the file $($perfmon_outfile) to see the results."

Once the loop is complete, the $perfmon_outoutfile is ready for analysis using Power BI.

Using VSCode with the Powershell Extension for running the performance counter script

When debugging PowerShell scripts, I like to use the simplicity of PowerShell ISE. However, Microsoft no longer supports the ISE. As a replacement, Microsoft recommends using Visual Studio Code (VSCode) with the PowerShell Extension. All the better, since VSCode has direct integration with Git. Also, VSCode has some great extensions for SQL Server like Microsoft’s own mssql. The mssql and PowerShell extension links above include the steps for installing everything.

Since I’m using an Administrator account for my EC2 instance, my examples show running VSCode as Administrator.

Opening the PowerShell script

I’m going to assume that you are not familiar with GitHub. Here is a simple approach to getting the file.

In your browser, go to https://github.com/db-best-technologies/power-bi-for-sql-dbas/tree/master/blog-1-capture-counters-using-powershell, right-click on the Blog Post 1 – Capturing Disk Based Performance Counters for SQL Server.ps1 file name, click on “Save Link As”, and then save the file on to your hard drive. You can then open the file in VS Code.

Once loaded, you have options to select text within the PowerShell script and then Run Selection using the F8 key. I like to do this with code I’m not familiar with to experiment with various features.

Post 1 Run Selection Example 2

VSCode also opens CSV files

PowerShell creates a CSV file in your C:\Temp directory as defined in line 18. I uploaded an example CSV file from a TPC-C run using HammerDB with increasing sets of virtual users with SQL Server 2017. You can find it at Task-Blog-B1-PerfMon-Capture.csv.

The great thing about VSCode, is there seems to be an extension for everything. For example, the Excel Viewer by GrapeCity displays CSV files inside of VSCode for an read-only Excel-like experience.

About the actual benchmark environment

For the benchmark run, I used an r5d.xlarge EC2 instance with a 300 GB EBS volume (E:) for the 250 GB data file, a 25 GB EBS volume (L:) for the 10 GB log file, and the local SSD (T:) for the tempdb files.

The test ran with 9 series of virtual users of 3, 5, 8, 13, 21, 34, 55, 89, and 144 users.  Each user run consisted of the following:

  • 2 minutes of ramp-up time to add users executing the TPC-C stored procedures that generate transactions
  • 3 minutes of a timed execution period where the system should be running at a steady-state
  • 3 minutes of time for active users to complete any transactions and then terminate the user connections

Here is an example of the run profile that looks at PhysicalDisk(E: and L:)\Disk Transfers/sec values.

Post 1 Disk Transfers Per Sec

In my next blog post, I’ll show how I used Power BI to create the chart above and go through some techniques that I’ve found helpful in optimizing EBS storage for SQL Server.

Share this...
Share on Facebook
Facebook
Tweet about this on Twitter
Twitter
Share on LinkedIn
Linkedin