Diagnosing CPU faults and performance with HammerDB

An often overlooked aspect of database benchmarking is that it should be used to stress test databases on all new hardware environments before they enter production. In this post we will look at an example of issue diagnosis and resolution with the desktop CPU Intel® Core™ i9-14900K as well as examples of measuring database performance with both the performance and efficient cores with this CPU. We will look at some initial TPROC-C tests with SQL Server 2022 and Windows 11 with what we learned from the configuration.

APPCRASH and Diagnosis

To begin with, running in the default environment with a low number of Virtual Users works as expected, however as soon as we start to increase the number of Virtual Users, HammerDB crashes with an example error as below.

Faulting application name: wish90.exe, version: 9.0.2.0, time stamp: 0x6749d6f6
Faulting module name: tcl90.dll, version: 9.0.2.0, time stamp: 0x6749d49c
Exception code: 0xc0000005
Fault offset: 0x00000000001577ec
Faulting process id: 0x2510
Faulting application start time: 0x1DB44CF52D8545D
Faulting application path: E:\HammerDB-5.0-Win\HammerDB-5.0\bin\wish90.exe
Faulting module path: E:\HammerDB-5.0-Win\HammerDB-5.0\bin\tcl90.dll
Report Id: 364bd54e-7da0-4984-9d78-eb6c4ee343ea
Faulting package full name: 
Faulting package-relative application ID:

Looking in the Windows event logs we can see associated errors related to CPU instability.

A corrected hardware error has occurred.

Reported by component: Processor Core
Error Source: Corrected Machine Check
Error Type: Internal parity error
Processor APIC ID: 32

whilst every workload we run resulting in application crashes or BSOD with errors such as CLOCK_WATCHDOG_TIMEOUT, clearly highlighting a problem with our configuration.

Whereas a user might initially look to errors in the application as the cause of the issue, searching for the errors we see in the system logs leads us to the following link Vmin Shift Instability Root Cause specifically related to the Intel® Core™ i9-14900K with the resolution given as follows “For all Intel® Core™ 13th/14th Gen desktop processor users: the 0x12B microcode update must be loaded via BIOS update and has been distributed to system and motherboard manufacturers to incorporate into their BIOS.”

We have therefore diagnosed and resolved an issue with database performance through stress testing the platform with HammerDB.

Testing the Solution

Now we have updated the system BIOS with the 0x12B microcode update, the system or application no longer crashes and the schema build completes as expected.

We can now run a test for a longer duration with the system exhibiting stability

at 2M SQL Server TPM

and a highly respectable throughput of 800K+ NOPM for a desktop environment.

CPU affinity and calibration

However we can see from the HammerDB CPU metrics that of the 32 logical CPUs available the first 16 cores appear to be underutilised during the test. (Another important CPU check is to identify individual cores with high levels of system utilisation, that could identify individual cores overburdened with interrupts). Additionally if we check the configuration of the i9-14900K we can see that there are 24 cores in this CPU of which 8 are performance cores,  16 efficient cores and hyper-threading  is only available on Performance-cores, therefore the first 16 logical CPUs on the system could be the 2 x logical CPUs of the performance cores.

With SQL Server we can check this by manually setting the Processor Affinity

and using a single-threaded CPU test as described in the HammerDB documentation.

In this case we create and run the following routine on SQL Server

USE [tpcc]
GO
SET ANSI_NULLS ON
GO
CREATE PROCEDURE [dbo].[CPUSIMPLE] 
AS
   BEGIN
      DECLARE
         @n numeric(16,6) = 0,
         @a DATETIME,
         @b DATETIME
      DECLARE
         @f int
      SET @f = 1
      SET @a = CURRENT_TIMESTAMP
      WHILE @f <= 10000000 
         BEGIN
      SET @n = @n % 999999 + sqrt(@f)
            SET @f = @f + 1
         END
         SET @b = CURRENT_TIMESTAMP
         PRINT ‘Timing = ‘ + ISNULL(CAST(DATEDIFF(MS, @a, @b)AS VARCHAR),”)
         PRINT ‘Res = ‘+ ISNULL(CAST(@n AS VARCHAR),”)
   END

Running this test firstly on firstly CPUs 0-15 and then 16-31 confirms our expectations that with a single-thread 0-15 completes the test in just over 5 seconds

Timing = 5350
Res = 873729.721235

(1 row affected)

Completion time: 2024-12-03T13:22:00.6322394+00:00

and 12 seconds on CPUs 16-31

Timing = 12013
Res = 873729.721235

(1 row affected)

Completion time: 2024-12-03T13:22:59.1543056+00:00

Using this knowledge what if we set for example the CPU affinity to the first 24 CPUs only. Re-running the test show we are now using the performance cores

and seeing a boost in performance

and when the test completes we have now reached the symbolic 1M NOPM in a desktop environment, importantly without any system instabilities.

As we have run the CPU single-threaded tests in T-SQL for completeness we can also run the same test within HammerDB itself, with an elapsed time of just over 1 second showing that the client side logic of HammerDB offers high levels of performance compared to database stored procedures.

We can also verify from system tools that the CPU is reaching the levels of frequency expected from the configuration.

(Note that as the test completes so quickly you can increase the number of iterations in the loop to observe the boost in frequency for longer.)

Summary

What we observed from these tests is that ALL hardware and software environments should be stress tested, up to and beyond expected operating conditions to identify potential issues. In this example we found a repeatable way with HammerDB to identify and resolve a CPU instability issue, however issues can also occur for example at the memory and I/O layers, stress testing and observing system logs is the best practice to identify these before the configuration is deployed in production.

We also used the HammerDB CPU metrics to observe the distribution of the CPU utilisation across the available CPUs and use SQL Server CPU affinity to improve performance by 17% over the default configuration using the knowledge we had of this particular CPU’s configuration.

Author