In the recent MySQL 8.0.16 release there is a new variable for the InnoDB storage engine called innodb_spin_wait_pause_multiplier
described as providing “greater control over the duration of spin-lock polling delays that occur when a thread waits to acquire a mutex or rw-lock” and “delays can be tuned more finely to account for differences in PAUSE instruction duration on different processor architectures”
This post aims to address what this about and whether you really need to be concerned about the difference in instructions on the different processor architectures for MySQL by testing the performance with HammerDB. Note that the main developer of HammerDB is Intel employee (#IAMINTEL) however HammerDB is a personal open source project and HammerDB has no optimization whatsoever for a database running on any particular architecture. Fortunately the HammerDB TPC-C/OLTP workload intentionally has a great deal of contention between threads and is therefore ideal for testing spin-locks.
So the PAUSE instruction is an optimization over NOP when a thread is waiting to acquire a lock and is particularly important in spin-locks on x86 CPUs for power and performance. However in the Skylake microarchitecture (you can see a list of CPUs here) the PAUSE instruction changed and in the documentation it says “the latency of the PAUSE instruction in prior generation microarchitectures is about 10 cycles, whereas in Skylake microarchitecture it has been extended to as many as 140 cycles.” and “as the PAUSE latency has been increased significantly, workloads that are sensitive to PAUSE latency will suffer some performance loss.” The impact on a database workload is not necessarily straightforward however as it depends on how much time that workload actually spends in spin-locks, fortunately as noted a HammerDB deliberately induces a great deal of contention so is at the top end of workloads that do.
Also it is interesting to note that the impact of this change was not observed in other databases or other MySQL storage engines such as MyRocks the only noticeable impact with HammerDB workloads occurred in MySQL with InnoDB where in the source code ut0ut.cc in the directory storage/innobase/ut the following section has UT_RELAX_CPU defined to call the PAUSE instruction on x86.
for (i = 0; i <delay * 50; i++) { j += i; UT_RELAX_CPU(); }
Note that this fixed value of 50 is multiplied by the parameter innodb_spin_wait_delay which has a default of 6 (but selects a value at random up to this value) so could be calling PAUSE up to 300 times. So innodb_spin_wait_delay has always been configurable in recent versions but now from MySQL innodb_spin_wait_pause_multiplier
is configurable also rather than requiring modification of the source code to do so. However as noted many factors affect spin-locks including how much time you actually spend in locks in the first place and therefore the best way to really determine how much time you have to spend in fine-tuning spin-locks in the real world can be done through testing.
So to test I took a system with Skylake CPUs and all storage on a P4800X SSD.
model name : Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
The OS was:
Ubuntu 18.04.1 LTS (GNU/Linux 4.15.0-23-generic x86_64)
I downloaded and installed the pre-built Linux binary from here:
mysql-8.0.16-linux-glibc2.12-x86_64
and set the following my.cnf
[mysqld] datadir=/home/mysql/mysql-8.0.16/mysql-8.0.16-linux-glibc2.12-x86_64/data language=/home/mysql/mysql-8.0.16/mysql-8.0.16-linux-glibc2.12-x86_64/share/english default_authentication_plugin=mysql_native_password socket=/tmp/mysql.sock port=3307 bind_address=127.0.0.1 # general max_connections=4000 table_open_cache=8000 table_open_cache_instances=16 back_log=1500 default_password_lifetime=0 ssl=0 performance_schema=OFF max_prepared_stmt_count=128000 skip_log_bin=1 character_set_server=latin1 collation_server=latin1_swedish_ci transaction_isolation=REPEATABLE-READ # files innodb_file_per_table innodb_log_file_size=1024M innodb_log_files_in_group=32 innodb_open_files=4000 # buffers innodb_buffer_pool_size=64000M innodb_buffer_pool_instances=16 innodb_log_buffer_size=64M # tune innodb_doublewrite=0 innodb_thread_concurrency=0 innodb_flush_log_at_trx_commit=0 innodb_max_dirty_pages_pct=90 innodb_max_dirty_pages_pct_lwm=10 join_buffer_size=32K sort_buffer_size=32K innodb_use_native_aio=1 innodb_stats_persistent=1 innodb_spin_wait_delay=6 innodb_spin_wait_pause_multiplier=50 innodb_max_purge_lag_delay=300000 innodb_max_purge_lag=0 innodb_flush_method=O_DIRECT_NO_FSYNC innodb_checksum_algorithm=none innodb_io_capacity=4000 innodb_io_capacity_max=20000 innodb_lru_scan_depth=9000 innodb_change_buffering=none innodb_read_only=0 innodb_page_cleaners=4 innodb_undo_log_truncate=off # perf special innodb_adaptive_flushing=1 innodb_flush_neighbors=0 innodb_read_io_threads=16 innodb_write_io_threads=16 innodb_purge_threads=4 innodb_adaptive_hash_index=0 # monitoring innodb_monitor_enable='%'
Initially I set doublewrite and flush_log_at_trx_commit and then deleted and rebuilt the schema to re-test with these parameters turned on.
I installed HammerDB 3.1 and used the client library from MySQL 5.7 installed in the home directory
libmysqlclient.so.20
and added this to the library path as follows:
export LD_LIBRARY_PATH=/home/mysql:$LD_LIBRARY_PATH
I created a HammerDB build file and called it using the source command with the hammerdbcli tool.
HammerDB-3.1$ cat innodbbuild400.tcl #!/bin/tclsh puts "SETTING CONFIGURATION" dbset db mysql diset connection mysql_host 127.0.0.1 diset connection mysql_port 3307 diset tpcc mysql_count_ware 400 diset tpcc mysql_partition true diset tpcc mysql_num_vu 64 diset tpcc mysql_storage_engine innodb print dict buildschema
Once the schema was built I shutdown and restarted the database and configured a test script as follows:
HammerDB-3.1$ cat innodbtest1.tcl #!/bin/tclsh proc runtimer { seconds } { set x 0 set timerstop 0 while {!$timerstop} { incr x after 1000 if { ![ expr {$x % 60} ] } { set y [ expr $x / 60 ] puts "Timer: $y minutes elapsed" } update if { [ vucomplete ] || $x eq $seconds } { set timerstop 1 } } return } puts "SETTING CONFIGURATION" dbset db mysql diset connection mysql_host 127.0.0.1 diset connection mysql_port 3307 diset tpcc mysql_driver timed diset tpcc mysql_rampup 1 diset tpcc mysql_duration 2 vuset logtotemp 1 loadscript puts "SEQUENCE STARTED" foreach z { 1 2 4 8 16 24 32 40 48 56 64 72 80 88} { puts "$z VU TEST" vuset vu $z vucreate vurun runtimer 240 vudestroy after 20000 } puts "TEST SEQUENCE COMPLETE"
and then ran the test leaving it unattended until it reported the message “TEST SEQUENCE COMPLETE”. In particular note that the parameter innodb_spin_wait_pause_multiplier remained at the default throughout.
HammerDB-3.1$ ./hammerdbcli HammerDB CLI v3.1 Copyright (C) 2003-2018 Steve Shaw Type "help" for a list of commands The xml is well-formed, applying configuration hammerdb>source innodbtest1.tcl
The following was the results in graph form:
with a peak performance of the following:
Vuser 1:64 Active Virtual Users configured Vuser 1:TEST RESULT : System achieved 2159378 MySQL TPM at 713488 NOPM
so we are generating more than 2 million MySQL Transactions per minute and more than 700,000 New Orders per minute without having modified any of the spin-loop parameters.
Now that the parameter is configurable I could also run some experiments:
mysql> show global variables like '%pause%'; +-----------------------------------+-------+ | Variable_name | Value | +-----------------------------------+-------+ | innodb_spin_wait_pause_multiplier | 50 | +-----------------------------------+-------+ 1 row in set (0.00 sec) mysql> set global innodb_spin_wait_pause_multiplier=25; Query OK, 0 rows affected (0.00 sec) mysql> show global variables like '%pause%'; +-----------------------------------+-------+ | Variable_name | Value | +-----------------------------------+-------+ | innodb_spin_wait_pause_multiplier | 25 | +-----------------------------------+-------+ 1 row in set (0.00 sec)
what I found was when reducing innodb_spin_wait_pause_multiplier signficantly to a value such as 5 I could achieve marginally improved performance up to 40 virtual users 643423 NOPM at a value of 5 compared to 626630 NOPM at 50 however beyond this point performance was signficantly lower and at 64 virtual users was only 278909 NOPM. The optimal performance remained at the default.
Further testing involved doubling the number of warehouses with the same results, however not partitioning the schema did bring some benefits in reducing innodb_spin_wait_pause_multiplier to a lower value.
The key finding was remarkably consistent with that of spin-lock configuration on other databases in that in the vast majority of use cases you simply do not have to worry about modifying these parameters from the default regardless of the processor architecture. However it is noted that there are many factors that input into home much time is spent in spin-locks in the first place until these parameters come into play, examples such as the speed of the memory and the storage and configuration of the schema with partitioning as well as the operating system. In particular the number of concurrent users is going to be a significant factor.
Nevertheless if you have an application generating millions of transactions per minute and have the opportunity to test and optimize surely the availability of more parameters to fine tune a database to a particular application is welcome because with each release MySQL is continuing to push the boundaries of performance over previous releases.