3. Choosing a Database for running TPC-H workloads

3. Choosing a Database for running TPC-H workloads

TPC-H workloads run complex queries scanning large volumes of data and therefore require the use of database features such as parallel query and in-memory column stores to maximise performance. With the available HammerDB TPC-H based workloads the three databases that support these features are the Enterprise Editions of Oracle, SQL Server and Db2 and therefore these databases will deliver the best experience for building and running TPC-H. Over time there has been improvement with open-source and open-source derived databases in the ability to run TPC-H workloads. For example PostgreSQL supports Parallel Query and the PostgreSQL derived versions of Amazon Redshift and Greenplum offer further accelerated query solutions.MySQL does not support an analytic storage engine however the MariaDB column store storage is best suited for running analytic tests against MySQL. Nevertheless it is known that with some or all of the open source solutions a number of queries either fail or are extrmemly long running due to the limitations of the databases themselves (and not HammerDB) therefore these workloads should be viewed in an experimental manner as they will not result in the ability to generate a QphH value.

3.1. Oracle

The Oracle database is fully featured for running TPC-H based workloads and presents two options for configuring the database either row oriented parallel query or the In-Memory Column Store (IM column store). Both of these configurations are able to run a full TPC-H workload and are configured on the database as opposed to configuring with HammerDB.

3.2. Microsoft SQL Server

SQL Server is able to support a full TPC-H workload and offers row oriented parallel query as well as in-memory column store configured. The clustered columnstore build is selected through the HammerDB Build Options.

Figure 9.9. Clustered Columnstore

Clustered Columnstore

3.3. Db2

Db2 can support a full TPCH workload through row oriented parallel query and Db2 BLU in-memory column store. The column store is selected through the Db2 Organize by options.

Figure 9.10. Db2 Organize By

Db2 Organize By

3.4. PostgreSQL

PostgreSQL supports standard row oriented parallel query. This offers significant performance improvement over single-threaded queries however not all queries at all schema sizes are expected to complete without database error and some run for a significant length of time. Options are also available to run the PostgreSQL queries against a Greenplum database. It is important to be aware that because of the Greenplum MPP architecture there is significant overhead in processing INSERT operations and therefore data should be loaded in bulk after generating with the HammerDB datagen operation.

Figure 9.11. PostgreSQL TPC-H

PostgreSQL TPC-H

3.5. MySQL / MariaDB

MySQL does not support row oriented parallel query or a column store configuration and therefore queries run against a MySQL database are expected to be long-running. However the MySQL compatible MariaDB supports a separate installation of a column store based database which offers significantly improved query times. However some queries will result in database errors or long running queries. This option is selected with the Data Warehouse Storage Engine Option.

Figure 9.12. MySQL MariaDB TPC-H


3.6. Redis

Redis does not support analytic workloads and therefore HammerDB does not have a TPC-H workload for Redis.