Configuring compaction settings for SSTables

Maintain the good health of the Cassandra cluster by tuning compaction throughput for write-intensive workloads.

Cassandra might write multiple versions of a row to different SSTables. Often, each version has a unique set of columns that Cassandra stores with a different time stamp. As a result, the size of the SSTables grows, and the data distribution might require accessing an increasing number of SSTables to retrieve a complete row of data. Cassandra periodically merges SSTables and discards old data through compaction, to keep the cluster healthy.
Note: By default, Pega Platform provides a compaction throughput of 16 MB per second for Cassandra 2.1.20, and 1024 MB per second for Cassandra 3.11.3 (8 concurrent compactors). For high write-intensive workloads, you can increase the default compaction throughput to a minimum of 256 MB per second.
  1. For every Decision Data Store (DDS) node, add the following dynamic system settings.
    1. In the Pega-Engine ruleset, set the same number of concurrent compactors by adding the prconfig/dnode/yaml/concurrent_compactors/default property with the value that represents the number of CPU cores.
    2. In the Pega-Engine ruleset, configure the compaction throughput by adding the prconfig/dnode/yaml/compaction_throughput_mb_per_sec/default property with the following value: 256.

      For more information, see Using dynamic system settings.

      Determining the most appropriate compaction throughput setting is an iterative process. You can use the nodetool to adjust the compaction throughput for one node at a time, without requiring a node restart. In that case, any changes are reverted after the restart. For more information about the nodetool commands for compaction throughput, see the Apache Cassandra documentation.

  2. Restart all DDS nodes.
    For more information, see Managing decision management nodes.