Apache Cassandra is the primary example of a backing technology that underpins the Decision Data Store data set. The following sections provide an overview of the most important Cassandra features in terms of scalability, data distribution, consistency, and architecture.
Cassandra handles the database operations for Pega Platform decision management by providing fast access to the data that is essential in making next-best-action decisions in both batch and real time.
- Distributed and decentralized
- Cassandra is a distributed system, which means that it is capable of running on multiple machines while appearing to users as a unified whole. Every node in a Cassandra cluster is identical. No single node performs organizational operations that are distinct from any other node. Instead, Cassandra features a peer-to-peer protocol and uses gossip to synchronize and maintain a list of nodes that are alive or dead.
- Elastically scalable
- The responsibility for data storage and processing is shared across many machine environments, to reduce the reliance on any one environment. Instead of hosting all data on a single server or replicating all of the data on all servers in a cluster, Cassandra divides portions of the data horizontally and hosts it separately.
- In Cassandra, a read operation returns the most recently written value. For fault tolerance reasons, data is typically replicated across the cluster. You can control the number of replicas to block for all updates, by setting the consistency level against the replication factor.
- The replication factor is the number of nodes in the cluster to which you want to propagate updates through add, update, or delete operations, and determines how much performance you give up, in order to gain more consistency.
- The consistency level controls how many replicas in the cluster must acknowledge a write operation, or respond to a read operation, in order to be successful.
- For example, you can set the consistency level to a number equal to the replication factor to gain stronger consistency at the cost of synchronous blocking operations, which wait for all nodes to be updated in order to declare success.
- Row and column-oriented
- In Cassandra, rows do not need to have the same number of columns. Instead, column families arrange columns into tables and are controlled by keyspaces. A keyspace is a logical namespace that holds the column families, as well as certain configuration properties.
Decision Data Store
The Decision Data Store is the repository for analytical data from a variety of sources and is deployed as part of the Pega Platform node cluster. The Decision Data Store consists of nodes that connect to an external Cassandra cluster using one-to-many relationships, as shown in the following figure:
Each node that comprises the Decision Data Store handles data in JSON format for each customer, from different sources. The data is distributed and replicated around the cluster, and is stored in the node file system.
It is also possible to configure a Decision Data Store node cluster that used a Cassandra database in embedded mode, as shown in the following figure:
This type of configuration is deprecated starting in Pega Platform version 8.6 and therefore does not ensure future compatibility. It is not recommended for new deployments. However, it is still supported for systems that have been updated from earlier versions of Pega Platform to the current version.
Supported configurationsPega only supports Cassandra distributions that are based on genuine Apache Cassandra. Note that while many distributions state that they are Cassandra compatible, there may be some caveats that cause incompatibility with Pega Platform. Pega only supports Instaclustr and DataStax Enterprise (DSE). You can find examples of such distributions on the Apache Cassandra website.
In Pega Cloud environments, the Cassandra database is preconfigured. No action is required.
In on-premises and client-managed cloud environments, you have the following deployment options for Cassandra:
- In this model, you need to install and operate your own Cassandra cluster, and connect it to Pega Platform. This option is recommended as it ensures future compatibility. For more information, see Connecting to an external Cassandra database through the Decision Data Store service.
- Managed (embedded, internal)
In this model, the nodes that are designated to hosting the Decision Data Store have their Cassandra Java virtual machine (JVM) started and stopped for them, by the JVM that is hosting the Pega Platform instance. This type of configuration is deprecated starting in Pega Platform version 8.6 and therefore does not ensure future compatibility. It is not recommended for new deployments. For more information, see: