This documentation site is for previous versions.

Visit our new documentation site for current releases.

Cassandra overview

Updated on March 11, 2021

Apache Cassandra is the primary example of a backing technology that underpins the Decision Data Store data set. The following sections provide an overview of the most important Cassandra features in terms of scalability, data distribution, consistency, and architecture.

Apache Cassandra

Cassandra handles the database operations for Pega decision management by providing fast access to the data that is essential in making next-best-action decisions in both batch and real time.

Distributed and decentralized: Cassandra is a distributed system, which means that it is capable of running on multiple machines while appearing to users as a unified whole. Every node in a Cassandra cluster is identical. No single node performs organizational operations that are distinct from any other node. Instead, Cassandra features a peer-to-peer protocol and uses gossip to synchronize and maintain a list of nodes that are alive or dead.
Elastically scalable: The responsibility for data storage and processing is shared across many machine environments, to reduce the reliance on any one environment. Instead of hosting all data on a single server or replicating all of the data on all servers in a cluster, Cassandra divides portions of the data horizontally and hosts it separately.
Consistent: In Cassandra, a read operation returns the most recently written value. For fault tolerance reasons, data is typically replicated across the cluster. You can control the number of replicas to block for all updates, by setting the consistency level against the replication factor.; The replication factor is the number of nodes in the cluster to which you want to propagate updates through add, update, or delete operations, and determines how much performance you give up, in order to gain more consistency.; The consistency level controls how many replicas in the cluster must acknowledge a write operation, or respond to a read operation, in order to be successful.; For example, you can set the consistency level to a number equal to the replication factor to gain stronger consistency at the cost of synchronous blocking operations, which wait for all nodes to be updated in order to declare success.
Row and column-oriented: In Cassandra, rows do not need to have the same number of columns. Instead, column families arrange columns into tables and are controlled by keyspaces. A keyspace is a logical namespace that holds the column families, as well as certain configuration properties.

Decision Data Store

The Decision Data Store is the repository for analytical data from a variety of sources and is deployed as part of the Pega Platform node cluster. The Decision Data Store consists of nodes that connect to an external Cassandra cluster using one-to-many relationships, as shown in the following figure:

Pega Platform outer cluster contains DDS node that connect to external Cassandra nodes. — Pega Platform connections to an external Cassandra database

Each node that comprises the Decision Data Store handles data in JSON format for each customer, from different sources. The data is distributed and replicated around the cluster, and is stored in the node file system.

It is also possible to configure a Decision Data Store node cluster that used a Cassandra database in embedded mode, as shown in the following figure:

Pega Platform outer cluster contains the DDS node cluster that connects to the internal Cassandra servers. — Decision Data Store node cluster using an internal Cassandra database

This type of configuration is deprecated starting in Pega Platform version 8.6 and therefore does not ensure future compatibility. It is not recommended for new deployments. However, it is still supported for systems that have been updated from earlier versions of Pega Platform to the current version.

Supported configurations

Pega Platform supports the following Cassandra configurations:

Apache open source Cassandra
- As proven with Instaclustr offering
- Apache open source (DIY)
DataStax Enterprise

Deployment options

In Pega Cloud environments, the Cassandra database is preconfigured. No action is required.

In on-premises and client-managed cloud environments, you have the following deployment options for Cassandra:

External

In this model, you need to install and operate your own Cassandra cluster, and connect it to Pega Platform. This option is recommended as it ensures future compatibility. For more information, see Connecting to an external Cassandra database.

Managed (embedded, internal)

In this model, the nodes that are designated to hosting the Decision Data Store have their Cassandra Java virtual machine (JVM) started and stopped for them, by the JVM that is hosting the Pega Platform instance. This type of configuration is deprecated starting in Pega Platform version 8.6 and therefore does not ensure future compatibility. It is not recommended for new deployments. For more information, see:

For more information, see the Apache Cassandra documentation.

Previous topic Managing Cassandra as a store for decisioning data
Next topic Configuring the Cassandra cluster

Have a question? Get answers now.

Visit the Support Center to ask questions, engage in discussions, share ideas, and help others.

Visit the Support Center

Get Started with Community

Cassandra overview

Apache Cassandra

Decision Data Store

Supported configurations

Deployment options

Related articles

Have a question? Get answers now.

Ready to crush complexity?

Experience the benefits of Pega Community when you log in.

Get Started with Community

Apache Cassandra

Decision Data Store

Supported configurations

Deployment options

Related articles

Have a question? Get answers now.

Ready to crush complexity?

Experience the benefits of Pega Community when you log in.

We'd prefer it if you saw us at our best.