Configuring the Stream service

Configure the Stream service to ingest, route, and deliver high volumes of data such as web clicks, transactions, sensor data, and customer interaction history.

Distribution and replication of the stream data records ensure scalability and fault tolerance of the Stream service. The service runs as a cluster on one or more servers.

Note: This procedure applies only to on-premises deployments.

Stream node type

When planning your deployment, assign the Stream node type to at least two and at most four nodes in one Pega cluster.

If you plan to have more than four Stream nodes, contact Global Customer Support to assist with your deployment.

Important:
  • Enable Stream nodes by configuring the node type as -DNodeType=Stream.
  • Do not mix Stream nodes with other node types.
  • Run a single JVM on a physical server to increase the resilience of the entire deployment.

For more information, see Assigning node types to nodes for on-premises environments.

Node identification

Each Pega node is identified with a Node ID that must be unique in the cluster. If the same Node ID is already used in the cluster, the node fails to start.

Use this setting to more easily identify nodes and their purposes at a glance. A node ID is generated by default based on certain system setting values. However, as a best practice, set the node ID manually to reflect the node’s intended purpose. To set the node ID, use the JVM argument as shown in the following example.

-Didentification.nodeid=stream-node-1
Important: Preserve the node ID after a server restart so that you can identify it as a previously known node.

Data replication

Stream service replicates every record across a configurable number of servers. This replication allows automatic failover to these replicas when a server in the cluster fails so messages remain available in the presence of failures.

By default, the Stream service keeps two replicas of each record. In case you increase the number of Stream nodes from two to three or four, make sure you change the data replication setting to match the number of Stream nodes. You can do it by using prconfig on every Pega node, or by using the following dynamic system setting:

Owning Ruleset
Pega-Engine
Setting Purpose
prconfig/dsm/services/stream/pyReplicationFactor/default
Value
<number_of_stream_nodes>

Data files location

By default, the Stream service stores its data in the java_ee_server_root/kafka-data folder. Change this location to a folder that you can monitor and secure against accidental data deletion.

Important: Do not use network attached storage or shared folders to store your stream data.

To change the default directory for a single server, in the prconfig.xml file, add the following entry: <env name="dsm/services/stream/pyBaseLogPath" value="/data/kafka-data" />

To change the default directory for all servers in the cluster, create a dynamic system setting with the following options:

Owning Ruleset
Pega-Engine
Setting Purpose
prconfig/dsm/services/stream/pyBaseLogPath/default
Value
/data/kafka-data

Ensure that you have at least 100 GB of disk space available to accommodate standard background processing activities.

Apache Kafka distribution location

When the Stream service is enabled in Pega Platform, the Apache Kafka distribution is unpacked in the following directory: java_ee_server_root/kafka-version

If you need to change the default location because it is secured against writing operations, you can do it in one of the following ways:

  • In the prconfig.xml file, add the following entry: <env name="dsm/services/stream/pyUnpackBasePath" value="/opt/kafka" />
  • Create a dynamic system setting with the following options:
    Owning Ruleset
    Pega-Engine
    Setting Purpose
    prconfig/dsm/services/stream/pyUnpackBasePath/default
    Value
    /opt/kafka

Operating system

Deploy the Stream service on Linux or any other Unix system.

Running Stream nodes on Windows might cause issues, and is not recommended in production environments.

The Stream service uses file descriptors for data files and open connections. Allow a limit of at least 100000 file descriptors. With a low descriptors count, the count limit might be exceeded causing the Stream service to fail. Check your operating system documentation on how to raise the ulimit.

Clock synchronization

Ensure that clocks on Stream nodes do not drift away and stay synchronized within a 30 seconds window. A very effective method of synchronizing clocks across all Pega Platform nodes is by using NTP.

Multiple JVMs on a single host

Do not run multiple Stream service JVMs on a single host. This reduces overall cluster resiliency and data availability in case the entire host fails.

However, in case such setup is required, you can do it by configuring dedicated, non-conflicting ports, for each Stream service JVM.

The Stream service uses three IP address and port pairs for internal communication. Assign a distinct set of ports for each JVM on a single host.

<!-- IP and port for communication between Pega nodes -->
<env name="dsm/services/stream/pyBrokerAddress" value="{IP_ADDRESS}"/>
<env name="dsm/services/stream/pyBrokerPort" value="9092"/>

<!-- IP and port for configuration management -->
<env name="dsm/services/stream/pyKeeperAddress" value="{IP_ADDRESS}"/>
<env name="dsm/services/stream/pyKeeperPort" value="2181"/>

<!-- Port for local Kafka management. Kafka JMX always runs on localhost --
>
<env name="dsm/services/stream/pyJmxPort" value="9999"/>

<!-- Port for HTTP streaming -->
<env name="dsm/services/stream/pyPort" value="7003"/>

JVM heap size

It is unlikely that you need to increase default JVM heap settings for your Stream service. However, if you need to do so, use the following settings:

  • Add an entry in the prconfig.xml file.

    For example: <env name="dsm/services/stream/pyHeapOptions" value="-Xmx4G -Xms4G" />

  • Create a dynamic system setting with the following options:
    Owning Ruleset
    Pega-Engine
    Setting Purpose
    prconfig/dsm/services/stream/pyHeapOptions/default
    Value
    For example: -Xmx4G -Xms4G

Multiple availability zones

Spread your Stream nodes across multiple availability zones. To distribute data replicas evenly across availability zones, use the following settings to configure AZ names:

  • Add an entry in the prconfig.xml file.

    For example: <env name="dsm/services/stream/server_properties/broker.rack" value="AZ-1" />

  • Create a dynamic system setting with the following options:
    Owning Ruleset
    Pega-Engine
    Setting Purpose
    prconfig/dsm/services/stream/server_properties/broker.rack/default
    Value
    For example: AZ-1