Kafka data sets in decision management

Updated on September 10, 2021

Beginning with Pega^® Platform 7.3, you can connect to Apache Kafka servers through a dedicated data set rule. Apache Kafka is a fault-tolerant and scalable platform that you can use as a data source for real-time analysis of customer records (such as messages, calls, and so on) as they occur. The most efficient way of using Kafka data sets in your application is through Data Flow rules that include event strategies.

Pega Platform includes Kafka client version 0.10.0.1, which is compatible with Kafka server version 0.10.

To establish a connection between Pega Platform and an Apache Kafka server, create the following components in your application:

Kafka configuration instance
Kafka data set rule type

Kafka configuration instances

Kafka configuration is a data instance that is created in the Data-Admin-Kafka class of your application. The purpose of these rules is to create a client connection between Pega Platform and an external Apache Kafka server or a cluster of servers.

Kafka configuration instance

Creating a Kafka configuration instance

For more information, see Creating a Kafka configuration instance.

Kafka data sets

Each Kafka server or server cluster that you connect to stores streams of records in categories that are called topics. For each topic that you want to access from Pega Platform, you must create a Kafka data set rule. When configuring a Kafka data set, you can select an existing topic in the target Kafka configuration instance, or you can create a topic if the Kafka cluster is configured for the autocreation of topics. Optionally, you also can specify the partition keys that you want to apply to the data while running distributed data flow runs, or whether you want to read historical Kafka records, that is, the records from before the real-time data flow run that references this Kafka data set was started.

Kafka data set

Creating a Kafka data set

For more information, see Creating a Kafka data set.

Kafka data sets in data flows

You can use Kafka data sets either as a source or a destination in a Data Flow rule. You can run data flows that reference Kafka data sets only in real-time mode. Because Kafka servers support partitioning, you can distribute data flow runs to process data across all the nodes that are configured as part of the Data Flow service, increasing the throughput and resiliency of data flow processing.

Previous topic Troubleshooting Kafka
Next topic Advanced configurations for the Stream service

Have a question? Get answers now.

Visit the Support Center to ask questions, engage in discussions, share ideas, and help others.

Visit the Support Center

Get Started with Community

Kafka data sets in decision management

Have a question? Get answers now.

Ready to crush complexity?

Experience the benefits of Pega Community when you log in.

Get Started with Community

Have a question? Get answers now.

Ready to crush complexity?

Experience the benefits of Pega Community when you log in.

We'd prefer it if you saw us at our best.