Ensure that your Stream service operates without causing any errors by regularly monitoring the status of your Stream nodes, partitions, disk space, CPU usage, and database availability.
You can do a simple, manual status check of the Stream service by going to the Stream landing page.
In the header of Dev Studio, click.
- Status of each stream node is NORMAL.
- There are no offline partitions.
- There are no under-replicated partitions.
You might see some under-replicated partitions from time to time, however, the value must stay 0 most of the time. If you constantly see under-replicated partitions, that might indicate that the Stream service is undersized. In this case, look at the CPU of your stream nodes. if CPU is averaging above 80%, consider scaling up your Stream nodes cluster.
- There is exactly one node for which the Is controller
parameter is set to 1.
If you see no controller nodes, or there is more than one controller node, perform a rolling restart of Stream nodes. A controller node is responsible for managing the states of partitions, replicas and for performing administrative tasks like reassigning partitions in case of a failure.
For more information, see Operating the Stream service.
Monitor Pega Platform instances by associating a health check with every node in a cluster. Take unhealthy nodes out of the service and restart them.
For more information, see Verifying that an instance is running.
The Stream service uses the disk to store data. Monitor disk usage to prevent running out of disk space. If a Stream node runs out of disk space, it stops functioning until space is available again. This situation leads to data loss and overall service instability.
Monitor the available disk space and take actions, for example, increase disk space, when the availability of free space drops below 30%.
Monitor the CPU usage on every Stream service node. If the CPU average usage is above 70%, consider provisioning a bigger node.
The Stream service relies on the availability of the Pega Platform database. A slow or unavailable database might have a serious impact on the Stream service.
Monitor queries to the following tables:
If queries take more than 1 second, consider tuning your database.
In the case of database unavailability, planned or unplanned, consider restarting your stream nodes, especially if you see them being unhealthy.