Stream service node status information

Monitor the performance of each Stream node that is part of the Stream service by checking the metrics on the Stream tab. Detailed status information about a node is helpful when you need to troubleshoot the node.

  • Node ID – The identification number of the node in the cluster.

  • Disk usage – The disk space used by the Stream service on this node.

  • Free disk space - The remaining disk space that is allocated to this node.

  • Partition
    • Total - The number of partitions created in the Stream service.

    • Under-replicated - The number of partitions that are not synchronized with the leader node. For example, under-replication can occur when a Stream node fails.
      Note: When you notice under-replicated partitions, check the status of your Stream nodes and troubleshoot them.
    • Offline - The number of partitions that do not have a leader. Partitions without a leader can happen when all brokers hosting replicas for this partition are down or no synchronized replica can take leadership due to message count issues. When a partition is offline, the Stream service does not process messages for that partition.
      Note: When you notice offline partitions, check the status of your Stream nodes and troubleshoot them.
    • Leaders - The number of leaders that handle all of the read and write requests across all partitions. A single partition can only have one leader. For more information, see the Apache Kafka documentation.

  • Incoming byte rate - The measurement of how much traffic comes in to be handled by the Stream service. The incoming bytes rate over specified periods of time and the overall mean value.

  • Outgoing byte rate - The measurement of how much traffic leaves the Styream service. The outgoing bytes rate over specified periods of time and the overall mean value.

  • Incoming message rate - The number of incoming records over specified periods of time and the overall mean value.

  • Processors
    • Network processors idle time - The average fraction of time that the network processor is idle.

    • Request handler threads idle time - The average fraction of time that the request handler threads are idle.

    Note: The idle time can have value between 0 and 1, where 0 means that the processor is 100% busy, 1 means that the processor is 100% free. When the idle time is lower than 0.3, it means that the processor is 70% busy, and a warning is displayed in the Stream tab. Check to see what is causing such a high demand on the processor and consider adding additional Stream nodes.
  • Metrics
    • Replication max lag - The amount of elapsed time the replica is allowed before it is considered to out of synchronization. This can happen if the replica does not contact the leader for more messages.

    • Is controller - When its value is 1, the node is the active controller in this cluster. There can be only one active controller in the cluster.

    For more information about the node metrics, see the Apache Kafka documentation.