Recovering a node

On-premises clients can restart a node that is unavailable by performing a node recovery procedure. For nodes running in Pega Cloud Services environments and client-managed cloud environments this task is not supported; the deployment automatically manages any required node recovery.

  1. Decommission the failed node:
    1. In the header of Dev Studio, click Configure > Decisioning > Infrastructure > Services.
    2. Select the service with the failed node by clicking the corresponding tab.
    3. For the failed node, in the Execute list, select Decommission.
  2. Fix the root cause of the failure.
    For example: Replace failed hardware parts, or the entire node.
  3. Recover the data:
    • If the data was previously owned by the failed node and is available on replica nodes, delete the Cassandra commit log and data folders.
    • If the data was previously owned by the failed node and is not available on any replica node, perform data recovery from a backup file.
  4. Restart the node and add it back to the applicable service.
    For more information, see Enabling decision management services.
  5. Run the nodetool repair operation.
  6. Remove unused key ranges by running the nodetool cleanup operation on all decision management nodes.