The value you get by operating applications on a Platform-as-a-Service, like Pega Cloud Services, is that Pega shares the responsibility of ensuring service reliability with you. This is different than on-premises, or even infrastructure-as-a-service, where clients bear most – if not all – of the burden of maintaining uptime.
To deliver a service that is highly available, the Pega Cloud Service reliability team monitors the health of Pega Cloud Services environments with 24x7x365 support from strategically located Service Reliability Centers and a Service Operations Center. The teams, a collection of Pega application experts, database administrators, service reliability engineers, and security engineers, delivers Pega-specific expertise to manage your Pega Cloud Service environments. To ensure service reliability, the team manages client communications, performs routine maintenance, and responds to automatically generated alerts, routine maintenance activity, and service requests tracked with the GCS ticketing system, which is built using Pega Platform™.
Pega performs several types of monitoring, including:
- The health of the environment – see Monitoring architecture
- The infrastructure resources – see Infrastructure monitoring
- Monitoring the applications – see Pega Predictive Diagnostic Cloud
The Pega Cloud Service reliability team leverages industry-leading monitoring toolsto translate infrastructure and application activity into real-time dashboards.
Pega Cloud Services monitoring tools gather data at each level of the client’s environment: the client’s private networks and subnets, database, application servers, the Pega Platform™ layer, and client-configured applications. Monitoring tools transform the raw infrastructure and application data into useful models. Data models are then translated into real-time service impact maps, which are designed to capture Key Performance Indicators (KPIs) of the health of Pega Cloud Services environments.
The Pega Cloud Service reliability team views the health of all environments in a situational awareness dashboard. This dashboard displays real-time environment metrics for Pega Cloud Services environments, which allows service reliability engineers to perform root cause analyses on system activity.
The Pega Cloud Service reliability team continuously monitors all client environments for infrastructure component issues and failures, system performance degradation, and resource utilization issues. Examples of the components and system performance monitoring include:
- Server responsiveness – Proactive, regular ping to verify server responsiveness
- CPU utilization – Proactive monitoring with alert when utilization exceeds threshold
- Memory utilization (buffered and cached) – Proactive monitoring with alert when utilization exceeds threshold
- Disk utilization – Proactive monitoring with alert when utilization exceeds threshold
- Application server accessibility – WGET/Curl, alert if no response
- Application server – WGET/Curl and JMX, alert if no response
- Application server Heap utilization – WGET/Curl, alert when utilization exceeds threshold
- Database query – Proactive, regular database query; alert if no response
- Database table space utilization – Proactive, regular database query; alert when utilization exceeds threshold
- Database errors – Proactive, regular database query; alert if error returned
Pega database administrators custom-configure system alerts to catch early signs of database CPU usage spikes, database storage issues, and database connectivity issues. Alerts automatically notify our 24x7x365 on-call operations team when database thresholds are reached so our Service Reliability Engineers can remediate any potential issues.
The Pega Cloud Services reliability team has strategically located security and network operations facilities around the world. Our security and networking experts monitor and manage your environment 24x7x365 using a follow-the-sun operations methodology. For more information on how Pega Cloud Services ensures your environment is secure, see Layered distributed denial of service protection in Pega Cloud Services.
The Pega Cloud Services reliability team monitors each infrastructure component of a client's environment, while clients and Pega Cloud Services reliability team share the responsibility of monitoring application health using Pega Predictive Diagnostic Cloud (PDC). PDC is an intelligent agent that predicts, prioritizes, and notifies administrators about the health of your Pega applications. It leverages artificial intelligence to provide operations teams with a prioritized list of action plans that ensure your system’s reliability. This combination of Pega-specific expertise and industry-leading monitoring tools enables our clients to focus on developing healthy, high-performing applications. For details, see Monitoring your application using Pega PDC.
Clients also have the ability to access PDC in order to monitor the health of their Pega applications. Pega Platform™ provides application guardrails in order to give development teams immediate notification if an application configuration doesn't meet Pega development best practices. PDC monitors not only your production environment, but also your sandbox environments as well, so you can understand the health of your Pega applications before you go live into production.
Pega application guardrails help ensure your applications are in sync with the compliance requirements of your Pega Cloud Services support agreement. For more information, see Improving your compliance score.