skapj
ServiceNow Employee
ServiceNow Employee

Whether you are part of IT operations, a BCM practitioner, a Compliance person, or from the Security team; one of the key area that will cross your mind and be of an interest to you will be how does ServiceNow provides service resiliency for our cloud services?

 

Backup and Recovery

First of all, unlike the traditional IT infrastructure setup where backups are used to recover services at the secondary or DR site, ServiceNow does not rely on our backup and recovery process for cloud service resiliency i.e. disaster recovery.

 

The primary purpose for our backup and recovery mechanism is to provide our customer's with the capability to recover their data in scenarios such as: if a customer deletes some data inadvertently, or a customer’s data integration or automation gets misconfigured or malfunctions, this could result in data being rendered unusable or inaccessible.

 

Frequency of Backups

Full backups are taken weekly, with differential backups made daily in between. Backups have a minimum 14 days retention period. An option to purchase extended 28 days retention period is also available.

 

Security of Backups

All backups are encrypted with AES 256, written to disks (gone are the days of tapes backup, labelling, packing and scheduling for transport etc), and stored onsite only within the same data center pair in which the customer's instances are hosted. Neither are backups sent offsite nor is another 3rd party service provider involved.

 

So how does ServiceNow handles Disaster Recovery and what is AHA?

For the Gen X among us, AHA is not the band thats often considered a one-hit wonder that rose to fame during the mid 80s. ServiceNow AHA stands for Advanced High Availability, where ServiceNow’s data centers are arranged in pairs, with all customer production data hosted in both data centers simultaneously and kept in sync using asynchronous database replication. Within each regional data center pair, there is no concept of a fixed primary location for any customer instance.

 

Geographic Resiliency

Both the data centers within the pair are geographically dispersed, and are in an active-active setup to facilitate quick service recovery by either a transfer or failover between the sites.

 

RTO and RPO 

Leveraging ServiceNow AHA, our metrics for RTO is 2 hours and RPO is 1 hour. ServiceNow conducts our ISCP (Cloud DRP) test annually, providing both the SOP and test report via CORE; with the results having consistently successfully far exceeded the RTO and RPO metrics.

 

Availability SLA

ServiceNow customer production instances (of the Subscription Service) have a contracted Availability SLA of 99.8% during a calendar month.

 

Conclusion

In summary, ServiceNow data centers and cloud-based infrastructure is designed to be highly available, with redundant components and multiple network paths to avoid single points of failure. ServiceNow data centers are arranged in pairs, with all customer production data hosted in both data centers simultaneously, to provide AHA quick service recovery and geography resiliency. The standard backup mechanism is also in place to provide data recovery in certain situations to complement our AHA capability for a holistic setup.

 

I hope that this blog post provides you with a quick high level overview of ServiceNow backup and recovery mechanism and our AHA cloud resilience capability, and possibly answering a few FAQs in this area at the same time. For more information and details, the following resources are available:

 

 

Thank you for your time, and have a good day!

1 Comment