As part of our “Experts Insights” conversations series, we are speaking with key recognised industry experts to seek their insights, knowledge, experience, and advice, on key business and technology themes and topics across a broad range of key industry and market sectors and segments.
This CEO conversation is with industry expert and thought leader, Dez Blanchfield, CEO of Promereon Group, we focus our discussion on the critical topic of Disaster Recovery, and many key aspects surrounding this mission critical capability all organisations should have in place, but all too often many do not.
The following is what Mr Blanchfield had to share on key questions we put to him on the topic of Disaster Recovery – please do join the conversation on your social network platform of choice by following the hashtag #CEOconversations, and post your questions, feedback, and please to engage with some likes and shares to be part of the conversation on your platform of choice.
What are the advantages and disadvantages of using the cloud for disaster recovery?
One of the greatest advantages of using cloud platforms for disaster recovery is the flexibility which comes with “pay as you go” as a Service ( aaS ) offerings, you can scale up as you grow, your environments can be build as-required by using modern Design Patterns and DevOps manual or automated orchestration.
Another key advantage of using cloud platforms for disaster recovery is the ability to quickly replicate an environment in the cloud, for testing, training, or any number of requirements where a full or partial copy of your production environment might be used, i.e. upgrades and testing, version validation, new feature updates and stress testing / load testing, after which the copy of the environment built can be “burned down” instantly and you only pay for what you used while the copy of the environment was stood up and running / used.
One of the biggest disadvantages of using cloud platforms for disaster recovery is often the interconnect between on-premises environments and cloud platforms, and the latency between, which can often impact an organization’s where there are high volumes of data, or simply a large dataset which may take a long time to copy or move into a cloud platform, i.e. moving a large multi terabyte backup of a database to a disaster recovery environment hosted on a cloud platform could take hours or even days to copy “up” into the cloud.
If an organisation is offline and suffering an outage, and copying a database back to a cloud platform to fail over to their disaster recovery environment is delayed by hours or days of data upload or copy time before the disaster recovery can be executed, it would in effect render the disaster recovery solution of little or no value.
Is the public cloud “tough enough” (to be used as a target) for disaster recovery?
Yes indeed, the majority of cloud platforms are indeed “tough enough” to be used to implement and host disaster recovery environments and supporting infrastructure. Cloud providers themselves by design build their data centers and core hosting infrastructure to the highest levels of availability, in most cases cloud providers operate to a five nines availability for their own protection in the provision of hosting infrastructure services such as Infrastructure as a Service ( IaaS ) or Platform as a Service ( PaaS ).
As a result of this laser focus on availability of the underpinning infrastructure availability offered by the majority of cloud providers, organisations who leverage cloud platforms core strengths through worlds best practice in design principles and design patterns suited to leveraging the best features of cloud services, and in turn gain significant commercial and technical advantage by employing IaaS and PaaS offerings, combined with the capabilities of DevOps automation and orchestration.
It is entirely likely that for the most part, cloud providers have more durable data center, network, storage, server compute, monitoring, support, maintenance, update, upgrade and support capabilities than most organisations have themselves in any combination of in-house or 3rd party computer room and or datacenter environments. Cloud hosting providers design to support millions of customers on their platforms, at scale, in high performance architectures. It is unlikely any enterprise of any form can claim to be even remotely similar in capabilities with their own in-house or 3rd party technology.
What are the biggest DR challenges for enterprises and large organizations?
Choosing the most appropriate design pattern is probably the biggest challenge in disaster recovery for large organisations, i.e. should they design to an Active Active model, or an Active Passive model, should they do batch data updates or live database syncing.
Other key challenges surrounding the challenge of providing disaster recovery capabilities include the likes of:
- Sizing and scaling to meet or match ongoing growth and workloads of an organisation, as and when the organisation needs to scale up or down depending on commercial or market forces, i.e. scaling up for holiday periods if you are in retail, or scaling down during off peak market or seasonal periods
- Protection and controls around core Information Technology Security Management ( ITSM ) measures around the disaster recovery environment, i.e. ensuring the disaster recovery environment security is equal to the production environment’s ITSM requirements
- Maintaining consistency between production environments which experience ongoing updates in new features, developments, updates, maintenance or patching – keeping a disaster recovery environment in-sync with production as production changes is a significant challenge
- Continued support and sponsorship from the Board down through C-Suite and Lines of Business is often a real challenge. Many organisations find it difficult to understand the need for continued investment in disaster recovery if they do not experience issues or outages in production environments, and it is only when a major outage in a production environment comes about that a failure to invest in and maintain bona fide disaster recovery capabilities are realised by which time it can often be a case of too little too late.
What are your disaster recovery predictions for the next 12 to 18 months?
Global data protection and privacy regulations are already driving significant change in how organisations manage and protect their data and supporting systems and infrastructure, which will in my opinion cause organisations and businesses of every type, shape and size, be it enterprise, government or not for profit, small, medium or large, to bring an entirely more formal focus to disaster recovery.
I truly believe we will even see a dedicated resource and role be defined, in the same way we have seen Cyber Security bring about the need for the likes of a Chief Risk Officer, Data Protection Officer, and Chief Information Security Officer, the likes of the EU GDPR will in my opinion bring about a need for the creation of a role such as Chief Disaster Recovery Officer and an entirely new business practice built around the protection of organisation resilience supported by bona fide disaster recovery capabilities beyond anything we have seen to date.
What would you include in a disaster recovery checklist?
In any disaster recovery checklist I always ensure the following three key components:
- Clear understanding of the uptime and availability required to ensure the business can continue operations, usually measured in the form of a Service Level Agreement ( SLA ) of some form between the I.T. department and the business or client.
- Regular monitoring, measures and review tools and processes by which the disaster recovery solution is maintained and managed. A disaster recovery solution which is not managed is likely to quickly fall out of sync with the business requirement and fail to meet the required SLA agreed with the business or client.
- Scheduled ongoing fail-over from production to disaster recovery to prove that the disaster recovery solution not only works, but can be failed to, and then restored from. Ideally at least once a year if not two or three times a year the production environments should be be failed over in a managed process to the disaster recovery environment, and left to run for a period of time after which it is clear that all systems are still functioning as required should the business need to remain on the disaster recovery environment for any period of time. Any business which can not regularly execute a full fail over from production to disaster recovery and run for a sustained period of time, and then restore back to production, can not truly say it has a fully functional and effective disaster recovery solution.
What is your best (or worst) IT disaster recovery story?
The worst disaster I have witnessed over the last three decades involved a boutique printing business which due to a mechanical failure in a high speed large format printing press, a fire broke out and burned the entire business premises to the ground, destroying all physical and logical components of the business, all of their card and paper stock, inks, office environments and technology including a self hosted production and disaster recovery environments.
By hosting their disaster recovery environment in the same building they operated out of, in the same computer room the production business systems infrastructure and environments were hosted in.
Once the fire destroyed the production business systems environment, it also in turn destroyed the mirrored disaster recovery environment in the same racks in the same computer room on-premises. This was unfortunately the worst possible outcome and the single worst disaster recovery story I’ve witnessed in three decades of working in the technology industry.
The best success story I have had the privilege of having designed & developed and witness go into action in a live real scenario was one where a client engaged me to design and implement an updated disaster recovery capability for a family business in the food industry.
Just two weeks after completing the work to complete the implementation and deployment of the newly designed, build and activated disaster recovery solution I designed for them, the customer allowed a software vendor to deploy an upgrade of a mission critical Enterprise Resource Planning ( ERP ) platform which in turn failed part way through the software update, and crashed the entire ERP system, and corrupted the production copy of data and databases, taking the business offline with a need to fail-over to the newly implemented disaster recovery solution.
The new failover process, a manual process to “switch” from the production ERP platform to the new disaster recovery ERP environment ran seamlessly and smoothly, in mere minutes, avoiding what could have been a lengthy outage of a mission critical business system.
The outcome was the business was able to execute a manually triggered failover from the corrupted production instance, to the disaster recovery instance of the ERP platform in mere minutes, so seamlessly that staff were not aware that there had been a failover to the disaster recovery environment and staff were able to continue to do their jobs and the business was able to continue to function with zero downtime and zero impact to operations or customers.
The success of the disaster recovery plan and disaster recovery strategy and solution allowed the business to avoid what could have potentially been a very bad outcome had the new disaster recovery solution I had designed and implemented not done what it had been designed to do, being instantly pick up the production workload with a real-time live sync up to date mirror copy of the production system, software and databases. A very good outcome for the business, and indeed a very rewarding outcome for me having designed and implemented the solution a mere two weeks prior to the incident.
Follow Mr Blanchfield on LinkedIn, Twitter and Threads, and engage with him as part of our CEO Conversations series via the #CEOconversations hashtag, and be sure to click “Follow” to informed of each neww article in our CEO Conversations series and supporting upcoming events.