What is disaster recovery?
Disaster recovery is the process and planning associated with building a data strategy that proactively acknowledges risk.
As we all know, we live in an age where data is more valuable and important than ever before. Such that businesses need to ‘hope for the best, prepare for the worst.’
In view of this, organisations must understand disaster recovery [DR] in the context of their entire IT strategy. In doing so, they will see that this service sits within a wider discussion of:
- Cloud backup
- Business continuity
These components must be tailored and practical with regards to best protecting and/or extending your infrastructure.
Cloud-based disaster recovery
Part of this process is assessing key metrics like your current disaster recovery provision against your Recovery Time (RTO) and Recovery Point Objectives (RPO). Such elements come to form your organisation’s change management process.
From a practical perspective, this means that we must collectively take our knowledge on the subject, and translate that thinking into a mentality that includes data sovereignty and security objectives.
While this initiative may be believed to impact just select data – it very much impacts the possible capabilities of your workforce.
Should you be using a private data-centre that supports an internal application – an outage could result in the loss of potential revenue.
Unfortunately, this occurs far to frequently, specifically with widespread SaaS applications. This leads to people checking status updates through websites like:
In reality, the role of IT should prevent such situations from occurring.
Disaster recovery benefits
In conjunction with the benefits of DR above, there are some other key advantages that should be acknowledged.
However, rather than thinking purely technical, it valuable to consider the economic threats.
Let’s say that your cloud backup services are offsite, this ensures that your data is housed away from any disasters that might strike your office.
However, it also ensures that you’re always ready with an instant recovery solution in any disaster without the cost of expensive stand-by systems. This automated system means that both the CTO and CFO can be a lot forward thinking regarding their technology predictions.
Disaster recovery in Azure
This sits in tandem with the amplified need of being global, thus making it important for providers to have in-built fail-over and disaster recovery capabilities.
Azure covers this with ease.
You can see that Microsoft have created regional and global fail-over options, hot and cold standby models as well as rolling reboot capabilities that work out of the box. These capabilities put them far beyond the, plain-old storage option.
Such dedication on their part has lead industry leading professionals to remark –
“We conduct tests to refine our business continuity plans in Azure Site Recovery, which runs in Microsoft Azure, as frequently as we want. This will give a new level of comfort to the business, especially around Tier 1 applications that directly influence the company’s ability to generate revenue or that protect the integrity of our brand through a consistent, always-available, quality service in the shops and online.”
Disaster recovery examples
While the list of DR examples is a large, cross-industry list, here are three that have been noted by the CTO at RedPixie – Dirk Anderson.
These somewhat simple initial errors have made companies, large and small, fall to a temporary standstill. Both from a technical and business impact.
Please see as follows:
1. Admin accidentally deletes the DNS primary zone
For this minor accident the technical impact was significant: every single node on the network (over 15,000) kept running (desktops, switches, servers, appliances, printers, storage), however none of them could communicate with one another.
As an investment / asset management business, the business impact was fairly sizeable:
- The event happened in middle of UK trading day and start of NYC trading day
- The entire business couldn’t trade
- Many disaster recovery and BCM processes [like this organisation,] do not work and cannot be invoked (as they rely on DNS too).
The recovery period took over 4 hours but the human cost will last forever on some of those involved.
Recovery method: there was an authoritative AD restore and some magic to get AD/DNS nodes to see one another (involved PSEXEC and DHCP changes).
- Have less DNS/AD admins
- Aggressive AD replication (2 mins) is a positive and a negative
- Audit records for AD and DNS changes
- Consider AD lag sites or other AD/DNS recovery methods that do not rely on AD/DNS itself (avoid chicken and egg scenario..)
2. Imbalanced PSUs took down a whole trading floor for 4 hours
Whilst sliding in a single blade server almost half a datacentre went quiet as the power was completely imbalanced [4 PSUs].
The impact of this issue took down a trading floor for over 4 hours.
3. Testing scripts
A new and improved IPAM system was being implemented then tested, as such a DR script was utilised to change a single DNS entry.
However, Due to human error – this then replaced instead of amend was utilised and the whole of the companies DNS suddenly consisted of a single entry.
This took a while to ensure and put additional focus on recovery points of DNS.
Getting DR right
As you can see, products to solve these issues are essential in keeping businesses active, and reducing possibly fatal downtime.
However, the use of a cloud provider makes these types of protections possible to all businesses:
“There was no way we could deliver disaster recovery at the size or scale that Azure can. We could never manage the capital investment or the processes that Microsoft has in place. We’re happy to let Microsoft innovate and bring new data centre service offerings to the table so that we can focus on running our business.”