Tuesday, October 06, 2015

Oracle VM - selecting a DR layer

When developing a new solution, including both the application, data-store and infrastructure components, one of the questions to ask is on which layer to build resilience against failure. On which level of the stack will you protect against failure of a component and on which level will your disaster recovery focus. In essence the answer is quite simple, you should ensure that disaster recovery is safeguarded as high as possible in the stack. The true answer is a very complex answer which includes disaster recovery, high availability and maximum availability components. Building a solution which is resilient against failure is a very complex process in which every component needs to be taken into account. However, making sure that you have disaster recovery as high up in the stack as possible will make your life much more easy.

As an example we take the below image which shows a application centered disaster recovery solution based within a virtualized environment with Oracle VM.

Within this solution applications will run in a active active setup in both site A as well as site B. Information between the two sites is kept in sync by making use of the MAA maximum Availability Architecture principles from Oracle. This means that when a site fails the application will still be able to function as it will run on the other site. Users should not face any downtime and should not even be aware that one of the two sites has been lost due to a disaster.

The application centered disaster recovery solution is the most resilient solution against disasters and the loss of a site. However, in some cases it is not feasible to run a architecture as shown above and you would still like to be able to perform a disaster recovery of the virtual machines running within your deployment. A solution to this is making use of block replication on a storage level and allowing your recovery site (site B) to start the VM's in case your site A is lost.


Within this model you will replicate all storage associated with the VM's from site A to a storage repository within site B. In essence this is an exact copy of the VM, however, on site B the machine is in a stopped state. This is also represented in the diagram below where you can more clearly see the replication of storage on the two sides. For this solution you can use storage block replication in a way that your storage appliance is supporting.


In case of a failure you have to ensure that all machines are stopped on site A, after this you can make the storage on site B readable and writable and start the virtual machines. This might not be the most ideal solution in comparison with disaster recovery in the higher levels of the stack, however, in case you are forced to ensure disaster recovery on a infrastructure / VM layer instead of a application level this is a solution that can be used. 

For more information, also view the slidedeck below.

No comments: