With data being such a commodity to those in the financial world, there is an essential demand placed on algorithmic risk platforms or grid compute platforms. Beyond finance – this is the case anywhere there is a recognised need for distributed processing.
These processes tend to be placed in production environments that are running on rack(s) of server kit in a data centre. In addition, the architecture will have been built via a variety of methods and people.
This will cover everything from shadow IT, through to expensive vendor kit and appliances.
The need to change
At RedPixie, we’ve helped many customers transition these activities to the cloud.
This tends to typically start with grid compute platforms – these workloads are considered when a company is defining its cloud strategy as the economic benefits are stark.
Here’s a common example:
‘I only use 2000 cores for 10 hours a month, wouldn’t it be great if I only paid for the time I was actually using them, rather than the racks of 60 odd servers I’m constantly paying for today?
The money is the easy one, what else do we need to consider when moving this kind of architecture to the cloud?’
Key challenges with moving grid compute platforms to the cloud
So, with the mind-set of change, the key challenges around a traditional stack are; storage, compute nodes, networking etc.
Most noteworthy, the storage layer has been a regular issue up till now.
As such, many companies have typically purchased some form of storage network appliance with fibre channel connections and fast spinning disks or SSDs.
However, this type of appliance cannot be simply “lifted-and-shifted” to the cloud and some engineering must be done.
Cloud migration example
We have recently tested a number of Software Defined Storage (SDS) solutions in Microsoft Azure to solve the storage issue noted above – the results are positive.
We are able to maximise the 10Gb/s networking in a number of different configurations (reading and writing around 1000MB/s) and new accelerated networking instances are available with 25Gb/s networking.
In addition, this is high-performance territory and it will be interesting to see where this goes with 100Gb/s on the horizon and whether we will ever be able to use RDMA for storage networking not just MPI.
Most other elements in the stack are not an issue to move to the cloud unless there are some extreme requirements.
What benefits are there to migrating to the cloud?
The aforementioned economics are often compelling for these systems. We won’t dig into further here as it is well trodden territory, put ‘Cloud economics’ into your favourite search engine.
When building these systems in the cloud you always aim for a ‘one-click’ deployment whereby you can deploy the entire stack with a single click.
The stack is defined as Infrastructure as Code (IaC) and even a very complicated stack can usually be deployed, from scratch, in 30 mins or less.
This opens up the possibility of giving individual analysts or developers their own copy of the entire stack (we get asked about this a lot) or using IaC to simply re-deploy the whole stack as a disaster recovery strategy. This also drives consistency across environments (e.g. QA, Production and DR)
The best IT guys know that automation is their friend and this is especially true in the large scale cloud where you are driven to automate first, not as an afterthought.
In addition to post-deployment scripts which build your stack we also look to automate; VM and service start-up and shutdown based on a tagging framework, Automation.
The best enforcement of policy (naming conventions, tags, etc.), auto-scaling virtual machine size, auto-scaling service size (SQL as a service for example), reporting orphaned resources and so on.
4. Manageability & simplicity
Well engineered cloud deployments will also look utilise PaaS and SaaS more which makes for a more manageable platform and puts some of the onus of managing the overall system in the hands of the vendor.
Summary: migrating grid compute platforms
We have talked about a lift-and-shift approach here but obviously there will be time when re-architecting is also necessary first or at a later stage.
A number of workloads will benefit immediately from re-architecting to leverage the benefits of cloud while others can simply be transitioned and merit some re-engineering through existing development and release cycles.
Let us know your thoughts on moving grid compute platforms into the cloud below. Should you wish to learn more, download this guide on revolutionising the insurance industry ⇓