For a 24 x 7 x 365 platform like GameSparks, we need to have a mechanism that allows us to rapidly and safely deploy new versions of our software to production while avoiding any downtime for our global user community.
Even though they probably don’t realise it, the end-users of our platform are mobile gamers from all around the world. Here in GameSparks Operations, our aim is to keep their gaming experience as smooth and connected as possible, so it isn’t acceptable to put up a maintenance page for an hour while we roll out a new software release!
The GameSparks platform was designed from the start so that any individual component can be updated or scaled in a transparent, rolling manner without any impact on the other parts of the platform or our users.
To do this, we decouple as much as possible by making each component independent, stateless and self-configuring. The diagram below shows the different segments of the GameSparks platform:
The “Game Management Segment” of the GameSparks platform is the part that you the developer uses when you’re configuring or updating your game in the portal, or checking on the behaviour of your user base through our analytics dashboards. Since this part of the platform is decoupled from our “Game Segments” – the services that your games communicate with – we can upgrade it independently without any knock-on effect.
In the same way, our “Live” game services – the WebSockets API your real users interact with – are completely segregated from our “Preview” services – the ones you develop and test against. We’ll always keep these two segments running the same version of our API so you don’t get any surprises when you publish your game, but we do scale them very differently.
All of the GameSparks services have been designed to be as stateless as possible, so enabling us to quickly and efficiently scale and contract them in response to load spikes. So if your game goes viral, we can have the capacity ready to support it online within minutes. And if this success is only short-lived, we quickly decommission any excess capacity to keep our costs under control and our prices low.
We use Puppet as our configuration management tool, and so whenever we provision a new node Puppet configures it in exactly the same manner as all of its counterparts in the environment. Aside from our database nodes – which take a little longer due to data synchronisation – we can have every other type of node up, running, fully configured and processing traffic in less than 5 minutes.
Once a node is up and running, we almost never need to log on to it. We know it is configured to a known, tested baseline, and it becomes automatically operationally ready by adding itself to our monitoring systems as it boots for the first time. Each node also constantly ships its logs to our central Logstash/Elasticsearch/Kibana analysis platform so we can investigate most problems without logging on to dozens of servers – more of that in a future blog post!
Don’t Get Attached To Servers!
All of these techniques and approaches mean that our servers have a relatively short lifespan. When we deploy a new software release we don’t deploy it to our existing servers – instead we replace them with completely new VMs with the new release on. New VMs that are identical to the ones that we have tested the release on.
As a result, we can perform final testing on the new deployment before we open it up to the public and also rapidly roll back to the old nodes if we encounter any unforeseen problems.
We also don’t have any months or years-old servers that need to be patched or have had manual configurations applied that somebody has forgotten to make persistent. Anybody who has been a Sys Admin knows how often this can happen! Instead, if we want to patch we test a new baseline build and then roll this out into our live environment – simple!
If you’re interested in learning more about how we deploy and operate the GameSparks platform then please leave a comment below.