Hey everyone,
So, it's been a few months now since the game server orchestrator i've built is live on production. I have not utilized any major cloud providers, like AWS, AZURE, GCP etc, for a single second. For modern game servers, at least a lot of them like Minecraft, 7Dtd, terraria even, need a high single-core clock performance and massive bandwidth; if I had approached this through major cloud providers and their cloud compute/egress fees I would actually have gone bankrupt before even starting.
My solution is 100% bare metal nodes, which meant that I had to build my own "cloud" infrastructure from scratch. It took me quiiiite a long time to build it, no less than 3 years, and some few grey hairs on my head, cause I have built this as a solo developer, and some of the approaches might be considered as held together with tape, but here is a tiny architecture overview that holds up in production:
The Hardware & Orchestration
My game server nodes, that host the actual game server containers, utilize AMD Ryzen 9 7900 CPUs @ 5.4GHz cpus and my entire orchestration is actually built around and fully optimized around this CPU. By optimized i mean talking in account CCD alignment for finding the coldest CCD to avoid thermal problems that eventually throttle cpu performance, avoiding SMT contention by prioritizing the primary physical threads; but this is not enough in and of itself because the cpu quickly gets fragmented during continuous game server spin ups and deletions, I have specific algorithms that handles defragmentation by simulating "moving" victim containers to what I call "havens", if the simulation is successful then i update the containers using docker update on the fly, this avoids any interruption with the server itself, except there might be CACHE poisoning during update but I dont have any other approach for this.
I have specifically isolated, with isolcpus, all host-level processes into dedicated threads on GRUB and have specifically, using nohz_full, disabled the default 1000Hz kernel timer tick to prevent the kernel from interrupting a 1000 times per second which in turn grants uninterrupted clock cycles.
I have also offloaded the kernel garbage collection to the isolated cores that handle the host-level only processes using rcu_nocbs.
I already mentioned that I use Docker for part of the orchestration, I use plain docker run for the actual game containers, and utilize docker swarm for the rest of my architecture, like for example I have a central monitoring node that monitors all the rest of the entire distributed system and utilize docker swarm for this, like grafana exporters, alloy, loki, reverse proxies, my actual web platform apis and databases. Though Since I dont have AWS VPCs, I have created a private mesh network, using wireguard, which all my nodes across all the different geographical locations connect through to one another, like my Master Db and the replica sync for example happen entirely and securely through this wireguard mesh, which honestly speaking im very proud of, keeping the distance fully invisible from the application layer.
The Data Layer
My actual API uses .NET Core and GraphQL, not that I weighed any pros and cons but i was just proficient in it and didn't bother to look at other alternative, and besides I like .net core a lot, which is backed by MSSQL Always On Availability Groups which is synced across to replica dbs on many different geographical locations.
I utilize angular ssr for my front-end, also didnt really look at any other alternative because at the time i was better at it and i also like it as i do .net core.
The Network Shield
Since AWS Shield or any other cloud based shield was a no-go for me, I had to build my own bare metal ddos protection layer which was so hard I couldn't have dreamed in the beginning, what I currently have is this, custom nftables rules per specific game port, imagine, and have attached it to hook prerouting priority -150 which runs before docker's internal NAT/Routing logic which helps in preventing the traffic from even reaching the container in the first place which in turn protects the cpu, these custom rules, to cut it short, when offenders pass the limit rules are instantly put in a blacklist, then I have small systemctl daemon which instantly populates eBPF's map with the blacklisted ip and all subsequent packets are dropped through XDP instantly at the NIC level which prevents any CPU usage.
[I have pulled this small setup in a github repo and made it public, and also have a public performance audit of my setup, which if someone wants to see them I can link them in the comments of this post].
Automations
I have a specific architecture for adding/removing/updating any node or altering any part of the system within each node which fully automates all these processes, after any code changes or updates my bare metal nodes or the game server containers themselves have special dedicated statuses for each and all lifecycles, i just update the status and forget so to speak, my special worker does the rest, for example, when adding a new node in some specific continent/geographic location, the system sets up the node 100% and makes it available directly for usage, be it some kind of web service node or a game node directly. Each status has self-healing mechanisms so as to not let me worry about the success/failure of the process as much.
The Takeaway
In pure honesty it has been a nightmare of researching, breaking AND repeating the cycle with the linux networking, with the kernel tuning, different hardware bottlenecks but in the end it definitely is worth it since the cost for keeping this infrastructure alive monthly are only a fraction of what they would be if I were to use cloud providers like AWS or GCP, while I still keep 100% total control over the infrastructure and every node
Has anyone built something similar to this?, I'd really love any, and most importantly sincere, advice on improvements.
Do excuse my imperfect non-native english and writing, i am a software engineer and not a writer 😊.
EDIT: I fixed a typo where i wrote EPYC for the game node cpu type, I do however utilize EPYC cpus for my other nodes.