Best practices for running Agones on Google Kubernetes Engine (GKE).
Running Agones in production takes consideration, from planning your launch to figuring out the best course of action for cluster and Agones upgrades. On this page, we’ve collected some general best practices. We also have cloud specific pages for:
If you are interested in submitting best practices for your cloud prodiver / on-prem, please contribute!
Separation of Agones from GameServer nodes
When running in production, Agones should be scheduled on a dedicated pool of nodes, distinct from where Game Servers
are scheduled for better isolation and resiliency. By default Agones prefers to be scheduled on nodes labeled with
agones.dev/agones-system=true and tolerates the node taint
If no dedicated nodes are available, Agones will run on regular nodes. See taints and tolerations
for more information about Kubernetes taints and tolerations.
See Creating a Cluster for initial set up on your cloud provider.
Allocate Across Clusters
Agones supports Multi-cluster Allocation, allowing you to allocate from a set of clusters, versus a single point of potential failure. There are several other options for multi-cluster allocation:
- Anthos Service Mesh can be used to route allocation traffic to different clusters based on arbitrary criteria. See Global Multiplayer Demo for an example where the match maker influences which cluster the allocation is routed to.
- Allocation Endpoint can be used in Cloud Run to proxy allocation requests.
- Or peruse the Third Party Examples
You should consider spreading your game servers in two ways:
- Across geographic fault domains (GCP regions, AWS availability zones, separate datacenters, etc.): This is desirable for geographic fault isolation, but also for optimizing client latency to the game server.
- Within a fault domain: Kubernetes Clusters are single points of failure. A single misconfigured RBAC rule, an overloaded Kubernetes Control Plane, etc. can prevent new game server allocations, or worse, disrupt existing sessions. Running multiple clusters within a fault domain also allows for easier upgrades.
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.
Last modified February 20, 2024: Document `Distributed` pod scheduling. (#3662) (24e7f83)