AWS EKS Best Practices Guide

Cloud Credentials

Select the Correct AWS Account

  • Principle of Least Privilege: Ensure the AWS credentials you use have only the permissions necessary to create and manage EKS clusters and associated resources (VPC, subnets, security groups, IAM roles, etc.). AWS recommends using IAM Access Analyzer to identify resources shared with external entities and refine permissions. Reference: AWS EKS IAM Securityarrow-up-right

  • Billing Awareness: Double-check you are using the intended AWS account so that resources (and subsequent costs) are billed to the correct account (especially in multi-account setups). AWS recommends using AWS Organizations for centralized billing and cost management. Reference: EKS Billingarrow-up-right

  • AWS Organization / Multi-Account Strategy: If your organization has a multi-account strategy (e.g., separate Dev, Staging, Prod accounts), make sure you are deploying to the right environment to keep resources isolated and manageable. AWS recommends using AWS Control Tower for setting up multi-account environments with guardrails. Reference: Multi-Regional EKS with Centralized Controlarrow-up-right

  • Temporary Credentials: AWS recommends using temporary credentials with appropriate session duration rather than long-term access keys. Reference: IAM Temporary Credentialsarrow-up-right


Basic Cluster Info

Cluster Name

  • Uniqueness: EKS cluster names must be unique per region per account. Once created, the cluster name cannot be changed. Reference: Create EKS Clusterarrow-up-right

  • Naming Conventions:

    • Use a clear naming pattern, for example: company-environment-appname-eks or staging-frontend-eks.

    • Avoid disallowed characters (AWS typically restricts certain special characters). Stick to alphanumeric, -, and _.

    • Include environment labels if relevant (e.g., prod, dev) to easily identify your cluster's purpose.

    • AWS recommends using a consistent tagging strategy across all resources for better organization and cost tracking. Reference: EKS Scaling and Namingarrow-up-right

Region

  • Resource Availability: Confirm that your chosen region supports EKS and all the complementary services/features you plan to use (e.g., managed node groups, specific instance types). Not all AWS regions support all EKS features. Reference: EKS Service Quotasarrow-up-right

  • Proximity & Latency: Choose a region close to your users or to other AWS services you rely on, reducing latency. AWS recommends using Amazon CloudFront for content delivery to further reduce latency for global users. Reference: Multi-Regional EKS Operationsarrow-up-right

  • Cost & Compliance: Different regions can have different costs and data residency regulations. Make sure to pick the region that meets compliance or data residency needs if applicable. AWS provides a Total Cost of Ownership (TCO) calculator to estimate costs across regions. Reference: AWS Regional Pricingarrow-up-right

  • High Availability: Some regions have more Availability Zones (AZs) than others. AWS recommends deploying across at least three AZs for production workloads to improve redundancy for your EKS clusters. Reference: EKS Network Requirementsarrow-up-right

  • Kubernetes Version: AWS recommends using the latest supported Kubernetes version for new clusters and staying within one or two minor versions of the latest for existing clusters. Reference: EKS Kubernetes Versionsarrow-up-right


Node Configuration

This section typically configures the networking and node groups that will run your workloads.

VPC Selection

  • Existing vs. New VPC:

    • Existing VPC: Make sure it is properly set up with private subnets, NAT gateways, and the necessary routing for EKS. Validate that it has enough IP address space (CIDR blocks) to handle the expected number of pods and nodes. AWS recommends a minimum CIDR block size of /24 for each subnet. Reference: EKS Network Requirementsarrow-up-right

    • New VPC: If auto-generating a new VPC, ensure you specify a sufficiently large CIDR range and consider private/public subnet segmentation. AWS recommends using the EKS VPC quick start template for proper configuration. Reference: Creating a VPC for EKSarrow-up-right

  • IP Address Management: EKS assigns IP addresses to each pod (if using the AWS VPC CNI). If your IP space is limited, you may encounter IP exhaustion. Plan the CIDR blocks accordingly. AWS recommends using custom networking and secondary CIDR blocks for large deployments. Reference: EKS Custom Networkingarrow-up-right

  • VPC Limits: Keep in mind AWS default VPC and subnet limits. If you plan on multiple clusters, check that you're not nearing any resource quotas (VPC count, NAT gateways, route tables, etc.). AWS recommends using AWS Service Quotas to monitor and request increases when needed. Reference: VPC Quotasarrow-up-right

  • VPC Endpoints: AWS recommends using VPC Endpoints for private connectivity to AWS services, reducing data transfer costs and improving security. Reference: Private EKS Clustersarrow-up-right

Subnets IDs

  • Minimum Two Subnets: EKS requires at least two subnets in different Availability Zones for high availability of the control plane and the worker nodes. For production environments, AWS recommends using three or more AZs. Reference: EKS Network Requirementsarrow-up-right

  • Private vs. Public Subnets:

  • Sufficient AZ Spread: Select subnets that span at least two (preferably three) AZs for better fault tolerance. AWS recommends evenly distributing workloads across all available AZs. Reference: EKS Data Plane Reliabilityarrow-up-right

  • Tagging: EKS automatically looks for subnets tagged with the appropriate Kubernetes tags. Ensure your subnets are properly tagged if you're reusing them. AWS provides specific tagging requirements for subnet discovery. Reference: EKS Network Load Balancingarrow-up-right

  • CIDR Planning: AWS recommends allocating sufficiently large subnet CIDR blocks to accommodate node and pod growth (/24 or larger for each subnet is recommended). Reference: EKS Pod Networkingarrow-up-right

Node Group Configuration

Node groups define the EC2 instances (workers) that run your Kubernetes workloads.

  1. Min Node, Desired Node, Max Node

    • Right-Sizing:

      • Min Node should be at least 1–2 to ensure the cluster can handle baseline workloads. AWS recommends having at least one node per AZ for high availability. Reference: EKS Cluster Autoscalingarrow-up-right

      • Desired Node is what you expect to run normally. AWS recommends setting this based on your typical workload requirements plus a buffer for unexpected spikes.

      • Max Node should be large enough to handle peak traffic without exhausting capacity. AWS recommends setting appropriate service quotas to ensure you can scale to this limit. Reference: EKS Managed Node Groupsarrow-up-right

    • Autoscaling: Make sure the range is realistic so that the cluster can scale up/down efficiently based on workload. AWS recommends using Cluster Autoscaler with appropriate scan interval and scale-down utilization threshold settings. Reference: EKS Autoscalingarrow-up-right

  2. Instance Family / Instance Type / Arch Type

    • Workload Requirements:

      • Choose an instance type (e.g., t3.large, m5.large, etc.) that balances CPU, memory, and network performance for your applications. AWS recommends using the Amazon EC2 Instance Selector tool to identify optimal instance types. Reference: EC2 Instance Selectorarrow-up-right

      • For specialized workloads (e.g., GPU or high-memory), pick the corresponding instance families (p2, p3, g4, r5, etc.). AWS provides optimized AMIs for GPU workloads on EKS. Reference: EKS GPU Supportarrow-up-right

    • ARM vs. x86: If your applications can run on ARM (Graviton) architectures, using arm64 can sometimes be more cost-effective. AWS Graviton processors can provide up to 40% better price/performance compared to equivalent x86-based instances. Reference: EKS Graviton Supportarrow-up-right

    • Reserved vs. Spot: If using Spot, consider using multiple instance types to improve availability. AWS recommends using Spot Instances for stateless, fault-tolerant workloads and Reserved Instances for predictable, long-running workloads. Reference: EKS Cost Optimizationarrow-up-right

    • Bottlerocket: AWS recommends considering Bottlerocket, a purpose-built Linux-based operating system for running containers, for improved security and reduced operational overhead. Reference: EKS Bottlerocketarrow-up-right

  3. Capacity Type

    • On-Demand: More reliable but higher cost. AWS recommends using On-Demand for critical production workloads that require guaranteed availability. Reference: EKS Managed Node Groupsarrow-up-right

    • Spot: Cheaper but can be interrupted. Best practice is to use a mixed strategy (Spot + On-Demand) for cost optimization while maintaining reliability. AWS provides Spot interruption handler for graceful termination. Reference: EKS Spot Instancesarrow-up-right

    • If using Spot, ensure you have a fallback On-Demand node group or at least capacity-optimized Spot allocation strategies to reduce the risk of interruption. AWS recommends implementing the Node Termination Handler for graceful pod evacuation upon Spot instance termination notice. Reference: AWS Node Termination Handlerarrow-up-right

    • Karpenter: AWS recommends considering Karpenter as a flexible, high-performance Kubernetes cluster autoscaler that helps improve application availability and cluster efficiency. Reference: EKS Karpenter Best Practicesarrow-up-right

  4. IAM Roles and Permissions (often part of Node Group creation though not shown in the screenshot):

  5. Storage Configuration:


Final Check & Deploy

Cost Optimization

Additional Resources

Last updated