AWS EKS Best Practices Guide

Cloud Credentials

Select the Correct AWS Account

Principle of Least Privilege: Ensure the AWS credentials you use have only the permissions necessary to create and manage EKS clusters and associated resources (VPC, subnets, security groups, IAM roles, etc.). AWS recommends using IAM Access Analyzer to identify resources shared with external entities and refine permissions. Reference: AWS EKS IAM Security
Billing Awareness: Double-check you are using the intended AWS account so that resources (and subsequent costs) are billed to the correct account (especially in multi-account setups). AWS recommends using AWS Organizations for centralized billing and cost management. Reference: EKS Billing
AWS Organization / Multi-Account Strategy: If your organization has a multi-account strategy (e.g., separate Dev, Staging, Prod accounts), make sure you are deploying to the right environment to keep resources isolated and manageable. AWS recommends using AWS Control Tower for setting up multi-account environments with guardrails. Reference: Multi-Regional EKS with Centralized Control
Temporary Credentials: AWS recommends using temporary credentials with appropriate session duration rather than long-term access keys. Reference: IAM Temporary Credentials

Basic Cluster Info

Cluster Name

Uniqueness: EKS cluster names must be unique per region per account. Once created, the cluster name cannot be changed. Reference: Create EKS Cluster
Naming Conventions:
- Use a clear naming pattern, for example: company-environment-appname-eks or staging-frontend-eks.
- Avoid disallowed characters (AWS typically restricts certain special characters). Stick to alphanumeric, -, and _.
- Include environment labels if relevant (e.g., prod, dev) to easily identify your cluster's purpose.
- AWS recommends using a consistent tagging strategy across all resources for better organization and cost tracking. Reference: EKS Scaling and Naming

Region

Resource Availability: Confirm that your chosen region supports EKS and all the complementary services/features you plan to use (e.g., managed node groups, specific instance types). Not all AWS regions support all EKS features. Reference: EKS Service Quotas
Proximity & Latency: Choose a region close to your users or to other AWS services you rely on, reducing latency. AWS recommends using Amazon CloudFront for content delivery to further reduce latency for global users. Reference: Multi-Regional EKS Operations
Cost & Compliance: Different regions can have different costs and data residency regulations. Make sure to pick the region that meets compliance or data residency needs if applicable. AWS provides a Total Cost of Ownership (TCO) calculator to estimate costs across regions. Reference: AWS Regional Pricing
High Availability: Some regions have more Availability Zones (AZs) than others. AWS recommends deploying across at least three AZs for production workloads to improve redundancy for your EKS clusters. Reference: EKS Network Requirements
Kubernetes Version: AWS recommends using the latest supported Kubernetes version for new clusters and staying within one or two minor versions of the latest for existing clusters. Reference: EKS Kubernetes Versions

Node Configuration

This section typically configures the networking and node groups that will run your workloads.

VPC Selection

Existing vs. New VPC:
- Existing VPC: Make sure it is properly set up with private subnets, NAT gateways, and the necessary routing for EKS. Validate that it has enough IP address space (CIDR blocks) to handle the expected number of pods and nodes. AWS recommends a minimum CIDR block size of /24 for each subnet. Reference: EKS Network Requirements
- New VPC: If auto-generating a new VPC, ensure you specify a sufficiently large CIDR range and consider private/public subnet segmentation. AWS recommends using the EKS VPC quick start template for proper configuration. Reference: Creating a VPC for EKS
IP Address Management: EKS assigns IP addresses to each pod (if using the AWS VPC CNI). If your IP space is limited, you may encounter IP exhaustion. Plan the CIDR blocks accordingly. AWS recommends using custom networking and secondary CIDR blocks for large deployments. Reference: EKS Custom Networking
VPC Limits: Keep in mind AWS default VPC and subnet limits. If you plan on multiple clusters, check that you're not nearing any resource quotas (VPC count, NAT gateways, route tables, etc.). AWS recommends using AWS Service Quotas to monitor and request increases when needed. Reference: VPC Quotas
VPC Endpoints: AWS recommends using VPC Endpoints for private connectivity to AWS services, reducing data transfer costs and improving security. Reference: Private EKS Clusters

Subnets IDs

Minimum Two Subnets: EKS requires at least two subnets in different Availability Zones for high availability of the control plane and the worker nodes. For production environments, AWS recommends using three or more AZs. Reference: EKS Network Requirements
Private vs. Public Subnets:
- Private Subnets for Worker Nodes: It's generally recommended to place worker nodes in private subnets for security. AWS recommends this approach for all production workloads. Reference: EKS Security Best Practices
- Public Subnets for Load Balancers: If you plan to expose services publicly, typically you attach load balancers to public subnets. AWS recommends using AWS Load Balancer Controller for optimal load balancer configuration. Reference: AWS Load Balancer Controller
Sufficient AZ Spread: Select subnets that span at least two (preferably three) AZs for better fault tolerance. AWS recommends evenly distributing workloads across all available AZs. Reference: EKS Data Plane Reliability
Tagging: EKS automatically looks for subnets tagged with the appropriate Kubernetes tags. Ensure your subnets are properly tagged if you're reusing them. AWS provides specific tagging requirements for subnet discovery. Reference: EKS Network Load Balancing
CIDR Planning: AWS recommends allocating sufficiently large subnet CIDR blocks to accommodate node and pod growth (/24 or larger for each subnet is recommended). Reference: EKS Pod Networking

Node Group Configuration

Node groups define the EC2 instances (workers) that run your Kubernetes workloads.

Min Node, Desired Node, Max Node
- Right-Sizing:
  - Min Node should be at least 1–2 to ensure the cluster can handle baseline workloads. AWS recommends having at least one node per AZ for high availability. Reference: EKS Cluster Autoscaling
  - Desired Node is what you expect to run normally. AWS recommends setting this based on your typical workload requirements plus a buffer for unexpected spikes.
  - Max Node should be large enough to handle peak traffic without exhausting capacity. AWS recommends setting appropriate service quotas to ensure you can scale to this limit. Reference: EKS Managed Node Groups
- Autoscaling: Make sure the range is realistic so that the cluster can scale up/down efficiently based on workload. AWS recommends using Cluster Autoscaler with appropriate scan interval and scale-down utilization threshold settings. Reference: EKS Autoscaling
Instance Family / Instance Type / Arch Type
- Workload Requirements:
  - Choose an instance type (e.g., t3.large, m5.large, etc.) that balances CPU, memory, and network performance for your applications. AWS recommends using the Amazon EC2 Instance Selector tool to identify optimal instance types. Reference: EC2 Instance Selector
  - For specialized workloads (e.g., GPU or high-memory), pick the corresponding instance families (p2, p3, g4, r5, etc.). AWS provides optimized AMIs for GPU workloads on EKS. Reference: EKS GPU Support
- ARM vs. x86: If your applications can run on ARM (Graviton) architectures, using arm64 can sometimes be more cost-effective. AWS Graviton processors can provide up to 40% better price/performance compared to equivalent x86-based instances. Reference: EKS Graviton Support
- Reserved vs. Spot: If using Spot, consider using multiple instance types to improve availability. AWS recommends using Spot Instances for stateless, fault-tolerant workloads and Reserved Instances for predictable, long-running workloads. Reference: EKS Cost Optimization
- Bottlerocket: AWS recommends considering Bottlerocket, a purpose-built Linux-based operating system for running containers, for improved security and reduced operational overhead. Reference: EKS Bottlerocket
Capacity Type
- On-Demand: More reliable but higher cost. AWS recommends using On-Demand for critical production workloads that require guaranteed availability. Reference: EKS Managed Node Groups
- Spot: Cheaper but can be interrupted. Best practice is to use a mixed strategy (Spot + On-Demand) for cost optimization while maintaining reliability. AWS provides Spot interruption handler for graceful termination. Reference: EKS Spot Instances
- If using Spot, ensure you have a fallback On-Demand node group or at least capacity-optimized Spot allocation strategies to reduce the risk of interruption. AWS recommends implementing the Node Termination Handler for graceful pod evacuation upon Spot instance termination notice. Reference: AWS Node Termination Handler
- Karpenter: AWS recommends considering Karpenter as a flexible, high-performance Kubernetes cluster autoscaler that helps improve application availability and cluster efficiency. Reference: EKS Karpenter Best Practices
IAM Roles and Permissions (often part of Node Group creation though not shown in the screenshot):
- Ensure the node group has an appropriate IAM role (AmazonEKSWorkerNodePolicy, AmazonEKS_CNI_Policy, etc.) to allow workers to communicate with the control plane. AWS recommends creating dedicated IAM roles for each node group. Reference: EKS Node IAM Role
- Use IRSA (IAM Roles for Service Accounts) to grant pods the least privilege they need. AWS recommends this approach over providing broad permissions to the entire node. Reference: EKS IAM Roles for Service Accounts
- AWS recommends implementing IAM Access Analyzer to identify unintended access to your resources and data. Reference: EKS IAM Security Best Practices
Storage Configuration:
- AWS recommends using EBS CSI Driver for persistent storage needs in EKS clusters. Reference: EKS EBS CSI Driver
- For shared file storage, AWS recommends EFS CSI Driver which supports ReadWriteMany access mode. Reference: EKS EFS CSI Driver

Final Check & Deploy

Validation: Before clicking "Deploy Cluster", verify all fields and confirm:
- You have selected the correct AWS account and region.
- The cluster name adheres to naming best practices.
- The VPC and subnets are correctly configured for high availability and enough IP space.
- The node group autoscaling configuration matches your expected workload.
- The instance type, family, and capacity type are cost-effective and suitable for your application needs.
- AWS recommends using eksctl or AWS CloudFormation templates for repeatable, version-controlled cluster deployments. Reference: EKS eksctl Getting Started
Tagging & Resource Management: Apply consistent tags (e.g., Environment=Production, Application=MyApp, Owner=TeamName) to the cluster, node groups, and VPC resources for better visibility and cost tracking. AWS recommends implementing tag-based access control and cost allocation. Reference: AWS Resource Tagging Strategy
Logging & Monitoring: After creation, enable or confirm you have control-plane logging, container logs shipping to CloudWatch, and relevant metrics set up for better observability. AWS recommends:
- Enabling all EKS control plane logs (API server, audit, authenticator, controller manager, scheduler) Reference: EKS Control Plane Logging
- Implementing Container Insights for comprehensive monitoring Reference: Container Insights for EKS
- Setting up Prometheus and Grafana for advanced metrics Reference: EKS Prometheus Monitoring
Security Hardening: Restrict public access to the EKS control plane unless required. Use security groups and network policies to limit traffic. AWS recommends:
- Enabling private API endpoint access for production clusters Reference: EKS Cluster Endpoint Access
- Implementing Kubernetes Network Policies for pod-to-pod communication controls Reference: EKS Network Security
- Using AWS Security Groups for Pods feature to apply fine-grained security group rules Reference: Security Groups for Pods
- Implementing encryption for secrets and EBS volumes Reference: EKS Security Best Practices

Cost Optimization

Right-sizing Resources: AWS recommends regularly analyzing resource utilization and adjusting instance types and quantities to match actual needs. Reference: EKS Cost Optimization
Spot Instances: Use Spot Instances for non-critical workloads to save up to 90% compared to On-Demand pricing. Reference: EKS Spot with Karpenter
Savings Plans: Consider Compute Savings Plans for predictable workloads to save up to 72% compared to On-Demand pricing. Reference: AWS Savings Plans
Cluster Scaling: Implement efficient cluster scaling to adjust capacity based on demand, avoiding over-provisioning. Reference: EKS Cluster Autoscaling Best Practices
Cost Monitoring: Use AWS Cost Explorer and Kubernetes cost allocation tags to track and analyze EKS-related spending. Reference: Kubernetes Cost Allocation

Additional Resources

PreviousTerraform and Scoutflo NextKubernetes and Scoutflo

Last updated 4 months ago