Deploying Keycloak on AWS ECS with Fargate using Terraform
Introduction
Keycloak is a popular open-source Identity and Access Management solution that provides single sign-on, identity federation, social login, and more. Deploying Keycloak in a production environment requires careful planning to ensure security, scalability, and high availability.
In this comprehensive guide, we'll walk through deploying Keycloak on AWS Elastic Container Service (ECS) with Fargate using Terraform. This serverless approach eliminates the need to manage the underlying infrastructure, allowing you to focus on your application.
Why This Approach?
Using ECS with Fargate provides a serverless container platform that eliminates the need to provision and manage servers. Combined with Terraform for infrastructure as code, this approach offers the perfect balance of control, scalability, and operational simplicity.
Architecture Overview
Before diving into the implementation details, let's understand the architecture we'll be building:
The architecture follows AWS best practices with a focus on security, high availability, and scalability:
Infrastructure Components
- Keycloak runs as a containerized application in ECS Fargate
- Aurora PostgreSQL provides a highly available database backend
- Application Load Balancer (ALB) distributes traffic across multiple instances
Security & Reliability
- Security groups and network ACLs control traffic flow
- Auto-scaling ensures the system can handle varying loads
- CloudWatch provides monitoring and alerting
Prerequisites
Before You Begin
Make sure you have all the prerequisites in place before starting the deployment process. Missing requirements can lead to errors or security issues in your infrastructure.
Before we begin, ensure you have the following:
- AWS Account with appropriate permissions
- Terraform (version 1.0 or later)
- AWS CLI configured with appropriate credentials
- A registered domain name (optional, but recommended for production)
VPC Setup
We'll start by creating a Virtual Private Cloud (VPC) with public and private subnets across multiple Availability Zones:
VPC Configuration Explained
The following Terraform code creates the foundation of our infrastructure: a Virtual Private Cloud (VPC) with both public and private subnets. This network isolation is crucial for security, as it allows us to place our database and Keycloak containers in private subnets while keeping only the load balancer in public subnets.
We're using the AWS VPC module which simplifies the creation of a properly configured VPC with NAT gateways for outbound internet access from private subnets.
We're also creating a security group for the Application Load Balancer that will allow HTTPS (port 443) and HTTP (port 80) traffic from the internet. The HTTP traffic will be redirected to HTTPS for secure communication.
BEST PRACTICE This setup follows AWS best practices for a production-grade environment with multiple Availability Zones for high availability.
1# VPC Configuration
2module "vpc" {
3 source = "terraform-aws-modules/vpc/aws"
4 version = "~> 3.0"
5
6 name = "keycloak-vpc"
7 cidr = "10.0.0.0/16"
8
9 azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
10 private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
11 public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
12
13 enable_nat_gateway = true
14 single_nat_gateway = false # For production, use multiple NAT gateways
15 one_nat_gateway_per_az = true
16
17 enable_vpn_gateway = false
18
19 # Enable DNS support for the VPC
20 enable_dns_hostnames = true
21 enable_dns_support = true
22
23 # Add tags for better resource management
24 tags = {
25 Environment = "production"
26 Project = "keycloak"
27 Terraform = "true"
28 }
29}
30
31# Security Groups
32resource "aws_security_group" "alb" {
33 name = "keycloak-alb-sg"
34 description = "Security group for Keycloak ALB"
35 vpc_id = module.vpc.vpc_id
36
37 ingress {
38 description = "HTTPS from internet"
39 from_port = 443
40 to_port = 443
41 protocol = "tcp"
42 cidr_blocks = ["0.0.0.0/0"]
43 }
44
45 ingress {
46 description = "HTTP from internet (for redirects)"
47 from_port = 80
48 to_port = 80
49 protocol = "tcp"
50 cidr_blocks = ["0.0.0.0/0"]
51 }
52
53 egress {
54 from_port = 0
55 to_port = 0
56 protocol = "-1"
57 cidr_blocks = ["0.0.0.0/0"]
58 }
59}
RDS (Aurora PostgreSQL) Setup
Keycloak requires a database to store its configuration and user data. We'll use Amazon Aurora PostgreSQLfor its high availability and performance:
Database Infrastructure
Next, we'll set up the database infrastructure for Keycloak. We're using Amazon Aurora PostgreSQL, which is a fully managed, PostgreSQL-compatible relational database that provides the performance and availability of commercial-grade databases at a fraction of the cost.
This is critical for Keycloak as it stores all user data, authentication configurations, and session information.
The code below creates:
- A database subnet group that spans multiple Availability Zones for high availability
- A security group that only allows traffic from the Keycloak service
- An Aurora PostgreSQL cluster with two instances
SECURITY NOTE For production environments, you should use AWS Secrets Manager instead of hardcoding database credentials in Terraform variables.
1# Database Subnet Group
2resource "aws_db_subnet_group" "keycloak" {
3 name = "keycloak-db-subnet-group"
4 subnet_ids = module.vpc.private_subnets
5
6 tags = {
7 Name = "Keycloak DB Subnet Group"
8 }
9}
10
11# Database Security Group
12resource "aws_security_group" "database" {
13 name = "keycloak-database-sg"
14 description = "Security group for Keycloak database"
15 vpc_id = module.vpc.vpc_id
16
17 ingress {
18 description = "PostgreSQL from Keycloak service"
19 from_port = 5432
20 to_port = 5432
21 protocol = "tcp"
22 security_groups = [aws_security_group.keycloak.id]
23 }
24
25 egress {
26 from_port = 0
27 to_port = 0
28 protocol = "-1"
29 cidr_blocks = ["0.0.0.0/0"]
30 }
31}
32
33# Aurora PostgreSQL Cluster
34resource "aws_rds_cluster" "keycloak" {
35 cluster_identifier = "keycloak-cluster"
36 engine = "aurora-postgresql"
37 engine_version = "13.7"
38 database_name = "keycloak"
39 master_username = "keycloak"
40 master_password = var.database_password # Use AWS Secrets Manager in production
41 backup_retention_period = 7
42 preferred_backup_window = "03:00-04:00"
43 db_subnet_group_name = aws_db_subnet_group.keycloak.name
44 vpc_security_group_ids = [aws_security_group.database.id]
45 skip_final_snapshot = true # Change for production
46
47 tags = {
48 Name = "Keycloak Aurora Cluster"
49 }
50}
51
52# Aurora PostgreSQL Instances
53resource "aws_rds_cluster_instance" "keycloak" {
54 count = 2 # Create 2 instances for high availability
55 identifier = "keycloak-instance-${count.index}"
56 cluster_identifier = aws_rds_cluster.keycloak.id
57 instance_class = "db.r5.large"
58 engine = "aurora-postgresql"
59 engine_version = "13.7"
60 db_subnet_group_name = aws_db_subnet_group.keycloak.name
61
62 tags = {
63 Name = "Keycloak Aurora Instance ${count.index}"
64 }
65}
ECS Cluster and Task Definition
Now, let's create an ECS cluster and task definition for running Keycloak:
ECS Infrastructure
Now we'll set up the ECS (Elastic Container Service) infrastructure to run Keycloak. We're using AWS Fargate, which is a serverless compute engine that allows you to run containers without managing servers.
This eliminates the need to provision and maintain EC2 instances, making it easier to scale and operate Keycloak.
The following code creates:
- An ECS cluster to group and manage your container tasks
- The necessary IAM roles for task execution
- A task definition that specifies how Keycloak should run
KEY CONFIGURATION The task definition includes important settings like CPU/memory allocations, container image, environment variables for database connection, admin credentials, and health checks.
1# ECS Cluster
2resource "aws_ecs_cluster" "keycloak" {
3 name = "keycloak-cluster"
4
5 setting {
6 name = "containerInsights"
7 value = "enabled"
8 }
9
10 tags = {
11 Name = "Keycloak ECS Cluster"
12 }
13}
14
15# IAM Role for ECS Task Execution
16resource "aws_iam_role" "ecs_task_execution_role" {
17 name = "keycloak-ecs-task-execution-role"
18
19 assume_role_policy = jsonencode({
20 Version = "2012-10-17",
21 Statement = [
22 {
23 Effect = "Allow",
24 Principal = {
25 Service = "ecs-tasks.amazonaws.com"
26 },
27 Action = "sts:AssumeRole"
28 }
29 ]
30 })
31}
32
33# Attach the AWS managed policy for ECS task execution
34resource "aws_iam_role_policy_attachment" "ecs_task_execution_role_policy" {
35 role = aws_iam_role.ecs_task_execution_role.name
36 policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
37}
38
39# ECS Task Definition
40resource "aws_ecs_task_definition" "keycloak" {
41 family = "keycloak"
42 network_mode = "awsvpc"
43 requires_compatibilities = ["FARGATE"]
44 cpu = "1024"
45 memory = "2048"
46 execution_role_arn = aws_iam_role.ecs_task_execution_role.arn
47
48 container_definitions = jsonencode([
49 {
50 name = "keycloak"
51 image = "quay.io/keycloak/keycloak:20.0.3"
52 essential = true
53 portMappings = [
54 {
55 containerPort = 8080
56 hostPort = 8080
57 protocol = "tcp"
58 }
59 ]
60 environment = [
61 { name = "KC_DB", value = "postgres" },
62 { name = "KC_DB_URL", value = "jdbc:postgresql://keycloak-cluster.cluster-abc123xyz.us-east-1.rds.amazonaws.com:5432/keycloak" },
63 { name = "KC_DB_USERNAME", value = "keycloak" },
64 { name = "KC_DB_PASSWORD", value = "var.database_password" },
65 { name = "KEYCLOAK_ADMIN", value = "admin" },
66 { name = "KEYCLOAK_ADMIN_PASSWORD", value = "var.keycloak_admin_password" },
67 { name = "KC_HOSTNAME", value = "auth.example.com" },
68 { name = "KC_PROXY", value = "edge" },
69 { name = "KC_HEALTH_ENABLED", value = "true" },
70 { name = "KC_HTTP_ENABLED", value = "true" },
71 { name = "KC_HTTP_PORT", value = "8080" },
72 { name = "KC_CACHE", value = "ispn" },
73 { name = "KC_CACHE_STACK", value = "kubernetes" }
74 ]
75 command = ["start", "--optimized"]
76 logConfiguration = {
77 logDriver = "awslogs"
78 options = {
79 "awslogs-group" = "/ecs/keycloak"
80 "awslogs-region" = "us-east-1"
81 "awslogs-stream-prefix" = "keycloak"
82 "awslogs-create-group" = "true"
83 }
84 }
85 healthCheck = {
86 command = ["CMD-SHELL", "curl -f http://localhost:8080/health/ready || exit 1"]
87 interval = 30
88 timeout = 5
89 retries = 3
90 startPeriod = 60
91 }
92 }
93 ])
94
95 tags = {
96 Name = "Keycloak Task Definition"
97 }
98}
Next, let's create an ECS service to run the Keycloak task with auto-scaling:
ECS Service & Auto-Scaling
With our ECS cluster and task definition in place, we now need to create an ECS service that will maintain and scale the desired number of Keycloak instances. This service ensures that the specified number of tasks are running at all times and integrates with the Application Load Balancer for traffic distribution.
We're also setting up auto-scaling to automatically adjust capacity based on demand, which is essential for handling traffic spikes efficiently.
The following code creates:
- An ECS service that runs two Keycloak instances across multiple Availability Zones
- CloudWatch logs for monitoring
- An auto-scaling policy that adds more instances when CPU utilization exceeds 70%
PERFORMANCE TIP The scale-out cooldown is set to just 60 seconds to quickly respond to traffic increases, while scale-in cooldown is 300 seconds to prevent rapid fluctuations.
1# ECS Service
2resource "aws_ecs_service" "keycloak" {
3 name = "keycloak-service"
4 cluster = aws_ecs_cluster.keycloak.id
5 task_definition = aws_ecs_task_definition.keycloak.arn
6 launch_type = "FARGATE"
7 desired_count = 2
8
9 network_configuration {
10 subnets = module.vpc.private_subnets
11 security_groups = [aws_security_group.keycloak.id]
12 assign_public_ip = false
13 }
14
15 load_balancer {
16 target_group_arn = aws_lb_target_group.keycloak.arn
17 container_name = "keycloak"
18 container_port = 8080
19 }
20
21 health_check_grace_period_seconds = 300
22
23 deployment_controller {
24 type = "ECS"
25 }
26
27 # Configure service discovery if needed
28
29 depends_on = [aws_lb_listener.keycloak_https]
30
31 tags = {
32 Name = "Keycloak ECS Service"
33 }
34}
35
36# CloudWatch Log Group for Keycloak
37resource "aws_cloudwatch_log_group" "keycloak" {
38 name = "/ecs/keycloak"
39 retention_in_days = 30
40
41 tags = {
42 Name = "Keycloak Logs"
43 }
44}
45
46# Auto Scaling for Keycloak
47resource "aws_appautoscaling_target" "keycloak" {
48 max_capacity = 10
49 min_capacity = 2
50 resource_id = "service/keycloak-cluster/keycloak-service"
51 scalable_dimension = "ecs:service:DesiredCount"
52 service_namespace = "ecs"
53}
54
55# Auto Scaling Policy based on CPU Utilization
56resource "aws_appautoscaling_policy" "keycloak_cpu" {
57 name = "keycloak-cpu-autoscaling"
58 policy_type = "TargetTrackingScaling"
59 resource_id = aws_appautoscaling_target.keycloak.resource_id
60 scalable_dimension = aws_appautoscaling_target.keycloak.scalable_dimension
61 service_namespace = aws_appautoscaling_target.keycloak.service_namespace
62
63 target_tracking_scaling_policy_configuration {
64 predefined_metric_specification {
65 predefined_metric_type = "ECSServiceAverageCPUUtilization"
66 }
67 target_value = 70
68 scale_in_cooldown = 300
69 scale_out_cooldown = 60
70 }
71}
Application Load Balancer (ALB) Setup
Let's set up an Application Load Balancer to distribute traffic to the Keycloak instances:
Load Balancer Configuration
An Application Load Balancer (ALB) is essential for distributing traffic across multiple Keycloak instances, providing high availability, and enabling SSL termination. The ALB sits in the public subnets and routes traffic to Keycloak containers running in private subnets, adding an important layer of security.
This architecture follows the security principle of defense in depth, with multiple layers of protection.
The following code creates:
- An internet-facing Application Load Balancer
- A target group that routes traffic to the Keycloak instances
- Listeners for both HTTPS and HTTP traffic (with HTTP redirecting to HTTPS)
SECURITY FEATURE The HTTPS listener uses an SSL certificate from AWS Certificate Manager (ACM) to provide secure communication.
1# Application Load Balancer
2resource "aws_lb" "keycloak" {
3 name = "keycloak-alb"
4 internal = false
5 load_balancer_type = "application"
6 security_groups = [aws_security_group.alb.id]
7 subnets = module.vpc.public_subnets
8
9 enable_deletion_protection = true # Set to true for production
10
11 tags = {
12 Name = "Keycloak ALB"
13 }
14}
15
16# Target Group for Keycloak
17resource "aws_lb_target_group" "keycloak" {
18 name = "keycloak-target-group"
19 port = 8080
20 protocol = "HTTP"
21 vpc_id = module.vpc.vpc_id
22 target_type = "ip"
23
24 health_check {
25 enabled = true
26 interval = 30
27 path = "/health/ready"
28 port = "traffic-port"
29 healthy_threshold = 3
30 unhealthy_threshold = 3
31 timeout = 5
32 protocol = "HTTP"
33 matcher = "200"
34 }
35
36 tags = {
37 Name = "Keycloak Target Group"
38 }
39}
40
41# HTTPS Listener
42resource "aws_lb_listener" "keycloak_https" {
43 load_balancer_arn = aws_lb.keycloak.arn
44 port = 443
45 protocol = "HTTPS"
46 ssl_policy = "ELBSecurityPolicy-TLS-1-2-2017-01"
47 certificate_arn = var.acm_certificate_arn
48
49 default_action {
50 type = "forward"
51 target_group_arn = aws_lb_target_group.keycloak.arn
52 }
53}
54
55# HTTP to HTTPS Redirect
56resource "aws_lb_listener" "keycloak_http" {
57 load_balancer_arn = aws_lb.keycloak.arn
58 port = 80
59 protocol = "HTTP"
60
61 default_action {
62 type = "redirect"
63 redirect {
64 port = "443"
65 protocol = "HTTPS"
66 status_code = "HTTP_301"
67 }
68 }
69}
Route 53 and ACM Setup
Finally, let's configure DNS and SSL certificates:
DNS and SSL Configuration
The final piece of our infrastructure is DNS configuration and SSL certificate management. Amazon Route 53 is a highly available and scalable Domain Name System (DNS) web service that will route users to your Keycloak instance. AWS Certificate Manager (ACM) provides free SSL certificates that can be used with AWS services like the Application Load Balancer.
Using a custom domain with proper SSL certificates is essential for production environments, as it provides a professional appearance and secure communication channel.
The following code creates:
- An SSL certificate (if needed)
- DNS records to point your domain to the Application Load Balancer
- The necessary DNS validation records if you're creating a new certificate
RELIABILITY FEATURE The alias record in Route 53 automatically updates if your ALB's IP address changes.
1# ACM Certificate (if not already existing)
2resource "aws_acm_certificate" "keycloak" {
3 count = var.create_certificate ? 1 : 0
4
5 domain_name = var.keycloak_hostname
6 validation_method = "DNS"
7
8 lifecycle {
9 create_before_destroy = true
10 }
11
12 tags = {
13 Name = "Keycloak Certificate"
14 }
15}
16
17# Route 53 Record for Keycloak
18resource "aws_route53_record" "keycloak" {
19 zone_id = var.route53_zone_id
20 name = var.keycloak_hostname
21 type = "A"
22
23 alias {
24 name = aws_lb.keycloak.dns_name
25 zone_id = aws_lb.keycloak.zone_id
26 evaluate_target_health = true
27 }
28}
29
30# Certificate validation records (if creating a new certificate)
31resource "aws_route53_record" "keycloak_validation" {
32 for_each = var.create_certificate ? {
33 for dvo in aws_acm_certificate.keycloak[0].domain_validation_options : dvo.domain_name => {
34 name = dvo.resource_record_name
35 record = dvo.resource_record_value
36 type = dvo.resource_record_type
37 }
38 } : {}
39
40 allow_overwrite = true
41 name = each.value.name
42 records = [each.value.record]
43 ttl = 60
44 type = each.value.type
45 zone_id = var.route53_zone_id
46}
Terraform Modules
For better organization, you can structure your Terraform code using modules. Here's a suggested module structure:
Modular Infrastructure Code
As your infrastructure grows in complexity, it's a good practice to organize your Terraform code into modules. Modules are containers for multiple resources that are used together and allow you to treat a group of resources as if they were a single resource.
This modular approach improves code reusability, readability, and maintainability.
Benefits of using modules include:
- Encapsulation of specific infrastructure components
- Easier collaboration in teams
- Better code organization and maintenance
- Improved reusability across projects
1# modules/vpc/main.tf
2# modules/database/main.tf
3# modules/ecs/main.tf
4# modules/alb/main.tf
5# modules/dns/main.tf
6# modules/monitoring/main.tf
7
8# Root main.tf
9module "vpc" {
10 source = "./modules/vpc"
11 # ...
12}
13
14module "database" {
15 source = "./modules/database"
16 vpc_id = module.vpc.vpc_id
17 # ...
18}
19
20module "ecs" {
21 source = "./modules/ecs"
22 vpc_id = module.vpc.vpc_id
23 db_endpoint = module.database.endpoint
24 # ...
25}
26
27# ... other modules
28
Deployment
To deploy the infrastructure, follow these steps:
Deployment Process
With all our Terraform code in place, we're ready to deploy the infrastructure. Terraform follows a declarative approach where you define the desired state of your infrastructure, and Terraform figures out how to achieve that state.
The deployment process involves initializing Terraform, planning the changes, and applying them.
The commands below will:
- Initialize the working directory and download provider plugins
- Plan the changes to show what actions Terraform will take
- Apply the changes to create the infrastructure
- Verify the deployment with output variables
IMPORTANT Always review the plan carefully before applying changes to ensure they match your expectations.
1# Initialize Terraform
2terraform init
3
4# Plan the deployment
5terraform plan -var-file=prod.tfvars -out=tfplan
6
7# Apply the deployment
8terraform apply tfplan
9
10# Verify deployment
11terraform output
12
After deployment, your Keycloak instance will be available at the domain you configured (e.g., keycloak.example.com
).
Testing the Deployment
Verification Steps
After deploying your infrastructure, it's crucial to thoroughly test your Keycloak deployment to ensure everything is working as expected. This includes testing functionality, security, and high availability.
Don't skip testing! A well-tested deployment prevents issues when users start accessing the system.
1Basic Functionality
- Access the Keycloak admin console using configured credentials
- Create a test realm and user
- Configure a simple client application
2Advanced Testing
- Test authentication flows
- Verify logs in CloudWatch
- Test high availability by simulating failures
Conclusion
In this guide, we've covered how to deploy Keycloak on AWS ECS with Fargate using Terraform. This approach provides a scalable, highly available, and secure identity management solution without the overhead of managing the underlying infrastructure.
Key Benefits Summary
This architecture provides a production-ready Keycloak deployment that follows AWS best practices for security, scalability, and reliability. The serverless approach with Fargate minimizes operational overhead while maintaining full control over your identity management system.
Infrastructure
- Serverless operation with Fargate
- High availability across multiple AZs
- Automatic scaling based on demand
Security
- Secure network configuration
- SSL/TLS encryption
- Private subnets for sensitive components
Management
- Managed database with Aurora PostgreSQL
- Infrastructure as Code with Terraform
- Monitoring with CloudWatch
This solution is suitable for production environments and can be further enhanced with additional monitoring, alerting, and security features based on your specific requirements.