← Back to Blog

Deploying Keycloak on AWS ECS with Fargate using Terraform

February 7, 2024
8 min read
AWS
Terraform
Keycloak
DevOps
Infrastructure

Introduction

Keycloak is a popular open-source Identity and Access Management solution that provides single sign-on, identity federation, social login, and more. Deploying Keycloak in a production environment requires careful planning to ensure security, scalability, and high availability.

In this comprehensive guide, we'll walk through deploying Keycloak on AWS Elastic Container Service (ECS) with Fargate using Terraform. This serverless approach eliminates the need to manage the underlying infrastructure, allowing you to focus on your application.

Why This Approach?

Using ECS with Fargate provides a serverless container platform that eliminates the need to provision and manage servers. Combined with Terraform for infrastructure as code, this approach offers the perfect balance of control, scalability, and operational simplicity.

Architecture Overview

Before diving into the implementation details, let's understand the architecture we'll be building:

The architecture follows AWS best practices with a focus on security, high availability, and scalability:

Infrastructure Components

  • Keycloak runs as a containerized application in ECS Fargate
  • Aurora PostgreSQL provides a highly available database backend
  • Application Load Balancer (ALB) distributes traffic across multiple instances

Security & Reliability

  • Security groups and network ACLs control traffic flow
  • Auto-scaling ensures the system can handle varying loads
  • CloudWatch provides monitoring and alerting

Prerequisites

Before You Begin

Make sure you have all the prerequisites in place before starting the deployment process. Missing requirements can lead to errors or security issues in your infrastructure.

Before we begin, ensure you have the following:

  • AWS Account with appropriate permissions
  • Terraform (version 1.0 or later)
  • AWS CLI configured with appropriate credentials
  • A registered domain name (optional, but recommended for production)

VPC Setup

We'll start by creating a Virtual Private Cloud (VPC) with public and private subnets across multiple Availability Zones:

VPC Configuration Explained

The following Terraform code creates the foundation of our infrastructure: a Virtual Private Cloud (VPC) with both public and private subnets. This network isolation is crucial for security, as it allows us to place our database and Keycloak containers in private subnets while keeping only the load balancer in public subnets.

We're using the AWS VPC module which simplifies the creation of a properly configured VPC with NAT gateways for outbound internet access from private subnets.

We're also creating a security group for the Application Load Balancer that will allow HTTPS (port 443) and HTTP (port 80) traffic from the internet. The HTTP traffic will be redirected to HTTPS for secure communication.

BEST PRACTICE This setup follows AWS best practices for a production-grade environment with multiple Availability Zones for high availability.

1# VPC Configuration
2module "vpc" {
3  source  = "terraform-aws-modules/vpc/aws"
4  version = "~> 3.0"
5
6  name = "keycloak-vpc"
7  cidr = "10.0.0.0/16"
8
9  azs             = ["us-east-1a", "us-east-1b", "us-east-1c"]
10  private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
11  public_subnets  = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
12
13  enable_nat_gateway = true
14  single_nat_gateway = false  # For production, use multiple NAT gateways
15  one_nat_gateway_per_az = true
16
17  enable_vpn_gateway = false
18
19  # Enable DNS support for the VPC
20  enable_dns_hostnames = true
21  enable_dns_support   = true
22
23  # Add tags for better resource management
24  tags = {
25    Environment = "production"
26    Project     = "keycloak"
27    Terraform   = "true"
28  }
29}
30
31# Security Groups
32resource "aws_security_group" "alb" {
33  name        = "keycloak-alb-sg"
34  description = "Security group for Keycloak ALB"
35  vpc_id      = module.vpc.vpc_id
36
37  ingress {
38    description = "HTTPS from internet"
39    from_port   = 443
40    to_port     = 443
41    protocol    = "tcp"
42    cidr_blocks = ["0.0.0.0/0"]
43  }
44
45  ingress {
46    description = "HTTP from internet (for redirects)"
47    from_port   = 80
48    to_port     = 80
49    protocol    = "tcp"
50    cidr_blocks = ["0.0.0.0/0"]
51  }
52
53  egress {
54    from_port   = 0
55    to_port     = 0
56    protocol    = "-1"
57    cidr_blocks = ["0.0.0.0/0"]
58  }
59}

RDS (Aurora PostgreSQL) Setup

Keycloak requires a database to store its configuration and user data. We'll use Amazon Aurora PostgreSQLfor its high availability and performance:

Database Infrastructure

Next, we'll set up the database infrastructure for Keycloak. We're using Amazon Aurora PostgreSQL, which is a fully managed, PostgreSQL-compatible relational database that provides the performance and availability of commercial-grade databases at a fraction of the cost.

This is critical for Keycloak as it stores all user data, authentication configurations, and session information.

The code below creates:

  • A database subnet group that spans multiple Availability Zones for high availability
  • A security group that only allows traffic from the Keycloak service
  • An Aurora PostgreSQL cluster with two instances

SECURITY NOTE For production environments, you should use AWS Secrets Manager instead of hardcoding database credentials in Terraform variables.

1# Database Subnet Group
2resource "aws_db_subnet_group" "keycloak" {
3  name       = "keycloak-db-subnet-group"
4  subnet_ids = module.vpc.private_subnets
5
6  tags = {
7    Name = "Keycloak DB Subnet Group"
8  }
9}
10
11# Database Security Group
12resource "aws_security_group" "database" {
13  name        = "keycloak-database-sg"
14  description = "Security group for Keycloak database"
15  vpc_id      = module.vpc.vpc_id
16
17  ingress {
18    description     = "PostgreSQL from Keycloak service"
19    from_port       = 5432
20    to_port         = 5432
21    protocol        = "tcp"
22    security_groups = [aws_security_group.keycloak.id]
23  }
24
25  egress {
26    from_port   = 0
27    to_port     = 0
28    protocol    = "-1"
29    cidr_blocks = ["0.0.0.0/0"]
30  }
31}
32
33# Aurora PostgreSQL Cluster
34resource "aws_rds_cluster" "keycloak" {
35  cluster_identifier      = "keycloak-cluster"
36  engine                  = "aurora-postgresql"
37  engine_version          = "13.7"
38  database_name           = "keycloak"
39  master_username         = "keycloak"
40  master_password         = var.database_password  # Use AWS Secrets Manager in production
41  backup_retention_period = 7
42  preferred_backup_window = "03:00-04:00"
43  db_subnet_group_name    = aws_db_subnet_group.keycloak.name
44  vpc_security_group_ids  = [aws_security_group.database.id]
45  skip_final_snapshot     = true  # Change for production
46
47  tags = {
48    Name = "Keycloak Aurora Cluster"
49  }
50}
51
52# Aurora PostgreSQL Instances
53resource "aws_rds_cluster_instance" "keycloak" {
54  count              = 2  # Create 2 instances for high availability
55  identifier         = "keycloak-instance-${count.index}"
56  cluster_identifier = aws_rds_cluster.keycloak.id
57  instance_class     = "db.r5.large"
58  engine             = "aurora-postgresql"
59  engine_version     = "13.7"
60  db_subnet_group_name = aws_db_subnet_group.keycloak.name
61
62  tags = {
63    Name = "Keycloak Aurora Instance ${count.index}"
64  }
65}

ECS Cluster and Task Definition

Now, let's create an ECS cluster and task definition for running Keycloak:

ECS Infrastructure

Now we'll set up the ECS (Elastic Container Service) infrastructure to run Keycloak. We're using AWS Fargate, which is a serverless compute engine that allows you to run containers without managing servers.

This eliminates the need to provision and maintain EC2 instances, making it easier to scale and operate Keycloak.

The following code creates:

  • An ECS cluster to group and manage your container tasks
  • The necessary IAM roles for task execution
  • A task definition that specifies how Keycloak should run

KEY CONFIGURATION The task definition includes important settings like CPU/memory allocations, container image, environment variables for database connection, admin credentials, and health checks.

1# ECS Cluster
2resource "aws_ecs_cluster" "keycloak" {
3  name = "keycloak-cluster"
4
5  setting {
6    name  = "containerInsights"
7    value = "enabled"
8  }
9
10  tags = {
11    Name = "Keycloak ECS Cluster"
12  }
13}
14
15# IAM Role for ECS Task Execution
16resource "aws_iam_role" "ecs_task_execution_role" {
17  name = "keycloak-ecs-task-execution-role"
18
19  assume_role_policy = jsonencode({
20    Version = "2012-10-17",
21    Statement = [
22      {
23        Effect = "Allow",
24        Principal = {
25          Service = "ecs-tasks.amazonaws.com"
26        },
27        Action = "sts:AssumeRole"
28      }
29    ]
30  })
31}
32
33# Attach the AWS managed policy for ECS task execution
34resource "aws_iam_role_policy_attachment" "ecs_task_execution_role_policy" {
35  role       = aws_iam_role.ecs_task_execution_role.name
36  policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
37}
38
39# ECS Task Definition
40resource "aws_ecs_task_definition" "keycloak" {
41  family                   = "keycloak"
42  network_mode             = "awsvpc"
43  requires_compatibilities = ["FARGATE"]
44  cpu                      = "1024"
45  memory                   = "2048"
46  execution_role_arn       = aws_iam_role.ecs_task_execution_role.arn
47
48  container_definitions = jsonencode([
49    {
50      name  = "keycloak"
51      image = "quay.io/keycloak/keycloak:20.0.3"
52      essential = true
53      portMappings = [
54        {
55          containerPort = 8080
56          hostPort      = 8080
57          protocol      = "tcp"
58        }
59      ]
60      environment = [
61        { name = "KC_DB", value = "postgres" },
62        { name = "KC_DB_URL", value = "jdbc:postgresql://keycloak-cluster.cluster-abc123xyz.us-east-1.rds.amazonaws.com:5432/keycloak" },
63        { name = "KC_DB_USERNAME", value = "keycloak" },
64        { name = "KC_DB_PASSWORD", value = "var.database_password" },
65        { name = "KEYCLOAK_ADMIN", value = "admin" },
66        { name = "KEYCLOAK_ADMIN_PASSWORD", value = "var.keycloak_admin_password" },
67        { name = "KC_HOSTNAME", value = "auth.example.com" },
68        { name = "KC_PROXY", value = "edge" },
69        { name = "KC_HEALTH_ENABLED", value = "true" },
70        { name = "KC_HTTP_ENABLED", value = "true" },
71        { name = "KC_HTTP_PORT", value = "8080" },
72        { name = "KC_CACHE", value = "ispn" },
73        { name = "KC_CACHE_STACK", value = "kubernetes" }
74      ]
75      command = ["start", "--optimized"]
76      logConfiguration = {
77        logDriver = "awslogs"
78        options = {
79          "awslogs-group"         = "/ecs/keycloak"
80          "awslogs-region"        = "us-east-1"
81          "awslogs-stream-prefix" = "keycloak"
82          "awslogs-create-group"  = "true"
83        }
84      }
85      healthCheck = {
86        command     = ["CMD-SHELL", "curl -f http://localhost:8080/health/ready || exit 1"]
87        interval    = 30
88        timeout     = 5
89        retries     = 3
90        startPeriod = 60
91      }
92    }
93  ])
94
95  tags = {
96    Name = "Keycloak Task Definition"
97  }
98}

Next, let's create an ECS service to run the Keycloak task with auto-scaling:

ECS Service & Auto-Scaling

With our ECS cluster and task definition in place, we now need to create an ECS service that will maintain and scale the desired number of Keycloak instances. This service ensures that the specified number of tasks are running at all times and integrates with the Application Load Balancer for traffic distribution.

We're also setting up auto-scaling to automatically adjust capacity based on demand, which is essential for handling traffic spikes efficiently.

The following code creates:

  • An ECS service that runs two Keycloak instances across multiple Availability Zones
  • CloudWatch logs for monitoring
  • An auto-scaling policy that adds more instances when CPU utilization exceeds 70%

PERFORMANCE TIP The scale-out cooldown is set to just 60 seconds to quickly respond to traffic increases, while scale-in cooldown is 300 seconds to prevent rapid fluctuations.

1# ECS Service
2resource "aws_ecs_service" "keycloak" {
3  name            = "keycloak-service"
4  cluster         = aws_ecs_cluster.keycloak.id
5  task_definition = aws_ecs_task_definition.keycloak.arn
6  launch_type     = "FARGATE"
7  desired_count   = 2
8
9  network_configuration {
10    subnets          = module.vpc.private_subnets
11    security_groups  = [aws_security_group.keycloak.id]
12    assign_public_ip = false
13  }
14
15  load_balancer {
16    target_group_arn = aws_lb_target_group.keycloak.arn
17    container_name   = "keycloak"
18    container_port   = 8080
19  }
20
21  health_check_grace_period_seconds = 300
22
23  deployment_controller {
24    type = "ECS"
25  }
26
27  # Configure service discovery if needed
28
29  depends_on = [aws_lb_listener.keycloak_https]
30
31  tags = {
32    Name = "Keycloak ECS Service"
33  }
34}
35
36# CloudWatch Log Group for Keycloak
37resource "aws_cloudwatch_log_group" "keycloak" {
38  name              = "/ecs/keycloak"
39  retention_in_days = 30
40
41  tags = {
42    Name = "Keycloak Logs"
43  }
44}
45
46# Auto Scaling for Keycloak
47resource "aws_appautoscaling_target" "keycloak" {
48  max_capacity       = 10
49  min_capacity       = 2
50  resource_id        = "service/keycloak-cluster/keycloak-service"
51  scalable_dimension = "ecs:service:DesiredCount"
52  service_namespace  = "ecs"
53}
54
55# Auto Scaling Policy based on CPU Utilization
56resource "aws_appautoscaling_policy" "keycloak_cpu" {
57  name               = "keycloak-cpu-autoscaling"
58  policy_type        = "TargetTrackingScaling"
59  resource_id        = aws_appautoscaling_target.keycloak.resource_id
60  scalable_dimension = aws_appautoscaling_target.keycloak.scalable_dimension
61  service_namespace  = aws_appautoscaling_target.keycloak.service_namespace
62
63  target_tracking_scaling_policy_configuration {
64    predefined_metric_specification {
65      predefined_metric_type = "ECSServiceAverageCPUUtilization"
66    }
67    target_value       = 70
68    scale_in_cooldown  = 300
69    scale_out_cooldown = 60
70  }
71}

Application Load Balancer (ALB) Setup

Let's set up an Application Load Balancer to distribute traffic to the Keycloak instances:

Load Balancer Configuration

An Application Load Balancer (ALB) is essential for distributing traffic across multiple Keycloak instances, providing high availability, and enabling SSL termination. The ALB sits in the public subnets and routes traffic to Keycloak containers running in private subnets, adding an important layer of security.

This architecture follows the security principle of defense in depth, with multiple layers of protection.

The following code creates:

  • An internet-facing Application Load Balancer
  • A target group that routes traffic to the Keycloak instances
  • Listeners for both HTTPS and HTTP traffic (with HTTP redirecting to HTTPS)

SECURITY FEATURE The HTTPS listener uses an SSL certificate from AWS Certificate Manager (ACM) to provide secure communication.

1# Application Load Balancer
2resource "aws_lb" "keycloak" {
3  name               = "keycloak-alb"
4  internal           = false
5  load_balancer_type = "application"
6  security_groups    = [aws_security_group.alb.id]
7  subnets            = module.vpc.public_subnets
8
9  enable_deletion_protection = true  # Set to true for production
10
11  tags = {
12    Name = "Keycloak ALB"
13  }
14}
15
16# Target Group for Keycloak
17resource "aws_lb_target_group" "keycloak" {
18  name     = "keycloak-target-group"
19  port     = 8080
20  protocol = "HTTP"
21  vpc_id   = module.vpc.vpc_id
22  target_type = "ip"
23
24  health_check {
25    enabled             = true
26    interval            = 30
27    path                = "/health/ready"
28    port                = "traffic-port"
29    healthy_threshold   = 3
30    unhealthy_threshold = 3
31    timeout             = 5
32    protocol            = "HTTP"
33    matcher             = "200"
34  }
35
36  tags = {
37    Name = "Keycloak Target Group"
38  }
39}
40
41# HTTPS Listener
42resource "aws_lb_listener" "keycloak_https" {
43  load_balancer_arn = aws_lb.keycloak.arn
44  port              = 443
45  protocol          = "HTTPS"
46  ssl_policy        = "ELBSecurityPolicy-TLS-1-2-2017-01"
47  certificate_arn   = var.acm_certificate_arn
48
49  default_action {
50    type             = "forward"
51    target_group_arn = aws_lb_target_group.keycloak.arn
52  }
53}
54
55# HTTP to HTTPS Redirect
56resource "aws_lb_listener" "keycloak_http" {
57  load_balancer_arn = aws_lb.keycloak.arn
58  port              = 80
59  protocol          = "HTTP"
60
61  default_action {
62    type = "redirect"
63    redirect {
64      port        = "443"
65      protocol    = "HTTPS"
66      status_code = "HTTP_301"
67    }
68  }
69}

Route 53 and ACM Setup

Finally, let's configure DNS and SSL certificates:

DNS and SSL Configuration

The final piece of our infrastructure is DNS configuration and SSL certificate management. Amazon Route 53 is a highly available and scalable Domain Name System (DNS) web service that will route users to your Keycloak instance. AWS Certificate Manager (ACM) provides free SSL certificates that can be used with AWS services like the Application Load Balancer.

Using a custom domain with proper SSL certificates is essential for production environments, as it provides a professional appearance and secure communication channel.

The following code creates:

  • An SSL certificate (if needed)
  • DNS records to point your domain to the Application Load Balancer
  • The necessary DNS validation records if you're creating a new certificate

RELIABILITY FEATURE The alias record in Route 53 automatically updates if your ALB's IP address changes.

1# ACM Certificate (if not already existing)
2resource "aws_acm_certificate" "keycloak" {
3  count = var.create_certificate ? 1 : 0
4
5  domain_name       = var.keycloak_hostname
6  validation_method = "DNS"
7
8  lifecycle {
9    create_before_destroy = true
10  }
11
12  tags = {
13    Name = "Keycloak Certificate"
14  }
15}
16
17# Route 53 Record for Keycloak
18resource "aws_route53_record" "keycloak" {
19  zone_id = var.route53_zone_id
20  name    = var.keycloak_hostname
21  type    = "A"
22
23  alias {
24    name                   = aws_lb.keycloak.dns_name
25    zone_id                = aws_lb.keycloak.zone_id
26    evaluate_target_health = true
27  }
28}
29
30# Certificate validation records (if creating a new certificate)
31resource "aws_route53_record" "keycloak_validation" {
32  for_each = var.create_certificate ? {
33    for dvo in aws_acm_certificate.keycloak[0].domain_validation_options : dvo.domain_name => {
34      name   = dvo.resource_record_name
35      record = dvo.resource_record_value
36      type   = dvo.resource_record_type
37    }
38  } : {}
39
40  allow_overwrite = true
41  name            = each.value.name
42  records         = [each.value.record]
43  ttl             = 60
44  type            = each.value.type
45  zone_id         = var.route53_zone_id
46}

Terraform Modules

For better organization, you can structure your Terraform code using modules. Here's a suggested module structure:

Modular Infrastructure Code

As your infrastructure grows in complexity, it's a good practice to organize your Terraform code into modules. Modules are containers for multiple resources that are used together and allow you to treat a group of resources as if they were a single resource.

This modular approach improves code reusability, readability, and maintainability.

Benefits of using modules include:

  • Encapsulation of specific infrastructure components
  • Easier collaboration in teams
  • Better code organization and maintenance
  • Improved reusability across projects
1# modules/vpc/main.tf
2# modules/database/main.tf
3# modules/ecs/main.tf
4# modules/alb/main.tf
5# modules/dns/main.tf
6# modules/monitoring/main.tf
7
8# Root main.tf
9module "vpc" {
10  source = "./modules/vpc"
11  # ...
12}
13
14module "database" {
15  source = "./modules/database"
16  vpc_id = module.vpc.vpc_id
17  # ...
18}
19
20module "ecs" {
21  source = "./modules/ecs"
22  vpc_id = module.vpc.vpc_id
23  db_endpoint = module.database.endpoint
24  # ...
25}
26
27# ... other modules
28

Deployment

To deploy the infrastructure, follow these steps:

Deployment Process

With all our Terraform code in place, we're ready to deploy the infrastructure. Terraform follows a declarative approach where you define the desired state of your infrastructure, and Terraform figures out how to achieve that state.

The deployment process involves initializing Terraform, planning the changes, and applying them.

The commands below will:

  1. Initialize the working directory and download provider plugins
  2. Plan the changes to show what actions Terraform will take
  3. Apply the changes to create the infrastructure
  4. Verify the deployment with output variables

IMPORTANT Always review the plan carefully before applying changes to ensure they match your expectations.

1# Initialize Terraform
2terraform init
3
4# Plan the deployment
5terraform plan -var-file=prod.tfvars -out=tfplan
6
7# Apply the deployment
8terraform apply tfplan
9
10# Verify deployment
11terraform output
12

After deployment, your Keycloak instance will be available at the domain you configured (e.g., keycloak.example.com).

Testing the Deployment

Verification Steps

After deploying your infrastructure, it's crucial to thoroughly test your Keycloak deployment to ensure everything is working as expected. This includes testing functionality, security, and high availability.

Don't skip testing! A well-tested deployment prevents issues when users start accessing the system.

1Basic Functionality

  • Access the Keycloak admin console using configured credentials
  • Create a test realm and user
  • Configure a simple client application

2Advanced Testing

  • Test authentication flows
  • Verify logs in CloudWatch
  • Test high availability by simulating failures

Conclusion

In this guide, we've covered how to deploy Keycloak on AWS ECS with Fargate using Terraform. This approach provides a scalable, highly available, and secure identity management solution without the overhead of managing the underlying infrastructure.

Key Benefits Summary

This architecture provides a production-ready Keycloak deployment that follows AWS best practices for security, scalability, and reliability. The serverless approach with Fargate minimizes operational overhead while maintaining full control over your identity management system.

Infrastructure

  • Serverless operation with Fargate
  • High availability across multiple AZs
  • Automatic scaling based on demand

Security

  • Secure network configuration
  • SSL/TLS encryption
  • Private subnets for sensitive components

Management

  • Managed database with Aurora PostgreSQL
  • Infrastructure as Code with Terraform
  • Monitoring with CloudWatch

This solution is suitable for production environments and can be further enhanced with additional monitoring, alerting, and security features based on your specific requirements.

Need help deploying Keycloak on AWS?
I offer expert consultation on AWS infrastructure, Keycloak deployment, and Terraform implementation. Let's discuss how I can help you build a secure, scalable authentication system.