Introduction

Keycloak is a popular open-source identity and access management solution that provides single sign-on, identity federation, social login, and much more. Deploying Keycloak in a production environment requires careful planning to ensure security, scalability, and high availability.

In this comprehensive guide, we'll walk through deploying Keycloak on AWS Elastic Container Service (ECS) with Fargate using Terraform. This serverless approach eliminates the need to manage underlying infrastructure, allowing you to focus on your application.

Why This Approach?

Using ECS with Fargate provides a serverless container platform that eliminates the need to provision and manage servers. Combined with Terraform for infrastructure as code, this approach offers the perfect balance of control, scalability, and operational simplicity.

Architecture Overview

Before diving into implementation details, let's break down the architecture we'll be building:

The architecture follows AWS best practices with a focus on security, high availability, and scalability:

Infrastructure Components

  • Keycloak runs as a containerized application in ECS Fargate
  • Aurora PostgreSQL provides a highly available database backend
  • Application Load Balancer (ALB) distributes traffic across multiple instances

Security and Reliability

  • Security groups and network ACLs control traffic flow
  • Auto scaling ensures handling of varying loads
  • CloudWatch provides monitoring and alerting

Prerequisites

Before You Begin

Make sure you have all the necessary prerequisites before starting the deployment process. Missing requirements could lead to errors or security issues in your infrastructure.

Before we begin, ensure you have the following:

  • AWS account with appropriate permissions
  • Terraform (version 1.0 or newer)
  • AWS CLI configured with appropriate credentials
  • Registered domain name (optional but recommended for production)

VPC Setup

Let's start by creating a Virtual Private Cloud (VPC) with public and private subnets across multiple availability zones:

1# VPC Configuration
2module "vpc" {
3  source  = "terraform-aws-modules/vpc/aws"
4  version = "~> 3.0"
5
6  name = "keycloak-vpc"
7  cidr = "10.0.0.0/16"
8
9  azs             = ["us-east-1a", "us-east-1b", "us-east-1c"]
10  private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
11  public_subnets  = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
12
13  enable_nat_gateway = true
14  single_nat_gateway = false  # For production use multiple NAT gateways
15  one_nat_gateway_per_az = true
16
17  enable_vpn_gateway = false
18
19  # Enable DNS support for VPC
20  enable_dns_hostnames = true
21  enable_dns_support   = true
22
23  # Add tags for better resource management
24  tags = {
25    Environment = "production"
26    Project     = "keycloak"
27    Terraform   = "true"
28  }
29}
30
31# Security Groups
32resource "aws_security_group" "alb" {
33  name        = "keycloak-alb-sg"
34  description = "Security group for Keycloak ALB"
35  vpc_id      = module.vpc.vpc_id
36
37  ingress {
38    description = "HTTPS from internet"
39    from_port   = 443
40    to_port     = 443
41    protocol    = "tcp"
42    cidr_blocks = ["0.0.0.0/0"]
43  }
44
45  ingress {
46    description = "HTTP from internet (for redirects)"
47    from_port   = 80
48    to_port     = 80
49    protocol    = "tcp"
50    cidr_blocks = ["0.0.0.0/0"]
51  }
52
53  egress {
54    from_port   = 0
55    to_port     = 0
56    protocol    = "-1"
57    cidr_blocks = ["0.0.0.0/0"]
58  }
59}

RDS Setup (Aurora PostgreSQL)

Keycloak requires a database to store configuration and user data. We'll use Amazon Aurora PostgreSQL for its high availability and performance:

1# Database subnet group
2resource "aws_db_subnet_group" "keycloak" {
3  name       = "keycloak-db-subnet-group"
4  subnet_ids = module.vpc.private_subnets
5
6  tags = {
7    Name = "Keycloak DB Subnet Group"
8  }
9}
10
11# Database security group
12resource "aws_security_group" "database" {
13  name        = "keycloak-database-sg"
14  description = "Security group for Keycloak database"
15  vpc_id      = module.vpc.vpc_id
16
17  ingress {
18    description     = "PostgreSQL from Keycloak service"
19    from_port       = 5432
20    to_port         = 5432
21    protocol        = "tcp"
22    security_groups = [aws_security_group.keycloak.id]
23  }
24
25  egress {
26    from_port   = 0
27    to_port     = 0
28    protocol    = "-1"
29    cidr_blocks = ["0.0.0.0/0"]
30  }
31}
32
33# Aurora PostgreSQL cluster
34resource "aws_rds_cluster" "keycloak" {
35  cluster_identifier      = "keycloak-cluster"
36  engine                  = "aurora-postgresql"
37  engine_version          = "13.7"
38  database_name           = "keycloak"
39  master_username         = "keycloak"
40  master_password         = var.database_password  # Use AWS Secrets Manager in production
41  backup_retention_period = 7
42  preferred_backup_window = "03:00-04:00"
43  db_subnet_group_name    = aws_db_subnet_group.keycloak.name
44  vpc_security_group_ids  = [aws_security_group.database.id]
45  skip_final_snapshot     = true  # Change for production
46
47  tags = {
48    Name = "Keycloak Aurora Cluster"
49  }
50}

ECS Cluster and Task Definition

Now let's create the ECS cluster and task definition to run Keycloak:

1# ECS Cluster
2resource "aws_ecs_cluster" "keycloak" {
3  name = "keycloak-cluster"
4
5  setting {
6    name  = "containerInsights"
7    value = "enabled"
8  }
9
10  tags = {
11    Name = "Keycloak ECS Cluster"
12  }
13}
14
15# ECS Task Definition
16resource "aws_ecs_task_definition" "keycloak" {
17  family                   = "keycloak"
18  network_mode             = "awsvpc"
19  requires_compatibilities = ["FARGATE"]
20  cpu                      = "1024"
21  memory                   = "2048"
22  execution_role_arn       = aws_iam_role.ecs_task_execution_role.arn
23
24  container_definitions = jsonencode([
25    {
26      name  = "keycloak"
27      image = "quay.io/keycloak/keycloak:20.0.3"
28      essential = true
29      portMappings = [
30        {
31          containerPort = 8080
32          hostPort      = 8080
33          protocol      = "tcp"
34        }
35      ]
36      environment = [
37        { name = "KC_DB", value = "postgres" },
38        { name = "KC_DB_URL", value = "jdbc:postgresql://keycloak-cluster.cluster-abc123xyz.us-east-1.rds.amazonaws.com:5432/keycloak" },
39        { name = "KC_DB_USERNAME", value = "keycloak" },
40        { name = "KC_DB_PASSWORD", value = "var.database_password" },
41        { name = "KEYCLOAK_ADMIN", value = "admin" },
42        { name = "KEYCLOAK_ADMIN_PASSWORD", value = "var.keycloak_admin_password" }
43      ]
44      command = ["start", "--optimized"]
45    }
46  ])
47}

Application Load Balancer (ALB) Setup

Let's configure the Application Load Balancer to distribute traffic across Keycloak instances:

1# Application Load Balancer
2resource "aws_lb" "keycloak" {
3  name               = "keycloak-alb"
4  internal           = false
5  load_balancer_type = "application"
6  security_groups    = [aws_security_group.alb.id]
7  subnets            = module.vpc.public_subnets
8
9  enable_deletion_protection = true  # Set to true for production
10
11  tags = {
12    Name = "Keycloak ALB"
13  }
14}
15
16# Target Group
17resource "aws_lb_target_group" "keycloak" {
18  name        = "keycloak-tg"
19  port        = 8080
20  protocol    = "HTTP"
21  vpc_id      = module.vpc.vpc_id
22  target_type = "ip"
23
24  health_check {
25    enabled             = true
26    healthy_threshold   = 2
27    unhealthy_threshold = 2
28    timeout             = 5
29    interval            = 30
30    path                = "/health"
31    matcher             = "200"
32    port                = "traffic-port"
33    protocol            = "HTTP"
34  }
35
36  tags = {
37    Name = "Keycloak Target Group"
38  }
39}
40
41# Listener
42resource "aws_lb_listener" "keycloak" {
43  load_balancer_arn = aws_lb.keycloak.arn
44  port              = "443"
45  protocol          = "HTTPS"
46  ssl_policy        = "ELBSecurityPolicy-TLS-1-2-2019-07"
47  certificate_arn   = aws_acm_certificate.keycloak.arn
48
49  default_action {
50    type             = "forward"
51    target_group_arn = aws_lb_target_group.keycloak.arn
52  }
53}

Route 53 and ACM Setup

Finally, let's configure DNS and SSL certificates:

1# SSL Certificate
2resource "aws_acm_certificate" "keycloak" {
3  domain_name       = var.keycloak_hostname
4  validation_method = "DNS"
5
6  lifecycle {
7    create_before_destroy = true
8  }
9
10  tags = {
11    Name = "Keycloak SSL Certificate"
12  }
13}
14
15# Route 53 record for Keycloak
16resource "aws_route53_record" "keycloak" {
17  zone_id = var.route53_zone_id
18  name    = var.keycloak_hostname
19  type    = "A"
20
21  alias {
22    name                   = aws_lb.keycloak.dns_name
23    zone_id                = aws_lb.keycloak.zone_id
24    evaluate_target_health = true
25  }
26}
27
28# Certificate validation
29resource "aws_acm_certificate_validation" "keycloak" {
30  certificate_arn         = aws_acm_certificate.keycloak.arn
31  validation_record_fqdns = [for record in aws_route53_record.keycloak_validation : record.fqdn]
32}

Deployment

To deploy the infrastructure, follow these steps:

1# Initialize Terraform
2terraform init
3
4# Plan the deployment
5terraform plan -var-file=prod.tfvars -out=tfplan
6
7# Apply the deployment
8terraform apply tfplan
9
10# Verify the deployment
11terraform output
12
13# Check Keycloak service status
14aws ecs describe-services --cluster keycloak-cluster --services keycloak-service

Deployment Tips

The initial deployment may take 10-15 minutes as AWS provisions all resources. Monitor the ECS service in the AWS console to track container startup progress. Keycloak typically takes 2-3 minutes to fully initialize after the container starts.

Testing Your Deployment

Once deployment is complete, it's crucial to verify that everything is working correctly:

1Basic Functionality

  • Access Keycloak admin console
  • Create a test realm and user
  • Configure a simple client application

2Advanced Testing

  • Test authentication flows
  • Verify logs in CloudWatch
  • Test high availability scenarios

Quick Verification Commands

1# Check if Keycloak is responding
2curl -k https://your-keycloak-domain.com/health
3
4# Test admin console access
5curl -k https://your-keycloak-domain.com/admin/
6
7# Check ECS service health
8aws ecs describe-services --cluster keycloak-cluster --services keycloak-service --query 'services[0].runningCount'
9
10# View container logs
11aws logs tail /ecs/keycloak --follow

Conclusion

In this guide, we've covered how to deploy Keycloak on AWS ECS with Fargate using Terraform. This approach provides a scalable, highly available, and secure identity management solution without the overhead of managing underlying infrastructure.

Key Benefits Summary

This architecture provides a production-ready Keycloak deployment that follows AWS best practices for security, scalability, and reliability. The serverless approach with Fargate minimizes operational overhead while maintaining full control over your identity management system.

Infrastructure

  • Serverless operations with Fargate
  • High availability across multiple AZs
  • Auto scaling based on demand

Security

  • Secure network configuration
  • SSL/TLS encryption
  • Private subnets for sensitive components

Management

  • Managed database with Aurora PostgreSQL
  • Infrastructure as code with Terraform
  • Monitoring with CloudWatch

Next Steps

  • Customize Keycloak themes and configurations for your organization
  • Integrate with your existing applications using OIDC or SAML
  • Set up monitoring and alerting for production use
  • Implement backup and disaster recovery procedures
Need Help with Keycloak Deployment on AWS?
I offer expert consulting on AWS infrastructure, Keycloak deployment, and Terraform implementation. Let's discuss how I can help you build a secure, scalable authentication system.