ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • redis-pubsub, event-router 노드 프로비저닝 실측 로그
    이코에코(Eco²)/Logs 2025. 12. 27. 18:35

    기존 18-Node 클러스터에 2개의 새 노드를 추가:

    • k8s-event-router (t3.small) - Redis Streams → Pub/Sub Bridge
    • k8s-redis-pubsub (t3.small) - Realtime Event Broadcast

    아키텍처 배경

    ┌─────────────────────────────────────────────────────────────────────────────┐
    │                         SSE HA Architecture (Full)                           │
    ├─────────────────────────────────────────────────────────────────────────────┤
    │                                                                              │
    │  ┌──────────┐                                                                │
    │  │  Client  │ ◄────────────────────────────────────────────────────┐        │
    │  └────┬─────┘                                                       │        │
    │       │ POST /scan                                                  │ SSE    │
    │       ▼                                                             │        │
    │  ┌──────────┐    ┌───────────┐    ┌────────────┐                   │        │
    │  │ Scan API │───►│ RabbitMQ  │───►│ Worker     │                   │        │
    │  │ (HPA)    │    │ (Cluster) │    │ (KEDA)     │                   │        │
    │  └──────────┘    └───────────┘    └─────┬──────┘                   │        │
    │                                         │ XADD (멱등)               │        │
    │                                         ▼                           │        │
    │  ┌───────────────────────────────────────────────────────────────┐ │        │
    │  │              Redis Streams (k8s-redis-streams)                │ │        │
    │  │  scan:events:0, scan:events:1, scan:events:2, scan:events:3   │ │        │
    │  └───────────────────────────────────────────────────────────────┘ │        │
    │                                         │ XREADGROUP                │        │
    │                                         ▼                           │        │
    │  ┌───────────────────────────────────────────────────────────────┐ │        │
    │  │     Event Router (k8s-event-router) [NEW - Phase 6]           │ │        │
    │  │  • Consumer Group: XREADGROUP + XACK                          │ │        │
    │  │  • State Update: scan:state:{job_id}                          │ │        │
    │  │  • XPENDING Reclaim: 장애 복구                                 │ │        │
    │  └───────────────────────────────────────────────────────────────┘ │        │
    │                                         │ PUBLISH                   │        │
    │                                         ▼                           │        │
    │  ┌───────────────────────────────────────────────────────────────┐ │        │
    │  │     Redis Pub/Sub (k8s-redis-pubsub) [NEW - Phase 6]          │ │        │
    │  │  • Channels: sse:events:{job_id}                              │ │        │
    │  │  • Fire & Forget (구독자 없으면 drop)                          │ │        │
    │  └───────────────────────────────────────────────────────────────┘ │        │
    │                                         │ SUBSCRIBE                 │        │
    │                                         ▼                           │        │
    │  ┌───────────────────────────────────────────────────────────────┐ │        │
    │  │       SSE Gateway (k8s-sse-gateway) [Existing]                │─┘        │
    │  │  • Pub/Sub 구독: sse:events:{job_id}                          │          │
    │  │  • State 복구: scan:state:{job_id}                            │          │
    │  │  • seq 기반 중복 필터링                                        │          │
    │  │  • HPA: 수평 확장 (Consistent Hash 불필요)                     │          │
    │  └───────────────────────────────────────────────────────────────┘          │
    │                                                                              │
    │  ▶ 신규 노드: k8s-event-router, k8s-redis-pubsub                            │
    │  ▶ 기존 노드: k8s-sse-gateway (Pub/Sub 구독 방식으로 전환)                   │
    └─────────────────────────────────────────────────────────────────────────────┘

    Phase 1: 사전 상태 확인

    기존 vCPU 현황

    $ terraform output cluster_info
    {
      "total_nodes" = 18
      "total_vcpu" = 40
      "phase" = "Phase 1-5 Complete - 18-Node Architecture"
    }

    추가 예정 리소스

    노드 인스턴스 타입 vCPU 메모리 용도
    k8s-event-router t3.small 2 2GB XREADGROUP → PUBLISH
    k8s-redis-pubsub t3.small 2 2GB Pub/Sub Broadcast

    Phase 2: Terraform 구성

    2.1 kubelet_profiles 추가

    # main.tf - locals.kubelet_profiles에 추가
    "k8s-event-router" = "--node-labels=role=event-router,domain=event-router,service=event-router,workload=event-router,tier=integration,phase=5 --register-with-taints=domain=event-router:NoSchedule"
    "k8s-redis-pubsub" = "--node-labels=role=infrastructure,domain=data,infra-type=redis-pubsub,redis-cluster=pubsub,workload=cache,tier=data,phase=1 --register-with-taints=domain=data:NoSchedule"

    2.2 EC2 모듈 정의

    # main.tf - Phase 6: HA Event Architecture
    
    # Event Router Node
    module "event_router" {
      source = "./modules/ec2"
    
      instance_name        = "k8s-event-router"
      instance_type        = "t3.small"
      ami_id               = data.aws_ami.ubuntu.id
      subnet_id            = module.vpc.public_subnet_ids[2]  # AZ-c
      security_group_ids   = [module.security_groups.cluster_sg_id]
      key_name             = aws_key_pair.k8s.key_name
      iam_instance_profile = aws_iam_instance_profile.k8s.name
    
      root_volume_size = 20
      root_volume_type = "gp3"
    
      user_data = templatefile("${path.module}/user-data/common.sh", {
        hostname           = "k8s-event-router"
        kubelet_extra_args = local.kubelet_profiles["k8s-event-router"]
      })
    
      tags = {
        Role     = "worker"
        Workload = "event-router"
        Domain   = "event-router"
        Phase    = "6"
      }
    }
    
    # Redis Pub/Sub Node
    module "redis_pubsub" {
      source = "./modules/ec2"
    
      instance_name        = "k8s-redis-pubsub"
      instance_type        = "t3.small"
      ami_id               = data.aws_ami.ubuntu.id
      subnet_id            = module.vpc.public_subnet_ids[0]  # AZ-a
      security_group_ids   = [module.security_groups.cluster_sg_id]
      key_name             = aws_key_pair.k8s.key_name
      iam_instance_profile = aws_iam_instance_profile.k8s.name
    
      root_volume_size = 10  # Pub/Sub only, no persistence
      root_volume_type = "gp3"
    
      user_data = templatefile("${path.module}/user-data/common.sh", {
        hostname           = "k8s-redis-pubsub"
        kubelet_extra_args = local.kubelet_profiles["k8s-redis-pubsub"]
      })
    
      tags = {
        Role         = "worker"
        Workload     = "cache"
        RedisCluster = "pubsub"
        Phase        = "6"
      }
    }

    Phase 3: Terraform Plan (-target 사용)

    ⚠️ 중요: -target 옵션으로 범위를 제한하여 기존 인프라에 영향 없이 진행

    $ terraform plan \
      -var="dockerhub_password=${DOCKERHUB_TOKEN:-dummy}" \
      -target=module.event_router \
      -target=module.redis_pubsub \
      -out=ha-event-nodes.plan
    
    Plan: 2 to add, 0 to change, 0 to destroy.
    
    Changes to Outputs:
      + event_router_instance_id  = (known after apply)
      + event_router_private_ip   = (known after apply)
      + event_router_public_ip    = (known after apply)
      + redis_pubsub_instance_id  = (known after apply)
      + redis_pubsub_private_ip   = (known after apply)
      + redis_pubsub_public_ip    = (known after apply)

    검증 포인트:

    • 2 to add - 새 노드 2개만 추가
    • 0 to change - 기존 리소스 변경 없음
    • 0 to destroy - 삭제 없음

    Phase 4: Terraform Apply

    $ terraform apply "ha-event-nodes.plan"
    
    module.redis_pubsub.aws_instance.this: Creating...
    module.event_router.aws_instance.this: Creating...
    module.redis_pubsub.aws_instance.this: Still creating... [10s elapsed]
    module.event_router.aws_instance.this: Still creating... [10s elapsed]
    module.redis_pubsub.aws_instance.this: Creation complete after 13s [id=i-0f70d6b9ac5dde237]
    module.event_router.aws_instance.this: Creation complete after 13s [id=i-091c54e48fca2b4f4]
    
    Apply complete! Resources: 2 added, 0 changed, 0 destroyed.

    소요 시간: 13초 (2개 인스턴스 병렬 생성)

    생성된 인스턴스

    노드 Instance ID Public IP Private IP
    k8s-event-router i-091c54e48fca2b4f4 3.39.195.102 10.0.3.14
    k8s-redis-pubsub i-0f70d6b9ac5dde237 43.201.54.31 10.0.1.248

    Phase 5: Output 갱신 (-refresh-only)

    ⚠️ 주의: -target 사용 후에는 outputs가 완전히 갱신되지 않음

    문제 증상

    Apply 직후 cluster_info 출력:

    "total_nodes" = 17   # 예상: 20
    "total_vcpu" = 38    # 예상: 44
    "phase" = "Phase 1-4 Complete - 17-Node..."  # 갱신 안됨

    해결

    $ terraform apply -refresh-only -auto-approve -var="dockerhub_password=dummy"

    갱신 후 확인

    cluster_info = {
      "total_nodes" = 20
      "total_vcpu" = 44
      "total_memory_gb" = 60
      "phase" = "Phase 1-6 Complete - 20-Node HA Architecture"
    }
    
    node_roles = {
      ...
      "event_router" = "Event Router - Streams→Pub/Sub Bridge (t3.small, 2GB) - Phase 6"
      "redis_pubsub" = "Redis Pub/Sub - Realtime Broadcast (t3.small, 2GB) - Phase 6"
      ...
    }

    Phase 6: SSH 연결 확인

    $ ssh -i ~/.ssh/sesacthon.pem ubuntu@3.39.195.102 "hostname && uptime"
    k8s-event-router
     18:16:50 up 2 min,  0 users,  load average: 0.16, 0.11, 0.04
    
    $ ssh -i ~/.ssh/sesacthon.pem ubuntu@43.201.54.31 "hostname && uptime"
    k8s-redis-pubsub
     18:16:52 up 2 min,  0 users,  load average: 0.12, 0.13, 0.05

    Phase 7: Ansible - 노드 설정

    Inventory 업데이트

    $ terraform output -raw ansible_inventory > ../ansible/inventory/hosts.ini

    containerd, kubeadm 설치

    $ cd ansible && ansible-playbook -i inventory/hosts.ini \
      playbooks/setup-new-nodes.yml -l "event_router,redis_pubsub"
    
    PLAY RECAP *********************************************************************
    k8s-event-router  : ok=37   changed=18   unreachable=0   failed=0
    k8s-redis-pubsub  : ok=38   changed=18   unreachable=0   failed=0

    설치 결과

    k8s-event-router:
      - containerd: active
      - kubeadm: v1.28.4
      - kubelet: Kubernetes v1.28.4
    
    k8s-redis-pubsub:
      - containerd: active
      - kubeadm: v1.28.4
      - kubelet: Kubernetes v1.28.4

    Phase 8: Worker Join

    Join Token 생성

    $ ssh ubuntu@13.209.44.249 "kubeadm token create --print-join-command"
    kubeadm join 10.0.1.21:6443 --token l5b0yw.kja4c9un8t015qv6 \
      --discovery-token-ca-cert-hash sha256:3dbda1db23abe18e97d7e7a0d20b57acd0f7751fdb56844575c52cc13bd95d9e

    노드 Join 실행

    # Event Router
    $ ssh ubuntu@3.39.195.102 "sudo kubeadm join 10.0.1.21:6443 ..."
    This node has joined the cluster:
    * Certificate signing request was sent to apiserver and a response was received.
    * The Kubelet was informed of the new secure connection details.
    
    # Redis Pub/Sub
    $ ssh ubuntu@43.201.54.31 "sudo kubeadm join 10.0.1.21:6443 ..."
    This node has joined the cluster:
    * Certificate signing request was sent to apiserver and a response was received.
    * The Kubelet was informed of the new secure connection details.

    Phase 9: 노드 상태 확인

    Ready 상태 전환

    $ kubectl get nodes k8s-event-router k8s-redis-pubsub
    NAME               STATUS   ROLES    AGE   VERSION
    k8s-event-router   Ready    <none>   57s   v1.28.4
    k8s-redis-pubsub   Ready    <none>   53s   v1.28.4

    라벨 확인

    $ kubectl get nodes k8s-event-router --show-labels
    LABELS:
      domain=event-router
      role=event-router
      service=event-router
      tier=integration
      workload=event-router
    
    $ kubectl get nodes k8s-redis-pubsub --show-labels
    LABELS:
      domain=data
      infra-type=redis-pubsub
      redis-cluster=pubsub
      role=infrastructure
      tier=data

    Taint 확인

    $ kubectl describe nodes k8s-event-router k8s-redis-pubsub | grep -A1 Taints
    k8s-event-router:
      Taints: domain=event-router:NoSchedule
    
    k8s-redis-pubsub:
      Taints: domain=data:NoSchedule

    Phase 10: Redis Pub/Sub 배포

    RedisFailover CR 생성

    $ kubectl apply -f workloads/redis/base/pubsub-redis-failover.yaml
    redisfailover.databases.spotahome.com/pubsub-redis created

    Pod 배포 현황 (60초 후)

    $ kubectl get pods -n redis -o wide | grep pubsub
    NAME                                 READY   STATUS    RESTARTS   AGE     NODE
    rfr-pubsub-redis-0                   3/3     Running   0          2m18s   k8s-redis-pubsub
    rfr-pubsub-redis-1                   3/3     Running   0          2m17s   k8s-redis-pubsub
    rfr-pubsub-redis-2                   3/3     Running   0          2m17s   k8s-redis-pubsub
    rfs-pubsub-redis-559f7789f9-j2sff    2/2     Running   0          2m16s   k8s-redis-pubsub
    rfs-pubsub-redis-559f7789f9-s7r2m    2/2     Running   0          2m18s   k8s-redis-pubsub
    rfs-pubsub-redis-559f7789f9-wlmpx    2/2     Running   0          2m17s   k8s-redis-pubsub

    배포 결과

    구분 Replicas 상태 노드
    Redis Master (rfr-*) 3 ✅ Running k8s-redis-pubsub
    Sentinel (rfs-*) 3 ✅ Running k8s-redis-pubsub

    Service 정보

    ubuntu@k8s-master:~$ kubectl get svc -n redis | grep pubsub
    rfr-pubsub-redis    ClusterIP   None             <none>        9121/TCP    13m
    rfs-pubsub-redis    ClusterIP   10.102.2.2       <none>        26379/TCP   13m

    연결 테스트

    $ kubectl run redis-test --rm -it --image=redis:7-alpine -- \
      redis-cli -h rfr-pubsub-redis.redis.svc.cluster.local ping
    PONG

    환경변수 설정

    REDIS_PUBSUB_URL=redis://rfr-pubsub-redis.redis.svc.cluster.local:6379/0

    Redis 4-Node Cluster 현황 (GitOps)

    프로비저닝 완료 후 Redis 클러스터 전체 구성:

    클러스터 노드 용도 Replicas 상태
    auth-redis k8s-redis-auth Blacklist + OAuth 1+1
    streams-redis k8s-redis-streams SSE 이벤트 원장 1+1
    cache-redis k8s-redis-cache Celery + Cache 1+1
    pubsub-redis k8s-redis-pubsub Realtime Broadcast 3+3

    RedisFailover 상태

    ubuntu@k8s-master:~$ kubectl get redisfailover -n redis
    NAME            NAME            REDIS   SENTINELS   AGE
    auth-redis      auth-redis      1       1           28h
    cache-redis     cache-redis     1       1           28h
    pubsub-redis    pubsub-redis    3       3           11m
    streams-redis   streams-redis   1       1           28h

    최종 클러스터 현황

    ubuntu@k8s-master:~$ kubectl get nodes -o wide
    NAME                  STATUS   ROLES           AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION   CONTAINER-RUNTIME
    k8s-api-auth          Ready    <none>          21h   v1.28.4   10.0.1.53     <none>        Ubuntu 22.04.5 LTS   6.8.0-1044-aws   containerd://2.2.1
    k8s-api-character     Ready    <none>          21h   v1.28.4   10.0.1.244    <none>        Ubuntu 22.04.5 LTS   6.8.0-1044-aws   containerd://2.2.1
    k8s-api-chat          Ready    <none>          21h   v1.28.4   10.0.1.49     <none>        Ubuntu 22.04.5 LTS   6.8.0-1044-aws   containerd://2.2.1
    k8s-api-image         Ready    <none>          21h   v1.28.4   10.0.3.183    <none>        Ubuntu 22.04.5 LTS   6.8.0-1044-aws   containerd://2.2.1
    k8s-api-location      Ready    <none>          21h   v1.28.4   10.0.2.236    <none>        Ubuntu 22.04.5 LTS   6.8.0-1044-aws   containerd://2.2.1
    k8s-api-my            Ready    <none>          21h   v1.28.4   10.0.2.56     <none>        Ubuntu 22.04.5 LTS   6.8.0-1044-aws   containerd://2.2.1
    k8s-api-scan          Ready    <none>          30d   v1.28.4   10.0.3.219    <none>        Ubuntu 22.04.5 LTS   6.8.0-1043-aws   containerd://2.1.5
    k8s-event-router      Ready    <none>          19m   v1.28.4   10.0.3.14     <none>        Ubuntu 22.04.5 LTS   6.8.0-1044-aws   containerd://2.2.1
    k8s-ingress-gateway   Ready    <none>          21h   v1.28.4   10.0.1.150    <none>        Ubuntu 22.04.5 LTS   6.8.0-1044-aws   containerd://2.2.1
    k8s-logging           Ready    <none>          10d   v1.28.4   10.0.3.59     <none>        Ubuntu 22.04.5 LTS   6.8.0-1044-aws   containerd://2.2.0
    k8s-master            Ready    control-plane   30d   v1.28.4   10.0.1.21     <none>        Ubuntu 22.04.5 LTS   6.8.0-1043-aws   containerd://2.1.5
    k8s-monitoring        Ready    <none>          21h   v1.28.4   10.0.2.84     <none>        Ubuntu 22.04.5 LTS   6.8.0-1044-aws   containerd://2.2.1
    k8s-postgresql        Ready    <none>          30d   v1.28.4   10.0.1.211    <none>        Ubuntu 22.04.5 LTS   6.8.0-1043-aws   containerd://2.1.5
    k8s-rabbitmq          Ready    <none>          21h   v1.28.4   10.0.2.148    <none>        Ubuntu 22.04.5 LTS   6.8.0-1044-aws   containerd://2.2.1
    k8s-redis-auth        Ready    <none>          30h   v1.28.4   10.0.2.43     <none>        Ubuntu 22.04.5 LTS   6.8.0-1044-aws   containerd://2.2.1
    k8s-redis-cache       Ready    <none>          30h   v1.28.4   10.0.2.202    <none>        Ubuntu 22.04.5 LTS   6.8.0-1044-aws   containerd://2.2.1
    k8s-redis-pubsub      Ready    <none>          19m   v1.28.4   10.0.1.248    <none>        Ubuntu 22.04.5 LTS   6.8.0-1044-aws   containerd://2.2.1
    k8s-redis-streams     Ready    <none>          30h   v1.28.4   10.0.2.215    <none>        Ubuntu 22.04.5 LTS   6.8.0-1044-aws   containerd://2.2.1
    k8s-sse-gateway       Ready    <none>          18h   v1.28.4   10.0.2.195    <none>        Ubuntu 22.04.5 LTS   6.8.0-1044-aws   containerd://2.2.1
    k8s-worker-ai         Ready    <none>          21h   v1.28.4   10.0.1.127    <none>        Ubuntu 22.04.5 LTS   6.8.0-1044-aws   containerd://2.2.1
    k8s-worker-storage    Ready    <none>          30d   v1.28.4   10.0.3.246    <none>        Ubuntu 22.04.5 LTS   6.8.0-1043-aws   containerd://2.1.5
    
    # NEW
    k8s-redis-pubsub      Ready    <none>          19m   v1.28.4   10.0.1.248    <none>        Ubuntu 22.04.5 LTS   6.8.0-1044-aws   containerd://2.2.1
    k8s-event-router      Ready    <none>          19m   v1.28.4   10.0.3.14     <none>        Ubuntu 22.04.5 LTS   6.8.0-1044-aws   containerd://2.2.1

     

    총 노드 수: 21개 (Master 1 + Workers 20)


    vCPU 현황

    인스턴스 타입 개수 vCPU/개 총 vCPU
    t3.xlarge 2 4 8
    t3.large 2 2 4
    t3.medium 6 2 12
    t3.small 11 2 22
    합계 21 - 46 vCPU

    주요 교훈

    1. -target 사용 후 -refresh-only 필수
      • 일부 모듈만 apply하면 output이 갱신되지 않음
      • 반드시 -refresh-only로 전체 상태 동기화 필요
    2. kubelet_extra_args로 라벨/Taint 자동 적용
      • Terraform user-data에서 설정하면 Join 시 자동 적용
      • 수동 라벨링 작업 불필요
    3. 병렬 인스턴스 생성
      • 2개 인스턴스가 13초 내 병렬 생성됨
      • 순차 생성 대비 시간 절약

    '이코에코(Eco²) > Logs' 카테고리의 다른 글

    Redis 3-Node 클러스터 프로비저닝 실측 로그  (0) 2025.12.26

    댓글

ABOUT ME

🎓 부산대학교 정보컴퓨터공학과 학사: 2017.03 - 2023.08
☁️ Rakuten Symphony Jr. Cloud Engineer: 2024.12.09 - 2025.08.31
🏆 2025 AI 새싹톤 우수상 수상: 2025.10.30 - 2025.12.02
🌏 이코에코(Eco²) 백엔드/인프라 고도화 중: 2025.12 - Present

Designed by Mango