ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • 이코에코(Eco²) Streams & Scaling for SSE #2: 3-Node Redis Cluster 선언적 배포 (GitOps)
    이코에코(Eco²)/Event Streams & Scaling 2025. 12. 26. 14:52

    개요

    EC2 노드 프로비저닝 후 Kubernetes 클러스터에 Redis를 선언적으로 배포합니다.
    Spotahome Redis Operator를 사용하여 Redis + Sentinel HA 구성을 자동화합니다.


    Operator 선정: Spotahome vs Bitnami

    후보 비교

    기준 Spotahome Redis Operator Bitnami Redis (Helm)
    배포 방식 CRD + Operator Helm Chart
    HA 구현 Redis + Sentinel (자동) 수동 설정 필요
    Failover 자동 (Sentinel 관리) 수동 또는 외부 도구
    리소스 관리 CR 단위 선언적 관리 values.yaml
    GitOps 친화도 높음 (CR = YAML) 중간 (Helm values)
    유지보수 Operator가 조정(Reconcile) 직접 관리

    Spotahome 선정 이유

    https://github.com/spotahome/redis-operator

     

    GitHub - spotahome/redis-operator: Redis Operator creates/configures/manages high availability redis with sentinel automatic fai

    Redis Operator creates/configures/manages high availability redis with sentinel automatic failover atop Kubernetes. - spotahome/redis-operator

    github.com

    1. 선언적 관리: RedisFailover CR 하나로 Redis + Sentinel 구성
    2. 자동 Failover: Master 장애 시 Sentinel이 자동으로 Replica 승격
    3. GitOps 적합: ArgoCD App-of-Apps 패턴과 자연스럽게 통합
    4. 운영 부담 최소화: Operator가 상태를 지속적으로 조정(Reconcile)

    Spotahome Redis Operator 아키텍처

    ┌─────────────────────────────────────────────────────────────┐
    │                    Kubernetes Cluster                        │
    │                                                              │
    │  ┌──────────────────────────────────────────────────────┐   │
    │  │  redis-operator (Deployment)                          │   │
    │  │  ├─ Watch: RedisFailover CRs                          │   │
    │  │  ├─ Create: StatefulSet, Service, ConfigMap           │   │
    │  │  └─ Reconcile: 원하는 상태와 현재 상태 동기화          │   │
    │  └──────────────────────────────────────────────────────┘   │
    │                           │                                  │
    │                           ▼                                  │
    │  ┌──────────────────────────────────────────────────────┐   │
    │  │  RedisFailover CR (auth-redis)                        │   │
    │  │                                                        │   │
    │  │  ┌─────────────────┐  ┌─────────────────┐            │   │
    │  │  │ Redis Master    │  │ Redis Replica   │            │   │
    │  │  │ (rfr-auth-redis)│  │ (rfr-auth-redis)│            │   │
    │  │  └────────┬────────┘  └────────┬────────┘            │   │
    │  │           │                    │                      │   │
    │  │  ┌────────┴────────────────────┴────────┐            │   │
    │  │  │         Sentinel Cluster             │            │   │
    │  │  │  ├─ Master 모니터링                  │            │   │
    │  │  │  ├─ Failover 판단 (quorum)           │            │   │
    │  │  │  └─ 클라이언트 리디렉션              │            │   │
    │  │  └─────────────────────────────────────┘            │   │
    │  └──────────────────────────────────────────────────────┘   │
    └─────────────────────────────────────────────────────────────┘

    생성되는 리소스

    CRD 생성 리소스
    RedisFailover StatefulSet (Redis), StatefulSet (Sentinel), Service, ConfigMap

    서비스 네이밍 규칙

    rfr-<name>    # Redis Master/Replica Service
    rfs-<name>    # Sentinel Service

    예: rfr-auth-redis.redis.svc.cluster.local:6379


    RedisFailover CR 설계

    auth-redis (보안 데이터)

    apiVersion: databases.spotahome.com/v1
    kind: RedisFailover
    metadata:
      name: auth-redis
      namespace: redis
      labels:
        app: auth-redis
        purpose: auth
    spec:
      sentinel:
        replicas: 3      # Quorum 보장 (2/3)
        resources:
          limits:
            cpu: 100m
            memory: 128Mi
        customConfig:
          - down-after-milliseconds 5000
          - failover-timeout 10000
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
                - matchExpressions:
                    - key: redis-cluster
                      operator: In
                      values: [auth]
    
      redis:
        replicas: 3      # Master 1 + Replica 2
        resources:
          limits:
            cpu: 300m
            memory: 512Mi
        storage:
          persistentVolumeClaim:
            spec:
              accessModes: [ReadWriteOnce]
              storageClassName: gp3
              resources:
                requests:
                  storage: 1Gi
        customConfig:
          - "maxmemory 256mb"
          - "maxmemory-policy noeviction"  # 보안 데이터 보호
          - "appendonly yes"               # AOF 영속성
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
                - matchExpressions:
                    - key: redis-cluster
                      operator: In
                      values: [auth]
        exporter:
          enabled: true
          image: oliver006/redis_exporter:v1.62.0

    설계 근거:

    • replicas: 3: Sentinel quorum(2/3)과 Redis Master + 2 Replica
    • noeviction: JWT Blacklist, OAuth State는 삭제되면 안 됨
    • PVC: Master 장애 후 Replica 승격 시 데이터 보존

    streams-redis (SSE 이벤트)

    apiVersion: databases.spotahome.com/v1
    kind: RedisFailover
    metadata:
      name: streams-redis
      namespace: redis
    spec:
      sentinel:
        replicas: 1      # dev: 최소 구성
      redis:
        replicas: 1
        resources:
          limits:
            cpu: 200m
            memory: 512Mi
        storage:
          emptyDir: {}   # 휘발성 (TTL로 자동 정리)
        customConfig:
          - "maxmemory 256mb"
          - "maxmemory-policy noeviction"  # 이벤트 유실 방지
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
                - matchExpressions:
                    - key: redis-cluster
                      operator: In
                      values: [streams]
        exporter:
          enabled: true

    설계 근거:

    • replicas: 1 (dev): 비용 절감, prod에서는 3으로 확장
    • emptyDir: SSE 이벤트는 TTL(1시간)로 자동 삭제, 영속성 불필요
    • noeviction: 처리 전 이벤트가 eviction되면 SSE 스트림 끊김

    cache-redis (Celery 결과)

    apiVersion: databases.spotahome.com/v1
    kind: RedisFailover
    metadata:
      name: cache-redis
      namespace: redis
    spec:
      sentinel:
        replicas: 1
      redis:
        replicas: 1
        resources:
          limits:
            cpu: 300m
            memory: 768Mi
        storage:
          emptyDir: {}
        customConfig:
          - "maxmemory 512mb"
          - "maxmemory-policy allkeys-lru"  # 메모리 부족 시 eviction
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
                - matchExpressions:
                    - key: redis-cluster
                      operator: In
                      values: [cache]
        exporter:
          enabled: true

    설계 근거:

    • allkeys-lru: Celery 결과는 일정 시간 후 불필요, eviction 허용
    • emptyDir: 캐시 데이터는 휘발성, 재시작 시 재생성 가능

    Eviction Policy 비교

    Redis 인스턴스 Policy 근거
    auth-redis noeviction JWT Blacklist 삭제 시 만료된 토큰 재사용 가능
    streams-redis noeviction 처리 전 이벤트 삭제 시 SSE 스트림 끊김
    cache-redis allkeys-lru 오래된 Celery 결과는 eviction해도 무방

    ArgoCD Sync-wave 전략

    의존성 순서

    Sync-wave 24: PostgreSQL
    Sync-wave 27: Redis Operator (CRD + Deployment)
    Sync-wave 28: RedisFailover CRs (auth, streams, cache)
    Sync-wave 29+: 애플리케이션 (Redis 의존)

    ArgoCD Application 구성

    # clusters/dev/apps/27-redis-operator.yaml
    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: dev-redis-operator
      namespace: argocd
      annotations:
        argocd.argoproj.io/sync-wave: "27"
    spec:
      source:
        repoURL: https://spotahome.github.io/redis-operator
        chart: redis-operator
        targetRevision: 3.3.0
        helm:
          values: |
            replicas: 1
            image:
              repository: quay.io/spotahome/redis-operator
              tag: v1.3.0
            resources:
              requests:
                cpu: 100m
                memory: 128Mi
              limits:
                cpu: 200m
                memory: 256Mi
            # Control Plane 노드에 배치
            nodeSelector:
              role: control-plane
            tolerations:
              - key: node-role.kubernetes.io/control-plane
                operator: Exists
                effect: NoSchedule
            serviceAccount:
              create: true
            monitoring:
              enabled: false
      destination:
        server: https://kubernetes.default.svc
        namespace: redis-operator
    
    ---
    # clusters/dev/apps/28-redis-cluster.yaml
    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: dev-redis-cluster
      namespace: argocd
      annotations:
        argocd.argoproj.io/sync-wave: "28"
    spec:
      source:
        repoURL: https://github.com/eco2-team/backend.git
        path: workloads/redis/dev
        targetRevision: develop
      destination:
        server: https://kubernetes.default.svc
        namespace: redis

    sync-wave 간격의 의미:

    • 27 → 28: Operator가 CRD를 등록한 후 CR 생성
    • CR이 먼저 생성되면 no matches for kind "RedisFailover" 에러

    Kustomize 구조

    workloads/redis/
    ├── base/
    │   ├── auth-redis-failover.yaml
    │   ├── streams-redis-failover.yaml
    │   ├── cache-redis-failover.yaml
    │   └── kustomization.yaml
    ├── dev/
    │   └── kustomization.yaml        # replicas: 1 patch
    └── prod/
        └── kustomization.yaml        # replicas: 3 patch

    Dev 환경 Patch

    # workloads/redis/dev/kustomization.yaml
    apiVersion: kustomize.config.k8s.io/v1beta1
    kind: Kustomization
    
    resources:
      - ../base
    
    patches:
      # HA 비활성 (dev: replicas 1)
      - patch: |
          - op: replace
            path: /spec/sentinel/replicas
            value: 1
          - op: replace
            path: /spec/redis/replicas
            value: 1
        target:
          group: databases.spotahome.com
          version: v1
          kind: RedisFailover

    Prod 환경

    # workloads/redis/prod/kustomization.yaml
    apiVersion: kustomize.config.k8s.io/v1beta1
    kind: Kustomization
    
    resources:
      - ../base
    
    # base가 이미 replicas: 3이므로 추가 patch 불필요

    환경변수 매핑

    External Secrets → Deployment

    # workloads/secrets/external-secrets/dev/api-secrets.yaml
    apiVersion: external-secrets.io/v1beta1
    kind: ExternalSecret
    spec:
      data:
        - secretKey: AUTH_REDIS_BLACKLIST_URL
          remoteRef:
            key: eco2/dev/api
            property: AUTH_REDIS_BLACKLIST_URL
            # 값: redis://rfr-auth-redis.redis.svc.cluster.local:6379/0

    서비스별 Redis 매핑

    환경변수 Redis 인스턴스 DB 용도
    AUTH_REDIS_BLACKLIST_URL rfr-auth-redis 0 JWT Blacklist
    AUTH_REDIS_OAUTH_STATE_URL rfr-auth-redis 3 OAuth State
    REDIS_STREAMS_URL rfr-streams-redis 0 SSE Events
    CELERY_RESULT_BACKEND rfr-cache-redis 0 Task Results
    IMAGE_REDIS_URL rfr-cache-redis 6 Image Cache

    배포 확인 (실측)

    ArgoCD Sync

    ubuntu@k8s-master:~$ kubectl get applications -n argocd | grep redis
    dev-redis-cluster                Synced        Healthy
    dev-redis-operator               Synced        Healthy

    Redis Operator

    ubuntu@k8s-master:~$ kubectl get pods -n redis-operator -o wide
    NAME                                  READY   STATUS    RESTARTS   AGE   IP                NODE         NOMINATED NODE   READINESS GATES
    dev-redis-operator-5fc99ccfcf-ckqqz   1/1     Running   0          22m   192.168.235.236   k8s-master   <none>           <none>

    RedisFailover 상태

    ubuntu@k8s-master:~$ kubectl get redisfailover -n redis
    NAME            NAME            REDIS   SENTINELS   AGE
    auth-redis      auth-redis      1       1           26m
    cache-redis     cache-redis     1       1           26m
    streams-redis   streams-redis   1       1           26m

    Pod 확인

    ubuntu@k8s-master:~$ kubectl get pods -n redis -o wide
    NAME                                 READY   STATUS    RESTARTS   AGE   IP                NODE                NOMINATED NODE   READINESS GATES
    rfr-auth-redis-0                     3/3     Running   0          26m   192.168.75.196    k8s-redis-auth      <none>           <none>
    rfr-cache-redis-0                    3/3     Running   0          26m   192.168.46.132    k8s-redis-cache     <none>           <none>
    rfr-streams-redis-0                  3/3     Running   0          26m   192.168.169.196   k8s-redis-streams   <none>           <none>
    rfs-auth-redis-66bf8f9657-dzp7v      2/2     Running   0          26m   192.168.75.195    k8s-redis-auth      <none>           <none>
    rfs-cache-redis-7845fbdd47-l27dr     2/2     Running   0          26m   192.168.46.131    k8s-redis-cache     <none>           <none>
    rfs-streams-redis-7d9c9986d9-twjdx   2/2     Running   0          26m   192.168.169.195   k8s-redis-streams   <none>           <none>

    Note: rfr-* Pod는 3/3 (Redis + Exporter + Sentinel sidecar), rfs-*2/2 (Sentinel + Exporter)

    Services

    ubuntu@k8s-master:~$ kubectl get svc -n redis
    NAME                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)     AGE
    rfr-auth-redis      ClusterIP   None             <none>        9121/TCP    26m
    rfr-cache-redis     ClusterIP   None             <none>        9121/TCP    26m
    rfr-streams-redis   ClusterIP   None             <none>        9121/TCP    26m
    rfs-auth-redis      ClusterIP   10.96.118.174    <none>        26379/TCP   26m
    rfs-cache-redis     ClusterIP   10.111.135.35    <none>        26379/TCP   26m
    rfs-streams-redis   ClusterIP   10.108.155.145   <none>        26379/TCP   26m

    메모리 사용량 및 정책

    ubuntu@k8s-master:~$ for pod in rfr-auth-redis-0 rfr-streams-redis-0 rfr-cache-redis-0; do
      kubectl exec -n redis $pod -c redis -- redis-cli INFO memory | grep used_memory_human
      kubectl exec -n redis $pod -c redis -- redis-cli CONFIG GET maxmemory-policy
    done
    used_memory_human:872.70K
    maxmemory-policy
    noeviction
    used_memory_human:893.45K
    maxmemory-policy
    noeviction
    used_memory_human:970.57K
    maxmemory-policy
    allkeys-lru
    Instance Used Memory maxmemory-policy
    auth-redis 872.66K noeviction
    streams-redis 893.45K noeviction
    cache-redis 926.23K allkeys-lru

     


    트러블슈팅

    1. CRD 미등록 에러

    error: unable to recognize "redisfailover.yaml": 
    no matches for kind "RedisFailover" in version "databases.spotahome.com/v1"

    해결: Redis Operator가 먼저 배포되어야 함 (sync-wave 순서 확인)

    2. Helm Values 형식 오류

    # 오류: YAML 파싱 에러
    Failed to load target state: yaml: line 4: did not find expected node content

    원인: Spotahome Redis Operator Helm Chart의 values key가 버전별로 다름

    설명
    replicas (O) Operator Pod 수 (올바른 키)
    replicaCount (X) 존재하지 않는 키
    monitoring.enabled (O) Prometheus 메트릭 (올바른 키)
    serviceMonitor.enabled (X) 존재하지 않는 키

    해결: helm show values spotahome/redis-operator로 올바른 키 확인

    3. PVC Pending

    kubectl get pvc -n redis

    해결: StorageClass 존재 여부, Node Affinity 확인


    References

    댓글

ABOUT ME

🎓 부산대학교 정보컴퓨터공학과 학사: 2017.03 - 2023.08
☁️ Rakuten Symphony Jr. Cloud Engineer: 2024.12.09 - 2025.08.31
🏆 2025 AI 새싹톤 우수상 수상: 2025.10.30 - 2025.12.02
🌏 이코에코(Eco²) 백엔드/인프라 고도화 중: 2025.12 - Present

Designed by Mango