이코에코(Eco²)/Event Streams & Scaling

이코에코(Eco²) Streams & Scaling for SSE #2: 3-Node Redis Cluster 선언적 배포 (GitOps)

mango_fr 2025. 12. 26. 14:52

개요

EC2 노드 프로비저닝 후 Kubernetes 클러스터에 Redis를 선언적으로 배포합니다.
Spotahome Redis Operator를 사용하여 Redis + Sentinel HA 구성을 자동화합니다.


Operator 선정: Spotahome vs Bitnami

후보 비교

기준 Spotahome Redis Operator Bitnami Redis (Helm)
배포 방식 CRD + Operator Helm Chart
HA 구현 Redis + Sentinel (자동) 수동 설정 필요
Failover 자동 (Sentinel 관리) 수동 또는 외부 도구
리소스 관리 CR 단위 선언적 관리 values.yaml
GitOps 친화도 높음 (CR = YAML) 중간 (Helm values)
유지보수 Operator가 조정(Reconcile) 직접 관리

Spotahome 선정 이유

https://github.com/spotahome/redis-operator

 

GitHub - spotahome/redis-operator: Redis Operator creates/configures/manages high availability redis with sentinel automatic fai

Redis Operator creates/configures/manages high availability redis with sentinel automatic failover atop Kubernetes. - spotahome/redis-operator

github.com

  1. 선언적 관리: RedisFailover CR 하나로 Redis + Sentinel 구성
  2. 자동 Failover: Master 장애 시 Sentinel이 자동으로 Replica 승격
  3. GitOps 적합: ArgoCD App-of-Apps 패턴과 자연스럽게 통합
  4. 운영 부담 최소화: Operator가 상태를 지속적으로 조정(Reconcile)

Spotahome Redis Operator 아키텍처

┌─────────────────────────────────────────────────────────────┐
│                    Kubernetes Cluster                        │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  redis-operator (Deployment)                          │   │
│  │  ├─ Watch: RedisFailover CRs                          │   │
│  │  ├─ Create: StatefulSet, Service, ConfigMap           │   │
│  │  └─ Reconcile: 원하는 상태와 현재 상태 동기화          │   │
│  └──────────────────────────────────────────────────────┘   │
│                           │                                  │
│                           ▼                                  │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  RedisFailover CR (auth-redis)                        │   │
│  │                                                        │   │
│  │  ┌─────────────────┐  ┌─────────────────┐            │   │
│  │  │ Redis Master    │  │ Redis Replica   │            │   │
│  │  │ (rfr-auth-redis)│  │ (rfr-auth-redis)│            │   │
│  │  └────────┬────────┘  └────────┬────────┘            │   │
│  │           │                    │                      │   │
│  │  ┌────────┴────────────────────┴────────┐            │   │
│  │  │         Sentinel Cluster             │            │   │
│  │  │  ├─ Master 모니터링                  │            │   │
│  │  │  ├─ Failover 판단 (quorum)           │            │   │
│  │  │  └─ 클라이언트 리디렉션              │            │   │
│  │  └─────────────────────────────────────┘            │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

생성되는 리소스

CRD 생성 리소스
RedisFailover StatefulSet (Redis), StatefulSet (Sentinel), Service, ConfigMap

서비스 네이밍 규칙

rfr-<name>    # Redis Master/Replica Service
rfs-<name>    # Sentinel Service

예: rfr-auth-redis.redis.svc.cluster.local:6379


RedisFailover CR 설계

auth-redis (보안 데이터)

apiVersion: databases.spotahome.com/v1
kind: RedisFailover
metadata:
  name: auth-redis
  namespace: redis
  labels:
    app: auth-redis
    purpose: auth
spec:
  sentinel:
    replicas: 3      # Quorum 보장 (2/3)
    resources:
      limits:
        cpu: 100m
        memory: 128Mi
    customConfig:
      - down-after-milliseconds 5000
      - failover-timeout 10000
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: redis-cluster
                  operator: In
                  values: [auth]

  redis:
    replicas: 3      # Master 1 + Replica 2
    resources:
      limits:
        cpu: 300m
        memory: 512Mi
    storage:
      persistentVolumeClaim:
        spec:
          accessModes: [ReadWriteOnce]
          storageClassName: gp3
          resources:
            requests:
              storage: 1Gi
    customConfig:
      - "maxmemory 256mb"
      - "maxmemory-policy noeviction"  # 보안 데이터 보호
      - "appendonly yes"               # AOF 영속성
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: redis-cluster
                  operator: In
                  values: [auth]
    exporter:
      enabled: true
      image: oliver006/redis_exporter:v1.62.0

설계 근거:

  • replicas: 3: Sentinel quorum(2/3)과 Redis Master + 2 Replica
  • noeviction: JWT Blacklist, OAuth State는 삭제되면 안 됨
  • PVC: Master 장애 후 Replica 승격 시 데이터 보존

streams-redis (SSE 이벤트)

apiVersion: databases.spotahome.com/v1
kind: RedisFailover
metadata:
  name: streams-redis
  namespace: redis
spec:
  sentinel:
    replicas: 1      # dev: 최소 구성
  redis:
    replicas: 1
    resources:
      limits:
        cpu: 200m
        memory: 512Mi
    storage:
      emptyDir: {}   # 휘발성 (TTL로 자동 정리)
    customConfig:
      - "maxmemory 256mb"
      - "maxmemory-policy noeviction"  # 이벤트 유실 방지
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: redis-cluster
                  operator: In
                  values: [streams]
    exporter:
      enabled: true

설계 근거:

  • replicas: 1 (dev): 비용 절감, prod에서는 3으로 확장
  • emptyDir: SSE 이벤트는 TTL(1시간)로 자동 삭제, 영속성 불필요
  • noeviction: 처리 전 이벤트가 eviction되면 SSE 스트림 끊김

cache-redis (Celery 결과)

apiVersion: databases.spotahome.com/v1
kind: RedisFailover
metadata:
  name: cache-redis
  namespace: redis
spec:
  sentinel:
    replicas: 1
  redis:
    replicas: 1
    resources:
      limits:
        cpu: 300m
        memory: 768Mi
    storage:
      emptyDir: {}
    customConfig:
      - "maxmemory 512mb"
      - "maxmemory-policy allkeys-lru"  # 메모리 부족 시 eviction
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: redis-cluster
                  operator: In
                  values: [cache]
    exporter:
      enabled: true

설계 근거:

  • allkeys-lru: Celery 결과는 일정 시간 후 불필요, eviction 허용
  • emptyDir: 캐시 데이터는 휘발성, 재시작 시 재생성 가능

Eviction Policy 비교

Redis 인스턴스 Policy 근거
auth-redis noeviction JWT Blacklist 삭제 시 만료된 토큰 재사용 가능
streams-redis noeviction 처리 전 이벤트 삭제 시 SSE 스트림 끊김
cache-redis allkeys-lru 오래된 Celery 결과는 eviction해도 무방

ArgoCD Sync-wave 전략

의존성 순서

Sync-wave 24: PostgreSQL
Sync-wave 27: Redis Operator (CRD + Deployment)
Sync-wave 28: RedisFailover CRs (auth, streams, cache)
Sync-wave 29+: 애플리케이션 (Redis 의존)

ArgoCD Application 구성

# clusters/dev/apps/27-redis-operator.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: dev-redis-operator
  namespace: argocd
  annotations:
    argocd.argoproj.io/sync-wave: "27"
spec:
  source:
    repoURL: https://spotahome.github.io/redis-operator
    chart: redis-operator
    targetRevision: 3.3.0
    helm:
      values: |
        replicas: 1
        image:
          repository: quay.io/spotahome/redis-operator
          tag: v1.3.0
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 200m
            memory: 256Mi
        # Control Plane 노드에 배치
        nodeSelector:
          role: control-plane
        tolerations:
          - key: node-role.kubernetes.io/control-plane
            operator: Exists
            effect: NoSchedule
        serviceAccount:
          create: true
        monitoring:
          enabled: false
  destination:
    server: https://kubernetes.default.svc
    namespace: redis-operator

---
# clusters/dev/apps/28-redis-cluster.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: dev-redis-cluster
  namespace: argocd
  annotations:
    argocd.argoproj.io/sync-wave: "28"
spec:
  source:
    repoURL: https://github.com/eco2-team/backend.git
    path: workloads/redis/dev
    targetRevision: develop
  destination:
    server: https://kubernetes.default.svc
    namespace: redis

sync-wave 간격의 의미:

  • 27 → 28: Operator가 CRD를 등록한 후 CR 생성
  • CR이 먼저 생성되면 no matches for kind "RedisFailover" 에러

Kustomize 구조

workloads/redis/
├── base/
│   ├── auth-redis-failover.yaml
│   ├── streams-redis-failover.yaml
│   ├── cache-redis-failover.yaml
│   └── kustomization.yaml
├── dev/
│   └── kustomization.yaml        # replicas: 1 patch
└── prod/
    └── kustomization.yaml        # replicas: 3 patch

Dev 환경 Patch

# workloads/redis/dev/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - ../base

patches:
  # HA 비활성 (dev: replicas 1)
  - patch: |
      - op: replace
        path: /spec/sentinel/replicas
        value: 1
      - op: replace
        path: /spec/redis/replicas
        value: 1
    target:
      group: databases.spotahome.com
      version: v1
      kind: RedisFailover

Prod 환경

# workloads/redis/prod/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - ../base

# base가 이미 replicas: 3이므로 추가 patch 불필요

환경변수 매핑

External Secrets → Deployment

# workloads/secrets/external-secrets/dev/api-secrets.yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
spec:
  data:
    - secretKey: AUTH_REDIS_BLACKLIST_URL
      remoteRef:
        key: eco2/dev/api
        property: AUTH_REDIS_BLACKLIST_URL
        # 값: redis://rfr-auth-redis.redis.svc.cluster.local:6379/0

서비스별 Redis 매핑

환경변수 Redis 인스턴스 DB 용도
AUTH_REDIS_BLACKLIST_URL rfr-auth-redis 0 JWT Blacklist
AUTH_REDIS_OAUTH_STATE_URL rfr-auth-redis 3 OAuth State
REDIS_STREAMS_URL rfr-streams-redis 0 SSE Events
CELERY_RESULT_BACKEND rfr-cache-redis 0 Task Results
IMAGE_REDIS_URL rfr-cache-redis 6 Image Cache

배포 확인 (실측)

ArgoCD Sync

ubuntu@k8s-master:~$ kubectl get applications -n argocd | grep redis
dev-redis-cluster                Synced        Healthy
dev-redis-operator               Synced        Healthy

Redis Operator

ubuntu@k8s-master:~$ kubectl get pods -n redis-operator -o wide
NAME                                  READY   STATUS    RESTARTS   AGE   IP                NODE         NOMINATED NODE   READINESS GATES
dev-redis-operator-5fc99ccfcf-ckqqz   1/1     Running   0          22m   192.168.235.236   k8s-master   <none>           <none>

RedisFailover 상태

ubuntu@k8s-master:~$ kubectl get redisfailover -n redis
NAME            NAME            REDIS   SENTINELS   AGE
auth-redis      auth-redis      1       1           26m
cache-redis     cache-redis     1       1           26m
streams-redis   streams-redis   1       1           26m

Pod 확인

ubuntu@k8s-master:~$ kubectl get pods -n redis -o wide
NAME                                 READY   STATUS    RESTARTS   AGE   IP                NODE                NOMINATED NODE   READINESS GATES
rfr-auth-redis-0                     3/3     Running   0          26m   192.168.75.196    k8s-redis-auth      <none>           <none>
rfr-cache-redis-0                    3/3     Running   0          26m   192.168.46.132    k8s-redis-cache     <none>           <none>
rfr-streams-redis-0                  3/3     Running   0          26m   192.168.169.196   k8s-redis-streams   <none>           <none>
rfs-auth-redis-66bf8f9657-dzp7v      2/2     Running   0          26m   192.168.75.195    k8s-redis-auth      <none>           <none>
rfs-cache-redis-7845fbdd47-l27dr     2/2     Running   0          26m   192.168.46.131    k8s-redis-cache     <none>           <none>
rfs-streams-redis-7d9c9986d9-twjdx   2/2     Running   0          26m   192.168.169.195   k8s-redis-streams   <none>           <none>

Note: rfr-* Pod는 3/3 (Redis + Exporter + Sentinel sidecar), rfs-*2/2 (Sentinel + Exporter)

Services

ubuntu@k8s-master:~$ kubectl get svc -n redis
NAME                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)     AGE
rfr-auth-redis      ClusterIP   None             <none>        9121/TCP    26m
rfr-cache-redis     ClusterIP   None             <none>        9121/TCP    26m
rfr-streams-redis   ClusterIP   None             <none>        9121/TCP    26m
rfs-auth-redis      ClusterIP   10.96.118.174    <none>        26379/TCP   26m
rfs-cache-redis     ClusterIP   10.111.135.35    <none>        26379/TCP   26m
rfs-streams-redis   ClusterIP   10.108.155.145   <none>        26379/TCP   26m

메모리 사용량 및 정책

ubuntu@k8s-master:~$ for pod in rfr-auth-redis-0 rfr-streams-redis-0 rfr-cache-redis-0; do
  kubectl exec -n redis $pod -c redis -- redis-cli INFO memory | grep used_memory_human
  kubectl exec -n redis $pod -c redis -- redis-cli CONFIG GET maxmemory-policy
done
used_memory_human:872.70K
maxmemory-policy
noeviction
used_memory_human:893.45K
maxmemory-policy
noeviction
used_memory_human:970.57K
maxmemory-policy
allkeys-lru
Instance Used Memory maxmemory-policy
auth-redis 872.66K noeviction
streams-redis 893.45K noeviction
cache-redis 926.23K allkeys-lru

 


트러블슈팅

1. CRD 미등록 에러

error: unable to recognize "redisfailover.yaml": 
no matches for kind "RedisFailover" in version "databases.spotahome.com/v1"

해결: Redis Operator가 먼저 배포되어야 함 (sync-wave 순서 확인)

2. Helm Values 형식 오류

# 오류: YAML 파싱 에러
Failed to load target state: yaml: line 4: did not find expected node content

원인: Spotahome Redis Operator Helm Chart의 values key가 버전별로 다름

설명
replicas (O) Operator Pod 수 (올바른 키)
replicaCount (X) 존재하지 않는 키
monitoring.enabled (O) Prometheus 메트릭 (올바른 키)
serviceMonitor.enabled (X) 존재하지 않는 키

해결: helm show values spotahome/redis-operator로 올바른 키 확인

3. PVC Pending

kubectl get pvc -n redis

해결: StorageClass 존재 여부, Node Affinity 확인


References