-
이코에코(Eco²) Streams & Scaling for SSE #2: 3-Node Redis Cluster 선언적 배포 (GitOps)이코에코(Eco²)/Event Streams & Scaling 2025. 12. 26. 14:52
개요
EC2 노드 프로비저닝 후 Kubernetes 클러스터에 Redis를 선언적으로 배포합니다.
Spotahome Redis Operator를 사용하여 Redis + Sentinel HA 구성을 자동화합니다.
Operator 선정: Spotahome vs Bitnami
후보 비교
기준 Spotahome Redis Operator Bitnami Redis (Helm) 배포 방식 CRD + Operator Helm Chart HA 구현 Redis + Sentinel (자동) 수동 설정 필요 Failover 자동 (Sentinel 관리) 수동 또는 외부 도구 리소스 관리 CR 단위 선언적 관리 values.yaml GitOps 친화도 높음 (CR = YAML) 중간 (Helm values) 유지보수 Operator가 조정(Reconcile) 직접 관리 Spotahome 선정 이유

https://github.com/spotahome/redis-operator
GitHub - spotahome/redis-operator: Redis Operator creates/configures/manages high availability redis with sentinel automatic fai
Redis Operator creates/configures/manages high availability redis with sentinel automatic failover atop Kubernetes. - spotahome/redis-operator
github.com
- 선언적 관리:
RedisFailoverCR 하나로 Redis + Sentinel 구성 - 자동 Failover: Master 장애 시 Sentinel이 자동으로 Replica 승격
- GitOps 적합: ArgoCD App-of-Apps 패턴과 자연스럽게 통합
- 운영 부담 최소화: Operator가 상태를 지속적으로 조정(Reconcile)
Spotahome Redis Operator 아키텍처
┌─────────────────────────────────────────────────────────────┐ │ Kubernetes Cluster │ │ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ redis-operator (Deployment) │ │ │ │ ├─ Watch: RedisFailover CRs │ │ │ │ ├─ Create: StatefulSet, Service, ConfigMap │ │ │ │ └─ Reconcile: 원하는 상태와 현재 상태 동기화 │ │ │ └──────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ RedisFailover CR (auth-redis) │ │ │ │ │ │ │ │ ┌─────────────────┐ ┌─────────────────┐ │ │ │ │ │ Redis Master │ │ Redis Replica │ │ │ │ │ │ (rfr-auth-redis)│ │ (rfr-auth-redis)│ │ │ │ │ └────────┬────────┘ └────────┬────────┘ │ │ │ │ │ │ │ │ │ │ ┌────────┴────────────────────┴────────┐ │ │ │ │ │ Sentinel Cluster │ │ │ │ │ │ ├─ Master 모니터링 │ │ │ │ │ │ ├─ Failover 판단 (quorum) │ │ │ │ │ │ └─ 클라이언트 리디렉션 │ │ │ │ │ └─────────────────────────────────────┘ │ │ │ └──────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────┘생성되는 리소스
CRD 생성 리소스 RedisFailover StatefulSet (Redis), StatefulSet (Sentinel), Service, ConfigMap 서비스 네이밍 규칙
rfr-<name> # Redis Master/Replica Service rfs-<name> # Sentinel Service예:
rfr-auth-redis.redis.svc.cluster.local:6379
RedisFailover CR 설계
auth-redis (보안 데이터)
apiVersion: databases.spotahome.com/v1 kind: RedisFailover metadata: name: auth-redis namespace: redis labels: app: auth-redis purpose: auth spec: sentinel: replicas: 3 # Quorum 보장 (2/3) resources: limits: cpu: 100m memory: 128Mi customConfig: - down-after-milliseconds 5000 - failover-timeout 10000 affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: redis-cluster operator: In values: [auth] redis: replicas: 3 # Master 1 + Replica 2 resources: limits: cpu: 300m memory: 512Mi storage: persistentVolumeClaim: spec: accessModes: [ReadWriteOnce] storageClassName: gp3 resources: requests: storage: 1Gi customConfig: - "maxmemory 256mb" - "maxmemory-policy noeviction" # 보안 데이터 보호 - "appendonly yes" # AOF 영속성 affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: redis-cluster operator: In values: [auth] exporter: enabled: true image: oliver006/redis_exporter:v1.62.0설계 근거:
- replicas: 3: Sentinel quorum(2/3)과 Redis Master + 2 Replica
- noeviction: JWT Blacklist, OAuth State는 삭제되면 안 됨
- PVC: Master 장애 후 Replica 승격 시 데이터 보존
streams-redis (SSE 이벤트)
apiVersion: databases.spotahome.com/v1 kind: RedisFailover metadata: name: streams-redis namespace: redis spec: sentinel: replicas: 1 # dev: 최소 구성 redis: replicas: 1 resources: limits: cpu: 200m memory: 512Mi storage: emptyDir: {} # 휘발성 (TTL로 자동 정리) customConfig: - "maxmemory 256mb" - "maxmemory-policy noeviction" # 이벤트 유실 방지 affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: redis-cluster operator: In values: [streams] exporter: enabled: true설계 근거:
- replicas: 1 (dev): 비용 절감, prod에서는 3으로 확장
- emptyDir: SSE 이벤트는 TTL(1시간)로 자동 삭제, 영속성 불필요
- noeviction: 처리 전 이벤트가 eviction되면 SSE 스트림 끊김
cache-redis (Celery 결과)
apiVersion: databases.spotahome.com/v1 kind: RedisFailover metadata: name: cache-redis namespace: redis spec: sentinel: replicas: 1 redis: replicas: 1 resources: limits: cpu: 300m memory: 768Mi storage: emptyDir: {} customConfig: - "maxmemory 512mb" - "maxmemory-policy allkeys-lru" # 메모리 부족 시 eviction affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: redis-cluster operator: In values: [cache] exporter: enabled: true설계 근거:
- allkeys-lru: Celery 결과는 일정 시간 후 불필요, eviction 허용
- emptyDir: 캐시 데이터는 휘발성, 재시작 시 재생성 가능
Eviction Policy 비교
Redis 인스턴스 Policy 근거 auth-redis noeviction JWT Blacklist 삭제 시 만료된 토큰 재사용 가능 streams-redis noeviction 처리 전 이벤트 삭제 시 SSE 스트림 끊김 cache-redis allkeys-lru 오래된 Celery 결과는 eviction해도 무방
ArgoCD Sync-wave 전략
의존성 순서
Sync-wave 24: PostgreSQL Sync-wave 27: Redis Operator (CRD + Deployment) Sync-wave 28: RedisFailover CRs (auth, streams, cache) Sync-wave 29+: 애플리케이션 (Redis 의존)ArgoCD Application 구성
# clusters/dev/apps/27-redis-operator.yaml apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: dev-redis-operator namespace: argocd annotations: argocd.argoproj.io/sync-wave: "27" spec: source: repoURL: https://spotahome.github.io/redis-operator chart: redis-operator targetRevision: 3.3.0 helm: values: | replicas: 1 image: repository: quay.io/spotahome/redis-operator tag: v1.3.0 resources: requests: cpu: 100m memory: 128Mi limits: cpu: 200m memory: 256Mi # Control Plane 노드에 배치 nodeSelector: role: control-plane tolerations: - key: node-role.kubernetes.io/control-plane operator: Exists effect: NoSchedule serviceAccount: create: true monitoring: enabled: false destination: server: https://kubernetes.default.svc namespace: redis-operator --- # clusters/dev/apps/28-redis-cluster.yaml apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: dev-redis-cluster namespace: argocd annotations: argocd.argoproj.io/sync-wave: "28" spec: source: repoURL: https://github.com/eco2-team/backend.git path: workloads/redis/dev targetRevision: develop destination: server: https://kubernetes.default.svc namespace: redissync-wave 간격의 의미:
- 27 → 28: Operator가 CRD를 등록한 후 CR 생성
- CR이 먼저 생성되면
no matches for kind "RedisFailover"에러
Kustomize 구조
workloads/redis/ ├── base/ │ ├── auth-redis-failover.yaml │ ├── streams-redis-failover.yaml │ ├── cache-redis-failover.yaml │ └── kustomization.yaml ├── dev/ │ └── kustomization.yaml # replicas: 1 patch └── prod/ └── kustomization.yaml # replicas: 3 patchDev 환경 Patch
# workloads/redis/dev/kustomization.yaml apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization resources: - ../base patches: # HA 비활성 (dev: replicas 1) - patch: | - op: replace path: /spec/sentinel/replicas value: 1 - op: replace path: /spec/redis/replicas value: 1 target: group: databases.spotahome.com version: v1 kind: RedisFailoverProd 환경
# workloads/redis/prod/kustomization.yaml apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization resources: - ../base # base가 이미 replicas: 3이므로 추가 patch 불필요
환경변수 매핑
External Secrets → Deployment
# workloads/secrets/external-secrets/dev/api-secrets.yaml apiVersion: external-secrets.io/v1beta1 kind: ExternalSecret spec: data: - secretKey: AUTH_REDIS_BLACKLIST_URL remoteRef: key: eco2/dev/api property: AUTH_REDIS_BLACKLIST_URL # 값: redis://rfr-auth-redis.redis.svc.cluster.local:6379/0서비스별 Redis 매핑
환경변수 Redis 인스턴스 DB 용도 AUTH_REDIS_BLACKLIST_URL rfr-auth-redis 0 JWT Blacklist AUTH_REDIS_OAUTH_STATE_URL rfr-auth-redis 3 OAuth State REDIS_STREAMS_URL rfr-streams-redis 0 SSE Events CELERY_RESULT_BACKEND rfr-cache-redis 0 Task Results IMAGE_REDIS_URL rfr-cache-redis 6 Image Cache
배포 확인 (실측)
ArgoCD Sync
ubuntu@k8s-master:~$ kubectl get applications -n argocd | grep redis dev-redis-cluster Synced Healthy dev-redis-operator Synced HealthyRedis Operator
ubuntu@k8s-master:~$ kubectl get pods -n redis-operator -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES dev-redis-operator-5fc99ccfcf-ckqqz 1/1 Running 0 22m 192.168.235.236 k8s-master <none> <none>RedisFailover 상태
ubuntu@k8s-master:~$ kubectl get redisfailover -n redis NAME NAME REDIS SENTINELS AGE auth-redis auth-redis 1 1 26m cache-redis cache-redis 1 1 26m streams-redis streams-redis 1 1 26mPod 확인
ubuntu@k8s-master:~$ kubectl get pods -n redis -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES rfr-auth-redis-0 3/3 Running 0 26m 192.168.75.196 k8s-redis-auth <none> <none> rfr-cache-redis-0 3/3 Running 0 26m 192.168.46.132 k8s-redis-cache <none> <none> rfr-streams-redis-0 3/3 Running 0 26m 192.168.169.196 k8s-redis-streams <none> <none> rfs-auth-redis-66bf8f9657-dzp7v 2/2 Running 0 26m 192.168.75.195 k8s-redis-auth <none> <none> rfs-cache-redis-7845fbdd47-l27dr 2/2 Running 0 26m 192.168.46.131 k8s-redis-cache <none> <none> rfs-streams-redis-7d9c9986d9-twjdx 2/2 Running 0 26m 192.168.169.195 k8s-redis-streams <none> <none>Note:
rfr-*Pod는3/3(Redis + Exporter + Sentinel sidecar),rfs-*는2/2(Sentinel + Exporter)Services
ubuntu@k8s-master:~$ kubectl get svc -n redis NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE rfr-auth-redis ClusterIP None <none> 9121/TCP 26m rfr-cache-redis ClusterIP None <none> 9121/TCP 26m rfr-streams-redis ClusterIP None <none> 9121/TCP 26m rfs-auth-redis ClusterIP 10.96.118.174 <none> 26379/TCP 26m rfs-cache-redis ClusterIP 10.111.135.35 <none> 26379/TCP 26m rfs-streams-redis ClusterIP 10.108.155.145 <none> 26379/TCP 26m메모리 사용량 및 정책
ubuntu@k8s-master:~$ for pod in rfr-auth-redis-0 rfr-streams-redis-0 rfr-cache-redis-0; do kubectl exec -n redis $pod -c redis -- redis-cli INFO memory | grep used_memory_human kubectl exec -n redis $pod -c redis -- redis-cli CONFIG GET maxmemory-policy done used_memory_human:872.70K maxmemory-policy noeviction used_memory_human:893.45K maxmemory-policy noeviction used_memory_human:970.57K maxmemory-policy allkeys-lruInstance Used Memory maxmemory-policy auth-redis 872.66K noeviction streams-redis 893.45K noeviction cache-redis 926.23K allkeys-lru
트러블슈팅
1. CRD 미등록 에러
error: unable to recognize "redisfailover.yaml": no matches for kind "RedisFailover" in version "databases.spotahome.com/v1"해결: Redis Operator가 먼저 배포되어야 함 (sync-wave 순서 확인)
2. Helm Values 형식 오류
# 오류: YAML 파싱 에러 Failed to load target state: yaml: line 4: did not find expected node content원인: Spotahome Redis Operator Helm Chart의 values key가 버전별로 다름
키 설명 replicas(O)Operator Pod 수 (올바른 키) replicaCount(X)존재하지 않는 키 monitoring.enabled(O)Prometheus 메트릭 (올바른 키) serviceMonitor.enabled(X)존재하지 않는 키 해결:
helm show values spotahome/redis-operator로 올바른 키 확인3. PVC Pending
kubectl get pvc -n redis해결: StorageClass 존재 여부, Node Affinity 확인
References
'이코에코(Eco²) > Event Streams & Scaling' 카테고리의 다른 글
- 선언적 관리: