ADR: Info Service 3-Tier Memory Architecture

이코에코(Eco²) Knowledge Base/Plans 2026. 1. 17. 02:39

https://developers.naver.com/docs/serviceapi/search/news/news.md#%EB%89%B4%EC%8A%A4

상태: In-Progress
일자: 2026-01-17
관련 ADR: info-service-adr.md

리뷰 히스토리: Info Service ADR 초안 작성 - Redis-Only의 한계 분석 및 3-Tier 제안

1. Overview

Info 서비스의 뉴스 수집 → 저장 → 캐싱 파이프라인(Redis-Only) 초안으로는 API Call Limits 내 무한 스크롤을 구현 및 운영하기엔 한계가 명확하다고 판단, Persist Layer(Postgres) 추가 및 Read/Write 저장소 분리를 거치며 메모리 계층을 고도화 했습니다.

1.1 핵심 설계 원칙

Write/Read 분리	Worker가 수집/저장, API가 조회만 담당
Eventual Consistency	5분 주기 갱신으로 최신성과 안정성 균형
Graceful Degradation	Redis 미스 시 Postgres Fallback
Infinite Scroll	커서 기반 페이지네이션으로 TTL 만료에도 끊김 없음

1.2 현재 상태 (AS-IS)

Redis Only (TTL 1h)
├── news:feed:{category}  → Sorted Set
├── news:article:{id}     → Hash
└── 문제점:
    ├─ TTL 만료 시 커서 깨짐
    ├─ 30분 스크롤 중 데이터 유실 가능
    └─ "진짜 무한 스크롤" 불가능

1.3 목표 상태 (TO-BE)

3-Tier Architecture
├── Celery Beat (5분 주기)
│   └─ info_worker 트리거
├── Postgres (Source of Truth)
│   └─ 영구 저장, 커서 기반 페이지네이션 백본
├── Redis (Hot Cache)
│   └─ feed: 1h TTL, article: 24h TTL
└── Info API (읽기 전용)
    └─ Cache-Aside + Postgres Fallback

2. Architecture Diagram (아키텍처 다이어그램)

┌─────────────────────────────────────────────────────────────────────┐
│                        3-Tier Architecture                           │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│   ┌─────────────┐                                                   │
│   │ Celery Beat │  ← 5분 주기 스케줄링                              │
│   └──────┬──────┘                                                   │
│          │                                                           │
│          ▼                                                           │
│   ┌─────────────┐                                                   │
│   │ info_worker │  ← 뉴스 수집 Worker                               │
│   │ (Collector) │                                                   │
│   └──────┬──────┘                                                   │
│          │                                                           │
│          │  1. Fetch from News APIs (Naver, NewsData)               │
│          │  2. Dedup & Classify                                     │
│          │  3. UPSERT to Postgres                                   │
│          │  4. Warm Redis Cache                                     │
│          ▼                                                           │
│   ┌─────────────┐         ┌─────────────┐         ┌─────────────┐  │
│   │External APIs│────────▶│  Postgres   │────────▶│    Redis    │  │
│   │  (Source)   │         │  (Truth)    │         │   (Cache)   │  │
│   └─────────────┘         └─────────────┘         └─────────────┘  │
│                                  │                       │          │
│                                  │   Fallback           │ Primary  │
│                                  │                       │          │
│                                  ▼                       ▼          │
│                           ┌─────────────────────────────────┐       │
│                           │          info (API)             │       │
│                           │          (Reader)               │       │
│                           └─────────────────────────────────┘       │
│                                        │                            │
│                                        ▼                            │
│                           ┌─────────────────────────────────┐       │
│                           │           Client                │       │
│                           │     (Infinite Scroll UI)        │       │
│                           └─────────────────────────────────┘       │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

3. Component Design (컴포넌트 설계)

3.1 Info Worker (Collector)

역할: 주기적 뉴스 수집, DB 저장, 캐시 워밍

apps/info_worker/
├── setup/
│   ├── config.py           # Pydantic Settings
│   ├── celery.py           # Celery app config
│   └── dependencies.py     # DI Container
├── domain/
│   └── entities/           # NewsArticle (기존 재사용)
├── application/
│   ├── commands/
│   │   └── collect_news_command.py
│   ├── ports/
│   │   ├── news_source.py         # 외부 API Port
│   │   ├── news_repository.py     # Postgres Port
│   │   └── news_cache.py          # Redis Port
│   └── services/
│       └── news_aggregator.py     # 병합/분류 로직
├── infrastructure/
│   ├── integrations/
│   │   ├── naver/                 # NaverNewsClient
│   │   └── newsdata/              # NewsDataClient
│   ├── persistence/
│   │   └── postgres_news_repository.py
│   └── cache/
│       └── redis_news_cache.py
└── presentation/
    └── tasks/
        └── collect_news_task.py   # Celery Task

3.2 Info API (Reader)

역할: 캐시/DB 읽기, 응답 반환 (기존 구조 단순화)

# AS-IS: FetchNewsCommand가 수집 + 조회 모두 담당
class FetchNewsCommand:
    async def execute(self, request):
        if not await self._news_cache.is_fresh(category):
            await self._refresh_cache(category)  # 수집까지 담당
        return await self._get_from_cache(request)

# TO-BE: FetchNewsCommand는 조회만 담당
class FetchNewsCommand:
    async def execute(self, request):
        # 1. Redis 조회 (Primary)
        result = await self._news_cache.get_articles(request)
        if result.articles:
            return result

        # 2. Postgres Fallback
        result = await self._news_repository.get_articles(request)

        # 3. 백그라운드 캐시 워밍 (선택)
        if result.articles:
            await self._trigger_cache_warm(category)

        return result

3.3 Postgres Schema

-- ============================================================
-- 테이블: news_articles
-- 목적: 뉴스 기사 영구 저장 및 커서 기반 페이지네이션
-- ============================================================

CREATE TABLE news_articles (
    -- PK: URL 기반 해시 (중복 제거)
    id              TEXT PRIMARY KEY,

    -- 기본 정보
    url             TEXT NOT NULL,
    title           TEXT NOT NULL,
    snippet         TEXT NOT NULL,

    -- 소스 정보
    source          TEXT NOT NULL,      -- 'naver', 'newsdata'
    source_name     TEXT NOT NULL,
    source_icon_url TEXT,

    -- 미디어
    thumbnail_url   TEXT,
    video_url       TEXT,

    -- 분류
    category        TEXT NOT NULL,      -- 'environment', 'energy', 'ai'
    keywords        TEXT[],
    ai_tag          TEXT,

    -- 시간
    published_at    TIMESTAMPTZ NOT NULL,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),

    -- Soft Delete (필요시)
    is_deleted      BOOLEAN NOT NULL DEFAULT FALSE,

    -- Constraints
    CONSTRAINT uq_news_articles_url UNIQUE (url)
);

-- ============================================================
-- 인덱스: 커서 기반 페이지네이션 (핵심)
-- 쿼리: WHERE (published_at, id) < (cursor) ORDER BY ... DESC
-- ============================================================

CREATE INDEX idx_news_cursor
    ON news_articles (category, published_at DESC, id DESC)
    WHERE is_deleted = FALSE;

-- 소스별 조회
CREATE INDEX idx_news_source
    ON news_articles (source, published_at DESC);

-- 최근 기사 조회 (캐시 워밍용)
CREATE INDEX idx_news_recent
    ON news_articles (published_at DESC)
    WHERE is_deleted = FALSE;

3.4 Redis Cache Design

이중 TTL 전략:

키 패턴	타입	TTL	용도
`news:feed:{category}`	Sorted Set	1h	피드 인덱스 (최신 N개 ID)
`news:article:{id}`	Hash	24h	기사 상세 데이터
`news:lock:{category}`	String	30s	캐시 스탬피드 방지 락
`news:meta:{category}`	Hash	1h	메타 정보 (last_refresh 등)

Stale-While-Revalidate 구현:

# news:meta:{category} 구조
{
    "last_refresh": "2026-01-17T10:00:00Z",
    "article_count": "150",
    "is_stale": "false"  # Worker 장애 감지용
}

# 읽기 경로
async def get_articles(self, category: str) -> list[NewsArticle]:
    # 1. 캐시 조회
    articles = await self._redis.zrevrange(f"news:feed:{category}", 0, limit)

    # 2. 결과 있으면 즉시 반환 (Stale 여부 무관)
    if articles:
        # 3. Stale 체크 → 백그라운드 갱신 요청
        meta = await self._redis.hgetall(f"news:meta:{category}")
        if self._is_stale(meta):
            await self._trigger_background_refresh(category)
        return articles

    # 4. 캐시 미스 → Postgres Fallback
    return await self._news_repository.get_articles(category)

4. Cursor-based Pagination (커서 기반 페이지네이션)

4.1 커서 인코딩 형식

# 형식: {published_at_ms}_{article_id}
# 예시: "1737100800000_naver_abc123"

def encode_cursor(published_at: datetime, article_id: str) -> str:
    """커서 인코딩."""
    ts_ms = int(published_at.timestamp() * 1000)
    return f"{ts_ms}_{article_id}"

def decode_cursor(cursor: str) -> tuple[datetime, str]:
    """커서 디코딩."""
    ts_str, article_id = cursor.rsplit("_", 1)
    published_at = datetime.fromtimestamp(int(ts_str) / 1000, tz=timezone.utc)
    return published_at, article_id

4.2 Postgres 쿼리

-- 첫 페이지 (커서 없음)
SELECT * FROM news_articles
WHERE category = $1 AND is_deleted = FALSE
ORDER BY published_at DESC, id DESC
LIMIT $2;

-- 다음 페이지 (커서 사용)
SELECT * FROM news_articles
WHERE category = $1
  AND is_deleted = FALSE
  AND (published_at, id) < ($2, $3)  -- 커서 튜플 비교
ORDER BY published_at DESC, id DESC
LIMIT $4;

4.3 API 응답

{
  "articles": [
    {
      "id": "naver_abc123",
      "title": "환경부 분리배출 정책 변경",
      "url": "https://...",
      "thumbnail_url": "https://...",
      "published_at": "2026-01-17T10:00:00Z"
    }
  ],
  "next_cursor": "1737100800000_naver_abc123",
  "has_more": true,
  "meta": {
    "total_cached": 150,
    "cache_expires_in": 3400,
    "source": "redis"  // 또는 "postgres"
  }
}

5. Celery Beat Configuration (스케줄링)

5.1 Beat Schedule

# apps/info_worker/setup/celery.py

from celery import Celery
from celery.schedules import crontab

celery_app = Celery("info_worker")

celery_app.conf.update(
    broker_url=settings.celery_broker_url,
    result_backend=settings.celery_result_backend,
    task_routes={
        "info.collect_news": {"queue": "info.collect_news"},
    },
    beat_schedule={
        # 전체 카테고리 수집 (5분 주기)
        "collect-news-all": {
            "task": "info.collect_news",
            "schedule": 300.0,  # 5분
            "kwargs": {"category": "all"},
        },
        # NewsData 별도 스케줄 (30분 주기 - Rate Limit 대응)
        "collect-news-newsdata": {
            "task": "info.collect_news_newsdata",
            "schedule": 1800.0,  # 30분
            "kwargs": {"category": "all"},
        },
    },
)

5.2 Rate Limit 대응 전략

Naver	25,000	5분	288회	OK
NewsData	200	30분	48회	OK

# NewsData는 별도 스케줄로 분리
@celery_app.task(name="info.collect_news_newsdata")
def collect_news_newsdata_task(category: str) -> dict:
    """NewsData 전용 수집 태스크 (30분 주기)."""
    # Rate Limit 체크
    status = rate_limiter.check_and_consume("newsdata")
    if not status.is_allowed:
        logger.warning("NewsData rate limited, skipping")
        return {"status": "skipped", "reason": "rate_limited"}

    # NewsData만 호출
    command = CollectNewsCommand(
        news_sources=[newsdata_client],  # NewsData만
        news_repository=repository,
        news_cache=cache,
    )
    return command.execute(category)

6. Data Flow (데이터 흐름)

6.1 Write Path (Worker → Postgres → Redis)

┌─────────────────────────────────────────────────────────────────┐
│                    collect_news_task                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  1. Rate Limit 확인                                             │
│     └─ Naver: 25,000/day, NewsData: 200/day                     │
│     └─ 초과 시 해당 소스 skip                                   │
│                                                                  │
│  2. 뉴스 API 호출 (병렬)                                         │
│     └─ asyncio.gather(naver.fetch(), newsdata.fetch())          │
│                                                                  │
│  3. 병합 및 중복 제거                                            │
│     └─ URL 기반 dedup (id = hash(url))                          │
│                                                                  │
│  4. 카테고리 분류                                                │
│     └─ 키워드 매칭 + NewsData ai_tag 활용                       │
│                                                                  │
│  5. OG 이미지 추출 (Naver 기사)                                  │
│     └─ 썸네일 없는 기사만 처리                                  │
│     └─ Semaphore(10)으로 동시 요청 제한                         │
│                                                                  │
│  6. Postgres UPSERT                                             │
│     └─ INSERT ... ON CONFLICT (url) DO UPDATE                   │
│     └─ 기존 기사는 updated_at만 갱신                            │
│                                                                  │
│  7. Redis 캐시 워밍                                              │
│     └─ feed: ZADD (최신 200개)                                  │
│     └─ article: HSET (개별 기사)                                │
│     └─ meta: last_refresh 갱신                                  │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

6.2 Read Path (API → Redis → Postgres)

┌─────────────────────────────────────────────────────────────────┐
│                    fetch_news_command                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  1. Redis 캐시 조회 (Primary)                                    │
│     ├─ ZREVRANGEBYSCORE feed:{category}                         │
│     ├─ HMGET article:{id}...                                    │
│     └─ Hit → 4로 (응답 반환)                                    │
│                                                                  │
│  2. Postgres Fallback (Cache Miss)                              │
│     └─ SELECT ... WHERE (published_at, id) < cursor             │
│     └─ 커서 기반 페이지네이션                                   │
│                                                                  │
│  3. (선택) 백그라운드 캐시 워밍 트리거                           │
│     └─ Celery task로 캐시 갱신 요청                             │
│                                                                  │
│  4. 응답 반환                                                    │
│     └─ articles + next_cursor + meta                            │
│     └─ meta.source = "redis" | "postgres"                       │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

7. Deployment (배포)

7.1 Kubernetes Resources

# 1. Info Worker Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: info-worker
  namespace: info
spec:
  replicas: 1  # 단일 인스턴스 권장 (중복 수집 방지)
  selector:
    matchLabels:
      app: info-worker
  template:
    spec:
      containers:
      - name: info-worker
        image: eco2:info-worker
        command: [celery]
        args:
        - -A
        - info_worker.setup.celery:celery_app
        - worker
        - --loglevel=info
        - -Q
        - info.collect_news
        - -c
        - "4"  # 동시성 (OG 이미지 추출용)
        env:
        - name: CELERY_BROKER_URL
          valueFrom:
            secretKeyRef:
              name: info-secret
              key: CELERY_BROKER_URL
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: info-secret
              key: DATABASE_URL

# 2. Info Beat (기존 celery-beat에 통합 또는 별도)
# - replicas: 1 (필수 - 중복 스케줄링 방지)
# - strategy: Recreate

7.2 RabbitMQ Topology

# Exchange
apiVersion: rabbitmq.com/v1beta1
kind: Exchange
metadata:
  name: info-direct
  namespace: rabbitmq
spec:
  name: info.direct
  type: direct
  vhost: eco2
  rabbitmqClusterReference:
    name: rfr-rabbitmq

# Queue
apiVersion: rabbitmq.com/v1beta1
kind: Queue
metadata:
  name: info-collect-news
  namespace: rabbitmq
spec:
  name: info.collect_news
  vhost: eco2
  durable: true
  rabbitmqClusterReference:
    name: rfr-rabbitmq

# Binding
apiVersion: rabbitmq.com/v1beta1
kind: Binding
metadata:
  name: info-collect-news-binding
  namespace: rabbitmq
spec:
  source: info.direct
  destination: info.collect_news
  destinationType: queue
  routingKey: info.collect_news
  vhost: eco2
  rabbitmqClusterReference:
    name: rfr-rabbitmq

8. Migration Plan (마이그레이션 계획)

Phase 1: Postgres 스키마 생성

news_articles 테이블 생성 (DDL)
인덱스 생성 (커서, 소스, 최근)
기존 Redis 데이터 마이그레이션 (선택)

Phase 2: Info Worker 구현

apps/info_worker/ 디렉토리 구조 생성
CollectNewsCommand 구현 (기존 로직 이전)
Celery Task 정의 (collect_news_task.py)
Beat Schedule 설정
K8s 매니페스트 작성

Phase 3: Info API 수정

FetchNewsCommand 읽기 전용으로 단순화
NewsRepositoryPort 추가 (Postgres)
Postgres Fallback 로직 구현
커서 인코딩 변경 ({ts}_{id})

Phase 4: 배포 및 검증

개발 환경 배포
통합 테스트 (Write → Read 경로)
Rate Limit 동작 확인
프로덕션 배포

9. Alternatives Considered (대안 검토)

9.1 Redis Only (현재 상태 유지)

장점:

단순한 아키텍처
빠른 응답 속도

단점:

TTL 만료 시 데이터 유실
진짜 무한 스크롤 불가능
Worker 장애 시 캐시 stale 지속

결정: 기각 - 무한 스크롤 요구사항 충족 불가

9.2 Postgres Only (Redis 제거)

장점:

아키텍처 단순화
데이터 일관성 보장
운영 포인트 감소

단점:

첫 페이지 레이턴시 증가
DB 부하 증가

결정: 기각 - 응답 속도 저하 우려

9.3 Postgres 18+ Caching (Redis 대체)

장점:

단일 데이터스토어
캐시 무효화 단순화
운영 복잡도 감소

단점:

Postgres 18+ 필요 (현재 15 사용)
프로덕션 검증 부족

결정: 향후 고려 - Postgres 업그레이드 시 재검토

10. Consequences (결과)

Positive

진짜 무한 스크롤 구현 가능
TTL 만료에도 데이터 유지
API와 Worker 분리로 확장성 향상
Rate Limit 관리 체계화
장애 격리 (Worker 장애가 API에 영향 없음)

Negative

인프라 복잡도 증가 (Postgres 추가)
Worker 운영 오버헤드
데이터 일관성 관리 필요
모니터링 포인트 증가

Risks

NewsData Rate Limit 초과	글로벌 뉴스 누락	30분 주기로 분리
Postgres 장애	캐시 미스 시 서비스 불가	Redis TTL 연장으로 버퍼
Worker 장애	캐시 stale 상태 지속	is_stale 플래그로 감지
중복 수집	DB 부하 증가	UPSERT로 idempotent 처리

11. Implementation Checklist (구현 체크리스트)

P0: 필수 (Critical)

Postgres news_articles 테이블 및 인덱스 생성
info_worker 기본 구조 생성
CollectNewsCommand 구현
Celery Beat Schedule 설정
K8s Deployment 매니페스트

P1: 중요 (High Priority)

FetchNewsCommand 읽기 전용 리팩토링
PostgresNewsRepository 구현
커서 인코딩 {ts}_{id} 형식 변경
Postgres Fallback 로직 구현
RabbitMQ Topology CR 작성

P2: 권장 (Medium Priority)

Stale-While-Revalidate 구현
Rate Limiter 소스별 분리 스케줄
news:meta:{category} 메타 정보 관리
백그라운드 캐시 워밍 트리거

P3: 선택 (Low Priority)

기존 Redis 데이터 Postgres 마이그레이션
Worker 헬스체크 엔드포인트
Grafana 대시보드 (수집 메트릭)
DLQ 재처리 스케줄

12. References (참고 자료)

'이코에코(Eco²) Knowledge Base > Plans' 카테고리의 다른 글

ADR: LangGraph Native Streaming (0)	2026.01.16
ADR: Agentic Chat Worker Production Ready (0)	2026.01.16
ADR: Chat Worker Prompt Strategy (0)	2026.01.14
ADR: Workflow Pattern Decision for Chat (1)	2026.01.13
ADR: Async Job Queue Decision for Chat (0)	2026.01.13

ABOUT ME