이코에코(Eco²) Agent #9: Vision Processing

이코에코(Eco²)/Agent 2026. 1. 14. 17:02

Chat Worker에 Vision 기능 추가 — 이미지 기반 폐기물 분류

1. 개요

1.1 AS-IS: 기존 Vision 파이프라인 (scan_worker)

Vision 처리의 원형은 scan_worker에 구현되어 있었습니다. Celery Task 체인 방식으로 동작합니다:

┌─────────────────────────────────────────────────────────────────────┐
│                AS-IS: scan_worker 파이프라인                         │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌────────────┐    ┌────────────┐    ┌────────────┐    ┌─────────┐ │
│  │ vision_task│───▶│ rule_task  │───▶│answer_task │───▶│reward   │ │
│  │  (Stage 1) │    │  (Stage 2) │    │  (Stage 3) │    │_task    │ │
│  └────────────┘    └────────────┘    └────────────┘    └─────────┘ │
│        │                │                  │                        │
│        ▼                ▼                  ▼                        │
│  ┌────────────┐    ┌────────────┐    ┌────────────┐                │
│  │ VisionStep │    │  RuleStep  │    │ AnswerStep │                │
│  │            │    │            │    │            │                │
│  │ GPT Vision │    │ JSON 검색  │    │ JSON 출력  │                │
│  └────────────┘    └────────────┘    └────────────┘                │
│                                                                     │
│  ClassifyContext (데이터 전달)                                       │
│  ├─ task_id, user_id, image_url                                    │
│  ├─ classification: { major, middle, minor }                       │
│  ├─ lite_rag_result: { disposal_common, disposal_detail }          │
│  └─ answer: { disposal_steps, insufficiencies, user_answer }       │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

핵심 파일:

apps/scan_worker/presentation/tasks/vision_task.py — Celery Task
apps/scan_worker/application/classify/steps/vision_step.py — Vision Step
apps/scan_worker/infrastructure/llm/gpt/vision.py — GPT Vision Adapter
apps/scan_worker/infrastructure/retrievers/json_regulation.py — Rule-based 검색

AS-IS 특징:

# Celery Task 체인
vision_task.s(task_id, user_id, image_url) | 
rule_task.s() | 
answer_task.s() | 
reward_task.s()

# 답변 출력: Structured JSON (무거움)
{
  "disposal_steps": { "단계1": "...", "단계2": "..." },
  "insufficiencies": ["라벨을 제거해야 합니다"],
  "user_answer": "페트병은..."
}

1.2 TO-BE: LangGraph 기반 재구성 (chat_worker)

chat_worker에서는 LangGraph 파이프라인으로 재구성했습니다:

┌─────────────────────────────────────────────────────────────────────┐
│                TO-BE: chat_worker 파이프라인                         │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌────────┐   ┌────────┐   ┌────────┐   ┌────────┐   ┌────────┐   │
│  │ intent │──▶│vision? │──▶│ router │──▶│  rag   │──▶│ answer │   │
│  │  node  │   │  node  │   │  node  │   │  node  │   │  node  │   │
│  └────────┘   └────────┘   └────────┘   └────────┘   └────────┘   │
│       │            │            │            │            │        │
│       ▼            ▼            ▼            ▼            ▼        │
│  ┌────────┐   ┌────────┐   ┌────────┐   ┌────────┐   ┌────────┐   │
│  │LLMPort │   │Vision  │   │조건부   │   │Retriev-│   │LLMPort │   │
│  │        │   │Model   │   │라우팅   │   │erPort  │   │+Stream │   │
│  │        │   │Port    │   │        │   │        │   │        │   │
│  └────────┘   └────────┘   └────────┘   └────────┘   └────────┘   │
│                                                                     │
│  State Dict (데이터 전달)                                            │
│  ├─ job_id, user_id, message, image_url                            │
│  ├─ classification_result: { major, middle, minor }                │
│  ├─ disposal_rules: { ... }  (LocalAssetRetriever)                 │
│  └─ answer: "자연어 텍스트 (토큰 스트리밍)"                           │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

핵심 파일:

apps/chat_worker/infrastructure/orchestration/langgraph/nodes/vision_node.py
apps/chat_worker/application/ports/vision/vision_model.py — VisionModelPort
apps/chat_worker/infrastructure/llm/vision/openai_vision.py — GPT-5.2 Adapter
apps/chat_worker/infrastructure/retrieval/local_asset_retriever.py — Rule-based 검색

TO-BE 특징:

# LangGraph 파이프라인
graph = StateGraph(dict)
graph.add_node("intent", intent_node)
graph.add_node("vision", vision_node)
graph.add_node("rag", rag_node)
graph.add_node("answer", answer_node)

# 답변 출력: 자연어 텍스트 (스트리밍 가능)
"페트병이네요! 투명 페트병 전용 수거함에 버리면 돼요.
라벨을 떼고, 내용물을 비운 뒤 납작하게 압착해서 배출해주세요."

1.3 AS-IS vs TO-BE 비교

항목	AS-IS (scan_worker)	TO-BE (chat_worker)
오케스트레이션	Celery Task 체인	LangGraph StateGraph
데이터 전달	ClassifyContext	State Dict
Vision	VisionStep	vision_node
검색	json_regulation.py	LocalAssetRetriever
답변 형식	Structured JSON	자연어 텍스트
SSE 스트리밍	❌ 불가	✅ 토큰 단위
조건부 라우팅	❌ 고정 체인	✅ intent 기반 분기
테스트	어려움 (Celery 의존)	용이 (Mock 주입)

유지되는 것:

✅ Vision 분류 결과 (classification_result)
✅ Rule-based Retrieval (LocalAssetRetriever ≈ json_regulation.py)
✅ 3단계 분류 체계 (major/middle/minor)
✅ situation_tags 처리

개선된 것:

🔄 Celery → LangGraph (유연한 분기)
🔄 JSON 출력 → 자연어 (SSE 스트리밍)
🔄 고정 파이프라인 → Intent 기반 동적 라우팅

1.5 프롬프트 구조 개선

기존 scan_worker의 답변 생성 프롬프트는 Structured JSON을 반환했습니다:

// scan_worker/answer_generation_prompt.txt 출력 형식
{
  "disposal_steps": { "단계1": "...", "단계2": "..." },
  "insufficiencies": ["라벨을 제거해야 합니다"],
  "user_answer": "페트병은 투명 페트병 수거함에..."
}

문제점:

무거운 응답 구조 — 프론트엔드에서 파싱 후 재구성 필요
SSE 스트리밍 불가 — JSON 전체가 완성되어야 파싱 가능
토큰 낭비 — JSON 키와 구조에 토큰 소모

개선된 프롬프트 (chat_worker/waste_answer_prompt.txt):

# Output
자연어로 답변하세요. Markdown 서식 사용 가능.

# 금지사항
- JSON 형식 출력 금지

┌─────────────────────────────────────────────────────────────┐
│                  프롬프트 구조 비교                          │
├─────────────────────────────────────────────────────────────┤
│  scan_worker (기존):                                        │
│  - 출력: Structured JSON                                    │
│  - 스트리밍: ❌ 불가 (전체 JSON 완성 필요)                    │
│  - 프론트: JSON 파싱 → UI 매핑 필요                          │
│                                                             │
│  chat_worker (개선):                                        │
│  - 출력: 자연어 텍스트 + Markdown                            │
│  - 스트리밍: ✅ 토큰 단위 SSE 가능                           │
│  - 프론트: 그대로 렌더링 가능                                │
└─────────────────────────────────────────────────────────────┘

1.6 SSE 스트리밍 응답 설계

Vision 분류 후 답변을 토큰 단위로 스트리밍해야 합니다:

┌─────────────────────────────────────────────────────────────┐
│                   SSE 스트리밍 플로우                         │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Vision Node                                                │
│  └─ stage: "vision" (15% → 25%)                            │
│       └─ 분류 결과 발행 (JSON, 1회)                          │
│                                                             │
│  Answer Node                                                │
│  └─ stage: "answer" (75%)                                  │
│       └─ token: "페" "트" "병" "은" "..." (스트리밍)          │
│                                                             │
│  Done                                                       │
│  └─ stage: "done" (100%)                                   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

핵심 고려사항:

이벤트 타입	내용	스트리밍 여부
`stage:vision`	분류 결과 (major/middle/minor)	❌ 1회 발행
`stage:answer`	답변 생성 시작	❌ 1회 발행
`token`	답변 토큰	✅ 토큰 단위
`stage:done`	완료	❌ 1회 발행

LLM 스트리밍 설정:

# answer_node에서 스트리밍 호출
async for chunk in llm.stream(prompt):
    await event_publisher.notify_token(
        task_id=job_id,
        token=chunk.content,
    )

1.4 재구성 목표

┌─────────────────────────────────────────────────────────────┐
│                   Clean Architecture 재구성 목표             │
├─────────────────────────────────────────────────────────────┤
│  1. Port/Adapter 패턴으로 LLM 의존성 분리                    │
│  2. LangGraph 노드로 파이프라인 통합                         │
│  3. 진행 이벤트(SSE) 발행 추가                               │
│  4. 기존 텍스트 플로우 호환 유지                              │
└─────────────────────────────────────────────────────────────┘

2. 아키텍처 설계

2.1 파이프라인 플로우 변경

Before (텍스트 전용):

START → intent → router → [waste_rag/character/location/general] → answer → END

After (Vision 추가):

START → intent → [vision?] → router → [waste_rag/...] → answer → END

┌────────────────────────────────────────────────────────────────────┐
│                      Vision 추가 플로우                             │
│                                                                    │
│   ┌─────────┐      ┌──────────┐      ┌──────────┐                 │
│   │  START  │─────▶│  intent  │─────▶│image_url?│                 │
│   └─────────┘      └──────────┘      └────┬─────┘                 │
│                                           │                        │
│                              ┌────────────┴────────────┐          │
│                              │                         │          │
│                          있음▼                     없음▼          │
│                        ┌──────────┐            ┌──────────┐       │
│                        │  vision  │            │          │       │
│                        │ (분류)   │            │          │       │
│                        └────┬─────┘            │          │       │
│                             │                  │          │       │
│                             ▼                  ▼          │       │
│                        ┌─────────────────────────┐        │       │
│                        │        router           │◀───────┘       │
│                        │   (intent 기반 분기)     │               │
│                        └───────────┬─────────────┘               │
│                                    │                              │
│            ┌───────────┬───────────┼───────────┬──────────┐       │
│            ▼           ▼           ▼           ▼          │       │
│       ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐      │       │
│       │ waste  │  │  char  │  │  loc   │  │general │      │       │
│       │  _rag  │  │ (gRPC) │  │ (gRPC) │  │(pass)  │      │       │
│       └───┬────┘  └───┬────┘  └───┬────┘  └───┬────┘      │       │
│           │           │           │           │           │       │
│           └───────────┴───────────┴───────────┘           │       │
│                           │                               │       │
│                      ┌────▼────┐                          │       │
│                      │ answer  │◀─────────────────────────┘       │
│                      └────┬────┘                                  │
│                           │                                       │
│                      ┌────▼────┐                                  │
│                      │   END   │                                  │
│                      └─────────┘                                  │
└────────────────────────────────────────────────────────────────────┘

플로우 요약:

Vision 플로우:  intent → vision → router → waste_rag → answer
텍스트 플로우:  intent → router → [waste/char/loc/general] → answer

조건부 라우팅 로직:

image_url 존재 + classification_result 미존재 → Vision 노드
그 외 → Router로 직행

2.3 설계 결정: Vision 후 Intent 기반 분기 유지

논의된 대안:

옵션	설명	장점	단점
A. image_url → waste_rag 직행	이미지 있으면 무조건 RAG	단순, 90%+ 케이스 최적화	유연성 부족
B. Vision → Router → intent 분기 ✅	이미지 있어도 intent에 따라 분기	유연성 확보	약간 복잡

선택: B안 (현재 방식 유지)

이유:

1. [📷 페트병 사진] + "이거 어떻게 버려요?"
   → intent: waste → waste_rag ✅ (90%+ 케이스)

2. [📷 페트병 사진] + "이 근처 재활용 센터 어디 있어?"
   → intent: location → location_node
   → classification_result는 state에 남아있음 (나중에 활용 가능)

3. [📷 페트병 사진] + "이코야 안녕~"
   → intent: character → character_node

사용자가 이미지를 보내면서 다른 의도의 질문을 할 수 있습니다. 예를 들어 페트병 사진을 보내면서 "이 근처 재활용 센터 어디 있어?"라고 물을 수 있죠.

이 경우 Vision 분류 결과(classification_result)는 state에 저장되어 있고, 나중에 필요하면 활용할 수 있습니다. Intent 기반 분기를 유지하면 다양한 사용자 시나리오에 대응 가능합니다.

# factory.py
def route_after_intent(state):
    if state.get("image_url") and not state.get("classification_result"):
        return "vision"
    return "router"  # intent 기반 분기로

# Vision 후에도 router를 거쳐 intent에 따라 분기
graph.add_edge("vision", "router")

2.2 Clean Architecture 계층 분리

┌─────────────────────────────────────────────────────────────┐
│                     Vision 계층 구조                         │
│                                                             │
│  ┌───────────────────────────────────────────────────────┐  │
│  │              Application Layer (Port)                 │  │
│  │                                                       │  │
│  │   ┌─────────────────────────────────────────────┐     │  │
│  │   │            VisionModelPort (ABC)            │     │  │
│  │   │                                             │     │  │
│  │   │   + analyze_image(image_url, user_input)    │     │  │
│  │   │     → dict[classification, tags, conf]     │     │  │
│  │   └─────────────────────────────────────────────┘     │  │
│  └───────────────────────────────────────────────────────┘  │
│                              │                              │
│                              │ implements                   │
│                              ▼                              │
│  ┌───────────────────────────────────────────────────────┐  │
│  │            Infrastructure Layer (Adapters)            │  │
│  │                                                       │  │
│  │   ┌─────────────────┐     ┌─────────────────┐         │  │
│  │   │ OpenAIVision    │     │ GeminiVision    │         │  │
│  │   │    Client       │     │    Client       │         │  │
│  │   │                 │     │                 │         │  │
│  │   │  - gpt-5.2      │     │  - gemini-3-    │         │  │
│  │   │  - detail: high │     │    flash-preview│         │  │
│  │   └─────────────────┘     └─────────────────┘         │  │
│  └───────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

핵심 원칙: Application Layer는 구체적인 LLM 구현을 몰라야 합니다.

3. 의사결정

3.1 Vision 모델 선택: GPT-5.2 vs Gemini 3 Flash

비교 항목	GPT-5.2	Gemini 3 Flash Preview
비용	~$0.002/이미지 (low)	~$0.0001/이미지
속도	~1-2초	~0.5-1초
정확도	매우 높음	매우 높음
구조화 출력	Structured Outputs	response_schema
한국어	우수	우수
특징	멀티모달 통합	멀티모달 통합

모델 구성 (2026년 1월 기준):

OpenAI: gpt-5.2 (텍스트+이미지 통합 멀티모달)

Google: gemini-3-flash-preview (텍스트+이미지 통합 멀티모달)

GPT-5.2 시리즈: gpt-5.2, gpt-5.2-pro, gpt-5.2-instant, gpt-5-mini

결정: 둘 다 지원하며 Factory Pattern으로 런타임에 교체 가능

# dependencies.py
def create_vision_client(
    provider: Literal["openai", "gemini"] = "openai",
) -> VisionModelPort:
    settings = get_settings()
    if provider == "gemini":
        return GeminiVisionClient(model=settings.gemini_default_model)
    return OpenAIVisionClient(model=settings.openai_default_model)

3.2 이미지 detail 레벨: high vs low

┌────────────────────────────────────────────────────────────┐
│               GPT-5.2 Vision detail 비교                    │
├────────────────────────────────────────────────────────────┤
│  detail: high  ✅ 선택                                      │
│  - 고해상도 타일 분할 처리                                   │
│  - 토큰: 이미지 크기에 비례                                  │
│  - 비용: 높음                                               │
│  - 용도: 세밀한 분류, 라벨/텍스트 인식                       │
├────────────────────────────────────────────────────────────┤
│  detail: low                                               │
│  - 저해상도 단일 이미지                                     │
│  - 토큰: 최소화                                             │
│  - 비용: 대폭 절감                                          │
│  - 용도: 대략적 분류만 필요한 경우                           │
└────────────────────────────────────────────────────────────┘

결정: detail: high — 정확한 폐기물 분류를 위해 고해상도 필요

이유:

폐기물 분류 시 라벨, 재질 표시, 세부 형태 인식 필요
"PP" vs "PE" 구분, 재활용 마크 인식 등 디테일 중요
비용보다 분류 정확도 우선

3.3 Vision 노드 위치: Intent 전 vs Intent 후

옵션	Intent → Vision	Intent → Vision?
장점	단순한 플로우	불필요한 Vision 호출 방지
단점	항상 Vision 호출	조건부 라우팅 복잡도 증가
비용	높음	낮음

결정: Intent 후 조건부 Vision 호출

def route_after_intent(state: dict) -> str:
    if state.get("image_url") and not state.get("classification_result"):
        return "vision"
    return "router"

4. 구현 상세

4.1 VisionModelPort (Application Layer)

# application/ports/vision/vision_model.py
class VisionModelPort(ABC):
    """Vision 모델 Port - 이미지 분류 추상 인터페이스."""

    @abstractmethod
    async def analyze_image(
        self,
        image_url: str,
        user_input: str | None = None,
    ) -> dict[str, Any]:
        """이미지 분석 → 분류 결과 반환.

        Returns:
            {
                "classification": {
                    "major_category": "재활용폐기물",
                    "middle_category": "플라스틱류",
                    "minor_category": "페트병",
                },
                "situation_tags": ["세척필요", "라벨제거필요"],
                "confidence": 0.95,
            }
        """
        pass

4.2 OpenAI Vision Client (Infrastructure Layer)

# infrastructure/llm/vision/openai_vision.py
class OpenAIVisionClient(VisionModelPort):
    """OpenAI GPT-5.2 Vision 클라이언트."""

    def __init__(self, model: str = "gpt-5.2", api_key: str | None = None):
        self._model = model
        self._client = OpenAI(api_key=api_key or os.environ.get("OPENAI_API_KEY"))
        self._prompt = self._load_prompt()

    async def analyze_image(self, image_url: str, user_input: str | None = None):
        response = self._client.beta.chat.completions.parse(
            model=self._model,
            messages=[
                {"role": "system", "content": self._prompt},
                {
                    "role": "user",
                    "content": [
                        {"type": "text", "text": user_input or "이 폐기물을 분류해주세요."},
                        {
                            "type": "image_url",
                            "image_url": {"url": image_url, "detail": "high"},
                        },
                    ],
                },
            ],
            response_format=VisionResult,  # Pydantic 구조화 출력
        )
        return response.choices[0].message.parsed.model_dump()

4.3 Vision Node (Orchestration Layer)

# infrastructure/orchestration/langgraph/nodes/vision_node.py
def create_vision_node(vision_model: VisionModelPort, event_publisher: ProgressNotifierPort):
    async def vision_node(state: dict) -> dict:
        job_id = state.get("job_id", "")
        image_url = state.get("image_url")

        # 이미지 없으면 스킵
        if not image_url:
            return state

        # 1. 진행 이벤트 발행
        await event_publisher.notify_stage(
            task_id=job_id,
            stage="vision",
            status="processing",
            progress=15,
            message="🔍 이미지 분석 중...",
        )

        try:
            # 2. Vision 모델 호출
            result = await vision_model.analyze_image(image_url, state.get("message"))

            # 3. 완료 이벤트 발행
            major_category = result.get("classification", {}).get("major_category", "unknown")
            await event_publisher.notify_stage(
                task_id=job_id,
                stage="vision",
                status="completed",
                progress=25,
                result={"major_category": major_category},
                message=f"✅ 분류 완료: {major_category}",
            )

            # 4. state 업데이트
            return {**state, "classification_result": result, "has_image": True}

        except Exception as e:
            await event_publisher.notify_stage(
                task_id=job_id, stage="vision", status="failed", message="⚠️ 이미지 분석 실패"
            )
            return {**state, "classification_result": None, "has_image": True, "vision_error": str(e)}

    return vision_node

4.4 Factory 수정

# infrastructure/orchestration/langgraph/factory.py
def create_chat_graph(
    llm: LLMClientPort,
    retriever: RetrieverPort,
    event_publisher: ProgressNotifierPort,
    vision_model: VisionModelPort | None = None,  # 추가
    ...
) -> StateGraph:
    # Vision 노드 생성
    if vision_model:
        vision_node = create_vision_node(vision_model, event_publisher)
    else:
        async def vision_node(state): return state  # passthrough

    graph = StateGraph(dict)

    # 노드 등록
    graph.add_node("intent", intent_node)
    graph.add_node("vision", vision_node)
    graph.add_node("router", router_node)
    ...

    # 조건부 엣지
    graph.add_conditional_edges("intent", route_after_intent, {
        "vision": "vision",
        "router": "router",
    })
    graph.add_edge("vision", "router")

5. 데이터 흐름

5.1 이미지 요청 처리

┌─────────────────────────────────────────────────────────────────────┐
│                    이미지 포함 요청 데이터 흐름                        │
└─────────────────────────────────────────────────────────────────────┘

┌──────────────┐     POST /chat                     ┌──────────────┐
│   Frontend   │ ─────────────────────────────────▶ │   Chat API   │
│              │     {                              │              │
│  📷 + "이거   │       session_id: "abc",          │              │
│   어떻게     │       message: "이거 어떻게 버려요?", │              │
│   버려요?"   │       image_url: "https://..."     │              │
│              │     }                              │              │
└──────────────┘                                    └──────┬───────┘
                                                          │
                                                          │ RabbitMQ
                                                          ▼
┌─────────────────────────────────────────────────────────────────────┐
│                         Chat Worker                                 │
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                    LangGraph Pipeline                        │    │
│  │                                                             │    │
│  │  State: {                                                   │    │
│  │    job_id: "xyz",                                           │    │
│  │    message: "이거 어떻게 버려요?",                            │    │
│  │    image_url: "https://...",     ◀── Frontend에서 전달       │    │
│  │  }                                                          │    │
│  │           │                                                 │    │
│  │           ▼                                                 │    │
│  │  ┌─────────────┐                                            │    │
│  │  │   Intent    │  → intent: "waste"                         │    │
│  │  └──────┬──────┘                                            │    │
│  │         │                                                   │    │
│  │         │  image_url 존재 → vision 분기                      │    │
│  │         ▼                                                   │    │
│  │  ┌─────────────┐                                            │    │
│  │  │   Vision    │  → GPT-5.2 / Gemini Vision 호출            │    │
│  │  └──────┬──────┘                                            │    │
│  │         │                                                   │    │
│  │  State 업데이트:                                              │    │
│  │  {                                                          │    │
│  │    ...                                                      │    │
│  │    classification_result: {        ◀── Vision 결과 저장      │    │
│  │      classification: {                                      │    │
│  │        major_category: "재활용폐기물",                        │    │
│  │        middle_category: "플라스틱류",                         │    │
│  │        minor_category: "페트병",                             │    │
│  │      },                                                     │    │
│  │      situation_tags: ["세척필요"],                            │    │
│  │      confidence: 0.95,                                      │    │
│  │    },                                                       │    │
│  │    has_image: true,                                         │    │
│  │  }                                                          │    │
│  │         │                                                   │    │
│  │         ▼                                                   │    │
│  │  ┌─────────────┐                                            │    │
│  │  │  waste_rag  │  → classification_result로 규정 검색        │    │
│  │  └──────┬──────┘                                            │    │
│  │         │                                                   │    │
│  │         ▼                                                   │    │
│  │  ┌─────────────┐                                            │    │
│  │  │   Answer    │  → 분류 결과 + 규정으로 답변 생성            │    │
│  │  └─────────────┘                                            │    │
│  │                                                             │    │
│  └─────────────────────────────────────────────────────────────┘    │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

5.2 SSE 이벤트 시퀀스

┌─────────────────────────────────────────────────────────────────────┐
│                     Vision 포함 SSE 이벤트 시퀀스                     │
└─────────────────────────────────────────────────────────────────────┘

 Frontend                    Chat Worker                   SSE Gateway
    │                            │                              │
    │  POST /chat (with image)   │                              │
    │───────────────────────────▶│                              │
    │                            │                              │
    │  { job_id, stream_url }    │                              │
    │◀───────────────────────────│                              │
    │                            │                              │
    │  GET /events (SSE)         │                              │
    │─────────────────────────────────────────────────────────▶│
    │                            │                              │
    │                            │  stage: "queued"  (0%)       │
    │◀─────────────────────────────────────────────────────────│
    │                            │                              │
    │                            │  stage: "intent"  (10%)      │
    │◀─────────────────────────────────────────────────────────│
    │                            │                              │
    │                            │  stage: "vision"  (15%)      │
    │◀────────────────────────────┬────────────────────────────│
    │   🔍 이미지 분석 중...       │                              │
    │                            │                              │
    │                            │  (Vision API 호출 ~2초)       │
    │                            │                              │
    │                            │  stage: "vision"  (25%)      │
    │◀────────────────────────────┬────────────────────────────│
    │   ✅ 분류 완료: 재활용폐기물  │                              │
    │                            │                              │
    │                            │  stage: "waste_rag" (50%)    │
    │◀─────────────────────────────────────────────────────────│
    │                            │                              │
    │                            │  stage: "answer"   (75%)     │
    │◀─────────────────────────────────────────────────────────│
    │                            │                              │
    │                            │  token: "페" "트" "병" ...    │
    │◀─────────────────────────────────────────────────────────│
    │                            │                              │
    │                            │  stage: "done"    (100%)     │
    │◀─────────────────────────────────────────────────────────│
    │                            │                              │

6. 디렉토리 구조

apps/chat_worker/
├── application/
│   └── ports/
│       └── vision/
│           ├── __init__.py
│           └── vision_model.py        # VisionModelPort (ABC)
│
├── infrastructure/
│   └── llm/
│       ├── clients/                   # 기존 LLM 클라이언트
│       └── vision/
│           ├── __init__.py
│           ├── openai_vision.py       # OpenAIVisionClient
│           └── gemini_vision.py       # GeminiVisionClient
│
└── infrastructure/
    └── orchestration/
        └── langgraph/
            ├── factory.py             # vision_model 파라미터 추가
            └── nodes/
                └── vision_node.py     # VisionNode 구현

7. 테스트 전략

7.1 단위 테스트 (Mock Vision)

class MockVisionModel(VisionModelPort):
    async def analyze_image(self, image_url, user_input=None):
        return {
            "classification": {
                "major_category": "재활용폐기물",
                "middle_category": "플라스틱류",
                "minor_category": "페트병",
            },
            "situation_tags": ["세척필요"],
            "confidence": 0.95,
        }

async def test_vision_node_with_image():
    node = create_vision_node(MockVisionModel(), MockNotifier())
    state = {"job_id": "test", "image_url": "https://example.com/pet.jpg"}

    result = await node(state)

    assert result["classification_result"] is not None
    assert result["has_image"] is True

7.2 통합 테스트 (Vision → RAG 플로우)

async def test_vision_to_rag_flow():
    # Vision 분석
    state = await vision_node({"image_url": "https://...", "message": "이거 뭐야?"})

    # RAG 검색 (Vision 결과 활용)
    state = await rag_node(state)

    # 분류 결과로 규정 검색했는지 확인
    assert retriever.search_called
    assert state["disposal_rules"] is not None

8. 성능 최적화

8.1 이미지 처리 최적화

최적화 항목	적용 방법	효과
detail: high	OpenAI 파라미터	정확도 향상
이미지 캐싱	동일 이미지 재분류 방지	API 호출 절감
병렬 처리	Intent와 Vision 동시 실행 (향후)	지연 시간 단축

8.2 비용 예측

┌────────────────────────────────────────────────────────────┐
│                   Vision 비용 예측 (월간)                    │
├────────────────────────────────────────────────────────────┤
│  가정:                                                     │
│  - 일 1,000건 이미지 요청                                   │
│  - 월 30,000건                                             │
│                                                            │
│  GPT-5.2 (detail: high):                                   │
│  - 멀티모달 통합 모델 (텍스트+이미지)                        │
│  - 출력: ~100 토큰                                          │
│  - 월: ~$10-20                                             │
│                                                            │
│  Gemini 3 Flash Preview:                                   │
│  - 멀티모달 통합 모델                                       │
│  - 무료 티어: 충분한 일일 한도                               │
│  - 월: ~$2-5                                               │
└────────────────────────────────────────────────────────────┘

참고: 가격은 2026년 1월 기준이며 변동될 수 있습니다.

9. 향후 확장

9.1 다중 이미지 지원

# 현재: 단일 이미지
image_url: str | None

# 향후: 다중 이미지
image_urls: list[str] | None

9.2 이미지 분류 캐싱

동일 이미지(해시 기준)에 대한 분류 결과를 캐싱합니다:

class VisionCache:
    async def get_or_analyze(self, image_url: str) -> dict:
        cache_key = f"vision:{hashlib.sha256(image_url).hexdigest()[:16]}"
        cached = await redis.get(cache_key)
        if cached:
            return json.loads(cached)

        result = await vision_model.analyze_image(image_url)
        await redis.setex(cache_key, 3600, json.dumps(result))
        return result

10. 요약

항목	결정	이유
Vision 위치	Intent 후 조건부	불필요한 호출 방지
기본 모델	GPT-5.2 (high)	정확도 우선
detail 레벨	high	정확한 분류를 위해 고해상도
결과 저장	state.classification_result	RAG 노드 활용
이벤트 발행	vision stage (15%→25%)	실시간 진행 표시

커밋 정보

Commit: da24a5dbfa0e4eb3c4d76f4e0f2c5e9e7a8b1c3d

feat(chat-worker): add Vision processing support

- Add VisionModelPort (application/ports/vision)
- Add OpenAIVisionClient, GeminiVisionClient (infrastructure/llm/vision)
- Add VisionNode for image classification (orchestration/langgraph/nodes)
- Update factory.py: intent → [vision?] → router flow
- Update dependencies.py: create_vision_client factory

Flow:
1. Intent node processes message
2. If image_url exists, Vision node analyzes image
3. classification_result stored in state
4. RAG/Answer nodes use classification result

Changed Files (10)

주요 파일:

application/ports/vision/vision_model.py — VisionModelPort (ABC)
infrastructure/llm/vision/openai_vision.py — OpenAI GPT-5.2 클라이언트
infrastructure/llm/vision/gemini_vision.py — Gemini Vision 클라이언트
infrastructure/orchestration/langgraph/nodes/vision_node.py — Vision 노드
infrastructure/orchestration/langgraph/factory.py — 그래프 플로우 수정
setup/dependencies.py — create_vision_client 팩토리 추가

'이코에코(Eco²) > Agent' 카테고리의 다른 글

이코에코(Eco²) Agent #11: Web Search Subagent (0)	2026.01.14
이코에코(Eco²) Agent #10: Local Prompt Optimization (0)	2026.01.14
Agentic Chat - Event Relay Layer 정합성 검증 리포트 (0)	2026.01.14
이코에코(Eco²) Agent #8: Infrastructure Layer (0)	2026.01.14
Agentic Chat Worker 테스트 및 품질 리포트 (0)	2026.01.14

ABOUT ME

mango_fr 개발기 mango_fr 개발기

1. 개요

1.1 AS-IS: 기존 Vision 파이프라인 (scan_worker)

1.2 TO-BE: LangGraph 기반 재구성 (chat_worker)

1.3 AS-IS vs TO-BE 비교

1.5 프롬프트 구조 개선

1.6 SSE 스트리밍 응답 설계

1.4 재구성 목표

2. 아키텍처 설계

2.1 파이프라인 플로우 변경

2.3 설계 결정: Vision 후 Intent 기반 분기 유지

2.2 Clean Architecture 계층 분리

3. 의사결정

3.1 Vision 모델 선택: GPT-5.2 vs Gemini 3 Flash

3.2 이미지 detail 레벨: high vs low

3.3 Vision 노드 위치: Intent 전 vs Intent 후

4. 구현 상세

4.1 VisionModelPort (Application Layer)

4.2 OpenAI Vision Client (Infrastructure Layer)

4.3 Vision Node (Orchestration Layer)

4.4 Factory 수정

5. 데이터 흐름

5.1 이미지 요청 처리

5.2 SSE 이벤트 시퀀스

6. 디렉토리 구조

7. 테스트 전략

7.1 단위 테스트 (Mock Vision)

7.2 통합 테스트 (Vision → RAG 플로우)

8. 성능 최적화

8.1 이미지 처리 최적화

8.2 비용 예측

9. 향후 확장

9.1 다중 이미지 지원

9.2 이미지 분류 캐싱

10. 요약

커밋 정보

'이코에코(Eco²) > Agent' 카테고리의 다른 글

티스토리툴바

ABOUT ME

1. 개요

1.1 AS-IS: 기존 Vision 파이프라인 (scan_worker)

1.2 TO-BE: LangGraph 기반 재구성 (chat_worker)

1.3 AS-IS vs TO-BE 비교

1.5 프롬프트 구조 개선

1.6 SSE 스트리밍 응답 설계

1.4 재구성 목표

2. 아키텍처 설계

2.1 파이프라인 플로우 변경

2.3 설계 결정: Vision 후 Intent 기반 분기 유지

2.2 Clean Architecture 계층 분리

3. 의사결정

3.1 Vision 모델 선택: GPT-5.2 vs Gemini 3 Flash

3.2 이미지 detail 레벨: high vs low

3.3 Vision 노드 위치: Intent 전 vs Intent 후

4. 구현 상세

4.1 VisionModelPort (Application Layer)

4.2 OpenAI Vision Client (Infrastructure Layer)

4.3 Vision Node (Orchestration Layer)

4.4 Factory 수정

5. 데이터 흐름

5.1 이미지 요청 처리

5.2 SSE 이벤트 시퀀스

6. 디렉토리 구조

7. 테스트 전략

7.1 단위 테스트 (Mock Vision)

7.2 통합 테스트 (Vision → RAG 플로우)

8. 성능 최적화

8.1 이미지 처리 최적화

8.2 비용 예측

9. 향후 확장

9.1 다중 이미지 지원

9.2 이미지 분류 캐싱

10. 요약

커밋 정보

'이코에코(Eco²) > Agent' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바