ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • Scan API(LLM x2) Performance Test - VU 800
    이코에코(Eco²)/Performance 2026. 1. 27. 16:35

    Date: 2026-01-27 16:10 KST (07:10 UTC)
    Test Type: k6 Scan Polling Test (Retest)
    Target VUs: 800
    OpenAI Tier: 4 (TPM 4,000,000)
    Snapshot: https://snapshots.raintank.io/dashboard/snapshot/VV9dOtFB57B6cDhqAFaPwQU93kds3nlO

     

    Grafana

     

    snapshots.raintank.io


    Executive Summary

    VU 800 테스트에서 99.7% 성공률을 달성하였습니다. VU 700 대비 성공률이 되려 0.5%p 향상되었으며, 4건의 실패는 Polling Timeout으로 추정됩니다. VU 800은 Tier 4 TPM 한도 내에서 안정적으로 동작합니다.

     

    지표 VU 700 대비 평가
    성공률 99.7% +0.5%p PASS
    실패 건수 4 -7 PASS
    HTTP 429 에러 0 - 정상
    E2E P95 144.6s +18% FAIL (SLA)
    Throughput 367.3 req/m +12% 개선

     

     


    1. Test Configuration

    test_info:
      target_vus: 800
      test_script: k6-scan-polling-test.js
      endpoint: https://api.dev.growbin.app/api/v1/scan
      timestamp: 2026-01-27T07:10:03.398Z
      poll_timeout: 300s
      max_poll_attempts: 150
      openai_tier: 4

    Test Timeline

    Phase Time (UTC) Time (KST) Event
    Test Start 07:07:03 16:07:03 Ramp-up 시작
    Worker Scale-up 07:07:15 16:07:15 1 → 3 replicas
    Steady State 07:07:33 16:07:33 VU 800 도달
    Test End 07:10:03 16:10:03 JSON 결과 저장

     


    2. Results

    2.1 Throughput & Success Rate

    Metric VU 800 (Retest) VU 700 VU 1000 Delta (vs 700)
    Total Submitted 1,386 1,496 1,592 -7.4%
    Total Completed 1,378 1,313 1,494 +4.9%
    Total Failed 4 11 53 -64%
    Success Rate 99.7% 99.2% 96.6% +0.5%p
    Reward Rate 0.0% 0.0% 0.0% -
    Throughput 367.3 req/m 329.1 req/m 378.7 req/m +12%

    2.2 Latency Distribution

    Metric Value VU 700 Delta Target Status
    Scan Submit P95 734ms 444ms +65% < 500ms FAIL
    Poll P95 2,110ms 1,283ms +64% < 500ms FAIL
    E2E P95 144.6s 122.3s +18% < 30s FAIL
    E2E Average 97.9s 89.9s +9% < 20s FAIL

    2.3 Polling Statistics

    Metric Value VU 700 Delta
    Total Poll Requests 52,845 57,955 -9%
    Avg Polls per Task 38.1 39.7 -4%
    Poll Interval ~2s ~2s -

    3. Error Analysis

    3.1 Error Summary

    Error Type Count Status
    HTTP 429 (Rate Limit) 0 정상
    Rate Limit Retries 0 정상
    Quota Exhausted 0 정상
    Answer Failed 0 정상
    Vision Failed 0 정상

    3.2 Failure Breakdown (4건)

    원인 건수 비율 설명
    Polling Timeout ~4 100% E2E > 300s
    Rate Limit 0 0% Tier 4 TPM 4M 내
    Quota 소진 0 0% Auto-recharge
    Total 4 100% -

    3.3 TPM Usage Analysis

    VU 800 TPM Estimation:
    ├─ Completed tasks: 1,378
    ├─ Estimated tokens/task: ~5,000
    ├─ Total tokens (3min): ~6,890,000
    ├─ TPM Average: ~2,297,000 tokens/min
    ├─ TPM Limit: 4,000,000 (Tier 4)
    └─ Usage: 57% of limit (Safe Zone)

    4. Infrastructure Metrics

    4.1 KEDA Scaling Events

    Time Component Replicas Trigger
    07:07:15 scan-worker 1 → 3 scan.vision queue > 10
    07:07:27 scan-api 1 → 2 CPU threshold
    07:07:57 scan-api 2 → 3 CPU threshold

    4.2 Worker Resource Usage (Post-Test)

    Pod CPU Memory Node
    scan-worker-78f8ccdc9-5srpg 156m 416Mi k8s-worker-ai
    scan-worker-78f8ccdc9-g2d9v 139m 582Mi k8s-worker-ai-2
    scan-worker-78f8ccdc9-m569v 206m 597Mi k8s-worker-ai-2
    scan-worker-canary 136m 480Mi k8s-worker-ai

    4.3 Worker Restart During Test

    Event Time Description
    Restart ~07:06:00 scan-worker-78f8ccdc9-5srpg 재시작
    Reason - Readiness probe failure (celery inspect ping timeout)
    Impact - 일부 in-flight tasks 손실 가능

    5. VU Progression Summary

    5.1 VU별 성능 비교 (Tier 4)

    VU 요청 수 완료 실패 성공률 E2E P95 Rate Limit
    600 1,408 1,401 4 99.7% 108.3s 0
    700 1,496 1,313 11 99.2% 122.3s 0
    800 1,386 1,378 4 99.7% 144.6s 0
    1000 1,592 1,494 53 96.6% 166.7s 0

    5.2 운영 권장 범위 (Tier 4 기준)

    범위 VU 성공률 E2E P95 TPM 사용률
    Green Zone 50-400 99.9%+ < 65s < 40%
    Yellow Zone 400-600 99.5%+ 65-110s 40-55%
    Orange Zone 600-800 99%+ 110-145s 55-60%
    Red Zone 800-1000 96%+ 145-170s 60-70%

    6. Comparison: First Attempt vs Retest

    Metric First Attempt Retest 개선
    Success Rate 0.0% 99.7% +99.7%p
    Completed 0 1,378
    Scan Submit P95 10,002ms 734ms -93%
    System Status Crashed Healthy 복구

    6.1 성공 요인

    요인 설명
    Warm Workers 3 replicas + canary가 이미 Ready 상태
    Empty Queues 이전 테스트 잔여물 없음
    KEDA Pre-scaled Cold start 회피
    System Stabilized Pod 재시작 후 안정화 완료

    7. Recommendations

    7.1 현재 상태 평가

    항목 상태 비고
    Rate Limit 정상 TPM 57% 사용
    성공률 양호 99.7% > 95% SLA
    레이턴시 초과 E2E P95 144.6s > 30s

    7.2 Cold Start 방지 권장

    # scan-worker KEDA ScaledObject
    spec:
      minReplicaCount: 2  # Changed from 1
      maxReplicaCount: 5  # Changed from 3

    7.3 VU 900 테스트 예상

    VU 예상 성공률 예상 TPM 사용률 Risk
    900 ~98% ~65% Medium-High

    8. Appendix

    8.1 Raw Test Output

    {
      "test_info": {
        "target_vus": 800,
        "duration_seconds": 1769497578.2772892
      },
      "results": {
        "total_submitted": 1386,
        "total_completed": 1378,
        "total_failed": 4,
        "success_rate": "99.7%",
        "reward_rate": "0.0%"
      },
      "latency": {
        "scan_submit_p95": "734ms",
        "poll_p95": "2110ms",
        "e2e_p95": "144.6s",
        "e2e_avg": "97.9s"
      },
      "polling": {
        "total_poll_requests": 52845,
        "avg_polls_per_task": "38.1"
      },
      "throughput": {
        "requests_per_minute": "367.3 req/m"
      }
    }

    8.3 Prometheus Query Reference

    # Time Range
    start: 2026-01-27T07:07:03Z
    end: 2026-01-27T07:10:03Z
    
    # Worker CPU
    sum(rate(container_cpu_usage_seconds_total{namespace="scan",pod=~"scan-worker.*"}[1m])) by (pod)
    
    # Worker Memory
    sum(container_memory_working_set_bytes{namespace="scan",pod=~"scan-worker.*"}) by (pod) / 1024 / 1024
    
    # Queue Depth
    rabbitmq_queue_messages{queue=~"scan.*"}

    8.4 Related Files

    • Test Script: e2e-tests/performance/k6-scan-polling-test.js
    • Result JSON (Retest): k6-scan-polling-vu800-2026-01-27T07-10-03-398Z.json
    • Result JSON (Failed): k6-scan-polling-vu800-2026-01-27T06-50-46-515Z.json
    • VU 700 Report: docs/blogs/tests/2026-01-27-scan-load-test-vu700.md
    • VU 1000 Tier 4 Report: docs/blogs/tests/2026-01-27-scan-load-test-vu1000-tier4.md

    9. Conclusion

    VU 800 테스트 결과 요약

    항목 첫 시도 재테스트
    성공률 0.0% (CRITICAL) 99.7% (PASS)
    Rate Limit N/A (System Crash) 0건
    실패 원인 Cascading Failure Polling Timeout (4건)
    TPM 사용률 N/A ~57% (Safe Zone)

     

    댓글

ABOUT ME

🎓 부산대학교 정보컴퓨터공학과 학사: 2017.03 - 2023.08
☁️ Rakuten Symphony Jr. Cloud Engineer: 2024.12.09 - 2025.08.31
🏆 2025 AI 새싹톤 우수상 수상: 2025.10.30 - 2025.12.02
🌏 이코에코(Eco²) 백엔드/인프라 고도화 중: 2025.12 - Present

Designed by Mango