ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • Scan API(LLM x2) Performance Test - VU 1000 (Tier 4)
    이코에코(Eco²)/Performance 2026. 1. 27. 15:28

     

    Date: 2026-01-27 15:12 KST (06:12 UTC)
    Test Type: k6 Scan Polling Test (Tier 4 Validation)
    Target VUs: 1000
    Purpose: OpenAI Tier 4 승격 후 Rate Limit 해소 검증
    Snapshot: https://snapshots.raintank.io/dashboard/snapshot/QpoxrHswLzzh7CpkepGlP7IZkymTZTcX

     

    Grafana

     

    snapshots.raintank.io


    Executive Summary

     

    OpenAI Tier 4 승격 + Auto-recharge 설정 후 VU 1000 재테스트 결과, 96.6% 성공률을 달성하여 이전 테스트(92.5%) 대비 4.1%p 개선되었습니다. Rate Limit(TPM) 에러와 Quota 소진 에러가 완전히 해소되었으며, 53건의 실패는 Worker 재시작 및 Polling Timeout으로 인한 것으로 분석됩니다.

    지표 Tier 4 (현재) Tier 3 (이전) 개선 평가
    성공률 96.6% 92.5% +4.1%p PASS
    실패 건수 53 97 -45% 개선
    HTTP 429 에러 0 1,017 -100% 해소
    Quota 소진 0 166 -100% 해소
    E2E P95 166.7s 175.4s -5% 개선
    Throughput 378.7 req/m 301.1 req/m +26% 개선

     


    1. Test Configuration

    test_info:
      target_vus: 1000
      test_script: k6-scan-polling-test.js
      endpoint: https://api.dev.growbin.app/api/v1/scan
      timestamp: 2026-01-27T06:12:42.942Z
      poll_timeout: 300s
      max_poll_attempts: 150
      openai_tier: 4  # Upgraded from Tier 3
      auto_recharge: enabled

    Test Timeline

    Phase Time (UTC) Time (KST) Event
    Test Start 06:09:42 15:09:42 Ramp-up 시작
    Worker Scale-up 06:09:50 15:09:50 1 → 3 replicas
    Steady State 06:10:12 15:10:12 VU 1000 도달
    Worker Restart 06:10:53 15:10:53 Memory pressure
    Test End 06:12:42 15:12:42 JSON 결과 저장
    Worker Scale-down 06:17:00 15:17:00 3 → 1 replicas

    2. Test Results

    2.1 Throughput & Success Rate

    Metric Tier 4 Tier 3 (이전) Delta VU 600
    Total Submitted 1,592 1,645 -3.2% 1,408
    Total Completed 1,494 1,204 +24% 1,401
    Total Failed 53 97 -45% 4
    Success Rate 96.6% 92.5% +4.1%p 99.7%
    Reward Rate 0.0% 0.0% - 0.0%
    Throughput 378.7 req/m 301.1 req/m +26% 358.6 req/m

    2.2 Latency Distribution

    Metric Tier 4 Tier 3 (이전) Delta Target Status
    Scan Submit P95 863ms 787ms +10% < 500ms FAIL
    Poll P95 2,547ms 2,772ms -8% < 500ms FAIL
    E2E P95 166.7s 175.4s -5% < 30s FAIL
    E2E Average 117.6s 121.4s -3% < 20s FAIL

    2.3 Polling Statistics

    Metric Tier 4 Tier 3 (이전) Delta
    Total Poll Requests 73,070 74,922 -2.5%
    Avg Polls per Task 46.2 44.9 +3%
    Poll Interval ~2s ~2s -

    3. Rate Limit Analysis (Tier 4 vs Tier 3)

    3.1 OpenAI API Limits Comparison

    Limit Type Tier 3 Tier 4 증가율
    TPM (Tokens Per Minute) 2,000,000 4,000,000 +100%
    RPM (Requests Per Minute) 10,000 10,000 0%

    3.2 Error Comparison

    Error Type Tier 4 (현재) Tier 3 (이전) 개선
    HTTP 429 Total 0 1,017 -100%
    Rate Limit (TPM) 0 361 -100%
    Quota Exhausted 0 166 -100%
    Answer Failed 0 225 -100%
    Vision Failed 0 19 -100%

    3.3 TPM Usage Analysis

    Tier 4 Test TPM Estimation:
    ├─ Completed tasks: 1,494
    ├─ Estimated tokens/task: ~5,000
    ├─ Total tokens (3min): ~7,470,000
    ├─ TPM Average: ~2,490,000 tokens/min
    ├─ TPM Limit: 4,000,000
    └─ Usage: 62% of limit (Safe Zone)

    4. Failure Analysis (53건)

    4.1 Failure Breakdown

    원인 건수 비율 설명
    Worker 재시작 ~30 57% 06:10:53 UTC 재시작으로 인한 Task 손실
    Polling Timeout ~20 38% E2E > 300s
    기타 ~3 5% 네트워크, Scale-down 시점
    Total 53 100% -

    4.2 Worker Restart Analysis

    ┌─────────────────────────────────────────────────────────────────────────┐
    │                    Worker Restart Timeline                               │
    ├─────────────────────────────────────────────────────────────────────────┤
    │                                                                          │
    │  06:09:42 ─── Test Start                                                │
    │      │                                                                   │
    │  06:09:50 ─── KEDA Scale-up (1 → 3)                                     │
    │      │        Workers: wtxmw, nt895, rbkz6                              │
    │      │                                                                   │
    │  06:10:53 ─── Worker Restart (Memory Pressure)                          │
    │      │        wtxmw restarted                                           │
    │      │        → In-flight tasks lost (~30)                              │
    │      │                                                                   │
    │  06:12:42 ─── Test End                                                  │
    │      │                                                                   │
    │  06:17:00 ─── KEDA Scale-down (3 → 1)                                   │
    │              "All metrics below target"                                  │
    │                                                                          │
    └─────────────────────────────────────────────────────────────────────────┘

    4.3 Rate Limit vs Worker Restart: 실패 원인 비교

    테스트 Rate Limit 실패 Worker 재시작 실패 Timeout 실패
    Tier 3 VU 1000 80건 (82%) 5건 (5%) 12건 (12%)
    Tier 4 VU 1000 0건 (0%) 30건 (57%) 20건 (38%)

    5. Infrastructure Metrics

    5.1 KEDA Scaling Events

    Time (UTC) Component Replicas Trigger
    06:09:50 scan-worker 1 → 3 scan.vision queue > 10
    06:09:55 scan-api 1 → 3 CPU threshold
    06:17:00 scan-worker 3 → 1 All metrics below target

    5.2 RabbitMQ Queue Depth

    Queue Peak Tier 3 Peak Delta
    scan.vision 532 795 -33%
    scan.rule 560 509 +10%
    scan.answer 221 312 -29%
    scan.reward 138 164 -16%

    분석: Queue depth 감소 → Rate Limit 미발생으로 처리 속도 향상

    5.3 Worker Resource Usage

    Pod CPU Peak Memory Peak Note
    scan-worker-wtxmw 0.88 cores 1,026MB Original (restarted)
    scan-worker-nt895 0.89 cores 553MB Scale-up
    scan-worker-rbkz6 0.89 cores 550MB Scale-up
    scan-worker-canary 0.89 cores 1,047MB Canary

    5.4 Redis Memory Usage

    Instance Peak Tier 3 Peak Delta
    rfr-cache 204MB 167MB* +22%
    rfr-streams 59MB 58MB +2%
    rfr-pubsub 4MB 2MB -

    *Tier 3 테스트는 VU 600 기준


    6. Prometheus Queries Used

    6.1 CPU & Memory

    # Worker CPU Usage
    sum(rate(container_cpu_usage_seconds_total{namespace="scan",pod=~"scan-worker.*",container="scan-worker"}[1m])) by (pod)
    
    # Worker Memory Usage
    sum(container_memory_working_set_bytes{namespace="scan",pod=~"scan-worker.*",container="scan-worker"}) by (pod) / 1024 / 1024

    6.2 Queue & Scaling

    # RabbitMQ Queue Depth
    rabbitmq_queue_messages{queue=~"scan.*"}
    
    # HPA Replica Count
    kube_horizontalpodautoscaler_status_current_replicas{horizontalpodautoscaler=~".*scan.*"}

    6.3 Test Time Range

    Start: 2026-01-27T06:09:42Z (epoch: 1769494182)
    End:   2026-01-27T06:12:42Z (epoch: 1769494362)

    7. Tier 4 Upgrade Impact Summary

    7.1 Before vs After

    지표 Tier 3 VU 1000 Tier 4 VU 1000 효과
    성공률 92.5% 96.6% +4.1%p
    Rate Limit 에러 1,017건 0건 해소
    Quota 소진 166건 0건 해소
    Throughput 301.1 req/m 378.7 req/m +26%
    E2E P95 175.4s 166.7s -5%

    7.2 Key Findings

    1. Rate Limit 완전 해소: TPM 4M 한도로 VU 1000 부하 수용
    2. Quota 소진 방지: Auto-recharge로 잔액 자동 충전
    3. 처리량 26% 향상: Rate Limit 대기 없이 연속 처리
    4. Queue Depth 감소: 평균 25% 감소로 처리 효율 향상
    5. 잔여 실패 원인: Worker 재시작(57%) + Polling Timeout(38%)

    8. Appendix

    8.1 Raw Test Output

    {
      "test_info": {
        "target_vus": 1000,
        "duration_seconds": 1769494126.207721
      },
      "results": {
        "total_submitted": 1592,
        "total_completed": 1494,
        "total_failed": 53,
        "success_rate": "96.6%",
        "reward_rate": "0.0%"
      },
      "latency": {
        "scan_submit_p95": "863ms",
        "poll_p95": "2547ms",
        "e2e_p95": "166.7s",
        "e2e_avg": "117.6s"
      },
      "polling": {
        "total_poll_requests": 73070,
        "avg_polls_per_task": "46.2"
      },
      "throughput": {
        "requests_per_minute": "378.7 req/m"
      }
    }

    8.2 Worker Log Summary (Current Container)

    Metric Main Worker Canary Total
    scan.vision succeeded 174 205 379
    scan.rule succeeded 214 - 214+
    scan.answer succeeded 250 272 522
    scan.reward succeeded 302 - 302+
    HTTP 429 errors 0 0 0
    Rate limit retries 0 0 0

    8.3 Related Files

    • Test Script: e2e-tests/performance/k6-scan-polling-test.js
    • Result JSON: k6-scan-polling-vu1000-2026-01-27T06-12-42-942Z.json
    • Previous Test Report: docs/blogs/tests/2026-01-27-scan-load-test-vu1000.md
    • Error Analysis: docs/blogs/tests/2026-01-27-error-analysis.md

    9. Conclusion

    Tier 4 승격 효과 확정

    항목 결과
    Rate Limit 해소 TPM 4M으로 VU 1000 수용
    Quota 소진 방지 Auto-recharge 정상 작동
    성공률 개선 92.5% → 96.6% (+4.1%p)
    처리량 향상 301.1 → 378.7 req/m (+26%)

    잔여 과제

    1. Worker 안정성: Memory limit 증가로 재시작 방지 필요
    2. Polling Timeout: E2E P95 166.7s로 SLA(30s) 초과 지속
    3. 95% SLA 달성: Worker 재시작 방지 시 달성 가능

    댓글

ABOUT ME

🎓 부산대학교 정보컴퓨터공학과 학사: 2017.03 - 2023.08
☁️ Rakuten Symphony Jr. Cloud Engineer: 2024.12.09 - 2025.08.31
🏆 2025 AI 새싹톤 우수상 수상: 2025.10.30 - 2025.12.02
🌏 이코에코(Eco²) 백엔드/인프라 고도화 중: 2025.12 - Present

Designed by Mango