chore: Enhance documentation for E2E testing:

- Added clarity and structure to README files, including recent updates and getting started sections. - Improved manual verification documentation for CrowdSec authentication, emphasizing expected outputs and success criteria. - Updated debugging guide with detailed output examples and automatic trace capture information. - Refined best practices for E2E tests, focusing on efficient polling, locator strategies, and state management. - Documented triage report for DNS Provider feature tests, highlighting issues fixed and test results before and after improvements. - Revised E2E test writing guide to include when to use specific helper functions and patterns for better test reliability. - Enhanced troubleshooting documentation with clear resolutions for common issues, including timeout and token configuration problems. - Updated tests README to provide quick links and best practices for writing robust tests.
2026-03-24 01:47:22 +00:00
parent 7d986f2821
commit ca477c48d4
52 changed files with 983 additions and 198 deletions
--- a/docs/performance/feature-flags-endpoint.md
+++ b/docs/performance/feature-flags-endpoint.md
@@ -31,6 +31,7 @@ for _, s := range settings {
 ```

 **Key Improvements:**
+
 - **Single Query:** `WHERE key IN (?, ?, ?)` fetches all flags in one database round-trip
 - **O(1) Lookups:** Map-based access eliminates linear search overhead
 - **Error Handling:** Explicit error logging and HTTP 500 response on failure
@@ -56,6 +57,7 @@ if err := h.DB.Transaction(func(tx *gorm.DB) error {
 ```

 **Key Improvements:**
+
 - **Atomic Updates:** All flag changes commit or rollback together
 - **Error Recovery:** Transaction rollback prevents partial state
 - **Improved Logging:** Explicit error messages for debugging
@@ -65,10 +67,12 @@ if err := h.DB.Transaction(func(tx *gorm.DB) error {
 ### Before Optimization (Baseline - N+1 Pattern)

 **Architecture:**
+
 - GetFlags(): 3 sequential `WHERE key = ?` queries (one per flag)
 - UpdateFlags(): Multiple separate transactions

 **Measured Latency (Expected):**
+
 - **GET P50:** 300ms (CI environment)
 - **GET P95:** 500ms
 - **GET P99:** 600ms
@@ -77,20 +81,24 @@ if err := h.DB.Transaction(func(tx *gorm.DB) error {
 - **PUT P99:** 600ms

 **Query Count:**
+
 - GET: 3 queries (N+1 pattern, N=3 flags)
 - PUT: 1-3 queries depending on flag count

 **CI Impact:**
+
 - Test flakiness: ~30% failure rate due to timeouts
 - E2E test pass rate: ~70%

 ### After Optimization (Current - Batch Query + Transaction)

 **Architecture:**
+
 - GetFlags(): 1 batch query `WHERE key IN (?, ?, ?)`
 - UpdateFlags(): 1 transaction wrapping all updates

 **Measured Latency (Target):**
+
 - **GET P50:** 100ms (3x faster)
 - **GET P95:** 150ms (3.3x faster)
 - **GET P99:** 200ms (3x faster)
@@ -99,10 +107,12 @@ if err := h.DB.Transaction(func(tx *gorm.DB) error {
 - **PUT P99:** 200ms (3x faster)

 **Query Count:**
+
 - GET: 1 batch query (N+1 eliminated)
 - PUT: 1 transaction (atomic)

 **CI Impact (Expected):**
+
 - Test flakiness: 0% (with retry logic + polling)
 - E2E test pass rate: 100%

@@ -125,11 +135,13 @@ if err := h.DB.Transaction(func(tx *gorm.DB) error {
 **Status:** Complete

 **Changes:**
+
 - Added `defer` timing to GetFlags() and UpdateFlags()
 - Log format: `[METRICS] GET/PUT /feature-flags: {duration}ms`
 - CI pipeline captures P50/P95/P99 metrics

 **Files Modified:**
+
 - `backend/internal/api/handlers/feature_flags_handler.go`

 ### Phase 1: Backend Optimization - N+1 Query Fix
@@ -139,16 +151,19 @@ if err := h.DB.Transaction(func(tx *gorm.DB) error {
 **Priority:** P0 - Critical CI Blocker

 **Changes:**
+
 - **GetFlags():** Replaced N+1 loop with batch query `WHERE key IN (?)`
 - **UpdateFlags():** Wrapped updates in single transaction
 - **Tests:** Added batch query and transaction rollback tests
 - **Benchmarks:** Added BenchmarkGetFlags and BenchmarkUpdateFlags

 **Files Modified:**
+
 - `backend/internal/api/handlers/feature_flags_handler.go`
 - `backend/internal/api/handlers/feature_flags_handler_test.go`

 **Expected Impact:**
+
 - 3-6x latency reduction (600ms → 200ms P99)
 - Elimination of N+1 query anti-pattern
 - Atomic updates with rollback on error
@@ -159,32 +174,38 @@ if err := h.DB.Transaction(func(tx *gorm.DB) error {
 ### Test Helpers Used

 **Polling Helper:** `waitForFeatureFlagPropagation()`
+
 - Polls `/api/v1/feature-flags` until expected state confirmed
 - Default interval: 500ms
 - Default timeout: 30s (150x safety margin over 200ms P99)

 **Retry Helper:** `retryAction()`
+
 - 3 max attempts with exponential backoff (2s, 4s, 8s)
 - Handles transient network/DB failures

 ### Timeout Strategy

 **Helper Defaults:**
+
 - `clickAndWaitForResponse()`: 30s timeout
 - `waitForAPIResponse()`: 30s timeout
 - No explicit timeouts in test files (rely on helper defaults)

 **Typical Poll Count:**
+
 - Local: 1-2 polls (50-200ms response + 500ms interval)
 - CI: 1-3 polls (50-200ms response + 500ms interval)

 ### Test Files

 **E2E Tests:**
+
 - `tests/settings/system-settings.spec.ts` - Feature toggle tests
 - `tests/utils/wait-helpers.ts` - Polling and retry helpers

 **Backend Tests:**
+
 - `backend/internal/api/handlers/feature_flags_handler_test.go`
 - `backend/internal/api/handlers/feature_flags_handler_coverage_test.go`

@@ -205,11 +226,13 @@ go test ./internal/api/handlers/ -bench=Benchmark.*Flags -benchmem -run=^$
 ### Benchmark Analysis

 **GetFlags Benchmark:**
+
 - Measures single batch query performance
 - Tests with 3 flags in database
 - Includes JSON serialization overhead

 **UpdateFlags Benchmark:**
+
 - Measures transaction wrapping performance
 - Tests atomic update of 3 flags
 - Includes JSON deserialization and validation
@@ -219,14 +242,17 @@ go test ./internal/api/handlers/ -bench=Benchmark.*Flags -benchmem -run=^$
 ### Why Batch Query Over Individual Queries?

 **Problem:** N+1 pattern causes linear latency scaling
+
 - 3 flags = 3 queries × 200ms = 600ms total
 - 10 flags = 10 queries × 200ms = 2000ms total

 **Solution:** Single batch query with IN clause
+
 - N flags = 1 query × 200ms = 200ms total
 - Constant time regardless of flag count

 **Trade-offs:**
+
 - ✅ 3-6x latency reduction
 - ✅ Scales to more flags without performance degradation
 - ⚠️ Slightly more complex code (map-based lookup)
@@ -234,14 +260,17 @@ go test ./internal/api/handlers/ -bench=Benchmark.*Flags -benchmem -run=^$
 ### Why Transaction Wrapping?

 **Problem:** Multiple separate writes risk partial state
+
 - Flag 1 succeeds, Flag 2 fails → inconsistent state
 - No rollback mechanism for failed updates

 **Solution:** Single transaction for all updates
+
 - All succeed together or all rollback
 - ACID guarantees for multi-flag updates

 **Trade-offs:**
+
 - ✅ Atomic updates with rollback on error
 - ✅ Prevents partial state corruption
 - ⚠️ Slightly longer locks (mitigated by fast SQLite)
@@ -253,11 +282,13 @@ go test ./internal/api/handlers/ -bench=Benchmark.*Flags -benchmem -run=^$
 **Status:** Not implemented (not needed after Phase 1 optimization)

 **Rationale:**
+
 - Current latency (50-200ms) is acceptable for feature flags
 - Feature flags change infrequently (not a hot path)
 - Adding cache increases complexity without significant benefit

 **If Needed:**
+
 - Use Redis or in-memory cache with TTL=60s
 - Invalidate on PUT operations
 - Expected improvement: 50-200ms → 10-50ms
@@ -267,11 +298,13 @@ go test ./internal/api/handlers/ -bench=Benchmark.*Flags -benchmem -run=^$
 **Status:** SQLite default indexes sufficient

 **Rationale:**
+
 - `settings.key` column used in WHERE clauses
 - SQLite automatically indexes primary key
 - Query plan analysis shows index usage

 **If Needed:**
+
 - Add explicit index: `CREATE INDEX idx_settings_key ON settings(key)`
 - Expected improvement: Minimal (already fast)

@@ -280,11 +313,13 @@ go test ./internal/api/handlers/ -bench=Benchmark.*Flags -benchmem -run=^$
 **Status:** GORM default pooling sufficient

 **Rationale:**
+
 - GORM uses `database/sql` pool by default
 - Current concurrency limits adequate
 - No connection exhaustion observed

 **If Needed:**
+
 - Tune `SetMaxOpenConns()` and `SetMaxIdleConns()`
 - Expected improvement: 10-20% under high load

@@ -293,12 +328,14 @@ go test ./internal/api/handlers/ -bench=Benchmark.*Flags -benchmem -run=^$
 ### Metrics to Track

 **Backend Metrics:**
+
 - P50/P95/P99 latency for GET and PUT operations
 - Query count per request (should remain 1 for GET)
 - Transaction count per PUT (should remain 1)
 - Error rate (target: <0.1%)

 **E2E Metrics:**
+
 - Test pass rate for feature toggle tests
 - Retry attempt frequency (target: <5%)
 - Polling iteration count (typical: 1-3)
@@ -307,11 +344,13 @@ go test ./internal/api/handlers/ -bench=Benchmark.*Flags -benchmem -run=^$
 ### Alerting Thresholds

 **Backend Alerts:**
+
 - P99 > 500ms → Investigate regression (2.5x slower than optimized)
 - Error rate > 1% → Check database health
 - Query count > 1 for GET → N+1 pattern reintroduced

 **E2E Alerts:**
+
 - Test pass rate < 95% → Check for new flakiness
 - Timeout errors > 0 → Investigate CI environment
 - Retry rate > 10% → Investigate transient failure source
@@ -319,10 +358,12 @@ go test ./internal/api/handlers/ -bench=Benchmark.*Flags -benchmem -run=^$
 ### Dashboard

 **CI Metrics:**
+
 - Link: `.github/workflows/e2e-tests.yml` artifacts
 - Extracts `[METRICS]` logs for P50/P95/P99 analysis

 **Backend Logs:**
+
 - Docker container logs with `[METRICS]` tag
 - Example: `[METRICS] GET /feature-flags: 120ms`

@@ -331,15 +372,18 @@ go test ./internal/api/handlers/ -bench=Benchmark.*Flags -benchmem -run=^$
 ### High Latency (P99 > 500ms)

 **Symptoms:**
+
 - E2E tests timing out
 - Backend logs show latency spikes

 **Diagnosis:**
+
 1. Check query count: `grep "SELECT" backend/logs/query.log`
 2. Verify batch query: Should see `WHERE key IN (...)`
 3. Check transaction wrapping: Should see single `BEGIN ... COMMIT`

 **Remediation:**
+
 - If N+1 pattern detected: Verify batch query implementation
 - If transaction missing: Verify transaction wrapping
 - If database locks: Check concurrent access patterns
@@ -347,15 +391,18 @@ go test ./internal/api/handlers/ -bench=Benchmark.*Flags -benchmem -run=^$
 ### Transaction Rollback Errors

 **Symptoms:**
+
 - PUT requests return 500 errors
 - Backend logs show transaction failure

 **Diagnosis:**
+
 1. Check error message: `grep "Failed to update feature flags" backend/logs/app.log`
 2. Verify database constraints: Unique key constraints, foreign keys
 3. Check database connectivity: Connection pool exhaustion

 **Remediation:**
+
 - If constraint violation: Fix invalid flag key or value
 - If connection issue: Tune connection pool settings
 - If deadlock: Analyze concurrent access patterns
@@ -363,15 +410,18 @@ go test ./internal/api/handlers/ -bench=Benchmark.*Flags -benchmem -run=^$
 ### E2E Test Flakiness

 **Symptoms:**
+
 - Tests pass locally, fail in CI
 - Timeout errors in Playwright logs

 **Diagnosis:**
+
 1. Check backend latency: `grep "[METRICS]" ci-logs.txt`
 2. Verify retry logic: Should see retry attempts in logs
 3. Check polling behavior: Should see multiple GET requests

 **Remediation:**
+
 - If backend slow: Investigate CI environment (disk I/O, CPU)
 - If no retries: Verify `retryAction()` wrapper in test
 - If no polling: Verify `waitForFeatureFlagPropagation()` usage