- Added initial feature flag state verification before tests to ensure a stable starting point. - Implemented retry logic with exponential backoff for toggling feature flags, improving resilience against transient failures. - Introduced `waitForFeatureFlagPropagation` utility to replace hard-coded waits with condition-based verification for feature flag states. - Added advanced test scenarios for handling concurrent toggle operations and retrying on network failures. - Updated existing tests to utilize the new retry and propagation utilities for better reliability and maintainability.
43 lines
1.9 KiB
Plaintext
43 lines
1.9 KiB
Plaintext
# Playwright E2E Test Timeout Fix - Feature Flags Endpoint
|
|
|
|
## 1. Introduction
|
|
|
|
### Overview
|
|
This plan addresses systematic timeout failures in Playwright E2E tests for the feature flags endpoint (`/feature-flags`) occurring consistently in CI environments. The tests in `tests/settings/system-settings.spec.ts` are failing due to timeouts when waiting for API responses during feature toggle operations.
|
|
|
|
### Problem Statement
|
|
Four tests are timing out in CI:
|
|
1. `should toggle Cerberus security feature`
|
|
2. `should toggle CrowdSec console enrollment`
|
|
3. `should toggle uptime monitoring`
|
|
4. `should persist feature toggle changes`
|
|
|
|
All tests follow the same pattern:
|
|
- Click toggle → Wait for PUT `/feature-flags` (currently 15s timeout)
|
|
- Wait for subsequent GET `/feature-flags` (currently 10s timeout)
|
|
- Both operations frequently exceed their timeouts in CI
|
|
|
|
### Root Cause Analysis
|
|
Based on comprehensive research, the timeout failures are caused by:
|
|
|
|
1. **Backend N+1 Query Pattern** (PRIMARY)
|
|
- `GetFlags()` makes 3 separate SQLite queries (one per feature flag)
|
|
- `UpdateFlags()` makes additional individual queries per flag
|
|
- Each toggle operation requires: 3 queries (PUT) + 3 queries (GET) = 6 DB operations minimum
|
|
|
|
2. **CI Environment Characteristics**
|
|
- Slower disk I/O compared to local development
|
|
- SQLite on CI runners lacks shared memory optimizations
|
|
- No database query caching layer
|
|
- Sequential query execution compounds latency
|
|
|
|
3. **Test Pattern Amplification**
|
|
- Tests explicitly set lower timeouts (15s, 10s) than helper defaults (30s)
|
|
- Immediate GET after PUT doesn't allow for state propagation
|
|
- No retry logic for transient failures
|
|
|
|
### Objectives
|
|
1. **Immediate**: Increase timeouts and add strategic waits to fix CI failures
|
|
2. **Short-term**: Improve test reliability with better wait strategies
|
|
3. **Long-term**: Document backend performance optimization opportunities
|