12 KiB
E2E Test Reorganization Implementation
Problem Statement
CI E2E tests were timing out at 20 minutes even with 8 shards per browser (24 total shards) because:
- Cross-Shard Contamination: Security enforcement tests that enable/disable Cerberus were randomly distributed across shards, causing ACL and rate limit failures in non-security tests
- Global State Interference: Tests modifying global security state (Cerberus middleware) were running in parallel, causing unpredictable test failures
- Uneven Distribution: Random shard distribution didn't account for test dependencies and sequential requirements
Solution Architecture
Test Isolation Strategy
Reorganized tests into two categories with dedicated job execution:
Category 1: Security Enforcement Tests (Isolated Serial Execution)
- Location:
tests/security-enforcement/ - Job Names:
e2e-chromium-securitye2e-firefox-securitye2e-webkit-security
- Sharding: 1 shard per browser (no sharding within security tests)
- Environment:
CHARON_SECURITY_TESTS_ENABLED: "true" - Timeout: 30 minutes (allows for sequential execution)
- Test Files:
rate-limit-enforcement.spec.tscrowdsec-enforcement.spec.tsemergency-token.spec.ts(break glass protocol)combined-enforcement.spec.tssecurity-headers-enforcement.spec.tswaf-enforcement.spec.tsacl-enforcement.spec.tszzz-admin-whitelist-blocking.spec.ts(test.describe.serial)zzzz-break-glass-recovery.spec.ts(test.describe.serial)emergency-reset.spec.ts
Execution Flow (as specified by user):
- Enable Cerberus security module
- Run tests requiring security ON (ACL, WAF, rate limiting, etc.)
- Execute break glass protocol test (
emergency-token.spec.ts) - Run tests requiring security OFF (verify bypass)
Category 2: Non-Security Tests (Parallel Sharded Execution)
- Job Names:
e2e-chromium(Shard 1-4)e2e-firefox(Shard 1-4)e2e-webkit(Shard 1-4)
- Sharding: 4 shards per browser (12 total shards)
- Environment:
CHARON_SECURITY_TESTS_ENABLED: "false"← Cerberus OFF by default - Timeout: 20 minutes per shard
- Test Directories:
tests/coretests/dns-provider-crud.spec.tstests/dns-provider-types.spec.tstests/emergency-servertests/integrationtests/manual-dns-provider.spec.tstests/monitoringtests/security(UI/dashboard tests, not enforcement)tests/settingstests/tasks
Job Distribution
Before:
Total: 24 shards (8 per browser)
├── Chromium: 8 shards (all tests randomly distributed)
├── Firefox: 8 shards (all tests randomly distributed)
└── WebKit: 8 shards (all tests randomly distributed)
Issues:
- Security tests randomly distributed across all shards
- Cerberus state changes affecting parallel test execution
- ACL/rate limit failures in non-security tests
After:
Total: 15 jobs
├── Security Enforcement (3 jobs)
│ ├── Chromium Security: 1 shard (serial execution, 30min timeout)
│ ├── Firefox Security: 1 shard (serial execution, 30min timeout)
│ └── WebKit Security: 1 shard (serial execution, 30min timeout)
│
└── Non-Security (12 shards)
├── Chromium: 4 shards (parallel, Cerberus OFF, 20min timeout)
├── Firefox: 4 shards (parallel, Cerberus OFF, 20min timeout)
└── WebKit: 4 shards (parallel, Cerberus OFF, 20min timeout)
Benefits:
- Security tests isolated, run serially without cross-shard interference
- Non-security tests always run with Cerberus OFF (default state)
- Reduced total job count from 24 to 15
- Clear separation of concerns
Implementation Details
Workflow Changes
Security Enforcement Jobs (New)
Created dedicated jobs for security enforcement tests:
e2e-{browser}-security:
name: E2E {Browser} (Security Enforcement)
timeout-minutes: 30
env:
CHARON_SECURITY_TESTS_ENABLED: "true"
strategy:
matrix:
shard: [1] # Single shard
total-shards: [1]
steps:
- name: Run Security Enforcement Tests
run: npx playwright test --project={browser} tests/security-enforcement/
Key Changes:
- Single shard per browser (no parallel execution within security tests)
- Explicitly targets
tests/security-enforcement/directory - 30-minute timeout to accommodate serial execution
CHARON_SECURITY_TESTS_ENABLED: "true"enables Cerberus middleware
Non-Security Jobs (Updated)
Updated existing browser jobs to exclude security enforcement tests:
e2e-{browser}:
name: E2E {Browser} (Shard ${{ matrix.shard }}/${{ matrix.total-shards }})
timeout-minutes: 20
env:
CHARON_SECURITY_TESTS_ENABLED: "false" # Cerberus OFF
strategy:
matrix:
shard: [1, 2, 3, 4] # 4 shards
total-shards: [4]
steps:
- name: Run {Browser} tests (Non-Security)
run: |
npx playwright test --project={browser} \
tests/core \
tests/dns-provider-crud.spec.ts \
tests/dns-provider-types.spec.ts \
tests/emergency-server \
tests/integration \
tests/manual-dns-provider.spec.ts \
tests/monitoring \
tests/security \
tests/settings \
tests/tasks \
--shard=${{ matrix.shard }}/${{ matrix.total-shards }}
Key Changes:
- Reduced from 8 shards to 4 shards per browser
- Explicitly lists test directories (excludes
tests/security-enforcement/) CHARON_SECURITY_TESTS_ENABLED: "false"keeps Cerberus OFF by default- 20-minute timeout per shard (sufficient for non-security tests)
Environment Variable Strategy
| Job Type | Variable | Value | Purpose |
|---|---|---|---|
| Security Enforcement | CHARON_SECURITY_TESTS_ENABLED |
"true" |
Enable Cerberus middleware for enforcement tests |
| Non-Security | CHARON_SECURITY_TESTS_ENABLED |
"false" |
Keep Cerberus OFF to prevent ACL/rate limit interference |
Benefits
1. Test Isolation
- Security enforcement tests run independently without affecting other shards
- No cross-shard contamination from global state changes
- Clear separation between enforcement tests and regular functionality tests
2. Predictable Execution
- Security tests execute serially in a controlled environment
- Proper test execution order: enable → tests ON → break glass → tests OFF
- Non-security tests always start with Cerberus OFF (default state)
3. Performance Optimization
- Reduced total job count from 24 to 15 (37.5% reduction)
- Eliminated failed tests due to ACL/rate limit interference
- Balanced shard durations to stay under timeout limits
4. Maintainability
- Explicit test path listing makes it clear which tests run where
- Security enforcement tests are clearly identified and isolated
- Easy to add new test categories without affecting security tests
5. Debugging
- Failures in security enforcement jobs are clearly isolated
- Non-security test failures can't be caused by security middleware interference
- Clearer artifact naming:
playwright-report-{browser}-securityvsplaywright-report-{browser}-{shard}
Testing Strategy
Test Execution Order (User-Specified)
For security enforcement tests, the execution follows this sequence:
-
Enable Security Module
- Tests that enable Cerberus middleware
-
Tests Requiring Security ON
- ACL enforcement verification
- WAF rule enforcement
- Rate limiting enforcement
- CrowdSec integration enforcement
- Security headers enforcement
- Combined enforcement scenarios
-
Break Glass Protocol
emergency-token.spec.ts- Emergency bypass testing
-
Tests Requiring Security OFF
- Verify bypass functionality
- Test default (Cerberus disabled) behavior
Test File Naming Convention
Security enforcement tests use prefixes for ordering:
- Regular tests:
*-enforcement.spec.ts - Serialized tests:
zzz-*-blocking.spec.ts(test.describe.serial) - Final tests:
zzzz-*-recovery.spec.ts(test.describe.serial)
This naming convention ensures Playwright executes tests in the correct order even within the single security shard.
Migration Impact
CI Pipeline Changes
Before:
- 24 parallel jobs (8 shards × 3 browsers)
- Random test distribution
- Frequent failures due to security middleware interference
After:
- 15 jobs (3 security + 12 non-security)
- Deterministic test distribution
- Security tests isolated to prevent interference
Execution Time
Estimated Timings:
- Security enforcement jobs: ~25 minutes each (serial execution)
- Non-security shards: ~15 minutes each (parallel execution)
- Total pipeline time: ~30 minutes (parallel job execution)
Previous Timings:
- All shards: Exceeding 20 minutes with frequent timeouts
- Total pipeline time: Failing due to timeouts
Validation Checklist
- Security enforcement tests run serially without cross-shard interference
- Non-security tests complete within 20-minute timeout
- All browsers (Chromium, Firefox, WebKit) have dedicated security enforcement jobs
CHARON_SECURITY_TESTS_ENABLEDcorrectly set for each job type- Test artifacts clearly named by category (security vs shard number)
- CI pipeline completes successfully without timeout errors
- No ACL/rate limit failures in non-security test shards
Future Improvements
Potential Optimizations
-
Further Shard Balancing
- Profile individual test execution times
- Redistribute tests across shards to balance duration
- Consider 5-6 shards if any shard approaches 20-minute timeout
-
Test Grouping
- Group similar test types together for better cache utilization
- Consider browser-specific test isolation (e.g., Firefox-specific tests)
-
Dynamic Sharding
- Use Playwright's built-in test duration data for intelligent distribution
- Automatically adjust shard count based on test additions
-
Parallel Security Tests
- If security tests grow significantly, consider splitting into sub-categories
- Example: WAF tests, ACL tests, rate limit tests in separate shards
- Requires careful state management to avoid interference
Related Documentation
- User request: "We need to make sure all the security tests are ran in the same shard...Cerberus should be off by default so all the other tests in other shards arent hitting the acl or rate limit and failing"
- Test execution flow specified by user: "enable security → tests requiring security ON → break glass protocol → tests requiring security OFF"
- Original issue: Tests timing out at 20 minutes even with 6 shards due to cross-shard security middleware interference
Rollout Plan
Phase 1: Implementation ✅
- Create dedicated security enforcement jobs for all browsers
- Update non-security jobs to exclude security-enforcement directory
- Set
CHARON_SECURITY_TESTS_ENABLEDappropriately for each job type - Document changes and strategy
Phase 2: Validation (In Progress)
- Run full CI pipeline to verify no timeout errors
- Validate security enforcement tests execute in correct order
- Confirm non-security tests don't hit ACL/rate limit failures
- Monitor execution times to ensure shards stay under timeout limits
Phase 3: Optimization (TBD)
- Profile test execution times per shard
- Adjust shard distribution if any shard approaches timeout
- Consider further optimizations based on real-world execution data
Conclusion
This reorganization addresses the root cause of CI timeout and test interference issues by:
- Isolating security enforcement tests in dedicated serial jobs
- Separating concerns between security testing and functional testing
- Ensuring non-security tests always run with Cerberus OFF (default state)
- Preventing cross-shard contamination from global security state changes
The implementation follows the user's explicit requirements and maintains clarity through clear job naming, environment variable configuration, and explicit test path specifications.