fix(ci): reorganize E2E tests for improved isolation and execution stability

This commit is contained in:
GitHub Actions
2026-02-05 01:47:22 +00:00
parent 6aea2380b0
commit 39b5b8a928
2 changed files with 785 additions and 18 deletions

View File

@@ -0,0 +1,322 @@
# E2E Test Reorganization Implementation
## Problem Statement
CI E2E tests were timing out at 20 minutes even with 8 shards per browser (24 total shards) because:
1. **Cross-Shard Contamination**: Security enforcement tests that enable/disable Cerberus were randomly distributed across shards, causing ACL and rate limit failures in non-security tests
2. **Global State Interference**: Tests modifying global security state (Cerberus middleware) were running in parallel, causing unpredictable test failures
3. **Uneven Distribution**: Random shard distribution didn't account for test dependencies and sequential requirements
## Solution Architecture
### Test Isolation Strategy
Reorganized tests into two categories with dedicated job execution:
#### **Category 1: Security Enforcement Tests (Isolated Serial Execution)**
- **Location**: `tests/security-enforcement/`
- **Job Names**:
- `e2e-chromium-security`
- `e2e-firefox-security`
- `e2e-webkit-security`
- **Sharding**: 1 shard per browser (no sharding within security tests)
- **Environment**: `CHARON_SECURITY_TESTS_ENABLED: "true"`
- **Timeout**: 30 minutes (allows for sequential execution)
- **Test Files**:
- `rate-limit-enforcement.spec.ts`
- `crowdsec-enforcement.spec.ts`
- `emergency-token.spec.ts` (break glass protocol)
- `combined-enforcement.spec.ts`
- `security-headers-enforcement.spec.ts`
- `waf-enforcement.spec.ts`
- `acl-enforcement.spec.ts`
- `zzz-admin-whitelist-blocking.spec.ts` (test.describe.serial)
- `zzzz-break-glass-recovery.spec.ts` (test.describe.serial)
- `emergency-reset.spec.ts`
**Execution Flow** (as specified by user):
1. Enable Cerberus security module
2. Run tests requiring security ON (ACL, WAF, rate limiting, etc.)
3. Execute break glass protocol test (`emergency-token.spec.ts`)
4. Run tests requiring security OFF (verify bypass)
#### **Category 2: Non-Security Tests (Parallel Sharded Execution)**
- **Job Names**:
- `e2e-chromium` (Shard 1-4)
- `e2e-firefox` (Shard 1-4)
- `e2e-webkit` (Shard 1-4)
- **Sharding**: 4 shards per browser (12 total shards)
- **Environment**: `CHARON_SECURITY_TESTS_ENABLED: "false"`**Cerberus OFF by default**
- **Timeout**: 20 minutes per shard
- **Test Directories**:
- `tests/core`
- `tests/dns-provider-crud.spec.ts`
- `tests/dns-provider-types.spec.ts`
- `tests/emergency-server`
- `tests/integration`
- `tests/manual-dns-provider.spec.ts`
- `tests/monitoring`
- `tests/security` (UI/dashboard tests, not enforcement)
- `tests/settings`
- `tests/tasks`
### Job Distribution
**Before**:
```
Total: 24 shards (8 per browser)
├── Chromium: 8 shards (all tests randomly distributed)
├── Firefox: 8 shards (all tests randomly distributed)
└── WebKit: 8 shards (all tests randomly distributed)
Issues:
- Security tests randomly distributed across all shards
- Cerberus state changes affecting parallel test execution
- ACL/rate limit failures in non-security tests
```
**After**:
```
Total: 15 jobs
├── Security Enforcement (3 jobs)
│ ├── Chromium Security: 1 shard (serial execution, 30min timeout)
│ ├── Firefox Security: 1 shard (serial execution, 30min timeout)
│ └── WebKit Security: 1 shard (serial execution, 30min timeout)
└── Non-Security (12 shards)
├── Chromium: 4 shards (parallel, Cerberus OFF, 20min timeout)
├── Firefox: 4 shards (parallel, Cerberus OFF, 20min timeout)
└── WebKit: 4 shards (parallel, Cerberus OFF, 20min timeout)
Benefits:
- Security tests isolated, run serially without cross-shard interference
- Non-security tests always run with Cerberus OFF (default state)
- Reduced total job count from 24 to 15
- Clear separation of concerns
```
## Implementation Details
### Workflow Changes
#### Security Enforcement Jobs (New)
Created dedicated jobs for security enforcement tests:
```yaml
e2e-{browser}-security:
name: E2E {Browser} (Security Enforcement)
timeout-minutes: 30
env:
CHARON_SECURITY_TESTS_ENABLED: "true"
strategy:
matrix:
shard: [1] # Single shard
total-shards: [1]
steps:
- name: Run Security Enforcement Tests
run: npx playwright test --project={browser} tests/security-enforcement/
```
**Key Changes**:
- Single shard per browser (no parallel execution within security tests)
- Explicitly targets `tests/security-enforcement/` directory
- 30-minute timeout to accommodate serial execution
- `CHARON_SECURITY_TESTS_ENABLED: "true"` enables Cerberus middleware
#### Non-Security Jobs (Updated)
Updated existing browser jobs to exclude security enforcement tests:
```yaml
e2e-{browser}:
name: E2E {Browser} (Shard ${{ matrix.shard }}/${{ matrix.total-shards }})
timeout-minutes: 20
env:
CHARON_SECURITY_TESTS_ENABLED: "false" # Cerberus OFF
strategy:
matrix:
shard: [1, 2, 3, 4] # 4 shards
total-shards: [4]
steps:
- name: Run {Browser} tests (Non-Security)
run: |
npx playwright test --project={browser} \
tests/core \
tests/dns-provider-crud.spec.ts \
tests/dns-provider-types.spec.ts \
tests/emergency-server \
tests/integration \
tests/manual-dns-provider.spec.ts \
tests/monitoring \
tests/security \
tests/settings \
tests/tasks \
--shard=${{ matrix.shard }}/${{ matrix.total-shards }}
```
**Key Changes**:
- Reduced from 8 shards to 4 shards per browser
- Explicitly lists test directories (excludes `tests/security-enforcement/`)
- `CHARON_SECURITY_TESTS_ENABLED: "false"` keeps Cerberus OFF by default
- 20-minute timeout per shard (sufficient for non-security tests)
### Environment Variable Strategy
| Job Type | Variable | Value | Purpose |
|----------|----------|-------|---------|
| Security Enforcement | `CHARON_SECURITY_TESTS_ENABLED` | `"true"` | Enable Cerberus middleware for enforcement tests |
| Non-Security | `CHARON_SECURITY_TESTS_ENABLED` | `"false"` | Keep Cerberus OFF to prevent ACL/rate limit interference |
## Benefits
### 1. **Test Isolation**
- Security enforcement tests run independently without affecting other shards
- No cross-shard contamination from global state changes
- Clear separation between enforcement tests and regular functionality tests
### 2. **Predictable Execution**
- Security tests execute serially in a controlled environment
- Proper test execution order: enable → tests ON → break glass → tests OFF
- Non-security tests always start with Cerberus OFF (default state)
### 3. **Performance Optimization**
- Reduced total job count from 24 to 15 (37.5% reduction)
- Eliminated failed tests due to ACL/rate limit interference
- Balanced shard durations to stay under timeout limits
### 4. **Maintainability**
- Explicit test path listing makes it clear which tests run where
- Security enforcement tests are clearly identified and isolated
- Easy to add new test categories without affecting security tests
### 5. **Debugging**
- Failures in security enforcement jobs are clearly isolated
- Non-security test failures can't be caused by security middleware interference
- Clearer artifact naming: `playwright-report-{browser}-security` vs `playwright-report-{browser}-{shard}`
## Testing Strategy
### Test Execution Order (User-Specified)
For security enforcement tests, the execution follows this sequence:
1. **Enable Security Module**
- Tests that enable Cerberus middleware
2. **Tests Requiring Security ON**
- ACL enforcement verification
- WAF rule enforcement
- Rate limiting enforcement
- CrowdSec integration enforcement
- Security headers enforcement
- Combined enforcement scenarios
3. **Break Glass Protocol**
- `emergency-token.spec.ts` - Emergency bypass testing
4. **Tests Requiring Security OFF**
- Verify bypass functionality
- Test default (Cerberus disabled) behavior
### Test File Naming Convention
Security enforcement tests use prefixes for ordering:
- Regular tests: `*-enforcement.spec.ts`
- Serialized tests: `zzz-*-blocking.spec.ts` (test.describe.serial)
- Final tests: `zzzz-*-recovery.spec.ts` (test.describe.serial)
This naming convention ensures Playwright executes tests in the correct order even within the single security shard.
## Migration Impact
### CI Pipeline Changes
**Before**:
- 24 parallel jobs (8 shards × 3 browsers)
- Random test distribution
- Frequent failures due to security middleware interference
**After**:
- 15 jobs (3 security + 12 non-security)
- Deterministic test distribution
- Security tests isolated to prevent interference
### Execution Time
**Estimated Timings**:
- Security enforcement jobs: ~25 minutes each (serial execution)
- Non-security shards: ~15 minutes each (parallel execution)
- Total pipeline time: ~30 minutes (parallel job execution)
**Previous Timings**:
- All shards: Exceeding 20 minutes with frequent timeouts
- Total pipeline time: Failing due to timeouts
## Validation Checklist
- [ ] Security enforcement tests run serially without cross-shard interference
- [ ] Non-security tests complete within 20-minute timeout
- [ ] All browsers (Chromium, Firefox, WebKit) have dedicated security enforcement jobs
- [ ] `CHARON_SECURITY_TESTS_ENABLED` correctly set for each job type
- [ ] Test artifacts clearly named by category (security vs shard number)
- [ ] CI pipeline completes successfully without timeout errors
- [ ] No ACL/rate limit failures in non-security test shards
## Future Improvements
### Potential Optimizations
1. **Further Shard Balancing**
- Profile individual test execution times
- Redistribute tests across shards to balance duration
- Consider 5-6 shards if any shard approaches 20-minute timeout
2. **Test Grouping**
- Group similar test types together for better cache utilization
- Consider browser-specific test isolation (e.g., Firefox-specific tests)
3. **Dynamic Sharding**
- Use Playwright's built-in test duration data for intelligent distribution
- Automatically adjust shard count based on test additions
4. **Parallel Security Tests**
- If security tests grow significantly, consider splitting into sub-categories
- Example: WAF tests, ACL tests, rate limit tests in separate shards
- Requires careful state management to avoid interference
## Related Documentation
- User request: "We need to make sure all the security tests are ran in the same shard...Cerberus should be off by default so all the other tests in other shards arent hitting the acl or rate limit and failing"
- Test execution flow specified by user: "enable security → tests requiring security ON → break glass protocol → tests requiring security OFF"
- Original issue: Tests timing out at 20 minutes even with 6 shards due to cross-shard security middleware interference
## Rollout Plan
### Phase 1: Implementation ✅
- [x] Create dedicated security enforcement jobs for all browsers
- [x] Update non-security jobs to exclude security-enforcement directory
- [x] Set `CHARON_SECURITY_TESTS_ENABLED` appropriately for each job type
- [x] Document changes and strategy
### Phase 2: Validation (In Progress)
- [ ] Run full CI pipeline to verify no timeout errors
- [ ] Validate security enforcement tests execute in correct order
- [ ] Confirm non-security tests don't hit ACL/rate limit failures
- [ ] Monitor execution times to ensure shards stay under timeout limits
### Phase 3: Optimization (TBD)
- [ ] Profile test execution times per shard
- [ ] Adjust shard distribution if any shard approaches timeout
- [ ] Consider further optimizations based on real-world execution data
## Conclusion
This reorganization addresses the root cause of CI timeout and test interference issues by:
- **Isolating** security enforcement tests in dedicated serial jobs
- **Separating** concerns between security testing and functional testing
- **Ensuring** non-security tests always run with Cerberus OFF (default state)
- **Preventing** cross-shard contamination from global security state changes
The implementation follows the user's explicit requirements and maintains clarity through clear job naming, environment variable configuration, and explicit test path specifications.