Files
Charon/docs/plans/current_spec.md
GitHub Actions 103f0e0ae9 fix: resolve WAF integration failure and E2E ACL deadlock
Fix integration scripts using wget-style curl options after Alpine→Debian
migration (PR #550). Add Playwright security test helpers to prevent ACL
from blocking subsequent tests.

Fix curl syntax in 5 scripts: -q -O- → -sf
Create security-helpers.ts with state capture/restore
Add emergency ACL reset to global-setup.ts
Fix fixture reuse bug in security-dashboard.spec.ts
Add security-helpers.md usage guide
Resolves WAF workflow "httpbin backend failed to start" error
2026-01-25 14:09:38 +00:00

1238 lines
35 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# WAF Integration Workflow Fix: wget-style curl Syntax Migration
**Plan ID**: WAF-2026-001
**Status**: 📋 PENDING
**Priority**: High
**Created**: 2026-01-25
**Scope**: Fix integration test scripts using incorrect wget-style curl syntax
---
## Problem Summary
After migrating the Docker base image from Alpine to Debian Trixie (PR #550), the WAF integration workflow is failing. The root cause is **not** a missing `wget` command, but rather several integration test scripts using **wget-style options with curl** that don't work correctly.
### Root Cause
Multiple scripts use `curl -q -O-` which is **wget syntax, not curl syntax**:
| Syntax | Tool | Meaning |
|--------|------|---------|
| `-q` | **wget** | Quiet mode |
| `-q` | **curl** | **Invalid** - does nothing useful |
| `-O-` | **wget** | Output to stdout |
| `-O-` | **curl** | **Wrong** - `-O` means "save with remote filename", `-` is treated as a separate URL |
The correct curl equivalents are:
| wget | curl | Notes |
|------|------|-------|
| `wget -q` | `curl -s` | Silent mode |
| `wget -O-` | `curl -s` | stdout is curl's default output |
| `wget -q -O- URL` | `curl -s URL` | Full equivalent |
| `wget -O filename` | `curl -o filename` | Note: lowercase `-o` in curl |
---
## Files Requiring Changes
### Priority 1: Integration Test Scripts (Blocking WAF Workflow)
| File | Line | Current Code | Issue |
|------|------|--------------|-------|
| [scripts/waf_integration.sh](../../scripts/waf_integration.sh#L205) | 205 | `curl -q -O- http://${BACKEND_CONTAINER}/get` | wget syntax |
| [scripts/cerberus_integration.sh](../../scripts/cerberus_integration.sh#L214) | 214 | `curl -q -O- http://${BACKEND_CONTAINER}/get` | wget syntax |
| [scripts/rate_limit_integration.sh](../../scripts/rate_limit_integration.sh#L190) | 190 | `curl -q -O- http://${BACKEND_CONTAINER}/get` | wget syntax |
| [scripts/crowdsec_startup_test.sh](../../scripts/crowdsec_startup_test.sh#L178) | 178 | `curl -q -O- http://127.0.0.1:8085/health` | wget syntax |
### Priority 2: Utility Scripts
| File | Line | Current Code | Issue |
|------|------|--------------|-------|
| [scripts/install-go-1.25.5.sh](../../scripts/install-go-1.25.5.sh#L18) | 18 | `curl -q -O "$TMPFILE" "URL"` | Wrong syntax - `-O` doesn't take an argument in curl |
---
## Detailed Fixes
### Fix 1: scripts/waf_integration.sh (Line 205)
**Current (broken):**
```bash
if docker exec ${CONTAINER_NAME} sh -c "curl -q -O- http://${BACKEND_CONTAINER}/get 2>/dev/null || curl -s http://${BACKEND_CONTAINER}/get" >/dev/null 2>&1; then
```
**Fixed:**
```bash
if docker exec ${CONTAINER_NAME} sh -c "curl -sf http://${BACKEND_CONTAINER}/get" >/dev/null 2>&1; then
```
**Notes:**
- `-s` = silent (no progress meter)
- `-f` = fail silently on HTTP errors (returns non-zero exit code)
- Removed redundant fallback since the fix makes the command work correctly
---
### Fix 2: scripts/cerberus_integration.sh (Line 214)
**Current (broken):**
```bash
if docker exec ${CONTAINER_NAME} sh -c "curl -q -O- http://${BACKEND_CONTAINER}/get 2>/dev/null || curl -s http://${BACKEND_CONTAINER}/get" >/dev/null 2>&1; then
```
**Fixed:**
```bash
if docker exec ${CONTAINER_NAME} sh -c "curl -sf http://${BACKEND_CONTAINER}/get" >/dev/null 2>&1; then
```
---
### Fix 3: scripts/rate_limit_integration.sh (Line 190)
**Current (broken):**
```bash
if docker exec ${CONTAINER_NAME} sh -c "curl -q -O- http://${BACKEND_CONTAINER}/get 2>/dev/null || curl -s http://${BACKEND_CONTAINER}/get" >/dev/null 2>&1; then
```
**Fixed:**
```bash
if docker exec ${CONTAINER_NAME} sh -c "curl -sf http://${BACKEND_CONTAINER}/get" >/dev/null 2>&1; then
```
---
### Fix 4: scripts/crowdsec_startup_test.sh (Line 178)
**Current (broken):**
```bash
LAPI_HEALTH=$(docker exec ${CONTAINER_NAME} curl -q -O- http://127.0.0.1:8085/health 2>/dev/null || echo "FAILED")
```
**Fixed:**
```bash
LAPI_HEALTH=$(docker exec ${CONTAINER_NAME} curl -sf http://127.0.0.1:8085/health 2>/dev/null || echo "FAILED")
```
---
### Fix 5: scripts/install-go-1.25.5.sh (Line 18)
**Current (broken):**
```bash
curl -q -O "$TMPFILE" "https://go.dev/dl/${TARFILE}"
```
**Fixed:**
```bash
curl -sSfL -o "$TMPFILE" "https://go.dev/dl/${TARFILE}"
```
**Notes:**
- `-s` = silent
- `-S` = show errors even in silent mode
- `-f` = fail on HTTP errors
- `-L` = follow redirects (important for go.dev downloads)
- `-o filename` = output to specified file (lowercase `-o`)
---
## Verification Commands
After applying fixes, verify each script works:
```bash
# Test WAF integration
./scripts/waf_integration.sh
# Test Cerberus integration
./scripts/cerberus_integration.sh
# Test Rate Limit integration
./scripts/rate_limit_integration.sh
# Test CrowdSec startup
./scripts/crowdsec_startup_test.sh
# Verify Go install script syntax
bash -n ./scripts/install-go-1.25.5.sh
```
---
## Behavior Differences: wget vs curl
When migrating from wget to curl, be aware of these differences:
| Behavior | wget | curl |
|----------|------|------|
| Output destination | File by default | stdout by default |
| Follow redirects | Yes by default | Requires `-L` flag |
| Retry on failure | Built-in retry | Requires `--retry N` |
| Progress display | Text progress bar | Progress meter (use `-s` to hide) |
| HTTP error handling | Non-zero exit on 404 | Requires `-f` for non-zero exit on HTTP errors |
| Quiet mode | `-q` | `-s` (silent) |
| Output to file | `-O filename` (uppercase) | `-o filename` (lowercase) |
| Save with remote name | `-O` (no arg) | `-O` (uppercase, no arg) |
---
## Execution Checklist
- [ ] **Fix 1**: Update `scripts/waf_integration.sh` line 205
- [ ] **Fix 2**: Update `scripts/cerberus_integration.sh` line 214
- [ ] **Fix 3**: Update `scripts/rate_limit_integration.sh` line 190
- [ ] **Fix 4**: Update `scripts/crowdsec_startup_test.sh` line 178
- [ ] **Fix 5**: Update `scripts/install-go-1.25.5.sh` line 18
- [ ] **Verify**: Run each integration test locally
- [ ] **CI**: Confirm WAF integration workflow passes
---
## Notes
1. **Deprecated Scripts**: Several affected scripts are marked deprecated (will be removed in v2.0.0). However, they are still used by CI workflows, so fixes are required.
2. **Skill-Based Replacements**: The `.github/skills/scripts/` directory was checked and contains no wget usage - those scripts already use correct curl syntax.
3. **Docker Compose Files**: All health checks in docker-compose files already use correct curl syntax (`curl -f`, `curl -fsS`).
4. **Dockerfile**: The main Dockerfile correctly installs `curl` and uses correct curl syntax in the HEALTHCHECK instruction.
---
# Previous Plan (Archived)
The previous Git & Workflow Recovery Plan has been archived below.
---
# Git & Workflow Recovery Plan (ARCHIVED)
**Plan ID**: GIT-2026-001
**Status**: ✅ ARCHIVED
**Priority**: High
**Created**: 2026-01-25
**Scope**: Git recovery, Renovate fix, Workflow simplification
---
## Problem Summary
1. **Git State**: Feature branch `feature/beta-release` is in a broken rebase state
2. **Renovate**: Targeting feature branches creates orphaned PRs and merge conflicts
3. **Propagate Workflow**: Overly complex cascade (`main → development → nightly → feature/*`) causes confusion
4. **Nightly Branch**: Unnecessary intermediate branch adding complexity
---
## Phase 1: Git Recovery
### Step 1.1 — Abort the Rebase
```bash
# Check current state
git status
# Abort the in-progress rebase
git rebase --abort
# Verify clean state
git status
```
### Step 1.2 — Fetch Latest from Origin
```bash
# Fetch all branches
git fetch origin --prune
# Ensure we're on the feature branch
git checkout feature/beta-release
```
### Step 1.3 — Merge Development into Feature Branch
**Use merge, NOT rebase** to preserve commit history and avoid force-push issues.
```bash
# Merge development into feature/beta-release
git merge origin/development --no-ff -m "Merge development into feature/beta-release"
```
### Step 1.4 — Resolve Conflicts (if any)
Likely conflict files based on Renovate activity:
- `package.json` / `package-lock.json` (version bumps)
- `backend/go.mod` / `backend/go.sum` (Go dependency updates)
- `.github/workflows/*.yml` (action digest pins)
**Resolution strategy:**
```bash
# For package.json - accept development's versions, then run npm install
git checkout --theirs package.json package-lock.json
npm install
git add package.json package-lock.json
# For go.mod/go.sum - accept development's versions, then tidy
git checkout --theirs backend/go.mod backend/go.sum
cd backend && go mod tidy && cd ..
git add backend/go.mod backend/go.sum
# For workflow files - usually safe to accept development
git checkout --theirs .github/workflows/
# Complete the merge
git commit
```
### Step 1.5 — Push the Merged Branch
```bash
git push origin feature/beta-release
```
---
## Phase 2: Renovate Fix
### Problem
Current config in `.github/renovate.json`:
```json
"baseBranches": [
"development",
"feature/beta-release"
]
```
This causes:
- Duplicate PRs for the same dependency (one per branch)
- Orphaned branches like `renovate/feature/beta-release-*` when feature merges
- Constant merge conflicts between branches
### Solution
Only target `development`. Changes flow naturally via propagate workflow.
### Old Config (REMOVE)
```json
{
"baseBranches": [
"development",
"feature/beta-release"
],
...
}
```
### New Config (REPLACE WITH)
```json
{
"baseBranches": [
"development"
],
...
}
```
### File to Edit
**File**: `.github/renovate.json`
**Line**: ~12-15
---
## Phase 3: Propagate Workflow Fix
### Problem
Current workflow in `.github/workflows/propagate-changes.yml`:
```yaml
on:
push:
branches:
- main
- development
- nightly # <-- Unnecessary
```
Cascade logic:
- `main``development` ✅ (Correct)
- `development``nightly` ❌ (Unnecessary)
- `nightly``feature/*` ❌ (Overly complex)
### Solution
Simplify to **only** `main → development` propagation.
### Old Trigger (REMOVE)
```yaml
on:
push:
branches:
- main
- development
- nightly
```
### New Trigger (REPLACE WITH)
```yaml
on:
push:
branches:
- main
```
### Old Script Logic (REMOVE)
```javascript
if (currentBranch === 'main') {
// Main -> Development
await createPR('main', 'development');
} else if (currentBranch === 'development') {
// Development -> Nightly
await createPR('development', 'nightly');
} else if (currentBranch === 'nightly') {
// Nightly -> Feature branches
const branches = await github.paginate(github.rest.repos.listBranches, {
owner: context.repo.owner,
repo: context.repo.repo,
});
const featureBranches = branches
.map(b => b.name)
.filter(name => name.startsWith('feature/'));
core.info(`Found ${featureBranches.length} feature branches: ${featureBranches.join(', ')}`);
for (const featureBranch of featureBranches) {
await createPR('development', featureBranch);
}
}
```
### New Script Logic (REPLACE WITH)
```javascript
if (currentBranch === 'main') {
// Main -> Development (only propagation needed)
await createPR('main', 'development');
}
```
### File to Edit
**File**: `.github/workflows/propagate-changes.yml`
---
## Phase 4: Cleanup
### Step 4.1 — Delete Nightly Branch
```bash
# Delete remote nightly branch (if exists)
git push origin --delete nightly 2>/dev/null || echo "nightly branch does not exist"
# Delete local tracking branch
git branch -D nightly 2>/dev/null || true
```
### Step 4.2 — Delete Orphaned Renovate Branches
```bash
# List all renovate branches targeting feature/beta-release
git fetch origin
git branch -r | grep 'renovate/feature/beta-release' | while read branch; do
remote_branch="${branch#origin/}"
echo "Deleting: $remote_branch"
git push origin --delete "$remote_branch"
done
```
### Step 4.3 — Close Orphaned Renovate PRs
After branches are deleted, any associated PRs will be automatically closed by GitHub.
---
## Execution Checklist
- [ ] **Phase 1**: Git Recovery
- [ ] 1.1 Abort rebase
- [ ] 1.2 Fetch latest
- [ ] 1.3 Merge development
- [ ] 1.4 Resolve conflicts
- [ ] 1.5 Push merged branch
- [ ] **Phase 2**: Renovate Fix
- [ ] Edit `.github/renovate.json` - remove `feature/beta-release` from baseBranches
- [ ] Commit and push
- [ ] **Phase 3**: Propagate Workflow Fix
- [ ] Edit `.github/workflows/propagate-changes.yml` - simplify triggers and logic
- [ ] Commit and push
- [ ] **Phase 4**: Cleanup
- [ ] 4.1 Delete nightly branch
- [ ] 4.2 Delete orphaned `renovate/feature/beta-release-*` branches
- [ ] 4.3 Verify orphaned PRs are closed
---
## Verification
After all phases complete:
```bash
# Confirm no rebase in progress
git status
# Expected: "On branch feature/beta-release" with clean state
# Confirm nightly deleted
git branch -r | grep nightly
# Expected: no output
# Confirm orphaned renovate branches deleted
git branch -r | grep 'renovate/feature/beta-release'
# Expected: no output
# Confirm Renovate config only targets development
cat .github/renovate.json | grep -A2 baseBranches
# Expected: only "development"
```
---
## Rollback Plan
If issues occur:
1. **Git Recovery Failed**:
```bash
git fetch origin
git checkout feature/beta-release
git reset --hard origin/feature/beta-release
```
2. **Renovate Changes Broke Something**: Revert the commit to `.github/renovate.json`
3. **Propagate Workflow Issues**: Revert the commit to `.github/workflows/propagate-changes.yml`
---
## Archived Spec (Prior Implementation)
# Security Fix: Remove Hardcoded Encryption Keys from Docker Compose Files
**Plan ID**: SEC-2026-001
**Status**: ✅ IMPLEMENTED
**Priority**: Critical (Security)
**Created**: 2026-01-25
**Implemented By**: Management Agent
---
### Summary
Removed hardcoded encryption keys from Docker Compose test files and implemented ephemeral key generation in CI workflows.
### Changes Applied
| File | Change |
|------|--------|
| `.docker/compose/docker-compose.playwright.yml` | Replaced hardcoded key with `${CHARON_ENCRYPTION_KEY:?...}` |
| `.docker/compose/docker-compose.e2e.yml` | Replaced hardcoded key with `${CHARON_ENCRYPTION_KEY:?...}` |
| `.github/workflows/e2e-tests.yml` | Added ephemeral key generation step |
| `.env.test.example` | Added prominent documentation |
### Security Notes
- The old key `ucDWy5ScLubd3QwCHhQa2SY7wL2OF48p/c9nZhyW1mA=` exists in git history
- This key should **NEVER** be used in any production environment
- Each CI run now generates a unique ephemeral key
### Testing
```bash
# Verify compose fails without key
unset CHARON_ENCRYPTION_KEY
docker compose -f .docker/compose/docker-compose.playwright.yml config 2>&1
# Expected: "CHARON_ENCRYPTION_KEY is required"
# Verify compose succeeds with key
export CHARON_ENCRYPTION_KEY=$(openssl rand -base64 32)
docker compose -f .docker/compose/docker-compose.playwright.yml config
# Expected: Valid YAML output
```
### References
- **OWASP**: [A02:2021 Cryptographic Failures](https://owasp.org/Top10/A02_2021-Cryptographic_Failures/)
---
# Playwright Security Test Helpers
**Plan ID**: E2E-SEC-001
**Status**: ✅ COMPLETED
**Priority**: Critical (Blocking 230/707 E2E test failures)
**Created**: 2026-01-25
**Completed**: 2026-01-25
**Scope**: Add security test helpers to prevent ACL deadlock in E2E tests
---
## Completion Notes
**Implementation Summary:**
- Created `tests/utils/security-helpers.ts` with full security state management utilities
- Functions implemented: `getSecurityStatus`, `setSecurityModuleEnabled`, `captureSecurityState`, `restoreSecurityState`, `withSecurityEnabled`, `disableAllSecurityModules`
- Pattern enables guaranteed cleanup via Playwright's `test.afterAll()` fixture
**Documentation:**
- See [Security Test Helpers Guide](../testing/security-helpers.md) for usage examples
---
## Problem Summary
During E2E testing, if ACL is left enabled from a previous test run (e.g., due to test failure), it can create a **deadlock**:
1. ACL blocks API requests → returns 403 Forbidden
2. Global cleanup can't run → API blocked
3. Auth setup fails → tests skip
4. Manual intervention required to reset volumes
**Root Cause Analysis:**
- `security-dashboard.spec.ts` has tests that toggle ACL, WAF, and Rate Limiting
- The tests attempt to "toggle back" but if a test fails mid-execution, cleanup doesn't run
- Playwright's `test.afterAll` with fixtures guarantees cleanup even on failure
- The current tests don't use fixtures for security state management
## Solution Architecture
### API Endpoints (Backend Already Supports)
| Endpoint | Method | Purpose |
|----------|--------|---------|
| `/api/v1/security/status` | GET | Returns current state of all security modules |
| `/api/v1/settings` | POST | Toggle settings with `{ key: "security.acl.enabled", value: "true/false" }` |
### Settings Keys
| Key | Values | Description |
|-----|--------|-------------|
| `security.acl.enabled` | `"true"` / `"false"` | Toggle ACL enforcement |
| `security.waf.enabled` | `"true"` / `"false"` | Toggle WAF enforcement |
| `security.rate_limit.enabled` | `"true"` / `"false"` | Toggle Rate Limiting |
| `security.crowdsec.enabled` | `"true"` / `"false"` | Toggle CrowdSec |
| `feature.cerberus.enabled` | `"true"` / `"false"` | Master toggle for all security |
---
## Implementation Plan
### File 1: `tests/utils/security-helpers.ts` (CREATE)
```typescript
/**
* Security Test Helpers - Safe ACL/WAF/Rate Limit toggle for E2E tests
*
* These helpers provide safe mechanisms to temporarily enable security features
* during tests, with guaranteed cleanup even on test failure.
*
* Problem: If ACL is left enabled after a test failure, it blocks all API requests
* causing subsequent tests to fail with 403 Forbidden (deadlock).
*
* Solution: Use Playwright's test.afterAll() with captured original state to
* guarantee restoration regardless of test outcome.
*
* @example
* ```typescript
* import { withSecurityEnabled, getSecurityStatus } from './utils/security-helpers';
*
* test.describe('ACL Tests', () => {
* let cleanup: () => Promise<void>;
*
* test.beforeAll(async ({ request }) => {
* cleanup = await withSecurityEnabled(request, { acl: true });
* });
*
* test.afterAll(async () => {
* await cleanup();
* });
*
* test('should enforce ACL', async ({ page }) => {
* // ACL is now enabled, test enforcement
* });
* });
* ```
*/
import { APIRequestContext } from '@playwright/test';
/**
* Security module status from GET /api/v1/security/status
*/
export interface SecurityStatus {
cerberus: { enabled: boolean };
crowdsec: { mode: string; api_url: string; enabled: boolean };
waf: { mode: string; enabled: boolean };
rate_limit: { mode: string; enabled: boolean };
acl: { mode: string; enabled: boolean };
}
/**
* Options for enabling specific security modules
*/
export interface SecurityModuleOptions {
/** Enable ACL enforcement */
acl?: boolean;
/** Enable WAF protection */
waf?: boolean;
/** Enable rate limiting */
rateLimit?: boolean;
/** Enable CrowdSec */
crowdsec?: boolean;
/** Enable master Cerberus toggle (required for other modules) */
cerberus?: boolean;
}
/**
* Captured state for restoration
*/
export interface CapturedSecurityState {
acl: boolean;
waf: boolean;
rateLimit: boolean;
crowdsec: boolean;
cerberus: boolean;
}
/**
* Mapping of module names to their settings keys
*/
const SECURITY_SETTINGS_KEYS: Record<keyof SecurityModuleOptions, string> = {
acl: 'security.acl.enabled',
waf: 'security.waf.enabled',
rateLimit: 'security.rate_limit.enabled',
crowdsec: 'security.crowdsec.enabled',
cerberus: 'feature.cerberus.enabled',
};
/**
* Get current security status from the API
* @param request - Playwright APIRequestContext (authenticated)
* @returns Current security status
*/
export async function getSecurityStatus(
request: APIRequestContext
): Promise<SecurityStatus> {
const response = await request.get('/api/v1/security/status');
if (!response.ok()) {
throw new Error(
`Failed to get security status: ${response.status()} ${await response.text()}`
);
}
return response.json();
}
/**
* Set a specific security module's enabled state
* @param request - Playwright APIRequestContext (authenticated)
* @param module - Which module to toggle
* @param enabled - Whether to enable or disable
*/
export async function setSecurityModuleEnabled(
request: APIRequestContext,
module: keyof SecurityModuleOptions,
enabled: boolean
): Promise<void> {
const key = SECURITY_SETTINGS_KEYS[module];
const value = enabled ? 'true' : 'false';
const response = await request.post('/api/v1/settings', {
data: { key, value },
});
if (!response.ok()) {
throw new Error(
`Failed to set ${module} to ${enabled}: ${response.status()} ${await response.text()}`
);
}
// Wait a brief moment for Caddy config reload
await new Promise((resolve) => setTimeout(resolve, 500));
}
/**
* Capture current security state for later restoration
* @param request - Playwright APIRequestContext (authenticated)
* @returns Captured state object
*/
export async function captureSecurityState(
request: APIRequestContext
): Promise<CapturedSecurityState> {
const status = await getSecurityStatus(request);
return {
acl: status.acl.enabled,
waf: status.waf.enabled,
rateLimit: status.rate_limit.enabled,
crowdsec: status.crowdsec.enabled,
cerberus: status.cerberus.enabled,
};
}
/**
* Restore security state to previously captured values
* @param request - Playwright APIRequestContext (authenticated)
* @param state - Previously captured state
*/
export async function restoreSecurityState(
request: APIRequestContext,
state: CapturedSecurityState
): Promise<void> {
const currentStatus = await getSecurityStatus(request);
// Restore in reverse dependency order (features before master toggle)
const modules: (keyof SecurityModuleOptions)[] = ['acl', 'waf', 'rateLimit', 'crowdsec', 'cerberus'];
for (const module of modules) {
const currentValue = module === 'rateLimit'
? currentStatus.rate_limit.enabled
: module === 'crowdsec'
? currentStatus.crowdsec.enabled
: currentStatus[module].enabled;
if (currentValue !== state[module]) {
await setSecurityModuleEnabled(request, module, state[module]);
}
}
}
/**
* Enable security modules temporarily with guaranteed cleanup.
*
* Returns a cleanup function that MUST be called in test.afterAll().
* The cleanup function restores the original state even if tests fail.
*
* @param request - Playwright APIRequestContext (authenticated)
* @param options - Which modules to enable
* @returns Cleanup function to restore original state
*
* @example
* ```typescript
* test.describe('ACL Tests', () => {
* let cleanup: () => Promise<void>;
*
* test.beforeAll(async ({ request }) => {
* cleanup = await withSecurityEnabled(request, { acl: true, cerberus: true });
* });
*
* test.afterAll(async () => {
* await cleanup();
* });
* });
* ```
*/
export async function withSecurityEnabled(
request: APIRequestContext,
options: SecurityModuleOptions
): Promise<() => Promise<void>> {
// Capture original state BEFORE making any changes
const originalState = await captureSecurityState(request);
// Enable Cerberus first (master toggle) if any security module is requested
const needsCerberus = options.acl || options.waf || options.rateLimit || options.crowdsec;
if ((needsCerberus || options.cerberus) && !originalState.cerberus) {
await setSecurityModuleEnabled(request, 'cerberus', true);
}
// Enable requested modules
if (options.acl) {
await setSecurityModuleEnabled(request, 'acl', true);
}
if (options.waf) {
await setSecurityModuleEnabled(request, 'waf', true);
}
if (options.rateLimit) {
await setSecurityModuleEnabled(request, 'rateLimit', true);
}
if (options.crowdsec) {
await setSecurityModuleEnabled(request, 'crowdsec', true);
}
// Return cleanup function that restores original state
return async () => {
try {
await restoreSecurityState(request, originalState);
} catch (error) {
// Log error but don't throw - cleanup should not fail tests
console.error('Failed to restore security state:', error);
// Try emergency disable of ACL to prevent deadlock
try {
await setSecurityModuleEnabled(request, 'acl', false);
} catch {
console.error('Emergency ACL disable also failed - manual intervention may be required');
}
}
};
}
/**
* Disable all security modules (emergency reset).
* Use this in global-setup.ts or when tests need a clean slate.
*
* @param request - Playwright APIRequestContext (authenticated)
*/
export async function disableAllSecurityModules(
request: APIRequestContext
): Promise<void> {
const modules: (keyof SecurityModuleOptions)[] = ['acl', 'waf', 'rateLimit', 'crowdsec'];
for (const module of modules) {
try {
await setSecurityModuleEnabled(request, module, false);
} catch (error) {
console.warn(`Failed to disable ${module}:`, error);
}
}
}
/**
* Check if ACL is currently blocking requests.
* Useful for debugging test failures.
*
* @param request - Playwright APIRequestContext
* @returns True if ACL is enabled and blocking
*/
export async function isAclBlocking(request: APIRequestContext): Promise<boolean> {
try {
const status = await getSecurityStatus(request);
return status.acl.enabled && status.cerberus.enabled;
} catch {
// If we can't get status, ACL might be blocking
return true;
}
}
```
---
### File 2: `tests/security/security-dashboard.spec.ts` (MODIFY)
**Changes Required:**
1. Import the new security helpers
2. Add `test.beforeAll` to capture initial state
3. Add `test.afterAll` to guarantee cleanup
4. Remove redundant "toggle back" steps in individual tests
5. Group toggle tests in a separate describe block with isolated cleanup
**Exact Changes:**
```typescript
// ADD after existing imports (around line 12)
import {
withSecurityEnabled,
captureSecurityState,
restoreSecurityState,
CapturedSecurityState,
} from '../utils/security-helpers';
```
```typescript
// REPLACE the entire 'Module Toggle Actions' describe block (lines ~80-180)
// with this safer implementation:
test.describe('Module Toggle Actions', () => {
// Capture state ONCE for this describe block
let originalState: CapturedSecurityState;
let request: APIRequestContext;
test.beforeAll(async ({ request: req }) => {
request = req;
originalState = await captureSecurityState(request);
});
test.afterAll(async () => {
// CRITICAL: Restore original state even if tests fail
if (originalState) {
await restoreSecurityState(request, originalState);
}
});
test('should toggle ACL enabled/disabled', async ({ page }) => {
const toggle = page.getByTestId('toggle-acl');
const isDisabled = await toggle.isDisabled();
if (isDisabled) {
test.info().annotations.push({
type: 'skip-reason',
description: 'Toggle is disabled because Cerberus security is not enabled',
});
test.skip();
return;
}
await test.step('Toggle ACL state', async () => {
await page.waitForLoadState('networkidle');
await toggle.scrollIntoViewIfNeeded();
await page.waitForTimeout(200);
await toggle.click({ force: true });
await waitForToast(page, /updated|success|enabled|disabled/i, 10000);
});
// NOTE: Do NOT toggle back here - afterAll handles cleanup
});
test('should toggle WAF enabled/disabled', async ({ page }) => {
const toggle = page.getByTestId('toggle-waf');
const isDisabled = await toggle.isDisabled();
if (isDisabled) {
test.info().annotations.push({
type: 'skip-reason',
description: 'Toggle is disabled because Cerberus security is not enabled',
});
test.skip();
return;
}
await test.step('Toggle WAF state', async () => {
await page.waitForLoadState('networkidle');
await toggle.scrollIntoViewIfNeeded();
await page.waitForTimeout(200);
await toggle.click({ force: true });
await waitForToast(page, /updated|success|enabled|disabled/i, 10000);
});
// NOTE: Do NOT toggle back here - afterAll handles cleanup
});
test('should toggle Rate Limiting enabled/disabled', async ({ page }) => {
const toggle = page.getByTestId('toggle-rate-limit');
const isDisabled = await toggle.isDisabled();
if (isDisabled) {
test.info().annotations.push({
type: 'skip-reason',
description: 'Toggle is disabled because Cerberus security is not enabled',
});
test.skip();
return;
}
await test.step('Toggle Rate Limit state', async () => {
await page.waitForLoadState('networkidle');
await toggle.scrollIntoViewIfNeeded();
await page.waitForTimeout(200);
await toggle.click({ force: true });
await waitForToast(page, /updated|success|enabled|disabled/i, 10000);
});
// NOTE: Do NOT toggle back here - afterAll handles cleanup
});
test('should persist toggle state after page reload', async ({ page }) => {
const toggle = page.getByTestId('toggle-acl');
const isDisabled = await toggle.isDisabled();
if (isDisabled) {
test.info().annotations.push({
type: 'skip-reason',
description: 'Toggle is disabled because Cerberus security is not enabled',
});
test.skip();
return;
}
const initialChecked = await toggle.isChecked();
await test.step('Toggle ACL state', async () => {
await page.waitForLoadState('networkidle');
await toggle.scrollIntoViewIfNeeded();
await page.waitForTimeout(200);
await toggle.click({ force: true });
await waitForToast(page, /updated|success|enabled|disabled/i, 10000);
});
await test.step('Reload page', async () => {
await page.reload();
await waitForLoadingComplete(page);
});
await test.step('Verify state persisted', async () => {
const newChecked = await page.getByTestId('toggle-acl').isChecked();
expect(newChecked).toBe(!initialChecked);
});
// NOTE: Do NOT restore here - afterAll handles cleanup
});
});
```
---
### File 3: `tests/global-setup.ts` (MODIFY)
**Add Emergency Security Reset:**
```typescript
// ADD to the end of the global setup function, before returning
// Import at top of file
import { request as playwrightRequest } from '@playwright/test';
import { existsSync, readFileSync } from 'fs';
import { STORAGE_STATE } from './constants';
// ADD in globalSetup function, after auth state is created:
async function emergencySecurityReset(baseURL: string) {
// Only run if auth state exists (meaning we can make authenticated requests)
if (!existsSync(STORAGE_STATE)) {
return;
}
try {
const authenticatedContext = await playwrightRequest.newContext({
baseURL,
storageState: STORAGE_STATE,
});
// Disable ACL to prevent deadlock from previous failed runs
await authenticatedContext.post('/api/v1/settings', {
data: { key: 'security.acl.enabled', value: 'false' },
});
await authenticatedContext.dispose();
console.log('✓ Security reset: ACL disabled');
} catch (error) {
console.warn('⚠️ Could not reset security state:', error);
}
}
// Call at end of globalSetup:
await emergencySecurityReset(process.env.PLAYWRIGHT_BASE_URL || 'http://localhost:8080');
```
---
### File 4: `tests/fixtures/auth-fixtures.ts` (OPTIONAL ENHANCEMENT)
**Add security fixture for tests that need it:**
```typescript
// ADD after existing imports
import {
withSecurityEnabled,
SecurityModuleOptions,
CapturedSecurityState,
captureSecurityState,
restoreSecurityState,
} from '../utils/security-helpers';
// ADD to AuthFixtures interface
interface AuthFixtures {
// ... existing fixtures ...
/**
* Security state manager for tests that need to toggle security modules.
* Automatically captures and restores state.
*/
securityState: {
enable: (options: SecurityModuleOptions) => Promise<void>;
captured: CapturedSecurityState | null;
};
}
// ADD fixture definition in test.extend
securityState: async ({ request }, use) => {
let capturedState: CapturedSecurityState | null = null;
const manager = {
enable: async (options: SecurityModuleOptions) => {
capturedState = await captureSecurityState(request);
const cleanup = await withSecurityEnabled(request, options);
// Store cleanup for afterAll
manager._cleanup = cleanup;
},
captured: capturedState,
_cleanup: null as (() => Promise<void>) | null,
};
await use(manager);
// Cleanup after test
if (manager._cleanup) {
await manager._cleanup();
}
},
```
---
## Execution Checklist
### Phase 1: Create Helper Module
- [ ] **1.1** Create `tests/utils/security-helpers.ts` with exact code from File 1 above
- [ ] **1.2** Run TypeScript check: `npx tsc --noEmit`
- [ ] **1.3** Verify helper imports correctly in a test file
### Phase 2: Update Security Dashboard Tests
- [ ] **2.1** Add imports to `tests/security/security-dashboard.spec.ts`
- [ ] **2.2** Replace 'Module Toggle Actions' describe block with new implementation
- [ ] **2.3** Run affected tests: `npx playwright test security-dashboard --project=chromium`
- [ ] **2.4** Verify tests pass AND cleanup happens (check security status after)
### Phase 3: Add Global Safety Net
- [ ] **3.1** Update `tests/global-setup.ts` with emergency security reset
- [ ] **3.2** Run full test suite: `npx playwright test --project=chromium`
- [ ] **3.3** Verify no ACL deadlock occurs across multiple runs
### Phase 4: Validation
- [ ] **4.1** Force a test failure (e.g., add `throw new Error()`) and verify cleanup still runs
- [ ] **4.2** Check security status after failed test: `curl localhost:8080/api/v1/security/status`
- [ ] **4.3** Confirm ACL is disabled after cleanup
- [ ] **4.4** Run full E2E suite 3 times consecutively to verify stability
---
## Benefits
1. **No deadlock**: Tests can safely enable/disable ACL with guaranteed cleanup
2. **Cleanup guaranteed**: `test.afterAll` runs even on failure
3. **Realistic testing**: ACL tests use the same toggle mechanism as users
4. **Isolation**: Other tests unaffected by ACL state
5. **Global safety net**: Even if individual cleanup fails, global setup resets state
## Risk Mitigation
| Risk | Mitigation |
|------|------------|
| Cleanup fails due to API error | Emergency fallback disables ACL specifically |
| Global setup can't reset state | Auth state file check prevents errors |
| Tests run in parallel | Each describe block has its own captured state |
| API changes break helpers | Settings keys are centralized in one const |
## Files Summary
| File | Action | Priority |
|------|--------|----------|
| `tests/utils/security-helpers.ts` | **CREATE** | Critical |
| `tests/security/security-dashboard.spec.ts` | **MODIFY** | Critical |
| `tests/global-setup.ts` | **MODIFY** | High |
| `tests/fixtures/auth-fixtures.ts` | **MODIFY** (Optional) | Low |