Files
Charon/docs/plans/current_spec.md
GitHub Actions 103f0e0ae9 fix: resolve WAF integration failure and E2E ACL deadlock
Fix integration scripts using wget-style curl options after Alpine→Debian
migration (PR #550). Add Playwright security test helpers to prevent ACL
from blocking subsequent tests.

Fix curl syntax in 5 scripts: -q -O- → -sf
Create security-helpers.ts with state capture/restore
Add emergency ACL reset to global-setup.ts
Fix fixture reuse bug in security-dashboard.spec.ts
Add security-helpers.md usage guide
Resolves WAF workflow "httpbin backend failed to start" error
2026-01-25 14:09:38 +00:00

35 KiB
Raw Blame History

WAF Integration Workflow Fix: wget-style curl Syntax Migration

Plan ID: WAF-2026-001 Status: 📋 PENDING Priority: High Created: 2026-01-25 Scope: Fix integration test scripts using incorrect wget-style curl syntax


Problem Summary

After migrating the Docker base image from Alpine to Debian Trixie (PR #550), the WAF integration workflow is failing. The root cause is not a missing wget command, but rather several integration test scripts using wget-style options with curl that don't work correctly.

Root Cause

Multiple scripts use curl -q -O- which is wget syntax, not curl syntax:

Syntax Tool Meaning
-q wget Quiet mode
-q curl Invalid - does nothing useful
-O- wget Output to stdout
-O- curl Wrong - -O means "save with remote filename", - is treated as a separate URL

The correct curl equivalents are:

wget curl Notes
wget -q curl -s Silent mode
wget -O- curl -s stdout is curl's default output
wget -q -O- URL curl -s URL Full equivalent
wget -O filename curl -o filename Note: lowercase -o in curl

Files Requiring Changes

Priority 1: Integration Test Scripts (Blocking WAF Workflow)

File Line Current Code Issue
scripts/waf_integration.sh 205 curl -q -O- http://${BACKEND_CONTAINER}/get wget syntax
scripts/cerberus_integration.sh 214 curl -q -O- http://${BACKEND_CONTAINER}/get wget syntax
scripts/rate_limit_integration.sh 190 curl -q -O- http://${BACKEND_CONTAINER}/get wget syntax
scripts/crowdsec_startup_test.sh 178 curl -q -O- http://127.0.0.1:8085/health wget syntax

Priority 2: Utility Scripts

File Line Current Code Issue
scripts/install-go-1.25.5.sh 18 curl -q -O "$TMPFILE" "URL" Wrong syntax - -O doesn't take an argument in curl

Detailed Fixes

Fix 1: scripts/waf_integration.sh (Line 205)

Current (broken):

if docker exec ${CONTAINER_NAME} sh -c "curl -q -O- http://${BACKEND_CONTAINER}/get 2>/dev/null || curl -s http://${BACKEND_CONTAINER}/get" >/dev/null 2>&1; then

Fixed:

if docker exec ${CONTAINER_NAME} sh -c "curl -sf http://${BACKEND_CONTAINER}/get" >/dev/null 2>&1; then

Notes:

  • -s = silent (no progress meter)
  • -f = fail silently on HTTP errors (returns non-zero exit code)
  • Removed redundant fallback since the fix makes the command work correctly

Fix 2: scripts/cerberus_integration.sh (Line 214)

Current (broken):

if docker exec ${CONTAINER_NAME} sh -c "curl -q -O- http://${BACKEND_CONTAINER}/get 2>/dev/null || curl -s http://${BACKEND_CONTAINER}/get" >/dev/null 2>&1; then

Fixed:

if docker exec ${CONTAINER_NAME} sh -c "curl -sf http://${BACKEND_CONTAINER}/get" >/dev/null 2>&1; then

Fix 3: scripts/rate_limit_integration.sh (Line 190)

Current (broken):

if docker exec ${CONTAINER_NAME} sh -c "curl -q -O- http://${BACKEND_CONTAINER}/get 2>/dev/null || curl -s http://${BACKEND_CONTAINER}/get" >/dev/null 2>&1; then

Fixed:

if docker exec ${CONTAINER_NAME} sh -c "curl -sf http://${BACKEND_CONTAINER}/get" >/dev/null 2>&1; then

Fix 4: scripts/crowdsec_startup_test.sh (Line 178)

Current (broken):

LAPI_HEALTH=$(docker exec ${CONTAINER_NAME} curl -q -O- http://127.0.0.1:8085/health 2>/dev/null || echo "FAILED")

Fixed:

LAPI_HEALTH=$(docker exec ${CONTAINER_NAME} curl -sf http://127.0.0.1:8085/health 2>/dev/null || echo "FAILED")

Fix 5: scripts/install-go-1.25.5.sh (Line 18)

Current (broken):

curl -q -O "$TMPFILE" "https://go.dev/dl/${TARFILE}"

Fixed:

curl -sSfL -o "$TMPFILE" "https://go.dev/dl/${TARFILE}"

Notes:

  • -s = silent
  • -S = show errors even in silent mode
  • -f = fail on HTTP errors
  • -L = follow redirects (important for go.dev downloads)
  • -o filename = output to specified file (lowercase -o)

Verification Commands

After applying fixes, verify each script works:

# Test WAF integration
./scripts/waf_integration.sh

# Test Cerberus integration
./scripts/cerberus_integration.sh

# Test Rate Limit integration
./scripts/rate_limit_integration.sh

# Test CrowdSec startup
./scripts/crowdsec_startup_test.sh

# Verify Go install script syntax
bash -n ./scripts/install-go-1.25.5.sh

Behavior Differences: wget vs curl

When migrating from wget to curl, be aware of these differences:

Behavior wget curl
Output destination File by default stdout by default
Follow redirects Yes by default Requires -L flag
Retry on failure Built-in retry Requires --retry N
Progress display Text progress bar Progress meter (use -s to hide)
HTTP error handling Non-zero exit on 404 Requires -f for non-zero exit on HTTP errors
Quiet mode -q -s (silent)
Output to file -O filename (uppercase) -o filename (lowercase)
Save with remote name -O (no arg) -O (uppercase, no arg)

Execution Checklist

  • Fix 1: Update scripts/waf_integration.sh line 205
  • Fix 2: Update scripts/cerberus_integration.sh line 214
  • Fix 3: Update scripts/rate_limit_integration.sh line 190
  • Fix 4: Update scripts/crowdsec_startup_test.sh line 178
  • Fix 5: Update scripts/install-go-1.25.5.sh line 18
  • Verify: Run each integration test locally
  • CI: Confirm WAF integration workflow passes

Notes

  1. Deprecated Scripts: Several affected scripts are marked deprecated (will be removed in v2.0.0). However, they are still used by CI workflows, so fixes are required.

  2. Skill-Based Replacements: The .github/skills/scripts/ directory was checked and contains no wget usage - those scripts already use correct curl syntax.

  3. Docker Compose Files: All health checks in docker-compose files already use correct curl syntax (curl -f, curl -fsS).

  4. Dockerfile: The main Dockerfile correctly installs curl and uses correct curl syntax in the HEALTHCHECK instruction.


Previous Plan (Archived)

The previous Git & Workflow Recovery Plan has been archived below.


Git & Workflow Recovery Plan (ARCHIVED)

Plan ID: GIT-2026-001 Status: ARCHIVED Priority: High Created: 2026-01-25 Scope: Git recovery, Renovate fix, Workflow simplification


Problem Summary

  1. Git State: Feature branch feature/beta-release is in a broken rebase state
  2. Renovate: Targeting feature branches creates orphaned PRs and merge conflicts
  3. Propagate Workflow: Overly complex cascade (main → development → nightly → feature/*) causes confusion
  4. Nightly Branch: Unnecessary intermediate branch adding complexity

Phase 1: Git Recovery

Step 1.1 — Abort the Rebase

# Check current state
git status

# Abort the in-progress rebase
git rebase --abort

# Verify clean state
git status

Step 1.2 — Fetch Latest from Origin

# Fetch all branches
git fetch origin --prune

# Ensure we're on the feature branch
git checkout feature/beta-release

Step 1.3 — Merge Development into Feature Branch

Use merge, NOT rebase to preserve commit history and avoid force-push issues.

# Merge development into feature/beta-release
git merge origin/development --no-ff -m "Merge development into feature/beta-release"

Step 1.4 — Resolve Conflicts (if any)

Likely conflict files based on Renovate activity:

  • package.json / package-lock.json (version bumps)
  • backend/go.mod / backend/go.sum (Go dependency updates)
  • .github/workflows/*.yml (action digest pins)

Resolution strategy:

# For package.json - accept development's versions, then run npm install
git checkout --theirs package.json package-lock.json
npm install
git add package.json package-lock.json

# For go.mod/go.sum - accept development's versions, then tidy
git checkout --theirs backend/go.mod backend/go.sum
cd backend && go mod tidy && cd ..
git add backend/go.mod backend/go.sum

# For workflow files - usually safe to accept development
git checkout --theirs .github/workflows/

# Complete the merge
git commit

Step 1.5 — Push the Merged Branch

git push origin feature/beta-release

Phase 2: Renovate Fix

Problem

Current config in .github/renovate.json:

"baseBranches": [
  "development",
  "feature/beta-release"
]

This causes:

  • Duplicate PRs for the same dependency (one per branch)
  • Orphaned branches like renovate/feature/beta-release-* when feature merges
  • Constant merge conflicts between branches

Solution

Only target development. Changes flow naturally via propagate workflow.

Old Config (REMOVE)

{
  "baseBranches": [
    "development",
    "feature/beta-release"
  ],
  ...
}

New Config (REPLACE WITH)

{
  "baseBranches": [
    "development"
  ],
  ...
}

File to Edit

File: .github/renovate.json Line: ~12-15


Phase 3: Propagate Workflow Fix

Problem

Current workflow in .github/workflows/propagate-changes.yml:

on:
  push:
    branches:
      - main
      - development
      - nightly  # <-- Unnecessary

Cascade logic:

  • maindevelopment (Correct)
  • developmentnightly (Unnecessary)
  • nightlyfeature/* (Overly complex)

Solution

Simplify to only main → development propagation.

Old Trigger (REMOVE)

on:
  push:
    branches:
      - main
      - development
      - nightly

New Trigger (REPLACE WITH)

on:
  push:
    branches:
      - main

Old Script Logic (REMOVE)

if (currentBranch === 'main') {
  // Main -> Development
  await createPR('main', 'development');
} else if (currentBranch === 'development') {
  // Development -> Nightly
  await createPR('development', 'nightly');
} else if (currentBranch === 'nightly') {
  // Nightly -> Feature branches
  const branches = await github.paginate(github.rest.repos.listBranches, {
    owner: context.repo.owner,
    repo: context.repo.repo,
  });

  const featureBranches = branches
    .map(b => b.name)
    .filter(name => name.startsWith('feature/'));

  core.info(`Found ${featureBranches.length} feature branches: ${featureBranches.join(', ')}`);

  for (const featureBranch of featureBranches) {
    await createPR('development', featureBranch);
  }
}

New Script Logic (REPLACE WITH)

if (currentBranch === 'main') {
  // Main -> Development (only propagation needed)
  await createPR('main', 'development');
}

File to Edit

File: .github/workflows/propagate-changes.yml


Phase 4: Cleanup

Step 4.1 — Delete Nightly Branch

# Delete remote nightly branch (if exists)
git push origin --delete nightly 2>/dev/null || echo "nightly branch does not exist"

# Delete local tracking branch
git branch -D nightly 2>/dev/null || true

Step 4.2 — Delete Orphaned Renovate Branches

# List all renovate branches targeting feature/beta-release
git fetch origin
git branch -r | grep 'renovate/feature/beta-release' | while read branch; do
  remote_branch="${branch#origin/}"
  echo "Deleting: $remote_branch"
  git push origin --delete "$remote_branch"
done

Step 4.3 — Close Orphaned Renovate PRs

After branches are deleted, any associated PRs will be automatically closed by GitHub.


Execution Checklist

  • Phase 1: Git Recovery

    • 1.1 Abort rebase
    • 1.2 Fetch latest
    • 1.3 Merge development
    • 1.4 Resolve conflicts
    • 1.5 Push merged branch
  • Phase 2: Renovate Fix

    • Edit .github/renovate.json - remove feature/beta-release from baseBranches
    • Commit and push
  • Phase 3: Propagate Workflow Fix

    • Edit .github/workflows/propagate-changes.yml - simplify triggers and logic
    • Commit and push
  • Phase 4: Cleanup

    • 4.1 Delete nightly branch
    • 4.2 Delete orphaned renovate/feature/beta-release-* branches
    • 4.3 Verify orphaned PRs are closed

Verification

After all phases complete:

# Confirm no rebase in progress
git status
# Expected: "On branch feature/beta-release" with clean state

# Confirm nightly deleted
git branch -r | grep nightly
# Expected: no output

# Confirm orphaned renovate branches deleted
git branch -r | grep 'renovate/feature/beta-release'
# Expected: no output

# Confirm Renovate config only targets development
cat .github/renovate.json | grep -A2 baseBranches
# Expected: only "development"

Rollback Plan

If issues occur:

  1. Git Recovery Failed:

    git fetch origin
    git checkout feature/beta-release
    git reset --hard origin/feature/beta-release
    
  2. Renovate Changes Broke Something: Revert the commit to .github/renovate.json

  3. Propagate Workflow Issues: Revert the commit to .github/workflows/propagate-changes.yml


Archived Spec (Prior Implementation)

Security Fix: Remove Hardcoded Encryption Keys from Docker Compose Files

Plan ID: SEC-2026-001 Status: IMPLEMENTED Priority: Critical (Security) Created: 2026-01-25 Implemented By: Management Agent


Summary

Removed hardcoded encryption keys from Docker Compose test files and implemented ephemeral key generation in CI workflows.

Changes Applied

File Change
.docker/compose/docker-compose.playwright.yml Replaced hardcoded key with ${CHARON_ENCRYPTION_KEY:?...}
.docker/compose/docker-compose.e2e.yml Replaced hardcoded key with ${CHARON_ENCRYPTION_KEY:?...}
.github/workflows/e2e-tests.yml Added ephemeral key generation step
.env.test.example Added prominent documentation

Security Notes

  • The old key ucDWy5ScLubd3QwCHhQa2SY7wL2OF48p/c9nZhyW1mA= exists in git history
  • This key should NEVER be used in any production environment
  • Each CI run now generates a unique ephemeral key

Testing

# Verify compose fails without key
unset CHARON_ENCRYPTION_KEY
docker compose -f .docker/compose/docker-compose.playwright.yml config 2>&1
# Expected: "CHARON_ENCRYPTION_KEY is required"

# Verify compose succeeds with key
export CHARON_ENCRYPTION_KEY=$(openssl rand -base64 32)
docker compose -f .docker/compose/docker-compose.playwright.yml config
# Expected: Valid YAML output

References


Playwright Security Test Helpers

Plan ID: E2E-SEC-001 Status: COMPLETED Priority: Critical (Blocking 230/707 E2E test failures) Created: 2026-01-25 Completed: 2026-01-25 Scope: Add security test helpers to prevent ACL deadlock in E2E tests


Completion Notes

Implementation Summary:

  • Created tests/utils/security-helpers.ts with full security state management utilities
  • Functions implemented: getSecurityStatus, setSecurityModuleEnabled, captureSecurityState, restoreSecurityState, withSecurityEnabled, disableAllSecurityModules
  • Pattern enables guaranteed cleanup via Playwright's test.afterAll() fixture

Documentation:


Problem Summary

During E2E testing, if ACL is left enabled from a previous test run (e.g., due to test failure), it can create a deadlock:

  1. ACL blocks API requests → returns 403 Forbidden
  2. Global cleanup can't run → API blocked
  3. Auth setup fails → tests skip
  4. Manual intervention required to reset volumes

Root Cause Analysis:

  • security-dashboard.spec.ts has tests that toggle ACL, WAF, and Rate Limiting
  • The tests attempt to "toggle back" but if a test fails mid-execution, cleanup doesn't run
  • Playwright's test.afterAll with fixtures guarantees cleanup even on failure
  • The current tests don't use fixtures for security state management

Solution Architecture

API Endpoints (Backend Already Supports)

Endpoint Method Purpose
/api/v1/security/status GET Returns current state of all security modules
/api/v1/settings POST Toggle settings with { key: "security.acl.enabled", value: "true/false" }

Settings Keys

Key Values Description
security.acl.enabled "true" / "false" Toggle ACL enforcement
security.waf.enabled "true" / "false" Toggle WAF enforcement
security.rate_limit.enabled "true" / "false" Toggle Rate Limiting
security.crowdsec.enabled "true" / "false" Toggle CrowdSec
feature.cerberus.enabled "true" / "false" Master toggle for all security

Implementation Plan

File 1: tests/utils/security-helpers.ts (CREATE)

/**
 * Security Test Helpers - Safe ACL/WAF/Rate Limit toggle for E2E tests
 *
 * These helpers provide safe mechanisms to temporarily enable security features
 * during tests, with guaranteed cleanup even on test failure.
 *
 * Problem: If ACL is left enabled after a test failure, it blocks all API requests
 * causing subsequent tests to fail with 403 Forbidden (deadlock).
 *
 * Solution: Use Playwright's test.afterAll() with captured original state to
 * guarantee restoration regardless of test outcome.
 *
 * @example
 * ```typescript
 * import { withSecurityEnabled, getSecurityStatus } from './utils/security-helpers';
 *
 * test.describe('ACL Tests', () => {
 *   let cleanup: () => Promise<void>;
 *
 *   test.beforeAll(async ({ request }) => {
 *     cleanup = await withSecurityEnabled(request, { acl: true });
 *   });
 *
 *   test.afterAll(async () => {
 *     await cleanup();
 *   });
 *
 *   test('should enforce ACL', async ({ page }) => {
 *     // ACL is now enabled, test enforcement
 *   });
 * });
 * ```
 */

import { APIRequestContext } from '@playwright/test';

/**
 * Security module status from GET /api/v1/security/status
 */
export interface SecurityStatus {
  cerberus: { enabled: boolean };
  crowdsec: { mode: string; api_url: string; enabled: boolean };
  waf: { mode: string; enabled: boolean };
  rate_limit: { mode: string; enabled: boolean };
  acl: { mode: string; enabled: boolean };
}

/**
 * Options for enabling specific security modules
 */
export interface SecurityModuleOptions {
  /** Enable ACL enforcement */
  acl?: boolean;
  /** Enable WAF protection */
  waf?: boolean;
  /** Enable rate limiting */
  rateLimit?: boolean;
  /** Enable CrowdSec */
  crowdsec?: boolean;
  /** Enable master Cerberus toggle (required for other modules) */
  cerberus?: boolean;
}

/**
 * Captured state for restoration
 */
export interface CapturedSecurityState {
  acl: boolean;
  waf: boolean;
  rateLimit: boolean;
  crowdsec: boolean;
  cerberus: boolean;
}

/**
 * Mapping of module names to their settings keys
 */
const SECURITY_SETTINGS_KEYS: Record<keyof SecurityModuleOptions, string> = {
  acl: 'security.acl.enabled',
  waf: 'security.waf.enabled',
  rateLimit: 'security.rate_limit.enabled',
  crowdsec: 'security.crowdsec.enabled',
  cerberus: 'feature.cerberus.enabled',
};

/**
 * Get current security status from the API
 * @param request - Playwright APIRequestContext (authenticated)
 * @returns Current security status
 */
export async function getSecurityStatus(
  request: APIRequestContext
): Promise<SecurityStatus> {
  const response = await request.get('/api/v1/security/status');

  if (!response.ok()) {
    throw new Error(
      `Failed to get security status: ${response.status()} ${await response.text()}`
    );
  }

  return response.json();
}

/**
 * Set a specific security module's enabled state
 * @param request - Playwright APIRequestContext (authenticated)
 * @param module - Which module to toggle
 * @param enabled - Whether to enable or disable
 */
export async function setSecurityModuleEnabled(
  request: APIRequestContext,
  module: keyof SecurityModuleOptions,
  enabled: boolean
): Promise<void> {
  const key = SECURITY_SETTINGS_KEYS[module];
  const value = enabled ? 'true' : 'false';

  const response = await request.post('/api/v1/settings', {
    data: { key, value },
  });

  if (!response.ok()) {
    throw new Error(
      `Failed to set ${module} to ${enabled}: ${response.status()} ${await response.text()}`
    );
  }

  // Wait a brief moment for Caddy config reload
  await new Promise((resolve) => setTimeout(resolve, 500));
}

/**
 * Capture current security state for later restoration
 * @param request - Playwright APIRequestContext (authenticated)
 * @returns Captured state object
 */
export async function captureSecurityState(
  request: APIRequestContext
): Promise<CapturedSecurityState> {
  const status = await getSecurityStatus(request);

  return {
    acl: status.acl.enabled,
    waf: status.waf.enabled,
    rateLimit: status.rate_limit.enabled,
    crowdsec: status.crowdsec.enabled,
    cerberus: status.cerberus.enabled,
  };
}

/**
 * Restore security state to previously captured values
 * @param request - Playwright APIRequestContext (authenticated)
 * @param state - Previously captured state
 */
export async function restoreSecurityState(
  request: APIRequestContext,
  state: CapturedSecurityState
): Promise<void> {
  const currentStatus = await getSecurityStatus(request);

  // Restore in reverse dependency order (features before master toggle)
  const modules: (keyof SecurityModuleOptions)[] = ['acl', 'waf', 'rateLimit', 'crowdsec', 'cerberus'];

  for (const module of modules) {
    const currentValue = module === 'rateLimit'
      ? currentStatus.rate_limit.enabled
      : module === 'crowdsec'
      ? currentStatus.crowdsec.enabled
      : currentStatus[module].enabled;

    if (currentValue !== state[module]) {
      await setSecurityModuleEnabled(request, module, state[module]);
    }
  }
}

/**
 * Enable security modules temporarily with guaranteed cleanup.
 *
 * Returns a cleanup function that MUST be called in test.afterAll().
 * The cleanup function restores the original state even if tests fail.
 *
 * @param request - Playwright APIRequestContext (authenticated)
 * @param options - Which modules to enable
 * @returns Cleanup function to restore original state
 *
 * @example
 * ```typescript
 * test.describe('ACL Tests', () => {
 *   let cleanup: () => Promise<void>;
 *
 *   test.beforeAll(async ({ request }) => {
 *     cleanup = await withSecurityEnabled(request, { acl: true, cerberus: true });
 *   });
 *
 *   test.afterAll(async () => {
 *     await cleanup();
 *   });
 * });
 * ```
 */
export async function withSecurityEnabled(
  request: APIRequestContext,
  options: SecurityModuleOptions
): Promise<() => Promise<void>> {
  // Capture original state BEFORE making any changes
  const originalState = await captureSecurityState(request);

  // Enable Cerberus first (master toggle) if any security module is requested
  const needsCerberus = options.acl || options.waf || options.rateLimit || options.crowdsec;
  if ((needsCerberus || options.cerberus) && !originalState.cerberus) {
    await setSecurityModuleEnabled(request, 'cerberus', true);
  }

  // Enable requested modules
  if (options.acl) {
    await setSecurityModuleEnabled(request, 'acl', true);
  }
  if (options.waf) {
    await setSecurityModuleEnabled(request, 'waf', true);
  }
  if (options.rateLimit) {
    await setSecurityModuleEnabled(request, 'rateLimit', true);
  }
  if (options.crowdsec) {
    await setSecurityModuleEnabled(request, 'crowdsec', true);
  }

  // Return cleanup function that restores original state
  return async () => {
    try {
      await restoreSecurityState(request, originalState);
    } catch (error) {
      // Log error but don't throw - cleanup should not fail tests
      console.error('Failed to restore security state:', error);
      // Try emergency disable of ACL to prevent deadlock
      try {
        await setSecurityModuleEnabled(request, 'acl', false);
      } catch {
        console.error('Emergency ACL disable also failed - manual intervention may be required');
      }
    }
  };
}

/**
 * Disable all security modules (emergency reset).
 * Use this in global-setup.ts or when tests need a clean slate.
 *
 * @param request - Playwright APIRequestContext (authenticated)
 */
export async function disableAllSecurityModules(
  request: APIRequestContext
): Promise<void> {
  const modules: (keyof SecurityModuleOptions)[] = ['acl', 'waf', 'rateLimit', 'crowdsec'];

  for (const module of modules) {
    try {
      await setSecurityModuleEnabled(request, module, false);
    } catch (error) {
      console.warn(`Failed to disable ${module}:`, error);
    }
  }
}

/**
 * Check if ACL is currently blocking requests.
 * Useful for debugging test failures.
 *
 * @param request - Playwright APIRequestContext
 * @returns True if ACL is enabled and blocking
 */
export async function isAclBlocking(request: APIRequestContext): Promise<boolean> {
  try {
    const status = await getSecurityStatus(request);
    return status.acl.enabled && status.cerberus.enabled;
  } catch {
    // If we can't get status, ACL might be blocking
    return true;
  }
}

File 2: tests/security/security-dashboard.spec.ts (MODIFY)

Changes Required:

  1. Import the new security helpers
  2. Add test.beforeAll to capture initial state
  3. Add test.afterAll to guarantee cleanup
  4. Remove redundant "toggle back" steps in individual tests
  5. Group toggle tests in a separate describe block with isolated cleanup

Exact Changes:

// ADD after existing imports (around line 12)
import {
  withSecurityEnabled,
  captureSecurityState,
  restoreSecurityState,
  CapturedSecurityState,
} from '../utils/security-helpers';
// REPLACE the entire 'Module Toggle Actions' describe block (lines ~80-180)
// with this safer implementation:

test.describe('Module Toggle Actions', () => {
  // Capture state ONCE for this describe block
  let originalState: CapturedSecurityState;
  let request: APIRequestContext;

  test.beforeAll(async ({ request: req }) => {
    request = req;
    originalState = await captureSecurityState(request);
  });

  test.afterAll(async () => {
    // CRITICAL: Restore original state even if tests fail
    if (originalState) {
      await restoreSecurityState(request, originalState);
    }
  });

  test('should toggle ACL enabled/disabled', async ({ page }) => {
    const toggle = page.getByTestId('toggle-acl');

    const isDisabled = await toggle.isDisabled();
    if (isDisabled) {
      test.info().annotations.push({
        type: 'skip-reason',
        description: 'Toggle is disabled because Cerberus security is not enabled',
      });
      test.skip();
      return;
    }

    await test.step('Toggle ACL state', async () => {
      await page.waitForLoadState('networkidle');
      await toggle.scrollIntoViewIfNeeded();
      await page.waitForTimeout(200);
      await toggle.click({ force: true });
      await waitForToast(page, /updated|success|enabled|disabled/i, 10000);
    });

    // NOTE: Do NOT toggle back here - afterAll handles cleanup
  });

  test('should toggle WAF enabled/disabled', async ({ page }) => {
    const toggle = page.getByTestId('toggle-waf');

    const isDisabled = await toggle.isDisabled();
    if (isDisabled) {
      test.info().annotations.push({
        type: 'skip-reason',
        description: 'Toggle is disabled because Cerberus security is not enabled',
      });
      test.skip();
      return;
    }

    await test.step('Toggle WAF state', async () => {
      await page.waitForLoadState('networkidle');
      await toggle.scrollIntoViewIfNeeded();
      await page.waitForTimeout(200);
      await toggle.click({ force: true });
      await waitForToast(page, /updated|success|enabled|disabled/i, 10000);
    });

    // NOTE: Do NOT toggle back here - afterAll handles cleanup
  });

  test('should toggle Rate Limiting enabled/disabled', async ({ page }) => {
    const toggle = page.getByTestId('toggle-rate-limit');

    const isDisabled = await toggle.isDisabled();
    if (isDisabled) {
      test.info().annotations.push({
        type: 'skip-reason',
        description: 'Toggle is disabled because Cerberus security is not enabled',
      });
      test.skip();
      return;
    }

    await test.step('Toggle Rate Limit state', async () => {
      await page.waitForLoadState('networkidle');
      await toggle.scrollIntoViewIfNeeded();
      await page.waitForTimeout(200);
      await toggle.click({ force: true });
      await waitForToast(page, /updated|success|enabled|disabled/i, 10000);
    });

    // NOTE: Do NOT toggle back here - afterAll handles cleanup
  });

  test('should persist toggle state after page reload', async ({ page }) => {
    const toggle = page.getByTestId('toggle-acl');

    const isDisabled = await toggle.isDisabled();
    if (isDisabled) {
      test.info().annotations.push({
        type: 'skip-reason',
        description: 'Toggle is disabled because Cerberus security is not enabled',
      });
      test.skip();
      return;
    }

    const initialChecked = await toggle.isChecked();

    await test.step('Toggle ACL state', async () => {
      await page.waitForLoadState('networkidle');
      await toggle.scrollIntoViewIfNeeded();
      await page.waitForTimeout(200);
      await toggle.click({ force: true });
      await waitForToast(page, /updated|success|enabled|disabled/i, 10000);
    });

    await test.step('Reload page', async () => {
      await page.reload();
      await waitForLoadingComplete(page);
    });

    await test.step('Verify state persisted', async () => {
      const newChecked = await page.getByTestId('toggle-acl').isChecked();
      expect(newChecked).toBe(!initialChecked);
    });

    // NOTE: Do NOT restore here - afterAll handles cleanup
  });
});

File 3: tests/global-setup.ts (MODIFY)

Add Emergency Security Reset:

// ADD to the end of the global setup function, before returning

// Import at top of file
import { request as playwrightRequest } from '@playwright/test';
import { existsSync, readFileSync } from 'fs';
import { STORAGE_STATE } from './constants';

// ADD in globalSetup function, after auth state is created:

async function emergencySecurityReset(baseURL: string) {
  // Only run if auth state exists (meaning we can make authenticated requests)
  if (!existsSync(STORAGE_STATE)) {
    return;
  }

  try {
    const authenticatedContext = await playwrightRequest.newContext({
      baseURL,
      storageState: STORAGE_STATE,
    });

    // Disable ACL to prevent deadlock from previous failed runs
    await authenticatedContext.post('/api/v1/settings', {
      data: { key: 'security.acl.enabled', value: 'false' },
    });

    await authenticatedContext.dispose();
    console.log('✓ Security reset: ACL disabled');
  } catch (error) {
    console.warn('⚠️ Could not reset security state:', error);
  }
}

// Call at end of globalSetup:
await emergencySecurityReset(process.env.PLAYWRIGHT_BASE_URL || 'http://localhost:8080');

File 4: tests/fixtures/auth-fixtures.ts (OPTIONAL ENHANCEMENT)

Add security fixture for tests that need it:

// ADD after existing imports
import {
  withSecurityEnabled,
  SecurityModuleOptions,
  CapturedSecurityState,
  captureSecurityState,
  restoreSecurityState,
} from '../utils/security-helpers';

// ADD to AuthFixtures interface
interface AuthFixtures {
  // ... existing fixtures ...

  /**
   * Security state manager for tests that need to toggle security modules.
   * Automatically captures and restores state.
   */
  securityState: {
    enable: (options: SecurityModuleOptions) => Promise<void>;
    captured: CapturedSecurityState | null;
  };
}

// ADD fixture definition in test.extend
securityState: async ({ request }, use) => {
  let capturedState: CapturedSecurityState | null = null;

  const manager = {
    enable: async (options: SecurityModuleOptions) => {
      capturedState = await captureSecurityState(request);
      const cleanup = await withSecurityEnabled(request, options);
      // Store cleanup for afterAll
      manager._cleanup = cleanup;
    },
    captured: capturedState,
    _cleanup: null as (() => Promise<void>) | null,
  };

  await use(manager);

  // Cleanup after test
  if (manager._cleanup) {
    await manager._cleanup();
  }
},

Execution Checklist

Phase 1: Create Helper Module

  • 1.1 Create tests/utils/security-helpers.ts with exact code from File 1 above
  • 1.2 Run TypeScript check: npx tsc --noEmit
  • 1.3 Verify helper imports correctly in a test file

Phase 2: Update Security Dashboard Tests

  • 2.1 Add imports to tests/security/security-dashboard.spec.ts
  • 2.2 Replace 'Module Toggle Actions' describe block with new implementation
  • 2.3 Run affected tests: npx playwright test security-dashboard --project=chromium
  • 2.4 Verify tests pass AND cleanup happens (check security status after)

Phase 3: Add Global Safety Net

  • 3.1 Update tests/global-setup.ts with emergency security reset
  • 3.2 Run full test suite: npx playwright test --project=chromium
  • 3.3 Verify no ACL deadlock occurs across multiple runs

Phase 4: Validation

  • 4.1 Force a test failure (e.g., add throw new Error()) and verify cleanup still runs
  • 4.2 Check security status after failed test: curl localhost:8080/api/v1/security/status
  • 4.3 Confirm ACL is disabled after cleanup
  • 4.4 Run full E2E suite 3 times consecutively to verify stability

Benefits

  1. No deadlock: Tests can safely enable/disable ACL with guaranteed cleanup
  2. Cleanup guaranteed: test.afterAll runs even on failure
  3. Realistic testing: ACL tests use the same toggle mechanism as users
  4. Isolation: Other tests unaffected by ACL state
  5. Global safety net: Even if individual cleanup fails, global setup resets state

Risk Mitigation

Risk Mitigation
Cleanup fails due to API error Emergency fallback disables ACL specifically
Global setup can't reset state Auth state file check prevents errors
Tests run in parallel Each describe block has its own captured state
API changes break helpers Settings keys are centralized in one const

Files Summary

File Action Priority
tests/utils/security-helpers.ts CREATE Critical
tests/security/security-dashboard.spec.ts MODIFY Critical
tests/global-setup.ts MODIFY High
tests/fixtures/auth-fixtures.ts MODIFY (Optional) Low