Files

GitHub Actions ca477c48d4 chore: Enhance documentation for E2E testing:

- Added clarity and structure to README files, including recent updates and getting started sections.
- Improved manual verification documentation for CrowdSec authentication, emphasizing expected outputs and success criteria.
- Updated debugging guide with detailed output examples and automatic trace capture information.
- Refined best practices for E2E tests, focusing on efficient polling, locator strategies, and state management.
- Documented triage report for DNS Provider feature tests, highlighting issues fixed and test results before and after improvements.
- Revised E2E test writing guide to include when to use specific helper functions and patterns for better test reliability.
- Enhanced troubleshooting documentation with clear resolutions for common issues, including timeout and token configuration problems.
- Updated tests README to provide quick links and best practices for writing robust tests.

2026-03-24 01:47:22 +00:00

11 KiB

Raw Blame History

E2E Test Troubleshooting

Common issues and solutions for Playwright E2E tests.

Recent Improvements (2026-02)

Test Timeout Issues - RESOLVED

Symptoms: Tests timing out after 30 seconds, config reload overlay blocking interactions

Resolution:

Extended timeout from 30s to 60s for feature flag propagation
Added automatic detection and waiting for config reload overlay
Improved test isolation with proper cleanup in afterEach hooks

If you still experience timeouts:

Rebuild the E2E container: .github/skills/scripts/skill-runner.sh docker-rebuild-e2e
Check Docker logs for health check failures
Verify emergency token is set in .env file

API Key Format Mismatch - RESOLVED

Symptoms: Feature flag tests failing with propagation timeout

Resolution:

Added key normalization to handle both feature.cerberus.enabled and cerberus.enabled formats
Tests now automatically detect and adapt to API response format

Configuration: No manual configuration needed, normalization is automatic.

Quick Diagnostics

Run these commands first:

# Check emergency token is set
grep CHARON_EMERGENCY_TOKEN .env

# Verify token length
echo -n "$(grep CHARON_EMERGENCY_TOKEN .env | cut -d= -f2)" | wc -c
# Should output: 64

# Check Docker container is running
docker ps | grep charon

# Check health endpoint
curl -f http://localhost:8080/api/v1/health || echo "Health check failed"

Error: "CHARON_EMERGENCY_TOKEN is not set"

Symptoms

Tests fail immediately with environment configuration error
Error appears in global setup before any tests run

Cause

Emergency token not configured in .env file.

Solution

Generate token:
```
openssl rand -hex 32
```

Add to .env file:

echo "CHARON_EMERGENCY_TOKEN=<paste_token_here>" >> .env

Verify:
```
grep CHARON_EMERGENCY_TOKEN .env
```
Run tests:
```
npx playwright test --project=chromium
```

📖 More Info: See Getting Started - Emergency Token Configuration

Error: "CHARON_EMERGENCY_TOKEN is too short"

Symptoms

Global setup fails with message about token length
Current token length shown in error (e.g., "32 chars, minimum 64")

Cause

Token is shorter than 64 characters (security requirement).

Solution

Regenerate token with correct length:

openssl rand -hex 32  # Generates 64-char hex string

Update .env file:

sed -i "s/CHARON_EMERGENCY_TOKEN=.*/CHARON_EMERGENCY_TOKEN=<new_token>/" .env

Verify length:

echo -n "$(grep CHARON_EMERGENCY_TOKEN .env | cut -d= -f2)" | wc -c
# Should output: 64

Error: "Failed to reset security modules using emergency token"

Symptoms

Security teardown fails
Causes 20+ cascading test failures
Error message about emergency reset

Possible Causes

Token too short (< 64 chars)
Token doesn't match backend configuration
Backend not running or unreachable
Network/container issues

Solution

Step 1: Verify token configuration

# Check token exists and is 64 chars
echo -n "$(grep CHARON_EMERGENCY_TOKEN .env | cut -d= -f2)" | wc -c

# Check backend env matches (if using Docker)
docker exec charon env | grep CHARON_EMERGENCY_TOKEN

Step 2: Verify backend is running

curl http://localhost:8080/api/v1/health
# Should return: {"status":"ok"}

Step 3: Test emergency endpoint directly

curl -X POST http://localhost:8080/api/v1/emergency/security-reset \
  -H "X-Emergency-Token: $(grep CHARON_EMERGENCY_TOKEN .env | cut -d= -f2)" \
  -H "Content-Type: application/json" \
  -d '{"reason":"manual test"}' | jq

Step 4: Check backend logs

# Docker Compose
docker compose logs charon | tail -50

# Docker Run
docker logs charon | tail -50

Step 5: Regenerate token if needed

# Generate new token
NEW_TOKEN=$(openssl rand -hex 32)

# Update .env
sed -i "s/CHARON_EMERGENCY_TOKEN=.*/CHARON_EMERGENCY_TOKEN=${NEW_TOKEN}/" .env

# Restart backend with new token
docker restart charon

# Wait for health
sleep 5 && curl http://localhost:8080/api/v1/health

Error: "Blocked by access control list" (403)

Symptoms

Most tests fail with 403 Forbidden errors
Error message contains "Blocked by access control"

Cause

Security teardown did not successfully disable ACL before tests ran.

Solution

Run teardown script manually:

npx playwright test tests/security-teardown.setup.ts

Check teardown output for errors:
- Look for "Emergency reset successful" message
- Verify no error messages about missing token

Verify ACL is disabled:

curl http://localhost:8080/api/v1/security/status | jq
# acl.enabled should be false

If still blocked, manually disable via API:

# Using emergency token
curl -X POST http://localhost:8080/api/v1/emergency/security-reset \
  -H "X-Emergency-Token: $(grep CHARON_EMERGENCY_TOKEN .env | cut -d= -f2)" \
  -H "Content-Type: application/json" \
  -d '{"reason":"manual disable before tests"}'

Run tests again:
```
npx playwright test --project=chromium
```

Tests Pass Locally but Fail in CI/CD

Symptoms

Tests work on your machine
Same tests fail in GitHub Actions
Error about missing emergency token in CI logs

Cause

CHARON_EMERGENCY_TOKEN not configured in GitHub Secrets.

Solution

Navigate to repository settings:
- Go to: https://github.com/<your-org>/<your-repo>/settings/secrets/actions
- Or: Repository → Settings → Secrets and Variables → Actions
Create secret:
- Click "New repository secret"
- Name: CHARON_EMERGENCY_TOKEN
- Value: Generate with openssl rand -hex 32
- Click "Add secret"
Verify secret is set:
- Secret should appear in list (value is masked)
- Cannot view value after creation (security)
Re-run workflow:
- Navigate to Actions tab
- Re-run failed workflow
- Check "Validate Emergency Token Configuration" step passes

📖 Detailed Instructions: See GitHub Setup Guide

Error: "ECONNREFUSED" or "ENOTFOUND"

Symptoms

Tests fail with connection refused errors
Cannot reach localhost:8080 or configured base URL

Cause

Backend container not running or not accessible.

Solution

Check container status:
```
docker ps | grep charon
```

If not running, start it:

# Docker Compose
docker compose up -d

# Docker Run
docker start charon

Wait for health:

timeout 60 bash -c 'until curl -f http://localhost:8080/api/v1/health; do sleep 2; done'

Check logs if still failing:
```
docker logs charon | tail -50
```

Error: Token appears to be a placeholder value

Symptoms

Global setup validation fails
Error mentions "placeholder value"

Cause

Token contains common placeholder strings like:

test-emergency-token
your_64_character
replace_this
0000000000000000

Solution

Generate a unique token:
```
openssl rand -hex 32
```

Replace placeholder in .env:

sed -i "s/CHARON_EMERGENCY_TOKEN=.*/CHARON_EMERGENCY_TOKEN=<new_token>/" .env

Verify it's not a placeholder:

grep CHARON_EMERGENCY_TOKEN .env
# Should show a random hex string

Debug Mode

Run tests with full debugging for deeper investigation:

With Playwright Inspector

npx playwright test --debug

Interactive UI for stepping through tests.

With Full Traces

npx playwright test --trace=on

Capture execution traces for each test.

View Trace After Test

npx playwright show-trace test-results/traces/*.zip

Opens trace viewer in browser.

With Enhanced Logging

DEBUG=charon:*,charon-test:* PLAYWRIGHT_DEBUG=1 npx playwright test --project=chromium

Enables all debug output.

Performance Issues

Tests Running Slowly

Symptoms: Tests take > 5 minutes for full suite.

Solutions:

Use sharding (parallel execution):

npx playwright test --shard=1/4 --project=chromium

Run specific test files:

npx playwright test tests/manual-dns-provider.spec.ts

Skip slow tests during development:

npx playwright test --grep-invert "@slow"

Feature Flag Toggle Tests Timing Out

Symptoms:

Tests in tests/settings/system-settings.spec.ts fail with timeout errors
Error messages mention feature flag toggles (Cerberus, CrowdSec, Uptime, Persist)

Cause:

Backend N+1 query pattern causing 300-600ms latency in CI
Hard-coded waits insufficient for slower CI environments

Solution (Fixed in v2.x):

Backend now uses batch query pattern (3-6x faster: 600ms → 200ms P99)
Tests use condition-based polling with waitForFeatureFlagPropagation()
Retry logic with exponential backoff handles transient failures

If you still experience issues:

Check backend latency: grep "[METRICS]" docker logs charon
Verify batch query is being used (should see WHERE key IN (...) in logs)
Ensure you're running latest version with the optimization

📖 See Also: Feature Flags Performance Documentation

Container Startup Slow

Symptoms: Health check timeouts, tests fail before running.

Solutions:

Increase health check timeout:

timeout 120 bash -c 'until curl -f http://localhost:8080/api/v1/health; do sleep 2; done'

Pre-pull Docker image:
```
docker pull wikid82/charon:latest
```

Check Docker resource limits:

docker stats charon
# Ensure adequate CPU/memory

Getting Help

If you're still stuck after trying these solutions:

Check known issues:
- Review E2E Triage Report
- Search GitHub Issues

Collect diagnostic info:

# Environment
echo "OS: $(uname -a)"
echo "Docker: $(docker --version)"
echo "Node: $(node --version)"

# Configuration
echo "Base URL: ${PLAYWRIGHT_BASE_URL:-http://localhost:8080}"
echo "Token set: $([ -n "$CHARON_EMERGENCY_TOKEN" ] && echo "Yes" || echo "No")"

# Logs
docker logs charon > charon-logs.txt
npx playwright test --project=chromium > test-output.txt 2>&1

Open GitHub issue:
- Include diagnostic info above
- Attach charon-logs.txt and test-output.txt
- Describe steps to reproduce
- Tag with testing and e2e labels
Ask in community:
- GitHub Discussions
- Include relevant error messages (mask any secrets!)

Last Updated: 2026-02-02

11 KiB Raw Blame History

E2E Test Troubleshooting

Recent Improvements (2026-02)

Test Timeout Issues - RESOLVED

API Key Format Mismatch - RESOLVED

Quick Diagnostics

Error: "CHARON_EMERGENCY_TOKEN is not set"

Symptoms

Cause

Solution

Error: "CHARON_EMERGENCY_TOKEN is too short"

Symptoms

Cause

Solution

Error: "Failed to reset security modules using emergency token"

Symptoms

Possible Causes

Solution

Error: "Blocked by access control list" (403)

Symptoms

Cause

Solution

Tests Pass Locally but Fail in CI/CD

Symptoms

Cause

Solution

Error: "ECONNREFUSED" or "ENOTFOUND"

Symptoms

Cause

Solution

Error: Token appears to be a placeholder value

Symptoms

Cause

Solution

Debug Mode

With Playwright Inspector

With Full Traces

View Trace After Test

With Enhanced Logging

Performance Issues

Tests Running Slowly

Feature Flag Toggle Tests Timing Out

Container Startup Slow

Getting Help

Related Documentation

11 KiB

Raw Blame History