Remove overly complex verification logic that was causing all browser
jobs to fail. Browser installation should fail fast and clearly if
there are issues.
Changes:
- Remove multi-line verification scripts from all 3 browser install steps
- Simplify to single command: npx playwright install --with-deps {browser}
- Let install step show actual errors if it fails
- Let test execution show "browser not found" errors if install incomplete
Rationale:
- Previous complex verification (using grep/find) was the failure point
- Simpler approach provides clearer error messages for debugging
- Tests themselves will fail clearly if browsers aren't available
Expected outcome:
- Install steps show actual error messages if they fail
- If install succeeds, tests execute normally
- If install "succeeds" but browser is missing, test step shows clear error
Timeout remains at 45 minutes (accommodates 10-15 min install + execution)
- Changed workflow name to reflect sequential execution for stability.
- Reduced test sharding from 4 to 1 per browser, resulting in 3 total jobs.
- Updated job summaries and documentation to clarify execution model.
- Added new documentation file for E2E CI failure diagnosis.
- Adjusted job summary tables to reflect changes in shard counts and execution type.
workflow_run triggers only fire for push events, not pull_request events,
causing PRs to skip integration and E2E tests entirely. Add dual triggers
to all test workflows so they run for both push (via workflow_run) and
pull_request events, while maintaining single-build architecture.
All workflows still pull pre-built images from docker-build.yml - no
redundant builds introduced. This fixes PR test coverage while preserving
the "Build Once, Test Many" optimization for push events.
Fixes: Build Once architecture (commit 928033ec)
Restructures CI/CD pipeline to eliminate redundant Docker image builds
across parallel test workflows. Previously, every PR triggered 5 separate
builds of identical images, consuming compute resources unnecessarily and
contributing to registry storage bloat.
Registry storage was growing at 20GB/week due to unmanaged transient tags
from multiple parallel builds. While automated cleanup exists, preventing
the creation of redundant images is more efficient than cleaning them up.
Changes CI/CD orchestration so docker-build.yml is the single source of
truth for all Docker images. Integration tests (CrowdSec, Cerberus, WAF,
Rate Limiting) and E2E tests now wait for the build to complete via
workflow_run triggers, then pull the pre-built image from GHCR.
PR and feature branch images receive immutable tags that include commit
SHA (pr-123-abc1234, feature-dns-provider-def5678) to prevent race
conditions when branches are updated during test execution. Tag
sanitization handles special characters, slashes, and name length limits
to ensure Docker compatibility.
Adds retry logic for registry operations to handle transient GHCR
failures, with dual-source fallback to artifact downloads when registry
pulls fail. Preserves all existing functionality and backward
compatibility while reducing parallel build count from 5× to 1×.
Security scanning now covers all PR images (previously skipped),
blocking merges on CRITICAL/HIGH vulnerabilities. Concurrency groups
prevent stale test runs from consuming resources when PRs are updated
mid-execution.
Expected impact: 80% reduction in compute resources, 4× faster
total CI time (120min → 30min), prevention of uncontrolled registry
storage growth, and 100% consistency guarantee (all tests validate
the exact same image that would be deployed).
Closes #[issue-number-if-exists]
- Implemented `diagnose-crowdsec.sh` script for checking CrowdSec connectivity and configuration.
- Added E2E tests for CrowdSec console enrollment, including API checks for enrollment status, diagnostics connectivity, and configuration validation.
- Created E2E tests for CrowdSec diagnostics, covering configuration file validation, connectivity checks, and configuration export.
- Create phase1_diagnostics.md to document findings from test interruptions
- Introduce phase1_validation_checklist.md for pre-deployment validation
- Implement diagnostic-helpers.ts for enhanced logging and state capture
- Enable browser console logging, error tracking, and dialog lifecycle monitoring
- Establish performance monitoring for test execution times
- Document actionable recommendations for Phase 2 remediation