Restructures CI/CD pipeline to eliminate redundant Docker image builds
across parallel test workflows. Previously, every PR triggered 5 separate
builds of identical images, consuming compute resources unnecessarily and
contributing to registry storage bloat.
Registry storage was growing at 20GB/week due to unmanaged transient tags
from multiple parallel builds. While automated cleanup exists, preventing
the creation of redundant images is more efficient than cleaning them up.
Changes CI/CD orchestration so docker-build.yml is the single source of
truth for all Docker images. Integration tests (CrowdSec, Cerberus, WAF,
Rate Limiting) and E2E tests now wait for the build to complete via
workflow_run triggers, then pull the pre-built image from GHCR.
PR and feature branch images receive immutable tags that include commit
SHA (pr-123-abc1234, feature-dns-provider-def5678) to prevent race
conditions when branches are updated during test execution. Tag
sanitization handles special characters, slashes, and name length limits
to ensure Docker compatibility.
Adds retry logic for registry operations to handle transient GHCR
failures, with dual-source fallback to artifact downloads when registry
pulls fail. Preserves all existing functionality and backward
compatibility while reducing parallel build count from 5× to 1×.
Security scanning now covers all PR images (previously skipped),
blocking merges on CRITICAL/HIGH vulnerabilities. Concurrency groups
prevent stale test runs from consuming resources when PRs are updated
mid-execution.
Expected impact: 80% reduction in compute resources, 4× faster
total CI time (120min → 30min), prevention of uncontrolled registry
storage growth, and 100% consistency guarantee (all tests validate
the exact same image that would be deployed).
Closes #[issue-number-if-exists]
- Implemented `diagnose-crowdsec.sh` script for checking CrowdSec connectivity and configuration.
- Added E2E tests for CrowdSec console enrollment, including API checks for enrollment status, diagnostics connectivity, and configuration validation.
- Created E2E tests for CrowdSec diagnostics, covering configuration file validation, connectivity checks, and configuration export.
- Create phase1_diagnostics.md to document findings from test interruptions
- Introduce phase1_validation_checklist.md for pre-deployment validation
- Implement diagnostic-helpers.ts for enhanced logging and state capture
- Enable browser console logging, error tracking, and dialog lifecycle monitoring
- Establish performance monitoring for test execution times
- Document actionable recommendations for Phase 2 remediation
- Print cache contents and Playwright CLI version for diagnostics
- Search for expected browser executables and force reinstall with --force if absent
- Add headless-launch verification via Node to fail fast with clear logs
- Add step to delete ~/.cache/ms-playwright before installing browsers
- Guarantees correct browser version for each run
- Prevents mismatched or missing browser binaries (chromium_headless_shell-1208, etc.)
- Should resolve browser not found errors for all browsers
- Removed restore-keys fallback from Playwright cache
- Only exact cache matches (same package-lock.json hash) are used
- This prevents restoring incompatible browser versions when Playwright updates
- Added cache-hit check to skip install when cache is valid
- Firefox and WebKit were failing because old cache was restored but browsers were incompatible
Job names now show: 'E2E chromium (Shard 1/4)' instead of 'E2E Tests (Shard 1/4)'
Makes it easier to identify which browser/shard is passing or failing
- Artifact names now include browser: playwright-report-{browser}-shard-{N}
- Docker logs include browser: docker-logs-{browser}-shard-{N}
- Install step always runs (idempotent) to ensure version match
- Fixed artifact name conflicts when 3 browsers share same shard number
- Updated summary and PR comment to reflect new naming
- Introduced a new wrapper function for query client to facilitate testing.
- Added comprehensive tests for upload, commit, and cancel operations.
- Improved error handling in tests to capture and assert error states.
- Enhanced session management and state reset functionality in tests.
- Implemented polling behavior tests for import status and preview queries.
- Ensured that upload previews are prioritized over status query previews.
- Validated cache invalidation and state management after commit and cancel actions.