Commit Graph

681 Commits

Author SHA1 Message Date
Jeremy
7c0a29b760 fix: Merge branch 'development' 2026-02-05 19:05:57 +00:00
GitHub Actions
7fc94902e8 fix(ci): remove redundant Playwright browser cache cleanup from workflows 2026-02-05 19:05:57 +00:00
GitHub Actions
b043a97539 fix(ci): remove redundant image tag determination logic from multiple workflows 2026-02-05 19:05:48 +00:00
GitHub Actions
e8584f17c0 git status
rm .github/workflows/crowdsec-integration.yml .github/workflows/rate-limit-integration.yml .github/workflows/waf-integration.yml .github/workflows/e2e-tests.yml
fix(ci): add image_tag input for manual triggers in integration workflows
2026-02-05 19:04:31 +00:00
renovate[bot]
96746ed100 fix(deps): update weekly-non-major-updates 2026-02-05 19:03:37 +00:00
renovate[bot]
43668b4d5c fix(deps): update weekly-non-major-updates 2026-02-05 19:03:08 +00:00
GitHub Actions
7a63e4b9c1 chore: update Go version references from 1.25.6 to 1.25.7 across documentation and scripts 2026-02-05 19:03:08 +00:00
GitHub Actions
21b52959f5 chore: e3e triage 2026-02-05 11:00:56 +00:00
GitHub Actions
39b5b8a928 fix(ci): reorganize E2E tests for improved isolation and execution stability 2026-02-05 01:47:22 +00:00
GitHub Actions
6aea2380b0 fix(ci): increase total shards for parallel test execution in E2E tests 2026-02-05 01:32:18 +00:00
GitHub Actions
5284aff1e5 fix(ci): update shard configuration for parallel test execution in E2E tests 2026-02-05 01:27:59 +00:00
GitHub Actions
140a8bfd0f fix(ci): increase total shards for parallel test execution in E2E tests 2026-02-05 01:02:10 +00:00
GitHub Actions
d708ecb394 fix(ci): update shard configuration for parallel test execution in E2E tests 2026-02-05 01:01:00 +00:00
GitHub Actions
f5892dd89d fix(ci): enable parallel test execution with sharding for E2E tests 2026-02-05 00:56:12 +00:00
GitHub Actions
d4f89ebf73 fix(ci): update conditions for artifact uploads and cleanup steps in E2E tests 2026-02-05 00:24:21 +00:00
GitHub Actions
9eed683a76 fix(ci): update concurrency group name for E2E tests workflow 2026-02-05 00:05:42 +00:00
GitHub Actions
8d393b6e82 fix(ci): simplify test execution commands and remove unnecessary logging for Chromium, Firefox, and WebKit tests 2026-02-04 23:53:17 +00:00
GitHub Actions
f5700c266a fix(ci): increase timeout for Chromium, Firefox, and WebKit tests; add line reporter for cleaner CI output 2026-02-04 23:46:05 +00:00
GitHub Actions
22619326de fix(ci): streamline Playwright configuration and remove preflight setup test 2026-02-04 23:34:48 +00:00
GitHub Actions
7c81c7e3de fix(ci): reduce timeout for Chromium tests to improve CI efficiency 2026-02-04 23:08:51 +00:00
GitHub Actions
57f0919116 fix(ci): enhance logging for environment details and test discovery in Chromium tests 2026-02-04 22:58:06 +00:00
GitHub Actions
f885096ab4 fix(ci): simplify Chromium, Firefox, and WebKit test job names and remove shard references 2026-02-04 21:48:28 +00:00
GitHub Actions
292ca5d170 fix(ci): enhance Playwright debug output for better browser launch diagnostics 2026-02-04 21:43:24 +00:00
GitHub Actions
89dc5650e1 debug(ci): Add Playwright verbose output and reduce job timeout
Investigation Phase:

Problem:
- Tests hang AFTER global setup completes
- No test execution begins (hung before first test)
- Step timeout (15min) doesn't trigger properly
- Job timeout (45min) eventually kills process after 44min

Changes:
1. Added DEBUG=pw:api to all browser jobs
   - Will show exact Playwright API calls
   - Pinpoint where execution hangs (auth setup vs browser launch vs test init)

2. Reduced job timeout: 45min → 20min
   - Fail faster when tests hang
   - Reduces wasted CI resources
   - Still allows normal test execution (local: 1.2min)

Expected Outcome:
- Verbose logs reveal hang location
- Faster feedback loop (20min vs 44min)
- Can identify if issue is:
  * auth.setup.ts hanging
  * Browser process not launching
  * Connection issues to application

Next Steps Based on Logs:
- If browser launch hangs: Add dumb-init (Phase 3)
- If auth setup hangs: Investigate cookie/storage state
- If network hangs: Add localhost loopback routing

Phase: 2.5 of 3 (Diagnostic Logging)
See: docs/plans/ci_hang_remediation.md
2026-02-04 21:11:13 +00:00
GitHub Actions
ff1bb06f60 feat(ci): Add explicit timeout enforcement (Phase 2)
Resource Constraint Management:

Problem:
- Tests hanging indefinitely during execution in CI
- 2-core runners resource-constrained vs local dev machines
- No timeout enforcement allows tests to run forever

Changes:
1. playwright.config.js:
   - Reduced per-test timeout: 90s → 60s (CI only)
   - Comment clarifies CI resource constraints
   - Local dev keeps 90s for debugging

2. .github/workflows/e2e-tests-split.yml:
   - Added timeout-minutes: 15 to all test steps
   - Ensures CI fails explicitly after 15 minutes
   - Prevents workflow hanging until 6-hour GitHub limit

Expected Outcome:
- Tests fail fast with timeout error instead of hanging
- Clearer debugging: timeout vs hang vs test failure
- CI resources freed up faster for other jobs

Phase: 2 of 3 (Resource Constraints)
See: docs/plans/ci_hang_remediation.md
2026-02-04 20:26:17 +00:00
GitHub Actions
eb917a82e6 fix(ci): update health check URL from localhost to 127.0.0.1 for consistency
- workflow explicitly set PLAYWRIGHT_BASE_URL: http://localhost:8080 which overrides all the 127.0.0.1 defaults
2026-02-04 20:06:15 +00:00
GitHub Actions
b94a40f54a fix(ci): adjust GeoIP database download and Playwright dependencies for CI stability 2026-02-04 18:46:09 +00:00
GitHub Actions
707c34b4d6 fix(ci): improve Playwright installation steps by removing redundant system dependency installs and enhancing exit code handling 2026-02-04 17:43:49 +00:00
GitHub Actions
1b66257868 fix(ci): enhance Playwright installation steps with system dependencies and cache checks 2026-02-04 17:27:35 +00:00
GitHub Actions
6e3fcf7824 fix: simplify Playwright browser installation steps
Remove overly complex verification logic that was causing all browser
jobs to fail. Browser installation should fail fast and clearly if
there are issues.

Changes:
- Remove multi-line verification scripts from all 3 browser install steps
- Simplify to single command: npx playwright install --with-deps {browser}
- Let install step show actual errors if it fails
- Let test execution show "browser not found" errors if install incomplete

Rationale:
- Previous complex verification (using grep/find) was the failure point
- Simpler approach provides clearer error messages for debugging
- Tests themselves will fail clearly if browsers aren't available

Expected outcome:
- Install steps show actual error messages if they fail
- If install succeeds, tests execute normally
- If install "succeeds" but browser is missing, test step shows clear error

Timeout remains at 45 minutes (accommodates 10-15 min install + execution)
2026-02-04 17:08:30 +00:00
GitHub Actions
3c0b9fa2b1 fix: resolve Playwright browser executable not found errors in CI
Root causes:
1. Browser cache was restoring corrupted/stale binaries from previous runs
2. 30-minute timeout insufficient for fresh Playwright installation (10-15 min)
   plus Docker/health checks and test execution

Changes:
- Remove browser caching from all 3 browser jobs (chromium, firefox, webkit)
- Increase timeout from 30 → 45 minutes for all jobs
- Add diagnostic logging to browser install steps:
  * Install start/completion timestamps
  * Exit code verification
  * Cache directory inspection on failure
  * Browser executable verification using 'npx playwright test --list'

Benefits:
- Fresh browser installations guaranteed (no cache pollution)
- 15-minute buffer prevents premature timeouts
- Detailed diagnostics to catch future installation issues early
- Consistent behavior across all browsers

Technical notes:
- Browser install with --with-deps takes 10-15 minutes per browser
- GitHub Actions cache was causing more harm than benefit (stale binaries)
- Sequential execution (1 shard per browser) combined with fresh installs
  ensures stable, reproducible CI behavior

Expected outcome:
- Firefox/WebKit failures from missing browser executables → resolved
- Chrome timeout at 30 minutes → resolved with 45 minute buffer
- Future installation issues → caught immediately via diagnostics

Refs: #hofix/ci
QA: YAML syntax validated, pre-commit hooks passed (12/12)
2026-02-04 16:44:47 +00:00
Jeremy
40a37f76ac Merge branch 'main' into hotfix/ci 2026-02-04 11:09:04 -05:00
GitHub Actions
e6c2f46475 fix(e2e): update E2E tests workflow to sequential execution and fix race conditions
- Changed workflow name to reflect sequential execution for stability.
- Reduced test sharding from 4 to 1 per browser, resulting in 3 total jobs.
- Updated job summaries and documentation to clarify execution model.
- Added new documentation file for E2E CI failure diagnosis.
- Adjusted job summary tables to reflect changes in shard counts and execution type.
2026-02-04 16:08:11 +00:00
Jeremy
a845b83ef7 fix: Merge branch 'development' 2026-02-04 16:01:22 +00:00
GitHub Actions
7bb88204d2 fix(ci): remove redundant Playwright browser cache cleanup from workflows 2026-02-04 14:42:17 +00:00
GitHub Actions
73f6d3d691 fix(ci): remove redundant image tag determination logic from multiple workflows 2026-02-04 14:24:11 +00:00
GitHub Actions
80c033b812 fix(ci): standardize image tag step ID across integration workflows 2026-02-04 14:16:02 +00:00
Jeremy
0519b4baed Merge branch 'main' into hotfix/ci 2026-02-04 09:10:32 -05:00
GitHub Actions
8edde88f95 fix(ci): add image_tag input for manual triggers in integration workflows 2026-02-04 14:08:36 +00:00
GitHub Actions
e1c7ed3a13 fix(ci): add manual trigger inputs for Cerberus integration workflow 2026-02-04 13:53:01 +00:00
Jeremy
83a695fbdc Merge branch 'feature/beta-release' into development 2026-02-04 05:26:47 -05:00
GitHub Actions
6938d4634c fix(ci): update workflows to support manual triggers and conditional execution based on Docker build success 2026-02-04 10:07:50 +00:00
GitHub Actions
1267b74ace fix(ci): add pull_request triggers to test workflows for PR coverage
workflow_run triggers only fire for push events, not pull_request events,
causing PRs to skip integration and E2E tests entirely. Add dual triggers
to all test workflows so they run for both push (via workflow_run) and
pull_request events, while maintaining single-build architecture.

All workflows still pull pre-built images from docker-build.yml - no
redundant builds introduced. This fixes PR test coverage while preserving
the "Build Once, Test Many" optimization for push events.

Fixes: Build Once architecture (commit 928033ec)
2026-02-04 05:51:58 +00:00
GitHub Actions
721b533e15 fix(docker-build): enhance feature branch tag generation with improved sanitization 2026-02-04 05:17:19 +00:00
GitHub Actions
1a8df0c732 refactor(docker-build): simplify feature branch tag generation in workflow 2026-02-04 05:00:46 +00:00
GitHub Actions
4a2c3b4631 refactor(docker-build): improve Docker build command handling with array arguments for tags and labels 2026-02-04 04:55:58 +00:00
GitHub Actions
ac39eb6866 refactor(docker-build): optimize Docker build command handling and improve readability 2026-02-04 04:50:48 +00:00
GitHub Actions
6b15aaad08 fix(workflow): enhance Docker build process for PRs and feature branches 2026-02-04 04:46:41 +00:00
GitHub Actions
928033ec37 chore(ci): implement "build once, test many" architecture
Restructures CI/CD pipeline to eliminate redundant Docker image builds
across parallel test workflows. Previously, every PR triggered 5 separate
builds of identical images, consuming compute resources unnecessarily and
contributing to registry storage bloat.

Registry storage was growing at 20GB/week due to unmanaged transient tags
from multiple parallel builds. While automated cleanup exists, preventing
the creation of redundant images is more efficient than cleaning them up.

Changes CI/CD orchestration so docker-build.yml is the single source of
truth for all Docker images. Integration tests (CrowdSec, Cerberus, WAF,
Rate Limiting) and E2E tests now wait for the build to complete via
workflow_run triggers, then pull the pre-built image from GHCR.

PR and feature branch images receive immutable tags that include commit
SHA (pr-123-abc1234, feature-dns-provider-def5678) to prevent race
conditions when branches are updated during test execution. Tag
sanitization handles special characters, slashes, and name length limits
to ensure Docker compatibility.

Adds retry logic for registry operations to handle transient GHCR
failures, with dual-source fallback to artifact downloads when registry
pulls fail. Preserves all existing functionality and backward
compatibility while reducing parallel build count from 5× to 1×.

Security scanning now covers all PR images (previously skipped),
blocking merges on CRITICAL/HIGH vulnerabilities. Concurrency groups
prevent stale test runs from consuming resources when PRs are updated
mid-execution.

Expected impact: 80% reduction in compute resources, 4× faster
total CI time (120min → 30min), prevention of uncontrolled registry
storage growth, and 100% consistency guarantee (all tests validate
the exact same image that would be deployed).

Closes #[issue-number-if-exists]
2026-02-04 04:42:42 +00:00
GitHub Actions
daef23118a test(crowdsec): add LAPI connectivity tests and enhance integration test reporting 2026-02-04 01:56:56 +00:00