- Refactor concurrency settings in `e2e-tests-split.yml` and `codecov-upload.yml` to remove SHA and run_id from group strings, allowing for proper cancellation of in-progress runs.
- Ensure that new pushes to the same branch cancel any ongoing workflow runs, improving CI efficiency and reducing queue times.
- Added URL validation for notification providers to ensure only valid http/https URLs are accepted.
- Implemented tests for URL validation scenarios in the Notifications component.
- Updated translations for error messages related to invalid URLs in multiple languages.
- Introduced new hooks for managing security headers and access lists in tests.
- Enhanced the ProviderForm component to reset state correctly when switching between add and edit modes.
- Improved user feedback with update indicators after saving changes to notification providers.
- Added mock implementations for new hooks in various test files to ensure consistent testing behavior.
- Implemented tests for the emergency server (Tier 2) to validate health checks, security reset functionality, and independent access.
- Created a comprehensive suite for system settings feature toggles, ensuring proper state management and API call metrics reporting.
- Removed redundant feature toggle tests from the system settings spec to maintain clarity and focus.
- Enhanced test isolation by restoring default feature flag states after each test.
- Updated environment variable assignments in multiple workflow files to use double quotes for consistency and to prevent potential issues with variable expansion.
- Refactored echo commands to group multiple lines into a single block for improved readability in the following workflows:
- release-goreleaser.yml
- renovate_prune.yml
- security-pr.yml
- security-weekly-rebuild.yml
- supply-chain-pr.yml
- supply-chain-verify.yml
- update-geolite2.yml
- waf-integration.yml
- weekly-nightly-promotion.yml
- Updated quality-checks.yml to support manual dispatch with frontend checks.
- Modified rate-limit-integration.yml to remove workflow_run triggers and adjust conditions for execution.
- Removed pull request triggers from repo-health.yml, retaining only scheduled and manual dispatch.
- Adjusted security-pr.yml and supply-chain-pr.yml to eliminate workflow_run dependencies and refine execution conditions.
- Cleaned up supply-chain-verify.yml by removing workflow_run triggers and ensuring proper execution conditions.
- Updated waf-integration.yml to remove workflow_run triggers, allowing manual dispatch only.
- Revised current_spec.md to reflect the consolidation of CI workflows into a single pipeline, detailing objectives, research findings, and implementation plans.
Detailed explanation of:
- **Dependency Fix**: Added explicit Chromium installation to Firefox and WebKit security jobs. The authentication fixture depends on Chromium being present, even when testing other browsers, causing previous runs to fail setup.
- **Workflow Isolation**: Explicitly routed `tests/security/` to the dedicated "Security Enforcement" jobs and removed them from the general shards. This prevents false negatives where security config tests fail because the middleware is intentionally disabled in standard test runs.
- **Metadata**: Added `@security` tags to all security specs (`rate-limiting`, `waf-config`, etc.) to align metadata with the new execution strategy.
- **References**: Fixes CI failures in PR
- Updated GO_VERSION to 1.25.7 across all GitHub Actions workflows to fix immediate build failures
- Added custom regex manager to `.github/renovate.json` to explicitly track `GO_VERSION` in YAML files
- Ensures Renovate detects and automerges Go updates for workflows alongside the main project
Fixed syntax errors in playwright.config.js (duplicate identifiers)
Verified all E2E and Integration workflows have correct push triggers
Confirmed immediate feedback loop for feature/hotfix branches
Validated E2E environment by running core test suite (100% pass)
- Replaced deprecated generic tool names with specific VS Code command IDs
- Enabled broad MCP tool access for Management and QA agents
- Scoped DevOps agent to strictly infrastructure and release tools
- aligned Playwright and Trivy tool usage with new MCP namespaces
- Introduced a new workflow for E2E tests that runs tests sequentially to avoid race conditions caused by parallel execution.
- Reduced the number of shards from 4 to 1 per browser, ensuring all tests for each browser run sequentially.
- Updated the existing WAF integration workflow to include pull request triggers for better CI management.
Investigation Phase:
Problem:
- Tests hang AFTER global setup completes
- No test execution begins (hung before first test)
- Step timeout (15min) doesn't trigger properly
- Job timeout (45min) eventually kills process after 44min
Changes:
1. Added DEBUG=pw:api to all browser jobs
- Will show exact Playwright API calls
- Pinpoint where execution hangs (auth setup vs browser launch vs test init)
2. Reduced job timeout: 45min → 20min
- Fail faster when tests hang
- Reduces wasted CI resources
- Still allows normal test execution (local: 1.2min)
Expected Outcome:
- Verbose logs reveal hang location
- Faster feedback loop (20min vs 44min)
- Can identify if issue is:
* auth.setup.ts hanging
* Browser process not launching
* Connection issues to application
Next Steps Based on Logs:
- If browser launch hangs: Add dumb-init (Phase 3)
- If auth setup hangs: Investigate cookie/storage state
- If network hangs: Add localhost loopback routing
Phase: 2.5 of 3 (Diagnostic Logging)
See: docs/plans/ci_hang_remediation.md
Resource Constraint Management:
Problem:
- Tests hanging indefinitely during execution in CI
- 2-core runners resource-constrained vs local dev machines
- No timeout enforcement allows tests to run forever
Changes:
1. playwright.config.js:
- Reduced per-test timeout: 90s → 60s (CI only)
- Comment clarifies CI resource constraints
- Local dev keeps 90s for debugging
2. .github/workflows/e2e-tests-split.yml:
- Added timeout-minutes: 15 to all test steps
- Ensures CI fails explicitly after 15 minutes
- Prevents workflow hanging until 6-hour GitHub limit
Expected Outcome:
- Tests fail fast with timeout error instead of hanging
- Clearer debugging: timeout vs hang vs test failure
- CI resources freed up faster for other jobs
Phase: 2 of 3 (Resource Constraints)
See: docs/plans/ci_hang_remediation.md