chore: refactor end-to-end tests for emergency server and feature toggles
- Implemented tests for the emergency server (Tier 2) to validate health checks, security reset functionality, and independent access. - Created a comprehensive suite for system settings feature toggles, ensuring proper state management and API call metrics reporting. - Removed redundant feature toggle tests from the system settings spec to maintain clarity and focus. - Enhanced test isolation by restoring default feature flag states after each test.
This commit is contained in:
@@ -1,335 +1,188 @@
|
||||
---
|
||||
title: "CI Pipeline Reliability and Docker Tagging"
|
||||
title: "E2E Security Test Isolation"
|
||||
status: "draft"
|
||||
scope: "ci/linting, ci/integration, docker/publishing"
|
||||
notes: Restore Go linting parity, prevent integration-stage cancellation after successful image builds, and correct Docker tag outputs across CI workflows.
|
||||
scope: "e2e/ci, tests/playwright"
|
||||
notes: Separate security-toggling Playwright tests from non-security shards to prevent ACL, WAF, and rate-limit contamination.
|
||||
---
|
||||
|
||||
## 1. Introduction
|
||||
|
||||
This plan expands the CI scope to address three related gaps: missing Go
|
||||
lint enforcement, integration jobs being cancelled after a successful
|
||||
image build, and incomplete Docker tag outputs on Docker Hub. The
|
||||
intended outcome is a predictable pipeline where linting blocks early,
|
||||
integration and E2E gates complete reliably, and registries receive the
|
||||
full tag set required for traceability and stable consumption.
|
||||
This plan addresses E2E test contamination where security-focused tests are executed in non-security shards. The goal is to isolate tests that toggle Cerberus, ACL, WAF, CrowdSec, or rate limiting so non-security shards remain stable and do not hit global security state changes. The scope includes Playwright test organization and the E2E workflow split.
|
||||
|
||||
Objectives:
|
||||
|
||||
- Reinstate golangci-lint in the pipeline lint stage.
|
||||
- Use the fast config that already blocks local commits.
|
||||
- Ensure golangci-lint config is valid for the version used in CI.
|
||||
- Remove CI-only leniency so lint failures block merges.
|
||||
- Prevent integration jobs from being cancelled when image builds have
|
||||
already completed successfully.
|
||||
- Ensure Docker Hub and GHCR receive SHA-only and branch+SHA tags, plus
|
||||
latest/dev/nightly tags for main/development/nightly branches.
|
||||
- Keep CI behavior consistent across pre-commit, Makefile, VS Code
|
||||
tasks, and GitHub Actions workflows.
|
||||
- Identify which Playwright tests in non-security shards toggle or reset security modules.
|
||||
- Separate security-toggling tests into security-only execution paths.
|
||||
- Keep non-security shards stable by preventing global security state changes within those shards.
|
||||
- Preserve current coverage of security behaviors while avoiding cross-shard interference.
|
||||
|
||||
## 2. Research Findings
|
||||
|
||||
### 2.1 Current CI State (Linting)
|
||||
### 2.1 Non-Security Shard Inputs
|
||||
|
||||
- The main pipeline is [ .github/workflows/ci-pipeline.yml ] and its
|
||||
lint job runs repo health, Hadolint, GORM scanner, and frontend lint.
|
||||
There is no Go lint step in this pipeline.
|
||||
- A separate manual workflow, [ .github/workflows/quality-checks.yml ],
|
||||
runs golangci-lint with `continue-on-error: true`, which means CI does
|
||||
not block on Go lint failures.
|
||||
The non-security shards in the E2E workflow run a fixed set of directories and files in [ .github/workflows/e2e-tests-split.yml ](../../.github/workflows/e2e-tests-split.yml). The inputs include tests/settings, tests/integration, and tests/emergency-server, which contain security-toggling behavior.
|
||||
|
||||
### 2.2 Integration Cancellation Symptoms
|
||||
### 2.2 Security-Toggling Tests in Settings
|
||||
|
||||
- [ .github/workflows/ci-pipeline.yml ] defines workflow-level
|
||||
concurrency:
|
||||
`group: ci-manual-pipeline-${{ github.workflow }}-${{ github.ref_name }}`
|
||||
with `cancel-in-progress: true`.
|
||||
- Integration jobs depend on `build-image` and gate on
|
||||
`inputs.run_integration != false` and
|
||||
`needs.build-image.outputs.push_image == 'true'`.
|
||||
- Integration-gate fails if any dependent integration job reports
|
||||
`failure` or `cancelled`, and runs with `if: always()`.
|
||||
- A workflow-level cancellation after the build-image job completes will
|
||||
cancel downstream integration jobs even though the build succeeded.
|
||||
[ tests/settings/system-settings.spec.ts ](../../tests/settings/system-settings.spec.ts) toggles Cerberus and CrowdSec feature flags via the feature flags API and resets those flags after each test. These tests change global security state and can affect unrelated shards running in parallel.
|
||||
|
||||
### 2.3 Current Image Tag Outputs
|
||||
### 2.3 Emergency Server Tests
|
||||
|
||||
- In [ .github/workflows/ci-pipeline.yml ], the `Compute image tags`
|
||||
step emits:
|
||||
- `DEFAULT_TAG` (sha-<short> or pr-<number>-<short>)
|
||||
- latest/dev/nightly tags based on `github.ref_name`
|
||||
- In [ .github/workflows/docker-build.yml ], `docker/metadata-action`
|
||||
emits tags including:
|
||||
- `type=raw,value=pr-${{ env.TRIGGER_PR_NUMBER }}-{{sha}}` for PRs
|
||||
- `type=sha,format=short` for non-PRs
|
||||
- feature branch tag via `steps.feature-tag.outputs.tag`
|
||||
- `latest` only when `is_default_branch` is true
|
||||
- `dev` only when `env.TRIGGER_REF == 'refs/heads/development'`
|
||||
- Docker Hub currently shows only PR and SHA-prefixed tags for some
|
||||
builds; SHA-only and branch+SHA tags are not emitted consistently.
|
||||
- Nightly tagging exists in [ .github/workflows/nightly-build.yml ],
|
||||
but the main Docker build workflow does not emit a `nightly` tag based
|
||||
on branch detection.
|
||||
[ tests/emergency-server/tier2-validation.spec.ts ](../../tests/emergency-server/tier2-validation.spec.ts) calls the emergency security reset endpoint and validates rate limiting behavior on the emergency server. This directly disables security modules during execution and should be treated as security enforcement coverage.
|
||||
|
||||
### 2.4 Global Security Reset in Test Setup
|
||||
|
||||
[ tests/global-setup.ts ](../../tests/global-setup.ts) performs an emergency security reset and verifies that ACL and rate limiting are disabled before tests run. This is intended for cleanup, but it reinforces that global security state is shared across shards and is sensitive to security toggles.
|
||||
|
||||
Observed behavior in [ tests/global-setup.ts ](../../tests/global-setup.ts):
|
||||
|
||||
- Always validates `CHARON_EMERGENCY_TOKEN` and fails fast if missing or invalid.
|
||||
- Executes pre-auth and authenticated `emergencySecurityReset()`.
|
||||
- Runs `verifySecurityDisabled()` after the authenticated reset.
|
||||
|
||||
This means non-security shards still perform a global security reset even when `CHARON_SECURITY_TESTS_ENABLED` is set to `false` in the workflow.
|
||||
|
||||
### 2.5 Security Test Suites Already Isolated
|
||||
|
||||
The workflow already routes tests/security and tests/security-enforcement into dedicated security jobs. These suites include explicit security module enablement and enforcement checks, such as rate-limit enforcement in [ tests/security-enforcement/rate-limit-enforcement.spec.ts ](../../tests/security-enforcement/rate-limit-enforcement.spec.ts) and dashboard toggles in [ tests/security/security-dashboard.spec.ts ](../../tests/security/security-dashboard.spec.ts).
|
||||
|
||||
### 2.6 Integration Tests Touch Security Domains
|
||||
|
||||
Some integration tests create access lists and navigate to security pages, for example [ tests/integration/multi-feature-workflows.spec.ts ](../../tests/integration/multi-feature-workflows.spec.ts). These do not explicitly toggle security modules, but they use security-domain resources that may depend on Cerberus state and should be reviewed for compatibility with Cerberus being disabled.
|
||||
|
||||
## 3. Technical Specifications
|
||||
|
||||
### 3.1 CI Lint Job (Pipeline)
|
||||
### 3.1 Security Test Classification Rules
|
||||
|
||||
Add a Go lint step to the lint job in
|
||||
[ .github/workflows/ci-pipeline.yml ]:
|
||||
Classify a test as security-affecting if it does any of the following:
|
||||
|
||||
- Tooling: `golangci/golangci-lint-action`.
|
||||
- Working directory: `backend`.
|
||||
- Config: `backend/.golangci-fast.yml`.
|
||||
- Timeout: match config intent (2m fast, or 5m if parity with other
|
||||
pipeline steps is preferred).
|
||||
- Failures: do not allow `continue-on-error`.
|
||||
- Calls the emergency security reset endpoint.
|
||||
- Sets or toggles feature flags related to Cerberus, ACL, WAF, CrowdSec, or rate limiting.
|
||||
- Enables or disables security modules via settings or admin controls.
|
||||
- Depends on rate limiting behavior or ACL/WAF enforcement for assertions.
|
||||
|
||||
### 3.2 CI Lint Job (Manual Quality Checks)
|
||||
### 3.2 Isolation Strategy Options
|
||||
|
||||
Update [ .github/workflows/quality-checks.yml ] to align with local
|
||||
blocking behavior:
|
||||
Option A (preferred): Move security-affecting tests into dedicated security folders
|
||||
|
||||
- Remove `continue-on-error: true` from the golangci-lint step.
|
||||
- Ensure the step points to `backend/.golangci-fast.yml` or runs in
|
||||
`backend` so that the config is picked up deterministically.
|
||||
- Pin golangci-lint version to the same major used in CI pipeline to
|
||||
avoid config drift.
|
||||
- Move or split tests from tests/settings/system-settings.spec.ts into a new security-focused file under tests/security or tests/security-enforcement.
|
||||
- Move tests/emergency-server to tests/security-enforcement or tests/security, depending on whether they validate enforcement behavior or emergency pathways.
|
||||
- Keep non-security shards limited to tests that do not mutate security state.
|
||||
|
||||
### 3.3 Integration Cancellation Root Cause and Fix
|
||||
Option B: Use Playwright tags and workflow filters
|
||||
|
||||
Investigate and address workflow-level cancellation affecting
|
||||
integration jobs after `build-image` completes.
|
||||
- Tag security-affecting tests with a consistent tag such as @security-affecting.
|
||||
- Update security jobs to run tagged tests and non-security jobs to exclude them using grep or grep-invert.
|
||||
|
||||
Required investigation steps:
|
||||
Option C: Update non-security job inputs to explicitly exclude security-affecting files
|
||||
|
||||
- Inspect recent CI runs for cancellation reasons in the Actions UI
|
||||
(workflow-level cancellation vs job-level failure).
|
||||
- Confirm whether cancellations coincide with the workflow-level
|
||||
concurrency group in [ .github/workflows/ci-pipeline.yml ].
|
||||
- Verify `inputs.run_integration` values are only populated on
|
||||
`workflow_dispatch` events and evaluate the behavior on
|
||||
`pull_request` events.
|
||||
- Verify `needs.build-image.outputs.push_image` and
|
||||
`needs.build-image.outputs.image_ref_dockerhub` are set for non-fork
|
||||
pull requests and branch pushes.
|
||||
- Remove tests/settings/system-settings.spec.ts and tests/emergency-server from non-security shard inputs.
|
||||
- Add those tests to the security job inputs.
|
||||
|
||||
Proposed fix (preferred):
|
||||
Decision: Prefer Option A with a fallback to Option B if the team wants to keep files in their current directories. Option C is acceptable as a short-term mitigation but is less maintainable long-term.
|
||||
|
||||
- Remove workflow-level concurrency from
|
||||
[ .github/workflows/ci-pipeline.yml ] and instead apply job-level
|
||||
concurrency to the build-image job only, keeping cancellation limited
|
||||
to redundant builds while allowing downstream integration/E2E/coverage
|
||||
jobs to finish.
|
||||
- Add explicit guards to integration jobs:
|
||||
`if: needs.build-image.result == 'success' &&
|
||||
needs.build-image.outputs.push_image == 'true' &&
|
||||
needs.build-image.outputs.image_ref_dockerhub != '' &&
|
||||
(inputs.run_integration != false)`.
|
||||
- Update the integration-gate logic to treat `skipped` jobs as
|
||||
non-fatal and only fail on `failure` or `cancelled` when
|
||||
`needs.build-image.result == 'success'` and `push_image == 'true'`.
|
||||
### 3.3 Workflow Separation Rules
|
||||
|
||||
Alternative fix (not recommended; does not meet primary objective):
|
||||
Update [ .github/workflows/e2e-tests-split.yml ](../../.github/workflows/e2e-tests-split.yml) so:
|
||||
|
||||
- Keep workflow-level concurrency but change to
|
||||
`cancel-in-progress: ${{ github.event_name == 'pull_request' }}` so
|
||||
branch pushes and manual dispatches complete all downstream jobs.
|
||||
- This option still cancels PR runs after successful builds, which
|
||||
conflicts with the primary objective of allowing integration gates
|
||||
to complete reliably.
|
||||
- Security jobs explicitly include all security-affecting tests, including those moved from settings and emergency-server.
|
||||
- Non-security jobs do not include any files or directories that toggle or reset security modules.
|
||||
- If tags are used, security jobs should run only tagged tests and non-security jobs should invert the tag.
|
||||
|
||||
### 3.4 Image Tag Outputs (CI Pipeline)
|
||||
### 3.4 Test Organization Changes
|
||||
|
||||
Update the `Compute image tags` step in
|
||||
[ .github/workflows/ci-pipeline.yml ] to emit additional tags.
|
||||
Planned file moves and splits:
|
||||
|
||||
Required additions:
|
||||
- Split tests/settings/system-settings.spec.ts so security-affecting tests move to a dedicated security-focused test file under tests/security.
|
||||
- Move tests/emergency-server into a security-enforcement folder.
|
||||
- Review integration tests for dependencies on security module state and move or tag as needed.
|
||||
|
||||
- SHA-only tag (short SHA, no prefix):
|
||||
`${SHORT_SHA}` for both GHCR and Docker Hub.
|
||||
- Tag normalization rules for `SANITIZED_BRANCH`:
|
||||
- Ensure the tag is non-empty after sanitization.
|
||||
- Ensure the first character is `[a-z0-9]`; if it would start with
|
||||
`-` or `.`, normalize by trimming leading `-` or `.` and recheck.
|
||||
- Replace non-alphanumeric characters with `-` and collapse multiple
|
||||
`-` characters into one.
|
||||
- Limit the tag length to 128 characters after normalization.
|
||||
- Fallback: if the sanitized result is empty or still invalid after
|
||||
normalization, use `branch` as the fallback prefix.
|
||||
- Branch+SHA tag for non-PR events using a sanitized branch name derived
|
||||
from `github.ref_name` (lowercase, `/` → `-`, non-alnum → `-`,
|
||||
trimmed, collapsed). Example:
|
||||
`${SANITIZED_BRANCH}-${SHORT_SHA}`.
|
||||
- Preserve existing `pr-${PR_NUMBER}-${SHORT_SHA}` for PRs.
|
||||
- Keep `latest`, `dev`, and `nightly` tags based on:
|
||||
`github.ref_name == 'main' | 'development' | 'nightly'`.
|
||||
Concrete list of tests to move from [ tests/settings/system-settings.spec.ts ](../../tests/settings/system-settings.spec.ts) into a new file [ tests/security/system-settings-feature-toggles.spec.ts ](../../tests/security/system-settings-feature-toggles.spec.ts):
|
||||
|
||||
Decision point: SHA-only tags for PR builds
|
||||
- Feature Toggles:
|
||||
- "should toggle Cerberus security feature"
|
||||
- "should toggle CrowdSec console enrollment"
|
||||
- "should toggle uptime monitoring"
|
||||
- "should persist feature toggle changes"
|
||||
- "should show overlay during feature update"
|
||||
- Feature Toggles - Advanced Scenarios (Phase 4):
|
||||
- "should handle concurrent toggle operations"
|
||||
- "should retry on 500 Internal Server Error"
|
||||
- "should fail gracefully after max retries exceeded"
|
||||
- "should verify initial feature flag state before tests"
|
||||
|
||||
- Option A (recommended): publish SHA-only tags only for trusted
|
||||
branches (main/development/nightly and non-fork pushes). PR builds
|
||||
continue to use `pr-${PR_NUMBER}-${SHORT_SHA}` without SHA-only tags.
|
||||
- Option B: publish SHA-only tags for PR builds when image push is
|
||||
enabled for a non-fork authorized run (e.g., same-repo PRs), in
|
||||
addition to PR-prefixed tags.
|
||||
- Assumption (default until decided): follow Option A to avoid
|
||||
ambiguous SHA-only tags for untrusted PR contexts.
|
||||
Note: The `test.afterEach` feature flag reset and `test.afterAll` API metrics reporting currently tied to toggles should move with the toggle suite into [ tests/security/system-settings-feature-toggles.spec.ts ](../../tests/security/system-settings-feature-toggles.spec.ts) to keep state cleanup scoped to the security job.
|
||||
|
||||
Required step-level variables and expressions:
|
||||
Concrete emergency server file moves:
|
||||
|
||||
- Step: `Compute image tags` (id: `tags`).
|
||||
- Variables: `SHORT_SHA`, `DEFAULT_TAG`, `PR_NUMBER`, `SANITIZED_BRANCH`.
|
||||
- Expressions:
|
||||
- `${{ github.event_name }}`
|
||||
- `${{ github.ref_name }}`
|
||||
- `${{ github.event.pull_request.number }}`
|
||||
- Move [ tests/emergency-server/emergency-server.spec.ts ](../../tests/emergency-server/emergency-server.spec.ts) to [ tests/security-enforcement/emergency-server/emergency-server.spec.ts ](../../tests/security-enforcement/emergency-server/emergency-server.spec.ts).
|
||||
- Move [ tests/emergency-server/tier2-validation.spec.ts ](../../tests/emergency-server/tier2-validation.spec.ts) to [ tests/security-enforcement/emergency-server/tier2-validation.spec.ts ](../../tests/security-enforcement/emergency-server/tier2-validation.spec.ts).
|
||||
|
||||
### 3.5 Image Tag Outputs (docker-build.yml)
|
||||
### 3.5 Error Handling and Edge Cases
|
||||
|
||||
Update [ .github/workflows/docker-build.yml ] `Generate Docker metadata`
|
||||
tags to match the required outputs.
|
||||
- Parallel shards must not toggle global security state at the same time.
|
||||
- Tests that require Cerberus enabled must run only in security jobs where Cerberus is enabled by environment or explicit setup.
|
||||
- If global setup performs a security reset, security jobs must re-enable required modules before assertions.
|
||||
|
||||
Required additions:
|
||||
### 3.6 Global Setup Conditioning (Critical)
|
||||
|
||||
- Add SHA-only short tag for all events:
|
||||
`type=sha,format=short,prefix=,suffix=`.
|
||||
- Add branch+SHA short tag for non-PR events using a sanitized branch
|
||||
name derived from `env.TRIGGER_REF` or `env.TRIGGER_HEAD_BRANCH`.
|
||||
- Apply the same tag normalization rules as the CI pipeline
|
||||
(`SANITIZED_BRANCH` non-empty, leading character normalized, length
|
||||
<= 128, fallback to `branch`).
|
||||
- Add explicit branch tags for main/development/nightly based on
|
||||
`env.TRIGGER_REF` (do not rely on `is_default_branch` for
|
||||
workflow_run triggers):
|
||||
- `type=raw,value=latest,enable=${{ env.TRIGGER_REF == 'refs/heads/main' }}`
|
||||
- `type=raw,value=dev,enable=${{ env.TRIGGER_REF == 'refs/heads/development' }}`
|
||||
- `type=raw,value=nightly,enable=${{ env.TRIGGER_REF == 'refs/heads/nightly' }}`
|
||||
Global setup must not reset security in non-security shards. Add a guard in [ tests/global-setup.ts ](../../tests/global-setup.ts):
|
||||
|
||||
Required step names and variables:
|
||||
|
||||
- Step: `Compute feature branch tag` (id: `feature-tag`) remains for
|
||||
`refs/heads/feature/*`.
|
||||
- New step: `Compute branch+sha tag` (id: `branch-tag`) for all
|
||||
non-PR events using `TRIGGER_REF`.
|
||||
- Metadata step: `Generate Docker metadata` (id: `meta`).
|
||||
- Expressions:
|
||||
- `${{ env.TRIGGER_EVENT }}`
|
||||
- `${{ env.TRIGGER_REF }}`
|
||||
- `${{ env.TRIGGER_HEAD_SHA }}`
|
||||
- `${{ env.TRIGGER_PR_NUMBER }}`
|
||||
- `${{ steps.branch-tag.outputs.tag }}`
|
||||
|
||||
### 3.6 Repository Hygiene Review (Requested)
|
||||
|
||||
- [ .gitignore ]: No change required for CI updates; no new artifacts
|
||||
introduced by the tag changes.
|
||||
- [ codecov.yml ]: No change required; coverage configuration remains
|
||||
correct.
|
||||
- [ .dockerignore ]: No change required; CI-only YAML edits are already
|
||||
excluded from Docker build context.
|
||||
- [ Dockerfile ]: No change required; tagging logic is CI-only.
|
||||
- [ Branch tag normalization ]: No new files required; logic should be
|
||||
implemented in existing CI steps only.
|
||||
- Only validate `CHARON_EMERGENCY_TOKEN`, call `emergencySecurityReset()`, and run `verifySecurityDisabled()` when `CHARON_SECURITY_TESTS_ENABLED === 'true'`.
|
||||
- For non-security shards (`CHARON_SECURITY_TESTS_ENABLED !== 'true'`), skip all security reset logic and continue with health checks and test data cleanup only.
|
||||
- Preserve existing behavior for security shards so enforcement tests still run against a deterministic baseline.
|
||||
|
||||
## 4. Implementation Plan
|
||||
|
||||
### Phase 1: Playwright Tests (Behavior Baseline)
|
||||
|
||||
- Confirm that no UI behavior is affected by CI-only changes.
|
||||
- Keep this phase as a verification note: E2E is unchanged and can be
|
||||
re-run if CI changes surface unexpected side effects.
|
||||
- Confirm the current security toggle behavior in system settings and emergency server tests.
|
||||
- Define expected outcomes for toggling Cerberus and CrowdSec so that moved tests retain coverage.
|
||||
|
||||
### Phase 2: Pipeline Lint Restoration
|
||||
### Phase 2: Security-Affecting Test Identification
|
||||
|
||||
- Add a Go lint step to the lint job in
|
||||
[ .github/workflows/ci-pipeline.yml ].
|
||||
- Use `backend/.golangci-fast.yml` and ensure the step blocks on
|
||||
failure.
|
||||
- Keep the lint job dependency order intact (repo health → Hadolint →
|
||||
GORM scan → Go lint → frontend lint).
|
||||
- Inventory tests in tests/settings, tests/emergency-server, and tests/integration against the security-affecting rules.
|
||||
- Create a list of files to move, split, or tag.
|
||||
|
||||
### Phase 3: Integration Cancellation Fix
|
||||
### Phase 3: Test Restructuring
|
||||
|
||||
- Remove workflow-level concurrency from
|
||||
[ .github/workflows/ci-pipeline.yml ] and add job-level concurrency
|
||||
on `build-image` only.
|
||||
- Add explicit `if` guards to integration jobs based on
|
||||
`needs.build-image.result`, `needs.build-image.outputs.push_image`,
|
||||
and `needs.build-image.outputs.image_ref_dockerhub`.
|
||||
- Update `integration-gate` to ignore `skipped` results when integration
|
||||
is not expected to run and only fail on `failure` or `cancelled` when
|
||||
build-image succeeded and pushed an image.
|
||||
- Split tests/settings/system-settings.spec.ts to isolate security toggles into [ tests/security/system-settings-feature-toggles.spec.ts ](../../tests/security/system-settings-feature-toggles.spec.ts) using the concrete list above.
|
||||
- Move emergency server tests into [ tests/security-enforcement/emergency-server/ ](../../tests/security-enforcement/emergency-server/) using the concrete list above.
|
||||
- If integration tests require security modules enabled, relocate or tag them.
|
||||
|
||||
### Phase 4: Docker Tagging Updates
|
||||
### Phase 4: Workflow Updates
|
||||
|
||||
- Update `Compute image tags` in
|
||||
[ .github/workflows/ci-pipeline.yml ] to emit SHA-only and
|
||||
branch+SHA tags in addition to the existing PR and branch tags.
|
||||
- Update `Generate Docker metadata` in
|
||||
[ .github/workflows/docker-build.yml ] to emit SHA-only, branch+SHA,
|
||||
and explicit latest/dev/nightly tags based on `env.TRIGGER_REF`.
|
||||
- Add tag normalization logic in both workflows to ensure valid Docker
|
||||
tag prefixes (non-empty, valid leading character, <= 128 length,
|
||||
fallback when sanitized branch is empty or invalid).
|
||||
- Update non-security shard inputs in [ .github/workflows/e2e-tests-split.yml ](../../.github/workflows/e2e-tests-split.yml):
|
||||
- Remove `tests/emergency-server` from non-security job inputs.
|
||||
- Keep `tests/settings` but ensure the moved security toggle suite lives under `tests/security` so it is not picked up.
|
||||
- Update security job inputs to include the relocated emergency server folder:
|
||||
- Ensure `tests/security-enforcement/emergency-server` is included (already covered by `tests/security-enforcement/` once moved).
|
||||
- Security jobs already include `tests/security/`, which will pick up `tests/security/system-settings-feature-toggles.spec.ts`.
|
||||
- If tags are adopted, add grep filters to the security and non-security job commands.
|
||||
|
||||
### Phase 5: Validation and Guardrails
|
||||
|
||||
- Verify CI logs show the golangci-lint version and config in use.
|
||||
- Confirm integration jobs are no longer cancelled after successful
|
||||
builds when new runs are queued.
|
||||
- Validate that Docker Hub and GHCR tags include:
|
||||
- SHA-only short tags
|
||||
- Branch+SHA short tags
|
||||
- latest/dev/nightly tags for main/development/nightly branches
|
||||
- Run the security jobs and non-security jobs separately and confirm no security-related tests execute in non-security shards.
|
||||
- Confirm rate limit and ACL enforcement tests only run under security jobs with Cerberus enabled.
|
||||
- Capture and review Playwright reports for cross-shard contamination indicators.
|
||||
|
||||
## 5. Acceptance Criteria (EARS)
|
||||
|
||||
- WHEN a pull request or manual pipeline run executes, THE SYSTEM SHALL
|
||||
run golangci-lint in the pipeline lint stage using
|
||||
`backend/.golangci-fast.yml`.
|
||||
- WHEN golangci-lint finds violations, THE SYSTEM SHALL fail the
|
||||
pipeline lint stage and block downstream jobs.
|
||||
- WHEN the manual quality workflow runs, THE SYSTEM SHALL enforce the
|
||||
same blocking behavior and fast config as pre-commit.
|
||||
- WHEN a build-image job completes successfully and image push is
|
||||
enabled for a non-fork authorized run, THE SYSTEM SHALL allow
|
||||
integration jobs to run to completion without being cancelled by
|
||||
workflow-level concurrency.
|
||||
- WHEN integration jobs are skipped by configuration while image push
|
||||
is disabled or not authorized for the run, THE SYSTEM SHALL not mark
|
||||
the integration gate as failed.
|
||||
- WHEN a non-PR build runs on main/development/nightly branches and
|
||||
image push is enabled for a non-fork authorized run, THE SYSTEM SHALL
|
||||
publish `latest`, `dev`, or `nightly` tags respectively to Docker Hub
|
||||
and GHCR.
|
||||
- WHEN any image is built in CI and image push is enabled for a
|
||||
non-fork authorized run, THE SYSTEM SHALL publish SHA-only and
|
||||
branch+SHA tags in addition to existing PR or default tags.
|
||||
- WHEN a non-security E2E shard runs, THE SYSTEM SHALL exclude all tests that toggle or reset Cerberus, ACL, WAF, CrowdSec, or rate limiting.
|
||||
- WHEN a non-security E2E shard runs, THE SYSTEM SHALL skip the global security reset in [ tests/global-setup.ts ](../../tests/global-setup.ts) unless `CHARON_SECURITY_TESTS_ENABLED` is `true`.
|
||||
- WHEN a security E2E shard runs, THE SYSTEM SHALL include all tests that toggle or reset security modules and all enforcement tests.
|
||||
- WHEN security-affecting tests run, THE SYSTEM SHALL execute them only in workflows where Cerberus is enabled.
|
||||
- WHEN tests are reorganized, THE SYSTEM SHALL preserve existing security coverage without introducing new cross-shard dependencies.
|
||||
- WHEN integration tests require security modules enabled, THE SYSTEM SHALL route them to security shards or explicitly enable security in their setup.
|
||||
|
||||
## 6. Risks and Mitigations
|
||||
|
||||
- Risk: CI runtime increases due to added golangci-lint execution.
|
||||
Mitigation: use the fast config and keep timeout tight (2m) with
|
||||
caching enabled by the action.
|
||||
- Risk: Config incompatibility with CI golangci-lint version.
|
||||
Mitigation: pin the version and log it in CI; validate config format.
|
||||
- Risk: Reduced cancellation leads to overlapping integration runs.
|
||||
Mitigation: keep job-level concurrency on build-image; monitor queue
|
||||
time and adjust if needed.
|
||||
- Risk: Tag proliferation complicates image selection for users.
|
||||
Mitigation: document tag matrix in release notes or README once
|
||||
verified in CI.
|
||||
- Risk: Sanitized branch names may collapse to empty or invalid tags.
|
||||
Mitigation: enforce normalization rules with a safe fallback prefix
|
||||
to keep tag generation deterministic.
|
||||
- Risk: Moving tests breaks historical references or documentation links. Mitigation: update any references in test comments and plan docs after moves.
|
||||
- Risk: Tag-based filtering is inconsistent across local and CI runs. Mitigation: document the tag usage in Playwright config and ensure local scripts align with CI filters.
|
||||
- Risk: Integration tests implicitly rely on Cerberus being enabled. Mitigation: audit integration tests and either enable Cerberus in test setup or move them to security shards.
|
||||
|
||||
## 7. Confidence Score
|
||||
|
||||
Confidence: 84 percent
|
||||
Confidence: 78 percent
|
||||
|
||||
Rationale: The linting changes are straightforward, but integration
|
||||
job cancellation behavior depends on workflow-level concurrency and may
|
||||
require validation in Actions history to select the most appropriate
|
||||
fix. Tagging changes are predictable once metadata-action inputs are
|
||||
aligned with branch detection.
|
||||
Rationale: The security-toggling tests are identifiable and the workflow split is clear, but integration test dependencies on security state require additional verification before final routing.
|
||||
|
||||
Reference in New Issue
Block a user