Files
Charon/docs/plans/archive/dod_remediation_spec.md
2026-02-19 16:34:10 +00:00

12 KiB

Definition of Done Remediation Plan

1. Introduction

Overview

This plan remediates Definition of Done (DoD) blockers identified in QA validation for the Notifications changes. It prioritizes the High severity Docker image vulnerability, restores frontend coverage to the 88% gate (with branch focus), resolves linting failures, and re-runs inconclusive checks to reach a clean DoD pass.

Objectives

  • Eliminate the GHSA-69x3-g4r3-p962 vulnerability in the runtime image.
  • Restore frontend coverage to >=88% across lines, statements, functions, and branches.
  • Fix markdownlint and hadolint failures.
  • Re-run TypeScript and pre-commit checks with clean output capture.

Scope

  • Backend dependency graph inspection for nebula source.
  • Frontend test coverage targeting Notifications changes.
  • Dockerfile lint compliance fixes.
  • Markdown table formatting fixes.
  • DoD validation re-runs.

2. Research Findings

QA Report Summary

  • Docker image scan failed with GHSA-69x3-g4r3-p962 in github.com/slackhq/nebula@v1.9.7 (fixed in v1.10.3).
  • Frontend coverage below 88% (branches 78.78%).
  • Markdownlint failure in tests README table formatting.
  • Hadolint failures: DL3059 and SC2012.
  • TypeScript and pre-commit checks inconclusive.

Repository Evidence

Known Contextual Signals

  • Prior security report indicates nebula was patched to 1.10.3 for a different CVE, but the current image scan still detects 1.9.7. This suggests image build steps might be pulling a separate older version during Caddy or CrowdSec build stages.

3. Technical Specifications

3.1 EARS Requirements (DoD Remediation)

  • WHEN the runtime image is scanned, THE SYSTEM SHALL report zero HIGH or CRITICAL vulnerabilities.
  • WHEN frontend coverage is executed, THE SYSTEM SHALL report at least 88% for lines, statements, functions, and branches.
  • WHEN markdownlint runs, THE SYSTEM SHALL report zero lint errors.
  • WHEN hadolint runs, THE SYSTEM SHALL report zero DL3059 or SC2012 findings.
  • WHEN TypeScript checks and pre-commit hooks are executed, THE SYSTEM SHALL report PASS with complete output.

3.2 Dependency Remediation Strategy (Nebula)

  • Identify the actual module path pulling github.com/slackhq/nebula@v1.9.7 by inspecting all build-stage module graphs, with priority on Caddy and CrowdSec build stages.
  • Upgrade the dependency at the source module to v1.10.3 or later and regenerate module sums.
  • Rebuild the Docker image and confirm the fix via a container scan (Grype/Trivy).

3.3 Frontend Coverage Strategy

  • Use the coverage report to pinpoint missing lines/branches in the Notifications flow.
  • Add Vitest unit tests for Notifications.tsx that cover URL validation branches (invalid protocol, malformed URL, empty allowed), update indicator timer behavior, and form reset state.
  • Target frontend unit test files (e.g., frontend/src/pages/__tests__/Notifications.test.tsx) and related helpers; do not rely on Playwright E2E for coverage gates.
  • Ensure coverage is verified through the standard coverage task for frontend.
  • Note: E2E tests verify behavior but do not contribute to Vitest coverage gates.

3.4 Lint Fix Strategy

  • Markdownlint: correct table spacing (align column pipes consistently).
  • Hadolint:
    • DL3059: consolidate consecutive RUN steps in affected stages where possible.
    • SC2012: replace ls -la usages with stat or test -e for deterministic existence checks.

3.5 Validation Strategy

  • Re-run TypeScript check and pre-commit hooks with clean capture.
  • Re-run full DoD sequence (E2E already passing for notifications).

4. Implementation Plan

Phase 1: High-Priority Nebula Upgrade (P0)

Status: ACCEPTED RISK (was BLOCKED) Note: Proceeding to Phase 2-4 with documented security exception.

Commands

  1. Locate dependency source (module graph):
    • cd backend && go mod why -m github.com/slackhq/nebula
    • rg "slackhq/nebula" -n backend .docker docs configs
    • If dependency is in build-stage modules, inspect Caddy and CrowdSec build steps by capturing build logs or inspecting generated go.mod within the builder stage.
  2. Upgrade to v1.10.3+ at the source module:
    • go get github.com/slackhq/nebula@v1.10.3 (in the module where it is pulled)
    • go mod tidy
  3. Rebuild image and rescan:
    • .github/skills/scripts/skill-runner.sh docker-rebuild-e2e
    • .github/skills/scripts/skill-runner.sh security-scan-docker-image

Rollback Plan

  • If the upgrade fails, run git restore backend/go.mod backend/go.sum (or Dockerfile if the patch was applied in a build stage) and rebuild the image.

Checkpoint

  • STOP: If GHSA-69x3-g4r3-p962 persists after the image scan, reassess the dependency source before continuing to Phase 2. Likely sources are the Caddy builder stage or CrowdSec builder stage module graphs.

Files to Modify (Expected)

  • If dependency is in backend module: backend/go.mod and backend/go.sum.
  • If dependency is in a build-stage module (Caddy/CrowdSec builder), update the patching logic in Dockerfile in the relevant build stage.

Expected Outcomes

  • Grype/Trivy reports zero HIGH/CRITICAL vulnerabilities.
  • GHSA-69x3-g4r3-p962 removed from image scan output.

Risks

  • Dependency upgrade could impact Caddy/CrowdSec build reproducibility or plugin compatibility.
  • If the dependency is tied to a third-party module (xcaddy build), upgrades may require explicit go get overrides.

Phase 2: Frontend Coverage Improvement (P1)

Commands

  1. Run verbose coverage:
    • cd frontend && npm run test:coverage -- --reporter=verbose
  2. Inspect the HTML report:
    • open coverage/lcov-report/index.html
  3. Identify missing lines/branches in Notifications components and related utilities.

Files to Modify (Expected)

Expected Outcomes

  • Coverage meets or exceeds 88% for lines, statements, functions, branches.
  • Patch coverage reaches 100% for all modified lines (Codecov patch view).

Risks

  • Additional tests may require stable mock setup for API calls and timers.
  • Over-mocking can hide real behavior; ensure branch coverage reflects actual runtime behavior.

Checkpoint

  • Verify coverage >=88% before starting lint fixes.

Phase 3: Lint Fixes (P2)

Commands

  1. Markdownlint:
    • npm run lint:markdown
  2. Hadolint:
    • docker run --rm -i hadolint/hadolint < Dockerfile

Files to Modify

  • Markdown table formatting: tests/README.md
  • Dockerfile lint issues:
    • SC2012 replacements: Dockerfile and Dockerfile
    • DL3059 consolidation of adjacent RUN instructions in the affected stages (specify the exact stage during implementation to limit cache impact to that stage only).

Expected Outcomes

  • Markdownlint passes with zero errors.
  • Hadolint passes with zero DL3059 or SC2012 findings.

Risks

  • Consolidating RUN steps may impact layer caching; ensure build outputs are unchanged.

Phase 4: Validation Re-runs (P3)

Commands

  1. E2E (mandatory first):
    • npx playwright test --project=firefox
  2. Pre-commit (all files):
    • pre-commit run --all-files
  3. TypeScript check:
    • cd frontend && npm run type-check
  4. Other DoD validations (as required):
    • Frontend coverage: scripts/frontend-test-coverage.sh
    • Backend coverage (if impacted): scripts/go-test-coverage.sh
    • Security scans: CodeQL and Trivy/Grype tasks

Order Note

  • Per .github/instructions/testing.instructions.md, E2E is mandatory first validation. Sequence must be E2E -> pre-commit -> TypeScript -> other validations.

Expected Outcomes

  • TypeScript and pre-commit checks show PASS with complete logs.
  • DoD gates pass with zero blocking findings.

Risks

  • Pre-commit hooks may surface additional lint failures requiring quick fixes.

5. Decision Record

Decision - 2026-02-10

Decision: How to remediate nebula@v1.9.7 in the runtime image.

Context: The image scan finds a High vulnerability in github.com/slackhq/nebula@v1.9.7, but the workspace already contains v1.10.3 in the sum file. The actual source module is unknown and likely part of the Caddy or CrowdSec build stages.

Options:

  1. Add a direct dependency override in the source module that pulls nebula (e.g., go get or replace in the build-stage module).
  2. Add a forced go get github.com/slackhq/nebula@v1.10.3 patch in the Caddy/CrowdSec builder stage after xcaddy generates its go.mod.
  3. Upgrade the dependent plugin or dependency chain to a release that already pins nebula@v1.10.3+.

Rationale: Option 2 offers the most deterministic fix when the dependency is introduced in generated build-stage modules. Option 3 is preferred if a plugin release provides a clean upstream fix without manual overrides.

Impact: Ensures the runtime image is free of the known vulnerability and aligns build-stage dependencies with security requirements.

Review: Reassess if upstream plugins release versions that pin the dependency and allow removal of manual overrides.

6. Acceptance Criteria

  • Docker image scan reports zero HIGH/CRITICAL vulnerabilities and GHSA-69x3-g4r3-p962 is absent.
  • Frontend coverage meets or exceeds 88% for lines, statements, functions, and branches.
  • Markdownlint passes with no table formatting errors.
  • Hadolint passes with no DL3059 or SC2012 findings.
  • TypeScript check and pre-commit hooks complete with PASS output.
  • DoD validation is unblocked and ready for Supervisor review.

7. Verification Matrix

Phase Check Expected Artifact Status
P0 Docker scan grype-results.json shows 0 HIGH/CRITICAL ⏸️
P0 Dependency source confirmed Builder-stage or module graph notes captured ⏸️
P1 Frontend coverage coverage/lcov-report/index.html shows >=88% ⏸️
P2 Markdownlint npm run lint:markdown passes ⏸️
P2 Hadolint hadolint passes with no DL3059/SC2012 ⏸️
P3 E2E Playwright run passes ⏸️
P3 Pre-commit pre-commit run --all-files passes ⏸️
P3 TypeScript npm run type-check passes ⏸️
P3 Coverage (if impacted) scripts/*-test-coverage.sh passes ⏸️
P3 Security scans CodeQL/Trivy/Grype pass ⏸️