7.6 KiB
7.6 KiB
CI Encryption-Key Investigation and Remediation Plan
Context
- Date: 2026-02-17
- Scope: CI failures where backend jobs report encryption key not picked up.
- In-scope files:
.github/workflows/quality-checks.yml.github/workflows/codecov-upload.ymlscripts/go-test-coverage.shbackend/internal/crypto/rotation_service.gobackend/internal/services/dns_provider_service.gobackend/internal/services/credential_service.go
Problem Statement
CI backend tests can fail late and ambiguously when CHARON_ENCRYPTION_KEY is missing or malformed. The root causes are context-dependent secret availability, missing preflight validation, and drift between workflow intent and implementation.
Research Findings
Workflow Surface and Risks
| Workflow | Job | Key-sensitive step | Current key source | Main risk |
|---|---|---|---|---|
.github/workflows/quality-checks.yml |
backend-quality |
Run Go tests, Run Perf Asserts |
${{ secrets.CHARON_ENCRYPTION_KEY_TEST }} |
Empty/malformed input not preflighted |
.github/workflows/codecov-upload.yml |
backend-codecov |
Run Go tests with coverage |
${{ secrets.CHARON_ENCRYPTION_KEY_TEST }} |
Same key-risk as above |
Backend Failure Surface
backend/internal/crypto/rotation_service.goNewRotationService(db *gorm.DB)hard-fails ifCHARON_ENCRYPTION_KEYis empty.
backend/internal/services/dns_provider_service.goNewDNSProviderService(...)depends onNewRotationService(...)and can degrade to warning-based behavior when key input is bad.
backend/internal/services/credential_service.goNewCredentialService(...)has the same dependency pattern.
Script Failure Mode
scripts/go-test-coverage.shcurrently usesset -euo pipefailbut does not pre-validate key shape beforego test.- Empty secret expressions become late runtime failures instead of deterministic preflight failures.
Supervisor-Required Constraints (Preserved)
pull_request_targetSHALL NOT be used for secret-bearing backend test execution on untrusted code (fork PRs and Dependabot PRs).- Same-repo
pull_requestandworkflow_dispatchSHALL requireCHARON_ENCRYPTION_KEY_TEST; missing secret SHALL fail fast (no fallback). - Fork PRs and Dependabot PRs SHALL use workflow-only ephemeral key fallback for backend test execution.
- Key material SHALL NEVER be logged.
- Resolved key SHALL be masked before any potential output path.
GITHUB_ENVpropagation SHALL use safe delimiter write pattern.- Workflow layer SHALL own key resolution/fallback.
- Script layer SHALL only validate and fail fast; it SHALL NOT generate fallback keys.
- Anti-drift guard SHALL be added so trigger comments and trigger blocks remain aligned.
- Known drift SHALL be corrected: comment in
quality-checks.ymlaboutcodecov-upload.ymltrigger behavior must match actual triggers.
EARS Requirements
Ubiquitous
- THE SYSTEM SHALL fail fast with explicit diagnostics when encryption-key input is required and unavailable or malformed.
- THE SYSTEM SHALL prevent secret-value exposure in logs, summaries, and artifacts.
Event-driven
- WHEN workflow context is trusted (same-repo
pull_requestorworkflow_dispatch), THE SYSTEM SHALL requiresecrets.CHARON_ENCRYPTION_KEY_TEST. - WHEN workflow context is untrusted (fork PR or Dependabot PR), THE SYSTEM SHALL generate ephemeral key material in workflow preflight only.
- WHEN workflow context is untrusted, THE SYSTEM SHALL NOT use
pull_request_targetfor secret-bearing backend tests.
Unwanted behavior
- IF
CHARON_ENCRYPTION_KEYis empty, non-base64, or decoded length is not 32 bytes, THEN THE SYSTEM SHALL stop before running tests. - IF trigger comments diverge from workflow triggers, THEN THE SYSTEM SHALL fail anti-drift validation.
Technical Design
Workflow Contract
Both backend jobs (backend-quality, backend-codecov) implement the same preflight sequence:
Resolve encryption key for backend testsFail fast when required encryption secret is missingValidate encryption key format
Preflight Resolution Algorithm
- Detect fork PR context via
github.event.pull_request.head.repo.fork. - Detect Dependabot PR context (actor/repo metadata check).
- Trusted context: require
secrets.CHARON_ENCRYPTION_KEY_TEST; fail immediately if empty. - Untrusted context: generate ephemeral key (
openssl rand -base64 32) in workflow only. - Mask resolved key via
::add-mask::. - Export via delimiter-based
GITHUB_ENVwrite:CHARON_ENCRYPTION_KEY<<EOF<value>EOF
Script Validation Contract
scripts/go-test-coverage.sh adds strict preflight validation:
- Present and non-empty.
- Base64 decodable.
- Decoded length exactly 32 bytes.
Script constraints:
- SHALL NOT generate keys.
- SHALL NOT select key source.
- SHALL only validate and fail fast with deterministic error messages.
Error Handling Matrix
| Condition | Detection layer | Outcome |
|---|---|---|
| Trusted context + missing secret | Workflow preflight | Immediate failure with explicit message |
| Untrusted context + no secret access | Workflow preflight | Ephemeral key path (masked) |
| Malformed key | Script preflight | Immediate failure before go test |
| Trigger/comment drift | Workflow consistency guard | CI failure until synchronized |
Implementation Plan
Phase 1: Workflow Hardening
- Update
.github/workflows/quality-checks.ymland.github/workflows/codecov-upload.ymlwith identical key-resolution and key-validation steps. - Enforce trusted-context fail-fast and untrusted-context fallback boundaries.
- Add explicit prohibition notes and controls preventing
pull_request_targetmigration for secret-bearing tests.
Phase 2: Script Preflight Hardening
- Update
scripts/go-test-coverage.shto validate key presence/format/length before tests. - Preserve existing coverage behavior; only harden pre-test guard path.
Phase 3: Anti-Drift Enforcement
- Define one canonical backend-key-bootstrap contract path.
- Add consistency check that enforces trigger/comment parity between
quality-checks.ymlandcodecov-upload.yml. - Fix known push-only comment mismatch in
quality-checks.yml.
Validation Plan
Run these scenarios:
- Same-repo PR with valid secret.
- Same-repo PR with missing secret (must fail fast).
- Same-repo PR with malformed secret (must fail fast before tests).
- Fork PR with no secret access (must use ephemeral fallback).
- Dependabot PR with no secret access (must use ephemeral fallback, no
pull_request_target). workflow_dispatchwith valid secret.
Expected results:
- No late ambiguous key-init failures.
- No secret material logged.
- Deterministic and attributable failure messages.
- Trigger docs and trigger config remain synchronized.
Acceptance Criteria
- Backend jobs in
quality-checks.ymlandcodecov-upload.ymlno longer fail ambiguously on encryption-key pickup. - Trusted contexts fail fast if
CHARON_ENCRYPTION_KEY_TESTis missing. - Untrusted contexts use workflow-only ephemeral fallback.
scripts/go-test-coverage.shenforces deterministic key preflight checks.pull_request_targetis explicitly prohibited for secret-bearing backend tests on untrusted code.- Never-log-key-material and safe
GITHUB_ENVpropagation are implemented. - Workflow/script responsibility boundary is enforced.
- Anti-drift guard is present and known trigger-comment mismatch is resolved.
Handoff to Supervisor
- This document is intentionally single-scope and restricted to CI encryption-key investigation/remediation.
- Legacy multi-topic coverage planning content has been removed from this file to maintain coherence.