Files
Charon/docs/plans/archive/ci_codecov_backend_failure_remediation.md
2026-02-19 16:34:10 +00:00

38 KiB

CI Codecov Backend Test Failures - Remediation Plan

Date: 2026-02-16 Status: Investigation Complete - Ready for Implementation Priority: CRITICAL CI BLOCKER Workflow: .github/workflows/codecov-upload.ymlbackend-codecov job


Executive Summary

CRITICAL: Multiple CI workflows are failing with the same root cause. Investigation reveals these failures affect 3 workflows, not just codecov-upload.

Affected Workflows

Workflow File Purpose Job Name(s) Test Command Status Priority
codecov-upload.yml Coverage upload to Codecov backend-codecov go-test-coverage.sh Failing CRITICAL
quality-checks.yml PR quality gates backend-quality go-test-coverage.sh + go test -run TestPerf Failing CRITICAL
benchmark.yml Performance regression checks benchmark go test -bench + go test -run TestPerf ⚠️ At Risk HIGH

Other Workflows Analyzed (NOT affected):

  • e2e-tests-split.yml - Already has CHARON_ENCRYPTION_KEY configured (6+ locations)
  • cerberus-integration.yml - Runs integration scripts, not Go unit tests
  • crowdsec-integration.yml - Runs integration scripts, not Go unit tests
  • All other workflows - Do not run backend Go tests

Root Cause Issues

  1. RotationService Initialization Warnings (Non-blocking but pollutes logs)

    • Multiple services print: "Warning: RotationService initialization failed, using basic encryption: CHARON_ENCRYPTION_KEY is required"
    • Root cause: Missing CHARON_ENCRYPTION_KEY environment variable in ALL 3 affected workflows
    • Impact: Services fall back to basic encryption (no test failures, but warnings appear)
  2. GORM "record not found" Errors (Blocking failures)

    • Source: backend/internal/services/proxyhost_service.go:194
    • Root cause: Tests calling GetByID() without proper test data setup
    • Impact: Tests expecting proxy host records fail with gorm.ErrRecordNotFound

Investigation Findings

1. Encryption Key Requirements

File Analysis: .github/workflows/codecov-upload.yml

Path: /projects/Charon/.github/workflows/codecov-upload.yml Lines: 43-53 (backend-codecov job)

Current Environment Variables:

env:
  CGO_ENABLED: 1

Missing Variables:

  • CHARON_ENCRYPTION_KEY (required for RotationService)

File Analysis: backend/internal/crypto/rotation_service.go

Path: /projects/Charon/backend/internal/crypto/rotation_service.go Lines: 63-75

Error Trigger:

func NewRotationService(db *gorm.DB) (*RotationService, error) {
	// Load current key (required)
	currentKeyB64 := os.Getenv("CHARON_ENCRYPTION_KEY")
	if currentKeyB64 == "" {
		return nil, fmt.Errorf("CHARON_ENCRYPTION_KEY is required")
	}
	// ...
}

File Analysis: Service Dependencies

Affected Services:

  • backend/internal/services/dns_provider_service.go:145 - Calls crypto.NewRotationService(db)
  • backend/internal/services/credential_service.go:72 - Calls crypto.NewRotationService(db)

Fallback Behavior:

rotationService, err := crypto.NewRotationService(db)
if err != nil {
	// Fallback to non-rotation mode
	fmt.Printf("Warning: RotationService initialization failed, using basic encryption: %v\n", err)
}

Test Setup Comparison:

Test File Sets CHARON_ENCRYPTION_KEY? Uses RotationService?
rotation_service_test.go Yes (via setupTestKeys()) Yes
dns_provider_service_test.go No (hardcoded test key) ⚠️ Tries but falls back
credential_service_test.go No (hardcoded test key) ⚠️ Tries but falls back

Example: How Tests Set Encryption Keys

File: backend/internal/crypto/rotation_service_test.go:28-41

func setupTestKeys(t *testing.T) (currentKey, nextKey, legacyKey string) {
	currentKey, err := GenerateNewKey()
	require.NoError(t, err)

	_ = os.Setenv("CHARON_ENCRYPTION_KEY", currentKey)
	t.Cleanup(func() { _ = os.Unsetenv("CHARON_ENCRYPTION_KEY") })

	return currentKey, nextKey, legacyKey
}

File: backend/internal/services/dns_provider_service_test.go:62

// Does NOT set CHARON_ENCRYPTION_KEY
encryptor, err := crypto.NewEncryptionService("AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=")

2. ProxyHost "Record Not Found" Errors

File Analysis: backend/internal/services/proxyhost_service.go

Path: /projects/Charon/backend/internal/services/proxyhost_service.go Lines: 192-197

Error Source:

func (s *ProxyHostService) GetByID(id uint) (*models.ProxyHost, error) {
	var host models.ProxyHost
	if err := s.db.Where("id = ?", id).First(&host).Error; err != nil {
		return nil, err  // Returns gorm.ErrRecordNotFound if no record
	}
	return &host, nil
}

GORM Error Type: gorm.ErrRecordNotFound (not explicitly handled in ProxyHostService)

Test Pattern Analysis

File: backend/internal/services/proxyhost_service_test.go:73-102

Working Test Pattern:

func TestProxyHostService_CRUD(t *testing.T) {
	db := setupProxyHostTestDB(t)
	service := NewProxyHostService(db)

	// Create test data BEFORE calling GetByID
	host := &models.ProxyHost{
		UUID:        "uuid-1",
		DomainNames: "test.example.com",
		ForwardHost: "127.0.0.1",
		ForwardPort: 8080,
	}
	err := service.Create(host)  // Creates record in DB
	assert.NoError(t, err)
	assert.NotZero(t, host.ID)

	// Now GetByID works because record exists
	fetched, err := service.GetByID(host.ID)
	assert.NoError(t, err)
	assert.Equal(t, host.DomainNames, fetched.DomainNames)
}

File: backend/internal/api/handlers/proxy_host_handler_update_test.go:50-60

Helper Function Pattern:

func createTestProxyHost(t *testing.T, db *gorm.DB, name string) models.ProxyHost {
	host := models.ProxyHost{
		UUID:          uuid.NewString(),
		Name:          name,
		DomainNames:   name + ".test.com",
		ForwardScheme: "http",
		ForwardHost:   "localhost",
		ForwardPort:   8080,
		Enabled:       true,
	}
	require.NoError(t, db.Create(&host).Error)
	return host
}

Likely Failure Scenario

Hypothesis: Some tests are calling GetByID() with a hardcoded ID (e.g., GetByID(1)) expecting a record to exist, but:

  • SQLite in-memory DB is empty at test start
  • Test doesn't create the record before calling GetByID()
  • Test previously relied on global seeding that no longer runs

To Identify Failing Tests:

# Search for tests calling GetByID without creating the record first
grep -r "GetByID(" backend/**/*_test.go

Root Cause Analysis

Why Were Tests Passing Before?

Encryption Key Warnings:

  • Tests have ALWAYS printed these warnings (not a recent regression)
  • Warnings are to stderr, don't fail tests
  • This is "noise" that should be cleaned up

ProxyHost Errors:

  • Likely Recent Change:
    • A test was recently modified to call GetByID() without proper setup
    • A global test fixture/seed was removed
    • Test database setup order changed
  • Verification Needed: Check recent commits to *_test.go files

CI vs. Local Test Differences

CI Environment (codecov-upload.yml):

  • No environment variables set beyond CGO_ENABLED=1
  • Fresh test database for each test run
  • No .env file loaded

Local Environment:

  • May have .env file with CHARON_ENCRYPTION_KEY set
  • Test setup may differ from CI
  • Local runs might have different test execution order

Key Files Checked:

  • .env.example - Shows CHARON_ENCRYPTION_KEY= (empty, requires generation)
  • scripts/go-test-coverage.sh - Does NOT set CHARON_ENCRYPTION_KEY
  • scripts/setup-e2e-env.sh - Generates key for E2E tests (NOT unit tests)

Remediation Plan

Phase 1: Environment Variable Configuration (WARNING ELIMINATION)

Objective: Eliminate RotationService initialization warnings in CI logs across ALL affected workflows

Implementation Strategy

Single Secret for All Workflows:

  • Use one GitHub Secret: CHARON_ENCRYPTION_KEY_TEST
  • Apply to all 3 workflows consistently
  • Same security model across all test runs

Security: Use GitHub Repository Secrets for production-like CI

Implementation:

  1. Generate Test Key:

    # Local execution to generate key
    openssl rand -base64 32
    
  2. Add to GitHub Secrets:

    • Navigate to: Repository → Settings → Secrets → Actions
    • Create new secret: CHARON_ENCRYPTION_KEY_TEST
    • Value: Generated base64 key from step 1
  3. Update ALL 3 Workflows:

    Workflow 1: codecov-upload.yml File: .github/workflows/codecov-upload.yml Location: Line 53-60 (backend-codecov job, "Run Go tests with coverage" step)

    - name: Run Go tests with coverage
      working-directory: ${{ github.workspace }}
      env:
        CGO_ENABLED: 1
        CHARON_ENCRYPTION_KEY: ${{ secrets.CHARON_ENCRYPTION_KEY_TEST }}  # ADD THIS LINE
      run: |
        bash scripts/go-test-coverage.sh 2>&1 | tee backend/test-output.txt
        exit "${PIPESTATUS[0]}"
    

    Workflow 2: quality-checks.yml (Test Coverage Step) File: .github/workflows/quality-checks.yml Location: Line 37-45 (backend-quality job, "Run Go tests" step)

    - name: Run Go tests
      id: go-tests
      working-directory: ${{ github.workspace }}
      env:
        CGO_ENABLED: 1
        CHARON_ENCRYPTION_KEY: ${{ secrets.CHARON_ENCRYPTION_KEY_TEST }}  # ADD THIS LINE
      run: |
        bash "scripts/go-test-coverage.sh" 2>&1 | tee backend/test-output.txt
        exit "${PIPESTATUS[0]}"
    

    Workflow 2: quality-checks.yml (Perf Tests Step) File: .github/workflows/quality-checks.yml Location: Line 115-124 (backend-quality job, "Run Perf Asserts" step)

    - name: Run Perf Asserts
      working-directory: backend
      env:
        # Conservative defaults to avoid flakiness on CI; tune as necessary
        PERF_MAX_MS_GETSTATUS_P95: 500ms
        PERF_MAX_MS_GETSTATUS_P95_PARALLEL: 1500ms
        PERF_MAX_MS_LISTDECISIONS_P95: 2000ms
        CHARON_ENCRYPTION_KEY: ${{ secrets.CHARON_ENCRYPTION_KEY_TEST }}  # ADD THIS LINE
      run: |
        {
          echo "## 🔍 Running performance assertions (TestPerf)"
          go test -run TestPerf -v ./internal/api/handlers -count=1 | tee perf-output.txt
        } >> "$GITHUB_STEP_SUMMARY"
        exit "${PIPESTATUS[0]}"
    

    Workflow 3: benchmark.yml (Benchmark Step) File: .github/workflows/benchmark.yml Location: Line 44 (benchmark job, "Run Benchmark" step)

    - name: Run Benchmark
      working-directory: backend
      env:
        CHARON_ENCRYPTION_KEY: ${{ secrets.CHARON_ENCRYPTION_KEY_TEST }}  # ADD THIS LINE
      run: go test -bench=. -benchmem -run='^$' ./... | tee output.txt
    

    Workflow 3: benchmark.yml (Perf Asserts Step) File: .github/workflows/benchmark.yml Location: Line 74 (benchmark job, "Run Perf Asserts" step)

    - name: Run Perf Asserts
      working-directory: backend
      env:
        PERF_MAX_MS_GETSTATUS_P95: 500ms
        PERF_MAX_MS_GETSTATUS_P95_PARALLEL: 1500ms
        PERF_MAX_MS_LISTDECISIONS_P95: 2000ms
        CHARON_ENCRYPTION_KEY: ${{ secrets.CHARON_ENCRYPTION_KEY_TEST }}  # ADD THIS LINE
      run: |
        echo "## 🔍 Running performance assertions (TestPerf)" >> "$GITHUB_STEP_SUMMARY"
        go test -run TestPerf -v ./internal/api/handlers -count=1 | tee perf-output.txt
        exit "${PIPESTATUS[0]}"
    

Summary of Changes:

  • 3 workflow files to modify
  • 5 env sections to update (2 in quality-checks, 2 in benchmark, 1 in codecov-upload)
  • 1 GitHub Secret to create

Pros:

  • Secrets are encrypted at rest
  • Key never appears in logs
  • Matches production security model
  • Consistent across all workflows

Cons:

  • Requires GitHub repository admin access
  • Key rotation requires updating secret (but affects all workflows at once)

Option B: Generate Ephemeral Key (ALTERNATIVE)

Security: Generate temporary key for each CI run

Implementation:

Apply this pattern to all 3 workflows. Each workflow generates its own ephemeral key.

Workflow 1: codecov-upload.yml File: .github/workflows/codecov-upload.yml Location: Before "Run Go tests with coverage" step (after "Set up Go")

- name: Generate test encryption key
  id: test-key
  run: |
    TEST_KEY=$(openssl rand -base64 32)
    echo "::add-mask::${TEST_KEY}"
    echo "CHARON_ENCRYPTION_KEY=${TEST_KEY}" >> $GITHUB_ENV

- name: Run Go tests with coverage
  working-directory: ${{ github.workspace }}
  env:
    CGO_ENABLED: 1
    # CHARON_ENCRYPTION_KEY inherited from $GITHUB_ENV
  run: |
    bash scripts/go-test-coverage.sh 2>&1 | tee backend/test-output.txt
    exit "${PIPESTATUS[0]}"

Workflow 2: quality-checks.yml File: .github/workflows/quality-checks.yml Location: Before "Run Go tests" step (after "Repo health check")

- name: Generate test encryption key
  id: test-key
  run: |
    TEST_KEY=$(openssl rand -base64 32)
    echo "::add-mask::${TEST_KEY}"
    echo "CHARON_ENCRYPTION_KEY=${TEST_KEY}" >> $GITHUB_ENV

- name: Run Go tests
  id: go-tests
  working-directory: ${{ github.workspace }}
  env:
    CGO_ENABLED: 1
    # CHARON_ENCRYPTION_KEY inherited from $GITHUB_ENV
  run: |
    bash "scripts/go-test-coverage.sh" 2>&1 | tee backend/test-output.txt
    exit "${PIPESTATUS[0]}"

# ... later in the same job ...

- name: Run Perf Asserts
  working-directory: backend
  env:
    PERF_MAX_MS_GETSTATUS_P95: 500ms
    PERF_MAX_MS_GETSTATUS_P95_PARALLEL: 1500ms
    PERF_MAX_MS_LISTDECISIONS_P95: 2000ms
    # CHARON_ENCRYPTION_KEY inherited from $GITHUB_ENV
  run: |
    {
      echo "## 🔍 Running performance assertions (TestPerf)"
      go test -run TestPerf -v ./internal/api/handlers -count=1 | tee perf-output.txt
    } >> "$GITHUB_STEP_SUMMARY"
    exit "${PIPESTATUS[0]}"

Workflow 3: benchmark.yml File: .github/workflows/benchmark.yml Location: Before "Run Benchmark" step (after "Set up Go")

- name: Generate test encryption key
  id: test-key
  run: |
    TEST_KEY=$(openssl rand -base64 32)
    echo "::add-mask::${TEST_KEY}"
    echo "CHARON_ENCRYPTION_KEY=${TEST_KEY}" >> $GITHUB_ENV

- name: Run Benchmark
  working-directory: backend
  env:
    # CHARON_ENCRYPTION_KEY inherited from $GITHUB_ENV
  run: go test -bench=. -benchmem -run='^$' ./... | tee output.txt

# ... later in the same job ...

- name: Run Perf Asserts
  working-directory: backend
  env:
    PERF_MAX_MS_GETSTATUS_P95: 500ms
    PERF_MAX_MS_GETSTATUS_P95_PARALLEL: 1500ms
    PERF_MAX_MS_LISTDECISIONS_P95: 2000ms
    # CHARON_ENCRYPTION_KEY inherited from $GITHUB_ENV
  run: |
    echo "## 🔍 Running performance assertions (TestPerf)" >> "$GITHUB_STEP_SUMMARY"
    go test -run TestPerf -v ./internal/api/handlers -count=1 | tee perf-output.txt
    exit "${PIPESTATUS[0]}"

Pros:

  • No secrets management needed
  • Key is ephemeral (discarded after run)
  • Simpler to implement
  • Each workflow run gets its own unique key

Cons:

  • Generates new key on every run (minimal overhead ~0.1s)
  • Doesn't test key persistence scenarios

Security: Hardcode a test-only key in workflow

Implementation:

Apply same hardcoded key to all 3 workflows:

- name: Run Go tests with coverage  # or Run Benchmark, or Run Perf Asserts
  working-directory: ${{ github.workspace }}
  env:
    CGO_ENABLED: 1
    CHARON_ENCRYPTION_KEY: "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="  # Hardcoded test key
  run: |
    bash scripts/go-test-coverage.sh 2>&1 | tee backend/test-output.txt
    exit "${PIPESTATUS[0]}"

Apply to:

  • .github/workflows/codecov-upload.yml - Line 53 env block
  • .github/workflows/quality-checks.yml - Lines 37 and 115 env blocks
  • .github/workflows/benchmark.yml - Lines 44 and 74 env blocks

Pros:

  • Simplest to implement (just add one line per env block)
  • No secrets management
  • No key generation overhead

Cons:

  • ⚠️ Key visible in workflow file and logs
  • ⚠️ Security audit will flag this
  • ⚠️ Doesn't test real key loading from environment
  • ⚠️ Not recommended for repos with security compliance requirements

Recommendation: Use Option A (GitHub Secrets) for production readiness and security compliance, or Option B (Ephemeral) for simplicity without security concerns. Avoid Option C unless this is a demo/test repository.


Phase 2: Database Seeding/Test Setup (ERROR ELIMINATION)

Objective: Fix ProxyHost "record not found" failures

Step 1: Identify Failing Tests

Action: Run tests locally and capture failures

cd backend
go test -v ./... 2>&1 | tee test-output.txt
grep -i "record not found" test-output.txt

Expected Output:

--- FAIL: TestSomeFunction (0.00s)
    service_test.go:123: Error getting proxy host: record not found

Step 2: Classify Failures

For each failing test, determine:

  1. Test calls GetByID() without creating record?

    • Fix: Add createTestProxyHost() call before GetByID()
  2. Test expects a specific ID (e.g., ID=1)?

    • Fix: Store the returned ID from Create() and use it in GetByID()
  3. Test relies on global seed data?

    • Fix: Add explicit test data creation in test setup

Step 3: Apply Fixes

Pattern 1: Missing Test Data Creation

Before (Broken):

func TestSomeFunction(t *testing.T) {
	db := setupTestDB(t)
	service := NewProxyHostService(db)

	// Assumes ID=1 exists (WRONG)
	host, err := service.GetByID(1)
	require.NoError(t, err)
}

After (Fixed):

func TestSomeFunction(t *testing.T) {
	db := setupTestDB(t)
	service := NewProxyHostService(db)

	// Create test data first
	testHost := &models.ProxyHost{
		UUID:        "test-uuid",
		DomainNames: "test.example.com",
		ForwardHost: "localhost",
		ForwardPort: 8080,
	}
	require.NoError(t, service.Create(testHost))

	// Now fetch it by the auto-assigned ID
	host, err := service.GetByID(testHost.ID)
	require.NoError(t, err)
	assert.Equal(t, "test.example.com", host.DomainNames)
}

Pattern 2: Expecting Specific Error

Option A: Handle gorm.ErrRecordNotFound

func TestGetByID_NotFound(t *testing.T) {
	db := setupTestDB(t)
	service := NewProxyHostService(db)

	// Test error handling for non-existent ID
	_, err := service.GetByID(999)
	require.Error(t, err)
	assert.True(t, errors.Is(err, gorm.ErrRecordNotFound))
}

Option B: Wrap Error in Service (BETTER)

Modify ProxyHostService.GetByID() to return a domain-specific error:

File: backend/internal/services/proxyhost_service.go:192-197

var ErrProxyHostNotFound = errors.New("proxy host not found")

func (s *ProxyHostService) GetByID(id uint) (*models.ProxyHost, error) {
	var host models.ProxyHost
	if err := s.db.Where("id = ?", id).First(&host).Error; err != nil {
		if errors.Is(err, gorm.ErrRecordNotFound) {
			return nil, ErrProxyHostNotFound
		}
		return nil, err
	}
	return &host, nil
}

Then tests become:

func TestGetByID_NotFound(t *testing.T) {
	db := setupTestDB(t)
	service := NewProxyHostService(db)

	_, err := service.GetByID(999)
	require.Error(t, err)
	assert.True(t, errors.Is(err, services.ErrProxyHostNotFound))
}

Step 4: Add Missing Test Utilities

Create Shared Test Helper:

File: backend/internal/services/testutil/proxyhost_fixtures.go (NEW FILE)

package testutil

import (
	"testing"

	"github.com/Wikid82/charon/backend/internal/models"
	"github.com/google/uuid"
	"github.com/stretchr/testify/require"
	"gorm.io/gorm"
)

// CreateTestProxyHost creates a proxy host with sensible defaults for testing.
func CreateTestProxyHost(t *testing.T, db *gorm.DB, overrides ...func(*models.ProxyHost)) *models.ProxyHost {
	t.Helper()

	host := &models.ProxyHost{
		UUID:          uuid.NewString(),
		Name:          "Test Proxy",
		DomainNames:   "test.example.com",
		ForwardScheme: "http",
		ForwardHost:   "localhost",
		ForwardPort:   8080,
		Enabled:       true,
	}

	// Apply overrides
	for _, override := range overrides {
		override(host)
	}

	require.NoError(t, db.Create(host).Error)
	return host
}

Usage in Tests:

import "github.com/Wikid82/charon/backend/internal/services/testutil"

func TestSomeFunction(t *testing.T) {
	db := setupTestDB(t)
	service := NewProxyHostService(db)

	// Create test data with defaults
	host1 := testutil.CreateTestProxyHost(t, db)

	// Create test data with custom values
	host2 := testutil.CreateTestProxyHost(t, db, func(h *models.ProxyHost) {
		h.Name = "Custom Name"
		h.ForwardPort = 9000
	})

	// Now use them
	fetched, err := service.GetByID(host1.ID)
	require.NoError(t, err)
}

Phase 3: Validation

Consolidated Implementation Checklist

Phase 1: Multi-Workflow Environment Variable Fix

  • Generate or configure secret:

    • Option A: Generate key with openssl rand -base64 32, add to GitHub Secrets as CHARON_ENCRYPTION_KEY_TEST
    • Option B: Add key generation step to each workflow (ephemeral keys)
    • Option C: Use hardcoded test key (not recommended)
  • Update Workflow 1 (Priority: CRITICAL):

    • File: .github/workflows/quality-checks.yml
    • Location 1: Line 37-45 - Add CHARON_ENCRYPTION_KEY to "Run Go tests" step
    • Location 2: Line 115-124 - Add CHARON_ENCRYPTION_KEY to "Run Perf Asserts" step
    • Verification: Both test steps have the env var
  • Update Workflow 2 (Priority: HIGH):

    • File: .github/workflows/codecov-upload.yml
    • Location: Line 53-60 - Add CHARON_ENCRYPTION_KEY to "Run Go tests with coverage" step
    • Verification: Test step has the env var
  • Update Workflow 3 (Priority: MEDIUM):

    • File: .github/workflows/benchmark.yml
    • Location 1: Line 44 - Add CHARON_ENCRYPTION_KEY to "Run Benchmark" step
    • Location 2: Line 74 - Add CHARON_ENCRYPTION_KEY to "Run Perf Asserts" step
    • Verification: Both test steps have the env var
  • Total changes: 3 files, 5 env blocks updated

Phase 2: Test Data Setup Fixes

  • Identify failing tests with "record not found" errors
  • Fix each test by adding proper test data creation
  • Add testutil.CreateTestProxyHost() helper if needed
  • Verify all tests pass locally

Phase 3: Multi-Workflow Validation

  • Local validation (all tests pass with encryption key set)
  • Push to feature branch
  • Monitor all 3 workflow runs in GitHub Actions
  • Verify each workflow:
    • quality-checks.yml - No warnings, tests pass
    • codecov-upload.yml - No warnings, tests pass, coverage uploaded
    • benchmark.yml - No warnings, benchmarks complete

Phase 3: Validation (Detailed Procedures)

Step 1: Local Validation

Execute Before Pushing:

# 1. Set encryption key locally (matches CI)
export CHARON_ENCRYPTION_KEY=$(openssl rand -base64 32)

# 2. Run backend tests
cd /projects/Charon
.github/skills/scripts/skill-runner.sh test-backend-coverage

# 3. Verify no warnings in output
# Look for: "Warning: RotationService initialization failed"
# Expected: No warnings

# 4. Verify coverage pass
# Expected: "Coverage requirement met"

# 5. Check for test failures
# Expected: All tests pass

Success Criteria:

  • No "RotationService initialization failed" warnings
  • No "record not found" errors
  • Coverage >= 85%
  • All tests pass

Step 2: CI Validation

Push to Branch and Monitor:

git checkout -b fix/ci-backend-test-failures
git add .github/workflows/codecov-upload.yml
git add .github/workflows/quality-checks.yml
git add .github/workflows/benchmark.yml
git add backend/internal/services/proxyhost_service.go  # If modified
git add backend/internal/services/*_test.go  # Any test fixes
git commit -m "fix(ci): resolve backend test failures across all workflows

- Add CHARON_ENCRYPTION_KEY to quality-checks, codecov-upload, and benchmark workflows
- Fix ProxyHost test data setup in service tests
- Eliminate RotationService initialization warnings

Affected workflows:
- quality-checks.yml (CRITICAL: PR blocker)
- codecov-upload.yml (HIGH: coverage tracking)
- benchmark.yml (MEDIUM: performance regression)

Resolves: backend test job failures across 3 CI workflows"
git push origin fix/ci-backend-test-failures

Monitor All 3 CI Workflows:

  1. Navigate to GitHub Actions → Your PR

  2. Verify these workflow runs appear:

    • Quality Checks (most critical)
    • Upload Coverage to Codecov
    • Go Benchmark (may run later via workflow_run trigger)
  3. For each workflow, verify:

    • No stderr warnings in test execution steps
    • Test output shows all tests passing
    • No "RotationService initialization failed" messages
    • No "record not found" errors
  4. Quality Checks specific checks:

    • "Run Go tests" step succeeds
    • "Run Perf Asserts" step succeeds
    • GORM Security Scanner passes
    • Frontend tests pass (unrelated but monitored)
  5. Codecov Upload specific checks:

    • Backend tests pass
    • Coverage upload succeeds
    • Coverage report appears on PR
  6. Benchmark specific checks:

    • Benchmarks complete without errors
    • Performance assertions pass
    • (Note: Results may only store on main branch pushes)

Expected Duration:

  • quality-checks.yml: ~3-5 minutes
  • codecov-upload.yml: ~3-5 minutes
  • benchmark.yml: ~4-6 minutes

Success Criteria - ALL workflows must:

  • Complete without failures
  • Show no encryption key warnings
  • Show no database record errors
  • Maintain or improve coverage/performance baselines

Dependencies & Risks

Dependencies

Internal:

  • GitHub repository secrets access (for Option A)
  • Ability to modify 3 workflow files: .github/workflows/{codecov-upload,quality-checks,benchmark}.yml
  • Go test environment (local and CI)

External:

  • Codecov service (for coverage upload)
  • GitHub Actions runner availability

Risks

Risk Likelihood Impact Mitigation
Tests fail after adding encryption key Low Medium Test locally first with same env var
New test failures introduced by fixes Medium Medium Validate each test fix individually
Coverage drops below 85% Low High Add tests alongside fixes, not after
Codecov upload still fails Low High Verify Codecov token is valid
Breaking other tests by modifying ProxyHostService Low High Only add error wrapping, don't change logic
Missing affected workflows (incomplete fix) Low Critical Verified all workflows via grep search; only 3 run Go tests
Workflow fixes out of sync Medium High Use same env var name (CHARON_ENCRYPTION_KEY) across all workflows
Quality checks workflow more critical than codecov N/A Critical Prioritize quality-checks.yml - it blocks PR merges
Benchmark workflow fails silently Low Medium Add same fix proactively even if not currently failing

Multi-Workflow Coordination

Critical Insight: The quality-checks.yml workflow is MORE important than codecov-upload.yml because:

  • Quality checks run on every PR and block merges
  • Codecov upload is informational and doesn't block merges
  • Quality checks includes multiple test types (unit tests + perf tests)

Implementation Priority:

  1. FIRST: Fix quality-checks.yml (most critical - PR blocker)
  2. SECOND: Fix codecov-upload.yml (high priority - coverage tracking)
  3. THIRD: Fix benchmark.yml (proactive - prevent future issues)

Consistency Requirements:

  • All workflows MUST use the same environment variable name: CHARON_ENCRYPTION_KEY
  • If using Option A (GitHub Secrets), all workflows MUST reference the same secret: CHARON_ENCRYPTION_KEY_TEST
  • If using Option B (Ephemeral), all workflows MUST generate keys the same way for consistency

Technical Debt Created

  1. Test Helper Utilities:

    • New testutil package should be documented
    • Consider creating similar helpers for other models
  2. Error Handling Consistency:

    • If wrapping gorm.ErrRecordNotFound, apply same pattern to all services
    • Document error handling conventions
  3. Environment Variable Documentation:

    • Update docs/development.md with required CI env vars
    • Document test key generation process

Stop/Go Rules

Stop Conditions

Phase 1 (Environment Variables):

  • STOP if: Local tests fail after setting CHARON_ENCRYPTION_KEY
    • Action: Investigate why encryption key breaks tests
    • Escalate to: Backend service owners

Phase 2 (Test Fixes):

  • STOP if: More than 5 test files need modifications
    • Action: Consider global test fixture/seed instead
    • Escalate to: Test infrastructure team
  • STOP if: Fixing tests requires production code changes beyond error wrapping
    • Action: Escalate as potential design issue

Phase 3 (Validation):

  • STOP if: CI still fails after local validation passes
    • Action: Compare CI environment vs. local (Go version, SQLite version, etc.)
    • Escalate to: DevOps/CI team

Go Conditions

Phase 1 → Phase 2:

  • GO if: Tests run with no RotationService warnings
  • GO if: Coverage remains >= 85%

Phase 2 → Phase 3:

  • GO if: All identified test failures are fixed
  • GO if: No new test failures introduced

Phase 3 → Complete:

  • GO if: CI run passes with all checks green
  • GO if: Codecov upload succeeds

Success Metrics

Quantitative

  1. RotationService Warnings: 0 occurrences in CI logs
  2. Test Failures: 0 "record not found" errors
  3. Coverage: Maintain >= 85% backend coverage
  4. CI Duration: No increase in test execution time
  5. Test Pass Rate: 100% (all tests pass)

Qualitative

  1. Code Quality: Test fixes follow established patterns
  2. Documentation: Changes are self-explanatory or documented
  3. Maintainability: Future tests can easily create test data
  4. Security: Encryption key handling follows best practices

Timeline Estimate

Phase Estimated Duration Confidence
Phase 1: Environment Variable (3 workflows) 45 minutes High
Phase 2: Test Fixes 1-3 hours Medium
Phase 3: Validation (3 workflows) 45 minutes High
Total 2.5-4.5 hours Medium

Assumptions:

  • Fewer than 5 tests need fixing
  • No production code changes required (beyond error wrapping)
  • CI environment is stable
  • All 3 workflows can be tested in parallel

Phase 1 Breakdown:

  • Generate/configure secret: 5 minutes
  • Update quality-checks.yml (2 env blocks): 15 minutes
  • Update codecov-upload.yml (1 env block): 10 minutes
  • Update benchmark.yml (2 env blocks): 10 minutes
  • Document changes and verify: 5 minutes

Contingency:

  • If more than 5 tests fail: +2 hours
  • If production code needs refactoring: +4 hours
  • If CI environment has additional issues: +1 hour
  • If workflows have unexpected dependencies: +1 hour

Follow-Up Actions

Immediate (This PR)

  1. Add CHARON_ENCRYPTION_KEY to CI workflow
  2. Fix all identified test failures
  3. Verify CI passes

Short-Term (Next Sprint)

  1. Test Infrastructure Audit:

    • Document all required environment variables for tests
    • Create standardized test setup utilities (testutil package)
    • Add linting rule to catch missing test data setup
  2. Error Handling Standardization:

    • Define domain-specific errors for all services (not just ProxyHost)
    • Document error handling conventions
    • Apply pattern to all *Service.GetByID() methods
  3. CI Environment Documentation:

    • Document all GitHub Secrets required for workflows
    • Create key rotation procedure
    • Add CI environment variable checklist

Long-Term (Future)

  1. Test Fixture Framework:

    • Evaluate using testfixtures or similar library
    • Create declarative test data setup
    • Reduce boilerplate in test files
  2. Integration Testing:

    • Separate unit tests (fast, mocked) from integration tests (real DB)
    • Use build tags: //go:build integration
    • Run integration tests separately in CI
  3. Service Constructor Refactoring:

    • Make RotationService initialization explicit
    • Allow tests to inject mock RotationService
    • Reduce warning messages in test output

References

Files Analyzed

CI Configuration:

  • .github/workflows/codecov-upload.yml (workflow definition)

Backend Services:

  • backend/internal/crypto/rotation_service.go (encryption key loading)
  • backend/internal/services/dns_provider_service.go (RotationService usage)
  • backend/internal/services/credential_service.go (RotationService usage)
  • backend/internal/services/proxyhost_service.go (GetByID implementation)

Tests:

  • backend/internal/crypto/rotation_service_test.go (key setup pattern)
  • backend/internal/services/dns_provider_service_test.go (test setup)
  • backend/internal/services/credential_service_test.go (test setup)
  • backend/internal/services/proxyhost_service_test.go (CRUD test pattern)
  • backend/internal/api/handlers/proxy_host_handler_update_test.go (test helper)

Documentation:

  • .env.example (environment variable reference)
  • ARCHITECTURE.md (encryption key documentation)
  • docs/guides/dns-providers.md (encryption key usage guide)

External Resources


Appendix A: Workflow Analysis Details

Analysis Methodology

Search Commands Used:

# Find all workflow files
find .github/workflows -name "*.yml"

# Find workflows running Go tests
grep -r "go test\|go-test-coverage\.sh" .github/workflows/*.yml

# Find workflows with encryption key
grep -r "CHARON_ENCRYPTION_KEY" .github/workflows/*.yml

Results:

  • 39 total workflow files in .github/workflows/
  • 3 workflows run Go unit tests (affected by missing encryption key)
  • 1 workflow (e2e-tests-split.yml) already has encryption key configured
  • 2 workflows (cerberus, crowdsec) run integration tests (not affected)
  • 33 workflows don't run backend tests (not affected)

Workflow-by-Workflow Breakdown

1. quality-checks.yml (CRITICAL)

Purpose: PR quality gates that block merges Trigger: On every pull_request to main/development Impact: Most critical - blocks PR approvals Test Commands:

  • Line 43: bash "scripts/go-test-coverage.sh"
  • Line 123: go test -run TestPerf -v ./internal/api/handlers

Current Status: Failing Fix Required: Add CHARON_ENCRYPTION_KEY to both test steps Expected Result: PR checks turn green, allowing merges

2. codecov-upload.yml (HIGH PRIORITY)

Purpose: Upload test coverage to Codecov service Trigger: On pull_request to main/development + workflow_dispatch Impact: High - coverage tracking and reporting Test Commands:

  • Line 58: bash scripts/go-test-coverage.sh

Current Status: Failing Fix Required: Add CHARON_ENCRYPTION_KEY to test step Expected Result: Coverage reports appear on PRs

3. benchmark.yml (MEDIUM PRIORITY)

Purpose: Performance regression detection Trigger: After docker-build.yml completes + workflow_dispatch Impact: Medium - catches performance regressions Test Commands:

  • Line 44: go test -bench=. -benchmem -run='^$' ./...
  • Line 74: go test -run TestPerf -v ./internal/api/handlers

Current Status: ⚠️ At risk (may not have failed yet) Fix Required: Add CHARON_ENCRYPTION_KEY to both test steps (proactive) Expected Result: Benchmarks run cleanly without warnings

4. e2e-tests-split.yml (ALREADY FIXED)

Purpose: End-to-end Playwright tests Trigger: Multiple triggers, runs E2E test shards Status: Already configured correctly

Evidence of correct configuration:

# Lines 280, 481, 690, 894, 1098, 1310 - All identical:
- name: Generate test encryption key
  run: echo "CHARON_ENCRYPTION_KEY=$(openssl rand -base64 32)" >> "$GITHUB_ENV"

Why it's correct: Each shard generates its own ephemeral key before running tests. This is the pattern Option B recommends.

5. cerberus-integration.yml (NOT AFFECTED)

Purpose: Cerberus security stack integration tests Test Type: Docker compose with integration scripts Why not affected: Doesn't run go test - runs scripts/cerberus_integration.sh Status: No changes needed

6. crowdsec-integration.yml (NOT AFFECTED)

Purpose: CrowdSec bouncer integration tests Test Type: Docker compose with integration scripts Why not affected: Doesn't run go test - runs skill-based integration scripts Status: No changes needed

Why Other Workflows Aren't Affected

Workflows without backend tests:

  • docker-build.yml - Builds images, no test execution
  • codeql.yml - Security scanning only
  • supply-chain-*.yml - SBOM and provenance only
  • release-goreleaser.yml - Release automation
  • docs.yml - Documentation deployment
  • repo-health.yml - Repository maintenance
  • renovate_prune.yml - Dependency management
  • auto-versioning.yml - Version bumping
  • caddy-major-monitor.yml - Upstream monitoring
  • update-geolite2.yml - GeoIP updates
  • nightly-build.yml - Scheduled builds
  • propagate-changes.yml - Branch sync
  • weekly-nightly-promotion.yml - Release promotion
  • gh_cache_cleanup.yml - Cache maintenance

Key Insight: The CI failures only affect workflows that run go test commands, and specifically those that instantiate services requiring RotationService. Integration test workflows use Docker compose and don't instantiate Go services directly in the CI runner.


Sign-Off

Prepared by: Investigation Agent Reviewed by: Pending (Awaiting supervisor approval) Approved by: Pending

Next Action: Await approval to proceed with Phase 1 implementation.