Files

GitHub Actions f4ef79def3 chore: repo cleanup by archiving plans / reports

2026-02-19 16:34:10 +00:00

38 KiB

Raw Blame History

CI Codecov Backend Test Failures - Remediation Plan

Date: 2026-02-16 Status: Investigation Complete - Ready for Implementation Priority: CRITICAL CI BLOCKER Workflow: .github/workflows/codecov-upload.yml → backend-codecov job

Executive Summary

CRITICAL: Multiple CI workflows are failing with the same root cause. Investigation reveals these failures affect 3 workflows, not just codecov-upload.

Affected Workflows

Workflow File	Purpose	Job Name(s)	Test Command	Status	Priority
`codecov-upload.yml`	Coverage upload to Codecov	`backend-codecov`	`go-test-coverage.sh`	❌ Failing	CRITICAL
`quality-checks.yml`	PR quality gates	`backend-quality`	`go-test-coverage.sh` + `go test -run TestPerf`	❌ Failing	CRITICAL
`benchmark.yml`	Performance regression checks	`benchmark`	`go test -bench` + `go test -run TestPerf`	⚠️ At Risk	HIGH

Other Workflows Analyzed (NOT affected):

✅ e2e-tests-split.yml - Already has CHARON_ENCRYPTION_KEY configured (6+ locations)
✅ cerberus-integration.yml - Runs integration scripts, not Go unit tests
✅ crowdsec-integration.yml - Runs integration scripts, not Go unit tests
✅ All other workflows - Do not run backend Go tests

Root Cause Issues

RotationService Initialization Warnings (Non-blocking but pollutes logs)
- Multiple services print: "Warning: RotationService initialization failed, using basic encryption: CHARON_ENCRYPTION_KEY is required"
- Root cause: Missing CHARON_ENCRYPTION_KEY environment variable in ALL 3 affected workflows
- Impact: Services fall back to basic encryption (no test failures, but warnings appear)
GORM "record not found" Errors (Blocking failures)
- Source: backend/internal/services/proxyhost_service.go:194
- Root cause: Tests calling GetByID() without proper test data setup
- Impact: Tests expecting proxy host records fail with gorm.ErrRecordNotFound

Investigation Findings

1. Encryption Key Requirements

File Analysis: `.github/workflows/codecov-upload.yml`

Path: /projects/Charon/.github/workflows/codecov-upload.yml Lines: 43-53 (backend-codecov job)

Current Environment Variables:

env:
  CGO_ENABLED: 1

Missing Variables:

CHARON_ENCRYPTION_KEY (required for RotationService)

File Analysis: `backend/internal/crypto/rotation_service.go`

Path: /projects/Charon/backend/internal/crypto/rotation_service.go Lines: 63-75

Error Trigger:

func NewRotationService(db *gorm.DB) (*RotationService, error) {
	// Load current key (required)
	currentKeyB64 := os.Getenv("CHARON_ENCRYPTION_KEY")
	if currentKeyB64 == "" {
		return nil, fmt.Errorf("CHARON_ENCRYPTION_KEY is required")
	}
	// ...
}

File Analysis: Service Dependencies

Affected Services:

backend/internal/services/dns_provider_service.go:145 - Calls crypto.NewRotationService(db)
backend/internal/services/credential_service.go:72 - Calls crypto.NewRotationService(db)

Fallback Behavior:

rotationService, err := crypto.NewRotationService(db)
if err != nil {
	// Fallback to non-rotation mode
	fmt.Printf("Warning: RotationService initialization failed, using basic encryption: %v\n", err)
}

Test Setup Comparison:

Test File	Sets CHARON_ENCRYPTION_KEY?	Uses RotationService?
`rotation_service_test.go`	✅ Yes (via `setupTestKeys()`)	✅ Yes
`dns_provider_service_test.go`	❌ No (hardcoded test key)	⚠️ Tries but falls back
`credential_service_test.go`	❌ No (hardcoded test key)	⚠️ Tries but falls back

Example: How Tests Set Encryption Keys

File: backend/internal/crypto/rotation_service_test.go:28-41

func setupTestKeys(t *testing.T) (currentKey, nextKey, legacyKey string) {
	currentKey, err := GenerateNewKey()
	require.NoError(t, err)

	_ = os.Setenv("CHARON_ENCRYPTION_KEY", currentKey)
	t.Cleanup(func() { _ = os.Unsetenv("CHARON_ENCRYPTION_KEY") })

	return currentKey, nextKey, legacyKey
}

File: backend/internal/services/dns_provider_service_test.go:62

// Does NOT set CHARON_ENCRYPTION_KEY
encryptor, err := crypto.NewEncryptionService("AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=")

2. ProxyHost "Record Not Found" Errors

File Analysis: `backend/internal/services/proxyhost_service.go`

Path: /projects/Charon/backend/internal/services/proxyhost_service.go Lines: 192-197

Error Source:

func (s *ProxyHostService) GetByID(id uint) (*models.ProxyHost, error) {
	var host models.ProxyHost
	if err := s.db.Where("id = ?", id).First(&host).Error; err != nil {
		return nil, err  // Returns gorm.ErrRecordNotFound if no record
	}
	return &host, nil
}

GORM Error Type: gorm.ErrRecordNotFound (not explicitly handled in ProxyHostService)

Test Pattern Analysis

File: backend/internal/services/proxyhost_service_test.go:73-102

Working Test Pattern:

func TestProxyHostService_CRUD(t *testing.T) {
	db := setupProxyHostTestDB(t)
	service := NewProxyHostService(db)

	// Create test data BEFORE calling GetByID
	host := &models.ProxyHost{
		UUID:        "uuid-1",
		DomainNames: "test.example.com",
		ForwardHost: "127.0.0.1",
		ForwardPort: 8080,
	}
	err := service.Create(host)  // Creates record in DB
	assert.NoError(t, err)
	assert.NotZero(t, host.ID)

	// Now GetByID works because record exists
	fetched, err := service.GetByID(host.ID)
	assert.NoError(t, err)
	assert.Equal(t, host.DomainNames, fetched.DomainNames)
}

File: backend/internal/api/handlers/proxy_host_handler_update_test.go:50-60

Helper Function Pattern:

func createTestProxyHost(t *testing.T, db *gorm.DB, name string) models.ProxyHost {
	host := models.ProxyHost{
		UUID:          uuid.NewString(),
		Name:          name,
		DomainNames:   name + ".test.com",
		ForwardScheme: "http",
		ForwardHost:   "localhost",
		ForwardPort:   8080,
		Enabled:       true,
	}
	require.NoError(t, db.Create(&host).Error)
	return host
}

Likely Failure Scenario

Hypothesis: Some tests are calling GetByID() with a hardcoded ID (e.g., GetByID(1)) expecting a record to exist, but:

SQLite in-memory DB is empty at test start
Test doesn't create the record before calling GetByID()
Test previously relied on global seeding that no longer runs

To Identify Failing Tests:

# Search for tests calling GetByID without creating the record first
grep -r "GetByID(" backend/**/*_test.go

Root Cause Analysis

Why Were Tests Passing Before?

Encryption Key Warnings:

Tests have ALWAYS printed these warnings (not a recent regression)
Warnings are to stderr, don't fail tests
This is "noise" that should be cleaned up

ProxyHost Errors:

Likely Recent Change:
- A test was recently modified to call GetByID() without proper setup
- A global test fixture/seed was removed
- Test database setup order changed
Verification Needed: Check recent commits to *_test.go files

CI vs. Local Test Differences

CI Environment (codecov-upload.yml):

No environment variables set beyond CGO_ENABLED=1
Fresh test database for each test run
No .env file loaded

Local Environment:

May have .env file with CHARON_ENCRYPTION_KEY set
Test setup may differ from CI
Local runs might have different test execution order

Key Files Checked:

.env.example - Shows CHARON_ENCRYPTION_KEY= (empty, requires generation)
scripts/go-test-coverage.sh - Does NOT set CHARON_ENCRYPTION_KEY
scripts/setup-e2e-env.sh - Generates key for E2E tests (NOT unit tests)

Remediation Plan

Phase 1: Environment Variable Configuration (WARNING ELIMINATION)

Objective: Eliminate RotationService initialization warnings in CI logs across ALL affected workflows

Implementation Strategy

Single Secret for All Workflows:

Use one GitHub Secret: CHARON_ENCRYPTION_KEY_TEST
Apply to all 3 workflows consistently
Same security model across all test runs

Option A: Set in GitHub Actions (RECOMMENDED)

Security: Use GitHub Repository Secrets for production-like CI

Implementation:

Generate Test Key:

# Local execution to generate key
openssl rand -base64 32

Add to GitHub Secrets:
- Navigate to: Repository → Settings → Secrets → Actions
- Create new secret: CHARON_ENCRYPTION_KEY_TEST
- Value: Generated base64 key from step 1

Update ALL 3 Workflows:

Workflow 1: codecov-upload.yml File: .github/workflows/codecov-upload.yml Location: Line 53-60 (backend-codecov job, "Run Go tests with coverage" step)

- name: Run Go tests with coverage
  working-directory: ${{ github.workspace }}
  env:
    CGO_ENABLED: 1
    CHARON_ENCRYPTION_KEY: ${{ secrets.CHARON_ENCRYPTION_KEY_TEST }}  # ADD THIS LINE
  run: |
    bash scripts/go-test-coverage.sh 2>&1 | tee backend/test-output.txt
    exit "${PIPESTATUS[0]}"

Workflow 2: quality-checks.yml (Test Coverage Step) File: .github/workflows/quality-checks.yml Location: Line 37-45 (backend-quality job, "Run Go tests" step)

- name: Run Go tests
  id: go-tests
  working-directory: ${{ github.workspace }}
  env:
    CGO_ENABLED: 1
    CHARON_ENCRYPTION_KEY: ${{ secrets.CHARON_ENCRYPTION_KEY_TEST }}  # ADD THIS LINE
  run: |
    bash "scripts/go-test-coverage.sh" 2>&1 | tee backend/test-output.txt
    exit "${PIPESTATUS[0]}"

Workflow 2: quality-checks.yml (Perf Tests Step) File: .github/workflows/quality-checks.yml Location: Line 115-124 (backend-quality job, "Run Perf Asserts" step)

- name: Run Perf Asserts
  working-directory: backend
  env:
    # Conservative defaults to avoid flakiness on CI; tune as necessary
    PERF_MAX_MS_GETSTATUS_P95: 500ms
    PERF_MAX_MS_GETSTATUS_P95_PARALLEL: 1500ms
    PERF_MAX_MS_LISTDECISIONS_P95: 2000ms
    CHARON_ENCRYPTION_KEY: ${{ secrets.CHARON_ENCRYPTION_KEY_TEST }}  # ADD THIS LINE
  run: |
    {
      echo "## 🔍 Running performance assertions (TestPerf)"
      go test -run TestPerf -v ./internal/api/handlers -count=1 | tee perf-output.txt
    } >> "$GITHUB_STEP_SUMMARY"
    exit "${PIPESTATUS[0]}"

Workflow 3: benchmark.yml (Benchmark Step) File: .github/workflows/benchmark.yml Location: Line 44 (benchmark job, "Run Benchmark" step)

- name: Run Benchmark
  working-directory: backend
  env:
    CHARON_ENCRYPTION_KEY: ${{ secrets.CHARON_ENCRYPTION_KEY_TEST }}  # ADD THIS LINE
  run: go test -bench=. -benchmem -run='^$' ./... | tee output.txt

Workflow 3: benchmark.yml (Perf Asserts Step) File: .github/workflows/benchmark.yml Location: Line 74 (benchmark job, "Run Perf Asserts" step)

- name: Run Perf Asserts
  working-directory: backend
  env:
    PERF_MAX_MS_GETSTATUS_P95: 500ms
    PERF_MAX_MS_GETSTATUS_P95_PARALLEL: 1500ms
    PERF_MAX_MS_LISTDECISIONS_P95: 2000ms
    CHARON_ENCRYPTION_KEY: ${{ secrets.CHARON_ENCRYPTION_KEY_TEST }}  # ADD THIS LINE
  run: |
    echo "## 🔍 Running performance assertions (TestPerf)" >> "$GITHUB_STEP_SUMMARY"
    go test -run TestPerf -v ./internal/api/handlers -count=1 | tee perf-output.txt
    exit "${PIPESTATUS[0]}"

Summary of Changes:

3 workflow files to modify
5 env sections to update (2 in quality-checks, 2 in benchmark, 1 in codecov-upload)
1 GitHub Secret to create

Pros:

Secrets are encrypted at rest
Key never appears in logs
Matches production security model
Consistent across all workflows

Cons:

Requires GitHub repository admin access
Key rotation requires updating secret (but affects all workflows at once)

Option B: Generate Ephemeral Key (ALTERNATIVE)

Security: Generate temporary key for each CI run

Implementation:

Apply this pattern to all 3 workflows. Each workflow generates its own ephemeral key.

Workflow 1: codecov-upload.yml File: .github/workflows/codecov-upload.yml Location: Before "Run Go tests with coverage" step (after "Set up Go")

- name: Generate test encryption key
  id: test-key
  run: |
    TEST_KEY=$(openssl rand -base64 32)
    echo "::add-mask::${TEST_KEY}"
    echo "CHARON_ENCRYPTION_KEY=${TEST_KEY}" >> $GITHUB_ENV

- name: Run Go tests with coverage
  working-directory: ${{ github.workspace }}
  env:
    CGO_ENABLED: 1
    # CHARON_ENCRYPTION_KEY inherited from $GITHUB_ENV
  run: |
    bash scripts/go-test-coverage.sh 2>&1 | tee backend/test-output.txt
    exit "${PIPESTATUS[0]}"

Workflow 2: quality-checks.yml File: .github/workflows/quality-checks.yml Location: Before "Run Go tests" step (after "Repo health check")

- name: Generate test encryption key
  id: test-key
  run: |
    TEST_KEY=$(openssl rand -base64 32)
    echo "::add-mask::${TEST_KEY}"
    echo "CHARON_ENCRYPTION_KEY=${TEST_KEY}" >> $GITHUB_ENV

- name: Run Go tests
  id: go-tests
  working-directory: ${{ github.workspace }}
  env:
    CGO_ENABLED: 1
    # CHARON_ENCRYPTION_KEY inherited from $GITHUB_ENV
  run: |
    bash "scripts/go-test-coverage.sh" 2>&1 | tee backend/test-output.txt
    exit "${PIPESTATUS[0]}"

# ... later in the same job ...

- name: Run Perf Asserts
  working-directory: backend
  env:
    PERF_MAX_MS_GETSTATUS_P95: 500ms
    PERF_MAX_MS_GETSTATUS_P95_PARALLEL: 1500ms
    PERF_MAX_MS_LISTDECISIONS_P95: 2000ms
    # CHARON_ENCRYPTION_KEY inherited from $GITHUB_ENV
  run: |
    {
      echo "## 🔍 Running performance assertions (TestPerf)"
      go test -run TestPerf -v ./internal/api/handlers -count=1 | tee perf-output.txt
    } >> "$GITHUB_STEP_SUMMARY"
    exit "${PIPESTATUS[0]}"

Workflow 3: benchmark.yml File: .github/workflows/benchmark.yml Location: Before "Run Benchmark" step (after "Set up Go")

- name: Generate test encryption key
  id: test-key
  run: |
    TEST_KEY=$(openssl rand -base64 32)
    echo "::add-mask::${TEST_KEY}"
    echo "CHARON_ENCRYPTION_KEY=${TEST_KEY}" >> $GITHUB_ENV

- name: Run Benchmark
  working-directory: backend
  env:
    # CHARON_ENCRYPTION_KEY inherited from $GITHUB_ENV
  run: go test -bench=. -benchmem -run='^$' ./... | tee output.txt

# ... later in the same job ...

- name: Run Perf Asserts
  working-directory: backend
  env:
    PERF_MAX_MS_GETSTATUS_P95: 500ms
    PERF_MAX_MS_GETSTATUS_P95_PARALLEL: 1500ms
    PERF_MAX_MS_LISTDECISIONS_P95: 2000ms
    # CHARON_ENCRYPTION_KEY inherited from $GITHUB_ENV
  run: |
    echo "## 🔍 Running performance assertions (TestPerf)" >> "$GITHUB_STEP_SUMMARY"
    go test -run TestPerf -v ./internal/api/handlers -count=1 | tee perf-output.txt
    exit "${PIPESTATUS[0]}"

Pros:

No secrets management needed
Key is ephemeral (discarded after run)
Simpler to implement
Each workflow run gets its own unique key

Cons:

Generates new key on every run (minimal overhead ~0.1s)
Doesn't test key persistence scenarios

Option C: Inline Test Key (NOT RECOMMENDED)

Security: Hardcode a test-only key in workflow

Implementation:

Apply same hardcoded key to all 3 workflows:

- name: Run Go tests with coverage  # or Run Benchmark, or Run Perf Asserts
  working-directory: ${{ github.workspace }}
  env:
    CGO_ENABLED: 1
    CHARON_ENCRYPTION_KEY: "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA="  # Hardcoded test key
  run: |
    bash scripts/go-test-coverage.sh 2>&1 | tee backend/test-output.txt
    exit "${PIPESTATUS[0]}"

Apply to:

.github/workflows/codecov-upload.yml - Line 53 env block
.github/workflows/quality-checks.yml - Lines 37 and 115 env blocks
.github/workflows/benchmark.yml - Lines 44 and 74 env blocks

Pros:

Simplest to implement (just add one line per env block)
No secrets management
No key generation overhead

Cons:

⚠️ Key visible in workflow file and logs
⚠️ Security audit will flag this
⚠️ Doesn't test real key loading from environment
⚠️ Not recommended for repos with security compliance requirements

Recommendation: Use Option A (GitHub Secrets) for production readiness and security compliance, or Option B (Ephemeral) for simplicity without security concerns. Avoid Option C unless this is a demo/test repository.

Phase 2: Database Seeding/Test Setup (ERROR ELIMINATION)

Objective: Fix ProxyHost "record not found" failures

Step 1: Identify Failing Tests

Action: Run tests locally and capture failures

cd backend
go test -v ./... 2>&1 | tee test-output.txt
grep -i "record not found" test-output.txt

Expected Output:

--- FAIL: TestSomeFunction (0.00s)
    service_test.go:123: Error getting proxy host: record not found

Step 2: Classify Failures

For each failing test, determine:

Test calls GetByID() without creating record?
- Fix: Add createTestProxyHost() call before GetByID()
Test expects a specific ID (e.g., ID=1)?
- Fix: Store the returned ID from Create() and use it in GetByID()
Test relies on global seed data?
- Fix: Add explicit test data creation in test setup

Step 3: Apply Fixes

Pattern 1: Missing Test Data Creation

Before (Broken):

func TestSomeFunction(t *testing.T) {
	db := setupTestDB(t)
	service := NewProxyHostService(db)

	// Assumes ID=1 exists (WRONG)
	host, err := service.GetByID(1)
	require.NoError(t, err)
}

After (Fixed):

func TestSomeFunction(t *testing.T) {
	db := setupTestDB(t)
	service := NewProxyHostService(db)

	// Create test data first
	testHost := &models.ProxyHost{
		UUID:        "test-uuid",
		DomainNames: "test.example.com",
		ForwardHost: "localhost",
		ForwardPort: 8080,
	}
	require.NoError(t, service.Create(testHost))

	// Now fetch it by the auto-assigned ID
	host, err := service.GetByID(testHost.ID)
	require.NoError(t, err)
	assert.Equal(t, "test.example.com", host.DomainNames)
}

Pattern 2: Expecting Specific Error

Option A: Handle gorm.ErrRecordNotFound

func TestGetByID_NotFound(t *testing.T) {
	db := setupTestDB(t)
	service := NewProxyHostService(db)

	// Test error handling for non-existent ID
	_, err := service.GetByID(999)
	require.Error(t, err)
	assert.True(t, errors.Is(err, gorm.ErrRecordNotFound))
}

Option B: Wrap Error in Service (BETTER)

Modify ProxyHostService.GetByID() to return a domain-specific error:

File: backend/internal/services/proxyhost_service.go:192-197

var ErrProxyHostNotFound = errors.New("proxy host not found")

func (s *ProxyHostService) GetByID(id uint) (*models.ProxyHost, error) {
	var host models.ProxyHost
	if err := s.db.Where("id = ?", id).First(&host).Error; err != nil {
		if errors.Is(err, gorm.ErrRecordNotFound) {
			return nil, ErrProxyHostNotFound
		}
		return nil, err
	}
	return &host, nil
}

Then tests become:

func TestGetByID_NotFound(t *testing.T) {
	db := setupTestDB(t)
	service := NewProxyHostService(db)

	_, err := service.GetByID(999)
	require.Error(t, err)
	assert.True(t, errors.Is(err, services.ErrProxyHostNotFound))
}

Step 4: Add Missing Test Utilities

Create Shared Test Helper:

File: backend/internal/services/testutil/proxyhost_fixtures.go (NEW FILE)

package testutil

import (
	"testing"

	"github.com/Wikid82/charon/backend/internal/models"
	"github.com/google/uuid"
	"github.com/stretchr/testify/require"
	"gorm.io/gorm"
)

// CreateTestProxyHost creates a proxy host with sensible defaults for testing.
func CreateTestProxyHost(t *testing.T, db *gorm.DB, overrides ...func(*models.ProxyHost)) *models.ProxyHost {
	t.Helper()

	host := &models.ProxyHost{
		UUID:          uuid.NewString(),
		Name:          "Test Proxy",
		DomainNames:   "test.example.com",
		ForwardScheme: "http",
		ForwardHost:   "localhost",
		ForwardPort:   8080,
		Enabled:       true,
	}

	// Apply overrides
	for _, override := range overrides {
		override(host)
	}

	require.NoError(t, db.Create(host).Error)
	return host
}

Usage in Tests:

import "github.com/Wikid82/charon/backend/internal/services/testutil"

func TestSomeFunction(t *testing.T) {
	db := setupTestDB(t)
	service := NewProxyHostService(db)

	// Create test data with defaults
	host1 := testutil.CreateTestProxyHost(t, db)

	// Create test data with custom values
	host2 := testutil.CreateTestProxyHost(t, db, func(h *models.ProxyHost) {
		h.Name = "Custom Name"
		h.ForwardPort = 9000
	})

	// Now use them
	fetched, err := service.GetByID(host1.ID)
	require.NoError(t, err)
}

Phase 3: Validation

Consolidated Implementation Checklist

Phase 1: Multi-Workflow Environment Variable Fix

Generate or configure secret:
- Option A: Generate key with openssl rand -base64 32, add to GitHub Secrets as CHARON_ENCRYPTION_KEY_TEST
- Option B: Add key generation step to each workflow (ephemeral keys)
- Option C: Use hardcoded test key (not recommended)
Update Workflow 1 (Priority: CRITICAL):
- File: .github/workflows/quality-checks.yml
- Location 1: Line 37-45 - Add CHARON_ENCRYPTION_KEY to "Run Go tests" step
- Location 2: Line 115-124 - Add CHARON_ENCRYPTION_KEY to "Run Perf Asserts" step
- Verification: Both test steps have the env var
Update Workflow 2 (Priority: HIGH):
- File: .github/workflows/codecov-upload.yml
- Location: Line 53-60 - Add CHARON_ENCRYPTION_KEY to "Run Go tests with coverage" step
- Verification: Test step has the env var
Update Workflow 3 (Priority: MEDIUM):
- File: .github/workflows/benchmark.yml
- Location 1: Line 44 - Add CHARON_ENCRYPTION_KEY to "Run Benchmark" step
- Location 2: Line 74 - Add CHARON_ENCRYPTION_KEY to "Run Perf Asserts" step
- Verification: Both test steps have the env var
Total changes: 3 files, 5 env blocks updated

Phase 2: Test Data Setup Fixes

Identify failing tests with "record not found" errors
Fix each test by adding proper test data creation
Add testutil.CreateTestProxyHost() helper if needed
Verify all tests pass locally

Phase 3: Multi-Workflow Validation

Local validation (all tests pass with encryption key set)
Push to feature branch
Monitor all 3 workflow runs in GitHub Actions
Verify each workflow:
- ✅ quality-checks.yml - No warnings, tests pass
- ✅ codecov-upload.yml - No warnings, tests pass, coverage uploaded
- ✅ benchmark.yml - No warnings, benchmarks complete

Phase 3: Validation (Detailed Procedures)

Step 1: Local Validation

Execute Before Pushing:

# 1. Set encryption key locally (matches CI)
export CHARON_ENCRYPTION_KEY=$(openssl rand -base64 32)

# 2. Run backend tests
cd /projects/Charon
.github/skills/scripts/skill-runner.sh test-backend-coverage

# 3. Verify no warnings in output
# Look for: "Warning: RotationService initialization failed"
# Expected: No warnings

# 4. Verify coverage pass
# Expected: "Coverage requirement met"

# 5. Check for test failures
# Expected: All tests pass

Success Criteria:

✅ No "RotationService initialization failed" warnings
✅ No "record not found" errors
✅ Coverage >= 85%
✅ All tests pass

Step 2: CI Validation

Push to Branch and Monitor:

git checkout -b fix/ci-backend-test-failures
git add .github/workflows/codecov-upload.yml
git add .github/workflows/quality-checks.yml
git add .github/workflows/benchmark.yml
git add backend/internal/services/proxyhost_service.go  # If modified
git add backend/internal/services/*_test.go  # Any test fixes
git commit -m "fix(ci): resolve backend test failures across all workflows

- Add CHARON_ENCRYPTION_KEY to quality-checks, codecov-upload, and benchmark workflows
- Fix ProxyHost test data setup in service tests
- Eliminate RotationService initialization warnings

Affected workflows:
- quality-checks.yml (CRITICAL: PR blocker)
- codecov-upload.yml (HIGH: coverage tracking)
- benchmark.yml (MEDIUM: performance regression)

Resolves: backend test job failures across 3 CI workflows"
git push origin fix/ci-backend-test-failures

Monitor All 3 CI Workflows:

Navigate to GitHub Actions → Your PR
Verify these workflow runs appear:
- ✅ Quality Checks (most critical)
- ✅ Upload Coverage to Codecov
- ✅ Go Benchmark (may run later via workflow_run trigger)
For each workflow, verify:
- No stderr warnings in test execution steps
- Test output shows all tests passing
- No "RotationService initialization failed" messages
- No "record not found" errors
Quality Checks specific checks:
- "Run Go tests" step succeeds
- "Run Perf Asserts" step succeeds
- GORM Security Scanner passes
- Frontend tests pass (unrelated but monitored)
Codecov Upload specific checks:
- Backend tests pass
- Coverage upload succeeds
- Coverage report appears on PR
Benchmark specific checks:
- Benchmarks complete without errors
- Performance assertions pass
- (Note: Results may only store on main branch pushes)

Expected Duration:

quality-checks.yml: ~3-5 minutes
codecov-upload.yml: ~3-5 minutes
benchmark.yml: ~4-6 minutes

Success Criteria - ALL workflows must:

✅ Complete without failures
✅ Show no encryption key warnings
✅ Show no database record errors
✅ Maintain or improve coverage/performance baselines

Dependencies & Risks

Dependencies

Internal:

GitHub repository secrets access (for Option A)
Ability to modify 3 workflow files: .github/workflows/{codecov-upload,quality-checks,benchmark}.yml
Go test environment (local and CI)

External:

Codecov service (for coverage upload)
GitHub Actions runner availability

Risks

Risk	Likelihood	Impact	Mitigation
Tests fail after adding encryption key	Low	Medium	Test locally first with same env var
New test failures introduced by fixes	Medium	Medium	Validate each test fix individually
Coverage drops below 85%	Low	High	Add tests alongside fixes, not after
Codecov upload still fails	Low	High	Verify Codecov token is valid
Breaking other tests by modifying ProxyHostService	Low	High	Only add error wrapping, don't change logic
Missing affected workflows (incomplete fix)	Low	Critical	Verified all workflows via grep search; only 3 run Go tests
Workflow fixes out of sync	Medium	High	Use same env var name (`CHARON_ENCRYPTION_KEY`) across all workflows
Quality checks workflow more critical than codecov	N/A	Critical	Prioritize quality-checks.yml - it blocks PR merges
Benchmark workflow fails silently	Low	Medium	Add same fix proactively even if not currently failing

Multi-Workflow Coordination

Critical Insight: The quality-checks.yml workflow is MORE important than codecov-upload.yml because:

Quality checks run on every PR and block merges
Codecov upload is informational and doesn't block merges
Quality checks includes multiple test types (unit tests + perf tests)

Implementation Priority:

FIRST: Fix quality-checks.yml (most critical - PR blocker)
SECOND: Fix codecov-upload.yml (high priority - coverage tracking)
THIRD: Fix benchmark.yml (proactive - prevent future issues)

Consistency Requirements:

All workflows MUST use the same environment variable name: CHARON_ENCRYPTION_KEY
If using Option A (GitHub Secrets), all workflows MUST reference the same secret: CHARON_ENCRYPTION_KEY_TEST
If using Option B (Ephemeral), all workflows MUST generate keys the same way for consistency

Technical Debt Created

Test Helper Utilities:
- New testutil package should be documented
- Consider creating similar helpers for other models
Error Handling Consistency:
- If wrapping gorm.ErrRecordNotFound, apply same pattern to all services
- Document error handling conventions
Environment Variable Documentation:
- Update docs/development.md with required CI env vars
- Document test key generation process

Stop/Go Rules

Stop Conditions

Phase 1 (Environment Variables):

STOP if: Local tests fail after setting CHARON_ENCRYPTION_KEY
- Action: Investigate why encryption key breaks tests
- Escalate to: Backend service owners

Phase 2 (Test Fixes):

STOP if: More than 5 test files need modifications
- Action: Consider global test fixture/seed instead
- Escalate to: Test infrastructure team
STOP if: Fixing tests requires production code changes beyond error wrapping
- Action: Escalate as potential design issue

Phase 3 (Validation):

STOP if: CI still fails after local validation passes
- Action: Compare CI environment vs. local (Go version, SQLite version, etc.)
- Escalate to: DevOps/CI team

Go Conditions

Phase 1 → Phase 2:

GO if: Tests run with no RotationService warnings
GO if: Coverage remains >= 85%

Phase 2 → Phase 3:

GO if: All identified test failures are fixed
GO if: No new test failures introduced

Phase 3 → Complete:

GO if: CI run passes with all checks green
GO if: Codecov upload succeeds

Success Metrics

Quantitative

RotationService Warnings: 0 occurrences in CI logs
Test Failures: 0 "record not found" errors
Coverage: Maintain >= 85% backend coverage
CI Duration: No increase in test execution time
Test Pass Rate: 100% (all tests pass)

Qualitative

Code Quality: Test fixes follow established patterns
Documentation: Changes are self-explanatory or documented
Maintainability: Future tests can easily create test data
Security: Encryption key handling follows best practices

Timeline Estimate

Phase	Estimated Duration	Confidence
Phase 1: Environment Variable (3 workflows)	45 minutes	High
Phase 2: Test Fixes	1-3 hours	Medium
Phase 3: Validation (3 workflows)	45 minutes	High
Total	2.5-4.5 hours	Medium

Assumptions:

Fewer than 5 tests need fixing
No production code changes required (beyond error wrapping)
CI environment is stable
All 3 workflows can be tested in parallel

Phase 1 Breakdown:

Generate/configure secret: 5 minutes
Update quality-checks.yml (2 env blocks): 15 minutes
Update codecov-upload.yml (1 env block): 10 minutes
Update benchmark.yml (2 env blocks): 10 minutes
Document changes and verify: 5 minutes

Contingency:

If more than 5 tests fail: +2 hours
If production code needs refactoring: +4 hours
If CI environment has additional issues: +1 hour
If workflows have unexpected dependencies: +1 hour

Follow-Up Actions

Immediate (This PR)

✅ Add CHARON_ENCRYPTION_KEY to CI workflow
✅ Fix all identified test failures
✅ Verify CI passes

Short-Term (Next Sprint)

Test Infrastructure Audit:
- Document all required environment variables for tests
- Create standardized test setup utilities (testutil package)
- Add linting rule to catch missing test data setup
Error Handling Standardization:
- Define domain-specific errors for all services (not just ProxyHost)
- Document error handling conventions
- Apply pattern to all *Service.GetByID() methods
CI Environment Documentation:
- Document all GitHub Secrets required for workflows
- Create key rotation procedure
- Add CI environment variable checklist

Long-Term (Future)

Test Fixture Framework:
- Evaluate using testfixtures or similar library
- Create declarative test data setup
- Reduce boilerplate in test files
Integration Testing:
- Separate unit tests (fast, mocked) from integration tests (real DB)
- Use build tags: //go:build integration
- Run integration tests separately in CI
Service Constructor Refactoring:
- Make RotationService initialization explicit
- Allow tests to inject mock RotationService
- Reduce warning messages in test output

References

Files Analyzed

CI Configuration:

.github/workflows/codecov-upload.yml (workflow definition)

Backend Services:

backend/internal/crypto/rotation_service.go (encryption key loading)
backend/internal/services/dns_provider_service.go (RotationService usage)
backend/internal/services/credential_service.go (RotationService usage)
backend/internal/services/proxyhost_service.go (GetByID implementation)

Tests:

backend/internal/crypto/rotation_service_test.go (key setup pattern)
backend/internal/services/dns_provider_service_test.go (test setup)
backend/internal/services/credential_service_test.go (test setup)
backend/internal/services/proxyhost_service_test.go (CRUD test pattern)
backend/internal/api/handlers/proxy_host_handler_update_test.go (test helper)

Documentation:

.env.example (environment variable reference)
ARCHITECTURE.md (encryption key documentation)
docs/guides/dns-providers.md (encryption key usage guide)

External Resources

Appendix A: Workflow Analysis Details

Analysis Methodology

Search Commands Used:

# Find all workflow files
find .github/workflows -name "*.yml"

# Find workflows running Go tests
grep -r "go test\|go-test-coverage\.sh" .github/workflows/*.yml

# Find workflows with encryption key
grep -r "CHARON_ENCRYPTION_KEY" .github/workflows/*.yml

Results:

39 total workflow files in .github/workflows/
3 workflows run Go unit tests (affected by missing encryption key)
1 workflow (e2e-tests-split.yml) already has encryption key configured
2 workflows (cerberus, crowdsec) run integration tests (not affected)
33 workflows don't run backend tests (not affected)

Workflow-by-Workflow Breakdown

1. quality-checks.yml (CRITICAL)

Purpose: PR quality gates that block merges Trigger: On every pull_request to main/development Impact: Most critical - blocks PR approvals Test Commands:

Line 43: bash "scripts/go-test-coverage.sh"
Line 123: go test -run TestPerf -v ./internal/api/handlers

Current Status: ❌ Failing Fix Required: Add CHARON_ENCRYPTION_KEY to both test steps Expected Result: PR checks turn green, allowing merges

2. codecov-upload.yml (HIGH PRIORITY)

Purpose: Upload test coverage to Codecov service Trigger: On pull_request to main/development + workflow_dispatch Impact: High - coverage tracking and reporting Test Commands:

Line 58: bash scripts/go-test-coverage.sh

Current Status: ❌ Failing Fix Required: Add CHARON_ENCRYPTION_KEY to test step Expected Result: Coverage reports appear on PRs

3. benchmark.yml (MEDIUM PRIORITY)

Purpose: Performance regression detection Trigger: After docker-build.yml completes + workflow_dispatch Impact: Medium - catches performance regressions Test Commands:

Line 44: go test -bench=. -benchmem -run='^$' ./...
Line 74: go test -run TestPerf -v ./internal/api/handlers

Current Status: ⚠️ At risk (may not have failed yet) Fix Required: Add CHARON_ENCRYPTION_KEY to both test steps (proactive) Expected Result: Benchmarks run cleanly without warnings

4. e2e-tests-split.yml (ALREADY FIXED)

Purpose: End-to-end Playwright tests Trigger: Multiple triggers, runs E2E test shards Status: ✅ Already configured correctly

Evidence of correct configuration:

# Lines 280, 481, 690, 894, 1098, 1310 - All identical:
- name: Generate test encryption key
  run: echo "CHARON_ENCRYPTION_KEY=$(openssl rand -base64 32)" >> "$GITHUB_ENV"

Why it's correct: Each shard generates its own ephemeral key before running tests. This is the pattern Option B recommends.

5. cerberus-integration.yml (NOT AFFECTED)

Purpose: Cerberus security stack integration tests Test Type: Docker compose with integration scripts Why not affected: Doesn't run go test - runs scripts/cerberus_integration.sh Status: ✅ No changes needed

6. crowdsec-integration.yml (NOT AFFECTED)

Purpose: CrowdSec bouncer integration tests Test Type: Docker compose with integration scripts Why not affected: Doesn't run go test - runs skill-based integration scripts Status: ✅ No changes needed

Why Other Workflows Aren't Affected

Workflows without backend tests:

docker-build.yml - Builds images, no test execution
codeql.yml - Security scanning only
supply-chain-*.yml - SBOM and provenance only
release-goreleaser.yml - Release automation
docs.yml - Documentation deployment
repo-health.yml - Repository maintenance
renovate_prune.yml - Dependency management
auto-versioning.yml - Version bumping
caddy-major-monitor.yml - Upstream monitoring
update-geolite2.yml - GeoIP updates
nightly-build.yml - Scheduled builds
propagate-changes.yml - Branch sync
weekly-nightly-promotion.yml - Release promotion
gh_cache_cleanup.yml - Cache maintenance

Key Insight: The CI failures only affect workflows that run go test commands, and specifically those that instantiate services requiring RotationService. Integration test workflows use Docker compose and don't instantiate Go services directly in the CI runner.

Sign-Off

Prepared by: Investigation Agent Reviewed by: Pending (Awaiting supervisor approval) Approved by: Pending

Next Action: Await approval to proceed with Phase 1 implementation.

38 KiB Raw Blame History

CI Codecov Backend Test Failures - Remediation Plan

Executive Summary

Affected Workflows

Root Cause Issues

Investigation Findings

1. Encryption Key Requirements

File Analysis: .github/workflows/codecov-upload.yml

File Analysis: backend/internal/crypto/rotation_service.go

File Analysis: Service Dependencies

Example: How Tests Set Encryption Keys

2. ProxyHost "Record Not Found" Errors

File Analysis: backend/internal/services/proxyhost_service.go

Test Pattern Analysis

Likely Failure Scenario

Root Cause Analysis

Why Were Tests Passing Before?

CI vs. Local Test Differences

Remediation Plan

Phase 1: Environment Variable Configuration (WARNING ELIMINATION)

Implementation Strategy

Option A: Set in GitHub Actions (RECOMMENDED)

Option B: Generate Ephemeral Key (ALTERNATIVE)

Option C: Inline Test Key (NOT RECOMMENDED)

Phase 2: Database Seeding/Test Setup (ERROR ELIMINATION)

Step 1: Identify Failing Tests

Step 2: Classify Failures

Step 3: Apply Fixes

Step 4: Add Missing Test Utilities

Phase 3: Validation

Consolidated Implementation Checklist

Phase 3: Validation (Detailed Procedures)

Step 1: Local Validation

Step 2: CI Validation

Dependencies & Risks

Dependencies

Risks

Multi-Workflow Coordination

Technical Debt Created

Stop/Go Rules

Stop Conditions

Go Conditions

Success Metrics

Quantitative

Qualitative

Timeline Estimate

Follow-Up Actions

Immediate (This PR)

Short-Term (Next Sprint)

Long-Term (Future)

References

Files Analyzed

External Resources

Appendix A: Workflow Analysis Details

Analysis Methodology

Workflow-by-Workflow Breakdown

1. quality-checks.yml (CRITICAL)

2. codecov-upload.yml (HIGH PRIORITY)

3. benchmark.yml (MEDIUM PRIORITY)

4. e2e-tests-split.yml (ALREADY FIXED)

5. cerberus-integration.yml (NOT AFFECTED)

6. crowdsec-integration.yml (NOT AFFECTED)

Why Other Workflows Aren't Affected

Sign-Off

38 KiB

Raw Blame History

File Analysis: `.github/workflows/codecov-upload.yml`

File Analysis: `backend/internal/crypto/rotation_service.go`

File Analysis: `backend/internal/services/proxyhost_service.go`