38 KiB
CI Codecov Backend Test Failures - Remediation Plan
Date: 2026-02-16
Status: Investigation Complete - Ready for Implementation
Priority: CRITICAL CI BLOCKER
Workflow: .github/workflows/codecov-upload.yml → backend-codecov job
Executive Summary
CRITICAL: Multiple CI workflows are failing with the same root cause. Investigation reveals these failures affect 3 workflows, not just codecov-upload.
Affected Workflows
| Workflow File | Purpose | Job Name(s) | Test Command | Status | Priority |
|---|---|---|---|---|---|
codecov-upload.yml |
Coverage upload to Codecov | backend-codecov |
go-test-coverage.sh |
❌ Failing | CRITICAL |
quality-checks.yml |
PR quality gates | backend-quality |
go-test-coverage.sh + go test -run TestPerf |
❌ Failing | CRITICAL |
benchmark.yml |
Performance regression checks | benchmark |
go test -bench + go test -run TestPerf |
⚠️ At Risk | HIGH |
Other Workflows Analyzed (NOT affected):
- ✅
e2e-tests-split.yml- Already hasCHARON_ENCRYPTION_KEYconfigured (6+ locations) - ✅
cerberus-integration.yml- Runs integration scripts, not Go unit tests - ✅
crowdsec-integration.yml- Runs integration scripts, not Go unit tests - ✅ All other workflows - Do not run backend Go tests
Root Cause Issues
-
RotationService Initialization Warnings (Non-blocking but pollutes logs)
- Multiple services print: "Warning: RotationService initialization failed, using basic encryption: CHARON_ENCRYPTION_KEY is required"
- Root cause: Missing
CHARON_ENCRYPTION_KEYenvironment variable in ALL 3 affected workflows - Impact: Services fall back to basic encryption (no test failures, but warnings appear)
-
GORM "record not found" Errors (Blocking failures)
- Source:
backend/internal/services/proxyhost_service.go:194 - Root cause: Tests calling
GetByID()without proper test data setup - Impact: Tests expecting proxy host records fail with
gorm.ErrRecordNotFound
- Source:
Investigation Findings
1. Encryption Key Requirements
File Analysis: .github/workflows/codecov-upload.yml
Path: /projects/Charon/.github/workflows/codecov-upload.yml
Lines: 43-53 (backend-codecov job)
Current Environment Variables:
env:
CGO_ENABLED: 1
Missing Variables:
CHARON_ENCRYPTION_KEY(required for RotationService)
File Analysis: backend/internal/crypto/rotation_service.go
Path: /projects/Charon/backend/internal/crypto/rotation_service.go
Lines: 63-75
Error Trigger:
func NewRotationService(db *gorm.DB) (*RotationService, error) {
// Load current key (required)
currentKeyB64 := os.Getenv("CHARON_ENCRYPTION_KEY")
if currentKeyB64 == "" {
return nil, fmt.Errorf("CHARON_ENCRYPTION_KEY is required")
}
// ...
}
File Analysis: Service Dependencies
Affected Services:
backend/internal/services/dns_provider_service.go:145- Callscrypto.NewRotationService(db)backend/internal/services/credential_service.go:72- Callscrypto.NewRotationService(db)
Fallback Behavior:
rotationService, err := crypto.NewRotationService(db)
if err != nil {
// Fallback to non-rotation mode
fmt.Printf("Warning: RotationService initialization failed, using basic encryption: %v\n", err)
}
Test Setup Comparison:
| Test File | Sets CHARON_ENCRYPTION_KEY? | Uses RotationService? |
|---|---|---|
rotation_service_test.go |
✅ Yes (via setupTestKeys()) |
✅ Yes |
dns_provider_service_test.go |
❌ No (hardcoded test key) | ⚠️ Tries but falls back |
credential_service_test.go |
❌ No (hardcoded test key) | ⚠️ Tries but falls back |
Example: How Tests Set Encryption Keys
File: backend/internal/crypto/rotation_service_test.go:28-41
func setupTestKeys(t *testing.T) (currentKey, nextKey, legacyKey string) {
currentKey, err := GenerateNewKey()
require.NoError(t, err)
_ = os.Setenv("CHARON_ENCRYPTION_KEY", currentKey)
t.Cleanup(func() { _ = os.Unsetenv("CHARON_ENCRYPTION_KEY") })
return currentKey, nextKey, legacyKey
}
File: backend/internal/services/dns_provider_service_test.go:62
// Does NOT set CHARON_ENCRYPTION_KEY
encryptor, err := crypto.NewEncryptionService("AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=")
2. ProxyHost "Record Not Found" Errors
File Analysis: backend/internal/services/proxyhost_service.go
Path: /projects/Charon/backend/internal/services/proxyhost_service.go
Lines: 192-197
Error Source:
func (s *ProxyHostService) GetByID(id uint) (*models.ProxyHost, error) {
var host models.ProxyHost
if err := s.db.Where("id = ?", id).First(&host).Error; err != nil {
return nil, err // Returns gorm.ErrRecordNotFound if no record
}
return &host, nil
}
GORM Error Type: gorm.ErrRecordNotFound (not explicitly handled in ProxyHostService)
Test Pattern Analysis
File: backend/internal/services/proxyhost_service_test.go:73-102
Working Test Pattern:
func TestProxyHostService_CRUD(t *testing.T) {
db := setupProxyHostTestDB(t)
service := NewProxyHostService(db)
// Create test data BEFORE calling GetByID
host := &models.ProxyHost{
UUID: "uuid-1",
DomainNames: "test.example.com",
ForwardHost: "127.0.0.1",
ForwardPort: 8080,
}
err := service.Create(host) // Creates record in DB
assert.NoError(t, err)
assert.NotZero(t, host.ID)
// Now GetByID works because record exists
fetched, err := service.GetByID(host.ID)
assert.NoError(t, err)
assert.Equal(t, host.DomainNames, fetched.DomainNames)
}
File: backend/internal/api/handlers/proxy_host_handler_update_test.go:50-60
Helper Function Pattern:
func createTestProxyHost(t *testing.T, db *gorm.DB, name string) models.ProxyHost {
host := models.ProxyHost{
UUID: uuid.NewString(),
Name: name,
DomainNames: name + ".test.com",
ForwardScheme: "http",
ForwardHost: "localhost",
ForwardPort: 8080,
Enabled: true,
}
require.NoError(t, db.Create(&host).Error)
return host
}
Likely Failure Scenario
Hypothesis: Some tests are calling GetByID() with a hardcoded ID (e.g., GetByID(1)) expecting a record to exist, but:
- SQLite in-memory DB is empty at test start
- Test doesn't create the record before calling
GetByID() - Test previously relied on global seeding that no longer runs
To Identify Failing Tests:
# Search for tests calling GetByID without creating the record first
grep -r "GetByID(" backend/**/*_test.go
Root Cause Analysis
Why Were Tests Passing Before?
Encryption Key Warnings:
- Tests have ALWAYS printed these warnings (not a recent regression)
- Warnings are to stderr, don't fail tests
- This is "noise" that should be cleaned up
ProxyHost Errors:
- Likely Recent Change:
- A test was recently modified to call
GetByID()without proper setup - A global test fixture/seed was removed
- Test database setup order changed
- A test was recently modified to call
- Verification Needed: Check recent commits to
*_test.gofiles
CI vs. Local Test Differences
CI Environment (codecov-upload.yml):
- No environment variables set beyond
CGO_ENABLED=1 - Fresh test database for each test run
- No
.envfile loaded
Local Environment:
- May have
.envfile withCHARON_ENCRYPTION_KEYset - Test setup may differ from CI
- Local runs might have different test execution order
Key Files Checked:
.env.example- ShowsCHARON_ENCRYPTION_KEY=(empty, requires generation)scripts/go-test-coverage.sh- Does NOT setCHARON_ENCRYPTION_KEYscripts/setup-e2e-env.sh- Generates key for E2E tests (NOT unit tests)
Remediation Plan
Phase 1: Environment Variable Configuration (WARNING ELIMINATION)
Objective: Eliminate RotationService initialization warnings in CI logs across ALL affected workflows
Implementation Strategy
Single Secret for All Workflows:
- Use one GitHub Secret:
CHARON_ENCRYPTION_KEY_TEST - Apply to all 3 workflows consistently
- Same security model across all test runs
Option A: Set in GitHub Actions (RECOMMENDED)
Security: Use GitHub Repository Secrets for production-like CI
Implementation:
-
Generate Test Key:
# Local execution to generate key openssl rand -base64 32 -
Add to GitHub Secrets:
- Navigate to: Repository → Settings → Secrets → Actions
- Create new secret:
CHARON_ENCRYPTION_KEY_TEST - Value: Generated base64 key from step 1
-
Update ALL 3 Workflows:
Workflow 1: codecov-upload.yml File:
.github/workflows/codecov-upload.ymlLocation: Line 53-60 (backend-codecov job, "Run Go tests with coverage" step)- name: Run Go tests with coverage working-directory: ${{ github.workspace }} env: CGO_ENABLED: 1 CHARON_ENCRYPTION_KEY: ${{ secrets.CHARON_ENCRYPTION_KEY_TEST }} # ADD THIS LINE run: | bash scripts/go-test-coverage.sh 2>&1 | tee backend/test-output.txt exit "${PIPESTATUS[0]}"Workflow 2: quality-checks.yml (Test Coverage Step) File:
.github/workflows/quality-checks.ymlLocation: Line 37-45 (backend-quality job, "Run Go tests" step)- name: Run Go tests id: go-tests working-directory: ${{ github.workspace }} env: CGO_ENABLED: 1 CHARON_ENCRYPTION_KEY: ${{ secrets.CHARON_ENCRYPTION_KEY_TEST }} # ADD THIS LINE run: | bash "scripts/go-test-coverage.sh" 2>&1 | tee backend/test-output.txt exit "${PIPESTATUS[0]}"Workflow 2: quality-checks.yml (Perf Tests Step) File:
.github/workflows/quality-checks.ymlLocation: Line 115-124 (backend-quality job, "Run Perf Asserts" step)- name: Run Perf Asserts working-directory: backend env: # Conservative defaults to avoid flakiness on CI; tune as necessary PERF_MAX_MS_GETSTATUS_P95: 500ms PERF_MAX_MS_GETSTATUS_P95_PARALLEL: 1500ms PERF_MAX_MS_LISTDECISIONS_P95: 2000ms CHARON_ENCRYPTION_KEY: ${{ secrets.CHARON_ENCRYPTION_KEY_TEST }} # ADD THIS LINE run: | { echo "## 🔍 Running performance assertions (TestPerf)" go test -run TestPerf -v ./internal/api/handlers -count=1 | tee perf-output.txt } >> "$GITHUB_STEP_SUMMARY" exit "${PIPESTATUS[0]}"Workflow 3: benchmark.yml (Benchmark Step) File:
.github/workflows/benchmark.ymlLocation: Line 44 (benchmark job, "Run Benchmark" step)- name: Run Benchmark working-directory: backend env: CHARON_ENCRYPTION_KEY: ${{ secrets.CHARON_ENCRYPTION_KEY_TEST }} # ADD THIS LINE run: go test -bench=. -benchmem -run='^$' ./... | tee output.txtWorkflow 3: benchmark.yml (Perf Asserts Step) File:
.github/workflows/benchmark.ymlLocation: Line 74 (benchmark job, "Run Perf Asserts" step)- name: Run Perf Asserts working-directory: backend env: PERF_MAX_MS_GETSTATUS_P95: 500ms PERF_MAX_MS_GETSTATUS_P95_PARALLEL: 1500ms PERF_MAX_MS_LISTDECISIONS_P95: 2000ms CHARON_ENCRYPTION_KEY: ${{ secrets.CHARON_ENCRYPTION_KEY_TEST }} # ADD THIS LINE run: | echo "## 🔍 Running performance assertions (TestPerf)" >> "$GITHUB_STEP_SUMMARY" go test -run TestPerf -v ./internal/api/handlers -count=1 | tee perf-output.txt exit "${PIPESTATUS[0]}"
Summary of Changes:
- 3 workflow files to modify
- 5 env sections to update (2 in quality-checks, 2 in benchmark, 1 in codecov-upload)
- 1 GitHub Secret to create
Pros:
- Secrets are encrypted at rest
- Key never appears in logs
- Matches production security model
- Consistent across all workflows
Cons:
- Requires GitHub repository admin access
- Key rotation requires updating secret (but affects all workflows at once)
Option B: Generate Ephemeral Key (ALTERNATIVE)
Security: Generate temporary key for each CI run
Implementation:
Apply this pattern to all 3 workflows. Each workflow generates its own ephemeral key.
Workflow 1: codecov-upload.yml
File: .github/workflows/codecov-upload.yml
Location: Before "Run Go tests with coverage" step (after "Set up Go")
- name: Generate test encryption key
id: test-key
run: |
TEST_KEY=$(openssl rand -base64 32)
echo "::add-mask::${TEST_KEY}"
echo "CHARON_ENCRYPTION_KEY=${TEST_KEY}" >> $GITHUB_ENV
- name: Run Go tests with coverage
working-directory: ${{ github.workspace }}
env:
CGO_ENABLED: 1
# CHARON_ENCRYPTION_KEY inherited from $GITHUB_ENV
run: |
bash scripts/go-test-coverage.sh 2>&1 | tee backend/test-output.txt
exit "${PIPESTATUS[0]}"
Workflow 2: quality-checks.yml
File: .github/workflows/quality-checks.yml
Location: Before "Run Go tests" step (after "Repo health check")
- name: Generate test encryption key
id: test-key
run: |
TEST_KEY=$(openssl rand -base64 32)
echo "::add-mask::${TEST_KEY}"
echo "CHARON_ENCRYPTION_KEY=${TEST_KEY}" >> $GITHUB_ENV
- name: Run Go tests
id: go-tests
working-directory: ${{ github.workspace }}
env:
CGO_ENABLED: 1
# CHARON_ENCRYPTION_KEY inherited from $GITHUB_ENV
run: |
bash "scripts/go-test-coverage.sh" 2>&1 | tee backend/test-output.txt
exit "${PIPESTATUS[0]}"
# ... later in the same job ...
- name: Run Perf Asserts
working-directory: backend
env:
PERF_MAX_MS_GETSTATUS_P95: 500ms
PERF_MAX_MS_GETSTATUS_P95_PARALLEL: 1500ms
PERF_MAX_MS_LISTDECISIONS_P95: 2000ms
# CHARON_ENCRYPTION_KEY inherited from $GITHUB_ENV
run: |
{
echo "## 🔍 Running performance assertions (TestPerf)"
go test -run TestPerf -v ./internal/api/handlers -count=1 | tee perf-output.txt
} >> "$GITHUB_STEP_SUMMARY"
exit "${PIPESTATUS[0]}"
Workflow 3: benchmark.yml
File: .github/workflows/benchmark.yml
Location: Before "Run Benchmark" step (after "Set up Go")
- name: Generate test encryption key
id: test-key
run: |
TEST_KEY=$(openssl rand -base64 32)
echo "::add-mask::${TEST_KEY}"
echo "CHARON_ENCRYPTION_KEY=${TEST_KEY}" >> $GITHUB_ENV
- name: Run Benchmark
working-directory: backend
env:
# CHARON_ENCRYPTION_KEY inherited from $GITHUB_ENV
run: go test -bench=. -benchmem -run='^$' ./... | tee output.txt
# ... later in the same job ...
- name: Run Perf Asserts
working-directory: backend
env:
PERF_MAX_MS_GETSTATUS_P95: 500ms
PERF_MAX_MS_GETSTATUS_P95_PARALLEL: 1500ms
PERF_MAX_MS_LISTDECISIONS_P95: 2000ms
# CHARON_ENCRYPTION_KEY inherited from $GITHUB_ENV
run: |
echo "## 🔍 Running performance assertions (TestPerf)" >> "$GITHUB_STEP_SUMMARY"
go test -run TestPerf -v ./internal/api/handlers -count=1 | tee perf-output.txt
exit "${PIPESTATUS[0]}"
Pros:
- No secrets management needed
- Key is ephemeral (discarded after run)
- Simpler to implement
- Each workflow run gets its own unique key
Cons:
- Generates new key on every run (minimal overhead ~0.1s)
- Doesn't test key persistence scenarios
Option C: Inline Test Key (NOT RECOMMENDED)
Security: Hardcode a test-only key in workflow
Implementation:
Apply same hardcoded key to all 3 workflows:
- name: Run Go tests with coverage # or Run Benchmark, or Run Perf Asserts
working-directory: ${{ github.workspace }}
env:
CGO_ENABLED: 1
CHARON_ENCRYPTION_KEY: "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=" # Hardcoded test key
run: |
bash scripts/go-test-coverage.sh 2>&1 | tee backend/test-output.txt
exit "${PIPESTATUS[0]}"
Apply to:
.github/workflows/codecov-upload.yml- Line 53 env block.github/workflows/quality-checks.yml- Lines 37 and 115 env blocks.github/workflows/benchmark.yml- Lines 44 and 74 env blocks
Pros:
- Simplest to implement (just add one line per env block)
- No secrets management
- No key generation overhead
Cons:
- ⚠️ Key visible in workflow file and logs
- ⚠️ Security audit will flag this
- ⚠️ Doesn't test real key loading from environment
- ⚠️ Not recommended for repos with security compliance requirements
Recommendation: Use Option A (GitHub Secrets) for production readiness and security compliance, or Option B (Ephemeral) for simplicity without security concerns. Avoid Option C unless this is a demo/test repository.
Phase 2: Database Seeding/Test Setup (ERROR ELIMINATION)
Objective: Fix ProxyHost "record not found" failures
Step 1: Identify Failing Tests
Action: Run tests locally and capture failures
cd backend
go test -v ./... 2>&1 | tee test-output.txt
grep -i "record not found" test-output.txt
Expected Output:
--- FAIL: TestSomeFunction (0.00s)
service_test.go:123: Error getting proxy host: record not found
Step 2: Classify Failures
For each failing test, determine:
-
Test calls
GetByID()without creating record?- Fix: Add
createTestProxyHost()call beforeGetByID()
- Fix: Add
-
Test expects a specific ID (e.g., ID=1)?
- Fix: Store the returned ID from
Create()and use it inGetByID()
- Fix: Store the returned ID from
-
Test relies on global seed data?
- Fix: Add explicit test data creation in test setup
Step 3: Apply Fixes
Pattern 1: Missing Test Data Creation
Before (Broken):
func TestSomeFunction(t *testing.T) {
db := setupTestDB(t)
service := NewProxyHostService(db)
// Assumes ID=1 exists (WRONG)
host, err := service.GetByID(1)
require.NoError(t, err)
}
After (Fixed):
func TestSomeFunction(t *testing.T) {
db := setupTestDB(t)
service := NewProxyHostService(db)
// Create test data first
testHost := &models.ProxyHost{
UUID: "test-uuid",
DomainNames: "test.example.com",
ForwardHost: "localhost",
ForwardPort: 8080,
}
require.NoError(t, service.Create(testHost))
// Now fetch it by the auto-assigned ID
host, err := service.GetByID(testHost.ID)
require.NoError(t, err)
assert.Equal(t, "test.example.com", host.DomainNames)
}
Pattern 2: Expecting Specific Error
Option A: Handle gorm.ErrRecordNotFound
func TestGetByID_NotFound(t *testing.T) {
db := setupTestDB(t)
service := NewProxyHostService(db)
// Test error handling for non-existent ID
_, err := service.GetByID(999)
require.Error(t, err)
assert.True(t, errors.Is(err, gorm.ErrRecordNotFound))
}
Option B: Wrap Error in Service (BETTER)
Modify ProxyHostService.GetByID() to return a domain-specific error:
File: backend/internal/services/proxyhost_service.go:192-197
var ErrProxyHostNotFound = errors.New("proxy host not found")
func (s *ProxyHostService) GetByID(id uint) (*models.ProxyHost, error) {
var host models.ProxyHost
if err := s.db.Where("id = ?", id).First(&host).Error; err != nil {
if errors.Is(err, gorm.ErrRecordNotFound) {
return nil, ErrProxyHostNotFound
}
return nil, err
}
return &host, nil
}
Then tests become:
func TestGetByID_NotFound(t *testing.T) {
db := setupTestDB(t)
service := NewProxyHostService(db)
_, err := service.GetByID(999)
require.Error(t, err)
assert.True(t, errors.Is(err, services.ErrProxyHostNotFound))
}
Step 4: Add Missing Test Utilities
Create Shared Test Helper:
File: backend/internal/services/testutil/proxyhost_fixtures.go (NEW FILE)
package testutil
import (
"testing"
"github.com/Wikid82/charon/backend/internal/models"
"github.com/google/uuid"
"github.com/stretchr/testify/require"
"gorm.io/gorm"
)
// CreateTestProxyHost creates a proxy host with sensible defaults for testing.
func CreateTestProxyHost(t *testing.T, db *gorm.DB, overrides ...func(*models.ProxyHost)) *models.ProxyHost {
t.Helper()
host := &models.ProxyHost{
UUID: uuid.NewString(),
Name: "Test Proxy",
DomainNames: "test.example.com",
ForwardScheme: "http",
ForwardHost: "localhost",
ForwardPort: 8080,
Enabled: true,
}
// Apply overrides
for _, override := range overrides {
override(host)
}
require.NoError(t, db.Create(host).Error)
return host
}
Usage in Tests:
import "github.com/Wikid82/charon/backend/internal/services/testutil"
func TestSomeFunction(t *testing.T) {
db := setupTestDB(t)
service := NewProxyHostService(db)
// Create test data with defaults
host1 := testutil.CreateTestProxyHost(t, db)
// Create test data with custom values
host2 := testutil.CreateTestProxyHost(t, db, func(h *models.ProxyHost) {
h.Name = "Custom Name"
h.ForwardPort = 9000
})
// Now use them
fetched, err := service.GetByID(host1.ID)
require.NoError(t, err)
}
Phase 3: Validation
Consolidated Implementation Checklist
Phase 1: Multi-Workflow Environment Variable Fix
-
Generate or configure secret:
- Option A: Generate key with
openssl rand -base64 32, add to GitHub Secrets asCHARON_ENCRYPTION_KEY_TEST - Option B: Add key generation step to each workflow (ephemeral keys)
- Option C: Use hardcoded test key (not recommended)
- Option A: Generate key with
-
Update Workflow 1 (Priority: CRITICAL):
- File:
.github/workflows/quality-checks.yml - Location 1: Line 37-45 - Add
CHARON_ENCRYPTION_KEYto "Run Go tests" step - Location 2: Line 115-124 - Add
CHARON_ENCRYPTION_KEYto "Run Perf Asserts" step - Verification: Both test steps have the env var
- File:
-
Update Workflow 2 (Priority: HIGH):
- File:
.github/workflows/codecov-upload.yml - Location: Line 53-60 - Add
CHARON_ENCRYPTION_KEYto "Run Go tests with coverage" step - Verification: Test step has the env var
- File:
-
Update Workflow 3 (Priority: MEDIUM):
- File:
.github/workflows/benchmark.yml - Location 1: Line 44 - Add
CHARON_ENCRYPTION_KEYto "Run Benchmark" step - Location 2: Line 74 - Add
CHARON_ENCRYPTION_KEYto "Run Perf Asserts" step - Verification: Both test steps have the env var
- File:
-
Total changes: 3 files, 5 env blocks updated
Phase 2: Test Data Setup Fixes
- Identify failing tests with "record not found" errors
- Fix each test by adding proper test data creation
- Add
testutil.CreateTestProxyHost()helper if needed - Verify all tests pass locally
Phase 3: Multi-Workflow Validation
- Local validation (all tests pass with encryption key set)
- Push to feature branch
- Monitor all 3 workflow runs in GitHub Actions
- Verify each workflow:
- ✅ quality-checks.yml - No warnings, tests pass
- ✅ codecov-upload.yml - No warnings, tests pass, coverage uploaded
- ✅ benchmark.yml - No warnings, benchmarks complete
Phase 3: Validation (Detailed Procedures)
Step 1: Local Validation
Execute Before Pushing:
# 1. Set encryption key locally (matches CI)
export CHARON_ENCRYPTION_KEY=$(openssl rand -base64 32)
# 2. Run backend tests
cd /projects/Charon
.github/skills/scripts/skill-runner.sh test-backend-coverage
# 3. Verify no warnings in output
# Look for: "Warning: RotationService initialization failed"
# Expected: No warnings
# 4. Verify coverage pass
# Expected: "Coverage requirement met"
# 5. Check for test failures
# Expected: All tests pass
Success Criteria:
- ✅ No "RotationService initialization failed" warnings
- ✅ No "record not found" errors
- ✅ Coverage >= 85%
- ✅ All tests pass
Step 2: CI Validation
Push to Branch and Monitor:
git checkout -b fix/ci-backend-test-failures
git add .github/workflows/codecov-upload.yml
git add .github/workflows/quality-checks.yml
git add .github/workflows/benchmark.yml
git add backend/internal/services/proxyhost_service.go # If modified
git add backend/internal/services/*_test.go # Any test fixes
git commit -m "fix(ci): resolve backend test failures across all workflows
- Add CHARON_ENCRYPTION_KEY to quality-checks, codecov-upload, and benchmark workflows
- Fix ProxyHost test data setup in service tests
- Eliminate RotationService initialization warnings
Affected workflows:
- quality-checks.yml (CRITICAL: PR blocker)
- codecov-upload.yml (HIGH: coverage tracking)
- benchmark.yml (MEDIUM: performance regression)
Resolves: backend test job failures across 3 CI workflows"
git push origin fix/ci-backend-test-failures
Monitor All 3 CI Workflows:
-
Navigate to GitHub Actions → Your PR
-
Verify these workflow runs appear:
- ✅ Quality Checks (most critical)
- ✅ Upload Coverage to Codecov
- ✅ Go Benchmark (may run later via workflow_run trigger)
-
For each workflow, verify:
- No stderr warnings in test execution steps
- Test output shows all tests passing
- No "RotationService initialization failed" messages
- No "record not found" errors
-
Quality Checks specific checks:
- "Run Go tests" step succeeds
- "Run Perf Asserts" step succeeds
- GORM Security Scanner passes
- Frontend tests pass (unrelated but monitored)
-
Codecov Upload specific checks:
- Backend tests pass
- Coverage upload succeeds
- Coverage report appears on PR
-
Benchmark specific checks:
- Benchmarks complete without errors
- Performance assertions pass
- (Note: Results may only store on main branch pushes)
Expected Duration:
- quality-checks.yml: ~3-5 minutes
- codecov-upload.yml: ~3-5 minutes
- benchmark.yml: ~4-6 minutes
Success Criteria - ALL workflows must:
- ✅ Complete without failures
- ✅ Show no encryption key warnings
- ✅ Show no database record errors
- ✅ Maintain or improve coverage/performance baselines
Dependencies & Risks
Dependencies
Internal:
- GitHub repository secrets access (for Option A)
- Ability to modify 3 workflow files:
.github/workflows/{codecov-upload,quality-checks,benchmark}.yml - Go test environment (local and CI)
External:
- Codecov service (for coverage upload)
- GitHub Actions runner availability
Risks
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Tests fail after adding encryption key | Low | Medium | Test locally first with same env var |
| New test failures introduced by fixes | Medium | Medium | Validate each test fix individually |
| Coverage drops below 85% | Low | High | Add tests alongside fixes, not after |
| Codecov upload still fails | Low | High | Verify Codecov token is valid |
| Breaking other tests by modifying ProxyHostService | Low | High | Only add error wrapping, don't change logic |
| Missing affected workflows (incomplete fix) | Low | Critical | Verified all workflows via grep search; only 3 run Go tests |
| Workflow fixes out of sync | Medium | High | Use same env var name (CHARON_ENCRYPTION_KEY) across all workflows |
| Quality checks workflow more critical than codecov | N/A | Critical | Prioritize quality-checks.yml - it blocks PR merges |
| Benchmark workflow fails silently | Low | Medium | Add same fix proactively even if not currently failing |
Multi-Workflow Coordination
Critical Insight: The quality-checks.yml workflow is MORE important than codecov-upload.yml because:
- Quality checks run on every PR and block merges
- Codecov upload is informational and doesn't block merges
- Quality checks includes multiple test types (unit tests + perf tests)
Implementation Priority:
- FIRST: Fix
quality-checks.yml(most critical - PR blocker) - SECOND: Fix
codecov-upload.yml(high priority - coverage tracking) - THIRD: Fix
benchmark.yml(proactive - prevent future issues)
Consistency Requirements:
- All workflows MUST use the same environment variable name:
CHARON_ENCRYPTION_KEY - If using Option A (GitHub Secrets), all workflows MUST reference the same secret:
CHARON_ENCRYPTION_KEY_TEST - If using Option B (Ephemeral), all workflows MUST generate keys the same way for consistency
Technical Debt Created
-
Test Helper Utilities:
- New
testutilpackage should be documented - Consider creating similar helpers for other models
- New
-
Error Handling Consistency:
- If wrapping
gorm.ErrRecordNotFound, apply same pattern to all services - Document error handling conventions
- If wrapping
-
Environment Variable Documentation:
- Update
docs/development.mdwith required CI env vars - Document test key generation process
- Update
Stop/Go Rules
Stop Conditions
Phase 1 (Environment Variables):
- STOP if: Local tests fail after setting
CHARON_ENCRYPTION_KEY- Action: Investigate why encryption key breaks tests
- Escalate to: Backend service owners
Phase 2 (Test Fixes):
- STOP if: More than 5 test files need modifications
- Action: Consider global test fixture/seed instead
- Escalate to: Test infrastructure team
- STOP if: Fixing tests requires production code changes beyond error wrapping
- Action: Escalate as potential design issue
Phase 3 (Validation):
- STOP if: CI still fails after local validation passes
- Action: Compare CI environment vs. local (Go version, SQLite version, etc.)
- Escalate to: DevOps/CI team
Go Conditions
Phase 1 → Phase 2:
- GO if: Tests run with no RotationService warnings
- GO if: Coverage remains >= 85%
Phase 2 → Phase 3:
- GO if: All identified test failures are fixed
- GO if: No new test failures introduced
Phase 3 → Complete:
- GO if: CI run passes with all checks green
- GO if: Codecov upload succeeds
Success Metrics
Quantitative
- RotationService Warnings: 0 occurrences in CI logs
- Test Failures: 0 "record not found" errors
- Coverage: Maintain >= 85% backend coverage
- CI Duration: No increase in test execution time
- Test Pass Rate: 100% (all tests pass)
Qualitative
- Code Quality: Test fixes follow established patterns
- Documentation: Changes are self-explanatory or documented
- Maintainability: Future tests can easily create test data
- Security: Encryption key handling follows best practices
Timeline Estimate
| Phase | Estimated Duration | Confidence |
|---|---|---|
| Phase 1: Environment Variable (3 workflows) | 45 minutes | High |
| Phase 2: Test Fixes | 1-3 hours | Medium |
| Phase 3: Validation (3 workflows) | 45 minutes | High |
| Total | 2.5-4.5 hours | Medium |
Assumptions:
- Fewer than 5 tests need fixing
- No production code changes required (beyond error wrapping)
- CI environment is stable
- All 3 workflows can be tested in parallel
Phase 1 Breakdown:
- Generate/configure secret: 5 minutes
- Update quality-checks.yml (2 env blocks): 15 minutes
- Update codecov-upload.yml (1 env block): 10 minutes
- Update benchmark.yml (2 env blocks): 10 minutes
- Document changes and verify: 5 minutes
Contingency:
- If more than 5 tests fail: +2 hours
- If production code needs refactoring: +4 hours
- If CI environment has additional issues: +1 hour
- If workflows have unexpected dependencies: +1 hour
Follow-Up Actions
Immediate (This PR)
- ✅ Add
CHARON_ENCRYPTION_KEYto CI workflow - ✅ Fix all identified test failures
- ✅ Verify CI passes
Short-Term (Next Sprint)
-
Test Infrastructure Audit:
- Document all required environment variables for tests
- Create standardized test setup utilities (
testutilpackage) - Add linting rule to catch missing test data setup
-
Error Handling Standardization:
- Define domain-specific errors for all services (not just ProxyHost)
- Document error handling conventions
- Apply pattern to all
*Service.GetByID()methods
-
CI Environment Documentation:
- Document all GitHub Secrets required for workflows
- Create key rotation procedure
- Add CI environment variable checklist
Long-Term (Future)
-
Test Fixture Framework:
- Evaluate using
testfixturesor similar library - Create declarative test data setup
- Reduce boilerplate in test files
- Evaluate using
-
Integration Testing:
- Separate unit tests (fast, mocked) from integration tests (real DB)
- Use build tags:
//go:build integration - Run integration tests separately in CI
-
Service Constructor Refactoring:
- Make
RotationServiceinitialization explicit - Allow tests to inject mock
RotationService - Reduce warning messages in test output
- Make
References
Files Analyzed
CI Configuration:
.github/workflows/codecov-upload.yml(workflow definition)
Backend Services:
backend/internal/crypto/rotation_service.go(encryption key loading)backend/internal/services/dns_provider_service.go(RotationService usage)backend/internal/services/credential_service.go(RotationService usage)backend/internal/services/proxyhost_service.go(GetByID implementation)
Tests:
backend/internal/crypto/rotation_service_test.go(key setup pattern)backend/internal/services/dns_provider_service_test.go(test setup)backend/internal/services/credential_service_test.go(test setup)backend/internal/services/proxyhost_service_test.go(CRUD test pattern)backend/internal/api/handlers/proxy_host_handler_update_test.go(test helper)
Documentation:
.env.example(environment variable reference)ARCHITECTURE.md(encryption key documentation)docs/guides/dns-providers.md(encryption key usage guide)
External Resources
Appendix A: Workflow Analysis Details
Analysis Methodology
Search Commands Used:
# Find all workflow files
find .github/workflows -name "*.yml"
# Find workflows running Go tests
grep -r "go test\|go-test-coverage\.sh" .github/workflows/*.yml
# Find workflows with encryption key
grep -r "CHARON_ENCRYPTION_KEY" .github/workflows/*.yml
Results:
- 39 total workflow files in
.github/workflows/ - 3 workflows run Go unit tests (affected by missing encryption key)
- 1 workflow (e2e-tests-split.yml) already has encryption key configured
- 2 workflows (cerberus, crowdsec) run integration tests (not affected)
- 33 workflows don't run backend tests (not affected)
Workflow-by-Workflow Breakdown
1. quality-checks.yml (CRITICAL)
Purpose: PR quality gates that block merges Trigger: On every pull_request to main/development Impact: Most critical - blocks PR approvals Test Commands:
- Line 43:
bash "scripts/go-test-coverage.sh" - Line 123:
go test -run TestPerf -v ./internal/api/handlers
Current Status: ❌ Failing
Fix Required: Add CHARON_ENCRYPTION_KEY to both test steps
Expected Result: PR checks turn green, allowing merges
2. codecov-upload.yml (HIGH PRIORITY)
Purpose: Upload test coverage to Codecov service Trigger: On pull_request to main/development + workflow_dispatch Impact: High - coverage tracking and reporting Test Commands:
- Line 58:
bash scripts/go-test-coverage.sh
Current Status: ❌ Failing
Fix Required: Add CHARON_ENCRYPTION_KEY to test step
Expected Result: Coverage reports appear on PRs
3. benchmark.yml (MEDIUM PRIORITY)
Purpose: Performance regression detection Trigger: After docker-build.yml completes + workflow_dispatch Impact: Medium - catches performance regressions Test Commands:
- Line 44:
go test -bench=. -benchmem -run='^$' ./... - Line 74:
go test -run TestPerf -v ./internal/api/handlers
Current Status: ⚠️ At risk (may not have failed yet)
Fix Required: Add CHARON_ENCRYPTION_KEY to both test steps (proactive)
Expected Result: Benchmarks run cleanly without warnings
4. e2e-tests-split.yml (ALREADY FIXED)
Purpose: End-to-end Playwright tests Trigger: Multiple triggers, runs E2E test shards Status: ✅ Already configured correctly
Evidence of correct configuration:
# Lines 280, 481, 690, 894, 1098, 1310 - All identical:
- name: Generate test encryption key
run: echo "CHARON_ENCRYPTION_KEY=$(openssl rand -base64 32)" >> "$GITHUB_ENV"
Why it's correct: Each shard generates its own ephemeral key before running tests. This is the pattern Option B recommends.
5. cerberus-integration.yml (NOT AFFECTED)
Purpose: Cerberus security stack integration tests
Test Type: Docker compose with integration scripts
Why not affected: Doesn't run go test - runs scripts/cerberus_integration.sh
Status: ✅ No changes needed
6. crowdsec-integration.yml (NOT AFFECTED)
Purpose: CrowdSec bouncer integration tests
Test Type: Docker compose with integration scripts
Why not affected: Doesn't run go test - runs skill-based integration scripts
Status: ✅ No changes needed
Why Other Workflows Aren't Affected
Workflows without backend tests:
docker-build.yml- Builds images, no test executioncodeql.yml- Security scanning onlysupply-chain-*.yml- SBOM and provenance onlyrelease-goreleaser.yml- Release automationdocs.yml- Documentation deploymentrepo-health.yml- Repository maintenancerenovate_prune.yml- Dependency managementauto-versioning.yml- Version bumpingcaddy-major-monitor.yml- Upstream monitoringupdate-geolite2.yml- GeoIP updatesnightly-build.yml- Scheduled buildspropagate-changes.yml- Branch syncweekly-nightly-promotion.yml- Release promotiongh_cache_cleanup.yml- Cache maintenance
Key Insight: The CI failures only affect workflows that run go test commands, and specifically those that instantiate services requiring RotationService. Integration test workflows use Docker compose and don't instantiate Go services directly in the CI runner.
Sign-Off
Prepared by: Investigation Agent Reviewed by: Pending (Awaiting supervisor approval) Approved by: Pending
Next Action: Await approval to proceed with Phase 1 implementation.