chore: git cache cleanup
This commit is contained in:
@@ -0,0 +1,263 @@
|
||||
# CI Test Failures Remediation Plan
|
||||
|
||||
**Date:** 2026-02-16
|
||||
**Status:** READY FOR IMPLEMENTATION
|
||||
**Encryption Key Issue:** ✅ RESOLVED
|
||||
**Remaining Issue:** 9 Test Failures
|
||||
|
||||
---
|
||||
|
||||
## ✅ CONFIRMED: Encryption Key Issue Resolved
|
||||
|
||||
**Verification:**
|
||||
- No "invalid key length" errors in CI logs
|
||||
- No "CHARON_ENCRYPTION_KEY is required" warnings
|
||||
- Secret format validated: `r2Xfh5PUagXEVG1Qhg9Hq3ELfMdtQZx5gX0kvE23BHQ=` decodes to exactly 32 bytes
|
||||
- Coverage: 85.1% ✅ (meets 85% requirement)
|
||||
|
||||
---
|
||||
|
||||
## 🔴 9 FAILING TESTS TO FIX
|
||||
|
||||
### Test Failure Summary
|
||||
|
||||
```
|
||||
FAILED TEST SUMMARY:
|
||||
--- FAIL: TestMain_DefaultStartupGracefulShutdown_Subprocess (0.55s)
|
||||
--- FAIL: TestGetAcquisitionConfig (0.00s)
|
||||
--- FAIL: TestEnsureBouncerRegistration_ConcurrentCalls (0.01s)
|
||||
--- FAIL: TestPluginHandler_ReloadPlugins_WithErrors (0.00s)
|
||||
--- FAIL: TestFetchIndexFallbackHTTP (0.00s)
|
||||
--- FAIL: TestRunScheduledBackup_CleanupFails (0.01s)
|
||||
--- FAIL: TestSecurityService_LogAudit_ChannelFullFallsBackToSyncWrite (0.03s)
|
||||
--- FAIL: TestCredentialService_Delete (0.01s)
|
||||
--- FAIL: TestCredentialService_GetCredentialForDomain_WildcardMatch (0.01s)
|
||||
```
|
||||
|
||||
### Context
|
||||
|
||||
These failures appear to be **pre-existing issues** unrelated to the encryption key work:
|
||||
- User previously mentioned "we need to fix the other failing tests" during CI wait
|
||||
- Backend Dev reported fixing some of these tests locally but they're still failing in CI
|
||||
- This suggests environment differences between local and CI runners
|
||||
|
||||
---
|
||||
|
||||
## 📋 REMEDIATION STRATEGY
|
||||
|
||||
### Phase 1: Local Reproduction (Priority: HIGH)
|
||||
|
||||
**Goal:** Reproduce each failure locally to understand root causes.
|
||||
|
||||
**Actions:**
|
||||
1. Run each failing test individually with verbose output
|
||||
2. Document exact failure messages and stack traces
|
||||
3. Identify patterns (timeout, race condition, environment dependency, etc.)
|
||||
4. Check if failures are deterministic or flaky
|
||||
|
||||
**Commands:**
|
||||
```bash
|
||||
# Run individual failing tests with verbose output
|
||||
cd backend
|
||||
|
||||
# Test 1
|
||||
go test -v -run TestMain_DefaultStartupGracefulShutdown_Subprocess ./cmd/main_test.go
|
||||
|
||||
# Test 2
|
||||
go test -v -run TestGetAcquisitionConfig ./internal/handlers/crowdsec_handler_test.go
|
||||
|
||||
# Test 3
|
||||
go test -v -run TestEnsureBouncerRegistration_ConcurrentCalls ./internal/services/crowdsec_service_test.go
|
||||
|
||||
# Test 4
|
||||
go test -v -run TestPluginHandler_ReloadPlugins_WithErrors ./internal/api/handlers/plugin_handler_test.go
|
||||
|
||||
# Test 5
|
||||
go test -v -run TestFetchIndexFallbackHTTP ./internal/crowdsec/preset_hub_test.go
|
||||
|
||||
# Test 6
|
||||
go test -v -run TestRunScheduledBackup_CleanupFails ./internal/services/backup_service_test.go
|
||||
|
||||
# Test 7
|
||||
go test -v -run TestSecurityService_LogAudit_ChannelFullFallsBackToSyncWrite ./internal/services/security_service_test.go
|
||||
|
||||
# Test 8
|
||||
go test -v -run TestCredentialService_Delete ./internal/services/credential_service_test.go
|
||||
|
||||
# Test 9
|
||||
go test -v -run TestCredentialService_GetCredentialForDomain_WildcardMatch ./internal/services/credential_service_test.go
|
||||
```
|
||||
|
||||
### Phase 2: Root Cause Analysis (Priority: HIGH)
|
||||
|
||||
**Expected Failure Patterns:**
|
||||
|
||||
1. **Subprocess/Concurrency Tests** (3 tests):
|
||||
- TestMain_DefaultStartupGracefulShutdown_Subprocess
|
||||
- TestEnsureBouncerRegistration_ConcurrentCalls
|
||||
- TestSecurityService_LogAudit_ChannelFullFallsBackToSyncWrite
|
||||
- **Hypothesis:** Race conditions or timing issues in CI environment
|
||||
- **Check:** CI has different CPU/timing characteristics than local
|
||||
|
||||
2. **File System Tests** (2 tests):
|
||||
- TestRunScheduledBackup_CleanupFails
|
||||
- TestFetchIndexFallbackHTTP
|
||||
- **Hypothesis:** Permission issues or missing temp directories in CI
|
||||
- **Check:** Temp directory creation/cleanup
|
||||
|
||||
3. **Database/Service Tests** (2 tests):
|
||||
- TestCredentialService_Delete
|
||||
- TestCredentialService_GetCredentialForDomain_WildcardMatch
|
||||
- **Hypothesis:** Now that encryption is working, tests may be revealing actual bugs
|
||||
- **Check:** Encryption/decryption logic with valid keys
|
||||
|
||||
4. **Handler Tests** (2 tests):
|
||||
- TestGetAcquisitionConfig
|
||||
- TestPluginHandler_ReloadPlugins_WithErrors
|
||||
- **Hypothesis:** Mock expectations not matching actual behavior
|
||||
- **Check:** Mock setup vs reality
|
||||
|
||||
### Phase 3: Fix Implementation (Priority: HIGH)
|
||||
|
||||
**Fix Patterns by Category:**
|
||||
|
||||
#### Pattern A: Race Condition Fixes
|
||||
```go
|
||||
// Add proper synchronization
|
||||
var mu sync.Mutex
|
||||
mu.Lock()
|
||||
defer mu.Unlock()
|
||||
|
||||
// Or increase timeouts for CI
|
||||
timeout := 1 * time.Second
|
||||
if os.Getenv("CI") == "true" {
|
||||
timeout = 5 * time.Second
|
||||
}
|
||||
```
|
||||
|
||||
#### Pattern B: Temp Directory Fixes
|
||||
```go
|
||||
// Ensure proper cleanup
|
||||
t.Cleanup(func() {
|
||||
os.RemoveAll(tempDir)
|
||||
})
|
||||
|
||||
// Check directory exists before operations
|
||||
if _, err := os.Stat(dir); os.IsNotExist(err) {
|
||||
t.Skip("Directory not accessible in CI")
|
||||
}
|
||||
```
|
||||
|
||||
#### Pattern C: Timing/Async Fixes
|
||||
```go
|
||||
// Use Eventually pattern for async operations
|
||||
require.Eventually(t, func() bool {
|
||||
// Check condition
|
||||
return result == expected
|
||||
}, 5*time.Second, 100*time.Millisecond)
|
||||
```
|
||||
|
||||
#### Pattern D: Database/Encryption Fixes
|
||||
```go
|
||||
// Ensure encryption key is set in test
|
||||
t.Setenv("CHARON_ENCRYPTION_KEY", "your-test-key")
|
||||
|
||||
// Verify service initialization
|
||||
require.NoError(t, err, "Service should initialize with valid key")
|
||||
```
|
||||
|
||||
### Phase 4: Validation (Priority: CRITICAL)
|
||||
|
||||
**Local Validation:**
|
||||
```bash
|
||||
# Run all tests multiple times to catch flaky tests
|
||||
for i in {1..5}; do
|
||||
echo "Run $i"
|
||||
go test -v -run 'TestMain_DefaultStartupGracefulShutdown_Subprocess|TestGetAcquisitionConfig|TestEnsureBouncerRegistration_ConcurrentCalls|TestPluginHandler_ReloadPlugins_WithErrors|TestFetchIndexFallbackHTTP|TestRunScheduledBackup_CleanupFails|TestSecurityService_LogAudit_ChannelFullFallsBackToSyncWrite|TestCredentialService_Delete|TestCredentialService_GetCredentialForDomain_WildcardMatch' ./...
|
||||
done
|
||||
```
|
||||
|
||||
**CI Validation:**
|
||||
1. Push fixes to feature branch
|
||||
2. Monitor CI workflow runs
|
||||
3. Verify all 9 tests pass in CI
|
||||
4. Verify coverage remains >=85%
|
||||
|
||||
---
|
||||
|
||||
## 🎯 DECISION POINT
|
||||
|
||||
**BEFORE proceeding with fixes, we need to decide:**
|
||||
|
||||
**Option A: Fix Now (Blocking PR #666)**
|
||||
- Investigate and fix all 9 tests
|
||||
- Estimated time: 2-4 hours
|
||||
- Blocks returning to original coverage goal
|
||||
- **Pros:** Clean slate, no test debt
|
||||
- **Cons:** Delays coverage work further
|
||||
|
||||
**Option B: Skip for Now (Unblock Coverage Work)**
|
||||
- Mark failing tests with `t.Skip()` and TODO comments
|
||||
- Return to original coverage goal
|
||||
- Fix tests in separate PR after coverage work complete
|
||||
- **Pros:** Unblocks progress on original goal
|
||||
- **Cons:** Accumulates technical debt
|
||||
|
||||
**Option C: Parallel Work**
|
||||
- Backend Dev fixes tests while other work continues
|
||||
- **Pros:** Maximizes throughput
|
||||
- **Cons:** Coordination overhead
|
||||
|
||||
**RECOMMENDATION:** Need user input on priority:
|
||||
- If coverage patch is time-sensitive → Option B
|
||||
- If CI stability is critical → Option A
|
||||
- If team has bandwidth → Option C
|
||||
|
||||
---
|
||||
|
||||
## 📊 RISK ASSESSMENT
|
||||
|
||||
**Continuing with failing tests:**
|
||||
- ❌ PR #666 cannot merge (CI failures blocking)
|
||||
- ❌ Coverage validation workflow unreliable
|
||||
- ❌ May mask new test failures
|
||||
- ⚠️ Could indicate actual bugs in production code
|
||||
|
||||
**Fixing tests first:**
|
||||
- ✅ Clean CI before coverage expansion
|
||||
- ✅ Higher confidence in code quality
|
||||
- ✅ Easier to isolate coverage-related issues
|
||||
- ⏳ Delays return to original coverage objective
|
||||
|
||||
---
|
||||
|
||||
## 👤 USER ACTION REQUIRED
|
||||
|
||||
**Please decide:**
|
||||
|
||||
1. **Fix all 9 tests NOW before coverage work?** (Option A)
|
||||
2. **Skip/TODO tests to unblock coverage goal?** (Option B)
|
||||
3. **Parallel: Some agent fixes tests while coverage work proceeds?** (Option C)
|
||||
|
||||
**Also confirm:**
|
||||
- Are these tests known to be flaky or is this the first failure?
|
||||
- Were these tests passing before the encryption key changes?
|
||||
- Are there any other blocking issues to address first?
|
||||
|
||||
---
|
||||
|
||||
## 📝 NOTES
|
||||
|
||||
- Encryption key validation verified locally: `echo "r2Xfh5PUagXEVG1Qhg9Hq3ELfMdtQZx5gX0kvE23BHQ=" | base64 -d | wc -c` → 32 bytes ✅
|
||||
- CI logs show no RotationService warnings ✅
|
||||
- Coverage at 85.1% meets requirement ✅
|
||||
- Original conversation goal was: "The final part to a green CI is the codecov patch. We're missing 578 lines of coverage"
|
||||
|
||||
---
|
||||
|
||||
## 🔄 NEXT STEPS (PENDING USER DECISION)
|
||||
|
||||
1. User decides on Option A, B, or C
|
||||
2. If Option A: Begin Phase 1 reproduction
|
||||
3. If Option B: Skip tests with TODO comments, proceed to coverage
|
||||
4. If Option C: Split work: Backend Dev fixes tests, Planning starts coverage plan
|
||||
Reference in New Issue
Block a user