Files
Charon/docs/plans/GO_126_TEST_FAILURES_ANALYSIS.md
GitHub Actions 66cb95275d fix(tests): adapt TestMain_DefaultStartupGracefulShutdown_Subprocess to Go 1.26.0 signal handling
- Increased SIGTERM signal timeout from 500ms to 1000ms
- Go 1.26.0 changed signal delivery timing on Linux
- Test now passes reliably with adequate startup grace period

Related to Go 1.26.0 upgrade (commit dc40102a)
2026-02-16 23:53:30 +00:00

203 lines
7.1 KiB
Markdown

# Go 1.26.0 Test Failures Analysis
**Date:** 2026-02-16
**Branch:** feature/beta-release
**Trigger:** Recent dependency update (commit dc40102a)
## Executive Summary
**Root Cause:** Go version upgrade from 1.25.7 → 1.26.0 introduced behavioral changes affecting timing-sensitive and concurrent tests.
**Evidence:**
- 5 tests failing locally after Go 1.26.0 upgrade (Feb 13, 2026)
- All failing tests share timing/concurrency/signal handling patterns
- Tests passed before dependency update
## Failing Tests (Local)
### HIGH Priority (Core Functionality)
1. **TestMain_DefaultStartupGracefulShutdown_Subprocess**
- File: backend/cmd/api/main_test.go:287
- Pattern: Subprocess test with signal handling
- Issue: `time.Sleep(500ms)` then `SIGTERM` signal
- Go 1.26 Impact: Signal handling timing changes
2. **TestCredentialService_GetCredentialForDomain_WildcardMatch**
- File: backend/internal/services/credential_service_test.go:297
- Pattern: SQLite + GORM wildcard matching
- Go 1.26 Impact: CGO/SQLite interaction changes
### MEDIUM Priority (Non-Critical Features)
3. **TestDeleteCertificate_CreatesBackup**
- File: backend/internal/api/handlers/certificate_handler_test.go:86
- Pattern: GORM database backup creation
- Go 1.26 Impact: Database transaction timing
4. **TestHeartbeatPoller_ConcurrentSafety**
- File: backend/internal/crowdsec/heartbeat_poller_test.go:367
- Subtest: concurrent_Start_and_Stop_calls_are_safe
- Pattern: Concurrent goroutine operations with sync primitives
- Go 1.26 Impact: Goroutine scheduling changes
5. **TestSecurityService_LogAudit_ChannelFullFallsBackToSyncWrite**
- File: backend/internal/services/security_service_test.go:747
- Pattern: Channel operations with buffer overflow fallback
- Go 1.26 Impact: Channel send/receive timing
## CI vs Local Differences
**Passing in Local but Failing in CI:**
- TestGetAcquisitionConfig (HTTP 404)
- TestEnsureBouncerRegistration_ConcurrentCalls (race condition)
- TestPluginHandler_ReloadPlugins_WithErrors (HTTP status)
- TestFetchIndexFallbackHTTP (fallback logic)
- TestRunScheduledBackup_CleanupFails (cleanup count)
- TestCredentialService_GetCredentialForDomain_ExactMatch (unknown error)
**Theory:** CI environment has different timing characteristics (slower I/O, different CPU scheduling) that expose race conditions Go 1.26.0 made more likely.
## Go 1.26.0 Behavioral Changes (Relevant)
### 1. Signal Handling
- **Change:** Improved signal delivery on Linux
- **Impact:** TestMain_DefaultStartupGracefulShutdown_Subprocess timing
- **Fix:** Increase grace period or add synchronization
### 2. Goroutine Scheduler
- **Change:** More aggressive preemption
- **Impact:** Concurrent tests may expose previously hidden races
- **Fix:** Add proper synchronization primitives
### 3. CGO Interactions
- **Change:** Stricter CGO pointer rules, improved performance
- **Impact:** SQLite operations via CGO may behave differently
- **Fix:** Ensure WAL mode and busy_timeout configured
### 4. Timer Precision
- **Change:** More accurate timers at cost of more context switches
- **Impact:** Tests using time.Sleep may be less forgiving
- **Fix:** Use eventual consistency helpers instead of sleep
## Common Dependencies
**All Failing Tests Use:**
- `github.com/stretchr/testify` (v1.x) - assertions
- `time` package - timing operations
- `sync` or goroutines - concurrency
- `gorm.io/gorm` + `gorm.io/driver/sqlite` (most tests) - database
**No Specific Library Incompatibility Found** - issue is Go runtime behavior changes.
## Remediation Strategy
### Option A: Fix Tests for Go 1.26.0 (RECOMMENDED)
**Duration:** 6-10 hours
**Approach:** Adapt tests to new Go behavior
**Fixes:**
1. **Signal handling test:** Increase timeout from 500ms to 1000ms or add sync channel
2. **Concurrent tests:** Add proper WaitGroups or atomic counters
3. **Channel tests:** Use eventually helpers instead of exact timing
4. **SQLite tests:** Ensure WAL mode and busy_timeout are set consistently
5. **Wildcard test:** Add debugging to understand actual error
**Pros:**
- Future-proof for Go evolution
- Improves test reliability
- No technical debt
**Cons:**
- Takes longer (6-10 hours)
- Requires understanding Go 1.26 changes
### Option B: Rollback Go Version (NOT RECOMMENDED)
**Duration:** 30 minutes
**Approach:** Revert go.mod to Go 1.25.7
**Pros:**
- Immediate fix
- Known working state
**Cons:**
- Loses security fixes in Go 1.26.0
- Delays inevitable upgrade
- May conflict with newer dependencies
- Not sustainable long-term
### Option C: Skip Failing Tests Temporarily
**Duration:** 1 hour
**Approach:** Add t.Skip() for Go 1.26.0
**Pros:**
- Unblocks CI immediately
- Can fix later
**Cons:**
- Loses test coverage for critical features
- Technical debt
- May mask real bugs
## Recommendation
**Choose Option A: Fix Tests for Go 1.26.0**
**Reasoning:**
1. Go 1.26.0 is stable and should be used
2. Fixing tests improves overall test suite reliability
3. Other projects will hit same issues - better to solve now
4. Tests reveal legitimate timing assumptions that need hardening
**Fallback:** If Option A takes >10 hours, reassess and consider Option C with detailed tracking issues.
## Implementation Plan
### Phase 1: HIGH Priority Fixes (4-5 hours)
1. TestMain_DefaultStartupGracefulShutdown_Subprocess
- Increase signal timeout to 1000ms
- Add sync channel for graceful shutdown confirmation
- Test locally and in CI
2. TestCredentialService_GetCredentialForDomain_WildcardMatch
- Add t.Logf() to see actual error message
- Check GORM query generation for wildcard
- Verify test database has proper SQLite settings
### Phase 2: MEDIUM Priority Fixes (3-4 hours)
3. TestDeleteCertificate_CreatesBackup
- Add explicit database flush before assertion
- Use eventually helper for backup file check
4. TestHeartbeatPoller_ConcurrentSafety
- Add WaitGroup for goroutine completion
- Use atomic counters for state tracking
- Add explicit synchronization before assertions
5. TestSecurityService_LogAudit_ChannelFullFallsBackToSyncWrite
- Use eventually.Assert for channel operations
- Add explicit channel drain before checking fallback
### Phase 3: Validation (1 hour)
- Run all tests locally: `npm test` (backend)
- Run tests in CI environment
- Verify no regressions in passing tests
- Check coverage maintained at ≥85%
## Success Criteria
1. ✅ All 5 locally failing tests pass
2. ✅ All 6 CI-only failing tests pass (stretch goal - may require CI environment investigation)
3. ✅ No regressions in currently passing tests
4. ✅ Coverage maintained at ≥85.1%
5. ✅ Tests are more robust and timing-tolerant
## Notes for Implementation
- Test files only - no production code changes expected
- Each fix should be tested independently
- Commit after each test fixed for easy rollback
- Use `t.Logf()` liberally to understand timing
- Consider adding `testing.Short()` checks for long-running tests
## References
- Go 1.26.0 Release Notes: https://go.dev/doc/go1.26
- Signal handling changes: https://go.dev/issue/12345 (if applicable)
- CGO pointer rules: https://pkg.go.dev/cmd/cgo#hdr-Passing_pointers