69 KiB
Executable File
Phase 2.3: Critical Fixes Remediation Plan
Status: Planning - Ready for Execution Created: 2026-02-09 Target Completion: 2026-02-09 (2-3 hours parallel execution) Dependencies: Phase 2.2 discovery complete, Phase 3 E2E security testing blocked until completion
1. Executive Summary
Pre-Execution Validation Checklist
Before proceeding to Phase 2.3a, verify all prerequisites:
- All developers assigned and available
- Database in clean state (fresh container)
- Git workspace clean (no uncommitted changes)
- Code review owners assigned
- Approval authority (Tech Lead) available for sign-off
- Backend Docker build environment ready
- Frontend test environment ready (Node.js, Playwright)
- Auth endpoint verified exists (2.3c pre-check)
If any items unchecked: Resolve before proceeding to Phase 2.3a
Overview
Phase 2.3 addresses three critical blocking issues identified during Phase 2.2 discovery that prevent progression to Phase 3 E2E security testing:
| Issue | Severity | Component | Fix Effort | Blocker? |
|---|---|---|---|---|
| CVE-2024-45337 - golang.org/x/crypto/ssh authorization bypass | CRITICAL | Backend Dependencies | 1 hour | YES - Production blocker |
| InviteUser Email Blocking - Synchronous SMTP blocks HTTP response | HIGH | Backend (user_handler.go) | 2-3 hours | YES - Test suite blocker |
| Test Auth Token Refresh - E2E tests fail with 401 after 30+ min | MEDIUM | Frontend (Playwright fixtures) | 0.5-1 hour | YES - Test execution blocker |
Critical Path & Timeline
Sequential Timeline: 4-5 hours Parallel Timeline: 2-3 hours (recommended) Phase 3 Start Eligible: After ALL three phases complete
Interdependency Analysis:
- ✅ 2.3a and 2.3b are independent (different code areas)
- ✅ 2.3a and 2.3c are independent (different languages/layers)
- ✅ 2.3b and 2.3c are independent (can run in parallel)
- ✅ All three can run simultaneously with different developers
Phase 3 Blocking Dependencies
| Phase | Blocker Type | Consequence if Delayed |
|---|---|---|
| 2.3a | Security compliance | Cannot deploy to production (CVE vulnerability) |
| 2.3b | Functional requirement | User management test suite fails/timeouts |
| 2.3c | Test infrastructure | Phase 3 tests will fail with 401 errors after 30 min |
Decision: All three MUST complete before Phase 3 approval.
2. Phase 2.3a: Dependency Security Update (1 hour)
Priority: 🔴 CRITICAL Owner: Backend Developer Can Run in Parallel: Yes (with 2.3b and 2.3c) Start Time: Immediately Target Completion: 1 hour
Objective
Update golang.org/x/crypto and related dependencies to patch CVE-2024-45337 (SSH authorization bypass), then verify with container security scan.
Root Cause
CVE Details:
- CVE-2024-45337 - golang.org/x/crypto/ssh authorization bypass
- Affected versions: Before v0.31.0
- Risk: Attackers can bypass authorization checks via SSH protocol manipulation
- Impact: If Charon exposes SSH management → complete auth bypass
Current Status
# Current go.mod references:
go list -m all | grep -E 'golang.org/x/(crypto|net|oauth2)|github.com/quic-go'
# Expected output: Old versions (v0.27.0, v0.28.x, v0.x.x)
Steps
Step 1: Update Dependencies (15 min)
File: backend/go.mod
Command: Execute from /projects/Charon/
cd backend
# Update golang.org/x/crypto to latest
go get -u golang.org/x/crypto
# Update related security packages
go get -u golang.org/x/net
go get -u golang.org/x/oauth2
# Update WebRTC/QUIC dependencies (may depend on crypto)
go get -u github.com/quic-go/quic-go
# Cleanup and verify integrity
go mod tidy
go mod verify
Expected Changes:
golang.org/x/crypto→ v0.31.0 or latergolang.org/x/net→ latest (v0.33.0+)golang.org/x/oauth2→ latestgithub.com/quic-go/quic-go→ latest compatible
Verification:
# Should show updated versions
go list -m all | grep -E 'golang.org/x|(quic-go|crypto)'
# Should complete without errors
go mod verify
Step 2: Build & Test Backend (15 min)
Ensure backend compiles with new dependencies:
# Test compilation (without running)
go build -v ./...
# Run backend unit tests
go test -short ./...
# Should complete in <5 min with no errors
Expected Result: Build succeeds, tests pass, no deprecation warnings related to crypto APIs.
Step 3: Rebuild Docker Image (15 min)
File: Dockerfile
Command: Execute from /projects/Charon/
# Clean build (no cache) to ensure new go.mod is used
docker build \
--no-cache \
-t charon:local \
-f Dockerfile \
.
# Expected output:
# ✓ Building backend stage (uses new go.mod)
# ✓ Running `go mod verify`
# ✓ Building binary
# ✓ Final image layers
# Successfully built IMAGE_ID
# Successfully tagged charon:local
Timing: 5-7 minutes for full build
Step 4: Container Security Scan (15 min)
Tool: Trivy (vulnerability scanner)
Command: Execute from /projects/Charon/
# Scan the local image for vulnerabilities
trivy image \
--severity CRITICAL,HIGH \
--exit-code 0 \
--timeout=30m \
charon:local
# Save results to file for review
trivy image \
--format json \
--severity CRITICAL,HIGH \
charon:local > /tmp/trivy-charon-local.json
Expected Output:
charon:local (alpine 3.19)
=======================
Total: 0 vulnerabilities (CRITICAL: 0, HIGH: 0)
Scanned at: 2026-02-09T14:30:00Z
Database updated at: 2026-02-09T14:00:00Z
If vulnerabilities remain:
- ❌ CVE-2024-45337 still present → dependency update failed
- ❌ New vulnerabilities discovered → investigate and update
- → Document in troubleshooting section
- → Retry with
go mod graph | grep cryptoto debug
Step 5: Smoke Test Core Functionality (10 min)
Endpoint: POST /api/v1/auth/login
Data: Use default test credentials
# Start or ensure container is running
docker run -d \
--name charon-test \
-p 8080:8080 \
-e CHARON_DB_PATH=/data/charon.db \
charon:local
# Wait for health check
sleep 5
# Test login endpoint
curl -s -X POST http://localhost:8080/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{
"email":"admin@example.com",
"password":"TestPass123!"
}' | jq .
# Expected response:
# {
# "token": "eyJ...",
# "expires_at": "2026-02-10T14:30:00Z",
# ...
# }
# Cleanup
docker stop charon-test
docker rm charon-test
Success Criteria
- ✅ Dependency Update: All golang.org/x packages updated to latest
- ✅ Build Success: Docker image builds without errors
- ✅ No CVE-2024-45337: Trivy scan reports 0 CRITICAL vulnerabilities
- ✅ Smoke Test: Login endpoint responds with valid token
- ✅ Trivy Database: Current (within 1 hour of scan time)
Failure Handling
If build fails after dependency update:
- Check for incompatible API changes:
go mod why -graph golang.org/x/crypto - Review changelog for breaking changes
- May need code updates in cryptography-related handlers
- Escalate to platform owner if APIs changed significantly
If Trivy still reports CVE-2024-45337:
- Verify
golang.org/x/crypto v0.31.0+installed:go list -m golang.org/x/crypto - Check Trivy database is current:
trivy image-config --scanners config --list - Rebuild without cache:
docker build --no-cache ...
Regression Testing
Run quick smoke tests to ensure nothing broke:
- ✅ Login succeeds
- ✅ Logout succeeds
- ✅ Token validation works
- ✅ Permission checks work (admin endpoint accessible)
Timing: 5-10 minutes total
3. Phase 2.3b: Async Email Refactor (2-3 hours, Parallelizable)
Priority: 🟡 HIGH Owner: Backend Developer (may be different from 2.3a, or same with sequential scheduling) Can Run in Parallel: Yes (with 2.3a and 2.3c) Start Time: Immediately (or after 2.3a if same developer) Target Completion: 2-3 hours
Objective
Convert InviteUser endpoint from synchronous email sending (blocking HTTP request) to async pattern (non-blocking background job). This unblocks the user management test suite and prevents endpoint timeouts in production.
Root Cause
Current Code: /projects/Charon/backend/internal/api/handlers/user_handler.go (lines 462-469)
// CURRENT BLOCKING PATTERN
if h.MailService.IsConfigured() {
baseURL, ok := utils.GetConfiguredPublicURL(h.DB)
if ok {
appName := getAppName(h.DB)
// ❌ THIS BLOCKS THE ENTIRE HTTP REQUEST
if err := h.MailService.SendInvite(user.Email, inviteToken, appName, baseURL); err == nil {
emailSent = true
}
}
}
return c.JSON(200, user)
CRITICAL BUG - Race Condition:
The user Email field referenced inside a goroutine MUST be captured BEFORE launching the goroutine. If any other goroutine or code modifies the user object, the email sending could get stale or corrupted data.
Danger Pattern (DON'T DO THIS):
go func() {
// ❌ RACE CONDITION: user object may be modified before this runs
if err := h.MailService.SendInvite(user.Email, ...); err != nil { ... }
}()
Why it blocks:
h.MailService.SendInvite()calls SMTP synchronously- Waits for SMTP server response (can take 1-30 seconds)
- HTTP request blocked until email completes or errors
- Test timeout after 60 seconds if SMTP is slow
Implementation Strategy
Three options for async pattern:
Option A: Simple Goroutine (Recommended - 30 min)
Best for: MVP, fast iteration, sufficient functionality Trade-off: No guaranteed delivery, no retry mechanism Code change:
// AFTER - Non-blocking async pattern
go func() {
if h.MailService.IsConfigured() {
baseURL, ok := utils.GetConfiguredPublicURL(h.DB)
if ok {
appName := getAppName(h.DB)
if err := h.MailService.SendInvite(user.Email, inviteToken, appName, baseURL); err != nil {
// Log error but don't block response
h.Logger.Error("Failed to send invite email",
zap.String("user_email", user.Email),
zap.Error(err))
}
}
}
}()
// Response returns immediately (no wait for email)
return c.JSON(http.StatusCreated, user)
Pros:
- ✅ Minimal code change (5 lines)
- ✅ No external dependencies
- ✅ Immediate response (sub-200ms)
- ✅ Thread-safe with goroutines
Cons:
- ❌ No retry mechanism
- ❌ No persistent queue
- ❌ Email may not send if service crashes during goroutine execution
Option B: Channel-Based Queue (Recommended for Phase 2.3b - 1.5-2 hours)
Best for: Balanced reliability + maintainability Trade-off: More code, but structured queue pattern Files to create/modify:
- Create:
backend/internal/services/email_queue.go - Modify:
backend/internal/api/handlers/user_handler.go - Modify:
backend/internal/api/server.go(initialize queue worker)
Architecture:
InviteUser handler
↓
Send job to channel (non-blocking, buffered channel)
↓
Return 201 response immediately
↓
Background worker goroutine
├─ Read job from channel
├─ Send email
├─ Log result (success/failure)
└─ Continue processing next job
Implementation sketch:
// backend/internal/services/email_queue.go
type EmailJob struct {
Email string
Token string
AppName string
BaseURL string
CreatedAt time.Time
}
type EmailQueue struct {
jobs chan EmailJob
log *zap.Logger
}
func NewEmailQueue(size int, log *zap.Logger) *EmailQueue {
q := &EmailQueue{
jobs: make(chan EmailJob, size),
log: log,
}
// Start worker goroutine
go q.worker()
return q
}
func (q *EmailQueue) Enqueue(job EmailJob) error {
select {
case q.jobs <- job:
return nil
default:
// Queue full - could retry or log warning
q.log.Warn("Email queue full, discarding job", zap.String("email", job.Email))
return errors.New("queue full")
}
}
func (q *EmailQueue) worker() {
for job := range q.jobs {
// Process email (retry logic optional)
if err := q.sendEmail(job); err != nil {
q.log.Error("Failed to send email",
zap.String("email", job.Email),
zap.Error(err))
}
}
}
Handler usage:
// In InviteUser handler (much simpler now)
go func() {
h.EmailQueue.Enqueue(EmailJob{
Email: user.Email,
Token: inviteToken,
AppName: appName,
BaseURL: baseURL,
})
}()
return c.JSON(http.StatusCreated, user)
Pros:
- ✅ Structured queue pattern
- ✅ Buffered channel handles spikes
- ✅ Single worker processes emails in order
- ✅ Easy to monitor (queue length, errors)
- ✅ Extensible (add retry logic later)
Cons:
- ⚠️ Email lost if service crashes (not persisted)
- ⚠️ More code than Option A
Option C: Database Task Table (Most Robust - 2-3 hours)
Best for: Production-grade reliability Trade-off: Most code, database schema change required Files:
- Migrate: Create table
email_tasks - Create:
backend/internal/services/email_persistence.go - Modify:
backend/internal/api/handlers/user_handler.go - Modify:
backend/internal/api/server.go(initialize worker)
Architecture:
InviteUser handler
↓
Insert email_task row (status='pending')
↓
Return 201 response immediately
↓
Background worker goroutine
├─ Query pending email_task rows
├─ Send email
├─ Update task (status='sent' or 'failed')
├─ Retry on failure (configurable attempts)
└─ Continue polling
Schema:
CREATE TABLE email_tasks (
id INTEGER PRIMARY KEY AUTOINCREMENT,
email TEXT NOT NULL,
token TEXT NOT NULL,
subject TEXT,
body TEXT,
status TEXT DEFAULT 'pending', -- pending, sent, failed
attempts INTEGER DEFAULT 0,
max_attempts INTEGER DEFAULT 3,
error_message TEXT,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
sent_at DATETIME,
UNIQUE(email, token) -- Prevent duplicates
);
Pros:
- ✅ Guaranteed delivery (persisted in database)
- ✅ Automatic retry (configurable)
- ✅ Full audit trail (when sent, errors)
- ✅ Survives service crashes
Cons:
- ❌ Schema migration required
- ❌ Additional polling overhead
- ❌ Complexity in retry logic
Recommended Approach for Phase 2.3b
Execute Option A (simple goroutine) for Phase 2.3b (30 min)
- Fast, unblocks tests immediately
- Sufficient for current requirements
- Can refactor to Option B/C later if needed
Then if time permits, begin Option B refactoring (additional 1-2 hours)
Implementation: Option A (30 min)
File: backend/internal/api/handlers/user_handler.go
Location: Method InviteUser, around line 462-469
Current code:
// Try to send invite email
emailSent := false
if h.MailService.IsConfigured() {
baseURL, ok := utils.GetConfiguredPublicURL(h.DB)
if ok {
appName := getAppName(h.DB)
if err := h.MailService.SendInvite(user.Email, inviteToken, appName, baseURL); err == nil {
emailSent = true
}
}
}
Updated code (WITH RACE CONDITION FIX):
// Send invite email asynchronously (non-blocking)
emailSent := false // Placeholder - email will be sent in background
if h.MailService.IsConfigured() {
// Capture user data BEFORE launching goroutine to avoid race condition
userEmail := user.Email
go func() {
baseURL, ok := utils.GetConfiguredPublicURL(h.DB)
if ok {
appName := getAppName(h.DB)
// Use captured email instead of user.Email to prevent race condition
if err := h.MailService.SendInvite(userEmail, inviteToken, appName, baseURL); err != nil {
// Log failure but don't block response
h.Logger.Error("Failed to send invite email",
zap.String("user_email", userEmail),
zap.String("error", err.Error()))
}
}
}()
emailSent = true // Set true immediately since email will be sent in background
}
What changed:
- CAPTURE user.Email before goroutine (
userEmail := user.Email) - Wrapped email sending in
go func() { ... }()goroutine - Use captured
userEmailinside goroutine (notuser.Email) - Email sends in background (non-blocking)
- HTTP response returns immediately
- Added error logging (via h.Logger which should exist)
- Set
emailSent = trueimmediately since we're sending async
WHY THIS MATTERS:
If the user object is modified or freed while the goroutine is running, directly accessing user.Email could read corrupt/stale data. By capturing userEmail first, we guarantee the goroutine always sends to the correct email address.
Testing Strategy: Phase 2.3b
Test 1: Response Time Verification (5 min)
File: Add to test if needed, or use curl:
# Measure response time
time curl -s -X POST http://localhost:8080/api/v1/users/invite \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"email":"newuser@example.com"}' | jq .
# Expected output:
# ✅ real 0m0.150s (should be <200ms, not >5s)
# ✅ JSON response with user details
Test 2: Database Commit Verification (5 min)
# Verify user created immediately (before email completes)
curl -s http://localhost:8080/api/v1/users \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq '.items[] | select(.email=="newuser@example.com")'
# Expected:
# ✅ User appears in list immediately
# ✅ Status shows created (not pending)
Test 3: Email Sending in Background (10 min)
File: Unit test in /projects/Charon/backend/internal/api/handlers/user_handler_test.go
// Add test case
func TestInviteUserAsync(t *testing.T) {
// Setup: Create mock mail service
mockMailService := &MockMailService{
sendInviteDelay: time.Second * 2, // Simulate slow SMTP
}
handler := &UserHandler{
MailService: mockMailService,
// ... other fields
}
// Record response time
start := time.Now()
response := handler.InviteUser(testContext)
elapsed := time.Since(start)
// Assert: Response returned quickly (async)
assert.Less(t, elapsed, 200*time.Millisecond, "Response should be immediate")
assert.Equal(t, http.StatusCreated, response.Status, "Should return 201")
// Sleep to allow goroutine to complete
time.Sleep(time.Second * 3)
// Assert: Mail service was called
assert.Equal(t, 1, mockMailService.callCount, "Email should be sent")
}
Test 4: E2E Test Suite - Test #248 (10 min)
File: Run existing E2E tests
# Run the full user management test suite
npx playwright test \
--project=firefox \
tests/user-management.spec.ts::test('should invite user') \
--timeout=5000 # Reduce timeout to verify fast response
# Expected:
# ✅ Test passes
# ✅ User created
# ✅ Response time <200ms (not timeout)
Test 5: Other User Management Tests (10 min)
# Run all related user management tests
npx playwright test \
--project=firefox \
tests/user-management.spec.ts
# Expected:
# ✅ Test #248 (invite user)
# ✅ Test #258 (update permissions)
# ✅ Test #260 (remove hosts)
# ✅ Test #262 (toggle user)
# ✅ Test #269 (set role to admin)
# ✅ Test #270 (set role to user)
# All tests should complete without timeout
Success Criteria: Phase 2.3b
- ✅ Response Time: InviteUser endpoint returns in <200ms (not >5 seconds)
- ✅ Immediate Commit: User created and visible in database immediately after response
- ✅ Async Email: Email sent in background (verified via logs or email delivery)
- ✅ Error Handling: Email failures logged but don't block endpoint
- ✅ Test #248 Passes: E2E test completes without timeout
- ✅ No Regressions: All other user management tests pass
- ✅ Code Change: Minimal (5-10 lines modified in one handler)
Failure Handling
If endpoint still times out after change:
- Verify goroutine was added correctly (check code review)
- Check if there's another blocking operation (database query?)
- Profile with pprof if needed:
go tool pprof http://localhost:6060/debug/pprof/profile - May need Option B (queue-based) or Option C (database-based) if other bottlenecks found
If email no longer sends:
- Goroutine may be exiting before email completes
- Add
time.Sleep()in test (not production) to allow goroutine to finish - Consider Option B if guaranteed delivery needed
Effort Estimate
| Task | Duration | Notes |
|---|---|---|
| Code change (Option A) | 10 min | Simple goroutine wrap |
| Unit test addition | 10 min | Add async test case |
| Manual testing (curl) | 10 min | Verify response time |
| E2E test validation | 10 min | Run Playwright tests |
| Code review + fixes | 10 min | Address feedback |
| Total | 50 min | Within 30min-1hr estimate |
If refactoring to Option B during same phase: +60-90 min
4. Phase 2.3c: Test Auth Token Refresh (30 min - 1 hour, Parallelizable)
Priority: 🟡 MEDIUM Owner: Frontend Developer (or Backend if no separate Frontend) Can Run in Parallel: Yes (with 2.3a and 2.3b) Start Time: Immediately Target Completion: 30 min - 1 hour
Objective
Implement automatic auth token refresh in Playwright test fixtures to prevent HTTP 401 errors during long-running test sessions (>30 minutes).
Pre-Execution Verification
CRITICAL STEP - Do this FIRST before implementing fixtures:
Verify the refresh endpoint exists and works. If it's missing, you'll need to implement it first (additional 30 min).
Manual Verification Script
Run this before starting Phase 2.3c implementation:
#!/bin/bash
# Pre-check: Verify auth token refresh endpoint exists
echo "[1/3] Getting fresh auth token..."
TOKEN=$(curl -s -X POST http://localhost:8080/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{"email":"admin@example.com","password":"TestPass123!"}' \
| jq -r '.token')
if [ -z "$TOKEN" ] || [ "$TOKEN" == "null" ]; then
echo "❌ FAILED: Could not obtain auth token. Check login endpoint."
exit 1
fi
echo "✅ Token obtained: ${TOKEN:0:20}..."
echo "[2/3] Checking if refresh endpoint exists (POST /api/v1/auth/refresh)..."
RESPONSE=$(curl -s -w "\n%{http_code}" -X POST http://localhost:8080/api/v1/auth/refresh \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{}')
HTTP_CODE=$(echo "$RESPONSE" | tail -1)
BODY=$(echo "$RESPONSE" | head -1)
if [ "$HTTP_CODE" == "404" ]; then
echo "❌ FAILED: Refresh endpoint not found (HTTP 404)"
echo " You must implement POST /api/v1/auth/refresh first (30 min task)"
exit 1
elif [ "$HTTP_CODE" == "401" ]; then
echo "❌ FAILED: Refresh endpoint returned 401 (invalid token)"
echo " Check token format and auth logic"
exit 1
elif [ "$HTTP_CODE" == "200" ]; then
echo "✅ Refresh endpoint exists and returned 200 OK"
NEW_TOKEN=$(echo "$BODY" | jq -r '.token' 2>/dev/null)
if [ -z "$NEW_TOKEN" ] || [ "$NEW_TOKEN" == "null" ]; then
echo "⚠️ WARNING: Endpoint returned 200 but no new token in response"
echo " Response body: $BODY"
else
echo "✅ New token received: ${NEW_TOKEN:0:20}..."
fi
else
echo "⚠️ Unexpected HTTP code: $HTTP_CODE"
echo " Response: $BODY"
exit 1
fi
echo "[3/3] Verification complete"
echo "✅ READY TO PROCEED with Phase 2.3c implementation"
Expected output:
✅ Token obtained: eyJhbGc...
✅ Refresh endpoint exists and returned 200 OK
✅ New token received: eyJhbGc...
✅ READY TO PROCEED with Phase 2.3c implementation
If failed: Implement /api/v1/auth/refresh endpoint first (separate 30-min task before Phase 2.3c)
Problem Statement
Current Symptom:
- E2E tests run for 30+ minutes
- After ~30 min, all API requests fail with HTTP 401 Unauthorized
- Tests timeout waiting for response
- Root cause: JWT auth token expires after 30 minutes
Why This Happens:
- JWT token issued at test start with 30-minute expiration
- Long test suites (Phase 3 E2E suite may be 60+ min)
- Token not refreshed before it expires
- All subsequent API calls rejected
Affected Tests:
- Full Phase 2 E2E suite (currently <30 min, but approaching limit)
- Phase 3 E2E security testing (60+ min, definitely exceeds token lifetime)
- Any future smoke tests or integration suites
Current Architecture
Auth Flow:
Login (POST /auth/login)
↓ Returns JWT token + refresh_token
↓ Token stored in Playwright fixtures
↓ Used in all subsequent API requests
↓ Token expires after 30 min
↓ ❌ All requests fail with 401
Token Details:
- Issued by: Backend (location: verify where tokens set in login handler)
- Expires: 30 minutes (configurable, likely in config or constants)
- Refresh endpoint: Assume exists (POST /auth/refresh or similar)
- Refresh token: May be issued with JWT for refresh flow
Current Fixture:
// tests/fixtures/auth.ts (or similar)
// Likely stores token in memory but doesn't refresh
Solution Options
Option A: Automatic Token Refresh in Fixtures (Recommended - 30 min)
Best for: Playwright-native solution, no backend changes
File: tests/fixtures/auth.ts (or wherever auth setup exists)
Implementation:
// tests/fixtures/auth.ts
import { test as base, expect } from '@playwright/test';
export const test = base.extend<{ authenticatedToken: string }>({
authenticatedToken: async ({ page }, use) => {
// Login and get token
const response = await page.request.post('http://localhost:8080/api/v1/auth/login', {
data: {
email: process.env.TEST_EMAIL || 'admin@example.com',
password: process.env.TEST_PASSWORD || 'TestPass123!'
}
});
const { token, expires_at } = await response.json();
// Create refresh wrapper
let currentToken = token;
let tokenExpiry = new Date(expires_at);
// Auto-refresh before expiry (85% of lifetime = ~25 min into 30 min token)
const tokenRefreshInterval = setInterval(async () => {
const now = new Date();
const timeUntilExpiry = tokenExpiry.getTime() - now.getTime();
// Refresh if within 5 minutes of expiry
if (timeUntilExpiry < 5 * 60 * 1000) {
try {
const refreshResponse = await page.request.post(
'http://localhost:8080/api/v1/auth/refresh',
{
headers: {
'Authorization': `Bearer ${currentToken}`
}
}
);
if (refreshResponse.ok()) {
const refreshData = await refreshResponse.json();
currentToken = refreshData.token;
tokenExpiry = new Date(refreshData.expires_at);
console.log('[AUTH] Token refreshed successfully');
} else {
console.warn('[AUTH] Token refresh failed', refreshResponse.status());
}
} catch (err) {
console.error('[AUTH] Token refresh error:', err);
}
}
}, 60 * 1000); // Check every 1 minute
// Use token in tests
await use(currentToken);
// Cleanup
clearInterval(tokenRefreshInterval);
}
});
// In tests, use the authenticatedToken fixture:
// test('example', async ({ page, authenticatedToken }) => {
// await page.request.get('/api/v1/users', {
// headers: { 'Authorization': `Bearer ${authenticatedToken}` }
// });
// });
Pros:
- ✅ No backend changes needed
- ✅ Automatic & transparent to tests
- ✅ Handles token expiry gracefully
- ✅ Works with existing auth infrastructure
Cons:
- ⚠️ Assumes refresh endpoint exists
- ⚠️ Slight overhead (periodic checks)
Option B: Longer Token Expiration for Tests (5 min)
Best for: Quick fix if refresh endpoint doesn't exist File: Backend config or test environment setup
Implementation:
# Environment variable approach
TEST_JWT_EXPIRATION=1440 # 24 hours instead of 30 min
# Or in backend config
CHARON_JWT_EXPIRATION_MINUTES=1440 # For test environment only
Pros:
- ✅ Single line change
- ✅ No fixture complexity
Cons:
- ❌ Reduces security (longer token lifetime)
- ❌ Only suitable for test environment
- ❌ May not work if backend doesn't respect env var
Option C: Cache & Reuse Auth Token (Recommended addition - 15 min)
Best for: Combining with Option A for reliability
File: tests/fixtures/auth.ts
Implementation:
// Store token on disk between test runs
const tokenCachePath = './test-auth-cache.json';
export const test = base.extend<{ authenticatedToken: string }>({
authenticatedToken: async ({ page }, use) => {
let token = null;
let tokenExpiry = null;
// Try to load cached token first
try {
const cached = JSON.parse(fs.readFileSync(tokenCachePath, 'utf-8'));
const expiryTime = new Date(cached.expires_at);
if (expiryTime > new Date()) {
// Token still valid
token = cached.token;
tokenExpiry = expiryTime;
console.log('[AUTH] Using cached token');
}
} catch (err) {
// Cache doesn't exist or invalid
}
// If no valid cached token, login
if (!token) {
const response = await page.request.post(
'http://localhost:8080/api/v1/auth/login',
{
data: {
email: process.env.TEST_EMAIL || 'admin@example.com',
password: process.env.TEST_PASSWORD || 'TestPass123!'
}
}
);
const data = await response.json();
token = data.token;
tokenExpiry = new Date(data.expires_at);
// Cache for next test run
fs.writeFileSync(tokenCachePath, JSON.stringify({
token,
expires_at: tokenExpiry.toISOString()
}));
}
// Refresh if needed (reuse token too)
const refreshInterval = setInterval(async () => {
// ... same as Option A
}, 60 * 1000);
await use(token);
clearInterval(refreshInterval);
}
});
Pros:
- ✅ Reuses token across test runs
- ✅ Faster startup (skip login on valid cached token)
- ✅ Automatic refresh if cache near expiry
Cons:
- ⚠️ Requires gitignore for cache file
- ⚠️ File-based cache less robust
Recommended Approach for Phase 2.3c
Execute Option A + Option C (45 min total)
- Add automatic token refresh in fixtures (Option A) - 30 min
- Cache token for reuse across test runs (Option C) - 15 min
Implementation: Option A + C (45 min)
File: tests/fixtures/auth.ts
Assumption: File exists (standard Playwright fixture pattern)
Current file likely contains:
import { test as base } from '@playwright/test';
export const test = base.extend({
// existing fixtures
});
Add auth with refresh:
import { test as base, expect } from '@playwright/test';
import * as fs from 'fs';
import * as path from 'path';
const TOKEN_CACHE_PATH = path.join(__dirname, '../../.auth-token-cache.json');
export const test = base.extend<{
authenticatedToken: string;
apiHeaders: (token: string) => Record<string, string>;
}>({
authenticatedToken: async ({ page, context }, use) => {
let currentToken = '';
let tokenExpiry = new Date(0);
/**
* Load cached token if still valid
*/
function loadCachedToken(): string | null {
try {
if (fs.existsSync(TOKEN_CACHE_PATH)) {
const cached = JSON.parse(fs.readFileSync(TOKEN_CACHE_PATH, 'utf-8'));
const expiry = new Date(cached.expires_at);
if (expiry > new Date()) {
console.log('[AUTH] Using cached token (valid until ' + expiry.toISOString() + ')');
tokenExpiry = expiry;
return cached.token;
}
}
} catch (err) {
console.warn('[AUTH] Failed to load cached token:', err);
}
return null;
}
/**
* Save token to cache
*/
function cacheToken(token: string, expiresAt: string): void {
try {
fs.writeFileSync(TOKEN_CACHE_PATH, JSON.stringify(
{ token, expires_at: expiresAt },
null,
2
));
console.log('[AUTH] Token cached for future test runs');
} catch (err) {
console.warn('[AUTH] Failed to cache token:', err);
}
}
/**
* Refresh token when near expiry
*/
async function refreshToken(): Promise<boolean> {
try {
const response = await page.request.post(
'http://localhost:8080/api/v1/auth/refresh',
{
headers: {
'Authorization': `Bearer ${currentToken}`
}
}
);
if (response.ok()) {
const data = await response.json();
currentToken = data.token;
tokenExpiry = new Date(data.expires_at);
cacheToken(currentToken, data.expires_at);
console.log('[AUTH] Token refreshed (new expiry: ' + data.expires_at + ')');
return true;
} else {
console.warn('[AUTH] Token refresh failed:', response.status());
return false;
}
} catch (err) {
console.error('[AUTH] Token refresh error:', err);
return false;
}
}
/**
* Get or create fresh token
*/
async function ensureValidToken(): Promise<string> {
const now = new Date();
const timeUntilExpiry = tokenExpiry.getTime() - now.getTime();
// If token expires in less than 5 minutes, refresh
if (timeUntilExpiry < 5 * 60 * 1000 && currentToken) {
await refreshToken();
return currentToken;
}
// If no token, try cache, then login
if (!currentToken) {
currentToken = loadCachedToken() || '';
}
if (!currentToken) {
// No cached token, login fresh
const loginResponse = await page.request.post(
'http://localhost:8080/api/v1/auth/login',
{
data: {
email: process.env.TEST_EMAIL || 'admin@example.com',
password: process.env.TEST_PASSWORD || 'TestPass123!'
}
}
);
if (!loginResponse.ok()) {
throw new Error(`Login failed: ${loginResponse.status()}`);
}
const data = await loginResponse.json();
currentToken = data.token;
tokenExpiry = new Date(data.expires_at);
cacheToken(currentToken, data.expires_at);
console.log('[AUTH] Fresh token obtained (expiry: ' + data.expires_at + ')');
}
return currentToken;
}
// Setup interval to refresh before expiry
const refreshCheckInterval = setInterval(async () => {
const now = new Date();
const timeUntilExpiry = tokenExpiry.getTime() - now.getTime();
if (currentToken && timeUntilExpiry < 5 * 60 * 1000) {
await refreshToken();
}
}, 60 * 1000); // Check every minute
// Ensure token on first use
await ensureValidToken();
// Provide token to tests
await use(currentToken);
// Cleanup
clearInterval(refreshCheckInterval);
},
/**
* Helper to generate authenticated API headers
*/
apiHeaders: async ({ authenticatedToken }, use) => {
const getHeaders = (token: string) => ({
'Authorization': `Bearer ${token}`,
'Content-Type': 'application/json'
});
await use(getHeaders);
}
});
export { expect };
Update .gitignore:
# Auth cache (test-only, contains valid JWT)
.auth-token-cache.json
Concurrency Safety: Cache File Locking
IMPORTANT: If Playwright tests run with --workers=N (parallel workers), multiple test instances write to .auth-token-cache.json simultaneously. This can corrupt the JSON file.
Add file locking to prevent corruption:
Install dependency:
npm install --save-dev async-lock
Update tests/fixtures/auth.ts with locking:
import * as fs from 'fs';
import * as path from 'path';
import AsyncLock from 'async-lock';
const TOKEN_CACHE_PATH = path.join(__dirname, '../../.auth-token-cache.json');
const cacheLock = new AsyncLock(); // Prevent concurrent writes
// Update cacheToken function (found in the extended fixture code above):
function cacheToken(token: string, expiresAt: string): void {
// Use lock to ensure only one worker writes cache at a time
cacheLock.acquire('auth-cache', () => {
try {
fs.writeFileSync(TOKEN_CACHE_PATH, JSON.stringify(
{ token, expires_at: expiresAt },
null,
2
));
console.log('[AUTH] Token cached safely (locked write)');
} catch (err) {
console.warn('[AUTH] Failed to cache token:', err);
}
});
}
Why this matters:
- Without locking: 2 workers write simultaneously → corrupted JSON file → cache becomes unusable
- With locking: Only 1 worker writes at a time → safe JSON file → cache works reliably
When to use:
- ✅ Use if running:
npx playwright test --workers=2or higher - ❌ Not needed if running with
--workers=1(sequential)
Update test usage:
Before (using raw token):
test('should list users', async ({ page }) => {
const response = await page.request.get('http://localhost:8080/api/v1/users', {
headers: {
'Authorization': `Bearer ${token}`
}
});
});
After (using fixtures):
import { test, expect } from '../fixtures/auth';
test('should list users', async ({ page, apiHeaders, authenticatedToken }) => {
const response = await page.request.get(
'http://localhost:8080/api/v1/users',
{
headers: apiHeaders(authenticatedToken)
}
);
expect(response.ok()).toBeTruthy();
});
Testing Strategy: Phase 2.3c
Test 1: Single Long-Running Test (20 min)
Objective: Verify token doesn't expire in 60-minute test session
# Run a single test that takes 30+ minutes
# This should complete without 401 errors
npx playwright test \
tests/some-long-test.spec.ts::test('60-minute task') \
--timeout=3600000 # 60 minutes
Expected Result:
- ✅ No HTTP 401 errors mid-test
- ✅ Token refreshed at ~25 min mark (verify in console logs)
- ✅ All API calls succeed
Test 2: Full Phase 2 E2E Suite (30 min)
# Run all Phase 2 E2E tests
npx playwright test \
tests/phase2/ \
--reporter=html
# Expected:
# ✅ All tests complete
# ✅ No 401 errors
# ✅ Console logs show token refresh events
Verification:
- Check console for:
[AUTH] Token refreshed - Check for cached token:
ls -la .auth-token-cache.json
Test 3: Verify Cache Reuse (5 min)
# Run suite twice to verify token reuse
npx playwright test tests/phase2/ --workers=1
# Look for:
# First run: "[AUTH] Fresh token obtained"
# Run console log again (or second invocation):
# "[AUTH] Using cached token"
Test 4: Verify Refresh Endpoint (5 min)
Manual test:
# Get token
TOKEN=$(curl -s -X POST http://localhost:8080/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{"email":"admin@example.com","password":"TestPass123!"}' | jq -r '.token')
# Try refresh endpoint
curl -s -X POST http://localhost:8080/api/v1/auth/refresh \
-H "Authorization: Bearer $TOKEN" | jq .
# Expected:
# {
# "token": "eyJ...",
# "expires_at": "2026-02-09T15:30:00Z"
# }
Success Criteria: Phase 2.3c
- ✅ No 401 Errors: 60+ minute test run completes without HTTP 401
- ✅ Token Refresh: Logs show token is refreshed automatically
- ✅ Cache Reuse: Second test run uses cached token (not login again)
- ✅ Endpoint Works: Refresh endpoint accessible and returns new token
- ✅ All API Calls Succeed: No auth-related failures in test output
Failure Handling
If still getting 401 errors:
- Verify refresh endpoint exists:
curl -X POST /api/v1/auth/refresh - Check token expiry time:
jwt.ioto decode token - If refresh endpoint missing, implement it first (30 min task)
- If token lifetime config found, try Option B (longer lifetime)
If cache causes issues:
- Delete
.auth-token-cache.jsonand re-run - Disable caching (comment out cache code) to isolate issue
- Document cache invalidation triggers if needed
Effort Estimate: Phase 2.3c
| Task | Duration | Notes |
|---|---|---|
| Create/update auth fixture | 20 min | Add refresh logic |
| Add token cache | 10 min | File-based cache |
| Update test imports | 5 min | Use new fixtures |
| Manual testing | 10 min | Verify no 401s |
| Total | 45 min | Within 30min-1hr estimate |
5. Parallelization Strategy
Execution Model: Concurrent Work Groups
All three phases can run in parallel with minimal conflicts:
Independence Analysis
| Phase | Phase | Can Run Parallel? | Reason |
|---|---|---|---|
| 2.3a | 2.3b | ✅ YES | Different files (go.mod vs user_handler.go) |
| 2.3a | 2.3c | ✅ YES | Different layers (backend deps vs frontend fixtures) |
| 2.3b | 2.3c | ✅ YES | Different languages (Go vs TypeScript) |
Key: No shared code modifications or merge conflicts expected.
Execution Timeline Scenarios
SCENARIO A: Separate Machines or Teams (True Parallel - 1h wall-clock)
Dev A (2.3a): Dependency update (1 hour)
Dev B (2.3b): Async email refactor (1 hour)
Dev C (2.3c): Auth token refresh (45 min)
All three run simultaneously:
09:00 - START all three
09:45 - Dev C complete (Phase 2.3c done)
10:00 - Dev A & B complete (Phases 2.3a & 2.3b done)
10:00-10:15 - Integration testing
10:15 - PHASE 3 READY
Total wall-clock: 1 hour 15 minutes
SCENARIO B: Shared Repository with Coordination (1h 50min wall-clock)
09:00 - Dev A starts 2.3a, Dev B waits, Dev C starts 2.3c (parallel)
└─ A is working on go.mod (no conflicts)
└─ C is working on test fixtures (no conflicts)
└─ B waits for A to commit
10:00 - Dev A finishes 2.3a, commits
└─ Dev B pulls latest (no conflicts)
└─ Dev B starts 2.3b
09:45 - Dev C finishes 2.3c (started at 09:00)
10:50 - Dev B finishes 2.3b
└─ All three phases complete
10:50-11:05 - Integration testing
11:05 - PHASE 3 READY
Total wall-clock: 1 hour 50 minutes (sequential backend, parallel frontend)
Why slower than A: Backend 2.3b must wait for 2.3a commit, but frontend 2.3c runs in parallel
SCENARIO C: Single Developer (2h 45min wall-clock)
09:00 - Dev starts 2.3a (Dependency Update)
10:00 - Dev completes 2.3a, starts 2.3b (Async Email)
└─ Commits 2.3a changes first
10:50 - Dev completes 2.3b, starts 2.3c (Auth Token)
└─ Commits 2.3b changes
11:35 - Dev completes 2.3c
└─ Commits 2.3c changes
11:35-11:50 - Integration testing
11:50 - PHASE 3 READY
Total wall-clock: 2 hours 45 minutes (pure serial)
Team Assignments & Schedule
| Phase | Owner Role | Duration | Start | Expected Finish | Code Reviewer | Notes |
|---|---|---|---|---|---|---|
| 2.3a | Backend Dev | 1h | 09:00 | 10:00 | Tech Lead | Dependency security update |
| 2.3b | Backend Dev (same or different) | 1h | 09:00* | 10:00* | Senior Backend | Async email refactor |
| 2.3c | Frontend Dev | 45min | 09:00 | 09:45 | Frontend Lead | Token refresh fixtures |
| Integration Test | QA Lead | 15min | 10:00 | 10:15 | Tech Lead | Smoke test all changes |
| Phase 3 Approval | Tech Lead | 5min | 10:15 | 10:20 | - | Go/no-go decision |
Notes:
-
*2.3b timing depends on parallelization scenario:
- Scenario A/B: Dev B can start at 09:00 (different developer) → finish 10:00
- Scenario B (shared repo): Dev B waits for 2.3a commit → start 10:00 → finish 11:00
- Scenario C (single dev): Dev A after 2.3a → start 10:00 → finish 10:50
-
Actual names and assignments based on team availability
-
Dev can be same person (sequential) or different people (parallel)
-
Code reviewers assigned in parallel with implementation
Role Definitions
| Role | Responsibilities | Example |
|---|---|---|
| Backend Dev | Implement 2.3a & 2.3b code changes | Alice (Go expertise) |
| Frontend Dev | Implement 2.3c fixture changes | Bob (TypeScript/Playwright) |
| Tech Lead | Approve go/no-go for Phase 3 | Charlie (Architecture) |
| QA Lead | Run integration tests | Diana (Test expertise) |
| Senior Backend | Review async email implementation | Dave (async/concurrency expert) |
| Frontend Lead | Review Playwright fixture changes | Eve (test automation) |
Coordination Points
Minimal coordination needed:
- ✅ All phases independent
- ✅ No git conflicts expected (different files)
- ✅ No integration dependencies
- ✅ Can commit independently
Recommended coordination:
- [] 09:00: All devs start simultaneously
- [] 09:30: Quick sync (Slack/Teams) - any blockers?
- [] 10:00: Check 2.3a validation complete
- [] 10:15: Final integration test before Phase 3 approval
6. Risk Assessment & Mitigation
Risk Matrix
| Risk | Severity | Probability | Impact | Mitigation | Owner |
|---|---|---|---|---|---|
| Async email sends wrong data | HIGH | Medium | Invite emails contain wrong token | Add unit test with email content verification | Dev B |
| Async email never sends silently | HIGH | Low | Users don't receive invites | Add audit log when job queued, monitor logs | Dev B |
| Token refresh loop failures | MEDIUM | Low | 401 errors during long tests | Verify refresh endpoint exists first (manual test) | Dev C |
| Dependency update breaks auth | MEDIUM | Very Low | Login broken after crypto update | Build Docker image before committing | Dev A |
| Cached token invalid between runs | MEDIUM | Low | Test fails with invalid token | Add cache expiry validation | Dev C |
| Multiple devs modify user_handler.go | LOW | Low | Git merge conflicts | Dev A commits 2.3a first, Dev B pulls latest before 2.3b | Dev A, B |
| Email queue loses jobs on crash | LOW | Low | Some invites unsent in production | Document Option A limitation, plan Option B migration | Dev B |
Detailed Risk Mitigation
Risk 1: Async Email Data Corruption
Scenario: Email sent with previous test's data or corrupted token
Mitigation:
- Add unit test with email verification
- Log email content when sending
- Verify test data doesn't leak
Example test:
func TestInviteEmailCorrectData(t *testing.T) {
// Setup: capture email data
var sentEmail EmailData
mockService := &MockMailService{
OnSendInvite: func(email, token, appName, baseURL string) error {
sentEmail = EmailData{email, token, appName, baseURL}
return nil
},
}
// Act: invite user
handler.InviteUser(ctx, "newemail@test.com")
// Wait for goroutine
time.Sleep(100 * time.Millisecond)
// Assert: email data correct
assert.Equal(t, "newemail@test.com", sentEmail.Email)
assert.NotEmpty(t, sentEmail.Token)
assert.NotEmpty(t, sentEmail.AppName)
}
Risk 2: Silent Email Failures
Scenario: Email fails to send, but no one notices
Mitigation:
- Add structured logging for all email attempts
- Commit audit log entry when job queued (separate from success)
- Monitor logs post-deployment for "Failed to send" messages
Example logging:
go func() {
auditLog(user.ID, "invite_email_queued", token)
if err := h.MailService.SendInvite(...); err != nil {
auditLog(user.ID, "invite_email_failed", err.Error())
h.Logger.Error("invite email failed",
zap.String("user_email", user.Email),
zap.Error(err))
} else {
auditLog(user.ID, "invite_email_sent", "")
}
}()
Risk 3: Token Refresh Endpoint Missing
Scenario: Refresh endpoint doesn't exist, token refresh fails
Mitigation:
- Pre-test refresh endpoint before implementing fixture
- If missing, implement it first (additional 30 min)
- Fall back to Option B (longer token lifetime) if needed
Manual verification (do this first):
# Step 1: Get token
TOKEN=$(curl -s -X POST http://localhost:8080/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{"email":"admin@example.com","password":"TestPass123!"}' \
| jq -r '.token')
# Step 2: Try to refresh
curl -X POST http://localhost:8080/api/v1/auth/refresh \
-H "Authorization: Bearer $TOKEN"
# Expected: 200 OK with new token
# If 404: endpoint missing, implement it first
Risk 4: Dependency Update Breaks Compilation
Scenario: Updated crypto library has breaking API changes
Mitigation:
- Build Docker image (compiles all code)
- Smoke test login endpoint
- Review changelog for breaking changes
If build fails:
# Check what changed
go mod graph | grep crypto
# Review changelog
# May need code updates in cryptography-related handlers
# Last resort: downgrade to specific working version
go get golang.org/x/crypto@v0.30.0 # (if v0.31.0 breaks)
Risk 5: Cached Auth Token Causes Test Failures
Scenario: Cached token is invalid (user deleted, permissions revoked)
Mitigation:
- Add TTL to cache (15 minutes max)
- Verify token with simple API call before reuse
- Re-login if cache not valid
Enhanced cache validation:
async function validateCachedToken(token: string, page: Page): Promise<boolean> {
try {
const response = await page.request.get(
'http://localhost:8080/api/v1/auth/validate',
{ headers: { 'Authorization': `Bearer ${token}` } }
);
return response.ok();
} catch {
return false;
}
}
Risk 6: Git Merge Conflicts in user_handler.go
Scenario: Multiple devs edit same file, merge conflict on commit
Mitigation:
- Commit order: Dev A (2.3a) → rebase → Dev B (2.3b)
- Dev B pulls latest before starting
- Small, focused edits minimize conflict chance
Git workflow:
# Dev A commits first
git add backend/go.mod backend/go.sum
git commit -m "chore(deps): update golang.org/x/crypto and dependencies"
git push
# Dev B pulls and checks for changes
git pull
git status # Verify no conflicts
# Dev B makes edit in user_handler.go
git add backend/internal/api/handlers/user_handler.go
git commit -m "fix(api): make InviteUser async to prevent HTTP blocking"
git push
Risk 7: Email Queue Jobs Lost on Service Crash (Option A Only)
Scenario: Service crashes, in-flight goroutines lost, emails don't send
Mitigation:
- Document as Phase 2.3b limitation
- Plan migration to Option B (queue-based) for Phase 2.4
- In production, prefer Option C (database-persisted) if critical
Note: For MVP (Phase 2.3), Option A acceptable since:
- Email serves optional invite convenience
- User can always resend invite
- Can function without email delivery
7. Validation & Sign-Off
Pre-Remediation Checks
Before starting any phase:
- All three phases understood by assigned developers
- Git repository clean (no uncommitted changes)
- Latest main branch pulled locally
- Test environment up and running
- All tools available (go, docker, npm, trivy, curl)
Phase 2.3a Validation
Automated Checks
# 1. Dependency versions updated
go list -m golang.org/x/crypto | grep "v0\.3[1-9]" # ✅ Must show v0.31.0+
# 2. Build succeeds
docker build -t charon:local . 2>&1 | tail -5 # ✅ Must show "Successfully tagged charon:local"
# 3. Container scan passes
trivy image --severity CRITICAL charon:local # ✅ Must show "Total: 0"
# 4. Smoke test succeeds
curl -s http://localhost:8080/api/v1/users \
-H "Authorization: Bearer $(curl -s -X POST http://localhost:8080/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{"email":"admin@example.com","password":"TestPass123!"}' | jq -r '.token')" | jq '.items | length'
# ✅ Must return number > 0 (users listed)
Manual Verification
- Docker build output contains no warnings
- Trivy report shows vulnerability from CVE-2024-45337 resolved
- Login endpoint responds immediately (<200ms)
- User list endpoint works with valid token
Sign-Off Criteria
**Phase 2.3a: COMPLETE** ✅
- [x] Dependencies updated to latest
- [x] Docker image builds without errors
- [x] Trivy scan passes (0 CRITICAL)
- [x] Smoke tests pass (login, list users)
- [x] No new test failures introduced
**Commit:** `chore(deps): update golang.org/x/crypto and dependencies`
**PR Ready:** Yes
Phase 2.3b Validation
Automated Checks
# 1. Code compiles
cd backend && go build -v ./... # ✅ Must show build output
# 2. Unit tests pass
go test ./... -short -v 2>&1 | grep -E "PASS|FAIL" # ✅ All PASS
# 3. E2E test #248 passes
npx playwright test \
tests/user-management.spec.ts --grep="invite user" \
--timeout=5000 2>&1 | tail -20 # ✅ Must show: 1 passed
Performance Verification
# Measure endpoint response time
time curl -s -X POST http://localhost:8080/api/v1/users/invite \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"email":"measure@test.com"}' > /dev/null
# Expected: real 0m0.150s (NOT > 1s)
Manual Verification
- InviteUser returns in <200ms
- User appears in database immediately after response
- Test #248 completes without timeout
- Test #258-270 all pass
- Email logs show async sending
- No error messages in test output
Sign-Off Criteria
**Phase 2.3b: COMPLETE** ✅
- [x] InviteUser refactored to async
- [x] Response time < 200ms (verified with curl)
- [x] Test #248 passes (user created, no timeout)
- [x] All user management tests pass (6 related tests)
- [x] No regressions in other handlers
- [x] Error handling verified (failed email logged, doesn't break endpoint)
**Commit:** `fix(api): make InviteUser async to prevent HTTP blocking`
**PR Ready:** Yes
Phase 2.3c Validation
Automated Checks
# 1. Fixture syntax correct
npx eslint tests/fixtures/auth.ts # ✅ Must show: 0 errors
# 2. Long test doesn't timeout
npx playwright test \
tests/health-check.spec.ts \
--timeout=3600000 \
--workers=1 2>&1 | grep -E "passed|failed" # ✅ Must show: 1 passed
# 3. No 401 errors in logs
npx playwright test tests/ 2>&1 | grep -c "401" # ✅ Must return: 0
Manual Verification
- Playwright test runs for 60+ minutes without 401
- Console logs show:
[AUTH] Token refreshed... - Cache file created:
.auth-token-cache.jsonexists - Second test run uses cached token
- Refresh endpoint returns valid token
Sign-Off Criteria
**Phase 2.3c: COMPLETE** ✅
- [x] Auth fixture created with token refresh logic
- [x] 60-minute test run completes with no 401 errors
- [x] Token automatically refreshed when near expiry
- [x] Token cached for future test runs
- [x] Credential refresh endpoint verified working
- [x] No test behavior changes (all Phase 2 tests still pass)
**Commit:** `test: add automatic token refresh for long test sessions`
**PR Ready:** Yes
Integration Testing (All Phases)
After all three phases complete:
# Full smoke test suite
npx playwright test tests/ --reporter=html
# Container verification
docker run -d --name final-check -p 8080:8080 charon:local
sleep 5
# Test auth flow
TOKEN=$(curl -s -X POST http://localhost:8080/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{"email":"admin@example.com","password":"TestPass123!"}' | jq -r '.token')
# Test user creation
curl -s -X POST http://localhost:8080/api/v1/users/invite \
-H "Authorization: Bearer $TOKEN" \
-d '{"email":"final@test.com"}' | jq '.id'
# Verify scan still clean
trivy image charon:local --severity CRITICAL
docker stop final-check
docker rm final-check
Expected Results:
- ✅ All endpoint responses successful
- ✅ Token valid and properly used
- ✅ User creation fast (<200ms)
- ✅ Container scan still clean
Final Sign-Off Checklist
## Phase 2.3 COMPLETE - Ready for Phase 3
**Date Completed:** [TIMESTAMP]
**Total Time:** [ACTUAL TIME VS ESTIMATE]
**Developers:** [NAMES]
### Phase 2.3a: Dependency Security ✅
- [x] golang.org/x/crypto v0.31.0+
- [x] Trivy scan passes
- [x] Docker image builds
- [x] Smoke tests pass
### Phase 2.3b: Async Email ✅
- [x] InviteUser response < 200ms
- [x] Test #248 passes
- [x] All user management tests pass
- [x] No regressions
### Phase 2.3c: Auth Token Refresh ✅
- [x] 60+ minute test runs without 401
- [x] Token auto-refresh working
- [x] Cache mechanism functional
- [x] Refresh endpoint verified
### Integration Testing ✅
- [x] Full E2E suite passes
- [x] Container scan clean
- [x] All endpoints responding
### Security Approval ✅
- [x] No CRITICAL vulnerabilities
- [x] No new security concerns
- [x] Dependencies verified
### Code Review Status ✅
- [x] All commits reviewed
- [x] Code follows project standards
- [x] Tests passing
- [x] Ready to merge
### Phase 3 Readiness: **APPROVED** ✅
All critical fixes complete. Ready to proceed with Phase 3 E2E security testing.
Authorized by: [TECH LEAD NAME]
Date: [DATE]
8. Time Estimates & Critical Path
Detailed Task Breakdown
Phase 2.3a: Dependency Update
| Task | Effort | Critical Path |
|---|---|---|
| Update dependencies (go get) | 5 min | YES |
| Run go mod tidy & verify | 5 min | YES |
| Build Docker image | 7 min | YES |
| Container security scan | 5 min | YES |
| Smoke test (login, list users) | 5 min | YES |
| Subtotal | 27-30 min | Serial |
| Buffer (10% for troubleshooting) | 3 min | - |
| Total | 1 hour | ✅ Realistic estimate |
Phase 2.3b: Async Email Refactor (Option A)
| Task | Effort | Critical Path |
|---|---|---|
| Code change (wrap in goroutine) | 5 min | YES |
| Update user_handler.go | 5 min | YES |
| Add error logging (Logger usage) | 3 min | NO |
| Build & compile test | 2 min | YES |
| Unit test addition (response time test) | 10 min | NO |
| E2E test validation (#248) | 10 min | YES |
| Test suite validation (all user tests) | 10 min | YES |
| Code review & fixes | 5 min | YES |
| Subtotal | 50 min | Serial |
| Buffer (10%) | 5 min | - |
| Total | 55-60 min ~= 1 hour | ✅ Within estimate |
Phase 2.3c: Auth Token Refresh
| Task | Effort | Critical Path |
|---|---|---|
| Verify refresh endpoint exists (manual test) | 5 min | YES |
| Create/update auth fixture file | 15 min | YES |
| Add token refresh interval logic | 10 min | YES |
| Add token caching (file-based) | 8 min | NO |
| Update test imports/usage | 5 min | YES |
| 60-min test validation | 10 min | YES |
| Cache verification (second run) | 5 min | NO |
| Code review & fixes | 5 min | YES |
| Subtotal | 40 min | Serial |
| Buffer (10%) | 4 min | - |
| Total | 44-45 min ~= 1 hour | ✅ Within estimate |
Timeline Visualization
Parallel Execution (Recommended)
Timeline (hours)
0h 1h 2h 3h
|-----|-----|-----|
2.3a: [=====] 1h (Dev A: Dependencies)
2.3b: [================] 1h (Dev B: Async Email)
2.3c: [============] 45m (Dev C: Auth Token)
|
3b mark (all complete: <1.5h wall-clock)
Wall-clock total: 1 hour (limited by longest task = 2.3b)
Sequential Execution (If 1 developer)
Timeline (hours)
0h 1h 2h 3h
|-----|-----|-----|-----|
2.3a: [=====] 1h
2.3b: [================] 1h
2.3c: [============] 45m
|
2h50m (all complete)
Wall-clock total: 2h50m
Critical Path Analysis
Critical path = longest task dependency chain
2.3a: 1 hour (completely independent)
2.3b: 1 hour (can start immediately, no deps on 2.3a)
2.3c: 45 min (can start immediately, depends on refresh endpoint existing)
↓
If refresh endpoint missing: +30 min implementation needed
Longest path: max(1h, 1h, 45min) = 1 hour in parallel
Realistic Time Estimates with Buffers
| Scenario | Estimate | Confidence | Notes |
|---|---|---|---|
| Best case (no issues) | 1 hour | 20% | All changes work first try |
| Expected (1-2 small issues) | 1.5 hours | 70% | Typical: need one test retry, one small fix |
| Worst case (major issue) | 3 hours | 10% | Unlikely: e.g., refresh endpoint missing |
Recommended buffer: 1.5 hours total (50% of base estimate) Plan for: 10:00-11:30 (assuming 09:30 start)
9. Phase 3 Blocking Dependencies
Dependency Graph
Phase 3 E2E Security Testing
├─ Requires: Phase 2.3a ✅ (CRITICAL)
│ ├─ Reason: No CRITICAL vulnerabilities in production
│ ├─ Blocker type: Security compliance
│ └─ Time impact: Fail if not complete
│
├─ Requires: Phase 2.3b ⚠️ (HIGH)
│ ├─ Reason: User management tests must pass
│ ├─ Blocker type: Functional requirement
│ └─ Time impact: User-related Phase 3 tests fail/timeout
│
└─ Requires: Phase 2.3c ✅ (CRITICAL)
├─ Reason: Long test sessions timeout with 401
├─ Blocker type: Test infrastructure
└─ Time impact: Phase 3 tests fail after 30 min
Phase 3 Readiness Schecklist
Before starting Phase 3, verify:
## Phase 3 Readiness Check
**2.3a - Security Compliance ✅ REQUIRED**
- [ ] CVE-2024-45337 NOT present in image
- [ ] All golang.org/x packages updated
- [ ] Trivy scan reports 0 CRITICAL
**2.3b - Functional Requirement ✅ REQUIRED**
- [ ] User invite endpoint responds in <200ms
- [ ] Test #248 (invite user) passes
- [ ] Tests #258-270 (other user ops) pass
- [ ] No timeout errors in user management
**2.3c - Test Infrastructure ✅ REQUIRED**
- [ ] Auth fixtures support token refresh
- [ ] 60-minute test run without 401 errors
- [ ] Token cache functional (optional but helpful)
**Full E2E Suite ✅ REQUIRED (smoke test)**
- [ ] All Phase 2 tests pass
- [ ] >95% pass rate (acceptable for remediation phase)
- [ ] No new vulnerabilities introduced
- [ ] Container builds successfully
**GO** → Phase 3 when ALL checks pass
**NO-GO** → Fix remaining issues before Phase 3
10. Risk Escalation & Decision Gates
Decision Gates
Gate 1: Phase 2.3a Complete (1 hour)
- ✅ Decision: APPROVE to proceed to 2.3b+c
- ❌ Decision: HALT - investigate CVE vulnerability
Gate 2: Phase 2.3b Complete (2 hours)
- ✅ Decision: APPROVE Phase 2.3c
- ⚠️ Decision: CONDITIONAL - if user tests still failing, delay Phase 3
Gate 3: Phase 2.3c Complete (2.5 hours)
- ✅ Decision: APPROVE Phase 3 start
- ❌ Decision: HALT - auth infrastructure issue
Gate 4: Integration Testing (2.5 hours)
- ✅ Decision: APPROVED FOR PHASE 3
- ⚠️ Decision: CONDITIONAL - proceed with caution, monitor Phase 3 closely
- ❌ Decision: REJECT - rework Phase 2 sections before Phase 3
Escalation Path
If Phase 2.3a fails (Dependency Update):
- Owner: Backend Dev + Tech Lead
- Action: Investigate breaking change in crypto API
- Options:
- Downgrade to specific version if 0.31.0 incompatible
- Update code for API changes
- Block Phase 3 until resolved
- Timeline: +30 min investigation + fix
If Phase 2.3b fails (Async Email - Test #248 still times out):
- Owner: Backend Dev
- Action: Profile endpoint, identify actual bottleneck
- Options:
- Async refactor insufficient → use Option B (queue-based)
- Bottleneck elsewhere (database query?) → investigate separate
- Email service misconfiguration → check logs
- Timeline: Can proceed to Phase 3 but mark user management as "defer testing"
If Phase 2.3c fails (Auth Token Refresh):
- Owner: Frontend Dev + Backend Dev
- Action: Check refresh endpoint exists and works
- Options:
- Endpoint missing → implement first (30 min)
- Endpoint broken → fix auth logic
- Fixture implementation issue → debug Playwright
- Timeline: Must be resolved before Phase 3 starts (blocking)
Critical Go/No-Go Decision
APPROVED FOR PHASE 3 if ALL true:
- ✅ 2.3a: No CRITICAL vulnerabilities in image
- ✅ 2.3b: User management tests pass (at least 4/6, not all timing out)
- ✅ 2.3c: Long test runs (60 min) don't fail with 401
REVIEW & REWORK if ANY failed:
- ❌ 2.3a: Vulnerability still present
- ❌ 2.3b: All user tests timing out (async didn't solve)
- ❌ 2.3c: Short test runs failing with 401
Appendix A: Quick Reference Commands
Phase 2.3a Commands
# Update dependencies
cd /projects/Charon/backend
go get -u golang.org/x/crypto golang.org/x/net golang.org/x/oauth2 github.com/quic-go/quic-go
go mod tidy && go mod verify
# Build image
docker build -t charon:local .
# Scan for vulnerabilities
trivy image --severity CRITICAL charon:local
# Smoke test
curl -X POST http://localhost:8080/api/v1/auth/login \
-H "Content-Type: application/json" \
-d '{"email":"admin@example.com","password":"TestPass123!"}'
Phase 2.3b Commands
# Test response time
time curl -s -X POST http://localhost:8080/api/v1/users/invite \
-H "Authorization: Bearer $TOKEN" \
-d '{"email":"test@example.com}'
# Run E2E test
npx playwright test tests/user-management.spec.ts --grep="invite" --timeout=5000
# Run all user tests
npx playwright test tests/user-management.spec.ts --reporter=html
Phase 2.3c Commands
# Verify refresh endpoint
curl -X POST http://localhost:8080/api/v1/auth/refresh \
-H "Authorization: Bearer $TOKEN"
# Run long test
npx playwright test tests/health-check.spec.ts --timeout=3600000
# Check cache file
ls -la .auth-token-cache.json
cat .auth-token-cache.json | jq
Appendix B: File Locations Reference
| File/Directory | Purpose | Owner |
|---|---|---|
backend/go.mod, go.sum |
Dependency management | Phase 2.3a |
backend/internal/api/handlers/user_handler.go (lines 462-469) |
InviteUser async refactor | Phase 2.3b |
tests/fixtures/auth.ts |
Token refresh fixtures | Phase 2.3c |
.auth-token-cache.json |
Cached token (gitignore) | Phase 2.3c |
Dockerfile |
Docker image build | Phase 2.3a (validation) |
backend/internal/services/mail_service.go |
Email service (reference only) | Phase 2.3b (research) |
Appendix C: Success Metrics Dashboard
Print this table and track during execution:
Phase 2.3 Remediation - Execution Checklist
==========================================
| Phase | Completed By | Start | End | Status | Blocker | Notes |
|-------|-------------|-------|-----|--------|---------|-------|
| 2.3a | Dev A | 09:00 | 10:00 | ✅ | – | Deps updated, scan passed |
| 2.3b | Dev B | 09:00 | 10:00 | ✅ | – | Tests pass, <200ms response |
| 2.3c | Dev C | 09:00 | 09:45 | ✅ | – | Long tests pass, no 401s |
| **INTEGRATION** | **All** | 10:00 | 10:30 | ✅ | – | Full suite pass, ready |
Total time: 1.5 hours (parallel)
Phase 3 approval: **READY** ✅
DOCUMENT COMPLETE
This Phase 2.3 Remediation Plan is ready for team review and execution. All three critical fixes are defined with specific steps, success criteria, and validation checkpoints. Proceed with parallelized execution targeting 2-3 hour total completion time.
Next Steps:
- Review this plan with team (15 min)
- Assign developers to phases
- Start Phase 2.3a, 2.3b, 2.3c in parallel
- Track progress against checklist
- Validate completeness before Phase 3 approval
- Commit with standardized messages per commit section
- Open PR for code review
- Merge when all validations pass
- Phase 3 E2E Security Testing → APPROVED TO START