Files
Charon/docs/plans/archive/phase_2_fix_plan.md
akanealw eec8c28fb3
Some checks failed
Go Benchmark / Performance Regression Check (push) Has been cancelled
Cerberus Integration / Cerberus Security Stack Integration (push) Has been cancelled
Upload Coverage to Codecov / Backend Codecov Upload (push) Has been cancelled
Upload Coverage to Codecov / Frontend Codecov Upload (push) Has been cancelled
CodeQL - Analyze / CodeQL analysis (go) (push) Has been cancelled
CodeQL - Analyze / CodeQL analysis (javascript-typescript) (push) Has been cancelled
CrowdSec Integration / CrowdSec Bouncer Integration (push) Has been cancelled
Docker Build, Publish & Test / build-and-push (push) Has been cancelled
Quality Checks / Auth Route Protection Contract (push) Has been cancelled
Quality Checks / Codecov Trigger/Comment Parity Guard (push) Has been cancelled
Quality Checks / Backend (Go) (push) Has been cancelled
Quality Checks / Frontend (React) (push) Has been cancelled
Rate Limit integration / Rate Limiting Integration (push) Has been cancelled
Security Scan (PR) / Trivy Binary Scan (push) Has been cancelled
Supply Chain Verification (PR) / Verify Supply Chain (push) Has been cancelled
WAF integration / Coraza WAF Integration (push) Has been cancelled
Docker Build, Publish & Test / Security Scan PR Image (push) Has been cancelled
Repo Health Check / Repo health (push) Has been cancelled
changed perms
2026-04-22 18:19:14 +00:00

190 lines
7.6 KiB
Markdown
Executable File

# Phase 2 E2E Failure Fix Plan
## 1. Introduction
This plan analyzes Phase 2 E2E failures from the remediation checklist and
prioritizes fixes that unblock the most tests. It focuses on shared root
causes, dependency clusters, and ownership for targeted remediation.
## 2. Research Findings
### 2.1 Source of Truth
Primary input: [E2E_REMEDIATION_CHECKLIST.md](../../E2E_REMEDIATION_CHECKLIST.md)
(Phase 2A, 2B, 2C failures).
### 2.2 Failure Clusters
- Core UI Docker integration: 2 failures on missing/blocked connection source
control.
- Settings notifications: 7 failures with timeouts or page context closure.
- Settings strict-mode collisions: 5 failures from over-broad selectors.
- Tasks log viewing: 12 timeouts waiting for log responses.
- Caddy import sessions: 3 failures (import results and missing session banner).
- Monitoring real-time logs: 19 failures with WebSocket status stuck at
Disconnected.
- Wait-helpers: 1 failure waiting for URL string match.
## 3. Root Cause Categorization
### 3.1 Failure Buckets (54 total)
| Bucket | Count | Examples |
| --- | --- | --- |
| Backend API issues | 24 | Notifications CRUD/timeouts, system settings save, Caddy import results, log viewing API timeouts |
| Frontend UI issues | 3 | Docker integration control missing, certificate email validation state |
| WebSocket issues | 19 | Real-time logs never connect (Disconnected state persists) |
| Test infrastructure issues | 6 | Strict-mode collisions (selectors), wait-helpers URL timeout |
| Admin access/permissions issues | 2 | Guest visibility of backup button, permissions uncheck disabled |
### 3.2 Root Cause Patterns
- Logs viewing failures (12) all timeout on `page.waitForResponse`, indicating
a shared logs API endpoint not returning or blocked in Docker mode.
- Real-time logs failures (19) all show Disconnected, indicating WebSocket
handshake or server-side streaming not established for `/api/v1/logs`.
- Caddy import failures cluster on missing import session artifacts (no banner
and zero parsed imports), suggesting a shared import-session persistence or
retrieval issue.
- Settings notifications failures cluster on timeouts and context closure,
suggesting API routes or navigation errors when provider lists/templates
are queried or mutated.
- Strict-mode collisions in settings and monitoring point to test selectors
resolving multiple nodes, indicating test infra refinement needed.
- Admin access failures show inconsistent RBAC enforcement between UI
visibility and server-side enforcement.
## 4. Technical Specifications
### 4.1 Priority Ranking (Max Impact First)
1. WebSocket connection failures for real-time logs (19 tests blocked)
2. Logs API timeouts for static log viewing (12 tests blocked)
3. Notifications settings API timeouts/context closure (7 tests blocked)
4. Caddy import session persistence/results (3 tests blocked)
5. Docker integration UI controls missing (2 tests blocked)
6. Strict-mode collisions and wait-helpers (6 tests blocked)
7. Admin access/permissions mismatches (2 tests blocked)
### 4.2 Fix Batches
#### Critical Fixes (Block multiple suites)
- WebSocket connection / event delivery
- Affected tests: 19 (monitoring/real-time-logs)
- Root cause: WebSocket never reaches Connected; likely backend
upgrade/streaming path or proxy config issue.
- Recommendation: Backend Dev
- Logs API timeouts
- Affected tests: 12 (tasks/logs-viewing)
- Root cause: log listing endpoints timing out or blocked in container mode.
- Recommendation: Backend Dev
- Notifications settings API timeouts
- Affected tests: 7 (settings/notifications)
- Root cause: provider/template APIs not responding or UI navigation error
closing the page context.
- Recommendation: Backend Dev with Frontend Dev support
- Caddy import session persistence
- Affected tests: 3 (tasks/caddy-import-*)
- Root cause: import sessions not persisted or banner data not returned.
- Recommendation: Backend Dev
#### Secondary Fixes (Quick wins or infra)
- Docker integration UI controls
- Affected tests: 2 (core/proxy-hosts Docker integration)
- Root cause: missing/hidden form control for connection source.
- Recommendation: Frontend Dev
- Strict-mode collisions and wait helpers
- Affected tests: 6 (settings + monitoring + wait-helpers)
- Root cause: selectors match multiple elements or URL helper too strict.
- Recommendation: Playwright Dev
- Admin access/permissions mismatches
- Affected tests: 2 (tasks/backups guest UI, settings permission uncheck)
- Root cause: UI visibility vs RBAC mismatch or disabled inputs.
- Recommendation: Backend Dev with Frontend Dev support
## 5. Effort and Impact Estimates
| Category | Effort | Impact | Notes |
| --- | --- | --- | --- |
| WebSocket connection | L | Very High | Unblocks 19 monitoring tests |
| Logs API timeouts | M | High | Unblocks 12 task tests |
| Notifications API timeouts | M | High | Unblocks 7 settings tests |
| Caddy import sessions | M | Medium | Unblocks 3 task tests |
| Docker integration UI | S | Medium | Unblocks 2 core tests |
| Strict-mode + wait helpers | S | Medium | Unblocks 6 tests |
| Admin access mismatches | S | Low | Unblocks 2 tests |
## 6. Implementation Plan
### Phase 1: WebSocket and Logs APIs
1. Verify `/api/v1/logs` WebSocket handshake and server-side stream starts.
2. Validate static logs API endpoints and response time in Docker mode.
3. Confirm UI connects to correct WebSocket endpoint for app/security modes.
### Phase 2: Notifications and Caddy Import Sessions
1. Validate notification providers CRUD endpoints and template endpoints.
2. Ensure notification routes do not crash the page context.
3. Validate import-session persistence and banner retrieval endpoints.
### Phase 3: UI and Test Infrastructure Quick Wins
1. Restore Docker integration connection source control visibility.
2. Tighten selectors in strict-mode failures (system status, user management,
uptime monitor).
3. Adjust wait-helpers URL matching to handle expected navigation timing.
### Phase 4: RBAC Consistency
1. Ensure guest users cannot see Create Backup UI controls.
2. Ensure permission management inputs reflect actual capability and are
enabled for admin flows.
## 7. Acceptance Criteria (EARS)
- WHEN the real-time logs page loads, THE SYSTEM SHALL establish a WebSocket
connection and report Connected status within the test timeout.
- WHEN static logs are requested, THE SYSTEM SHALL return log data within
the test timeout for pagination, filtering, and download flows.
- WHEN notification providers/templates are managed, THE SYSTEM SHALL respond
to CRUD requests without page context closure or timeouts.
- WHEN a Caddy import session exists, THE SYSTEM SHALL return the session
banner and import results for review flows.
- WHEN a guest user accesses backups, THE SYSTEM SHALL hide Create Backup
controls and enforce server-side RBAC.
- WHEN strict-mode selectors are used, THE SYSTEM SHALL present a unique
element for each targeted control in settings and monitoring pages.
## 8. Delegation Recommendations
- Backend Dev
- WebSocket connection and streaming
- Logs API timeouts
- Notifications APIs
- Caddy import session persistence
- RBAC enforcement for backups and permissions
- Frontend Dev
- Docker integration UI control visibility
- UI state handling for notifications if backend responses are valid
- Playwright Dev
- Strict-mode selector refinements
- wait-helpers URL matching reliability
## 9. Confidence Score
Confidence: 78 percent
Rationale: Failure clusters are clear and repeated across suites, but root
causes still require endpoint-level confirmation in backend logs and
WebSocket diagnostics.