fix: standardize agent names and add Management agent for orchestration

This commit is contained in:
GitHub Actions
2025-12-05 15:46:31 +00:00
parent d2740fafcc
commit 220cfb585a
10 changed files with 503 additions and 83 deletions

View File

@@ -1,98 +1,216 @@
## 📋 Plan: Security Hardening, User Gateway & Identity
<!--
This file is a placeholder for the current plan. The `Planning` agent must write the detailed plan here (see docs/plans/sample_orchestration_plan.md for a sample).
Subagents will read this file as the single source of truth for the feature implementation.
-->
### 🧐 UX & Context Analysis
<!--
CURRENT SPEC: Aggregated Host Statuses (Uptime) — Endpoint + Dashboard Widget
- Replace this file with the feature spec and Handoff JSON contract for implementing
'Aggregated Host Statuses': an API endpoint grouping uptime monitors by host and
a dashboard widget that shows aggregated host-level health and quick drill-down.
- This document should be used as the single source of truth for developers and handoff.
-->
This plan expands on the initial security hardening to include a full **Identity Provider (IdP)** feature set. This allows Charon to manage users, invite them via email, and let them log in using external providers (SSO), while providing seamless access to downstream apps.
# Current Plan: Aggregated Host Statuses
#### 1. The User Gateway (Forward Auth)
* **Scenario:** Admin shares `jellyseerr.example.com` with a friend.
* **Flow:**
1. Friend visits `jellyseerr.example.com`.
2. Redirected to Charon Login.
3. Logs in via **Plex / Google / GitHub** OR Local Account.
4. Charon verifies access.
5. Charon redirects back to Jellyseerr, injecting `X-Forwarded-User: friend@email.com`.
6. **Magic:** Jellyseerr (configured for header auth) sees the header and logs the friend in automatically. **No second login.**
This feature adds a backend endpoint that returns aggregated health information for upstream hosts
and a frontend Dashboard widget to display the aggregated view. The goal is to provide host-level
health at-a-glance to help identify server-wide outages and quickly navigate to affected services.
#### 2. User Onboarding (SMTP & Invites)
* **Problem:** Admin shouldn't set passwords manually.
* **Solution:** Admin enters email -> Charon sends Invite Link -> User clicks link -> User sets Password & Name.
## Summary
- Endpoint: `GET /api/v1/uptime/hosts/aggregated` (authenticated)
- Backend: Service method + handler + route + GORM query, small in-memory cache, server-side filters
- Frontend: API client, custom React Query hook, `HostStatusesWidget` in Dashboard, demo/test pages
- Acceptance: Auth respects accessible hosts, accurate counts, performance (fast aggregate queries)
#### 3. User-Centric Permissions (Allow/Block Lists)
* **Concept:** Instead of managing groups, Admin manages permissions *per user*.
* **UX:**
* Go to **Users** -> Edit User -> **Permissions** Tab.
* **Mode:** Toggle between **"Allow All (Blacklist)"** or **"Deny All (Whitelist)"**.
* **Exceptions:** Multi-select list of Proxy Hosts.
* *Example:* Set Mode to "Deny All", select "Jellyseerr". User can ONLY access Jellyseerr.
* *Example:* Set Mode to "Allow All", select "Home Assistant". User can access everything EXCEPT Home Assistant.
## HandOff JSON contract (Truth)
Request: `GET /api/v1/uptime/hosts/aggregated`
- Query Params (optional):
- `status` (string): filter results by host status: up|down|pending|maintenance
- `q` (string): search text (host or name)
- `sort_by` (string): `monitor_count|down_count|avg_latency|last_check` (default: `down_count`)
- `order` (string): `asc|desc` (default: `desc`)
- `page` (int): pagination page (default 1)
- `per_page` (int): items per page (default 50)
### 🤝 Handoff Contract (The Truth)
#### 1. Auth Verification (Internal API for Caddy)
* **Endpoint:** `GET /api/auth/verify`
* **Response Headers:**
* `X-Forwarded-User`: The user's email or username.
* `X-Forwarded-Groups`: (Future) User roles/groups.
#### 2. SMTP Configuration
Response: 200 JSON
```json
// POST /api/settings/smtp
{
"host": "smtp.gmail.com",
"port": 587,
"username": "admin@example.com",
"password": "app-password",
"from_address": "Charon <no-reply@example.com>",
"encryption": "starttls" // none, ssl, starttls
"aggregated_hosts": [
{
"id": "uuid",
"host": "10.0.0.12",
"name": "web-01",
"status": "down",
"monitor_count": 3,
"counts": { "up": 1, "down": 2, "pending": 0, "maintenance": 0 },
"avg_latency_ms": 257,
"last_check": "2025-12-05T09:54:54Z",
"last_status_change": "2025-12-05T09:53:44Z",
"affected_monitors": [
{ "id": "mon-1", "name": "example-api", "status": "down", "last_check": "2025-12-05T09:54:54Z" },
{ "id": "mon-2", "name": "webapp", "status": "down", "last_check": "2025-12-05T09:52:14Z" }
],
"uptime_24h": 99.3
}
],
"meta": { "page": 1, "per_page": 50, "total": 1 }
}
```
#### 3. User Permissions
Notes:
- All timestamps are ISO 8601 UTC.
- Field names use snake_case (server -> frontend contract per project guidelines).
- Only accessible hosts are returned to the authenticated caller (utilize existing auth handlers).
## Backend Requirements
1. Database
- Ensure index on `uptime_monitors(uptime_host_id)`, `uptime_monitors(status)`, and `uptime_monitors(last_check)`.
- No model changes required for `UptimeHost` or `UptimeMonitor` unless we want an `avg_latency` column cached (optional).
2. Service (in `internal/services/uptime_service.go`)
- Add method: `GetAggregatedHostStatuses(filters AggregationFilter) ([]AggregatedHost, error)`.
- Implementation detail:
- Query should join `uptime_hosts` and `uptime_monitors` and run a `GROUP BY uptime_host_id`.
- Use a SELECT that computes: monitor_count, up_count, down_count, pending_count, maintenance_count, avg_latency, last_check (MAX), last_status_change (MAX).
- Provide a parameter to include a limited list of affected monitors (eg. top N by last_check) and optional `uptime_24h` calculation where a heartbeat history exists.
- Return GORM structs matching the `AggregatedHost` DTO.
3. Handler (in `internal/api/handlers/uptime_handler.go`)
- Add `func (h *UptimeHandler) AggregatedHosts(c *gin.Context)` that:
- Binds query params; validates and normalizes them.
- Calls `service.GetAggregatedHostStatuses(filters)`.
- Filters the results using `authMiddleware` (maintain accessible hosts list or `authHandler.GetAccessibleHosts` logic).
- Caches the result for `CHARON_UPTIME_AGGREGATION_TTL` (default 30s). Cache strategy: package global in `services` with simple `sync.Map` + TTL.
- Produces a 200 JSON with the contract above.
- Add unit tests and integration tests verifying results and auth scoping.
4. Routes
- Register under protected group in `internal/api/routes/routes.go`:
- `protected.GET('/uptime/hosts/aggregated', uptimeHandler.AggregatedHosts)`
5. Observability
- Add a Prometheus counter/metric: `charon_uptime_aggregated_requests_total` (labels: status, cache_hit true/false).
- Add logs for aggregation errors.
6. Security
- Ensure only authenticated users can access aggregated endpoint.
- Respect `authHandler.GetAccessibleHosts` (or similar) to filter hosts the user should see.
7. Tests
- Unit tests for service logic calculating aggregates (mock DB / in-memory DB fixtures).
- Handler integration tests using the testdb and router that verify JSON response structure, pagination, filters, and auth filtering.
- Perf tests: basic benchmark to ensure aggregation query completes within acceptable time for 10k monitors (e.g. < 200ms unless run on dev env; document specifics).
## Frontend Requirements
1. API client changes (`frontend/src/api/uptime.ts`)
- Add `export const getAggregatedHosts = async (params?: AggregationQueryParams) => client.get<AggregatedHost[]>('/uptime/hosts/aggregated', { params }).then(r => r.data)`
- Add new TypeScript types for `AggregatedHost`, `AggregatedHostCounts`, `AffectedMonitor`.
2. React Query Hook (`frontend/src/hooks/useAggregatedHosts.ts`)
- `useAggregatedHosts` should accept params similar to query params (filters), and accept `enabled` flag.
- Use TanStack Query with `refetchInterval: 30_000` and `staleTime: 30_000` to match backend TTL.
3. Dashboard Widget (`frontend/src/components/Dashboard/HostStatusesWidget.tsx`)
- Shows high-level summary: total hosts, down_count, up_count, pending.
- Clickable host rows navigate to the uptime or host detail page.
- Visuals: small status badge, host name, counts, avg latency, last check time.
- Accessible: all interactive elements keyboard and screen-reader navigable.
- Fallback: if the aggregated endpoint is not found or returns 403, display a short explanatory message with a link to uptime page.
4. Dashboard Page Update (`frontend/src/pages/Dashboard.tsx`)
- Add `HostStatusesWidget` to the Dashboard layout (prefer 2nd column near `UptimeWidget`).
5. Tests
- Unit tests for `HostStatusesWidget` rendering different states.
- Mock API responses for `useAggregatedHosts` using the existing test utilities.
- Add Storybook story if used in repo (optional).
6. Styling
- Keep styling consistent with `UptimeWidget` (dark-card, status badges, mini bars).
## Acceptance Criteria
1. API
- `GET /api/v1/uptime/hosts/aggregated` returns aggregated host objects in the correct format.
- Query params `status`, `q`, `sort_by`, `order`, `page`, `per_page` work as expected.
- The endpoint respects user-specific host access permissions.
- Endpoint adheres to TTL caching; cache invalidation occurs after TTL or when underlying monitor status change triggers invalidation.
2. Backend Tests
- Unit tests cover all aggregation branches and logic (e.g. zero-monitor host, mixed statuses, all down host).
- Integration tests validate auth-scoped responses.
3. Frontend UI
- Widget displays host-level counts and shows a list of top N hosts with status badges.
- Clicking a host navigates to the uptime or host detail page.
- Widget refreshes according to TTL and reacts to manual refreshes.
- UI has automated tests covering rendering with typical API responses, filtering and pagination UI behavior.
4. Performance
- Aggregation query responds within acceptable time for typical deployments (document target; e.g. < 200ms for 5k monitors), or we add a follow-up plan to add precomputation.
## Example API Contract (Sample Request + Response)
Request:
```http
GET /api/v1/uptime/hosts/aggregated?sort_by=down_count&order=desc&page=1&per_page=20
Authorization: Bearer <token>
```
Response:
```json
// POST /api/users
{
"email": "friend@example.com",
"role": "user",
"permission_mode": "deny_all", // or "allow_all"
"permitted_hosts": [1, 4, 5] // List of ProxyHost IDs to treat as exceptions
"aggregated_hosts": [
{
"id": "39b6f7c2-2a5c-47d7-9c9d-1d7f1977dabc",
"host": "10.0.10.12",
"name": "production-web-1",
"status": "down",
"monitor_count": 3,
"counts": {"up": 1, "down": 2, "pending": 0, "maintenance": 0},
"avg_latency_ms": 257,
"last_check": "2025-12-05T09:54:54Z",
"last_status_change": "2025-12-05T09:53:44Z",
"affected_monitors": [
{"id":"m-01","name":"api.example","status":"down","last_check":"2025-12-05T09:54:54Z","latency":105},
{"id":"m-02","name":"www.example","status":"down","last_check":"2025-12-05T09:52:14Z","latency":401}
],
"uptime_24h": 98.77
}
],
"meta": {"page":1,"per_page":20,"total":1}
}
```
### 🏗️ Phase 1: Security Hardening (Quick Wins)
1. **Secure Headers:** `Content-Security-Policy`, `Strict-Transport-Security`, `X-Frame-Options`.
2. **Cookie Security:** `HttpOnly`, `Secure`, `SameSite=Strict`.
## Error cases
- 401 Unauthorized — Invalid or missing token.
- 403 Forbidden — Caller lacks host access.
- 500 Internal Server Error — DB / aggregation error.
### 🏗️ Phase 2: Backend Core (User & SMTP)
1. **Models:**
* `User`: Add `InviteToken`, `InviteExpires`, `PermissionMode` (string), `Permissions` (Many-to-Many with ProxyHost).
* `ProxyHost`: Add `ForwardAuthEnabled` (bool).
* `Setting`: Add keys for `smtp_host`, `smtp_port`, etc.
2. **Logic:**
* `internal/services/mail`: Implement SMTP sender.
* `internal/api/handlers/user.go`: Add `InviteUser` handler and Permission logic.
## Observability & Operational Notes
- Metrics: `charon_uptime_aggregated_requests_total`, `charon_uptime_aggregated_cache_hits_total`.
- Cache TTL: default 30s via `CHARON_UPTIME_AGGREGATION_TTL` env var.
- Logging: Rate-limited errors and aggregation durations logged to the general logger.
### 🏗️ Phase 3: SSO Implementation
1. **Library:** Use `github.com/markbates/goth` or `golang.org/x/oauth2`.
2. **Models:** `SocialAccount` (UserID, Provider, ProviderID, Email).
3. **Routes:**
* `GET /auth/:provider`: Start OAuth flow.
* `GET /auth/:provider/callback`: Handle return, create/link user, set session.
## Follow-ups & Optional Enhancements
1. Add an endpoint-level `since` parameter that returns delta/trend information (e.g. change in down_count in last 24 hours).
2. Background precompute task (materialized aggregated table) for very large installations.
3. Add a configuration to show `affected_monitors` collapsed/expanded per host for faster page loads.
### 🏗️ Phase 4: Forward Auth Integration
1. **Caddy:** Configure `forward_auth` directive to point to Charon API.
2. **Logic:** `VerifyAccess` handler:
* Check if User is logged in.
* Fetch User's `PermissionMode` and `Permissions`.
* If `allow_all`: Grant access UNLESS host is in `Permissions`.
* If `deny_all`: Deny access UNLESS host is in `Permissions`.
## Short List of Files To Change
- Backend:
- backend/internal/services/uptime_service.go (add aggregation method)
- backend/internal/api/handlers/uptime_handler.go (add handler method)
- backend/internal/api/routes/routes.go (register new route)
- backend/internal/services/uptime_service_test.go (add tests)
- backend/internal/api/handlers/uptime_handler_test.go (add handler tests)
- backend/internal/models/uptime.go / uptime_host.go (index recommendations or small schema updates if needed)
### 🎨 Phase 5: Frontend Implementation
1. **Settings:** New "SMTP" and "SSO" tabs in Settings page.
2. **User List:** "Invite User" button.
3. **User Edit:** New "Permissions" tab with "Allow/Block" toggle and Host selector.
4. **Login Page:** Add "Sign in with Google/Plex/GitHub" buttons.
- Frontend:
- frontend/src/api/uptime.ts (add `getAggregatedHosts`)
- frontend/src/hooks/useAggregatedHosts.ts (new hook)
- frontend/src/components/Dashboard/HostStatusesWidget.tsx (new widget)
- frontend/src/pages/Dashboard.tsx (add widget)
- frontend/src/components/__tests__/HostStatusesWidget.test.tsx (new tests)
### 📚 Phase 6: Documentation
1. **SSO Guides:** How to get Client IDs from Google/GitHub.
2. **Header Auth:** Guide on configuring Jellyseerr/Grafana to trust Charon.
---
If you want, I can now scaffold the backend service method + handler and the frontend API client and widget as a follow-up PR.