Files
Charon/docs/plans/current_spec.md

12 KiB

CrowdSec Hub Presets Sync & Apply Plan (feature/beta-release)

Current State (what exists today)

  • Backend: backend/internal/api/handlers/crowdsec_handler.go exposes ListPresets (returns curated list from backend/internal/crowdsec/presets.go) and a stubbed PullAndApplyPreset that only validates slug and returns preview or HTTP 501 when apply=true. No real hub sync or apply.
  • Backend uses CommandExecutor for cscli decisions only; no hub pull/install logic and no cache/backups beyond file write backups in WriteFile and import flow.
  • Frontend: frontend/src/pages/CrowdSecConfig.tsx calls pullAndApplyCrowdsecPreset then falls back to local writeCrowdsecFile apply. Preset catalog merges backend list with frontend/src/data/crowdsecPresets.ts. Errors 501/404 are surfaced as info to keep local apply working. Overview toggle/start/stop already wired to startCrowdsec/stopCrowdsec.
  • Docs: docs/cerberus.md still notes CrowdSec integration is a placeholder; no hub sync described.

Incident Triage: CrowdSec preset pull/apply 502/500 (feature/beta-release)

  • Logs to pull first: backend app/GIN logs under /app/data/logs/charon.log (or data/logs/charon.log in dev) via backend/cmd/api/main.go; look for warnings "crowdsec preset pull failed" / "crowdsec preset apply failed" emitted in backend/internal/api/handlers/crowdsec_handler.go. Access logs will also show 502/500 for POST /api/v1/admin/crowdsec/presets/pull and /apply.
  • Routes and code paths: handlers PullPreset and ApplyPreset live in backend/internal/api/handlers/crowdsec_handler.go and delegate to HubService.Pull/Apply in backend/internal/crowdsec/hub_sync.go with cache helpers in backend/internal/crowdsec/hub_cache.go. Data dir used is data/crowdsec with cache under data/crowdsec/hub_cache from backend/internal/api/routes/routes.go.
  • Quick checks before repro: (1) Cerberus enabled (feature.cerberus.enabled setting or FEATURE_CERBERUS_ENABLED/CERBERUS_ENABLED env) or handler returns 404 early; (2) cscli on PATH and executable (HubService uses real executor and calls cscli version/cscli hub install); (3) outbound HTTPS to https://hub.crowdsec.net reachable (fallback after cscli hub list); (4) cache dir writable data/crowdsec/hub_cache and contains per-slug metadata.json, bundle.tgz, preview.yaml; (5) backup path writable (apply renames data/crowdsec to data/crowdsec.backup.<ts>).
  • Likely 502 on pull: hub cache unavailable or init failed (cache dir permission), invalid slug, hub index fetch errors (cscli hub list -o json or direct GET /api/index.json), download blocked/size >25MiB, preview/download HTTP non-200, or cache write errors. Handler logs warning and returns 502 with error string.
  • Likely 500 on apply: backup rename fails, cscli install fails with no cache fallback (if pull never succeeded or cache expired/missing), cache read errors (metadata.json/bundle.tgz unreadable), tar extraction rejects symlinks/unsafe paths, or rollback after extract failure. Handler writes CrowdsecPresetEvent (if DB reachable) with backup path and returns 500 with backup hint.
  • Validation steps during triage: verify cache entry freshness (TTL 24h) via metadata.json timestamps; confirm cscli hub install <slug> succeeds manually; if cscli missing, ensure prior pull populated cache; test hub egress with curl to hub index and archive URLs; check file ownership/permissions on data/crowdsec and data/crowdsec/hub_cache; confirm log lines around warnings for exact error message; inspect backup directory to restore if partial apply.

Goal

Implement real CrowdSec Hub preset sync + apply on backend (using cscli or direct hub index) with caching, validation, backups, rollback, and wire the UI to new endpoints so operators can preview/apply hub items with clear status/errors.

Backend Plan (handlers, helpers, storage)

  1. Route adjustments (gin group under /admin/crowdsec in backend/internal/api/handlers/crowdsec_handler.go):
    • Replace stub endpoint with POST /admin/crowdsec/presets/pull → fetch hub item and cache; returns metadata + preview + cache key/etag.
    • Add POST /admin/crowdsec/presets/apply → apply previously pulled item by cache key/slug; performs backup + cscli install + optional restart.
    • Keep GET /admin/crowdsec/presets but include hub/etag info and whether cached locally.
    • Optional: GET /admin/crowdsec/presets/cache/:slug → raw preview/download for UI.
  2. Hub sync helper (new backend/internal/crowdsec/hub_sync.go):
    • Provide type HubClient interface { FetchIndex(ctx) (HubIndex, error); FetchPreset(ctx, slug) (PresetBundle, error) } with real impl using either: a) cscli hub list -o json and cscli hub update + cscli hub install <item> (preferred if cscli present), or b) direct fetch of https://hub.crowdsec.net/ or GitHub raw .index.json + tarball download.
    • Validate downloads: size limits, tarball path traversal guard, checksum/etag compare, basic YAML validation.
  3. Caching (new backend/internal/crowdsec/hub_cache.go):
    • Cache pulled bundles under ${DataDir}/hub_cache/<slug>/ with index metadata (etag, fetched_at, source URL) and preview YAML.
    • Expose LoadCachedPreset(slug) and StorePreset(slug, bundle); evict stale on TTL (configurable, default 24h) or when etag changes.
  4. Apply flow (extend handler):
    • Pull: fetch index, resolve slug, download bundle to cache, return preview + warnings (missing cscli, requires restart, etc.).
    • Apply: before modify, run backupDir := DataDir + ".backup." + timestamp (mirror current write/import backups). Then: a) If cscli available: cscli hub update, cscli hub install <slug> (or collection path), maybe cscli decisions list sanity check. Use CommandExecutor with context timeout. b) If cscli absent: extract bundle into DataDir with sanitized paths; preserve permissions. c) Write audit record to DB table crowdsec_preset_events (new model in backend/internal/models).
    • On failure: restore backup (rename back), surface error + backup path.
  5. Status and restart:
    • After apply, optionally call h.Executor.Stop/Start if running to reload config; or cscli service reload when available. Return reload_performed flag.
  6. Validation & security hardening:
    • Enforce Cerberus enablement check (isCerberusEnabled) on all new routes.
    • Path sanitization with filepath.Clean, limit tar extraction to DataDir, reject symlinks/abs paths.
    • Timeouts on all external calls; default 10s pull, 15s apply.
    • Log with context: slug, etag, source, backup path; redact secrets.
  7. Migration of curated list:
    • Keep curated presets in backend/internal/crowdsec/presets.go but add Source: "hub" for hub-backed items and include RequiresHub true when not bundled.
    • ListPresets should merge curated + live hub index when available, mark availability per slug (cached, remote-only, local-bundled).

Frontend Plan (API wiring + UX)

  1. API client updates in frontend/src/api/presets.ts:
    • Replace pullAndApplyCrowdsecPreset with pullCrowdsecPreset({ slug }) and applyCrowdsecPreset({ slug, cache_key }); include response typing for preview/status/errors.
    • Add getCrowdsecPresetCache(slug) if backend exposes cache preview.
  2. CrowdSec config page frontend/src/pages/CrowdSecConfig.tsx:
    • Use new mutations: pull to show preview + metadata (etag, fetched_at, source); disable local fallback unless backend says apply_supported=false.
    • Show status strip (success/error) and backup path from apply response; surface reload flag and errors inline.
    • Gate preset actions when Cerberus disabled; show tooltip if hub unreachable.
    • Keep local backup + manual file apply as last-resort only when backend explicitly returns 501/NotImplemented.
  3. Overview page frontend/src/pages/Security.tsx:
    • No UI change except error surfacing when start/stop fails due to hub apply requiring reload; show toast from handler message.
  4. Import page frontend/src/pages/ImportCrowdSec.tsx:
    • Add note linking to presets apply so users prefer presets over raw package imports.

Hub Fetch/Validate/Apply Flow (detailed)

  1. Pull
    • Handler: CrowdsecHandler.PullPreset(ctx) (new) calls HubClient.FetchPresetHubCache.StorePreset → returns {preset, preview_yaml, etag, cache_key, fetched_at}.
    • If hub unavailable, return 503 with message; UI shows retry/cached copy option.
  2. Apply
    • Handler: CrowdsecHandler.ApplyPreset(ctx) loads cache by slug/cache_key → backupCurrentConfig()InstallPreset() (cscli or manual) → optional restart → returns {status:"applied", backup, reloaded:true/false}.
    • On error: restore backup, include {status:"failed", backup, error}.
  3. Caching & rollback
    • Cache directory per slug with checksum file; TTL enforced on pull; apply uses cached bundle unless force_refetch flag.
    • Backups stored with timestamp; keep last N (configurable). Provide restoration note in response for UI.
  4. Validation
    • Tarball extraction guard: reject absolute paths, .., symlinks; limit total size.
    • YAML sanity: parse key scenario/collection files to ensure readable; log warning not blocker unless parse fails.
    • Require explicit apply=true separate from pull; no implicit apply on pull.

Security Considerations

  • Only allow these endpoints when Cerberus enabled and user authenticated to admin scope.
  • Use CommandExecutor to shell out to cscli; restrict PATH and working dir; do not pass user-controlled args without whitelist.
  • Network egress: if hub URL configurable, validate scheme is https and host is allowlisted (crowdsec official or configured mirror).
  • Rate limit pull/apply (simple in-memory token bucket) to avoid abuse.
  • Logging: include slug and etag, omit file contents; redact download URLs if they contain tokens (unlikely).

Required Tests

  • Backend unit/integration:
    • backend/internal/api/handlers/crowdsec_handler_test.go: success and error cases for PullPreset (hub reachable/unreachable, invalid slug), ApplyPreset (cscli success, cscli missing fallback, apply fails and restores backup), ListPresets merging cached hub entries.
    • backend/internal/crowdsec/hub_sync_test.go: parse index JSON, validate tar extraction guards, TTL eviction.
    • backend/internal/crowdsec/hub_cache_test.go: store/load/evict logic and checksum verification.
    • backend/internal/api/handlers/crowdsec_exec_test.go: ensure executor timeouts/commands constructed for cscli hub calls.
  • Frontend unit/UI:

Docs Updates

  • Update docs/cerberus.md CrowdSec section with new hub preset flow, backup/rollback notes, and requirement for cscli availability when using hub.
  • Update docs/features.md to list “CrowdSec Hub presets sync/apply (admin)” and mention offline curated fallback.
  • Add short troubleshooting entry in docs/troubleshooting/crowdsec.md (new) for hub unreachable, checksum mismatch, or cscli missing.

Migration Notes

  • Existing curated presets remain but are marked as bundled; UI should continue to show them even if hub unreachable.
  • Stub endpoint POST /admin/crowdsec/presets/pull/apply is replaced by separate pull and apply; frontend must switch to new API paths before backend removal to avoid 404.
  • Backward compatibility: keep returning 501 from old endpoint until frontend merged; remove once new routes live and tested.