12 KiB
12 KiB
CrowdSec Hub Presets Sync & Apply Plan (feature/beta-release)
Current State (what exists today)
- Backend: backend/internal/api/handlers/crowdsec_handler.go exposes
ListPresets(returns curated list from backend/internal/crowdsec/presets.go) and a stubbedPullAndApplyPresetthat only validates slug and returns preview or HTTP 501 whenapply=true. No real hub sync or apply. - Backend uses
CommandExecutorforcscli decisionsonly; no hub pull/install logic and no cache/backups beyond file write backups inWriteFileand import flow. - Frontend: frontend/src/pages/CrowdSecConfig.tsx calls
pullAndApplyCrowdsecPresetthen falls back to localwriteCrowdsecFileapply. Preset catalog merges backend list with frontend/src/data/crowdsecPresets.ts. Errors 501/404 are surfaced as info to keep local apply working. Overview toggle/start/stop already wired tostartCrowdsec/stopCrowdsec. - Docs: docs/cerberus.md still notes CrowdSec integration is a placeholder; no hub sync described.
Incident Triage: CrowdSec preset pull/apply 502/500 (feature/beta-release)
- Logs to pull first: backend app/GIN logs under
/app/data/logs/charon.log(ordata/logs/charon.login dev) via backend/cmd/api/main.go; look for warnings "crowdsec preset pull failed" / "crowdsec preset apply failed" emitted in backend/internal/api/handlers/crowdsec_handler.go. Access logs will also show 502/500 for POST/api/v1/admin/crowdsec/presets/pulland/apply. - Routes and code paths: handlers
PullPresetandApplyPresetlive in backend/internal/api/handlers/crowdsec_handler.go and delegate toHubService.Pull/Applyin backend/internal/crowdsec/hub_sync.go with cache helpers in backend/internal/crowdsec/hub_cache.go. Data dir used isdata/crowdsecwith cache underdata/crowdsec/hub_cachefrom backend/internal/api/routes/routes.go. - Quick checks before repro: (1) Cerberus enabled (
feature.cerberus.enabledsetting orFEATURE_CERBERUS_ENABLED/CERBERUS_ENABLEDenv) or handler returns 404 early; (2)csclion PATH and executable (HubServiceuses real executor and callscscli version/cscli hub install); (3) outbound HTTPS to https://hub.crowdsec.net reachable (fallback aftercscli hub list); (4) cache dir writabledata/crowdsec/hub_cacheand contains per-slugmetadata.json,bundle.tgz,preview.yaml; (5) backup path writable (apply renamesdata/crowdsectodata/crowdsec.backup.<ts>). - Likely 502 on pull: hub cache unavailable or init failed (cache dir permission), invalid slug, hub index fetch errors (
cscli hub list -o jsonor direct GET/api/index.json), download blocked/size >25MiB, preview/download HTTP non-200, or cache write errors. Handler logs warning and returns 502 with error string. - Likely 500 on apply: backup rename fails,
cscliinstall fails with no cache fallback (if pull never succeeded or cache expired/missing), cache read errors (metadata.json/bundle.tgzunreadable), tar extraction rejects symlinks/unsafe paths, or rollback after extract failure. Handler writesCrowdsecPresetEvent(if DB reachable) with backup path and returns 500 withbackuphint. - Validation steps during triage: verify cache entry freshness (TTL 24h) via
metadata.jsontimestamps; confirmcscli hub install <slug>succeeds manually; if cscli missing, ensure prior pull populated cache; test hub egress with curl to hub index and archive URLs; check file ownership/permissions ondata/crowdsecanddata/crowdsec/hub_cache; confirm log lines around warnings for exact error message; inspect backup directory to restore if partial apply.
Goal
Implement real CrowdSec Hub preset sync + apply on backend (using cscli or direct hub index) with caching, validation, backups, rollback, and wire the UI to new endpoints so operators can preview/apply hub items with clear status/errors.
Backend Plan (handlers, helpers, storage)
- Route adjustments (gin group under
/admin/crowdsecin backend/internal/api/handlers/crowdsec_handler.go):- Replace stub endpoint with
POST /admin/crowdsec/presets/pull→ fetch hub item and cache; returns metadata + preview + cache key/etag. - Add
POST /admin/crowdsec/presets/apply→ apply previously pulled item by cache key/slug; performs backup + cscli install + optional restart. - Keep
GET /admin/crowdsec/presetsbut include hub/etag info and whether cached locally. - Optional:
GET /admin/crowdsec/presets/cache/:slug→ raw preview/download for UI.
- Replace stub endpoint with
- Hub sync helper (new backend/internal/crowdsec/hub_sync.go):
- Provide
type HubClient interface { FetchIndex(ctx) (HubIndex, error); FetchPreset(ctx, slug) (PresetBundle, error) }with real impl using either: a)cscli hub list -o jsonandcscli hub update+cscli hub install <item>(preferred if cscli present), or b) direct fetch of https://hub.crowdsec.net/ or GitHub raw.index.json+ tarball download. - Validate downloads: size limits, tarball path traversal guard, checksum/etag compare, basic YAML validation.
- Provide
- Caching (new backend/internal/crowdsec/hub_cache.go):
- Cache pulled bundles under
${DataDir}/hub_cache/<slug>/with index metadata (etag, fetched_at, source URL) and preview YAML. - Expose
LoadCachedPreset(slug)andStorePreset(slug, bundle); evict stale on TTL (configurable, default 24h) or when etag changes.
- Cache pulled bundles under
- Apply flow (extend handler):
Pull: fetch index, resolve slug, download bundle to cache, return preview + warnings (missing cscli, requires restart, etc.).Apply: before modify, runbackupDir := DataDir + ".backup." + timestamp(mirror current write/import backups). Then: a) If cscli available:cscli hub update,cscli hub install <slug>(or collection path), maybecscli decisions listsanity check. UseCommandExecutorwith context timeout. b) If cscli absent: extract bundle into DataDir with sanitized paths; preserve permissions. c) Write audit record to DB tablecrowdsec_preset_events(new model in backend/internal/models).- On failure: restore backup (rename back), surface error + backup path.
- Status and restart:
- After apply, optionally call
h.Executor.Stop/Startif running to reload config; orcscli service reloadwhen available. Returnreload_performedflag.
- After apply, optionally call
- Validation & security hardening:
- Enforce
Cerberusenablement check (isCerberusEnabled) on all new routes. - Path sanitization with
filepath.Clean, limit tar extraction to DataDir, reject symlinks/abs paths. - Timeouts on all external calls; default 10s pull, 15s apply.
- Log with context: slug, etag, source, backup path; redact secrets.
- Enforce
- Migration of curated list:
- Keep curated presets in backend/internal/crowdsec/presets.go but add
Source: "hub"for hub-backed items and includeRequiresHubtrue when not bundled. ListPresetsshould merge curated + live hub index when available, mark availability per slug (cached, remote-only, local-bundled).
- Keep curated presets in backend/internal/crowdsec/presets.go but add
Frontend Plan (API wiring + UX)
- API client updates in frontend/src/api/presets.ts:
- Replace
pullAndApplyCrowdsecPresetwithpullCrowdsecPreset({ slug })andapplyCrowdsecPreset({ slug, cache_key }); include response typing for preview/status/errors. - Add
getCrowdsecPresetCache(slug)if backend exposes cache preview.
- Replace
- CrowdSec config page frontend/src/pages/CrowdSecConfig.tsx:
- Use new mutations:
pullto show preview + metadata (etag, fetched_at, source); disable local fallback unless backend saysapply_supported=false. - Show status strip (success/error) and backup path from apply response; surface reload flag and errors inline.
- Gate preset actions when Cerberus disabled; show tooltip if hub unreachable.
- Keep local backup + manual file apply as last-resort only when backend explicitly returns 501/NotImplemented.
- Use new mutations:
- Overview page frontend/src/pages/Security.tsx:
- No UI change except error surfacing when start/stop fails due to hub apply requiring reload; show toast from handler message.
- Import page frontend/src/pages/ImportCrowdSec.tsx:
- Add note linking to presets apply so users prefer presets over raw package imports.
Hub Fetch/Validate/Apply Flow (detailed)
- Pull
- Handler:
CrowdsecHandler.PullPreset(ctx)(new) callsHubClient.FetchPreset→HubCache.StorePreset→ returns{preset, preview_yaml, etag, cache_key, fetched_at}. - If hub unavailable, return 503 with message; UI shows retry/cached copy option.
- Handler:
- Apply
- Handler:
CrowdsecHandler.ApplyPreset(ctx)loads cache by slug/cache_key →backupCurrentConfig()→InstallPreset()(cscli or manual) → optional restart → returns{status:"applied", backup, reloaded:true/false}. - On error: restore backup, include
{status:"failed", backup, error}.
- Handler:
- Caching & rollback
- Cache directory per slug with checksum file; TTL enforced on pull; apply uses cached bundle unless
force_refetchflag. - Backups stored with timestamp; keep last N (configurable). Provide restoration note in response for UI.
- Cache directory per slug with checksum file; TTL enforced on pull; apply uses cached bundle unless
- Validation
- Tarball extraction guard: reject absolute paths,
.., symlinks; limit total size. - YAML sanity: parse key scenario/collection files to ensure readable; log warning not blocker unless parse fails.
- Require explicit
apply=trueseparate from pull; no implicit apply on pull.
- Tarball extraction guard: reject absolute paths,
Security Considerations
- Only allow these endpoints when Cerberus enabled and user authenticated to admin scope.
- Use
CommandExecutorto shell out to cscli; restrict PATH and working dir; do not pass user-controlled args without whitelist. - Network egress: if hub URL configurable, validate scheme is https and host is allowlisted (crowdsec official or configured mirror).
- Rate limit pull/apply (simple in-memory token bucket) to avoid abuse.
- Logging: include slug and etag, omit file contents; redact download URLs if they contain tokens (unlikely).
Required Tests
- Backend unit/integration:
backend/internal/api/handlers/crowdsec_handler_test.go: success and error cases forPullPreset(hub reachable/unreachable, invalid slug),ApplyPreset(cscli success, cscli missing fallback, apply fails and restores backup),ListPresetsmerging cached hub entries.backend/internal/crowdsec/hub_sync_test.go: parse index JSON, validate tar extraction guards, TTL eviction.backend/internal/crowdsec/hub_cache_test.go: store/load/evict logic and checksum verification.backend/internal/api/handlers/crowdsec_exec_test.go: ensure executor timeouts/commands constructed for cscli hub calls.
- Frontend unit/UI:
- frontend/src/pages/tests/CrowdSecConfig.test.tsx: pull shows preview, apply success shows backup path/reload flag, hub failure falls back to cached/local message, Cerberus disabled disables actions.
- frontend/src/api/tests/presets.test.ts: client hits new endpoints and maps response.
- frontend/src/pages/tests/Security.test.tsx: start/stop toasts remain correct when apply errors bubble.
Docs Updates
- Update docs/cerberus.md CrowdSec section with new hub preset flow, backup/rollback notes, and requirement for cscli availability when using hub.
- Update docs/features.md to list “CrowdSec Hub presets sync/apply (admin)” and mention offline curated fallback.
- Add short troubleshooting entry in docs/troubleshooting/crowdsec.md (new) for hub unreachable, checksum mismatch, or cscli missing.
Migration Notes
- Existing curated presets remain but are marked as bundled; UI should continue to show them even if hub unreachable.
- Stub endpoint
POST /admin/crowdsec/presets/pull/applyis replaced by separatepullandapply; frontend must switch to new API paths before backend removal to avoid 404. - Backward compatibility: keep returning 501 from old endpoint until frontend merged; remove once new routes live and tested.