chore: remove generated hub index files from repo

This commit is contained in:
GitHub Actions
2025-12-11 05:27:11 +00:00
parent 97c2ef9b71
commit 8687a05ec0
25 changed files with 1899 additions and 188 deletions

View File

@@ -1,82 +1,153 @@
# CrowdSec Preset Apply Cache Miss — Bot Mitigation Essentials
**Date:** December 11, 2025
**Incident:** `CrowdSec preset add error: Apply failed: load cache: load cache for bot-mitigation-essentials: cache miss. Backup created at data/crowdsec.backup.20251210-193359`
# CrowdSec Preset Matching Fix
## Context Snapshot
- **Observed error path:** `HubService.Apply()``loadCacheMeta()``HubCache.Load()` returns `ErrCacheMiss`, while apply already created a backup at `data/crowdsec.backup.*`, indicating we fell through the cscli path and then the manual cache path without a cached bundle.
- **Key components in play:**
- Cache layer: [backend/internal/crowdsec/hub_cache.go](backend/internal/crowdsec/hub_cache.go) (`Store`, `Load`, `List`, `Exists`, `Touch`)
- Hub orchestration: [backend/internal/crowdsec/hub_sync.go](backend/internal/crowdsec/hub_sync.go) (`Pull`, `Apply`, `loadCacheMeta`, `runCSCLI`, `extractTarGz`)
- HTTP surface: [backend/internal/api/handlers/crowdsec_handler.go](backend/internal/api/handlers/crowdsec_handler.go) (`PullPreset`, `ApplyPreset`, `ListPresets`, `GetCachedPreset`)
- Coverage and repro baselines: [backend/internal/crowdsec/hub_pull_apply_test.go](backend/internal/crowdsec/hub_pull_apply_test.go), [backend/internal/api/handlers/crowdsec_pull_apply_integration_test.go](backend/internal/api/handlers/crowdsec_pull_apply_integration_test.go)
- **Hypotheses to validate:**
1. **Cache never created** for slug `bot-mitigation-essentials` (e.g., hub index didnt contain slug, slug mismatch, or pull failure masked by fallback logging).
2. **Cache existed but expired/evicted** (24h TTL default in `NewHubCache`, `ErrCacheExpired` treated as miss) before apply.
3. **cscli path failed** and manual path fell back to cache that was missing; backup already created → rollback not restoring correctly on miss.
4. **Slug naming drift** between curated presets and hub index (e.g., `crowdsecurity/bot-mitigation-essentials` vs `bot-mitigation-essentials`).
## Problem
The user reports "preset not found in hub" for all three curated presets:
1. `honeypot-friendly-defaults`
2. `crowdsecurity/base-http-scenarios`
3. `geolocation-aware`
## Plan (phased; minimize requests)
### Phase 1 — Fast Forensics (no new mutations)
- Inspect logs for the failing apply to capture:
- `crowdsec preset apply failed` entries in [backend/internal/api/handlers/crowdsec_handler.go](backend/internal/api/handlers/crowdsec_handler.go) (ensure we log `cache_key`, `backup_path`, `hub_base_url`).
- Prior `preset pulled and cached successfully` entries for the same slug to see if pull ever succeeded.
- Check cache filesystem state without new pulls:
- List `data/hub_cache/` and `backend/data/hub_cache/` for `bot-mitigation-essentials` to confirm presence of `metadata.json`, `bundle.tgz`, `preview.yaml`.
- Read `metadata.json` to confirm `retrieved_at` vs TTL and `cache_key`.
- Confirm whether curated presets include the slug:
- Inspect `ListCuratedPresets()` in [backend/internal/crowdsec/presets.go](backend/internal/crowdsec/presets.go) (if present) and compare to hub index slugs.
## Root Cause Analysis
### Phase 2 — Reproduce with Minimal Requests
- Execute one controlled pull + apply sequence for `bot-mitigation-essentials` only:
1. `POST /api/v1/admin/crowdsec/presets/pull {slug}` — capture response `cache_key`, `etag`, and verify cache files written.
2. `POST /api/v1/admin/crowdsec/presets/apply {slug}` — watch for fallback message `load cache for ... cache miss`.
- Capture logs around these calls to see which path ran:
- `HubService.Apply()` branch (`hasCSCLI`, `runCSCLI` success/fail, then `loadCacheMeta`).
- `HubCache.Load()` result (hit/expired/miss).
- Validate backup rollback: ensure `data/crowdsec.backup.*` is restored when cache miss occurs.
### 1. `crowdsecurity/base-http-scenarios`
This preset **exists** in the CrowdSec Hub (verified via `curl`), but the application fails to find it.
- **Cause**: The `fetchIndexHTTPFromURL` function in `backend/internal/crowdsec/hub_sync.go` attempts to unmarshal the index JSON into a `HubIndex` struct.
- The `HubIndex` struct expects a JSON object with an `"items"` field (compiled format).
- The raw hub index (from `raw.githubusercontent.com`) uses a "Map of Maps" structure (source format) with keys like `"collections"`, `"parsers"`, etc., and **no** `"items"` field.
- `json.Unmarshal` succeeds but leaves `idx.Items` empty (nil).
- The code assumes success and returns the empty index, bypassing the fallback to `parseRawIndex`.
- `findIndexEntry` then searches an empty list and returns false.
### Phase 3 — Code Fix Design (targeted, low-risk)
- **Cache resilience:**
- In `HubService.Apply()`, when `runCSCLI` fails **and** `loadCacheMeta` returns `ErrCacheMiss`, attempt a single `Pull()` retry (hub available) before failing, but guard with context and size limits.
- When `ErrCacheExpired`, auto-evict + repull once to refresh.
- **Slug correctness & curated mapping:**
- Ensure curated preset slug list includes `crowdsecurity/bot-mitigation-essentials` (verify file [backend/internal/crowdsec/presets.go](backend/internal/crowdsec/presets.go)).
- In `findIndexEntry` (hub_sync.go), consider accepting slug without namespace by matching suffix when unique to avoid hub miss.
- **Better guidance and rollback:**
- In `ApplyPreset` handler, if cache miss occurs after backup creation, ensure rollback succeeds and return `backup` + actionable guidance (e.g., "Pull preset again; cache missing").
- Add explicit log when rollback triggers due to cache miss, including backup path and slug.
- **TTL visibility:**
- Add `retrieved_at` and TTL remaining to `GetCachedPreset` and `ListPresets` outputs to help UI warn about expired cache.
- **CSCLI guardrails:**
- If `cscli` is not found or returns non-zero, include stderr in logs and surface a friendlier hint in the error payload.
### 2. `honeypot-friendly-defaults` & `geolocation-aware`
These presets are defined with `Source: "charon-curated"` and `RequiresHub: false`.
- **Cause**: They do not exist in the CrowdSec Hub. The "preset not found" error is correct behavior if `Hub.Pull` is called for them.
- **Implication**: The frontend or handler should not be attempting to `Pull` these presets from the Hub, or the backend should handle them differently (e.g., by generating local configuration).
### Phase 4 — Tests & Repro Harness
- Add regression tests:
- `HubService` unit: `Apply` with `ErrCacheMiss` triggers single repull then succeeds (mock HTTP + cache).
- Integration handler: simulate missing cache after pull (evict between pull/apply) → expect repull or clear error and rollback confirmed.
- Slug normalization test: `bot-mitigation-essentials` (no namespace) maps to `crowdsecurity/bot-mitigation-essentials` when hub index only has the namespaced entry.
- Backup rollback test: ensure `data/crowdsec` restored on cache-miss failure.
- Extend logging assertions in existing tests to validate `cache_key` and `backup` presence in error responses.
## Implementation Plan
### Phase 5 — Observability & UX polish
- Add a lightweight cache status endpoint or extend `ListPresets` to include `cache_state: [hit|expired|miss]` per slug.
- Frontend (CrowdSecConfig.tsx) follow-up (future PR): surface cache age, "repull" CTA on cache miss, and show backup path when apply fails. (Keep frontend changes out of this fix unless necessary.)
### 1. Fix Index Parsing in `backend/internal/crowdsec/hub_sync.go`
Modify `fetchIndexHTTPFromURL` to correctly detect the raw index format.
- **Current Logic**:
```go
if err := json.Unmarshal(data, &idx); err != nil {
// Try parsing as raw index
if rawIdx, rawErr := parseRawIndex(data, target); rawErr == nil { ... }
}
```
- **New Logic**:
```go
if err := json.Unmarshal(data, &idx); err != nil || len(idx.Items) == 0 {
// If unmarshal failed OR resulted in empty items (likely raw index format),
// try parsing as raw index.
if rawIdx, rawErr := parseRawIndex(data, target); rawErr == nil {
return rawIdx, nil
}
// If both failed, return original error (or new error if unmarshal succeeded but empty)
}
```
### Phase 6 — Verification Checklist (one pass)
1. `go test ./backend/internal/crowdsec ./backend/internal/api/handlers -run Pull|Apply -v` (or focused test names added above).
2. `cd backend && go test ./...` to ensure no regressions.
3. Manual: pull + apply `crowdsecurity/bot-mitigation-essentials` twice; second apply should hit cache without backup churn.
4. Confirm logs show cache hit and no `cache miss` warnings; backup directory not recreated on cache hit.
5. Validate data directories remain git-ignored (`/data/`, `/backend/data/`, backups under `/data/backups/`).
### 2. Verify `parseRawIndex`
Ensure `parseRawIndex` correctly handles the `collections` section and extracts the `crowdsecurity/base-http-scenarios` entry.
- The existing implementation iterates over the map and should correctly extract entries.
- `sanitizeSlug` is verified to handle the slug correctly.
## Config File Review
- **.gitignore** — already ignores `/data/` and `/data/backups/`; covers cache/backup artifacts (`backend/data/`). No change needed.
- **.dockerignore** — excludes `data/` and `backend/data/`, keeping hub cache/backup out of build context. No change needed.
- **.codecov.yml** — excludes `backend/data/**`; cache/backup coverage not expected. No change needed.
- **Dockerfile** — installs `cscli`; ensure version is recent enough for hub pulls (currently `CROWDSEC_VERSION=1.7.4`). No adjustments required for this fix, but verify the image still includes cscli after build.
### 3. (Future/Separate Task) Handle Charon-Curated Presets
- The handler `PullPreset` currently calls `Hub.Pull` blindly.
- It should check `RequiresHub` from the preset definition.
- If `RequiresHub` is false, it should skip the Hub pull and potentially perform a local "install" (or return success if no action is needed).
- *Note: This plan focuses on fixing the matching issue for the hub-based preset.*
## Deliverables
- Patch for cache-miss resilience and slug normalization in `HubService.Apply()` and helpers.
- Error/logging improvements in `ApplyPreset` handler.
- Regression tests covering cache-miss + repull, slug normalization, and rollback behavior.
- Optional: cache-status enrichment for UI consumption (if small and low-risk).
## Verification Steps
1. Run `curl` to fetch the raw index (already done).
2. Apply the fix to `hub_sync.go`.
3. Run `go test ./backend/internal/crowdsec/...` to verify the fix.
4. Attempt to pull `crowdsecurity/base-http-scenarios` again.
# CrowdSec Presets UI Improvements
## Problem
The current CrowdSec Presets UI uses a simple native `<select>` dropdown. As the number of presets grows (especially with the Hub integration), this becomes unwieldy. Users cannot search for presets, sort them, or easily distinguish between curated and Hub presets.
## Goals
1. **Search**: Allow users to filter presets by title, description, or slug.
2. **Sort**: Allow users to sort presets by Alphabetical order, Type, or Source.
3. **UI**: Replace the `<select>` with a more robust, scrollable list view with search and sort controls.
## Implementation Plan
### 1. State Management
Modify `frontend/src/pages/CrowdSecConfig.tsx` to add state for search and sort.
```typescript
const [searchQuery, setSearchQuery] = useState('')
const [sortBy, setSortBy] = useState<'alpha' | 'type' | 'source'>('alpha')
```
### 2. Filtering and Sorting Logic
Update the `presetCatalog` logic or create a derived `filteredPresets` list.
* **Filter**: Check if `searchQuery` is included in `title`, `description`, or `slug` (case-insensitive).
* **Sort**:
* `alpha`: Sort by `title` (A-Z).
* `type`: Sort by `type` (if available, otherwise fallback to title). *Note: The current `CrowdsecPreset` type might need to expose `type` (collection, scenario, etc.) if it's not already clear. If not available, we might infer it or skip this sort option for now.*
* `source`: Sort by `source` (e.g., `charon-curated` vs `hub`).
### 3. UI Components
Replace the `<select>` element with a custom UI block.
* **Search Input**: A standard text input at the top.
* **Sort Controls**: A small dropdown or set of buttons to toggle sort order.
* **List View**: A scrollable `div` (max-height constrained) rendering the list of filtered presets.
* Each item should show the `title` and maybe a small badge for `source` or `status` (installed/cached).
* Clicking an item selects it (updates `selectedPresetSlug`).
* The selected item should be visually highlighted.
### 4. Detailed Design
```tsx
<div className="space-y-2">
<div className="flex gap-2">
<Input
placeholder="Search presets..."
value={searchQuery}
onChange={(e) => setSearchQuery(e.target.value)}
className="flex-1"
/>
<select
value={sortBy}
onChange={(e) => setSortBy(e.target.value as any)}
className="..."
>
<option value="alpha">Name (A-Z)</option>
<option value="source">Source</option>
</select>
</div>
<div className="border border-gray-700 rounded-lg max-h-60 overflow-y-auto bg-gray-900">
{filteredPresets.map(preset => (
<div
key={preset.slug}
onClick={() => setSelectedPresetSlug(preset.slug)}
className={`p-2 cursor-pointer hover:bg-gray-800 ${selectedPresetSlug === preset.slug ? 'bg-blue-900/30 border-l-2 border-blue-500' : ''}`}
>
<div className="font-medium">{preset.title}</div>
<div className="text-xs text-gray-400 flex justify-between">
<span>{preset.slug}</span>
<span>{preset.source}</span>
</div>
</div>
))}
</div>
</div>
```
## Verification Steps
1. Verify search filters the list correctly.
2. Verify sorting changes the order of items.
3. Verify clicking an item selects it and updates the preview/details view below.
4. Verify the UI handles empty search results gracefully.
# Documentation Updates
## Tasks
- [x] Update `docs/features.md` with new CrowdSec integration details (Hub Presets, Console Enrollment).
- [x] Update `docs/security.md` with instructions for using the new UI and Console Enrollment.
- [x] Create `docs/reports/crowdsec_integration_summary.md` summarizing all changes.