Files
Charon/docs/reports/crowdsec-preset-pull-apply-debug.md
GitHub Actions 9ad3afbd22 Fix Rate Limiting Issues
- Updated Definition of Done report with detailed checks and results for backend and frontend tests.
- Documented issues related to race conditions and test failures in QA reports.
- Improved security scan notes and code cleanup status in QA reports.
- Added summaries for rate limit integration test fixes, including root causes and resolutions.
- Introduced new debug and integration scripts for rate limit testing.
- Updated security documentation to reflect changes in configuration and troubleshooting steps.
- Enhanced troubleshooting guides for CrowdSec and Go language server (gopls) errors.
- Improved frontend and scripts README files for clarity and usage instructions.
2025-12-12 19:21:44 +00:00

252 lines
7.8 KiB
Markdown

# CrowdSec Preset Pull/Apply Flow - Debug Report
## Issue Summary
User reported that pulling CrowdSec presets appeared to succeed, but applying them failed with "preset not cached" error, suggesting either:
1. Pull was failing silently
2. Cache was not being saved correctly
3. Apply was looking in the wrong location
4. Cache key mismatch between pull and apply
## Investigation Results
### Architecture Overview
The CrowdSec preset system has three main components:
1. **HubCache** (`backend/internal/crowdsec/hub_cache.go`)
- Stores presets on disk at `{dataDir}/hub_cache/{slug}/`
- Each preset has: `bundle.tgz`, `preview.yaml`, `metadata.json`
- Enforces TTL-based expiration (default: 24 hours)
2. **HubService** (`backend/internal/crowdsec/hub_sync.go`)
- Orchestrates pull and apply operations
- `Pull()`: Downloads from hub, stores in cache
- `Apply()`: Loads from cache, extracts to dataDir
3. **CrowdsecHandler** (`backend/internal/api/handlers/crowdsec_handler.go`)
- HTTP endpoints: `/pull` and `/apply`
- Manages hub service and cache initialization
### Pull Flow (What Actually Happens)
```
1. Frontend POST /admin/crowdsec/presets/pull {slug: "test/preset"}
2. Handler.PullPreset() calls Hub.Pull()
3. Hub.Pull():
- Fetches index from hub
- Downloads archive (.tgz) and preview (.yaml)
- Calls Cache.Store(slug, etag, source, preview, archive)
4. Cache.Store():
- Creates directory: {cacheDir}/{slug}/
- Writes: bundle.tgz, preview.yaml, metadata.json
- Returns CachedPreset metadata with paths
5. Handler returns: {status, slug, preview, cache_key, etag, ...}
```
### Apply Flow (What Actually Happens)
```
1. Frontend POST /admin/crowdsec/presets/apply {slug: "test/preset"}
2. Handler.ApplyPreset() calls Hub.Apply()
3. Hub.Apply():
- Calls loadCacheMeta() which calls Cache.Load(slug)
- Cache.Load() reads metadata.json from {cacheDir}/{slug}/
- If cache miss and no cscli: returns error
- If cached: reads bundle.tgz, extracts to dataDir
4. Handler returns: {status, backup, reload_hint, cache_key, ...}
```
### Root Cause Analysis
**The pull→apply flow was actually working correctly!** The investigation revealed:
1.**Cache storage works**: Pull successfully stores files to disk
2.**Cache loading works**: Apply successfully reads from same location
3.**Cache keys match**: Both use the slug as the lookup key
4.**Permissions are fine**: Tests show no permission issues
**However, there was a lack of visibility:**
- Pull/apply operations had minimal logging
- Errors could be hard to diagnose without detailed logs
- Cache operations were opaque to operators
## Implemented Fixes
### 1. Comprehensive Logging
Added detailed logging at every critical point:
**HubCache Operations** (`hub_cache.go`):
- Store: Log cache directory, file sizes, paths created
- Load: Log cache lookups, hits/misses, expiration checks
- Include full file paths for debugging
**HubService Operations** (`hub_sync.go`):
- Pull: Log archive download, preview fetch, cache storage
- Apply: Log cache lookup, file extraction, backup creation
- Track each step with context
**Handler Operations** (`crowdsec_handler.go`):
- PullPreset: Log cache directory checks, file existence verification
- ApplyPreset: Log cache status before apply, list cached slugs if miss occurs
- Include hub base URL and slug in all logs
### 2. Enhanced Error Messages
**Before:**
```
error: "cscli unavailable and no cached preset; pull the preset or install cscli"
```
**After:**
```
error: "CrowdSec preset not cached. Pull the preset first by clicking 'Pull Preview', then try applying again."
```
More user-friendly with actionable guidance.
### 3. Verification Checks
Added file existence verification after cache operations:
- After pull: Check that archive and preview files exist
- Before apply: Check cache and verify files are still present
- Log any discrepancies immediately
### 4. Comprehensive Testing
Created new test suite to verify pull→apply workflow:
**`hub_pull_apply_test.go`**:
- `TestPullThenApplyFlow`: End-to-end pull→apply test
- `TestApplyWithoutPullFails`: Verify proper error when cache missing
- `TestCacheExpiration`: Verify TTL enforcement
- `TestCacheListAfterPull`: Verify cache listing works
**`crowdsec_pull_apply_integration_test.go`**:
- `TestPullThenApplyIntegration`: HTTP handler integration test
- `TestApplyWithoutPullReturnsProperError`: Error message validation
All tests pass ✅
## Example Log Output
### Successful Pull
```
level=info msg="attempting to pull preset" cache_dir=/data/hub_cache slug=test/preset
level=info msg="storing preset in cache" archive_size=158 etag=abc123 preview_size=24 slug=test/preset
level=info msg="preset successfully stored in cache"
archive_path=/data/hub_cache/test/preset/bundle.tgz
cache_key=test/preset-1765324634
preview_path=/data/hub_cache/test/preset/preview.yaml
slug=test/preset
level=info msg="preset pulled and cached successfully" ...
```
### Successful Apply
```
level=info msg="attempting to apply preset" cache_dir=/data/hub_cache slug=test/preset
level=info msg="preset found in cache"
archive_path=/data/hub_cache/test/preset/bundle.tgz
cache_key=test/preset-1765324634
slug=test/preset
level=info msg="successfully loaded cached preset metadata" ...
```
### Cache Miss Error
```
level=info msg="attempting to apply preset" slug=test/preset
level=warning msg="preset not found in cache before apply" error="cache miss" slug=test/preset
level=info msg="current cache contents" cached_slugs=["other/preset"]
level=warning msg="crowdsec preset apply failed" error="preset not cached" ...
```
## Verification Steps
To verify the fix works, follow these steps:
1. **Build the updated backend:**
```bash
cd backend && go build ./cmd/api
```
2. **Run the backend with logging enabled:**
```bash
./api
```
3. **Pull a preset from the UI:**
- Check logs for "preset successfully stored in cache"
- Note the archive_path in logs
4. **Apply the preset:**
- Check logs for "preset found in cache"
- Should succeed without "preset not cached" error
5. **Verify cache contents:**
```bash
ls -la data/hub_cache/
```
Should show preset directories with files.
## Files Modified
1. `backend/internal/crowdsec/hub_cache.go`
- Added logger import
- Added logging to Store() and Load() methods
- Log cache directory creation, file writes, cache misses
2. `backend/internal/crowdsec/hub_sync.go`
- Added logging to Pull() and Apply() methods
- Log cache storage operations and metadata loading
- Track download sizes and file paths
3. `backend/internal/api/handlers/crowdsec_handler.go`
- Added comprehensive logging to PullPreset() and ApplyPreset()
- Check cache directory before operations
- Verify files exist after pull
- List cache contents when apply fails
4. `backend/internal/crowdsec/hub_pull_apply_test.go` (NEW)
- Comprehensive unit tests for pull→apply flow
5. `backend/internal/api/handlers/crowdsec_pull_apply_integration_test.go` (NEW)
- HTTP handler integration tests
## Conclusion
The pull→apply functionality was working correctly from an implementation standpoint. The issue was lack of visibility into cache operations, making it difficult to diagnose problems. With comprehensive logging now in place:
1. ✅ Operators can verify pull operations succeed
2. ✅ Operators can see exactly where files are cached
3. ✅ Apply failures show cache contents for debugging
4. ✅ Error messages guide users to correct actions
5. ✅ File paths are logged for manual verification
**If users still experience "preset not cached" errors, the logs will now clearly show:**
- Whether pull succeeded
- Where files were saved
- Whether files still exist when apply runs
- What's actually in the cache
- Any permission or filesystem issues
This makes the system much easier to troubleshoot and support.